Image quality - a blast from the past

AzBat

Agent of the Bat
Legend
The GeforceFX bench thread got me thinking of a way to quantify or qualify image quality. It reminded me of how 3DMark has the image quality tests that can be viewed in the Results Browser. It allowed you to compare an image from the 3D card with the reference rasterizer. I know I'm not real knowledgeable on this kind of stuff, so forgive me if I seem too ignorant :) , but I was thinking you might be able to take the same premise as 3DMark and using the XOR comparison to compare 3D quality against the software reference rasterizer. I was thinking that there might be more discussion similar to this on the web. So I did a search on Google and my first hit was an archived thread on pcstats.com titled "Different Take on Image Quality" that was posted on Beyond3D back in November 2001. :) It looks like it was a long drawn out discussion, but I figured it would be interesting to bring this topic back up again.

Anyway, back to my idea. What's wrong or why can't it be done? Couldn't you get the reference rasterizer to generate one frame using all the features like FSAA, anisotropic filtering, etc. Then have it generate that same frame with hardware and those same features enabled. Then create a program to do a XOR comparison and use some kind of algorithm to quantify the difference? The image that has the most difference would have a higher score. The product with the lower score would in a sense have the better quality. Would it not? Again, how and why is this idea wrong?

Tommy McClain
 
XOR is a crappy way of comparing images.

129 XOR 128 = 1...
128 XOR 127 = 255!


abs(image1-image2) is much better...
 
In addition to the problem of an XOR comparison being a really crappy way of doing things, there is also the possibility that when using AA or AF the reference rasterizer isn't the best looking way of doing it.

IQ is a subjective thing... just is, no way around it. There is no rule that says an AA method that differs from the refras can't look better.

Now, if you want to compare two images with no filtering, no antialiasing, etc., then perhaps a simple XOR could establish the "correctness" in some limited form, but abs() would still be a far more useful test.
 
Who's to say the software rasteriser's more right than the video card when it comes to IQ, particulary when it comes to AF, AA etc? Besides, I think I heard someone say M$ removed the software renderer from DX9...


*G*
 
When talking IQ people make it sound far more tedious and complicated then it has to be, we are looking for the obvious here i.e:

Level of Detail (too agressive or too low) blurry or aliasing
Anisotropic filtering- quality (not very hard to compare..in fact we did it here)
FSAA quality..verticals, angles, alpha textures etc..

These are the three main areas IMO people would be concerned about, the other obvious one would be rendering errors.

Say Morrowind since its one of the prettier games out there, select a spot to do all these tests.i.e walk outside the city to give some depth perception and start taking screen shots ??

If one card can't filter as well as another in the distance, well there is difference #1.
 
Grall said:
Who's to say the software rasteriser's more right than the video card when it comes to IQ, particulary when it comes to AF, AA etc? Besides, I think I heard someone say M$ removed the software renderer from DX9...


*G*

Yes Noko proved that on Rage3D when Rosco posted his famous Xor test on the 8500 :LOL:
 
Thanks for the reply guys. I wasn't aware that a XOR comparison was crappy. However, I wasn't specifically wanting to use a XOR comparison. It was just the only kind I knew of. If there are better ways like abs(), then those ways should be used instead.

I completely understand that IQ is subjective, but doesn't the reference rasterizer render the correct way? So I guess I'm looking for a way to quantify and qualify rendering correctness instead of judging who has better quality. I was under the assumption that the reference rasterizer rendered all supported 3D features like bi-linear filtering, anisotropic filtering, etc a correct way as they are defined.

Forgive me if what I'm about to talk about shows my ignorance, but if you wanted do a 4x MSAA, then doesn't Microsoft have correct way included in their reference rasterizer? So if a card does 4x MSAA and it doesn't match Microsoft's reference rasterizer, couldn't it be said it's not correct? I guess I'm only concerned with how a product's IQ compares in an apples-to-apples comparison of the reference rasterizer. So if the reference rasterizer doesn't define how the different FSAA methods should look, then I can understand it wouldn't be fair to do such comparisons.

So I guess it should be asked what FSAA, level of detail or filtering methods are defined by the reference rasterizer? And if this can't be done in DirectX, then what about OpenGL? Doesn't OpenGL provide for defining the correctness of all rendering features?

If there is no definition for the correct way of using rendering features, then I can see that there will be no way to quantify and qualify IQ. There has to be some base reference to determine the correctness. I mean somehow with the release of 3D WinBench 99 and later 3D WinBench 2000, some kind of calculation was performed to automatically determine if a quality test passed or failed. For more specifics check this URL...

http://www.etestinglabs.com/bi/cont1999/199912/mipmap.asp?visitor=C

With that said, couldn't something like this be done using anti-aliasing, anisotropic filtering, etc? But instead of passing or failing a test, why not quantify how bad or good it compares to a reference image with some kind of score?

P.S. The reference rasterizer still exists in DirectX 9. It just doesn't come with the runtime install.

Tommy McClain
 
DT,

I agree that IQ doesn't have to be complicated. And I also agree with your 3 main areas. The only thing I'm wondering about is do you think that you can quantify or qualify these 3 areas with some kind of automated test?

Tommy McClain
 
The problem lies in AA not really trying to be 'correct', in fact adding more theoretical error to the sample points results in a better looking image (AccuView-shifted samples, for example).

If the Reference Rasteriser's AA sample points could be manually set, it'd be a valid comparison...
 
AzBat said:
Forgive me if what I'm about to talk about shows my ignorance, but if you wanted do a 4x MSAA, then doesn't Microsoft have correct way included in their reference rasterizer? So if a card does 4x MSAA and it doesn't match Microsoft's reference rasterizer, couldn't it be said it's not correct? I guess I'm only concerned with how a product's IQ compares in an apples-to-apples comparison of the reference rasterizer. So if the reference rasterizer doesn't define how the different FSAA methods should look, then I can understand it wouldn't be fair to do such comparisons.

Even if you are talking about a mode as specific as "4x MSAA", there is still no one correct or "best" way to do it. For example you could use an ordered grid, a rotated grid, or a sparse distribution of samples. And even within those three categories, there are an infinite number of possible sample patterns. Each one will work better for some images, and worse for others. For edges at randomly distributed angles, sparse should look better than rotated which should look better than ordered in most cases, but not all. So even if the refrast implemented one of these methods, it could never be guaranteed to produce the "ideal" image.

And of course, the same argument could be applied to texture filtering as well, complicating things even further.
 
The idea of a reference is not the correct way either, who says the reference is better..Microsoft ?? Futuremark/Madonion ??

Ideally (who cares hows its done) we want the best image we can get without aliasing..clear and realistic including buildings and telephone poles being smooth and not jaggy(I haven't seen a jaggy telephone pole..well maybe New Years). We keep hearing cinematic quality well IMO for that to happen we need player models without pointy elbows, we need intertactive enviroments (i.e walking through the water in Morrowind and ripples come from your feet), we need FSAA to make buildings etc look real...and of course good depth perception (clear) but not above really what say 20/20 vision would give you .

SO if X brand of card delivers this with a new and unorthadox way do we really care as a consumer what they did to get that ?? Thats why IMO is these image comparison arguements we far too worried about 'how' instead of what we see.

All IMO
 
The problem is in these tests the GF2 rendered images closest to the reference image, but nearly everyone agreed that the Radeon (and also V5 to a lesser extent) looked much better. Some things just can't be quantified...
 
Here are my ideas on measuring image quality:

1. Texture filtering. Essentially attempt to generate, in software, the most aggressive texture filtering implementation you can think of (ex. anisotropic filtering with no absolute degree limitation, possibly with texture-level supersampling thrown in that does not affect triangle edges). The basic idea is not to produce an image that is the goal of a given rendering algorithm, but is instead an attempt at a best-case scenario in output quality with a given input. The scene in question should have a number of features, including complex geometry, very high-resolution textures, and at least some textures tailor-made to showcase aliasing.

2. Anti-aliasing. Again, attempt to generate a best-case output. An easy way might be to do stochastic anti-aliasing at 64+ samples per pixel. To attempt to isolate the effects of anti-aliasing from texture-filtering, the best scene would be either wireframe or flat-shaded (not textured), with many different edge angles in view.

Some global notes: For optimal examination of the data, all averaging should be done in a gamma-correct fashion, where, before rendering, the user should calibrate the amount of gamma correction for the particular monitor being used (optimally the video card will also allow calibration of the degree of gamma correction, if any). As a side note, it might actually be better to just set a specific gamma correction level so that all cards will be on equal footing with any monitor, but for the best results on a specific monitor, custom tweaking is a necessity.

On image analysis: This is the tough one. The easiest way would be to simply directly compare the images. The simple image subtraction and dot product techniques given previously would be valid. But this won't really tell which images look better to the user (for example, 4x OGMS and 4x RGMS may look identical to this test). This would require more complicated techniques based upon the human visual subsystem. As an example, large regions of the same color could be "extra-bad" for the final output. The program could be designed to count "jaggies" in an image, where there is a stairstep at above a given intensity. Another option might be to attempt to detect texture aliasing by examining a frequency-based analysis of the image (or at least one particular region of the image).

Obviously this sort of image quality analysis is beyond the scope of most review sites, but hopefully somebody will release such a thing before long. We really need not just a way to compare specific rendering techniques, but also a way to give each specific set of rendering options on a specific video card a "quality score." As a side note, though I think it would be fun to put out something like this, I really don't have the time right now. If I get the time, perhaps I'll start working on it.
 
I doubt someone will take the time to write the software to do such an elaborate comparison, especially if the software is attempting to quantify the differences in IQ between a rendered image and the reference.

Why? Your eyes can tell the difference between two cards fairly easily, and that's all that's needed.
 
Bigus Dickus said:
Why? Your eyes can tell the difference between two cards fairly easily, and that's all that's needed.

No, it's not, because one image in any realistic situation cannot possibly show all of the ins and outs of the image quality of one video card vs. another. Playing a number of games and attempting to determine differences in image quality in that way is highly subjective. It would really be nice to have an image quality benchmark that really tries its absolute best to be as objective as is humanly possible.
 
Chalnoth said:
Bigus Dickus said:
Why? Your eyes can tell the difference between two cards fairly easily, and that's all that's needed.

No, it's not, because one image in any realistic situation cannot possibly show all of the ins and outs of the image quality of one video card vs. another. Playing a number of games and attempting to determine differences in image quality in that way is highly subjective. It would really be nice to have an image quality benchmark that really tries its absolute best to be as objective as is humanly possible.

I agree with both of you to an extent.

I think it has already been done, I just think the "application" is called "test programs and a selection of multiple games". I think the vacuum left by the "selection of multiple games" can be covered by a less comprehensive set of specific "test programs" (the "tube test" with the additions I requested, i.e., rotating the "tube" without the colored mip levels with multiple full screen resolutions selectable and some other refinements strikes me as an almost perfect start to analyze anisotropic filtering effectiveness if actually used in context). I don't think mathematical comparison will work, for reasons already stated, and that we've already established several ways in games and several test programs to compare what we see, and can continue to refine it with discussion of what comparisons mean to the gamer.

Other test programs include the AA sample pattern tool, which is already used to analyze the effectiveness of AA to good effect.

I agree with Chalnoth with the scope of the things to be tested, and even with most of the goals behind the types of images he would recommend generating, but disagree that we don't already have the tools basically necessary to test such things properly.

Furthermore, I don't think mathematical or analytical comparison is necessary or desirable...while it could prevent some of the more outstanding failings of some reviewers in regards to comparing image quality, there are too many opportunities for error, or outright "cheating" by IHVs, in trying to use a fixed algorithm as a "shortcut" to evaluation. Especially with the complexity of possible outputs making it difficult to establish the boundary for valid mathematical comparison for each of the test cases specified.

The proposed criteria for image analysis all seem to be flawed to me...how do you decide the balancing point between aliasing and texture detail (which would both decrease the number of "same color regions")? The difference between bluriness and anti-aliasing (which both would increase the frequency of certain color ranges in some regions)? This can be determined by human decision, and the focus of such efforts to me seems like it would be best to be aimed at isolating what to look at rather than coming up with a general algorithm for an "Image Quality Quotient".

That's not to say a utility that performs all the tests mentioned and offers highlights of pre-selected image segments for analysis wouldn't offer something valuable (while optimization for it was not achieved by IHVs) in ADDITION to other tests, just that it is a poor substitute for a good human analysis at this time. IMO.
 
AzBat said:
Thanks for the reply guys. I wasn't aware that a XOR comparison was crappy. However, I wasn't specifically wanting to use a XOR comparison. It was just the only kind I knew of. If there are better ways like abs(), then those ways should be used instead.
Tom,
MSE (mean squared error) measure of the difference is probably better still.

I completely understand that IQ is subjective, but doesn't the reference rasterizer render the correct way?
Debatable. I certainly have seen things in the refrast which you would definitely do differently in a HW solution.

As for correct filtering, perhaps if you rendered the image with, say, 100x100 supersampling, fourier transformed this high res image, applied 'the correct' low-pass filter, inverse fourier transformed, and then sub sampled, then maybe the result would be a safe 'reference' image with which to do quality comparisons.
 
Yes, that was very much my idea as well - to oversample in a massive way to create the 'perfect' image from a given source dataset.

In particular filtering then isn't anything like as much of a problem - ideally, you'd sample at high enough resolution that textures would always be magnified by several times, and therefore point sampling the highest LOD would always be sufficient.

In reality, 5000x5000 oversampling sounds pretty hard, so you'd still have to look at a few issues with filtering algorithms and mipmapping, which are likely to be the biggest potential points of contention between hardware and software...
 
Back
Top