Image quality - a blast from the past

Simon F said:
As for correct filtering, perhaps if you rendered the image with, say, 100x100 supersampling, fourier transformed this high res image, applied 'the correct' low-pass filter, inverse fourier transformed, and then sub sampled, then maybe the result would be a safe 'reference' image with which to do quality comparisons.
Single discrete image will not do the trick, images over time segment ( aka digital video :p ) should be taken ( texture aliasing and other reasons )
Also, AFAIK API's define "correct" differently, and specify some features quite loosely. Nobody has ever estabilished whats the most "correct" way for scanline rendering.
Say, application would use GL SGIS_texture_filter4 extension with default setting, where is the reference correct image defined ?
BTW, does anyone know, which consumer graphics cards implement this extension ?
 
no_way said:
Simon F said:
As for correct filtering, perhaps if you rendered the image with, say, 100x100 supersampling, fourier transformed this high res image, applied 'the correct' low-pass filter, inverse fourier transformed, and then sub sampled, then maybe the result would be a safe 'reference' image with which to do quality comparisons.
Single discrete image will not do the trick
Why would image data N+1 be more likely to produce a significantly different MSE than its predecessor?
 
Although a bitmap is indeed fully digital (digital values of color components per pixel and digital quantity of pixels), the overall image has to be considered as something analog.

Any form of signal theory will prove this, especially if you throw in AA or anisotropic filtering. Whereas you can overall improve an image through additional data per pixel, this would make this "fail" any reference comparison, even though the result would be nearly unanimously "superior"...


I really wish noko's screenshots on that old thread were still there as they illustrated this quite well. The three color swatch IQ quality test was the most telling, with the reference rasterizer and IHV A images being chunky and banded, and IHV B's being a smooth gradient of near perfect color. As the smooth gradient was far from the banded reference or IHV A, obviously the better of the two "failed" the "IQ comparison" test by a larger margin.

The author of the "Different Take at Image Quality" was combating a growing trend of individuals commenting that the IQ of IHV A was substantially superior to the IQ of IHV B. When the real world proves otherwise, man has always ran to mathematics and science to try and formulate wild theories and wierd science in order to prove the age old "suspend an elephant from a cliff hanging from a daisy" methodology. That discussion was an excellent example of such a condition.
 
Maybe you could break them up into seperate test?

Have a "plain" test with no-AA or AF and here test the ability of the card to reproduce the same color pattern (ie given image), using some sort of bit by bit color comparison. Then run a seperate test on the AA ability as well as a 3rd or 4th test on filtering. I would think that if you can seperate them at least you can gain some knowledge about each stage of the card.
 
While you guys are at it, why not add a feature to the program to make it a full fledged art critic. I mean, people are so biased, how can we trust them to judge whether Picasso was a good painter? :rolleyes:
 
Chalnoth said:
No, it's not, because one image in any realistic situation cannot possibly show all of the ins and outs of the image quality of one video card vs. another.
When did I ever mention basing a subjective judgement on a single static image?

Playing a number of games and attempting to determine differences in image quality in that way is highly subjective.
And, since IQ is a highly subjective and personal thing, it seems a perfect fit. The only opinion that really matters is that of the actual user.

It would really be nice to have an image quality benchmark that really tries its absolute best to be as objective as is humanly possible.
Next you'll tell us that having a way to objectively decide if Mozart is better than Bach would be nice as well.
 
Again I think Bigus is hitting the nail on the head here (what do we need artist judges !!), here are 2 screen shots @ Hardocp (great becuase the size of the PIC is small)..we are getting far to technical (now looking at a program) instead of looking at the obvious...the obvious jumps out what image is better, at least to 90% of the people anyways..who likes what pic better Bachelor #1 or Bachelor #2 and why ??(no cheating..no looking at the file name)...these are .jpg so the original would be even better.

Bachelor 1
1010979849aGDrY8relO_3_1_l.jpg


Bachelor 2
1010979849aGDrY8relO_3_2_l.jpg
 
But those images do not tell the whole story. Just because a screenshot is sharper doesn't mean it will look better in motion. If there is significant aliasing, it will look quite a bit worse. Some time ago, I posted a couple of different shots on my GeForce4 with the LOD setting tweaked on one. The one with the more aggressive LOD had a much nicer-looking screenshot, but looked very poor in motion in that same level (due to aliasing).

Anyway, for the exact same type of filtering, the Radeon series will always look very, very slightly worse than the GeForce series, when LOD levels are properly-adjusted (When the LOD levels are different, it's going to be dependent upon the scene and the person looking at the images). This is more true with the Radeon 7200 and Radeon 8500 than it is for the 9500-9700. This is just due to the MIP map selection algorithm.

As for anisotropic filtering, don't forget that I said the exact same type of filtering, and the aniso of the GeForce series and the Radeon series cannot be directly compared, though in my experience the Radeon 9700 has more aliasing (though it is rarely noticeable) than the GeForce4.
 
Bigus Dickus said:
Chalnoth said:
No, it's not, because one image in any realistic situation cannot possibly show all of the ins and outs of the image quality of one video card vs. another.
When did I ever mention basing a subjective judgement on a single static image?

But usually such image quality comparisons are wholly inadequate. In particular, almost never do reviewers bother to look for texture aliasing.

Playing a number of games and attempting to determine differences in image quality in that way is highly subjective.
And, since IQ is a highly subjective and personal thing, it seems a perfect fit. The only opinion that really matters is that of the actual user.

But fundamentally, image quality is not subjective. For example, it can easily be shown that the Radeon 9700 has superior FSAA to any other video card out there right now. Nobody can dispute that. And, at least at the exact same degree of anisotropic, the GeForce4 Ti series has the best anisotropic filtering around. That is also undisputable. (I also think that the GF4 at 8-degree looks better than the 9700 at 16-degree, but since that is mostly based on aliasing, it is disputable)

The only place that the subjectiveness comes in is when there are tradeoffs (such as aniso vs. AA, and performance vs. image quality). This is, in particular, why I suggested attempting to separate a texture filtering test from an anti-aliasing test. It's not complete, but it's a start.

It would really be nice to have an image quality benchmark that really tries its absolute best to be as objective as is humanly possible.
Next you'll tell us that having a way to objectively decide if Mozart is better than Bach would be nice as well.

The problem is that artists are striving for a specific and known goal in their art. That is, they don't have an image that they are attempting to recreate as accurately as possible (not anymore, anyway...). But video cards are. They have a job to do, and how well they do that job can be mathematically quantified.

But, as I said, it is beneficial to attempt to split up the image quality examinations, since each video card will have different tradeoffs. Beyond just the two splits I mentioned above, just in texture filtering it might be good to attempt to make a split between texture aliasing and high-angle texture clarity.
 
But usually such image quality comparisons are wholly inadequate. In particular, almost never do reviewers bother to look for texture aliasing.

Texture aliasing is very rare today with FSAA and AF, in the older cards it WAS a arguement that held some water..
Sure aggressive LOD can cause it, again falls back on the persons preference, they may prefer a bit of aliasing vs blurry textures (I know I do)...

A card like a 9700 has the feature set to deliver texture clarity and speed.
 
Chalnoth said:
But usually such image quality comparisons are wholly inadequate. In particular, almost never do reviewers bother to look for texture aliasing.
Again, when did I say that current reviews do an adequate job of IQ assessment? You seem to be reading a lot of stuff I haven't written.

But fundamentally, image quality is not subjective.
Umm.... sure. How could it not be?

I also think that the GF4 at 8-degree looks better than the 9700 at 16-degree...
Precisely. You prefer less filtering depth with a perceived improvement in texture aliasing reduction, and the removal of any angle dependent reductions in filtering depth. Others prefer the greater filtering depth, despite the cases where angled surfaces cause a temporary reduction in that filtering. How do you suggest we quantify that difference? Even supposing you came up with some magical algorithm that could give an output matching one person's assessment of the IQ, it wouldn't match everyone's. Why? IQ is subjective.

For example, it can easily be shown that the Radeon 9700 has superior FSAA to any other video card out there right now. Nobody can dispute that.
Really? What if some people think old RGSS is superior due to the cases where alpha textures are properly handled? Who are you to say that those cases aren't more important to them than the increased edge quality of the 9700? Again, IQ is subjective. Every user has their own opinion of what combinations look better. Perhaps some prefer the look of Quincunx style AA, and would actualy argue that to them the "softening" of textures was pleasing to the eye? How would an objective algorithm take that into consideration? It can't.

And, at least at the exact same degree of anisotropic, the GeForce4 Ti series has the best anisotropic filtering around. That is also undisputable.
Undisputable? Really? Man, you hold your own opinion in pretty high regard, don't you. I've seen screenshot comparisons where 8X AF on the GF4 has the same filtering depth as 4X AF on the 9700. In those cases, I prefer the look of the 9700's 8X to the GF4's 8X simply due the the increased filtering depth. Thus, the 9700 is better. "Undisputably." :rolleyes:

The only place that the subjectiveness comes in is when there are tradeoffs
Well, not the only place. I refer you to the QAA example. But, tradeoffs are a large part of it, and they are abundant. Edge quality vs. alpha textures vs. texture filtering. AF filtering depth vs. aliasing reduction vs. angle consistency. You could add AA sample number vs. pattern vs. gamma/non-gamma corrected.

There are of course other subjective areas as well that your "objective" algorithm would be sorely unable to cope with. How about color saturation or pixel sharpness?

That is, they don't have an image that they are attempting to recreate as accurately as possible (not anymore, anyway...). But video cards are. They have a job to do, and how well they do that job can be mathematically quantified.
Perhaps you could quantify image accuracy, but that is not equal to image quality. That argument has been made in the past, and repeatedly refuted. I can't believe you would make the same argument here.

The bottom line is that IQ can't be mathematically quantified in any consistent way, simply because "quality" is, in this case, a highly subjective property.
 
Perhaps you could quantify image accuracy, but that is not equal to image quality....

Agreed. I think we should expand on this and we can help nail down and focus this discussion. I see two different things being discussed here, and I'll henceforth term them:

1) Image "correctness"
2) Image "pleasantness" (Please...someone come up with a better word!) ;)

I would say that put BOTH of them together, and you get what we call "overall image quality".

"Correctness" (accuracy) is more or less something that can be objectively quantified. We would require some reference image or scene that can be agreed is "100% accurate representation of the ultimate quality scene". In reality, that's an analog representation. In practice, it's going to be at any given resolution computed with "very high" color and z accuracy, and "very high" full scene antialiasing and advanced texture filtering.

"Pleasantness" is a purely subjective measurement of how acceptable the image is. Given the same image or scene rendered by two different pieces of hardware "at the same settings", different people can reach different conclusions about which one is more "pleasing." Given different implementations of certain features, rendering accuracy, trade-offs, etc.

Now, since pleasantness is subjective, overall image quality cannot be 100% objective, though there is an objective component.

I think the important thing to keep in mind, is that "correctness" basically defines whether or not two images are even comparable in the first place. If two images are not "close enough" in correctness, they should not be directly compared. At that point where two images are sufficiently of the same correctness, "pleasantness" then pretty much determines overall image quality.

That prevents something like one person "prefering" point sampled graphics because they are "sharp", vs. filtered graphics. We all agree that's not an apples to apples comparision, yet one cannot argue someone's opinion that point sampled looks better. We can argue that the two images are not close enough in relative "correctness" to be directly compared in the first place.

So, what can we do with this? What's the best way to approach the comparison of two different implementations?

Two step process:

1) Come up with some quantifiable, 100% objective method to deterine "correctness." Terribly difficult to do, I know, and if it's done, I still don't see how a single number to "quantify correctness" is possible. But that aside, we then come up with some generally agreed threshold by which two implementations are agreed to be of "similar correctness." Very simplistically, say that they must be within 5% of each other in terms of the "correctness" score.

2) At that point, it is purely a subjective measure to determine which implementation is the most "pleasing", and therefore has the overall higest image quality.

In short:
1) Fundamentally, Image Correctness is quantifiable.
2) Fundamentally, Image Pleasantness is not quantifiable.

3) Overall Image quality has both components. The quantifiable component determines the degree to which the images can be compared. The subjective component determines the overall image quality of two images that are of comparable correctness.
 
GraphixViolence said:
Even if you are talking about a mode as specific as "4x MSAA", there is still no one correct or "best" way to do it. For example you could use an ordered grid, a rotated grid, or a sparse distribution of samples. And even within those three categories, there are an infinite number of possible sample patterns. Each one will work better for some images, and worse for others. For edges at randomly distributed angles, sparse should look better than rotated which should look better than ordered in most cases, but not all. So even if the refrast implemented one of these methods, it could never be guaranteed to produce the "ideal" image.

And of course, the same argument could be applied to texture filtering as well, complicating things even further.

I understand what you're saying now. Quantifying a comparison on FSAA methods seems to be impossible as it is not an apples-to-apples comparison. Although you still could do an apples-to-oranges comparison on the same level of FSAA. Like testing 2 cards that have 4x FSAA(no matter the method). It would be a totally subjective comparison, and thus couldn't be quantified. Basically what's been already going on. Again, as I mentioned earlier, I'm no longer wanting to quantify "image pleasantness" as Joe coined. But instead, I'm now wanting to quantify "image correctness".

BTW, nobody has ever answered my question as to what FSAA rendering methods have been defined in the reference rasterizer. Those are going to be the ones that you can do some kind of comparison that can be quantified.

Tagrineth said:
If the Reference Rasteriser's AA sample points could be manually set, it'd be a valid comparison...

Agreed.

Simon F said:
MSE (mean squared error) measure of the difference is probably better still.

Whatever. I don't care which method is used. Especially considering I'm not writing the app. :)

Simon F said:
Debatable. I certainly have seen things in the refrast which you would definitely do differently in a HW solution.

Hmm. I'm not so concerned with how a rendering method is implemented, but instead the output as compared to the reference output. I was under the impression the reference rasterizer defined the output of a given rendering feature and not the actual method for implementing the feature. So when you look at the output for bi-linear filtering on a card, you would then make sure the output was as close as possible to the output of the reference rasterizer. If the output was identical, then that feature was implemented correctly, no matter what hoops were gone through to get there.

I know that it is debatable as to whether Microsoft's reference rasterizer is correct, or even better for that matter. However, I contend that cards and apps that are used in Direct3D should strive to be correct as defined by the API. Even though DirectX is controlled by Microsoft, it's features are somewhat dictated by the hardware vendors. So in a way they help define the reference rasterizer. If they're implementing rendering methods that don't match the reference rasterizer, then as consumers, I believe we need to know if and how close they match. By using the reference rasterizer, I believe we can quantify the comparisons. However, I understand that not every feature comparison can be quantified.

Tommy McClain
 
Joe DeFuria said:
We would require some reference image or scene that can be agreed is "100% accurate representation of the ultimate quality scene".
Well, I think that is a fundamentally impossible thing to do. It's like saying "let's find the perfect apple, and then all apple taste tests can be objectively compared to the reference apple." Do you see a problem there?

While I understand what you are saying, I just don't agree. Yes, there are some obvious ways in which "correctness" can is objective. For example, Z-fighting issues, polygon tearing, color banding, missing fog, etc. are objectively "incorrectly" rendered components.

If someone wants to write a program that can detect and quantify these types of rendering errors, and establish an objective relative correctness based on those grounds, I'm all for it. IMO, that is the only "correctness" or "accuracy" that is truly objective.

That being said, consider this example: You have chosen your "100% accurate reference image." Card A renders a perfect replica, and is 100% correct. Card B renders in such a way that each and every pixel in the edge AA region differs from the reference image, and is only 85% correct. Now, to the eye, neither looks "better." Under close scrutiny, you might could claim that you could see a difference, but they would just be different, without either being in any way objectively "better."

Why would Card A then be more "correct" than Card B? More telling, why shouldn't card B's image have been the reference?

So you see, even choosing the reference image will be a subjective process, thus any future comparisons to that reference will ultimately produce subjective results. It is unavoidable.
 
AzBat said:
I know that it is debatable as to whether Microsoft's reference rasterizer is correct, or even better for that matter.
Bingo!

And granting the above statement being true, how could the closeness to the refras ever "objectively" quantify image quality?
 
Joe DeFuria said:
2) Image "pleasantness" (Please...someone come up with a better word!) ;)

How about "allure" or "attractiveness"? :)


Joe DeFuria said:
I would say that put BOTH of them together, and you get what we call "overall image quality".

I agree. There should definitely be 2 components to IQ.


Joe DeFuria said:
"Correctness" (accuracy) is more or less something that can be objectively quantified. We would require some reference image or scene that can be agreed is "100% accurate representation of the ultimate quality scene". In reality, that's an analog representation. In practice, it's going to be at any given resolution computed with "very high" color and z accuracy, and "very high" full scene antialiasing and advanced texture filtering.

So you're thinking that we should being doing correctness comparisons on a "ultimate quality scene" instead of against the reference rasterizer? I don't see how an "ultimate quality scene" could be agreed upon. The details of how that scene is described would be subjective as well. I think we should just stick to comparing defined rendering methods. The only thing I know of that has defined the output of different rendering methods is the reference rasterizer. In essence, continuing where 3D WinBench left off.


Joe DeFuria said:
"Pleasantness" is a purely subjective measurement of how acceptable the image is. Given the same image or scene rendered by two different pieces of hardware "at the same settings", different people can reach different conclusions about which one is more "pleasing." Given different implementations of certain features, rendering accuracy, trade-offs, etc.

Agreed.


Joe DeFuria said:
1) Come up with some quantifiable, 100% objective method to deterine "correctness." Terribly difficult to do, I know, and if it's done, I still don't see how a single number to "quantify correctness" is possible. But that aside, we then come up with some generally agreed threshold by which two implementations are agreed to be of "similar correctness." Very simplistically, say that they must be within 5% of each other in terms of the "correctness" score.

I can see how a single number is possible. Compare the difference between an image on a 3D card with a reference image. A score of 1 would mean the correctness is absolute. So the closer the score is to 1, the more correct the comparison. This is done similarly in the GPS work I do. I compare residuals of GPS signals to a signal from a reference satellite that has the highest signal-to-noise ratios. As the residuals move away from 1, the lower quality of a signal.


Joe DeFuria said:
3) Overall Image quality has both components. The quantifiable component determines the degree to which the images can be compared. The subjective component determines the overall image quality of two images that are of comparable correctness.

Good job. I agree that's a very good and accurate description of image quality.

Tommy McClain
 
Bigus Dickus said:
AzBat said:
I know that it is debatable as to whether Microsoft's reference rasterizer is correct, or even better for that matter.
Bingo!

And granting the above statement being true, how could the closeness to the refras ever "objectively" quantify image quality?

First, I didn't say that I want to quantify image QUALITY. I said "image correctness". However, you also forgot to include the sentence in which I said...

AzBat said:
However, I contend that cards and apps that are used in Direct3D should strive to be correct as defined by the API.

My idea now is to strive for correctness as defined by the API, no matter whether the reference rasterizer is correct or not. You CAN objectively quantify it's correctness as compared to the reference rasterizer. If it's decided that the reference rasterizer is wrong, then that's Microsoft and/or the hardware vendors fault and they need to fix it. I contend that the reference rasterizer should be just that, the REFERENCE by which all hardware should strive for.

Tommy McClain
 
Doomtrooper said:
Texture aliasing is very rare today with FSAA and AF, in the older cards it WAS a arguement that held some water..
Sure aggressive LOD can cause it, again falls back on the persons preference, they may prefer a bit of aliasing vs blurry textures (I know I do)...

A card like a 9700 has the feature set to deliver texture clarity and speed.

Texture aliasing is not very rare today. Anisotropic filtering does not reduce texture aliasing at all, and neither does the FSAA of most of today's video cards (GeForce3, GeForce4, Radeon 9500/9700).

Anyway, I think I'm going to attempt to do a little bit more rigorous examination of the texture aliasing that I've seen in the Radeon 9700 (texture aliasing that exceeds that of my GeForce4). I currently suspect that it's not due to LOD selection, though that certainly is a factor, but is due to the filtering method itself. This should be relatively easy to check, but I just haven't yet.
 
We had this arguement before, you play with your blur crap and I'll stick to clear textures...aliasing can be controlled easily and yes anisotropic does help with Aliasing by pushing the mip map border further..especially 16X..it doesn't eliminate it but it does make it far less noticeable.
 
Back
Top