Image Quality Comparisons, Some thoughts.

mjtdevries said:
Why couldn't you use uncompressed movies?

They needn't be very long or at very high resolution. You might make a movie of part of the screen to show differences. (yes the worst case scenario again)

In any case, there are loads of people who download countless movies of several GB every day. So a 400MB movie to show trilinear filtering issues wouln't be so bad every 9 months?

If your servers can't handle the load, just put the movie on edonkey/emule or something like that?

It might not be the best solution for everybody, but it would more for most people here wouldn't it?

In my country the government sponorsed telco monoply which has been since 50% privatised so they are only interested in the share holders. Only has to provide 19.2 Kbits connections.
 
bloodbob said:
The only problem is A) you can't use uncompressed movies they are just to frigging massive so you already get images different from the cards B) even compressed movies are large and their are hosting costs and in many places around the world their are telco monopolies which hold back broadband.

True, very true. But how often have I seen reviews that contain hundreds of full screen screenshots at every single combination of AF/AA setting for dozens of games. How much bandwidth does that waste conveying almost no relevant information! :)

Yeah, I know it's tough, I guess I'm expressing a wish for the medium-term rather than the short term.
 
Additional note: if it is true that R420 based products present under specific circumstances more aggressive LOD values, compared to R3xx's as an example it might very well be something insignificant to most of you.

I find myself in more than one occassion on the R300 to have to reduce the MIPmap LOD slider by one notch in order to find a better a balance between MIPmap detail and aliasing. If the overall pattern should be truly slightly more aggressive on X800 based products, then it's not a minor issue to me. The spot where I still cannot comment on without having a real time experience on the latter is how it would look like if I would reduce any LOD slider by more than one notches on those.

And yes in game or synthetic IQ evaluating screenshots might show you the slightly more aggressive LOD pattern; what it will not show you are the side-effects that mostly can be seen only in motion.


***edit: I don't think that large video files would be the sollution either.
 
ChrisRay said:
. Are Worse case scenerios really the best way to determine which IHV is better?

Your Thoughts Please.

Chris

To answer the question directly, 'Yes'. Further, I think the community has in fact been acting entirely in a reactive manner on this issue and that the driver has been totally the IHVs. We are acting exactly in a mirror-image fashion to what they have been doing, tho I would admit often with way too much emotionalism attached to it.

Consider, what you would call worse cases and worst cases they are calling better cases and best cases --and taking advantage of them to do less work. They have been trotting out various adaptive algorithms and strategies to take advantage of those cases where they can do less work without (they represent) impacting image quality. Sometimes (not nearly enough!) they are being upfront and proactive about what is happening, and others not. In either case the community feels a responsibilty --correctly in my view-- to test their claims and methods to see how well or poorly they meet the goal.

Doing less work is not *inherently* evil. Indeed, often it is widely praised. I'll give you a very non-controversial example: occlusion culling. You'll find many people who will rhapsodize on the joys of doing it well and point out who does it better with great approbation while regretting that other implementations don't do enough of it.

Having said that, there have been times in the past when occlusion culling was controversial and under great scrutiny as an image quality issue --remember 3dfx Hidden Surface Removal and how unreliable it was?

But it has been perfected over time by the major IHVs to where now it is a reliable general case optimization and a non-controversial "adaptive" technique. If it started getting flaky again, then you'd see the interest in scrutinizing it heat up as well.

And, in spite of the emotionalizm I noted above, and that the IHV's have not exactly been embracing the community's participation (else we wouldn't have had to "catch" them at these things), what we are engaged in is a fundamentally healthy process that in the end leads to better, more reliable techniques over time.
 
geo said:
I'll give you a very non-controversial example: occlusion culling. You'll find many people who will rhapsodize on the joys of doing it well and point out who does it better with great approbation while regretting that other implementations don't do enough of it.

Having said that, there have been times in the past when occlusion culling was controversial and under great scrutiny as an image quality issue --remember 3dfx Hidden Surface Removal and how unreliable it was?

But it has been perfected over time by the major IHVs to where now it is a reliable general case optimization and a non-controversial "adaptive" technique. If it started getting flaky again, then you'd see the interest in scrutinizing it heat up as well.

Funny you should say that, because one of the new DX "querying" features probably won't be giving the right results because of the manner in which many developers will be forced to use it to get speed ups. <shrug>
 
Ailuros said:
Additional note: if it is true that R420 based products present under specific circumstances more aggressive LOD values, compared to R3xx's as an example it might very well be something insignificant to most of you.

The problem is that you don't know what else is going on there - is it just a simple case of LOD shifting, or is there other stuff going on there as well? Is it the case that the LOD may do something different if presented with a different set of parameters?
 
Worst case scenario's are usefull in the same way that synthetic benchmarks are usefull.

It can show you have a technique works and in which real-life cases you can expect more or less impact from it.

But an optimization is usually a tradeoff between quality and speed.
When a worst case scenario happens only during 1% of gameplay, the tradeoff is obviously better than when it happens during 40% of gameplay.

Therefore it is wrong to only look at worst case scenario's, and not analyze how often they happen. (Most sites are guilty of this)

ATI's adaptive AF is a good example: Everybody knows that there are serious IQ tradeoffs. But the performance advantage was so big, that ATI was applauded for creating their AF algoritme.

If you can judge IQ with screenshots, then reviewers should first do a doubleblind test themselves, and then give the readers the same doubleblind test in the review too.

If you can only judge it in motion, then I am afraid that a uncompressed movie would be the only good way to show it to the readers. A reviewer can of course give their personal opinion of what they saw when they tested the card, but the risk of biased opinions is huge!
When you can trust a site do to their own doubleblind test internally and publish that results that might work too. (Unfortunately I don't trust most sites enough for this)

Unfortunately these kind of IQ difference would not be measurable or reproducable. So reviewers will not like it. (And it is difficult for most readers to understand)
But IMO it is the only proper way of judging IQ.

The only other option would be to establish a reference to which each card must comply even when you compare bit for bit. That doesn't really measure IQ, but it measures differences compared to a reference that you define as having the highest possible IQ.
In doing comparison tests, the drivers setting that accomplishes that feat, would need to be used. Companies would be allowed to make their own optimizations, but not without giving users the option to use the above configuration.

But it would still leave a discussion what card is fastest with "acceptable" IQ optimizations.

That would also mean that a standard like DX9 should make definitions better. Full precision would not be both 24 FP and 32 FP, but only 1 of them. (Otherwise you cannot do a bit-comparison)

Then you can start to measure IQ differences objectively and reproducable. The number of wrong pixels would be the IQ indicator.
 
Personally, I don't really see a problem with looking for a worst-case scenario, be that synthetically or through screenshots in a real-world situation.

However, I do think that it very much has to be tempered and balanced against normal, real-world screenshots to show image quality in a more general sense to give the average user an idea of what they will be seeing.
 
Worst case scenarios do need to be shown because there's no telling how often that scenario may crop up in the future. However, this needs to be balanced with relevant, useful commentary from the person who holds the card. Is this likley to happen often? Did it lead to aliasing? I'm not sure movies are practical as you would need several. We may be back in the "trusting the reviewers opinion" arena.
 
mjtdevries said:
That would also mean that a standard like DX9 should make definitions better. Full precision would not be both 24 FP and 32 FP, but only 1 of them. (Otherwise you cannot do a bit-comparison)

Then you can start to measure IQ differences objectively and reproducable. The number of wrong pixels would be the IQ indicator.

Well in most cases their shouldn't be a difference between 24 FP and 32 FP because its all gets rendered to a 8 bit frame buffer. In my opion +/- 1 from the proper result is fine any more then its a problem.
 
Ok, so a +/- 1 in difference is acceptable. That is fine by me, but we need to define it a little more.
Do you mean +/- 1 in the RGB value of a pixel that is displayed on screen? (In that case we can easily use our comparison tool on the pixels again)
When shaders get longer I can imagine that the difference between FP24 and FP32 gives you results that have a bigger difference.
 
About the "trust the reviewers" bit.

How many people here (including the ones that are reviewers themselves) trust other peoples reviews enough for that?

How many trust other people to judge trilinear filtering IQ issues when no movies or screenshots are (/can be) supplied to let the reader judge it for themselves?

Personally I only trust two reviewers enough for that.
 
I don't trust them either, but putting movies up for download simply isn't practical. One movie isn't enough since then we're back to "worst case" syndrome.
 
That's certainly not a worst case scenario, but a very good review on IQ.

I didn't know about the fallout from telling that truth. It's really sad that that happens.

Have other companies taken similar actions against review sites as a result of an unfavourable review? ATI, Maxtor, XGI, S3 etc?

I had almost forgotten how big IQ differences were just that short time ago. It makes the current IQ discussion about tiny differences look a bit silly.
 
IQ was never given ANY importance, I hate to bring up old reviews but you can see that performance was the only thing people were concerned about a couple of years ago. Anandtech shows the image quality issues that plagued the old Geforce series with S3TC, mentions the 50% performance hit for disabling it yet at the end of the review showing performance leaves the S3TC enabled :!:

http://www.anandtech.com/video/showdoc.html?i=1288&p=8

This trend has not changed, look at any Guru3D review for a good laugh.
 
The solution isn't huge uncompressed movies, enlarged high-contrast screenshots or mathematical forumlas that compare bit patterns. There is a much more simple way - a trusted reviewer's opinion. If Dave says he can't notice any IQ difference then I'll trust him on that. That's why we have reviews.
 
The trusted reviewer is a simple solution, but not a very practical because there are very few reviewers I trust enough for that.

I also trust Dave, but he doesn't do many comparisons between competing cards. So that is not enough when I want to decide what card to buy.

I trust Brent, but he tests by settings speed to a fixed value and showing the resulting IQ.

It is a very good method of comparison, but I would then also like to see the last kind of comparison: Leaving IQ fixed and showing the resulting speed difference. Unfortunately there is no site that does that.

And the second problem is that if there was a site that did that, I wouldn't trust the reviewer enough.

Because IQ is subjective and also depended on the type of games you play I don't think that 2 reviews is enough to decide which card to buy.
 
Back
Top