PDA

View Full Version : Why don't benchmarks capture and compare some random frames?


g__day
19-Sep-2003, 14:36
Looking thru Aquamark 3's ability to capture selected frames - which I very much like - I got to wondering why benchmarkers don't go one step further to capture maybe 10 - 30 random frames and then compare these to software generated frames and alert people to the loss of image fidelity detected?

Surely this would be easy to do and would practically eliminate all but incredibly sophisticated cheats that alter image quality?

What do you guys and gals think of this proposal? It might save reviewers alot of time and preserve the integrity of posts on the hall of fame of quite a few benchmarks.

Imagine running 3d Mark of Det xx and getting a message that says - "Congratulations - you got top score - however as we detected a 68% variation in selected image quality in 20 random frames we thought we tell you then throw this benchmark result in the bin - Please feel free to press this button to send a how pissed off am I now message to NVidia - and have a nice day. :)"

Ostsol
19-Sep-2003, 17:27
Graphics hardware is not standardized enough for such a comparison to be useful, most of the time. With the various differences in texture filtering, LOD settings, FSAA sample patterns, precision etc. . . it is possible that no video card will be able to match the reference rasterizer.

MDolenc
19-Sep-2003, 17:45
Plus you would have to write your own software renderer, or require your users to install DX SDK to get the reference (that is if you write for DX).

Ostsol
19-Sep-2003, 18:02
Plus you would have to write your own software renderer, or require your users to install DX SDK to get the reference (that is if you write for DX).
Well, technically the benchmark could simply come with some screens from the reference rasterizer in an uncompressed format (or lossless format). However, I guess the screenshots would have to account for the different precisions available on current DX9 cards. Thus, there would have to be screens for FP32/FP16 mixes and FP24.

BRiT
19-Sep-2003, 23:19
All software-based screenshot capturing of frames have to go through the video driver. It is therefore possible for the driver to alter what is captured. This effectively makes screenshot capturing 100% worthless.

This is discussed in other threads around here. Read them for more expansive answers on why this is the case.

Vince
20-Sep-2003, 09:46
This statistical analysis would also be ignorant of the duality that lies behind a conscious perception of a given image: between quantitative technical precision and qualitative esthetic properties.

The most precise hardware routines with the highest of accuracies is not guaranteed to produce the most esthetically pleasing image to the user.

A prime example of this is 3dfx's hybrid, T-Buffer subset, FSAA algorithms. Something which to this day (and don't ask me why) people claim has among the highest of percieved image qualities. Yet, everytime the T-Buffer's jittered/rotated sampling patterns were compared to nVidia's equivalent against a baseline render - 3dfx had, by far, the more deviant image output.

It's very hard to draw a firm line as you propose, and it reduces to a more philisopical question of what qualities are superior: technical precision or estethic perception. And, quite honestly, in the grand scheme of things, during which particular IHVs will come and go, it's not a very useful metric. Stick with Rev and Baumann, much better bet.

Chris123234
20-Sep-2003, 19:41
This statistical analysis would also be ignorant of the duality that lies behind a conscious perception of a given image: between quantitative technical precision and qualitative esthetic properties.

The most precise hardware routines with the highest of accuracies is not guaranteed to produce the most esthetically pleasing image to the user.

A prime example of this is 3dfx's hybrid, T-Buffer subset, FSAA algorithms. Something which to this day (and don't ask me why) people claim has among the highest of percieved image qualities. Yet, everytime the T-Buffer's jittered/rotated sampling patterns were compared to nVidia's equivalent against a baseline render - 3dfx had, by far, the more deviant image output.

It's very hard to draw a firm line as you propose, and it reduces to a more philisopical question of what qualities are superior: technical precision or estethic perception. And, quite honestly, in the grand scheme of things, during which particular IHVs will come and go, it's not a very useful metric. Stick with Rev and Baumann, much better bet.

He was asking about it looking the same as the refference image not if its prettier.

You can always bench w/out AA and AF anyways.

g__day
21-Sep-2003, 02:52
Thanks guys - so the bottom line seems to be DX9.0 doesn't specify a rendered image precisely enough for an unbiased, expert reviewer to say if a hardware rendered frame has visual errors or not?

Shame about that - if the spec's were tight enough automated comparision to reference frames would be possible. What I am hearing is this is impossible because the standard leaves too much room for interepretation for automated validation to be possible.

Any idea if DX9.1 or DX10 will close this loophole?

Vince
21-Sep-2003, 07:43
Shame about that - if the spec's were tight enough automated comparision to reference frames would be possible

It's hardly a shame. For example, why should an IHV be "punished" for implimenting, say a superior stochastic sampling construct or any that is pseudo-random, which provides the end-user with a much better esthetic result? Because, that's what will happen if what you're advocating will be manifested.

This goes back to my first post which questions what's more important: output precision compared to an arbitrary baseline, or better esthetic quality which will deviate from said baseline.

Why you would want to impliment such a system is beyond me. Well, it's obvious why you did - but as I said before, IHVs come and go. Lets not make a fundimentally bad decision in the interm based on ones views of a singular component that will have negative global effects in the long-run.

There is no reason that we must turn to [blind] statistical analysis (and shun esthetics) at this juncture when the ability exists to present the end-user with the facts, photos and comparisons - and allow them to choose for themselves.

He was asking about it looking the same as the refference image not if its prettier

You totally missed everything I said.

Chris123234
21-Sep-2003, 18:36
You totally missed everything I said.

I thought i did thats why I added that second part.

Vince
21-Sep-2003, 22:55
I thought i did thats why I added that second part.

Ok, but then what good is even benchmarking it? If you're not going to run with AA or AF, then why stop there? Arbitrarily turn off all samplings or specific shader programs that don't give the results *you* seek to produce. :roll:

Kinda kills the whole concept, huh?

Chris123234
21-Sep-2003, 22:57
I thought i did thats why I added that second part.

Ok, but then what good is even benchmarking it? If you're not going to run with AA or AF, then why stop there? Arbitrarily turn off all samplings or specific shader programs that don't give the results *you* seek to produce.

:roll:

A benchmark can tax a card w/o AA and AF. Look at the HL2 figures :roll:

Vince
21-Sep-2003, 23:03
A benchmark can tax a card w/o AA and AF. Look at the HL2 figures :roll:

What's the purpose of a benchmark to you?

Because to me, one which doesn't provide me with knowledge of how a game will play on my system is useless. And I expect to play with features like AA and AF active and influencing the overall speed of the system. I want to see the impact in terms of real world preformance the features of my card has, thank you.

Now, if all you want is to produce some arbitrary numbers that show one IHV is *faster* or *bigger* or *better* than another, then your mentality is no better than that of an IHV who will cheat to gain that edge.

Hell, I'm sure you can produce a benchmark based on DX5 that "taxes" a system too - is that what you want?

Chris123234
22-Sep-2003, 00:12
A benchmark can tax a card w/o AA and AF. Look at the HL2 figures :roll:

What's the purpose of a benchmark to you?

Because to me, one which doesn't provide me with knowledge of how a game will play on my system is useless. And I expect to play with features like AA and AF active and influencing the overall speed of the system. I want to see the impact in terms of real world preformance the features of my card has, thank you.

Now, if all you want is to produce some arbitrary numbers that show one IHV is *faster* or *bigger* or *better* than another, then your mentality is no better than that of an IHV who will cheat to gain that edge.

Hell, I'm sure you can produce a benchmark based on DX5 that "taxes" a system too - is that what you want?

Wow you must be in a bitchy mood today. I play games @ 1600x1200 which is a resolution you dont really need AA. Maybe 2X but thats all. I only use AF when I play. AA is just an annoyance to me because it just blurs all the menus.

Myrmecophagavir
22-Sep-2003, 17:16
Wow you must be in a bitchy mood today. I play games @ 1600x1200 which is a resolution you dont really need AA. Maybe 2X but thats all. I only use AF when I play. AA is just an annoyance to me because it just blurs all the menus.
Then hassle the developers to turn AA off when rendering 2D things. Which game is it?

MfA
26-Sep-2003, 03:57
Maybe some review site should get a digital framegrabber and compare video output for 3DMark against refrast with some ridiculous supersampling ratio (16x16 should do, of course rendering would take a while) with some image quality metric (I like SSIM (http://www.cns.nyu.edu/~zwang/files/research/ssim/)). Dunno how well such tests approximate how offensive cheats/texture/line-aliasing/etc are to the human eye, but at least it's objective.

"Cheating" with software screencaptures would be easy enough to find with a framegrabber at the very least.

g__day
26-Sep-2003, 22:23
Which was the intent of my original post MfA + others. I see 3d graphics as modelling the real world based on mathematics. If the formulea are specificied precisely enough then they are comparable according to agreed standards and subjectivity is avoided.

I do not see this as a bad thing or one that limits creativity, so long as you have a standard and its fit for puspose. Having a standard that isn't close enough for comparison lets alot of subjectivity in and the marketers have a field day - battle by superlatives. I'd rather the engineers and game developers duke it out myself.

MfA
27-Sep-2003, 03:31
Well intent is nice, but the thread would have gone better if you had started with a concrete proposal how to go about it :)

g__day
28-Sep-2003, 01:09
Ah but if I knew that I'd be a very clever person indeed. Why presume someone who wants to learn already has the knowledge when they ask a question? I know what I'd like - I understand the maths - I don't know what the 3d API -> 3d driver will do, hence my questions about the standards and the challenges.

I raised this idea with Massive Development who partially answered my questions about this and used a Reference software rasteriser to analyse frame 4,000 of their benchmark - so obivously something can be done.

http://arc.aquamark3.com/forum/showthread.php?s=&threadid=399

3dGPU has hosted a gif showing the rendering difference on a software renderer vs NVidia and ATi hardware - gamma differences aside. Interestingly the RefRast is rendering at 0.6 FPS which seems pretty fast.

http://www.beyond3d.com/forum/viewtopic.php?t=8184

this gif shows the results

http://www.3dgpu.com/files/refrastcomp.gif

So to all who said it can't be done - well you can head in this direction with some success!

MfA
29-Sep-2003, 02:01
It can be done, but as long as you are relying on software grabbing instead of an external TMDS grabber you are exposing yourself to cheats.

I disagree with wired, but Im too lazy to register on that board.

Why is the vegetation entirely different between the reference shot and the 5600 one?

Ostsol
29-Sep-2003, 04:17
Why is the vegetation entirely different between the reference shot and the 5600 one?
The vegetation in the reference shot is entirely different from the 9800 Pro shot as well. Either everyone's cheating or that particular reference shot is kinda useless. Just goes to show ya that just because it's a reference shot, it doesn't mean it's right. After all, developers don't program games using a reference rasterizer.

MfA
29-Sep-2003, 05:44
Games yes, but benchmarks should aim at being reproducable and fair. Having the vegetation planted randomly, or being so sloppy with numerical precision that floating point imprecision in the vertex shader determines the final positions, is not something I appreciate in a benchmark.