3DMark03 Mother Nature IQ comparison b/w FX & R300

Joe, the original archive of cho contains a pic with the 42.82 drivers which seem to render "correct".

I hate to quote myself, but what the heck. :)

mr said:
First look:

There are slight discrepancies to the refrast in every pic, but as most of them are "single pixel errors" I guess these are just slight differences in sample positions? The seed values seem OK to me.

The R300 comes fairly close to the refrast. There are slight differences in the riverbed stones and the water. Overall the pic is IMO a nuance darker.

The NV30 42.72 has obviously lost a bright sun. The waterbed differences are similar to R300 but the waterline seems to be lower.
The grass in the background seems to look different because of the lighting issue.
With the 42.82 drivers everything seems to be OK. This looks pretty much like the R300 pic with the same (little) differences compared to the refrast.
 
sure enough it looks great, i checked it out and they are almost dead on with the reference image and the radeon as far as image quality, there were a few leaves out of place between all the shots but so few that it is barley worth noting. now all we need is benchmarks for the 42.82 drivers to at least get a little better performance comparison of the cards.
 
d_player33 on Overclockers Australia pointed this out:

Radeon:
ATi.gif


GFFX:
FX.gif


The Radeon image isn't just brighter, it has more details.
 
GreenBeret said:
d_player33 on Overclockers Australia pointed this out:

Radeon:
ATi.gif


GFFX:
FX.gif


The Radeon image isn't just brighter, it has more details.

So he rendered the ATI picture in higher resolution and he got more details?
What a surprise...
 
pcchen said:
I wrote a shader to test R300's mantissa, and I found that it is 16 bits (not counting sign bit). So I guess R300's FP24 is s16e7?

My friend tested the same shader on NV30 and found out that its mantissa is 10 bits. So I think the shader is correct...

That is correct. s16e7 -- I store it normalized, and it's s+e7+16 = 24b of storage. I expand the mantissa to 17b (adding the 1.) for computations, when needed.

We don't have the 2^127 as the largest number, it's 1.999*2^63 -- Smallest is 2^-64. The range was deemed large enough for most items (1.8*10^19), while giving us 17b of mantissa, which is more than enough for texture lookup (2k texture requires 11b, plus 4b subprecision takes you to 15b -- The extra two bits improve precision in computations and reduce the probability of introducing errors in the max texture addressing computation) Our choice of 24b total was based on this -- enough to cover all texture addresses and most numerical items as well; a "good" balance, imho.
 
sireric said:
That is correct. s16e7 -- I store it normalized, and it's s+e7+16 = 24b of storage. I expand the mantissa to 17b (adding the 1.) for computations, when needed.

We don't have the 2^127 as the largest number, it's 1.999*2^63 -- Smallest is 2^-64. The range was deemed large enough for most items (1.8*10^19), while giving us 17b of mantissa, which is more than enough for texture lookup (2k texture requires 11b, plus 4b subprecision takes you to 15b -- The extra two bits improve precision in computations and reduce the probability of introducing errors in the max texture addressing computation) Our choice of 24b total was based on this -- enough to cover all texture addresses and most numerical items as well; a "good" balance, imho.
As a point of interest, nVidia claims that 32-bit FP is necessary for proper accuracy on texturing operations. This may be the reason that the Quadro FX apparently has more accurate rendering than the ATI FireGL X1.
 
sireric said:
That is correct. s16e7 -- I store it normalized, and it's s+e7+16 = 24b of storage. I expand the mantissa to 17b (adding the 1.) for computations, when needed.

We don't have the 2^127 as the largest number, it's 1.999*2^63 -- Smallest is 2^-64. The range was deemed large enough for most items (1.8*10^19), while giving us 17b of mantissa, which is more than enough for texture lookup (2k texture requires 11b, plus 4b subprecision takes you to 15b -- The extra two bits improve precision in computations and reduce the probability of introducing errors in the max texture addressing computation) Our choice of 24b total was based on this -- enough to cover all texture addresses and most numerical items as well; a "good" balance, imho.

Thanks for your explanation. It clears up some questions :)

Chalnoth said:
As a point of interest, nVidia claims that 32-bit FP is necessary for proper accuracy on texturing operations. This may be the reason that the Quadro FX apparently has more accurate rendering than the ATI FireGL X1.

I don't know much about the texture addressing precision effects. However, the accuracy of pixel shader is not necessarily associated with texture addressing precision.
 
good point Joe. :D


also, it is bound to be worth well more than a million to them as they sure are not forthcoming with answers. :rolleyes:
 
sireric said:
That is correct. s16e7 -- I store it normalized, and it's s+e7+16 = 24b of storage. I expand the mantissa to 17b (adding the 1.) for computations, when needed.

We don't have the 2^127 as the largest number, it's 1.999*2^63 -- Smallest is 2^-64. The range was deemed large enough for most items (1.8*10^19), while giving us 17b of mantissa, which is more than enough for texture lookup (2k texture requires 11b, plus 4b subprecision takes you to 15b -- The extra two bits improve precision in computations and reduce the probability of introducing errors in the max texture addressing computation) Our choice of 24b total was based on this -- enough to cover all texture addresses and most numerical items as well; a "good" balance, imho.

yes indeed, sireric, thanks for the explanation.
so you essentially traded ieee754 'forward' compatibility (i.e. fp24 could have been an ieee754 sans the last byte) for an extra bit of mantissa. drop in range itself is really nothing to worry about, but you also lost precission in the [0, 1) range (particularly 0 + epsion) - i'm very curious what conisderations you had when deciding on the tradeoff (yeah, i know may be asking for too much inside info)
 
Hyp-X said:
So he rendered the ATI picture in higher resolution and he got more details?
What a surprise...

They are the images you saw on the first page (in that review). He didn't enlarge the GFFX pic big enough though. Hence the size difference.

Whoever can't see the differences will need an eye test, IMO.
 
Chalnoth said:
You know what? That's really sad. The NV30 emulation is many times faster. Takes a few seconds to render a frame in the Dawn demo.

Hmm. Do you think the 3DMark03 image tests take the screenshots while running the benchmark? That may be why it took 10 hours, because he had to go through the entire benchmark through the refrast.
 
pcchen said:
I wrote a shader to test R300's mantissa, and I found that it is 16 bits (not counting sign bit). So I guess R300's FP24 is s16e7?

My friend tested the same shader on NV30 and found out that its mantissa is 10 bits. So I think the shader is correct...

Does this mean that NVidia's drivers are using 16-bit FP for PS_2_0? i.e. the low pixel shader benchmarks are from FP16, and FP32 would be even lower?

If so, that doesn't look good at all for NV3x. Before, we though that using FP16 all the time may be a hack that NVidia could use to save face, but it looks like they are already using it!

Humus, if you're reading this thread, could you put a "zoom factor" label in your mandelbrot demo so that we can see at what point we get precision artifacts?
 
Mintmaster said:
That may be why it took 10 hours, because he had to go through the entire benchmark through the refrast.
Nope. I rendered 1 frame(!) using the ref rast, and it took ~10 hours (using the IQ test). :D
 
wow, now if that is not a reason to be thankful for dedicated video hardware, i cannot think of what would be. ;)
 
kyleb said:
wow, now if that is not a reason to be thankful for dedicated video hardware, i cannot think of what would be. ;)
1frame/~10hours with the CPU, and with today's cards you get ~20frames/sec (or more) in the same scene! :D We have come a long way!
 
Back
Top