3DMark03 Mother Nature IQ comparison b/w FX & R300

worm[Futuremark said:
]1frame/~10hours with the CPU, and with today's cards you get ~20frames/sec (or more) in the same scene! :D We have come a long way!

Or you used a crappy slow CPU... :rolleyes:
 
Kristof said:
worm[Futuremark said:
]1frame/~10hours with the CPU, and with today's cards you get ~20frames/sec (or more) in the same scene! :D We have come a long way!

Or you used a crappy slow CPU... :rolleyes:
Weeeeelll.. Not THAT crappy! ;) Rendering a frame from GT4 (in 3DMark03) is pretty hefty for a CPU.
 
darkblu said:
sireric said:
That is correct. s16e7 -- I store it normalized, and it's s+e7+16 = 24b of storage. I expand the mantissa to 17b (adding the 1.) for computations, when needed.

We don't have the 2^127 as the largest number, it's 1.999*2^63 -- Smallest is 2^-64. The range was deemed large enough for most items (1.8*10^19), while giving us 17b of mantissa, which is more than enough for texture lookup (2k texture requires 11b, plus 4b subprecision takes you to 15b -- The extra two bits improve precision in computations and reduce the probability of introducing errors in the max texture addressing computation) Our choice of 24b total was based on this -- enough to cover all texture addresses and most numerical items as well; a "good" balance, imho.

yes indeed, sireric, thanks for the explanation.
so you essentially traded ieee754 'forward' compatibility (i.e. fp24 could have been an ieee754 sans the last byte) for an extra bit of mantissa. drop in range itself is really nothing to worry about, but you also lost precission in the [0, 1) range (particularly 0 + epsion) - i'm very curious what conisderations you had when deciding on the tradeoff (yeah, i know may be asking for too much inside info)

Well, the exponent range is symmetrical, or almost, allowing us to get to a (epsilon) value of 1.0*2^(-64) (==5.4*10^(-20)) -- Which seems acceptable in the lower range. Given that the final displayable output is 10b only, both our range and precision have been found to be "perfect" in all test cases run through. I'm sure I could come up with some long shader that would show a few lsb bits of error, but that would not, generally, translate to something visible. For the test applications and games we've tested, (i.e. 3dmark, D3), we're within an lsb after the dozen or so instructions executed. Again, the balance seems to have been struck dead on. In every case, any visual errors found on the output will be due to the applications using textures that are too small or without enough precision -- Those errors are orders of magnitude larger than our internal precision possible errors. I could imagine that the VS might have more issues with a non-32b shader, but the R3xx class of products all have 32b VS, so there's no issue there.

The difference between 16b and 24b is significant, the difference, we've found, between 24b and 32b is very small and dwarfed by other limits (i.e. applications using textures that are too small or without sufficient precision).

We've even run renderman shaders, using our Ashli compiler, that are thousands of instructions long -- The output is perfect. You can run complexe maya models, and using the ashli plug-in, run in real time, and I cannot tell the difference between the software rendered and the HW rendered version, except that the HW version is much, much faster. Of course the prelim version of the plug-in doesn't offer all the features of the tool, so cannot "replace" the sw renderer in all cases, but it's a great add-on and can be used for a lot of modeling.
 
Himself said:
BTW, worm, your 3dmark link is invalid, perhaps it has not been published. :)
:oops: I played around with some new things some time ago (only visible to us, sorry) and deselected it, and forgot to select it back. Works now! Thanks for pointing it out.. ;)
 
Mintmaster said:
Does this mean that NVidia's drivers are using 16-bit FP for PS_2_0? i.e. the low pixel shader benchmarks are from FP16, and FP32 would be even lower?

If so, that doesn't look good at all for NV3x. Before, we though that using FP16 all the time may be a hack that NVidia could use to save face, but it looks like they are already using it!

Perhaps there are some problems in my shader program, I can't verify it since both refrast and R300 give the same precision for both normal and PP versions of my pixel shaders, which is expected. My friend Cho tested the program on a FX and the result is 10 bits for both versions. It could be a driver issue, though.
 
It has been stated before that there is only a minor performance increase in going from fp16 to fp32 in the NV3X architecture. Therefore, it is trivial to believe performance would drastically drop if Nvidia enabled fp32 in the drivers. Also, I believe the lack of precision is only a part of the 42.72 drivers, which seem to be developed for benchmarking against the R350. The 43 series drivers seem to bring performance back to normal for the CineFX architecture (at least in Tom's benchmarks).
 
Back
Top