darkblu said:
sireric said:
That is correct. s16e7 -- I store it normalized, and it's s+e7+16 = 24b of storage. I expand the mantissa to 17b (adding the 1.) for computations, when needed.
We don't have the 2^127 as the largest number, it's 1.999*2^63 -- Smallest is 2^-64. The range was deemed large enough for most items (1.8*10^19), while giving us 17b of mantissa, which is more than enough for texture lookup (2k texture requires 11b, plus 4b subprecision takes you to 15b -- The extra two bits improve precision in computations and reduce the probability of introducing errors in the max texture addressing computation) Our choice of 24b total was based on this -- enough to cover all texture addresses and most numerical items as well; a "good" balance, imho.
yes indeed, sireric, thanks for the explanation.
so you essentially traded ieee754 'forward' compatibility (i.e. fp24 could have been an ieee754 sans the last byte) for an extra bit of mantissa. drop in range itself is really nothing to worry about, but you also lost precission in the [0, 1) range (particularly 0 + epsion) - i'm very curious what conisderations you had when deciding on the tradeoff (yeah, i know may be asking for too much inside info)
Well, the exponent range is symmetrical, or almost, allowing us to get to a (epsilon) value of 1.0*2^(-64) (==5.4*10^(-20)) -- Which seems acceptable in the lower range. Given that the final displayable output is 10b only, both our range and precision have been found to be "perfect" in all test cases run through. I'm sure I could come up with some long shader that would show a few lsb bits of error, but that would not, generally, translate to something visible. For the test applications and games we've tested, (i.e. 3dmark, D3), we're within an lsb after the dozen or so instructions executed. Again, the balance seems to have been struck dead on. In every case, any visual errors found on the output will be due to the applications using textures that are too small or without enough precision -- Those errors are orders of magnitude larger than our internal precision possible errors. I could imagine that the VS might have more issues with a non-32b shader, but the R3xx class of products all have 32b VS, so there's no issue there.
The difference between 16b and 24b is significant, the difference, we've found, between 24b and 32b is very small and dwarfed by other limits (i.e. applications using textures that are too small or without sufficient precision).
We've even run renderman shaders, using our Ashli compiler, that are thousands of instructions long -- The output is perfect. You can run complexe maya models, and using the ashli plug-in, run in real time, and I cannot tell the difference between the software rendered and the HW rendered version, except that the HW version is much, much faster. Of course the prelim version of the plug-in doesn't offer all the features of the tool, so cannot "replace" the sw renderer in all cases, but it's a great add-on and can be used for a lot of modeling.