Crippled DP performance holds back gamers or not?

As Andrew said before, fixed point math doesn't need any extra hardware support. Regular integer math is enough.

When converting fixed point result (or loading a stored fixed point value) to float, you need one extra (32 bit float multiply) instruction to scale the fixed point value to the wanted range. Also most GPUs support directly loading normalized fixed point values to floats from memory (normalized to [0,1] or [-1,+1]). This special case saves one ALU instruction.

16 bit fixed point values can be directly loaded to floating point mantissa, where the (2^n) scaling factor can be directly loaded to the float exponent bits. This is only possible if the scaling factor is 2^n (... 1/8, 1/4, 1/2, 1, 2, 4, 8...). This way the conversion is completely lossless (no rounding errors from 16 bit fixed -> float). Some GPUs have a single instruction for this (and some shader languages such as OpenGL ES 3.1 have API support for this). This is nice if your scale factor is 2^n.

For example in our vertex data preprocessor we round up the mesh bounding box (xyz channels separately) to the next power of two before we convert the floating point value to fixed point (scaled to fit the 2^n bounding box exactly). This way the float -> fixed -> back to float conversion doesn't incur any rouding error for vertices that are snapped to any origin centered grids (with 2^n spacing). This is important if you need to snap multiple objects side by side to a grid and don't want to see tiny holes in between them (holes occur because of rounding differences).

NVIDIA Kepler ("GK") integer performance isn't as good as AMD GCN integer performance. Shifts are 1/3 rate, 32 bit integer multiply is 1/6 rate, etc. It's still much better than the 64 bit float performance on GeForce cards. Teslas/Quadros actually can actually do a 64 bit FMA twice as fast as many integer operations (including 32 bit integer multiply). Integer shift rate is also crippled on consumer cards (half rate compared to pro cards). Chart here: http://docs.nvidia.com/cuda/cuda-c-programming-guide/#arithmetic-instructions
 
That said, I wish I could be more precise instead of always worrying about the precision from others when I aint even an expert and can't even appreciate the difference half the time.
 
Star Citizen is supposed to go to 64bit precision for larger envirorments sometime in the next 6 or so months. So we can see how the cards handle this soon
 
According to Wiki usual flight simulators have values tables where the planes approximated aerodynamical caracteristics are stored while X Plane can compute whatever plane shapes and extract its caracteristics.

To do so require solving (Navier-Stokes) non linear differential equations and those ones are generaly extremely sensitive to initial conditions and computing precision, the computation is surely done by the CPU and in principle is requiring double precision.
 
Anyone else wish for a 48-bit word size? It seems like the sweet spot for so much.

Deep color with 16 bits per channel, audio processing with 24-bit PCM stereo pairs, fixed point 32.16 formats, and of course a float with a nice sized mantissa would all seem to work well with a 48-bit word.
 
Back
Top