OpenGL guy said:But there's only one texture unit per pixel, so what's the point? Old games use textures, not shaders.
Well you could still execute an 8 color op shader in as little as 2 cycles. But the point would be, if I didn't need the increased precision, then even in a DX9 title, I could switch on 8-bit precision and increase my performance further.
When you write a C program, do you always use double's instead of floats, or 64-bit long words instead of 32-bit? No, you pick the precision you need, and no more, if you can get away with it and if it yields higher performance.
If I wrote a 100-op shader, but knew that I didn't need 16-bit FP and could deal with the error, and if running it in 8-bit mode allowed it to execute in 25 cycles, I would do it.
This is a case where NVidia optimized their design for one axis (shader execution speed) at the cost of other things (AA quality, etc) They made a tradeoff. and gave programmers more programmability, more control over the precision in the pipeline. ATI made different tradeoffs. The fact that 128-bit runs "slower" than 32-bit or 64-bit is a plus for Nvidia, and not a bonus for ATI, no matter which way you slice it. Because, as I said, the real way to look at it is that NVidia runs 128-bit as fast as ATI, but up to 2x faster in 64-bit. It's not a question of "slow down" but one of "I can choose to speed it up", the same way you can choose to reduce your screen resolution from 1280x1024 to 1024x768 or from 32-bit to 16-bit.