I was thinking about DemoCoder's statement that he thought that based upon the docs, the NV40 was still going to be a multi-precision architcture like the NV30. I didn't thoroughly read them, but the only thing I did remember seeing was data types being multiple-precision, not processing. So, I decided to go back and see what I could find.
I felt the obvious place to look for whether or not the NV40 would be a multi-precision architecture would be to look in the DX9 optimization doc:
http://developer.nvidia.com/object/gdc_2004_presentations.html
If you'll note, there is no mention in the DX9 optimization document about using the _pp hint to optimize shaders, which you would think would be there if that was still necessary for optimal performance.
Of course, it is possible that nVidia decided that this was old news, and thus didn't need mentioning. The paper only does describe a couple of techniques of optimization, and certainly does not cover all of the different 'do's and 'don't's of DX9 programming.
And, then there's this hint from the GLSL paper:
That said, given that the NV30 does gain benefit from using FP16 over FP32, and given nVidia's past history, the NV40 will likely be more of an evolutionary step over the NV30 than a revolutionary one, it may still seem more likely that the NV40 will still gain from using FP16. This would occur if nVidia has not nixxed the FP register usage performance hit entirely, but has instead reduced it. It would also happen if nVidia has taken the multi-precision architecture a step further and physically increased processing power when using FP16.
So, will the NV40 actually benefit from using FP16 in the shader?
I felt the obvious place to look for whether or not the NV40 would be a multi-precision architecture would be to look in the DX9 optimization doc:
http://developer.nvidia.com/object/gdc_2004_presentations.html
If you'll note, there is no mention in the DX9 optimization document about using the _pp hint to optimize shaders, which you would think would be there if that was still necessary for optimal performance.
Of course, it is possible that nVidia decided that this was old news, and thus didn't need mentioning. The paper only does describe a couple of techniques of optimization, and certainly does not cover all of the different 'do's and 'don't's of DX9 programming.
And, then there's this hint from the GLSL paper:
Is nVidia only talking about current hardware here? Or the NV40 as well?• Supports HLSL-style types – float, half,
fixed and equivalent vector, matrix types
– half precision (fp16) is sufficient for most
shading calculations (colors, unit vectors)
– faster on GeForce FX series processors
– no penalty on other hardware
That said, given that the NV30 does gain benefit from using FP16 over FP32, and given nVidia's past history, the NV40 will likely be more of an evolutionary step over the NV30 than a revolutionary one, it may still seem more likely that the NV40 will still gain from using FP16. This would occur if nVidia has not nixxed the FP register usage performance hit entirely, but has instead reduced it. It would also happen if nVidia has taken the multi-precision architecture a step further and physically increased processing power when using FP16.
So, will the NV40 actually benefit from using FP16 in the shader?