NV30, FP32, and long shaders

KimB

Legend
We all know about the NV3x's FP register usage issue.

But what hasn't been tested is performance hit from "too many" FP registers vs. number of instructions.

Depending on how the architecture of the NV3x works, adding more than a given number of registers may reduce the IPC by a constant percentage, or may only add a delay between successive pixels.

If the performance hit comes from an added delay between pixels, then very long shaders should have very little performance hit from using the maximum number of registers available.

Has anybody done a test like this? I don't have an NV3x, so I can't do it myself. Such a test would be best done in OpenGL, because we can be more sure of what the hardware is actually doing.
 
If anyone is actually willing to create something like this, I'm more then willing to try it on my 5800u.
 
You could always try ATI's Ashli demo. It's got a shader that compiles to 7 passes on an R3xx. They all probably use a heck of alot of registers, but perhaps NVidia's Universal Compiler can reduce that.
 
Back
Top