We all know about the NV3x's FP register usage issue.
But what hasn't been tested is performance hit from "too many" FP registers vs. number of instructions.
Depending on how the architecture of the NV3x works, adding more than a given number of registers may reduce the IPC by a constant percentage, or may only add a delay between successive pixels.
If the performance hit comes from an added delay between pixels, then very long shaders should have very little performance hit from using the maximum number of registers available.
Has anybody done a test like this? I don't have an NV3x, so I can't do it myself. Such a test would be best done in OpenGL, because we can be more sure of what the hardware is actually doing.
But what hasn't been tested is performance hit from "too many" FP registers vs. number of instructions.
Depending on how the architecture of the NV3x works, adding more than a given number of registers may reduce the IPC by a constant percentage, or may only add a delay between successive pixels.
If the performance hit comes from an added delay between pixels, then very long shaders should have very little performance hit from using the maximum number of registers available.
Has anybody done a test like this? I don't have an NV3x, so I can't do it myself. Such a test would be best done in OpenGL, because we can be more sure of what the hardware is actually doing.