OpenGL guy said:
Some people have said that the extra passes are slow: This is true, but I really don't think the cost of extra passes is a big deal when you are talking about such long shaders!
If you consider that each pass changes the state, and therefore stalls the pipelines, I would think it could be a very large issue.
In order for multipass to have a small speed hit, this is what I see as necessary:
1. Either the state change stall is much smaller than the time to execute one pass, or:
2. The software manages the multiple passes (no auto-multipass, or some specialized auto-multipass...), so that the number of pipeline stalls is reduced (i.e. runs one pass over a large number of triangles, then the next pass, and so on).
The question is, is the available PS program length in the R300 long enough to make the hit from worst-case multipass small compared to the time it takes to execute one max-length program? (As a side note, this topic has made me rethink my stance on the need for unlimited-size programs...large enough program sizes might be enough...though the VS really needs to have long or unlimited progs available...). Obviously, the NV30 looks like it will have a much smaller speed hit from multipass than the R300.
And will any HLSL compilers (DX, OpenGL, Cg, RenderMonkey) generate auto-multipass?