They have constant registers, I don't believe they have static branch support (but they do have hardware LOOP instruction support)
Look at it this way Colorless: Minimally, they could handle any static branch with predication and it would run at the same speed as the the no-branch CMP/SGE/style version. Secondly, any static branch can be handled by compiling two different versions of the shader.
The speed drop we are seeing (below dynamic branch!) clearly shows something is wrong with the driver, and my own investigations show that the optimizations the driver does are somewhat brittle.
I bet vertex texturing perform is even worse, because the compiler has to be much smarter about the nature of texture fetch latency and the MIMD pipeline and how to hide the latency.
Look at it this way Colorless: Minimally, they could handle any static branch with predication and it would run at the same speed as the the no-branch CMP/SGE/style version. Secondly, any static branch can be handled by compiling two different versions of the shader.
The speed drop we are seeing (below dynamic branch!) clearly shows something is wrong with the driver, and my own investigations show that the optimizations the driver does are somewhat brittle.
I bet vertex texturing perform is even worse, because the compiler has to be much smarter about the nature of texture fetch latency and the MIMD pipeline and how to hide the latency.