GD: NVIDIA and the gaming industry seem to be (eventually) moving to Pixel Shader 3.0 - What advantages do 3.0 have over 2.0? What effects can be done in 3.0 that can't be done with 2.0?
Ujesh: Shader Model 3.0 offers many new features that developers are excited about. I’ll highlight a few: First, there is the ability to finally support loops and branches in programs. This is a fundamental requirement and will improve the efficiency in how programmers can write their code. Another cool feature is Geometry Instancing. This feature is particularly useful for real-time strategy games. Instead of sending small models down the graphics pipe one at a time, programmers can batch up the geometry and send it down the pipe as a single instance. They can then index into that geometry and apply specific attributes. The result will be larger, more epic battles that play at blazing-fast frame rates.
GD: How long of a shader is optimal versus multipassing? Isn't using 2.0 with multipassing still faster than using a longer 3.0 shader?
Tony: Multi-passing is never faster than single pass in the shader, as it will always be more bandwidth and more instructions.
GD: Do you expect that the GeForce 6800 series will be able too render a (long) 3.0 Pixel Shader and still perform at top speed? How about 5 of them?
Tony: The GeForce 6 architecture is designed for the ability to render an arbitrary length pixel shader and still perform at full speed. Of course, that depends on how you define full speed. If by full speed you mean full hardware utilization, then yes. However, there is a direct correlation to rendered pixel rate and shader length. An extremely long shader will still get 100 percent utilization of the hardware, but will not run at full pixel rate. This is true for all programmable hardware, CPU's and GPU's alike. A 1000-instruction shader, regardless of whether it’s single pass or multi-pass, will not run at 16 pixels per clock, as it will take multiple cycles to compute the shader (unless of course that shader was running on pixels which got discarded by Z-Cull, but I'm pretty sure that's not what you mean...) Also, if that arbirtrarily long shader program was run on only one pixel in the frame, the delivered frame rate might not be impacted at all from that one pixel.
In essence, performance can be related to the number of pixels, and the number of computations that need to be run on those pixels. One loophole in this question would be a 1M pixel frame: If 999,998 pixels had a simple 4 operation shader, one pixel had a 1000 operation shader, and one pixel and 5 1000 operation shaders, you would have a total of 4,005,992 operations. That same frame with 1M pixels with 100% "simple" 4 operation shaders would be 4M operations. In this case, that frame would run at 99.8504% of 16 pixels per clock.