Advancements in CineFX’s technology have raised the capabilities of pixel shaders so they’re on par with vertex shader capabilities. With increased control at the pixel level, programmers can dramatically enrich their games, bringing more lifelike qualities to every character, object, and scene. The native 32-bit processing abilities of the GeForce 6 Series of GPUs also increase overall pixel shading precision, taking pixel shaders to new levels of unsurpassed image quality."
Just as there are new features with Vertex Shader 3.0, there are new features with Pixel Shader 3.0.
"No longer forced to restrict each pixel shader program to 96 instructions, programmers are now free from hardware2 limitations, and can implement more complex effects at the pixel level."
Just like the Vertex Shader, the Pixel Shader’s length is infinite. The shader program limit for PS 2.0 was 96 instructions; with PS 3.0 on the NV40 it is infinite.
"Full support for subroutines, loops, and branches—including loop counter registers and condition codes—and a new back/face register gives complete control to the programmer."
The same flow control capabilities of loops, branches, and subroutines are also supported in the pixel shader.
"The GeForce architecture has always enabled developers to choose the necessary level of precision for each image or scene. Now the choice is simpler, because the performance degradations associated with full 32-bit floating-point precision have been eliminated."
"Floating point operands can be treated in native 32-bit or optional 16-bit format, which are the standard formats in the film industry. Although both modes deliver equivalent performance, the 32-bit floating point mode uses twice as much memory to store the operands. Programmers can choose between native 32-bit mode and the optional 16-bit mode to achieve the required level of precision in each case. Plus, they can efficiently manage memory usage in situations where space is a consideration. Other data formats are also supported."
In DirectX9.0, Floating Point 24 was required as the minimum value for PS/VS 2.0. For PS/VS 3.0 the minimum requirement is FP32.
Floating Point 32 (FP32) is supported across the entire pipeline in NV40. The NV30 series also supported FP32. However, performance of FP32 in the NV30 series was very slow and often precision was lowered to gain performance in games. With NV40, NVIDIA claims that performance has been upgraded and is now very fast. In fact, above it states FP32 performance and FP16 performance are “equivalentâ€, though keep in mind FP32 uses twice as much memory to store operands.