One of the main features of pixel shader 3.0 is that is supports dynamic branching (also called data dependent branching), something that nVidia of course makes a big deal about. This demo however shows a way to implement dynamic branching without pixel shader 3.0, using basic graphics functionality such as alpha testing and stencil testing. Not only is it doable, but also gives every bit of the performance of real dynamic branching.
One scenario where dynamic branching comes in handy is when you have range-limited lights. Whenever a fragment is outside the range of the light, you can skip all the lighting math and just return zero immediately. With pixel shader 3.0 you'd simply implement this with an if-statement. Without pixel shader 3.0, you just move this check to another shader. This shader returns a truth value to alpha. The alpha test then kills all unlit fragments. Surviving fragments will write 1 to stencil. In the next pass you simply draw lighting as usual. The stencil test will remove any fragments that aren't tagged as lit. If the hardware performs the stencil test prior to shading you will save a lot of shading power. In fact, the shading workload is reduced to the same level of pixel shader 3.0, or in some cases even slightly below since the alpha test can do a compare that otherwise would have to be performed in the shader.
The result is that early-out speeds things up considerably. In this demo the speed-up is generally in the range of two to four times as fast than without early-out. And with more complex lighting, the gain would have been even larger.