As this topic seems to be rather interesting I've decided to share my thoughts on it too.
First of all - thanks to Humus. He managed to use this old philosophy(stencil culling) in a rather new way. It's not so easy to do as some may think, this is a real invention.
Second: I wouln't call it dynamic branching(it is not ), but rather stencil emulation of it.
Third: no performance boost on NVs shows the leck of early stencil discard. It's a disadvantage on Nvidias hardware, althought it's not so important as well. In this case Nvidia fails, but on the I've never seen methods like this to be used commonly.
Forth and most important one: where does 2-4x performance boost on radeons come from? The demo of Humus performs dynamic stencil culling of lighted fragments based on limit attenuation cases. But: he uses several lights with rather small radiation radius each. So, each pixel will be usualy lit by 1-2 lights, not more. Here does the performance boost come from. Using SM3.0 for it will result in much better boost as I predict, because it allwos you to avoid state changed and sending geometry more then one time. Could someone port this demo to SM3.0 and see if i'm right(I own only crappy FX5600 )? If my thoghts are right, stencil emulation will use nothing or very little in real games like Far-Cry, because of different environment. In Far-Cry each fragment is usually lit by all lights. NV4x gets some boost mainly because of one-pass solution(avoiding state changes). That's the reasom why I thing this optimisation is rather useless for real applications. It could however find use in scenes with lots of small lights - it will sadly enought still be an ATI-only optimisation.
First of all - thanks to Humus. He managed to use this old philosophy(stencil culling) in a rather new way. It's not so easy to do as some may think, this is a real invention.
Second: I wouln't call it dynamic branching(it is not ), but rather stencil emulation of it.
Third: no performance boost on NVs shows the leck of early stencil discard. It's a disadvantage on Nvidias hardware, althought it's not so important as well. In this case Nvidia fails, but on the I've never seen methods like this to be used commonly.
Forth and most important one: where does 2-4x performance boost on radeons come from? The demo of Humus performs dynamic stencil culling of lighted fragments based on limit attenuation cases. But: he uses several lights with rather small radiation radius each. So, each pixel will be usualy lit by 1-2 lights, not more. Here does the performance boost come from. Using SM3.0 for it will result in much better boost as I predict, because it allwos you to avoid state changed and sending geometry more then one time. Could someone port this demo to SM3.0 and see if i'm right(I own only crappy FX5600 )? If my thoghts are right, stencil emulation will use nothing or very little in real games like Far-Cry, because of different environment. In Far-Cry each fragment is usually lit by all lights. NV4x gets some boost mainly because of one-pass solution(avoiding state changes). That's the reasom why I thing this optimisation is rather useless for real applications. It could however find use in scenes with lots of small lights - it will sadly enought still be an ATI-only optimisation.