The shadow volume CPU processing was not the main reason stencil shadows got replaced by shadow maps in all recent engines. Stencil shadow volume GPU extraction was possible and pretty fast even on DX8 GPUs. In our old DX9 engine we used fully GPU generated stencil shadows extensively and all our benchmarks indicated that GPU extraction was noticeably faster than CPU extraction. (we had around 20-30 fully uniformly shadowed light sources in view at once).
Yes, stencil shadow extraction on DX9 GPUs doubled the vertex count of the volume, but since the volumes were rather low poly, this basically didn't affect the performance at all (the bottleneck was never the vertex shader performance). The biggest drawback of stencil shadows has always been the massive stencil fillrate it requires when multiple shadow volumes are crossing the view in a bad angle. The performance is really view dependant and a slight change in viewing angle can drop the performance dramatically. It's really difficult for artists to tweak the performance of stencil shadowed scenes, as the algorithm performance is so erratic.
Positive things about stencil shadows:
- Stencil shadows combine really well with deferred shading, especially with LiDR (light indexed deferred shading).
- With stencil shadows you have to light only the pixels that receive light. Stencil test skips the complex deferred lighting shader completely for pixels that are in shadow. This saves around 30% of the (deferred) lighting shader performance.
- Stencil shadows use much less memory than shadowmaps.
- Stencil shadows use much less texture memory bandwidth than shadowmaps. The rendertarget bandwidth usage is however much larger (but in systems like Xbox 360, the render target is located in really fast EDRAM to provide practically unlimited bandwidth).
- Stencil shadows are pixel perfect and have no surface acne, no blockiness, no sample walking issues in moving lights, etc... For example self shadowing looks really good and requires no tweaking (good looking self shadowing has always been hard to do with shadow maps).
Negative things:
- Performance is sometimes good and sometimes abysmal. A simple chain link fence can drop the game to 1 fps if the camera looks along it.
- No alpha masked shadow casters are possible. Plants, trees and vegetation are all almost impossible to render properly with decent performance.
- For fast (DX9) GPU volume extraction the shadow meshes have to be closed surfaces. In DX10/11 you can likely use geometry shader to render more freely formed geometry efficiently.
- Deferred shading light area combination tricks do not work with stencil shadows, since stencil test can only fail or succeed (not succeed for one light and fail for the other). So you have to render each light separately (and this causes extra g-buffer reads and backbuffer blending).