Wow, 75% is huge amount of time! I'm guessing the bulk of the time is spent on vert related processing due to the 6-8 lights, have you guys tried massively simplified proxy meshes and position only vert data streams for the shadow passes? If you are hit by fill rate, have 16bit depth buffers helped at all, or is fill not the main hit? I'm curious where your main bottleneck is.
NOTE:
This is slightly old info, as I have optimized our shadowing system a bit lately (implemented Exponential Shadow Maps and started to utilize the free 4xMSAA). See the last post in this thread for more info:
http://forum.beyond3d.com/showthread.php?t=47528
---
The platform we use does not support 16 bit depth buffers (only 32 bit ones). We are rendering to D24FS8 depth buffer to get the benefit of double fill rate z-only rendering. We are mostly fill rate bound during the shadow map rendering. The shadow map vertex shader is very simple (just transform), there is no pixel shader (no color writes) and the shadow object vertex format is optimized for bandwidth (16 bit per channel position only). All the shadow meshes are simplified (hand made by artists).
In addition to depth rendering, we are rendering translucent light projectors (modulate rgb color filters) to our shadow maps. We use R8G8B8A8 textures for the color filters (the platform does not support R5G6B5 render targets). The depth buffer (and hi-z) of the shadow map is used to depth cull the projectors behind first geometry intersection properly. With light projectors we get smooth semi transparent soft shadows from windows, fences, particles, etc.
Basically this is what we do every frame:
1. Render shadows for 3 layered directional PSSM (3092x1024 texture with 3 layers in it) -
Fill rate bound (depth only rendeiring to double fill rate)
2. Render translucent light projectors for the 3 layered directional PSSM -
Fill rate bound
3. Resolve the shadow maps one at a time from the 1024x1024 render target to a single 3092x1024 texture -
Fill rate limited
4. Render shadows for around 6-8 spot lights (1024x1024 to 256x256 resolution depending on light screen space bounding area) -
Fill rate bound (depth only rendeiring to double fill rate)
5. Render translucent light projectors for the spotlight -
Fill rate bound
6. Resolve the shadow maps from render target to texture -
Fill rate limited
7. Calculate the shadow map position and sample the single combined PSSM texture (one fetch and transform) and around 2-3 spot lights for each pixel (2-3 texture fetches and transforms). This adds both vertex shader and pixel shader cost, but not by much as our material system is very complex and we are pretty much
evenly ALU and TEX bound.