Actually, with a good z-pass, in a forward renderer your shading is still almost strictly O(num_visible_pixels), taking all the problems with heavily alpha-tested geometry aside.
_If_ you choose a deferred renderer for performance reasons, you are basically trading bandwidth for geometry.
I'd say that with decent object sorting even the z-only pass is unnecessary. The engine just needs to chop the scene up a little finer, but that pays dividends in frustum culling anyway.
There are other disadvantages with deferred too that rarely get brought up. In addition to the bandwidth load of writing and reading the G-buffers, you also decouple the texture and math operations so that they can't run in parallel. While filling the G-buffers you have almost no math, so you're texture or ROP limited. While doing the lighting, you have only the G-buffers to read, so you're math limited.
On PS3 this disadvantage may not be significant because the G7x/RSX architecture isn't made to do both simultaneously at full speed anyway. Other architectures, though, can get through the same operations quicker on a forward renderer. I'm going to try to quantify this with some variables:
k - net overdraw (number of pixels evading early Z culling divided by screen pixels)
A - # cycles per pix to read textures
B - # cycles per pix to perform lighting math
C - # cycles per pix to write G-buffer
For G8x or R5xx/R6xx/Xenos doing forward rendering, total cycles per pixel would be:
k * max(A, B)
For a deferred renderer, cost would be:
k * max(A, C) + B
I'm making a few assumption here:
- Good thread handling to parallelize texture and math when possible
- The DR is not slowed down by reading the G-buffer due to sufficient math
- The DR is not slowed down writing the G-buffer due to texture BW when C > A
- The DR is not doing forward shadow mapping, though basically you just need to include this cost in B to keep the formulas the same.
Overall, it's not a clear win unless k is pretty big and B is a lot bigger than A and C. Take into account the work needed to get MSAA going, and I'm not a big fan of DR for ordinary workloads. Of course, these values can vary a lot from pixel to pixel, but it sort of shows why DR appears to have trouble getting the same framerate for similar FR scenes when comparing PC games.
There is one situation where I do see the advantage of DR, but I never see it mentioned. If you have lots of
local lights, then you can use the stencil buffer to mark pixels in the light's volume of influence (just like Doom3 does for a shadow volume) and only light those pixels. Good dynamic branching (again, a feature of R5xx/R6xx/G8x but not G7x/RSX) can largely negate this advantage, though, so even here I'm not convinced that DR is a big advantage.