I just have tested performance for a pretty complex scene&material on ps3 for both deferred&forward lighting, the performance is similar to each other. But havn't done for unified shader architecture( xbox360, g80 architecture for nv on pc...).
As unified shader will focus on ps calculation on lighting pass, so it'll be quicker than non-unified shader->quicker than forward lighting.
Our shader that writes the g-buffers is heavily ALU bound (almost 2x compared to fill), and this completely masks out the halved fill rate you get when writing to 2 render targets. The edram masks out the extra backbuffer bandwidth. In our case deferred (2x8888 g-buffers + depth) offers better pixel shader fill rate (with all bottlenecks counted) compared to our forward renderer (more ALU on geometry pass because of lighting). This is a really good thing, since there tends to always be some overdraw in the scene, and the overdraw count is unpredictable. Minimum frame rate is really important on consoles (you often want to have v-synch locked 30 fps or 60 fps) and using techniques that do not dip that low on worst case scenarios is always a good choice.
Reading the g-buffers isn't causing any performance loss on the unified shader console either, if you tile your light sources to screen space tiles, and do all your lighting on one pass (you read your depth buffer + your g-buffers only once). The shader that lights the tile is heavily ALU bound, and this completely masks out all the TEX/bw. So you get the g-buffers reads basically for free.
for blending multiple material for terrain, performance gains.
That's true, and it's same for decals (if you want to have proper lighting on all your decals).
But in our case virtual texturing helps much more than deferred lighting, as we only end up blending terrain and decals over just a few 128x128 blocks every frame, instead of the whole frame
not mention the little triangles, the waste shading calculation will be saved in deferred lighting.
That's true. Each polygon edge (that doesn't perfectly fit a hardware tile) causes also extra pixel shader processing for lights in forward rendering. So the smaller polygons you have, the more extra wasted light processing you have in forward rendering.
Yes valid points. I was assuming a pre-z pass in the forward renderer but of course that adds another geometry pass as well which will have some trade-off with the additional image-space pass of doing it deferred.
And pre-z pass doesn't do anything about the pixel shader overdraw caused by coarse hi-z culling (4x4 or bigger boundaries on object edges and hi-z depth value being too inaccurate to distinguish near surfaces) and the small triangle overdraw (pixel shader runs for the whole tile even if one pixel is inside it).
Pre-z pass solves the (non convex) object's own overdraw, and the object based sorting inefficencies perfectly, but it has a performance hit that scales on the scene complexity. I personally always prefer techniques that have (near) constant performance hit on all scenes (same amount of processing for every visible final screen pixel) over techniques that have wildly varying cost based on the scene complexity.
A perfect renderer would be one that scales perfectly by the screen resolution. One rendered final screen pixel would always take constant time to produce, no matter the content. Currently we have two major components that break this rule: geometry processing (everything using vertex shader basically) and overdraw (various kinds of it). To lower the geometry cost, it's important to process all geometry just once and to make all vertex processing as cheap as possible (simple shaders, low vertex counts at far away distance, etc). To lower the overdraw it's important to move as much stuff as possible to screen space (where all the processing is guaranteed to be done to final visible pixels only).