anyone has compared deferred/forward lighting for one light on unify shader gpu?

ccanan

Newcomer
as I remember, comparison on deferred lighting and forward lighting for one full screen light is more or less the same.
and the hardware should be non-unified shader gpu.
so anyone has tested the performance comparison on unified shader gpu for one light.
I think if it's similar on non-unified shader, it should be quicker for deferred lighting on unified shader.
 
If the light affects the whole screen forward lighting will be slightly faster (deferred has an extra pass/storage with no gain in this case). If it affects none of the screen, deferred shading *may* be faster. With a single light though they're both going to be plenty fast. The real advantage of deferred shading comes when there are many lights that affect smaller regions of the screen or are occluded.
 
How about making pixel-identical version of the one-light-case? Is that possible, given no AA? Can stencil-shadows be made pixel-identical to screen-space shadows?
 
Depends on the light source (do you have shadowmaps, how complex is the lighting equation and the shadowmap filtering & partitioning) and on your scene's overdraw properties. Usually the heavier the light source is to calculate per pixel, the better it is for deferred lighting (since you get zero light calculation overdraw).

In most engines, the scene is roughly sorted from front to back (by object center positions for example). Object sorting reduces overdraw, but you still get overdraw when object's bounding spheres intersect (per object sorting order results in wrong order, because it's not 100% accurate). Also for any objects that are not convex, the object will overdraw over itself.

And even if you had perfectly depth ordered scene, there will be extra pixel processing, because z-test occurs after pixel shader (does not save light processing cost), and hierarchal z-test is coarse (usually 4x4 blocks or bigger). So at every object border (depth discrepancy) you will have some extra pixel shader overdraw. Additionally if surfaces are near each other in z-buffer (for example paintings on walls, leaves on ground), the hierarchal z-buffer lower bit representation is not enough to distinguish the surfaces, and both pixels are processed (extra overdraw for lighting). All this extra overdraw increases light calculation cost for forward rendering. Deferred renderer always processes each pixel just once, no matter how complex the scene. The more complex light you have (sun light with SDSM partioned shadows with EVSM filtering for example), the more you gain by using deferred rendering.

If the light source is light (no shadows, etc). Then forward rendering is most likely faster. But it all depends on many other factors as well (how heavy your g-buffers are for example). I am assuming here that your engine doesn't support AA, or you are using post process AA (MLAA for example). If you want to use MSAA, then deferred will always be considerably slower in a single light scenario.
 
Yes valid points. I was assuming a pre-z pass in the forward renderer but of course that adds another geometry pass as well which will have some trade-off with the additional image-space pass of doing it deferred.

Long story short, deferred is normally better nowadays, particularly if you can do it tiled :)
 
Hi, thx for the nice replies.
The reason I start this thread is as our project just begins, we can't make sure if we'll have lots of local lightings, lots of caustic style effects. So deferred lighting tech decision is hard to made.
But if deferred lighting is already better on one single sun light(with shadow), everything is easier.

Just on theory and generally speaking, my current conclusion is deferred lighting should be better on unified shader gpu on performance.
I just have tested performance for a pretty complex scene&material on ps3 for both deferred&forward lighting, the performance is similar to each other.
But havn't done for unified shader architecture( xbox360, g80 architecture for nv on pc...).

As unified shader will focus on ps calculation on lighting pass, so it'll be quicker than non-unified shader->quicker than forward lighting.

For the reasons, I think:
* seperate calculation'll get better parallel situation for complex material nowadays.
* for blending multiple material for terrain, performance gains.
* complex scene makes even zcull not enough, not mention the little triangles, the waste shading calculation will be saved in deferred lighting.
 
I just have tested performance for a pretty complex scene&material on ps3 for both deferred&forward lighting, the performance is similar to each other. But havn't done for unified shader architecture( xbox360, g80 architecture for nv on pc...).

As unified shader will focus on ps calculation on lighting pass, so it'll be quicker than non-unified shader->quicker than forward lighting.
Our shader that writes the g-buffers is heavily ALU bound (almost 2x compared to fill), and this completely masks out the halved fill rate you get when writing to 2 render targets. The edram masks out the extra backbuffer bandwidth. In our case deferred (2x8888 g-buffers + depth) offers better pixel shader fill rate (with all bottlenecks counted) compared to our forward renderer (more ALU on geometry pass because of lighting). This is a really good thing, since there tends to always be some overdraw in the scene, and the overdraw count is unpredictable. Minimum frame rate is really important on consoles (you often want to have v-synch locked 30 fps or 60 fps) and using techniques that do not dip that low on worst case scenarios is always a good choice.

Reading the g-buffers isn't causing any performance loss on the unified shader console either, if you tile your light sources to screen space tiles, and do all your lighting on one pass (you read your depth buffer + your g-buffers only once). The shader that lights the tile is heavily ALU bound, and this completely masks out all the TEX/bw. So you get the g-buffers reads basically for free.

for blending multiple material for terrain, performance gains.
That's true, and it's same for decals (if you want to have proper lighting on all your decals).

But in our case virtual texturing helps much more than deferred lighting, as we only end up blending terrain and decals over just a few 128x128 blocks every frame, instead of the whole frame :)

not mention the little triangles, the waste shading calculation will be saved in deferred lighting.
That's true. Each polygon edge (that doesn't perfectly fit a hardware tile) causes also extra pixel shader processing for lights in forward rendering. So the smaller polygons you have, the more extra wasted light processing you have in forward rendering.

Yes valid points. I was assuming a pre-z pass in the forward renderer but of course that adds another geometry pass as well which will have some trade-off with the additional image-space pass of doing it deferred.
And pre-z pass doesn't do anything about the pixel shader overdraw caused by coarse hi-z culling (4x4 or bigger boundaries on object edges and hi-z depth value being too inaccurate to distinguish near surfaces) and the small triangle overdraw (pixel shader runs for the whole tile even if one pixel is inside it).

Pre-z pass solves the (non convex) object's own overdraw, and the object based sorting inefficencies perfectly, but it has a performance hit that scales on the scene complexity. I personally always prefer techniques that have (near) constant performance hit on all scenes (same amount of processing for every visible final screen pixel) over techniques that have wildly varying cost based on the scene complexity.

A perfect renderer would be one that scales perfectly by the screen resolution. One rendered final screen pixel would always take constant time to produce, no matter the content. Currently we have two major components that break this rule: geometry processing (everything using vertex shader basically) and overdraw (various kinds of it). To lower the geometry cost, it's important to process all geometry just once and to make all vertex processing as cheap as possible (simple shaders, low vertex counts at far away distance, etc). To lower the overdraw it's important to move as much stuff as possible to screen space (where all the processing is guaranteed to be done to final visible pixels only).
 
Last edited by a moderator:
Back
Top