Shading models in Deferred rendering

joysui

Newcomer
Hi guys,
I wonder if there is some solution to the shading models in Deferred rendering. As far as I know, integrating variant shading models to Deferred rendering is not practical,Now which I cound do is only blinn-phong shading model for all materials.
If I could store a material id into Gbuffer, and then how to use this id to fetch the right shading model? Via branching?
 
You can use whatever shading model you want as long as your G-buffer has all the parameters you need for it. What you can't do, is use different models at the same time on a per-material basis as with foward renderers, although material ID buffers allow you to do that to some extent as you mentioned. That might lead to branching or extra passes, depending on how you implement it. You can mix and match different shading models per-light though, which can be very usefull.
 
Thanks milk. But packing all the parameters into G-buffer makes G-buffer fat,so I think it maybe not practical, Also,use branching to select shading models is not performance friendly, though I have not tested it. I wonder if there is some other methods to tackle the problem.Any thoughts?
 
Physically based shading doesn't necessarily need multiple shader variations. Single shader can handle both dielectric and metal materials.

http://seblagarde.wordpress.com/2011/08/17/hello-world/

In game we deal with two category of material: dielectric and metal material.
Dielectric material (water, glass, skin, wood, hair, leather, plastic, stone, concrete…) have white specular reflectance. Metals have spectral (mean use RGB color) specular reflectance. But some metals have same Fresnel behavior at all wavelength (like aluminium). See [1] or [2].

With this in mind, it can be tempting for performance to have a separated shader for non-spectral (the most common material) and spectral specular material. This could reduce storage (require 1 channel in texture instead of 3) and save few mul/mad instruction in shader (the pow is not affected).

However, a common trick to save drawcall is to merge material ID (3DS Max definition) together to have only one drawcall per object. Dieletric and metal material on an object now shared the same texture and the same shader. To deal with this advantage, we only have one spectral specular lighting shader. The instruction penalty is really low and texture storage is not a big deal either as specular tend to be constant and require low resolution texture.

It's very important to feed correct input data to your physically based shading pipeline:
http://seblagarde.wordpress.com/2011/08/17/feeding-a-physical-based-lighting-mode/

If you go this route, you have to forget all the shading hacks of DX9 era. No custom per material/object fake highlight cube maps or/and custom blending formulas to emulate specific surface behavior (that work well only in a single lighting condition). You also need a good unified (global) lighting solution (and preferably area light sources for dynamic lights).

Link to Siggraph 2013 physically based shading course: http://blog.selfshadow.com/publications/s2013-shading-course/
 
Well that's just the ways of modern game rendering. Find the right shaders and the right parameters to cover most of your assets. Many games get a lot of mileage out of a couple parameters with physically based values. Classically you need an RGB albedo, and Mono specular and roughness, that gets most stuff covered reasonably acuratelly, but different games might add in extras like emissive, translucency (for screen-space SSS) colored specular, extra normal layers (samaritan used for wet surfaces) depending on how important those effects are for that game specifically. There have been advancements on packing techniques to keep all these G-Buffer layers in control, from inferred rendering to chromatic sub-sampling. But usually these extra stuff are only really that important for so few objects that devs simply render them in forward after the fact much like their transparencies. If you've got a light accumulation buffer, that becomes even more straight forward. Also, try to avoid the good old blinn-phong for your lights, and experiment with the better options, even if approximated, as that's what most best looking games have done recently and it adds a lot to their look.
 
There have been advancements on packing techniques to keep all these G-Buffer layers in control, from inferred rendering to chromatic sub-sampling. But usually these extra stuff are only really that important for so few objects that devs simply render them in forward after the fact much like their transparencies.
I personally hate additional forward passes, because a single forward rendered object near the camera covers up a big area, and thus causes a surprisingly large (and unpredictable) additional cost (lighting and shading cost is almost twice for those pixels). Extra cost can be reduced by a depth prepass (and stencil marking), but that doubles the geometry cost for all objects. I hate that even more :(

Instead of having extra forward passes, I prefer interleaving transparency data into MSAA subsamples. This way the extra cost becomes more manageable, and everything can flow though the same lighting pipeline. Also maintenance cost of separate forward rendered shaders (and all permutations of those) can become quite large in a big project.

If you need lots of texture data per pixel and you are using virtual texturing, you better just store the (indirected) texture coordinate (that points to the virtual texture tile cache atlas) to the g-buffer. This allows you to directly read the (DXT) compressed virtual texture data in the lighting shader. It saves bandwidth, since you don't need to store the uncompressed texture data (X times overdraw) and read the uncompressed data later (in the lighting shader). 2x16i is enough for the UV in bilinear case. Trilinear needs extra 8 bits for mip. Anisotropic needs another extra 8 bits (for UV gradients). Mipmapped shadow mapping also needs UV gradients in deferred lighting shader. Anisotropic filtering for EVSM shadow maps look quite good indeed. UV gradients are also needed for g-buffer decal rendering (but you don't need that if you have virtual texturing, since it's faster to render decals directly to VT tile cache). Basically you need gradients for every texture access that uses mipmaps in the lighting shader. Many current generation games sidestepped the problem with various hacks ("no mipmaps" being the most common one), but that's not going to be good enough in the long run (for either quality or performance). Mipmapping saves performance (and prevents oversampling noise), and anisotropic filtering prevents undersampling (prevents blurriness in glazing angles). You want both.

Branching in the lighting shader is also a good option (if you need multiple lighting models). Modern GPUs need much smaller branch coherency than old ones. 64 wide warp = 8x8 pixel tile. That's not bad. You will pay only some extra for object border pixels (in areas where shading model changes). Be sure to design your lighting shader in a way that most code is identical between the branches (g-buffer unpacking, light loop bodies, etc). If only a small part of the shader is inside the light mode if-else branch, you only have to pay extra for that part (as both if-else sides are executed in the border regions for all the 8x8 tile pixels). Also a small branch keeps the added GPR count low. Also remember that you don't need to reserve space for all light modes separately in the g-buffer. You should reserve some of the g-buffer channels for light mode specific purposes (these channels have different meaning for different lighting modes). This doesn't cause any problems with filtering, since g-buffers are always point-sampled (use DirectX load instruction for best performance). If you are using the virtual texture UV based g-buffer, you can of course read as many extra texture layers inside a branch as you want (using the same shared g-buffer gradients for sampling, as gradient instructions can't be placed inside a branch).
 
Also a good optimization (for modern GPUs) is to use 4x16i (64 bit) formats for storing the g-buffer instead of the commonly used 8888 (32 bit) formats. 4x16i formats are full rate (at least on AMD GCN cards), and thus result in doubled fill rate compared to having twice as many 8888 MRTs. Extra bit packing at the end of the shader needs a few (integer) ALU instructions, but you should never be ALU bound in the g-buffer shader (so the cost should be practically zero). Also the (point) sampling of 4x16i textures is full rate (at least on GCN).

It's a very good thing for deferred rendering that hardware manufacturers needed to improve 4x16 bit rendering/sampling speed to make HDR rendering fast. For g-buffer rendering, we need to exploit it manually, but that's fine as DX10 already brought a good set of integer operations for bit packing.
 
Wow,so much valuable information. Thanks, guys.
sebbbi, I also hate the deferred + forward, so I wanna know other methods to deal with the problem of shading models. I don't think I can afford the branching, you said "Modern GPUs need much smaller branch coherency than old ones" ,and I have the "old ones", I am still in the dx9 era. :(
And I can't open the link that you posted.
Thanks again!
 
even dx9 hw can be quite capable, but optimizations depend on the particular hardware. e.g. mobile phones, although they look dx9'ish, usually cope well with branching. consoles, although not branching friendly, you can have some cpu-ass to split buffers into material buckets depending on IDs.
 
If not all, then these:

Real Shading in Unreal Engine 4
Crafting a Next-Gen Material Pipeline for The Order: 1886
Physically Based Shading at Pixar
 
Back when we were doing "traditional" deferred rendering we used to just branch per-pixel on a BRDF ID stored in the G-Buffer in order to handle things like anisotropy, skin, hair, etc. For the most part it worked fine on the DX11-class GPU that we were targeting at the time. Hair used to give us problems though, since it was alpha tested and had poor coherency in a lot of places.
 
hi,MJP.
do branching in DX11 is more effiicient than DX9?
Branching is intuitive. I wonder if I can use some indirect methods like via lookup tables.
 
How efficient branching is depends on the HW and composition of a shader. It has little to do with DirectX level. You can always use any method you want but every single one is basically a trade-off. Lookup tables cost you texture fetch (if your LUT is a texture) or a lot of registers (if you use consts). (and I don't really see how LUTs would limit branching; you can limit it using more arithmetic though) Branching as is may hinder parallelism. What tends to kill performance WRT conditional branching is inconsistency between results of comparisons in different pipes running on HW. But even this is heavily HW dependent so there's no best approach. Only trade-offs.
 
do branching in DX11 is more effiicient than DX9?
Branching is intuitive. I wonder if I can use some indirect methods like via lookup tables.
Modern DX11 hardware have separate scalar execution units that handle branching instructions (and math involved in them). Thus if your branches are coherent, they are usually almost free. Branch coherency requirements have also decreased. Nvidia GPUs have 32 wide warps and GCN has 64 wide waves. In many cases (especially when working with compute shaders) you can guarantee that all your branches are coherent.

On old DX9 GPUs branching was very slow. In most cases multipass algorithm with stencil masking was faster than using branching.
 
Many programming constructs can be done without branching. Selecting a value without branching is possible (a = b ? c : d does a conditional move). Min/max (and many other) operations are also branchless. With these you can for example create branchless binary search or tree iteration.

Long branches should be avoided, because of the increase in GPR count. The more registers your shader needs, the less threads a single compute unit can run (latency hiding gets worse).
 
On old DX9 GPUs branching was very slow. In most cases multipass algorithm with stencil masking was faster than using branching.

Which can be an option if you wanna do material specific shading on a deferred renderer on that type of hardware - I think... The more different Material ID's, the more passes with their own stencils will be needed though.
 
Back
Top