Deferred normals from depth.

Graham

Hello :-)
Veteran
Supporter
I've been thinking on this one for quite a while.

Basically, I had this idea floating about - that for a deferred renderer, you could get away without using MRT at all. Simply render depth + diffuse colour.

Now, if it's actually practical is totally besides the point (although I could see use on the 360, tiling advantages... less bandwidth hit).
My 'ultimate' idea revolves around not using normal maps at all - go all out with displacement maps, and let the depth dictate the lighting. 360 rears it's head here again. However one point crops up - I'll get to in a moment.


Anyway, I was mucking about, and actually got the damned thing to work. Which is always a nice thing. And to an extent it works better than I expected.


So. The whole idea is that you can get the information you need for your light pass, ie, position + normal all from the depth buffer.

The first challenge was getting position. Naturally being me I didn't look for the answer, but worked it out myself. It's a pretty simple quadratic equation problem. But enough of that,

The other bit was getting a normal.
And this is where things get fun, and SM3.0 only :)

So basically I finally have a use for ddx() and ddy() ;-). ddx and ddy are partial derivatives between adjacent pixels in the quad, horizontally and vertically. So you can work out what the change in a value is compared to the neighbour pixel. This, as it turns out, is nice and easy for working out change in surface position on the x and y axis in screen space. Cross these, and you have a normal :)


Anyway. Here is some shader code:

Code:
float4x4 wvpMatrixINV; // inverse world view projection
float near = 50.0; // near clip
float far = 5000.0;// far clip
float2 window = float2(800,600);
float2 windowHalf = window * 0.5;

...

float writtenDepth = .... // load depth from depth buffer;

//quadratic to get real depth (aka w)
float depth = -near / (-1+writtenDepth * ((far-near)/far));

float2 screen = (vpos.xy-windowHalf)/windowHalf; // screen pixel point

float4 worldPoint = float4(screen,writeDepth,1) * depth; // get the vertex position. * depth accounts for the /w perspective divide
float4 worldPointInv = mul(worldPoint,wvpMatrixINV); // mul by inverse of the world view projection matrix

//yay
float3 normal = normalize(cross(ddx(world.xyz),ddy(world.xyz)));

//dance
Output.Colour = float4(normal*0.5+0.5,0);

And here is a not-so-pretty pictar.

depth_normals.png



There are lots of ways this could be modified of course, store height in the alpha channel of your colour map to jitter things around. Could work.
Of course the thing I didn't think ofwas that it's all flat shaded (*slap*) - Soo... Er.... MORE TESSELLATION!

I'm not actually too sure if you could realistically see a performance boost using this method. Although it does, in effect, provide a fairly potent form of normal map compression. :p
I suppose in bandwidth restricted environments, there could well be a boost.

Thoughts? Questions? Links to papers already demonstrating this? :)


* ohh, and yes that is a quake3 map in an XNA application. I'm slowly porting an old .net Q3 renderer I upgraded a year or so back.. pics: 1, 2, 3 (thats not lightmapped btw :))
 
The problem with using ddx and ddy to determine normals in screen space is that ddx and ddy only work on a 2x2 pixel granularity. E.g. the horizontal rate of change you obtain for pixel n (with n even) is the same as pixel (n+1). This will cause continuity issues when lower-frequency normal maps are used which can result in banding artefacts.
Another issue is that an adjacent pixel may belong to a completely different object so computing a normal from those depths will be mathematically incorrect (and once again can create visual issues on polygon edges due to lighting). This issue is actually similar to the issue pertaining to the use of MSAA with deferred shading.
 
Yup.
I know :)

but I chose not to mention that, because I have a solution, but it's more complex and I haven't implemented it yet :p it would end up not using ddx/ddy on the edges by branching off into more complex code. It also requires a good edge outline, which I've only managed to produce rendering the scene twice. Which all kinda defeats the initial point :p
but it's still interesting.

It all sortof stems from the thing I did ages ago with the shadowed bunnies (two more pics), however it extends into doing antialiasing as a post process effect similar to this post from a while back, but on the gpu fully. From what I can tell it'd be a DX10 or 360 thing. Not doable on SM3.


This was one of the steps along the path to figuring out if my overall grand idea would actually work.
So while in it's current form it's utterly useless, it does prove a point.

Sorry if I didn't make that clearer in the original post :)
I guess I got a bit wrapped up in it all :p But I still think it's interesting :)


My main goal with doing this was to test if this part of the algorithm was feasible, and it seems it is. I was very worried that precision loss through all the calculations, etc, would screw it up.
 
Last edited by a moderator:
The position-from-depth encoding is fairly standard for deferred shading - IIRC they discuss it a bit in the GPU Gems 2 chapter on deferred rendering.

We've used something similar in a different context to get face normals and it works pretty well. I'm a little unsure of how this helps deferred shading much though. Just to avoid storing the normal? As you note it'll be flat shaded for one, and is the 2/3 floats really that bad?

You seem to imply that this is to avoid MRT - why would you want to avoid that, especially when MRT is 100% efficient on the G80 (and hopefully R600). I also don't see how you can avoid storing the rest of the BRDF inputs (data-driven stuff like specular colours/powers, emissive, etc).

I am intrigued by the idea, but is losing normal interpolation really worth it to save 2/3 float components in the G-buffer, or is there some other motivation?
 
So, could you use the alpha channel to store a material ID?

With a material ID you can then look up the correct combination of textures/maps for each pixel.

Depth + normal + material ID with MSAA samples available when reading from the depth buffer (so, SM4/XB360 only) to produce well-behaved edges and polygon intersections?

Jawed
 
Very nice Graham - that certainly is pretty damn cool! :) To be honest, I fail to see how the lack of interpolation is even a problem, you just need to have your height and/or normal maps to be created appropriately. Interpolation is a good hack if you don't have normal maps, though, which you obviously don't have in a Quake3 renderer!

AndyTX: MRT might be 100% efficient on the G80, but it's still going to scale by the datapaths. Obviously, 3x64bpp is going to take longer than 1x32bpp. What this means is that if you begin outputting a lot via MRTs, it's going to begin making sense to render things front-to-back, or even doing an initial Z-Pass. The first kills batching, the second wastes geometry processing and rendering time. Take your pick. And front-to-back is never perfect, of course. So it's not a major optimization, but certainly not useless either.

Jawed: That's worthless if you don't also have the texture coordinates. I have been researching this approach (passively, but thinking about it every couple of days) for the last few weeks. It has its fair share of theorical problems, which are pretty much the same ones as presented by SuperCow for Graham's algorithm. Nothing that can't be fixed, though - for a cost.

In practice, this is a very inelegant but potentially quite efficient way to emulate what a much smarter rasterizer would do: allow triangles sharing an edge to be rasterized in the same quad pipeline, while simultaneously (nearly) supersampling edges.



Uttar
 
To be honest, I fail to see how the lack of interpolation is even a problem, you just need to have your height and/or normal maps to be created appropriately. Interpolation is a good hack if you don't have normal maps, though, which you obviously don't have in a Quake3 renderer!
But you don't have texture coordinates either, so what good is a normal map? Besides even with normal maps you want interpolation for the tangent space basis (excepting world space normal maps, but then they need to be unique, like light maps). Theoretically texture arrays can index enough to data to allow for one gigantic atlas of all textures in the scene, but I can't see that being any more efficient than the alternative.

MRT might be 100% efficient on the G80, but it's still going to scale by the datapaths.
Granted, but in my experience reading/writing the G-buffer is still the cheap part of deferred shading. In the long term it won't be a problem at all since it's predictable streaming memory access.

Obviously, 3x64bpp is going to take longer than 1x32bpp.
I'm still missing how you avoid writing the rest of the G-buffer data... maybe I'm just being stupid. Someone care to explain? If you're not writing the BRDF inputs you're missing one of the key benefits of deferred shading: the ability to use the rasterizer to solve the light contribution problem per-pixel.
 
But you don't have texture coordinates either, so what good is a normal map?
Indeed, in a tradtional deferred renderer you don't have the texture coordinates. There is nothing preventing you from writing the texture coordinates and a material (->texture!) index, though, at least for a DX10-level-only renderer. If you don't have texture arrays, this has to become a switch statement, which isn't too efficient in hardware for obvious reasons!

But then you need a way to detect discontinuities (and Graham's approach for normals might complicate that), and while I think I've got one that makes sense on paper, I obviously haven't implemented it yet. I am also a bit uncertain about its performance characteristics... Should it even work at all, of course.


Uttar
 
Hello.

:)

I probably should have been more obvious in my original post I guess. This was simply an idea I had, and I wanted to see if it actually worked, and more importantly find out if precision problems made it useless.

The overall benefit I see is in the displacement mapped case, in such a case, storing displacement in the alpha of an existing texture is a lot simpler than adding a normal map with alpha. Then again, on such hardware it's probably possible to compute the normals in the vertex/geometry shader anyway, so it may well all be totally redundant. :)

At the end of the day, it was an interesting experiment, and had some interesting maths, so I thought I'd share it :)

I'll probably end up not even using it.
 
Indeed, in a tradtional deferred renderer you don't have the texture coordinates. There is nothing preventing you from writing the texture coordinates and a material (->texture!) index, though, at least for a DX10-level-only renderer.
Sure, but your "material index" is just a (potentially compressed) representation of all of the other BRDF inputs that you're neglecting to render, so I don't think much is gained. In particular, STALKER's approach of using a material index into a 3D texture of different BRDF slices seems to be a good trade-off, but still requires the other inputs to be saved. I don't think there's any getting around the fact that these inputs must be represented in some form or another - compressed or otherwise. Maybe packing more data into the material index could be handy for some of the inputs though.

At the end of the day, it was an interesting experiment, and had some interesting maths, so I thought I'd share it :)
Certainly, and I'm glad you did! I was just trying to understand the advantages and disadvantages - sorry if I implied that it was uninteresting (quite the opposite)!
 
Hmm.

I just had another idea..

I'm wondering if the use of ddx/ddy on a texture coordinate would give enough information to eliminate the need for tangent/binormal vertex data... :) :)

Be more accurate too, but per pixel transform, not per vertex.
 
Hmm.

I just had another idea..

I'm wondering if the use of ddx/ddy on a texture coordinate would give enough information to eliminate the need for tangent/binormal vertex data... :) :)

Be more accurate too, but per pixel transform, not per vertex.
You might want to read chapter 2.6 of ShaderX5: Normal Mapping without Precomputed Tangents, by Christian Schüler. ;)
 
You might want to read chapter 2.6 of ShaderX5: Normal Mapping without Precomputed Tangents, by Christian Schüler. ;)
My copy of ShaderX5 is in the mail, so I guess I know what artile I'll begin with now, thanks! ;)


Uttar
 
Ok.

Ohh and btw, the next algorithm I'll post will be even awesomer.*


*provided I get it to work correctly.
 
You might want to read chapter 2.6 of ShaderX5: Normal Mapping without Precomputed Tangents, by Christian Schüler. ;)

Could you probably summarize his algorithm (I don't have the book)? Is my guess right that he uses derivatives for texture coordinates and position to compute tangent and binormal per-pixel?
 
Another deferred shading backer :D
Though GPUs has enough power to process pixel-sized polygons, but it still might not a win compared to bigger triangle + normal maps. So I prefer packing normals.
The XBOX 1 RT DR paper could be done on Ati cards with d3d hack, but not as easy on NV(What a shame MS does not provide DST support), or the normal could be extracted via modify output depth, but it might be a very bad idea.
 
Another deferred shading backer :D
Though GPUs has enough power to process pixel-sized polygons

I have a fantasy of using dynamic displacement mapping to effectivly do ~pixel sized triangles in a single pass. Then do all your fancy stuff in deffered passes. One day... one day...


not as easy on NV (What a shame MS does not provide DST support)

[edit] Binary search!

:p :p :devilish:
 
Last edited by a moderator:
Back
Top