Hardware two sided depth test?

51mon

Newcomer
Hi
I have been thinking about a thing for a while now. When performing depth test only one side of a mesh could be compared against the z-buffer. At least on the hardware / API I have come across. In several cases there would have been great to do this test on both back and front faces in the same pass. Space clipping in deferred rendering could gain much, for example. I don’t have insight in the details of hardware / API implementation but it doesn’t feels like an impossible request. There already exists two-sided stencil features. Are there any hardware / API where you could do this already? Does anyone know why it doesn’t exist? Does anybody know of some future plans to implement this?

Thanks
 
I'm not sure I understand what you are proposing.

Are you wanting that
  • if a triangle is clockwise choose depth test 1 (e.g. pass if new depth > old depth) else
  • if the triangle is anticlockwise choose depth test 2 (e.g. pass if new depth <= old depth)?

I don't understand how this would be used, but I'm keen to be enlightened.
 
In a deferred lighting pass you might want to render "light volumes" to check which pixels are affected by a certain light. Only those pixels with depth values that lie inside the light volume can be lit, thus you want to find out whether the depth buffer value at a given pixel is between the front faces and the back faces of the light volume.

Of course most GPUs handle only one triangle at a time and would have no way of finding out which other triangle is the back face for the front face fragment you're currently rendering.

It's also relatively easy to render the back faces of the light volume setting stencil to 1 when the depth test fails, then render the front faces with stencil test. Light volume geometry can be very simple, it does not have to be accurate since you have to calculate light attenuation anyway.
 
As you say, light volumes can be handled in the same manner as shadow volumes, so I think I'd need to see a more convincing use case. <shrug>
 
Yeah you’re right. Sorry, my mind slipped and I missed that obvious fact. To enable what I request you would need to extend early-z functionality and some fast stencil as well (1 bit would be enough). The test would look something like this:

If(Backfacing)
{
If(PassTest1)
{
If(Stencil)
{
PerformShading
}
Else
{
SetStencil
DiscardFragment
}
}
Else
{
DiscardFragment
}
}

Else
{
If(PassTest2)
{
If(Stencil)
{
PerformShading
}
Else
{
SetStencil
DiscardFragment
}
}
Else
{
DiscardFragment
}
}

After each instance
{
ClearStencil
}


My idea is that you would be able to clip space in 3 dimensions completely, with all convex volumes (e.g. lights). With hardware instancing you would also be able to handle a huge number in one pass. I don’t know if this is possible or if the target applications would be enough to be worth the labor?

(hopefully I’m right this time)
 
As you say, light volumes can be handled in the same manner as shadow volumes, so I think I'd need to see a more convincing use case. <shrug>
Actually, one feature I would love is a seperate stencil operation for stencil fail depth pass, so that you can distinguish between depth fail and depth pass for all pixels that fail stencil. I would love to be able to render a crapload of light volumes efficiently without having to change renderstates inbetween.

Right now, there two options. One is to mark pixels with color writes disabled (as with shadow volumes), and then enable color write, send the geometry again, and run the shader on marked pixels while clearing the stencil buffer.

The other method doesn't need renderstate changes if you flip the Z-test, assuming you can lay out the index buffer to draw frontfaces first (easily doable with spherical light volumes, among other shapes):
1) All frontfaces pass the stencil test. They set stencil=1 on Z-pass, and you use early out in the pixel shader by testing the vFace register. Nothing should run the lighting shader.
2) For backfaces, the stencil test passes if equal to 0, and it clears the value too. The pixel shader will run on backfaces.

If we had a new D3DRS_STENCILFAILZPASS renderstate, then in step 1 we could fail all pixels in the stencil test but still mark only the Z-pass pixels. Then we wouldn't have to rely on dynamic branching, and the frontfaces are written super fast. It always seemed weird to me that there are only 3 renderstates for stencil operations when you have 4 different conditions: SPDP, SPDF, SFDP, SFDF. Why lump the last two together?

Anyway, I guess it's not a big deal. Such coherent early out is probably fast enough in the existing method.
 
My idea is that you would be able to clip space in 3 dimensions completely, with all convex volumes (e.g. lights). With hardware instancing you would also be able to handle a huge number in one pass. I don’t know if this is possible or if the target applications would be enough to be worth the labor?
I just realized that both you and I are trying to do exactly the same thing. You looked at the issue as a Z test inflexibility (you want different Z tests for each side), and I looked at it as a stencil test inflexibility (I want different stencil ops for both Z test results, even when stencil fails).

Try my method above. Early out in the pixel shader is not as good as failing the Z test, but it's a lot faster than doing the lighting computation.
 
It always seemed weird to me that there are only 3 renderstates for stencil operations when you have 4 different conditions: SPDP, SPDF, SFDP, SFDF. Why lump the last two together?
That doesn't look weird to me at all. The notion those are "lumped together" is inaccurate. Stencil test is clearly defined to happen BEFORE depth test, and if a fragment fails the stencil test it's discarded. How could you possibly take a different action based on the depth test if the fragment has never reached the depth test stage in the first place?
Granted, of course you could redefine stencil and depth test so they'd happen at the same time (and maybe current hw indeed does both tests at the same time anyway), but the logical view of the pixel pipeline as it has evolved from history just isn't like that.
 
Granted, of course you could redefine stencil and depth test so they'd happen at the same time (and maybe current hw indeed does both tests at the same time anyway), but the logical view of the pixel pipeline as it has evolved from history just isn't like that.
Exactly. :smile:

I understand the definition of which is first, but the data is packed together anyway. There's no reason to specify an order like that. Look at this reference page, for example:
http://msdn2.microsoft.com/en-us/library/bb205120.aspx

I was hoping that at least for DX10 we'd see a change.
 
I was doing some pseudo code implementation and think I found a way to implement complete 3D clipping in a single pass. You need to distinct StencilFailDepthPass from StencilFailDepthFail as Mintmaster suggested. But you also need an individual depth test for front and back faces. The fragment cull has to be done before PS (in the same way as early-z). The stencil write has to put early enough in the pipeline to ensure that the test is performed correctly. Multiple volumes can’t be open for evaluation in the same fragment at the same time. This restriction is probably not too hard to fulfill. Here is the pseudo code:


// The stencil need to be cleared to a value greater than zero, this will also be the reference value.

CullMode = None;
DepthWriteMask = Zero;

BackFaceStencilFunc = Greater;
BackFaceDepthFunc = Greater;
BackFaceStencilPassDepthPass = Decr; // Execute PS
BackFaceStencilPassDepthFail = Decr;
BackFaceStencilFailDepthPass = Decr;
BackFaceStencilFailDepthFail = Keep;

FrontFaceStencilFunc = Less;
FrontFaceDepthFunc = Less;
FrontFaceStencilPassDepthPass = Incr; // Execute PS
FrontFaceStencilPassDepthFail = Incr;
FrontFaceStencilFailDepthPass = Incr;
FrontFaceStencilFailDepthFail = Keep;
 
I was doing some pseudo code implementation and think I found a way to implement complete 3D clipping in a single pass. You need to distinct StencilFailDepthPass from StencilFailDepthFail as Mintmaster suggested. But you also need an individual depth test for front and back faces.
If you read my post above, all you need is to distinguish between StencilFailDepthPass and StencilFailDepthFail. You need to draw frontfaces before backfaces, but that's not usually a problem.

Also, in the method you described, you'll have trouble when the near clip plane enters a light volume. I like how it doesn't matter what order the faces are drawn for a convex volume, but this is a pretty serious limitation, and trying to use a vertex shader to prevent clipping will have drawbacks. This is the big breakthrough of Carmack's "reverse" algorithm.

Without separate CCW/CW Z tests, you can do this:

- Order the light volume index buffer so that frontfaces get drawn first for any pixel
- DepthFunc = Greater

- FrontFaceStencilFunc = Never pass
- FrontFaceStencilFailDepthPass = Incr;
- FrontFaceStencilFailDepthFail = Keep;

- BackFaceStencilFunc = Equal; // (reference is zero)
- BackFaceStencilPassDepthPass = Keep; // Execute PS
- BackFaceStencilPassDepthFail = Keep;
- BackFaceStencilFailDepthPass = Decr;
- BackFaceStencilFailDepthFail = Decr;

However, we still have the problem that neither your suggestion of separate CCW/CW Z functions nor my suggestion of separate SFDF/SFDP stencilops are part of any API. The best we can do is use early out in the pixel shader to distinguish between CCW and CW, as I mentioned in my first post.

MOD: It would probably be good to move this into the 3D Algorithms & Coding forum.
 
Also, in the method you described, you'll have trouble when the near clip plane enters a light volume. I like how it doesn't matter what order the faces are drawn for a convex volume, but this is a pretty serious limitation, and trying to use a vertex shader to prevent clipping will have drawbacks. This is the big breakthrough of Carmack's "reverse" algorithm.

Yeah that’s right you have to watch out for light volumes that intersect the camera position, maybe put them in a separate pass. I think that order independence would be a good thing when handling dynamic generation and transformation of the volumes. The method you suggested is of course an alternative and better in predetermined cases but makes dynamic handling harder.

However, we still have the problem that neither your suggestion of separate CCW/CW Z functions nor my suggestion of separate SFDF/SFDP stencilops are part of any API. The best we can do is use early out in the pixel shader to distinguish between CCW and CW, as I mentioned in my first post.

Exactly neither of us got the tools we need at our disposal. In order to make the techniques useful stencil culling and writing has to be performed before PS (no silent execution). I don’t know the state of modern graphic cards but I have heard that ati is better than nvidia at stencils.
 
Also, in the method you described, you'll have trouble when the near clip plane enters a light volume.
Actually, there is a very simple (HW) solution to capping volumes that are cut by the front clip plane. Mind you, it does have the same disadvantage as the "depth fail" approach in that it is patented.
 
Yeah that’s right you have to watch out for light volumes that intersect the camera position, maybe put them in a separate pass. I think that order independence would be a good thing when handling dynamic generation and transformation of the volumes. The method you suggested is of course an alternative and better in predetermined cases but makes dynamic handling harder.
Harder, but not impossible. Remember that you don't necessarily have to draw all the frontfaces first. You just need to make sure that for any given pixel, the frontface is drawn before the backface. A sphere, for example, could be drawn in a spiral from the inside out, starting from the point closest to the camera. An ellipsoid can be done with the same geometry if it is deformed using the axes instead of rotated. Even a bounding box can be done with a clever vertex shader. Since we're only talking about convex geometry, it shouldn't be too hard.

Exactly neither of us got the tools we need at our disposal. In order to make the techniques useful stencil culling and writing has to be performed before PS (no silent execution). I don’t know the state of modern graphic cards but I have heard that ati is better than nvidia at stencils.
Actually I think NVidia has always been faster but recently ATI caught up (and maybe surpassed) with stencil rejection, i.e. "keep" ops where you test but don't modify. For stencil writing, however, NVidia is still king AFAIK. G80 is stupendously fast here, like 50GPix/s or something.

Actually, there is a very simple (HW) solution to capping volumes that are cut by the front clip plane.
I'm sure there is, but if ATI/NV haven't implemented it and/or it isn't exposed to us, HW solutions don't do us much good...
 
Interesting, but does it work when the camera is actually inside a volume and not just clipping it? Would polygons behind the camera flip around and wind up filling the screen?

If so, that's exactly what 51mon's algorithm needs. Of course, there's still the matter of the two other HW features it needs.

BTW, can you or Simon think of a better way to cull unseen/unlit pixels in mass light volume rendering than my branching suggestion?
 
Back
Top