Early-out depth test

Nick

Veteran
Just checking whether I got this right...

Pixel shading (or texturing and blending for the fixed-function pipeline) can be skipped when the depth test fails and alpha testing is off. In other words as soon as alpha testing is turned off, early-out depth tests can be used. Then it only has to update the stencil buffer (when active) and it's finished.

I'm just a tiny bit confused about the order of operations and their dependencies in a modern graphics pipeline. There's stencil testing, depth testing, alpha testing, the texture and blend stages (or the pixel shader), alpha blending, writing the depth value, the color value, updating the stencil when passed, not passed, passed but depth test failed...

Is there anywhere I can find a nice overview of this? I found the old Radeon pipeline, but it doesn't go into the details and doesn't feature any early-out mechanism or parallel processing. I'm sure I can figure it out myself just fine after a bit of sleep, but any clear resources would still be highly appreciated. I'm trying to work very analytical instead of using the test-and-compare approach...
 
Depth test is independent of alpha test... :oops: I was attempting to reorder the operations to allow early-out tests on everything. And by moving the stencil update operations I created false dependencies.

I just have to decouple tests from writes. It seems far easier to just do all the tests as early as possible, jumping over as many stages as possible when they fail, and keeping their results in boolean masks to do the writes later.

I found a nice way to do the stencil update in one go: (PASS * Z + ZFAIL * !Z) * S + FAIL * !S, where Z and S are the boolean results of the depth and stencil tests, and PASS, ZFAIL, FAIL are the updated stencil values for the corresponding conditions. Many optimizations are possible with special test conditions or matching stencil operations. It's very handy to have it all in one formula so I don't introduce extra branches in the pipeline.
 
I'm just a tiny bit confused about the order of operations and their dependencies in a modern graphics pipeline.
OpenGL and D3D agree on how the pipeline looks, from a logical standpoint:

You can find it on page 199 of the OpenGL 2.0 specs:

Fragment (after shader) -> Pixel Ownership Test -> Scissor Test -> Alpha Test -> Stencil Test -> Depth test (updates stencil) -> Blending -> Dithering -> LogicOp -> Framebuffer.
 
And of course, any optimisations, like an early deth test, are acceptable as long as they casue results to be consistent with the conceptual order of operations in the spec.
 
Bob said:
OpenGL and D3D agree on how the pipeline looks, from a logical standpoint. You can find it on page 199 of the OpenGL 2.0 specs:

Fragment (after shader) -> Pixel Ownership Test -> Scissor Test -> Alpha Test -> Stencil Test -> Depth test (updates stencil) -> Blending -> Dithering -> LogicOp -> Framebuffer.
Hi Bob, Indeed that's the logical sequence of the operations, but it's without any optimization. Most importantly, it doesn't explain how an early-out depth test works. Anyway the OpenGL specifications seems quite detailed, so I might get more information/confirmation out of it. Thanks!
GameCat said:
And of course, any optimisations, like an early deth test, are acceptable as long as they casue results to be consistent with the conceptual order of operations in the spec.
That's exactly the problem. For example first I brought the depth test and write to the front of the pipeline (for the early-out mechanism), but the depth write shouldn't occur when the alpha test fails. There are a few solutions. I could do the depth test at the end of the pipeline when alpha testing is enabled, or I could do the test at the front and the write at the end (after the alpha test). I think the latter approach has some implementation advantages.

It's this kind of optimization information that is missing from the specifications (although I can agree it's not a part of it). Looking back, it's easy to see that splitting the test from the write was the right solution, but it would be very nice to have a document explaining an optimal pipeline.

Well, I'm probably just asking for too much so I'll just implement it incrementally and keep my eyes open... ;)
 
If any of the test fails (scissor/depth/stencil/alpha), you can skip pixel processing. You can have early outs for all of those tests and do them in any order you like. Early out for alpha is probably the most complicated, since you need to know if alpha is modified in the shader or if it's just an alpha of specific texture (common case). For early depth with latest pixel shaders you also need to check if pixel shader modifies the oDepth register (ps2.0 and up). Doing the early depth test first would prolly be good idea since latest games usually have the z-pass to take advantage of z-cull.

Cheers, Altair
 
In Hardware, early Z test might not be usable in conjunction with alpha test because the Z writes should be committed right after the test. In software however, this restriction doesn't exist as long as you don't try to simulate a pipeline. Also, early Z reject should be possible in any case.
 
Altair said:
If any of the test fails (scissor/depth/stencil/alpha), you can skip pixel processing. You can have early outs for all of those tests and do them in any order you like.
Hey Altair,

There's another exception. When the depth test fails, color is not written but stencil is still updated. Ironically, the only test where I can skip the rest of the pipeline is alpha test, but the whole shader has to run to get the alpha to compare with. :rolleyes:

My current view of the pipeline with early-out depth test is:
Code:
bool depthPass = depthTest();

if(depthPass || alphaTestEnabled)
{
	shader();
}

if(alphaTest())   // True when not enabled
{
	if(stencilTest())   // True when not enabled
	{
		if(depthPass)
		{
			writeColor();
			writeDepth();
			writeStencil(PASS);   // Does nothing when disabled
		}
		else
		{
			writeStencil(ZFAIL);   // Does nothing when disabled

		}
	}
	else
	{
		writeStencil(FAIL);   // Does nothing when disabled

	}
}
It took me longer than five minutes to write this. :( Skipping the shader is simply not possible when alpha testing is active. It has to compute alpha to know whether the stencil still has to be updated when the depth test fails. By the way, here I only keep the depth test boolean explicitely, but when working with quad pipelines it becomes more complicated than that. Oh and I'm assuming the depth is not adjusted in the shader of course...
 
Xmas said:
In Hardware, early Z test might not be usable in conjunction with alpha test because the Z writes should be committed right after the test.
Depth write is still after the stencil test I believe.
In software however, this restriction doesn't exist as long as you don't try to simulate a pipeline.
Could you explain a bit more what you mean?
Also, early Z reject should be possible in any case.
This too? ;)

As far as I know it doesn't really work any different in software. I want to implement the same operations, not simulate what every component of the pipeline does. When it's inactive, I remove it from the software pipeline. After "early Z reject" I still have to update the stencil, when the alpha test passes (if active).

Lots of conditionals. ;) Luckily most configurations don't use all tests simultaneously so it can be optimized greatly...
 
Nick said:
There's another exception. When the depth test fails, color is not written but stencil is still updated.
Actually no, stencil is not updated in that case. Or well, it depends if D3DRS_STENCILZFAIL != D3DSTENCILOP_KEEP, which is default and quite common case (sorry, forgot that one).

Nick said:
Ironically, the only test where I can skip the rest of the pipeline is alpha test, but the whole shader has to run to get the alpha to compare with. :rolleyes:
You can also skip it if stencil/scissor fails. You don't either need to run the whole shader to figure out alpha. I guess at pixel shader creation stage you could analyze if the alpha is result of a simple operation (e.g. plain texture fetch) and make special test for it. AFAIK this is pretty common case when alpha test is used so you could gain something by doing this (I think ATI does something like this).

Nick said:
Skipping the shader is simply not possible when alpha testing is active.

I don't think your view is correct. I believe it's more like:
Code:
if(scissorTest() && stencilTest() && alphaTest())
{
  if(depthTest())
    shader();
  else
    if(stencilZFail()!=D3DSTENCILOP_KEEP)
      updateStencil(stencilZFail());
}
I believe alpha test should be done last since it's probably the most expensive test. I just put it to the beginning to avoid redundant checks.
 
Altair said:
Actually no, stencil is not updated in that case. Or well, it depends if D3DRS_STENCILZFAIL != D3DSTENCILOP_KEEP, which is default and quite common case (sorry, forgot that one).
The shadow volume algorithm defines stencil update operations for both when the depth test fails and passes. So, when stenciling is active, it's not uncommon to do another operation than 'keep'.

Anyway, the general formula (PASS * Z + ZFAIL * !Z) * S + FAIL * !S is easy to optimize. I already have written the code that handles every possible situation optimally.
Nick said:
You can also skip it if stencil/scissor fails.
In the first case it still writes to the stencil buffer. ;) When alpha test fails the whole pipeline can be left. Scissor test is done at the clipping stage, by adjusting the frustum planes, so I don't consider it part of the pixel pipeline. I just need a framework that works under all conditions first, then optimize from that...
You don't either need to run the whole shader to figure out alpha. I guess at pixel shader creation stage you could analyze if the alpha is result of a simple operation (e.g. plain texture fetch) and make special test for it. AFAIK this is pretty common case when alpha test is used so you could gain something by doing this (I think ATI does something like this).
That's very true, but I don't really consider it a crucial optimization.
I don't think your view is correct. I believe it's more like:
Code:
if(scissorTest() && stencilTest() && alphaTest())
{
  if(depthTest())
    shader();
  else
    if(stencilZFail()!=D3DSTENCILOP_KEEP)
      updateStencil(stencilZFail());
}
I believe alpha test should be done last since it's probably the most expensive test. I just put it to the beginning to avoid redundant checks.
You can't do the alpha test before the shader. And the stencil test also doesn't enclose everything, because when it fails it's still updates the stencil buffer (when the alpha test passes). Of course it works with certain quite common settings, but I believe you'd get quite the same resulting pipeline with my structure.
 
Nick said:
The shadow volume algorithm defines stencil update operations for both when the depth test fails and passes. So, when stenciling is active, it's not uncommon to do another operation than 'keep'.
Yup, that's what the if(stencilZFail()!=D3DSTENCILOP_KEEP) was for. Anyway, you are absolutely right that when stencil fails, it may still update stencil buffer depending if D3DRS_STENCILFAIL!=KEEP, but for that also depth and alpha tests must pass, IIRC. So let me have another try, just for fun :)
Code:
if(alphaTest())
{
  if(stencilTest())
  {
    if(depthTest())
      shader();
    else
      if(stencilZFail()!=D3DSTENCILOP_KEEP)
        updateStencil(stencilZFail());
  }
  else
    if(stencilFail()!=D3DSTENCILOP_KEEP)
      updateStencil(stencilFail());
}

Nick said:
You can't do the alpha test before the shader. And the stencil test also doesn't enclose everything, because when it fails it's still updates the stencil buffer (when the alpha test passes).
It depends on the utilization of stencil buffer of course. We use stencil for few things, but not for shadows. I believe Mr. Carmack is one of the very few who use stencil for shadows in games atleast :) You can do the alpha test without executing the entire shader though, but obviously it depends what's the shader like if you gain anything from it.

edit: Duh, that's exactly the code you wrote :oops: Should stop posting between rebuilds and pay more attention ;) So, the way you could organize it might be something like:
Code:
if(depthTest())
{
  if(stencilTest())
  {
    if(alphaTest())
      shadeRGB();
  }
  else
    if(stencilFail()!=D3DSTENCILOP_KEEP && alphaTest())
      updateStencil(stencilFail());
}
else
  if(stencilZFail()!=D3DSTENCILOP_KEEP && stencilTest() && alphaTest())
    updateStencil(stencilZFail());

bool alphaTest()
{
  shadeAlpha();
  ...
}
 
Not quite sure what the original question was, but does this answer it?

Scissor%20Test.gif
 
Ilfirin said:
Not quite sure what the original question was, but does this answer it?
Hi Ilfirin, the question was more specifically about the order of operations when using the early-out depth test optimization. It changes the simple 'linear' look of the pipeline quite a lot.

Anyway, I think I got it all figured out now. My quad pipeline will soon be completely implemented. The way things look now, it will be a nice factor faster than my previous pixel pipeline, with higher precision and closer to hardware specifications. And I haven't even started optimizing yet. ;)

Thanks all for the ideas! I'll keep you updated...
 
Good luck Nick! When will we have a d3d9 version with sm3.0 support that will give detailed statistics in pixel level (z-cull, etc?) ;)
 
It should be possible in some cases via static analysis to determine if a shader will ever write to the alpha, and in some cases, it can even be determined what value it will write, if it is immutable (e.g. a constant expression). The drivers already have to scan the shaders to determine if Z is modified to perform similar analysis.
 
Altair said:
Good luck Nick! When will we have a d3d9 version with sm3.0 support that will give detailed statistics in pixel level (z-cull, etc?) ;)
Thanks!

Well I'm currently finishing the quad pipelines. After that I think my goal will be to run Unreal Tournament 2003/2004 to test the fixed-function pipeline extensively and determine which optimizations are most succesful for this new approach. Shader Model 3.0 is only a few steps away really, but I'd rather first push out an entirely completed fixed-function pipeline first. I have to start selling this to pay my rent so I need to get it at commercial quality first...

Anyway collecting statistics is something that is certainly going to be possible in one of the next versions. It could be an important feature to some. ;)
 
Nick said:
Depth write is still after the stencil test I believe.
Right, but hardware does both depth and stencil test at the same time.

In software however, this restriction doesn't exist as long as you don't try to simulate a pipeline.
Could you explain a bit more what you mean?
If you don't have a pipeline with several things going on in parallel, you don't have to worry about accessing or changing ressources that other pipeline stages are accessing, too.

iow, you can't do an early Z test and defer writing until the end of the pipeline because the Z test for the next pixel needs to compare to the updated Z buffer. You can only do a conservative early reject if you have to write later.
 
Xmas said:
Right, but hardware does both depth and stencil test at the same time.
The stencil test can be moved around quite a bit I see. What makes you think hardware does (read: has to do) depth and stencil test at the same time? Ok it could reduce complexity and dependencies.
If you don't have a pipeline with several things going on in parallel, you don't have to worry about accessing or changing ressources that other pipeline stages are accessing, too.
Ah, you mean a pipelined pipeline. ;) Yes I can see that things become more complex then. Your comment about software not having to deal with certain issues is becoming clear now.
iow, you can't do an early Z test and defer writing until the end of the pipeline because the Z test for the next pixel needs to compare to the updated Z buffer. You can only do a conservative early reject if you have to write later.
Aren't pixel pipelines emptied before going to the next triangle? :? I can see that for tiny triangles that isn't really beneficial... But I think I see your point. When alpha test and stencil test are disabled the depth write can happen very early and the next pixel can enter the pipeline almost directly after it.

Thanks!
 
Back
Top