Carmack's comments on NV30 vs R300, DOOM developments

no_way said:
elchuppa said:
It's interesting how some people seem to hate Carmack, to me he has never been particularly abrasive nor judgemental in style or manner. ...

I always like reading his .plan updates that's for sure.
Plus, he's a great rocket scientist. er no, make that engineer.
Still, i wonder what code path will the original Radeon use ?

arb path
 
Crusher:
What Mintmaster explained is exactly what I ment.

Just one addition.
Carmacks' reverse does operations on Z-fail for both front- and back-facing polys.

One could imagine one optimization that HierZ could do for Z-fail. Since it can say that a block of pixels Z-fails, and that the Z-fail-function should be called for all of them. It could skip the individual z-tests for each of the pixels in the block. But since the z-buffer and stencil buffer is mixed, you'd have to read the z-values any way, and the z-test units would just sit idle. So there realy isn't any benefit from it.

Mintmaster:
Thanks for the kind words. And for saving me a lot of work by giving a good explanation. :)
 
Maybe I'm misunderstanding why people think the situation is so bad. It sounds to me like people are either complaining that you can't use Hierarchical Z-Buffering at all when you are doing stencil shadows, or that it messes up the shadows if you do. I don't see how that can be the case. However, if the complaint is merely that the Hierarchical Z-Buffer doesn't help on the rendering of the shadows themselves, but still works everywhere else, then there's really not much to complain about (and this is how I understand it to work). Which is it?

If it's using a Hierarchical Z-Buffer and the scene is rendered normally with all the little overdraw-saving optimizations going on, you should be left with a z buffer filled with depth components for each pixel. If you aren't, then it's not really a z buffer anymore. If that's what you're left with, then certainly when you do z tests against it, the driver has to be able to allow you to test each pixel individually. If it doesn't, then I can't see how the driver is worth the bandwidth it takes to download it. Perhaps part of the problem lies in z buffer compression, which nobody has mentioned explicitly, but your comments of pixels being batched together suggest that it's at least part of the issue. From everything I've read, the z-buffer compression and the Hierarchical Z-Buffer are two seperate features, which is why I'm having such a hard time seeing a problem with Hierarchical Z-Buffering alone causing issues here. Are people mixing terms up and actually referring to HYPER Z instead of just Hierarchical Z-Buffering?

But even with compression on, it should still work fine. The depth values for 4 or 9 or 16 pixels, or however they batch them, would be the same, and calling a z-test function on any of those pixels would return the same value, but that shouldn't prevent the process of calling the z-test for each of the pixels. As you say, there really isn't any benefit from it *for that purpose*, but that's one of the requirements of stencil shadows. It should still operate correctly, and it certainly doesn't negate the benefit from using the Hierarchical Z-Buffer and the z-buffer compression for the rest of the rendering that takes place.

I guess I should break down and sign up as an ATI developer so I can download some docs to read through before I comment much more, I just find hard to fathom that HZ would be as horrible a thing to have when running an application that has stencil shadows. It looks to me like it should have the same benefit there as it would in any other application, it just wouldn't speed up the shadowing part at all (which again, I can't see anyone really complaining about).

And you're right that the reverse does stencil operations on z-fail for both sides. I looked through my code in a hurry and I have both methods in there, must have looked at the wrong function while writing that part :)
 
BTW, I don't claim to know more about this than any of you, I'm just trying to explain the way I understand it to work, and what I think is the logical way it *should* work, and of course both could be wrong. I just find it hard to believe that they would go to the trouble of supporting a two-sided stencil operation, which is pretty much only useful for volumetric shadows, and then implement a Hierarchical Z-Buffer algorithm that would get f'd up by the z tests required to generate them.

I just thought of another possible cause of the problem... does the Z-buffer compression compress the stencil buffer as well? I've always thought of them as being independent of each other.
 
Let's see he 1) creates a specific code path for the NV30, while 2) not creating one for the R300.

Then 3) he pats Nvidia on the back for making such great drivers, when the fact is he is tailoring his code to run on their HW to begin with! How could things not work exactly how he wants, when he goes out of his way to program for Nvidia's hardware?

And saying that the R200 path is fine to use with the R300 is just ridiculous...why not just run the NV30 with the NV10 codepath too?
It's not that complicated or conspiratorial. Carmack codes for OGL and the way to expose new features is through extensions. It is entirely up to HW vendors to write extension specs and drivers to expose new HW features.

In the case of the NV30, Nvidia wrote specs and drivers for an advanced fragment shader long in advance of actual HW. This shader is more advanced than the ARB fragment program, especially when combined with float buffers as Carmack mention later in his .plan.

ATI on the other hand have been happy to support the ARB path, as R300
doesn't have much more features than this exposes. The R200 path is coded using ATI's extensions, which have less functionality than the ARB extensions.

Basically he has coded paths to bring out the absolute best in each HW. All along he has said that the graphical quality on the paths will be almost exactly the same except for the NV10 path (from which he has removed specular highlights). The paths he has coded give maximum quality for minimum passes on each HW.
 
As I said in the original post.

HierZ won't help in stencil filling passes. (But the output should be rendered correctly.)

HierZ should work in the other passes. (Initial Z-pass, and lightning passes that just read the stencil buffer.)

So Carmacks' reverse will not "**** up" the HierZ, just make it useless under the stencil filling passes.
 
Crusher, I had a problem understand this too, but things became much more clear when I read this (note bold text) by Mintmaster:

Mintmaster said:
In other words, pixels from the stencil volumes that pass the depth test will not change the stencil buffer, but cannot be discarded rapidly by HZ. Polygons that fail can be discarded, but you don't want to discard them since they need to update the stencil buffer one pixel at a time.
 
LeStoffer said:
Crusher, I had a problem understand this too, but things became much more clear when I read this (note bold text) by Mintmaster:

Mintmaster: In other words, pixels from the stencil volumes that pass the depth test will not change the stencil buffer, but cannot be discarded rapidly by HZ. Polygons that fail can be discarded, but you don't want to discard them since they need to update the stencil buffer one pixel at a time.

Ah, sorry, guess I read through that too fast and thought he was saying something different. That would explain the source of the problem, but that leaves the question of why they can't switch early Z rejection off in the driver when they encounter a set of flags like:

D3DRS_ZWRITEENABLE = FALSE
D3DRS_STENCILENABLE = TRUE
D3DRS_STENCILZFAIL = D3DSTENCILOP_KEEP

It should be pretty obvious that that combination of flags imples per-pixel z-depth and stencil tests are going to be occuring. In fact, couldn't they just disable the HZ rejections anytime D3DRS_ZWRITEENABLE is set to FALSE? If you're not going to update the Z buffer, HZ doesn't do you much good anyway.
 
Crusher said:
If you're not going to update the Z buffer, HZ doesn't do you much good anyway.

Are you sure? You still need to check the Z to see if your current polygon or pixel is occluded. Checking an on chip tree is cheaper than checking the z buffer in memory.
 
RussSchultz said:
Are you sure? You still need to check the Z to see if your current polygon or pixel is occluded. Checking an on chip tree is cheaper than checking the z buffer in memory.

Well, I guess that depends on if you can do explicit z-tests through the tree without having it automagically discard polygons. If you could just diable the early-z polygon rejection and still use the tree, that would probably be best. Ultimately you want the shadows to be correct though, so if the polygon rejection is coupled tightly to the HZ functionality, it would be better to just disable the whole thing. I'm not sure how much performance you'd save using the Hierarchy when you're testing every pixel anyway, though. I was under the impression that the bandwidth savings were mainly beneficial when doing the normal scene rendering where you're both reading and writing to the z-buffer, and you've got a higher depth complexity (when you test the stencil shadows, you're only doing one depth-complexity level of comparison).
 
Oh, you're still talking about stencil. Never mind. Part of your text suggested turning off HZ whenever you weren't writing to Z, which I interpreted to be a larger case than just stencil.
 
Crusher:
I think you put to much into what I said. HierZ won't break anything, it will just not help (in the stencil filling passes). It doesn't need to be disabled. But it wouldn't hurt to do it if the three flags are set. But I guess you meant STENCILZFAIL == INC or DEC. (I don't know much about D3D internals, but that seemed to make more sense.)

Just looking at D3DRS_ZWRITEENABLE would not be good. That would disable it in the Doom3 lightning passes. (Those that read the stencil buffer, and do the lightning.)

RussSchultz:
Read my post on the top of this page.
Edit: I got in late here. Maybe you thought of some different case.
 
Well, yeah, I was talking in a general sense on that statement, because I can't think of many other situations when you disable z-writes off the top of my head, but I suppose you might want to for something. Still, I think the performance loss of disabling HZ anytime z-write is disabled would be preferable to not having the option of doing per-pixel z-tests.
 
Basic said:
Crusher:
I think you put to much into what I said. HierZ won't break anything, it will just not help (in the stencil filling passes). It doesn't need to be disabled. But it wouldn't hurt to do it if the three flags are set. But I guess you meant STENCILZFAIL == INC or DEC. (I don't know much about D3D internals, but that seemed to make more sense.)

Maybe that's why I couldn't get it to work :) Couldn't find any code examples so I just modified the Microsoft depth-pass method to use depth-fail, maybe their testing doesn't work right in reverse. Gotta love the fact that all the DX sample code works just well enough to do what they're trying to do, but never what you want to do.

It's still not right, but at least it doesn't reverse the shadows when you're inside the volume now.
 
I guess I was thinking of multipassing the rendering to speed it up.

First pass you only write the Z buffer (not doing the fragment calculations), next pass you only read it and perform the complex fragment operations. If your HZ buffer can reject 80% of what's being drawn you cut out that much traffic to the memory.

Or is there some particular reason that won't work?
 
It's 9:20 AM and I haven't slept yet, and I don't know the first thing about fragments, so I think I'll stop pretending to be a developer for the night. ;)
 
RussSchultz said:
Heh. I was just trying to be current. Replace fragment with pixel and then think about it again.

Yeah, I know what you meant, I'm a DX7 level programmer so far though (even though I use the DX8 SDK). I don't have any pixel shading hardware, so I haven't bothered diving into that yet.
 
Crusher, I have a feeling you don't quite follow the principle behind HiZ. First, forget about Z-compression, as that's unrelated to this discussion, and is transparent to the pipeline. Follow my previous explanation carefully, and if you still don't get it, I'll try one more time:

Say a tile of pixels (previously drawn) has a set of z values {A}, and "Zmax" the maximum value in {A} (i.e. the HZ value). Now the next batch to be drawn fits in this tile, with a set of pixels in it, {B}, and the minimum value of {B} is "newZmin", which is calculated analytically from poly info rather than through rasterization. If "newZmin" > "Zmax", then that means each value in {B} is greater than any value in {A}.

With depth pass rendering, {B} can be thrown away without individually testing the contents of {B} with the z-buffer data in {A}, since we know all of {B} fails.

Summary: "Zmax" > {A}, "newZmin" < {B}. Therfore, if "newZmin" > "Zmax", then {A} < {B}, so throw {B} away.

With depth fail rendering, by definition you keep the pixels that fail the Z test (if a current pixel is greater than the value in the Z-buffer), and you ditch pixels that pass the Z test (if a current pixel is greater than the value in the Z-buffer).
Throwing away pixels that fail the z-test is PERFECTLY FINE. It should never be keeping those pixels anyway, since you explicitly disable z-buffer writes before you render the volume. All you care about is the result of the test--the fact that it did fail--so that you can alter the entry in the stencil buffer accordingly
In Carmack's "reverse" algorithm, the pixels you keep aren't used to update the Z- or colour-buffer, but rather to increment/decrement the stencil buffer. If you throw them away rapidly one block at a time, you can't do individual stencil tests, so throwing them away is NOT FINE. I think you don't have a clear understanding of the graphics pipeline. You also said something about the driver that didn't make sense to me. This is all done in the hardware.

Now, the unfortunate part is you can't say whether all of {B} is less than all of {A} based on a comparison of "Zmax" with "newZmin", or even "newZmax" for that matter.

Summary: "Zmax" > {A}, "newZmin" < {B}, "newZmax" > {B}. Not enough info to ever say {A} > {B}, so {B} can't be discarded. What you need is a "Zmax" (Obviously, ATI is very aware of this ;) )


As for disabling early Z rejection, that's exactly our point. HZ can't make any conclusion by the above reasoning, so you're back to using the ordinary, full-resolution Z-buffer. The problem is not with the flags you mentioned, but rather with changing from depth-pass to depth-fail, which in D3D is changing the value of D3DRS_ZFUNC from D3DCMP_LESSEQUAL to D3DCMP_GREATER. As Basic said, HZ is useless here. HZ is not a "horrible thing to have". Listen to what we are saying before you comment. HiZ is just sitting there, unable to do anything to accelerate rendering of the stencil volumes, as it can't ditch batches of pixels that pass the z-test.

However, when you return to the lighting passes, you're back to ordinary depth pass rendering, and the HZ values can still help you reject pixels that won't be seen because of the Z buffer. The stencil test still occurs afterwards, for pixels that pass the Z-test.

If you still have any problems understanding, then I give up. Basic and LeStoffer, thanks for letting me know this was worth my while.
 
Back
Top