Crusher, I have a feeling you don't quite follow the principle behind HiZ. First, forget about Z-compression, as that's unrelated to this discussion, and is transparent to the pipeline. Follow my previous explanation carefully, and if you still don't get it, I'll try one more time:
Say a tile of pixels (previously drawn) has a set of z values {A}, and "Zmax" the maximum value in {A} (i.e. the HZ value). Now the next batch to be drawn fits in this tile, with a set of pixels in it, {B}, and the minimum value of {B} is "newZmin", which is calculated analytically from poly info rather than through rasterization. If "newZmin" > "Zmax", then that means each value in {B} is greater than any value in {A}.
With depth pass rendering, {B} can be thrown away without individually testing the contents of {B} with the z-buffer data in {A}, since we know all of {B} fails.
Summary: "Zmax" > {A}, "newZmin" < {B}. Therfore, if "newZmin" > "Zmax", then {A} < {B}, so throw {B} away.
With depth fail rendering, by definition you keep the pixels that fail the Z test (if a current pixel is greater than the value in the Z-buffer), and you ditch pixels that pass the Z test (if a current pixel is greater than the value in the Z-buffer).
Throwing away pixels that fail the z-test is PERFECTLY FINE. It should never be keeping those pixels anyway, since you explicitly disable z-buffer writes before you render the volume. All you care about is the result of the test--the fact that it did fail--so that you can alter the entry in the stencil buffer accordingly
In Carmack's "reverse" algorithm, the pixels you keep aren't used to update the Z- or colour-buffer, but rather to increment/decrement the stencil buffer. If you throw them away rapidly one block at a time, you can't do individual stencil tests, so throwing them away is NOT FINE. I think you don't have a clear understanding of the graphics pipeline. You also said something about the driver that didn't make sense to me. This is all done in the hardware.
Now, the unfortunate part is you can't say whether all of {B} is
less than all of {A} based on a comparison of "Zmax" with "newZmin", or even "newZmax" for that matter.
Summary: "Zmax" > {A}, "newZmin" < {B}, "newZmax" > {B}. Not enough info to ever say {A} > {B}, so {B} can't be discarded. What you need is a "Zmax" (Obviously, ATI is very aware of this
)
As for disabling early Z rejection, that's exactly our point. HZ can't make any conclusion by the above reasoning, so you're back to using the ordinary, full-resolution Z-buffer. The problem is not with the flags you mentioned, but rather with changing from depth-pass to depth-fail, which in D3D is changing the value of D3DRS_ZFUNC from D3DCMP_LESSEQUAL to D3DCMP_GREATER. As Basic said, HZ is useless here. HZ is not a "horrible thing to have". Listen to what we are saying before you comment. HiZ is just sitting there, unable to do anything to accelerate rendering of the stencil volumes, as it can't ditch batches of pixels that pass the z-test.
However, when you return to the lighting passes, you're back to ordinary depth pass rendering, and the HZ values can still help you reject pixels that won't be seen because of the Z buffer. The stencil test still occurs afterwards, for pixels that pass the Z-test.
If you still have any problems understanding, then I give up. Basic and LeStoffer, thanks for letting me know this was worth my while.