ATI Hierarchical-Z issue with Doom 3

Discussion in 'Architecture and Products' started by Wunderchu, Aug 2, 2004.

  1. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,344
    Likes Received:
    176
    Location:
    On the path to wisdom
    NVidia has a form of early Z, but no hierarchical Z AFAIK. It can reject up to 16 quads per clock.

    I'm not sure about ATI's hierZ.

    The big difference between the two (apart from the hierarchical buffer) is that ATI only stores one value, while NVidia uses both min and max, which, btw, is another very much Doom3-centric feature.
     
  2. ERP

    ERP
    Veteran

    Joined:
    Feb 11, 2002
    Messages:
    3,669
    Likes Received:
    49
    Location:
    Redmond, WA
    AFAIK this isn't exactly accurate, although it may have changed since I last saw low level docs on an NVidia chip. It's true that NVidia can determine the min and max of the samples, in the quad, but that's a side effect, rather than a design.

    It used to be that the early Z was implemented the obvious way, relying on the Z Compression (which has some of the Z buffer on chip) to save the bandwidth. It's possible that this has changed in later chips.
     
  3. swaaye

    swaaye Entirely Suboptimal
    Legend

    Joined:
    Mar 15, 2003
    Messages:
    9,045
    Likes Received:
    1,119
    Location:
    WI, USA
    So, is the rest of HyperZ working? Z clear and Z compression? It would be interesting to compare a 8500 to a 9000. The 9000 doesn't have HeirZ, unlike the 8500.......see what it does in Doom3 (if you can even notice considering the general performance of those cards here anyway.)
     
  4. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    I missed the end of that thread. Interesting.

    I know, I was just comparing with ATI. For ATI, HiZ doesn't depend on FSAA due to it's use of min/max, but early Z does, AFAIK. They can only test 2 individual samples per pipe per clock.

    It's too bad ATI didn't opt for a min/max HiZ implementation for R420. Maybe they just underestimated NVidia. The X800XT PE's 133 Gpix/s z-rejection rate is not applicable in Doom3. I'll try to remain hopeful that they can separate the sense of early Z and HiZ so that the former can be used during shading (which requires many cycles, so even 16 pix/cycle rejection is good enough) and HiZ can be used for stencil rendering by storing the max Z instead.
     
  5. jvd

    jvd
    Banned

    Joined:
    Feb 13, 2002
    Messages:
    12,724
    Likes Received:
    9
    Location:
    new jersey
    Same settings

    2x


     
  6. tcchiu

    Newcomer

    Joined:
    Jun 3, 2004
    Messages:
    22
    Likes Received:
    0
    Location:
    Taiwan
    http://www.3dcenter.org/artikel/2004/07-30_english.php

    I don't understand. The z-test mode doesn't have to be changed even if z-fail shadow volume algorithm is used, right?

    Z-fail algorithm increments and decrements the stencil values when z-test fails, but in the both pass (1st z-only pass, and 2nd stencil shadow pass) the same z-test mode LT (less than) can be used.

    How could it break ATI's Hierarchical Z?
     
  7. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    It may be that ATI doesn't implement a depth fail routine directly, but switches the depth function instead (and always assumes that a depth fail means do nothing).
     
  8. Thowllly

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    551
    Likes Received:
    4
    Location:
    Norway
    With hierarchical Z you store one z value for a tile of (maybe 8x8 pixel) of pixels. That one value is the same as the one pixel in the tile that is closest to the camera. If that pixel passes, they all passes. If you want to do a depth fail test, then you can't use the same 'closest z' value, even if that value passes the 'depth fail' test (iow, that pixel is closer than the new fragment your testing against), there might be other pixels in the tile that do not pass the 'depth fail' test (iow, those pixels are further away than the new fragment your testing against). To do a hierarchical Z depth fail test, you also need to store an additional value for each tile, the depth of the pixel furthest away. That would require a larger hierarchical Z buffer, and since ATI has that buffer on chip, they probably don't want to waste space on something so rarely used.
     
  9. Scali

    Regular

    Joined:
    Nov 19, 2003
    Messages:
    2,127
    Likes Received:
    0
    In that case, it may have been better to change the depth function and keep the stencil function on zpass.
    I wonder how this will affect ATi and NV cards.

    Edit: With a simple test in my 3d engine on my Radeon 9600Pro I see no difference in performance between using zfail stenciling and a zless compare or zpass and a zgreaterequal compare.
    So either the hierarchical buffer works in both cases, or it works in neither.
     
  10. Thowllly

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    551
    Likes Received:
    4
    Location:
    Norway
    I think I was somewhat wrong in my previous post. Even if you only store one value per tile it should help no matter what z test your doing, and storing both largest and smallest z would help for normal rendering too, not only the unusual cases. nevermind...
     
  11. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,344
    Likes Received:
    176
    Location:
    On the path to wisdom
    ATI stores the farthest Z value. So if an incoming tile is completely behind this value, you can trivially reject it. Trivial rejection is much more useful than trivial accept. (note that this also means that if a triangle partly covers a tile marked as cleared, that tile is effectively "lost")


    The Z-fail method now means that, if (the stencil test passes and) the Z test fails, you increase/decrease the stencil value. So even if hierZ knows that all pixels fail the Z test, it can't reject pixels because there's still work to be done for those pixels.
     
  12. Reznor007

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    633
    Likes Received:
    70
    Location:
    Norman, OK, USA
    I heard that H-Z was removed from RV350/RV360...
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...