ATI Hierarchical-Z issue with Doom 3

Discussion in 'Architecture and Products' started by Wunderchu, Aug 2, 2004.

  1. Tim Murray

    Tim Murray the Windom Earle of mobile SOCs
    Veteran

    Joined:
    May 25, 2003
    Messages:
    3,278
    Likes Received:
    66
    Location:
    Mountain View, CA
    perhaps that's why JC wouldn't talk about it to Rev?
     
  2. Moloch

    Moloch God of Wicked Games
    Veteran

    Joined:
    Jun 20, 2002
    Messages:
    2,981
    Likes Received:
    72
    Holy crap.. could they make the game playable on a 6800 at 1600x1200 with 4X AA and like 8XAF?
     
  3. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    I'm sure ATI wishes that's all it took...

    Check out this thread. With 4xAA enabled, the GT has 2.8 GPix/s stencil fillrate. The X800XT PE has 4.1 GPix/s stencil fillrate. Also, given that NV40's lead doesn't really change when AA is enabled, it seems stencil performance is not a very big deal.

    I'm quite sure there's some Hi-Z problem, specifically changing the sense of the depth test.

    It seems that there's something more, though. NV40 can only reject at 64 samples/clk, and R420 can do 32 without HiZ (and 256 with). Maybe the shading is faster on NV40 as well, and their superscalar shading units are put to good use. Maybe it's the use of so many textures. I'm not sure, but this definately goes beyond stencil shading power.
     
  4. Johnny Rotten

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    327
    Likes Received:
    0
    Location:
    Edmonton, Alberta
    Thats quite the claim. Is that backed up anywhere else in the article? I had a quick scan and couldnt see anything.
     
  5. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    There's a console command that appears to correspond to the the "UltraShadow" OGL extension name - enabling or disabling it does nothing on NV35/36/40.
     
  6. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    9800 PRO clocked @ 9700 PRO speeds scores exactly the same as a 9700 PRO in Doom3 demo1.
     
  7. dan2097

    Regular

    Joined:
    May 23, 2003
    Messages:
    323
    Likes Received:
    0
    As Nvidia are detecting doom3 couldnt their optimized path use ultrashadow regardless of whether doom3 requests it on supported cards?

    Dont see why they would do that though thinking about it as that would imply ultrashadow gave no benefit to someone who didnt know what was going on.
     
  8. Pete

    Pete Moderate Nuisance
    Moderator Legend

    Joined:
    Feb 7, 2002
    Messages:
    5,777
    Likes Received:
    1,814
    Hmmm. So, what does this tell us? Can we conclude from this alone that the stencil processing/rejection rate is not what's holding ATi back (in contradiction to aths' recent 3DC article, IIRC)? That the explanation for R420's relatively lackadaisical performance lies elsewhere, like in the drivers?

    Does JC have an explanation?
     
  9. tEd

    tEd Casual Member
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,105
    Likes Received:
    70
    Location:
    switzerland
    funny thing is that thief3 which also uses alot of stencil shadows runs better on x800. i was very surprised
     
  10. Humus

    Humus Crazy coder
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,217
    Likes Received:
    77
    Location:
    Stockholm, Sweden
    Doesn't have to mean it's never enabled. Could just as well mean that it's always enabled, which I find more likely, unless Carmack just never got it working with his engine, but I doubt that.
     
  11. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    Go find Rex and ask him...

    [edit by Reverend]Rex sounds nicer than Rev![/edit]
     
  12. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,344
    Likes Received:
    176
    Location:
    On the path to wisdom
    NV40 can reject 64 pixels/clock (or rather 16 quads).

    NV40 almost certainly needs less clocks per pixel for the lighting (e.g. nrm_pp), but I'm not sure that's enough to equalize the 30% clock speed advantage of X800XTPE.


    I've also heard this rumor of depth bounds test not being enabled before.

    Depth bounds test needs information on the range of the light. The app has to explicitly pass this information to the ICD.
     
  13. Reverend

    Banned

    Joined:
    Jan 31, 2002
    Messages:
    3,266
    Likes Received:
    24
    Answers a few of your questions.
     
  14. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Are you suggesting NV40 can reject 64 pixels/clock even with AA? Can anyone run gl_ext_reme with and without AA for me? I found it works very well for calculating Z-reject rate.

    Yeah, I was thinking about that. Can R300 actually perform a nrm in one clock, or does the driver expand it to dp3/rsq/mul? NVidia was probably able to really hand tune the shading well (not that there's anything wrong with that), since there are very few different shaders AFAIK. Still, 50% clock advantage (XT PE vs GT) is very big. Hmm...

    I don't really see how the depth bounds test would affect performance that much for NV40 with Doom3. NV40's Z-reject rate is only twice it's stencil fill rate, and since Doom3 is indoors, it would be pretty hard to bound the shadows to a smaller region within a room, IMO.
     
  15. croc_mak

    Newcomer

    Joined:
    Mar 26, 2002
    Messages:
    46
    Likes Received:
    0
    Do you mean the "r_depthboundstest" cvar dave?
     
  16. Reverend

    Banned

    Joined:
    Jan 31, 2002
    Messages:
    3,266
    Likes Received:
    24
    r_usedepthboundstest
     
  17. Dave B(TotalVR)

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    491
    Likes Received:
    3
    Location:
    Essex, UK (not far from IMGTEC:)
    You dont think that ATi's lack of optimisation in Doom3 has anything to do with them leaking the alpha test last year? :wink:
     
  18. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,344
    Likes Received:
    176
    Location:
    On the path to wisdom
    Yes, that's what I'm suggesting. The early rejection is independent of pipelines and operates on quads, so MSAA doesn't matter here.
    Unfortunately, I don't have the hardware.

    AFAIK NV40 is the only chip with native nrm support.

    I don't think it's that much of a difference, too. But with 4xAA enabled, Z-reject is eight times faster.
     
  19. AlphaWolf

    AlphaWolf Specious Misanthrope
    Legend

    Joined:
    May 28, 2003
    Messages:
    9,470
    Likes Received:
    1,686
    Location:
    Treading Water
    Wouldn't that be more than 2 years ago?
     
  20. Ante P

    Veteran

    Joined:
    Mar 24, 2002
    Messages:
    1,448
    Likes Received:
    0
    has anyone tried enabling r_usedepthboundstest with the 62.20 drivers?
    I can't be bothered to add it to my config and I assume it would need a vid restart if I enabled it in the console

    it's hard work being this lazy ya know ;)
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...