ATI Hierarchical-Z issue with Doom 3

Holy crap.. could they make the game playable on a 6800 at 1600x1200 with 4X AA and like 8XAF?
 
zeckensack said:
Maybe ATI hardware just doesn't have as much stencil fillrate. Ever thought of that?

I'm sure ATI wishes that's all it took...

Check out this thread. With 4xAA enabled, the GT has 2.8 GPix/s stencil fillrate. The X800XT PE has 4.1 GPix/s stencil fillrate. Also, given that NV40's lead doesn't really change when AA is enabled, it seems stencil performance is not a very big deal.

I'm quite sure there's some Hi-Z problem, specifically changing the sense of the depth test.

It seems that there's something more, though. NV40 can only reject at 64 samples/clk, and R420 can do 32 without HiZ (and 256 with). Maybe the shading is faster on NV40 as well, and their superscalar shading units are put to good use. Maybe it's the use of so many textures. I'm not sure, but this definately goes beyond stencil shading power.
 
Bjorn said:
Just saw this at Firingsquad:

http://www.firingsquad.com/hardware/doom3_perf/page11.asp

The most startling part about it is that NVIDIA’s ace in the hole, UltraShadow, isn’t even enabled currently in DOOM 3. With the extensive use of stencil shadows throughout DOOM 3’s dark levels, UltraShadow could play a huge role in improving NVIDIA’s current performance even more.

Doesn't say anything about why it isn't used though, if it's true that is.

Thats quite the claim. Is that backed up anywhere else in the article? I had a quick scan and couldnt see anything.
 
There's a console command that appears to correspond to the the "UltraShadow" OGL extension name - enabling or disabling it does nothing on NV35/36/40.
 
tEd said:
hasn't this been discussed here before? I thought this was fixed or improved with r350.

9800 PRO clocked @ 9700 PRO speeds scores exactly the same as a 9700 PRO in Doom3 demo1.
 
As Nvidia are detecting doom3 couldnt their optimized path use ultrashadow regardless of whether doom3 requests it on supported cards?

Dont see why they would do that though thinking about it as that would imply ultrashadow gave no benefit to someone who didnt know what was going on.
 
DaveBaumann said:
tEd said:
hasn't this been discussed here before? I thought this was fixed or improved with r350.

9800 PRO clocked @ 9700 PRO speeds scores exactly the same as a 9700 PRO in Doom3 demo1.
Hmmm. So, what does this tell us? Can we conclude from this alone that the stencil processing/rejection rate is not what's holding ATi back (in contradiction to aths' recent 3DC article, IIRC)? That the explanation for R420's relatively lackadaisical performance lies elsewhere, like in the drivers?

Does JC have an explanation?
 
Pete said:
DaveBaumann said:
tEd said:
hasn't this been discussed here before? I thought this was fixed or improved with r350.

9800 PRO clocked @ 9700 PRO speeds scores exactly the same as a 9700 PRO in Doom3 demo1.
Hmmm. So, what does this tell us? Can we conclude from this alone that the stencil processing/rejection rate is not what's holding ATi back (in contradiction to aths' recent 3DC article, IIRC)? That the explanation for R420's relatively lackadaisical performance lies elsewhere, like in the drivers?

Does JC have an explanation?

funny thing is that thief3 which also uses alot of stencil shadows runs better on x800. i was very surprised
 
DaveBaumann said:
There's a console command that appears to correspond to the the "UltraShadow" OGL extension name - enabling or disabling it does nothing on NV35/36/40.

Doesn't have to mean it's never enabled. Could just as well mean that it's always enabled, which I find more likely, unless Carmack just never got it working with his engine, but I doubt that.
 
Humus said:
Doesn't have to mean it's never enabled. Could just as well mean that it's always enabled, which I find more likely, unless Carmack just never got it working with his engine, but I doubt that.

Go find Rex and ask him...

[edit by Reverend]Rex sounds nicer than Rev![/edit]
 
Mintmaster said:
It seems that there's something more, though. NV40 can only reject at 64 samples/clk, and R420 can do 32 without HiZ (and 256 with). Maybe the shading is faster on NV40 as well, and their superscalar shading units are put to good use. Maybe it's the use of so many textures. I'm not sure, but this definately goes beyond stencil shading power.
NV40 can reject 64 pixels/clock (or rather 16 quads).

NV40 almost certainly needs less clocks per pixel for the lighting (e.g. nrm_pp), but I'm not sure that's enough to equalize the 30% clock speed advantage of X800XTPE.


I've also heard this rumor of depth bounds test not being enabled before.

dan2097 said:
As Nvidia are detecting doom3 couldnt their optimized path use ultrashadow regardless of whether doom3 requests it on supported cards?
Depth bounds test needs information on the range of the light. The app has to explicitly pass this information to the ICD.
 
Reverend said:
John,

What's the situation with regards to depth bounds test implementation in the game? Doesn't appear to have any effect on a NV35 or NV40 using the cvar.

Also, enabling the use of a FP rendering buffer (r_hdr_usefloats) doesn't appear to result in any differences or improvements... or maybe I don't know where to look?

John Carmack said:
>John,

> What's the situation with regards to depth bounds test implementation
> in the game? Doesn't appear to have any effect on a NV35 or NV40
> using the cvar.

Nvidia claims some improvement, but it might require unreleased drivers. It's not a big deal one way or another.

> Also, enabling the use of a FP rendering buffer (r_hdr_usefloats)
> doesn't appear to result in any differences or improvements... or
> maybe I don't know where to look?

All of the r_sb* and r_hdr* cvars are for research code that wasn't included in the shipping build.

John Carmack

Answers a few of your questions.
 
Xmas said:
NV40 can reject 64 pixels/clock (or rather 16 quads).
Are you suggesting NV40 can reject 64 pixels/clock even with AA? Can anyone run gl_ext_reme with and without AA for me? I found it works very well for calculating Z-reject rate.

Xmas said:
NV40 almost certainly needs less clocks per pixel for the lighting (e.g. nrm_pp), but I'm not sure that's enough to equalize the 30% clock speed advantage of X800XTPE.
Yeah, I was thinking about that. Can R300 actually perform a nrm in one clock, or does the driver expand it to dp3/rsq/mul? NVidia was probably able to really hand tune the shading well (not that there's anything wrong with that), since there are very few different shaders AFAIK. Still, 50% clock advantage (XT PE vs GT) is very big. Hmm...

Xmas said:
I've also heard this rumor of depth bounds test not being enabled before.

dan2097 said:
As Nvidia are detecting doom3 couldnt their optimized path use ultrashadow regardless of whether doom3 requests it on supported cards?
Depth bounds test needs information on the range of the light. The app has to explicitly pass this information to the ICD.
I don't really see how the depth bounds test would affect performance that much for NV40 with Doom3. NV40's Z-reject rate is only twice it's stencil fill rate, and since Doom3 is indoors, it would be pretty hard to bound the shadows to a smaller region within a room, IMO.
 
DaveBaumann said:
There's a console command that appears to correspond to the the "UltraShadow" OGL extension name - enabling or disabling it does nothing on NV35/36/40.

Do you mean the "r_depthboundstest" cvar dave?
 
Mintmaster said:
Xmas said:
NV40 can reject 64 pixels/clock (or rather 16 quads).
Are you suggesting NV40 can reject 64 pixels/clock even with AA? Can anyone run gl_ext_reme with and without AA for me? I found it works very well for calculating Z-reject rate.
Yes, that's what I'm suggesting. The early rejection is independent of pipelines and operates on quads, so MSAA doesn't matter here.
Unfortunately, I don't have the hardware.

Yeah, I was thinking about that. Can R300 actually perform a nrm in one clock, or does the driver expand it to dp3/rsq/mul?
AFAIK NV40 is the only chip with native nrm support.

I don't really see how the depth bounds test would affect performance that much for NV40 with Doom3. NV40's Z-reject rate is only twice it's stencil fill rate, and since Doom3 is indoors, it would be pretty hard to bound the shadows to a smaller region within a room, IMO.
I don't think it's that much of a difference, too. But with 4xAA enabled, Z-reject is eight times faster.
 
Dave B(TotalVR) said:
You dont think that ATi's lack of optimisation in Doom3 has anything to do with them leaking the alpha test last year? ;)

Wouldn't that be more than 2 years ago?
 
has anyone tried enabling r_usedepthboundstest with the 62.20 drivers?
I can't be bothered to add it to my config and I assume it would need a vid restart if I enabled it in the console

it's hard work being this lazy ya know ;)
 
Back
Top