perhaps that's why JC wouldn't talk about it to Rev?
zeckensack said:Maybe ATI hardware just doesn't have as much stencil fillrate. Ever thought of that?
Bjorn said:Just saw this at Firingsquad:
http://www.firingsquad.com/hardware/doom3_perf/page11.asp
The most startling part about it is that NVIDIA’s ace in the hole, UltraShadow, isn’t even enabled currently in DOOM 3. With the extensive use of stencil shadows throughout DOOM 3’s dark levels, UltraShadow could play a huge role in improving NVIDIA’s current performance even more.
Doesn't say anything about why it isn't used though, if it's true that is.
tEd said:hasn't this been discussed here before? I thought this was fixed or improved with r350.
Hmmm. So, what does this tell us? Can we conclude from this alone that the stencil processing/rejection rate is not what's holding ATi back (in contradiction to aths' recent 3DC article, IIRC)? That the explanation for R420's relatively lackadaisical performance lies elsewhere, like in the drivers?DaveBaumann said:tEd said:hasn't this been discussed here before? I thought this was fixed or improved with r350.
9800 PRO clocked @ 9700 PRO speeds scores exactly the same as a 9700 PRO in Doom3 demo1.
Pete said:Hmmm. So, what does this tell us? Can we conclude from this alone that the stencil processing/rejection rate is not what's holding ATi back (in contradiction to aths' recent 3DC article, IIRC)? That the explanation for R420's relatively lackadaisical performance lies elsewhere, like in the drivers?DaveBaumann said:tEd said:hasn't this been discussed here before? I thought this was fixed or improved with r350.
9800 PRO clocked @ 9700 PRO speeds scores exactly the same as a 9700 PRO in Doom3 demo1.
Does JC have an explanation?
DaveBaumann said:There's a console command that appears to correspond to the the "UltraShadow" OGL extension name - enabling or disabling it does nothing on NV35/36/40.
Humus said:Doesn't have to mean it's never enabled. Could just as well mean that it's always enabled, which I find more likely, unless Carmack just never got it working with his engine, but I doubt that.
NV40 can reject 64 pixels/clock (or rather 16 quads).Mintmaster said:It seems that there's something more, though. NV40 can only reject at 64 samples/clk, and R420 can do 32 without HiZ (and 256 with). Maybe the shading is faster on NV40 as well, and their superscalar shading units are put to good use. Maybe it's the use of so many textures. I'm not sure, but this definately goes beyond stencil shading power.
Depth bounds test needs information on the range of the light. The app has to explicitly pass this information to the ICD.dan2097 said:As Nvidia are detecting doom3 couldnt their optimized path use ultrashadow regardless of whether doom3 requests it on supported cards?
Reverend said:John,
What's the situation with regards to depth bounds test implementation in the game? Doesn't appear to have any effect on a NV35 or NV40 using the cvar.
Also, enabling the use of a FP rendering buffer (r_hdr_usefloats) doesn't appear to result in any differences or improvements... or maybe I don't know where to look?
John Carmack said:>John,
> What's the situation with regards to depth bounds test implementation
> in the game? Doesn't appear to have any effect on a NV35 or NV40
> using the cvar.
Nvidia claims some improvement, but it might require unreleased drivers. It's not a big deal one way or another.
> Also, enabling the use of a FP rendering buffer (r_hdr_usefloats)
> doesn't appear to result in any differences or improvements... or
> maybe I don't know where to look?
All of the r_sb* and r_hdr* cvars are for research code that wasn't included in the shipping build.
John Carmack
Are you suggesting NV40 can reject 64 pixels/clock even with AA? Can anyone run gl_ext_reme with and without AA for me? I found it works very well for calculating Z-reject rate.Xmas said:NV40 can reject 64 pixels/clock (or rather 16 quads).
Yeah, I was thinking about that. Can R300 actually perform a nrm in one clock, or does the driver expand it to dp3/rsq/mul? NVidia was probably able to really hand tune the shading well (not that there's anything wrong with that), since there are very few different shaders AFAIK. Still, 50% clock advantage (XT PE vs GT) is very big. Hmm...Xmas said:NV40 almost certainly needs less clocks per pixel for the lighting (e.g. nrm_pp), but I'm not sure that's enough to equalize the 30% clock speed advantage of X800XTPE.
I don't really see how the depth bounds test would affect performance that much for NV40 with Doom3. NV40's Z-reject rate is only twice it's stencil fill rate, and since Doom3 is indoors, it would be pretty hard to bound the shadows to a smaller region within a room, IMO.Xmas said:I've also heard this rumor of depth bounds test not being enabled before.
Depth bounds test needs information on the range of the light. The app has to explicitly pass this information to the ICD.dan2097 said:As Nvidia are detecting doom3 couldnt their optimized path use ultrashadow regardless of whether doom3 requests it on supported cards?
DaveBaumann said:There's a console command that appears to correspond to the the "UltraShadow" OGL extension name - enabling or disabling it does nothing on NV35/36/40.
r_usedepthboundstestcroc_mak said:DaveBaumann said:There's a console command that appears to correspond to the the "UltraShadow" OGL extension name - enabling or disabling it does nothing on NV35/36/40.
Do you mean the "r_depthboundstest" cvar dave?
Yes, that's what I'm suggesting. The early rejection is independent of pipelines and operates on quads, so MSAA doesn't matter here.Mintmaster said:Are you suggesting NV40 can reject 64 pixels/clock even with AA? Can anyone run gl_ext_reme with and without AA for me? I found it works very well for calculating Z-reject rate.Xmas said:NV40 can reject 64 pixels/clock (or rather 16 quads).
AFAIK NV40 is the only chip with native nrm support.Yeah, I was thinking about that. Can R300 actually perform a nrm in one clock, or does the driver expand it to dp3/rsq/mul?
I don't think it's that much of a difference, too. But with 4xAA enabled, Z-reject is eight times faster.I don't really see how the depth bounds test would affect performance that much for NV40 with Doom3. NV40's Z-reject rate is only twice it's stencil fill rate, and since Doom3 is indoors, it would be pretty hard to bound the shadows to a smaller region within a room, IMO.
Dave B(TotalVR) said:You dont think that ATi's lack of optimisation in Doom3 has anything to do with them leaking the alpha test last year?