r420 may beat nv40 in doom3 with anti-aliasing

LokeshRay

Newcomer
my first post here
just posted this a few hours back at nvnews
wanted to knw wht u guys have to say about this

r420 does 32 z/stencil operations per clock cycle when antialiasing is enabled
nv40 does 32 z/stencil operations per clock when no color is being dealt with

this has been mentioned by anandtech
By contrast, R420 pushes 32 z/stencil operations per clock cycle when antialiasing is enabled (one z/stencil operation can be completed per clock at the end of each pixel pipeline, and one z/stencil operation can be completed inside the multisample AA unit).

YOU can read the entire article RIGHT HERE


so in doom 3 when FSAA is enabled,
r420 will be doing 32 z/stencil operations all the time
nv40 does it only when no color is being used, or when a z/stencil only pass is performed.

thus for FSAA
amount of time for which r420 does 32 z/stencil operations per clock cycle, is greater than that for NV40 , in a typical doom3 map.
on an outdoor map, nv40 will probably do very few 32 z/stencil operations/clock.
so
when FSAA is enabled,
nv40:
sometimes 32 z/stencil operations per clock
rest of the times 16 z/stencil operations per clock

r420:
always 32 z/stencil operations per clock

in maps with lots of shadows, when FSAA is turned on, r420 will match or exceed nv40 performance because
32x520>32x450

THE performance delta will be even larger in maps with hardly any shadows, especially outdoor maps,

but, we all know how nvidias opengl drivers are superior,
their performance in opengl games is better,
nv40 has ultra shadow technology built into the hardware, and doom3 takes advantage of this,
so it will all depend on how the high core frequency of r420 is able to offset the advantages that nv40 has (ultra shadow, 32 z/stencil operations with no color)

and when FSAA is not enabled, there is no doubt that nv40 will beat r420
 
LokeshRay said:
my first post here
just posted this a few hours back at nvnews
wanted to knw wht u guys have to say about this

r420 does 32 z/stencil operations per clock cycle when antialiasing is enabled
nv40 does 32 z/stencil operations per clock when no color is being dealt with
The NV40 also does 32 z/stencil operations per clock when AA is enabled.

With nVidia's heavy acceleration of stencil shadows, I expect the NV4x to perform much better with respect to the R4xx on DOOM3 than it will on other games.
 
I found it interesting that in 3DM03 GT2 and X2, the X800s were slightly ahead w/o AA+AF, yet fell behind when AA+AF were enabled (6800U was also ahead of X800 XT in Fablemark in B3D's review, but Dave didn't include AA+AF benches of it). This goes against the general pattern that the X800 loses less performance when enabling AA+AF, and it also seems to go against both the synthetic benchmarks that show the 6800 can perform far more z ops per cycle and the idea that the X800 can do more z ops per cycle with AA than without. Can I draw a conclusion about either card's stenciling ability from these two tests, or are they otherwise limited so that stencil performance isn't playing a big role in the framerate? Or am I barking up the wrong tree?
 
christoph said:
NV40@400MHz= 12.8 GPixels/sec stencil fill-rate
X800XT@520MHz= 8.3 GPixels/sec stencil fill-rate

I'm sorry but what math did u use to reach these figures ?
 
32 * 400 vs 16 * 520, and he's ignoring antialiasing. With antialiasing, it's 16.6gp for the XT. Whether or not the AA stencil/z fillrate can be used depends on what the algorithm does (e.g. render to texture)
 
rwolf said:
Especially since the developers have sold their souls to Nvidia.

Well, recent interview with JC mentioned that NV3x rendering path was dropped and ARB2 is used instead.
 
DemoCoder said:
32 * 400 vs 16 * 520, and he's ignoring antialiasing. With antialiasing, it's 16.6gp for the XT. Whether or not the AA stencil/z fillrate can be used depends on what the algorithm does (e.g. render to texture)

right so wouldn't it be

nv40@400mhz 12.8 gpixels/sec stencil fillrate while fsaa
x800xt@520mhz 16.6 gpixels / sec stencil fillrate while fsaa

Thus giving ati the advantage in fsaa just like the name of the thread ?
 
That depends if:

a) you can even enable AA in D3 without a massive performance loss. That depends on what performance is like without AA enabled at modest resolutions.
b) you ignore UltraShadow culling
 
DemoCoder said:
That depends if:

a) you can even enable AA in D3 without a massive performance loss. That depends on what performance is like without AA enabled at modest resolutions.
b) you ignore UltraShadow culling

can u give me a quick summer of ultra shadow culling ?
 
3dmark03 game test 2 should give good indication on how d3 will perform i think. from the xbitlabs review here:
http://www.xbitlabs.com/articles/video/display/r420-2_32.html

3dm2_1280_candy.gif

3dm2_1600_candy.gif
 
why does it say af 8x/16x . Does that imply one card is using 8x while the other is using 16x ?

Also why would u give me a benchmark that i don't trust and haven't trusted in almost 2 years
 
3dMark03 could also be limited by vertex shader performance too. Remember, it does shadow volume extrusion in the vertex shaders.

It also might use the Z-buffer incorrectly, which could disable Hyper-Z HD. Dunno.
 
jvd said:
Evildeus said:
GF FX 5950 can't do more than 8*....

right so doesn't that mean the radeons are doing more aniso ?

Thats what i get from seeing af 8/16

You're probably right, jvd, although we should check before accusing. If those tests are with R420 doing 16x and NV40 doing 8x, then i really do not see the point of taking the test.
 
Back
Top