NV40: 6x2/12x1/8x2/16x1? Meh. Summary of what I believe

The scene is rendered in multiple passes; first the scene is rendered using ambient lighting, which also prepares the depth buffer for stencil lighting.
Well, that's one difference right off. We know that the GeForce FX is designed for a style of rendering where that first pass is done without color writes.
 
:?

Are you talking about the Doom 3 rendering style? Surely the GFFX wasn't designed entirely around just one game! :)
 
John Carmack has proven very influential in the past on directing the future of 3D graphics. Why would this be any different?
 
DemoCoder said:
OpenGL guy said:
How about 16 Z/stencil ops per cycle when AA is enabled?

Can I go now or is class still in session?
Well, let's talk per pipe: Any changes besides adding multisampling?
Just where do you think these extra Z/stencil test units reside?
Most of the discussion of zixel fillrate to date has been with multisampling off.

e.g. any stenci/z fill changes with multisampling off?
Wouldn't you say that there's more need for these extra samples with AA enabled than with it disabled? I mean, since we're already doing 8 per cycle without AA...

But it seems clear that you want to restrict things enough to make your case that there are no improvements between R200 and R300 regarding Z/stencil.
 
Chalnoth said:
The scene is rendered in multiple passes; first the scene is rendered using ambient lighting, which also prepares the depth buffer for stencil lighting.
Well, that's one difference right off. We know that the GeForce FX is designed for a style of rendering where that first pass is done without color writes.
But would still benefit from the stencil only passes.
 
Ailuros said:
Vince said:
Ailuros said:
Now despite the obvious OT considering stencil performance can someone kindly explain to me, where exactly it has been proven so far that the NV30 due to it's higher clockspeed yields better stencil performance and that in a pure stencil op concentrated synthetic?
The discussion was on the differing ideologies behind each respective architecture and that each IHV has talented people capable of producing valid and high-preformance ICs that are specialized to their beliefs of what's important in the marketplace.. So, it's a theoretical discussion and comments like this serve as the proof you seek:
[url=http://www.beyond3d.com/forum/viewtopic.php?p=225324#225324 said:
OpenGL Guy in responce to me[/url]]If you want to say that the 5800 Ultra was 50% faster because of its higher clock speed, I won't dispute that.
It's obvious that I haven't followed the discussion for the past few pages and I really don't have much to disagree with the paragraph above. Au contraire I DO disagree with OpenGL Guy's note that the NV30 was 50% faster. Faster than what and exactly where, because I'm obviously blind.

I can see more raw fill-rate on paper and that's about it; I can also see severe possible bandwidth constrains for that very same fill-rate again on paper. In the case of the NV30 it's 8*500= 4.0GPixels/s vs. on the R300 it's 8*325= 2.6GPixels/s stencil fill-rate, yet it's also 16GB/s vs. 19.84GB/s bandwidth. Performance numbers are up there.
It's hopeless, Ailuros. It's clear that NV3x is more advanced because it can do more stencil ops than color ops, the fact that it is out performed by a lower clocked chip is not relevant, haven't you been paying attention? :LOL:

Never mind that the NV3x has a very similar pipe configuration to the NV25 (including such useful things as register combiners!). The fact that the engineers tossed in a couple of extra Z/stencil units is all that matter here. Never mind that the R200 was 4x2 and the R300 is 8x1 (even with color enabled!)... no change here at all!

And of course the NV3x has deeper pipelines than the R3x0, nvidia said so! I mean, you couldn't possiby acheive such awesome clock frequencies without more pipelining! Obviously using a smaller, faster process had nothing to do with it.

But I'm just an ATI zealot, pay no attention to me. I never state any facts, only my stupid, biased opinions which aren't worth the paper they aren't printed on.

P.S. Note that most of this is not directed at you, Ailuros.
 
Aren't you being a bit too much vehement over here? Sure R3** is much better architecture and performance than NV3*, but that doesn't mean that in some very peculiar cases the NV3*'s architecture can't do thing better than R3**. Well that's how i see this little debate.
Just my 2 cents :!:
 
Chalnoth said:
The scene is rendered in multiple passes; first the scene is rendered using ambient lighting, which also prepares the depth buffer for stencil lighting.
Well, that's one difference right off. We know that the GeForce FX is designed for a style of rendering where that first pass is done without color writes.

Which takes less then 5% of rendering time in this benchmark. :rolleyes:

If an architecture is "designed" for stencil it's the Kyro 2.
It has 2 pixel pipes and 16 "Zixel/stencil" pipes.
 
You should check out the benchmarks more often guys...

37045.gif


37043.gif


Now you can talk about what being optimized for what :rolleyes:

/Graphs by F-Center. The article is in russian, but graphs are in english./
 
Chalnoth said:
The scene is rendered in multiple passes; first the scene is rendered using ambient lighting, which also prepares the depth buffer for stencil lighting.
Well, that's one difference right off. We know that the GeForce FX is designed for a style of rendering where that first pass is done without color writes.

And? FableMark is a synthetic benchmark that measures stencil performance and to that IS fill-rate limited. If you have some other application in mind that can measure stencil fill-rate , there's not a single reason not having a look at it either.

Of course are there also the graphs that Degustator just posted, yet the next best thing I'm expecting to hear is "oh well there are textures in there..." LOL.

***edit: quite interesting is the fact that the XT with colour writes off/ Z writes on sustains a ~95% fill-rate efficiency from 0 to 4 textures, while the NV38 goes from ~85% with 0 textures to ~75% with 4 textures.

If an architecture is "designed" for stencil it's the Kyro 2.
It has 2 pixel pipes and 16 "Zixel/stencil" pipes.

2*16= 32 Z/stencil units. There is also one limitation to that, but the issue here isn't outdated value hardware either.

OpenGLguy,

But I'm just an ATI zealot, pay no attention to me. I never state any facts, only my stupid, biased opinions which aren't worth the paper they aren't printed on.

On a more humorous note I wouldn't expect anything else from you either. My father is proud of me too; or so at least I think... :oops:
 
Ailuros said:
And? FableMark is a synthetic benchmark that measures stencil performance and to that IS fill-rate limited. If you have some other application in mind that can measure stencil fill-rate , there's not a single reason not having a look at it either.
Well, one could use Tenebrae, even though it's not a production game.

Of course are there also the graphs that Degustator just posted, yet the next best thing I'm expecting to hear is "oh well there are textures in there..." LOL.
Um, no.

The main reason I doubted this was from my experience with the Radeon 9700. My experience has been that games that use the stencil buffer decrease the Radeon 9700's performance significantly with FSAA, to the point where with FSAA, it seemed to be only a little bit faster than my GeForce4 (whereas usually it's much faster...). Perhaps something has changed in the architecture since, but I'd still be interested to see some FSAA stencil benchmarks like those Degustator posted, particularly along with older versions of the R3xx cards.
 
Well, one could use Tenebrae, even though it's not a production game.

You do realise I suppose that FableMark is a LOT closer to future stencil shadow implementations of the immediate future than Tenebrae or not?

However let's see what that shoddy "let's bumpmap everything up the wazoo" looks like from Reverend's reviews (sans AA):

http://www.beyond3d.com/reviews/albatron/gffx5900pv/index.php?p=10

NV35: 400MHz*8= 3.2GPixels/s stencil fillrate

25.1 fps in 1024? errrr..... :rolleyes:

Now Reverend warns that the above is with a new version of Tenebrae and shouldn't be compared to older reviews. Older reviews (sadly only from mainstream parts) showed the following pattern:

9600PRO:

http://www.beyond3d.com/reviews/triplex/redair9600pro/index.php?p=9

1024@ 18.5 fps

420MHz*4= 1.68GPixels/s stencil fillrate

5600PRO:

http://www.beyond3d.com/reviews/albatron/gffx5600pturbo/index.php?p=7

1024@ 16.4 fps

325MHz*4=1.3GPixels/s stencil fillrate

I don't think AA could be even a consideration in any case with that one heh.

The main reason I doubted this was from my experience with the Radeon 9700. My experience has been that games that use the stencil buffer decrease the Radeon 9700's performance significantly with FSAA, to the point where with FSAA, it seemed to be only a little bit faster than my GeForce4 (whereas usually it's much faster...). Perhaps something has changed in the architecture since, but I'd still be interested to see some FSAA stencil benchmarks like those Degustator posted, particularly along with older versions of the R3xx cards.

Usually the games with stencil shadows I've run across this far had shoddy implementations of stencil shadows anyway.

But yes the MSAA/stencil fill-rates from the same application as the graphs above (are those from MDolenc's Fill-rate Tester or not? If yes then there are already results with MSAA on posted from Wavey on these boards).
 
Ailuros said:
You do realise I suppose that FableMark is a LOT closer to future stencil shadow implementations of the immediate future than Tenebrae or not?
I'm not so sure. Tenebrae, for example, got its basic stencil rendering algorithm directly from JC's shadow volume paper (obviously JC will be doing it differently for the final release of DOOM3, but not all that differently).

Anyway, things will become much clearer on the shadow volume front very soon. In the long run, of course, it really shouldn't mean all that much directly, as we'll be moving on to other shadowing techniques, but there are some aspects of shadow volume rendering that could well remain for later shadowing techniques (such as the initial z-only pass, and subsequent color write disabled passes for testing in/out of shadow).
 
What I don't get, is is that GFFX is 8x0/4x1, and the latest Radeons are 8x1/8x1, yet GFFX has way more transistors even with seemingly not enough temporary registers. What are the extra transistors for? Maybe it just comes down to parts of GFFX being "broken".

It's also interesting that GFFX emphasizes Z performance so much over other issues. Is this just a miscalculation on Nvidias part, or did they really design it for Doom 3?
 
nobie said:
What I don't get, is is that GFFX is 8x0/4x1, and the latest Radeons are 8x1/8x1, yet GFFX has way more transistors even with seemingly not enough temporary registers. What are the extra transistors for? Maybe it just comes down to parts of GFFX being "broken".
The GeForce FX has support for vastly more instructions per shader, more different instructions, better texture filtering, and higher-precision FP (off the top of my head).
 
nobie said:
Hmm but wouldn't the higher precision FP be offset by having fewer FP units?

The integer support along with multiple FP precision levels and the PS 2.0+ support are the most likely culprits. The better texture filtering claim is pure crap since there's not a shred of evidence that suggests nVidia has devoted more transistors to their filtering than ATi. And recent comparisons such as [H] certainly belies such claims.
 
John Reynolds said:
The better texture filtering claim is pure crap since there's not a shred of evidence that suggests nVidia has devoted more transistors to their filtering than ATi. And recent comparisons such as [H] certainly belies such claims.
Um, sure there is. Of course, we don't quite know just how many extra transistors that nVidia has devoted (it could be a negligible amount), but it is almost certainly more, for two main reasons:

1. Better MIP map/anisotropic degree selection algorithm.
2. Higher precision fraction allowed for bilinear filtering and for MIP map filtering (i.e. trilinear).

The end result of these two things is that nVidia's anisotropic filtering is more consistent, and the GeForces will have less texture aliasing in most situations.

The only way that ATI would have spent more transistors is if their implementation is not as efficient transistor-wise, since nVidia's solution does do more.
 
Chalnoth said:
Ailuros said:
You do realise I suppose that FableMark is a LOT closer to future stencil shadow implementations of the immediate future than Tenebrae or not?
I'm not so sure. Tenebrae, for example, got its basic stencil rendering algorithm directly from JC's shadow volume paper (obviously JC will be doing it differently for the final release of DOOM3, but not all that differently).

Anyway, things will become much clearer on the shadow volume front very soon. In the long run, of course, it really shouldn't mean all that much directly, as we'll be moving on to other shadowing techniques, but there are some aspects of shadow volume rendering that could well remain for later shadowing techniques (such as the initial z-only pass, and subsequent color write disabled passes for testing in/out of shadow).

You know it's still somehow funny how a NV35 yields more or less the same performance in FableMark as it did in the preliminary Doom3 demos. If NV35's were to get ~25fps w/o AA/AF, D3 would be a total disaster.

Plus Tenebrae looks IMHO God aweful overall. As I said there all kinds of shadows in games so far; usually it's an exeption if shadows so far in general have been worth the hussle; Q3a had some quirky stenciling too but it belongs to the same category as most others. And yes of course do I expect that to change with the release of Doom3.
 
Back
Top