NV40: 6x2/12x1/8x2/16x1? Meh. Summary of what I believe

digitalwanderer said:
I agree with half of that statement....this is the most confusing thread I've tried to make sense of since joining this place! :oops:

Heh...;) Well, I suppose it is amusing precisely because it is so confusing...:) I find in times like these it pays me to keep a sense of humor on tap...more or less.

Confucious says:

"Man who speak in riddles has forked tongue."

Or, was it Sitting Bull who said that...? Hmmmm....:)
 
WaltC said:
digitalwanderer said:
I agree with half of that statement....this is the most confusing thread I've tried to make sense of since joining this place! :oops:

Heh...;) Well, I suppose it is amusing precisely because it is so confusing...:) I find in times like these it pays me to keep a sense of humor on tap...more or less.

Confucious says:

"Man who speak in riddles has forked tongue."

Or, was it Sitting Bull who said that...? Hmmmm....:)
That is NOT helping! :rolleyes: ;) :p

Damn it, this thread really is beyond my understanding and I really want to understand it...but I figure if I keep getting myself good and confuzled trying to understand then it'll make a bit of sense to me one day. :)

I'll try and avoid the commentary from now on here since I'm out of my league, but this thread DEFINATELY needed a bit of a low-geek-speak break! ;)
 
digitalwanderer said:
That is NOT helping! :rolleyes: ;) :p

Damn it, this thread really is beyond my understanding and I really want to understand it...but I figure if I keep getting myself good and confuzled trying to understand then it'll make a bit of sense to me one day. :)

I'll try and avoid the commentary from now on here since I'm out of my league, but this thread DEFINATELY needed a bit of a low-geek-speak break! ;)

Well, don't know if this helps, but what I was saying was that if you don't know what the physical pixel pipeline organization of a gpu is, you can't determine anything about it relative to its performance. That's about the size of it, seems to me. You need to know that as a starting place, else everything else you say about "ops" and "z-pixels" can have no meaning relative to performance. Basically, the pixel pipeline organization of a gpu is as fundamental a thing to know as the width of the local memory bus or the MHz clock of the gpu, etc. Without knowing the fundamentals, everything else is so much dross...imo, of course...;)

You know, the fact is that on these forums it wasn't until after nVidia decided to misrepresent the pixel pipeline organization of nV30/5/8 that anybody out here became confused about what a fundamental thing that knowledge is...;) If it brings to your mind the tangential confusion about "trilinear filtering," also authored by nVidia, you've got the right idea. nVidia is the author of the current confusion in many areas (fp16 vs. fp24, too, etc.) , as far as I'm concerned. Misinformation and obfuscation for base, competitive reasons has no place in academic appreciations.
 
WaltC,
you're right that the physical organization is very important when estimating the performance of a chip. However, I think you're making too many assumptions based on "traditional" ways to build GPUs.
What if the TMUs aren't truly "attached" to the pipelines? What if some units can work on vertices as well as pixels? What if you have N pixel processors that take two clocks to perform an operation on a quad, but only one clock to simply pass on a quad to an appropriately sized back end with 2xN Z/stencil compare units?

The AxB scheme simply isn't enough any more.
 
So Nvidia created FP16 for the sole purpose of creating confusion, rather than trying to squeeze performance out of their architecture? And they decided to design their pipelines in a way difference from other IHVs, because, let's me guess, they wanted to create confusion as part of their plan, muahaha.

Hey, why don't we propose that NVidia's engineering teams don't design anything, but are mere marketing abstractions, and in fact, the entire company is just a marketing abstraction.
 
DemoCoder said:
So Nvidia created FP16 for the sole purpose of creating confusion, rather than trying to squeeze performance out of their architecture? And they decided to design their pipelines in a way difference from other IHVs, because, let's me guess, they wanted to create confusion as part of their plan, muahaha.

Hey, why don't we propose that NVidia's engineering teams don't design anything, but are mere marketing abstractions, and in fact, the entire company is just a marketing abstraction.

Hahahaha. Well said.
 
Overall in many cases the performances (clock for clock) of R300 and NV35 are similar, but they both take different approaches at doing it. Similar increases in performance levels in shader dominated situations can be obtained at by deepening the pipeline, but then that requires more control logic (software and/or hardware) in order for it to be scheduled correctly and utilized efficiently.

How can clock for clock operation between R300 and NV35 be similar when in most tests (DX9 synthetic and game benchmarks) a RV360 with a 4*1 pipeline performs on par with a NV35 with a 4*2 pipeline, higher bandwith and faster clockspeeds? This of course taking out the NVidia driver treatment. ;)

Besides, when you add IQ(AA and AF) into that equation it gets more obvious. The performance loss delta is higher on these cards than in r3x0 ones (which have slower clock speeds). Granted the implementation of AA and AF is not the same, but in the end, the resulting IQ for NV35 is at best similar to R3x0.

I can agree in DX8.1 and below, but tests done in this very site show that NV3x architecture does have serious deficiencies when 2.0 shaders and High IQ features come into play.
 
BetrayerX said:
How can clock for clock operation between R300 and NV35 be similar when in most tests (DX9 synthetic and game benchmarks) a RV360 with a 4*1 pipeline performs on par with a NV35 with a 4*2 pipeline, higher bandwith and faster clockspeeds? This of course taking out the NVidia driver treatment. ;)
Because pipelines aren't everything.
 
The Baron said:
BetrayerX said:
How can clock for clock operation between R300 and NV35 be similar when in most tests (DX9 synthetic and game benchmarks) a RV360 with a 4*1 pipeline performs on par with a NV35 with a 4*2 pipeline, higher bandwith and faster clockspeeds? This of course taking out the NVidia driver treatment. ;)
Because pipelines aren't everything.

I understand that, but it's part of the problem and at the end it affects NV3x performance.

I dunno if what I wrote was clear (english is my second language), but what I meant was that if the smaller version of the R3x0 core can keep up with NV3x best, then how can NV3x have similar work per clock performance with it's bigger brother.
 
Depends what exactly is meant with each term and implementation plays a major role too. In terms of multitexturing fill-rate for instance at same clockspeeds whether 8*1 or 4*2 doesn't make a difference.

The difference lies elsewhere and it apparently affects arithmetic efficiency.

On the other hand who's willing to bet, that it's impossible to equal the arithmetic efficiency of a R3xx with a entirely different 4*2-alike design?
 
I don't think the inefficiency has to do with two texture units. With smaller triangles, being able to work on two quads at once is a bonus. So an 8 pipe design is more efficient for workloads with lot of small tris. On the other hand, if the average workload samples 2 textures per quad, an 8x2 architecture would be even better, especially if one of the TMUs can do "double duty" as an extra FP unit.
 
Thanks for the post Walt, simple and complex at the same time.
Several questions for us laymen or this laymen that I think relate to the topic:

1. does Nvidia have a 4x1 part right now that they could convert to 8x1?(if i'm reading right the 9700/9800 were basically 2 4x1's combined on die with strict logic)
2. do you think Nvidia scrapped plans for the Nv40 behind the scenes and followed ATi's 8x1 when the 9700pro A-bomb hit or were they too locked in at that point?
3. given that thermally wise the 5900/5950 are more inefficent than ATi's cards where is Nvidia going to find the headroom to tack on pipelines and texture units if they go to 6x2 as some are suggesting?
 
TANTALIZING! =D

Oops, I dipped into the Inqwell again:

The marchitecture, just like the Nvidia NV45 case will end up exactly the same as R420, which means that the key thing about this card will be pixel shader 3.0 and of course the chip will be made in 0.13 marchitecture very likely with Low K.

As for clock speeds, it should end up close to 500/1000 MHz but these numbers are not finalised and literally have changed every working day.

The pipeline number of this part is still mystery to us but we are on very good and sensational lead that we have to verify. That’s where the real secret and power of this chip is, we are told.

Good Lord, can it really be true: an nVidia chip with a mysteriously powerful pipeline number? Secretly sensational!
 
16 would be sensational, since it would a doubling of the pipelines but in a relatively confined transistor budget.

My bet on the R420 is 8 pipelines, but with 2-3 FP add/mul/dot3 units per pipe.
 
Bouncing Zabaglione Bros. said:
Question is, what would count as a "sensational" number of pipelines?

Well Fuad said their lead is "sensational" (the worst kind, LOL), not the number of pipelines.

MuFu.
 
DemoCoder said:
Err, the way I read that Inq post was that the R423 has a mysterious number of pipelines, not the NV40.
Right, not sure how I confused the two, as I was thinking of Anandtech's recent eight pipeline claim and Hellbinder's months-old 12-pipeline speculation. :oops:
 
There's a possibility both are wrong. ATi have certainly done a very good job of confusing people. :LOL:

That Synopsys announcement still haunts me, hehe.
 
MuFu said:
There's a possibility both are wrong. ATi have certainly done a very good job of confusing people. :LOL:
TELL ME ABOUT IT!!!! :oops:

I'm actually half-way convinced the R420 might just be a bloody FP32 card now. :|
 
Back
Top