NVIDIA GT200 Rumours & Speculation Thread

Kaotik · May 30, 2008

Cheers, guess it was a typo on the other forum then

leoneazzurro · May 30, 2008

Lukfi said:
Then we agree on this, don't we?

About the absolute performance yes, on the performance per transistor I don't know. If RV770 is 850M transistor and comes to be 20% less than GT260, then it is even more powerful for each transistor.
But anyway, comparing chips born for different markets is difficult when coming at performance per transistor, as i.e. even in the same architecture and process node, if you compare i.e. the HD3870 and HD3650, you have the first being much more than 2 times as powerful than the second, with an increase of only 60% of the die area.
So, the best comparison should be with the chip the competition has in the same price range, that's the G92b, and as I said before, if rumors are true Nvidia needs to increase the performance of the 55nm shrink. And yes, I know that G92 and G92b are limited by the bandwidth, but this is a design decision like having only 16 TMU and 2 Z sample/clock on RV6X0 and derivatives.

mczak · May 30, 2008

Lukfi said:
If you don't mind my asking, what's wrong with comparing transistor counts?

There were always rumours about companies counting transistors a bit differently. Also, it doesn't tell you that much about chip complexity anyway - transistors in caches can be packed way more densely usually than those in logic circuits. But most importantly, die area is a more direct indicator of cost than the amount of transistors - what matters is how big your die is (and thus how many dies you can fit on a wafer), not how many transistors are in the chip. So sure G94 looks better with its 500 million transistors vs. rv670 with its 666 million transistors considering the performance - but it doesn't make it cheaper to produce. Why there is this quite large discrepancy in transistor count vs. die size is of course interesting on its own - maybe G94 has relatively less cache and more logic, counted differently, or AMD was just able to pack them more tightly together for some other unknown reasons...

Arty · May 30, 2008

That 933 GFlops just keeps hounding me, mainly because it is so close to the teraflop mark. Wonder if the chips couldnt be clocked higher or the yeilds were really that bad?

INKster · May 30, 2008

Interesting how they chose "Parallel computing Architecture" instead of "Shaders" or "Scalar Processors/Cores", and placed "Graphics Processing Architecture" only under that...

Does this mean that they intend to push CUDA even further, while diluting DX10/OpenGL marketing features, or was it merely an inconsequential decision ?

Arun · May 30, 2008

INKster said:
Does this mean that they intend to push CUDA even further, while diluting DX10/OpenGL marketing features, or was it merely an inconsequential decision ?

I have no idea if that's the case, but that'd seem like a pretty smart idea if you don't support DX10.1, doesn't it?

fellix · May 30, 2008

serenity said:
That 933 GFlops just keeps hounding me, mainly because it is so close to the teraflop mark. Wonder if the chips couldnt be clocked higher or the yeilds were really that bad?

Yep -- it's like $0.99 against $1.00.

kresek · May 30, 2008

Kaotik said:
Cheers, guess it was a typo on the other forum then

"A modern GeForce GPU has approximately 1.2 billion transistors [...]"

http://anandtech.com/weblog/showpost.aspx?i=453

maybe an average for the GTX variants "enabled" transistor counts (1.0/1.4)?

INKster · May 30, 2008

Arun said:
I have no idea if that's the case, but that'd seem like a pretty smart idea if you don't support DX10.1, doesn't it?

Did that hurt them much the last time with G92 ?
Then again, between having FP64 for CUDA, and/or a minor update to DX10, i don't think there's any doubt about their relative importance...

Lukfi · May 30, 2008

leoneazzurro said:
But anyway, comparing chips born for different markets is difficult when coming at performance per transistor, as i.e. even in the same architecture and process node, if you compare i.e. the HD3870 and HD3650, you have the first being much more than 2 times as powerful than the second, with an increase of only 60% of the die area.

Basically, you're saying that performance increases more than linearly with transistor count increase. And that higher-end chips are more effective (because the fixed parts that always have to be there take up relatively less portion of the total number of transistors) - of course, excluding the cases where the chips are bandwidth-limited, like G92, but I don't suppose this will be the case with RV770XT or GT200. So, theoretically, GT200 should have more perf/tran. count than RV770.

mczak said:
There were always rumours about companies counting transistors a bit differently.

Sure I was comparing official figures which may be counted differently. But it's absolutely irrelevant for the sake of the argument! We do know that 505 million in G94 is comparable to 666 million in RV670. As was already said, G92 is limited by memory bandwidth, so it's not as effective. GT200 has a higher ALU:tex ratio, that should give it even more perf/transistor than if if was just a bigger G80/G92. GT200 has 1,4 billion transistors, that is an official figure. If we were to compare its effectiveness against other GPUs, we need to factor in the NVIO which is about 100 million transistors. So we have a 1,5 billion transistor beast. RV770 won't come anywhere near 75% of this number. I know that the mathematics involved are not exactly correct, but hey, it works.

AnarchX · May 30, 2008

Lukfi said:
. If we were to compare its effectiveness against other GPUs, we need to factor in the NVIO which is about 100 million transistors.

So much transitors?

I thought NVIO(1.0) was in low two digit area and most die-space of this 110nm die was empty to get the enough space for pin-out.

Lukfi · May 30, 2008

=>AnarchX: Well apparently the 100 million figure is overstated. Google can't find any info on this, so I don't know. But if you just compare G80 and G92, the difference is 73 million and G92 has only two thirds the ROPs and memory controller channels. But frankly I don't know whether those >73 million came from NVIO or the modified texturing units.

AnarchX · May 30, 2008

Just a bit calculating:

A: TCPs with 8TAs
B: ROP-partitions
C: all the other stuff, where should be a big part setups

G92: 8*A + 4*B + C = 754
G94: 4*A + 4*B + C = 505
G84: 2*A + 2*B + C = 289
=> A=62,25; B=45,75; C=73

So I would clearly see NVIO below <20/30M and still ask me why it is not included in this 1.4B transitors.

Lukfi · May 30, 2008

You're right, it's a mystery. If it's just 20-30 million or so then it's not a big deal compared to G80's 681 million and GT200's 1,4 billion. However, where would a simple chip like NVIO get more than 50-60 million? That gets you a fully functional GPU.

On a side note... can anybody compute 240 GT200 shaders × 1296 MHz (Charlie's shader domain frequency) to get a GFLOPS figure?

AnarchX · May 30, 2008

240 * 2 + 1 (MADD+ GPGPU/General Shading enabled MUL) * 1.296GHz = 933GFLOPs.

Back to NVIO:
Just some weeks ago I heard a rumor in Asia, that Nvidia is planing to ship in Q2 2009 GeForce only without display-outputs, while open H-SLI to AMD, with the target to stay in Intel Chipset market, even in Nehalem times.
Maybe NVIO is part of this strategy and NVIO can then only ordered for Quadro.

INKster · May 30, 2008

AnarchX said:
Back to NVIO:
Just some weeks ago I heard a rumor in Asia, that Nvidia is planing to ship in Q2 2009 GeForce only without display-outputs, while open H-SLI to AMD, with the target to stay in Intel Chipset market, even in Nehalem times.
Maybe NVIO is part of this strategy and NVIO can then only ordered for Quadro.

Well, with memory controllers vacating die space on motherboard northbridges in both Intel and AMD line-ups, i wouldn't be surprised if NVIO-related functionality ever appeared on them.
In fact, Hybrid-SLI/Hybrid Power is close to it right now -but not quite there yet-.

The only thing that's certain is that the GT200's "NVIO" isn't the same NVIO used on G80, judging by the resistor placement and -apparently- a slightly larger die.

mczak · May 30, 2008

Lukfi said:
=>AnarchX: Well apparently the 100 million figure is overstated. Google can't find any info on this, so I don't know. But if you just compare G80 and G92, the difference is 73 million and G92 has only two thirds the ROPs and memory controller channels. But frankly I don't know whether those >73 million came from NVIO or the modified texturing units.

I thought nvio was quite small in terms of transistor count - maybe not even two-digit million number. However, I'd certainly expect those to be "large transistors"... G92 had compared to G80 nvio integrated, VP2 engine, twice the texture addressing capability. Conversely, it certainly saved on ROPs (though while there only 4 quad-rops instead of 6 each one probably is slightly more complex since they support better compression - still those 4 rops should of course use less transistors than the old 6).

mczak · May 30, 2008

Lukfi said:
Sure I was comparing official figures which may be counted differently. But it's absolutely irrelevant for the sake of the argument!

Well you're right, but I still don't like using these numbers for anything. Though I admit you could make the same argument if you'd use die size for comparison too...

We do know that 505 million in G94 is comparable to 666 million in RV670. As was already said, G92 is limited by memory bandwidth, so it's not as effective. GT200 has a higher ALU:tex ratio, that should give it even more perf/transistor than if if was just a bigger G80/G92. GT200 has 1,4 billion transistors, that is an official figure. RV770 won't come anywhere near 75% of this number.

But you forget that current rumours suggest it's clocked quite a bit lower - so I don't really expect it to have a higher "per-transistor performance" than G92. Also, if you think G92 is really limited by memory bandwidth (which certainly seems to be a factor), then G200 would be just the same if not worse if you assume it's at least twice as fast...
Also, I didn't dispute that GTX 260 won't be faster than a single RV770 - but two of them certainly will exceed the transistor number (if you insist on using these, or you could just look at die sizes too) of a single GTX280. If R700 is going to use GDDR5 ram, it will also have vastly more total memory bandwidth (though it might be wasted depending on how those rumours about single address space turn out...)

ninelven · May 30, 2008

mczak said:
If R700 is going to use GDDR5 ram, it will also have vastly more total memory bandwidth

Where do you get vastly? Even at 4.5GHz, that is 144GB/s with a 256bit memory interface.

mboeller · May 30, 2008

AnarchX said:
240 * 2 + 1 (MADD+ GPGPU/General Shading enabled MUL) * 1.296GHz = 933GFLOPs.

Upps!

Does that mean that the "missing MUL" is still missing when you use the 280GTX for gaming?

Therefore the Vantage-Scores would make sense!

280GTX: 240 x 2 x 1296 = 622 GFLOPS.
8800Ultra: 128 x 2 x 1512 = 387 GFLOPS

622/387 = 1,61

The 280GTX has a 1,66 times higher score in Vantage than the 8800Ultra.

NVIDIA GT200 Rumours & Speculation Thread

Kaotik

Drunk Member

leoneazzurro

mczak

Arty

KEPLER

INKster

Arun

Unknown.

fellix

kresek

INKster

Lukfi

AnarchX

Lukfi

AnarchX

Lukfi

AnarchX

INKster

mczak

mczak

ninelven

PM

mboeller

Similar threads