NVIDIA GT200 Rumours & Speculation Thread

XMAN26 · May 29, 2008

Jawed said:
What are you distinguishing here? Memory controllers and physical interfaces?

So, overall, 8 channels of memory amount to 22.25% of the die?

Jawed

OK, are you saying each MC is 64bit wide and connects to 2 mem chips each? The card is reported to have 16 chips for 512bit bus.

fellix · May 29, 2008

Sixteen 32-bit devices would fill eight 64-bit channels -- the same case as in R600.

ShaidarHaran · May 29, 2008

XMAN26 said:
OK, are you saying each MC is 64bit wide and connects to 2 mem chips each? The card is reported to have 16 chips for 512bit bus.

Yes. NV has used this approach since the original Geforce DDR days, IIRC. Crossbar w/64-bit channels FTW.

Arun · May 29, 2008

nAo said:
Any more info on this big chunk?

Global Scheduler, Triangle Setup, Input Assembly, Rasterization, L2 Cache, I/O controllers, and so forth presumably - it's really just everything that isn't duplicated elsewhere on the chip.

Jawed said:
What are you distinguishing here? Memory controllers and physical interfaces?

The I/O interfaces, yes, that's what you see on the edges of the chip.

fellix said:
Any guess for the cache size? There are vividly 48 (16*3) SRAM banks in each triplet.

That's not cache; it's RF/Shared Memory. As for the amount, everything in due time...

Damn -- at least double the resolution of the die shot would reveal much more.

Yeah, it does, but I don't think there's anything revolutionary there that you couldn't notice from CJ's shot. Should even be possible to figure out what's copy-pasted and what's not. Alternatively, you could just wait for our article... As for when it'll come out, stay tuned! (Hi NV!

)

nAo · May 29, 2008

Arun said:
Global Scheduler, Triangle Setup, Input Assembly, Rasterization, L2 Cache, I/O controllers, and so forth presumably - it's really just everything that isn't duplicated elsewhere on the chip.

Hi-Z culling is probably a part of this chunk too then.
Very interesting Arun, thanks for the info and I am waiting for the final article, harry up!

Jawed · May 30, 2008

So, what we're seeing here is that 3 multiprocessors are about the same size as 4x TAs and 8x TFs.

Hmm, so in G80 it's reasonable to say that 4x TAs+8xTFs+L1-cache are roughly 50% bigger than the cluster's ALUs+register-file+shared-memory ("16 ALUs"). Needless to say I feel vindicated after all the shit I've taken for suggesting that TMUs are costly.

A naive averaging, 364M transistors across 80 bilinears per clock, makes each bilinear result cost ~4.6M transistors.

Or if you prefer 40x fp16s, each of which costs ~9M transistors.

Jawed

Vincent · May 30, 2008

Jawed said:
So, what we're seeing here is that 3 multiprocessors are about the same size as 4x TAs and 8x TFs.

Hmm, so in G80 it's reasonable to say that 4x TAs+8xTFs+L1-cache are roughly 50% bigger than the cluster's ALUs+register-file+shared-memory ("16 ALUs"). Needless to say I feel vindicated after all the shit I've taken for suggesting that TMUs are costly.

A naive averaging, 364M transistors across 80 bilinears per clock, makes each bilinear result cost ~4.6M transistors.

Or if you prefer 40x fp16s, each of which costs ~9M transistors.

Jawed

Any justification on why NVIDIA expands more TMU units within this brand new ASIC ?
:?:

Jawed · May 30, 2008

Vincent said:
Any justification on why NVIDIA expands more TMU units within this brand new ASIC ?

NVidia's bumped up the ALU:TEX ratio quite considerably. If they'd used less clusters (hence less TMUs) and more multiprocessors per cluster (to end up with ~900+GFLOPs) the ALU:TEX ratio would have gone up even more.

For NVidia the meaning of ALU:TEX ratio is different than for ATI - I think it's reasonable to say that NVidia's thread scheduling (ALU and TEX instruction issue) means that a lower ratio is required to hide texturing latency.

But, with the magically-rediscovered MUL the effective ALU:TEX ratio goes up another notch (though ALU clock of ~1300MHz and TMU clock of ~600MHz makes for a slight lowering of the ratio in comparison with G80's 1350/575).

Anyway, in terms of overall performance I expect GT280's "80 TMUs" look "better balanced" than G80's 64

Jawed

nAo · May 30, 2008

Jawed said:
So, what we're seeing here is that 3 multiprocessors are about the same size as 4x TAs and 8x TFs.

You are assuming that TMUs and SPs have not changed much from G80 to the new architecture, and at this time we really don't know.

Jawed · May 30, 2008

nAo said:
You are assuming that TMUs and SPs have not changed much from G80 to the new architecture, and at this time we really don't know.

The GT280 die picture makes this picture easier to interpret:

http://www.techpowerup.com/reviews/NVIDIA/G80/images/core.jpg

The extent of the ALUs is a bit hard to discern though, due to the way those sections fade at the edges. It seems the entire stretch of die between the multiprocessors on either side is TMUs. G80 doesn't have the "cross" that we see in GT280 - instead it seems to have 3 horizontal bands - apparently leaving the central region to TMUs.

It seems G80 clusters are 50:50 multiprocessors and TMUs.

Jawed

3dcgi · May 30, 2008

Domell said:
About GT200b. Well if this is true that it is taped out, why NVIDIA releases it`s 65nm version next month and not wait about 2-3 months for 55nm version?

It usually takes more than 3 months to go from tape out to production.

juan789123498 · May 30, 2008

Jawed said:
The GT280 die picture makes this picture easier to interpret:

http://www.techpowerup.com/reviews/NVIDIA/G80/images/core.jpg

The extent of the ALUs is a bit hard to discern though, due to the way those sections fade at the edges. It seems the entire stretch of die between the multiprocessors on either side is TMUs. G80 doesn't have the "cross" that we see in GT280 - instead it seems to have 3 horizontal bands - apparently leaving the central region to TMUs.

It seems G80 clusters are 50:50 multiprocessors and TMUs.

Jawed

Is today, die shot's day?

Ok, i will contribute with a G80 die shot, that even babies will understand.
Jawed it's a little easier to discern blocks with this picture , no?

And yes, G80 clusters seems to be 50:50 multiprocessors and TMUs.

And I almost forgot to say I saw shared memory per multiprocessor gets doubled in GT200 vs G80 (32KB vs 16KB). And remember there are 30 multiprocessors vs 16 in G80.

jimmyjames123 · May 30, 2008

Arun said:
It's 10, 3x8 in each, no coarse redundancy. Anyway, since we're letting the cat out of the bag here... This is also based on a die shot, although a better one so if you can't corroborate it, you'll just have to believe me:
~26.5% SMs, ~26% Non-SM Clusters (TMUs etc.), ~14.25% memory I/O, ~13.25% unique, ~8% MCs, ~6.25% ROPs, ~2% Misc. I/O, 4%+ Misc. (intrachip I/O, rounding errors, unidentifiable units, etc.)

Arun, did you mean 13.0% for unique?

Great breakdown, and even though you guys have a better shot, CJ's die shot was very nice!

satein · May 30, 2008

Arun said:
It's 10, 3x8 in each, no coarse redundancy. Anyway, since we're letting the cat out of the bag here... This is also based on a die shot, although a better one so if you can't corroborate it, you'll just have to believe me:
~26.5% SMs, ~26% Non-SM Clusters (TMUs etc.), ~14.25% memory I/O, ~13.25% unique, ~8% MCs, ~6.25% ROPs, ~2% Misc. I/O, 4%+ Misc. (intrachip I/O, rounding errors, unidentifiable units, etc.)

So, is there any part refered to PV2 on this core :?:

Don't tell me that the same history as G80 will repeat itself on GT200 too.

trinibwoy · May 30, 2008

juan789123498 said:
And I almost forgot to say I saw shared memory per multiprocessor gets doubled in GT200 vs G80 (32KB vs 16KB). And remember there are 30 multiprocessors vs 16 in G80.

Well at least a doubling of the PDC was expected given the double precision support.

dnavas · May 30, 2008

Jawed said:
But, with the magically-rediscovered MUL

Is it really rediscovered, or has the marketing department "rediscovered" it?

-Dave

CJ · May 30, 2008

dnavas said:
Is it really rediscovered, or has the marketing department "rediscovered" it?

-Dave

The documents talk about "Improved Dual Issue"... so make of it what you will.... Also mentioned are "2x Registers" and "3x ROP blending performance".

Anarchist4000 · May 30, 2008

Sounds like they just fixed the MUL issue by adding register space and made a marketing issue out of it. I guess it could be viewed as discovered since it wasn't really available for general use, even through they counted it towards performance.

AnarchX · May 30, 2008

According to a Mod from known PCinlife, who has also die-shots and other informations in his hands, GTX 280 will offer 2 times 8800 Ultra performance.

Tchock · May 30, 2008

AnarchX said:
According to a Mod from known PCinlife, who has also die-shots and other informations in his hands, GTX 280 will offer 2 times 8800 Ultra performance.

Last I heard, 2*9800GX2 performance came from their mouths too.

NVIDIA GT200 Rumours & Speculation Thread

XMAN26

fellix

ShaidarHaran

hardware monkey

Arun

Unknown.

nAo

Nutella Nutellae

Jawed

Vincent

Jawed

nAo

Nutella Nutellae

Jawed

3dcgi

juan789123498

jimmyjames123

satein

trinibwoy

Meh

dnavas

CJ

Anarchist4000

AnarchX

Tchock

Similar threads