NVIDIA GT200 Rumours & Speculation Thread

Status
Not open for further replies.
What are you distinguishing here? Memory controllers and physical interfaces?

So, overall, 8 channels of memory amount to 22.25% of the die?

Jawed

OK, are you saying each MC is 64bit wide and connects to 2 mem chips each? The card is reported to have 16 chips for 512bit bus.
 
nAo said:
Any more info on this big chunk?
Global Scheduler, Triangle Setup, Input Assembly, Rasterization, L2 Cache, I/O controllers, and so forth presumably - it's really just everything that isn't duplicated elsewhere on the chip.
Jawed said:
What are you distinguishing here? Memory controllers and physical interfaces?
The I/O interfaces, yes, that's what you see on the edges of the chip.
fellix said:
Any guess for the cache size? There are vividly 48 (16*3) SRAM banks in each triplet.
That's not cache; it's RF/Shared Memory. As for the amount, everything in due time...
Damn -- at least double the resolution of the die shot would reveal much more.
Yeah, it does, but I don't think there's anything revolutionary there that you couldn't notice from CJ's shot. Should even be possible to figure out what's copy-pasted and what's not. Alternatively, you could just wait for our article... As for when it'll come out, stay tuned! (Hi NV! ;))
 
Global Scheduler, Triangle Setup, Input Assembly, Rasterization, L2 Cache, I/O controllers, and so forth presumably - it's really just everything that isn't duplicated elsewhere on the chip.
Hi-Z culling is probably a part of this chunk too then.
Very interesting Arun, thanks for the info and I am waiting for the final article, harry up! ;)
 
So, what we're seeing here is that 3 multiprocessors are about the same size as 4x TAs and 8x TFs.

Hmm, so in G80 it's reasonable to say that 4x TAs+8xTFs+L1-cache are roughly 50% bigger than the cluster's ALUs+register-file+shared-memory ("16 ALUs"). Needless to say I feel vindicated after all the shit I've taken for suggesting that TMUs are costly.

A naive averaging, 364M transistors across 80 bilinears per clock, makes each bilinear result cost ~4.6M transistors.

Or if you prefer 40x fp16s, each of which costs ~9M transistors.

Jawed
 
So, what we're seeing here is that 3 multiprocessors are about the same size as 4x TAs and 8x TFs.

Hmm, so in G80 it's reasonable to say that 4x TAs+8xTFs+L1-cache are roughly 50% bigger than the cluster's ALUs+register-file+shared-memory ("16 ALUs"). Needless to say I feel vindicated after all the shit I've taken for suggesting that TMUs are costly.

A naive averaging, 364M transistors across 80 bilinears per clock, makes each bilinear result cost ~4.6M transistors.

Or if you prefer 40x fp16s, each of which costs ~9M transistors.

Jawed

Any justification on why NVIDIA expands more TMU units within this brand new ASIC ?
:?:
 
Any justification on why NVIDIA expands more TMU units within this brand new ASIC ?
:?:
NVidia's bumped up the ALU:TEX ratio quite considerably. If they'd used less clusters (hence less TMUs) and more multiprocessors per cluster (to end up with ~900+GFLOPs) the ALU:TEX ratio would have gone up even more.

For NVidia the meaning of ALU:TEX ratio is different than for ATI - I think it's reasonable to say that NVidia's thread scheduling (ALU and TEX instruction issue) means that a lower ratio is required to hide texturing latency.

But, with the magically-rediscovered MUL the effective ALU:TEX ratio goes up another notch (though ALU clock of ~1300MHz and TMU clock of ~600MHz makes for a slight lowering of the ratio in comparison with G80's 1350/575).

Anyway, in terms of overall performance I expect GT280's "80 TMUs" look "better balanced" than G80's 64 ;)

Jawed
 
So, what we're seeing here is that 3 multiprocessors are about the same size as 4x TAs and 8x TFs.
You are assuming that TMUs and SPs have not changed much from G80 to the new architecture, and at this time we really don't know.
 
You are assuming that TMUs and SPs have not changed much from G80 to the new architecture, and at this time we really don't know.
The GT280 die picture makes this picture easier to interpret:

http://www.techpowerup.com/reviews/NVIDIA/G80/images/core.jpg

The extent of the ALUs is a bit hard to discern though, due to the way those sections fade at the edges. It seems the entire stretch of die between the multiprocessors on either side is TMUs. G80 doesn't have the "cross" that we see in GT280 - instead it seems to have 3 horizontal bands - apparently leaving the central region to TMUs.

It seems G80 clusters are 50:50 multiprocessors and TMUs.

Jawed
 
About GT200b. Well if this is true that it is taped out, why NVIDIA releases it`s 65nm version next month and not wait about 2-3 months for 55nm version?
It usually takes more than 3 months to go from tape out to production.
 
The GT280 die picture makes this picture easier to interpret:

http://www.techpowerup.com/reviews/NVIDIA/G80/images/core.jpg

The extent of the ALUs is a bit hard to discern though, due to the way those sections fade at the edges. It seems the entire stretch of die between the multiprocessors on either side is TMUs. G80 doesn't have the "cross" that we see in GT280 - instead it seems to have 3 horizontal bands - apparently leaving the central region to TMUs.

It seems G80 clusters are 50:50 multiprocessors and TMUs.

Jawed

Is today, die shot's day?

Ok, i will contribute with a G80 die shot, that even babies will understand.
Jawed it's a little easier to discern blocks with this picture , no? :LOL:



And yes, G80 clusters seems to be 50:50 multiprocessors and TMUs.

And I almost forgot to say I saw shared memory per multiprocessor gets doubled in GT200 vs G80 (32KB vs 16KB). And remember there are 30 multiprocessors vs 16 in G80.
 
It's 10, 3x8 in each, no coarse redundancy. Anyway, since we're letting the cat out of the bag here... This is also based on a die shot, although a better one so if you can't corroborate it, you'll just have to believe me:
~26.5% SMs, ~26% Non-SM Clusters (TMUs etc.), ~14.25% memory I/O, ~13.25% unique, ~8% MCs, ~6.25% ROPs, ~2% Misc. I/O, 4%+ Misc. (intrachip I/O, rounding errors, unidentifiable units, etc.)

Arun, did you mean 13.0% for unique?

Great breakdown, and even though you guys have a better shot, CJ's die shot was very nice!
 
It's 10, 3x8 in each, no coarse redundancy. Anyway, since we're letting the cat out of the bag here... This is also based on a die shot, although a better one so if you can't corroborate it, you'll just have to believe me:
~26.5% SMs, ~26% Non-SM Clusters (TMUs etc.), ~14.25% memory I/O, ~13.25% unique, ~8% MCs, ~6.25% ROPs, ~2% Misc. I/O, 4%+ Misc. (intrachip I/O, rounding errors, unidentifiable units, etc.)
So, is there any part refered to PV2 on this core :?:

Don't tell me that the same history as G80 will repeat itself on GT200 too. :devilish:
 
Is it really rediscovered, or has the marketing department "rediscovered" it?

-Dave

The documents talk about "Improved Dual Issue"... so make of it what you will.... Also mentioned are "2x Registers" and "3x ROP blending performance".
 
Sounds like they just fixed the MUL issue by adding register space and made a marketing issue out of it. I guess it could be viewed as discovered since it wasn't really available for general use, even through they counted it towards performance.
 
According to a Mod from known PCinlife, who has also die-shots and other informations in his hands, GTX 280 will offer 2 times 8800 Ultra performance.
 
Status
Not open for further replies.
Back
Top