NVIDIA GT200 Rumours & Speculation Thread

Status
Not open for further replies.
So it's your opinion that Nvidia did indeed break up their cluster an decoupled TMUs from ALU-blocks? No more "one Quad-TMU per 16xSIMD"?
I'm simply agreeing with Arun's interpretation, each cluster contains 3 multiprocessors and 2 quad TMUs. This increases the ALU:TEX ratio, something that NVidia has signalled (weakly, to be honest) will happen.

Overall I have to say the quality of this 240 SPs rumour is a bit thin - NVidia can blind analysts pretty easily, e.g. G80 can be described as 160 SPs (8 MAD lanes + 2 MI lanes per multiprocessor). If you use that as the basis of "240 SPs", then GT200 is a 12 cluster design with 96 TMUs :rolleyes:

Jawed
 
Yeah I must be missing where that 240 SP number came from Nvidia. It could easily be referring to Tesla or simply a mistake on the part of the writer. Either way, a 240 SP, 80 TMU G8x variant will easily be twice as fast as Nvidia's fastest single chip solutions today given sufficient bandwidth. I don't get how that's a disappointment if it were true.
 
Yeah I must be missing where that 240 SP number came from Nvidia. It could easily be referring to Tesla or simply a mistake on the part of the writer. Either way, a 240 SP, 80 TMU G8x variant will easily be twice as fast as Nvidia's fastest single chip solutions today given sufficient bandwidth. I don't get how that's a disappointment if it were true.


It is too good to be true. :D
 
http://messages.finance.yahoo.com/S...&tid=435392&mid=435402&tof=7&rt=2&frt=2&off=1

240 cores... 10 clusters x 3x8 SPs/cluster? I'm not exactly impressed to say the least unless each SP is more powerful than G80's though, or overall chip efficiency was somehow improved. The reason why I bother posting this is that it's an excerpt from Barron's, which apparently met with Jen-Hsun last week.

In other news, my home processor contains XtReme Processors, a whole boatload of them: 1024! I don't know what the hell Nvidia is doing with only a crappy 240 cores. I thought they were on top of the game, but apparently they are just going to fade like everyone else.

First ATI, then AMD, and now Nvidia. Oh well.

And yes, I'm going to keep this crusade up until everyone ridicules anyone who refers to a single alu as a processor or a core. You either count a single bit xor as a "processor/core" you count something that actually is a processor as a processor.

Aaron Spink
speaking for myself inc.
 
And yes, I'm going to keep this crusade up until everyone ridicules anyone who refers to a single alu as a processor or a core. You either count a single bit xor as a "processor/core" you count something that actually is a processor as a processor.

Ugh, can you imagine always having to explain the difference between Nvidia and AMD "processors" if that were to happen? I think referring to the number of ALUs is a far lesser evil! :smile:
 
I'm going to keep this crusade up until everyone ridicules anyone who refers to a single alu as a processor or a core. You either count a single bit xor as a "processor/core" you count something that actually is a processor as a processor.

Aaron Spink
speaking for myself inc.

Amen.

Who ever decided to use the term "stream processor" from nVIDIA's marketing department should be shot. Now people refer the RV670 having 320 "stream processors" for the love of god. Since when were ALUs = Processor anyway..

And then theres the whole blasted naming scheme thats been going out of control.. :LOL:

End of rant.
 
/me adjusts his resume to fit with the times. Guess i've designed 2 processors now. My third one will be a 3ghz 128bit processor, capable of single cycle XORs.
 
I thought the 65nm G100/GT200 was looking like a ~250w TDP, even cutting it down would probably be in the ~200w range. Two of them together ~400w.
Power connectors? Nvidia going to a 6+8pin on each PCB?

As per some research into the topic, 45nm seems possible for ATi in Q4 '08 meaning Nvidia might move to it sometime in Q2 '09. I doubt shrinking GT200 from 65nm down to 55nm will give Nvidia the TDP headroom to put a highend fully clocked GT200 in a dual chip/pcb product. (Edit- So right after I post this, I find this article at TheInq... 40nm half node at TSMC? AMD/ATi planning a jump from 55nm to 40nm for R7**?)

Also, there was talk about ~2-2.4ghz shader domains for Nvidia's next highend GPU. With 240SPs counting max like G80's 2+1, this would put FP performance somewhere in the 1.4-1.7TFlop range.
This would be impressive, roughly doubling SPs while still increasing the shader domain a good amount.

Also just to clarify, didn't G92 have 64 TMU & TFUs?
No one thinks that is sufficient for GT200?
 
Last edited by a moderator:
And yes, I'm going to keep this crusade up until everyone ridicules anyone who refers to a single alu as a processor or a core. You either count a single bit xor as a "processor/core" you count something that actually is a processor as a processor.

I agree completely. This is not a PR forum, so we should keep a technical language.
 
NVidia could put two GT200s on a card to make a GX2, to compete with RV770X2.
That would be an overkill from every POV even if G100 has 'only' 240 SPs.

So what you're proposing is a 16 cluster GPU, with 2 multiprocessors and 2 TMUs per cluster, with 1 cluster turned off for redundancy?

I'm dubious NVidia will retain the current ALU:TEX ratio and it's also clear that significantly more TMUs are not needed to attain 2x G92's performance.
I think it's pretty clear right now that NV prefers to be B/W limited rather than texture fetch limited (and it's the right way as far as i can tell). So G100 can have some exessive texturing power which won't be used most of the time but will allow them to be quite a bit faster in several benchmarks -- as it is with every G8x/9x chip right now.
But i'm not really buying this 240 SPs rumour for now, and when it comes to the previous rumour of 384 SPs -- well, there things can get a little bit more interesting cause 192 TMUs is really really too much even for 512-bit bus.
 
Last edited by a moderator:
What I'd really like to see, but this sadly is most likely wishful thinking, is 128 TMUs/384 SPs with the TMUs only being in charge of filtering INT8 and compressed formats. Certainly if the datapaths to the ALUs aren't too expensive (it could be 1/4th for FP10/FP16/IN16 and 1/8th for FP32 given the number of TMUs so that should help) that *might* become an appealing design.
 
What I'd really like to see, but this sadly is most likely wishful thinking, is 128 TMUs/384 SPs with the TMUs only being in charge of filtering INT8 and compressed formats. Certainly if the datapaths to the ALUs aren't too expensive (it could be 1/4th for FP10/FP16/IN16 and 1/8th for FP32 given the number of TMUs so that should help) that *might* become an appealing design.

Sorry if this is obvious and I'm being dense but exactly why would this be appealing either from a design or consumer standpoint?
 
Well, I'm no dev nor architect, but if you were to design a chip with TMUs dedicated to int8-only filtering, I would imagine that would increase your bilerps rate tremendously over current designs. Not that G8x/G9x are terribly texture-bound, but throw enough ALUs on the chip and you tend to become texture-bound again (see R6xx).
 
In my opinion it is unlikely that Jen-Hsun would say preannounce the number of cores to a Barron's writer. I think it is much more likely that there was some miscommunication on the part of the Barron's writer and the GX2. He is very careful with what he says and I highly doubt he would let that slip, even off the record.

Sure I could be wrong, but Barron's is not the most reliable source for cold hard facts. And that is the understatement of the year.
 
5.1.2.1 Global Memory ... We recommend fulfilling the coalescing requirements for the entire warp as opposed to only each of its halves separately because future devices will necessitate it for proper coalescing. : Intent for future hardware to double the internal bus size from 512-bit to 1024-bit (16 thread half warp * 32bit -> 32 thread warp * 32bit)? Could this be matched with 1024-bit GDDR5 (I don't know anything about GDDR so sorry if that was a dumb question)?

Also this:

5.1.2.2 Constant Memory ... The cost scales linearly with the number of different addresses read by all threads. We recommend having all threads of the entire warp read the same address as opposed to all threads within each of its halves only, as future devices will require it for full speed read.

It does sound like the warp size will stay constant at 32 "threads" but shared memory banks will double to 32, and the 2x8 ALU + 2 interp ALU per cluster configuration might change to something like a 2x16 + 2 interp setup.

That would keep the cluster count for a 256ALU GPU at 8 (if the interconnect is a crossbar which I thought it was, then scaling up to double the number of ports might be an issue, no?). The control logic would have to be beefed up to deal with the need to issue 2x the number of instructions per cycle (and probably also track more warps). Texture power (per-clock per-cluster) would remain where it is now unless the TMUs are also widened...

Does that make much sense as a hypothetical design?
 
In other news, my home processor contains XtReme Processors, a whole boatload of them: 1024! I don't know what the hell Nvidia is doing with only a crappy 240 cores. I thought they were on top of the game, but apparently they are just going to fade like everyone else.

First ATI, then AMD, and now Nvidia. Oh well.

And yes, I'm going to keep this crusade up until everyone ridicules anyone who refers to a single alu as a processor or a core. You either count a single bit xor as a "processor/core" you count something that actually is a processor as a processor.

Aaron Spink
speaking for myself inc.

Assuming there's any truth to that rumour, by your definition if G80 can be seen as 16 processors then for GFNext you'd have 24 of them with unknown yet capabilities.

...and of course here we again with the senseless number crunching. Any sort of unit count is so damn useless until someone knows at least a few details of what each unit is capable of that it isn't funny anymore. It's not like tendencies like that are something new, it happens every time a few tidbits/rumours (true or not) appear before the release of any GPU. There's neither any shock and awe reaction from me personally to a supposed gazillion of SPs (like 800 f.e.) as there isn't any sort of disappointment for a number that might sound too "little" (like 240 f.e.).

Equally idiotic is also the TMU number crunching; are they 64,96, 128, 196? Let me start somewhere else before each and everyone throws in his funky theory: what does anyone mean when he says "TMU" nowadays? In G80 we saw 64 TF and 32 TA units, while in midrange G8x and G9x we see 64 TF/TAs. So what's the current take on what each unit is capable of before anyone else spins the lottery and comes out with the next funky theory.


LordEC911,

That ALU frequency might as well originate from G92b; while I have severe doubts it'll reach 2.4GHz, something below or equal to 2GHz doesn't sound impossible since we're supposedly talking about 55nm. IHVs are most of the times way too enthusiastic and optimistic about frequencies reached before they go into the final production stage. I recall also reading about frequecies beyond 1GHz for RV770. Folks like you are aware how safety measures work for final products; all these rumours can indicate are how usual overclockability on a final product might look like.

Jawed,

If there has been anything scratched a long time ago from any roadmap it was something that more likely had a G9x alike internal codename than anything else.
 
Last edited by a moderator:
LordEC911,

That ALU frequency might as well originate from G92b; while I have severe doubts it'll reach 2.4GHz, something below or equal to 2GHz doesn't sound impossible since we're supposedly talking about 55nm. IHVs are most of the times way too enthusiastic and optimistic about frequencies reached before they go into the final production stage. I recall also reading about frequecies beyond 1GHz for RV770. Folks like you are aware how safety measures work for final products; all these rumours can indicate are how usual overclockability on a final product might look like.

Like the "65nm R600" supposedly hitting +900mhz.
And then RV670 supposedly clocking +900mhz.
BTW- Roughly the same GPU but different rumors with quite a bit of time between them.
 
Status
Not open for further replies.
Back
Top