ELSA hints GT206 and GT212

DegustatoR said:
Well you obviously can't have both
In the words of John McEnroe, "you can't be serious." I mean we all know the 8800GTS G92 had lower clocks than the 8800GT due to that extra TPC just having a devastating effect... oh wait.

I still don't understand your logic... a single TPCs is insignificant... except when going from 8 to 9... then the world ends.

Maybe they should make a chip with 0 TPCs.... surely they would be able to achieve an inifinite clockrate.

Also, wouldn't a higher clockrate make each TPC more valuable... why yes, yes it would.
 
I think that they should have go with GDDR5 since the beginning with that 512bit bus. It was too expensive & complex in the PCB. GDDR5 @ 240GB/s with their ROPs could have marked the difference. Come on, they have been using GDDR3 since the NV40.

Shading power is great, but bandwidth is like the ingots of a good gfx card (please, no R600 or Crysis references).
 
Come on, they have been using GDDR3 since the NV40.

Actually, it predates it. The Geforce FX 5700 Ultra (NV36) was their first card with GDDR3 onboard. ;)
And yes, GDDR3 is becoming a little long in the tooth, but this was also the fault of the alternatives because GDDR4 failed to achieve practical performance/cost gains over it.
GDDR5 might just have better luck, though...
 
It is. GT206, however, should be on 55nm and it should have 216 SPs as well.

There's a somewhat weird rumour circulating that the last two digits in the 206 and 212 codenames indicate cluster amounts. Such a hypothetical case would indicate more SPs/cluster, less overall TMUs and less ROPs/memory channels.

Those 216SPs on the second variant of the 260@65nm isn't gaining them anything spectacular in performance since the ALU:TMU frequency ration is still stuck on GT200 levels. With a 2.5x ratio as on G92 they could easily close the gap of the "missing" 20 SPs.
 
I think that they should have go with GDDR5 since the beginning with that 512bit bus. It was too expensive & complex in the PCB. GDDR5 @ 240GB/s with their ROPs could have marked the difference. Come on, they have been using GDDR3 since the NV40.

Shading power is great, but bandwidth is like the ingots of a good gfx card (please, no R600 or Crysis references).

IHVs equip usually their GPU (and yes there have been weird exceptions in the past) with as much bandwidth as they need. There's not a single sign so far that GT200 is in any serious way bandwidth restrained and if they use on any future variant GDDR5, they'll most likely adjust the amount of ROPs/memory channels to the bandwidth their own simulators indicate each chip actually needs.
 
There's a somewhat weird rumour circulating that the last two digits in the 206 and 212 codenames indicate cluster amounts.
You mean that GT206 would have six clusters whereas the GT212 would have 12, or that GT212 will be double GT206 no matter its specs?
My guess is the codenames don't mean much in this generation. Otherwise think about G86, G96 and G92 (80 + 12).
 
You mean that GT206 would have six clusters whereas the GT212 would have 12, or that GT212 will be double GT206 no matter its specs?

As I said its a theory that circulates; it might or might have any relevance. It does make sense though if you think that a 206 is going to be slightly faster than the existing 260 but under 55nm and >Q1 a 212 with twice the clusters @40nm ending up faster than the current 280.

Bottomline here should be that everyone counts into NV trying to save as many transistors as possible in order to decrease severely manufacturing costs. The above theory does make sense to me if you consider that a 206 probably won't be fillrate "underpowered" with 48 TMUs and 16 ROPs and what the cost in transistor budget for each of those units usually should be.

My guess is the codenames don't mean much in this generation. Otherwise think about G86, G96 and G92 (80 + 12).

Since the codenames here are for high end SKUs mostly I suppose the chain is rather: GT200->GT206->GT212. While it is true that codenames don't have to indicate anything, the above makes more sense to me than your 216SP@55nm theory. Want to calculate what difference in transistors alone 48TMUs difference would make?
 
My guess is the codenames don't mean much in this generation. Otherwise think about G86, G96 and G92 (80 + 12).
Every +2 halves the number of clusters. Which makes for a very simple to understand line-up to understand for G8x/G9x; too bad it also means NV was too lazy to try and figure out a way to make a more optimal roadmap :p (something which AMD tried to do up to a certain extend in the R6xx era; i.e. different ALU-TEX ratios per-chip)
 
Bottomline here should be that everyone counts into NV trying to save as many transistors as possible in order to decrease severely manufacturing costs. The above theory does make sense to me if you consider that a 206 probably won't be fillrate "underpowered" with 48 TMUs and 16 ROPs and what the cost in transistor budget for each of those units usually should be.
Why 48 TUs? You think they're gonna redesign the clusters again to increase the ALU:TEX ratio? I don't think so. R6xx was built with this modularity in mind, so each chip of the lineup had different ALU:TEX ratio, like Arun says. G8x/G9x on the other hand, was designed for a fixed ALU:TEX ratio for all chips (similar to R5xx) and it probably took some work to put more 24 SPs into a cluster in GT200.
There's also no indication there will actually be 16 ROP/RBEs. If GT206 is supposed to be about the same powerful, why should it have just 16?
 
IHVs equip usually their GPU (and yes there have been weird exceptions in the past) with as much bandwidth as they need. There's not a single sign so far that GT200 is in any serious way bandwidth restrained and if they use on any future variant GDDR5, they'll most likely adjust the amount of ROPs/memory channels to the bandwidth their own simulators indicate each chip actually needs.

Yes, they adjust the bandwidth, but, it's clear that the card suffers a lot with high res & AA. It could be due to the weaker nV ROPs, as you guys told me. But, if nV ROPs are weaker than ATIs ROPs, it's a good reason to put more bw in the chip, or to redesign the ROPs for higher efficiency.

On the GTX280, nV double or tripled the shading power, but they only raised the bw of the chip by a 40% compared to the mighty 8800 Ultra. So, i think that the card will became bounded by the bw, quicker than due to the shading power. When the shader programs aren't so complex, you'll need to feed more data to the chip. The real world benchmarks showed only a 40-60% speed up in some cases, when compared to the 8800 Ultra. I know that everybody wants to run Crysis decently, but, there are other games with different needs. :)
 
if clusters are to be upped to 32SP (four multi-processors) you could have a 192SP GT206. Here's a nice midrange GPU (with 48 TMU, and 16 ROPs? they match the memory bus as usual).
I would still rule out a GX2 if nvidia is making a bad ass 384SP monster.

Will they change the ROPs to get 8x MSAA as fast as RV7xx? that would be nice and I would understand more their PR thing on "GPU 2013" where they emphasize "AA fillrate".
 
I would still rule out a GX2 if nvidia is making a bad ass 384SP monster.
GT212 isn't coming anytime soon and they need something to counter RV770X2 with.
On the other hand, 384 SPs and 96 TMUs (presuming they're going 32/8 TCPs for GT21x) on 40nm is nothing to shout about. Such chip should end up being in the same league as G92 die size.

Will they change the ROPs to get 8x MSAA as fast as RV7xx? that would be nice and I would understand more their PR thing on "GPU 2013" where they emphasize "AA fillrate".
I wouldn't count on any changes to the basic building blocks until DX11 GT300 or whatever they'll call it.
But since they'll still need to change the ROPs to support GDDR5 there is a possibility that they'll 'fix' MSAA 8x performance in their GDDR5 cards...

ninelven said:
I still don't understand your logic...
Yeah, you don't.
 
On the other hand, 384 SPs and 96 TMUs (presuming they're going 32/8 TCPs for GT21x) on 40nm is nothing to shout about. Such chip should end up being in the same league as G92 die size.
Then perhaps it will have more units, nVidia still wants a monster chip, that's their philosophy. Still not sure about the 32/8 TPCs though. In a way, that is a change in one of the basic building blocks you're talking about.
 
Then perhaps it will have more units, nVidia still wants a monster chip, that's their philosophy.
I've never heard of such philosophy from NV -)
They probably will be doing new top-end chips (CUDA and other professional applications demand it) but it doesn't mean that they won't do a GX2-type products or that they will always provide mainstream cards on their 'big' chips.
From the looks of it the public is quite happy with dual-chip AFR boards -- so why bother? Slap two simple chips in one card and make everyone happy -- they've done it two times already, and i don't see any reason why they won't do it again.

Still not sure about the 32/8 TPCs though. In a way, that is a change in one of the basic building blocks you're talking about.
TPC is a building block, sure, but not basic.
They've rebalanced TPCs in GT200, they may well do it again -- especially since they need more math power to counter AMDs R7x0 architecture, not texturing power.
 
I've never heard of such philosophy from NV -)
Jen-Hsun himself said that. nVidia will do dual-chip cards only if it is necesarry and impossible to achieve the same performance with a single GPU.
They probably will be doing new top-end chips (CUDA and other professional applications demand it) but it doesn't mean that they won't do a GX2-type products or that they will always provide mainstream cards on their 'big' chips.
Sure, GX2 is always a possibility as long as it makes economic sense. But nVidia will do a GX2 only in case something goes wrong with their big GPU (GT200 should originally have been released in Q4 2007).
They've rebalanced TPCs in GT200, they may well do it again -- especially since they need more math power to counter AMDs R7x0 architecture, not texturing power.
You're probably right. The architecture is getting a little long in the tooth and could use some ALU:TEX rebalancing. Now the question is whether nVidia optimizes the ALU:TEX ratio for newer games, or rather optimizes the current 24 SPs per cluster design to achieve higher frequencies of the shader core - an approach that would likely yield similar results while being easier from the engineering standpoint and also "cheaper" in terms of transistor count.
 
Looking back at the leaked slide on the first post, something kept bugging me but i couldn't pinpoint it until now (my eyes must be tired, so bear with me :D).

Notice the red arrows to the right of the current 9800GTX+ feature list.
It looks as if both of them read "980GTX+" instead of "9800GTX+". Double typo or what ? Hope they're cutting on the amount of letters as well as the numerals. Anything to lessen confusion would be good ;)
 
Notice the red arrows to the right of the current 9800GTX+ feature list. It looks as if both of them read "980GTX+" instead of "9800GTX+". Double typo or what ? Hope they're cutting on the amount of letters as well as the numerals. Anything to lessen confusion would be good ;)

Yes, After I blow-up the image into large size, I do see 980GTX+ as well; 3 digits now....
 
Why 48 TUs? You think they're gonna redesign the clusters again to increase the ALU:TEX ratio? I don't think so. R6xx was built with this modularity in mind, so each chip of the lineup had different ALU:TEX ratio, like Arun says. G8x/G9x on the other hand, was designed for a fixed ALU:TEX ratio for all chips (similar to R5xx) and it probably took some work to put more 24 SPs into a cluster in GT200.
There's also no indication there will actually be 16 ROP/RBEs. If GT206 is supposed to be about the same powerful, why should it have just 16?

What do you mean with "just 16 ROPs"? RV770 has "just 16 ROPs" too.

Yes, they adjust the bandwidth, but, it's clear that the card suffers a lot with high res & AA. It could be due to the weaker nV ROPs, as you guys told me. But, if nV ROPs are weaker than ATIs ROPs, it's a good reason to put more bw in the chip, or to redesign the ROPs for higher efficiency.

Who told you that NV has "weaker" ROPs?. I'd rather suggest that there's might be something else in their way considering 8xMSAA performance; triangle setup is being rumoured to be part of the problem albeit I'm not entirely sure its really relevant in this case. But if it should be true then even increasing hypothetically throughput per ROP (which is already as high as it can be for this generation), then it would be a complete waste of transistors since changing the triangle setup isn't something that can be done on such short notice.

On the GTX280, nV double or tripled the shading power, but they only raised the bw of the chip by a 40% compared to the mighty 8800 Ultra. So, i think that the card will became bounded by the bw, quicker than due to the shading power. When the shader programs aren't so complex, you'll need to feed more data to the chip. The real world benchmarks showed only a 40-60% speed up in some cases, when compared to the 8800 Ultra. I know that everybody wants to run Crysis decently, but, there are other games with different needs. :)

Before you come to any weird assumptions that GT200 is bandwidth limited, you'd have to come up with some credible data (like benchmarks from various sites for instance) that indicate any of it. I haven't been able to see any indications so far, but feel free to link me to anything that I might have missed.
 
Last edited by a moderator:
I don't know what ATI's and nVidia's ROPs are capable of, but I think it's not comparable. If 16 were enough for a chip like the GT200, why would they put 32 in there? (well maybe I know why, but since it's a bit contradictory to one of your theories, I'd like to hear your answer :) )
 
Back
Top