ELSA hints GT206 and GT212

CarstenS · Feb 3, 2009

Sorry for failing to be a bit funny.

Of course, i know that Arun was talking about package.

Jawed · Feb 3, 2009

CarstenS said:
It'd only make sense after all, given the massive amount of die-space their scheduling logic and associated bulkhead takes, to go for the slightly bigger die and try and outperform AMD. It's not like they got any other choice, given the enourmous FLOPS/mm² AMD already has achieved with their 2nd 55nm generation.

GTX260-216 performs quite happily against HD4870 despite its FLOP shortfall.

Oh, and I don't include you in that group of peeps who didn't get the package thing!

I thought that was quite funny.

Jawed

Arun · Feb 3, 2009

CarstenS said:
Sorry for failing to be a bit funny.
Of course, i know that Arun was talking about package.

hah!

Well that's partially my fault, given how many people seriously make that mistake all the time (and then tons of people don't spot it) you can't blame me for being a tad paranoid here...

I did realize you were kidding about the latter part (lol @ quadratic reasons), but I kinda assumed you were implying my measurement might be wrong. Oops.

ninelven · Feb 3, 2009

What you are talking about are physical SP vs. TMU counts, but that way you're ignoring that SPs are running at a much higher frequency. So while for example G92 is physically 2:1, effectively it is >4:1.

No. I think I know what I am talking about. When Nvidia made this statement, they were referring to 3 Vec4's per TMU (equal clocks). This would be equivalent to 12 SPs per TMU if they were clocked the same, but I took into account that the SPs being clocked 2x higher, which is why I said and meant 6:1.

trinibwoy · Feb 3, 2009

Why is 3x4 = 8x1 ?

ninelven · Feb 3, 2009

Would you prefer a Flops to Texture Fill Rate comparison? I think you will end up in the same place...

CarstenS · Feb 4, 2009

Jawed said:
GTX260-216 performs quite happily against HD4870 despite its FLOP shortfall.

They do. But I was extrapolation the given routes for both. If NVidia and AMD were just adding further (40nm-shrinked) SIMDs like there was no tomorrow, AMD would surely outperform Nvidia and with a much smaller chip to boot.

So my take would be - disregarding any IPC improvements in the individual SMs - that Nvidia is bound to increase not only the total # of FLOPS but also the FLOPS per TPC/clk.

Jawed · Feb 4, 2009

CarstenS said:
So my take would be - disregarding any IPC improvements in the individual SMs - that Nvidia is bound to increase not only the total # of FLOPS but also the FLOPS per TPC/clk.

Arguably GT21x GPUs were designed before the shock and awe of RV770 hit, so what are the chances NVidia will respond specifically by increasing ALU:TEX?

Also, like G71 was to G70, NVidia will, if anything, be concentrating on performance per mm and per watt, not absolute performance. Trying to deliver the most performance at each price point, but trying to avoid making GPUs larger than necessary.

Overall it seems to me that increasing ALU:TEX in the shadow of GT300 is unlikely. Sure there was prolly supposed to have been a longer gap between GT21x stragglers and GT300 than the single quarter it looks like it'll be, but I still think GT300's shadow is long enough to de-prioritise ALU:TEX increases.

Also, NVidia's architecture gets a free, implicit, ALU:TEX ratio increase from the more complex shaders because the less texturing-per-clock in a shader the more cycles are left over for math, due to the multifunction interpolator's duty to generate texel coordinates.

NVidia's recent problem has been its GPUs are much larger than the bus-size requires (excluding GT200 which appears to be that big solely because of its 512-bit bus), e.g. G94's 240mm2 is quite a bit larger than the ~190mm2 required for a 256-bit bus (subject to the vagaries of power pads...). I imagine NVidia would prefer to deliver smaller chips for a given bus size and 40nm is the node to do it, if ever there was. Additionally, most of GT21x could have an 18-month lifetime, like G92, which increases the importance of minimal die size.

Jawed

Lukfi · Feb 4, 2009

ninelven said:
No. I think I know what I am talking about. When Nvidia made this statement, they were referring to 3 Vec4's per TMU (equal clocks). This would be equivalent to 12 SPs per TMU if they were clocked the same, but I took into account that the SPs being clocked 2x higher, which is why I said and meant 6:1.

I am pretty sure CarstenS was talking about the SP:TMU ratios, which is 2:1 on G8x/G9x, 3:1 on G200 and he expects it to be 4:1 on GT21x. Since you were quoting his words, I supposed (and probably everyone else did as well) you're talking about the same thing. I still don't see how the 6:1 number is relevant here...

Jawed said:
NVidia's recent problem has been its GPUs are much larger than the bus-size requires (excluding GT200 which appears to be that big solely because of its 512-bit bus)

Wasn't R600 smaller than G200b?

trinibwoy · Feb 4, 2009

ninelven said:
Would you prefer a Flops to Texture Fill Rate comparison? I think you will end up in the same place...

Sure, that's what I've been doing since G80 dropped. But even that is still a very rough estimation as different architectures exhibit different levels of effective flops throughput in real texturing scenarios.

DegustatoR · Feb 4, 2009

Lukfi said:
Alright, let's forget the rumours. What do you think of these?

GT212 | 288 SPs | 96 TMUs | 4 ROP blocks, 256-bit interface, GDDR5
GT214 | 144 SPs | 48 TMUs | 3 ROP blocks, 192-bit interface
GT216 | 72 SPs | 24 TMUs | 2 ROP blocks, 128-bit interface

It starts to look like this:

GT218 | 32 SP / 8 TU | 64-bit DDR3
GT216 | 96 SP / 24 TU | 128-bit G/DDR3
GT214 | 192 SP / 48 TU | 192-bit GDDR3
GT212 | 384 SP / 96 TU | 256-bit GDDR3/5

With GT212 covering everything above $200 with GDDR3 and GDDR5 and 2-chip AFR configs.
Not very impressive from where i'm standing. The only really good GPU in such line-up would be GT212, everything below it are just cost-cutting versions of G9x chips with less texturing power and hotter and noisier because of this.
I hope that's not how it'll turn out to be in the end =(

Jawed · Feb 4, 2009

Lukfi said:
Wasn't R600 smaller than G200b?

Thinking about it, GT200b is smaller but still has a 512-bit bus, so how big is GT200b?

Jawed

Arun · Feb 4, 2009

Quick point - why is everyone forgetting about GT215?

There are two ways to consider that chip: either they want a 5-chip line-up (wtf?) or they canned one or two of the four chips and replaced that with a new one. I'd bet on the latter...

My current guess (it changes about every 30 minutes

) is their line-up will look roughly like this, using DegustatoR's nomenclature for simplicity's sake:
GT218: 32 SP / 8 TU | 64-bit G/DDR3 | 0 DP & 1 SFU per SM [Early Q2]
GT216: 120 SP / 24 TU | 192-bit GDDR3 | 1 DP & 1 SFU per SM [Late Q2]
GT215: 320 SP / 64 TU | 256-bit GDDR5 | 1 DP & 1 SFU per SM [Q3]
GT300: 512-bit GDDR5 [Q4]

I know that might sound counter-intuitive for GT215, but I'd argue after a certain point the goal is to confuse your competitor more than anything else - especially with late roadmap changes! AMD has done an amazing job misinforming NV about their late changes, so I wouldn't exclude anything on either side. Plus, maybe someone at NV is superstitious and didn't want a chip that would be half-way between GT214 & GT212 performance to be named GT213...

Jawed: R600 was ~420mm², GT200b is ~470mm² + NVIO...

KonKort · Feb 4, 2009

Arun said:
GT218: 32 SP / 8 TU | 64-bit G/DDR3 | 0 DP & 1 SFU per SM [Early Q2]
GT216: 120 SP / 24 TU | 192-bit GDDR3 | 1 DP & 1 SFU per SM [Late Q2]
GT215: 320 SP / 64 TU | 256-bit GDDR5 | 1 DP & 1 SFU per SM [Q3]
GT300: 512-bit GDDR5 [Q4]

Right, wrong, wrong, right.

How do you get to 120 SP/24 TU or 320 SP/64 TU? Do you think 40 SPs, 8 TUs per cluster or 20 SPs, 4 TUs per cluster?
I know it is wrong but I am interesting how you thinking.

Jawed · Feb 4, 2009

Arun said:
Jawed: R600 was ~420mm², GT200b is ~470mm² + NVIO...

So GT200b is about 2.5mm on each side smaller than GT200 - about 21.5x21.5mm versus 24x24mm. Looking at the die picture for GT200 -65nm - it doesn't look like there's much room to shave off the sides due to the sheer quantity of what appears to be the GDDR3 physical interface.

Clearly, with a rectangular die the area could be much smaller for a given perimeter (bus width) - but GPUs with more than a 64-bit bus seem to be square-ish as a rule.

Jawed

DegustatoR · Feb 4, 2009

Arun said:
Quick point - why is everyone forgetting about GT215? There are two ways to consider that chip: either they want a 5-chip line-up (wtf?) or they canned one or two of the four chips and replaced that with a new one. I'd bet on the latter...

There is another possibility which i consider to be more probable: some of these 5 GT21x chips will show up only as mobile GPUs.
And GT215 is probably a chip with 128-bit GDDR3/5 and with 4 or 5 or 8 TPCs -- probably 8 because GT212 isn't the best candidate for notebook usage and GT214 if it's really 6TPC/192-bit with GDDR3 only configs in mind is a bit crap for notebooks also.
Hey, maybe GT215 is a late reaction to RV740? 8)

And i still think that GT300 with 512-bit GDDR5 is a bit extreme.

DegustatoR · Feb 4, 2009

[delete this]

Jawed · Feb 4, 2009

DegustatoR said:
And i still think that GT300 with 512-bit GDDR5 is a bit extreme.

But AMD, with a 256-bit bus, will be completely unable to compete with "GTX360" if it has a 448-bit GDDR5 bus - which is quite unlike HD4870 versus GTX260.

So GTX380 could easily be $650 and GTX260 $450 and HD5870's performance would price it at around $300 again, this time putting no pressure on NVidia.

I hope AMD goes for more than 16xZ per 64-bit channel - 16xZ per 32-bit channel would be cool. It's the only way to compete with NVidia, who'll prolly double Z per channel in GT300 in comparison with GT200. AMD could use the highest clocking GDDR5 to compensate too - higher-clocked than NVidia. That would give 16xZ per 32-bit channel bandwidth to breathe.

Jawed

DegustatoR · Feb 4, 2009

Jawed said:
But AMD, with a 256-bit bus, will be completely unable to compete with "GTX360" if it has a 448-bit GDDR5 bus - which is quite unlike HD4870 versus GTX260.

GT300 is (supposedly) a top-end GPU a la G80 and GT200. RV870 is (supposedly) a middle-class GPU a la RV670 and RV770. Any competetion between one GT300 and one RV870 is a problem for NV -- in much the same way it is now between GT200 and RV770.
As for the competetion between GT300 and an RV870-based AFR top-end from AMD, i'm not so sure that you have to have the same bandwidth on the one-chip top-end to be able to counter an AFR system which is quite ineffective in it's memory usage. A smarter way is to have less costly solution with the same performance -- and 384-bit GDDR5 might do the trick here.
But who knows what they've planned for GT300...

Razor1 · Feb 4, 2009

Jawed said:
But AMD, with a 256-bit bus, will be completely unable to compete with "GTX360" if it has a 448-bit GDDR5 bus - which is quite unlike HD4870 versus GTX260.

So GTX380 could easily be $650 and GTX260 $450 and HD5870's performance would price it at around $300 again, this time putting no pressure on NVidia.
Jawed

Well nV probably will adjust for thier short comings from the past launch, that or they are just stupid

.

ELSA hints GT206 and GT212

CarstenS

Moderator

Jawed

Arun

Unknown.

ninelven

PM

trinibwoy

Meh

ninelven

PM

CarstenS

Moderator

Jawed

Lukfi

trinibwoy

Meh

DegustatoR

Jawed

Arun

Unknown.

KonKort

Jawed

DegustatoR

DegustatoR

Jawed

DegustatoR

Razor1

Similar threads