ELSA hints GT206 and GT212

DegustatoR · Oct 16, 2008

Quadro CX
CUDA Parallel Processor Cores 192
Memory Interface 384-bit
GT200-based? Or not? =)

ShaidarHaran · Oct 16, 2008

DegustatoR said:
Quadro CX
CUDA Parallel Processor Cores 192
Memory Interface 384-bit
GT200-based? Or not? =)

Based on statements round these parts lately I'd guess that's GT206.

DegustatoR · Oct 16, 2008

ShaidarHaran said:
Based on statements round these parts lately I'd guess that's GT206.

Then this would lead to two things:
1. GT206 is not GT200b.
2. GT206 doesn't support GDDR5? Why in the hell would they need 384-bit bus if it does?

ShaidarHaran · Oct 16, 2008

DegustatoR said:
Then this would lead to two things:
1. GT206 is not GT200b.
2. GT206 doesn't support GDDR5? Why in the hell would they need 384-bit bus if it does?

1) quite possible. I'm not sold on the idea that GT200b==GT206, it was just a theory
2) I don't think that conclusion can be drawn based on a single SKU...

AnarchX · Oct 16, 2008

DegustatoR said:
Quadro CX
CUDA Parallel Processor Cores 192
Memory Interface 384-bit
GT200-based? Or not? =)

Device ID is a GT200 one.

marllt2 · Oct 18, 2008

DegustatoR said:
1. GT206 is not GT200b.

So, does the GT200b exists ?

Or was it a speculative "journalistic" codename, based on the G92 -> G92b ?

igg · Oct 18, 2008

@marllt2: According to some people around here it's shipping in tesla cards.

Arun · Oct 20, 2008

I suspect it's also the foundation of the Quadro CX. In fact, I wouldn't be surprised if that was the exact same GPU bin as for a potential GX2... (384-bit & 150W TDP are pretty strong hints towards that) - too bad the clocks aren't public, if shaders were at >1300MHz we'd have our answer...

DegustatoR · Oct 20, 2008

Arun, as AnarchX said, it has GT200 device id.

igg · Oct 20, 2008

@DegustatoR: I think the clock would indicate whether it's GT200b/GT206 or another GT200 derivate (like the GTX260 chip which also has a different memory interface).

Arun · Oct 20, 2008

DegustatoR: And so does G92b AFAICT... what's your point?

Of course, that doesn't answer the GT206 mystery...

BTW, tentative GT21x line-up possibilities:
1T|40A|1R -> 0.2TFlops+ -> GT218/???
3T|120A|3R -> 0.6TFlops+ -> GT216/Late March
6T|240A|6R -> 1.2TFlops+ -> GT214/Early May
12T|480A|12R -> 2.8TFlops+ -> GT212/Late June

OR

1T|32A|1R
4T|128A|3R
8T|256A|6R
16T|512A|12R

OR

2T|48A|2R
4T|128A|3R
8T|256A|6R
12T|480A|12R

OR

...

In the first possibility, the ALU ratio might seem high until you add this little catch.
GT212 ALUs: 8 MADDs, 8 MULs/2SFU/2DP
GT214+ ALUs: 8 MADDs, 4 MULs/1SFU/1DP

David Kirk said very explicitly in one of his uni courses that they could fiddle with the MADD vs SFU ratio as they saw fit, and as the ALU-TMU ratio increases it makes a lot of sense to reduce it in my mind. This would also be fairly simple if you truly tied the SFU & DP units would only result in completely negligible limitations. This would also result in the MUL not being wasted *at all* in graphics, since for register file access reasons I won't go into here it is not realistic to expect more than half a MUL to be exposed anyway.

I also think the first scenario is more likely because it is likely more practical for them to tweak that rather than the SP-TMU ratio. As for why they'd make such big one-generation jumps in the ALU ratio, remember it's much easier to be pad-limited on 40nm for a given amount of memory bandwidth, and bandwidth requirements obviously scale faster with TMUs/ROPs than ALUs. A wider memory bus also increases the ratio of non-digital functionality which doesn't scale, and that's obviously not a good thing.

As usual, I expect to be horribly wrong here and to be massively disappointed by NVIDIA's frequent failure to come up with a coherent roadmap and by their fundamental misunderstanding of the difference between gross margins and gross profit. Oh well!

DegustatoR · Oct 21, 2008

Arun said:
DegustatoR: And so does G92b AFAICT... what's your point? Of course, that doesn't answer the GT206 mystery...

The point is that it's the same chip -)
It could be "GT200b", sure, but it's still 10 24/8 TPCs, 512-bit bus whatever you want to call it. Otherwise it would have another device id in the drivers.

Arun said:
BTW, tentative GT21x line-up possibilities:
1T|40A|1R -> 0.2TFlops+ -> GT218/???
3T|120A|3R -> 0.6TFlops+ -> GT216/Late March
6T|240A|6R -> 1.2TFlops+ -> GT214/Early May
12T|480A|12R -> 2.8TFlops+ -> GT212/Late June

I think you're running a bit ahead of time =)

From my point of view if NV wants to be competitive with their GT21x parts (presuming that's GT20x architecture on 40nm) they'll need to do some rethinking and rebalancing of G8x architecture. Otherwise they'll end up being slower with the same complexity or even with more transistors.
I don't think that we'll see the return of 384-bit bus in GT21x chips -- 256-bit GDDR5 should be enough for them.

GT216 is probably a G94 replacement (128-bit GDDR5, 128 SP / 32 TMU?), GT212 is a GT200 replacement (256-bit GDDR5, 384 SP / 96 TMU?) and GT206 is a G92b/GT200 replacement for 9800GTX+/GTX260 parts (55nm, 256-bit GDDR5, 192 SP / 64 TMU?). Plus GT200b which should replace GTX280 and add GTX290 on top of the line.

But as i've said they'd probably want to do something with their SMs in GT21x parts otherwise they'll end up slower per transistor than RV8x0 line. Plus they need to fix AA and add 10.1 support maybe?

Arun said:
In the first possibility, the ALU ratio might seem high until you add this little catch.
GT212 ALUs: 8 MADDs, 8 MULs/2SFU/2DP
GT214+ ALUs: 8 MADDs, 4 MULs/1SFU/1DP

I really, REALLY don't think that going forward with seperate DP units is the right thing to do. They may do this for the top-end part to be used in Teslas but for anything below that they'll probably go with version 1.2 CUDA compute capability without DP support.

igg · Oct 21, 2008

NordicHardware has a new article about GT206/212/216:

You have all heard about the GT206, also known as GT200b in the press. As we predicted back in May, this would be a part of the Fall/Winter refresh and it is due any day now. Specifications are of course sparse but expect a tweaked GT200 core running cooler and faster, it's all good.

After GT206, there is a GT212 core on roadmaps. There are rumors that this will be the first NVIDIA chip with GDDR5, but I wouldn't bet on it. It should arrive in Q2 2009.

Comment: According to Elsa slides GT212 will be produced in 40 nm already.

Sources has informed us of another chips that is in the works. NVIDIA is also working on GT216 which is said to be the first chip to reach the market using TSMC's 40nm process. It will go up against AMD's RV870 chip and should hit the market at about the same time. That time is late Q2 2009, or early Q3. There are rumors that this chip will be DirectX 11 compatible, which would put it even with RV870 in terms of DX support.

However, Arun kind of disagrees in his post:

Nice, that cens article is the first one I've seen mention this correctly, although they got most of their other facts wrong: GT216 will be the first NVIDIA 40nm GPU. However, it won't replace GT200, and it should be out in late Q1, not late Q2 (the latter is actually for the chip replacing GT200, so they likely just got some stuff confused). To get back on topic, presumably the roadmap calls for AMD's first 40nm chip to come out before that GT200 replacement too but after GT216, however who the hell knows at this point.

Arun · Oct 21, 2008

DegustatoR said:
The point is that it's the same chip -)
It could be "GT200b", sure, but it's still 10 24/8 TPCs, 512-bit bus whatever you want to call it. Otherwise it would have another device id in the drivers.

Yes, it's still the same thing fundamentally, no reason it to have major changes. Whether GT200b and GT206 are the same thing is another debate completely, I still think it's more likely that they are not and GT206 is an ultra-low-end chip aimed at replacing G98/G86 in the Montevina Refresh timeframe but we'll see.

I think you're running a bit ahead of time =)

Is that really a problem?

From my point of view if NV wants to be competitive with their GT21x parts (presuming that's GT20x architecture on 40nm) they'll need to do some rethinking and rebalancing of G8x architecture. Otherwise they'll end up being slower with the same complexity or even with more transistors.

Okay let me put it this way: NVIDIA's perf/[transistor*mhz] is quite fine. What isn't fine is their transistor density and their clock speeds; the latter is in part because of the monstruous size of the chip which causes variability issues, but the former is very much both a failure *and* a design decision.

Part of their goal very very likely was to improve yields by reducing density, but I am skeptical they've got above-average yields for a chip of that size. I think they either screwed up at the implementation level or just overestimated the effect of density on variability (which clearly is still a big problem given GT200's awful clocks) and thus missed their original clock targets.

AMD's approach is to have a much denser but also more regular layout, combined with fine-grained redundancy to reduce the average impact of defects. Seemingly practice has proven their approach superior (although obviously they can't apply it universally). I would also be very interested in knowing the leakage vs performance characteristics of NVIDIA's gates versus ATI's; I quite suspect NV is lower on the performance curve, which is the kind of thing that seems to make sense on paper but might not in practice.

I don't think that we'll see the return of 384-bit bus in GT21x chips -- 256-bit GDDR5 should be enough for them.

It seems GT212 might be the only GDDR5 chip, unfortunately.

But as i've said they'd probably want to do something with their SMs in GT21x parts otherwise they'll end up slower per transistor than RV8x0 line. Plus they need to fix AA and add 10.1 support maybe?

Yes, AA & 10.1 would be a good thing on GT21x (which doesn't support DX11 AFAIK, based on my parsing of public statements from Michael Hara of Investor Relations). Regarding SM density, I think the RTL itself is fine, it's more of a density issue. I also think my proposed half-SFU solution would be a good way to improve perf/mm² slightly.

I really, REALLY don't think that going forward with seperate DP units is the right thing to do. They may do this for the top-end part to be used in Teslas but for anything below that they'll probably go with version 1.2 CUDA compute capability without DP support.

In the grand scheme of things, a single FP64 MADD unit is pretty damn cheap. And changing your 24x24 MADD units into 27x27 or 32x32 ones isn't free either, so for basic DP support along with proper denormal support etc. this isn't such an awful solution. I agree however that there is no good reason to keep it on the low-end parts such as GT218, and I wouldn't be surprised if their approach changed in the DX11 generation anyway (which is where most of the R&D dollars are right now obviously).

DegustatoR · Oct 21, 2008

Arun said:
GT206 is an ultra-low-end chip aimed at replacing G98/G86 in the Montevina Refresh timeframe but we'll see

Why would they want to replace ultra-low-end now?

Arun said:
Okay let me put it this way: NVIDIA's perf/[transistor*mhz] is quite fine. What isn't fine is their transistor density and their clock speeds; the latter is in part because of the monstruous size of the chip which causes variability issues, but the former is very much both a failure *and* a design decision.

Even with RV770 transistor density GT200 would still have 530mm^2 die size @65nm and ~450mm^2 @55nm.
Considering that RV770 is dangerously close to GT200 in performance i'd say that they definately have an issue with their perf/transistor ratio right now.
But maybe it's an issue of GT200 more than an issue of G8x architecture.

Arun said:
It seems GT212 might be the only GDDR5 chip, unfortunately.

Well, if GT206 isn't a mainstream part then the next candidate is GT212, yeah.

Arun said:
Yes, AA & 10.1 would be a good thing on GT21x (which doesn't support DX11 AFAIK, based on my parsing of public statements from Michael Hara of Investor Relations).

The thing is that if GT212 (GT216 is probably a low-end chip being the first on 40nm) will be out in 2Q09 then there's no point for them to support anything less than DX11 in it. I even think that they probably should scrap any hi-end part they have planned in the GT21x line and use it as a guinea pig for 40nm process while bringing GT30x DX11 stuff closer.
They're late on almost every front and it's time to do some roadmap rearranging imho.

Jawed · Oct 21, 2008

Arun said:
AMD's approach is to have a much denser but also more regular layout,

What does "more regular" mean? What's the advantage? What's more regular?

Jawed

Jawed · Oct 21, 2008

DegustatoR said:
Even with RV770 transistor density GT200 would still have 530mm^2 die size @65nm and ~450mm^2 @55nm.

Those sizes sound too large comparing 954M versus 1.4B transistors. So, how did you work that out?

Jawed

marllt2 · Oct 22, 2008

Arun said:
Yes, AA & 10.1 would be a good thing on GT21x (which doesn't support DX11 AFAIK, based on my parsing of public statements from Michael Hara of Investor Relations).

Could you remind us what Mr Hara said about that please ?

DegustatoR · Oct 22, 2008

Jawed said:
Those sizes sound too large comparing 954M versus 1.4B transistors. So, how did you work that out?

Yeah, my math is probably wrong there.
530 is correct, but at 55nm it would be 380mm^2 not 450.
Still pretty big compared to RV770 considering their performance figures.

Arun · Oct 22, 2008

DegustatoR said:
Why would they want to replace ultra-low-end now?

Because G98 is a POS and they need to compete against RV710?

Even with RV770 transistor density GT200 would still have 530mm^2 die size @65nm and ~450mm^2 @55nm.

What? If we exclude I/O & Analogue, I think it's pretty clear that ~260 * (1400+ / 965) = 380mm²+ on 55nm. This could be combined with 384-bit GDDR5 and higher clocks, which would result in similar perf/mm² (or, more accurately, similar perf/mm² to a hypothetical ATI part with the same performance target!)
[Pre-Publish EDIT: Oh, just noticed you corrected that yourself now, okay then]

Considering that RV770 is dangerously close to GT200 in performance i'd say that they definately have an issue with their perf/transistor ratio right now.

Notice that I said [transistor*mhz]... G92b can reach clocks very near HD4870, so I feel it's fair to say that's not a bad metric to consider.

But maybe it's an issue of GT200 more than an issue of G8x architecture.

My point is it's an issue of synthesis, not the actual RTL-level architecture (although the ALU-TEX ratio and the choice to stick to GDDR5 obviously don't help perf/mm² much either).

The thing is that if GT212 (GT216 is probably a low-end chip being the first on 40nm) will be out in 2Q09 then there's no point for them to support anything less than DX11 in it. I even think that they probably should scrap any hi-end part they have planned in the GT21x line and use it as a guinea pig for 40nm process while bringing GT30x DX11 stuff closer.

Good idea: let's quadruple risk for a company that badly needs to improve their position and can't afford any more screw ups! Given how a certain semi-risky decision on G96/G98 turned out, I'm sure Jen-Hsun will LOVE that idea!

They're late on almost every front and it's time to do some roadmap rearranging imho.

It's easy to forget that moving boxes on pieces of paper doesn't allow you to change reality. If your kind of strategy was pursued, NVIDIA could have canned G71/G72/G73 since G80 was originally scheduled to come up in a very similar timeframe. But new architectures are very prone to delays, and that kind of risk is absolutely senseless IMO. I think their current roadmap is pretty much as follows:
Q1: First DX10 or DX10.1 40nm chip (mid-range).
Q2: Other DX10 or DX10.1 40nm chips (family).
Q3: First DX11 40nm chip (ultra-high-end).
Q4: Other DX11 40nm chips (family).

It's perfectly plausible that each of these steps gets delayed by one quarter or more though, and it is not predictable which will be most delayed. By trying to get DX11 out of the door quicker, you just risk not having a competitive product in the market for an extra 6 months. These teams are already parallel and we're near enough tape-out even for DX11 that more people suddenly dedicated to the project just risks delaying everything, so I'm not sure I see the point.

marllt2 said:
Could you remind us what Mr Hara said about that please ?

He indicated 40nm would be in H1 while the new arch would be in H2.

Jawed said:
What does "more regular" mean? What's the advantage? What's more regular?

I wasn't thinking specifically of this company or that approach, but this is not a bad start to see the kind of thing I mean: http://www.tela-inc.com/ - what I find particularly cool with Tela's tech, BTW, is if you can reduce leakage you can use transistors higher on the performance-leakage curve, which also means you can improve your perf/mm² more than the raw density impact of the approach!

ELSA hints GT206 and GT212

DegustatoR

ShaidarHaran

hardware monkey

DegustatoR

ShaidarHaran

hardware monkey

AnarchX

marllt2

igg

Arun

Unknown.

DegustatoR

igg

Arun

Unknown.

DegustatoR

igg

Arun

Unknown.

DegustatoR

Jawed

Jawed

marllt2

DegustatoR

Arun

Unknown.

Similar threads