ELSA hints GT206 and GT212

Discussion in 'Architecture and Products' started by AnarchX, Sep 9, 2008.

  1. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,244
    Likes Received:
    3,408
    Quadro CX
    CUDA Parallel Processor Cores 192
    Memory Interface 384-bit

    GT200-based? Or not? =)
     
  2. ShaidarHaran

    ShaidarHaran hardware monkey
    Veteran

    Joined:
    Mar 31, 2007
    Messages:
    4,027
    Likes Received:
    90
    Based on statements round these parts lately I'd guess that's GT206.
     
  3. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,244
    Likes Received:
    3,408
    Then this would lead to two things:
    1. GT206 is not GT200b.
    2. GT206 doesn't support GDDR5? Why in the hell would they need 384-bit bus if it does?
     
  4. ShaidarHaran

    ShaidarHaran hardware monkey
    Veteran

    Joined:
    Mar 31, 2007
    Messages:
    4,027
    Likes Received:
    90
    1) quite possible. I'm not sold on the idea that GT200b==GT206, it was just a theory
    2) I don't think that conclusion can be drawn based on a single SKU...
     
  5. AnarchX

    Veteran

    Joined:
    Apr 19, 2007
    Messages:
    1,559
    Likes Received:
    34

    Device ID is a GT200 one.
     
  6. marllt2

    Newcomer

    Joined:
    Oct 18, 2008
    Messages:
    7
    Likes Received:
    0
    Location:
    France
    So, does the GT200b exists ?

    Or was it a speculative "journalistic" codename, based on the G92 -> G92b ?
     
  7. igg

    igg
    Newcomer

    Joined:
    May 16, 2008
    Messages:
    63
    Likes Received:
    0
    @marllt2: According to some people around here it's shipping in tesla cards.
     
  8. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    I suspect it's also the foundation of the Quadro CX. In fact, I wouldn't be surprised if that was the exact same GPU bin as for a potential GX2... (384-bit & 150W TDP are pretty strong hints towards that) - too bad the clocks aren't public, if shaders were at >1300MHz we'd have our answer...
     
  9. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,244
    Likes Received:
    3,408
    Arun, as AnarchX said, it has GT200 device id.
     
  10. igg

    igg
    Newcomer

    Joined:
    May 16, 2008
    Messages:
    63
    Likes Received:
    0
    @DegustatoR: I think the clock would indicate whether it's GT200b/GT206 or another GT200 derivate (like the GTX260 chip which also has a different memory interface).
     
  11. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    DegustatoR: And so does G92b AFAICT... what's your point? :) Of course, that doesn't answer the GT206 mystery...

    BTW, tentative GT21x line-up possibilities:
    1T|40A|1R -> 0.2TFlops+ -> GT218/???
    3T|120A|3R -> 0.6TFlops+ -> GT216/Late March
    6T|240A|6R -> 1.2TFlops+ -> GT214/Early May
    12T|480A|12R -> 2.8TFlops+ -> GT212/Late June

    OR

    1T|32A|1R
    4T|128A|3R
    8T|256A|6R
    16T|512A|12R

    OR

    2T|48A|2R
    4T|128A|3R
    8T|256A|6R
    12T|480A|12R

    OR

    ...


    In the first possibility, the ALU ratio might seem high until you add this little catch.
    GT212 ALUs: 8 MADDs, 8 MULs/2SFU/2DP
    GT214+ ALUs: 8 MADDs, 4 MULs/1SFU/1DP

    David Kirk said very explicitly in one of his uni courses that they could fiddle with the MADD vs SFU ratio as they saw fit, and as the ALU-TMU ratio increases it makes a lot of sense to reduce it in my mind. This would also be fairly simple if you truly tied the SFU & DP units would only result in completely negligible limitations. This would also result in the MUL not being wasted *at all* in graphics, since for register file access reasons I won't go into here it is not realistic to expect more than half a MUL to be exposed anyway.

    I also think the first scenario is more likely because it is likely more practical for them to tweak that rather than the SP-TMU ratio. As for why they'd make such big one-generation jumps in the ALU ratio, remember it's much easier to be pad-limited on 40nm for a given amount of memory bandwidth, and bandwidth requirements obviously scale faster with TMUs/ROPs than ALUs. A wider memory bus also increases the ratio of non-digital functionality which doesn't scale, and that's obviously not a good thing.

    As usual, I expect to be horribly wrong here and to be massively disappointed by NVIDIA's frequent failure to come up with a coherent roadmap and by their fundamental misunderstanding of the difference between gross margins and gross profit. Oh well! ;)
     
  12. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,244
    Likes Received:
    3,408
    The point is that it's the same chip -)
    It could be "GT200b", sure, but it's still 10 24/8 TPCs, 512-bit bus whatever you want to call it. Otherwise it would have another device id in the drivers.

    I think you're running a bit ahead of time =)

    From my point of view if NV wants to be competitive with their GT21x parts (presuming that's GT20x architecture on 40nm) they'll need to do some rethinking and rebalancing of G8x architecture. Otherwise they'll end up being slower with the same complexity or even with more transistors.
    I don't think that we'll see the return of 384-bit bus in GT21x chips -- 256-bit GDDR5 should be enough for them.

    GT216 is probably a G94 replacement (128-bit GDDR5, 128 SP / 32 TMU?), GT212 is a GT200 replacement (256-bit GDDR5, 384 SP / 96 TMU?) and GT206 is a G92b/GT200 replacement for 9800GTX+/GTX260 parts (55nm, 256-bit GDDR5, 192 SP / 64 TMU?). Plus GT200b which should replace GTX280 and add GTX290 on top of the line.

    But as i've said they'd probably want to do something with their SMs in GT21x parts otherwise they'll end up slower per transistor than RV8x0 line. Plus they need to fix AA and add 10.1 support maybe?

    I really, REALLY don't think that going forward with seperate DP units is the right thing to do. They may do this for the top-end part to be used in Teslas but for anything below that they'll probably go with version 1.2 CUDA compute capability without DP support.
     
  13. igg

    igg
    Newcomer

    Joined:
    May 16, 2008
    Messages:
    63
    Likes Received:
    0
    NordicHardware has a new article about GT206/212/216:
    Comment: According to Elsa slides GT212 will be produced in 40 nm already.
    However, Arun kind of disagrees in his post:
     
    #253 igg, Oct 21, 2008
    Last edited by a moderator: Oct 21, 2008
  14. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    Yes, it's still the same thing fundamentally, no reason it to have major changes. Whether GT200b and GT206 are the same thing is another debate completely, I still think it's more likely that they are not and GT206 is an ultra-low-end chip aimed at replacing G98/G86 in the Montevina Refresh timeframe but we'll see.

    Is that really a problem? :)

    Okay let me put it this way: NVIDIA's perf/[transistor*mhz] is quite fine. What isn't fine is their transistor density and their clock speeds; the latter is in part because of the monstruous size of the chip which causes variability issues, but the former is very much both a failure *and* a design decision.

    Part of their goal very very likely was to improve yields by reducing density, but I am skeptical they've got above-average yields for a chip of that size. I think they either screwed up at the implementation level or just overestimated the effect of density on variability (which clearly is still a big problem given GT200's awful clocks) and thus missed their original clock targets.

    AMD's approach is to have a much denser but also more regular layout, combined with fine-grained redundancy to reduce the average impact of defects. Seemingly practice has proven their approach superior (although obviously they can't apply it universally). I would also be very interested in knowing the leakage vs performance characteristics of NVIDIA's gates versus ATI's; I quite suspect NV is lower on the performance curve, which is the kind of thing that seems to make sense on paper but might not in practice.

    It seems GT212 might be the only GDDR5 chip, unfortunately.

    Yes, AA & 10.1 would be a good thing on GT21x (which doesn't support DX11 AFAIK, based on my parsing of public statements from Michael Hara of Investor Relations). Regarding SM density, I think the RTL itself is fine, it's more of a density issue. I also think my proposed half-SFU solution would be a good way to improve perf/mm² slightly.

    In the grand scheme of things, a single FP64 MADD unit is pretty damn cheap. And changing your 24x24 MADD units into 27x27 or 32x32 ones isn't free either, so for basic DP support along with proper denormal support etc. this isn't such an awful solution. I agree however that there is no good reason to keep it on the low-end parts such as GT218, and I wouldn't be surprised if their approach changed in the DX11 generation anyway (which is where most of the R&D dollars are right now obviously).
     
  15. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,244
    Likes Received:
    3,408
    Why would they want to replace ultra-low-end now?

    Even with RV770 transistor density GT200 would still have 530mm^2 die size @65nm and ~450mm^2 @55nm.
    Considering that RV770 is dangerously close to GT200 in performance i'd say that they definately have an issue with their perf/transistor ratio right now.
    But maybe it's an issue of GT200 more than an issue of G8x architecture.

    Well, if GT206 isn't a mainstream part then the next candidate is GT212, yeah.

    The thing is that if GT212 (GT216 is probably a low-end chip being the first on 40nm) will be out in 2Q09 then there's no point for them to support anything less than DX11 in it. I even think that they probably should scrap any hi-end part they have planned in the GT21x line and use it as a guinea pig for 40nm process while bringing GT30x DX11 stuff closer.
    They're late on almost every front and it's time to do some roadmap rearranging imho.
     
    #255 DegustatoR, Oct 21, 2008
    Last edited by a moderator: Oct 21, 2008
  16. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    What does "more regular" mean? What's the advantage? What's more regular?

    Jawed
     
  17. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Those sizes sound too large comparing 954M versus 1.4B transistors. So, how did you work that out?

    Jawed
     
  18. marllt2

    Newcomer

    Joined:
    Oct 18, 2008
    Messages:
    7
    Likes Received:
    0
    Location:
    France
    Could you remind us what Mr Hara said about that please ?
     
  19. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,244
    Likes Received:
    3,408
    Yeah, my math is probably wrong there.
    530 is correct, but at 55nm it would be 380mm^2 not 450.
    Still pretty big compared to RV770 considering their performance figures.
     
  20. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    Because G98 is a POS and they need to compete against RV710?

    What? If we exclude I/O & Analogue, I think it's pretty clear that ~260 * (1400+ / 965) = 380mm²+ on 55nm. This could be combined with 384-bit GDDR5 and higher clocks, which would result in similar perf/mm² (or, more accurately, similar perf/mm² to a hypothetical ATI part with the same performance target!)
    [Pre-Publish EDIT: Oh, just noticed you corrected that yourself now, okay then]

    Notice that I said [transistor*mhz]... G92b can reach clocks very near HD4870, so I feel it's fair to say that's not a bad metric to consider.

    My point is it's an issue of synthesis, not the actual RTL-level architecture (although the ALU-TEX ratio and the choice to stick to GDDR5 obviously don't help perf/mm² much either).

    Good idea: let's quadruple risk for a company that badly needs to improve their position and can't afford any more screw ups! Given how a certain semi-risky decision on G96/G98 turned out, I'm sure Jen-Hsun will LOVE that idea! :p
    It's easy to forget that moving boxes on pieces of paper doesn't allow you to change reality. If your kind of strategy was pursued, NVIDIA could have canned G71/G72/G73 since G80 was originally scheduled to come up in a very similar timeframe. But new architectures are very prone to delays, and that kind of risk is absolutely senseless IMO. I think their current roadmap is pretty much as follows:
    Q1: First DX10 or DX10.1 40nm chip (mid-range).
    Q2: Other DX10 or DX10.1 40nm chips (family).
    Q3: First DX11 40nm chip (ultra-high-end).
    Q4: Other DX11 40nm chips (family).

    It's perfectly plausible that each of these steps gets delayed by one quarter or more though, and it is not predictable which will be most delayed. By trying to get DX11 out of the door quicker, you just risk not having a competitive product in the market for an extra 6 months. These teams are already parallel and we're near enough tape-out even for DX11 that more people suddenly dedicated to the project just risks delaying everything, so I'm not sure I see the point.

    He indicated 40nm would be in H1 while the new arch would be in H2.

    I wasn't thinking specifically of this company or that approach, but this is not a bad start to see the kind of thing I mean: http://www.tela-inc.com/ - what I find particularly cool with Tela's tech, BTW, is if you can reduce leakage you can use transistors higher on the performance-leakage curve, which also means you can improve your perf/mm² more than the raw density impact of the approach! :)
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...