Llano IGP vs SNB IGP vs IVB IGP

Discussion in 'Architecture and Products' started by AnarchX, Oct 29, 2010.

  1. AnarchX

    Veteran

    Joined:
    Apr 19, 2007
    Messages:
    1,559
    Likes Received:
    34
    How do you think they will compare?

    Llano:
    - 32nm
    - 400SPs (5D VLIW) @ up to 600MHz
    - dual-channel DDR3 @ ~ 1.6Gbps
    - mid 2011

    Intel Graphics HD 200:
    - 32nm
    - 12 EUs (4D MADDs?) doubled troughput over last generation , 4 TMUs, clocks up to 1.35GHz
    - Direct3D 10.1 support, OpenCL, DirectCompute
    - connected to 8MiB LL-cache
    - dual-channel DDR3 @ ~ 1.6Gbps
    - early 2011

    Iy Bridge Graphics:
    - 22nm
    - 16 EUs according to Intel
    - Direct3D 11 support
    - stacked DRAM?
    - early 2012
     
    #1 AnarchX, Oct 29, 2010
    Last edited by a moderator: Apr 14, 2011
  2. Chabi

    Newcomer

    Joined:
    Aug 2, 2010
    Messages:
    117
    Likes Received:
    0
    Location:
    Hungary
    SNB IGP OpenCL compatible?
     
  3. AnarchX

    Veteran

    Joined:
    Apr 19, 2007
    Messages:
    1,559
    Likes Received:
    34
  4. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,021
    Likes Received:
    119
    The EUs still can't do MAD. They can, however, do MAC (with a special accumulator reg), and, in contrast to the last generation, enable/disable accumulator update per instruction, which might make it more easy to exploit this. Earlier EUs were 4D physical, 8D logical (well they had 4D mode but such a 4D instruction still took 2 cycles), so it's possible (but I don't know) they are 8D physical now (which would explain the "double throughput" but maybe that quote was meant to describe something else).
    I'm quite sure there were 8 TMUs even for i965 already (though not sure what they could do per clock), and I certainly wouldn't expect SNB to have less (in theory, it could have more, since it appears some versions will have 6 EUs the other 12 EUs, it's possible at least on paper the tmu block isn't shared).
    In any case, texture fillrate should be quite good even with 8 TMUs (possibly approaching Llano levels), with the caveat I've no idea about FP16 etc. For flops, if that's 4D units, you're looking at ~120GFlops if you count that MAC as 2 ops. If that's 8D units, well then that's twice that which would begin to look nearly comparable to Llano.
    So for Ivy Bridge, if that basically doubles SNB graphics performances, that could be quite a challenge for Llano. Though of course there's a lot more to graphic performance than just alus/tmus - one area intel was very weak was what AMD initially named HyperZ, things like early-z (though intel can do this now), z buffer compression etc to save bandwdith. I think though SNB improves this quite a bit, and the 8MB cache could give it a huge advantage in some situations since these chips are quite a bit bandwidth-challenged.
     
  5. AnarchX

    Veteran

    Joined:
    Apr 19, 2007
    Messages:
    1,559
    Likes Received:
    34
    Next-Gen Fusions Trinity and Komodo: http://www.abload.de/img/amddesktop126q72.jpg

    - still 32nm
    - probably L3-Cache connection for IGP
    - probably increased die-size (Thuban level ~300mm²) which should allow to increase SIMDs from 6 to 10 (800SPs @ 5D, 640SPs @4D)
    - probably mid 2012 release
    - Komodo probably with 3 memory channels or GDDR5 sideport
     
    #5 AnarchX, Nov 9, 2010
    Last edited by a moderator: Nov 9, 2010
  6. chavvdarrr

    Veteran

    Joined:
    Feb 25, 2003
    Messages:
    1,165
    Likes Received:
    34
    Location:
    Sofia, BG
    I had a feeling that Zacate has 2 SIMDs with 80SPs total
     
  7. AnarchX

    Veteran

    Joined:
    Apr 19, 2007
    Messages:
    1,559
    Likes Received:
    34
    The topic is about higher performance APUs/CPU-IGP-chips: Llano IGP vs SNB IGP vs IVB IGP.
     
  8. hkultala

    Regular

    Joined:
    May 22, 2002
    Messages:
    296
    Likes Received:
    38
    Location:
    Herwood, Tampere, Finland
    Yes, of course.

    I see nothing suggesting this.

    .. except that Llano will not have 6 but 3 SIMD cores (240 ALUs).
    And I don't except them to increase die size much, would be too costly to manufacture.

    My estimate is increase from 3(*80) to 4(*64)


    AMD has never used non-2-power memory buses before. I don't except them to do it with Komodo either.
     
  9. hkultala

    Regular

    Joined:
    May 22, 2002
    Messages:
    296
    Likes Received:
    38
    Location:
    Herwood, Tampere, Finland
    And there won't be a sideport in a chip which does not contain a GPU.

    AMD's PDF document for the investor day:

    http://phx.corporate-ir.net/External.File?item=UGFyZW50SUQ9Njk3NDJ8Q2hpbGRJRD0tMXxUeXBlPTM=&t=1

     
    #9 hkultala, Nov 10, 2010
    Last edited by a moderator: Nov 11, 2010
  10. caveman-jim

    Regular

    Joined:
    Sep 19, 2005
    Messages:
    305
    Likes Received:
    0
    Location:
    Austin, TX
    "designed to couple with" doesn't prove the existence of sideport.
     
  11. keritto

    Newcomer

    Joined:
    Apr 3, 2009
    Messages:
    143
    Likes Received:
    0
    Komodo is listed asCPU, and you should differentiate it from Llano and NG-Trinity as it could be seen in slides :wink:

    Komodo is CPU and guesstimating that it will probably be augmented with GPU similar to one used in Ontario/Zacate APUs, up to 80SPs (5D-VLIW) but more probably 64SPs "3rd Gen DX11" 4D-VLIW with other TMU:ROPS unchanged from O/Z. My guess is that Komodo will probably addressing lack of IGPs in new chipsets and also make it more comparable to intels SB. And it will be socket compatible with Zambezi (AM3r2)

    As for Trinity APU as it's in slides 2-4 BD cores, i in fact hope for 4-6 BD cores and "3rd Gen DX11" (SI) with maybe some minor upgrade from 480SPs 5D (EG/"NI" shaders) in Llano to 640SPs 4D (SI shaders). But then maybe AMD will stay to 2-4 BD cores just so they could add up necessary 4MB of L3 cache to it instead of extra 2 BD cores.

    Trinity
    2-4BD cores (4MB L2 cache)
    4MB L3 cache
    640SP (4D DX11 gen3)
    sFM1/sFS1

    or better (?)
    4-6BD cores (6MB L2 cache)
    no L3 cache
    640SP (4D DX11 gen3)
    sFM1/sFS1

    second solution would certainly need less job to adapt Llano style APU design to Trinity design.

    And does GPU really benefit from additional 4MB L3, instead already large 6M L2 (total for six BDv1 cores) available in HPC case. And for most of 3D/gaming work Llano and probably Trinity will rely on cheap 128-bit DDR3 1866MHz memory BW giving 30GB/s in total (shared w/ CPU) which is probably even good enough for budget dual display 1080p noAA/noAF gaming (considering for praised 640SP), or single 1080p 2AA/16AF?
     
  12. hkultala

    Regular

    Joined:
    May 22, 2002
    Messages:
    296
    Likes Received:
    38
    Location:
    Herwood, Tampere, Finland
    more than 4 bulldozer cores/2 bulldozer modules would make it too big.
    It's still manufactures at 32nm, and it's not a high-end products, so it must not big too big/too expensive to manufacture.

    And I don't see L3 cache as "necessary thing" for this market segment. With 2*2 MB L2 cache there is already plenty of cache.
     
  13. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,021
    Likes Received:
    119
    I really don't see the 480SPs in Llano - not with the flop numbers AMD quoted. More like 240SP IMHO.

    Well, the advantage of L3 is that you can use it for graphics too - L2 being exclusive to the cpu cores. This also probably means you can make the L2 cache attached to the ROPs smaller if you've got shared L3 and it's still faster (as the gpu l2 cache wasn't that large). Clearly, for Phenom II / Athlon II the L3 cache did not really help THAT much - but that balance should shift towards the solution with L3 cache in terms of performance benefits / area if you can also use it for the graphic core. It might require some changes to the MC/graphic core though, which might be something AMD isn't willing to do (as they couldn't just use basically unchanged discrete gpu cores).
     
  14. hkultala

    Regular

    Joined:
    May 22, 2002
    Messages:
    296
    Likes Received:
    38
    Location:
    Herwood, Tampere, Finland
    What makes this an advantage?
     
  15. hkultala

    Regular

    Joined:
    May 22, 2002
    Messages:
    296
    Likes Received:
    38
    Location:
    Herwood, Tampere, Finland
    Yep.

    And the size of the GPU part of the chip also seems to indicate it has 240 shader ALU's, not 480.
     
  16. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,537
    Likes Received:
    962
    They said 500+ GFLOPS. That sounds to me like 480SPs @ ~550MHz or maybe 400SPs @ ~630MHz.

    240SPs at ~1040MHz just doesn't seem realistic, power-wise.

    [​IMG]

    That GPU-part looks to be around 100mm², which is close to Redwood's size, but on 32nm.
     
  17. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,021
    Likes Received:
    119
    The quote was 400-500 GFlops. And from how it was worded, it was for the whole chip. Which leaves 300-400Gflops for the GPU. With 240SPs that gives you 625-830Mhz. Sounds doable to me.
    You are right it looks quite big.
     
  18. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,537
    Likes Received:
    962
    There was another comment during analyst day, where the guy said 500+ GFLOPS, worded in a way that makes me think it was just for the GPU. I don't have time right now but I'll try to find it a link it later today.
     
  19. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,021
    Likes Received:
    119
    Even with 500+ gflops for the gpu, shouldn't 400 SPs be more than sufficient? That would only need 625Mhz. Shouldn't the 32nm SOI process actually allow clock increases over 40nm bulk? Granted the structure doesn't really look like that. But it would be strange imho if there would be so many simds (hence increasing cost) but then they'd be clocked so low.
     
  20. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,537
    Likes Received:
    962
    400 SPs seems plausible, but 240 doesn't, IMO.

    I can't find a free transcript for Tuesday's analyst day, but I think the quote in question was during the Client platforms breakout session, for which the webcast is still available.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...