Llano IGP vs SNB IGP vs IVB IGP

Discussion in 'Architecture and Products' started by AnarchX, Oct 29, 2010.

  1. Karoshi

    Newcomer

    Joined:
    Aug 31, 2005
    Messages:
    181
    Likes Received:
    0
    Location:
    Mars
    Why are there no APU's GPUs running at 2+ GHz?
     
  2. DavidC

    Regular

    Joined:
    Sep 26, 2006
    Messages:
    348
    Likes Received:
    27
    It's all about balance. Remember we aren't talking about the 1980's which the components had passive cooling using 3W. We are already limited by cooling and power consumption.

    It's probably better to get 400SPs at 650MHz than 200SPs at 1300MHz. GPU code has extremely high parallelism so adding more SPs are easier than clocking it high.

    Nvidia does have high clock speeds for its SPs, but again, its just for SPs. All other blocks clock much lower. ATI design calls for having everything clock like the base clock. I guess they can change it, but not something that'll happen overnight.

    Even if the process technology, thermal and power limits, and costs of development allow clocking the GPU at 2GHz, does the design allow it?
     
  3. hkultala

    Regular

    Joined:
    May 22, 2002
    Messages:
    296
    Likes Received:
    38
    Location:
    Herwood, Tampere, Finland
    It seems intel is finally at least developing openCL implementation for their integrated GPU's:

    They just sent an email to llvm-developers list, recruiting people to develop their llvm-based opencl implementation:

     
  4. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
  5. GZ007

    Regular

    Joined:
    Jan 22, 2010
    Messages:
    416
    Likes Received:
    0
  6. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Not quite. Graphics can live with memory under 1 gig. Server workloads can't. Besides, it is not immediately clear that it will have lower latency than normal dram though it is likely. Besides, you would want to start with a lower risk product.
     
  7. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    512-bit LPDDR2 stacks? That's unlikely although not strictly impossible. Also describing LPDDR2 as "old" when there's barely any smartphone using it today proves only that Charlie doesn't know enough about that part of the market to speculate intelligently about it. It's interesting that nobody is thinking of doing that kind of bus width before the JEDEC Wide I/O standard with TSV (Through Silicon Vias i.e. 3D packaging) but Intel is hardly using a traditional approach here so standards are not very relevant.

    LPDDR2 chips are always 32-bit and the only official JEDEC packages are for Package-on-Package configurations. The maximum is a 64-bit PoP package with 2 or more chips. But Intel could certainly buy the raw chips and stack it themselves (something they couldn't do with GDDR5 perhaps) - they'd need one chip for every 32-bit. That would mean 16 chips for 512-bit (each 512Mbit for 1GB). That's a HUGE stack - this isn't going to be a thin package if true! Obviously that would be the top SKU and not aimed at ultraportables or netbooks - but the problem is that if you've got only 256MB then you can't have more than a 128-bit memory bus (there are no 256Mbit chips and 512Mbit isn't the most cost efficient standard already). They'd also be wasting a fairly huge 384-bit worth of their memory controller! It doesn't matter that the memory chips are closer and that the CPU's pitch is smaller, the memory controller (and probably PHYs) are still going to cost the same - that is to say... quite a lot!

    Could Ivy Bridge be doing something fancy with in-package memory resulting in a substantial performance boost? Yes. But I'm not convinced it's what Charlie is describing if so. Either way I'd like one of these - if he's wrong, deeply stacked enough to be smoked.
     
  8. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,565
    Likes Received:
    4,744
    Location:
    Well within 3d
    Silicon interposers have been part of an FPGA vendor's product plans already, so that is doable.
    I think Altera was the one. (edit: Xilinx)
    I hope the drawn diagram isn't too accurate, since that would require one massive glob of thermal grease to reach from the CPU to the bottom of the heatspreader.

    It's a slight step backwards from the progressively more unified GPU/CPU memory hiearchy introduced by Sandy Bridge.
    The on-die memory hierarchy on the CPU could still be unified, but there would be a secondary memory controller that would be primarily useful to the GPU.
    Perhaps at some point it would just be a DRAM L4 cache? It seems like a waste to have it idling if someone opts out of the on-board graphics.
     
    #28 3dilettante, Dec 30, 2010
    Last edited by a moderator: Dec 30, 2010
  9. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Sounds like AMD *could* do it reasonably cheaply.

    I assume that the driver will preferentially allocate memory for graphics objects from this pool.

    Teaching libc malloc to not touch this would be a different matter though. Or can drivers lock down a segment of physical memory to themselves?
     
  10. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    I remember reading on these forums (not sure who said it) that LPDDR2 was expensive and hence wasn't being used. It COULD be that charlie meant that it the standard had been finalized a while ago.
     
  11. fehu

    Veteran Regular

    Joined:
    Nov 15, 2006
    Messages:
    2,016
    Likes Received:
    958
    Location:
    Somewhere over the ocean
    [​IMG]

    He even added the watermark in the easiest spot to delete it :grin:
     
  12. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    Pretty sure I'm the one who said that ;) Specifically that the Apple A4 used 64-bit LPDDR1 instead of 32-bit LPDDR2 because 512MB of the latter would be a lot more expensive in that timeframe (and might not even have been available in the volumes Apple needs). Hopefully in early 2012 there wouldn't be a huge price difference versus LPDDR1 anymore, but there would still be a big price difference versus DDR3. No idea how it would compare per megabyte versus GDDR5. Expect plenty of LPDDR2 devices in 1H11 (starting with the LP Optimus 2X using Tegra2).
     
  13. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,565
    Likes Received:
    4,744
    Location:
    Well within 3d
    Could, if that is the direction they choose. GlobalFoundries may have some input on this.
    The additional question is "when".

    It's not only process technology Intel has historically beat AMD on by a wide margin.
    Packaging technology has also been a strong suit for Intel, with AMD usually lagging by a fair amount.

    Also, I checked and it was Xilinx that had the silicon interposer tech.
     
  14. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    What is there in packaging tech to beat your competitor with? PPro's L2 cache comes to mind but doesn't seem like that big a deal.
     
  15. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,565
    Likes Received:
    4,744
    Location:
    Well within 3d
    Intel transitioned faster to organic substrates when that first came into use, and faster to use LGA packages.
    It was faster to eliminate lead from its packaging, and one of the first to get a handle on the reliability issues that arose because of it.

    Intel was also able to mass-produce dual-die packages much earlier than AMD. This was perhaps due to necessity, but this predates AMD's MCM by years.
    As a result, it beat AMD's single-chip multicores to market, both for the dual and quad-core transitions.
     
  16. TKK

    TKK
    Newcomer

    Joined:
    Jan 12, 2010
    Messages:
    148
    Likes Received:
    0
    These are simply situations where having tremendous resources to throw at certain issues pays off.
    Their resources sure allow them to react very quickly.


    Anyway, while you can never know with Intel, I wouldn't be surprised if the IB incarnation comes with something moderate like 128-bit/256MB, basically Intel's own answer to 'sideport memory'. Too much of this would drive up cost and make cooling rather challenging, I think.

    I also wouldn't be surprised if developement of this started around the time AMD showcased sideport memory.
     
  17. DavidC

    Regular

    Joined:
    Sep 26, 2006
    Messages:
    348
    Likes Received:
    27
    Doesn't ANYONE remember the leaked slide with Gesher and Larrabee stating similar things?

    0-512MB 64GB/s bandwidth
     
  18. DarthShader

    Regular

    Joined:
    Jul 18, 2010
    Messages:
    350
    Likes Received:
    0
    Location:
    Land of Mu
  19. itsmydamnation

    Veteran Regular

    Joined:
    Apr 29, 2007
    Messages:
    1,337
    Likes Received:
    454
    Location:
    Australia
    current high end amd GPU's having like 7-8mb of ram on them, do they actually need to hit cahce at all. to me and im a layman, thinking about this logically doesn't make much sence.

    GPUs are:
    high memory latency
    high memory thoughput

    I would have thought that GPU's dont get that much duplicate data being pulled over the memory bus which is where a cache would help reduce memory bandwidth. Also if its getting 60% hit what about cache thrashing on CPU intensive games.

    if that tiny amout of cache actually helped a GPU you think we would have seen it by now.

    The other thing that a cache does is reduce latency, who cares about that for a GPU. so to me that conclusion makes little sence and until we know the way LLano's memory control works how can anything be gugaed.

    edit: also is SB cache structure still inclusive, can SB prefetch from memory straight into L3 or does it have to do straight to L1 like AMD?
     
    #39 itsmydamnation, Jan 2, 2011
    Last edited by a moderator: Jan 2, 2011
  20. ShaidarHaran

    ShaidarHaran hardware monkey
    Veteran

    Joined:
    Mar 31, 2007
    Messages:
    4,027
    Likes Received:
    90
    It confirms no such thing. It means exactly what it says. SNB graphics will potentially have access to a greater subset of data at a lower latency than Llano. Latency is practically irrelevant with such a parallel workload as graphics processing. Throughput is key.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...