AMD Exascale Heterogeneous Processor

Discussion in 'Architecture and Products' started by ToTTenTranz, Aug 2, 2015.

Tags:
  1. LordEC911

    Regular

    Joined:
    Nov 25, 2007
    Messages:
    795
    Likes Received:
    78
    Location:
    'Zona
    Wouldn't this part, "the EHP is coupled to a second level of off package memory" hint at more than one APU per node?
     
  2. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    10,631
    Likes Received:
    5,200
    I think it most definitely means the APU can access more memory than the on-package HBM.
     
    LordEC911 likes this.
  3. Blazkowicz

    Legend Veteran

    Joined:
    Dec 24, 2004
    Messages:
    5,607
    Likes Received:
    256
    As an example, Xbox One has one APU with two pools, 32MB + 8GB.

    Next-gen Xeon Phi does use HBM or an equivalent + six channels of DDR4, and explicitly goes into multi-socket motherboards.
     
  4. AlexV

    AlexV Heteroscedasticitate
    Moderator Veteran

    Joined:
    Mar 15, 2005
    Messages:
    2,528
    Likes Received:
    107
    It is highly unlikely that this is in any way a valid way to "advertise" yourself for a governmental project (or any non-hobbyist work really). I know it is commonly accepted wisdom in the great Internet echo-chamber that governmental purchases are done by barely literate navel-gazers that need NICE BIG LETTERS TO UNDERSTAND ANYTHING, but the purchasing/financing process is a bit more involved. You'd better hope that AMD "advertised" itself actively for a pretty long time to even be in the running, and not by way of fluffy papers. In no small part because Bill Dally and his guys have been talking about pretty similar things for many years now. And Intel's Xeon Phi is narrowing down on those ideals while being an in-the-silicon product, as opposed to a vague "this is awesome" theoretisation.
     
  5. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    10,631
    Likes Received:
    5,200
    Furthermore, if AMD wanted to make an advertisement, I doubt they'd use a publication with an impact factor of 1.8 to do it.
     
  6. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,295
    Likes Received:
    3,622
    Location:
    Well within 3d
    Another possible target is AMD's creditors and sources of investment/debt.
    In the face of being rated as being a default risk, one has to at least provide the picture of having a future when asking for more money, which has been indicated as a possible next step in the last financial call.


    The vision put forward in the EHP paper does show some of AMD's assumptions.
    On-interposer memory is the assumed source of high bandwidth, and the memory device count shows AMD hopes to get a quadrupling of per-stack bandwidth, and if Fiji and the Network on Chip paper that was cited are any guide, AMD is hoping for a significant improvement in interposer complexity and bump pitch. What Fiji showed is an interposer solution that lost enough area for a naive packing of 2 additional stacks, and the NOC paper wants tighter pitches and discusses a potentially active interposer. Without such improvements, it would be interesting to know where AMD would fit 8 stacks with what Fiji showed in terms of ASIC size and interposer area.

    AMD's using an 8-stack interposer does go against what the PIM proposal had, for a number of reasons. Part of the PIM's benefit was that it adopted an HMC-like interface with the GPU and negated the need for an interposer.
    Having an interposer wouldn't necessary mean you couldn't have PIM, but it takes away a benefit that might have pushed it over the top if other problems like software complexity and cost to implement outweighed its performance benefit in the workloads it worked well in.
    Additionally, the PIM's projected power ceiling, coupled with there being 8 stacks, would devote a huge chunk of the node's power budget to the PIM stacks.

    Power-wise, AMD's proposed node count is high enough that it probably could not meet the DOE's original 20MW ceiling, although the Obama administration's 30MW ceiling might still work. If each node and the hardware attached to it drew 200W, the proposed node count would leave 0 Watts for anything else, like that top-of-rack network and undefined storage nodes.

    AMD provided barely anything to describe what it would do about the networking side, although it said it might be neat to make the NIC an HSA device.
     
  7. pTmdfx

    Regular Newcomer

    Joined:
    May 27, 2014
    Messages:
    271
    Likes Received:
    166
    As far as I have read, PIM in this concept was meant for the high-capacity, off-package layer of memory in the first place.
     
  8. pTmdfx

    Regular Newcomer

    Joined:
    May 27, 2014
    Messages:
    271
    Likes Received:
    166
    Clarification: The EHP paper itself didn't explicitly tell, but the prior work is AFAIR going along this direction.
     
  9. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,295
    Likes Received:
    3,622
    Location:
    Well within 3d
    The TOP-PIM paper placed PIM in opposition to HBM and WideIO, and it was originally evaluated as the primary memory pool.
    It was also evaluated in terms of being placed under a stack of DRAM, which was a significant constraint on its power ceiling.
    A non-volatile standard might be able to accept a higher power budget, although if that happened it would increase the amount of power budget taken away from the central APU.

    The original proposal is years-old at this point, so it might predate AMD's current belief that it needs a tiered memory pool. It does seem to predate the idea that the off-package memory would be non-volatile.
     
  10. pTmdfx

    Regular Newcomer

    Joined:
    May 27, 2014
    Messages:
    271
    Likes Received:
    166
    https://asc.llnl.gov/fastforward/AMD-FF.pdf
    This also worths a read.
     
  11. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,295
    Likes Received:
    3,622
    Location:
    Well within 3d
    That could be an evolution of the concept, although the paper cites the original concept.
    The EHP paper that gives each node 10 TFLOPS also gives the GPU in the node 10 TFLOPs as a baseline, which does not leave much room for processing in memory. The fastforward pdf diagram has enough PIM stacks that if they drew what AMD originally proposed, they would significantly constrain the GPU.
    Possibly, the programmable logic patent that was posted in some of the other AMD threads might be more consistent with the NVRAM stacks, which might live with much lower power consumption since it deals data movement and basic manipulation rather than computation.
    This may well be an area where AMD's vision is not fully nailed down.

    I also wouldn't know about how the fastforward diagram puts an arrow with optics and high-speed IO into the APU (is the APU stacked?). Optical in particular seems like it would have trouble fitting. Some of the ways optical waveguides are implemented is separate from digital logic via some kind of mounting of heterogenous materials, e.g. an interposer.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...