AMD: Speculation, Rumors, and Discussion (Archive)

Discussion in 'Architecture and Products' started by iMacmatician, Mar 30, 2015.

Thread Status:
Not open for further replies.
  1. psurge

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    939
    Likes Received:
    35
    Location:
    LA, California
    Would they actually need to do anything here? I would hope the minimum needed is a mobo with the appropriate form factor, and perhaps some modification to existing after market cooler designs. They could charge a pretty penny and still come out cheaper than the price of a high-end discrete GPU + DDR4 + high-end Intel desktop CPU. And they'd presumably offer ECC, which isn't available from Intel in any consumer product as far as I know.

    Going further - this would also likely be very attractive for the high performance in a small form factor crowd. Personally, I would love a system with support for one or two NVMe SSDs (M.2 or 2.5"), a high performance/wattage APU with 16 or 32GB HBM, the usual outputs, an external power brick, ~250-300W, stuffed in a box not much larger than the double-height PCIe cards we see today.

    Anyway whatever :). I think they'd be idiots not to have some (relatively but not outrageously expensive) way for the high-end consumer market to get into one of these. Make it happen AMD!
     
  2. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,496
    Likes Received:
    911
    It would be pretty great as a SteamBox as well.
     
    Lightman likes this.
  3. psurge

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    939
    Likes Received:
    35
    Location:
    LA, California
  4. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    10,073
    Likes Received:
    4,648
    HBM plus quad-channel DDR4 controller? What's the point?
     
  5. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,496
    Likes Received:
    911
    If this is both real and the HPC APU that was mentioned here, then the 16GB of HBM would be far too little, and a large pool of DDR4 would be required.
     
  6. smw

    smw
    Newcomer

    Joined:
    Sep 13, 2008
    Messages:
    113
    Likes Received:
    43
    It says that it supports 256GB DDR4 per channel and 4 total channels, so I guess to achieve parity between DDR4 and HBM you would need something like 5nm production for the memory chips and HBM5 generation ;) On the other hand, at least from the schematic, it seems that the L3$ is shared only between the CPU cores, so I am thinking - is it possible that the HBM would act as both GPU memory and as a form of L4$ (since, again from the schematic, it seems that the HBM memory is considered a part of the APU die) that would be shared between the CPU and the GPU? Perhaps a part of it would be reserved for caching purposes?
     
  7. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,137
    Likes Received:
    2,939
    Location:
    Well within 3d
    The blocks are generic enough on that slide, so nothing sticks out as being egregiously wrong at first glance.
    I am not sure how to reconcile the specifics for the GPU architecture's name versus the other data points. The next-gen GPU in the prior slides that feeds into the HPC GPU would seem to be Fiji, with the next gen that might fit the Arctic Islands theme being the box after.
    The bandwidth numbers for the HBM look better if the APU is not put off until 2017 or later. Knights Landing has that bandwidth and Volta is set to double it by the time Volcanic Islands is replaced. The PCIe IO may be outmoded by then as well.

    The rest of the diagram's details are pretty non-specific. I may go into more of a comparison with Knight's Landing in that thread, but the design wins for Intel and IBM+Nvidia for upcoming supercomputers show how broad a set of technologies is important in that space. Intel is pushing a more full interconnect, software-defined networking, photonics, a file system, and so on. Nvidia and IBM have a number of techs outside of the chip as well.

    If this set of details is all AMD is (at best leaking) talking about, the HPC APU may have a harder time of things.
     
  8. pTmdfx

    Newcomer

    Joined:
    May 27, 2014
    Messages:
    249
    Likes Received:
    129
    Guess that any HBM pool coming with an APU is likely software managed, independently addressable pool as today. There are plenty of system memory bandwidth, and CPU gets loads of cache. So DRAM caches are just something marginally better for CPU in some niche cases, but at a cost of DRAM bandwidth (in-memory tags, locality, latencies, etc) which is against what you would expect for a GPU.
     
  9. pTmdfx

    Newcomer

    Joined:
    May 27, 2014
    Messages:
    249
    Likes Received:
    129
    This is an integrated solution after all. All the names you brought up here are a discrete part (KNL too if one treats it as accelerator) of a system that burns maybe way more power than this integrated one. Moreover, AMD would still release discrete parts to compete with these. That's said the bullet point of an APU should have been the integration. If there is nothing so different about it (oh, we already have tons of not-so-different parts today), I would agree on your cloudy forecast.

    Perhaps AMD has something to do with its, hem, SeaMicro IPs.
     
  10. psurge

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    939
    Likes Received:
    35
    Location:
    LA, California
    The PCIe IO may be a little out of date, but I think 1GbE in 2017 (?) is pretty egregious. I was expecting to see at least 10GbE, maybe even support for (one or more channels) of the upcoming 25GbE standard.

    The CPU core count seems high. An 18 core Haswell Xeon EP is pretty huge and power hungry - 662mm2 for the 18core variant on 22nm, at 145W according to Anandtech (granted, the Xeon also has almost 6x the L3 of this rumored APU). Anyway, the less than amazing HBM2 bandwidth (given the time frame) may simply be what makes sense given the amount of streaming compute they have space for after laying down 16 cores. Still, for a higher-end gaming or HPC focused APU, I would have expected more emphasis on streaming compute and HBM bandwidth than on so much general purpose CPU.
     
  11. Nemo

    Newcomer

    Joined:
    Sep 15, 2012
    Messages:
    125
    Likes Received:
    23
    AMD published a paper last year about multi-level design in next-gen memory. It's possible they just wanted it out sooner and merely toyed with the idea as a means to that end.
    http://dl.acm.org/citation.cfm?id=2689667
     
  12. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,137
    Likes Received:
    2,939
    Location:
    Well within 3d
    We do not know the power of this solution, nor where that would go in the continuum of perf/W to know if it is a win.
    The high level of integration could help at a node level, but inter-node connectivity and platform organization is a big factor in maintaining scaling and power-efficiency for large HPC systems, which KNL and POWER9+Volta are going into.
    Intel and Cray, and IBM and Nvidia have various forms of interconnect and high-end infrastructure between compute cards and between blades/racks.
    Utilization and power consumption can suffer if scaling to large numbers of nodes is poor.

    Perhaps AMD is counting on third-party solutions to leverage its PCIe, but this does make the AMD less of a one-stop shop compared to the other solutions. I would be curious to see if AMD expects a vector into that space, since oft-burned former partner Cray is hitched to KNL currently.

    Those discretes will need something to corresponding to the expanded interconnects the competing discrete solutions have.

    SeaMicro's shared-nothing nodes do not quite fit the echelon of systems the others can scale to, and AMD has apparently flubbed the integration of it once already. I have not seen much on what the future versions would entail. If it's integrated in that diagram, there doesn't seem to be as much network capability.
     
  13. TheAlSpark

    TheAlSpark Moderator
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    20,813
    Likes Received:
    5,911
    Location:
    ಠ_ಠ
    For the Inglorious Futar of Consoles!
     
  14. Inuhanyou

    Regular

    Joined:
    Dec 23, 2012
    Messages:
    786
    Likes Received:
    48
    Location:
    New Jersey, USA
    Oh my. The wait for a potential PS5 with all these new technologies popping up will be a wait indeed.
     
  15. pTmdfx

    Newcomer

    Joined:
    May 27, 2014
    Messages:
    249
    Likes Received:
    129
    While the exact number is not known yet, AMD said their target for the HPC APU is 200 to 300 watts of TDP in their Japan Update event. I assume a top-of-the-line Volta or a KNL should consume similar level (or at least the lower bound) of power.

    Agreed. That's said it is interesting to know if the APUs can scale vertically (MP) as a node, particularly when the alleged APU has a large number of PCIe lanes. Those could have served a double duty as ccNUMA interfaces in the same way of the cancelled G2012 platform. Volta has the NVLink to accomplish this together with the MP-scalable POWER9, while KNL as an accelerator can scale alongside the Xeon MP systems.

    That's an ageing piece of IP even at the time of acquisition, and it is fair to assume that there is something new in the pipeline. Recently filed patents suggests this, too. Moreover, the SeaMicro has an integrated network there, so MPI sounds not to be an alien client there. That's said, as far as I understand, Intel's OmniScale fabric is similarly shared-nothing.
     
  16. Newguy

    Regular Newcomer

    Joined:
    Nov 10, 2014
    Messages:
    256
    Likes Received:
    112
    Hopefully this is the right place to post this, on AMD's "high performance server APU":

    http://techreport.com/r.x/amd-fad-2015/slide-datacenter.jpg

    It says multi-tflops, as in >=2. Kaveri (7850k) was 856 total, about 736 GPU 120 CPU (512 cores, 720MHz):

    http://images.anandtech.com/doci/7507/KaveriPerf_575px.jpg

    I'm going to say triple the CPU part because of Zen ~1.4x excavator, assume that's 5-10% more than steamroller then double the cores which again I assume could happen for a big server part. 2x cores, 1.5x perf per core (assuming same clocks) would make about 360gflops CPU side. So that would leave 1640gflops, so either 768 cores at roughly 1070MHz which could be possible, or more likely 896 cores at ~925MHz. Only problem with that is for 2 tflops at 200 watts (assumption) consumes a lot more power than I'd assume it should with that "double perf/w" claim they've made for GPUs and that's a lot of die gone for a probably large server part. For example the 270 is ~2.3Tflops, 212mm^2 and consumes about 150W. Assume ~80W with a shrink (little less than the claimed doubling of perf/w) and maybe 150mm^2 you're looking for a lot left over in a big, 200W server part. What would be a reasonable assumption of die size, 400mm^2 for the APU? Idk maybe I'm underestimating the CPU core count, could be 16 and them being clocked lower, usually happens having more slower cores in server stuff doesn't it instead of more, fast stuff (perf/w and all).
     
  17. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,496
    Likes Received:
    911
    There's way too little information to really conclude anything. Given the vague timing (2016–2017) it might just be a plain old regular APU with an Opteron sticker. Think 4 Zen cores, plus 16 CUs (1024SPs) running at ~1GHz. You'd get over 2TFLOPs from the GPU alone.

    But if it's an APU designed specifically for the HPC market, I'd expect something much bigger. Probably 4~8 Zen cores and 32~64 CUs (2048~4096 SPs).
     
  18. pTmdfx

    Newcomer

    Joined:
    May 27, 2014
    Messages:
    249
    Likes Received:
    129
    Frankly, AMD might not need to "specifically" roll out one new die design for HPC. If AMD is perfectly fine to bring 2.5D packaging elsewhere (and they hinted it), they may go the multi-die route, where dies can form multiple SKUs facing different CPU, APU and GPU segments.

    Let's say if they would introduce a couple of high-end HBM GPUs and are okay with the overhead of a multi-die interface, they can bring in a CPU die and this already gives you two combinations of APU without bringing up a new monolithic SOC from zero. Then the CPU die or GPU die (Multiadapter, yah!) can have multi-die variants of itself, if the interface supports it and the spec is meaningful.

    Moreover, as mentioned earlier in the thread, AMD's target is 200W to 300W TDP... per the report of the Japan HPC presentation. I am a bit doubtful about such TDP possible to be contributed by just one monolithic die on 14nm.

    Having said that, it all depends on the economies of 2.5D packaging...
     
  19. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,137
    Likes Received:
    2,939
    Location:
    Well within 3d
    A confirmed interconnect adjusted to this reality would need to be fleshed out. The connection speeds and power consumption, while better than PCB traces, would not match going over the same die.

    Another unanswered question is where is the cost going to be eaten, besides the interposer.
    Will there be two variants of each die, a non-interposer CPU and normal CPU and the same split for the GPU? Or is each going to bump up complexity and die size so that they can do both.
    Or is AMD going to 2.5D almost universally (seems far off).

    I don't see 200-300W from one die as a challenge. Large GPUs can do that readily, and the largest CPUs like the upper bins of Intel's EX Xeons have TDPs that an extra bin above could probably hit.
    Modern devices do not have a problem drawing power if allowed.
     
  20. pTmdfx

    Newcomer

    Joined:
    May 27, 2014
    Messages:
    249
    Likes Received:
    129
    Since they are introducing a new SOC interconnect and how long their die stacking program has been running, I guess they would have been aware of the extensibility.

    For GPUs, if the GPU can make use of HBM, the cost is always there and a "normal" version is off the table. For CPUs, that's my question too - say if they would still give the single die part an interposer, make another non-interposer version, or use flip-chip bumps directly (but still 2.5D, so mixed bump size for the interposer...).

    I guess AMD might be fine with the fusing-off approach, and the overhead might not be too high in die area, since TSV enables smaller bump size which in turns shrinks the PHY sizes. Moreover, weighting redundancy in the single-die variants over scaling of more product SKUs (that may eat into the higher margin markets) sounds a nice investment IMO. At least it sounds more solid than designing multiple monolithic SOCs and hoping for profit "the old way".

    By the way, I bet AMD would still make (but not with high priority) low-end GPUs and APUs with low-end graphics that use external memory, since the market is still there anyway, and what AMD lacks in competency in the first place is not the graphics, but the CPU piece.


    GPU in that range is often a really huge die... Anyway, one bullet point of 2.5D is to break down monolithic SOCs, and since the GPU is likely getting HBM, it seems a broken-up one is fairly natural move.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...