AMD CDNA Discussion Thread

Discussion in 'Architecture and Products' started by Frenetic Pony, Nov 16, 2020.

  1. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    Of course.
    Those procurements stretch for years.
     
  2. pharma

    Veteran

    Joined:
    Mar 29, 2004
    Messages:
    4,887
    Likes Received:
    4,534
    One possible area that will need to be resolved is software. Existing HPC applications and software will see MI200 as two gpu's and not one like MI100 (or A100), so from the standpoint of extracting the highest performance and efficiency there might be additional work beyond using what is already available to attain the best optimized workloads for MI200.
     
  3. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,110
    Location:
    New York
    That's not a problem for HPC. That software is already designed to target many CPUs and GPUs. The only real issue is that MI200 isn't actually the "first multi-die GPU" that AMD promised. It's actually 2 GPUs on a stick with faster I/O between them which isn't that exciting.

    [​IMG]
     
  4. Esrever

    Regular

    Joined:
    Feb 6, 2013
    Messages:
    846
    Likes Received:
    647
    "It's actually 2 GPUs on a stick with faster I/O between them which isn't that exciting."
    And this is somehow unexpected? Do you expect Nvidia's or Intel's solution to have some magic sauce to connect them together as to not have it boil down to 2 GPUs with a faster interconnect between them?
     
    Krteq likes this.
  5. pharma

    Veteran

    Joined:
    Mar 29, 2004
    Messages:
    4,887
    Likes Received:
    4,534
    At Argonne National Laboratory you should have Polaris coming online in 2022, and should be able to a validate AI workload performance.
    Argonne’s 44-Petaflops ‘Polaris’ Supercomputer Will Be Testbed for Aurora, Exascale Era (hpcwire.com)

    The testing should be interesting from the aspect of comparing performance based on the number of gpu's required from each system (Polaris 2,240 vs Aurora 9,000) to attain similar results.

    Edit: I doubt it will be used as a testbed but Leonardo will also be available with 10 exaflops of AI performance.
     
    #305 pharma, Nov 10, 2021
    Last edited: Nov 10, 2021
  6. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,211
    Yes, AMD's upcoming RDNA3 GPUs will use MCM, and they will have to become one homogeneous big GPU or else performance will suffer considerably, putting two GPUs on one stick is no different than putting two GPUs in SLI/Crossfire on one PCB, it was worthless and the whole multi GPUs configuration trend died prematurely. We were led to believe CDNA2 is a true multi-core GPU acting as one, a true breakthrough, not some crossfired GPUs on a single PCB.
    Bandwidth is still actually 200GB/s directional. Power consumption is 560w on water cooling.
     
    #306 DavidGraham, Nov 10, 2021
    Last edited: Nov 10, 2021
  7. troyan

    Regular

    Joined:
    Sep 1, 2015
    Messages:
    603
    Likes Received:
    1,122
    AMD's own benchmarks has been made with the 560W version. So MI250X is around 36% more efficient than A100 SMX while using twice the footprint and only on par with the 300W PCIe version. And i guess the 20% increase in FP16 is only sustained with 560W, too. So it is less effcient while being around twice as big...

    For nVidia? Everything else doesnt make any sense. At least their interconnect has to be faster than NVLink.
     
  8. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,462
    Location:
    Finland
    I want whatever you were smoking when figuring out Polaris "attaining similar results" to Aurora.
    Polaris is nothing but quick'n'dirty "testbed" put together because Aurora is late. And Aurora won't have 9000 GPUs, it'll be over 9000 nodes each of which have 6 Ponte Vecchios which could be counted as 2 GPUs each.
     
    Krteq, Bondrewd and Lightman like this.
  9. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,110
    Location:
    New York
    You’re asking this even after I shared AMD’s slide calling MI200 a multi-die GPU? Singular. As in one GPU.

    If Nvidia or Intel make the same claim we can talk about it then.
     
  10. pharma

    Veteran

    Joined:
    Mar 29, 2004
    Messages:
    4,887
    Likes Received:
    4,534
    My bad, typo on my part. I meant they could use Polaris as a Frontier testbed for AI workloads in 2022. The point is there will be options to test Frontier workloads against when it comes up to speed.[/QUOTE]
     
  11. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,110
    Location:
    New York
    NVLink already provides up to 600GB/s bi-directional between GPUs on different PCBs. That's either using a switch or direct peer-to-peer connections. So 400GB/s between GPUs on the same interposer isn't earth shattering. Given current NVLink speeds you would expect much bigger numbers between Hopper dies on the same substrate.

    Someone mentioned it earlier in the thread but the big benefit of MI200 could be compute density. You can fit more MI200 dies than A100 dies in the same space.
     
    xpea likes this.
  12. Actiually.. the first commercial rasterization GPU was a multi-die GPU.

    [​IMG]
    I'm also surprised and a bit disappointed that AMD is still calling Mi200 a GPU.
     
    Lightman, Krteq and Bondrewd like this.
  13. Inter die links are 400GB/s while the inter GPU links are 800 GB/s. All coherent links.
    Direct from the CDNA2 paper.
    But somehow it seems a lot of repurposed tech from EPYC.
    Anyway Trento CPU is a one off project and possibly MI200 too.

    Packaging tech EFB is interesting though, will probably be useful in democratizing HBM. Cutting down Si Interposer costs drastically.
    Real next gen is MI300.
    But I bet the system architect already evaluated everything and went with MI200 for this gen.
    They have Spock, they have the previous machine and they have the Exascale readiness task force from the ECP project who are evaluating all the architectures.
    I bet they are at least as smart as forum members here if not smarter, just about everyone there carrying their PhD in their name tags and all.

    Eventual goal is to ensure software from Frontier and Aurora can run on both.
    You can google around, lots of work done by the gov agencies for software portability and they awarded contracts to Codeplay (by Argonne) and Mentor Graphics (by ORNL) too for SW work on Intel and AMD systems.
    Their goal is basically to drop all non open/non industry standard frameworks, its public info wont bother linking just few minutes of google search.
     
    Lightman and Krteq like this.
  14. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,110
    Location:
    New York
    That 800 GB/s number isn't real though as you need to reserve some links for GPU<->CPU communication. Unless there's still an option to use PCIe.
     
  15. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,211
    NVIDIA's NVLink3 is 300GB/s directional (600GB/s bi-directional), which is still faster than AMD's 200GB/s directional (400GB/s bi-directional) between dies.
     
  16. Page 8
    https://www.amd.com/system/files/documents/amd-cdna2-white-paper.pdf
    Inter die IF 4x links 400 GB/s
    GPU P2P 8x Links 800 GB/s (Long range MTK SerDes I think)
    Host interface 16x lanes in IF mode or PCIe mode
    Downstream 1x 25Gbps to NIC/Slingshot
     
    #317 Deleted member 90741, Nov 10, 2021
    Last edited by a moderator: Nov 10, 2021
    Lightman, trinibwoy and Krteq like this.
  17. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    It's just ASE FOEB.
    Everyone will be using the things sooner or later.
     
  18. pharma

    Veteran

    Joined:
    Mar 29, 2004
    Messages:
    4,887
    Likes Received:
    4,534
    AMD Announces Instinct MI200 Accelerator Family: Taking Servers to Exascale and Beyond (anandtech.com)
     
  19. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,110
    Location:
    New York
    It doesn't say whether the host interface is dedicated or uses one of the 8 external links. The latter certainly seems to be the case from the picture. 2 links to the CPU in PCIe mode and 6 links to other GPUs.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...