AMD CDNA Discussion Thread

Discussion in 'Architecture and Products' started by Frenetic Pony, Nov 16, 2020.

  1. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    An IP stash yes and AMD will leverage it in funny ways sooner or later.
     
  2. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,109
    Location:
    New York
    Does AMD even support SYCL? The govt can’t make it happen on their own.

    Also SYCL isn’t a replacement for ROCm. SYCL has to be compiled into something that will run on the target hardware and for AMD that’s ROCm. So either way ROCm needs to be fixed.
     
    #242 trinibwoy, Nov 9, 2021
    Last edited: Nov 9, 2021
  3. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    No.
    Raw SYCL is who cares, time to board OneAPI train.
     
  4. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,109
    Location:
    New York
    AMDs official stack is HIP+ROCm. Their best bet would be to abandon HIP and build out an official SYCL/ROCm stack but there’s no sign of that. There’s no sign of them tossing ROCm in favor of oneAPI either.

    How and when exactly is AMD planning to hop on that train?
     
  5. Granath

    Newcomer

    Joined:
    Jul 26, 2021
    Messages:
    80
    Likes Received:
    81
  6. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,109
    Location:
    New York
    Yup, notice the names of the organizations building SYCL solutions that would run on CDNA. None of them are named AMD. Question is whether the supercomputer guys will be willing to use these unofficial / unsupported solutions.
     
    DegustatoR likes this.
  7. Lurkmass

    Regular

    Joined:
    Mar 3, 2020
    Messages:
    565
    Likes Received:
    711
    SYCL has no future for interoperability since the only big corporation investing in it is Intel and even then they expect developers to use DPC++ instead which are Intel specific extensions to SYCL so I doubt many vendors including AMD would support those extensions or let alone SYCL by itself. There's a ROCm backend for DPC++ being developed by the community but I doubt it works ...

    SYCL for portability purposes is not that useful either since AMD refuses to make a SPIR-V kernel compiler which would ensure more consistent behaviour across vendors. You could have a SYCL implementation on ROCm but the source code will get compiled into native GCN/CDNA bytecode which can't run anywhere but AMD HW and that's already the case that's happening with DPC++ community effort ...

    HIP is arguably the more sane solution since that's what developers are actually using right now when it's syntax is more familiar to CUDA but it doesn't make any guarantees about portability either. The compute world is just going to have to cope with multiple but similar enough source languages rather than eventually hoping for one intermediate bytecode to rule them all ...

    CUDA, ROCm, and oneAPI software stacks are all meant to be built and specialized for extracting maximum perf behind each vendors unique HW so forcing either to converge is ill-suited when they have different priorities and standards ...
     
  8. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,210
    Some cases are not the same as ALL cases.

    Nope, 200GB/s, the two dies are connected by 4 IF links, each capable of 50GB/s up/down, so 200GB/s in total.
     
  9. Lurkmass

    Regular

    Joined:
    Mar 3, 2020
    Messages:
    565
    Likes Received:
    711
    The solution to MI250X's segmented memory access between the dies is to just launch twice as many compute kernel for maximum performance so that each die can run it's own compute kernels independent of each other in parallel. This shouldn't be much of a problem if at all since many existing workloads can be easily extended to extract more parallelism ...
     
    Lightman likes this.
  10. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,210
    Then why bother putting two dies together with all the added complexity?
     
    DegustatoR, xpea and PSman1700 like this.
  11. pTmdfx

    Regular

    Joined:
    May 27, 2014
    Messages:
    415
    Likes Received:
    379
    #251 pTmdfx, Nov 9, 2021
    Last edited: Nov 9, 2021
    no-X and Lightman like this.
  12. OlegSH

    Regular

    Joined:
    Jan 10, 2010
    Messages:
    797
    Likes Received:
    1,622
    Computation density in HPC. They need tons of FP64 flops in the smallest possible area.
    What I don't get is why they sum up memory bandwidth and capacity with the 200GB/s bi-directional interface between the GCDs, that looks silly.

    https://www.amd.com/system/files/documents/amd-cdna2-white-paper.pdf
    Not a word on on threads spawning on GPU and on lock free programming. I guess RDNA2 is still in stone age in these regards?
     
    DegustatoR likes this.
  13. Lurkmass

    Regular

    Joined:
    Mar 3, 2020
    Messages:
    565
    Likes Received:
    711
    For higher compute density per-node of course since it helps conserve I/O bandwidth between different nodes and cabinets. Supercomputers and servers are connected by very thick cables on high speed networks so data traffic congestion becomes a real problem on large systems ...
     
    DegustatoR likes this.
  14. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    Same as here duh.
    400.
    Each IFIS at 25GT/s is 100GB/s bidir much the same way it is on EPYC.
     
  15. OlegSH

    Regular

    Joined:
    Jan 10, 2010
    Messages:
    797
    Likes Received:
    1,622
    GA100 doesn't require any special cache treatment unless you want to reach out the absolute maximum performance for a single GPU config, but that's a special case of low level opts to get max perf.
    This is a single GPU and it's being programmed accordingly, it has nothing common with the dual separate GPUs in MI250.
     
    pharma, DegustatoR and PSman1700 like this.
  16. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    Well duh welcome to the very definition of NUMA lands.
    You can treat those things as one bigass GPU.
    You can treat the entire node as one bigass APU.
    Unless you want to reach out the absolute maximum performance for a single GPU config, that is.
     
  17. OlegSH

    Regular

    Joined:
    Jan 10, 2010
    Messages:
    797
    Likes Received:
    1,622
    NUMA has nothing to do with this. Apparently A100 has full speed access to all memory banks without any optimizations, the only difference can be in cache latencies.

    Duh oh ah you can treat 8 GPUs DGX system as a one bigass GPU, what a news!
     
  18. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,240
    Likes Received:
    3,393
    And you will likely get way less than A100 performance by treating them this way.
    Which kinda makes no sense from any perspective but you proving some point to someone who's not even here.
     
    pharma and OlegSH like this.
  19. troyan

    Regular

    Joined:
    Sep 1, 2015
    Messages:
    603
    Likes Received:
    1,122
    With the different bandwidth between GCDs and HBM is not "one bigass GPU". In fact there are 8 or 16 independent GPUs attached with IF and even then there are huge differences within the construct.
    Either MI200 is one year to late or the US goverment has become impatient to build Frontier. Do they even able to get 50% sustained performance? AMDs numbers against A100 point in the direction of less than 50% real FP64 performance...
     
  20. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    Totally not a misery point I swear to god.
    Who knows haha.
    what the fuck.
    membw says hello.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...