AMD CDNA Discussion Thread

Discussion in 'Architecture and Products' started by Frenetic Pony, Nov 16, 2020.

  1. OlegSH

    OlegSH Regular

    For sure, all on-chip networks have variable latencies, nobody calls this NUMA.
     
    DavidGraham and pharma like this.
  2. Bondrewd

    Bondrewd Veteran

    Bad news: A100 is way less funny than usual.
     
    Tarkin1977 likes this.
  3. trinibwoy

    trinibwoy Meh Legend

    I'm sure you're right but there's so much hype around SYCL right now as the one language to rule them all. But there was a lot of hype for OpenCL at one time too so....

    It's then still an open question of what stack people will use to get the most out of Frontier and El Capitan. ROCm seems very raw still. Ironically the "Radeon" in ROCm doesn't really fit any more.
     
    Last edited: Nov 9, 2021
    PSman1700 likes this.
  4. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■) Moderator Legend Alpha

    Can folks remain civil and stop with the petty bickering that does not improve any discussions.
     
    Krteq, Malo, Lightman and 2 others like this.
  5. Granath

    Granath Newcomer

  6. Esrever

    Esrever Regular

    Without any nvidia exascale supercomputers being built, how is anyone even going validate any of the things being said without any data?
     
  7. Leoneazzurro5

    Leoneazzurro5 Regular

    Well, if we even will get a number about sustained FLOPs then we could speak about efficiency. Quite frankly, even if the FP64 efficiency of MI200 was half of A100, MI200 would be anyway more efficient at FP64, and I think that figure is not beyond reach.
     
  8. xpea

    xpea Regular

    Wow this one is priceless...
    As of today, A100 HPC/ML scaling performance is very well know as it's the undisputed leader in this field. Just look at HP500 list and MLperf website, or simply google A100 benchmarks, they are hundreds of pages...
    On the other side, except few selected benchmarks from AMD yesterday presentation, we have nothing/nada/zip yet about MI200. I mean from unbiased source in the real world...
     
    pharma likes this.
  9. trinibwoy

    trinibwoy Meh Legend

    Didn't AMD share a bunch of FP64 benchmarks showing MI200 well ahead?

    It's crazy that a dual-die MI200 only has 7% more transistors than A100? A100's count is probably inflated due to having 40MB L2 cache vs 16MB on MI200. I assume that's what those tweets are referring to.
     
    Lightman likes this.
  10. Bondrewd

    Bondrewd Veteran

    Truly since NV won exactly 0 bids.
    Yea.
    It just overall has more memories per SM.
    192KiB L1/shmem slab versus 16+64 for CDNA2.
     
    Krteq and Lightman like this.
  11. pharma

    pharma Veteran

    Take a look outside the US, specifically in the EU.

    We will know soon enough whether it's a stunt and whether that's the reason to move on to MI300 asap.
     
    Last edited: Nov 10, 2021
  12. DavidGraham

    DavidGraham Veteran

    No more than 2.5X ahead, despite being theoretically almost 5X faster. Some benches don't even advance beyond the 1.6X margin.
     
  13. Bondrewd

    Bondrewd Veteran

    The only quasi-announced EU exascale uses SiPearl + Ponte Vecchio...
    You can't win a Summit successor with a 'stunt'.
    The 'reason' is AMD cranking ~6Q prod to prod in DC GPUs.
    Been like that since Vega20.
     
  14. DavidGraham

    DavidGraham Veteran





     
    Last edited: Nov 10, 2021
    xpea and pharma like this.
  15. trinibwoy

    trinibwoy Meh Legend

    Presumably the people ordering these things aren’t idiots and MI200 had to be more than a benchmark queen to get the nod. But stranger things have happened.
     
    Lightman likes this.
  16. pharma

    pharma Veteran

    That's what I would assume. But the tweet above implies AMD still not including pertinent specs in the white paper, which is somewhat unusual at this late date.
     
  17. Lurkmass

    Lurkmass Regular

    The source language doesn't matter as much as a cross-vendor intermediate bytecode representation or lack thereof. Even if vendors did agree to a common source language like SYCL it's progress would be stalled thereafter since vendors can't see eye to eye on what the supported driver ingestion format should be. AMD and Intel will never agree to support PTX assembly as the standard format for compute kernel binaries since it's a sub-optimal abstraction for their hardware in terms of performance. Nvidia will never agree to accept any other format either because they don't want to make their existing software ecosystem advantage that they've built up over the years to be redundant so why should they force themselves to be at level playing field with others when they can keep being on top ?

    If the industry did start participating on SYCL technical specifications, it would end up in being the same deadlock that OpenCL was mired in. If adoption behind SYCL is contingent on other corporations making compromises at their own detriment for the greater goal of achieving portability then we take OpenCL to be the example of an end result from the lack of compromises ...

    You won't be thrilled with the the answer but developers will have to use the ROCm stack regardless because it's the most production ready option on AMD HW. Mesa's clover project isn't functional yet. Others can try to make their own stack by looking at ROCm itself but that's far from ideal since public documentation is bad and it'll be hard to follow the code without being a former AMD employee so they may still have to do some reverse engineering despite being an open source project. ROCm is the only one left standing as being viable because it's a project that AMD officially supports and it's a part of their long term corporate responsibility as well so if they don't ditch it by the end of this decade then ROCm will eventually reach maturity ... (ROCm was only made public a little over 5 years ago with the initial release)

    Whether developers will want to maintain compatibility with multiple compute stacks is another problem altogether but given the politics they'll have no choice but to bite the bullet if they want to expand their customer base ...
     
    Lightman, trinibwoy and xpea like this.
  18. xpea

    xpea Regular

    nearly every week a government supercompurter is installed with A100 around the world...
    Nov 1st, Texas Advanced Computing Center (TACC):
    https://www.hpcwire.com/2021/11/01/tacc-unveils-lonestar6-supercomputer/
    Oct 18th, UAE National Center for meteorology:
    https://www.hpcwire.com/off-the-wir...ecasting-with-new-supercomputer-built-by-hpe/
    Sept 28th, Department of Energy’s National Nuclear Security Administration Tri-Lab CTS-2
    https://www.hpcwire.com/2021/09/28/nnsa-selects-dell-for-40m-cts-2-commodity-computing-contract/
    Sept 16th, Queen Máxima of the Netherlands
    https://www.hpcwire.com/off-the-wir...s-inagurates-supercomputer-for-dutch-science/

    and so on and so on...
     
    DegustatoR likes this.
  19. Bondrewd

    Bondrewd Veteran

    Yeah but is the Summit aka a pretty balanced system replacement so ain't no way it actually sucks™ at real sciences.

    Where's the NV exascale?

    Even Intel won (effectively) 2 systems.
     
  20. troyan

    troyan Regular

    AMD's has shown even a low 1.4 improvement over A100. One GCD is nearly as big as GA100 (*) while offering barely better FP64 performance with less cache, on chip bandwidth and off chip interconnection.

    I'm curios about the PCIe version. Lets see what AMD can do with 300W.

    *
     
Loading...

Share This Page

Loading...