AMD CDNA Discussion Thread

Discussion in 'Architecture and Products' started by Frenetic Pony, Nov 16, 2020.

  1. xpea

    Regular

    Joined:
    Jun 4, 2013
    Messages:
    551
    Likes Received:
    783
    Location:
    EU-China
    ^^THIS
    I suspect 2 years old A100 still beats MI200 in large AI/ML workloads where interconnect performance is the bottleneck. As I said previously, AMD saw a business opportunity in the traditional (and dying) HPC64 market with government exascale race and before Hopper availability. They executed well. Kudos to Lisa Su


    It simply shows its CGN roots and the efficiency related to this old architecture. It's much more power efficient to use ML, tensors and lower precision to solve large scientific problems. Even the traditional FP64 workloads, like weather simulation, are migrating to ML. FP64 is now a niche and AMD must move quickly to a new arch. Maybe MI300...

    On a final note, am I the only one disappointed by MI200 ? We all know that it was on a tight schedule to win the exascale race, but still, except the packaging (not even proprietary to AMD, equivalent inFO-L is available at TSMC), MI200 brings nothing new. 2 years old A100 is more feature packed. No sparsity ! Few and slow interconnect links and so on... in fact, it has huge flaws like we can see in AMD promoted typical HPC 4+1 (GPU+CPU) topology, where not even all GDC are linked ! From 3.2TB/s claimed to a mere 100GB bi-directional will look ugly in real world performance with large dataset... It's no surprise that all AMD benchmarks are with a single MI250X vs a single A100. I guess Nvidia will fire back soon to show how A100 scaling beats MI250X in bandwidth limited scenario. Maybe even something new in few hours at GTC 2021...

    Edit: typo
     
    #221 xpea, Nov 9, 2021
    Last edited: Nov 9, 2021
    pharma likes this.
  2. Lurkmass

    Regular

    Joined:
    Mar 3, 2020
    Messages:
    565
    Likes Received:
    711
    Best feature that they brought back to the MI200 series ever since the stillborn HSA project is the hardware accelerated coherent unified memory interop with x86 CPUs. There are a lot of high-end compute systems where GPU acceleration wasn't feasible either because many customers didn't want to make compromises on having too many invasive codebase changes, sacrificing performance targets in other parts of the system, or were often facing high CPU-GPU communication overhead ...

    Programming for another architecture such as IBM Power just to take advantage of NVLink wasn't an attractive concept and sometimes it meant a regression in CPU performance as well since IBM didn't have aggressive release schedules. In other cases opting to use GPU acceleration with PCIE wasn't ideal since PCIE became the bottleneck ...
     
    Lightman likes this.
  3. xpea

    Regular

    Joined:
    Jun 4, 2013
    Messages:
    551
    Likes Received:
    783
    Location:
    EU-China
    In these large HPC/ML systems, it's all about time to market and bottleneck when scaling. For me, MI200 is a regression where each GDC access its HBM pool with 1.6TB/s bandwidth and goes out of die at 50GB/s... 32 times slower...
     
  4. Lurkmass

    Regular

    Joined:
    Mar 3, 2020
    Messages:
    565
    Likes Received:
    711
    For many systems, I can see the MI250X being the no compromise option for those that didn't use GPU acceleration up until now. The alternatives meanwhile did have compromises like requiring a software rewrite (NVLink), changing to a system with lower CPU perf (IBM Power), or low interconnect perf (PCIE) ...

    MI250X with the highest end server x86 CPUs today is the most ideal solution for heterogenous compute and there's no platform that comes close to it's capabilities outside of outdated IBM Power9/Nvidia Volta systems. High CPU perf and coherent CPU-GPU interconnect perf is paramount to a heterogenous system more so than just pure GPU perf alone ...
     
    Lightman and no-X like this.
  5. xpea

    Regular

    Joined:
    Jun 4, 2013
    Messages:
    551
    Likes Received:
    783
    Location:
    EU-China
    You talk about software rewrite but CUDA is the standard for parallel workloads and it has been for years (NVIDIA GTC conference starting today has 200k registrations and CUDA has more than 3 million registered developers). Up to now, AMD is a no show, no go. So let's get back to reality, shall we ?

    To be successful, it's not only a question of hardware capability but more important, a question of software tools, APIs, documentation, training, seminars, addressable market and so on. AMD current software state is a misery land that nobody wants to walk through. An uncomplete mess that is throw at the open source community, praying that someone someday will do the job for AMD. Basically a recipe for failure.
    Just as a reminder, it took AMD 2 years and half to (barely) support Navi in ROCm via ROCr OpenCL runtime path. 2 YEARS AND HALF FOR GOD'S SAKE :no:
     
  6. Lurkmass

    Regular

    Joined:
    Mar 3, 2020
    Messages:
    565
    Likes Received:
    711
    Considering how much more prolific the incumbency with x86 software is compared to CUDA software, I'd be willing think that the minority that's already using CUDA is willing to throw in the towel in favour of more official solutions like ROCm or oneAPI if the ever changing paradigm of graphics programming of moving to new APIs serves as an example ...

    Well I guess AMD lucked out because most potential customers didn't care about GPU acceleration in the past so ROCm just like CUDA is equally of no value to them currently!



    The above illustrates the concept behind why the MI250X is a no compromise heterogeneous system ...
     
  7. xpea

    Regular

    Joined:
    Jun 4, 2013
    Messages:
    551
    Likes Received:
    783
    Location:
    EU-China
    Please read carefully. I said CUDA is the standard for parallel computing. ROCm and OneAPI are not "official", they are currently... nothing

    So Nvidia built a more than 100% Y/Y growing business in datacenter with revenue that will exceed 10 billion dollar this year and nobody cared. Yeah sure :roll:
     
  8. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    Oh noes he's both seething and LARPing again.
    Goddamit
     
  9. Lurkmass

    Regular

    Joined:
    Mar 3, 2020
    Messages:
    565
    Likes Received:
    711
    ROCm or oneAPI might as well be the only official APIs since they offer superior interop with x86 CPUs. The few who were using CUDA are more likely to drop it rather than move off of x86 because there's more potential performance to extract with other APIs than changing CPU architectures ...

    Where are you getting your numbers ? NV's total DC revenue was slightly less than $3B for FY 2020. They'll barely be able to make it above $6B for FY 2021 if they can. GPU acceleration is still the clear distant second compared to pure x86 systems ...
     
  10. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,451
    Likes Received:
    471
    Sorry, but HPC market isn't dying, but growing. Not as fast as AI market, but definitely not dying.
     
    Lightman likes this.
  11. tsa1

    Newcomer

    Joined:
    Oct 8, 2020
    Messages:
    89
    Likes Received:
    97
    I wonder how can we use 'superior' ML to solve problems where FP64 is a minimum required precision and a single error in Nth decimal place will make the equations unsolvable...
     
    Lightman likes this.
  12. Granath

    Newcomer

    Joined:
    Jul 26, 2021
    Messages:
    80
    Likes Received:
    82
    HPC exists because there are laws of physics described by equations. And for complex systems it's not possible to solve them by hand.
    it's going to change over night ?
     
  13. Granath

    Newcomer

    Joined:
    Jul 26, 2021
    Messages:
    80
    Likes Received:
    82
    nympy replacement by cuNum. Well, nvidia can deliver, they understand that software is everything.
    AMD in far behind and I wonder if they can ever catch up.
     
  14. Lurkmass

    Regular

    Joined:
    Mar 3, 2020
    Messages:
    565
    Likes Received:
    711
    Off topic but cuNumeric isn't actually a replacement for NumPy but think of it as being a library for automatic NumPy acceleration on mGPU or mult-node systems ...
     
    #234 Lurkmass, Nov 9, 2021
    Last edited: Nov 9, 2021
  15. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    this
     
  16. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,058
    Likes Received:
    3,116
    Location:
    New York
    That can’t be true otherwise EL Capitan and Frontier wouldn’t exist. ROCm is basically non-existent in the industry at large but maybe folks working on supercomputers don’t care about that. Software is king so they must believe ROCm will mature into something useful one day.

    Nvidia on the other hand is out in the cold as the only platform without a coherent CPU to GPU interface. That puts them in a tough spot in the HPC game.
     
  17. xpea

    Regular

    Joined:
    Jun 4, 2013
    Messages:
    551
    Likes Received:
    783
    Location:
    EU-China
    2.4B last quarter in datacenter is ~10B yearly without counting any growth. But reality is that this business is growing too fast to use a fix number...
     
  18. xpea

    Regular

    Joined:
    Jun 4, 2013
    Messages:
    551
    Likes Received:
    783
    Location:
    EU-China
    Part of these exascale systems price is $300 million government investment in SYSCL effort... So yeah it proves that ROCm is nearly useless

    Grace-Hopper say hello
     
  19. Bondrewd

    Veteran

    Joined:
    Sep 16, 2017
    Messages:
    1,682
    Likes Received:
    846
    2023, same Q as MI300 but way more barebones.
    NV gotta try harder, maybe they'll make it on time in 2025
     
  20. Lets see if Xilinx merger gets through in the next week or two, if it does I bet AMD will slot in new MI accelerator series by rebranding some Xilinx domain specific accelerator.
    SmartNIC will be the new IPU as well.
    Portfolio will expand overnight
     
    Lightman likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...