AMD CDNA Discussion Thread

CUDA, ROCm, and oneAPI software stacks are all meant to be built and specialized for extracting maximum perf behind each vendors unique HW so forcing either to converge is ill-suited when they have different priorities and standards ...

I'm sure you're right but there's so much hype around SYCL right now as the one language to rule them all. But there was a lot of hype for OpenCL at one time too so....

It's then still an open question of what stack people will use to get the most out of Frontier and El Capitan. ROCm seems very raw still. Ironically the "Radeon" in ROCm doesn't really fit any more.
 
Last edited:
Without any nvidia exascale supercomputers being built, how is anyone even going validate any of the things being said without any data?
 
Well, if we even will get a number about sustained FLOPs then we could speak about efficiency. Quite frankly, even if the FP64 efficiency of MI200 was half of A100, MI200 would be anyway more efficient at FP64, and I think that figure is not beyond reach.
 
Without any nvidia exascale supercomputers being built, how is anyone even going validate any of the things being said without any data?
Wow this one is priceless...
As of today, A100 HPC/ML scaling performance is very well know as it's the undisputed leader in this field. Just look at HP500 list and MLperf website, or simply google A100 benchmarks, they are hundreds of pages...
On the other side, except few selected benchmarks from AMD yesterday presentation, we have nothing/nada/zip yet about MI200. I mean from unbiased source in the real world...
 
Didn't AMD share a bunch of FP64 benchmarks showing MI200 well ahead?

It's crazy that a dual-die MI200 only has 7% more transistors than A100? A100's count is probably inflated due to having 40MB L2 cache vs 16MB on MI200. I assume that's what those tweets are referring to.
 
Truly since NV won exactly 0 bids.
Take a look outside the US, specifically in the EU.

We will know soon enough whether it's a stunt and whether that's the reason to move on to MI300 asap.
 
Last edited by a moderator:
Presumably the people ordering these things aren’t idiots and MI200 had to be more than a benchmark queen to get the nod. But stranger things have happened.
That's what I would assume. But the tweet above implies AMD still not including pertinent specs in the white paper, which is somewhat unusual at this late date.
 
I'm sure you're right but there's so much hype around SYCL right now as the one language to rule them all. But there was a lot of hype for OpenCL at one time too so....

The source language doesn't matter as much as a cross-vendor intermediate bytecode representation or lack thereof. Even if vendors did agree to a common source language like SYCL it's progress would be stalled thereafter since vendors can't see eye to eye on what the supported driver ingestion format should be. AMD and Intel will never agree to support PTX assembly as the standard format for compute kernel binaries since it's a sub-optimal abstraction for their hardware in terms of performance. Nvidia will never agree to accept any other format either because they don't want to make their existing software ecosystem advantage that they've built up over the years to be redundant so why should they force themselves to be at level playing field with others when they can keep being on top ?

If the industry did start participating on SYCL technical specifications, it would end up in being the same deadlock that OpenCL was mired in. If adoption behind SYCL is contingent on other corporations making compromises at their own detriment for the greater goal of achieving portability then we take OpenCL to be the example of an end result from the lack of compromises ...

It's then still an open question of what stack people will use to get the most out of Frontier and El Capitan. ROCm seems very raw still. Ironically the "Radeon" in ROCm doesn't really fit any more.

You won't be thrilled with the the answer but developers will have to use the ROCm stack regardless because it's the most production ready option on AMD HW. Mesa's clover project isn't functional yet. Others can try to make their own stack by looking at ROCm itself but that's far from ideal since public documentation is bad and it'll be hard to follow the code without being a former AMD employee so they may still have to do some reverse engineering despite being an open source project. ROCm is the only one left standing as being viable because it's a project that AMD officially supports and it's a part of their long term corporate responsibility as well so if they don't ditch it by the end of this decade then ROCm will eventually reach maturity ... (ROCm was only made public a little over 5 years ago with the initial release)

Whether developers will want to maintain compatibility with multiple compute stacks is another problem altogether but given the politics they'll have no choice but to bite the bullet if they want to expand their customer base ...
 
nearly every week a government supercompurter is installed with A100 around the world...
Nov 1st, Texas Advanced Computing Center (TACC):
https://www.hpcwire.com/2021/11/01/tacc-unveils-lonestar6-supercomputer/
Oct 18th, UAE National Center for meteorology:
https://www.hpcwire.com/off-the-wir...ecasting-with-new-supercomputer-built-by-hpe/
Sept 28th, Department of Energy’s National Nuclear Security Administration Tri-Lab CTS-2
https://www.hpcwire.com/2021/09/28/nnsa-selects-dell-for-40m-cts-2-commodity-computing-contract/
Sept 16th, Queen Máxima of the Netherlands
https://www.hpcwire.com/off-the-wir...s-inagurates-supercomputer-for-dutch-science/

and so on and so on...
 
No more than 2.5X ahead, despite being theoretically almost 5X faster. Some benches don't even advance beyond the 1.6X margin.

AMD's has shown even a low 1.4 improvement over A100. One GCD is nearly as big as GA100 (*) while offering barely better FP64 performance with less cache, on chip bandwidth and off chip interconnection.

I'm curios about the PCIe version. Lets see what AMD can do with 300W.

*
 
Back
Top