AMD CDNA Discussion Thread

OlegSH · Nov 9, 2021

Bondrewd said:
Totally not a misery point I swear to god.

For sure, all on-chip networks have variable latencies, nobody calls this NUMA.

Bondrewd · Nov 9, 2021

OlegSH said:
For sure, all on-chip networks have variable latencies

Bad news: A100 is way less funny than usual.

trinibwoy · Nov 9, 2021

Lurkmass said:
CUDA, ROCm, and oneAPI software stacks are all meant to be built and specialized for extracting maximum perf behind each vendors unique HW so forcing either to converge is ill-suited when they have different priorities and standards ...

I'm sure you're right but there's so much hype around SYCL right now as the one language to rule them all. But there was a lot of hype for OpenCL at one time too so....

It's then still an open question of what stack people will use to get the most out of Frontier and El Capitan. ROCm seems very raw still. Ironically the "Radeon" in ROCm doesn't really fit any more.

BRiT · Nov 9, 2021

Can folks remain civil and stop with the petty bickering that does not improve any discussions.

Granath · Nov 9, 2021

Esrever · Nov 9, 2021

Without any nvidia exascale supercomputers being built, how is anyone even going validate any of the things being said without any data?

Leoneazzurro5 · Nov 9, 2021

Well, if we even will get a number about sustained FLOPs then we could speak about efficiency. Quite frankly, even if the FP64 efficiency of MI200 was half of A100, MI200 would be anyway more efficient at FP64, and I think that figure is not beyond reach.

xpea · Nov 9, 2021

Esrever said:
Without any nvidia exascale supercomputers being built, how is anyone even going validate any of the things being said without any data?

Wow this one is priceless...
As of today, A100 HPC/ML scaling performance is very well know as it's the undisputed leader in this field. Just look at HP500 list and MLperf website, or simply google A100 benchmarks, they are hundreds of pages...
On the other side, except few selected benchmarks from AMD yesterday presentation, we have nothing/nada/zip yet about MI200. I mean from unbiased source in the real world...

trinibwoy · Nov 9, 2021

Didn't AMD share a bunch of FP64 benchmarks showing MI200 well ahead?

It's crazy that a dual-die MI200 only has 7% more transistors than A100? A100's count is probably inflated due to having 40MB L2 cache vs 16MB on MI200. I assume that's what those tweets are referring to.

Bondrewd · Nov 9, 2021

xpea said:
Wow this one is priceless...

Truly since NV won exactly 0 bids.

trinibwoy said:
idn't AMD share a bunch of FP64 benchmarks showing MI200 well ahead?

Yea.

trinibwoy said:
A100's count is probably inflated due to having 40MB L2 cache vs 16MB on MI200

It just overall has more memories per SM.
192KiB L1/shmem slab versus 16+64 for CDNA2.

pharma · Nov 10, 2021

Bondrewd said:
Truly since NV won exactly 0 bids.

Take a look outside the US, specifically in the EU.

We will know soon enough whether it's a stunt and whether that's the reason to move on to MI300 asap.

DavidGraham · Nov 10, 2021

trinibwoy said:
New Didn't AMD share a bunch of FP64 benchmarks showing MI200 well ahead?

No more than 2.5X ahead, despite being theoretically almost 5X faster. Some benches don't even advance beyond the 1.6X margin.

Bondrewd · Nov 10, 2021

pharma said:
Take a look outside the US, specifically in the EU

The only quasi-announced EU exascale uses SiPearl + Ponte Vecchio...

pharma said:
We will know soon enough whether it's a stunt

You can't win a Summit successor with a 'stunt'.

pharma said:
whether that's the reason to move on to MI300 asap.

The 'reason' is AMD cranking ~6Q prod to prod in DC GPUs.
Been like that since Vega20.

DavidGraham · Nov 10, 2021

https://twitter.com/i/web/status/1458159517590384640

https://twitter.com/i/web/status/1458197980293566467

https://twitter.com/i/web/status/1458151728075849728

trinibwoy · Nov 10, 2021

Bondrewd said:
You can't win a Summit successor with a 'stunt'.

Presumably the people ordering these things aren’t idiots and MI200 had to be more than a benchmark queen to get the nod. But stranger things have happened.

pharma · Nov 10, 2021

trinibwoy said:
Presumably the people ordering these things aren’t idiots and MI200 had to be more than a benchmark queen to get the nod. But stranger things have happened.

That's what I would assume. But the tweet above implies AMD still not including pertinent specs in the white paper, which is somewhat unusual at this late date.

Lurkmass · Nov 10, 2021

trinibwoy said:
I'm sure you're right but there's so much hype around SYCL right now as the one language to rule them all. But there was a lot of hype for OpenCL at one time too so....

The source language doesn't matter as much as a cross-vendor intermediate bytecode representation or lack thereof. Even if vendors did agree to a common source language like SYCL it's progress would be stalled thereafter since vendors can't see eye to eye on what the supported driver ingestion format should be. AMD and Intel will never agree to support PTX assembly as the standard format for compute kernel binaries since it's a sub-optimal abstraction for their hardware in terms of performance. Nvidia will never agree to accept any other format either because they don't want to make their existing software ecosystem advantage that they've built up over the years to be redundant so why should they force themselves to be at level playing field with others when they can keep being on top ?

If the industry did start participating on SYCL technical specifications, it would end up in being the same deadlock that OpenCL was mired in. If adoption behind SYCL is contingent on other corporations making compromises at their own detriment for the greater goal of achieving portability then we take OpenCL to be the example of an end result from the lack of compromises ...

trinibwoy said:
It's then still an open question of what stack people will use to get the most out of Frontier and El Capitan. ROCm seems very raw still. Ironically the "Radeon" in ROCm doesn't really fit any more.

You won't be thrilled with the the answer but developers will have to use the ROCm stack regardless because it's the most production ready option on AMD HW. Mesa's clover project isn't functional yet. Others can try to make their own stack by looking at ROCm itself but that's far from ideal since public documentation is bad and it'll be hard to follow the code without being a former AMD employee so they may still have to do some reverse engineering despite being an open source project. ROCm is the only one left standing as being viable because it's a project that AMD officially supports and it's a part of their long term corporate responsibility as well so if they don't ditch it by the end of this decade then ROCm will eventually reach maturity ... (ROCm was only made public a little over 5 years ago with the initial release)

Whether developers will want to maintain compatibility with multiple compute stacks is another problem altogether but given the politics they'll have no choice but to bite the bullet if they want to expand their customer base ...

xpea · Nov 10, 2021

nearly every week a government supercompurter is installed with A100 around the world...
Nov 1st, Texas Advanced Computing Center (TACC):
https://www.hpcwire.com/2021/11/01/tacc-unveils-lonestar6-supercomputer/
Oct 18th, UAE National Center for meteorology:
https://www.hpcwire.com/off-the-wir...ecasting-with-new-supercomputer-built-by-hpe/
Sept 28th, Department of Energy’s National Nuclear Security Administration Tri-Lab CTS-2
https://www.hpcwire.com/2021/09/28/nnsa-selects-dell-for-40m-cts-2-commodity-computing-contract/
Sept 16th, Queen Máxima of the Netherlands
https://www.hpcwire.com/off-the-wir...s-inagurates-supercomputer-for-dutch-science/

and so on and so on...

Bondrewd · Nov 10, 2021

trinibwoy said:
But stranger things have happened.

Yeah but is the Summit aka a pretty balanced system replacement so ain't no way it actually sucks™ at real sciences.

Where's the NV exascale?

Even Intel won (effectively) 2 systems.

troyan · Nov 10, 2021

DavidGraham said:
No more than 2.5X ahead, despite being theoretically almost 5X faster. Some benches don't even advance beyond the 1.6X margin.

AMD's has shown even a low 1.4 improvement over A100. One GCD is nearly as big as GA100 (*) while offering barely better FP64 performance with less cache, on chip bandwidth and off chip interconnection.

I'm curios about the PCIe version. Lets see what AMD can do with 300W.

*

https://twitter.com/i/web/status/1458132215221702662

AMD CDNA Discussion Thread

OlegSH

Bondrewd

trinibwoy

Meh

BRiT

(>• •)>⌐■-■ (⌐■-■)

Granath

Esrever

Leoneazzurro5

xpea

trinibwoy

Meh

Bondrewd

pharma

DavidGraham

Bondrewd

DavidGraham

trinibwoy

Meh

pharma

Lurkmass

xpea

Bondrewd

troyan