AMD CDNA Discussion Thread

FP64 will always be ineffcient. That is not a focus problem it is reality. nVidia solved this with their TensorCores. That is the reason why GA100 delivers 2,3x more performance over GV100 within the same power consumption.
Believing that CNDA2 will be 5x or 6x more effcient than CDNA1 for HPC workload is ridiculous. MI100 only delivers ~7 TFLOPs within 300W. A100 as 250W PCIe is around 12,8TFLOPs with TensorCores:
MI100: https://www.delltechnologies.com/en-us/blog/finer-floating-points-of-accelerating-hpc/
A100: https://infohub.delltechnologies.co...edge-r7525-servers-with-nvidia-a100-gpgpus-1/

All this talk about AMDs 5nm products is so far away from reality. Apples A14 is only 40% more effcient than A12 on 7nm.
CDNA2 does supposedly Full rate FP64 and packed FP32 (so some FP32 can run at twice the speed, but not all) and doubles the CU count compared to MI100 (due both chiplets having 128 CU like MI100).
AMD also confirmed 128 GB of HBM2e already.
 
I think those are both important points.

A large scale purchase by the US Government is significant. There's no denying that. We've seen other governments follow the US (e.g. Australia) when it comes to DC procurement.
No one denies the importance of these deals. As I said, it's well done by AMD. But for now they only appear as one time politically-driven opportunity. You need more than that to flip the market. The proof is that one year later, US government will put online a Grace-Hopper supercomputer in Los Alamos lab:
https://www.lanl.gov/discover/news-release-archive/2021/April/0412-nvidia.php
And 3 more government deals in the USA for Grace-Hopper supercomputers are nearly closed (announcement soon). Same for CSCS in Switzerland that won't use AMD but Grace-Hopper:
https://www.cscs.ch/science/compute...orlds-most-powerful-ai-capable-supercomputer/
Why go Nvidia in 2023 if AMD will rule it all as the one liner is suggesting ? The answer is simple. Nvidia will still be highly competitive and CUDA won't die soon.

But, even here at Oxford, Nvidia gives out Titan cards like free candy. They're making a big push in academia to keep CUDA relevant. But if government and industry start to move away from CUDA, then academia will follow...
Yes Nvidia is pushing hard in academia and it works as numerous startups are betting on CUDA. For AMD to succeed, great hardware is not enough. You must provide a commercial path after academia, otherwise no one will waste their time on unsupported hardware and/or without a widely accepted ecosystem. I said it many times, NVIDIA is a software company first and AMD must quadruple their software effort to get a chance of changing the market...
 
Bait again; quoting GEMM numbers again (dawg they explicitly banned GEMM acc for Top500 HPL, see Perlmutter going /2 in rmax).

Is this circlestrafing or what
 
unknown.png

>120PF target went POOF.
The GPU partition aka where HPL bang-bang comes from is here and it's only 90PF.
 
Last edited:
Perlmutter Debuts in the Top 5 of the Top500 (nersc.gov)
June 29, 2021
In addition to the Top500 64.6 Pflop/s achievement in this latest round, Perlmutter recorded 1.91 HPCG-petaflops in the HPCG benchmark, earning it the #3 spot on that list; and a power efficiency of 25.55 gigaflops/watt in the Green500, which earned it the #6 spot on that list. It is also notable that these measurements were run using containers and NERSC's Shifter container runtime. Containers enabled various combinations of libraries and builds to be rapidly tested.

“We are pleased to see that Perlmutter is the one system in the top 5 of the Top500 list that is also in the top 10 of the Green500,” said Jay Srinivasan, the NERSC-9 (Perlmutter) project director at NERSC. “The confluence of high performance and power efficiency is a notable achievement.”
 
unknown.png

>120PF target went POOF.
The GPU partition aka where HPL bang-bang comes from is here and it's only 90PF.

And how is it halved going to RPeak to 90?
How is A100 with 6144 x 9.7 TFLOPS FP64 and 7763 with 1536 x 2.5-ish TFLOPS with a combined theoretical peak of 63.4 PFLOPS reaching an RMax of 64.6 PFLOPS?

How, if not including Tensor math? I don't get it.

edit:
And how did your source not know about this half a year after A100's launch?
 
Last edited:
And how is it halved going to RPeak to 90?
Cuz the planned one was close to Summit. Duh.
How is A100 with 6144 x 9.7 TFLOPS FP64 and 7763 with 1536 x 2.5-ish TFLOPS with a combined theoretical peak of 63.4 PFLOPS reaching an RMax of 64.6 PFLOPS?
List parts aren't the HPC ones here.
Gotta pamp some watts in.

You see, Rpeak for 400W A100 at 6k GPUs alone is >120PF.
 
Last edited:
Cuz the planned one was close to Summit. Duh.
Makes total sense to lowball to ">120" when I really mean >180. While others are down to a decimal digit... *doh*
List parts aren't the HPC ones here.
Gotta pamp some watts in.
LOL - still it's RPeak vs. RMax. Even if i give around 20% more power/more TFLOPs just for shits and giggles, this is a ridiculously high efficiency.

Speaking of which: Why do they say on your sheet it's 6 MW when on it's RMax-run it was only 2.5? Axed some nodes?
 
Major hopium it was, yeah.
Just like Aurora.
Nah, that'd be if they wrote >180 TFLOPS. "<120" is just lowballing.

YES.
A100 is unironically more efficient when you pump more watts into it.

In large scale clusters - of course. Losses through networking diminish.
But I'm talking Blue-Gene league of efficiency of 85% RMax of RPeak. Doesn't work that way except for maybe very small clusters.

Yep, CPU ones aren't here.
Nor they're good at HPL pumping anyway.
Altogether around 19.2 TFLOPS RPeak - yes, not much compared to Accelerators, but 20% of the whole thing nevertheless.

edit:
So, to conclude: I don't see, why you should not use your compute resources for HPL. I don't see any proof, whether or not Top500 has explicitly banned GEMM Engines from their ranking. Can we please get back to the topic of CDNA/CDNA2ß
.
 
Oh come on it's the same slide that lists Aurora power for <60MW.
We're not debating Aurora. Or did you just discredit your own source?
Ever occured to you, that this slide of yours might be incorrect in more than one places?

You know, I'm asking because your slide says it's dated November 20th, 2020, right?
And then there's this PDF here, dated June 2020, 5 months earlier and just after A100 launch:
https://www.energy.gov/sites/default/files/2020/06/f75/fy-2021-sc-ascr-cong-budget.pdf

Yeah but the GPU numbers are still off and we're still under target.
Funnily enough, it says on p.23:
"...and begin operations of the 75 petaflop NERSC-9 system, named Perlmutter after LBNL Nobel Laureate Saul Perlmutter."
 
Last edited:
Yeah, but the cards started already shipping to customers in Q2 launched or not
For revenue or for bring-up? I'm asking because, you know, Intels 10nm products have been "shipping to customers since 2017" (maybe they used a sailboat for that and had strong headwind). So, there clearly is a difference between shipping (samples for qualifications and bring-up) and shipping (for actual market introduction).

Maybe AMD just replaced SPOCK systems at ORNL with some real ones for now.
 
Last edited:
Back
Top