AMD CDNA Discussion Thread

Kaotik · Jul 31, 2021

troyan said:
FP64 will always be ineffcient. That is not a focus problem it is reality. nVidia solved this with their TensorCores. That is the reason why GA100 delivers 2,3x more performance over GV100 within the same power consumption.
Believing that CNDA2 will be 5x or 6x more effcient than CDNA1 for HPC workload is ridiculous. MI100 only delivers ~7 TFLOPs within 300W. A100 as 250W PCIe is around 12,8TFLOPs with TensorCores:
MI100: https://www.delltechnologies.com/en-us/blog/finer-floating-points-of-accelerating-hpc/
A100: https://infohub.delltechnologies.co...edge-r7525-servers-with-nvidia-a100-gpgpus-1/

All this talk about AMDs 5nm products is so far away from reality. Apples A14 is only 40% more effcient than A12 on 7nm.

CDNA2 does supposedly Full rate FP64 and packed FP32 (so some FP32 can run at twice the speed, but not all) and doubles the CU count compared to MI100 (due both chiplets having 128 CU like MI100).
AMD also confirmed 128 GB of HBM2e already.

xpea · Jul 31, 2021

Wesker said:
I think those are both important points.

A large scale purchase by the US Government is significant. There's no denying that. We've seen other governments follow the US (e.g. Australia) when it comes to DC procurement.

No one denies the importance of these deals. As I said, it's well done by AMD. But for now they only appear as one time politically-driven opportunity. You need more than that to flip the market. The proof is that one year later, US government will put online a Grace-Hopper supercomputer in Los Alamos lab:
https://www.lanl.gov/discover/news-release-archive/2021/April/0412-nvidia.php
And 3 more government deals in the USA for Grace-Hopper supercomputers are nearly closed (announcement soon). Same for CSCS in Switzerland that won't use AMD but Grace-Hopper:
https://www.cscs.ch/science/compute...orlds-most-powerful-ai-capable-supercomputer/
Why go Nvidia in 2023 if AMD will rule it all as the one liner is suggesting ? The answer is simple. Nvidia will still be highly competitive and CUDA won't die soon.

Wesker said:
But, even here at Oxford, Nvidia gives out Titan cards like free candy. They're making a big push in academia to keep CUDA relevant. But if government and industry start to move away from CUDA, then academia will follow...

Yes Nvidia is pushing hard in academia and it works as numerous startups are betting on CUDA. For AMD to succeed, great hardware is not enough. You must provide a commercial path after academia, otherwise no one will waste their time on unsupported hardware and/or without a widely accepted ecosystem. I said it many times, NVIDIA is a software company first and AMD must quadruple their software effort to get a chance of changing the market...

Bondrewd · Jul 31, 2021

LARP again, not funny.

xpea said:
Same for CSCS in Switzerland that won't use AMD but Grace-Hopper:

Oh should I count all the MI300 systems or nah yet?
That's gotta be a loooooong laundry list of stuff.

xpea said:
great hardware is not enough

Yeah they make even better one.

troyan · Jul 31, 2021

Dell is really the only one who did benchmarks with MI100. So FP64 sustained is 7.9TFLOPs. That doesnt look like near the 12,6 TFLOPs a 250W A100 PCIe card get:
MI100: https://infohub.delltechnologies.co...r7525-servers-with-the-amd-mi100-accelerator/
A100: https://infohub.delltechnologies.co...-nvidia-a100-nvlink-on-dell-poweredge-xe8545/

Makes A100 ~1.91x more effcient. So CNDA2 will be on par with A100 18 months later.

Bondrewd · Jul 31, 2021

Bait again; quoting GEMM numbers again (dawg they explicitly banned GEMM acc for Top500 HPL, see Perlmutter going /2 in rmax).

Is this circlestrafing or what

CarstenS · Jul 31, 2021

Bondrewd said:
see Perlmutter going /2 in rmax).

How so? Perlmutter Phase 1 is 1536 nodes of 4x A100 and 1x Epyc 7763.

Bondrewd · Jul 31, 2021

CarstenS said:
How so?

>120PF target went POOF.
The GPU partition aka where HPL bang-bang comes from is here and it's only 90PF.

Deleted member 2197 · Jul 31, 2021

Perlmutter Debuts in the Top 5 of the Top500 (nersc.gov)
June 29, 2021

In addition to the Top500 64.6 Pflop/s achievement in this latest round, Perlmutter recorded 1.91 HPCG-petaflops in the HPCG benchmark, earning it the #3 spot on that list; and a power efficiency of 25.55 gigaflops/watt in the Green500, which earned it the #6 spot on that list. It is also notable that these measurements were run using containers and NERSC's Shifter container runtime. Containers enabled various combinations of libraries and builds to be rapidly tested.

“We are pleased to see that Perlmutter is the one system in the top 5 of the Top500 list that is also in the top 10 of the Green500,” said Jay Srinivasan, the NERSC-9 (Perlmutter) project director at NERSC. “The confluence of high performance and power efficiency is a notable achievement.”

CarstenS · Jul 31, 2021

Bondrewd said:
>120PF target went POOF.
The GPU partition aka where HPL bang-bang comes from is here and it's only 90PF.

And how is it halved going to RPeak to 90?
How is A100 with 6144 x 9.7 TFLOPS FP64 and 7763 with 1536 x 2.5-ish TFLOPS with a combined theoretical peak of 63.4 PFLOPS reaching an RMax of 64.6 PFLOPS?

How, if not including Tensor math? I don't get it.

edit:
And how did your source not know about this half a year after A100's launch?

Bondrewd · Jul 31, 2021

CarstenS said:
And how is it halved going to RPeak to 90?

Cuz the planned one was close to Summit. Duh.

CarstenS said:
How is A100 with 6144 x 9.7 TFLOPS FP64 and 7763 with 1536 x 2.5-ish TFLOPS with a combined theoretical peak of 63.4 PFLOPS reaching an RMax of 64.6 PFLOPS?

List parts aren't the HPC ones here.
Gotta pamp some watts in.

You see, Rpeak for 400W A100 at 6k GPUs alone is >120PF.

CarstenS · Jul 31, 2021

Bondrewd said:
Cuz the planned one was close to Summit. Duh.

Makes total sense to lowball to ">120" when I really mean >180. While others are down to a decimal digit... *doh*

Bondrewd said:
List parts aren't the HPC ones here.
Gotta pamp some watts in.

LOL - still it's RPeak vs. RMax. Even if i give around 20% more power/more TFLOPs just for shits and giggles, this is a ridiculously high efficiency.

Speaking of which: Why do they say on your sheet it's 6 MW when on it's RMax-run it was only 2.5? Axed some nodes?

Bondrewd · Jul 31, 2021

CarstenS said:
Makes total sense to lowball to ">120" when I really mean >180

Major hopium it was, yeah.
Just like Aurora.

CarstenS said:
this is a ridiculously high efficiency

YES.
A100 is unironically more efficient when you pump more watts into it.

CarstenS said:
Axed some nodes?

Yep, CPU ones aren't here.
Nor they're good at HPL pumping anyway.

CarstenS · Jul 31, 2021

Bondrewd said:
Major hopium it was, yeah.
Just like Aurora.

Nah, that'd be if they wrote >180 TFLOPS. "<120" is just lowballing.

Bondrewd said:
YES.
A100 is unironically more efficient when you pump more watts into it.

In large scale clusters - of course. Losses through networking diminish.
But I'm talking Blue-Gene league of efficiency of 85% RMax of RPeak. Doesn't work that way except for maybe very small clusters.

Bondrewd said:
Yep, CPU ones aren't here.
Nor they're good at HPL pumping anyway.

Altogether around 19.2 TFLOPS RPeak - yes, not much compared to Accelerators, but 20% of the whole thing nevertheless.

edit:
So, to conclude: I don't see, why you should not use your compute resources for HPL. I don't see any proof, whether or not Top500 has explicitly banned GEMM Engines from their ranking. Can we please get back to the topic of CDNA/CDNA2ß.

Bondrewd · Jul 31, 2021

CarstenS said:
Nah, that'd be if they wrote >180 TFLOPS. "<120" is just lowballing.

Oh come on it's the same slide that lists Aurora power for <60MW.

CarstenS said:
Altogether around 19.2 TFLOPS RPeak - yes, not much compared to Accelerators, but 20% of the whole thing nevertheless.

Yeah but the GPU numbers are still off and we're still under target.

CarstenS · Jul 31, 2021

Bondrewd said:
Oh come on it's the same slide that lists Aurora power for <60MW.

We're not debating Aurora. Or did you just discredit your own source?
Ever occured to you, that this slide of yours might be incorrect in more than one places?

You know, I'm asking because your slide says it's dated November 20th, 2020, right?
And then there's this PDF here, dated June 2020, 5 months earlier and just after A100 launch:
https://www.energy.gov/sites/default/files/2020/06/f75/fy-2021-sc-ascr-cong-budget.pdf

Bondrewd said:
Yeah but the GPU numbers are still off and we're still under target.

Funnily enough, it says on p.23:
"...and begin operations of the 75 petaflop NERSC-9 system, named Perlmutter after LBNL Nobel Laureate Saul Perlmutter."

CarstenS · Aug 6, 2021

I'll just leave this here:

https://twitter.com/x/status/1423559224282521606

https://www.freepatentsonline.com/20210157588.pdf

CarstenS · Aug 7, 2021

CDNA2 launching end of year 2021, officially:

In the 3 month old corporate presentation from april 2021, there was no date listed for CDNA2.

Kaotik · Aug 7, 2021

CarstenS said:
CDNA2 launching end of year 2021, officially:
View attachment 5779

In the 3 month old corporate presentation from april 2021, there was no date listed for CDNA2.

Yeah, but the cards started already shipping to customers in Q2 launched or not

CarstenS · Aug 7, 2021

Kaotik said:
Yeah, but the cards started already shipping to customers in Q2 launched or not

For revenue or for bring-up? I'm asking because, you know, Intels 10nm products have been "shipping to customers since 2017" (maybe they used a sailboat for that and had strong headwind). So, there clearly is a difference between shipping (samples for qualifications and bring-up) and shipping (for actual market introduction).

Maybe AMD just replaced SPOCK systems at ORNL with some real ones for now.

Bondrewd · Aug 7, 2021

CarstenS said:
For revenue or for bring-up?

The former.

CarstenS said:
Maybe AMD just replaced SPOCK systems at ORNL with some real ones for now.

Ehhhh yea but also no...
Also can be Polaris shipment.
Dunno.

CarstenS said:
CDNA2 launching end of year 2021, officially:

SC'21 to be pricise.
I may even be there in-person just for kicks.

AMD CDNA Discussion Thread

Kaotik

Drunk Member

xpea

Bondrewd

troyan

Bondrewd

CarstenS

Moderator

Bondrewd

Deleted member 2197

Guest

CarstenS

Moderator

Bondrewd

CarstenS

Moderator

Bondrewd

CarstenS

Moderator

Bondrewd

CarstenS

Moderator

CarstenS

Moderator

CarstenS

Moderator

Kaotik

Drunk Member

CarstenS

Moderator

Bondrewd