AMD CDNA Discussion Thread

troyan · Jul 31, 2021

CNDA2 will be like RDNA2: To late, to outdated. HPC is a niche market and will move more and more to mixed precision. GA100 is more than twice as efficient as CDNA1 with FP64 workload and yet you have a guy here claiming AMD will improve efficiency by 6x with CDNA2 over CDNA1. This is such nonsense.

Bondrewd · Jul 31, 2021

Some guys aren't even trying anymore.
Sad!

troyan said:
CNDA2 will be like RDNA2: To late, to outdated. HPC is a niche market and will move more and more to mixed precision. GA100 is more than twice as efficient as CDNA1 with FP64 workload and yet you have a guy here claiming AMD will improve efficiency by 6x with CDNA2 over CDNA1. This is such nonsense.

wow primordial bait

Samwell · Jul 31, 2021

CDNA2 surely isn't outdated. All we heard so far it's a true HPC beast. But i agree, that HPC is a small market. Nvidia clearly decided to focus on the DeapLearning market,because it's probably 10x the size of the HPC market. AMD will win in HPC it seems, but it will be interesting, what AMDs next steps will be to gain traction in the DL market.

Bondrewd · Jul 31, 2021

Samwell said:
But i agree, that HPC is a small market

It's also how you enter the accelerator game.
NV did the same with Titan anyway.

Samwell said:
All we heard so far it's a true HPC beast

In a hilarious twist of fate it even hit the DOE's 5yo moonshot 20MW per EF target.

Samwell said:
what AMDs next steps will be to gain traction in the DL market.

Delivering on their long-term roadmap while introducing cool system-level innovation there.

troyan · Jul 31, 2021

Samwell said:
CDNA2 surely isn't outdated. All we heard so far it's a true HPC beast. But i agree, that HPC is a small market. Nvidia clearly decided to focus on the DeapLearning market,because it's probably 10x the size of the HPC market. AMD will win in HPC it seems, but it will be interesting, what AMDs next steps will be to gain traction in the DL market.

How do you know that? Competition is not Ampere. And yet A100 delivers 19,5 TFLOPs FP64 performance within 300W with 80GB vram und nearly 2TB/s bandwidth. GA100 PCIe is more than twice as efficient as M100.
FP64 is ineffcient from a calculcation standpoint. Yet you have a guy here saying that CDNA2 will deliver more than twice the performance of MI100 FP32 numbers with FP64. For that this product has to improve efficiency by more than 4x.

Bondrewd · Jul 31, 2021

Another bait; lame.
Even quoted GEMM numbers that got whacked even outta Top500.

Okay kids let's play a game.
Why do NV dudes (both orgs) at large think super-harsh DC comp is coming and that even breathing will be miserable and you don't?
You'd think you have the same mindsets of pro everything green and good and against everything bad and red, but alas!

xpea · Jul 31, 2021

Bondrewd said:
It's SYCL you *insult removed*

lol, SYCL really?
SYCL official forum: https://community.khronos.org/c/sycl/57
Not even a full page of topics !!!
Official website: https://www.khronos.org/sycl/ with last update from April... 2020 !!!
And you seriously believe it will take over CUDA ?
Come back on earth, your are clearly not living on our beautiful blue planet

Bondrewd said:
They're just barely programmable matrix engines; all stuff relevant is abstracted away.

another big lol. It's not about the hardware, it's about the software that enables maximum hardware acceleration on every framework and every API. Moreover, CUDA provides countless libraries, tools, Integrations, and source code. Have a look :
https://developer.nvidia.com/cuda-zone
Then go to AMD ROCm or SYCL (lol) to understand the infinite difference

Bondrewd said:
Oh jeez Codeplay is literally writing a Level Zero backend for gfx9/you name it.

Available in...2025... as an alpha version that will land abandoned on github like many other AMD projects... Thanks but no, it doesn't count.

Bondrewd said:
The opposite.
HGX ships first to super 8, then DGX, then ODM HGX units.

Nah, DGX is always the first as the internal reference platform. HGX comes after because need to take into account all the talks with the Hyperscalers.

Bondrewd · Jul 31, 2021

xpea said:
lol, SYCL really?

YES.
Intel flavour of, called OneAPI.

xpea said:
And you seriously believe it will take over CUDA ?

It already did as far as DOE is concerned.
SYCL runs on all 3 exascale systems.
No CUDA there at all.
Big daddy G said it all and we all shall obey.

xpea said:
it's about the software that enables maximum hardware acceleration on every framework and every API

Jeez.
I'm talking about matrix engines and he goes full abstract spergout.

xpea said:
Available in...2025... as an alpha version that will land abandoned on github like many other AMD projects

?
It's not even AMD.
Spergout again.

xpea said:
Nah, DGX is always the first as the internal reference platform. HGX comes after because need to take into account all the talks with the Hyperscalers.

Nnnnope.

Samwell · Jul 31, 2021

troyan said:
How do you know that? Competition is not Ampere. And yet A100 delivers 19,5 TFLOPs FP64 performance within 300W with 80GB vram und nearly 2TB/s bandwidth. GA100 PCIe is more than twice as efficient as M100.
FP64 is ineffcient from a calculcation standpoint. Yet you have a guy here saying that CDNA2 will deliver more than twice the performance of MI100 FP32 numbers with FP64. For that this product has to improve efficiency by more than 4x.

Of course Ampere is competition. MI200 is starting to ship now, Hopper should be mid 2022. CDNA3? Mid 2023? It seems they are one differenet shedules, but nevertheless competing against each other.
FP64 is inefficient, but if you fully design for it with full speed FP64. It can be possible. It's just a matter of focus, A100 is more of a DL accelerator with FP64 attached. Mi200 is the other way around.

Bondrewd said:
Okay kids let's play a game.
Why do NV dudes (both orgs) at large think super-harsh DC comp is coming and that even breathing will be miserable and you don't?
You'd think you have the same mindsets of pro everything green and good and against everything bad and red, but alas!

Everyone knows competition is coming. There's Ponte Vecchio, AMD, Google with their TPU, different start ups. DC wasn't so competitive in many years.

Bondrewd · Jul 31, 2021

Samwell said:
There's Ponte Vecchio

Too little too late.

Samwell said:
AMD, Google with their TPU

Yes.

Samwell said:
different start ups.

Those are a cointoss.
Intel killed Nervana and Habana is now in limbo (and their s/w sucks even more).
Graphcore is a disappointment in the world of model bloat.
Groq etc etc are MIA yet.
At least Cerebras are coolio.

Samwell said:
CDNA3? Mid 2023?

Earlier.
AMD cranks 6Q; i.e. MI300 prod kickoff should be somewhere Q4'22.

xpea · Jul 31, 2021

Bondrewd said:
YES.
Intel flavour of, called OneAPI.

It already did as far as DOE is concerned.
SYCL runs on all 3 exascale systems.
No CUDA there at all.
Big daddy G said it all and we all shall obey.

Being chosen by ONE government of ONE country for ONE department of computing doesn't make it a standard. Their is a huge difference between a political decision and a commercial success. Hundreds of universities teach CUDA and thousand of companies depend on CUDA ecosystem. NOW. That's hard fact and current reality. Not the dream of an alternative future when you are living...

Bondrewd said:
Nnnnope.

How many times you repeat doesn't make it right. You are wrong. DGX is first. Stop to spread FUD

Bondrewd · Jul 31, 2021

xpea said:
Being chosen by ONE government of ONE country for ONE department of computing doesn't make it a standard

Of course it does.
That's how CUDA became relevant!
Titan. Titan made the formerly toy real.
(ironically the same lab even)

xpea said:
Not the dream of an alternative future when you are living...

Too bad!
NV thinks much the same things as I do.

xpea said:
DGX is first.

Nope.
Please talk to AMZN ppl.

xpea · Jul 31, 2021

Bondrewd said:
Of course it does.
That's how CUDA became relevant!
Titan. Titan made the formerly toy real.

FUD FUD FUD
CUDA was first available on 2007, 5 years before Titan and already downloaded million times before the supercomputer came online...

Bondrewd said:
Too bad!
NV thinks much the same things as I do.

FUD again. The people I was working with at Nvidia doesn't believe it one second.

Bondrewd said:
Nope.
Please talk to AMZN ppl.

I don't need to talk to second hand AMZN. I talk directly to relevant people

Bondrewd · Jul 31, 2021

xpea said:
CUDA was first available on 2007, 5 years before Titan and already downloaded million times before the supercomputer came online...

Not relevant.
DOE is a kingmaker; always was and always will be.
CUDA wasn't real until GK110 and Titan.
Just the way things are.

xpea said:
The people I was working with at Nvidia doesn't believe it one second.

You should consult more people then.
Bretty senior too.
Just don't ask any codenames.

xpea said:
I don't need to talk to AMZN.

But you should.
They get the boxes first and to their spec.

xpea · Jul 31, 2021

Bondrewd said:
But you should.
They get the boxes first and to their spec.

Oh right, sorry, I forgot that Nvidia brings the first silicon to amazon and power it up in front of Bezos for the first time... [/sarcasm]
well that's enough for me, it's a waste of time

Bondrewd · Jul 31, 2021

xpea said:
Oh right, sorry, I forgot that Nvidia brings the first silicon to amazon and power it up in front of Bezos for the first time...

No jokes.

Wesker · Jul 31, 2021

xpea said:
FUD FUD FUD
CUDA was first available on 2007, 5 years before Titan and already downloaded million times before the supercomputer came online...

I don't have as much information on your other points, but I just wanted to say that Bondrewd is right about CUDA gaining traction around the time of Titan. Not sure if Titan was the cause (happy to read an article or some stats on this point), but CUDA uptick was correlated with the launch and rollout of Titan, i.e., CUDA was considered nothing but fringe until about the mid/early 2010s.

Wesker · Jul 31, 2021

xpea said:
Being chosen by ONE government of ONE country for ONE department of computing doesn't make it a standard. Their is a huge difference between a political decision and a commercial success. Hundreds of universities teach CUDA and thousand of companies depend on CUDA ecosystem. NOW. That's hard fact and current reality. Not the dream of an alternative future when you are living...

I think those are both important points.

A large scale purchase by the US Government is significant. There's no denying that. We've seen other governments follow the US (e.g. Australia) when it comes to DC procurement.

But, even here at Oxford, Nvidia gives out Titan cards like free candy. They're making a big push in academia to keep CUDA relevant. But if government and industry start to move away from CUDA, then academia will follow...

troyan · Jul 31, 2021

Samwell said:
Of course Ampere is competition. MI200 is starting to ship now, Hopper should be mid 2022. CDNA3? Mid 2023? It seems they are one differenet shedules, but nevertheless competing against each other.
FP64 is inefficient, but if you fully design for it with full speed FP64. It can be possible. It's just a matter of focus, A100 is more of a DL accelerator with FP64 attached. Mi200 is the other way around.

FP64 will always be ineffcient. That is not a focus problem it is reality. nVidia solved this with their TensorCores. That is the reason why GA100 delivers 2,3x more performance over GV100 within the same power consumption.
Believing that CNDA2 will be 5x or 6x more effcient than CDNA1 for HPC workload is ridiculous. MI100 only delivers ~7 TFLOPs within 300W. A100 as 250W PCIe is around 12,8TFLOPs with TensorCores:
MI100: https://www.delltechnologies.com/en-us/blog/finer-floating-points-of-accelerating-hpc/
A100: https://infohub.delltechnologies.co...edge-r7525-servers-with-nvidia-a100-gpgpus-1/

All this talk about AMDs 5nm products is so far away from reality. Apples A14 is only 40% more effcient than A12 on 7nm.

Bondrewd · Jul 31, 2021

Wesker said:
Not sure if Titan was the cause

It was.
A big trial by fire for formerly hobbyist NV stuff now running real sciences!
Was the first GK110 shipment too iirc.

troyan said:
That is the reason why GA100 delivers 2,3x more performance over GV100 within the same power consumption.

lol Top500 lmao

troyan said:
MI100 only delivers ~7 TFLOPs within 300W

Uh.
11.5TF DPFP.
The green voices in your head need to check IHV datasheets.

Oh wait literally in the article you've quoted.

The world’s fastest HPC accelerator, with up to 11.5 TFLOPs peak double precision (FP64) performance¹

Cringe.

troyan said:
All this talk about AMDs 5nm products is so far away from reality

MI200 is N7p.