AMD CDNA Discussion Thread

CNDA2 will be like RDNA2: To late, to outdated. HPC is a niche market and will move more and more to mixed precision. GA100 is more than twice as efficient as CDNA1 with FP64 workload and yet you have a guy here claiming AMD will improve efficiency by 6x with CDNA2 over CDNA1. This is such nonsense.
 
Some guys aren't even trying anymore.
Sad!
CNDA2 will be like RDNA2: To late, to outdated. HPC is a niche market and will move more and more to mixed precision. GA100 is more than twice as efficient as CDNA1 with FP64 workload and yet you have a guy here claiming AMD will improve efficiency by 6x with CDNA2 over CDNA1. This is such nonsense.
wow primordial bait
 
CDNA2 surely isn't outdated. All we heard so far it's a true HPC beast. But i agree, that HPC is a small market. Nvidia clearly decided to focus on the DeapLearning market,because it's probably 10x the size of the HPC market. AMD will win in HPC it seems, but it will be interesting, what AMDs next steps will be to gain traction in the DL market.
 
But i agree, that HPC is a small market
It's also how you enter the accelerator game.
NV did the same with Titan anyway.
All we heard so far it's a true HPC beast
In a hilarious twist of fate it even hit the DOE's 5yo moonshot 20MW per EF target.
what AMDs next steps will be to gain traction in the DL market.
Delivering on their long-term roadmap while introducing cool system-level innovation there.
 
CDNA2 surely isn't outdated. All we heard so far it's a true HPC beast. But i agree, that HPC is a small market. Nvidia clearly decided to focus on the DeapLearning market,because it's probably 10x the size of the HPC market. AMD will win in HPC it seems, but it will be interesting, what AMDs next steps will be to gain traction in the DL market.

How do you know that? Competition is not Ampere. And yet A100 delivers 19,5 TFLOPs FP64 performance within 300W with 80GB vram und nearly 2TB/s bandwidth. GA100 PCIe is more than twice as efficient as M100.
FP64 is ineffcient from a calculcation standpoint. Yet you have a guy here saying that CDNA2 will deliver more than twice the performance of MI100 FP32 numbers with FP64. For that this product has to improve efficiency by more than 4x.
 
Another bait; lame.
Even quoted GEMM numbers that got whacked even outta Top500.

Okay kids let's play a game.
Why do NV dudes (both orgs) at large think super-harsh DC comp is coming and that even breathing will be miserable and you don't?
You'd think you have the same mindsets of pro everything green and good and against everything bad and red, but alas!
 
It's SYCL you *insult removed*
lol, SYCL really?
SYCL official forum: https://community.khronos.org/c/sycl/57
Not even a full page of topics !!!
Official website: https://www.khronos.org/sycl/ with last update from April... 2020 !!!
And you seriously believe it will take over CUDA ?
Come back on earth, your are clearly not living on our beautiful blue planet

They're just barely programmable matrix engines; all stuff relevant is abstracted away.
another big lol. It's not about the hardware, it's about the software that enables maximum hardware acceleration on every framework and every API. Moreover, CUDA provides countless libraries, tools, Integrations, and source code. Have a look :
https://developer.nvidia.com/cuda-zone
Then go to AMD ROCm or SYCL (lol) to understand the infinite difference

Oh jeez Codeplay is literally writing a Level Zero backend for gfx9/you name it.
Available in...2025... as an alpha version that will land abandoned on github like many other AMD projects... Thanks but no, it doesn't count.

The opposite.
HGX ships first to super 8, then DGX, then ODM HGX units.
Nah, DGX is always the first as the internal reference platform. HGX comes after because need to take into account all the talks with the Hyperscalers.
 
lol, SYCL really?
YES.
Intel flavour of, called OneAPI.
And you seriously believe it will take over CUDA ?
It already did as far as DOE is concerned.
SYCL runs on all 3 exascale systems.
No CUDA there at all.
Big daddy G said it all and we all shall obey.
it's about the software that enables maximum hardware acceleration on every framework and every API
Jeez.
I'm talking about matrix engines and he goes full abstract spergout.
Available in...2025... as an alpha version that will land abandoned on github like many other AMD projects
?
It's not even AMD.
Spergout again.
Nah, DGX is always the first as the internal reference platform. HGX comes after because need to take into account all the talks with the Hyperscalers.
Nnnnope.
 
How do you know that? Competition is not Ampere. And yet A100 delivers 19,5 TFLOPs FP64 performance within 300W with 80GB vram und nearly 2TB/s bandwidth. GA100 PCIe is more than twice as efficient as M100.
FP64 is ineffcient from a calculcation standpoint. Yet you have a guy here saying that CDNA2 will deliver more than twice the performance of MI100 FP32 numbers with FP64. For that this product has to improve efficiency by more than 4x.

Of course Ampere is competition. MI200 is starting to ship now, Hopper should be mid 2022. CDNA3? Mid 2023? It seems they are one differenet shedules, but nevertheless competing against each other.
FP64 is inefficient, but if you fully design for it with full speed FP64. It can be possible. It's just a matter of focus, A100 is more of a DL accelerator with FP64 attached. Mi200 is the other way around.

Okay kids let's play a game.
Why do NV dudes (both orgs) at large think super-harsh DC comp is coming and that even breathing will be miserable and you don't?
You'd think you have the same mindsets of pro everything green and good and against everything bad and red, but alas!

Everyone knows competition is coming. There's Ponte Vecchio, AMD, Google with their TPU, different start ups. DC wasn't so competitive in many years.
 
There's Ponte Vecchio
Too little too late.
AMD, Google with their TPU
Yes.
different start ups.
Those are a cointoss.
Intel killed Nervana and Habana is now in limbo (and their s/w sucks even more).
Graphcore is a disappointment in the world of model bloat.
Groq etc etc are MIA yet.
At least Cerebras are coolio.
CDNA3? Mid 2023?
Earlier.
AMD cranks 6Q; i.e. MI300 prod kickoff should be somewhere Q4'22.
 
YES.
Intel flavour of, called OneAPI.

It already did as far as DOE is concerned.
SYCL runs on all 3 exascale systems.
No CUDA there at all.
Big daddy G said it all and we all shall obey.
Being chosen by ONE government of ONE country for ONE department of computing doesn't make it a standard. Their is a huge difference between a political decision and a commercial success. Hundreds of universities teach CUDA and thousand of companies depend on CUDA ecosystem. NOW. That's hard fact and current reality. Not the dream of an alternative future when you are living...

How many times you repeat doesn't make it right. You are wrong. DGX is first. Stop to spread FUD
 
Being chosen by ONE government of ONE country for ONE department of computing doesn't make it a standard
Of course it does.
That's how CUDA became relevant!
Titan. Titan made the formerly toy real.
(ironically the same lab even)
Not the dream of an alternative future when you are living...
Too bad!
NV thinks much the same things as I do.
DGX is first.
Nope.
Please talk to AMZN ppl.
 
Of course it does.
That's how CUDA became relevant!
Titan. Titan made the formerly toy real.
FUD FUD FUD
CUDA was first available on 2007, 5 years before Titan and already downloaded million times before the supercomputer came online...

Too bad!
NV thinks much the same things as I do.
FUD again. The people I was working with at Nvidia doesn't believe it one second.

Nope.
Please talk to AMZN ppl.
I don't need to talk to second hand AMZN. I talk directly to relevant people
 
CUDA was first available on 2007, 5 years before Titan and already downloaded million times before the supercomputer came online...
Not relevant.
DOE is a kingmaker; always was and always will be.
CUDA wasn't real until GK110 and Titan.
Just the way things are.
The people I was working with at Nvidia doesn't believe it one second.
You should consult more people then.
Bretty senior too.
Just don't ask any codenames.
I don't need to talk to AMZN.
But you should.
They get the boxes first and to their spec.
 
FUD FUD FUD
CUDA was first available on 2007, 5 years before Titan and already downloaded million times before the supercomputer came online...

I don't have as much information on your other points, but I just wanted to say that Bondrewd is right about CUDA gaining traction around the time of Titan. Not sure if Titan was the cause (happy to read an article or some stats on this point), but CUDA uptick was correlated with the launch and rollout of Titan, i.e., CUDA was considered nothing but fringe until about the mid/early 2010s.
 
Being chosen by ONE government of ONE country for ONE department of computing doesn't make it a standard. Their is a huge difference between a political decision and a commercial success. Hundreds of universities teach CUDA and thousand of companies depend on CUDA ecosystem. NOW. That's hard fact and current reality. Not the dream of an alternative future when you are living...

I think those are both important points.

A large scale purchase by the US Government is significant. There's no denying that. We've seen other governments follow the US (e.g. Australia) when it comes to DC procurement.

But, even here at Oxford, Nvidia gives out Titan cards like free candy. They're making a big push in academia to keep CUDA relevant. But if government and industry start to move away from CUDA, then academia will follow...
 
Of course Ampere is competition. MI200 is starting to ship now, Hopper should be mid 2022. CDNA3? Mid 2023? It seems they are one differenet shedules, but nevertheless competing against each other.
FP64 is inefficient, but if you fully design for it with full speed FP64. It can be possible. It's just a matter of focus, A100 is more of a DL accelerator with FP64 attached. Mi200 is the other way around.

FP64 will always be ineffcient. That is not a focus problem it is reality. nVidia solved this with their TensorCores. That is the reason why GA100 delivers 2,3x more performance over GV100 within the same power consumption.
Believing that CNDA2 will be 5x or 6x more effcient than CDNA1 for HPC workload is ridiculous. MI100 only delivers ~7 TFLOPs within 300W. A100 as 250W PCIe is around 12,8TFLOPs with TensorCores:
MI100: https://www.delltechnologies.com/en-us/blog/finer-floating-points-of-accelerating-hpc/
A100: https://infohub.delltechnologies.co...edge-r7525-servers-with-nvidia-a100-gpgpus-1/

All this talk about AMDs 5nm products is so far away from reality. Apples A14 is only 40% more effcient than A12 on 7nm.
 
Not sure if Titan was the cause
It was.
A big trial by fire for formerly hobbyist NV stuff now running real sciences!
Was the first GK110 shipment too iirc.
That is the reason why GA100 delivers 2,3x more performance over GV100 within the same power consumption.
lol Top500 lmao
MI100 only delivers ~7 TFLOPs within 300W
Uh.
11.5TF DPFP.
The green voices in your head need to check IHV datasheets.

Oh wait literally in the article you've quoted.
The world’s fastest HPC accelerator, with up to 11.5 TFLOPs peak double precision (FP64) performance¹
Cringe.
All this talk about AMDs 5nm products is so far away from reality
MI200 is N7p.
 
Back
Top