AMD CDNA Discussion Thread

Also, on contrast to NVIDIA's A100, this isn't primarily a matrix/AI GPU, but rather is a vector GPU primarily. As a result, AI enhancements are very modest compared to A100 even when combining the two dies together .. however, vector throughput is through the roof. AMD and NVIDIA is diverging here, AMD is still not fighting NVIDIA in the AI market, rather focusing solely on the HPC market.
^^THIS
I suspect 2 years old A100 still beats MI200 in large AI/ML workloads where interconnect performance is the bottleneck. As I said previously, AMD saw a business opportunity in the traditional (and dying) HPC64 market with government exascale race and before Hopper availability. They executed well. Kudos to Lisa Su


Power consumption is 560W and is liquid cooled, which is curious considering the 6nm process.
It simply shows its CGN roots and the efficiency related to this old architecture. It's much more power efficient to use ML, tensors and lower precision to solve large scientific problems. Even the traditional FP64 workloads, like weather simulation, are migrating to ML. FP64 is now a niche and AMD must move quickly to a new arch. Maybe MI300...

On a final note, am I the only one disappointed by MI200 ? We all know that it was on a tight schedule to win the exascale race, but still, except the packaging (not even proprietary to AMD, equivalent inFO-L is available at TSMC), MI200 brings nothing new. 2 years old A100 is more feature packed. No sparsity ! Few and slow interconnect links and so on... in fact, it has huge flaws like we can see in AMD promoted typical HPC 4+1 (GPU+CPU) topology, where not even all GDC are linked ! From 3.2TB/s claimed to a mere 100GB bi-directional will look ugly in real world performance with large dataset... It's no surprise that all AMD benchmarks are with a single MI250X vs a single A100. I guess Nvidia will fire back soon to show how A100 scaling beats MI250X in bandwidth limited scenario. Maybe even something new in few hours at GTC 2021...

Edit: typo
 
Last edited:
Best feature that they brought back to the MI200 series ever since the stillborn HSA project is the hardware accelerated coherent unified memory interop with x86 CPUs. There are a lot of high-end compute systems where GPU acceleration wasn't feasible either because many customers didn't want to make compromises on having too many invasive codebase changes, sacrificing performance targets in other parts of the system, or were often facing high CPU-GPU communication overhead ...

Programming for another architecture such as IBM Power just to take advantage of NVLink wasn't an attractive concept and sometimes it meant a regression in CPU performance as well since IBM didn't have aggressive release schedules. In other cases opting to use GPU acceleration with PCIE wasn't ideal since PCIE became the bottleneck ...
 
In other cases opting to use GPU acceleration with PCIE wasn't ideal since PCIE became the bottleneck ...
In these large HPC/ML systems, it's all about time to market and bottleneck when scaling. For me, MI200 is a regression where each GDC access its HBM pool with 1.6TB/s bandwidth and goes out of die at 50GB/s... 32 times slower...
 
In these large HPC/ML systems, it's all about time to market and bottleneck when scaling. For me, MI200 is a regression where each GDC access its HBM pool with 1.6TB/s bandwidth and goes out of die at 50GB/s... 32 times slower...

For many systems, I can see the MI250X being the no compromise option for those that didn't use GPU acceleration up until now. The alternatives meanwhile did have compromises like requiring a software rewrite (NVLink), changing to a system with lower CPU perf (IBM Power), or low interconnect perf (PCIE) ...

MI250X with the highest end server x86 CPUs today is the most ideal solution for heterogenous compute and there's no platform that comes close to it's capabilities outside of outdated IBM Power9/Nvidia Volta systems. High CPU perf and coherent CPU-GPU interconnect perf is paramount to a heterogenous system more so than just pure GPU perf alone ...
 
For many systems, I can see the MI250X being the no compromise option for those that didn't use GPU acceleration up until now. The alternatives meanwhile did have compromises like requiring a software rewrite (NVLink), changing to a system with lower CPU perf (IBM Power), or low interconnect perf (PCIE) ...

MI250X with the highest end server x86 CPUs today is the most ideal solution for heterogenous compute and there's no platform that comes close to it's capabilities outside of outdated IBM Power9/Nvidia Volta systems. High CPU perf and coherent CPU-GPU interconnect perf is paramount to a heterogenous system more so than just pure GPU perf alone ...
You talk about software rewrite but CUDA is the standard for parallel workloads and it has been for years (NVIDIA GTC conference starting today has 200k registrations and CUDA has more than 3 million registered developers). Up to now, AMD is a no show, no go. So let's get back to reality, shall we ?

To be successful, it's not only a question of hardware capability but more important, a question of software tools, APIs, documentation, training, seminars, addressable market and so on. AMD current software state is a misery land that nobody wants to walk through. An uncomplete mess that is throw at the open source community, praying that someone someday will do the job for AMD. Basically a recipe for failure.
Just as a reminder, it took AMD 2 years and half to (barely) support Navi in ROCm via ROCr OpenCL runtime path. 2 YEARS AND HALF FOR GOD'S SAKE :no:
 
You talk about software rewrite but CUDA is the standard for parallel workloads and it has been for years (NVIDIA GTC conference starting today has 200k registrations and CUDA has more than 3 million registered developers). Up to now, AMD is a no show, no go. So let's get back to reality, shall we ?

Considering how much more prolific the incumbency with x86 software is compared to CUDA software, I'd be willing think that the minority that's already using CUDA is willing to throw in the towel in favour of more official solutions like ROCm or oneAPI if the ever changing paradigm of graphics programming of moving to new APIs serves as an example ...

To be successful, it's not only a question of hardware capability but more important, a question of software tools, APIs, documentation, training, seminars, addressable market and so on. AMD current software state is a misery land that nobody wants to walk through. An uncomplete mess that is throw at the open source community, praying that someone someday will do the job for AMD. Basically a recipe for failure.
Just as a reminder, it took AMD 2 years and half to (barely) support Navi in ROCm via ROCr OpenCL runtime path. 2 YEARS AND HALF FOR GOD'S SAKE :no:

Well I guess AMD lucked out because most potential customers didn't care about GPU acceleration in the past so ROCm just like CUDA is equally of no value to them currently!


The above illustrates the concept behind why the MI250X is a no compromise heterogeneous system ...
 
Considering how much more prolific the incumbency with x86 software is compared to CUDA software, I'd be willing think that the minority that's already using CUDA is willing to throw in the towel in favour of more official solutions like ROCm or oneAPI if the ever changing paradigm of graphics programming of moving to new APIs serves as an example ...
Please read carefully. I said CUDA is the standard for parallel computing. ROCm and OneAPI are not "official", they are currently... nothing

Well I guess AMD lucked out because most potential customers didn't care about GPU acceleration in the past so ROCm just like CUDA is equally of no value to them currently!
So Nvidia built a more than 100% Y/Y growing business in datacenter with revenue that will exceed 10 billion dollar this year and nobody cared. Yeah sure :rolleyes:
 
Please read carefully. I said CUDA is the standard for parallel computing. ROCm and OneAPI are not "official", they are currently... nothing

ROCm or oneAPI might as well be the only official APIs since they offer superior interop with x86 CPUs. The few who were using CUDA are more likely to drop it rather than move off of x86 because there's more potential performance to extract with other APIs than changing CPU architectures ...

So Nvidia built a more than 100% Y/Y growing business in datacenter with revenue that will exceed 10 billion dollar this year and nobody cared. Yeah sure :rolleyes:

Where are you getting your numbers ? NV's total DC revenue was slightly less than $3B for FY 2020. They'll barely be able to make it above $6B for FY 2021 if they can. GPU acceleration is still the clear distant second compared to pure x86 systems ...
 
I wonder how can we use 'superior' ML to solve problems where FP64 is a minimum required precision and a single error in Nth decimal place will make the equations unsolvable...
 
HPC exists because there are laws of physics described by equations. And for complex systems it's not possible to solve them by hand.
it's going to change over night ?
 
nympy replacement by cuNum. Well, nvidia can deliver, they understand that software is everything.
AMD in far behind and I wonder if they can ever catch up.
 
nympy replacement by cuNum. Well, nvidia can deliver, they understand that software is everything.
AMD in far behind and I wonder if they can ever catch up.

Off topic but cuNumeric isn't actually a replacement for NumPy but think of it as being a library for automatic NumPy acceleration on mGPU or mult-node systems ...
 
Last edited:
AMD current software state is a misery land that nobody wants to walk through.

That can’t be true otherwise EL Capitan and Frontier wouldn’t exist. ROCm is basically non-existent in the industry at large but maybe folks working on supercomputers don’t care about that. Software is king so they must believe ROCm will mature into something useful one day.

Nvidia on the other hand is out in the cold as the only platform without a coherent CPU to GPU interface. That puts them in a tough spot in the HPC game.
 
Where are you getting your numbers ? NV's total DC revenue was slightly less than $3B for FY 2020. They'll barely be able to make it above $6B for FY 2021 if they can. GPU acceleration is still the clear distant second compared to pure x86 systems ...
2.4B last quarter in datacenter is ~10B yearly without counting any growth. But reality is that this business is growing too fast to use a fix number...
 
That can’t be true otherwise EL Capitan and Frontier wouldn’t exist. ROCm is basically non-existent in the industry at large but maybe folks working on supercomputers don’t care about that. Software is king so they must believe ROCm will mature into something useful one day.
Part of these exascale systems price is $300 million government investment in SYSCL effort... So yeah it proves that ROCm is nearly useless

Nvidia on the other hand is out in the cold as the only platform without a coherent CPU to GPU interface. That puts them in a tough spot in the HPC game.
Grace-Hopper say hello
 
Lets see if Xilinx merger gets through in the next week or two, if it does I bet AMD will slot in new MI accelerator series by rebranding some Xilinx domain specific accelerator.
SmartNIC will be the new IPU as well.
Portfolio will expand overnight
 
Back
Top