AMD CDNA: MI300 & MI400 (Analysis, Speculation and Rumors in 2024)

DavidGraham · Jun 26, 2024

trinibwoy said:
Testing AMD’s Giant MI300X

https://chipsandcheese.com/2024/06/25/testing-amds-giant-mi300x/

Chips and Cheese is also comparing the MI300X primarily to the PCIe version of the H100, which is the weakest version of the H100 with the lowest specs

Chips and Cheese also mentions getting specific help from AMD with its testing, but doesn't appear to have received equivalent input from Nvidia, so there could be some bias in the benchmark results

The introduction says, "We would also like to thank Elio from NScale who assisted us with optimizing our LLM runs as well as a few folks from AMD who helped with making sure our results were reproducible on other MI300X systems." No mention is made of any consultation with any Nvidia folks, and that suggests this is more of an AMD-sponsored look at the MI300X

AMD MI300X performance compared with Nvidia H100 — low-level benchmarks testing cache, latency, inference, and more show strong results for a single GPU

MI300X has a very strong architecture, based on testing by Chips and Cheese.

www.tomshardware.com

Deleted member 2197 · Jun 26, 2024

DavidGraham said:
AMD MI300X performance compared with Nvidia H100 — low-level benchmarks testing cache, latency, inference, and more show strong results for a single GPU

MI300X has a very strong architecture, based on testing by Chips and Cheese.

www.tomshardware.com

I got the same feeling that TomsHardware did when I read the C&C article. Similar to the performance hpc tit-tat between AMD and Nvidia a few months ago with optimizations only for one side. I think AMD sponsored a couple more tests earlier this year with AI partners using similar optimizations.

If there was any credibility to these claims why avoid MLPerf or the recent AMD refusal of Tiny Corp using MI300x in MLPerf testing?
Can't blame marketing for these decisions ...

Lurkmass · Jun 27, 2024

BTW, the authors behind that C&C article updated their blog post to refute Tom's Hardware's claim of sponsorship partiality ...

https://twitter.com/x/status/1805992689814602021

What they've misinterpreted as a benign search for more information has now flared up into a storm ...

Granath · Jun 27, 2024

pharma said:
I got the same feeling that TomsHardware did when I read the C&C article. Similar to the performance hpc tit-tat between AMD and Nvidia a few months ago with optimizations only for one side. I think AMD sponsored a couple more tests earlier this year with AI partners using similar optimizations.

If there was any credibility to these claims why avoid MLPerf or the recent AMD refusal of Tiny Corp using MI300x in MLPerf testing?
Can't blame marketing for these decisions ...

So you got wrong feeling. They explained that AMD did not provide them any special optimization. Just result validation.

xpea · Jun 27, 2024

pharma said:
I got the same feeling that TomsHardware did when I read the C&C article. Similar to the performance hpc tit-tat between AMD and Nvidia a few months ago with optimizations only for one side. I think AMD sponsored a couple more tests earlier this year with AI partners using similar optimizations.

If there was any credibility to these claims why avoid MLPerf or the recent AMD refusal of Tiny Corp using MI300x in MLPerf testing?
Can't blame marketing for these decisions ...

The biggest issue I have with this article is that they use the generic vLLM and not TensorRT-LLM or LMDeploy that are much faster on Nvidia accelerators. AMD is clearly behind this article and they are not in a good spot right now with all their recent benchmark shenanigans...

PS: a recent test that compares different LLM inference backends:

Benchmarking LLM Inference Backends

Compare the Llama 3 serving performance with vLLM, LMDeploy, MLC-LLM, TensorRT-LLM, and Hugging Face TGI on BentoCloud.

www.bentoml.com

Deleted member 2197 · Jun 27, 2024

Granath said:
So you got wrong feeling. They explained that AMD did not provide them any special optimization. Just result validation.

It is a bit weird to receive only AMD optimizations from Nscale when they could have provided the same for Nvidia. Granted Nscale is a primary AMD partner and though the do offer access to Nvidia hardware did not offer any optimations for the testing. Could have something to do with increased marketing effort for their huge MI300x purchase earlier this year.

Let's see if C & C provides balanced testing/optimizations with the Nvidia contacts provided to them by TomsHardware.

Granath · Jun 27, 2024

pharma said:
It is a bit weird to receive only AMD optimizations from Nscale when they could have provided the same for Nvidia. Granted Nscale is a primary AMD partner and though the do offer access to Nvidia hardware did not offer any optimations for the testing. Could have something to do with increased marketing effort for their huge MI300x purchase earlier this year.

Let's see if C & C provides balanced testing/optimizations with the Nvidia contacts provided to them by TomsHardware.

But they said that no sw optimizations were provided. Vanilla vLLM for both vendors.

Deleted member 2197 · Jun 27, 2024

Granath said:
But they said that no sw optimizations were provided. Vanilla vLLM for both vendors.

"NScale who assisted us with optimizing our LLM runs as well as a few folks from AMD who helped with making sure our results were reproducible on other MI300X systems."

Also, why would you need AMD engineers to help make sure your results were similar to those already published by AMD?

Granath · Jun 27, 2024

pharma said:
"NScale who assisted us with optimizing our LLM runs as well as a few folks from AMD who helped with making sure our results were reproducible on other MI300X systems."

Also, why would you need AMD engineers to help make sure your results were similar to those already published by AMD?

Guys from AMD only checked results if they were consistent.

Deleted member 2197 · Jun 27, 2024

Granath said:
Guys from AMD only checked results if they were consistent.

Your speculation or from the article? Why the need for the results to be consistent with AMD published numbers?
What happens in the case the results are not consistent? Independent or managed test results?

Granath · Jun 27, 2024

pharma said:
Your speculation or from the article? Why the need for the results to be consistent with AMD published numbers?
What happens in the case the results are not consistent? Independent or managed test results?

That's what C&C explained in comments. Seems that word selection in article was unfortunate. But you know it's a small site. Private run, right?

DavidGraham · Jun 27, 2024

Granath said:
Guys from AMD only checked results if they were consistent.

If they did the same with NVIDIA, they would have been told that their results are not correct because they are using the slowest software stack for NVIDIA GPUs.

Granath · Jun 27, 2024

DavidGraham said:
If they did the same with NVIDIA, they would have been told that their results are not correct because they are using the slowest software stack for NVIDIA GPUs.

True.

Granath · Jun 28, 2024

Related to C&C article: https://www.nscale.com/blog/nscale-...improves-throughput-and-latency-by-up-to-7-2x

Granath · Jul 19, 2024

FP8 Achieved on AMD MI300X | TensorWave | The MI300X Cloud

Learn how TensorWave's support for FP8 on the MI300X can revolutionize model inference. Experience up to 1.6x faster speeds and improved efficiency.

tensorwave.com

Deleted member 2197 · Jul 21, 2024

AMD’s Long And Winding Road To The Hybrid CPU-GPU Instinct MI300A

Back in 2012, when AMD was in the process of backing out of the datacenter CPU business and did not really have its datacenter GPU act together at all,

www.nextplatform.com

del42sa · Aug 27, 2024

AMD Breaks Down Instinct MI300X MCM GPU: Full Chip Packs 320 “CDNA 3” Compute Units, 192 GB HBM3 With 288 GB HBM3e Upgrade This Year

Deleted member 2197 · Sep 11, 2024

AMD To Hold "Advancing AI" Event At October 10: Debut of Instinct MI325X AI Accelerator, 5th Gen EPYC CPUs & Much More

AMD will hold its "Advancing AI 2024" event, marking the debut of the firm's Instinct MI325X AI accelerators & 5th Gen EPYC server CPUs.

wccftech.com

pTmdfx · Sep 11, 2024

Instinct “annual cadence”:
MI300X / CDNA3: 23Q4
MI325X / CDNA3: 24Q4
MI350X / CDNA4: assumed 25Q4, and seemingly a rehash of CDNA3 plus an extra set of matrix instructions
MI400 / “CDNA Next”: assumed 26Q4

While for RDNA’s (lack of) cadence:
RDNA3: 22Q4
RDNA4: rumoured late 2024
RDNA5: rumoured late 2025
UDNA6: eh, logically late 2026 by extrapolation?

So there is a possibility of “CDNA Next” turning out to be the so-called “UDNA 6” with dates seemingly lining up. Let’s see if the upcoming event will reflect that.

LordEC911 · Sep 11, 2024

pTmdfx said:
So there is a possibility of “CDNA Next” turning out to be the so-called “UDNA 6” with dates seemingly lining up. Let’s see if the upcoming event will reflect that.

That is certainly a possibility... or it could mean that they are just now starting to flesh out UDNA, adding it to their current roadmap, and it is still more than 4-5years out. If that is the case, that would put UDNA sometime in +2029, after CDNA Next and its derivatives. UDNA5 would be the first in his scenario, since he was talking about "planning" 3 generations for backward and forward compatibility, RDNA5/6/7 and UDNA5/6/7.

The way Huynh phrased that "backward/forward compatibility" answer made it seem like they have been working on adding that to RDNA5/6/7 and UDNA is after that. The other clue is when he was asked about the "when" for UDNA, he gave this answer- "We haven’t disclosed that yet. It’s a strategy. (...) They(devs) actually wish we did it sooner, but I can't change the engine when a plane’s in the air. I have to find the right way to setpoint that so I don’t break things." CDNA Next showing up on the roadmap with 2026 makes it seem "disclosed" which could mean CDNA Next isn't UDNA. Tom's Hardware article with Huyng's interview

Really just depends on when they finalized this "strategy" and started planning the "backwards compatibility" into their architectures.

AMD CDNA: MI300 & MI400 (Analysis, Speculation and Rumors in 2024)

DavidGraham

Testing AMD’s Giant MI300X

AMD MI300X performance compared with Nvidia H100 — low-level benchmarks testing cache, latency, inference, and more show strong results for a single GPU

Deleted member 2197

Guest

AMD MI300X performance compared with Nvidia H100 — low-level benchmarks testing cache, latency, inference, and more show strong results for a single GPU

Lurkmass

Granath

xpea

Benchmarking LLM Inference Backends

Deleted member 2197

Guest

Granath

Deleted member 2197

Guest

Granath

Deleted member 2197

Guest

Granath

DavidGraham

Granath

Granath

Granath

FP8 Achieved on AMD MI300X | TensorWave | The MI300X Cloud

Deleted member 2197

Guest

AMD’s Long And Winding Road To The Hybrid CPU-GPU Instinct MI300A

del42sa

Deleted member 2197

Guest

AMD To Hold "Advancing AI" Event At October 10: Debut of Instinct MI325X AI Accelerator, 5th Gen EPYC CPUs & Much More

pTmdfx

LordEC911

AMD CDNA: MI300 & MI400 (Analysis, Speculation and Rumors in 2024)

Testing AMD’s Giant MI300X​

Deleted member 2197

Guest

Deleted member 2197

Guest

Deleted member 2197

Guest

Deleted member 2197

Guest

Deleted member 2197

Guest

Deleted member 2197

Guest

Testing AMD’s Giant MI300X