AMD CDNA: MI300 & MI400 (Analysis, Speculation and Rumors in 2024)

So there is a possibility of “CDNA Next” turning out to be the so-called “UDNA 6” with dates seemingly lining up. Let’s see if the upcoming event will reflect that.

That is certainly a possibility... or it could mean that they are just now starting to flesh out UDNA, adding it to their current roadmap, and it is still more than 4-5years out. If that is the case, that would put UDNA sometime in +2029, after CDNA Next and its derivatives. UDNA5 would be the first in his scenario, since he was talking about "planning" 3 generations for backward and forward compatibility, RDNA5/6/7 and UDNA5/6/7.

The way Huynh phrased that "backward/forward compatibility" answer made it seem like they have been working on adding that to RDNA5/6/7 and UDNA is after that. The other clue is when he was asked about the "when" for UDNA, he gave this answer- "We haven’t disclosed that yet. It’s a strategy. (...) They(devs) actually wish we did it sooner, but I can't change the engine when a plane’s in the air. I have to find the right way to setpoint that so I don’t break things." CDNA Next showing up on the roadmap with 2026 makes it seem "disclosed" which could mean CDNA Next isn't UDNA. Tom's Hardware article with Huyng's interview

Really just depends on when they finalized this "strategy" and started planning the "backwards compatibility" into their architectures.
 
Last edited:
Really just depends on when they finalized this "strategy" and started planning the "backwards compatibility" into their architectures.

While they have started teasing this only now, I am more in the camp “they wrapped up the planning and architectural designs for 2026 IPs” as the reason why they start talking about it.

They have had 5 years of insights from RDNA and CDNA products out in the wild. These ecosystem feedbacks and observations are not a random surge out of nowhere in this past few weeks.

Also given the CDNA-RDNA run has close to 5 years since first product launch by now, it is not a total surprise that the next multi-year architectural roadmap (incl. high level architectural features & big bets) is due or done by now.
 
Last edited:
This is a very detailed analysis between the H100/200 and the MI300X. The sad thing for AMD is that even after a year of the MI300X launch, they still haven't gotten the stability to get 100% performance out of the MI300X.


Some quotes from the article:
A few days ago, after we informed both that we had confirmed an article publication date of December 20th, AMD requested that we delay publication to include results based on a beta WIP development build on an AMD developer’s branch. All of our benchmarking on Nvidia was conducted on publicly available stable release builds. In the spirit of transparency and fairness, we include these results as well as updated testing harness results on as the original November 25th deadline image and the latest publicly available software. However, we believe that the correct way to interpret the results is to look at the performance of the public stable release of AMD/Nvidia software.
Below is AMD’s December 21st development build docker image. As you can see, it uses a number of non stable devlopment branches for dependencies such as hipBLASLt, AOTriton, ROCm Attention and installs everything including PyTorch from source code, taking upwards of 5 hours to build. These versions of the dependencies haven’t even been merged into AMD’s own main branch yet. 99.9% of users will not be installing PyTorch from source code and all of its dependencies from source code on development branches but will instead use the public stable PyPi PyTorch.
AMD’s December 21st Dev build is on a hanging development branch. That means it is a branch that has not been fully QA’ed and is at use only at a risk branch. There are many concerns about the validity of the results from using a development build and branches and building from source code, as most users are not doing this in real life. Most users will be installing AMD/Nvidia PyTorch from PyPI stable release mostly so we recommend readers keep this in mend when analyzing these results.
 
This bit here is very very important.

The only reason we have been able to get AMD performance within 75% of H100/H200 performance is because we have been supported by multiple teams at AMD in fixing numerous AMD software bugs.

To get AMD to a usable state with somewhat reasonable performance, a giant ~60 command Dockerfile that builds dependencies from source, hand crafted by an AMD principal engineer, was specifically provided for us, since the Pytorch Nightly and public PyTorch AMD images functioned poorly and had version differences.

This docker image requires ~5 hours to build from source and installs dependencies and sub-dependencies (hipBLASLt, Triton, PyTorch, TransformerEngine), a huge difference compared to Nvidia, which offers a pre-built, out of the box experience and takes but a single line of code. Most users do not build Pytorch, hipBLASLt from source code but instead use the stable release.

This echos what I have been saying about that old ChipsandCheese article, they were in touch with many AMD contacts to deliver their questionable, not based on reality results, they never contacted any NVIDIA rep to validate their results, they also used very slow libraries on NVIDIA, when faced with such criticism the editors shrugged it off and never cared to a dress it.
 
AMD's lackluster software would explain why NVIDIA is at +90% marketshare for AI hardware.

That article was horricfic reading, it sounded like they were describing a start-up, not a long running company.
 
Back
Top