AMD Execution Thread [2024]

This channel looks interesting. Haven't fully watched and can't attest to it's accuracy, but I see der8auer in the comments praising it. That at least gives me some feeling that it's good.

 
Unlike desktop Zen 5, server Zens got new IO-die supporting up to 16 Zen 5 or 12 Zen 5c CCDs
 
Unlike desktop Zen 5, server Zens got new IO-die supporting up to 16 Zen 5 or 12 Zen 5c CCDs

And DDR5-6400, along with a few other features. I think it was necessary since they went +50% cores. While desktop Zen 5 didn't increase the core count, it is disappointing that AMD made no change to the desktop IOD at all to enable higher Infinity Fabric and memory speeds.

MI325X should do well with it's memory capacity. And with the roadmap they're targeting, AMD is clearly putting a lot of resources towards AI GPUs now. From what I've read, that department has the highest resources and engineers allocated to it at the moment (Not a surprise).
 
And DDR5-6400, along with a few other features. I think it was necessary since they went +50% cores. While desktop Zen 5 didn't increase the core count, it is disappointing that AMD made no change to the desktop IOD at all to enable higher Infinity Fabric and memory speeds.
While the IO-die is the same, they did upgrade the official memory support from DDR5-5200 to DDR5-5600
 
Is MI325X primarily used for training? I might have missed it but did not see any inference benchmarks in the presentation.
 
Is MI325X primarily used for training? I might have missed it but did not see any inference benchmarks in the presentation.
No, all AMD Instincts are Inference first but now they're starting to claim it's good for training too
edit: was inference before training on presentation too
1728675452243.png
 
The fifth-generation EPYC CPU offers up to 192 efficiency cores for web apps and up to 192 cores for performance-demanding apps. However, we note that AMD again chose to forgo on-chip AI acceleration, which Intel added to Xeon for the last two generations. It seems strange, but I suspect this is what the hyperscalers told AMD they preferred.
...
The primary difference between the MI300 and MI325 is the amount of HBM3, which we assume are stacked 12 DDR chips high. The 256GB of HBM is 80% more memory than on the Nvidia H100 and 50% more than the H200. This additional memory can matter a lot, as models get larger. Meta said on stage that 100% of their inference processing on Llama 405B is being serviced by the MI300X. I wonder if that will remain the case now that Nvidia announced on the same day that they increased performance of Llama 405B on the H200 by another 50%.

How fast is the new MI325X? AMD claimed (without providing details like what software was used and the size of the input sequence) that the 325 is 40% faster than the soon-to-be-old H200 for certain inference workloads. If history is a guide, we can be certain this benchmark did not use Nvidia’s optimizing software which can easily increase performance by 2-4 times.
...
If you don’t think a developer would use Nvidia’s optimization software to increase performance by 2-4X (and I gotta meet this guy!), then one can conclude the MI325 is faster than the H200. If you use AMD’s optimizations in ROCm.

The larger HBM will attract a lot of users, however, like Meta. We will have to wait to see how much the MI350 can improve the contrast with Blackwell, which AMD ignored in the event.
 
Last edited by a moderator:
Back
Top