AMD Execution Thread [2024]

Unlike desktop Zen 5, server Zens got new IO-die supporting up to 16 Zen 5 or 12 Zen 5c CCDs
 
Unlike desktop Zen 5, server Zens got new IO-die supporting up to 16 Zen 5 or 12 Zen 5c CCDs

And DDR5-6400, along with a few other features. I think it was necessary since they went +50% cores. While desktop Zen 5 didn't increase the core count, it is disappointing that AMD made no change to the desktop IOD at all to enable higher Infinity Fabric and memory speeds.

MI325X should do well with it's memory capacity. And with the roadmap they're targeting, AMD is clearly putting a lot of resources towards AI GPUs now. From what I've read, that department has the highest resources and engineers allocated to it at the moment (Not a surprise).
 
And DDR5-6400, along with a few other features. I think it was necessary since they went +50% cores. While desktop Zen 5 didn't increase the core count, it is disappointing that AMD made no change to the desktop IOD at all to enable higher Infinity Fabric and memory speeds.
While the IO-die is the same, they did upgrade the official memory support from DDR5-5200 to DDR5-5600
 
Is MI325X primarily used for training? I might have missed it but did not see any inference benchmarks in the presentation.
 
Is MI325X primarily used for training? I might have missed it but did not see any inference benchmarks in the presentation.
No, all AMD Instincts are Inference first but now they're starting to claim it's good for training too
edit: was inference before training on presentation too
1728675452243.png
 
The fifth-generation EPYC CPU offers up to 192 efficiency cores for web apps and up to 192 cores for performance-demanding apps. However, we note that AMD again chose to forgo on-chip AI acceleration, which Intel added to Xeon for the last two generations. It seems strange, but I suspect this is what the hyperscalers told AMD they preferred.
...
The primary difference between the MI300 and MI325 is the amount of HBM3, which we assume are stacked 12 DDR chips high. The 256GB of HBM is 80% more memory than on the Nvidia H100 and 50% more than the H200. This additional memory can matter a lot, as models get larger. Meta said on stage that 100% of their inference processing on Llama 405B is being serviced by the MI300X. I wonder if that will remain the case now that Nvidia announced on the same day that they increased performance of Llama 405B on the H200 by another 50%.

How fast is the new MI325X? AMD claimed (without providing details like what software was used and the size of the input sequence) that the 325 is 40% faster than the soon-to-be-old H200 for certain inference workloads. If history is a guide, we can be certain this benchmark did not use Nvidia’s optimizing software which can easily increase performance by 2-4 times.
...
If you don’t think a developer would use Nvidia’s optimization software to increase performance by 2-4X (and I gotta meet this guy!), then one can conclude the MI325 is faster than the H200. If you use AMD’s optimizations in ROCm.

The larger HBM will attract a lot of users, however, like Meta. We will have to wait to see how much the MI350 can improve the contrast with Blackwell, which AMD ignored in the event.
 
Last edited by a moderator:

If this is true, it's really interesting. The CCD with the cores is stacked on top of the 3d v-cache. I guess that might be why they've enabled overclocking. Basically the heatsink is closer to the CCD, and in theory less likely to cook the 3d v-cache. Could be a really interesting cpu launch, just from a tech perspective.
 

If this is true, it's really interesting. The CCD with the cores is stacked on top of the 3d v-cache. I guess that might be why they've enabled overclocking. Basically the heatsink is closer to the CCD, and in theory less likely to cook the 3d v-cache. Could be a really interesting cpu launch, just from a tech perspective.
It also could have impacted all those changes seen by High Yield on the Zen5 die shot, the tighter packed cache saving die space and the smaller TSVs.
It seems so simple and obvious... now, when you think about all the advantages it provides.
I wonder if there is additional complexity to the packaging/assembly process because that is the only obvious potential downside I can think of.
 
I wonder if there is additional complexity to the packaging/assembly process because that is the only obvious potential downside I can think of.

It looks to me that this arrangement is going to be quite a bit more complex because generally the CCD has more pads, so by reversing the stacking it either the CCD will have to go through the 3D V cache chip or directly through the TSVs.
As for its advantage I can imagine that CCD requires more power so it's better to be closer to the heat spreader. Currently some of the power limits on the X3D chips are there because they don't want to fry the cache chip.
 
It seems so simple and obvious... now, when you think about all the advantages it provides.
It very much is simple and obvious as people have talked about it before, and they're still not even fully taking advantage of things yet, or at least taking things to their logical conclusion(of what is currently possible, at least). If you make putting a cache chip underneath standard, you can remove all the L3 from the CCD, which makes up a significant portion of the die. This gives a lot of playroom in terms of what to do with the CCD. You could shrink it heavily, you could widen the architecture massively, you could add more cores, you could reduce transistor density to improve thermals and clockspeeds, or obviously any combination of these things.

You could even add more cache or perhaps use an even lower density process for the cache chip(though I think TSMC has limits on compatibility currently?), since you're not having to worry about it being too big and covering up logic or anything.

Honestly, this being third generation of chips since Vcache was introduced, I'm a little disappointed they haven't gone this route yet.
 
Last edited:
AMD Q3 2024 is up, revenue: $6.82B, up +18% YoY, data center is massively up, client is up, embedded is down, and gaming is massively down. Q4 guidance is a bit cautious.

- Data Center: $3.55B, up +122% YoY, and up 25% sequentially.
- Client: $1.88B, up +29% YoY, and up 26% sequentially.
- Gaming: $462M, down -69% YoY and down 29% sequentially.
- Embedded: $927M, down 25% YoY but up 8% sequentially.

For the fourth quarter of 2024, AMD expects revenue to be approximately $7.5 billion, plus or minus $300 million. At the mid-point of the revenue range, this represents year-over-year growth of approximately 22% and sequential growth of approximately 10%.
rqCPTP0v3GILKPaN.jpg



 
Last edited:
Pretty much on life support atm. Either rdna4 is a good hit or it’s close to gg. And that’s an awful thing for consumers from a competitive standpoint.

image.png
 
Back
Top