AMD Execution Thread [2024]

TopSpoiler · Sep 26, 2024

Read the whole thread.

https://twitter.com/x/status/1839219312952766942

Scott_Arm · Oct 7, 2024

This channel looks interesting. Haven't fully watched and can't attest to it's accuracy, but I see der8auer in the comments praising it. That at least gives me some feeling that it's good.

Deleted member 2197 · Oct 7, 2024

Samsung Purchases $20 Million Worth of AMD MI300X Data Center GPUs for AI Development

Samsung Electronics has gone with Team Red and purchased $20 Million worth of AMD Instinct MI300X GPUs for its AI requirements.

wccftech.com

iroboto · Oct 11, 2024

Recap of Advancing AI keynote with AMD

Full Video:

Kaotik · Oct 11, 2024

Unlike desktop Zen 5, server Zens got new IO-die supporting up to 16 Zen 5 or 12 Zen 5c CCDs

Erinyes · Oct 11, 2024

Kaotik said:
Unlike desktop Zen 5, server Zens got new IO-die supporting up to 16 Zen 5 or 12 Zen 5c CCDs

And DDR5-6400, along with a few other features. I think it was necessary since they went +50% cores. While desktop Zen 5 didn't increase the core count, it is disappointing that AMD made no change to the desktop IOD at all to enable higher Infinity Fabric and memory speeds.

MI325X should do well with it's memory capacity. And with the roadmap they're targeting, AMD is clearly putting a lot of resources towards AI GPUs now. From what I've read, that department has the highest resources and engineers allocated to it at the moment (Not a surprise).

Kaotik · Oct 11, 2024

Erinyes said:
And DDR5-6400, along with a few other features. I think it was necessary since they went +50% cores. While desktop Zen 5 didn't increase the core count, it is disappointing that AMD made no change to the desktop IOD at all to enable higher Infinity Fabric and memory speeds.

While the IO-die is the same, they did upgrade the official memory support from DDR5-5200 to DDR5-5600

Deleted member 2197 · Oct 11, 2024

Is MI325X primarily used for training? I might have missed it but did not see any inference benchmarks in the presentation.

Kaotik · Oct 11, 2024

pharma said:
Is MI325X primarily used for training? I might have missed it but did not see any inference benchmarks in the presentation.

No, all AMD Instincts are Inference first but now they're starting to claim it's good for training too
edit: was inference before training on presentation too

Deleted member 2197 · Oct 11, 2024

Kaotik said:
No, all AMD Instincts are Inference first but now they're starting to claim it's good for training too

Why did they only show training numbers vs competition?

Kaotik · Oct 12, 2024

pharma said:
Why did they only show training numbers vs competition?

The post you quoted literally has inference numbers
ediit: might have been after you quoted but regardless they were there almost immediately)

Deleted member 2197 · Oct 14, 2024

AMD Launches MI325 AI Accelerator To Challenge Nvidia. Stock Fell 4.5%

Chasing Nvidia is hard; Team Green simultaneously countered with new software that can triple AI performance.

www.forbes.com

The fifth-generation EPYC CPU offers up to 192 efficiency cores for web apps and up to 192 cores for performance-demanding apps. However, we note that AMD again chose to forgo on-chip AI acceleration, which Intel added to Xeon for the last two generations. It seems strange, but I suspect this is what the hyperscalers told AMD they preferred.
...
The primary difference between the MI300 and MI325 is the amount of HBM3, which we assume are stacked 12 DDR chips high. The 256GB of HBM is 80% more memory than on the Nvidia H100 and 50% more than the H200. This additional memory can matter a lot, as models get larger. Meta said on stage that 100% of their inference processing on Llama 405B is being serviced by the MI300X. I wonder if that will remain the case now that Nvidia announced on the same day that they increased performance of Llama 405B on the H200 by another 50%.

How fast is the new MI325X? AMD claimed (without providing details like what software was used and the size of the input sequence) that the 325 is 40% faster than the soon-to-be-old H200 for certain inference workloads. If history is a guide, we can be certain this benchmark did not use Nvidia’s optimizing software which can easily increase performance by 2-4 times.
...
If you don’t think a developer would use Nvidia’s optimization software to increase performance by 2-4X (and I gotta meet this guy!), then one can conclude the MI325 is faster than the H200. If you use AMD’s optimizations in ROCm.

The larger HBM will attract a lot of users, however, like Meta. We will have to wait to see how much the MI350 can improve the contrast with Blackwell, which AMD ignored in the event.

DavidGraham · Oct 22, 2024

AMD trimmed down some CoWoS capacity due to uncertainty around MI325X demand, NVIDIA immediately picked up the spared capacity.

https://twitter.com/x/status/1848727215791644943

Scott_Arm · Oct 28, 2024

Here's Your First Look at a Delidded AMD Ryzen 7 9800X3D CPU With Next-Gen 3D V-Cache Tech

AMD's Ryzen 7 9800X3D CPU has been delidded, revealing the next-generation of 3D V-Cache technology innovations for gamers.

wccftech.com

If this is true, it's really interesting. The CCD with the cores is stacked on top of the 3d v-cache. I guess that might be why they've enabled overclocking. Basically the heatsink is closer to the CCD, and in theory less likely to cook the 3d v-cache. Could be a really interesting cpu launch, just from a tech perspective.

LordEC911 · Oct 29, 2024

Scott_Arm said:
Here's Your First Look at a Delidded AMD Ryzen 7 9800X3D CPU With Next-Gen 3D V-Cache Tech

AMD's Ryzen 7 9800X3D CPU has been delidded, revealing the next-generation of 3D V-Cache technology innovations for gamers.

wccftech.com

If this is true, it's really interesting. The CCD with the cores is stacked on top of the 3d v-cache. I guess that might be why they've enabled overclocking. Basically the heatsink is closer to the CCD, and in theory less likely to cook the 3d v-cache. Could be a really interesting cpu launch, just from a tech perspective.

It also could have impacted all those changes seen by High Yield on the Zen5 die shot, the tighter packed cache saving die space and the smaller TSVs.
It seems so simple and obvious... now, when you think about all the advantages it provides.
I wonder if there is additional complexity to the packaging/assembly process because that is the only obvious potential downside I can think of.

pcchen · Oct 29, 2024

LordEC911 said:
I wonder if there is additional complexity to the packaging/assembly process because that is the only obvious potential downside I can think of.

It looks to me that this arrangement is going to be quite a bit more complex because generally the CCD has more pads, so by reversing the stacking it either the CCD will have to go through the 3D V cache chip or directly through the TSVs.
As for its advantage I can imagine that CCD requires more power so it's better to be closer to the heat spreader. Currently some of the power limits on the X3D chips are there because they don't want to fry the cache chip.

DavidGraham · Oct 29, 2024

AMD is now working on their Neural AI denoising solution.

Neural Supersampling and Denoising for Real-time Path Tracing

Read our research for a neural denoising and supersampling technique, with the aim of achieving real-time path tracing.

gpuopen.com

Seanspeed · Oct 29, 2024

LordEC911 said:
It seems so simple and obvious... now, when you think about all the advantages it provides.

It very much is simple and obvious as people have talked about it before, and they're still not even fully taking advantage of things yet, or at least taking things to their logical conclusion(of what is currently possible, at least). If you make putting a cache chip underneath standard, you can remove all the L3 from the CCD, which makes up a significant portion of the die. This gives a lot of playroom in terms of what to do with the CCD. You could shrink it heavily, you could widen the architecture massively, you could add more cores, you could reduce transistor density to improve thermals and clockspeeds, or obviously any combination of these things.

You could even add more cache or perhaps use an even lower density process for the cache chip(though I think TSMC has limits on compatibility currently?), since you're not having to worry about it being too big and covering up logic or anything.

Honestly, this being third generation of chips since Vcache was introduced, I'm a little disappointed they haven't gone this route yet.

DavidGraham · Oct 29, 2024

AMD Q3 2024 is up, revenue: $6.82B, up +18% YoY, data center is massively up, client is up, embedded is down, and gaming is massively down. Q4 guidance is a bit cautious.

- Data Center: $3.55B, up +122% YoY, and up 25% sequentially.
- Client: $1.88B, up +29% YoY, and up 26% sequentially.
- Gaming: $462M, down -69% YoY and down 29% sequentially.
- Embedded: $927M, down 25% YoY but up 8% sequentially.

For the fourth quarter of 2024, AMD expects revenue to be approximately $7.5 billion, plus or minus $300 million. At the mid-point of the revenue range, this represents year-over-year growth of approximately 22% and sequential growth of approximately 10%.

AMD Reports Third Quarter 2024 Financial Results, Revenue Up 18 Percent YoY

AMD today announced revenue for the third quarter of 2024 of $6.8 billion, gross margin of 50%, operating income of $724 million, net income of $771 million and diluted earnings per share of $0.47. On a non-GAAP basis, gross margin was 54%, operating income was $1.7 billion, net income was $1.5...

www.techpowerup.com

RobertR1 · Oct 30, 2024

Pretty much on life support atm. Either rdna4 is a good hit or it’s close to gg. And that’s an awful thing for consumers from a competitive standpoint.

AMD Execution Thread [2024]

TopSpoiler

Scott_Arm

Deleted member 2197

Guest

Samsung Purchases $20 Million Worth of AMD MI300X Data Center GPUs for AI Development

iroboto

Daft Funk

Kaotik

Drunk Member

Erinyes

Kaotik

Drunk Member

Deleted member 2197

Guest

Kaotik

Drunk Member

Deleted member 2197

Guest

Kaotik

Drunk Member

Deleted member 2197

Guest

AMD Launches MI325 AI Accelerator To Challenge Nvidia. Stock Fell 4.5%

DavidGraham

Scott_Arm

Here's Your First Look at a Delidded AMD Ryzen 7 9800X3D CPU With Next-Gen 3D V-Cache Tech

LordEC911

Here's Your First Look at a Delidded AMD Ryzen 7 9800X3D CPU With Next-Gen 3D V-Cache Tech

pcchen

Moderator

DavidGraham

Neural Supersampling and Denoising for Real-time Path Tracing

Seanspeed

DavidGraham

AMD Reports Third Quarter 2024 Financial Results, Revenue Up 18 Percent YoY

RobertR1

Pro

Similar threads