AMD Execution Thread [2024]

The worst part by far about many of these "neural processing units" is that absolutely none of the shipping hardware vendors are interested in providing explicit programming support for them. There's no public compilers to support your usual compiled high-level languages like C++ or custom languages like HLSL and they won't even let you create custom programs in assembly either. I've never seen a class hardware designs before that was immediately outright hostile towards any software developers!

For all of it's faults, at least Sony let developers directly program against the Cell Processor's SPUs despite featuring the very same 'dataflow' architecture. From a programming perspective, all of these 'NPUs' are beneath even things like the Arduino microcontrollers ...
 
The worst part by far about many of these "neural processing units" is that absolutely none of the shipping hardware vendors are interested in providing explicit programming support for them. There's no public compilers to support your usual compiled high-level languages like C++ or custom languages like HLSL and they won't even let you create custom programs in assembly either. I've never seen a class hardware designs before that was immediately outright hostile towards any software developers!

For all of it's faults, at least Sony let developers directly program against the Cell Processor's SPUs despite featuring the very same 'dataflow' architecture. From a programming perspective, all of these 'NPUs' are beneath even things like the Arduino microcontrollers ...

So how do you target them?
 
AFAIK, NPUs are not really supposed to be directly programmable by the application developer, but rather by each vendor's video driver programmers who would adapt it to the specific needs of middleware libraries and runtimes.

For example, DirectML and WinML are designed to use Direct3D metacommands, proprietary opaque implementations of standard reduced-precision inferencing algorithms in the user-mode video driver, which are intended to be consumed programmatically at runtime using the Direct3D metacommand APIs (ID3D12Device5::EnumerateMetaCommands() and ::EnumerateMetaCommandParameters() to get their number and names/GUIDs, and names and the type of input and output parameters for each of these metacommands, create with ::CreateMetaCommand(), then initialize and execute with ID3D12GraphicsCommandList4::InitializeMetaCommand() and ::ExecuteMetaCommand()).

Recent generations of GPUs from NVIdia, AMD, and Intel have settled on a common set of Direct3D metacommands, with some minor variations (NB the three stages - creation, initialization and execution):

Nvida GeForce GTX / RTX
Code:
Metacommands [parameters per stage]:
Conv (Convolution) [84][1][6],
Conv (Convolution) [108][5][6],
GEMM (General matrix multiply) [67][1][6],
GEMM (General matrix multiply) [91][5][6],
GEMM (General matrix multiply) [91][5][6],
MVN (Mean Variance Normalization) [91][5][6],
MVN (Mean Variance Normalization) [67][1][6],
Pooling [56][3][4],
MHA (Multi-Head Attention) [299][13][16],
CopyTensor [3][1][31]

AMD RNDA2 / RDNA3
Code:
Metacommands [parameters per stage]:
Conv (Convolution) [84][1][6],
Conv (Convolution) [108][5][6],
GEMM (General matrix multiply) [67][1][6],
GEMM (General matrix multiply) [91][5][6],
GEMM (General matrix multiply) [91][5][6],
MVN (Mean Variance Normalization) [91][5][6],
MVN (Mean Variance Normalization) [67][1][6],
MHA (Multi-Head Attention) [299][13][16]

Intel Arc / Xe
Code:
Metacommands [parameters per stage]:
Conv (Convolution) [84][1][6],
Conv (Convolution) [108][5][6],
GEMM (General matrix multiply) [67][1][6],
GEMM (General matrix multiply) [91][5][6],
MVN (Mean Variance Normalization) [91][5][6],
Pooling [56][3][4],
Pooling [44][1][4],
LSTM (Long Short-Term Memory) [252][10][13],
MHA (Multi-Head Attention) [299][13][16],

Current NPUs are built for the same DirectML / OpenVINO / ONNX workflow, and their Windows driver model is a compute-only subset of WDDM KMD / Direct3D UMD, with only a limited form of 'core compute' capability.

PS. If you want to check the names and types of parameters for each metacommand supported by your GPU, you can use my command-line feature reporting tool, it has a verbose mode that will get you a dozen screens of C-style definitions for all these twelve hundred parameters.
 
Last edited:


Bu but there's always AI! ...:(
In fact, gaming demand is down “a lot,” said AMD CFO Jean Hu, in the analyst call.

“If you look at gaming, demand has been quite weak. That’s well known. Also [they have] inventory issues. We guided down more than 30% in the first and second quarters, and the second half will be lower than the first half. That is how we are looking for the gaming business this year,” Hu said.

“Looking further ahead, AI represents an unprecedented opportunity for AMD. While there has been significant growth in AI infrastructure build outs, we are still in the very early stages of what we believe is going to be a period of sustained growth driven by an insatiable demand for both more specialized AI and high-performance general purpose compute,” Su said in an analyst call after the company reported its first quarter results for the period ended March 31.

Gaming Revenue down by 48%.. ouch.
amd-q1.jpg
 
Pretty dang good for what was supposed to be a really low Q1.
Gaming drop was expected but Embedded is just unfortunate since Q1 is typically the high water mark.
MI300 & Co. already hit over $1b in revenue($400mil in Q4'23), they could hit upto ~$3b in DC GPU sales this year if demand holds.
 

AMD doesn't have leadership in either Gaming or AI, which is why none of them actually saved the company from having this weak quarter results. Worse yet, they actually guided down on Gaming way lower than that. They are not expecting the situation to improve this year at all. It could be why they cancelled high end RDNA4.

Gaming revenue declined sequentially by 32.6 percent and 47.5 percent year-over-year to $922 million, due to lower demand for PC GPUs. AMD’s CFO, Jean Hu, said the company doesn’t expect the situation to improve this year.

“Based on the visibility we have, the first half […] we guided down sequentially more than 30 percent the first and second quarters, and the second half will be lower than the first half. That is how we are looking for the gaming business this year.


 
On the Q1 results, it’s a little surprising that the drop in console revenue didn’t hurt operating margins in gaming. Console revenue should be basically free money at this point.
Agreed, I don't quite understand, does anyone know? How does AMD's semi-custom business work in terms of gross and operating margins (for both consoles and Samsung)? Given the amounts of money involved, I assume they legally buy the chip from TSMC and sell it back to MS so there are costs involved, rather than it being a pure IP deal (ala ARM/Imagination/Synopsys/etc.) that would result in ~100% gross margins? (side note: slightly surprised they expect gaming to be even lower in H2 despite PS5 Pro release, I guess royalties might not be more than licensing/R&D money they are already getting from Sony, or they expect XBox and/or consumer GPUs to drop further?)

Irrespective of gross margins, in any technology business where you have very large fixed R&D costs, if your revenue drops by 50% then your operating margins should crash into negative territory.

My best guess is they are (legally at least) moving R&D costs for GPU development between segments: previously a lot of that was in "Gaming" for consumer and console GPUs, and now those same GPU engineers are focused on data center products like MI400X and their operating costs go in the data center segment (legally and financially at least, in practice the division is unlikely to be so clear cut). I don't think that necessarily means they are significantly reducing R&D that will eventually impact gaming GPUs, e.g. shader processor improvements may be applicable to both segments but now costed as data center. And some of it may be temporary due to the smaller RDNA4 line-up and associated per-chip costs etc... I honestly wouldn't read too much into it R&D-wise, but it's the only explanation I can come up with for the weirdly stable gaming operating margins.

Curious if anyone else has any other explanations!
 
Could you please stop talking like a non-technical NV PR person?
Could you please stop discussing like a teenager and actually present counter arguments?

AFAICS DavidGraham's point is...on point. AMD don't have leadership in Gaming or AI. Indeed, they only have a small share of those markets. I don't feel evidence is needed to validate that as I think it's common knowledge. True or false? Make a counter-point and support it. Or challenge me, "where's the evidence AMD only has a small market share?" and I'll have to go dig it up.

One liner assertions and ad hominem's like this aren't constructive, friendly discussion. Please either engage in good faith or don't engage.
 
Agreed, I don't quite understand, does anyone know? How does AMD's semi-custom business work in terms of gross and operating margins (for both consoles and Samsung)? Given the amounts of money involved, I assume they legally buy the chip from TSMC and sell it back to MS so there are costs involved, rather than it being a pure IP deal (ala ARM/Imagination/Synopsys/etc.) that would result in ~100% gross margins? (side note: slightly surprised they expect gaming to be even lower in H2 despite PS5 Pro release, I guess royalties might not be more than licensing/R&D money they are already getting from Sony, or they expect XBox and/or consumer GPUs to drop further?)

Irrespective of gross margins, in any technology business where you have very large fixed R&D costs, if your revenue drops by 50% then your operating margins should crash into negative territory.

My best guess is they are (legally at least) moving R&D costs for GPU development between segments: previously a lot of that was in "Gaming" for consumer and console GPUs, and now those same GPU engineers are focused on data center products like MI400X and their operating costs go in the data center segment (legally and financially at least, in practice the division is unlikely to be so clear cut). I don't think that necessarily means they are significantly reducing R&D that will eventually impact gaming GPUs, e.g. shader processor improvements may be applicable to both segments but now costed as data center. And some of it may be temporary due to the smaller RDNA4 line-up and associated per-chip costs etc... I honestly wouldn't read too much into it R&D-wise, but it's the only explanation I can come up with for the weirdly stable gaming operating margins.

Curious if anyone else has any other explanations!

As you say, it's for sure not only a IP deal. Maybe there are some IP related cost, but for most of the later cycle it's just AMD acting as a normal component supplier, like a GPU/CPU in Laptops. I remember, some years ago they changed their accounting style for R&D cost, which is revenue relevant in the semi-custom business from booking it at the moment of payment to booking it at the moment, where the cost are incurred. Therefore, my impressions was the customer pays for the R&D and the chip seperately. The R&D cost are probably payed over a certain time and after that it's just the component supply.

Consoles are low-margin, but the cost on AMDs side are miniscule. The revenue of consoles might've been 1,4 billion a year ago with 20% margin, which means 280 million. Now 700 million with 140 million operating income. The gpus don't play a big role in amds gaming segment. Maybe 400 million reduced to 250 million now. But that's just pure guesswork. I only know the consoles are the much bigger part. Add some R&D shifting to Datacenter and the numbers make sense.

I'm not so optimistic the R&D cost won't impact gaming. Architecture development which benefits both is a small part, but AMD needs to step up in the whole software, ecosystem development on the hardware and software side in AI. If you're thinking of 300 million dollars GPU revenue vs 1,5 billion datacenter GPU end of year and more next year, it's clear AMD will focus their R&D on AI. I don't believe the real motivation for cancelling RNDA4 high end wasn't a R&D focus on AI. From company perspective it makes no sense to allocate your limited R&D on consumer, if you can gain 5x more in AI. Nvidia is in a different situation, because they have unlimited money at the moment, but AMD needs to manage their R&D very carefully, because of the weakness of gaming and embedded.
 
My best guess is they are (legally at least) moving R&D costs for GPU development between segments: previously a lot of that was in "Gaming" for consumer and console GPUs, and now those same GPU engineers are focused on data center products like MI400X and their operating costs go in the data center segment
Nope, those are different teams under different BUs.
Agreed, I don't quite understand, does anyone know?
The answer is so beyond obvious it's silly: Radeon margins are nowhere near as dogshit as you thought they were.
RDNA3 did suprisingly well in the market given how mediocre the actual products are.
Do you have an alternate theory?
Yes, use some linkedin-fu and find out who's the Radeon silicon something senior something program manager and you'll understand everything.
 
Yes, use some linkedin-fu and find out who's the Radeon silicon something senior something program manager and you'll understand everything.

A simple no would suffice.

RDNA3 did suprisingly well in the market given how mediocre the actual products are.

Doesn’t explain why lower console revenue had no margin impact as that should be a low cost revenue stream.
 
A simple no would suffice.
It's not that hard.
Again, I'm asking you to do the lowest effort thing possible.
Hint: the guy ran Rome and Milan programs at AMD DC before.
Doesn’t explain why lower console revenue had no margin impact as that should be a low cost revenue stream.
It means Radeon ASPs went up the same quarter.
Again, an answer so blissfully trivial it always hides in plain sight.
it's clear AMD will focus their R&D on AI
no they just focus on raw execution discipline.
Like the OG Krakens are also dead, those were RDNA4.5. Come on.
They were slipping and had to throttle cadence and kill a whole pile of would-be sidetrack parts.
 
Last edited:
Back
Top