AMD Execution Thread [2024]

Lurkmass · Apr 16, 2024

The worst part by far about many of these "neural processing units" is that absolutely none of the shipping hardware vendors are interested in providing explicit programming support for them. There's no public compilers to support your usual compiled high-level languages like C++ or custom languages like HLSL and they won't even let you create custom programs in assembly either. I've never seen a class hardware designs before that was immediately outright hostile towards any software developers!

For all of it's faults, at least Sony let developers directly program against the Cell Processor's SPUs despite featuring the very same 'dataflow' architecture. From a programming perspective, all of these 'NPUs' are beneath even things like the Arduino microcontrollers ...

trinibwoy · Apr 16, 2024

Lurkmass said:
The worst part by far about many of these "neural processing units" is that absolutely none of the shipping hardware vendors are interested in providing explicit programming support for them. There's no public compilers to support your usual compiled high-level languages like C++ or custom languages like HLSL and they won't even let you create custom programs in assembly either. I've never seen a class hardware designs before that was immediately outright hostile towards any software developers!

For all of it's faults, at least Sony let developers directly program against the Cell Processor's SPUs despite featuring the very same 'dataflow' architecture. From a programming perspective, all of these 'NPUs' are beneath even things like the Arduino microcontrollers ...

So how do you target them?

Bondrewd · Apr 16, 2024

trinibwoy said:
So how do you target them?

MS says use ONNX runtime.

TopSpoiler · Apr 17, 2024

trinibwoy said:
So how do you target them?

DirectML on Windows (explanation),
Ryzen AI Software for Python,
Intel NPU Acceleration Library for Python.

DmitryKo · Apr 17, 2024

AFAIK, NPUs are not really supposed to be directly programmable by the application developer, but rather by each vendor's video driver programmers who would adapt it to the specific needs of middleware libraries and runtimes.

For example, DirectML and WinML are designed to use Direct3D metacommands, proprietary opaque implementations of standard reduced-precision inferencing algorithms in the user-mode video driver, which are intended to be consumed programmatically at runtime using the Direct3D metacommand APIs (ID3D12Device5::EnumerateMetaCommands() and ::EnumerateMetaCommandParameters() to get their number and names/GUIDs, and names and the type of input and output parameters for each of these metacommands, create with ::CreateMetaCommand(), then initialize and execute with ID3D12GraphicsCommandList4::InitializeMetaCommand() and ::ExecuteMetaCommand()).

Recent generations of GPUs from NVIdia, AMD, and Intel have settled on a common set of Direct3D metacommands, with some minor variations (NB the three stages - creation, initialization and execution):

Nvida GeForce GTX / RTX

Code:

Metacommands [parameters per stage]:
Conv (Convolution) [84][1][6],
Conv (Convolution) [108][5][6],
GEMM (General matrix multiply) [67][1][6],
GEMM (General matrix multiply) [91][5][6],
GEMM (General matrix multiply) [91][5][6],
MVN (Mean Variance Normalization) [91][5][6],
MVN (Mean Variance Normalization) [67][1][6],
Pooling [56][3][4],
MHA (Multi-Head Attention) [299][13][16],
CopyTensor [3][1][31]

AMD RNDA2 / RDNA3

Code:

Metacommands [parameters per stage]:
Conv (Convolution) [84][1][6],
Conv (Convolution) [108][5][6],
GEMM (General matrix multiply) [67][1][6],
GEMM (General matrix multiply) [91][5][6],
GEMM (General matrix multiply) [91][5][6],
MVN (Mean Variance Normalization) [91][5][6],
MVN (Mean Variance Normalization) [67][1][6],
MHA (Multi-Head Attention) [299][13][16]

Intel Arc / Xe

Code:

Metacommands [parameters per stage]:
Conv (Convolution) [84][1][6],
Conv (Convolution) [108][5][6],
GEMM (General matrix multiply) [67][1][6],
GEMM (General matrix multiply) [91][5][6],
MVN (Mean Variance Normalization) [91][5][6],
Pooling [56][3][4],
Pooling [44][1][4],
LSTM (Long Short-Term Memory) [252][10][13],
MHA (Multi-Head Attention) [299][13][16],

Current NPUs are built for the same DirectML / OpenVINO / ONNX workflow, and their Windows driver model is a compute-only subset of WDDM KMD / Direct3D UMD, with only a limited form of 'core compute' capability.

PS. If you want to check the names and types of parameters for each metacommand supported by your GPU, you can use my command-line feature reporting tool, it has a verbose mode that will get you a dozen screens of C-style definitions for all these twelve hundred parameters.

Kaotik · Thursday at 4:27 AM

According to Korean media AMD has struck a deal worth over 2.8 billion euros with Samsung for HBM3e memories

https://twitter.com/i/web/status/1782957160361754723

AMD Execution Thread [2024]

Lurkmass

trinibwoy

Meh

Bondrewd

TopSpoiler

DmitryKo

Kaotik

Drunk Member

Similar threads