Yeah, there's a difference between supporting a data type and the range of instructions you have to use with them. Looking at the RDNA 2 Shader ISA document (of which I claim to understand very little!!) you can see this near the top:
https://developer.amd.com/wp-content/resources/RDNA2_Shader_ISA_November2020.pdf
"Feature Changes in RDNA2 Devices
Dot product ALU operations added accelerate inferencing and deep-learning:
◦V_DOT2_F32_F16 / V_DOT2C_F32_F16
◦V_DOT2_I32_I16 / V_DOT2_U32_U16
◦V_DOT4_I32_I8 / V_DOT4C_I32_I8
◦V_DOT4_U32_U8
◦V_DOT8_I32_I4
◦V_DOT8_U32_U4"
As you can see, these are additions since RDNA1, specifically to "accelerate inferencing and deep-learning".
Perhaps these are the specific, ML focused changes that MS requested (or there's some overlap). They're possibly what PS5 is lacking.
Could it be that RDNA1.0 ISA doesn't include everything for 1.1 or whatever Navi14 was? Since at least to my limited understanding this (from RDNA Whitepaper) suggests it should have some DOT2, DOT4 and DOT8 operations at least, which aren't present in the ISA doc
Some variants of the dual compute unit expose additional mixed-precision dot-product modes in the ALUs, primarily for accelerating machine learning inference. A mixed-precision FMA dot2 will compute two half-precision multiplications and then add the results to a single-precision accumulator. For even greater throughput, some ALUs will support 8-bit integer dot4 operations and 4-bit dot8 operations, all of which use 32-bit accumulators to avoid any overflows.
Last edited: