Playstation 5 [PS5] [Release November 12 2020]

Yeah, there's a difference between supporting a data type and the range of instructions you have to use with them. Looking at the RDNA 2 Shader ISA document (of which I claim to understand very little!!) you can see this near the top:

https://developer.amd.com/wp-content/resources/RDNA2_Shader_ISA_November2020.pdf

"Feature Changes in RDNA2 Devices

Dot product ALU operations added accelerate inferencing and deep-learning:
◦V_DOT2_F32_F16 / V_DOT2C_F32_F16
◦V_DOT2_I32_I16 / V_DOT2_U32_U16
◦V_DOT4_I32_I8 / V_DOT4C_I32_I8
◦V_DOT4_U32_U8
◦V_DOT8_I32_I4
◦V_DOT8_U32_U4"

As you can see, these are additions since RDNA1, specifically to "accelerate inferencing and deep-learning".

Perhaps these are the specific, ML focused changes that MS requested (or there's some overlap). They're possibly what PS5 is lacking.

Could it be that RDNA1.0 ISA doesn't include everything for 1.1 or whatever Navi14 was? Since at least to my limited understanding this (from RDNA Whitepaper) suggests it should have some DOT2, DOT4 and DOT8 operations at least, which aren't present in the ISA doc
Some variants of the dual compute unit expose additional mixed-precision dot-product modes in the ALUs, primarily for accelerating machine learning inference. A mixed-precision FMA dot2 will compute two half-precision multiplications and then add the results to a single-precision accumulator. For even greater throughput, some ALUs will support 8-bit integer dot4 operations and 4-bit dot8 operations, all of which use 32-bit accumulators to avoid any overflows.
 
Last edited:
Could it be that RDNA1.0 ISA doesn't include everything for 1.1 or whatever Navi14 was? Since at least to my limited understanding this (from RDNA Whitepaper) suggests it should have some DOT2, DOT4 and DOT8 operations at least, which aren't present in the ISA doc
could it simply be that MS added additional units that were only int capable while RDNA 1 reused the FP units and so if you wanted to do Int it would limit how many FP functions you can do ? Kinda how on a FP 16 unit you could do two FP 8 caculations ?
 
could it simply be that MS added additional units that were only int capable while RDNA 1 reused the FP units and so if you wanted to do Int it would limit how many FP functions you can do ? Kinda how on a FP 16 unit you could do two FP 8 caculations ?
Don't think so, MS's presentation at HotChips showed the exact same CU layout as we've come accustomed to. No mentions of any extra units or any such. RDNA1 and 2 both use FP-units for INT
 
Could it be that RDNA1.0 ISA doesn't include everything for 1.1 or whatever Navi14 was? Since at least to my limited understanding this (from RDNA Whitepaper) suggests it should have some DOT2, DOT4 and DOT8 operations at least, which aren't present in the ISA doc

Yeah, I remembered that and so I went looking for differences in the programmer guides to see what was actually there. The RDNA 1.0 Shader ISA doc is dated 25th September 2020, so quite a bit later than the RDNA (1) white paper (August 2019) and only 2 months before the RDNA 2 shader ISA doc was published. I can't find any intermediate RDNA ISA.*

Perhaps "Additional Multi-Precision ops for some "Navi" variants" in the RDNA white-paper was referencing a semi-custom part that would later be marketed as RDNA2. XSX was probably already at initial tape-out at that point.

I suppose, really, PS5, XSX and Navi 21 are all Navi variants... ¯\_(ツ)_/¯

*Though I guess PS5 is a mix of 1 + 2 + custom??
 
Yeah, I remembered that and so I went looking for differences in the programmer guides to see what was actually there. The RDNA 1.0 Shader ISA doc is dated 25th September 2020, so quite a bit later than the RDNA (1) white paper (August 2019) and only 2 months before the RDNA 2 shader ISA doc was published. I can't find any intermediate RDNA ISA.*

Perhaps "Additional Multi-Precision ops for some "Navi" variants" in the RDNA white-paper was referencing a semi-custom part that would later be marketed as RDNA2. XSX was probably already at initial tape-out at that point.
Same or similar ops are also present for Vega20 (but not Vega1x), so I doubt that. I also have distinct recollection* of reading confirmation for 4:1 INT8 and 8:1 INT4 for Navi14 somewhere.
*I have aphantasia, my memory is more "knowledge of things that happened" rather than what I assume are "regular memories", so pinpointing something to something more specific is hard
 
Yeah, I remembered that and so I went looking for differences in the programmer guides to see what was actually there. The RDNA 1.0 Shader ISA doc is dated 25th September 2020, so quite a bit later than the RDNA (1) white paper (August 2019) and only 2 months before the RDNA 2 shader ISA doc was published. I can't find any intermediate RDNA ISA.*

Perhaps "Additional Multi-Precision ops for some "Navi" variants" in the RDNA white-paper was referencing a semi-custom part that would later be marketed as RDNA2. XSX was probably already at initial tape-out at that point.

I suppose, really, PS5, XSX and Navi 21 are all Navi variants... ¯\_(ツ)_/¯

*Though I guess PS5 is a mix of 1 + 2 + custom??
I had always assumed that by some variants, it was likely indirectly referring to AMD MI Instinct cards at the time of writing (from their CDNA white paper),

The AMD CDNA architecture builds on GCN’s foundation of scalars and vectors and adds matrices as a first class citizen while simultaneously adding support for new numerical formats for machine learning and preserving backwards compatibility for any software written for the GCN architecture. These Matrix Core Engines add a new family of wavefront-level instructions, the Matrix Fused MultiplyAdd or MFMA. The MFMA family performs mixed-precision arithmetic and operates on KxN matrices using four different types of input data: 8-bit integers (INT8), 16-bit half-precision FP (FP16), 16-bit brain FP (bf16), and 32-bit single-precision (FP32). All MFMA instructions produce either 32-bit integer (INT32) or FP32 output, which reduces the likelihood of overflowing during the final accumulation stages of a matrix multiplication

While it's not explicit to say RDNA CU, it is likely in reference about customized variants can take some of these MI instinct features over.
With RDNA 2, and ML now being at the forefront of gaming, they likely brought it over for RDNA 2 as well.

I suppose it is possible that AMD felt okay to speak about semi-custom units in that white paper as well, but the majority assumption is that they wouldn't mention semi-custom variants in a white paper.
 
All of them have INT8/4 support.. even if they work by promoting the variables to FP16.
AFAIR from Locuza's and reddit analyses on the open source drivers, all the RDNA GPUs have rapid packed math on INT8 and INT4, except for the very first Navi 10.
That's apple's exclusive Navi 12, then Navi 14, Navi 21, Navi 22 and Navi 23.
With both Series SoCs having that too, why would Sony very specifically ask for that feature to be excluded?
They had the very first AMD GPU with FP16 RPM, but now asked to leave out this "RDNA 1.1" functionality? In 2016-2017 with deep neural networks blowing everywhere Sony thought ML wouldn't be a big thing somehow?

I get that we don't have any statement from Sony claiming their GPU does higher rate INT8 and INT4 (public information on the PS5's SoC is pretty scarce compared to the competition anyways), but assuming it doesn't have seems odd to me.

I think some other people can answer the first half of your post better than myself, I'd just like to focus on the 2nd half of it. We've come to accept that both platform holders have made some customizations with their GPU, and there's proof of this. But I don't see any particular motive Sony would have for extended hardware support at the silicon level for INT8/INT4 instructions because they aren't involved in deep neural network industries to the same level of companies like Microsoft. They don't have that type of vested interested in such fields, and there are other technologies they seemingly felt were more pertinent to premium performance benefits in a gaming context that they specifically highlighted at their own event. Machine learning wasn't one of those.

They may've had FP16 support at the hardware level in PS4 Pro but that doesn't guarantee they will continue to support all such extended lower-precision math work first or at all, just because they did it one other time. Couple things with what a handful of other developers have already suggested and it would seem that there's some extended silicon within Series systems for INT8/INT4 lower-precision math that likely isn't present on PS5. You can simulate INT8/INT4 on FP16 of course; there's probably some type of performance penalty but I dunno how much that would be, and I think in PS5's case the penalty would be lessened thanks to the faster GPU clock.

Yeah, there's a difference between supporting a data type and the range of instructions you have to use with them. Looking at the RDNA 2 Shader ISA document (of which I claim to understand very little!!) you can see this near the top:

https://developer.amd.com/wp-content/resources/RDNA2_Shader_ISA_November2020.pdf

"Feature Changes in RDNA2 Devices

Dot product ALU operations added accelerate inferencing and deep-learning:
◦V_DOT2_F32_F16 / V_DOT2C_F32_F16
◦V_DOT2_I32_I16 / V_DOT2_U32_U16
◦V_DOT4_I32_I8 / V_DOT4C_I32_I8
◦V_DOT4_U32_U8
◦V_DOT8_I32_I4
◦V_DOT8_U32_U4"

As you can see, these are additions since RDNA1, specifically to "accelerate inferencing and deep-learning".

Perhaps these are the specific, ML focused changes that MS requested (or there's some overlap). They're possibly what PS5 is lacking.

Yeah this is what I think a few folks have been considering for a while. Microsoft would have more a need for that type of lower-level math precision work because Series X APUs will also be leveraged in Azure cloud servers for raw data compute tasks (as well as Xcloud streaming). Makes more sense in that case for them to add some silicon into the design to support this at the hardware level.

Sony doesn't have the same vested interests; I'd argue FP16 is more than enough for the image reconstruction techniques they are using, good enough for AMD's SuperFidelity FX, foveated rendering techniques etc. And even if there's a penalty in converting INT8/INT4 calculations through FP16 I don't imagine that conversion penalty would be too big (partly also thanks to the faster clocks in the PS5's GPU clocks and *maybe* use of cache scrubbers...just taking some guesses at this point).

Also looks like the stuff linked in @iroboto's post supports a lot of what's being discussed her
 
Same or similar ops are also present for Vega20 (but not Vega1x), so I doubt that. I also have distinct recollection* of reading confirmation for 4:1 INT8 and 8:1 INT4 for Navi14 somewhere.
*I have aphantasia, my memory is more "knowledge of things that happened" rather than what I assume are "regular memories", so pinpointing something to something more specific is hard

Yeah, Vega 7 ISA added dot product instructions, probably as Vega 7 was primarily derived for business oriented compute including ML / inferencing and all that. I think they're the same as the ones added to RDNA 2.0, but I haven't checked thoroughly. No sign of "dot" anywhere in the RDNA1.0 ISA.

I have found a hardwaretimes page about RDNA 2, commenting on an RDNA 1.1, and speculating that RDNA 1.1 is for consoles. It doesn't say anything about Navi 14 being RDNA 1.1 though.

https://www.hardwaretimes.com/amd-r...rchitectural-deep-dive-a-focus-on-efficiency/

I suppose it being for a console might fit with no publicly available "RDNA 1.1 ISA" which includes the dot product ops, so perhaps that's XSX, or following on from what @iroboto says above it could be for some other unspecified part.

BTW I don't think it actually matters if XSX is some "RDNA 1.1" GPU with all the RDNA2 features, and PS5 is some other RDNA 1.x GPU with some RDNA 2 features and whatever custom stuff they wanted. The numbers would only be interesting because of how they might hint at development timelines and priorities. They're both proving to be very solid and competitive machines!

I had always assumed that by some variants, it was likely indirectly referring to AMD MI Instinct cards at the time of writing (from their CDNA white paper),



While it's not explicit to say RDNA CU, it is likely in reference about customized variants can take some of these MI instinct features over.
With RDNA 2, and ML now being at the forefront of gaming, they likely brought it over for RDNA 2 as well.

I suppose it is possible that AMD felt okay to speak about semi-custom units in that white paper as well, but the majority assumption is that they wouldn't mention semi-custom variants in a white paper.

They could be talking about a completely different product in their white paper, yeah, or even just the range of things they can offer. It does kind of look like a few years back MS might have looked at the Vega 7 roadmap and said "we'll have that mix precision dot product stuff in our RDNA GPU, thanks."

It would be fair enough for AMD to mention that they can do this I reckon!
 
You can simulate INT8/INT4 on FP16 of course; there's probably some type of performance penalty but I dunno how much that would be, and I think in PS5's case the penalty would be lessened thanks to the faster GPU clock.
There shouldn't be any performance penalty to running lower precision computation. There just isn't any gain.
And if you support mixed precision math, you also will gain.

Neither is required to run the model, but it's just more performance to support native int4/int8 packed math, as well as mixed precision dot product support.
 
There shouldn't be any performance penalty to running lower precision computation. There just isn't any gain.
And if you support mixed precision math, you also will gain.

Neither is required to run the model, but it's just more performance to support native int4/int8 packed math, as well as mixed precision dot product support.

Ah okay, that's cool. I did some brief reading on it a little bit back and must've conflated some of the benefits I did read about as penalties for hardware doing it a different way.

Specifically, data done in lower-precision having a lower bandwidth and memory footprint consumption than higher-precision calculations. So it can act as, essentially, giving a lot more bandwidth for more data. The trade-off of course that not all data may effectively be able to be calculated at such low precision, and it probably takes a lot of testing and time to figure which data is best suited for the lower-precision calculation.

I'm extremely interested to see the benefits of this in real-time for actual games on the consoles going forward, hopefully by mid-2022 we can start seeing the first batch of games beginning to leverage this type of lower-precision ML.
 
Couple things with what a handful of other developers have already suggested and it would seem that there's some extended silicon within Series systems for INT8/INT4 lower-precision math that likely isn't present on PS5.
You just won't do lower precision less accuarte int8/int4 calculations if you don't have benefit of higher performance.
 
You just won't do lower precision less accuarte int8/int4 calculations if you don't have benefit of higher performance.
You would to avoid retraining the model. It really depends on how much the developers (either AMD/MS/Sony or the individual studios) want to support. Lower precision is proving very successful at preserving accuracy. You're talking < 1% differential.
 
You would to avoid retraining the model. It really depends on how much the developers (either AMD/MS/Sony or the individual studios) want to support. Lower precision is proving very successful at preserving accuracy. You're talking < 1% differential.
ok good point, still possible fp16 is used in amd fx super resolution and not int8
 
ok good point, still possible fp16 is used in amd fx super resolution and not int8
PS5 will be alright. Just to clarify my stance here:

If you thought talking about performance expectations around a GPU was already fluffy, ML performance is all over the place in terms of consistency. Your model matters, your hardware matters, what filters you may use matters, and absolutely your training set in both quality and quantity matter. If you manage to produce a model that fits within your required criteria, that's already a huge win. Then it gets into crazy optimization where you're saving < 1% here and there all the time.
AMD particularly positions their ML hardware around 64bit performance because to their clients that matters.

So you consider some industries like Tesla or the medical or marketing industries, fast is great, but at the end of the day they want to claw as much performance as they can. For example if Tesla is getting 99.998313% no deaths per million miles during autopilot with a fp16 network, and then they move to fp64 network and get 99.98499822%. It may not seem like a lot. But if your customers are driving 1 Billion miles a year suddenly this is going to matter because this 0.01331478% * 1000 becomes 13.3 more people dead per year because you didn't use FP 64.
The same things goes with detecting cancer patients, when the numbers are so large, those minor 1% differences can lead to a ton of misdiagnosis.
So 64 bit and higher accuracy do in fact matter, and I should be clear that I would always want higher accuracy when it concerns my life. Some marketing companies will care if they are spending hundreds of millions to get web page conversions for instance.

But in the instance of graphics, since the risk of getting it wrong is relatively low to nil (considering how much players are already willing to accept with TXAA in terms of margin of error, and are relatively happy with resolution reductions to gain frame rate) I think it should be acceptable to trade off accuracy for higher performance.
 
PS5 will be alright. Just to clarify my stance here:

If you thought talking about performance expectations around a GPU was already fluffy, ML performance is all over the place in terms of consistency. Your model matters, your hardware matters, what filters you may use matters, and absolutely your training set in both quality and quantity matter. If you manage to produce a model that fits within your required criteria, that's already a huge win. Then it gets into crazy optimization where you're saving < 1% here and there all the time.
AMD particularly positions their ML hardware around 64bit performance because to their clients that matters.

So you consider some industries like Tesla or the medical or marketing industries, fast is great, but at the end of the day they want to claw as much performance as they can. For example if Tesla is getting 99.998313% no deaths per million miles during autopilot with a fp16 network, and then they move to fp64 network and get 99.98499822%. It may not seem like a lot. But if your customers are driving 1 Billion miles a year suddenly this is going to matter because this 0.01331478% * 1000 becomes 13.3 more people dead per year because you didn't use FP 64.
The same things goes with detecting cancer patients, when the numbers are so large, those minor 1% differences can lead to a ton of misdiagnosis.
So 64 bit and higher accuracy do in fact matter, and I should be clear that I would always want higher accuracy when it concerns my life. Some marketing companies will care if they are spending hundreds of millions to get web page conversions for instance.

But in the instance of graphics, since the risk of getting it wrong is relatively low to nil (considering how much players are already willing to accept with TXAA in terms of margin of error, and are relatively happy with resolution reductions to gain frame rate) I think it should be acceptable to trade off accuracy for higher performance.
I don't disagree but we don't know how good are amd algorithms, we don't know they manage to use with good results int8 models etc. so thats why I wrote its possible fx resoluton doesn't even use int8.
 
I don't disagree but we don't know how good are amd algorithms, we don't know they manage to use with good results int8 models etc. so thats why I wrote its possible fx resoluton doesn't even use int8.
I think there's a pretty good possibility that AMD has a solution that uses int8 that they deploy for their desktop parts that support it, and that it would work on parts that are fp16 but would perform slower, and that regardless of these facts Sony rolls their own solution that ends up being good as well. I think it would be short sighted for AMD to not use int8 since their newer parts support it and performance is better with it, and in the desktop graphics space, they need to be performance competitive with nVidia. If they have an fp16 only solution they would have a solution that is just as compatible as int8, but wouldn't have the performance benefits of int8 on newer hardware.
 
Bloodborne has a too big potential for some future remaster release.

The same as for Demons Souls, it was #1 requested game for re-release for years.
 
Back
Top