AMD FidelityFX on Consoles

I now believe that AMD custom is a lot more detailed than first thought.
Always thought it was mainly big blocks, media blocks, rops, front end, cache sizes, then specific customer customizations.

Now I think it's down to features also.
VRS, int 4/8 etc.

So it's not about removing features from RDNA2 it's about paying to have them included from list of available features.

So then you have to consider Sony and MS possible different reasoning.

Example reduced precision:
Sony
  • What use will it have in games
  • Cost of inclusion
  • Spend that money on other features (cache scrubbers etc)
  • Can be done without reduced precision for the odd situation where would be nice to have.

MS
  • Hardly any die footprint
  • Light Azure ML workloads when not running xcloud

Back when these was being designed DLSS wasn't a thing, etc.

Much like Nvidia it's up to MS to now make use of features that may be more unique.
ML texture de/compression, ML upscaling for example.
These are all things than can be done on all platforms but may have a decent performance benefit on xbox. And can shared across internal studios and also be included in playfab. Same for AVX libraries.

Maybe AMD sells features like BMW. Your hardware comes with all the bells and whistles but AMD won't expose them through software unless you pay for it (you can actually buy features through your BMW dash, WTF). LOL.

It seems a bit much for AMD to go through the process of redesigning hardware blocks just to remove or add features especially ones that cost very little in terms of silicon. Just disable them in the bios.

Also, MS was talking super resolution using DirectML back during spring of 2018 so its not new to them.
https://devblogs.microsoft.com/directx/gaming-with-windows-ml/
 
Last edited:
RDNA 1 has dot product hardware? How do u use it? There is no mention of it in the RDNA 1 isa from what I can tell and the RDNA 2 isa lists added dot product instructions as a notable feature change for RDNA2.


In June 2019 (4 months before the release of RX5500 / Navi 14), these were added for the Navi architecture:

{"v_dot2_f32_f16", GCNENC_VOP3P, GCN_STDMODE, 19, ARCH_NAVI_DL },
{"v_dot2_i32_i16", GCNENC_VOP3P, GCN_STDMODE, 20, ARCH_NAVI_DL },
{"v_dot2_u32_u16", GCNENC_VOP3P, GCN_STDMODE, 21, ARCH_NAVI_DL },
{ "v_dot4_i32_i8", GCNENC_VOP3P, GCN_STDMODE, 22, ARCH_NAVI_DL },
{ "v_dot4_u32_u8", GCNENC_VOP3P, GCN_STDMODE, 23, ARCH_NAVI_DL },
{ "v_dot8_i32_i4", GCNENC_VOP3P, GCN_STDMODE, 24, ARCH_NAVI_DL },
{ "v_dot8_u32_u4", GCNENC_VOP3P, GCN_STDMODE, 25, ARCH_NAVI_DL



GFX1011 is Navi 12 (Apple's Radeon Pro 5600M) and GFX1012 is Navi 14 (RX/Pro 5500/M, RX/Pro 5300/M) . In LLVM they're both listed as supporting dot4 INT8 and dot8 INT4 instructions with a 32bit accumulator:
https://llvm.org/docs/AMDGPU/AMDGPUAsmGFX1011.html

So far, Navi 10 is the only Navi GPU without support for higher throughput forINT8/INT4.
In fact, save for the 7nm APUs using older Vega iGPUs, Navi 10 is the only GPU chip released by AMD since 2019 that does not support dot4 INT8 / dot8 INT4.
Even Vega 20 already had support for this.

bf5C5B8.jpg





If I had to guess, there were never any plans to release any RDNA GPU without INT8/4 RPM, as it seems to be intrinsic to the base architecture. It was just broken on the latest Navi 10 tapeouts and AMD chose not to delay the RX5700 release any longer (with rumors of Navi 10 being a headache to AMD giving some credence to this).

The "it's there as an option" theory would make sense if these were only available for e.g. Navi 12 on Apple computers, with that company pushing for better ML inference performance. But it makes no sense for Navi 14 - which is a medium-low range discrete GPU for gaming - to have dot4-INT8 + dot8-INT4 capabilities, as it's definitely not an important feature for its target market.


It seems a bit much for AMD to go through the process of redesigning hardware blocks just to remove or add features especially ones that cost very little in terms of silicon. Just disable them in the bios.
Agreed.
Which is again another reason why it would be very odd if AMD had just decided to redesign the RDNA1 CUs for Navi 12 + Navi 14 to have the hardware for faster ML inference, and then take it off on RDNA2 very specifically for the PS5, and then put it on again for the Microsoft RDNA2 consoles, and leaving it there again for all RDNA2 discrete GPUs.
 
Last edited by a moderator:
Is there an ETA on this tech's implementation?
FidelityFX is just the latest name for AMD's set of optimization libraries, it's been around at least since mid-2019 and it's been used on dozens of games already.

If you're talking about Super Resolution / FSR then there's no date but Scott Herkelman said they're planning to "release it this year".
The problem here is we don't know if FSR "will be shown this year" or the first title supporting FSR is releasing this year. They could be two very different things.
 
Maybe AMD sells features like BMW. Your hardware comes with all the bells and whistles but AMD won't expose them through software unless you pay for it (you can actually buy features through your BMW dash, WTF). LOL.

It seems a bit much for AMD to go through the process of redesigning hardware blocks just to remove or add features especially ones that cost very little in terms of silicon. Just disable them in the bios.
It doesn't need to be a redesign.
The op codes at the hardware level just doesn't need to be implemented/exposed. Much lower level than drivers, and is done in hardware.

Also, MS was talking super resolution using DirectML back during spring of 2018 so its not new to them.
https://devblogs.microsoft.com/directx/gaming-with-windows-ml/
This is about the benefits of ML.
The part about ML upscaling is specifically talking about Nvidia

The only thing MS has demonstrated is that Nvidia could move to Direct ML and it will use tensor cores if available and shaders etc if not.
MS has been very involved in ML and AI though and one of the reasons I personally think they are investing in ML upscaling.

The trouble is the demo that MS showed is misunderstood by many people.
It's demoing Direct ML, it could have used anything, ducks bobing in a bath.

The part that they did was the easiest 5% that is required. It's the models that all the work is in and that was provided by Nvidia.
 
But it makes no sense for Navi 14 - which is a medium-low range discrete GPU for gaming - to have dot4-INT8 + dot8-INT4 capabilities, as it's definitely not an important feature for its target market.
This has been the strongest reason presented (for me) to believe its just on every gpu.

But for semi custom parts, if you don't think you need it and there's a cost, you may not take it.
I think I mis represented my thoughts before, making it possibly sound like I was suggesting a totally changed and different hardware.

AMD must have many different features so they can sell custom parts at different price points.
Even if it just means not exposing it at that hardware level.

I'm also not saying Sony doesn't have it, just that I could see why they wouldn't.
 
This has been the strongest reason presented (for me) to believe its just on every gpu.

But for semi custom parts, if you don't think you need it and there's a cost, you may not take it.
I think I mis represented my thoughts before, making it possibly sound like I was suggesting a totally changed and different hardware.

AMD must have many different features so they can sell custom parts at different price points.
Even if it just means not exposing it at that hardware level.

I'm also not saying Sony doesn't have it, just that I could see why they wouldn't.
The ALUs themselves are hardware blocks as are the components in the CUs as well.
So I would disagree that it’s in every Navi.

Semi custom is often just a selection of various component blocks: in this case the select of mixed precision dot products is selecting a different type of ALU block that supports it. It’s will run a higher cost of silicon to support multiple pathways

I think microcode firmware would be enabled if it’s ok every Navi. No point in holding it back. This isn’t a scenario where nvidia is purposely gimping FP16 so we are forced to buy Titans for work.

amds niche for ML is double precision throughput which is found on their CDNA lines.
 
Last edited:
The ALUs themselves are hardware blocks as are the components in the CUs as well.
So I would disagree that it’s in every Navi.

Semi custom is often just a selection of various component blocks: in this case the select of mixed precision dot products is selecting a different type of ALU block that supports it. It’s will run a higher cost of silicon to support multiple pathways
I'm unsure why they couldn't just not expose the lower precision functions though.
We've seen them disable features at the hardware level many times for many different reasons.

Can't say I disagree with your or @ToTTenTranz reasoning. Just that either way I could see it not being exposed.
 
I'm unsure why they couldn't just not expose the lower precision functions though.
We've seen them disable features at the hardware level many times for many different reasons.

Can't say I disagree with your or @ToTTenTranz reasoning. Just that either way I could see it not being exposed.
I’m pretty confident it’s a hardware pathway that requires silicon and a redesign of the ALU blocks to support it.
 
FidelityFX is just the latest name for AMD's set of optimization libraries, it's been around at least since mid-2019 and it's been used on dozens of games already.

If you're talking about Super Resolution / FSR then there's no date but Scott Herkelman said they're planning to "release it this year".
The problem here is we don't know if FSR "will be shown this year" or the first title supporting FSR is releasing this year. They could be two very different things.
Yeah, I should have written Super Resolution.
AMD has a page on FidelityFX "supported games", including upcoming games. But none of the ones listed include FSR as a planned feature.
 
The ALUs themselves are hardware blocks as are the components in the CUs as well.
So I would disagree that it’s in every Navi.

Why does Navi 14 have dot4 INT8 / dot8 INT4, then?
Why would AMD decide to include a ML inference feature on a low/mid-end GPU focused on gaming?
If this isn't an inherent capability of the RDNA WGPs, why include it in a low-margin / low-cost where die area is critical to achieve profitability?


Out of all RDNA GPUs that have launched so far and have their specifications publicly available (5 RDNA1/2 dGPUs + 2 Series SoCs), how many have this capability absent? There's Navi10 which was rumored to be very problematic for AMD, and...?


Yeah, I should have written Super Resolution.
AMD has a page on FidelityFX "supported games", including upcoming games. But none of the ones listed include FSR as a planned feature.
AMD is being awfully secretive about FSR, yes.
For what it's worth though, Scott Herkelman seemed quite bullish on the technology. In the PC Gamer conversation I believe he said FSR would make nvidia's RT preformance advantage a "moot point" (though I don't know if he was taking DLSS into the equation).
 
In the PC Gamer conversation I believe he said FSR would make nvidia's RT preformance advantage a "moot point" (though I don't know if he was taking DLSS into the equation).
That would be quite stupid but wouldn't be surprising. There's no way AMD is going to match Nvidia in combined performance. If AMD can at least boost perf via FSR to make RT more viable for AMD customers then it's great, at least for this gen.
 
In June 2019 (4 months before the release of RX5500 / Navi 14), these were added for the Navi architecture:





GFX1011 is Navi 12 (Apple's Radeon Pro 5600M) and GFX1012 is Navi 14 (RX/Pro 5500/M, RX/Pro 5300/M) . In LLVM they're both listed as supporting dot4 INT8 and dot8 INT4 instructions with a 32bit accumulator:
https://llvm.org/docs/AMDGPU/AMDGPUAsmGFX1011.html

So far, Navi 10 is the only Navi GPU without support for higher throughput forINT8/INT4.
In fact, save for the 7nm APUs using older Vega iGPUs, Navi 10 is the only GPU chip released by AMD since 2019 that does not support dot4 INT8 / dot8 INT4.
Even Vega 20 already had support for this.

bf5C5B8.jpg





If I had to guess, there were never any plans to release any RDNA GPU without INT8/4 RPM, as it seems to be intrinsic to the base architecture. It was just broken on the latest Navi 10 tapeouts and AMD chose not to delay the RX5700 release any longer (with rumors of Navi 10 being a headache to AMD giving some credence to this).

The "it's there as an option" theory would make sense if these were only available for e.g. Navi 12 on Apple computers, with that company pushing for better ML inference performance. But it makes no sense for Navi 14 - which is a medium-low range discrete GPU for gaming - to have dot4-INT8 + dot8-INT4 capabilities, as it's definitely not an important feature for its target market.



Agreed.
Which is again another reason why it would be very odd if AMD had just decided to redesign the RDNA1 CUs for Navi 12 + Navi 14 to have the hardware for faster ML inference, and then take it off on RDNA2 very specifically for the PS5, and then put it on again for the Microsoft RDNA2 consoles, and leaving it there again for all RDNA2 discrete GPUs.

Probably because the majority of the skus related to Navi 12 and Navi 14 are pro cards. The fact that they are in 5300 and 5500 based cards may be due to a reality where AMD didn't want to create a separate Navi 13 (or whatever) to accommodate lower end designs specifically meant for consumer graphics.

The PS5 doesn't seem to have them which would be weird if they were an intrinsic part of the rdna design. How are they broken in the PS5 (Navi 14 predates the PS5 by almost a year) but not in XSX and if they aren't broken why would Sony chose not to expose them? They are basically free in that circumstance.
 
Last edited:
Why does Navi 14 have dot4 INT8 / dot8 INT4, then?
Why would AMD decide to include a ML inference feature on a low/mid-end GPU focused on gaming?
If this isn't an inherent capability of the RDNA WGPs, why include it in a low-margin / low-cost where die area is critical to achieve profitability?


Out of all RDNA GPUs that have launched so far and have their specifications publicly available (5 RDNA1/2 dGPUs + 2 Series SoCs), how many have this capability absent? There's Navi10 which was rumored to be very problematic for AMD, and...?
Link?
I'm not sure I've seen Dot4 and Dot8 mixed precision outputs on a standard Navi.
There's nothing in a whitepaper to suggest it does for standard Navis
 
That would be quite stupid but wouldn't be surprising. There's no way AMD is going to match Nvidia in combined performance. If AMD can at least boost perf via FSR to make RT more viable for AMD customers then it's great, at least for this gen.

Stupid to say or stupid to do?
The statement as I recalled it does seem really farfetched, but I just revisited it here, at ~42m20s:


When asked about raytracing performance competitiveness:
(...)
So yes, we are at a deficit [against Nvidia]. There's no doubt about it. I think when we come out with FSR that point will be mute, and that's why we're working on that so significantly now.

Like I said, quite the bullish statement he made here and you can see a somewhat proud smile when he says this..
I also find it hard to believe it can match Nvidia's RT+DLSS2, but it might level things up a bit.


Probably because the majority of the skus related to Navi 12 and Navi 14 are pro cards. The fact that they are in 5300 and 5500 based cards may be due to a reality where AMD didn't want to create a separate Navi 13 (or whatever) to accommodate lower end design meant for consumer graphics.
The only Pro SKUs using Navi 12 and Navi 14 are the ones going into the 16" Macbook Pro, which have zero mention of ML inference in their website as they're presented as GPUs for 3D content creation and video content.

And if that capability is only interesting for Pro solutions, why include it in Navi 21 and Navi 22 that have no Pro counterparts at all?
Besides, is the 16" Macbook Pro really selling more than all the discrete cards + low-end system integrators?


The PS5 doesn't seem to have them which would be weird if they were an intrinsic part of the rdna design. How are they broken in the PS5 but not in XSX and if they aren't broken why would Sony chose not to expose them? They are basically free in that circumstance.
There is no proof whatsoever pointing to the PS5 not having what seems to be a basic feature of almost all the RDNA GPUs (all save one), and very little reason for Sony to just take it out
Sony not mentioning it isn't proof of anything, nor is one guy stating "there is no ML stuff" in a tweet he was quick to delete afterwards.



Link?
I'm not sure I've seen Dot4 and Dot8 mixed precision outputs on a standard Navi.
There's nothing in a whitepaper to suggest it does for standard Navis

What's "standard Navis"?
- For Navi 12 + Navi 14 you can follow the links in this post;
- For Navi 21 and Navi 22 it's well documented and reported;
- For the Series X SoC it's well documented and reported (on Microsoft slides);
- For the Series S SoC it's well documented and reported (on Microsoft slides).

If by "standard Navis" you mean "exclusively Navi 10", then there is no dot4/dot8 mixed precision output in there. Not functional, at least.
I'm not sure what makes Navi 10 more standard than any of the other two RDNA1 dGPUs or even the other Navi 2x GPUs out there, though.
 
When asked about raytracing performance competitiveness:


Like I said, quite the bullish statement he made here and you can see a somewhat proud smile when he says this..
I also find it hard to believe it can match Nvidia's RT+DLSS2, but it might level things up a bit.

Maybe amd can get relatively good performance upgrade if FSR is combined with more optimal usage of infinity cache. Lower res should mean there is relatively more infinity cache to use for other things than textures/framebuffer/... I wonder if current bottle neck for amd in RT is hw performance or memory accesses.
 
Stupid to say or stupid to do?
The statement as I recalled it does seem really farfetched, but I just revisited it here, at ~42m20s:
I mean stupid to compare a possible RT+FSR solution against Nvidia RT only, without DLSS.
 
What's "standard Navis"?
- For Navi 12 + Navi 14 you can follow the links in this post;
- For Navi 21 and Navi 22 it's well documented and reported;
- For the Series X SoC it's well documented and reported (on Microsoft slides);
- For the Series S SoC it's well documented and reported (on Microsoft slides).

If by "standard Navis" you mean "exclusively Navi 10", then there is no dot4/dot8 mixed precision output in there. Not functional, at least.
I'm not sure what makes Navi 10 more standard than any of the other two RDNA1 dGPUs or even the other Navi 2x GPUs out there, though.
Yea the navi 10 just going off the white paper
'Some variants of the dual compute unit expose additional mixed-precision dot-product modes in the ALUs, primarily for accelerating machine learning inference. A mixed-precision FMA dot2 will compute two half-precision multiplications and then add the results to a single-precision accumulator. For even greater throughput, some ALUs will support 8-bit integer dot4 operations and 4-bit dot8 operations, all of which use 32-bit accumulators to avoid any overflows.'

Navi 2X family has it all that we know for sure.

hmm.. yea not sure. It's possible Navi 12 and Navi 14 have it looking at the ISA.
Thanks for sharing.

Not sure how we would test it either. Someone would need to run a mixed precision NN on those GPUs and compare it to a Navi 10 to find the performance differential. AMD hasn't openly advertised it yet? But I guess according to the whitepaper these GPUs could be those variants.

The Navi 12 and 14 are mobile GPUs right? Hmm. Yea I guess that would make sense if you're a laptop person looking to do localized machine learning. Navi 10 is their gaming GPU however.

hmmm. Yea I dunno. I don't know what more you can add hardware wise to inference running over and above what is there already. I think we already worked backwards what MS did based off their XSX slides.
 
Stupid to say or stupid to do?
The statement as I recalled it does seem really farfetched, but I just revisited it here, at ~42m20s:


When asked about raytracing performance competitiveness:


Like I said, quite the bullish statement he made here and you can see a somewhat proud smile when he says this..
I also find it hard to believe it can match Nvidia's RT+DLSS2, but it might level things up a bit.

The only Pro SKUs using Navi 12 and Navi 14 are the ones going into the 16" Macbook Pro, which have zero mention of ML inference in their website as they're presented as GPUs for 3D content creation and video content.

Unless I am mistaken Navi 12 only consists of pro cards (two) as in the 5600M pro found in Macs and a discrete card (Radeon Pro V520). Navi 14 has 13 skus associated with it with 7 being professional cards.

What would motivate AMD more? Including capabilities in low-end consumer cards specifically for gaming use that its mid and high range lack. Or adding the capability to low-end professional cards where deep learning is more readily utilize with the side effect that some low-end consumer cards have the hardware because it cheaper to repurpose than redesign?

And if that capability is only interesting for Pro solutions, why include it in Navi 21 and Navi 22 that have no Pro counterparts at all?
Besides, is the 16" Macbook Pro really selling more than all the discrete cards + low-end system integrators?

Because AMD has CDNA based cards now with more fleshed out DL capabilities. CDNA sport Matrix Core Engines that have Matrix-MFA instructions not found in RDNA. The most capable deep learning AMD GPUs are professional-based cards.

There is no proof whatsoever pointing to the PS5 not having what seems to be a basic feature of almost all the RDNA GPUs (all save one), and very little reason for Sony to just take it out
Sony not mentioning it isn't proof of anything, nor is one guy stating "there is no ML stuff" in a tweet he was quick to delete afterwards.

There is no proof that PS5 has them either. The burden of proof shouldn't require any of us to prove a negative. Should we assume or have assumed (without any proof) that PS5 had VRS, Mesh shaders and/or Infinity Cache because Sony made no declaration that the PS5 doesn't sport those features? Or should we assume that the XSX has cache scrubbers because neither AMD or Microsoft has made declarations that their hardware lacks the functionality. Outside of Infinity Cache, the PS5 may have those features. But you can't treat possibilities as facts just because no one can prove otherwise.

AMD RDNA white paper describes int4 and int8 being part of variants of the RDNA CU. That implies there are RDNA CUs with no mixed-precision dot product functionality. There is enough hardware variation between RDNA based processors that you can't readily assume all the hardware is present but is simply turned off or broken.
 
Last edited:
Back
Top