AMD RDNA4 potential product value

Jay · Jan 7, 2025

arandomguy said:
Doesn't XeSS fallbacks (via DP4a and Compute) as a ML upscaler already work on other IHV hardware without explicit driver support and optimizations on the part of those vendors?

That's a lighter model.
Dp4a doesn't use tensor for example for xess dp4a path, just shader compute.
Don't really have a suitable api that would work across tensor and xmx etc. Hense would need them to implement it for their hardware, say cuda for Nvidia (assuming that's suitable)

arandomguy · Jan 7, 2025

DegustatoR said:
XeSS SR does but that version is significantly lower quality than the one which runs on XMX h/w. AMD doesn't need to copy that because they have FSR2/3 for that purpose already.
Also XeSS FG is locked to Intel WMMA h/w because implementing that via DP4a seemingly wasn't possible.
If FSR4 AI SR will aim at being close to DLSS SR then implementing a worse version a la XeSS seems pointless. They'd need all the power they can spend to reach DLSS IQ levels.

Jay said:
That's a lighter model.
Dp4a doesn't use tensor for example for xess dp4a path, just shader compute.
Don't really have a suitable api that would work across tensor and xmx etc. Hense would need them to implement it for their hardware, say cudu for Nvidia (assuming that's suitable)

Yes I know it's a lighter model but wasn't the question was whether or not a ML upscaler (we can set aside the FG component for now) can be generically implemented in such a way without explicit support by the vendor used by the client.

Or are asking can we implement a ML upscaler with identical output on the client side? Then are we further asking can we implement one with "neutral" performance implications for all IHVs?

Jay · Jan 7, 2025

DegustatoR said:
Well, with FSR4 being RDNA4 exclusive and everything else continuing to rely on FSR3 (presumably) there's a reason - DirectSR would provide such routing for AMD "automatically" without any need for them to update the FSR SDKs with such functionality (would still be needed for non-Windows PCs though probably).

That's the benefit for AMD possibly.
Not a reason to align it from MS pov.
Already have DLSS, XeSS, FSR3 etc, more than enough to make it worthwhile. Then when FSR4 turns up it would just automatically light up in the game. So would actually be more beneficial for AMD if games was made with direct sr already actually.

trinibwoy · Jan 7, 2025

arandomguy said:
Yes I know it's a lighter model but wasn't the question was whether or not a ML upscaler (we can set aside the FG component for now) can be generically implemented in such a way without explicit support by the vendor used by the client.

What API would a generic implementation use?

arandomguy · Jan 7, 2025

trinibwoy said:
What API would a generic implementation use?

Again, whatever XeSS does already accomplishes this?

What it doesn't do, as I elaborated, is have identical output and I guess identical performance regardless of the underlying hardware and vedor. However are we asking for a solution with those criteria or just a ML upscaler? The followup would be why is that important?

DegustatoR · Jan 7, 2025

arandomguy said:
Yes I know it's a lighter model but wasn't the question was whether or not a ML upscaler (we can set aside the FG component for now) can be generically implemented in such a way without explicit support by the vendor used by the client.

Or are asking can we implement a ML upscaler with identical output on the client side? Then are we further asking can we implement one with "neutral" performance implications for all IHVs?

Such implementation is needed only for compatibility with the h/w which doesn't have fast WMMA. In case of AMD they have FSR2 for that so there is no need.

And as I've said since the aim is to compete with DLSS going with a simpler lighter model defeats the purpose of having AI component there in the first place. For a relatively bad competitor they have FSR2 already, now they need something on par with DLSS, and you can't run that via DP4a (at least if XeSS is an indication).

arandomguy · Jan 7, 2025

DegustatoR said:
Such implementation is needed only for compatibility with the h/w which doesn't have fast WMMA. In case of AMD they have FSR2 for that so there is no need.

And as I've said since the aim is to compete with DLSS going with a simpler lighter model defeats the purpose of having AI component there in the first place. For a relatively bad competitor they have FSR2 already, now they need something on par with DLSS, and you can't run that via DP4a (at least if XeSS is an indication).

Now I'm not sure where this conversation is going.

Are we just answering this specifically? -

Seanspeed said:
Is there something inherent to ML-based upscaling that means any model it uses can only ever be used on a specific vendor's GPU architecture? Obviously we've really only got two examples to go by, DLSS and XeSS XMX(I guess technically three with PS5 Pro PSSR) which both do require vendor-specific hardware, but is this gonna be a hard rule going forward?

Is the idea of an 'open' ML upscaler usable by other vendors without explicit driver support and optimization impossible?

If we want to discuss the issue of what AMD should or should not do in general I don't seen the business side of any IHV providing an identical solution to their competitors for no gain on their end. If AMD's solution is "better" than they shouldn't provide it to their competitors unless they get something in return, if it's "neutral" the same, if it's "worse" well their competitiors shouldn't accept it.

trinibwoy · Jan 7, 2025

arandomguy said:
Again, whatever XeSS does already accomplishes this?

What it doesn't do, as I elaborated, is have identical output and I guess identical performance regardless of the underlying hardware and vedor. However are we asking for a solution with those criteria or just a ML upscaler? The followup would be why is that important?

Oh you mean ML inferencing using normal shader instructions? Well for one AMD would first need to build such a model that would have decent performance. Secondly if it’s possible then certainly RDNA 3 would be a target.

I don’t think identical output is a feasible objective since clearly that varies widely across vendors and even across different versions from the same vendor. The goal is “passable quality” at useful performance.

DegustatoR · Jan 7, 2025

arandomguy said:
Now I'm not sure where this conversation is going.

Are we just answering this specifically? -

I don't know what you answer specifically but my point is that for a DLSS class upscaler - which FSR4 hopefully is as there isn't any reason to not just continue to use FSR3 instead of that's not the case - you need WMMA performance which is only accessible through IHV APIs - NGX, CUDA, OneAPI (or whatever Intel is using for XMX version of XeSS). This means that implementing such upscaler via "hacks" (DP4a) or common ML APIs we have now (DirectML) likely isn't possible.

That being said though we don't really know yet how AMD has implemented FSR4 on PC. I would be surprised if it's a common API though. ROCm seems a more likely candidate.

Cappuccino · Jan 7, 2025

DegustatoR said:
XeSS SR does but that version is significantly lower quality than the one which runs on XMX h/w.

Consequently its also slower when comparing iso-resolution. You get higher quality but you pay for it with frames when you use the DP4A version.

troyan · Jan 7, 2025

nVidia is supporting the new Transformer model down to Turing. What have they spent on TensorCores? 20% of the transistor budget? So with 2 billion transistors on 16nm they can not even support ml upscaling they can use a 2x as heavy compute model.

Sorry, but AMD is just screwing their own customer base over to sell them basically the same product again.

Qesa · Jan 7, 2025

troyan said:
nVidia is supporting the new Transformer model down to Turing. What have they spent on TensorCores? 20% of the transistor budget?

We actually have a fairly confident answer for Turing since nvidia released it with and without RTX - it works out to 8-9% of the die (depending on how much of it is SMs compared to everything else) for tensor and RT cores combined.

Xmas · Jan 7, 2025

troyan said:
I dont see how "non-ML upscaling" would be a better fit for a 7900XTX with a 300mm^2 5nm die when DLSS runs just fine on a cut down 300mm^2 16nm chip with 10.8 billion transistors (or 1/5 of RDNA3).

What's important for upscaling is relative performance between rendering and the upscaler. The upscaler needs to be faster than generating extra pixels via normal render path, and it needs to achieve good quality. Thus the larger the relative performance difference between specialised matmul hardware and combined raster/RT/shading, the more it makes sense to lean heavily on ML upscaling.

So yes, even on a low-end Nvidia GPU it makes sense to use DLSS because of the ratio of Tensor Core ops to shader ops. But source/target resolutions will be lower than on a high-end GPU.

Seanspeed said:
Is there something inherent to ML-based upscaling that means any model it uses can only ever be used on a specific vendor's GPU architecture? Obviously we've really only got two examples to go by, DLSS and XeSS XMX(I guess technically three with PS5 Pro PSSR) which both do require vendor-specific hardware, but is this gonna be a hard rule going forward?

It's not a hard rule. But as described above, the computational budget you have for upscaling depends on relative performance of the matmul hardware, which is architecture dependent. And which matrix shapes and data types are best is also architecture dependent. ML upscaling is a use case where optimising for a specific architecture pays off massively, so cross-vendor models should at best be a fallback in case of glitches, but we do need a common API that developers can target.

troyan · Jan 7, 2025

The 7900XTX is not slower in pure ML workloads and its on par with a 2080TI: https://www.tomshardware.com/pc-components/gpus/stable-diffusion-benchmarks

DLSS performance needs 1.28ms on a 2080TI in 4k and 0.73ms on a 4080. So this workload is just to much for a 50 billion chip?

Seanspeed · Jan 7, 2025

arandomguy said:
If we want to discuss the issue of what AMD should or should not do in general I don't seen the business side of any IHV providing an identical solution to their competitors for no gain on their end. If AMD's solution is "better" than they shouldn't provide it to their competitors unless they get something in return, if it's "neutral" the same, if it's "worse" well their competitiors shouldn't accept it.

It doesn't really hurt them to accept it, though. If the aim is simply to deter adoption, that's just kind of a petty response and probably wont work in the long run if FSR4 is any good at all.

I remember when people were arguing whether Nvidia should support Freesync for a good while there, when that was still new. Sure, their proprietary tech was better, but if anything, being able to do anything AMD GPU's can do minimizes the selling point on AMD's side. I dont see there's any 'waging war' possible here, it just seems sensible to just adopt whatever the best solutions are, so that their users have options. Obviously DLSS will be an option in most cases where FSR4 will be an option, but if FSR4 is any good at all, it could potentially be one of those cases where there could be specific games where one might have preferable characteristics to the other, even just depending on personal preference. Doesn't really hurt Nvidia to have somebody choose FSR4 over DLSS in the occasional game. I feel that's better than having that occasional situation happen where it becomes a boon for AMD users since Nvidia users cant use it.

Again, all this is assuming that such an open-esque solution is possible to begin with.

Sega_Model_4 · Jan 8, 2025

Preorder starts at 9:00am ET, Thu Jan 23

https://twitter.com/x/status/1876991907890405550

iroboto · Jan 8, 2025

is there no actual product launch ?

Kaotik · Jan 8, 2025

iroboto said:
is there no actual product launch ?

There will be, they already promised that. The question is when it is between now and 23rd. Could even be on 23rd ending with "cards are available starting now" with review embargo on same day, but hoping earlier reviews/launch.

Subtlesnake · Jan 8, 2025

troyan said:
The 7900XTX is not slower in pure ML workloads and its on par with a 2080TI: https://www.tomshardware.com/pc-components/gpus/stable-diffusion-benchmarks

DLSS performance needs 1.28ms on a 2080TI in 4k and 0.73ms on a 4080. So this workload is just to much for a 50 billion chip?

AMD have not ruled out offering the path on the 7000 series.

"AMD said that FSR 4 uses a machine-learning based algorithm to reconstruct details in its super resolution component, and that RDNA 4 provides a significant uplift in MLops over the previous generations of RDNA, and so initially FSR 4 will be available on the RX 9000 series, but the company will assess specific SKUs from previous generations that it thinks are capable of the AI acceleration performance needed for FSR 4 SR algorithm to work without imposing a prohibitive performance or latency cost."

AMD Explains Missing RDNA 4 Announcements At CES

Perhaps the biggest surprise at AMD's January 6 CES keynote address was the omission of the Radeon RX 9070 series desktop graphics cards, and the RDNA 4 graphics architecture. This was particularly because the CES Pre-brief slide-deck that AMD provided to press included materials about Radeon...

www.techpowerup.com

raytracingfan · Jan 13, 2025

Seanspeed said:
Is there something inherent to ML-based upscaling that means any model it uses can only ever be used on a specific vendor's GPU architecture? Obviously we've really only got two examples to go by, DLSS and XeSS XMX(I guess technically three with PS5 Pro PSSR) which both do require vendor-specific hardware, but is this gonna be a hard rule going forward?

Is the idea of an 'open' ML upscaler usable by other vendors without explicit driver support and optimization impossible?

With current APIs, XeSS DP4a is as good as it gets, and as others have says it is lacking in both performance and quality compared to DLSS and XeSS XMX. The new DX12 Cooperative Vectors feature might change that though. It will be interesting if Epic or any other AAA engine devs can use it to make a first-rate ML upscaler/antialiaser/denoiser/interpolator tuned specifically for their engine/game.

AMD RDNA4 potential product value

Jay

arandomguy

Jay

trinibwoy

Meh

arandomguy

DegustatoR

arandomguy

trinibwoy

Meh

DegustatoR

Cappuccino

troyan

Qesa

Xmas

Porous

troyan

Seanspeed

Sega_Model_4

iroboto

Daft Funk

Kaotik

Drunk Member

Subtlesnake

AMD Explains Missing RDNA 4 Announcements At CES

raytracingfan

Similar threads