AMD RDNA5 Architecture Speculation

raytracingfan · Jan 28, 2025

DegustatoR said:
AMD's next-gen flagship UDNA Radeon GPU won't be as powerful as the GeForce RTX 5090

AMD is set to return to the high-end enthusiast GPU market with its next-gen unified UDNA architecture, however it could still fall short of the RTX 5090.

www.tweaktown.com

Assuming this is true, was RDNA4 in 2025 and and UDNA in 2026 always the plan or did RDNA4 get delayed out of the usual two-year cadence? If RDNA4 launched November 2024 it could have made big waves, but the March 2025 launch is comical. If UDNA releases next year RDNA4 will be a blip like RDNA1.

DegustatoR · Jan 28, 2025

raytracingfan said:
Assuming this is true, was RDNA4 in 2025 and and UDNA in 2026 always the plan or did RDNA4 get delayed out of the usual two-year cadence? If RDNA4 launched November 2024 it could have made big waves, but the March 2025 launch is comical. If UDNA releases next year RDNA4 will be a blip like RDNA1.

24-26 sounds like the usual cadence give or take a quarter. We also don't know what UDNA even is and how different it will be to RDNA4. It's possible that they'll just add SKUs on top of 9070 instead of fully replacing it.

LordEC911 · Jan 29, 2025

DegustatoR said:
24-26 sounds like the usual cadence give or take a quarter. We also don't know what UDNA even is and how different it will be to RDNA4. It's possible that they'll just add SKUs on top of 9070 instead of fully replacing it.

Wouldn't that negate the whole point of UDNA?
I guess it is certainly possible with the way AMD is currently acting towards Radeon but just doesn't make sense when considering their stated goals.

DegustatoR · Jan 29, 2025

LordEC911 said:
Wouldn't that negate the whole point of UDNA?

Which is what exactly?
For all we know "the whole point of UDNA" may be little more than the GPU design done specifically for PS6.
Then even if it will be significantly different to RDNA1-4 and they will launch top to bottom lineup in 26 it would still give RDNA4 2+ years on the market which isn't anything unusual.

yuri · Feb 2, 2025

raytracingfan said:
Assuming this is true, was RDNA4 in 2025 and and UDNA in 2026 always the plan or did RDNA4 get delayed out of the usual two-year cadence? If RDNA4 launched November 2024 it could have made big waves, but the March 2025 launch is comical. If UDNA releases next year RDNA4 will be a blip like RDNA1.

In 2022 RDNA4 was scheduled for 2024 using an "Advanced Node". With all the chiplet SKUs canned the remaining SKUs went under a rework.

LordEC911 said:
Wouldn't that negate the whole point of UDNA?
I guess it is certainly possible with the way AMD is currently acting towards Radeon but just doesn't make sense when considering their stated goals.

Radeon goes the cheapest possible route. The UDNA marketing might be just about replacing WMMAs with CDNA's matrix units...

LordEC911 · Feb 7, 2025

DegustatoR said:
Which is what exactly?

Unifying the architecture across all their products/IP...
It wouldn't make much sense to develop this new architecture and then limit it to a few products that are then added to an existing lineup from the previous generation.

Dangerman · Feb 7, 2025

Big question on who is right (basically Chiphell Vs Kepler_L2) with there being Halo Tier SKUs for UDNA? IMV because it feels a 192-CU (At least comptue chiplets using TSMC N3P), 512-bit UDNA card would be competitive against RTX 6090/6090 Ti along with FSR5 bringing stuff like Ray-Reconstruction. If they could sell that 1499 USD (Minus tariffs) WW at a good profit then they'd have a big winner especially for those upgrading from a 4090 etc. And then sell a 2999-3999 double VRAM Pro-card.

Bondrewd · Feb 7, 2025

Dangerman said:
Big question on who is right (basically Chiphell Vs Kepler_L2) with there being Halo Tier SKUs for UDNA?

Well they're not roadmapped.

Dangerman said:
IMV because it feels a 192-CU (At least comptue chiplets using TSMC N3P)

half-measures.
They can do much much much bigger and meaner if they commit.

Dangerman said:
512-bit

why that when you can stack MALL.

Urian · Feb 8, 2025

The new SIMD configuration in each Compute Unit could be 4 x 32 instead of 4 x 16 or 2 x 32.

Of course this is only speculation from my part.

pTmdfx · Feb 8, 2025

Urian said:
The new SIMD configuration in each Compute Unit could be 4 x 32 instead of 4 x 16 or 2 x 32.

Of course this is only speculation from my part.

Unlike a "CU" in the GCN linage resembling a complete "core", "CU" in the RDNA lineage has become sort of an abstract rectangle box of 2 "SIMD"s sharing the memory pipeline (incl. texture and RT) and L0 Cache.

"SIMD" is the complete core in RDNA, with almost all CU-level blocks in GCN having become "SIMD"-dedicated resources in RDNA. They have now also 2x the L0 cache capacity & bandwidth per SIMD lane (so as CDNA 3, by the way). I can't see them walking back on any of these changes. These all make "more SIMDs in a CU" seem more far-fetched than ever IMO — because that would have reduced the L0 capacity & bandwidth per SIMD lane, undoing the bump.

What most likely would happen IMO is:

1. New-but-still-32-wide "SIMD" architecture with e.g. the CDNA-style Matrix Core and CDNA 3's (presumably) proper dual-issue.
2. Wave64 mode stays for graphics (?) and to enable easier porting of existing GCN kernels.
3. Stack more CUs/WGPs — Heck, they introduced the middle-level cache (L1) in RDNA to help simplify the data fabric... which is a strong indicator of "more WGPs in an SE, then more SEs" being the intended scaling dimensions.
4. Don't bother with stripping out texture and RT units for big compute chips. Leave them in as dark silicon.

Voila, you get the one unified IP block to rule all GPU products.

I would errr on anything else that sound spectacular or novel. Well... unless you are very keen on some previous community-favourite speculations, like Super-SIMD.

Bondrewd · Feb 8, 2025

pTmdfx said:
Don't bother with stripping out texture and RT units for big compute chips. Leave them in as dark silicon.

That's never happening.

pTmdfx said:
Stack more CUs/WGPs — Heck, they introduced the middle-level cache (L1) in RDNA to help simplify the data fabric... which is a strong indicator of "more WGPs in an SE, then more SEs" being the intended scaling dimensions.

L2 is the catchall cache for tiled AMD GPUs tho.

snarfbot · Mar 3, 2025

then whats the point of unifying the architecture between graphics and compute? streamlining software support?

Bondrewd · Mar 3, 2025

snarfbot said:
then whats the point of unifying the architecture between graphics and compute?

they ain't doing that.

snarfbot said:
streamlining software support?

makes SPIR-V based ROCm IR easier.

Granath · Mar 3, 2025

Bondrewd said:
they ain't doing that.

makes SPIR-V based ROCm IR easier.

So what is a UDNA then?

Lurkmass · Mar 3, 2025

How would they even 'unify' when their compute libraries/platform (ROCm) and software compiled with a specific architecture in mind out there when ALL of their accelerator product lines have had different architectures so far such as gfx8/9 (Vega/CDNAx), gfx10 (RDNA 1/2.x), and gfx11/12 (RDNA3.x/4) ...

Does AMD just tell their existing compute-centric customers "oops sry lol", your applications/frameworks offline compiled/statically linked for/with X binaries are obsolete and/or incompatible with our future products as if NOTHING ever happened!

There's a reason why ROCm was designed initially the way it was so that NO ONE (not even AMD themselves to an extent) would ever be able to pull the rug under from anyone so that developers (mostly) wouldn't have to deal with any crap (32-bit PhysX deprecation anyone ?) that comes for developing software on other architecture just like the console model ...

Bondrewd · Mar 3, 2025

Granath said:
So what is a UDNA then?

Doesn't really exist in the hardware.
Maybe ISA unification, theorethically.

Lurkmass said:
in mind out there when ALL of their accelerator product lines have had different architectures so far such as gfx8/9 (Vega/CDNAx), gfx10 (RDNA 1/2.x), and gfx11/12 (RDNA3.x/4) ...

the core ISA is stale bread so rigging in SPIR-V IR support a-la how OneAPI did it ain't hard.

Lurkmass · Mar 3, 2025

Bondrewd said:
the core ISA is stale bread so rigging in SPIR-V IR support a-la how OneAPI did it ain't hard.

So AMD basically tells their compute users to start over again after their OpenCL and HSA failure ? Moving exclusively to SPIR-V in the future just means that any existing codebases won't work on their future hardware designs. When you're getting a next generation Instinct accelerator, you're also paying for the promise that any software that worked on their past/current hardware iterations (gfx9.x) WILL work on a future Instinct product (gfx9.y where y > x) regardless of driver/technical support provided that they don't mess up their hardware implementation ...

When AMD uproots their entire compute stack, are you sure you really want to trust their dodgy history of software support especially when they move to a more maintenance intensive platform ?

Such a shame right after when they were building up trust for ROCm with a hardware BC model ...

pTmdfx · Mar 4, 2025

Lurkmass said:
So AMD basically tells their compute users to start over again after their OpenCL and HSA failure ? Moving exclusively to SPIR-V in the future just means that any existing codebases won't work on their future hardware designs.

Their libraries have long been bundled with kernels pre-built from target-specific GCN assembly sources. I don't recall seeing any guarantees of forward/backward compatibility at binary/ISA level. Having said that, I can see them taking SPIR-V more seriously to keep their high-level libraries more... eh... maintainable. At least SPIR-V is not dead on arrival like HSAIL.

Otherwise I don't see why SPIR-V support has any causation on the up-to-interpretation "unification". It is exclusively a compiler software stack matter, eh...

Lurkmass · Mar 4, 2025

pTmdfx said:
I don't recall seeing any guarantees of forward/backward compatibility at binary/ISA level.

No but much of ROCm's success with compute was designed to exploit BC at the ISA level so that their big customers can specifically avoid having the terrible experiences plagued from their graphics portfolio ...

xpea · Mar 4, 2025

Lurkmass said:
No but much of ROCm's success with compute was designed to exploit BC at the ISA level so that their big customers can specifically avoid having the terrible experiences plagued from their graphics portfolio ...

Yep and we don't even talk about their stupid business decision to base most of their APIs and frameworks on CUDA like the semianalysis article pointed out. No words can express such pure madness and short vision...

AMD RDNA5 Architecture Speculation

Similar threads