AMD RDNA4 Architecture Speculation

Maybe they didn't have enough time to prepare it. I'd also expect the chiplet part would be 3nm. 400mm² 3nm monolithic die could be quite expensive, while 400mm² 4nm monolithic die would have quite a high power consumption (lowering clocks would hurt performance = lower margins). I believe they found a better way how to utilize the available manufacturing capacities, so the used them that way.
Well the hope is that RDNA4 fixes a lot of the inefficiencies of RDNA3, so you could do performative clockspeeds at 4nm with semi-reasonable power draw. RTX4080 is a ~400mm² GPU on 4nm, and while it's rated at 320w, that could easily be lowered to ~250w without hurting its general competitiveness much. But that might have helped clue people in that AD103 wasn't actually a high end part....

Either way, if AMD wants to do anything meaningful here, they're gonna need to stop being so margin-obsessed. Pushing out clocks to scrounge an extra 3-5% performance to get it closer to some higher tier Nvidia competitor so they can charge as much as possible will just mean lackluster reviews and lower sales.
 
Chiplet-based GPUs would need CoWoS packaging. CoWoS capacities are limited, so it makes sense to use all the reserved capacities to manufacture high-margin products ($x0 000 accelerators) instead of low-margin ($x00) gaming GPUs.
RDNA3 is not a COWOS.
 
Huh? Since when is "real chiplets" dictated by shader engines split between chiplets?

I was being facetious. Bandwidth requirements and hence interconnect complexity is on a different level to RDNA 3 hence the need for CoWoS.
 
Sure but it doesn’t explain why AMD didn’t go for a ~400mm^2 RDNA 4.
They didn't design a big single shader die in time, as they were planning to go chiplets for high end. Scrapping high end later in the cycle, it was already guaranteed the big die would be ready later than the rest. They probably also lack the manpower needed to allocate for this unexpected project.

This is the most likely explanation to me
 
They didn't design a big single shader die in time, as they were planning to go chiplets for high end. Scrapping high end later in the cycle, it was already guaranteed the big die would be ready later than the rest. They probably also lack the manpower needed to allocate for this unexpected project.

This is the most likely explanation to me

Maybe. N48 was also a last minute pivot right? If they could do that they could have potentially aimed for something bigger for plan B.
 
We can't rule out even a great reshaping of the long term goals of RTG. Seeing the characteristics of RDNA3-based products launched 2+ years after RDNA2 screams for reevaluation.

Btw RDNA3 family consists of only 3 chips + 1 APU, compared to RDNA2's 4 chips + 3(?) APUs. This means there were not that many designs to make this gen.
 
We can't rule out even a great reshaping of the long term goals of RTG. Seeing the characteristics of RDNA3-based products launched 2+ years after RDNA2 screams for reevaluation.

Btw RDNA3 family consists of only 3 chips + 1 APU, compared to RDNA2's 4 chips + 3(?) APUs. This means there were not that many designs to make this gen.
2 APUs, Phoenix and Phoenix2 are different chips. Also should RDNA "3.5" be counted as separate? That's coming to at least 3 APUs by the looks of it (Strix Point, smaller version of Strix Point and Strix Halo)
 
Last edited:
I'm so curious what we'll actually see in RDNA4 benchmarks, what kind of improvements we'll see relative to RDNA3. I'm expecting mostly iterative improvements but a bigger than expected bump in raytracing performance. But also, I'm doubly curious what we'll see from PS5 Pro's "RDNA3.5" implementation, whether it will align with what we see from RDNA4 DGPUs.

The performance claims we've heard for PS5 Pro's raytracing sound like they'll exceed even RDNA4 DGPUs. Really hoping that AMD makes some tangible improvements and genuinely keeps up with (or even, in some areas, surpasses) nvidia. Need the competition in the marketplace.
 
Inefficient work distribution and poor hardware utilization seem to be the biggest issues for 3D hardware these days. Will be interesting to see RDNA 4 move the needle but I suspect it will take work graphs to see real change there and who knows when that will start showing up in games. The API is still WIP.

A big boost to RT on desktop and console would be very welcome though I’m not getting hopes too high given AMD’s less than warm embrace of RT to date. Hopefully there’s some sort of hardware accelerated ray sorting to keep pace with Intel & Nvidia. MS really needs to standardize that functionality.
 
I do wonder if Navi 48 gets like 7900XT-4080 Raster performance then how good the RT/PT performance will be, 4070 Ti? 4070 Ti Super? even 4080? Because even just 4070 Ti RT Perf for $500 w/ 16GB would be great for people on 1440P who just want to do 120+ FPS with the popular MP games and 60 and above with the 'cinematic' games. Especially if there's AI Reconstruction in to boot.
 
I don't know how accurate is this, but some famous hardware leaker is saying no RDNA4 is going to be released this year.

Curiously AMD also never mentioned RDNA4 in their latest financial results, despite mentioning next Ryzen, Epyc and MI350 as slated to be released this year.

but it is certain that AMD will not launch Radeon RX 8000 series graphics cards with RDNA 4 GPU architecture in 2024

 
A big boost to RT on desktop and console would be very welcome though I’m not getting hopes too high given AMD’s less than warm embrace of RT to date. Hopefully there’s some sort of hardware accelerated ray sorting to keep pace with Intel & Nvidia. MS really needs to standardize that functionality.
I don't know if you know this but SER is just a poor man's version of callable shaders/function calls ...

On Intel HW, you don't need to expose an explicit API to do SER since all of the RT shader stages (ray generation/intersection/any hit/closest hit/callable) in the RT pipeline are purely implemented as callable shaders which makes it trivial for their driver/HW to determine whether or not to spill this state since their reordering mechanism can be done after every function call ...

Exposing a hobbled alternative (SER) that's less powerful/general than current functionality (callable shaders) doesn't seem all that attractive in the eyes of graphics programmers and I'm not sure if the industry is clamouring over the idea of more fixed function/"special state" HW. An explicit API for SER might be more interesting if it can be applied to graphics or compute shaders/pipelines or HW vendors can just straight up expose callable shaders/function calls for all other shader stages ...
 
I mean mostly isn't the best description lol. that is really just 51% of the issues. RDNA 2 was the first attempt , RDNA 3 was the second , 3.5 was the third ... 4 is the forth try ? Hopefully they get it mostly right lol

What LLVM AMDGPU backend patches has shown so far for GFX12 is... no big change in ray tracing architecturally. Appears to be the same as RDNA 3.

At best one could say the new GFX12 memory temporal hints might help fine-tune the traversal kernel and reduce cache thrashing. I would not hold my expectations too high from this alone, or banking on any super uber secret sauce, despite the spinning rumor mills.

Even for RDNA 3, I would have guess that the RT speedup was mostly from L0 to L2 caches being doubled across the line...
 
Last edited:
Back
Top