AMD RDNA4 Architecture Speculation

If they can do 64CU's in something so small, they've likely got a winner here, assuming no major architectural faults like with RDNA3. That'd be amazing and the economics of it could allow them to sell it at like $400.

With such a dense design, I'd imagine clocks might not be >3Ghz out the box/stock, but I'd definitely hope like 2.7Ghz minimum. Can look at Nvidia's AD104(4070Ti) at 295mm², 60SM's on a N4 variant being 2.6Ghz stock, so I wouldn't expect anything much higher than that.

But yea, I'd think something more like 48CU's makes sense, though I'd still be a little cautious on the >3Ghz clock stuff if we're talking stock speeds. And I wouldn't quite be expecting you can just linearly add up gains and all that. Usually doesn't work quite so neatly for GPU scaling.

There's potential here, but no matter what reasoned out math I can read or do myself, I always have to remember that this is exactly how people overhype themselves on Radeon stuff, and it rarely works out so peachy. I guess that's cynical, but I dont think AMD really has earned the benefit of the doubt. Fingers crossed for the best.
I don't know why we have to be cautious about +3ghz.
It would be one thing if RDNA4 was a radical change of architecture but everything so far points to that not being the case.
Did AMD miss and over-represent RDNA3? Yes, most definitely they did. But...
Most Navi 31 hit a minimum of 2.7-2.8ghz with some even nudging right up close to 3ghz.
Some Navi 32/33 are more than capable of hitting 3ghz.
The issue with those higher clocks is doing so at reason power levels.

If another generation later, with a decent process improvement, can't get them there, well... they should probably just close up shop on dGPUs.
Their performance and midrange GPUs have basically been stuck at 2.5-2.6ghz since RDNA2 in 2021, a ~10% increase in clocks is not enough to get them back on track.

IMO, they NEED to launch full chips of N44 and N48 at +3ghz for them to even make sense.
Yes, the currently rumored specs on N48 could get away with ~2.8ghz clocks but that's only 10% over N22/23...
N44's rumored specs, I think we can all agree, needs to be ~3.3ghz to break away from the 7600XT.

We are used to seeing ~20% increase in clocks with a new architecture and/or new node though that is obviously shrinking with the complexity involved with newer advanced nodes.
AMD GPUs were at ~900mhz-1ghz for most of GCN1-3 on 28nm, GCN 4 on 14LPP pushed them to 1.2-1.3ghz, +30% increase.
On that same 14LPP node Vega pushed those clocks to ~1.6ghz, ~23%-30% increase.
RDNA1 on 7nm pushed that to ~1.9ghz. ~19%-25% increase.
On the same node RDNA2's Navi21 pushed those clocks to ~2.25ghz on the big die, 18.4% increase.
Smaller and later RDNA2 dies hit 2.5-2.6ghz, ~11%-15.5% increase on the same node and architecture or +31% compared to RDNA1.
RDNA3 on 5nm stalled out launching at 2.5ghz clocks. ~11% increase compared to Navi21 even with a new node and relatively small sized GCDs.
RDNA3 small die on N6 pushed clocks up to 2.75ghz, ~6% increase compared to the smaller RDNA2 dies.

RDNA4 needs to make up that 5-10% miss while it adds its own ~15-20% increase.
A 20% increase to clocks doesn't seem so unreasonable, right?
 
Last edited:
This certainly won’t be 7900xt performance in a mid to low range price bracket. That would require a doubling of price to performance which is not going to happen. 7800xt performance at $350-400 sounds about right.
 
This certainly won’t be 7900xt performance in a mid to low range price bracket. That would require a doubling of price to performance which is not going to happen. 7800xt performance at $350-400 sounds about right.
If N48 is ~7900XT performance, I think most people are expecting $600, give or take ~$50.
7800XT perf at $350 offers better $/perf than 7900XT at $500.

Edit- Whoops, I was mixing up price discounts vs price/perf.
 
Last edited:
with NAVI31/32 GDC transistor density example, it only shows, that AMD chose extreme density oriented type of 5nm process, whille Nvidia despite "4N" moniker preferred another option of less dense type of 5nm in favor of higher clock, I assume ? In reality both chips use custom 5nm TSMC process

I think it is more that the logic/cache/io ratio in the GCDs is very heavily skewed towards logic and that shrinks better than cache/io. AD102 for example has higher density than AD104 at around 125.3M / mm because while it doubles the cache and IO it slightly more than doubles the compute going from 5 GPCs to 12GPCs, this skews that logic/cache/io ratio a bit more towards logic so you can get a bit more density.

Another example would be Hawk Point which is 140M / mm which is also a design more skewed towards logic than a 128bit 32MB L3 or 256bit 64MB L3 GPU is going to be.

I think somewhere in the region of 120-130M / mm is the kind of density to expect for N44 and N48 which would mean 15.6B - 16.9B xtors for N44 and 28.8B - 31.2B for N48. As I said in my previous post that would be more than double the N33 transistor count so there does look to be room to fit the specs mentioned.
 
I think somewhere in the region of 120-130M / mm is the kind of density to expect for N44 and N48 which would mean 15.6B - 16.9B xtors for N44 and 28.8B - 31.2B for N48. As I said in my previous post that would be more than double the N33 transistor count so there does look to be room to fit the specs mentioned.
fair point, this is the essence of the chiplet solution from AMD and the reason why they made RDNA3 chiplet, right ? having less density then it logically follows that the new chips will be larger in size if density goes down .
 
Last edited:
Navi 32 has 30 DCUs and has die size of 200mm. N4 is expected to be about 6% denser so practically 32 similar DCU's would fit into same area.

So what is the 40mm^2 of extra area then?

It could be either
1) Moving the outer level cache from the IO dies to the GDC die (to make it much faster and to improve power efficiency)
2) Moving the memory controllers to the GDC die (to improve power efficiency) and using V-cache for the outer level cache, stacking the cache die on top of the GDC die.
navi 32 has total 346mm2 of die area, GCD has ~ 200mm2 and rest 146mm2 of MCD , so 32CU would probably fit the same size on 4nm, but I am not sure if it´s possible to shrink 146mm2 to just 40mm2 going from 6nm to 4nm , perhaps if they strip off all chiplet comunication logic and IF fabric and use less cache size , then yes , but I will expect more of AI bloating transistors and beefer AV decode/encode part anyway.

stacked 3D cache would raise cost up, so we can exclude this
 
Last edited:
navi 32 has total 346mm2 of die area, GCD has ~ 200mm2 and rest 146mm2 of MCD , so 32CU would probably fit the same size on 4nm, but I am not sure if it´s possible to shrink 146mm2 to just 40mm2 going from 6nm to 4nm , perhaps if they strip off all chiplet comunication logic and IF fabric and use less cache size , then yes , but I will expect more of AI bloating transistors and beefer AV decode/encode part anyway.

stacked 3D cache would raise cost up, so we can exclude this

Stacked 3D cache (made on N6 or N7) would
1) Have cheaper wafer cost per capacity than cache made on N4/N4P
2) Be faster and more energy-efficient than cache on separate die with traditional die-to-die interconnects(RDNA3)
3) Allow making the main die smaller. Cost scaling of die size is superlinear.

Do we have any actual data on how much the v-cache packaging/integration costs?
 
3D stacked infinity cache on a smallish GCD would be interesting and a natural evolution of AMDs work. Does it make sense though given GDDR7?
 
3D stacked infinity cache on a smallish GCD would be interesting and a natural evolution of AMDs work. Does it make sense though given GDDR7?
no, it doesn't make any sense within cost sensitive products. You may put 3D cache on some Halo product or high-end variant of GPU, but for midrange chips which primary target is cost ? If that would be true, why AMD never used some V-cache on APU´s ? or NAVI 33 ?

btw, AMD APU´s will stick with RDNA3+ GPU at least till 2027, another "Vega" edition reincarnated ?

https://videocardz.com/newz/amd-apus-rumored-to-use-rdna3-gpu-architecture-until-at-least-2027
 
Last edited:
This post’s accuracy is not substantiated. Exercise discretion and wait for verified sources before accepting its claims.
AMD APU´s will stick with RDNA3+ GPU at least till 2027, another "Vega" edition reincarnated ?
?
Medusa mobile is RDNA5.
4 is the one that gets skipped on APUs since the OG KRK1/2 are deader than dead.
 
I don't know why we have to be cautious about +3ghz.
It would be one thing if RDNA4 was a radical change of architecture but everything so far points to that not being the case.
Did AMD miss and over-represent RDNA3? Yes, most definitely they did. But...
Most Navi 31 hit a minimum of 2.7-2.8ghz with some even nudging right up close to 3ghz.
Some Navi 32/33 are more than capable of hitting 3ghz.
The issue with those higher clocks is doing so at reason power levels.

If another generation later, with a decent process improvement, can't get them there, well... they should probably just close up shop on dGPUs.
Their performance and midrange GPUs have basically been stuck at 2.5-2.6ghz since RDNA2 in 2021, a ~10% increase in clocks is not enough to get them back on track.

IMO, they NEED to launch full chips of N44 and N48 at +3ghz for them to even make sense.
Yes, the currently rumored specs on N48 could get away with ~2.8ghz clocks but that's only 10% over N22/23...
N44's rumored specs, I think we can all agree, needs to be ~3.3ghz to break away from the 7600XT.

We are used to seeing ~20% increase in clocks with a new architecture and/or new node though that is obviously shrinking with the complexity involved with newer advanced nodes.
AMD GPUs were at ~900mhz-1ghz for most of GCN1-3 on 28nm, GCN 4 on 14LPP pushed them to 1.2-1.3ghz, +30% increase.
On that same 14LPP node Vega pushed those clocks to ~1.6ghz, ~23%-30% increase.
RDNA1 on 7nm pushed that to ~1.9ghz. ~19%-25% increase.
On the same node RDNA2's Navi21 pushed those clocks to ~2.25ghz on the big die, 18.4% increase.
Smaller and later RDNA2 dies hit 2.5-2.6ghz, ~11%-15.5% increase on the same node and architecture or +31% compared to RDNA1.
RDNA3 on 5nm stalled out launching at 2.5ghz clocks. ~11% increase compared to Navi21 even with a new node and relatively small sized GCDs.
RDNA3 small die on N6 pushed clocks up to 2.75ghz, ~6% increase compared to the smaller RDNA2 dies.

RDNA4 needs to make up that 5-10% miss while it adds its own ~15-20% increase.
A 20% increase to clocks doesn't seem so unreasonable, right?
Well I mean, your argument here is coming down to "I think it'll be so and so clockspeed because anything less will be disappointing". But Radeon is more often than not, disappointing. I'm not trying to be some hater by saying that, just being realistic.

And Navi 44 at such a tiny size would not likely be some 'next gen' version of the 7600XT, more some even lower level product, maybe even meant primarily for laptops/OEMs. At these stated die sizes, I imagine only N48 would be of interest to the likes of desktop/PC gaming enthusiast types.

I also think they dont need to make some giant leap in performance per mm² in order to release something worthwhile. It would be nice, but any moderate improvement, addressing some of their weaknesses, and keeping the price/economics reasonable, could be plenty. It may not set the world alight, but nobody should be thinking that's what AMD is aiming for here.
 
I think the issue is that N31 and N32 are bloated for no apparent reason.

N31 is around 58B transistors. N33 which is basically 1/3 of that spec wise is just 13.3B so why is N31 using around 18B more transistors than 3 lots of N33 to deliver pretty much the same spec?

If you were to just double N33 up that gets you 64CUs, 4SEs, 64MB cache and a 256 bit bus in 26.6B transistors, that would need a density of 110M xtors per mm to fit in a 240mm die area so to me that seems entirely doable with margin to spare when Hawk Point is 140M xtors per mm (although the logic/cache/io ratios are a bit more friendly for high density).

I was one of those people who thought N33 could match the 6900XT at 1080p. There are 3 reasons for this.

1) AMD had already publicly announced a 1.5x perf/w uplift, granted that does not always apply to every SKU but it gives you a starting point.

2) There was zero indication from rumours that the increase in compute was from dual issue that works when the stars align rather than from doubling the shader count.

3) I was around when RV770 happened where they increased core count by 2.5x with just 25% more die area on the exact same node and doubled performance Vs the 3870 at high res / 4x aa / 16x af.

Perhaps the excitement of another rv770 moment got the best of me but there have been some amazing PPA uplifts in the past.
What sounds good on paper is rarely so simple in real life. Things like transistor counts and whatnot are not reliable for predicting performance, certainly not in any kind of linear way.

I realize there was optimism surrounding RDNA3 after RDNA2's success, and I was moderately optimistic myself for that reason, but 200mm² on 6nm should have at the very least made people extremely skeptical of claims of N21-like performance. Expecting current day Radeon group to make a super rare level of generational leap on largely the same process node was always a huge long shot, and I think we should remember that here as well with these rumors. Even if you can 'math it out' in a way that sounds plausible, there's all kinds of things that will likely prevent it from working out that way.
 
What sounds good on paper is rarely so simple in real life. Things like transistor counts and whatnot are not reliable for predicting performance, certainly not in any kind of linear way.

I am not really using it to predict performance, just trying to get a feeling for if the spec mentioned would fit in 240mm of die area and superficially it seems doable. As always lots and lots of caveats but I don't think it is an outlandish rumour.
 
I am not really using it to predict performance, just trying to get a feeling for if the spec mentioned would fit in 240mm of die area and superficially it seems doable. As always lots and lots of caveats but I don't think it is an outlandish rumour.
I'd call it very close to outlandish, if the 'not outlandish' take requires you to believe that this modern Radeon group is about to pull off the kind of once in a generation lift that they only once achieved decades ago.

I'll give it 'not theoretically impossible' at best. lol It's also not theoretically impossible that Jennifer Love Hewitt will show up to my house this evening for dinner and a cuddle, but I'm not gonna be preparing anything nice just in case, either....
 
I'd call it very close to outlandish, if the 'not outlandish' take requires you to believe that this modern Radeon group is about to pull off the kind of once in a generation lift that they only once achieved decades ago.

I'll give it 'not theoretically impossible' at best. lol It's also not theoretically impossible that Jennifer Love Hewitt will show up to my house this evening for dinner and a cuddle, but I'm not gonna be preparing anything nice just in case, either....

They delivered RDNA 1 which was a big PPA uplift over Vega 20. Then they delivered RDNA 2 which had similar PPA but was a big PPW increase. Sure they missed with RDNA 3 but they are no where near as awful as you are making them out to be.
 
Back
Top