RDNA4

If those die sizes are correct, then all the performance claims that have gone around are clearly total bunk. There's simply no way that a monolithic 240mm² GPU still on a 5nm class process is gonna perform similar to even cut down Navi 31. No possible way.

And these die sizes do match up with the initial claims that RDNA4 was focusing on lower/mid range. 240mm² is midrange, I think a lot of people have lost sight of this in the midst of these $600+ parts nowadays. Polaris 10 was 232mm². Navi 10 was just 251mm². The RX480 and 5700XT and whatnot were proper midrange products. Both of these had 256-bit buses, by the way. It could be they do without Infinity Cache, which is quite space inefficient?

Granted, 130mm² is very small, yet will still likely be more performant than any APU til the larger Strix Point stuff comes along, which likely wont be cheap and will be primarily laptop-focused.

Some of these specs definitely dont all fit together properly, but I wouldn't be shocked if the die sizes are correct.
 
Last edited:
Does it make sense that N48 is bigger than N44? Thought the bigger dies get smaller numbers.

64 CU @ ~3.5Ghz seems doable in 240mm^2. Maybe that’s just the size of the GCD. Like a souped up N32.
 
Does it make sense that N48 is bigger than N44? Thought the bigger dies get smaller numbers.

64 CU @ ~3.5Ghz seems doable in 240mm^2. Maybe that’s just the size of the GCD. Like a souped up N32.
it would be pointless to present GCD size if it's monolithic chip ...
 
Does it make sense that N48 is bigger than N44? Thought the bigger dies get smaller numbers.
Die naming comes from order the projects were created.

Theory is that N41-43 were MCM and canned, while N44 is the small monolithic die leftover from original plans, and N48 is a newer monolithic project that came after plans were changed to compete in midrange.
 
If those die sizes are correct, then all the performance claims that have gone around are clearly total bunk. There's simply no way that a monolithic 240mm² GPU still on a 5nm class process is gonna perform similar to even cut down Navi 31. No possible way.
Why not?
7900GRE is heavily bandwidth limited with only 576GB/s memory and 2250GB/s IC.
It heavily outclasses the 7800XT with 1.85x GP/s and ~1.25x CU processing but is only ~10% faster, less than half the potential.
7900GRE OC to 2.7ghz core and ~18.5Gbps memory only gains another ~10% perf, around half its clock improvement.
Even 7900XTX and 7900XT seem a bit bandwidth starved especially compared to 7800XT running free.

If specs are true, N48 seems like a very balanced design. Assuming they do have ~22Gbps GDDR6 to feed it, you have the upper end potential of ~20% more performance than the 7900GRE.
IMO, 10-15% more than the GRE seems possible, aka ~7900XT perf.

Does it make sense that N48 is bigger than N44? Thought the bigger dies get smaller numbers.

64 CU @ ~3.5Ghz seems doable in 240mm^2. Maybe that’s just the size of the GCD. Like a souped up N32.
Doable but at what cost?
64CU and 128ROPs at 3.5ghz would a bit bandwidth limited but I could see a later release/refresh to get performance past 7900XT levels if they use 24Gbps GDDR6 or something even faster.
I'm guessing they don't push it too far to keep power <250w, some are saying full N48 @ 3ghz is ~200w.
 
Last edited:
Why not?
7900GRE is heavily bandwidth limited with only 576GB/s memory and 2250GB/s IC.
It heavily outclasses the 7800XT with 1.85x GP/s and ~1.25x CU processing but is only ~10% faster, less than half the potential.
7900GRE OC to 2.7ghz core and ~18.5Gbps memory only gains another ~10% perf, around half its clock improvement.
Even 7900XTX and 7900XT seem a bit bandwidth starved especially compared to 7800XT running free, aka ~7900XT perf.

If specs are true, N48 seems like a very balanced design. Assuming they do have ~22Gbps GDDR6 to feed it, you have the upper end potential of ~20% more performance than the 7900GRE. So, IMO, 10-15% more than the GRE seems possible.
240mm²

vs

492mm²

Same 5nm process family for both. And you think the top part will perform not just as good, but better?

I'll just put it this way - if this is true, then RDNA3 should go down as an even bigger disaster than the regular bashing I treat it to.

It reminds me of all of the people who, after we learned Navi 33 would only be 200mm², were still trying to reason/math out some way that it could totally perform like Navi 21 as some BS rumors up to that point were suggesting it would.

EDIT: I should clarify that I dont think they'll be able to pack in 64CU's in something that small, especially with a 256-bit bus and all. Would be amazing if true, it just seems a heavy ask.
 
Last edited:
It reminds me of all of the people who, after we learned Navi 33 would only be 200mm², were still trying to reason/math out some way that it could totally perform like Navi 21 as some BS rumors up to that point were suggesting it would.

EDIT: I should clarify that I dont think they'll be able to pack in 64CU's in something that small, especially with a 256-bit bus and all. Would be amazing if true, it just seems a heavy ask.
Very fair point, and good reality check for my silly season calculations.
I was going off the numbers based on the 64CU +3ghz rumors for N48.

48CUs at 3.4ghz would have a tough time matching 7900GRE performance levels.
1.5x more than the 32CU expected in N44 seems a bit more reasonable of an expectation.

Edit- That 48CU is interesting... 3SE 48CU 96ROPs @ 3.6ghz and 21.5Gbps GDDR6 lines up quite well.
That would potentially give ~20% more performance than 7800XT, as the upper limit.
 
Last edited:
This is just crazy baseless speculation based on no insider information, but the only way I can see those die sizes & specs being realistic is if at least several of the following are true:
  1. Either:
    1. There is little or no Infinity Cache.
    2. ... OR: Both 44 & 48 have new 6nm MCDs that are not included in these die sizes. To save PHY area on both GCD and MCD, the interface bandwidth between GCD and MCD is 1/2 of Navi3x, possibly by combining two 64-bit MCDs into a single 128-bit MCD but with a lot less L3 (also making each MCD smaller than two of the previous-gen MCDs). This slightly reduces the benefit of the L3 (which is also why it would make sense to have less L3 per unit of external memory bandwidth).
  2. Possibly CU is smaller and a bit slower, e.g. by only having the 256KiB register file of Navi33 instead of the 384KiB RF of Navi31/32 which also helps with weak SRAM scaling on 4nm (that possibly goes against the idea of making raytracing faster?)
  3. Save some more die area by cutting PCI-Express from "4.0 x8" on Navi33 to "5.0 x4" on 44, and from "4.0 x16" to "4.0 x8" on 48.
If both these chips are monolithic on 4nm with large(-ish) Infinity Caches though, then I'd say no way (and/or I'd be extremely impressed).
 
This post’s accuracy is not substantiated. Exercise discretion and wait for verified sources before accepting its claims.
however NAVI48 die size doesn´t fit
well yeah.
and that doesn't fit to 240mm2 because the difference between 5nm and 4nm is absolutely marginal
Well yeah that's where the whole new microarch gimmick comes in.
I thought full N44 would need 19.5/20Gbps GDDR6 with that faster Infinity Cache to break +20% over 7600XT but that rumor says they are still using 18Gbps GDDR6.
Those really aren't b/w-capped parts.
N48 is basically the same size of N23 but doubles everything... near perfect logic scaling(?)
It's the 2nd time in a row they feed bits to the woodchipper for the sake of area efficiency and oh no, they're not stopping (granted RDNA2 also did that for a tiny bit by axing the # of wave slots per SIMD)
If both these chips are monolithic on 4nm with large(-ish) Infinity Caches though, then I'd say no way (and/or I'd be extremely impressed).
the whole idea is maxing the most PPAmaxed shader core possible then spamming it across the lineup.
The tiny mainstream peanut is 16 cores, the (now dead) chungus tiled la creatura is over a hundred!
Does it make sense that N48 is bigger than N44? Thought the bigger dies get smaller numbers.
number indicates either the design start date or the TO order (don't remember exactly which).
Theory is that N41-43 were MCM and canned
40. you forgot the forty. 43 never really existed for much the same reasons tiled N33 was abandoned.
Yes but do we know for sure that it’s monolithic?
yea.
 
Last edited:
Suddenly I get Microsoft's whole "RDNA5 2026 Next Gen console is best ever!" talk. Chiplet + packing 32CU into, what, 60mm of N3E? 160CU GPU or something?
 
Very fair point, and good reality check for my silly season calculations.
I was going off the numbers based on the 64CU +3ghz rumors for N48.

48CUs at 3.4ghz would have a tough time matching 7900GRE performance levels.
1.5x more than the 32CU expected in N44 seems a bit more reasonable of an expectation.

Edit- That 48CU is interesting... 3SE 48CU 96ROPs @ 3.6ghz and 21.5Gbps GDDR6 lines up quite well.
That would potentially give ~20% more performance than 7800XT, as the upper limit.
If they can do 64CU's in something so small, they've likely got a winner here, assuming no major architectural faults like with RDNA3. That'd be amazing and the economics of it could allow them to sell it at like $400.

With such a dense design, I'd imagine clocks might not be >3Ghz out the box/stock, but I'd definitely hope like 2.7Ghz minimum. Can look at Nvidia's AD104(4070Ti) at 295mm², 60SM's on a N4 variant being 2.6Ghz stock, so I wouldn't expect anything much higher than that.

But yea, I'd think something more like 48CU's makes sense, though I'd still be a little cautious on the >3Ghz clock stuff if we're talking stock speeds. And I wouldn't quite be expecting you can just linearly add up gains and all that. Usually doesn't work quite so neatly for GPU scaling.

There's potential here, but no matter what reasoned out math I can read or do myself, I always have to remember that this is exactly how people overhype themselves on Radeon stuff, and it rarely works out so peachy. I guess that's cynical, but I dont think AMD really has earned the benefit of the doubt. Fingers crossed for the best.
 
Definitely feels like there's ~50mm² or so missing from that 240mm² Navi 48 if it's supposed to be hitting 7900XT perf.

Ampere:
GA102 = 628 mm² = 28.3B transistors = 45.1M / mm²

Ada:
AD104 = 294 mm² = 35.8B transistors = 121.8M / mm²
AD106 = 188 mm² = 22.9B transistors = 121.8M / mm²

Let's assume AMD gets to that 121.8M / mm², 240 * 121.8 = ~29.2B transistors, so it's roughly a match for GA102. A 3090 Ti is maybe ~90% of 7900 XT, so it seems theoretically possible but it would be such a massive improvement in area over Navi 31/32. 64MB L3 cache would be like ~50mm² alone? Maybe 0MB L3 cache? With 22GT/s GDDR6 you'd get ~700GB/s bandwidth.

Perhaps they are "mobile first" GPUs and AMD has basically butchered everything else with minimal display engines (2-3 displays max), scaled down media engine to rely on the iGPU, only 8x PCIe 5.0 (Navi 48) and 4x PCIe 5.0 (Navi 44)?
 
so it seems theoretically possible but it would be such a massive improvement in area over Navi 31/32.
That's because Navi31/32 as you know them today are dogshit.
Clock like trash.
Perhaps they are "mobile first" GPUs and AMD has basically butchered everything else with minimal display engines (2-3 displays max), scaled down media engine to rely on the iGPU, only 8x PCIe 5.0 (Navi 48) and 4x PCIe 5.0 (Navi 44)?
nope, pretty standard parts actually.
also PCIe5 PHYs are real chungus, it's not an area win you think it is (power, yes, but who gives a shit about interface power on GPUs?).
 
That's because Navi31/32 as you know them today are dogshit.
Clock like trash.

Is there some specific architectural change in Navi31/32 that objectively should have resulted in higher clocks? Deeper pipelines, secret sauce TSMC tricks, something else?

For months you’ve been claiming RDNA 3 is trash because it didn’t hit AMD’s internal targets but that doesn’t say much of anything.
 
240mm²

vs

492mm²

Same 5nm process family for both. And you think the top part will perform not just as good, but better?

I'll just put it this way - if this is true, then RDNA3 should go down as an even bigger disaster than the regular bashing I treat it to.

It reminds me of all of the people who, after we learned Navi 33 would only be 200mm², were still trying to reason/math out some way that it could totally perform like Navi 21 as some BS rumors up to that point were suggesting it would.

EDIT: I should clarify that I dont think they'll be able to pack in 64CU's in something that small, especially with a 256-bit bus and all. Would be amazing if true, it just seems a heavy ask.

I think the issue is that N31 and N32 are bloated for no apparent reason.

N31 is around 58B transistors. N33 which is basically 1/3 of that spec wise is just 13.3B so why is N31 using around 18B more transistors than 3 lots of N33 to deliver pretty much the same spec?

If you were to just double N33 up that gets you 64CUs, 4SEs, 64MB cache and a 256 bit bus in 26.6B transistors, that would need a density of 110M xtors per mm to fit in a 240mm die area so to me that seems entirely doable with margin to spare when Hawk Point is 140M xtors per mm (although the logic/cache/io ratios are a bit more friendly for high density).

I was one of those people who thought N33 could match the 6900XT at 1080p. There are 3 reasons for this.

1) AMD had already publicly announced a 1.5x perf/w uplift, granted that does not always apply to every SKU but it gives you a starting point.

2) There was zero indication from rumours that the increase in compute was from dual issue that works when the stars align rather than from doubling the shader count.

3) I was around when RV770 happened where they increased core count by 2.5x with just 25% more die area on the exact same node and doubled performance Vs the 3870 at high res / 4x aa / 16x af.

Perhaps the excitement of another rv770 moment got the best of me but there have been some amazing PPA uplifts in the past.
 
if´it´s not monolithic and GDC die itself is 240mm2 then NAVI 44 die size makes no sense at all

Actually, makes a lot of sense if the split between dies is different than in RDNA3

Navi 32 has 30 DCUs and has die size of 200mm. N4 is expected to be about 6% denser so practically 32 similar DCU's would fit into same area.

So what is the 40mm^2 of extra area then?

It could be either
1) Moving the outer level cache from the IO dies to the GDC die (to make it much faster and to improve power efficiency)
2) Moving the memory controllers to the GDC die (to improve power efficiency) and using V-cache for the outer level cache, stacking the cache die on top of the GDC die.
 
Definitely feels like there's ~50mm² or so missing from that 240mm² Navi 48 if it's supposed to be hitting 7900XT perf.

Ampere:
GA102 = 628 mm² = 28.3B transistors = 45.1M / mm²

Ada:
AD104 = 294 mm² = 35.8B transistors = 121.8M / mm²
AD106 = 188 mm² = 22.9B transistors = 121.8M / mm²

Let's assume AMD gets to that 121.8M / mm², 240 * 121.8 = ~29.2B transistors, so it's roughly a match for GA102. A 3090 Ti is maybe ~90% of 7900 XT, so it seems theoretically possible but it would be such a massive improvement in area over Navi 31/32. 64MB L3 cache would be like ~50mm² alone? Maybe 0MB L3 cache? With 22GT/s GDDR6 you'd get ~700GB/s bandwidth.

Perhaps they are "mobile first" GPUs and AMD has basically butchered everything else with minimal display engines (2-3 displays max), scaled down media engine to rely on the iGPU, only 8x PCIe 5.0 (Navi 48) and 4x PCIe 5.0 (Navi 44)?

I think there´s floating some wrong numbers about transistor density when it comes to NAVI 31/32/33 die size

TPU Database shows NAVI 31 has total of 57,700 million transistors right ?
GDC die size is 304mm2 with 45,400 million of transistors, which means tr density 150.2M / mm² for 5nm

with NAVI32 they state it has 28,100 million transistors, but I think it´s not total number, because you have to add 4x MCD of 2,050 million transistors, which means it has total of 36.300 million transistors and GDC transistor density ~143,3M / mm² ( 28100 /196mm2 = 143,3 M) . If it had only 81.2M / mm² given by TPU or 101M/mm² given by wikipedia , the GDC would have only 15.915 /19.796 million transistors and diference between the two is 304mm2 / 196mm2 which is roughly 55% more area of GDC N32 , so 45,400 mil tr / 28.100 ml tr = 1,6 explain perfectly the diference in die size. Dunno, why nobody noticed this transistor number discrepance

NAVI 33 has 13,300 million on 6nm with tr density 65.2M / mm² , MCD N32/31 has tr. density 54.64M / mm² which is a little bit less than monolithic N33

According TSMC own data the diference between 5nm and 4nm is only ~6% in logic density so hypothetical NAVI 32 GDC size made on 4nm would be ~ 184mm2 and NAVI31 GDC ~ 286mm2

\\\ to compare density of :

AD104 = 294 mm² = 35.8B transistors = 121.8M / mm²
AD106 = 188 mm² = 22.9B transistors = 121.8M / mm²

with NAVI31/32 GDC transistor density example, it only shows, that AMD chose extreme density oriented type of 5nm process, whille Nvidia despite "4N" moniker preferred another option of less dense type of 5nm in favor of higher clock, I assume ? In reality both chips use custom 5nm TSMC process
 
Last edited:
Back
Top