AMD RDNA4 Architecture Speculation

Reference is 2x8-pin.
To quote TPU, "The card calls for three 8-pin PCIe power connectors. We've only seen one other custom RX 9070 XT come with three connectors, and that is the XFX RX 9070 XT Merc 319 Black. ... Most other cards, including the PowerColor Red Devil, come with just two 8-pin connectors (375 W)"
Yes, there's cards (more than those 2 they had seen at that point) with 3x, but there's nothing new in AIBs putting extra power connectors, heck, there's even one with 12V-2x6, but it doesn't mean the card is drawing 600W
What's the point of putting 3 8 pin connectors if one of pins is not required? If OC versions really need a 3rd pin then it either clocks like a champ or there is something really wrong with it's efficiency. I guess we'll know soon.
 
What's the point of putting 3 8 pin connectors if one of pins is not required? If OC versions really need a 3rd pin then it either clocks like a champ or there is something really wrong with it's efficiency. I guess we'll know soon.
No, it doesn't mean either of those. We have countless examples from previous generations of cards with extra power connectors that offer no discernible benefits over default. Most likely it's just PR department telling engineers MOAR POWAH sells even when it does nothing.
 
According to IGN RX 9070 (non-XT) reaches 99 FPS in COD: Black Ops 6 at 4K Extreme settiings without FSR.
Not a very representative benchmark but puts it above 7970XTX.

performance-3840-2160.png


Probably not a good idea to read much into it.
 
Well they'd better be looking to price the 9070XT at $399, $450 max. Honestly I don't see how anyone with a 6800XT or higher could even consider it. That's a huge chunk of their previous buyers with no upgrade path without going Nvidia.
If it essentially gives you the performance of a 7900XT, which was going for ~$650 I think at it's lowest, why would you think they need to price it that low? It isn't a huge upgrade from a 6800XT yes but not everyone upgrades that often. The 7900XTX is still an option if someone really wants to go AMD.
That's how I'm interpreting this as well. If they were confident in having decided on some 'aggressive' pricing, they wouldn't have held off. They probably got word of Nvidia's 50 series pricing, and then realized they were not nearly as well positioned as they thought they'd be.

It's telling that in their interviews they did after the event, it really sounded like their talk of the importance of good pricing and learning lessons and all that was all in relation to their competition. That's probably obvious from a business standpoint, but from a consumer standpoint, that's still very much a "We're not really trying to provide great standalone value, only make it seem like good value when compared against our super high priced competition" giveaway here, and goes against what consumers are thinking when we talk about aggressive pricing.
Given that Nvidia is the market leader, naturally they can dictate pricing and AMD has almost no option but to follow. If AMD had announced a price and then had to announce a price cut due to Nvidia's aggressive pricing for the 5070, it would have been even worse.

Besides, they actually had great products in Kracken Point, Strix Halo and 9950X3D where they have leadership performance to talk about. I think they'd rather get more positive spin out of those than the slight negative spin RDNA4 might have gotten them.
That would be strange.

Navi 32 with 60CU's had total die area of 346mm², and that obviously included four 6nm MCD's totaling about 146mm².

Navi 48 is supposed to be 64CU's, but there should be die savings from moving to 4nm and putting the memory bus and L3 IC on the main 4nm die as well(scaling here isn't great, but it's still something).

Perhaps 64CU is wrong and they actually upped CU count a fair bit instead of going for narrow+faster clockspeeds? Or maybe RDNA4 CU's are a really sizeable chunk wider to accommodate better RT/AI? I dont know, but that doesn't strike me as the sort of architectural or area efficiency gains they'd have liked, especially if the aim is to be able to price this GPU more aggressively.
I agree, the die size does seem too large, even if they beefed up the RT and ML hardware. The rumours were sub 300mm2 die. Let's wait for official news from AMD before we jump to any conclusions but it does seem larger than expected.
Not a very representative benchmark but puts it above 7970XTX.

performance-3840-2160.png


Probably not a good idea to read much into it.

Yea too many variables to read into it. They might have used a totally different scene than TPU. And the CPU used is a 9950X3D vs what I think is a 13900K for TPU.
 
Some others have measured with different known references and got < 350mm2.
Hard to get solid measurements on such a wide picture at an angle, you can easily be off +10% on each side.
I would guess around 320-330mm2.

They supposedly doubled up RT and went to 4SE/128ROPs.
N31, and to some degree N32, were shader heavy designs.
N48 seems to be much better balanced, at least on paper.

Edit- Found your original post(from April '24) with the silly season die sizes from twitter for N48 and N44.
Going off my post/response, N48 was rumored to be ~240mm2 and N44 was ~120mm2 (i think).

Even then I thought ~280-320mm2 for N48 and ~140-150mm2 for N44 made more sense.
 
Last edited:
Seems like a solid enough leak.
7900XTX level in RT, >7900XT w/o RT.
TSE is skewed in RDNA's favor so the card being >4080S probably won't be representative overall.
Still the result suggests that it should be faster than 7900XT but slower than 7900XTX.
Combined with TDP increase it may mean that they are trying to reach some specific card on the Nvidia side - remains to be seen which one though.
 
Seems like a solid enough leak.
7900XTX level in RT, >7900XT w/o RT.
TSE is skewed in RDNA's favor so the card being >4080S probably won't be representative overall.
Still the result suggests that it should be faster than 7900XT but slower than 7900XTX.
Combined with TDP increase it may mean that they are trying to reach some specific card on the Nvidia side - remains to be seen which one though.

Some of the stuff posted there is a bit suspect, like how GPU-Z and/or the driver seem to think it's a 7800XT.
In particular the Furmark shot stood out to me because it showed 329w power draw, with a GPU voltage of... 0.765v?

Unless that bit of telemetry's wrong, that's a strange part of the V/F curve to be hanging out in while still pulling down 330W of power, especially considering the assumption is 330W is significantly raised from their original targets, and the board would be less power constrained than it would have been initially.

Edit: I guess I'm used to looking at the data for NVidia cards more often, which don't throttle quite as bad in FurMark.
Going back to TPU's 7800XT review, the data looks similar, so I guess it could be legit: https://www.techpowerup.com/review/amd-radeon-rx-7800-xt/39.html
 
Last edited:
Some of the stuff posted there is a bit suspect, like how GPU-Z and/or the driver seem to think it's a 7800XT.
In particular the Furmark shot stood out to me because it showed 329w power draw, with a GPU voltage of... 0.765v?

Unless that bit of telemetry's wrong, that's a strange part of the V/F curve to be hanging out in while still pulling down 330W of power, especially considering the assumption is 330W is significantly raised from their original targets, and the board would be less power constrained than it would have been initially.

Edit: I guess I'm used to looking at the data for NVidia cards more often, which don't throttle quite as bad in FurMark.
Going back to TPU's 7800XT review, the data looks similar, so I guess it could be legit: https://www.techpowerup.com/review/amd-radeon-rx-7800-xt/39.html
According to KitGuru PowerColor has said that current drivers limit performance around 20 % compared to release/media drivers
 
Some others have measured with different known references and got < 350mm2.
Hard to get solid measurements on such a wide picture at an angle, you can easily be off +10% on each side.
I would guess around 320-330mm2.

They supposedly doubled up RT and went to 4SE/128ROPs.
N31, and to some degree N32, were shader heavy designs.
N48 seems to be much better balanced, at least on paper.

Edit- Found your original post(from April '24) with the silly season die sizes from twitter for N48 and N44.
Going off my post/response, N48 was rumored to be ~240mm2 and N44 was ~120mm2 (i think).

Even then I thought ~280-320mm2 for N48 and ~140-150mm2 for N44 made more sense.

We also have to keep in mind that the 256 bit memory interface will not scale much from 6nm to 4nm and there is likely more control logic given that RDNA4 supposedly also supports GDDR7. I would be surprised if N44 is that low though, it's almost exactly half of N48 and we know there is a lot of "uncore" which dosen't scale linearly with specs. If N48 is ~320mm2, N44 should be around ~170-180 mm2.
 
AMD rep says performance of the 9070XT will be better than the leaks prior to CES.

Raster performance will improve a little bit, while RT and ML will improve considerably, because the ecosystem has evolved now to make use of these features.

He also says AMD will focus mainly on value, there is not gonna be a focus on tech showcases, or technology demos or heavy visual features or any over investment in features that will not be taken advantage of.

 
In the specs tab, it shows that the Radeon RX 9070 XT contains the Navi 48 XT die, which was expected. The base and boost clocks of the Gaming OC card are 2400 and 2970 MHz respectively, which means the reference edition will likely boast a lower boost clock. This also means that the rumors that suggested an up to 3.1 GHz boost clock are for premium AIB editions and not for the reference design.
Now coming to the price, the ₱35,000 price translates to US$593, which also includes 12% VAT. Excluding that, the price comes down to US$529. Hence, the reference card is likely going to launch for under $500 but remember that it is still too early to conclude this as the price is subject to change.
 

If that thing has performance in the 4070Ti region then it would be offering a lot of value at say $499. It will presumably be faster than the 5070 in raster and maybe comparable in RT. Plus now with AI based FSR4, and I wouldn't be surprised to see AMD come out with their own version of MFG either. Hopefully this generation we'll see a bit more competition in the mid range.

It would also make a very compelling alternative to anyone with an older GPU considering getting a PS5 Pro.
 
That would be strange.

Navi 32 with 60CU's had total die area of 346mm², and that obviously included four 6nm MCD's totaling about 146mm².

Navi 48 is supposed to be 64CU's, but there should be die savings from moving to 4nm

N4 is just very slightly improved N5. No big area savings from that.

and putting the memory bus and L3 IC on the main 4nm die as well(scaling here isn't great, but it's still something).

Perhaps 64CU is wrong and they actually upped CU count a fair bit instead of going for narrow+faster clockspeeds? Or maybe RDNA4 CU's are a really sizeable chunk wider to accommodate better RT/AI? I dont know, but that doesn't strike me as the sort of architectural or area efficiency gains they'd have liked, especially if the aim is to be able to price this GPU more aggressively.

RDNA4 CU is probably quite different than RDNA3 CU.

What is already publicly known is that the TMUs will be much beefier, capable of twice the texturing speed in many (but not all) situations. The beefier TMUs will both give perf increase and will also be bigger.

And then about the shaders itself:

RDNA3 doubled the number of multipliers per CU, but could dual-issue multiplications very rarely in real-world code, leading to very small performance increase from those dual-multipliers.

What I'm expecting is that they will get rid of many of the bottlenecks which prevented utilizing these dual-multipliers often. Might mean more register file ports or beefier frontend on the CUs.

Whatever they'll do, it will also mean the CUs might be considerably beefier.


Also, they might have finally added dedicated HW for the BVH tree traversal, not going it on software in shader cores anymore. If added, this will also add more area.


And the final thing: Cache. The L3 cache ("inifinity cache") can consume lots of die area if big. AFAIK we do not know the size of the cache yet, but I'd love to see the same 128 MiB 6800XT had, even though I'm pessimisting and thinking 64 MiB is more probable than 64 MiB.

Big cache is expensive, but can both improve performance by helping the bandwidth bottleneck, and also increase energy-efficiency, as DRAM accesses consume quite much energy.



IMHO 32 improved DCUs (64 CU's) with (often) twice the texturing power, beefier register files and instruction frontend, dedicated BVH tree traversal engine and 128 MiB of L3 cache would be quite nice, balanced and not very small chip.
 
And the final thing: Cache. The L3 cache ("inifinity cache") can consume lots of die area if big. AFAIK we do not know the size of the cache yet, but I'd love to see the same 128 MiB 6800XT had, even though I'm pessimisting and thinking 64 MiB is more probable than 64 MiB.
128MiB is (was) rather overkill. 64MiB with better hit rates and lower latency will yield better perf/area ratio.
 
Back
Top