AMD: RDNA 3 Speculation, Rumours and Discussion

Status
Not open for further replies.
With so many not able to buy current generation GPUs, it is going to be very hard to release a refreshed line up. Especially if it will be just as hard to buy the new GPUs.
 
So in the same discussion the leaker says that the low end cards may appear as a N2X 6nm refresh, due in 2023. If this is true, I'd say that will include only the smallest cards.
 
Would Navi 33 need 256MB of Infinity Cache to exceed 6900XT and compensate for half the GDDR bus width? I don't think so. 128MB, the same as Navi 21, merely clocked higher should be dominant in terms of overall memory efficiency. And if we assume this GPU is aimed at 1440p gaming (instead of 4K) then we could say that this amount of Infinity Cache is comfortable at this performance level.

6600XT is essentially the same performance as 5700XT. The smaller bus (50%) and die (94%) along with 7% more transistors gives this performance despite using 71% of the power. Infinity Cache is somewhere in the region of 2% of the transistors if we use the 6 transistor per bit rule of thumb and add a bit for supporting hardware.

We could conclude that the narrower memory bus is allowing Navi 2x to spend more power on die for actual graphics work and that the power consumption of Infinity Cache is practically negligible, since most of those transistors are "idle" at any one time.

But Navi 3x won't get this Infinity Cache boost. So we come back to 128MB, on its own, as determining the performance of Navi 33.

I think it's reasonable to expect Navi 33 to be about 320-350mm². This article about TSMC 6nm:

TSMC Reveals 6 nm Process Technology: 7 nm with Higher Transistor Density (anandtech.com)

implies 15% more transistors. I suppose that puts it in the region of 21-23B transistors assuming that some of those extra transistors would be available because of a reduction in GDDR PHY size from 192-bit to 128-bit (saves about 16mm²?). That's about 5B short of the transistor count of Navi 21 (6900XT)

So the extra transistors available for GPU work, versus Navi 22 (6700XT), is about 5B. So that's not going to take Navi 33 to the equivalent of 64 CUs - only about 52. So that implies much higher clocks, e.g. 3.2GHz+.

I do expect the ALU:TMU ratio to get doubled in Navi 3x, but that's not going to save a huge amount of transistors. 2% of an "equivalent-CU" saving?

It seems reasonable that ALU:RA will get halved or quartered in Navi 3x, but I wouldn't be surprised if that eats up all the savings of reduced TMU count. I'm assuming RA math doesn't use TMU math units, merely that data-paths and scheduling (queuing and coalescing memory operations) are largely common. If the TMU math units have been generalised to allow them to also do ray-box and ray-triangle tests, then I guess texturing throughput will continue rising to beyond-crazy levels, just to advance ray-acceleration throughput.

The big unknown is still the hints of a radically different WGP design. AMD's trend with RDNA has been to increase the quantity of "uncore" per WGP, so I think that would put a strong limit on the increase in "equivalent-CU" count.

In other words I think brute clocks are going to be more significant than brute SIMD count, in getting to 6900XT performance at 350mm² or less. Extra uncore may well unlock yet better "SIMD IPC".

We may also see that ">6900XT" actually only refers to ray-tracing performance. In pure rasterisation workloads Navi 33 might fall far short of 6900XT when ALU-limited.
 
The only talk of Q4 '21 or Q1 '22 I can remember is a few months back when the regular clickbait rumor sites were posting about tapeouts.

I'm only assuming N3x is Q2 2022 because @Bondrewd claimed AMD is on a 6 quarter cadence between GPU families. I don't follow random rumors from wccftech.



Would Navi 33 need 256MB of Infinity Cache to exceed 6900XT and compensate for half the GDDR bus width? I don't think so.
Going by AMD's famous graph of cache usage by target resolution, it does look like 256MB would fit some >75% of memory requests at 4K.

If 75% of the requests are done at ~2TB/s, then the remaining 25% can probably come at 256GB/s because the effective bandwidth will still end at ~1.6TB/s.
 
Last edited by a moderator:
One thing may be targeted candence, the other one is real situation affected by Covid, overload of TSMC, mining, etc. That could easily cause 4 months delay (Navi 23 was delayed by several months already).
 
One thing may be targeted candence, the other one is real situation affected by Covid, overload of TSMC, mining, etc. That could easily cause 4 months delay (Navi 23 was delayed by several months already).
Then unless nvidia is getting special treatment by TSMC they're getting Lovelace pushed to 2023.
 
Going by AMD's famous graph of cache usage by target resolution, it does look like 256MB would fit some >75% of memory requests at 4K.

If 75% of the requests are done at ~2TB/s, then the remaining 25% can probably come at 256GB/s because the effective bandwidth will still end at ~1.6TB/s.
If there's a 25% miss rate and DRAM is 256 GB/s, the effective bandwidth can be no higher than 1 TB/s, no matter how fast the bandwidth is on hits.

If the cache is serving up 2 TB/s from hits and 0.25 TB/s from misses, by definition that's an 89% hit rate.
 
Infinity Cache is somewhere in the region of 2% of the transistors if we use the 6 transistor per bit rule of thumb and add a bit for supporting hardware.
Sigh, when I was writing that I felt something was wrong, but couldn't put my finger on it. Bed resolved the problem: I hadn't accounted for bytes! So 8x 2%. ARGH.

The ALU count is identical to N21, which is 5120.
Well, we can subtract 32mm² off for 128-bit GDDR6 and save ~15% area to translate 520mm² on 7nm to about 424mm².

Perhaps the "new" WGP arrangement means there's only two shader engines, not four. This would reduce the count of ROPS, e.g. to 64. That would save a fair amount of die space... I reckon 32 ROPs are about the same area as a WGP, so just 64 ROPs saves in the region of 10mm². Perhaps 32mm² total saving with only two shader engines?

So (520-64)/1.15 takes us to 396mm² (ignoring the non-scaling of 128-bit GDDR6).

If this is really an 8GB card then it seems as if it would need to be positioned as a 1080p card, "7600XT".

I think power consumption is actually a bigger problem than performance, if this is really a 1080p card and around 150W.
 
So (520-64)/1.15 takes us to 396mm² (ignoring the non-scaling of 128-bit GDDR6).
Yep!
Sorta-kinda there.
If this is really an 8GB card then it seems as if it would need to be positioned as a 1080p card, "7600XT".
Unfortunately yes, clamshells are wildly impractical for anything resembling a mainstream GPU and 24Gb DRAMs are DDR5 only for now.
 
Status
Not open for further replies.
Back
Top