AMD: RDNA 3 Speculation, Rumours and Discussion

Status
Not open for further replies.
So hopefully the earlier rumors about two GCDs ultimately pan out, because while these small dies look great, they'd still end up second-best like Vega.
When exactly was the last time AMD being *not* the second-best?

I guess they managed to stay on top with Hawaii for some time in 2013. Since then it was the horrible GCN stagnation (4GB Fury, no highend with Polaris, hyped Vega 64, sad Vega VII) and two pretty solid RDNA gens.
 
In single games it might reach a 6900XT, especially in RT. In a bigger parcour as Hardwareunboxed, it will land below a 6800XT, if not clocked much higher than 3ghz.
Navi 33 with its 4096 stream-processors at ~3 GHz Game Clock should reach 24,6 TFLOPS, while Navi 21 / Radeon RX 6900 XT with 5120 stream-processors at 2015 MHz gets 20,6 TFLOPS. That's like ~20 % higher arithmetic performance. It's hard to imagine Navi 33 will be significantly slower than Navi 21 until RDNA 3 e.g. halves the ALU:TEX ratio (like Nvidia did with Ampere) or reduces the ALU:ROP ratio significantly.
 
And that with 1/3 of the die. Seriously, do you people really believe that? That would be the first time in history that a company can increase compute throughput by 2.5x ( i guess AMD needs ~50% more transistor) while reducing the die size by 20% over the previous generation on a slighty better node.

nVidia doubled FP32 compute throughput only from Turing to Ampere but they used (inclusive every other improvements) 30% more transistors and a shrink von 12nm to 8nm (full node step).
 
Seriously, do you people really believe that?
No one told you AMD engineering was  sane.

That would be the first time in history that a company can increase compute throughput by 2.5x ( i guess AMD needs ~50% more transistor) while reducing the die size by 20% over the previous generation on a slighty better node.
Hence why I've been shilling those parts for almost a year now, maybe more.
They're positively nuts.
 
While 4870/4850 were good designs, AMD would've been even better off with a bigger chip that would have give them the top-dog status for couple of years. Having the fastest cards during the 48-58 series would've gained them a lot of mindshare.
Yep a bitter lesson from history. A 384-bit version with 16 CUs would simply have trashed GTX 280 and still would have been radically smaller (though about the same power consumption).

The whole idea that Crossfire (x2 card) was viable as a flagship card :no:
 
And that with 1/3 of the die. Seriously, do you people really believe that? That would be the first time in history that a company can increase compute throughput by 2.5x ( i guess AMD needs ~50% more transistor) while reducing the die size by 20% over the previous generation on a slighty better node.

nVidia doubled FP32 compute throughput only from Turing to Ampere but they used (inclusive every other improvements) 30% more transistors and a shrink von 12nm to 8nm (full node step).

It depends on where you spend the transistors. It wouldn’t be the first time AMD designed an architecture targeting high flops on paper. RDNA is actually very balanced though aside from RT so I would be surprised if they starting chasing unusable flops again for their gaming arch.

Feeling major deja vu from the Navi3x hype. It’s almost like we heard the exact same promises for Navi2x from the same source.
 
I mean with Nvidia pushing >30 Tflops in the same area as Navi 21 it is rather imperative for AMD to improve their FP32 flops/die area if they don't want to start falling behind again with shaders becoming more and more math heavy.

That being said though Nvidia will likely be pushing ~40 Tflops from the same area in Lovelace so improving that by 2-2.5x won't lead to an AMD advantage in this metric still.

And yeah, with WGP rebalancing we can probably forget about RDNA3 scaling anywhere near linear along flops changes in comparison to RDNA2.
 
And that with 1/3 of the die. Seriously, do you people really believe that? That would be the first time in history that a company can increase compute throughput by 2.5x ( i guess AMD needs ~50% more transistor) while reducing the die size by 20% over the previous generation on a slighty better node.
Would it be?

R520 (Radeon X1800 XT) ⇾ R580 (Radeon X1950 XTX) was 3.25× with just 22 % more die space (including other improvements, e.g. beefier ROPs) on the same node and using the same architecture.
R520 (Radeon X1800 XT) ⇾ RV570 (Radeon X1950 PRO) was 2.16× with 20 % less die space on a half-node step (inferior 80nm process which resulted in reduced clock speed). With a clock increase comparable to Navi 21 ⇾ Navi 33, it would be 3.38× higher arithmetic performance at 20 % smaller area - much bigger step than Navi 21 ⇾ Navi 33 (achieved even on the same architecture).
RV670 (Radeon HD 3870) ⇾ RV770 (Radeon HD 4870) was 2.41× with 33 % more die space (including much improved ROPs) on the same node, done within 7 months.

Don't forget that Navi 21 ⇾ Navi 33 (maybe increases arithmetic performance, but) reduces Infinity Cache by a factor of 4. Bus-with is reduced by a factor of 2. There were hints about simplified ROPs (some ops could be moved to ALUs) etc.
 
I mean with Nvidia pushing >30 Tflops in the same area as Navi 21 it is rather imperative for AMD to improve their FP32 flops/die area if they don't want to start falling behind again with shaders becoming more and more math heavy.
It's worse than that, I reckon, since RT explicitly relies upon math throughput in AMD's design (assumption based upon there being no sight of a re-design for RT in patent documents). So Navi 31 merely catches up to where it should have been in Navi 21 from day one...

That being said though Nvidia will likely be pushing ~40 Tflops from the same area in Lovelace so improving that by 2-2.5x won't lead to an AMD advantage in this metric still.
... while NVidia will be leaping far ahead again, it seems.

And yeah, with WGP rebalancing we can probably forget about RDNA3 scaling anywhere near linear along flops changes in comparison to RDNA2.
As far as I understand it with two SIMDs sharing a register file it's guaranteed to scale worse than RDNA 2 because any 3-operand instruction in one SIMD is going to interfere with the throughput of the other SIMD, even if only momentarily. Despite all the possible compiler jiggery-pokery we've seen in open source patches, we also know from history that relying upon the compiler is the best way to trash throughput.
 
Don't forget that Navi 21 ⇾ Navi 33 (maybe increases arithmetic performance, but) reduces Infinity Cache by a factor of 4. Bus-with is reduced by a factor of 2. There were hints about simplified ROPs (some ops could be moved to ALUs) etc.

AMD has Navi24 on 6nm with 64bit, 16mb L3 cache, 1024SP, 100mm^2, 50 bxtors/mm^2 and high clocks (2900GHz?!) at 105W TDP.

We are talking about an archivement on the same process which laps anything TSMC was able to provide with their processes since years. For example Apple has only improved efficiency by 20% with M2 GPU over M1.
 
Navi 33 with its 4096 stream-processors at ~3 GHz Game Clock should reach 24,6 TFLOPS, while Navi 21 / Radeon RX 6900 XT with 5120 stream-processors at 2015 MHz gets 20,6 TFLOPS. That's like ~20 % higher arithmetic performance. It's hard to imagine Navi 33 will be significantly slower than Navi 21 until RDNA 3 e.g. halves the ALU:TEX ratio (like Nvidia did with Ampere) or reduces the ALU:ROP ratio significantly.

But that's exactly what they're doing, if the data is right. 4096 Shaders in a smaller size chip than 2048 is only possible, if you cut other corners. Just cutting out the Re-Order Buffer isn't enough by far. It's impossible to double the TMU count in such a small chip. TMUs aren't small. Therefore the ALU:Tex ratio must be halved. Same should apply for ROPs. They are going from 237 mm² to 203mm², the transistor count might stay the same, if they go for higher density, but not much more.

TESKATLIPOKA posted the most probable configuration already:
N33 -> 16WGP, 4096 Shaders, 128 TMUs, 64 ROPs, 32MB IC, 128bit GDDR6?
Only the shaders will be doubled compared to N23. The other stuff will stay the same.
 
Status
Not open for further replies.
Back
Top