AMD: RDNA 3 Speculation, Rumours and Discussion

no-X · Jul 29, 2021

As for the APUs and bandwidth: RDNA by itself is more bandwidth-efficient than currently used Vega+. Lets say 1,25×. Moving from DDR4 to DDR5 doubles bandwidth (2×). Infinity Cache / SLC allows to double the effective bandwidth (2×). So from the bandwidth perspective it would be possible to create (1,25*2*2) ~5-times faster integrated graphics than the current Vega+ used in Cezanne (using RDNA2/3 and standard dual-channel DDR5). I think such configuration would be more TDP-limited than bandwidth-limited.

Kaotik · Jul 29, 2021

pjbliverpool said:
I wonder if quad channel setups would be feasible as well? Not sure how much that adds to the cost of the motherboard but since the consoles are using them I assume it's not an extreme amount.

Technically they all will be quad channels, it's just two 32bit channels per DIMM instead of the current one 64bit channel per DIMM (or rather, that's 2x40bit vs 72bit, 8bits for ECC per channel)
And I don't see mainstream platforms getting more channels than the natural doubling DDR5 brings.

Frenetic Pony · Jul 29, 2021

JoeJ said:
It's niche because:
1. We never had a big APU at all for PCs.
2. Discrete GPUs had an entry level and were affordable.
3. iGPU were assumed to be not powerful in general.

All 3 points could change right now, reducing costs of a gaming PC to one half, with perf. good enough for all games of actual generation.
Smaller margins, but much higher volume. This is mainstream. High end is the true niche.

I'm wondering when chiplet APUs will be feasible, which would make this seem far more imminent. Drop in an IO die, an 8 core CPU die, and a smallish GPU, for example.

Let's say for AMD next year or two: 20 CU GPU, with TSMC's ultra high density SRAM libraries you could have 64mb LLC and DDR5 is fast enough to provide the rest. Clocked relatively high you're looking at, 5-6 teraflops, high enough to keep up with minimum requirements thanks to Series S. Then a 6 core Zen 4, high clocks and higher efficiency should keep it up with a Zen 2 3700. All you'd need is an NVME and you've got a game ready SFF box, final cost of what, $650 maybe?

Esrever · Jul 29, 2021

no-X said:
As for the APUs and bandwidth: RDNA by itself is more bandwidth-efficient than currently used Vega+. Lets say 1,25×. Moving from DDR4 to DDR5 doubles bandwidth (2×). Infinity Cache / SLC allows to double the effective bandwidth (2×). So from the bandwidth perspective it would be possible to create (1,25*2*2) ~5-times faster integrated graphics than the current Vega+ used in Cezanne (using RDNA2/3 and standard dual-channel DDR5). I think such configuration would be more TDP-limited than bandwidth-limited.

It would be economically limited at that point. Say they put 20 RDNA2 CUs and 64Mb of infinity cache in to compensate + 8 zen 3 cores. The chip to be like 300mm^2, with so much silicon, it's going to be expensive. Cost/mm^2 of silicon goes up exponentially as you get to bigger and bigger dies. Seems counter productive to do this when even discrete CPU and GPUs are moving towards chiplets.

100mm^2 CPU + 200mm^2 GPU will be cheaper to produce than a 300mm^2 APU by a long shot. I actually expect AMD to move in the other direction regarding APUs, GPU and CPU being separate dies even in small APUs makes sense economically. Doing 2 80mm chiplets would probably be cheaper than their 154mm Cezanne die, the power consumption and initial design investment has to be there tho.

DavidGraham · Jul 29, 2021

JoeJ said:
You mean higher res gfx needs more CPU power up to the point the CPU throttles?

No, I mean with a 3X more raster performance, a future RTX 4090/ 7900XT will definitely be CPU limited at 4K.

trinibwoy said:
Current gen console games targeting upscaled 4K at 30fps should take advantage of the extra horsepower on PCs shooting for native 4K at 120fps+.

No, a 3090 / 6900XT is twice as fast as a PS5/Series X, add 3X faster on top of that, and current generation games will be CPU limited quickly. Which is why I suspect that 3X figure is for resolutions of 4K and up. There is no way in hell we can achieve 3X raster performance in most of current games using any CPU we have today.

During the era of Xbox One/PS4, we didn't suffer much from the stagnation of CPU performance, these consoles had very weak CPUs, and games didn't put much load on the CPUs, now things will change, CPUs will be used to render more complex simulations, which means games will rely more on the CPU now, especially given the single threaded nature of games, and their tendency to not scale well with many cores, which means with these super powerful new GPUs, games will be more CPU limited, through the combination of complex simulation and high fps.

Jawed · Jul 29, 2021

An RDNA 2 CU is about 2mm².

An RDNA 3 WGP with 8 SIMDs on 5nm would be around 4mm² assuming 8 SIMDs I suppose.

Digidi · Jul 30, 2021

Putas said:
It would run empty on polygon covering less pixels than one converter can generate.

That’s the question why you build a piling where you have such big unbalance. So when we think about micro Polygons, the second Scan Converter runs always empty. It make only sense when you have Polygons which are bigger than 16 pixels….

trinibwoy · Jul 30, 2021

DavidGraham said:
No, a 3090 / 6900XT is twice as fast as a PS5/Series X, add 3X faster on top of that, and current generation games will be CPU limited quickly.

Not sure how you arrive at that conclusion using that math. Let’s use your 6x multiplier. 4K @ 120fps is 9x the pixels of 1440p (upscaled to 4K) @ 30fps. So still GPU limited.

Qesa · Jul 30, 2021

trinibwoy said:
Not sure how you arrive at that conclusion using that math. Let’s use your 6x multiplier. 4K @ 120fps is 9x the pixels of 1440p (upscaled to 4K) @ 30fps. So still GPU limited.

I think the other part of his argument is that the massive disparity in CPU performance last gen is no longer present. Skylake was vastly faster than jaguar, so running at 4x the frame rate wasn't an issue on PC; now that consoles are on Zen 2, a developer making use of that at 30fps is going to make 100+ a challenge.

Putas · Jul 30, 2021

Digidi said:
That’s the question why you build a piling where you have such big unbalance. So when we think about micro Polygons, the second Scan Converter runs always empty. It make only sense when you have Polygons which are bigger than 16 pixels….

Imagine the slowdown on macro polygons.

tsa1 · Jul 30, 2021

CPU performance is not something that is set in stone, some engines (SoTTR, for example), are mostly GPU-limited even in extreme scenarios (25% scaling, 800x600), while decrepit things like Dunia in all Ubisoft games are mostly CPU limited even in Full HD at low / mid-range GPUs level. And even in this case you will see _some_ (if not most) of the performance increase with more beefier GPU. For example, with 2x-3x more GPU dakka we'll be able to run DX:MD at 4k with 2x msaa or whatever type of anti-aliasing is used there (it's brutal to FPS atm).

People get caught in the absolutes for some reason and start to worry about a bit of "missing performance" (due to GPU or CPU) instead of just playing the games and noting what changed or not. Pretty sure I did not get 25-50% increase of fps in all games after switching from 3900x to 5900x (apart from e-sports titles where it actually happened), but it still got a lot smoother. I fully expect something of a sort for a similar GPU upgrade.

trinibwoy · Jul 30, 2021

Qesa said:
I think the other part of his argument is that the massive disparity in CPU performance last gen is no longer present. Skylake was vastly faster than jaguar, so running at 4x the frame rate wasn't an issue on PC; now that consoles are on Zen 2, a developer making use of that at 30fps is going to make 100+ a challenge.

It would be fantastic if developers found a way to make full use of the CPU such that an 8-core is required for 30fps. It’s highly unlikely this will happen though as if there was such a workload we would have seen it in some form already - demo, academic paper etc. Yes 8th generation CPUs were weak but that’s not the main reason for lack of innovation in CPU usage.

I would love to see high fidelity clothing simulation but that’s probably better suited for GPUs anyway. Maybe there’ll be a revolution in NPC AI. We can only hope.

troyan · Jul 30, 2021

So the 6600XT with 10 TFLOPs and a 128bit interface needs 160W. The GPU should be over 100W alone. Hope that shows how ridiculous these rumors about RDNA3 are. To deliver 2.7x more performance than that the RNDA3 GPU has to deliver 25TFLOPs with ~100W GPU power.

Bondrewd · Jul 30, 2021

troyan said:
RNDA3 GPU has to deliver 25TFLOPs with ~100W GPU power.

Good news!
It's both a shrink and a new uArch.

Leoneazzurro5 · Jul 30, 2021

troyan said:
So the 6600XT with 10 TFLOPs and a 128bit interface needs 160W. The GPU should be over 100W alone. Hope that shows how ridicolous these rumors about RDNA3 are. To deliver 2.7x more performance than that the RNDA3 GPU has to deliver 25TFLOPs with ~100W GPU power.

If you understood the thread at this point, you should have considered that:

- RDNA3 is a new architecture n a new process node (5nm), where as only the cache part is on 6nm (and caches are not the bulk of the power consumption)
- RX6600XT is clocked quite high, where the N31 clocks are supposed to be more conservative, hence in a much better point of the voltage/frequency curve, espacially considering a new process that TSMC is declaring able to reach higher speeds.
- N31 is supposed to have an higher power consumption than N21 anyway, while recent leaks point to next highend Nvidia card going over 400W and so this would be the competition this MCM GPU will face

DavidGraham · Jul 30, 2021

trinibwoy said:
Not sure how you arrive at that conclusion using that math

Using the latest UE5 demo, a Series X achieves 1080p30, while 3090/6900XT achieves 1080p60, with unoptimized PC code.

trinibwoy said:
It would be fantastic if developers found a way to make full use of the CPU such that an 8-core is required for 30fps

They will populate the screen with more characters, props and details, draw distance will be expanded, physics will get more complex, this should bring a modern 8 core CPU to it's knees even at 30fps.

troyan · Jul 30, 2021

Leoneazzurro5 said:
If you understood the thread at this point, you should have considered that:

- RDNA3 is a new architecture n a new process node (5nm), where as only the cache part is on 6nm (and caches are not the bulk of the power consumption)

RDNA2 is a new architecture, too. Yet with 70% more transistors than RDNA1 it is only ~30% more effcient. The 6600XT has a 128bit interface but uses 160W. A 3060 has 40% more offchip bandwidth and has nearly the same efficiency.
Navi23 is optimized for 1080p without raytracing and yet it is slighty better than a 3060. Even the PS5 SoC is more effcient and overall a better chip.

- N31 is supposed to have an higher power consumption than N21 anyway, while recent leaks point to next highend Nvidia card going over 400W and so this would be the competition this MCM GPU will face

Dont believe it. Even the 350W of the 3090 and 3080TI is way to high.

Bondrewd · Jul 30, 2021

troyan said:
Yet with 70% more transistors than RDNA1 it is only ~30% more effcient.

Wut.

troyan said:
The 6600XT has a 128bit interface but uses 160W

Hugging the fmax is very, very nice.

troyan said:
Even the 350W of the 3090 and 3080TI is way to high.

Who cares, AD102 is >450W.

CarstenS · Jul 30, 2021

troyan said:
RDNA2 is a new architecture, too. Yet with 70% more transistors than RDNA1 it is only ~30% more effcient. The 6600XT has a 128bit interface but uses 160W. A 3060 has 40% more offchip bandwidth and has nearly the same efficiency.
Navi23 is optimized for 1080p without raytracing and yet it is slighty better than a 3060. Even the PS5 SoC is more effcient and overall a better chip.

N23 over N10 is 760M transistors or +7,3%.
Power is down from 225W to 160W or -29%.

Leoneazzurro5 · Jul 30, 2021

troyan said:
RDNA2 is a new architecture, too. Yet with 70% more transistors than RDNA1 it is only ~30% more effcient. The 6600XT has a 128bit interface but uses 160W. A 3060 has 40% more offchip bandwidth and has nearly the same efficiency.
Navi23 is optimized for 1080p without raytracing and yet it is slighty better than a 3060. Even the PS5 SoC is more effcient and overall a better chip.

Frankly, you are comparing apples with oranges. RDNA2 is 30% (or MORE, as the comparison between N21 and N10 should have shown) efficient then RDNA1 on the very same process. If you want perf/W, you need to spend something for that. Nothing is free in the engineering world. Also, 6600XT and PS5 SoC have different clocks, different targets and different performance, even if PS5 integrates a (mobile) Zen2 CPU. If 6660XT would have been clocked lower, it would have had way lower power consumption. and they are based on the same architecture. So you are saying that RDNA2 is better than RDNA2. Rigghttt. To me, it seems you are only trying to bash AMD without a minimal understanding of tech and tech compromises.

troyan said:
Dont believe it. Even the 350W of the 3090 and 3080TI is way to high.

Lol, so Nvidia is able to put 144 SM on their next gen but these will magically consume much less because? Leakers are quite uniform on this point. .

AMD: RDNA 3 Speculation, Rumours and Discussion

no-X

Kaotik

Drunk Member

Frenetic Pony

Esrever

DavidGraham

Jawed

Digidi

trinibwoy

Meh

Qesa

Putas

tsa1

trinibwoy

Meh

troyan

Bondrewd

Leoneazzurro5

DavidGraham

troyan

Bondrewd

CarstenS

Moderator

Leoneazzurro5

Similar threads