RDNA3 Efficiency [Spinoff from RDNA4]

Arun

Unknown.
Moderator
Legend
[Albuquerquefx] Mod Mode: Arun's post seemed like the best place to start the spin-off RDNA3 efficiency thread. I've done a little bit of creative editing of some posts to keep the flow mostly linked to RDNA3 efficiency talk and less about "outlier" things. Please keep it civil...

I think it's fair to say RDNA3 has better power efficiency than RDNA2 probably even at iso-process (for non-raytracing especially), it just isn't "better enough", so how can it be an outlier unless you're claiming AMD typically improves perf/W more than they did from RDNA2->3 (not sure that's true in a long time except for RDNA1 maybe), or that AMD typically has the Perf/W crown (last time they did was GCN vs Kepler arguably, and definitely RV670/RV770/RV870-era where they had an unambiguous Perf/mm2 and Perf/W advantage in most markets).

Let's face it: 18 months is a long time in a high functioning engineering organisation. A lot can change, a lot can improve, through either rearchitecture or low-level optimisation. So it's always dangerous to underestimate the competition, and it's always wise to consider David Hume's Problem of Induction. The next R300/RV670/RV770 (or G80) could come out of nowhere. But to call RDNA3 an outlier seems to be pushing it a little bit in my opinion...

I'd rather know how RDNA4 compares to RDNA3 rather than think about the competitive positioning at this point really!
 
Last edited by a moderator:
I think it's fair to say RDNA3 has better power efficiency than RDNA2 probably even at iso-process
No.
Area, yes, power just no.
N33 is ~same perf for ~same power, the main difference is some 29mm^2 less area.
But to call RDNA3 an outlier seems to be pushing it a little bit in my opinion...
It's an outlier because they missed their publicly stated PPW targets.
Which is why RDNA4 has none communicated.
 
Last edited by a moderator:
Yeah consult their FAD2022 slides about RDNA3 and then see the final result.

No.
Area, yes, power just no.
N33 is ~same perf for ~same power, the main difference is some 29mm^2 less area.
but we still don´t know, what caused that issue with RDNA3. Was it chiplet design ? I don´t think so , because RX7600 suffer the same. Or some issue on silicon level then? Maybe.... Is it architecture flaw/bug ? Guess what....
I remember back then, when first numbers showed up , it was you who said, that AMD identified a bug in chip design and that it should be easy to fix , right ? So why didn't it become a reality ? sigh
 
Yeah consult their FAD2022 slides about RDNA3 and then see the final result.
Slideware aside, where AMD says its the same 54% improvement gen over gen as last time, Computerbase's comparisions (and Techpowerups as well, ~one third better ppw for 7900 XTX vs. 6900XT and 6950 XT vs. 5700 XT) for example seems to disagree with your unbacked statement.
Going bei WQHD-Resolution, because RDNA1 only had 8 GByte and suffered disproportionally when going to UHD, RNDA2 was around 33% more efficient (5700 XT vs. 6900 XT). That went up to 47% though in UHD resolution, because the 8 GByte framebuffer did hold the fps back unproportionately on RNDA1.

Going to RDNA3 vs. RDNA2 here, we see 34% better perf/watt for 7900 XTX compared to 6950 XT, which is all but an outlier.

So based on the provided data points, I doubt your unbacked sentiment, RDNA3 is not an outlier wrt perf/watt compared to other recent Radeon generations.
 
Last edited:
Slideware aside, where AMD says its the same 54% improvement gen over gen as last time, Computerbase's comparisions (and Techpowerups as well, ~one third better ppw for 7900 XTX vs. 6900XT and 6950 XT vs. 5700 XT) for example seems to disagree with your unbacked statement.
Going bei WQHD-Resolution, because RDNA1 only had 8 GByte and suffered disproportionally when going to UHD, RNDA2 was around 33% more efficient (5700 XT vs. 6900 XT). That went up to 47% though in UHD resolution, because the 8 GByte framebuffer did hold the fps back unproportionately on RNDA1.

Going to RDNA3 vs. RDNA2 here, we see 34% better perf/watt for 7900 XTX compared to 6950 XT, which is all but an outlier.

So based on the provided data points, I doubt your unbacked sentiment, RDNA3 is not an outlier wrt perf/watt compared to other recent Radeon generations.

Framebuffer doesn't take up a huge amount of ram, even at 4k. A full 64bit render target is just 64mb, even 10 of those (that'd be huge) would be just over half a gig, tons of bandwidth but not a huge pressure on ram.

RDNA3 is provably under target simply because the target clock increases were higher, and not hit, you can see they were aiming for about 2.75ghz or so, but couldn't hit anything but 2.5 with bad yields.
 
Last edited by a moderator:
Framebuffer doesn't take up a huge amount of ram, even at 4k. A full 64bit render target is just 64mb, even 10 of those (that'd be huge) would be just over half a gig, tons of bandwidth but not a huge pressure on ram.

RDNA3 is provably under target simply because the target clock increases were higher, and not hit, you can see they were aiming for about 2.75ghz or so, but couldn't hit anything but 2.5 with bad yields.
When talking about the graphics card as a physical object, framebuffer is often synonym to graphics card memory, even though both mean different things on the software side. I apologize if that caused confusion.

As my links have shown, RDNA3 is on the same trajectory compared to RDNA2 wrt perf/w. Refer to the TPU links if you will and want to directly compare UHD resolution as well. If you disagree, feel free to provide evidence to the contrary.
 
You can reduce clocks and voltage to exchange perf/mm^2 for perf/W
But you can't.
RDNA3 already sits close to Vmin for most (all) parts.
7600 XT has similar perf/W to 6600 XT (but much better perf/mm^2) while 7600 has ~20% better perf/W (and still slightly better perf/mm^2)
CP2077 is one of the few titles where it doesn't fall apart (somehow).
 
Last edited by a moderator:
But you can't.
RDNA3 already sits close to Vmin for most (all) parts.

Not at full tilt it doesn't.
From TPU's 7900xtx review: https://www.techpowerup.com/review/amd-radeon-rx-7900-xtx/38.html

Pretty sure what Qesa's getting at is that if you are willing to burn die a lot of die area, Apple-style, you could add 50% more functional units, and then run those at a lower, more efficient point on the V/F curve, which generally ends up with lower power consumption for the same performance level.

RDNA3 spends plenty of time up at 1.000v and beyond, it's not like it spends most of its time in gameplay down near 0.750v.

1709752055653.png
 
But you can't.
RDNA3 already sits close to Vmin for most (all) parts.
That can’t possibly be true for desktop parts, I couldn't find very precise information, but online discussions on undervolting makes me think voltage is typically way above 1.0v, which is already wayyyyy above 4nm Vmin (e.g. H100 goes down to 0.675v for low frequencies while AD102 is limited to 0.875v minimum).

EDIT: Ninjaed by T2098
 
Not at full tilt it doesn't.
Heavily depends on the game.
It's very weird.
Pretty sure what Qesa's getting at is that if you are willing to burn die a lot of die area, Apple-style, you could add 50% more functional units, and then run those at a lower, more efficient point on the V/F curve
You can't for RDNA3, not the current one anyway, it chugs amps, see 12/15W PHX1 results.
It's weird.
RDNA3 spends plenty of time up at 1.000v and beyond,
Ehhh not really, it's like .9v in gaming most of the time for N5 parts due to power limit hits.
but online discussions on undervolting makes me think voltage is typically way above 1.0v
No, kinda the issue, only like >400W AIBs sit there.
350W ref sits below 1v due to power limit smashing.
 
Heavily depends on the game.
It's very weird.

You can't for RDNA3, not the current one anyway, it chugs amps, see 12/15W PHX1 results.
It's weird.

Ehhh not really, it's like .9v in gaming most of the time for N5 parts due to power limit hits.

No, kinda the issue, only like >400W AIBs sit there.
350W ref sits below 1v due to power limit smashing.
The graph I posted was for the reference design - the AIB ones as you say are much closer to the ragged edge of the V/F curve: https://www.techpowerup.com/review/sapphire-radeon-rx-7900-xtx-nitro/38.html
The AIB cards make the point even better.

1709753789959.png
 
the AIB ones as you say are much closer to the ragged edge of the V/F curve
Yes and they draw horrific watts while doing that.
The thing was made to run 1.1v across all workloads while drawing 350W, not approaching 500W while burning your house down.
The AIB cards make the point even better.
My point is that measuring weirdo stuff like RDNA3 on a single game (CP2077) is idiotic.
It floats all over the place smashing into that 350W power limit.
I really need to find the appropriate mandarin rune combo, someone in the sinosphere probably graphed N31 v/f and power relations properly.
 
My point is that measuring weirdo stuff like RDNA3 on a single game (CP2077) is idiotic.
It floats all over the place smashing into that 350W power limit.
I agree with the first line there, but so far as I know, TPU's V/F curves are made from data taken from their entire 1440p benchmarking suite, which is some 25 games, and a good mix of new and old.

The top part of the page referring to CP2077 is only for the energy efficiency part, not the V/F curves.
 
but so far as I know, TPU's V/F curves are made from data taken from their entire 1440p benchmarking suite, which is some 25 games, and a good mix of new and old.
Oh, sorry, brainfart, mistaken it for how goldenpig graphed PHX1 v/f.
Either way this thing smashes the power limit while clocking on average no faster than RDNA2 refreshes while sitting at lower volts. Phoenix1 in particular is guilty of that.
1709755106850.png
 
CP2077 is one of the few titles where it doesn't fall apart (somehow).

I'm still waiting for any kind of proof or data point for your argument that's not based on your own personal opinion. Please add signal to the noise.
Two more data points:
 
Last edited by a moderator:
I'm still waiting for any kind of proof or data point for your argument that's not based on your own personal opinion. Please add signal to the noise.
Two more data points:
a) N31 isn't more efficient than AD102/103 at any point so guru3d chart hits the fucking trashcan (did you really dump the data without any sanity checks?).
b) 1709762833896.png
iso config (same WGP count same membw) against more reasonably clocked N21 part (6800 runs ~1v versus .9v for 7800XT here) PPW bump is 15% aka the shrink.
Worse, this is CP2077, one of the two good cases for RDNA3.
Juice 1v into 7800XT and watch the ppw delta evaporate.
 
RDNA3 is return to the status quo where AMD push their flagchip to best the 2nd best from nvidia which is comfortably clocked since they have a bigger chip that is >=50% larger.
 
Last edited by a moderator:
RDNA3 is return to the status quo where AMD push their flagchip to best the 2nd best from nvidia which is comfortably clocked since they have a bigger chip that is >=50% larger.
But it's just not true.
In no particular order, operating voltages of 4 GPU gens across both vendors. Find the outlier.
1709774996618.png1709775002659.png1709775010049.png1709775018324.png1709775039259.png1709775048045.png1709775066561.png1709775160876.png
 
Back
Top