AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

great article.
this page on CU scaling is also very useful
https://www.computerbase.de/2021-03/amd-radeon-rdna2-rdna-gcn-ipc-cu-vergleich/2/

might explain some of the issues XSX is having here. Reducing the resolution for XSX is not necessarily the answer. Though the lower clock speeds are likely an impediment for performance.

It would be interesting to see some tests done to see how it scales with clock speed and if that's consistent with the number of CUs. So test like 1.9GHz, 2.0GHz, 2.1 GHz etc across a range of widths (40, 72, 80). If clock scales pretty linearly and width does not, then you can start making pretty good estimates of performance.
 
confirmation of Cerny talks about narrow and fast vs wide and slower

It could also be interesting to put a RX6800 vs RX6700XT at the same compute throughput (e.g. 6800 @1330MHz vs. 6700XT @2000MHz).

Or if we want a comparison that is closer to the consoles, we could have the 6700XT at 2GHz (10.2 TFLOPs) vs. a RX6800 at 1576MHz (12.1 TFLOPs).
This wouldn't be a great comparison because the RX6800 AFAIK has twice the shader engines, but it could give us some insights on which engines will favor the higher compute throughput over the higher clockrate nonetheless.
 
confirmation of Cerny talks about narrow and fast vs wide and slower

I'm not sure that's true because you're only widening the compute portion of the GPU. If you were to widen the front and back ends in equal measure we may see a different story. Also, how is memory bandwidth hindering the scaling? We might see similarly non linear scaling if clock speeds were looked at in this way too without corresponding memory bandwidth increases. Certainly core overclocks on GPU's very rarely (basically never) result in a 1:1 performance uplift.
 
confirmation of Cerny talks about narrow and fast vs wide and slower
its not a fast/narrow and wide/slow argument.

wider / slow is exactly why we continually obtain more performance.

However, XSX is disproportionally wider on ALU vs it's front end. If they made the front end wider as do the 6800+ series of GPUS do, then it wouldn't be an issue.
 
wider / slow is exactly why we continually obtain more performance
It's how AMD's GCN and Nvidia Turing+Ampere had been achieving better power efficiency, but it certainly isn't the case for RDNA2 or Pascal.

Regardless, GPUs have clearly been steadily rising their clocks. In 2012 the best we'd get was a little over 1GHz on 28nm, but on 7nm RDNA2 is at ~2.5x that clockrate on similarly sized chips and power consumption.



However, XSX is disproportionally wider on ALU vs it's front end. If they made the front end wider as do the 6800+ series of GPUS do, then it wouldn't be an issue.
Anandtech's piece on the SeriesX SoC points to Microsoft giving a big priority to reaching 12TFLOPs (at some point they considered more CUs at a lower clock rate which would consume 20% less power). This priority might have been more related to the HPC loads they're planning for the chip on Azure servers than for gaming.
 
In a disappointing blow to all the 78 people on the planet who own a RDNA2 card to actually play games, AMD's Scott Herkelman told PC World that FidelityFX Super Resolution (officially now called FSR) is still going to take some time to arrive but "they're confident they can bring it this year", which makes the old pessimist me assume it's coming late December 2021 or January 2022.
He also said it may not even use machine learning at all.
Do I dare to ask for a timestamp?
 
I'm not sure that's true because you're only widening the compute portion of the GPU. If you were to widen the front and back ends in equal measure we may see a different story. Also, how is memory bandwidth hindering the scaling? We might see similarly non linear scaling if clock speeds were looked at in this way too without corresponding memory bandwidth increases. Certainly core overclocks on GPU's very rarely (basically never) result in a 1:1 performance uplift.
in the end showing thats performance its not perfect scaling with more cu's and explain why ps5 is so close to xsx
 
its not a fast/narrow and wide/slow argument.

wider / slow is exactly why we continually obtain more performance.

However, XSX is disproportionally wider on ALU vs it's front end. If they made the front end wider as do the 6800+ series of GPUS do, then it wouldn't be an issue.

The 6700XT vs the 6800 is an interesting comparison in this regard since all parts of the GPU have been widened equally (by 50%) as far as I can tell. However the 34% higher clock speed of the 6700XT (game clock to game clock) equalises thinks to the point of the 6800XT being only 12% faster on paper. So if clock is more important for overall performance than width I'd expect a less than 12% performance increase if memory bandwidth were equal. Unfortunately memory bandwidth is 33% higher on the 6800 (both main memory and IC) and it has 33% more infinity cache to boot so that would probably throw the comparison off, especially at higher resolutions.

However TPU does give the following performance uplifts:

1080p - 12%
1440p - 17%
2160p - 24%

Memory bandwidth is probably limiting performance a lot at 4k and possibly also somewhat at 1440p but at 1080p the 6700XT should have more than enough bandwidth and we're still seeing a 12% uplift for the 6800 here. And that's probably being supressed a little by CPU bottlenecks at that resolution.

So it's hardly conclusive but I see no evidence at all here to suggest fast and narrow is better than wide and slow provided all parts of the GPU are widened equally. In fact the evidence hints that the opposite may be true, at least in this single example.
 
PC gpus arent hampered like the ps5, they go both very wide and very, very fast (over 2.6ghz with boosts closing in to 3ghz), and Infinity Cache to help things out, with gobs of power and temp budgets.
 
Back
Top