It wasn't a perfect test no, but it's the best we have (unless you know of a better article?) and is likely close enough to reality to be useful when discussing CU scaling for RDNA2 based GPU's.
It's also worth reminding that the lower CU cards also have a smaller memory bus so they have less bandiwdth then the 60+ CU cards. Bandwidth was not equalized between the cards.
IC also does very little at 1080p so running at 2Ghz wouldn't be an issue, 1080p shows roughly the same CU scaling as 4k does.
There won't be a perfect test, and I don't rule out that the number is somewhat useful for talking about it. But the reason I made the point to begin with is that, technically speaking more CUs is not the issue.
For the tests themselves, they don't need equalized bandwidth between cards, we want bandwidth per CU.
the 6700XT has 1.5TB/s of bandwidth for 40CU.
The 6800XT and 6900XT have 2.0TB of bandwidth for 72 and 80CU respectively.
There's no comparison on which CUs are being supplied better.
Without the cache, they only have 512GB/s of bandwidth. That's actually slower than the XSX with 560GB/s.
Whereas the 6700XT is 384 GB/s and 1.5TB of cache, but compared to a PS4 which has 448GB/s with 36 CUs. At least the differential there is 60GB/s. They actually need over 768GB/s of bandwidth + 1.5TB/s of cache to be double the memory system of a 6700XT when doubling the compute power. the 6800/6900 series when the cache is removed is pretty pitiful in bandwidth per CU. So pitiful it's hard to believe, PS5 will have 2x the bandwidth per CU looking at offchip.
The point of my post was to showcase that CU scaling works and can work perfectly provided the data can be fed to the CUs properly, which is a function of both programming and of course consumption of available bandwidth.
Performance doesn't get worse just because you added in more CUs. They just aren't fed well because the cost of bandwidth is significantly higher than the cost of adding more ALUs. It's not an architectural thing, it's a cost thing.
We have more stages of cache in the SoC to try to mitigate the need to hit main memory because it's cheaper to do that than it is to build more and more bandwidth off chip.
Of the group, if we remove cache entirely - PS5 still has the most off-chip bandwidth per CU, followed by XSX, then 6700XT, 6800,6900 respectively.
CU scaling tests will always benefit the lower number in this case for those reasons, it really comes down to where the bottlenecks are.