Bandwidth requirements are resolution * framerate. If you double the framerate, you double the bandwidth regardless of resolution. The likelihood that the whole game can fit into the IC of 128MB is unlikely at ultra settings, which I think you' were thinking of footprint requirements as a result of resolution. Since we know they cannot fit everything into 128MB, they are constantly hitting off-chip memory as well and for everything else, so off-chip memory is a massive factor and obviously the CUs rely heavily on the IC to make up for the shortfall of how little off-chip memory bandwidth is available. Normalizing the bandwidth on the IC would be unfair penalization because it needs as much bandwidth as possible to go as faster.But again, computerbases results completely go against what you're saying.
IC has little effect at 1080p and comes more important as the resolution increases, and yet in their testing 1080p still shows roughly the same 70% CU scaling as 4k does.
If it was a bandwidth issue 1080p would scale the most (providing not CPU limited...etc..etc...) as there's more available bandwidth, but it doesn't. Indicating that bandwidth isn't really the issue.
Looking at your Techpowerup example, specially the overclocking section where they show average clocks at stock it would seem that the clock speeds were not a match.
So it's not a CU scaling example but a CU+clock speed scaling example.
If increasing bandwidth results in increased performance, then bandwidth is the bottleneck. This is why if you want to do CU scaling, you must do normalized bandwidth per CU. If you look at the offchip bandwidth per CU of the 6800XT it's bandwidth per CU is 7.1 GB/s. The bandwidth per CU for the 6700XT is 9.6GB/s. 7.1 / 9.6 is 75%. Roughly very close to your 72% scaling. Or put another way 512 GB/s is only 33% more than 384 GB/s, once again very close to the "Despite 50 percent more compute units in the Radeon RX 6800 and a good deal more memory bandwidth, the performance only increases by an average of 28 percent compared to the 40 CU configuration."
A good deal more bandwidth is relative because that bandwidth gets used up to output more frames per second. If you have unlimited bandwidth you move the bottleneck to compute and processing. If you have more compute and processing available than bandwidth, the bottleneck is bandwidth. The only way to measure CU scaling is to either max out everythign else, or normalize everything especially the bandwidth per CU if that's what you want to measure. But of course there are other factors like fixed function hardware etc. Which wasn't done.