Would Navi 33 need 256MB of Infinity Cache to exceed 6900XT and compensate for half the GDDR bus width? I don't think so. 128MB, the same as Navi 21, merely clocked higher should be dominant in terms of overall memory efficiency. And if we assume this GPU is aimed at 1440p gaming (instead of 4K) then we could say that this amount of Infinity Cache is comfortable at this performance level.
6600XT is essentially the same performance as 5700XT. The smaller bus (50%) and die (94%) along with 7% more transistors gives this performance despite using 71% of the power. Infinity Cache is somewhere in the region of 2% of the transistors if we use the 6 transistor per bit rule of thumb and add a bit for supporting hardware.
We could conclude that the narrower memory bus is allowing Navi 2x to spend more power on die for actual graphics work and that the power consumption of Infinity Cache is practically negligible, since most of those transistors are "idle" at any one time.
But Navi 3x won't get this Infinity Cache boost. So we come back to 128MB, on its own, as determining the performance of Navi 33.
I think it's reasonable to expect Navi 33 to be about 320-350mm². This article about TSMC 6nm:
TSMC Reveals 6 nm Process Technology: 7 nm with Higher Transistor Density (anandtech.com)
implies 15% more transistors. I suppose that puts it in the region of 21-23B transistors assuming that some of those extra transistors would be available because of a reduction in GDDR PHY size from 192-bit to 128-bit (saves about 16mm²?). That's about 5B short of the transistor count of Navi 21 (6900XT)
So the extra transistors available for GPU work, versus Navi 22 (6700XT), is about 5B. So that's not going to take Navi 33 to the equivalent of 64 CUs - only about 52. So that implies much higher clocks, e.g. 3.2GHz+.
I do expect the ALU:TMU ratio to get doubled in Navi 3x, but that's not going to save a huge amount of transistors. 2% of an "equivalent-CU" saving?
It seems reasonable that ALU:RA will get halved or quartered in Navi 3x, but I wouldn't be surprised if that eats up all the savings of reduced TMU count. I'm assuming RA math doesn't use TMU math units, merely that data-paths and scheduling (queuing and coalescing memory operations) are largely common. If the TMU math units have been generalised to allow them to also do ray-box and ray-triangle tests, then I guess texturing throughput will continue rising to beyond-crazy levels, just to advance ray-acceleration throughput.
The big unknown is still the hints of a radically different WGP design. AMD's trend with RDNA has been to increase the quantity of "uncore" per WGP, so I think that would put a strong limit on the increase in "equivalent-CU" count.
In other words I think brute clocks are going to be more significant than brute SIMD count, in getting to 6900XT performance at 350mm² or less. Extra uncore may well unlock yet better "SIMD IPC".
We may also see that ">6900XT" actually only refers to ray-tracing performance. In pure rasterisation workloads Navi 33 might fall far short of 6900XT when ALU-limited.