There are really no rumors / insiders suggesting 160CUs..
True. AMD's counter-espionage has hit truly spectacular levels though. NVidia has done much the same, though it seems AMD has won this round.
My die analyses of Navi variants are trivial things for NVidia to have done too (and NVidia has had a very long time to do them). I believe this is why NVidia has marketed "value" so heavily, because it doesn't take a rocket scientist to see that GPU compute is uber-cheap now.
Honestly, I'm gobsmacked by the weak compute of the two consoles (16% of the XSX die is for CUs). I see this as a major fail. Or, built-in extreme obsolescence. Perhaps the next console gen will consist of 2 refreshes?
Plus, 4x Navi10 CUs at 2.1GHz wouldn't compete with a 3080/3090, it'd be considerably faster.
Why would 6900XTX with ~41TF (2GHz) be on a completely different level from 3090 at 36TF? Only NVidia is now allowed to have huge amounts of compute?
160 CUs or more SIMDs per CU?
I've contemplated more SIMDs per CU. ALI:TEX could double, sure, but I wonder about LDS space and LDS versus VGPR/lane-mechanics too. To be honest, I have no theory for or against.
If both XSX and PS5 have RDNA 1 CUs, then yes, PC Navi could be a 4-SIMD per CU "monster", being the only "RDNA 2" GPU that has RDNA 2 CUs.
Also, I think a patent that talks about CUs sharing L1 encourages lots of CUs per L1. One thing I haven't been able to work out is whether an L1 is per shader engine or per shader array.
Because an L0 (and LDS) is shared by two CUs in a WGP, the patent should probably be read with WGPs in mind, not CUs. The WGP is the real unit of compute in RDNA, not a CU.
Additionally there are vague rumours saying that RDNA 2 is real RDNA, not the GCN/RDNA hybrid seen in Navi 1x. This could be interpreted to mean that any rumours that talk about Navi 2x CUs should be re-interpreted with WGP replacing CU.
Both would be monstrous but I don’t see how the former is possible unless they’ve hit theoretical peak density numbers on 7nm.
I don't understand what you mean by theoretical peak density numbers and why hitting them is relevant.
A Navi 10, 14 or XSX CU is ~2mm². A 5xxmm² die with ~150mm² of "cache" doesn't make sense to me. No matter how exciting the idea of a solid 128MB lump of last level cache, I can't take it seriously. I believe that a cache is a cache precisely because it's a small, efficient, block of memory.
160 CU's, 2.4 GHz clocks.
Silly season is in full force.
We've seen a die that's considerably larger than 500mm². So take your pick:
- massive last level cache
- lots of CUs, about the same FLOPS as EDIT: GA102
- HBM 4096-bit bus plus 512-bit GDDR6 bus
- some combination of these
Navi 21 scaled up from Navi 10 with 80 CUs and 4 shader engines is only about 360mm².
I would expect the full die to run at lower clocks.