anexanhume
Veteran
I think the coherency fabric blocks have significant caches based on the die shot.I don’t have my post handy, but there are also the L0 to L2 caches across the cores and CUs. I think it was up to 40MB or around there that were “obvious”, but I couldn’t figure out the rest, even with registers and the sort.
edit: nvm
Here it was:
Zen 2
-------
L1 = (32kB I$+ 32kB D$) * 8 = 512kB
L2 = 512kB * 8 = 4MB
L3 = 2*4MB = 8MB
Total CPU = 12.5MB
Registers/AVX etc. ???
Anaconda GPU
4 SIMD per WGP
-----------------------
I$ = 32kB per WGP
K$ = 16kB per WGP
Vector register file = 128kB*4 SIMD
L0 Vector cache = 2 * 16kB per WGP
Scalar write-back cache = 16kB*4 SIMD
Scalar RF = 10kB *4 SIMD
LDS = 2*64kB per WGP
L1 = 128kB per shader array * 4 arrays
^very confused at trying to read the whitepaper
28 WGP = ~23MB ???
GPU L2 = 5MB
Render Back End caches = 128kB per RBE ??? (guesswork - I think sebbbi did some tests back with GCN, but hard to say if there are any changes while I can't remember the amount he deduced)
Total: ~41MB accounted for