Good start, but would be nice to have a white paper explaining where all that SRAM went. There is enough missing for a whole 2nd chip in there. Most cache sizes in the ISA document were inline with previous generations, except the 2MB larger L2.
I tried to find another GPU slide where AMD gave a figure for MB of SRAM per die, without much luck so far.
We may need to refrain from being to worried about that figure in the absence of data from other AMD GPUs.
The wafer shot showed command processors and geometry front ends that appear to have some markedly visible SRAM arrays, and a few extra blocks with at least some SRAM of their own. Doubling L2 capacity means doubling the number of tags, AMD indicated much of its transistor growth was invested in extra clock speed and measures like buffers (SRAM) needed for compensating for the extra speed.
The Vega ISA has significantly expanded the number of memory instructions it can have in-flight, which may have implications as to how many buffers there are for descriptor information and miss handling resources, given that GCN tosses a fair amount of data to the L/S section. That is an area where AMD decided to change the layout for, and may have added more hardware in the process.
There's a band of what AMD indicated in its power management controllers slide to possibly be a major component of the on-die infinity fabric, and each block appears to have at least some SRAM, perhaps with routing information and transaction information.
The memory controllers/HBCC would need to maintain a fair amount of queues and state.
The virtualization section from AMD's slide is likely dominated by storage, and there's a security section that likely is an ARM A5 and secure memory good for hundreds of KB at least. The video block may be good for some fraction of a MB, etc.
If we start thinking each extra block is good for some fraction of a MB per section, or is tens of extra KB over 64 CUs, it would start adding up.