I guess for me, the question is if they bypass VRAM for SFS streaming as per Beard Man's comments and it's going directly to the GPU.
Well then... where is it going?
L2 is connected to memory controllers...
L1 is the Shader Arrays
L0 is the CUs...
so where are the textures being dumped? How do we quickly distribute that incoming data to all the shader arrays that require it?
if there was a cache of some size but not L3, of which the purpose is to hold 1 copy of everything and L1 can check it or L2 for it... before going out to memory... then perhaps this setup might make sense to have even in a smaller configuration - even something as small as esram.
This is an interesting question. I don't know enough about this even in general, but thinking about it, I suppose if you are indeed treating it like it's in vram (which has been mentioned a few times) your IO unit would pass the data to whatever requested it, in as similar a manner as possible to how vram is accessed. So maybe you evict something from some level of cache and dump it there. For textures I suppose L1 would make most sense?
Those "SoC memory coherency" things on the die shot seem kind of beefy, and there seems to be one per shader engine. I'll point my finger at them and say based on nothing in particular that they manage the job.
I suppose you'd have to be able to chose whether data was copied in vram afterwards, or simply used and discarded to be fetched again if needed.