On topic of ESRAM, it is known that it has a low latency.
It is known that GPUs are good at hiding memory latency, but, AFAIK, that is done through cache prefetch, which is easy in graphic applications, as memory access is usually linear, and hiding cache latency is performed through "barreling" - cycling several threads over on single execution unit - one thread requests memory, the next step is done by another thread, and a few more, and by the moment the requesting thread gets a next step, data has already arrived from cache. Cache, however, is small and is not explicitly controlled, so if prefetch fails, you get a long wait till data arrives from main memory, so your thread has to skip many cycles. If the access pattern is random enough, cache would be much less of a help, as it would be often missed. For maximum utilisation of GPU, one probably has to trick prefetch to load a "working set" into L2 cache and then do all the work with that set. But, it isn't very big.
The ESRAM is much bigger and is controlled explicitly, so a programmer could pre-load everything that is needed for current frame (or tile) and would never have any misses beyond ESRAM, maximising efficiency.
That, however, supposes that algorithms used would request data from memory in a highly nonlinear pattern. One could speculate that in proper DX11 compute shaders that would be quite common - more so in physics, probably, but could be in graphics as well.
Does anyone has info on such algorithms, or at least info on how often GPU caches are missed in modern games?