And what is your source for this? MS's own patents are pretty explicitly suggesting otherwise and they explain a lot about the stuff we do know by the sound of it. Maybe the 'deferred' part is what's off? The patent below goes into detail about the procedure of rendering tile-based content and where this method's gains come from and it seems to mesh well with what ppl like Arthur on GAF had said months back. Additionally, the patent seems to directly allude to leaning on the eSRAM and display planes both for this methodology.
Here is the patent link for a new page...this link is a bit better actually since it has the diagrams:
http://www.faqs.org/patents/app/20130063473
I read through it and it sounds like there could be pretty considerable bandwidth and processing advantages rendering things on a tile depth basis as opposed to simply using tiles to construct layers, and then processing those layers in the GPU. This (new?) method could possibly explain why the eSRAM is targeted at such low latency and the murmurs from insiders of an exceptionally efficient GPU. It may not simply be a generic GCN setup making it "efficient" like many have asserted. It sounds like the more meaningful efficiency gains may come instead (or rather, in addition to...) the way the content layers are being processed.
Someone can correct me here, but I think in the typical approach to tile-based rendering support, even for stuff like the PRT support in AMD's recent hardware, the method for processing the content involved the GPU waiting around for the full layer to be stored in memory before it processes it all at once. This leads to the GPU sitting there with nothing to do in this capacity in the meantime, no? As such your GPU efficiency is bound by latency somewhat.
In the manner employed by the patent there, instead of your GPU waiting until an entire layer full of tiles is ready it handles the processing on a per tile basis as those tiles are stored in memory. The result is your GPU processing is bound by the latency of the eSRAM, which reportedly is extremely low (which is a good thing). At least, that's what it sounds like to me.
The image planes seem to play a meaningful role in this process too, so I'll re-read that patent tomorrow maybe and see if it adds anything interesting.