To make this work you need a ROP architecture that can work with non-uniformly organised memory for render targets.
I can see why you'd think that (I used to also), but no...
Mintmaster's comment explains why this is the case, so I won't bother unless you want me to elaborate on potential implementation details.
Mintmaster said:
The problem with this idea is that it improves the best case while keeping the worst case the same. When a tile is compressed there isn't much benefit from storing it in EDRAM because it's low BW. It's the uncompressed tiles that chew up BW.
You are working on two assumptions which are, as far as I can tell, far from perfectly accurate:
1) eDRAM used this way wouldn't improve the worst-case: No, it would improve it, just by a lesser percentage. You could read three memory bursts instead of four, since one of them (even when it's for uncompressed data!) is in eDRAM. This saves 25% bandwidth in the *worst-case*.
2) The vast majority of bandwidth comes from non-compressed tiles. This is probably a gross simplification, see below.
Any smart modern architecture wouldn't just have 'compressed' and 'uncompressed' tiles. You'll have different levels of compression most likely, ideally reusing the same techniques (thus sharing silicon) but less aggressively.
Nobody outside NVIDIA and ATI has any idea how fine or coarse these compression levels really are, but at the strict minimum I would expect that you to have, say, 4:1, 2:1 and 1:1 for 4x MSAA's color buffer. I would also be surprised if there was no '3.5:1' mode basically (or perhaps that really is 4:1!) to handle the common case of 'nearly-perfect-but-really-not' compressibility.
So, what I suspect is that a majority of the bandwidth is taken by midly compressed tiles, not fully uncompressed ones which are more the exception than the rule and that are probably limited by, say, triangle setup anyway in current architectures.
Furthermore, there is something else that might not be completely obvious. Assuming there is only exactly enough eDRAM to fit everything (i.e. save 100% framebuffer bandwidth) under maximum compression for all tiles, then the amount of bandwidth you save is always exactly this, where both the final result and the average compression are between 0 and 1:
Code:
Saved Bandwidth = eDRAM Amount / (Framebuffer Size * Average Compression)
It could be proven that 50% of the framebuffer compressing 50% (and the other half being uncompressed) results in equal savings to any other way to achieve 25% overall framebuffer compression under the above rules. Thus, every tile being 25% compressed or 75% of tiles being 33.3% compressed results in the same bandwidth savings for a given amount of eDRAM.
Of course, that breaks down when you have more eDRAM than your framebuffer size multiplied by your best-case compression rate (unless you want to go non-uniform; ugh!) but the final results remain very impressive IMO.
So, the real question becomes: what do you think the average compression rates are for a 1920x1200 4x MSAA HDR framebuffer in, say, Oblivion? I'd expect them to be pretty damn good, otherwise the final performance doesn't make much sense in my mind. And as a logical consequence of this and the above, I would expect eDRAM bandwidth savings under my proposed approach to be pretty damn good too.