Let's just take that wavefront of 64 pixels, organized in some kind of rectangular fashion. If you use RGB & Alpha, that's 4 bytes per pixel or 256 bytes. A memory atom is 32 bytes. With delta-coding followed by something like Huffman or arithmetic coding, you can probably get those 256 bytes down to 128 or less bytes in many cases?
There might be a special case for a zeroed buffer, so maybe a handful of bytes for a whole tile.
Still, going back to the original question, the start address of each tile would be fixed, and at the same location as the uncompressed tile. Otherwise, how could you do a random access to any kind of tile in memory?
This goes to the unknown as to the granularity of the ROPs. There may be a level of tiling beyond the raster granularity, and the compression scheme might be quad-based.
8x8 and 16x16 of some base unit of representation are dimensions that come up fairly often.
At the larger granularity, and if a pixel quad is the base unit, could reduce a 1080p buffer to a few thousand entries in some kind of indirection structure.
That does sound like a fair amount of work, though, so I find your theory of a buffer full of severely fragmented free space quite plausible.
It is primarily about bandwidth, which is why I noted earlier that it could be a win even if the buffers themselves are bigger than they would be without compression as long as the number of bus transactions is reduced.
If there's cache between the ROP and the MC, that lockstep relationship will probably be gone for the most part?
The idea was that the small (16KB per RBE) caches have a thrash rate that the render export path is tuned to exploit. They have enough space for a some number of active tiles, some number waiting for writeback on the bus, and some number in the progress of being read in.
The caches aren't coherent and the ROPs themselves are statically allocated screen space on the input side and buffer space on the output, so whatever they read in or write back is heavily scheduled.
I still don't see how any of that would work for random accesses.
Short answer: I'm waiting for a disclosure on what manner it can do so, if it does.
The frame buffer was the designated target for this technology, and its greatest measured benefits are ROP throughput synthetics, which are fine at the granularity the ROPs have.
Perhaps more clarity will be provided at some point, but that's the extent of the disclosure at this point.
Going by the assumption that the compression method is built with the more rigidly defined ROP tiling in mind, those accesses would not be random at the level of the compression logic.
Going beyond that, nothing has been said about random access, although one could imagine that the GPU memory pipeline could detect an access to a buffer it knows to be compressed in this way and decompress on the fly. An access to a compressed buffer could have its address converted to an initial load of a pointer structure or an upper tier of a hierarchical structure and then the logic could work its way down.
Another possibility is a performance hit where if the driver/GPU detects this it decompresses the target buffer, which happens in the case of depth buffers bound as texture.
A less likely possibility is that such arbitrary access is requires a conversion back to a non-compressed format. There wasn't any discussion of special measures needed for software to use this, so I'm not banking on this possibility outside of a "it's happened before" sort of thing in some now-ancient GPUs.