I don't know. Looks like they are still keeping the tiny caches within the RBEs (which handle compression and decompression) as L1 and just back them with the L2. That means the L2 would only be exposed to the compressed render target tiles (probably limited in size to multiples of the cache line size) and it doesn't need to know anything about it in detail (it just caches cachelines mapping to certain memory addresses).
If the idea is that CUs can read that target back in the L2, it introduces a dynamically variable relationship between how many lines in the L2 belong to a tile, and the value of their contents.
The DCC pipeline is described as internally maintaining a secondary cache of compression metadata, which is now an unknown as far as whether it is cached.
Getting a consistent view across multiple locations when there's a hierarchical relationship involved seems like a complex undertaking.
I don't know if that would make the compression path an arbiter of access to compressed targets, since it would know how many lines are relevant and what values it generated. There are many more clients to service than back when it was situated between a statically mapped RBE and its associated channel.
That's true if a shader access to the render target necessitates a decompression pass. If the TMUs can directly read compressed render targets, I don't see a problem.
The TMUs already can read DCC targets, at the cost of losing compression efficiency versus targets defined as being off-limits.
"Shader-readable targets are not as well compressed as when it is known that the shader will
not read them."
http://gpuopen.com/dcc-overview/http://gpuopen.com/dcc-overview/
One possible scenario where this may happen is from the Polaris whitepaper, where 256 byte blocks can compress at an 8:1 ratio, which means it can compress down to less than the width of a non-ROP cache line.
"A single pixel in each block is written using a normal representation and all other pixels in the block are encoded as a difference from the first value. The block size is dynamically chosen based on access patterns and the data patterns to maximize the benefits. The peak compression ratio is 8:1 for a 256-byte block."
http://radeon.wpengine.netdna-cdn.c...is-Architecture-Whitepaper-Final-08042016.pdf
I'm not sure if it's coincidental that it matches GDDR5's prefetch length of 8 and transfer size of 32B.
This may make sense since it's currently more about saving DRAM accesses, and any ratios that wind up creating an additional partial burst save nothing in that regard.