Sounds reasonable, but this would only work with the
back buffer using an integer 8-bit per component RGB/BGR format, and not with HDR formats R10G10B10A2 and 16-bit floating point R16G16B16A16. Same for
multiple render target formats, including one- and two-component 8-bit integer formats.
I don't see any problems with float formats. Delta compression with floats works exactly the same way it works with integers. You have a certain (hard-coded) rule for the estimate (guess) and store the distance from it (rule can be different for different formats, as to ROP knows the active format for each RT). Floats are binary numbers, so you can threat them exactly like integers. If you flip the sign bit (and ignore NaN and INF), IEEE float bit presentation produces a monotonically increasing numeric presentation (numbers close to each other are also close to each other in the bit presentation). You store the distance by binary value, not by float value. Same in decompression, the decompressor doesn't need to know that the value is float. It just decompresses it with the same rules.
Who needs 16K framebuffer? More realistically, with 2 bits per tile and 8x8 pixel block, 1080p requires 8100 bytes, 2560x1600 requires 16000 bytes, and 4K (2160p) requires 32400 bytes.
I agree that any of the above is too large to have a dedicated cache though.
Older GPUs had dedicated caches for Hi-Z, fast clear and other acceleration structures. If you used too large render target, the GPU disabled the optimizations for the lower part of the render target. GCN on the other hand is fully memory based. The delta color block data is most likely cached by the L2 in a similar way to all the other data (including HTILE, etc other acceleration structures).
2 bits (per 8x8 tile) at 16k * 16k is only 1 MB. If you compare this to the actual data size of a 16k * 16k 32 bpp render target (
1 GB), you notice that the size of the acceleration structure is meaningless (it is 1024x smaller).
This would involve a thorough analysis of the shader compiler in the OpenGL driver.
Shader code just issues RT reads and writes. Shader execution (and the shader compiler) do not need to know about the RT/texture formats (or compression, or tiling mode, etc). Resource descriptors hide these details and go directly from scalar registers to samplers. Shaders should not require any microcode changes to support delta color compression (depth compression doesn't need shader code support either).
Not saying you really need it, but this is what the d3d11 specification requires to be supported. Maybe some app is doing something crazy (like 4x supersampling with 4k...) and you don't want to fall off a performance cliff once you've reached some fixed size where you'd need the savings the most (and yes chips like rv350 had such cliffs). And in case of MRTs you need that structure for each RT which further increases the size.
Not a problem on GCN. 2 bit per 8x8 block would be 1024x less data than the payload. GCN is fully memory based. L2 would cache the delta compression acceleration structure (like it does for all the other existing acceleration structures).
Well it doesn't support that feature for GCN 1.2 yet, but depth buffer compression is not really different. And the driver knows pretty much everything about this, including doing the decompress "blits" to uncompressed when necessary (fwiw it actually does this in-place, you essentially draw a quad and set up some RBE bits correctly, so depth values get read/written even though depth test is always pass and the values don't change, with compression enabled for reads, but disabled for writes or something - it is quite possible the hw may even be able to skip tiles which are already uncompressed though I'm not sure). I'm waiting for the driver to support this stuff on GCN 1.2 so I get a better understanding how it's working (especially the access in the tmus) ;-).
Yes, the driver needs to know when it needs to queue a decompress command for a compressed resource (this is usually done before a RT need to be sampled as a texture - driver knows this by checking the texture bindings). In DirectX 12 you do this manually (barrier resource transition from RT -> SRV). The shader reading a texture or writing to a render target doesn't need to know anything (resource descriptors hide all the format details).