That's not quite an accurate description. There's several compression ratios available
That what I hinted in the last paragraph. Block compression schemes naturally use coding-selectors for alternative codings per block.
The coding is traditionally not written down outside of the block, you have a large bit bitfield indicating if something's compressed or not, this selects the decoder to be used. The decoder for the compressed scheme is an isolated piece of hardware which doesn't take "parameters" like a function call (the compression mode) but just the fixed size memory chunk.
The selector can be a bit, or a few bits, but it can also be a violation of a convention (see start > stop criterion in BC1-5) or it can be a variable length prefix code (see BC6-7) or chained variable length prefix codes (see ASTC). It is rather inconvenient to have the selector outside of the block.
(since r3xx I think, for depth, but I doubt it's only one per color neither) - so blocks can be either compressed by 1:2, 1:4 and so on (not sure exactly which ratios are available, probably more than these 2), hence you need more bits per block (to identify the compression scheme, 2 bits would be good for just 2 ratios, as you need fast cleared, uncompressed, ratio 1, ratio 2,...)
My belief is that fixed rate coding is employed, as this reduces complexity by a large amount, it also prevents the encoder to deal with unneccessary decision problems. Encoding and decoding times have to be symmetric for the given problem.
You can select for example an encoding with more planes, but less precise delta, or more precide deltas but less planes. Different code-block sizes allow a better best case, as the data just might be compressible well, but fixed code-block sizes with a lot of different codings allow a better worst case, as much more blocks are compressible. It's a tradeoff.
Also, I don't think this buffer is really loaded as a whole nowadays. For color this would be very problematic as you'd waste _a lot_ of transistors (essentially should be able to hold that information for 8 16kx16k (which is the max size with d3d11, not 8kx8k)
I never tried allocating and binding 16kx16k rendertargets, my gut feeling was just that it might be stopped by some soft constraint, it's no less than 1GB for RGBA8.
color buffers - that is 8MB (with the assumption of 2 bits per block and your 8x8 block assumption which I don't think is quite accurate neither since IIRC nowadays this is really done per "memory block" hence the amount of pixels covered differs depending on the buffer format).
I would be interested in reading the description of such a scheme - other than ASTC which I consider not quite suitable for rendertarget compression, especially because of the encoder complexity and problems.
Sure you could say you only support it when there's just one color buffer or some such - meaning you miss it when you need that feature the most... Should be more efficient to just hold that information like other data - though this would increase latency in the (hopefully rare) case the block information data itself isn't yet in the cache.
I don't think we're in disagreement on this one. Albeit, I do think the net-effect of actually touching the rendertarget(s) and piping the compressed data through the ROP caches, is that the compressed data is "on the chip" afterwards.