Color compression, is it this simple?

DemoCoder

Veteran
Does anyone think that ATI and/or NVidia's color compression is anything more complex than the obvious trick of storing a flag for each pixel that says whether or not all of the subsamples are identical or not?

To do 4:1 (best case) lossless compression that can be efficiently random accessed, I would bet that they don't even use something as trivial as run length encoding, but simply optimize multisample reads and writes by using a flag bit.

Does anyone have any information to the contrary? Does the compression actually work when multisampling FSAA is turned off?
 
Sounds vaguely reasonable to me.

Though you might have trouble finding out where to put the flag so that it doesn't destroy your coherency.
 
I would say its all block based, and in each block you have little 2x2 blocks which can be compressed (inside polygon) or uncompressed (edge case). Next to these actual storage blocks you have headers (probably stored in a complete different place for multiple blocks), these headers are probably cached on-chip. Headers are quite simple since its just 1 bit to decide compressed/uncompressed. All edge cases are uncompressed and since the number of edges skyrockets with the influx of smaller triangles I would have my doubts about the actual final total bandwidth compression rate (storage space is not compressed, since worst case has to be supported which means all blocks are edges and full storage space is required).

All IMHO and guesses 8)
 
Dunno exactly what methods R300/NV30 use for color compression, although I suspect that it isn't really much more than a few flags to indicate which samples of a multisampled pixel have the same color - this would fit nicely together with the claims that NV30 can do almost-free AA. In NV30's case, there is probably an additional flag per pixel block to indicate if the pixel block has been rendered to at all yet - something like this is necessary in order to do the fast color buffer clearing that NV30 apparently supports.

Run-length encoding doesn't sound very useful for color compression, as just about any rendering technique except flatshading would break it.
 
I believe you all are probably correct about the compression using a flag of some sort. If it works on 2x2 blocks then it should work well with AA, but it will rarely work without AA. This is consistent with Nvidia's claims.
 
Well, hopefully it manages to at least work somewhat without AA, or, more importantly, at edge pixels. But yes, I'm sure that the compression technique is optimized for AA, since that's where most of the memory bandwidth hit occurs.

Hopefully there's also been a lot of thought put into what happens at FSAA edges. Something akin to a coverage mask technique might be just the ticket, but I have no idea how it could be implemented for proper coherency.

Update: Actually, now that I think about it, there might be a way.

Instead of 2x2 blocks, you could have 2x6 blocks (Would work well with 4x3 aspect ratios, but could be another number). If the block is labeled as compressed (There might even be multiple compression techniques at work for different scenarios...akin to FXT), then the first 32 bits describe the coverage of each color, for a max of three colors for each 2x2 piece of the block. The first two colors take up four bits each, with the last color's mask being a simple nor of the first two.

Obviously this exact technique is not optimal given the non-power-of-two block size, but I'm sure other variants are possible.
 
Chalnoth said:
Update: Actually, now that I think about it, there might be a way.

Instead of 2x2 blocks, you could have 2x6 blocks (Would work well with 4x3 aspect ratios, but could be another number). If the block is labeled as compressed (There might even be multiple compression techniques at work for different scenarios...akin to FXT), then the first 32 bits describe the coverage of each color, for a max of three colors for each 2x2 piece of the block. The first two colors take up four bits each, with the last color's mask being a simple nor of the first two.

Surely that is lossy and the claims are for a loss-less compression.
 
I think both are block based, and its going to be factored by the number of AA samples - i.e. for 4X AA (on R300) it will be 2x2 blocks, and for 2X 2x3 (or 3x2 dependant on how the buffer is arranged).

You can see the factoring taking place becuase ATI's Z buffer is 4:1 compression in the first place, however with 6X AA they quote it as up to 24:1.

I'd assume that NV30 operates in a similar fashion, the only difference being that NV30 appears to have it built in with AA off as well as on.
 
Back
Top