Will Microsoft ever adopt a new compression method?

Dio said:
I do think variable-rate compression is interesting and possibly has some potential, but if it starts needing index blocks and the like it's messy - and the surface simplicity of DXTC was a big help in its adoption, I think.
AFAICS, the only other option for variable rate would be to have "large" data blocks (eg 256 bit) that decode to large NxN pixel blocks with variable compressio rates internally. I tried something a little less ambitious when I was researching texture compression methods, and decoding huffman-like data in one or two cycles is deeply unpleasant!

As to what drove the decision - well, only the mathemagician knows that :). I think DXTC is a great format myself...

Well there are a few other hints: S3TC clearly grew from "CCC" (described in one of the siggraphs) which, IIRC, used 4x4 blocks of pixels. Each block stored two 8-bit palette indices (to select two base colours) and then each of the 16 pixels had a one bit index to choose which base colour. S3TC (sort of) doubles the storage cost, adds the implied colours, while eliminating the palette indirection.

What did disappointment me WRT DXTC is that it didn't have a 4bpp variable alpha variant. I did quite a few experiments when S3TC/DXTC first came out using a variant that had N-levels of alpha, and in most cases the quality was fine. The 8bpp for the DXT2+ modes just seemed like an overkill. <shrug>
 
Dave B(TotalVR) said:
Personally I think VQ compression is great, sure comrpessing the textures in an overnight job which is an ass but you get so much better compression ratio's than with S3TC, especially as the texture gets larger. You can also read and decompress a VQ compressed texture quicker than you can read an uncompressed texture which is stunning if u ask me.

You can get about twice the compression ratio of DXTC (for colour-only images), but the difference between 2bpp and 4bpp is not really of much interest in the video card market. Smaller than 4bpp is of interest mainly in areas where memory is at a huge premium (handheld devices etc). As far as video cards go if VQ's higher compression ratio couldn't win the day back when it was first introduced (when devices might typically have only about 8-16 MB of onboard RAM) then it is hardly likely to be a convincing argument now.

In the consumer 3D space the most interesting aspect of compression is increasing the efficiency of texturing, and DXTC solves that problem just fine - the added benefits of dropping to 2bpp vs. 4bpp in overall texturing efficiency are generally pretty marginal (considering you've already dropped from 24bpp->4bpp, and effectively from 32bpp->4 bpp, since most 3D hardware does not use packed texel formats).

Whether the image quality of VQ at 2bpp is equivalent to DXTC at 4bpp is a long and involved discussion in and of itself, but in most typical cases I believe it to be somewhat lower quality overall (although in the same ballpark). Of course each compression method has different strong and weak points in terms of IQ, and therefore the exact situation varies from image to image. I know that Simon had a comparison of some aspects of this on his homepage where he made some interesting observations on quality/bit.

VQ compression is also not great for hardware, as Simon has touched upon, since you need to hide an additional indirection. Also, for properly orthogonal support you have to be able to use N different sets of VQ palettes, where N is the number of simultaneous textures you support.

IMO the 4:1 compression ratio of alpha textures with S3TC is pitiful, I remember VQ getting 8:1 compression ratios, there was some comparison article written ages ago. I'll see if I can find it.

'Pitiful' is an interesting choice of words, and is probably taking things too far. It's certainly a low compression rate, and assigning as many bits of storage to one 8-bit component as to the other 3 is obviously not optimal. On the other hand it does its job well, and you get an alpha channel that is compressed practically without any fidelity loss. It could certainly be better, but if it was 'pitiful' it would not be very useful, and that is certainly not the case - going from 32bpp to 8 bpp is very useful.
 
MDolenc said:
Once shaders are fast enough, why not do texture (de)compression with them?
As shaders get faster, developers are just as likely to want to use all those extra cycles for something else. Besides, doing random access of texture data and performing bit unpacking is not going to be fast with the current instruction set and so, if you're going to add specialised HW to make the system faster, you might as well make it automatically decompress textures (all IMHO).

Hyp-X said:
I agree, that space is the most important part (while being faster IS a nice thing as well). And you can no longer rely on AGP memory either, as you'll soon run out of base memory as well.
If you have lots of textures being used in the same image then it's not just space: bandwidth is still going to be a problem.

(To Humus:) I don't see bandwidth issues magically going away simply because there's now this great opportunity to write really slow shader code :)
Dave B said:
Personally I think VQ compression is great, sure comrpessing the textures in an overnight job...
Hardly. It took about ~10-20% longer than the S3 compression tool on my old PC.
...but you get so much better compression ratio's than with S3TC,especially as the texture gets larger.
I would also say that, on average, the quality was slightly lower with
DC's VQTC than S3TC but, given the ~2-fold decrease in storage costs, this was completely acceptable.
You can also read and decompress a VQ compressed texture quicker than you can read an uncompressed
texture which is stunning if u ask me.
You'll get that with S3TC as well. Because the HW to do fast decompression has to be included in the system, you get savings because compressed textures use the external
memory bus far less, freeing it up for other tasks.

IMO the 4:1 compression ratio of alpha textures with S3TC
I personally dislike quoting a 'ratio' for texture compression since the schemes are nearly always lossy. FWIW, DXTC storage costs are 4bpp for opaque (and punch-through) and 8bpp for translucent textures, while the DC's VQ was always ~2bpp.
(Note that putting alpha in the VQ sometimes meant more degradation because it was trying to represent more data with the same number of bits.)

Simon, are you allowed to hint as to exactly how PVR-TC works? is it an extension of VQ or a whole new thing or what?
It'll become public eventually, but for now I'll not say anything on it other than it's nothing like the VQ (i.e. it has no indirection) and is reasonably cheap to implement in HW (e.g. it's used in MBX which has a very tight gate budget).

andypski said:
VQ compression is also not great for hardware, as Simon has touched upon, since you need to hide an additional indirection. Also, for properly orthogonal support you have to be able to use N different sets of VQ palettes, where N is the number of simultaneous textures you support.
IIRC DC used a second cache stage and each texture had its own palette/codebook (although, having said this, for small textures where the compression ratio would effectively decrease, you could actually pack textures together so they borrowed codes from neighbouring textures).
 
Simon F said:
AFAICS, the only other option for variable rate would be to have "large" data blocks (eg 256 bit) that decode to large NxN pixel blocks with variable compressio rates internally. I tried something a little less ambitious when I was researching texture compression methods, and decoding huffman-like data in one or two cycles is deeply unpleasant!
Oh yes. Been there, when I was writing a JPEG decoder on a 56001 DSP. It's very clumsy to try to do in HW.

I think the problem with a 'fixed/variable' JPEG-based system is that the (compressed) size ratio between 'large' and 'small' blocks in a JPEG image is large, so I envisage that in this kind of system the 'easy' bits of the image get a larger percentage of the bandwidth, while the 'hard' bits get less....


Simon F said:
What did disappointment me WRT DXTC is that it didn't have a 4bpp variable alpha variant. I did quite a few experiments when S3TC/DXTC first came out using a variant that had N-levels of alpha, and in most cases the quality was fine. The 8bpp for the DXT2+ modes just seemed like an overkill. <shrug>
I think the assumption was that the alpha channel would need higher precision. Certainly this is what I tend to see nowadays... of course there are trivial modifications that would give some alpha support for lower bit accuracy in RGB.
 
OK, I'm a bit late here, but I've been away.

andypski said:
Overall it might not be a gain at all - block->block noise (low frequency noise created by colour choice mismatches at block intersections) is one of the key problems that a high quality DXTC compressor needs to deal with, and is very difficult to solve optimally. In addition to this the low-frequency and structured nature of this noise makes it one of the most noticeable artifacts caused by the compression. With a larger block size the errors from block->block are likely to get larger.

If I got it right, DXT1 (I used that name to explicitly say "no alpha on the side") have two different compression blocks; 3-color-block, and 2-color-1-transparent-block. Each block has it's own mini-palette. The type of the block can be chosen indpendently per block.

So palette is chosen per 4x4 texel block, compression type per 4x4 texel block.

In the S3TC-similar compression blocks, FXT1 selects the palette per 4x4 texel block. But the compression type is as always in FXT1 selected per 8x4 block.

Are you saying that:
1) The slightly higher granularity in compression type selection will reduce block>block noice.
And/or
2) None of the extra modes in FXT1 would ever be used. Not even for, say, slow gradients, or multi-bit alpha modes still at 4bpp.
 
Basic said:
If I got it right, DXT1 (I used that name to explicitly say "no alpha on the side") have two different compression blocks; 3-color-block, and 2-color-1-transparent-block.
A minor correction: The two modes are "4-colours" (2 implied from the 2 stored base colours) and "3-colours + transparent black".
 
Basic said:
Are you saying that:
1) The slightly higher granularity in compression type selection will reduce block>block noice.
And/or
2) None of the extra modes in FXT1 would ever be used. Not even for, say, slow gradients, or multi-bit alpha modes still at 4bpp.

I had a long reply to this typed in but lost it in a crash, so this version is going to be fast and dirty, but I hope clear enough.

My recollection of FXT1 is that it had 4 compression modes -

CC_MIXED. 4x4 block, 2 565 endpoints and 2 interpolant colours. Basically identical to S3TC.

CC_HI. 8x4 block, 2 555 endpoints and 5 interpolants with 1 explicit transparent encoding.

CC_CHROMA. 8x4 block, 4 555 explicit colours

CC_ALPHA. 8x4 block. 3 5555 endpoints, 2 interpolants between endpoint 0 and 1 for the left 4x4 area and 2 between endpoints 1 and 2 for the right 4x4 area.

The CC_MIXED mode could not coexist in an image with the other formats because this created a dependency between the format chosen per block and the texel addressing (how do you find a specific texel on a line that is composed of different block sizes?) You would need to add an index of some kind, which would get messy.

So in mixed-mode images only the 8x4 formats are available. The compression of both S3TC and FXT1 on colour images is 4bpp, so for each 8x4 FXT1 block you have 2 S3TC blocks in a direct apples->apples comparison. Looking at the descriptions above it can be seen that for colour-only data S3TC should always be a superior representation to either the CC_CHROMA and CC_ALPHA formats for any data, and it is questionable whether CC_HI is better for gradients as well (lower endpoint precision with 1 extra interpolant vs. 2 extra explicit colours at higher precision.)

So for colour images FXT1 is pretty much a bust - I would expect that on almost all images the best S3TC compressor would beat the best FXT1 compressor for quality.

For images with alpha the presence of the CC_ALPHA format makes things interesting because it allows 4bpp compression of complex alpha, but the image quality will be lower than the 8bpp S3TC equivalent, so it is questionable if this is a big advantage.

Overall FXT1 seemed like a bit of a 'me too' exercise on the part of 3dfx and didn't really offer anything compelling above what S3TC provided, so it's not surprising that it didn't generate much interest.

- Andy.
 
Simon:
DOH :oops: Are you sure that two bits can represent 4 values? :)
I'll blame it on my always occuring post-Xmas cold.

andypski:

CC_MIXED has two sub modes (just as S3TC). Both are 8x4 blocks, but they split this 8x4 block into two 4x4 sub blocks.
Sub-modes:
CC_MIXED non-transparent: One 555 endpoint and one 565 endpoint, 2 interpolant colors.
CC_MIXED transparent: One 555 endpoint and one 565 endpoint, 1 interpolant color, 1 transparent "color".

And it certainly is possible to mix CC_MIXED freely with the other modes in the same texture. The only limitation is that both S3TC-like blocks in a CC_MIXED block must use the same mode (transparent/non-transparent).

Enhancements possible by small changes in the standard (not using any more memory):
CC_MIXED: All endpoints 565
CC_HI: One endpoint 565
CC_CHROMA: All endpoints 565
CC_ALPHA: One extra bit in one of the endpoints, not realy worth it.
There is coding space left for other compression modes, if someone see a new usefull mode.
 
Basic said:
CC_MIXED has two sub modes (just as S3TC). Both are 8x4 blocks, but they split this 8x4 block into two 4x4 sub blocks.
Sub-modes:
CC_MIXED non-transparent: One 555 endpoint and one 565 endpoint, 2 interpolant colors.
CC_MIXED transparent: One 555 endpoint and one 565 endpoint, 1 interpolant color, 1 transparent "color".

And it certainly is possible to mix CC_MIXED freely with the other modes in the same texture. The only limitation is that both S3TC-like blocks in a CC_MIXED block must use the same mode (transparent/non-transparent).

Aha! My (somewhat old) memory of FXT1 must be playing tricks on me.

Being able to include CC_MIXED blocks does make it a bit more interesting, but the other block modes still seem to be fairly uninteresting in general. The CC_ALPHA mode is still the only really interesting extension as explained above. In addition, reducing the precision of one of the endpoints can cause some problems due to increased quantisation noise, and the ability to freely mix 3 and 4 colour blocks in DXTC (without restrictions) can also marginally improve compression quality in some cases with smart compressors.

Changing the encoding to keep higher resolution endpoints at all times would make the spec much more interesting as it should then beat S3TC in all cases (although perhaps only marginally) - can you outline this suggestion and the new encoding?

- Andy.
 
The redundancy in FXT1 is in the placement of base colors. It's possible to switch places on the base colors, and then do the corresponding changes in the index field.

This can be used in different ways:
You could see the colors as 15 bit integers (removing the last green bit), and do compares between them.
So ie for (one half of) CC_MIXED:
Code:
if(color0<color1) {
  color0.greenlsb=0;
  color1.greenlsb=the_other_bit_stored_explicitly;
}
else {
  color0.greenlsb=the_other_bit_stored_explicitly;
  color1.greenlsb=1;
}
This scheme would btw be "compatible" with FXT1 in the sense that if you compress with FXT1 and decompress with this "FXT2" (or the other way around), the output will still be correct except for the green lsb.

Another way is to lock certain texels to certain (groups of) colors:
For CC_ CHROMA:
Texel0 always use color0 => 2 bit freed in the index array
Texel1 always use color0/1 => 1 bit freed in the index array
And then do a comparison as above between color2/3 for the fourth bit.
There's the four needed green lsb bits.
There is actually one bit that isn't used at all in this mode, so that one could be used as the fourth bit. But then you'd waste easy coding space for future enhancements with new compression modes.
 
So in your encoding I have 1 explicit bit that I specify per (4x4) CC_MIXED block as 0 or 1, and an implicit encoding from the ordering of the endpoint values that manipulates the effective LSB that is then substituted back into the green channel of each endpoint? I worked your example through, but couldn't generate the case where I could have colour 0's LSB as 1 and colour 1's LSB as 0 - have I misunderstood the encoding?
 
You're right that color0.glsb=0 and color1.glsb=1 together isn't possible. But you don't need that!
Think of it like this; the color order determines the green lsb of the "smallest" color, while the explicit bit is the green lsb of the "largest" color. With that in mind, you can easily see that you can set the green lsb of both colors to anything you want.
You just never need to set color0.glsb=0 and color1.glsb=1, the colors would be swaped instead.

There is just one "problem", if the colors are equal when the green lsb is stripped off. But otoh, the solution is quite simple. If the colors are equal, then color1.greenlsb=1 according to the rules above. So set the explicit bit to 0, and you've got all cases covered.
 
I'll read that as "I see what you're saying". (Normally I would interpret "Gotcha" as "I nailed you down there".)

Btw, you didn't seem too impressed about the CC_CHROMA mode at all. I know that it doesn't have the fine gradients in it, so it's just 4 colors total over the whole 8x4 block. But remember that it's the only mode that breaks the "one-dimentional" color limit. A block where three different colors meet will get big errors in all other modes. Try to get red, green and blue into the other modes.


Now back to a different place where compression might be interesting.
What if a GPU has a virtualized memory (like P10). Textures are split into blocks. (I don't remember the exact size for P10, was it 4KB? => 32x32texel@32bit.) If each of those 32x32 blocks were ~jpg compressed, and decompressed by the GPU as they were loaded into gfx card mem, then AGP memory could suddenly become a lot more useful.

AGP bandwith would be virtually multiplied by the jpg compression ratio. The jpg decompression wouldn't need to work in random order (as normal texture decompression schemes need). The possibly variable block size wouldn't be such a big problem here, since we could use a table to see where the blocks are stored. The table would be just 1/1024 of the (uncompressed) texture size, and the cost of the indirection isn't that bad since it's only going to be used when loading new textures over AGP, and it will be used in pair with a transfer of a rather large block of data.

It should be noted though, that the working set of textures still should fit into gfx card mem. The "working set" of textures being those needed to render one entire frame, and that probably will be there next frame too. Or in other words, textures should still only be loaded over AGP when getting into new areas/revealing new textures.
 
Basic said:
I'll read that as "I see what you're saying". (Normally I would interpret "Gotcha" as "I nailed you down there".)

I see what you're saying :)

Btw, you didn't seem too impressed about the CC_CHROMA mode at all. I know that it doesn't have the fine gradients in it, so it's just 4 colors total over the whole 8x4 block. But remember that it's the only mode that breaks the "one-dimentional" color limit. A block where three different colors meet will get big errors in all other modes. Try to get red, green and blue into the other modes.

It may have some uses, but I suspect the number of instances where it would be selected over standard S3TC blocks is fairly rare interms of overall error - it might reduce blocking slightly in some circumstances, but in real-world images/textures you can usually trade off some chroma accuracy in small areas without much visual blocking, particularly if there is any high frequency noise structure in the area - this would tend to mask the chroma inaccuracy. I suspect CC_CHROMA would come into play if you were trying to compress images such as HUD displays (which might have widely different primary colours in a single block)

With alterations in encoding FXT(2!) becomes interesting, but still not much of an advance over S3TC on the kinds of images it's designed to represent. It would be interesting to write a good encoder for this new format to see how it performs. 3DFX's original FXT1 encoder was really not very good at all (and it was so slow...)
 
You're probably right that the places were CC_CHROMA would be at most use would be at HUDs, or displays/signs/maps/diagrams, which often can be colorful. But I guess I can't be sure unless I have some example pictures. Maybe it's OK with rather large chroma errors, if it's on just a small area. (But since it's for textures, you'd never know how large area they will be expanded to.)

3DFX's original FXT1 encoder was really not very good at all (and it was so slow...)

Well, that's one thing that we agreed on all the time. If you have the encoder, then try anything with small black and white details. (So that there's black and white in every block.) Then try to compress it over a weekend or so... In some modes they actualy made exhaustive searches, but used an incorrect measure for what was optimal. :oops: I think I have the FXT1 source laying around somewhere.

I actually started to write a FXT1/2 en-/de-coder, but kinda' lost the interest when no one seemed to be interested in the format, and then 3dfx disapeared.

Btw, I know that 3dfxs' FXT1 encoder didn't attempt to do any dithering, did S3s' compressor do that?
 
Basic said:
Btw, I know that 3dfxs' FXT1 encoder didn't attempt to do any dithering, did S3s' compressor do that?

Of course the source code for S3's encoder was never made public.

(But no, it didn't... ;))

You can check that out on example images easily enough.
 
But you know because you were involved writing it? :)

I just have a feeling you're already closer to a FXT* compressor than I ever was, because you already have some nifty optimization source laying around.
 
Basic said:
But you know because you were involved writing it? :)

I just have a feeling you're already closer to a FXT* compressor than I ever was, because you already have some nifty optimization source laying around.

I can't lay claim to writing the original S3 compressor, although I do work extensively with the guy who did write it. :) (On compression, amongst other things...)

Since I no longer work for S3 I'm afraid I don't have access to the original compressor any more (it was a very clever one in many ways)

It should be possible to write an even better version than the original S3 version, but I think it would take significant time and research to beat it by any significant margin. I know that Simon F. thought that his own S3TC compressor gave better results on some images at least - he commented on this on his home page, although we would respectfully disagree (on the limited basis we have for comparison). ;)
 
Back
Top