Will Microsoft ever adopt a new compression method?

Discussion in 'General 3D Technology' started by Brimstone, Dec 21, 2002.

  1. Dave B(TotalVR)

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    491
    Likes Received:
    3
    Location:
    Essex, UK (not far from IMGTEC:)
  2. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,560
    Likes Received:
    157
    Location:
    In the Island of Sodor, where the steam trains lie
    AFAICS, the only other option for variable rate would be to have "large" data blocks (eg 256 bit) that decode to large NxN pixel blocks with variable compressio rates internally. I tried something a little less ambitious when I was researching texture compression methods, and decoding huffman-like data in one or two cycles is deeply unpleasant!

    Well there are a few other hints: S3TC clearly grew from "CCC" (described in one of the siggraphs) which, IIRC, used 4x4 blocks of pixels. Each block stored two 8-bit palette indices (to select two base colours) and then each of the 16 pixels had a one bit index to choose which base colour. S3TC (sort of) doubles the storage cost, adds the implied colours, while eliminating the palette indirection.

    What did disappointment me WRT DXTC is that it didn't have a 4bpp variable alpha variant. I did quite a few experiments when S3TC/DXTC first came out using a variant that had N-levels of alpha, and in most cases the quality was fine. The 8bpp for the DXT2+ modes just seemed like an overkill. <shrug>
     
  3. andypski

    Regular

    Joined:
    May 20, 2002
    Messages:
    584
    Likes Received:
    28
    Location:
    Santa Clara
    You can get about twice the compression ratio of DXTC (for colour-only images), but the difference between 2bpp and 4bpp is not really of much interest in the video card market. Smaller than 4bpp is of interest mainly in areas where memory is at a huge premium (handheld devices etc). As far as video cards go if VQ's higher compression ratio couldn't win the day back when it was first introduced (when devices might typically have only about 8-16 MB of onboard RAM) then it is hardly likely to be a convincing argument now.

    In the consumer 3D space the most interesting aspect of compression is increasing the efficiency of texturing, and DXTC solves that problem just fine - the added benefits of dropping to 2bpp vs. 4bpp in overall texturing efficiency are generally pretty marginal (considering you've already dropped from 24bpp->4bpp, and effectively from 32bpp->4 bpp, since most 3D hardware does not use packed texel formats).

    Whether the image quality of VQ at 2bpp is equivalent to DXTC at 4bpp is a long and involved discussion in and of itself, but in most typical cases I believe it to be somewhat lower quality overall (although in the same ballpark). Of course each compression method has different strong and weak points in terms of IQ, and therefore the exact situation varies from image to image. I know that Simon had a comparison of some aspects of this on his homepage where he made some interesting observations on quality/bit.

    VQ compression is also not great for hardware, as Simon has touched upon, since you need to hide an additional indirection. Also, for properly orthogonal support you have to be able to use N different sets of VQ palettes, where N is the number of simultaneous textures you support.

    'Pitiful' is an interesting choice of words, and is probably taking things too far. It's certainly a low compression rate, and assigning as many bits of storage to one 8-bit component as to the other 3 is obviously not optimal. On the other hand it does its job well, and you get an alpha channel that is compressed practically without any fidelity loss. It could certainly be better, but if it was 'pitiful' it would not be very useful, and that is certainly not the case - going from 32bpp to 8 bpp is very useful.
     
  4. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,560
    Likes Received:
    157
    Location:
    In the Island of Sodor, where the steam trains lie
    As shaders get faster, developers are just as likely to want to use all those extra cycles for something else. Besides, doing random access of texture data and performing bit unpacking is not going to be fast with the current instruction set and so, if you're going to add specialised HW to make the system faster, you might as well make it automatically decompress textures (all IMHO).

    If you have lots of textures being used in the same image then it's not just space: bandwidth is still going to be a problem.

    (To Humus:) I don't see bandwidth issues magically going away simply because there's now this great opportunity to write really slow shader code :)
    Hardly. It took about ~10-20% longer than the S3 compression tool on my old PC.
    I would also say that, on average, the quality was slightly lower with
    DC's VQTC than S3TC but, given the ~2-fold decrease in storage costs, this was completely acceptable.
    You'll get that with S3TC as well. Because the HW to do fast decompression has to be included in the system, you get savings because compressed textures use the external
    memory bus far less, freeing it up for other tasks.

    I personally dislike quoting a 'ratio' for texture compression since the schemes are nearly always lossy. FWIW, DXTC storage costs are 4bpp for opaque (and punch-through) and 8bpp for translucent textures, while the DC's VQ was always ~2bpp.
    (Note that putting alpha in the VQ sometimes meant more degradation because it was trying to represent more data with the same number of bits.)

    It'll become public eventually, but for now I'll not say anything on it other than it's nothing like the VQ (i.e. it has no indirection) and is reasonably cheap to implement in HW (e.g. it's used in MBX which has a very tight gate budget).

    IIRC DC used a second cache stage and each texture had its own palette/codebook (although, having said this, for small textures where the compression ratio would effectively decrease, you could actually pack textures together so they borrowed codes from neighbouring textures).
     
  5. Dio

    Dio
    Veteran

    Joined:
    Jul 1, 2002
    Messages:
    1,758
    Likes Received:
    8
    Location:
    UK
    Oh yes. Been there, when I was writing a JPEG decoder on a 56001 DSP. It's very clumsy to try to do in HW.

    I think the problem with a 'fixed/variable' JPEG-based system is that the (compressed) size ratio between 'large' and 'small' blocks in a JPEG image is large, so I envisage that in this kind of system the 'easy' bits of the image get a larger percentage of the bandwidth, while the 'hard' bits get less....


    I think the assumption was that the alpha channel would need higher precision. Certainly this is what I tend to see nowadays... of course there are trivial modifications that would give some alpha support for lower bit accuracy in RGB.
     
  6. Basic

    Regular

    Joined:
    Feb 8, 2002
    Messages:
    846
    Likes Received:
    13
    Location:
    Linköping, Sweden
    OK, I'm a bit late here, but I've been away.

    If I got it right, DXT1 (I used that name to explicitly say "no alpha on the side") have two different compression blocks; 3-color-block, and 2-color-1-transparent-block. Each block has it's own mini-palette. The type of the block can be chosen indpendently per block.

    So palette is chosen per 4x4 texel block, compression type per 4x4 texel block.

    In the S3TC-similar compression blocks, FXT1 selects the palette per 4x4 texel block. But the compression type is as always in FXT1 selected per 8x4 block.

    Are you saying that:
    1) The slightly higher granularity in compression type selection will reduce block>block noice.
    And/or
    2) None of the extra modes in FXT1 would ever be used. Not even for, say, slow gradients, or multi-bit alpha modes still at 4bpp.
     
  7. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,560
    Likes Received:
    157
    Location:
    In the Island of Sodor, where the steam trains lie
    A minor correction: The two modes are "4-colours" (2 implied from the 2 stored base colours) and "3-colours + transparent black".
     
  8. andypski

    Regular

    Joined:
    May 20, 2002
    Messages:
    584
    Likes Received:
    28
    Location:
    Santa Clara
    I had a long reply to this typed in but lost it in a crash, so this version is going to be fast and dirty, but I hope clear enough.

    My recollection of FXT1 is that it had 4 compression modes -

    CC_MIXED. 4x4 block, 2 565 endpoints and 2 interpolant colours. Basically identical to S3TC.

    CC_HI. 8x4 block, 2 555 endpoints and 5 interpolants with 1 explicit transparent encoding.

    CC_CHROMA. 8x4 block, 4 555 explicit colours

    CC_ALPHA. 8x4 block. 3 5555 endpoints, 2 interpolants between endpoint 0 and 1 for the left 4x4 area and 2 between endpoints 1 and 2 for the right 4x4 area.

    The CC_MIXED mode could not coexist in an image with the other formats because this created a dependency between the format chosen per block and the texel addressing (how do you find a specific texel on a line that is composed of different block sizes?) You would need to add an index of some kind, which would get messy.

    So in mixed-mode images only the 8x4 formats are available. The compression of both S3TC and FXT1 on colour images is 4bpp, so for each 8x4 FXT1 block you have 2 S3TC blocks in a direct apples->apples comparison. Looking at the descriptions above it can be seen that for colour-only data S3TC should always be a superior representation to either the CC_CHROMA and CC_ALPHA formats for any data, and it is questionable whether CC_HI is better for gradients as well (lower endpoint precision with 1 extra interpolant vs. 2 extra explicit colours at higher precision.)

    So for colour images FXT1 is pretty much a bust - I would expect that on almost all images the best S3TC compressor would beat the best FXT1 compressor for quality.

    For images with alpha the presence of the CC_ALPHA format makes things interesting because it allows 4bpp compression of complex alpha, but the image quality will be lower than the 8bpp S3TC equivalent, so it is questionable if this is a big advantage.

    Overall FXT1 seemed like a bit of a 'me too' exercise on the part of 3dfx and didn't really offer anything compelling above what S3TC provided, so it's not surprising that it didn't generate much interest.

    - Andy.
     
  9. Basic

    Regular

    Joined:
    Feb 8, 2002
    Messages:
    846
    Likes Received:
    13
    Location:
    Linköping, Sweden
    Simon:
    DOH :oops: Are you sure that two bits can represent 4 values? :)
    I'll blame it on my always occuring post-Xmas cold.

    andypski:

    CC_MIXED has two sub modes (just as S3TC). Both are 8x4 blocks, but they split this 8x4 block into two 4x4 sub blocks.
    Sub-modes:
    CC_MIXED non-transparent: One 555 endpoint and one 565 endpoint, 2 interpolant colors.
    CC_MIXED transparent: One 555 endpoint and one 565 endpoint, 1 interpolant color, 1 transparent "color".

    And it certainly is possible to mix CC_MIXED freely with the other modes in the same texture. The only limitation is that both S3TC-like blocks in a CC_MIXED block must use the same mode (transparent/non-transparent).

    Enhancements possible by small changes in the standard (not using any more memory):
    CC_MIXED: All endpoints 565
    CC_HI: One endpoint 565
    CC_CHROMA: All endpoints 565
    CC_ALPHA: One extra bit in one of the endpoints, not realy worth it.
    There is coding space left for other compression modes, if someone see a new usefull mode.
     
  10. andypski

    Regular

    Joined:
    May 20, 2002
    Messages:
    584
    Likes Received:
    28
    Location:
    Santa Clara
    Aha! My (somewhat old) memory of FXT1 must be playing tricks on me.

    Being able to include CC_MIXED blocks does make it a bit more interesting, but the other block modes still seem to be fairly uninteresting in general. The CC_ALPHA mode is still the only really interesting extension as explained above. In addition, reducing the precision of one of the endpoints can cause some problems due to increased quantisation noise, and the ability to freely mix 3 and 4 colour blocks in DXTC (without restrictions) can also marginally improve compression quality in some cases with smart compressors.

    Changing the encoding to keep higher resolution endpoints at all times would make the spec much more interesting as it should then beat S3TC in all cases (although perhaps only marginally) - can you outline this suggestion and the new encoding?

    - Andy.
     
  11. Basic

    Regular

    Joined:
    Feb 8, 2002
    Messages:
    846
    Likes Received:
    13
    Location:
    Linköping, Sweden
    The redundancy in FXT1 is in the placement of base colors. It's possible to switch places on the base colors, and then do the corresponding changes in the index field.

    This can be used in different ways:
    You could see the colors as 15 bit integers (removing the last green bit), and do compares between them.
    So ie for (one half of) CC_MIXED:
    Code:
    if(color0<color1) {
      color0.greenlsb=0;
      color1.greenlsb=the_other_bit_stored_explicitly;
    }
    else {
      color0.greenlsb=the_other_bit_stored_explicitly;
      color1.greenlsb=1;
    }
    This scheme would btw be "compatible" with FXT1 in the sense that if you compress with FXT1 and decompress with this "FXT2" (or the other way around), the output will still be correct except for the green lsb.

    Another way is to lock certain texels to certain (groups of) colors:
    For CC_ CHROMA:
    Texel0 always use color0 => 2 bit freed in the index array
    Texel1 always use color0/1 => 1 bit freed in the index array
    And then do a comparison as above between color2/3 for the fourth bit.
    There's the four needed green lsb bits.
    There is actually one bit that isn't used at all in this mode, so that one could be used as the fourth bit. But then you'd waste easy coding space for future enhancements with new compression modes.
     
  12. andypski

    Regular

    Joined:
    May 20, 2002
    Messages:
    584
    Likes Received:
    28
    Location:
    Santa Clara
    So in your encoding I have 1 explicit bit that I specify per (4x4) CC_MIXED block as 0 or 1, and an implicit encoding from the ordering of the endpoint values that manipulates the effective LSB that is then substituted back into the green channel of each endpoint? I worked your example through, but couldn't generate the case where I could have colour 0's LSB as 1 and colour 1's LSB as 0 - have I misunderstood the encoding?
     
  13. Basic

    Regular

    Joined:
    Feb 8, 2002
    Messages:
    846
    Likes Received:
    13
    Location:
    Linköping, Sweden
    You're right that color0.glsb=0 and color1.glsb=1 together isn't possible. But you don't need that!
    Think of it like this; the color order determines the green lsb of the "smallest" color, while the explicit bit is the green lsb of the "largest" color. With that in mind, you can easily see that you can set the green lsb of both colors to anything you want.
    You just never need to set color0.glsb=0 and color1.glsb=1, the colors would be swaped instead.

    There is just one "problem", if the colors are equal when the green lsb is stripped off. But otoh, the solution is quite simple. If the colors are equal, then color1.greenlsb=1 according to the rules above. So set the explicit bit to 0, and you've got all cases covered.
     
  14. andypski

    Regular

    Joined:
    May 20, 2002
    Messages:
    584
    Likes Received:
    28
    Location:
    Santa Clara
  15. Basic

    Regular

    Joined:
    Feb 8, 2002
    Messages:
    846
    Likes Received:
    13
    Location:
    Linköping, Sweden
    I'll read that as "I see what you're saying". (Normally I would interpret "Gotcha" as "I nailed you down there".)

    Btw, you didn't seem too impressed about the CC_CHROMA mode at all. I know that it doesn't have the fine gradients in it, so it's just 4 colors total over the whole 8x4 block. But remember that it's the only mode that breaks the "one-dimentional" color limit. A block where three different colors meet will get big errors in all other modes. Try to get red, green and blue into the other modes.


    Now back to a different place where compression might be interesting.
    What if a GPU has a virtualized memory (like P10). Textures are split into blocks. (I don't remember the exact size for P10, was it 4KB? => 32x32texel@32bit.) If each of those 32x32 blocks were ~jpg compressed, and decompressed by the GPU as they were loaded into gfx card mem, then AGP memory could suddenly become a lot more useful.

    AGP bandwith would be virtually multiplied by the jpg compression ratio. The jpg decompression wouldn't need to work in random order (as normal texture decompression schemes need). The possibly variable block size wouldn't be such a big problem here, since we could use a table to see where the blocks are stored. The table would be just 1/1024 of the (uncompressed) texture size, and the cost of the indirection isn't that bad since it's only going to be used when loading new textures over AGP, and it will be used in pair with a transfer of a rather large block of data.

    It should be noted though, that the working set of textures still should fit into gfx card mem. The "working set" of textures being those needed to render one entire frame, and that probably will be there next frame too. Or in other words, textures should still only be loaded over AGP when getting into new areas/revealing new textures.
     
  16. andypski

    Regular

    Joined:
    May 20, 2002
    Messages:
    584
    Likes Received:
    28
    Location:
    Santa Clara
    I see what you're saying :)

    It may have some uses, but I suspect the number of instances where it would be selected over standard S3TC blocks is fairly rare interms of overall error - it might reduce blocking slightly in some circumstances, but in real-world images/textures you can usually trade off some chroma accuracy in small areas without much visual blocking, particularly if there is any high frequency noise structure in the area - this would tend to mask the chroma inaccuracy. I suspect CC_CHROMA would come into play if you were trying to compress images such as HUD displays (which might have widely different primary colours in a single block)

    With alterations in encoding FXT(2!) becomes interesting, but still not much of an advance over S3TC on the kinds of images it's designed to represent. It would be interesting to write a good encoder for this new format to see how it performs. 3DFX's original FXT1 encoder was really not very good at all (and it was so slow...)
     
  17. Basic

    Regular

    Joined:
    Feb 8, 2002
    Messages:
    846
    Likes Received:
    13
    Location:
    Linköping, Sweden
    You're probably right that the places were CC_CHROMA would be at most use would be at HUDs, or displays/signs/maps/diagrams, which often can be colorful. But I guess I can't be sure unless I have some example pictures. Maybe it's OK with rather large chroma errors, if it's on just a small area. (But since it's for textures, you'd never know how large area they will be expanded to.)

    Well, that's one thing that we agreed on all the time. If you have the encoder, then try anything with small black and white details. (So that there's black and white in every block.) Then try to compress it over a weekend or so... In some modes they actualy made exhaustive searches, but used an incorrect measure for what was optimal. :shock: I think I have the FXT1 source laying around somewhere.

    I actually started to write a FXT1/2 en-/de-coder, but kinda' lost the interest when no one seemed to be interested in the format, and then 3dfx disapeared.

    Btw, I know that 3dfxs' FXT1 encoder didn't attempt to do any dithering, did S3s' compressor do that?
     
  18. andypski

    Regular

    Joined:
    May 20, 2002
    Messages:
    584
    Likes Received:
    28
    Location:
    Santa Clara
    Of course the source code for S3's encoder was never made public.

    (But no, it didn't... :wink:)

    You can check that out on example images easily enough.
     
  19. Basic

    Regular

    Joined:
    Feb 8, 2002
    Messages:
    846
    Likes Received:
    13
    Location:
    Linköping, Sweden
    But you know because you were involved writing it? :)

    I just have a feeling you're already closer to a FXT* compressor than I ever was, because you already have some nifty optimization source laying around.
     
  20. andypski

    Regular

    Joined:
    May 20, 2002
    Messages:
    584
    Likes Received:
    28
    Location:
    Santa Clara
    I can't lay claim to writing the original S3 compressor, although I do work extensively with the guy who did write it. :) (On compression, amongst other things...)

    Since I no longer work for S3 I'm afraid I don't have access to the original compressor any more (it was a very clever one in many ways)

    It should be possible to write an even better version than the original S3 version, but I think it would take significant time and research to beat it by any significant margin. I know that Simon F. thought that his own S3TC compressor gave better results on some images at least - he commented on this on his home page, although we would respectfully disagree (on the limited basis we have for comparison). :wink:
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...