High Rez Textures and Texture Compression

Reverend said:
Any thoughts on Tim's comment about IHVs not having any real incentive to innovate due to the DX standardization? Can't really be true because ATI must've invested quite a bit to come up with 3Dc.

Well, this was my concern re the convo with Ail upstream. At least with DX9 you had SM3 to aim at mid-life. I find it really hard to believe that as flexible as DX10 is supposed to be from a programmability POV, that the IHVs could really be happy with *only* performance improvements for three years after it --or that the rest of us will be truly satisfied if such goodies as they come up with by necessity have to be proprietary and outside the standard (like 3dc).
 
DemoCoder said:
I think the biggest impediment is that DXTC is perceived as "good enough"
Certainly, for a substantial amount of time, it was. Competition, hardware- and software-wise, has now ensured it no longer is.

and that the complexity needed to achieve any significant improvement is substantial.
I'm not sure I get what you mean -- "complexity" in terms of hardware cost or in terms of software cost? This is related to the above regarding competition -- hardware competitors want to outdo each other in terms of features and performance while certain graphics-minded developers (like id) most defintely want their games to look better than the rest. Both, IHVs and ISVs, have complexities to solve to achieve substantial improvement.
 
If the IHVs could have implemented a significantly better compression algorithm given today's transistor budgets they would have done it already, regardless of DX9, since the driver could always force-compression behind the scenes and gain performance boost, just like the early Detonator's got a 50% boost by turning on S3TC automatically.

API support is not neccessary, as game profiles could easily work around it.

Complexity is hardware complexity. The cost of implementing it, vs the benefits. Any transistors chewed up by a much more complex algorithm means transistors not dedicated to shader ALUs, or ROPs, or some other improvement.
 
DemoCoder said:
Are people looking for better PSNR (quality) at a given compression ratio, or better ratios?
That probably depends on who you ask, though I suspect there's a "have your cake and eat it too" crowd who probably insist on both. :rolleyes:

One possibility is to use a modified DXTC or VQ on YUV data instead of RGB data. Human beings are much more responsive to luminance artifacts than chroma.
That's fine for "natural" images but you do get "graphics" images where that breaks down (eg adjacent red and blue stripes). FWIW, I did experiment with a YUV option for PVRTC but it didn't seem worth the extra expense.

IIRC, I think Strom and Akenine-Moller's PACKMAN "2", aka ETC, uses the YUV idea.

The entire 384-bit 4x4 block can be compressed to 96-bits. Slightly worse than DXTC 6:1, but perhaps with less visible artifacts?
If you mention 96 bits to a hardware engineer tehn one night you'll find yourself beaten up in a dark alley.... :p

Another option is to look for different mathematical representations for the interpolation. DXTC stores information as a linear parameterization. That is, it stores 2 endpoints of a line, and then stores positions along that line. One could look for non-linear representations, or a change in coordinate system.
Actually, the former is mentioned in the S3TC patent.

Also, one might look to implement entropy coding by using a windowed algorithm on larger block sizes. Today, 4x4 block sizes are used, but an 8x8 blocksize would yield 256 values to play with, making entropy encoding much more useful.
There are trade-offs with such a scheme, but it could be worth investigating. 8x4 is the logical next size up and its "equivalent" is used in the PVRTC 2bpp version.

Another possibility is an entropy encoding algorithm that provides a compact index that allows the GPU to efficiency calculate which memory locations to fetch for decoding. Some sort of hybrid windowed approach which synchronizes the encoded stream to block boundaries every so often (giving up some efficiency) would have to be used to keep the index compact.
That sounds painful. BTW entropy encoding, even just within a block, is unpleasant because it is expensive to decode. Since you can't do random decoding, you'd have to decode the entire block and cache all the pixels. That effectively makes your cache less efficient because, for a given storage size, it will hold fewer pixels. Of course, you could make it multi-level (i.e. a "secondary" cache for compressed data and a "primary" for decompressed) but that adds further complexity.

The great thing about DXTC, PVRTC and PACKMAN is that they are simple to decompress on-the-fly.
 
Last edited by a moderator:
DemoCoder said:
Perhaps the two-level compression technique is best. DXTC for Video Ram decompression, and a different compression format to make PCIE uploads must faster (say, JPEG2000 compressed in main memory, and decompressed by the GPU into video ram on texture upload)
Couldn't this be a function of the driver?

If one IHV offered it, and an "Ultra" texture quality mode came with a game, at the cost of an extra 500MB on the disk, would gamers go for it?

Would it make a good intermediate step before some distant future where functionality like this becomes part of the API?

Jawed
 
Would it be possible to "automatically" mip-map textures as well, which would also be a dramatic saving on texture sizes?

JPEG2000 has built-in the concept of "target resolution", which implies that mip-mapped textures could be redundant in the disk version of the textures.

Jawed
 
Jawed said:
Would it be possible to "automatically" mip-map textures as well, which would also be a dramatic saving on texture sizes?
Most games do that, yes. DDC iirc lets you store mipmaps however, which is only useful if the mipmap is different from a simple scaling, or if you feel that might accelerate loading time. But TBH, that is rather incredibly naive imo: unless you plan on loading a small mipmap first and loading the real thing later one, in order to benefit from on-the-fly streaming, this technique is counter-productive.

Uttar
 
That's what becomes of not being a developer :oops:

I thought most textures included mipmapped versions of themselves stored on disk.

Jawed
 
Simon F said:
BTW entropy encoding, even just within a block, is unpleasant because it is expensive to decode. Since you can't do random decoding, you'd have to decode the entire block and cache all the pixels. That effectively makes your cache less efficient because, for a given storage size, it will hold fewer pixels. Of course, you could make it multi-level (i.e. a "secondary" cache for compressed data and a "primary" for decompressed) but that adds further complexity.

The great thing about DXTC, PVRTC and PACKMAN is that they are simple to decompress on-the-fly.
I'd like to wholly second this. The major problem with say Huffman is that you can't start extracting the next element of data until you've got the code size of the previous one. I'm not sure I want to think how many transistors you'd need to evaluate a 64-value Huffman in parallel...

In contrast, you can evaluate a DXTC block mathematically - so with cheap hardware, in parallel with single-cycle throughput.
 
Simon, good points. But I wouldn't give up on entropy encoding, perhaps one just needs to think outside the box away from the tradition entropy encoding approaches.

In an abstract sense, one could think of DXTC as entropy encoding. It takes as input, a sequence of tristimulus values, and tries to reduce entropy by finding a way of encoding the sequence as a linear parameterization. If the encoding was lossless, it would in fact be comparable to other entropy encoding techniques.

So what we would like to do is to figure out alternative "lossy" entropy encoding schemes. First and foremost, we have to identify where the low entropy is in our input and how to represent it more compactly. Many image compression algorithms work on the theory that spatially coherent tristimulus samples are also coherent in colorspace, and also on the basis that many errors in chroma reproduction can't be perceptually detected.

That defines are algorithm search parameters. Now all we have to do is find some as yet unapplied techniques to bring to bear. One possibility is that if we compute the gradients of all possible 4x4 tiles, and look at the statistical distribution on an imagery corpus, the distribution is non-uniform. This might mean that the gradient structure of tiles themselves might be useful. In a way, DCT based schemes already take advantage of this but we may wish to avoid DCT. Rather, perhaps a standard dictionary of candidate distributions may be statically determined, and the texel block is encoded by referencing a standard distribution, plus correction factors based on deviation.




I think I have a couple of other interesting ideas, but I need to think about it, and I'm on the road right now. I'll try to congeal them when I fly back to the US.
 
DemoCoder said:
In an abstract sense, one could think of DXTC as entropy encoding.
I wouldn't agree, personally. It's more quantisation and not really entropy encoding. Sure, it reduces entropy, but by that definition just about any compression method is going to be entropy encoding, so all that's happened there is that the problem has a different name.
 
Dio said:
I wouldn't agree, personally. It's more quantisation and not really entropy encoding. Sure, it reduces entropy, but by that definition just about any compression method is going to be entropy encoding, so all that's happened there is that the problem has a different name.

Like I said, if it was lossless, it would be. Entropy encoding just means representing symbols so that code lengths match probabilities. If DXTC was lossless (say may having a fallback to complete un-compressed tiles whenever a compression would result in ANY error) it would indeed be entropy encoding. The idea is to start with a thought exercise. Ignore for the moment that entropy coding ala shannon is not feasible.

The challenge is to find a mathematical construct so that the most probable distributions can be represented compactly, while less common constructs are either "lost" or defaulted to another encoding.

I would claim that the linear parameterization used by DXTC is not optimal and that other representations might exist that can more compactly encode the distribution of inputs that we care about.

I just think it is worth thinking outside the box and not assuming that every method for reducing entropy matches shannon's. That something is either "entropy encoding" or "is not". Rather, a hybrid scheme that is "almost" shannon style coding, but which deviates from it to achieve implementation simplicity may be possible.

It may be a useful thought exercise to start out with a code that assigns code lengths to probability distributions, and then play around with the manner by which the codes are assigned. For example, choosing a fixed length code for the area of most interest, and dealing with encoding the rest via auxillary techniques (quantization, etc)

The idea that non-fixed length codes which can't be randomly decoded without decoding all of the previous values in the block can't exist is IMHO wrong. A code that is self-syncing over a small window and requires at most 1 extra lookup (of the neighboring n-bits) in some cases may be possible.

I think it is premature to give up and not take a look at all options.
 
Don't worry, I won't be giving up, but like so many others, I just wish I had more time to think about it, and maybe a bit more maths :D
 
DemoCoder said:
Simon, good points. But I wouldn't give up on entropy encoding, perhaps one just needs to think outside the box away from the tradition entropy encoding approaches.
If you mean "find a way to parallelise the decode" then "good luck". Someone on this board recently gave a pointer to a parallel huffman decoding technique and it didn't look pretty. Even with a tiny huffman scheme (eg max of , say, 2 or 3 bits per code) I failed to find a nice HW solution for decoding multiple symbols in one step. The cost just "blows up" exponentially.
In an abstract sense, one could think of DXTC as entropy encoding. It takes as input, a sequence of tristimulus values, and tries to reduce entropy by finding a way of encoding the sequence as a linear parameterization. If the encoding was lossless, it would in fact be comparable to other entropy encoding techniques.
Sorry, got to disagree with you here. DXTC is VQ compression with 2 implicit vectors.
 
Simon F said:
If you mean "find a way to parallelise the decode" then "good luck". Someone on this board recently gave a pointer to a parallel huffman decoding technique and it didn't look pretty.

No, what I meant was, to invent a code wherein the Nth symbol doesn't require decoding the previous N-1 symbols. I don't think all variable length codes impose this restriction. For example, I don't see why a code can't exist such that if you need to decompress the Nth symbol, a function f(N) will exist will return a location M near the start of the symbol, and guarantee at most 1 retry to "align" to the proper offset.

Just as an off-the-top-of-my-head analogy, imagine a code where symbol lengths are quantized to be either 2-bits, 4-bits, or 8-bits long. If you read 8-bits at a given location f(N) you have either read 1 compressed symbol, 2 symbols, or 4 symbols or you have a misalighment. The encoding would contain a sync or check scheme, so that if in fact, you needed to read an 8-bit symbol, but you were misaligned (by say 4-bits), you could detect this and "backup" or "advance" to the next 4-bit boundary, likewise for 2-bit misalignments.

In a given 4x4 or 8x8 block, if one saved Y bits from entropy encoding, the extra space could be used to for other purposes, like increased fidelity perhaps by encoding as many correction factors as you could in the remaining space.

Anyway, I think entropy encoding is not neccessary the place to start. The place to start is at the input textures, and what non-perceptual information can be thrown away, as well as what low-entropy information can be interpolated or looked up, such as gradient or frequency distributions.

I also don't think implementing something like JPEG-2000 is "out of the question". I think you could use DXTC/VQ/PACKMAN/et al at the level of the L1 texture cache, and use JPEG-2000 to get data from system memory or the L2-cache. There's no shame in a tiered-solution. The ultimately goal is to make better use of memory space as well as trade off computation for bandwidth.

One of the things that gives me hope that a conceptual breakthrough may still lurk out there is the awesome efficiency of EdgeBreaker as well as the simplicity of the decode.
 
YUV is unnecessarily complex, YCoCg(-R) is much simpler and still provides better compression.

Democoder, what's the point in saving bits in areas with low entropy and using the saved bits to get extra fidelity in that area? If it has low entropy it is unlikely to be of perceptual significance ... the areas you need most bits in are the ones which are hardest to encode, where advanced entropy coding will get you the least. IMO, until you bite the bullet and go for globally variable rate coding (which needs an extra layer of pointer indirection to be fast decodable with random access) all you can do is screw around in the margins. That is pretty much the only way I see for Carmack to have made much headway. Another advantage of going there would be that you could also use variable resolution textures (a crude but effective way of compression in and of itself).
 
Last edited by a moderator:
Back
Top