On textures and compression

Looking at the Xenos capabilities, it looks like you could setup a 128-bit palletted texture using only 4-bits per pixel in the index buffer. That would give you 32:1 compression, which I think is pretty darn good. Of course, if you factor in say a pallette of 2048 colours with a 1024x1024 texture, then you have compression that's only about 30:1, still good though. Just looking at some pictures in photoshop and even a 256 colour pallette looks ok for large full colour photos, so a 2048 colour pallette would actually be pretty ridiculous. As for smooth gradients, you could probably just do them as shader math if you really had to. Other then the lack of AF, I think it would be a fantastic compression system and you get HDR textures (which afaik don't have AF anyways).
 
ShootMyMonkey said:
Makes sense given that J2k is a wavelet based scheme, so you can probably just point to any old single pixel in the low-pass and based on where it is in the image, you'll know where to look in all the highpass subbands (assuming you've decoded down to an image of the transform itself). Also suggests that the image could be used in such a way as to get implicit miplevels for free (i.e. just don't decode all the way down the band hierarchy).

The page that was linked earlier did mention tiles and regions but ignoring that, you could indeed use specific wavelet terms to construct a particular pixel except that those are surely entropy encoded making access to them just as difficult. :???:

Squeak said:
I wouldn't say the latest compression scheme from mister F is screwing around in the margins. 2bpp with alpha, and minimal silicon footprint is pretty darn impressive and a substantial improvement over S3TC.
But of course you are right that there is limited scope for further improvement on a general fixed rate scheme beyond this point.
Actually, I have had some ideas in that respect but just need a little bit of time to try them out.

DudeMiester said:
Looking at the Xenos capabilities, it looks like you could setup a 128-bit palletted texture using only 4-bits per pixel in the index buffer. That would give you 32:1 compression, which I think is pretty darn good. Of course, if you factor in say a pallette of 2048 colours with a 1024x1024 texture, then you have compression that's only about 30:1, still good though.
But how do you pack the colours? A 2k palette => 11 bits per pixel and that's not going to fit very well into a binary machine. You'd need horrible addressing calculations!
 
Last edited by a moderator:
ShootMyMonkey said:
Except it's not really entirely attributable to "compression" in the same sense being talked about -- that is to say file-level compression as opposed to "effective compression" of some "virtual" result constructed out of multiple files.

I don't follow. It's not multiple files. Could you clarify please?
 
Squeak said:
I wouldn't say the latest compression scheme from mister F is screwing around in the margins. 2bpp with alpha, and minimal silicon footprint is pretty darn impressive and a substantial improvement over S3TC.
In the (somewhat limited) testing I've done with PVRTC at 4bpp it's an interesting matchup when compared to S3TC - the blocking artifacts of S3TC are naturally not present with this method. On photographic-style data sets the most noticable artifacts are some modulation around colour boundaries (introducing some noise that can look a bit like ringing or dithering) and a tendency to blur fine details - not unexpected, and usually not particularly troubling. It's hard to see how it's doing against S3TC in terms of RMS error - I'm not sure that the compressors are actually trying to minimise the same metric - the S3TC compressor I'm using typically targets weighted error based on perceived luminance, and generally does better at this than the PVRTC compressor, but the opposite is true of unweighted error. If I alter the S3TC compressor to minimise unweighted error then it trades blows with PVRTC. Overall I'm not sure that there's a clear overall winner in RMS terms, with different images seeming to favour one or other technique.

It does have trouble with some types of image data other than photographic (as does S3TC in different ways) - if you have an image with clear colour boundaries they tend to become smeared, which could be undesirable. It's certainly a very interesting alternative to S3TC style encoding, and I'm not sure how much headroom is left with future improvements to the compressor - maybe quite a lot...? Writing a high-quality compressor for this format certainly seems to be a very different class of problem than writing a good S3TC compressor, so I expect there's quite a learning curve to get good results.

At 2bpp the smearing and modulation artifacts start to become pretty bad, but it is only 2bpp after all, and the quality on some images should be usable. In some environments you might not notice the low quality (small screens...). Then again, you could just take the 4bpp version and reduce the dimensions by 50% - it might not look much worse.
 
andypski said:
It does have trouble with some types of image data other than photographic (as does S3TC in different ways) - if you have an image with clear colour boundaries they tend to become smeared, which could be undesirable.
Weird, in theory it can do hard (aliased) boundaries between two color planes almost perfectly (ignoring the color quantization). I have never been entirely convinced the initial estimate of A&B and the validity of the iterative procedure to improve them used in the original paper ... but I have no idea where their present compressor is at now.
 
MfA said:
Weird, in theory it can do hard (aliased) boundaries between two color planes almost perfectly (ignoring the color quantization).
Yes - it is strange. I wasn't particularly expecting this sort of artifact - maybe a compressor issue.
 
Simon F said:
But how do you pack the colours? A 2k palette => 11 bits per pixel and that's not going to fit very well into a binary machine. You'd need horrible addressing calculations!
You wouldn't do that. I propose storing the palette in shader constant arrays. The reason you would limit your palette is for performance, not memory packing. Afaik, the constant arrays should be mostly loaded into registers, so memory consuption is the concern here not bandwidth. Thus, you want to limit how many colours you need as much as possible. I guess you say that you would pack the palette in 128-bit FP 4 component vectors. However, this would be entirely seperate from the texture itself, which would just be a single channel index table. Therefore, lookup would be trivial: Read the texture, lookup the corresponding colour in the constant array and render.

For more compression, you could arrange the colours in the palette by grouping similar colours. This would allow for a margin of error in the index value you read from the texture, since if you're off by +/- 3 or 4 values it shouldn't be too horrible. In turn, allowing you to use some basic single channel compression. Of course, this may not work on all textures, but even then without it you achieve some significant compression. After all, while 8 bit would give you only 256 colours, which is useful enough, 16 bit gives you 65,536 potential colours, which is far more then necessary or that the hardware can provide. Thus at minimum you get 8:1 compression (128-bit to 16-bit). Most textures vary little in colour, like brick or dirt, so you shouldn't need a huge number of colours, so 8-bit should work here. After all, people still use GIFs for a reason. You could also pack 2 or 3 texture palettes into one array, and share pallettes between textures as other optimisations.

The way I see it, full 32-bit textures are only necessary for very colourful and moderately noisy textures, which are a limited number. With low noise, the error of compressing the index texture will usually result in picking like colours anyways, and with high noise you won't notice the error (or much of anything really). The worst problem would be potential banding on smooth textures, but even then you can use bilinear filtering as an input to blend between palette colours (use the fractional part of the sample as the blending factor). Come to think of it, if you have a well ordered and smooth palette you might be able to use AF too, but I'm not sure. Also, if you notice a texture has a very regular palette, it may be possible to replace it with a palette generated by a function (could be a complex waveform if you have the GPU power). A sort of pseudo-procedural texture, which would free up registers for other palettes or things of interest. There's a host of other possibilites I'm sure, and this is all rendered in 128-bit FP colour.

EDIT: Is PS I took a diagonal rainbow gradient at 1024x1024, converted to 256 colour GIF with random selective dithering, then applied 1 px guassian blur to approximate bilinear filtering, and I'll be damned if it doesn't look smoother then the original somehow! Even a pure red/green gradient and some game screens I have are great.
 
Last edited by a moderator:
DudeMiester said:
You wouldn't do that. I propose storing the palette in shader constant arrays. The reason you would limit your palette is for performance, not memory packing.
I wasn't concerned about the storage of the palette - that's meant to be an insignificant part of the texture data and so it can padded. I was referring to the texture indices. 11bits per pixel does not pack very nicely into power of 2 words!:???:


andypski said:
Yes - it is strange. I wasn't particularly expecting this sort of artifact - maybe a compressor issue.
Yes - it needs so looking at.:???:
 
So? Then pad the indicies to 16-bits, distribute the values over the added range, throw on some compression (DXN or CTX1 from the Xenos docs), and in the shader rectify the errors induced by the compression. It should work, assuming the errors are not too tremendously large. You would have a margin of error of +/- 16, and by grouping colours as I suggest, you could double or triple the acceptable error.

In the PS you would have something like this for a nicely filtered paletted texture:
Code:
//Get your index from the texture. Bilinear filtering should be ok to enable.
float2 IndexFS=Tex2D (IndexTexture, coords);
//Recombine the components, scaling to a max of 2048, which should mitigate compression/filtering error.
float IndexF=(IndexFS.x+IndexFS.y)*1024;
//Get the integer index
int Index1=floor(IndexF);
int Index2=ciel(IndexF);
//Blend the palette colours together, where Palette is a constant array and frac(IndexF) is the blending factor.
float4 Colour=Palette[Index1]+(Palette[Index2]-Palette[Index1])*frac(IndexF);
Of course 2048 is a totally arbitrary number. You could just as well use 256, which fits nicely into a single 8 bit integer texture (DXT3A or 5A compressions possible). Although, I would imagine that compression here wouldn't be such a good idea, because there is much less tolerance. Still for textures with uniform colouring, it may be possible.

I'm not saying that this will work all the time. Rather I'm saying each texture should be considered on it's own for optimum compression. Different textures will have different needs, so going for a "universal" compression scheme will limit your results. Of course, that doesn't mean you can't make an app that analysis a texture and determines the optimum compression, be it a palette or some other system.
 
Last edited by a moderator:
DudeMiester said:
So? Then pad the indicies to 16-bits, distribute the values over the added range, throw on some compression (DXN or CTX1 from the Xenos docs), and in the shader rectify the errors induced by the compression. It should work, assuming the errors are not too tremendously large. You would have a margin of error of +/- 16, and by grouping colours as I suggest, you could double or triple the acceptable error.
Right.... so where is the (significant) compression? You've now got 16 bits/texel + 2k*32bits of palette. If all the texels of a, say, 1kx1k texture are used in a single shader application, then you're achieving close to 16 bits/texel (so a moderate 50% compression) but if in a render there are only, say, 2000 texels being accessed, then you're achieving ~50bits/texel, i.e 1.5x expansion :oops:
 
I don't think that would happen that often. If you shared palettes, you would minimise the added load. Also, the palettes would be 128-bit not 32-bit. Given that only 256 colours is ok for many images, then you could definitly share a 2048 colour palette between many textures. Finally, I'm talking in reference to what DX10 offers, where constant arrays are stored seperately and independantly from the shaders. This way if you have a series of shaders that use it, there is no additional BW cost.

Of course, if you only have one texture that is displayed over a small area of the screen, then of course you won't use paletting. However, how often do you only have one small compressed texture in a scene? If that were the case, then I don't know why you bother to compress anything at all, lol. No, I imagine you would have many textures on the screen sharing a small set of palettes. If there was still a problem, you would use LOD to enable paletting only on foreground objects and their textures. Memory consumption would be a bit higher, but not too much since the distance textures are lower resolution. This way you maxmise the savings, giving you 8-bit per pixel BW/storage (if you use DXN compression on a 16-bit index) on a a 128-bit source texture. By my calculations, you could store about 200 such textures in 201MB of RAM (assuming 1 palette per 10 textures). If that's not good compression, then I don't know what is. (maybe I really don't though, heh)
 
Filtering of palette indices will give you ugly artifacts and aliasing. Imagine having two neighboring texels with indices from each end of the palette. If you move a polygon with this texture in tiny sub-pixel-steps, you will cycle through the whole palette.
 
I see what you mean, heh. That seems obvious now. I was thinking about it, and the best balance I can come up with is something like this:

You take a 1024x1024 8-bit index texture and reduce it to a 512x512 8:8:8:8 texture, where the components are indexs to the colours the reduced pixel replaced. Then in the shader, you calculate where the current pixel would be located relative to the 4 original texels represented in the reduced index texture, and blend appropriately. You would still have some aliasing between the 2x2 blocks on high contrast textures, but you the pseudo-bilinear filtering you do get only requires one texture lookup. For textures without much colour variation it should be fine (but so would direct interpolation of indicies), and for contrasting textures the aliasing would be somewhat limited. Kind of like how 4xAA looks on polygon edges, I would think. Of course, this wouldn't work if you needed 16-bit indicies, but that should be fairly rare. Still you could pair indicies and do two texture lookups for some improvement.

Obviously, it's not usuable everywhere, but neither are other types of compression. I'm sure it has it's place somewhere. Certainly it's better then rendering 128-bit FP textures directly, so there's one definite avenue.
 
Last edited by a moderator:
Is DXT adaptative? By adaptative I mean applying different compression ratios for different blocks of the same texture, thus archiving maximum C : Q ratio.

DXT / s3tc is kinda old, it compresses textures in a similar JPG fashion, is it OK for today standards or are there newer compression schemes being developed?
 
Is DXT adaptative? By adaptative I mean applying different compression ratios for different blocks of the same texture, thus archiving maximum C : Q ratio.
No, and that's what makes it good as a texture compression method as opposed to a generic image compression method. Have a look at the first section of this paper on why this is a desireable feature.
DXT / s3tc is kinda old, it compresses textures in a similar JPG fashion,
No, it doesn't really have that much in common with JPEG.
is it OK for today standards or are there newer compression schemes being developed?
Yes there are other compression methods "out there" and being developed.
 
What ever happened with the free TC that 3dfx introduced? FXT1 had some definite advantages back in the day over S3TC, plus it was free. I'm surprised nobody ever picked it up. Then again, maybe something with NVIDIA buying the IP complicated that.

-Dave
(showing signs that I've been out of the loop)
 
What ever happened with the free TC that 3dfx introduced? FXT1 had some definite advantages back in the day over S3TC, plus it was free.
IMHO, it probably infringes on S3TC's patent so "free" is not exactly the word I would use.
 
IMHO, it probably infringes on S3TC's patent so "free" is not exactly the word I would use.

If I recall correctly there was prior work in the area upon which both were built. But I suspect with 3dfx no longer around to defend that, nobody would want to risk it.

-Dave
 
If I recall correctly there was prior work in the area upon which both were built.
If you refer to CCC (don't have the ref to it off hand but it is in the paper I linked to below), then I would argue that S3TC is a valid improvement on it.
 
Back
Top