GameCube's S3TC: How does it work?

zeckensack said:
A certain IHV with strange capitalization has been caught decoding paletted textures as they are read into texture cache ;)
That at least solves the read-port problem.
All that does is trade a multi-port store for more texture cache+serialising the decode :???:
 
Xmas said:
It depends on how the memory controller is designed. I can't think of a single PC GPU with memory access granularity of less than 64 bits, at least not in the last 5 years. And I'm pretty sure that applies to GC as well.
For portable platforms it could be less, and even when the blocks fits the cache, you still need to read all the pixels in a block to get the one pixel you need for interpolation of the texels.
Mipmaps should be generated by filtering the base map, and filtering produces intermediate values. But since you are limited to a small CLUT, you either have to optimize the palette for all mip levels, meaning you get even less colors for the base map, or live with substantial color errors in mipmaps.
You are going to need intermediate values almost no matter what the subject matter, for AAing edges. If you have 3 base colours, you'll have 5 variations per colour, which should be enough.

It certainly is enough for most alpha test cases. However I doubt 4-bit CLUT is good enough when you need multiple alpha values either.
Just one 0.5 alpha value is enough to make a huge difference on stuff like wire fences.
On leaves and grass you need almost nothing but alpha gradients, in which case 16 is more than enough.
Also, if you only need 4 colours you can bitmask the the texture and halve the size.
 
Xmas said:
It depends on how the memory controller is designed. I can't think of a single PC GPU with memory access granularity of less than 64 bits, at least not in the last 5 years. And I'm pretty sure that applies to GC as well.
Are you including burst length or DDR factors there? Otherwise, the memory controller widths on all R5xx are 32-bit as are NV44 and probably G72.
 
Squeak said:
For portable platforms it could be less, and even when the blocks fits the cache, you still need to read all the pixels in a block to get the one pixel you need for interpolation of the texels.
But the same is true for all texture formats where a texel is smaller than the memory access granularity. And luckily the probability of requiring multiple texels from a block is very high.

You are going to need intermediate values almost no matter what the subject matter, for AAing edges. If you have 3 base colours, you'll have 5 variations per colour, which should be enough.
If you have only 3 base colors, S3TC isn't going to do that bad a job. And if your base map only contains 3 colors, the next mip level might contain up to 15 colors (3 base colors + 12 mixed) if you use a simple 2x2 box filter. And way more for the next mip level.

Just one 0.5 alpha value is enough to make a huge difference on stuff like wire fences.
On leaves and grass you need almost nothing but alpha gradients, in which case 16 is more than enough.
Also, if you only need 4 colours you can bitmask the the texture and halve the size.
For wire fences, the best idea is usually to rotate the texture 45 degrees. What do you mean by "bitmask the texture"?


Dave Baumann said:
Are you including burst length or DDR factors there? Otherwise, the memory controller widths on all R5xx are 32-bit as are NV44 and probably G72.
Yes, of course I am including burst length.
 
Last edited by a moderator:
Xmas said:
What do you mean by "bitmask the texture"?
He means using multiple palettes with one texture to get multiple textures of lower bit depth. (eg. 2x2bit).
I can't say I've ever used that for plain colormaps, but I did use it for luminance component of compressed textures (yielding effective ~2.5bits/texel for the whole map).
 
Fafalada said:
He means using multiple palettes with one texture to get multiple textures of lower bit depth. (eg. 2x2bit).
Exactly, "bit masking" was probably a wrong word to use, as it could imply some sort of bitwise operation.
 
Xmas said:
If you have only 3 base colors, S3TC isn't going to do that bad a job. And if your base map only contains 3 colors, the next mip level might contain up to 15 colors (3 base colors + 12 mixed) if you use a simple 2x2 box filter. And way more for the next mip level.
I think that's highly dependent on the subject matter, the higher the mip level the less pixels you will have to spread those colour differences over. Of course there is always pathological cases where you have to do a lot of interpolation, but in the low mip levels bilinear should do a lot of the work.
 
Fafalada said:
He means using multiple palettes with one texture to get multiple textures of lower bit depth. (eg. 2x2bit).
Which hardware supports this?


Squeak said:
I think that's highly dependent on the subject matter, the higher the mip level the less pixels you will have to spread those colour differences over. Of course there is always pathological cases where you have to do a lot of interpolation, but in the low mip levels bilinear should do a lot of the work.
Not sure if I understand you correctly regarding "less pixels to spread color differences over". How is bilinear filtering supposed to help?
 
I did a real quick test to see for myself.
See if you can spot the difference (original, 4bit and S3TC). I made no attempt at optimising either compression. The result is just straight from the programs first try (xpadie and compressonator).
texturetest.png


Xmas said:
Which hardware supports this?
Any.

Not sure if I understand you correctly regarding "less pixels to spread color differences over". How is bilinear filtering supposed to help?
When the maps get smaller there is less pixels, hence less pixels to AA.
Take a look at this example (4bit straight downscaled and same but boxfiltered).
mipmap.png
 
Last edited by a moderator:
Squeak said:
I did a real quick test to see for myself.
See if you can spot the difference (original, 4bit and S3TC). I made no attempt at optimising either compression. The result is just straight from the programs first try (xpadie and compressonator).
texturetest.png
The block artifacts of S3TC compression are visible. But the compression algorithm seems to be concerned a bit too much about hue compared to brightness.


Could you please describe more specifically what you mean? Not every hardware supports CLUT textures. And not every hardware supports dependent texture reads, in case you're referring to using the texture content as index to a lookup texture (which doesn't work with filtering anyway).

When the maps get smaller there is less pixels, hence less pixels to AA.
Take a look at this example (4bit straight downscaled and same but boxfiltered).
mipmap.png
The ratio of texels to pixels on screen should still be about 1:1 (bi/trilinear filtering), and that's what matters. And I don't know what kind of box filter you applied there.
 
Squeak said:
I did a real quick test to see for myself.
See if you can spot the difference (original, 4bit and S3TC). I made no attempt at optimising either compression. The result is just straight from the programs first try (xpadie and compressonator).
texturetest.png
http://www.digit-life.com/articles/reviews3tcfxt1/

This link has a lot of good examples on how s3tc looks and compares it to fxt1 texture compression which I think I heard that the xbox1 uses.

Edit: Actually Xmas just said that xbox1 doesn't use that. sorry.
 
Last edited by a moderator:
nintenho said:
This link has a lot of good examples on how s3tc looks and compares it to fxt1 texture compression which I think I heard that the xbox1 uses.
No, the only chip that supports FXTC is 3dfx VSA-100. The NV2a in Xbox supports S3TC(=DXTC).
 
Xmas said:
Could you please describe more specifically what you mean? Not every hardware supports CLUT textures. And not every hardware supports dependent texture reads, in case you're referring to using the texture content as index to a lookup texture (which doesn't work with filtering anyway).

Even if you emulate the CLUT it would still work, it's just a matter of interleaving two 2bit textures in a 4bit one, and then use a pallette where the entries from either is blanket out.

The ratio of texels to pixels on screen should still be about 1:1 (bi/trilinear filtering), and that's what matters.
My point is, colourspace requirements tend to shrink, not grow, with smaller pictures.
Unless the picture is really computeri (aliased) to start with, a lot or all of the AA entries will be present in the source picture already.

And I don't know what kind of box filter you applied there.
There is more than one kind? Can't imagine.
 
Last edited by a moderator:
Xmas said:
The ratio of texels to pixels on screen should still be about 1:1 (bi/trilinear filtering), and that's what matters. And I don't know what kind of box filter you applied there.
Actually, I'd have said the texels/pixels should be < 1 if you believe Nyquist.
 
nintenho said:
http://www.digit-life.com/articles/reviews3tcfxt1/

This link has a lot of good examples on how s3tc looks and compares it to fxt1 texture compression which I think I heard that the xbox1 uses.

Xmas said:
No, the only chip that supports FXTC is 3dfx VSA-100. The NV2a in Xbox supports S3TC(=DXTC).

Besides, as I've said before, some modes of FXTC looked, IMHO, like they might infringe on the S3TC patent, so I doubt anyone else would risk using it.
 
Last edited by a moderator:
Squeak said:
Even if you emulate the CLUT it would still work, it's just a matter of interleaving two 2bit textures in a 4bit one, and then use a pallette where the entries from either is blanket out.
But you still only get one color from the TMU, not two. And I don't think "emulating" CLUT is a good idea if it breaks filtering and/or costs more performance than using uncompressed textures.
Besides, I thought you were arguing for hardware CLUT support. Dependent texture reads are not going to disappear, of course.

My point is, colourspace requirements tend to shrink, not grow, with smaller pictures.
If the number of pixels starts to become smaller than the number of colors, yes. But for medium-sized mipmap levels I don't think that's the case.

Unless the picture is really computeri (aliased) to start with, a lot or all of the AA entries will be present in the source picture already.
But the problem is that those "AA entries" take up a huge part of the palette so really only have a handful of "base colors".

There is more than one kind? Can't imagine.
There's different sizes.
With point sampling resize to 25% and then resizing to 200% I get your first image, but what I get with a box filter looks substantially different (and way better) than your second image.



Simon F said:
Actually, I'd have said the texels/pixels should be < 1 if you believe Nyquist.
If you believe Nyquist I think it should be a tad less than half, but the LOD formula proposed in the OpenGL spec targets a 1:1 ratio. Or rather, it's targeting an overlap of half a texel in each direction for the width 2 triangle filter kernel that bilinear represents.
 
("Bit masking" ...)
Xmas said:
Which hardware supports this?
Any hardware that supports paletted textures ;)
If you have hardware support for 4 bit palettes, and have two textures where you need just four indices each, you can pack them into a single 4-bit texture and use two palettes. In both palettes, you duplicate the entries in a way that makes the bits you don't need for the current texture irrelevant. This doesn't require dependent reads at all.

Let's put the two index bits for texture 0 into the two low bits of the texture and lets call the palette entries C00, C01, C02, C03.
Likewise put texture 1's two-bit indices into the high bits and let's name the colors C10, C11, C12, C13.

When you want to sample texture 0, you make this 4-bit palette current:
C00, C01, C02, C03
C00, C01, C02, C03
C00, C01, C02, C03
C00, C01, C02, C03
I.e. all rows are equal. The two higher bits don't affect the color that's pulled from the pallete.

When you want to sample texture 1, you switch to this palette:
C10, C10, C10, C10
C11, C11, C11, C11
C12, C12, C12, C12
C13, C13, C13, C13
All columns are equal now. With this palette, the two higher bits are irrelevant.
 
Thanks for the explanation, it wasn't obvious to me that this is about switching palettes.
 
Back
Top