GameCube's S3TC: How does it work?

sfried

Regular
I've heard claims from most reviews that the color in most GameCube games have a tendancy to look "more vibrant", notably the sleeper hit Beyond Good and Evil. I don't know if this has something to do with how GC handles textures or color compression. (I know for one thing banding occassionally occurs in some of the more popular 3rd parties)

I was wondering if all texture processesed are passed on to the S3 processor or if it's something software controlled. Could explain why some games look better on GC (RE4), while some do not (Metal Gear Solid: The Twin Snakes uses MGS2 engine. I also noticed alot of the PS2 games use alot of blur effects to keep the detail "consistent" throughout, which is something that Twin Snakes doesn't quite seem to "get," and also add the fact that it was handeled by a different dev other than Konami). To put it simply, is S3 always "on"?

Besides that, I've noticed only one game so far to utilize some S3 specific stuff is Timesplitters 2 (such as looking close to the walls). RL and RS3 (and some other games, too) might also have used them, but I probably haven't noticed because the S3 "assests" were already part of the art direction.
 
I think this is the best info on the net on this topic.

Texture Compression

The Gamecube's GPU can use S3TC's compressed textures which provides for a 6:1 ratio in compression for 24-bit textures. For 16-bit textures the ratio is 4:1, and for 8-bit textures the ratio is 2:1.

Let us consider how much compressed textures the Gamecube can hold in it's 24 MB of main memory if we consider different memory size requirements for game code/geometry/etc.
Code/Geometry/etc. Free Texture Space 24-bit Textures (compressed 6:1)
6 MB 18 MB 108 MB
8 MB 16 MB 96 MB
10 MB 14 MB 84 MB
12 MB 12 MB 72 MB


As you can see the Gamecube can store lots of textures in it's memory using S3TC's texture compression format. Note that another benefit of using compressed textures is that the bandwidth requirements also decrease by the same ratio as the actual compression. At a ratio of 6:1, the memory bus can pass 6 times more textures. That means the GPU's texture cache bus of 10.4 GB/sec can pass 62.4 GB of 24-bit compressed textures, and the external bus of 2.6 GB/sec can pass 15.6 GB of 24-bit compressed textures each second!

Should a developer use 16-bit textures over 24-bit textures in order to save space? Let us compare:
Texture Size 512 x 512 16-bit 24-bit
Uncompressed 525 KB 786 KB
Compressed 131 KB 131 KB


As you can see with the greater ratio of 6:1 for 24-bit textures, it makes more sense for the developer to use only 24-bit compressed textures as they are the same size as 16-bit compressed textures.

S3TC also allows texture compression of transparencies, which the Vector Quantization (VQ) texture compression on the Dreamcast could not do. This will allow the Gamecube to store lots of transparencies in it's main memory.

Thanks to S3TC's texture compression, the Gamecube's 1 MB of texture cache can hold the equivalent of 6 MB of 24-bit textures. That's roughly 8 x (512 x 512) 24-bit textures, or 32 x (256 x 256) 24-bit textures for example.

Virtual Texturing

Virtual texturing is a hardware feature of managing textures by breaking them up into smaller blocks. This can contribute to quite a savings in bandwidth and make for more efficient use of the texture cache for textures like sky textures for example where in most games only half of the sky can be seen in most scenes. By keeping the most used texture blocks in the texture cache, this allows main memory bandwidth to be used more efficiently. All of this is done automatically, and does not have to be coded in by the developer.

Althought I think it does not answer your question directely, I guess that GC textures look better because a 16bit or a 24bits texture does have the same size and others consoles (PS2?) use lower qualitity compressions/textures (16bits) , plus the 1Mg texture cache gives it some advantage, basicaly GC does have a much better use of its memory.

Anyway see the link, there is a lot of info in.

Just as a side note, althought GC with 24Mgs of Ram already do a great job on this I do hope that Rev bring us something new here as many (all?) GC games (eg RE4) could look much better with some very high(er) qualitity textures.
 
Last edited by a moderator:
Thanks for the info. Although I'm still wondering if some of the features mentioned (like said "zoomed-in" textures from TimeSplitters 2) are software or hardware implemented. (Is this a feature they could turn off to save processing power or does the way it looks depend on the pogramming?)
 
S3TC decompression is done on the fly. There's never a decompressed texture anywhere in memory, but the decompression is done "just in time" when a texel is fetched by Flipper.

It's universally how you do S3TC, including PC parts.

There is no such thing as an "S3 processor" in Gamecube, and it certainly is not software either.

Using an S3TC compressed texture is totally straight-forward, it's like using an uncompressed texture in almost every way, except that it is smaller (duh!) and it has a different data format (duh! again).
That's pretty much like the difference between RGBA5551 textures and RGBA8888 textures. The hardware cannot guess the format, so you just supply a magic number that describes it, which you either hammer into a hardware register yourself, or you supply it to the OpenGLish software layer that Nintendo supplies when you "select" that texture for rendering.

And in addition to magic numbers describing the uncompressed formats, there are also magic numbers for the S3TC compressed formats.

Making an S3TC compressed texture is a more involved process, but it doesn't actually happen on the Gamecube hardware, but during art production. Then you just put the ready compressed textures on the disc and you're done with it.
 
sfried said:
I've heard claims from most reviews that the color in most GameCube games have a tendancy to look "more vibrant", notably the sleeper hit Beyond Good and Evil. I don't know if this has something to do with how GC handles textures or color compression. (I know for one thing banding occassionally occurs in some of the more popular 3rd parties)
You have to understand what S3TC really is, and what you're comparing against. PS2 or XBox? It reads as if it's the PS2 you're comparing against.

S3TC does not "make textures look nicer" or something. It's just a lossy compression technique, it makes textures significantly smaller in memory, but loses some quality.
Due to compression, you may be able to use larger textures that are more detailed, and that will in most cases more than balance out the quality loss introduced by compression.
E.g. instead of an uncompressed 256x256 texture you can use a 512x512 compressed texture, that's four times the amount of texels, and it will still use the same amount of memory.

The PS2 doesn't directly support S3TC in hardware. However, it supports palettes. When you port games between these two systems either way, you're always going to lose some texture quality, though the Gamecube has the potential for higher texture quality (properly filtered to boot) in exclusive titles.

The Gamecube also has a rather unique RGBA6666 color format for its backbuffer. It isn't always used, but sometimes it's necessary because of the limited size of the embedded framebuffer memory. This is somewhat of a double-edged sword. The bad thing is that it's more prone to banding than RGBA8888, the format you'll usually use on the PC.
The good thing is that it is less prone to "color defects". Try running a PC game with lots of greys in it with a 16 bit framebuffer. The usual layout for that is five bits for red and blue plus six bits for green. You'll notice some shades of grey leaning towards purple (to little green) and others towards green.
Having the same amount of bits for all color channels avoids these artifacts.

But I really don't think that one explains any higher quality. Perhaps vs an XBox version that uses the same textures but renders to a 16 bit framebuffer. I don't know if such a thing exists. It might just be that GC has a nicer deflicker filter going on or a better gamma curve or whatever.
 
zeckensack said:
S3TC decompression is done on the fly. There's never a decompressed texture anywhere in memory, but the decompression is done "just in time" when a texel is fetched by Flipper.

It's universally how you do S3TC, including PC parts.

There is no such thing as an "S3 processor" in Gamecube, and it certainly is not software either.

Using an S3TC compressed texture is totally straight-forward, it's like using an uncompressed texture in almost every way, except that it is smaller (duh!) and it has a different data format (duh! again).
That's pretty much like the difference between RGBA5551 textures and RGBA8888 textures. The hardware cannot guess the format, so you just supply a magic number that describes it, which you either hammer into a hardware register yourself, or you supply it to the OpenGLish software layer that Nintendo supplies when you "select" that texture for rendering.

And in addition to magic numbers describing the uncompressed formats, there are also magic numbers for the S3TC compressed formats.

Making an S3TC compressed texture is a more involved process, but it doesn't actually happen on the Gamecube hardware, but during art production. Then you just put the ready compressed textures on the disc and you're done with it.

I remember a claim at one point that the gamecube handled texture compression differently than standard PC parts. Something about standard PC parts (such as the NV2A) decompressing prior to use, whereas the gamecube was capable of using the texture while still compressed.
 
Fox5 said:
I remember a claim at one point that the gamecube handled texture compression differently than standard PC parts. Something about standard PC parts (such as the NV2A) decompressing prior to use, whereas the gamecube was capable of using the texture while still compressed.
Yeah, some NVIDIA parts, up to and including the NV3x family, decompress the texture as it is loaded into texture cache.
Even there it is as I've said, decompression happens on the fly, and the decompressed result isn't stored in memory. The hardware handles it all by itself and makes it transparent to the application.

The issue with the NVIDIA parts wasn't that they did this at all, it's totally legit IMO, but rather that they chose to decompress opaque textures ("DXT1") to a 16 bit format which often isn't good enough ... and that they ate a large portion of the potential performance gain, because the effective texture cache drops to a quarter of what it actually is. Both things should be fixed starting with the Geforce 6 series.

Competing chips from ATI or S3 themselves never had either issue, and I don't quite believe Flipper would have them (although I don't know for sure).
 
I remember a claim at one point that the gamecube handled texture compression differently than standard PC parts. Something about standard PC parts (such as the NV2A) decompressing prior to use, whereas the gamecube was capable of using the texture while still compressed.
Except that PC parts DO also use compressed textures in VRAM and decompress on the fly. It's convenient for the TMUs as well since a single contiguous segment of x bits (usually 128 bits) happens to cover a square of multiple pixels (usually 4x4), so often times, you don't need to access anything else to get the block you need for interpolation.

If you're referring to decompressing into a texture cache, that's a different matter, though it may affect how many times you have to refill cache. That's why for instance, on the PSP, it's generally faster to use indexed textures because you can store that many more pixels in a cache fill, whereas compressed textures will have as many cache refills as an uncompressed texture.
 
True, NV2a AFAIR decompresses the textures between the 128Kb L2 and 8Kb L1 cache.
GC reads textures directly from the 1Mb texture RAM without any intermediate buffers.

S3TC has a couple of disadvantages that IMO makes it nice to also have CLUT and 16bit as a possibility.
- For every one texel you need, you have to fetch the whole 4x4 block.
In filtering operations with bilinear, trilinear and anisotopic used, the worst case scenario is having to load 2x8 blocks for essentially one texel.
- It's not good for lowres textures (equal or less than 256x256), where a there is lot of sharp colour changes. In that case, 4bit CLUT textures might actually be preferable (with a largely monocrome texture the two methods perform equally well).
- Because of it's block truncated nature it is horrible for animated textures, and doesn't have the ability of CLUT textures to cycle through the palette to make pulsing or simple changing patterns.
- Alpha requires 4 extra bits per texel, CLUT textures doesn't have this problem

The differences between the texture formats in the Graphics Synthesizer and the Flipper, has nothing to do with the possible vibrancy of the colours, though it might affect the total range of colours avalible within a given texture (see above).
If anything the difference is more likely to be in the DAC or artistic choice, than the texture formats.
 
Last edited by a moderator:
zeckensack said:
You have to understand what S3TC really is, and what you're comparing against. PS2 or XBox? It reads as if it's the PS2 you're comparing against.

...Perhaps vs an XBox version that uses the same textures but renders to a 16 bit framebuffer. I don't know if such a thing exists. It might just be that GC has a nicer deflicker filter going on or a better gamma curve or whatever.

I don't think it's flick-filter, as it makes most to practically all GC games look "washed out" when compared to the sharp "contrast" that PS2 games usually have.

It might be the case of the XBox where they just render it to a 16bit frame buffer. Or DirectX handles colors differently (which I also noticed since games tend to look "sharper" when rendered in OpenGL vs. DirectX, though DirectX can handle some effects better). Perhaps it's also the difference between T&L...dunno.

zeckensack said:
Making an S3TC compressed texture is a more involved process, but it doesn't actually happen on the Gamecube hardware, but during art production. Then you just put the ready compressed textures on the disc and you're done with it.
That probably explains it.
 
zeckensack said:
Yeah, some NVIDIA parts, up to and including the NV3x family, decompress the texture as it is loaded into texture cache.
<snip>
Competing chips from ATI or S3 themselves never had either issue, and I don't quite believe Flipper would have them (although I don't know for sure).
Even modern ATI and NVIDIA cards often decompress to texture cache... most have some formats that can read compressed from the texture cache (say DXT1) and a few they can't so they decompress (say DXT5).
However as they almost never tell you the size of there texture caches its all fairly academic in PC land...
 
Squeak said:
True, NV2a AFAIR decompresses the textures between the 128Kb L2 and 8Kb L1 cache.
GC reads textures directly from the 1Mb texture RAM without any intermediate buffers.

There is going to be an intermediate buffer of somesort, it may or may not attempt to cache results from previous decompression operations.
But there is no documented extra cache level.

But it's just arguing semanticsm what matters isn't when it's decompressed it's what the performance characteristics are.

GC has an enourmous (by PC chip standards) fast ram pool for textures, so indirect texture reads are fast reguardless of spatial coherency in the texture fetches, Modern PC chips have small caches with large set associativity, so any reasonably spacially coherent access pattern is cheap. None spacially coherent patterns are not.

Pc's go this way for a reason, because most useful access patterns are spacially coherent.
 
zeckensack said:
The PS2 doesn't directly support S3TC in hardware. However, it supports palettes. When you port games between these two systems either way, you're always going to lose some texture quality,
That's assuming PS2 title authoring textures in indexed formats (in which case conversion to S3TC will indeed look pretty yucky). It really depends on how a particular developer works, personally I'm more of a proponent of authoring in 24/32bit, quantization tools have made quite impressive advancements over last 5 years.

ShootMyMonkey said:
If you're referring to decompressing into a texture cache, that's a different matter, though it may affect how many times you have to refill cache.
It's really not a matter of "may" it Will affect hit and refill rates. That said, GCN cache is big enough to easily hold several large uncompressed maps, so the ability to hold compressed texels is somewhat redundant for that purpose alone.
My guess is that the wider range of supported addressing modes of texture cache(after all it also supports all indexed formats) is there because of the ability to configure it as user addressable memory instead of cache - akin to PS2s eDram, in which case it comes quite handy to store as much data in there as possible.
 
Squeak said:
True, NV2a AFAIR decompresses the textures between the 128Kb L2 and 8Kb L1 cache.
Why would NV2a need a L1/L2 organization? Those numbers sound quite big to me, do they come from official specs?

S3TC has a couple of disadvantages that IMO makes it nice to also have CLUT and 16bit as a possibility.
- For every one texel you need, you have to fetch the whole 4x4 block.
In filtering operations with bilinear, trilinear and anisotopic used, the worst case scenario is having to load 2x8 blocks for essentially one texel.
A single S3TC block is 64 bits in size. That's usually <= memory access granularity, so there is no disadvantage at all.
- It's not good for lowres textures (equal or less than 256x256), where a there is lot of sharp colour changes. In that case, 4bit CLUT textures might actually be preferable (with a largely monocrome texture the two methods perform equally well).
True, but cases where a 4-bit CLUT is good enough are rare. Mostly comic-style textures, and without mipmaps.
- Alpha requires 4 extra bits per texel, CLUT textures doesn't have this problem
Binary alpha doesn't, and if you want an alpha gradient 4-bit CLUT is extremely limiting.
 
Squeak said:
S3TC has a couple of disadvantages that IMO makes it nice to also have CLUT and 16bit as a possibility.
- For every one texel you need, you have to fetch the whole 4x4 block.
In filtering operations with bilinear, trilinear and anisotopic used, the worst case scenario is having to load 2x8 blocks for essentially one texel.
... And CLUT textures have a similar problem in that your palette has to have 4/8/N read ports to support those filtering modes.
 
Alot of the discussion seems to be revolving around the "compression," but I've also read somewhere on IGN (back when GC was only known by it's codename: Dolphin) that S3TC also handles enlarged or strecthed textures better than conventional bilinear resampling, thus addressing the common problem that resulted in most N64 games looking washed out. (This had nothing to do with the compression part, mind you.) Does S3 also allow it's block decomposition to be used for scaling textures as well? (I believe the article also showed comparison pics, and showed certain PC racing game that utilized S3 to remove the "gourand-ness" effect that distant textures offen had.)
 
Last edited by a moderator:
Xmas said:
Why would NV2a need a L1/L2 organization? Those numbers sound quite big to me, do they come from official specs?
No, but they came from a source who had no reason to lie.

A single S3TC block is 64 bits in size. That's usually <= memory access granularity, so there is no disadvantage at all.
Well, that depends entirely on the kind of memory you are using, doesn't it?

True, but cases where a 4-bit CLUT is good enough are rare. Mostly comic-style textures, and without mipmaps.
If you by "comic" you mean manmade style textures like letters or drawings, then yes, but why not MIP maps? Also, the wast majorety of textures in our world are largely monochrome.
What 4bit CLUT textures isn't good at, is substituting geometric detail with flat 2d texture, like it's often necessary with the limitations of the current technology.

Binary alpha doesn't, and if you want an alpha gradient 4-bit CLUT is extremely limiting.
Binary alpha looks horrible for almost anything, even straight lines.
 
DeanoC said:
Even modern ATI and NVIDIA cards often decompress to texture cache... most have some formats that can read compressed from the texture cache (say DXT1) and a few they can't so they decompress (say DXT5).
However as they almost never tell you the size of there texture caches its all fairly academic in PC land...
On older NVIDIA parts the effects of the reduced* texture cache can be measured. I think NV2x and NV3x had 4kiB of texture cache, because that explains the performance profile of the chips. It might actually be more, data may be replicated across multiple caches or whatever, but determining the effective cache size isn't exactly rocket science either.
*it still holds as many texels as it would for uncompressed 16 bit (default DXT1) or 32 bit (DXT5 or "tweaked" DXT1) texture, but it's a loss vs the competition regardless
Simon F said:
... And CLUT textures have a similar problem in that your palette has to have 4/8/N read ports to support those filtering modes.
A certain IHV with strange capitalization has been caught decoding paletted textures as they are read into texture cache ;)
That at least solves the read-port problem.
 
Squeak said:
Well, that depends entirely on the kind of memory you are using, doesn't it?
It depends on how the memory controller is designed. I can't think of a single PC GPU with memory access granularity of less than 64 bits, at least not in the last 5 years. And I'm pretty sure that applies to GC as well.

If you by "comic" you mean manmade style textures like letters or drawings, then yes, but why not MIP maps? Also, the wast majorety of textures in our world are largely monochrome.
What 4bit CLUT textures isn't good at, is substituting geometric detail with flat 2d texture, like it's often necessary with the limitations of the current technology.
Mipmaps should be generated by filtering the base map, and filtering produces intermediate values. But since you are limited to a small CLUT, you either have to optimize the palette for all mip levels, meaning you get even less colors for the base map, or live with substantial color errors in mipmaps.

Binary alpha looks horrible for almost anything, even straight lines.
It certainly is enough for most alpha test cases. However I doubt 4-bit CLUT is good enough when you need multiple alpha values either.
 
Back
Top