S3 Savage, MeTaL API

Reading metal.h, it does not call OpenGL or Direct3D. so... yes, to me it looks more like a full-fledged API rather than some extension to OpenGL or something? So I stand corrected: It looks pretty complete to me. I'm seeing more than just texture compression and handling, there's a lot in here about buffers, masks, registers, triangles, framebuffer, z-buffer, lighting, shading... oh: and DOS compatibility for compiling (though doubtful anyone used it). Anyhow, it's more info than we had before.

Done a lot of thinking about this and I am struck by a comment about the BitBoys Pyramid3D: "we only made our own API because Direct3D didn't do what we wanted at the time." (summary from an old Beyond3D thread). I think this is the same situation. They came up with texture compression, it wasn't natively supported by all hardware out there nor Direct3D, so they made their own (complete) API. Later, after the fact, their texture compression was added to all major APIs.

But I'm no 3D expert so what do I know (calls in the experts). ;)
 
GeForce 1/2/3 suffered from a hardware bug (or was it a hack to circumvent s3 patent? I think s3 sued Nvidia back in the day over due royalties for implementing s3tc into NV1x) that affected specifically DXT1 textures. Certain NVidia drivers included a way to automatically convert all DXT1 textures into DXT3/5 but at the price of lost efficiency: a DXT3 texture takes twice the memory space of a DXT1 one, for the same size, because of the additional 4bpp alpha channel (DXT1 only allows 1 bit for masking)
My educated guess is that the old GeForce GPUs (like the old ATI GPUs) uncompressed DXT textures to the GPU L1 texture cache (to make filtering fast and simple). Since DXT color endpoints are 565 (16 bit), it might be tempting to uncompress it to the texture cache as 565 (16 bit). This way it uses half the texture cache space (= less cache misses = faster). However uncompressing to 8888 (32 bit) produces better quality, since DXT 565 palette values are linearly interpolated. The two endpoints colors can always be presented at 565 (lossless), but the two interpolated middle colors generated need "1.5" bits more precision for lossless storage. This kind of image quality degradation is visible most clearly in smooth gradients (= palette endpoints are 1 away from each other).

Similarly DXT5 is uncompressed to 8888 (on all GPUs that uncompress data to the texture cache). This is good enough for the rgb, but the alpha channel has 8 bit palette endpoints and 3 bit (8 value) interpolation. It would need 10.5 bit storage to be lossless. On the other hand BC5 compression (two DXT5 alpha channels, also known as ATI 3Dc) uncompresses to 16+16 bit (32 bits), giving better quality. I don't actually know how modern hardware handles DXT5. There shouldn't be a reason to degrade the alpha channel quality, since the uncompressed data is not stored in the cache at all. Modern hardware keep the textures in their compressed (DXT) format in the L1 texture cache (this makes caches practically 4x "larger"). The data is uncompressed and filtered on the fly (and moved to shader registers = 32 bit floats per channel = enough precision to keep anything perfect).
 
I was reading about the Geforce DXT1 issue a few months ago and found out that NV25/17 (GF4/GF4MX) implemented dithering for DXT1 and this fixed up the extreme banding. In other words, everything from Geforce 256 to Geforce 3 had the banding.

I've also seen that some games avoid DXT1 on early Geforce hardware, defaulting to DXT5, I think. I remember this in Soldier of Fortune 1/2. It's in the video settings menu. So manual tweaks were only needed for some games. Quake 3 was probably the most talked about one.
 
Last edited:
My educated guess is that the old GeForce GPUs (like the old ATI GPUs) uncompressed DXT textures to the GPU L1 texture cache (to make filtering fast and simple).
My impression was that this was done to make the cache hold effectively 2x the number of texels when the source was DXT1 (though I'm not entirely sure how they would then distinguish the "fully transparent" pixels mode in DXT1. Perhaps they mapped everything to 1555 rather than 565 as in the original DXT1 base colours, which'd lead to even worse colour represntation).

Because DXT2~5 had to have a (non-trivial) alpha channel, that had use a 8888 format in the texture cache.

This unfortunately, led to the horrible, widespread mis-assumption that DXT2~5 had better quality than DXT1 for opaque textures. Not only does DXT2~5 double the bandwidth, ironically the quality is lower, (thanks to some strange limitations that were added to the DX spec).

It was also very annoying because I'd worked on a chip that had high speed DXT1 support (compared to 2~5) but many devs simply weren't using that format because of the above.
 
My impression was that this was done to make the cache hold effectively 2x the number of texels when the source was DXT1 (though I'm not entirely sure how they would then distinguish the "fully transparent" pixels mode in DXT1. Perhaps they mapped everything to 1555 rather than 565 as in the original DXT1 base colours, which'd lead to even worse colour represntation).
Yeah, I agree with you. They most likely uncompressed DXT1 as 1555 in the texture cache (instead of 565) because the DXT1 supports a single bit alpha block mode (if the first color endpoint value is larger than the second in a 4x4 block palette, one of the palette entries encodes alpha=0, and there is only one interpolated value). The GPU cannot know whether the texture contains any alpha pixels, so it must assume so and use 1555 (if you wan't to shave the storage cost to 16 bits per pixel in the cache). That's kind of horrible. But not as horrible as the only available Nokia N-Gage back buffer format, which was 4444 (obviously the 4 alpha bits weren't even used for any purpose) :)
 
Back
Top