Simon F said:
But how do you pack the colours? A 2k palette => 11 bits per pixel and that's not going to fit very well into a binary machine. You'd need horrible addressing calculations!
You wouldn't do that. I propose storing the palette in shader constant arrays. The reason you would limit your palette is for performance, not memory packing. Afaik, the constant arrays should be mostly loaded into registers, so memory consuption is the concern here not bandwidth. Thus, you want to limit how many colours you need as much as possible. I guess you say that you would pack the palette in 128-bit FP 4 component vectors. However, this would be entirely seperate from the texture itself, which would just be a single channel index table. Therefore, lookup would be trivial: Read the texture, lookup the corresponding colour in the constant array and render.
For more compression, you could arrange the colours in the palette by grouping similar colours. This would allow for a margin of error in the index value you read from the texture, since if you're off by +/- 3 or 4 values it shouldn't be too horrible. In turn, allowing you to use some basic single channel compression. Of course, this may not work on all textures, but even then without it you achieve some significant compression. After all, while 8 bit would give you only 256 colours, which is useful enough, 16 bit gives you 65,536 potential colours, which is far more then necessary or that the hardware can provide. Thus at minimum you get 8:1 compression (128-bit to 16-bit). Most textures vary little in colour, like brick or dirt, so you shouldn't need a huge number of colours, so 8-bit should work here. After all, people still use GIFs for a reason. You could also pack 2 or 3 texture palettes into one array, and share pallettes between textures as other optimisations.
The way I see it, full 32-bit textures are only necessary for very colourful and moderately noisy textures, which are a limited number. With low noise, the error of compressing the index texture will usually result in picking like colours anyways, and with high noise you won't notice the error (or much of anything really). The worst problem would be potential banding on smooth textures, but even then you can use bilinear filtering as an input to blend between palette colours (use the fractional part of the sample as the blending factor). Come to think of it, if you have a well ordered and smooth palette you might be able to use AF too, but I'm not sure. Also, if you notice a texture has a very regular palette, it may be possible to replace it with a palette generated by a function (could be a complex waveform if you have the GPU power). A sort of pseudo-procedural texture, which would free up registers for other palettes or things of interest. There's a host of other possibilites I'm sure, and this is all rendered in 128-bit FP colour.
EDIT: Is PS I took a diagonal rainbow gradient at 1024x1024, converted to 256 colour GIF with random selective dithering, then applied 1 px guassian blur to approximate bilinear filtering, and I'll be damned if it doesn't look smoother then the original somehow! Even a pure red/green gradient and some game screens I have are great.