Most blocks of pixels can be accurately described by 2 planes and edges between those planes, this is why S3TC was such a good idea. Simon improved upon that by letting the 2 planes be gradiated, S3TC can encode a gradient but not a combination of 2 gradiated planes, and he removed redundancy between the blocks (too often the coded colors of neighbouring blocks will be near identical, that correlation is a bit too short range to ignore). S3TC was very good, and PVR-TC is essentially S3TC++. I dont think that for fast decodable fixed compression codes you are going to be able to do much better ... especially not with transforms, Id sooner try my hands at forms of VQ with a fixed codebook.