That's the problem though. If you use WebP or JPEG you'll have to decode using the CPU, which is slow.
Actually you don't. Well, kind of don't.
You can just take 95% of the WebP codec but substitute the final deflate implementation with GDeflate. Everything else in that codec is already a perfect fit for the GPU. And suddenly you got in on the GPU, with all the benefits. Benefits such as: Mipmap chains with perfect frequency preserving sampling come for free as part of the compression scheme. You just need some extra meta data to correctly dispatch the reconstruction of the final target buffers, and you need a container format you can pass straight into GDeflate.
JPEG? Well, the same applies. Used that a decade ago, back then only the JPEG depacketization and the Huffman decoding needed to be done on the CPU, as it didn't scale properly on the GPU. Everything else was better placed on the GPU, scaled perfectly. Actually better scalability than h.264 via hardware-decoder at that time, at least if you didn't go for the bottom of the line model...
Nowadays, you would also just take a modified JPEG format with the deflate part substituted, wrap it in a nicer container, but still keep the whole macro block handling as it is.
Neither is WebP or JPEG in the sense that it would be binary-compatible, but the important part is neither the container format nor the bitstream, but the know-how in the block encoding (JPEG) respectively channel/frequency domain isolation (WebP). Trans-coding an existing WebP or JPEG file this way is trivial. It's literally just depacketizing, deflate decompression of the payload, recompress, write the meta data somewhere else, and that's it.
Fun fact: Even the AI decompression paper
@DegustatoR mentioned is in most parts building up on the design principles of the lossless WebP codec - at least the part where they break into down into a mip-chain with isolated frequency ranges before applying their own encoder to each level individually.
Optionally, it only falls back to VP8 intra frame coding for the lossy codec.
The interesting part about WebP is the
lossless codec. Which is the one I'm referring to when talking about switching to a multi-planar layout, run-level-coding and built-in mip-chains.
I'm not so sure about using P and B frames for textures though. P and B frames tend to have less clarity and that's fine for a video frame which is displayed only for 1/30 seconds but for a static texture it can be too blurry.
That's just encoder configuration. Nothing else.