I wrote this several times, but let's try it again.
How Zlib works.
Zlib (or ZIP, the most popular compression algorithm), like the vast majority of compression formats uses
single threaded decompression.
It basically counts known sequences of bits and groups them together by giving them a different "name". As a
very simple example of compression:
"0000000 111111 0000 1111" can be compressed into sequences of "number of zeros"-"number of ones", of which you could say it's 7x0 + 6x1 + 4x0 + 4x1, or "'111 110 100 100".
With this, I "compressed" 21 digits into 12.
But the result is one sequential file of which you can't change the order or take random blocks out of, otherwise it becomes unreadable. Or as explained
in the blog post:
To make ZIP compression levels higher, you need bigger files. To make ZIP decompression parallel, you'd need to split the original file into smaller files. So to make ZIP decompression more parallel, you lose compression ratio, and effective IO throughput in the process.
So at least with Zlib or anything else that uses ZIP, you don't gain by adding more threads to it.
GPUs aren't better than CPUs at running single-threaded code. They excel at highly parallel tasks. Which is why making a GPGPU decompressor for ZIP makes no sense, other than to save CPU cycles if the raw I/O is slow enough so the GPU decompression doesn't become a bottleneck. The only way to make Zlib decompression faster than a CPU is to make a dedicated hardware block for it.
Kraken is a very different compression format that was developed for
decompressing on 2 threads. It's not great for parallel work, but it's better than Zlib's one thread. But it's still not going to gain ridiculous amounts of performance from a GPU.
Which is why, to get crazy high Kraken performance like an >8GB/s output, once again a dedicated hardware block is needed. Unless the game engine is consistently trying to load tens or hundreds of textures at the same time (which I don't think it happens). But decompressing one texture is always going to be faster on a 3GHz CPU than a 1.9GHz shader processor.