See my previous post. You can't process those 256KB blocks in parallel before stitching them up together into one compressed texture file. A highly parallel processing hardware architecture is worthless for those 64-256KB blocks.
Compression ratio is in the figure 13 of the article that I posted here. Check the x axis.
It's >3:1 on ZLIB vs. 2:1 on the GPU compressor. It's below 66% of compression effectiveness.
A highly focused fixed function hardware block makes sense to decompress a file very fast. A highly parallel set of lower performing ALUs does not. The commercial IP blocks I linked to have more explanations why "throwing more cores" isn't the solution for the file decompression problem.
Why would the game engine request a random slice of a compressed texture file if it can't do anything with it?
At most, the game engine determines a desired LOD and requests a mipmap accordingly, but the smaller mip is still inside the larger texture file AFAIK. The advantage here is the engine doesn't need to put the large texture into the VRAM.
Well if you're not reading all the explanations and quotes I wrote here, nor the documentation and scientific articles I linked to about how file compression works, then it's really easy to not be convinced of anything..
¯\_(ツ)_/¯
You don't wait a second for a texture to load if you're streaming textures on the fly like they envision to do in this new generation (i.e. with little to no prefetching, while walking and turning,
not during a loading screen or a narrow corridor).
Ideally, you wait a couple of 16ms frames or one 33ms frame (like they did in the Unreal Engine demo). This means that, within 33ms, the system needs to, for each texture:
1 - Identify the texture file and mipmap level and request the file from storage
2 - Send the texture file at 5.5GB/s (probably into the ESRAM that sits in the IO complex)
3 - Decompress the file in the ESRAM into the RAM, at 8-20GB/s
The bottleneck here is how many texture files you can send towards the ESRAM during step 2, within less than 33ms.
At 5.5GB/s, you can send around 180MB of compressed textures assuming you have the whole 33ms. Looking at the
Unreal Engine's own documentation on DXT5 textures, that's just 9 (nine) 20MB 4K*4K textures being sent within 33ms, or maybe 18 if we assume a 2:1 Kraken compression afterwards.
What good are the 8704 parallel ALUs in a RTX 3080 for decompressing these 18 files that need to be processed in a mostly serial fashion? And how fast is each of these tiny ALUs is at decompressing each of these 5-20MB files?
It's not important to be able to decompress 5000 textures within 2 seconds (for which I'm sure GPUs could be very good at.. until they ran out of VRAM at least). What's important is to able to decompress one large compressed texture file - which is a mostly non-parallel task - within a couple of milisseconds.
Yes, he spent actually about half the presentation time talking about how the PS5's storage is so important for their next-gen vision, all of which is dependent on the SSD's raw speed and the decompressor's performance to drive data to the RAM.
It's
literally in the article you posted yourself.
They're comparing DirectStorage vs. non-DirectStorage.
Proof #88 on how nVidia is being super shady about this:
they're not measuring decompression speed either in that demo.
Look at the graph, it says
level load time, and they also talk about
CPU utilization.
Here's what Microsoft says about DirectStorage:
https://news.xbox.com/en-us/2020/03/16/xbox-series-x-glossary/
https://devblogs.microsoft.com/directx/directstorage-is-coming-to-pc/
DirectStorage is all about reducing the number of steps (i.e. CPU cycles and communication to IO latency) that are currently needed to perform an IO request.
Loading from a NVMe will already be much faster with DirectStorage even if using CPU decompression, and nVidia very conveniently left that info out from their "demo".