A question... Under the current model, the CPU copies data from storage into RAM, and then decompresses it, then copies the decompressed data to VRAM for use by the GPU. When we talk about saturating storage bandwidths, we're referring to the CPUs ability to copy AND decompress that data right?
So under the DirectStorage model, the CPU will again copy from storage into RAM, and the copy again the data destined for the GPU for decompression. How much resources does it take to simply copy that data into system memory? Is the CPU able to easily saturate Gen3/4 NVMe speeds just copying the data vs copying/decompressing?
Or is it really the improvements to the storage stack, such as bypassing file system overhead, as well as batching I/O requests and not requiring notification for all completed requests, that allows the CPU to copy that data faster?
DirectStorage seems like a pretty good step in the right direction on the PC side of things. Obviously not close to what the consoles have, but a workable solution for the short term until hardware can get to where it needs to be... as we know those things can take time on the PC side of things.
I'm just thinking that if you have a 7GB/s NVMe drive, and the CPU can saturate that bandwidth, up to 14GB/s given a 2:1 compression ratio... that means the CPU should be able to fill RAM up extremely quickly. And given that the texture/geometry data can remain compressed.. that essentially doubles the RAM capacity. So a lot of data can be put into RAM very VERY quickly.
From that point, you're still sending compressed data over to the GPU, so that means that you can send that data to the GPU even faster than you could before, and the GPU can decompress that data quicker than the CPU could, and at a more consistent rate.