Blazing Fast NVMEs and Direct Storage API for PCs *spawn*

So it looks like they've brought GPUDirect Storage to the desktop. Awesome.

I found this quote particularly interesting from Microsoft on Nvidia's web page:



The emphasis is mine and it implies that it'll be a standard feature of Direct Storage games to use advanced compression. It would make sense if that compression scheme were the same as used in the XSX, i.e. BC-PACK. A possible further hint towards that is the compression ratio advertised for the XSX and that used by Nvidia in their slide are the same at 2:1.



I'm not sure why. 22GB/s is the theoretical limit of the PS5 decompressor, not what you're going to achieve with normal game code and Kraken. Sony advertised 8-9GB/s from a 5.5GB/s raw throughput for a reason - because it's in line with typical Kraken compression ratio's.

Microsoft are advertising more than that for BC-Pack at 2:1 and that's the same compression ratio used by Nvidia in their example too. So starting with a higher raw throughput (7GB/s vs 5.5GB/s) and adding a higher compression ratio results in the 14GB/s being advertised by Nvidia being reasonably comparable to the 8-9GB/s advertised by Sony. Note in their presentation Nvidia also mentioned the GPU decompression could run faster than the limits of a 7GB/s SSD. That's analogous to Sony's mention of the hardware decompression block being capable of 22GB/s peak.



Nvidia clearly showed in their slide a 14GB/s decompression rate (2:1 compression ratio on the fastest PCIe 4.0 NVMe drives), and they go on to state that the GPU's are capable of more. It seems pretty open and shut to me and I'm not seeing the basis for assuming a hardware decompression block is going to be more performant than a GPU packing tens of TFLOPs. The console manufacturers could easily have included them simply because they were relatively cheap compared to including a slightly bigger more powerful GPU capable of doing the same job in shaders.

EDIT: damn... @Dictator beat me by 60 seconds and said it better too.

This was before Oodle texture. I suppose we will see game well above 10 GB/s with it. if no Sony would have gone with a slower decompressor, it would have been cheaper.
 
@chris1515 already answered that above. They're talking about different workloads. The 2 cores requirement in NV's slide is purely to handle the IO, no decompression involved. The PS5/XSX numbers include decompression. For comparable numbers look to the 14 cores NV mentioned being required to handle both the IO and decompression at 7GB/s

Thanks. Very clear.

I wonder what those cores are.
 
So it looks like they've brought GPUDirect Storage to the desktop. Awesome.

I found this quote particularly interesting from Microsoft on Nvidia's web page:



The emphasis is mine and it implies that it'll be a standard feature of Direct Storage games to use advanced compression. It would make sense if that compression scheme were the same as used in the XSX, i.e. BC-PACK. A possible further hint towards that is the compression ratio advertised for the XSX and that used by Nvidia in their slide are the same at 2:1.
That would be pretty interesting if Direct Storage were not only a new IO Standard to avoid Windows bottlenecks and serialisation, but also a standard compression Format from MS as well. It makes a lot of sense to me when you put it like that and also point out the 2:1 bit.


EDIT: damn... @Dictator beat me by 60 seconds and said it better too.
I spent those 60 seconds decompressing :)
 
This was before Oodle texture. I suppose we will see game well above 10 GB/s with it. if no Sony would have gone with a slower decompressor, it would have been cheaper.

I'm still not convinced Oodle texture wasn't accounted for in Sony's original numbers. They would have known about RDO texture compression long before then and presumably had the relationship with Oodle to know it was in the pipeline. Plus, while I've seen some wildly different compression ratio's stated for Kraken and Kraken+Oodle texture, the more consistent ones put Kraken+Oodle texture more in line with Sony's already advertised 1:1.64 as far as I can tell.
 
I'm still not convinced Oodle texture wasn't accounted for in Sony's original numbers. They would have known about RDO texture compression long before then and presumably had the relationship with Oodle to know it was in the pipeline. Plus, while I've seen some wildly different compression ratio's stated for Kraken and Kraken+Oodle texture, the more consistent ones put Kraken+Oodle texture more in line with Sony's already advertised 1:1.64 as far as I can tell.

Most likely yes, im not convinced they where actually designing something not knowing anything about the different compression ratios.
 
Zlib is apparently single threaded, and Kraken is 2 threads max.
I think what matters if its random access. If the compression format isn't random access, it's pretty much pointless. You're going to retrieve the whole texture to display a fraction of it. It's much more efficient to send the portions you need to see only instead of sending the whole texture.

I'm fairly positive that zlib, kraken and BC Pack are all random access (for images). AFAIUnderstand How many threads isn't not as important on the GPU because you're going to assign the texture blocks accordingly so you can decompress all you need across how many available threads you have.
 
Last edited:
Yea, what's important is getting the level of throughput up to a reasonable baseline.
Even if it was slower than the dedicated blocks, it doesn't matter, as it has compute to spare where consoles don't.
If latency isn't as good, also doesn't matter as games need to be able to handle the variety of performance that pc's have, inc wide range of ssd speeds.

Wondered if it wasn't demoed due to direct storage not being 100% ready, or them not having drivers for it yet. Looks like sort of thing they would've demoed.
 
Are we sure that PS5/XSX decompression parts aren't built into the GPU/RDNA2?

We don't know what's in RDNA2, but it's definitely not what's in both consoles since they're using different approaches.


@Dictator using what compression formats and assuming what compression ratio? Is it counting with delta color compression on the GPU or not?

Those 14GB/s just seem like 2x the maximum raw throughput of a NVMe 4x PCIe 4.0 (7GB/s), but unless there's a lot more info other than that single slide (which ambiguously throws directstorage software optimizations into the mix), then those are some pretty hefty claims out of very little data.
 
We don't know what's in RDNA2, but it's definitely not what's in both consoles since they're using different approaches.



@Dictator using what compression formats and assuming what compression ratio? Is it counting with delta color compression on the GPU or not?

Those 14GB/s just seem like 2x the maximum raw throughput of a NVMe 4x PCIe 4.0 (7GB/s), but unless there's a lot more info other than that single slide (which ambiguously throws directstorage software optimizations into the mix), then those are some pretty hefty claims out of very little data.

Thanks.
 
well nvidia just announced hardware on their new cards to enhance data transfers
20200901172408.jpg
Didn't AMD have something like this for a long time now?
I forgot what it was called but when gpuopen had both the graphics side and the professional compute side, there was something that was about direct to gpu over the pcie bus.
All without cpu intervention... I'll see if I can track down the name...

edit - direct gma from AMD's firepro line. In case the link wasn't enough.

https://web.archive.org/web/20160318200424/http://gpuopen.com/compute-product/direct-gma/
 
Last edited:
The following quotes from the article really make it sound like you'll need an nvme that supports DirectStorage, and maybe other components as well.

"With a DirectStorage capable PC and a DirectStorage enabled game..."

"DirectStorage will be supported on certain systems with NVMe drives and work to bring your gaming experience to the next level..."

"With a supported NVMe drive and properly configured gaming machine ..."

"This process has already begun for DirectStorage and we’re working with our industry partners right now to finish designing/building the API and its supporting components..."

https://devblogs.microsoft.com/directx/directstorage-is-coming-to-pc/
 
If the Turing support for RTX IO is the same as Ampere's, then there's no dedicated hardware for data decompression on Ampere and both architectures are using just the shader ALUs.

While it may not be dedicated hardware, it isn't necessarily using general compute shader cores. Considering the limitation (Turing and Ampere) it's far more likely that it's doing it on the Tensor cores. Something that doesn't generally get fully utilized in games.

It'll certainly be interesting to see how AMD approaches this. Considering that MS was the key driver for having packed int4 and int8 added to the GPU in XBSX, I could see this potentially getting leveraged for DirectStorage if AMD decide to adopt packed int4 and int8 into RDNA2, assuming of course that NV's support is reliant on their Tensor cores.

Another possibility is the dual issue fp32. Considering that may or may not get leveraged much in games, it's possible that NV could use some of that capability for DirectStorage without hindering game performance.

Regards,
SB
 
That would be pretty interesting if Direct Storage were not only a new IO Standard to avoid Windows bottlenecks and serialisation, but also a standard compression Format from MS as well. It makes a lot of sense to me when you put it like that and also point out the 2:1 bit.

I was thinking in something like MKV for games. A package made of (code+shaders+audio/music+assets).
All of those assets compressed in the new format.
If your system has Direct Storage, the GPU is in charge, if not a classic installation is needed.
 
Back
Top