Blazing Fast NVMEs and Direct Storage API for PCs *spawn*

For the existing situation, one of the diagram is showing that the data is doing nvme drive =>ram=>cpu=>ram=>gpu.

I thought that data could do cpu=>gpu directly now ?
If I recall correctly, they said that they are looking into that for the future. It wasn't in the talk intself, but someone asked the question in chat as the presentation was going on, and they said they're working on getting it to that point.

In the talk Andrew Yeung stated that there is an upgrade path for DirectStorage, just as there is for D3D. It's something that will evolve and improve as the hardware support allows.
 
Isn't this exactly what AMD did with Radeon SSGs? To my understanding the GPUs communicated directly with the pair of SSDs via PCIe bridge chip without the roundtrip through system memory.
Whether it would be doable as "universal solution" which would work with every vendor is of course another matter.

Yes. If the SSD and the GPU are on the same PCIe controller, there should be no problem.
However, some systems (such as most consumer-oriented Intel systems) does not has enough PCIe lanes on the CPU, and most PCIe peripherals connects to a different chip (such as the PCH) and the GPU connects to the PCIe controller on the CPU. In theory, the PCH should be able to route the PCIe packets from the SSD to the GPU, but I don't know how well it works in practice.
 
one of the diagram is showing that the data is doing nvme drive =>ram=>cpu=>ram=>gpu.
The CPU is used for decompressing the data in System RAM (slide 13 above), that's why the data flows from System RAM to the CPU and back to System RAM.

Why not nvme => Gpu ?
This requires an update to WDDM driver model to support disk I/O operations. NVMe is a block I/O protocol, which uses LBA sector numbers to access disk data; actual flash memory addresses are only visible to the NVMe controller.

Isn't this exactly what AMD did with Radeon SSGs? To my understanding the GPUs communicated directly with the pair of SSDs via PCIe bridge chip
Yes, there were actually four NVMe disks onboard, but they are ony used to setup a software RAID disk in Windows 10, which could be used as a fast file cache with proprietary OpenGL/OpenCL/DirectX11 SSG extensions from the Radeon Pro SSG SDK, i.e. for a scratch disk in Adobe Premiere Pro etc. The Radeon Pro SSG card basically doubles as a 4-way NVMe PCIe card bundled with fast enterprise-grade Samsung SM961 disks, and the onboard Broadcom PEX8747 chip, which is a 5-port 48-lane PCIe 3.0 Switch, eliminates the need for a HEDT chipset that supports PCIe bifurcation. In theory this switch would also facilitate P2P DMA transfers between NVMe disk and the GPU, but I really doubt it was ever implemented in Linux drivers, and WDDM 2.x cannot process disk I/O at all.


As for the claimed direct access to the NVMe disk from the GPU, IMHO paging from local video memory is only possible to system memory. AFAIK the Vega chip simply does not include a host processor which could translate LBA sectors into memory addresses (and modern NVMe controllers require several embedded ARM Cortex CPU cores to perform this task effectively), so it still needs the CPU to manage disk I/O.
 
Last edited:
A question... Under the current model, the CPU copies data from storage into RAM, and then decompresses it, then copies the decompressed data to VRAM for use by the GPU. When we talk about saturating storage bandwidths, we're referring to the CPUs ability to copy AND decompress that data right?

So under the DirectStorage model, the CPU will again copy from storage into RAM, and the copy again the data destined for the GPU for decompression. How much resources does it take to simply copy that data into system memory? Is the CPU able to easily saturate Gen3/4 NVMe speeds just copying the data vs copying/decompressing?

Or is it really the improvements to the storage stack, such as bypassing file system overhead, as well as batching I/O requests and not requiring notification for all completed requests, that allows the CPU to copy that data faster?


DirectStorage seems like a pretty good step in the right direction on the PC side of things. Obviously not close to what the consoles have, but a workable solution for the short term until hardware can get to where it needs to be... as we know those things can take time on the PC side of things.

I'm just thinking that if you have a 7GB/s NVMe drive, and the CPU can saturate that bandwidth, up to 14GB/s given a 2:1 compression ratio... that means the CPU should be able to fill RAM up extremely quickly. And given that the texture/geometry data can remain compressed.. that essentially doubles the RAM capacity. So a lot of data can be put into RAM very VERY quickly.
From that point, you're still sending compressed data over to the GPU, so that means that you can send that data to the GPU even faster than you could before, and the GPU can decompress that data quicker than the CPU could, and at a more consistent rate.
 
Last edited:
Marvel announced Bravera SC5 enterprise-grade NVMe SSD controllers, MV-SS1331 (8-channel) and MV-SS1333 (16-channel), which support PCIe 5.0 x4
Up to 14 GByte/s read with 2M IOPS, 7 GByte/s write with 1M IOPS, ZNS (Zoned Namespaces)

https://www.marvell.com/products/ssd-controllers/mv-ss1331-1333.html
https://www.anandtech.com/show/16703/marvell-announces-first-pcie-50-nvme-ssd-controllers
https://www.tomshardware.com/news/marvell-announced-pcie-gen5-ssd-controllers

Such an architectural shift often means sacrificing flexibility, but Marvell doesn't expect that to be a problem thanks in large part to the Open Compute Project's Cloud SSD specifications. Those standards go beyond the NVMe spec and define which optional features should be implemented, plus target performance and power levels for different form factors. The Cloud SSD specs were initially a collaboration between Microsoft and Facebook but have caught on in the broader market and even have the support of traditional enterprise server vendors like Dell and HP.

I wonder if some of these features will be part of DirectStorage and if some features have already been incorporated into the XBS consoles as MS were integral in getting the specifications started or if these specifications have no application to the consumer IO space?

Regards,
SB
 
I wonder when will we see pcie gen 5 in consumer hw? Zen4 looks to be pcie4 in consumer context.

Without looking this up, my fuzzy memory wants to puke out a 2022 timeframe.
 
Recent Googleing hit from April 27, 2021 saying AMD Zen 4 is 2022 release but 2021 announcement. Maybe something changed in 1 month?

https://www.tweaktown.com/news/7210...ie-5-0-in-2022-but-intel-has-first/index.html
The upcoming Zen 4 micro architecture that AMD is aiming for a 2021 announcement and 2022 release, will pack support for both DDR5 and PCIe 5.0 support. AMD will be taking a big leap with Zen 4 as it will be using the 5nm node from TSMC, and then on top of that supporting both DDR5 and PCIe 5.0 stnadards.
 
Recent Googleing hit from April 27, 2021 saying AMD Zen 4 is 2022 release but 2021 announcement. Maybe something changed in 1 month?

https://www.tweaktown.com/news/7210...ie-5-0-in-2022-but-intel-has-first/index.html
The "Zen 4 is PCIe 4" rumors started because AM5 leak said PCIe 4. It's clear that "AM5 PCIe 4" is referring to some specific products, not the sockets capabilities, and nothing says those products would be Zen4. In fact the leaked roadmap going rounds has specified next gen Rembrandt CPUs w/ iGFX (since they dropped the term APU already) with PCIe4 for AM5, while all Zen4 parts say PCIe5
 
More than top speed, real iops under different workloads is more interesting. We'll have to wait for some benchs, as usual...
 
Recent Googleing hit from April 27, 2021 saying AMD Zen 4 is 2022 release but 2021 announcement. Maybe something changed in 1 month?

https://www.tweaktown.com/news/7210...ie-5-0-in-2022-but-intel-has-first/index.html

Maybe zen3+ with the new stacked cache chip is pushing zen4 consumer version further in the future? Datacenter zen4 should be coming out before consumer version.

edit. Rumor I read is consumer zen4 is still pcie4 for 2022. AMD will just add 4 more pcie4 lanes. So maybe it takes until 2023 to see pcie5 start to get real adoption in consumer space? https://www.gamersnexus.net/news-pc/3574-hw-news-supercomputer-mining-malware-ddr5-amd
 
I wonder how PCIe5 drives will impact GPU performance with DirectStorage? 28GB/s is a hell of a lot of real time decompressed output. Will we see some GPU's unable to keep up with these drives?
 
I wonder how PCIe5 drives will impact GPU performance with DirectStorage? 28GB/s is a hell of a lot of real time decompressed output. Will we see some GPU's unable to keep up with these drives?

My guess is in real life you won't hit this top speed for loading game assets, so it's a non issue. I've still a sata ssd for my games, and I never saw readings at 450-500mo/sec... Or very quick bursts, but you won't have like a 5-10 seconds read at top speed. Loading game assets is not always a single sequential read.
 
My guess is in real life you won't hit this top speed for loading game assets, so it's a non issue. I've still a sata ssd for my games, and I never saw readings at 450-500mo/sec... Or very quick bursts, but you won't have like a 5-10 seconds read at top speed. Loading game assets is not always a single sequential read.

That's why we need DirectStorage and games using it. Currently streaming speeds in games are limited by the slow api to access disk. Another upside of DirectStorage is decompression support in the api.

I'm starting to lean towards 2023 before pcie5 is integrated into consumer level motherboards and cpu's.
 
That's why we need DirectStorage and games using it. Currently streaming speeds in games are limited by the slow api to access disk. Another upside of DirectStorage is decompression support in the api.

I'm starting to lean towards 2023 before pcie5 is integrated into consumer level motherboards and cpu's.

PCIe5 is already confirmed for Alderlake in late 2021 isn't it?
 
PCIe5 is already confirmed for Alderlake in late 2021 isn't it?

Don't know. Intel is so behind at the moment I'm not spending time on following their stuff. Rumor I saw about zen4 was that for consumer they would add 4 pcie4 lanes and pcie5 would initially be server only thing. Zen4 seems to be late 2022 as zen3+ is coming out in q1/2022.

edit. Link to the zen4 rumor mill: https://wccftech.com/amd-zen-4-powe...-ddr5-5200-on-am5-lga1718-socket-2022-launch/ and https://wccftech.com/amd-raphael-ryzen-desktop-cpus-powered-zen-4-architecture-q4-2022-launch-rumor/
 
Last edited:
Don't know. Intel is so behind at the moment I'm not spending time on following their stuff. Rumor I saw about zen4 was that for consumer they would add 4 pcie4 lanes and pcie5 would initially be server only thing. Zen4 seems to be late 2022 as zen3+ is coming out in q1/2022.

edit. Link to the zen4 rumor mill: https://wccftech.com/amd-zen-4-powe...-ddr5-5200-on-am5-lga1718-socket-2022-launch/ and https://wccftech.com/amd-raphael-ryzen-desktop-cpus-powered-zen-4-architecture-q4-2022-launch-rumor/

Looks like AMD will be well behind Intel on that front then. That said, it looks like Alderlake will support a mix of PCIe4 and PCIe5 lanes on a 4:16 ratio which suggests that 5 is targeted at GPU's and 4 at NVMe's. Hopefully the allocation won't be that rigid.

https://www.guru3d.com/news-story/l...des-indicate-support-for-pcie-5-and-ddr5.html
 
Back
Top