DirectStorage GPU Decompression, RTX IO, Smart Access Storage

For the record, Bitlocker enforcement (or the attempt) is where the mandate for TPM comes from in Win11.

The out of box experience wizard very well may squash the function due to the lack of MS sign on integration. I'm not sure of its linked to Hello per-se... I'd have to investigate.

The prior point still stands, raw DMA from storage block device to VRAM isn't something you can "just do" with modern file systems.
 
That diagram is a little misleading. In order to use BypassIO, you start with a standard file handle. Standard file handles are obtained by walking that entire stack, including Partition Magement, Volume Management, the file system, all the filter managers (eg antivirus, for example) and then the IO Manager. Once the handle is obtained, you can then leverage BypassIO to attempt to interact with it. There are a number of reasons why BypassIO can still fail to work, in which case Windows rolls back to using standard file IO for your file handle.

Notice what BypassIO cannot bypass: the filesystem, which is the bulk of the CPU work for transactional access. Gaining the file handle did the work to figure out which physical storage device, partition, and volume we're targeting, which are relatively simple memory page table lookups. The big CPU consumer still exists in the BypassIO chain pictured above: filesystem semantics such as metadata, permissions, compression, multi-user lock controls, and EFS.

BypassIO isn't the big performance driver in the new IO stack. Rather, the huge winner for disk performance is the new IORing, a pretty obvious ripoff of io_uring from Linux parlance. Good for Microsoft for finally ripping off what they couldn't figure out on their own (and I"m one of the regular forum Microsoft apologists...) It's a deeply threaded, asynchronous IO queue system which can group hundreds of thousands of I/Os into a single API submission. This is truly what allows app devs to completely pack the multiple NVMe disk I/O queues completely full, finally taking advantage a hardware capability which never existed in prior disk types.
 
I know things move slow, but I'm somewhat disappointed in the lack of communication from MS about future iterations and improvements in the pipeline for DirectStorage.

We know developers like Nixxes and SquareEnix have been implementing and testing DS for their games, and in the case of Nixxes, deciding to forego using DS due to current limitations. I wish we had some idea of where things are at.

FF16 I think is the latest game which uses DirectStorage and it does indeed load blisteringly fast.

 
Last edited:
3DMark has added a new DirectStorage benchmark.


Today we’re excited to launch the 3DMark DirectStorage feature test. This feature test is a free update for the 3DMark Storage Benchmark DLC.

The 3DMark DirectStorage feature test helps gamers understand the potential performance benefits that Microsoft’s DirectStorage technology could have for their PC’s gaming performance.

c32dc17a0275d8c50297a97ab6ffb29987a31b74.jpg


DirectStorage is a Microsoft technology for Windows PCs with PCIe SSDs that reduces the overhead when loading game data. DirectStorage can be used to reduce game loading times when paired with other technologies such as GDeflate, where the GPU can be used to decompress certain game assets instead of the CPU. On systems running Windows 11, DirectStorage can bring further benefits with BypassIO, lowering a game’s CPU overhead by reducing the CPU workload when transferring data.

It can be difficult to accurately measure the performance benefits of DirectStorage in a typical game scene, as the engine is performing many other tasks in addition to loading game assets, such as rendering geometry. As no game is the same it can be very difficult to measure the benefits DirectStorage has, as there are often many other factors in a game that limit its benefits.
This test simulates a near-best-case scenario for a DirectStorage implementation, where asset loading is not impacted by other variables such as the game’s asset management system or other tasks being performed by the GPU. This means you can see a demonstration of the near-maximum potential performance benefits enabling DirectStorage could have for a system.
71e048f47a26121715579753d68ea1434b8c7f8e.jpg

The DirectStorage feature test generates results showing the bandwidth differences when the DirectStorage API is used, compared to without.
 
Ran the test, here are my numbers. Mind you my drive is more than half full, I don't know if that will have had any affect.. but honestly, it's kind of a neat demo just by virtue of how it shows you each test and what part it's testing. From storage to RAM, then RAM over the PCIE to VRAM.. as well as DirectStorage done on the CPU vs Gdeflate on the GPU.

Screenshot-2024-12-04-163614.png



Of course, now that this test is out, I think that the real benchmarking that 3Dmark should be doing with regards to DirectStorage is a test showing just exactly how a highly demanding game asset streaming scenario would impact game performance on the GPU. That is the real issue. Bulk loading assets super quickly on a loading screen can of course scream as it uses the CPU or GPU full out.. but during streaming scenarios, the important bit is how it affects game performance.
 
The 3DMark DirectStorage feature test helps gamers understand the potential performance benefits that Microsoft’s DirectStorage technology could have for their PC’s gaming performance.

But this test does nothing of the sort? Benchmarkers might care about throughput in GB/s but that is meaningless for actual gamers and conveys nothing.

You'd need a test instead to measure something like actual load times, time to asset load in, or maybe conveying something like how many assets fully load in or the maximum speed one can traverse an environement without everything loading in.

Yes I have this general issue with people conflating "benchmarkers" and "gamers."
 
So only 11GB/s and 13GB/s from system ram to GPU memory a second!!
Thats a lot worse than my wide area guess, would have it at.
But most of my personal experience is only doing half of that job, ie. PCIE device to RAM or the other way.
Guess thats why GPU Direct / RDMA is a thing.

It would be interesting to know the top read speed of the SSD in each case too.
 
It would be interesting to know the top read speed of the SSD in each case too.
Samsung 980pro
- Read speed: 7000 MB/s
- Write speed: 5100 MB/s
- Random write (4KB): 1000000 IOPS
- Random read (4KB): 1000000 IOPS

WD Black SN850X
- Sequential Read: 1TB: 7,300MB/s
- Sequential Write: 1TB: 6,300MB/s
- Random read (4KB): 800000 IOPS
- Random write (4KB): 1100000 IOPS

PS: Little rant about NVME ssd's
It's getting a bit silly now
1733386876115.png
 
Seems like an interesting enough benchmark even if the results leave some room for interpretation.

980 Pro (CPU attached)
8v97yp2p.png


960 Pro (chipset attached)
obfz42jo.png


SATA TLC SSD (some Samsung enterprise one)
amddlois.png
 
Does DirectStorage support SSD's now?
It did not the first time I checked ("ACHI.sys not supported" I think the message was)
 
Back
Top