Next-Generation NVMe SSD and I/O Technology [PC, PS5, XBSX|S]

My theory is RTX I/O is GPUDirect Storage from 2019 that Nvidia have simply modified to work with Direct Storage.

And these slides are likely a half arsed and lazy attempt of Nvdia's to just re-use the old marketing material they made for GPUDirect Storage.

Again, backup your bold claims with evidence. Just calling it arsed and lazy attempts isnt all that of a good discussion.

I think at this juncture it's better to take this at face value and then consider how its achieved rather than dismiss it all as smoke and mirrors. Either that or don't involve oneself in the discussion until further info arises. Even if RTX IO turns out to be smoke and mirrors, the technical discussion favours the concept of working around Windows' present IO limitations. "Are nVidia doing it this way?" is a better question to ask, only to find out "they weren't doing that at all," then to ask, "is this all bullshit?" only to find out it wasn't. ;-)

@davis.anthony Lets take it like what Shifty implies, maybe they are lying, maybe they are not. The solution should be 'comparable to the PS5 IO', il take that at face value, if it werent true, they were lying.
 
We've seen one game with limited testing so it's very unclear exactly how much GPU and CPU performance is actually needed until the game is out and tested.

For GPU performance we've already been given information by Nvidia in that the performance hit from the GPU based decompression is barely measurable. So no issues there.

For CPU I agree, we need to wait for benchmarks but common sense should tell you that when you're removing over 80% of the decompression workload and significantly reducing the IO management workload, then the CPU is going to be significantly freed up

So as per above, why have Microsoft stated they are considering a hardware based solution?

Why did Nvidia release dedicated PhysX processors? Why did we still have standalone sound cards long after they were useful? Who knows what the business drivers are or even if it will ever actually materialise. My point isn't that there is zero benefit from a fully hardware based solution, but rather that my expectation is that the real world benefits will not be worth the increased cost and complexity.

Fixed function is far from being dead and if Microsoft are considering going that route they obviously feel it's worthwhile.

I didn't say it was dead, I said it was contrary to the general direction of the industry.

Capability already exists on certain GPU's.

RTX I/O requires an RTX GPU, plenty of people are yet to purchase one so do not have already have a capable GPU.

And you're locked out of RT until you buy an RT GPU, you're locked out of mesh shaders until you buy a mesh shader capable GPU.....

You're always locked out of something on PC until you upgrade.

This isn't remotely comparable. Any Shader Model 6 GPU will run the GPU decompression of Direct Storage. That's pretty much every modern GPU in every PC on the market today.

The hardware based CPU solution would be starting from zero, expecting people to buy a new and more expensive (than if it didn't have the hardware unit) CPU to carry out a function their existing GPU already handles perfectly fine. I expect little appetite for that from the market.

There's no confusion, they were pretty clear there are other CPU related bottlenecks that need to be addressed.

PS5 is possibly not the best machine to use to prove your point as it seems that the I/O complex deals with everything the CPU would normally do when used correctly.

No, it doesn't. The PS5 CPU still has to handle all the other activities that need to be done when loading a game. It is these "CPU bottlenecks" along with the decompression remaining on the CPU that the GDC talk is most likely referring to.

Based on what I've found that PCIEX block on the diagram is actually a completely separate PCIEX switch which has nothing to do with the CPU.

I don't think that needs to be assumed. While it likely was derived from a diagram where the block represented a separate switch, as others have pointed out, it could easily be used here to represent the data flowing through the PCIe root complex in the CPU but without any CPU intervention.

Having a PCIEX switch like in the diagram would actually be a very good solution.

You could have say an 8 lane PCIEX switch with a 4+6 set-up.

So you would connect your 8 lane NVME drive to this switch, it would send 4 lanes worth of bandwidth the CPU/RAM and 6 lanes worth of bandwidth to the GPU..... or the Switch could be configured to send varying levels of bandwidth to where it's needed, for example, if you're not gaming it could only give the GPU 1 lane and the OS the remaining 9.

It's not necessarily needed. If RTX-IO were somehow enabling P2P DMA through the root complex, the NVMe can use it's full bandwidth to transfer data to either direct to the GPU, or main memory, or both in any combination required.
 
Maybe Nvidia is privy to some knowledge that we aren't?

Anyway... no sense in fighting over a diagram which in my mind was pretty clearly just a repurpose of their GPUDirect diagrams to illustrate a basic idea of how data could be routed in the future utilizing RTX I/O. RTX I/O could allow for anything... however, in its current form it's entirely dependent on DirectStorage, and we know precisely how DirectStorage is going to work in the short term.. so as it stands I say just take it as a general diagram of the optimal flow of data that RTX I/O could allow.
 
Except those who store their steam library on a nas

I'm waiting until I can move to 10 or 25 Gbit networking (with a NVME based NAS) in my home before doing that for most of my games. Although that said, I do have smaller, less load intensive games on a NAS.

Although I guess you might just be storing them there, while I'm playing those less load intensive games directly off the NAS with the intention to move to most games being able to be played off of the NAS in the future with the NAS basically operating as my game drive.

Regards,
SB
 
As someone who does high performance storage as a component of their job, I still see an outstanding problem that's hinted at in this picture here:
geforce-rtx-30-series-rtx-io-announcing-rtx-io.jpg


Several of you in this thread have been picking on the storage being on the "other side" of the NIC in this diagram. By keeping the storage on the "other side" of a network link, we avoid all conversations about how a filesystem abstraction has to be handled. This diagram really, truly has no bearing on how a consumer PC is constructed and the complexity therein.

Think about this: every file in a modern consumer-facing file system is a collection of hundreds, thousands, even millions of individual data blocks all mapped together by some sort of file system bitmap or journal index or similar. Basically, there's a master table somewhere in the filesystem inner workings which translates the operating systems' request for a file into the related zillions of pointers linking to the literal blocks inside a logical partition scheme. That single file access may very well live on multiple partitions simultaneously (spanned volumes in Windows are thing that does occur) and those partitions may span multiple underlying storage devices. These partitions then map downwards again into the physical storage layer, which sometimes has their own set of pointer remappings into even lower level storage devices (eg a RAID card.)

We also must consider access and auditing controls built into modern filesystems. Are you permitted to read this file? If you do read this file, doesn't the last access time need to be updated in the file system metadata? Does the file access itself need to be logged for security reasons? Someone brought up disk encryption and TPM was hand-waved off into the solution, however TPM solutions presume whole disk encryption. As it turns out, file system encryption is very much a thing, and isn't linked to TPM-based whole disk encryption.

All this to say: a native PCI-E transaction from disk to memory only works if you can fully map every single one of those pointers from the parent file descriptor at the file system level into the literal discrete blocks of physical storage attached to the PCIE bus, only after determining you're allowed to make that read, possibly in parallel with still having to update the file system metadata and logging needs, and assuming a file system encryption scheme (read also: Microsoft EFS) isn't being used.

Flowing storage through a NIC as a data stream (presumably NVMEoF) completely removes all of this complexity.

Transferring raw block storage into memory pages is actually quite simple. Making a GPU call translate into a full-stack file system read access is not the same at all.
 
As someone who does high performance storage as a component of their job, I still see an outstanding problem that's hinted at in this picture here:
geforce-rtx-30-series-rtx-io-announcing-rtx-io.jpg


Several of you in this thread have been picking on the storage being on the "other side" of the NIC in this diagram. By keeping the storage on the "other side" of a network link, we avoid all conversations about how a filesystem abstraction has to be handled. This diagram really, truly has no bearing on how a consumer PC is constructed and the complexity therein.

Think about this: every file in a modern consumer-facing file system is a collection of hundreds, thousands, even millions of individual data blocks all mapped together by some sort of file system bitmap or journal index or similar. Basically, there's a master table somewhere in the filesystem inner workings which translates the operating systems' request for a file into the related zillions of pointers linking to the literal blocks inside a logical partition scheme. That single file access may very well live on multiple partitions simultaneously (spanned volumes in Windows are thing that does occur) and those partitions may span multiple underlying storage devices. These partitions then map downwards again into the physical storage layer, which sometimes has their own set of pointer remappings into even lower level storage devices (eg a RAID card.)

We also must consider access and auditing controls built into modern filesystems. Are you permitted to read this file? If you do read this file, doesn't the last access time need to be updated in the file system metadata? Does the file access itself need to be logged for security reasons? Someone brought up disk encryption and TPM was hand-waved off into the solution, however TPM solutions presume whole disk encryption. As it turns out, file system encryption is very much a thing, and isn't linked to TPM-based whole disk encryption.

All this to say: a native PCI-E transaction from disk to memory only works if you can fully map every single one of those pointers from the parent file descriptor at the file system level into the literal discrete blocks of physical storage attached to the PCIE bus, only after determining you're allowed to make that read, possibly in parallel with still having to update the file system metadata and logging needs, and assuming a file system encryption scheme (read also: Microsoft EFS) isn't being used.

Flowing storage through a NIC as a data stream (presumably NVMEoF) completely removes all of this complexity.

Transferring raw block storage into memory pages is actually quite simple. Making a GPU call translate into a full-stack file system read access is not the same at all.

I imagine that reads/writes through the file system is handled by DirectStorage and may not be a responsibility of RTX IO. However, GPUDirect makes access through the file system possible by enabling a distributed file system that runs in parallel to the OS managed system. GPUDirect has the cpu write commands to DMAs on the storage to drive data to and from the GPU. Nvidia states it minimizes interference with other commands that the CPU sends to the GPU.

https://on-demand.gputechconf.com/s...-to-gpu-memory-alleviating-io-bottlenecks.pdf

However, unless local game apps moving data through the cloud for rendering graphics actually becomes a thing, this isn’t all that relevant for consoles. But I can see a similar solution where apps that don’t need DS use the traditional file system and games using a DS system to create a low latency and more direct path from the SSD to the gpu.
 
I imagine that reads/writes through the file system is handled by DirectStorage and may not be a responsibility of RTX IO.
I agree. Allowing any part of the graphics subsystem (GPU hardware, on-board controller, driver) to work around the filesystem's security/permissions model is unthinkable.
 
All this to say: a native PCI-E transaction from disk to memory only works if you can fully map every single one of those pointers from the parent file descriptor at the file system level into the literal discrete blocks of physical storage attached to the PCIE bus, only after determining you're allowed to make that read, possibly in parallel with still having to update the file system metadata and logging needs, and assuming a file system encryption scheme (read also: Microsoft EFS) isn't being used.

Flowing storage through a NIC as a data stream (presumably NVMEoF) completely removes all of this complexity.

Transferring raw block storage into memory pages is actually quite simple. Making a GPU call translate into a full-stack file system read access is not the same at all.

Yes I'm glad you've raised this. I've been thinking about the issue of peer-to-peer NVMe -> GPU transfers for a work project. My current conclusion is that the simplest solution is to treat the NVMe drives as raw block devices and write my own very simple file system to keep track of what is where. Having a GPU understand XFS, ext4 or ZFS seems like a bit of a stretch.

I guess this sort of solution is also possible in a console, but on a general purpose computer it sounds ... complicated (and a potential security nightmare).
 
However, GPUDirect makes access through the file system possible by enabling a distributed file system that runs in parallel to the OS managed system.
What this means, said another way, is GPUDirect would require its own reserved storage space and methods of access, unrelated to the "user" file system. For a machine that already exists and didn't already have this reservation built in, this means a lot of funky partition management business will need to happen to shrink the existing partition(s) to then create a new and physically contiguous partition for use by this GPUDirect functionality.

I really can't see how Microsoft would push such a design into the commodity PC world. There's something else in this mix we aren't seeing yet...
 
What this means, said another way, is GPUDirect would require its own reserved storage space and methods of access, unrelated to the "user" file system.
If there is a Windows/NTFS mechanism for this, I've never used or heard of it. That may be why both Nvidia and AMD have experimented with attaching a SSD direct to the GPU to augment onboard memory,.
 
It wouldn't be NTFS, that's at least part of the point. "Regular" file systems don't worry about file packing methods on the physical media; whatever this new tech is probably needs to consider it differently.

Here's another characteristic of storage worth noting: modern physical storage devices are built in 4k blocks and have been for a decade or more. One of the reasons for this move was how difficult logical block addressing was getting; the old 512byte block sizes meant a single disk larger than 2TB needed internal LBA logic pointers bigger than 32 bits. Anyone who followed the tech at the time understood this obvious call out.

But why did 4KB make sense for a modern storage block size?

For multiple decades, operating systems have managed main memory in -- can you guess? -- 4KB chunks. Yet all modern operating systems offer another option called Huge Pages (Linux), Super Pages (BSD and MacOS) or Large Pages (Windows) to permit managing memory in far larger chunks, 2MB pages are the limit on BSD and Windows and a whopping 1GB page size is available in modern Linux distros. Aligning I/O request page sizes to memory page sizes is a significant efficiency play for "big IO" workloads and yet another example of how something like DirectIO / DirectStorage / GPUDirect / RTXIO needs even more non-trivial thought on how to accomplish the task set ahead of them.
 
It wouldn't be NTFS, that's at least part of the point. "Regular" file systems don't worry about file packing methods on the physical media; whatever this new tech is probably needs to consider it differently.
Ok, so not NTFS but my question remains. On mountable storage, Windows has ultimate authority about what processes access anything within the storage hierarchy. I am not aware of any Windows filesystem feature the allows Windows to release a portion of storage - bypassing the rest of the security/permissions model.

Windows is going to need to have access in case the storage area needs maintenance, plus games need to be installed in the first place. You may wish to adjust the size parameters of the partition, or back it up.
 
Ok, so not NTFS but my question remains. On mountable storage, Windows has ultimate authority about what processes access anything within the storage hierarchy. I am not aware of any Windows filesystem feature the allows Windows to release a portion of storage - bypassing the rest of the security/permissions model.
Yup. Which is why I said earlier:
There's something else in this mix we aren't seeing yet...
 
The loading results for Forespoken disagree with that statement.
The results showed 0.1s to 0.4s difference between DirectStorage and Win32.
They optimized the path but it is still being limited by CPU Decompression speeds... that is why you don't see impressive results.

I can be wrong but in Spider-man the main tasks load on CPU are the decompression, no?
 
DirectStorage right now is not that useful... they had to implement GPU decompression that is the main take of the API.
You have one point of reference thus far... wayyy to early to make a statement like that.

Not only that, but we've only really seen cherry-picked loading screen comparisons of Forspoken... we don't really know what CPU utilization is like during gameplay and streaming. Perhaps CPU utilization is lower, perhaps frametimes are more consistent, perhaps FPS are slightly higher?

We don't know enough yet.
 
DirectStorage right now is not that useful... they had to implement GPU decompression that is the main take of the API.

DirectStorage is still in its infancy and there are a lot of moving parts. Nvidia's RTX IO and AMD's SmartAccess Storage are suppose to plug into DirectStorage and provide the gpu decompression step. PCI-E Resizeable Bar (provides the cpu access to the full frame buffer on gpus) is another feature that DS will take advantage of but only a few devices support the feature. Only Zen 3, 10+ series Intel cpus, a handful of motherboards and 6000 series AMD/3000 series Nvidia gpus have explicit support for Resizeable Bar. Phison just announced its releasing new firmware that supports DirectStorage to improve performance on its SDDs.
 
Last edited:
Adoption isn't going to be speedy then. I wonder how that'll impact game design the rest of the generation when games are cross-plat, even Sony ones? Maybe the full power of fast SSD storage won't happen until next gen when it's a standard feature in devices people own?

Edit: Let's make this a bit clearer. What proportion of existing PC hardware will be able to use Direct Storage via just software updates?
 
Adoption isn't going to be speedy then. I wonder how that'll impact game design the rest of the generation when games are cross-plat, even Sony ones? Maybe the full power of fast SSD storage won't happen until next gen when it's a standard feature in devices people own?

Edit: Let's make this a bit clearer. What proportion of existing PC hardware will be able to use Direct Storage via just software updates?

Easiest option on PC is just to bump up RAM requirements, 32Gb instead of 16 and just try to cache as much in RAM as possible.

In theory Ratchet and Clank could work on PC if you have enough RAM to preload as many portal transitions as possible.

But that could also be a Direct Storage killer, some developers could just say 'Increase RAM requirements' rather then implementing Direct Storage which given the amount of gamers that don't have a Direct Storage compliant system (I don't) it might be a direction they go.
 
Back
Top