Next-Generation NVMe SSD and I/O Technology [PC, PS5, XBSX|S]

If it breaks disk performance counters, then perhaps there's no weird read behavior in Forspoken to begin with?
That's what I was hoping for. My earlier testing (without BypassIO) had the game amassing 100GB+ of reads in the span of a few minutes even while standing still and I thought BypassIO might change that. But it doesn't seem to change anything judging by the drive's SMART info.
 
Compusemble put out a video testing DirectStorage 1.1 vs 1.2 and to say there's a big jump in bandwidth is an understatement lol... Those Gen5 drives now can really show some muscle.


And that's using "only" a 3080Ti. So a 4090 for example should go even faster.

Weird how the compression ratio is much better on the Gen 4 drive than the Gen 3 though (I'm ignoring the Gen 5 as that could be GPU limited and we dont know its official speed rating). Gen 3 is about 2.4:1 while the Gen 4 is about 3.1:1. I can only assume the real world throughput of the Gen 3 is much lower than its rated potential.
 
And that's using "only" a 3080Ti. So a 4090 for example should go even faster.

Weird how the compression ratio is much better on the Gen 4 drive than the Gen 3 though (I'm ignoring the Gen 5 as that could be GPU limited and we dont know its official speed rating). Gen 3 is about 2.4:1 while the Gen 4 is about 3.1:1. I can only assume the real world throughput of the Gen 3 is much lower than its rated potential.

I might be being a bit slow, but isn't the compression ratio the same for all of them?
 
I might be being a bit slow, but isn't the compression ratio the same for all of them?

Sorry yes, just poor phrasing on my part. I should have said that's what the compression ratio works out to if we assume each drive is operating at its maximum rated speed. So either the Gen 3 isn't, or some other issue with the API limits performance on those drives.

In fact I do seem to recall reading some time ago that DS is optimised or works best on Gen4. At the time I thought that was justba raw speed thing but maybe there's more to it?
 
Can this be actually useful on RTX cards with tensor cores? Just wondering...
it says this

"this algorithm utilizes the matrix multiplication methods, which are now accelerated by modern GPUs. According to the paper, this makes the NTC algorithm more practical and morecapable due to lower disk and memory constraints."

So the tensor cores should be able to be used for it, but it should work on gpus without tensor cores aswell right?

Will this help with vram use? like are the textures stored in vram compressed until used? This quote kind of makes it sound like it does but it's still vague.

"At the same time, our method allows for on-demand, real-time decompression with random access similar to block texture compression on GPUs. This extends our compression benefits all the way from disk storage to memory."
 
Random-Access Neural Compression of Material Textures, Vaidyanathan et al. 2023

Seems heavier than traditional filtering at this stage for on the fly decompression even on RTX4090, but sounds promising.

View attachment 8876

one of the co-authors

Promising and some very nice results. Probably not anything that will make it into a shipping game until all or at least the vast majority of video cards on the market can support it at good enough speeds.

It'll be interesting to see where they can make improvements to the decompression speed.

Regards,
SB
 
This could be the starting of the end of dedicated texture units, though I'm not sure how filtering will be handled and how efficient to do something like anisotropic filtering without dedicated texture units.
Or this could be incorporated into the texture units but I guess it'd be some kind of standardization before wide usage.
 
I'm getting a little pissed off with Microsoft over this now, where's the damn Direct Storage games at Microsoft?
Being so late on PC has likely held back this paradigm shift heavily. For all the years of PC gamers complaining about consoles holding things back, I think we're experiencing very much the opposite situation here again. Windows on PC is probably one of the most complex constructions ever made by mankind, and while I get how difficult this must make introducing DirectStorage, it's still ultimately on Microsoft's shoulders to facilitate this and make sure it doesn't hinder development.

But it has, and it's undoubtedly causing problems that otherwise might well be alleviated. And I think it's proper time for developers to really make SSD's a mandatory requirement. Not just some bullshit 'minimum stated' requirement but a straight up, "This game literally wont run if you try and run it on an HDD" level requirement. But I think many devs/pubs are likely afraid of what that might do to sales. They shouldn't, though. They have to be the ones who lead the way here, or the shift will never happen.
 
Being so late on PC has likely held back this paradigm shift heavily. For all the years of PC gamers complaining about consoles holding things back, I think we're experiencing very much the opposite situation here again. Windows on PC is probably one of the most complex constructions ever made by mankind, and while I get how difficult this must make introducing DirectStorage, it's still ultimately on Microsoft's shoulders to facilitate this and make sure it doesn't hinder development.

But it has, and it's undoubtedly causing problems that otherwise might well be alleviated. And I think it's proper time for developers to really make SSD's a mandatory requirement. Not just some bullshit 'minimum stated' requirement but a straight up, "This game literally wont run if you try and run it on an HDD" level requirement. But I think many devs/pubs are likely afraid of what that might do to sales. They shouldn't, though. They have to be the ones who lead the way here, or the shift will never happen.

There's been no paradigm shift because there has been little incentive for developers to implement any of this stuff thus far. It's not Microsoft's fault. Developers haven't been altering their engines to render more efficiently with DX12... nor have they been altering their engines to load more efficiently either.

You've actually just outlined why consoles have indeed been holding things back... This is the first generation they've had SSDs to which they could design around from the start. Almost NO games thus far have done so because they're also cross-gen games. The only studios thus far which have really done anything that we know about to alter their engines for next gen loading paradigms are Insomniac, and SquareEnix with their proprietary Luminous engine. And even in that case we can easily see that the majority of these lightning fast speeds come from designing the game to load fast and simply having an SSD of any type.

From what we know, there's significant engine work that needs to be done to remove a lot of other bottlenecks which are currently preventing games from being able to utilize and rely on the speeds of the SSDs. Insomniac have been working on that, and others are surely coming.. but it was never going to happen en masse until these old cross-gen games go away for good.

I blame MS for their OWN developers not having pushed the tech so far... but not the wider industry. That will come when the consoles are finally ready to leave last gen fully behind. Hopefully this year and forward.
 
There's been no paradigm shift because there has been little incentive for developers to implement any of this stuff thus far. It's not Microsoft's fault. Developers haven't been altering their engines to render more efficiently with DX12... nor have they been altering their engines to load more efficiently either.

You've actually just outlined why consoles have indeed been holding things back... This is the first generation they've had SSDs to which they could design around from the start. Almost NO games thus far have done so because they're also cross-gen games. The only studios thus far which have really done anything that we know about to alter their engines for next gen loading paradigms are Insomniac, and SquareEnix with their proprietary Luminous engine. And even in that case we can easily see that the majority of these lightning fast speeds come from designing the game to load fast and simply having an SSD of any type.

From what we know, there's significant engine work that needs to be done to remove a lot of other bottlenecks which are currently preventing games from being able to utilize and rely on the speeds of the SSDs. Insomniac have been working on that, and others are surely coming.. but it was never going to happen en masse until these old cross-gen games go away for good.

I blame MS for their OWN developers not having pushed the tech so far... but not the wider industry. That will come when the consoles are finally ready to leave last gen fully behind. Hopefully this year and forward.

The technology needed to be ready and finished for devs to take into account with PC.


The same Nvidia guy told the first version of Direct Storage wasn't good and talk about usin IOring for improvement and it is now used. Things take time.
 
The technology needed to be ready and finished for devs to take into account with PC.


The same Nvidia guy told the first version of Direct Storage wasn't good and talk about usin IOring for improvement and it is now used. Things take time.
I had read that blog a while back, and by all measures DirectStorage is and always was going to be a "poor" implementation compared to what it is, and can be on console. There's nothing new about that statement. Especially for performance and optimization oriented programmers like Sherief. We already knew all of that. It's a stop gap measure to essentially make the relatively extremely inefficient storage to VRAM path as efficient as it could be until actual hardware based implementations can be created, and adopted by the wider market.

-Data still has to go to and be copied from system memory and create multiple copies
-Decompression still being done on the CPU (1.0) taking CPU cycles
or
-Decompression has to be done on the GPU (1.1) taking GPU cycles

These are all "poor" realities for performance seeking developers, currently. He also talks about how some control is abstracted away from developers with DirectStorage much like DX11 is compared to DX12. These guys want the most control as possible... but that's not always the current goal. I saw that he had mentioned some issues in the git and that they already implemented something that would solve his issue. Of course improvements take time... but improvements also don't happen until developers can get their hands on the code.

It's not that is isn't good... it's just that it's not good enough yet ;)

But I always thought this paragraph was interesting:

Let’s say I wanted to make the best out of this situation - I’ll manage my own upload heap, use my own CPU decompression and even write my own compute-based GPU decompression shaders for my own GPU-friendly compression format. In this case I just want DirectStorage to move compressed data from the SSD to either system memory, where I’ll run my decompression code and write the output to an upload heap and be responsible for submitting the copy command lists to do the actual upload, or move compressed data from the SSD to a video memory UAV and then once again I’m responsible for scheduling my own decompression work. The latter case works, and you can wait on a GPU fence for DirectStorage IO to complete (although it must be noted that the direct NVMe to video memory path is not currently implemented AFAIK, you can code today assuming it is and it will automagically work once the implementation is live, so zero issues there).

Here he is essentially saying that while not currently implemented in DirectStorage, you can write code which will allow for direct from NVMe to VRAM, which will work when the implementation finally goes live.. I thought some people here had said this wasn't going to be possible.. but it sounds like he either knows of a way it can work, or knows that it is being worked on.

Since that's an older blog post, perhaps things have changed since then... or perhaps not? I guess we'll just have to wait and see. But yes, by all measures DirectStorage for PC is a poor implementation of the console API. It wasn't ever going to be as efficient. But we do know that it will become more efficient in the future through dedicated silicon, and likely direct storage to VRAM access.
 
I am still shocked that AMD haven't announced a CPU that contains PS5's I/O complex.

An NvME directly connected to an I/O complex in a Ryzen CPU and that I/O complex has separate data paths directly to VRAM and RAM.

I honestly thought that was the way we would be going on PC.
 
I am still shocked that AMD haven't announced a CPU that contains PS5's I/O complex.

An NvME directly connected to an I/O complex in a Ryzen CPU and that I/O complex has separate data paths directly to VRAM and RAM.

I honestly thought that was the way we would be going on PC.

Modern AMD CPUs can already do all of that on the hardware side. The only major unique aspect of the PS5 IO complex is the hardware decompressor which wouldn't make sense on the PC.
 
Modern AMD CPUs can already do all of that on the hardware side. The only major unique aspect of the PS5 IO complex is the hardware decompressor which wouldn't make sense on the PC.

Considering how much the CPU gets hammered in TLOU and Spiderman I think it would be very beneficial on PC.

Also having it built in to the CPU would mean every piece of software could take advantage of it and not just games.
 
Back
Top