Next-Generation NVMe SSD and I/O Technology [PC, PS5, XBSX|S]

So I was surprised by the resultats given i've only a sata ssd for gaming right now (Samsung 850 Evo 1Tb, and 3090 on a pci gen3 slot)

bench1.jpg




Then I remembered I have primocache activated on this drive. So, without it (I see the drive reading at full speed during loading now) :

bench2wopc.jpg



The GPU is working a lot during the loading btw. I wonder if badly tuned engine could kill the framerate during loading of assets because of the spike in gpu usage....

Interesting that you're hitting an almost 4:1 compression ratio there. The other tests on faster drives seem to be more in the 3:1 range so I wonder if the GPU is actually a bottleneck on those drives. Certain your drive speed should be much less taxing on the GPU.
 
So I did another test (with primocache on so the bandwitdh is not limited by my satasrive). I have a lower cpu usage that the last one. The only diff is I directly run the bulkloaddemo.exe , instead of runing it via a cmd.

About the gpu usage, I indicated a spike at 80%. Well, now it's around 54 during the loading, but it's in fact what task manager called "Copy" engine doing the spike. The "3d" engine is pretty low.




bench2.jpg



EDIT : Using primocahe, the data is coming from the RAM, not the drive, I don't know if it screw with DirectStorage workflow a lot or not.
 
Last edited:
About the gpu usage, I indicated a spike at 80%. Well, now it's around 54 during the loading, but it's in fact what task manager called "Copy" engine doing the spike. The "3d" engine is pretty low.

Confirms NV’s neglible impact statement. Nice testing and benchmarks, its intresting. Now we see with Forspoken soon what that will do.
 
Using primocahe, the data is coming from the RAM, not the drive, I don't know if it screw with DirectStorage workflow a lot or not.
That's correct.

GPU decompression just means the CPU sends the still compressed data from RAM to the GPU for decompression rather than sending them uncompressed to the GPU.
 
That's correct.

GPU decompression just means the CPU sends the still compressed data from RAM to the GPU for decompression rather than sending them uncompressed to the GPU.

Oh ok. I was under the impression that it could be storage => gpu vram, without a passage by the ram first.
 
Oh ok. I was under the impression that it could be storage => gpu vram, without a passage by the ram first.

It basically is.
  • Without caching it's effectively drive -> GPU. Technically it bounces off the CPU as the PCIE channels are routed through the CPU with a tiny bit of CPU overhead, but it shouldn't touch system RAM.
  • With caching, as long as the data is in the RAM cache, it'll be RAM -> GPU instead.
Since Primocache inserts itself into the I/O flow either bypassing or superseeding Windows Caching (I'm not about to go digging to see exactly how Primocache works), then the benchmarking application isn't able to disable it on its own as it does with Windows Caching.

The user can disable Windows write-behind caching (which doesn't help in this scenario), but users cannot disable Windows I/O cache, only the application can do this through API calls which tells the loader not to cache a particular read. This is rarely used outside of filesystem copying operations because there is no need to cache files being copied.

If there is some way to do this, I'd be interested to see it documented!

Yeah, I was aware of the user being able to disable write-behind cache but while I knew that an application developer can explicitly disable Windows caching (IE - storage benchmark applications), I wasn't sure about whether or not a user could do something like that at the user level. Thus I just kind of left that caveat in there.

I'm pretty sure that some people I knew that complained about Windows "using all of their memory" mentioned installing or changing something so Windows wouldn't do that (basically caching of frequently used or recently used data), so there's likely something that someone could do if they REALLY wanted to. Although I never really understood why people would want to do that and thus slow down how Windows operates. :p So I never looked into it.

Regards,
SB
 
Last edited:
It basically is.
  • Without caching it's effectively drive -> GPU. Technically it bounces off the CPU as the PCIE channels are routed through the CPU with a tiny bit of CPU overhead, but it shouldn't touch system RAM.
  • With caching, as long as the data is in the RAM cache, it'll be RAM -> GPU instead.
Since Primocache inserts itself into the I/O flow either bypassing or superseeding Windows Caching (I'm not about to go digging to see exactly how Primocache works), then the benchmarking application isn't able to disable it on its own as it does with Windows Caching.



Yeah, I was aware of the user being able to disable write-behind cache but while I knew that an application developer can explicitly disable Windows caching (IE - storage benchmark applications), I wasn't sure about whether or not a user could do something like that at the user level. Thus I just kind of left that caveat in there.

I'm pretty sure that some people I knew that complained about Windows "using all of their memory" mentioned installing or changing something so Windows wouldn't do that (basically caching of frequently used or recently used data), so there's likely something that someone could do if they REALLY wanted to. Although I never really understood why people would want to do that and thus slow down how Windows operates. :p So I never looked into it.

Regards,
SB

Direct Storage still copies data into a staging buffer in system memory as standard:

1672122904192.png
Only AMD's Direct Access Storage has claimed so far to completely bypass System memory. Although we have yet to see it in action.
 
I'm pretty sure that some people I knew that complained about Windows "using all of their memory" mentioned installing or changing something so Windows wouldn't do that (basically caching of frequently used or recently used data), so there's likely something that someone could do if they REALLY wanted to. Although I never really understood why people would want to do that and thus slow down how Windows operates. :p So I never looked into it.

Since Windows Vista, Task Manager does not show RAM used by the file system cache, i.e. it's not reported as part of "Cached Value" which includes OS and apps. The file system cache is the most ephemeral of internal pools and is considered available at all times.

You can change its maximum size (which requires a restart) but that's about all the control or visibility you have! You really don't don't to mess about with it, Windows doesn't a pretty job or just using 'spare' RAM intelligently. If determines you might need data again and you have free space to hold in RAM, it does that. If something needs that RAM, it can use it. You lose nothing by letting Windows use RAM that isn't otherwise being used.
 
PS5 SSD tech showcase = Fully fledged high fidelity AAA game with ray tracing at 80+ frames that showcases the actual usecase of the said tech by paving the way for blazingly fast world to world transition through portals, merely half a year after the release of the console.

DirectStorage = Announced since 3 years, no games to speak of, no showcase to speak of. Oh, here's a wonky demo. Here's what you get: Some pears and numbers.

Whenever I see those pears I get laugh. LOL
 
PS5 SSD tech showcase = Fully fledged high fidelity AAA game with ray tracing at 80+ frames that showcases the actual usecase of the said tech by paving the way for blazingly fast world to world transition through portals, merely half a year after the release of the console.

DirectStorage = Announced since 3 years, no games to speak of, no showcase to speak of. Oh, here's a wonky demo. Here's what you get: Some pears and numbers.

Whenever I see those pears I get laugh. LOL

You are being disingenuous
 
PS5 SSD tech showcase = Fully fledged high fidelity AAA game with ray tracing at 80+ frames that showcases the actual usecase of the said tech by paving the way for blazingly fast world to world transition through portals, merely half a year after the release of the console.

DirectStorage = Announced since 3 years, no games to speak of, no showcase to speak of.
What's the purpose of this post? How does it contribute to the technical discussion? What exactly makes you laugh? Is this a commentary on the state of DirectStorage, the fact it's not going to happen, a commentary on the PC software landscape where new features aren't readily adopted, an observation about a weakness in MS's game studios that they aren't pioneering software features to lead an example for other devs? What?? ¯\_(ツ)_/¯

You are being disingenuous
Indeed. Those poor avocados. 😥
 
PS5 SSD tech showcase = Fully fledged high fidelity AAA game with ray tracing at 80+ frames that showcases the actual usecase of the said tech by paving the way for blazingly fast world to world transition through portals, merely half a year after the release of the console.

DirectStorage = Announced since 3 years, no games to speak of, no showcase to speak of. Oh, here's a wonky demo. Here's what you get: Some pears and numbers.

Whenever I see those pears I get laugh. LOL
I gotta admit I laughed at this. Although I disagree that Ratchet is the SSD technical showcase that people think it is... (DF proved it works on a bog standard drive) ... there IS a hint of truth to what you're saying overall.

DirectStorage has been an entire boatload of nothing up to this point... and yea, demo's of spinning planets and avocados is all we really have at this point... but it's coming it's coming 😄
 
Have we had confirmation that Halo Infinite doesn't use SFS?

Playing it just and the textures seem to blend in really smoothly, like smoother than I've ever seen before.
 
Have we had confirmation that Halo Infinite doesn't use SFS?

Playing it just and the textures seem to blend in really smoothly, like smoother than I've ever seen before.
Yes. Because VRAM usage between RDNA1 and Turing is identical, so this game does definately not use SFS.
 
PS5 SSD tech showcase = Fully fledged high fidelity AAA game with ray tracing at 80+ frames that showcases the actual usecase of the said tech by paving the way for blazingly fast world to world transition through portals, merely half a year after the release of the console.

DirectStorage = Announced since 3 years, no games to speak of, no showcase to speak of. Oh, here's a wonky demo. Here's what you get: Some pears and numbers.

Whenever I see those pears I get laugh. LOL

That's a pretty strange take. Of course a new technology is going to filter down into games faster on console than it will on PC but I don't see why that's surprising.

80% of the PS5's IO is just a fast NVMe drive/bus, another 10% is the hardware decompressor and the last 10% is the API/custom BIOS.

But none of it means much without software explicitly designed to take advantage of those features.

PC's have had the first 80% for a long time. DirectStorage probably delivers about half of the last 20% including GPU compression and yes we might be waiting a while to see the full fruits of that, but PC's already enjoy the vast majority of the fast NVMe/bus benefits that the PS5 introduced to consoles, and has done for some time. This is aptly demonstrated by the numerous loading time comparisons which show even PCIe3.0 PC's keeping pace with (or not far behind) the PS5 while being far ahead of the HDD based last gen consoles... all without DirectStorage.
 
80% of the PS5's IO is just a fast NVMe drive/bus, another 10% is the hardware decompressor and the last 10% is the API/custom BIOS.

But none of it means much without software explicitly designed to take advantage of those features.

PC's have had the first 80% for a long time.

I disagree with those numbers.

PS5's NVME is actually nothing special, the secret sauceis the hardware decompressor and other I/O units along with the API.
 
I disagree with those numbers.

PS5's NVME is actually nothing special, the secret sauceis the hardware decompressor and other I/O units along with the API.

I didn't mean to suggest the PS5's "secret sauce" was the NVMe, I meant it's NVMe was 80% (a finger in the air estimate) of what makes it's IO impressive as compared with the last gen consoles.

It can be argued that the PS5's "secret sauce" (compared to PC) is indeed the decompression unit + API if we define "secret sauce" as what gets that last 10% of IO performance vs a fully DirectStorage enabled PC or the last 20% vs a PC/game combination without DirectStorage (assuming similar hardware) but I'm not sure that's going to lead to any record braking performance in the face of PC hardware that's potentially twice as fast.

DirectStorage is obviously great for the PC, but when looking at the scale of IO performance from last gen consoles to ultra high end PC's, its really only adding very marginal IO performance benefits. The big impact will come from CPU cycle savings where high end PC's aren't going to gain that much relative to consoles because they already have the CPU cycles to spare.
 
PS5 SSD tech showcase = Fully fledged high fidelity AAA game with ray tracing at 80+ frames that showcases the actual usecase of the said tech by paving the way for blazingly fast world to world transition through portals, merely half a year after the release of the console.

DirectStorage = Announced since 3 years, no games to speak of, no showcase to speak of. Oh, here's a wonky demo. Here's what you get: Some pears and numbers.

Whenever I see those pears I get laugh. LOL

You mean like Prey was doing back in 2006? :p Of course, that's when it finally released. It showcased those portals back in 1997. :p You know going instantaneously between worlds instead of just rooms like the game Portal from 2007.

Sure, they didn't leverage it in quite the way that R&C did, but R&C isn't really doing anything with portals that wasn't done in games in the past other than using fast storage to stream in data rather than streaming it in from RAM. Hell, Prey also didn't use them as well in 2006 as they originally demo'd in 1997/98 because they had a hard time making compelling gameplay that relied on portals. Portal in 2007 finally showcased a really good gameplay mechanic that relied on portals and incorporated them into more fluid gameplay.

Even Tim Sweeney toyed with the idea of using portals in online connected "worlds" for a game in the Unreal Tournament series back in the late 90's to create this massive multiplayer game where you can chase other players through portals.

So, don't get me wrong, it's impressive that it's streaming it from SSD ... it's just that other than that it's nothing new and nothing that couldn't have been done with just ... more RAM.

But that's kind of the point. Consoles need super fast storage because they can't afford more RAM. So, it's that constraint that combined with advances in storage technology and speed has allowed consoles to leverage fast SSD (latency arguably being more valuable than bandwidth) to shift some some memory operations (ones that don't require fast memory access) to storage.

Regards,
SB
 
Last edited:
Back
Top