Next-Generation NVMe SSD and I/O Technology [PC, PS5, XBSX|S]

so i run several benchmarks, DirectStroge on cpu, gpu, via sata ssd, sata hdd, nvme drive hooked up directly to the cpu, through b550 chipset et al.

5950X@CO -30@ALL CORES, 4*8GB 3600MHz CL14,
Palit RTX 4090 GameRock OC @2745MHz/24000MHz, 16*PCIe 4.0
Win11 Home 22H2, Driver 528.02,

DS @CPU, Samsung 970PRO 512GB, 4*PCIe 3.0, Directly to CPU
bulkloaddemo_2023_01_z0cst.png


DS @GPU, Samsung 970PRO 512GB, 4*PCIe 3.0, Directly to CPU
bulkloaddemo_2023_01_l2drl.png


DS @GPU, HikVision 2TB, 4*PCIe 3.0, Through B550 Chipset
bulkloaddemo_2023_01_6ai9r.png


DS @GPU, Samsung 850EVO 256GB, SATA 3.0, Through B550 Chipset
bulkloaddemo_2023_01_the3u.png


DS @GPU, Western Digital Blue 4TB, SATA 3.0, Through B550 Chipset
bulkloaddemo_2023_01_99cvi.png

The HDD score doesn't make much sense there. Its way too fast. I wonder if its using a cache somewhere?

The SATA SSD score with CPU might be interesting although I suspect your CPU will already max out the SATA.
 
The HDD score doesn't make much sense there. Its way too fast. I wonder if its using a cache somewhere?

The SATA SSD score with CPU might be interesting although I suspect your CPU will already max out the SATA.
I double and triple checked it is what it is. I can see its reading speed locked @550mb/s on hwinfo64. The HDD has 64MB cache, probably it's in burst mode or something. .exe file is smaller than 32MB.

People at 3DCenter are getting over 20GB/s with fast PCIe 4.0 drives
 
Last edited:
I double and triple checked it is what it is. I can see its reading speed locked @550mb/s on hwinfo64. The HDD has 64MB cache, probably it's in burst mode or something. .exe file is smaller than 32MB.

People at 3DCenter are getting over 20GB/s with fast PCIe 4.0 drives

Yeah it must be using the solid state cache memory. HDDs top out at around 120-150MB/s.
 
@Silent_Buddha @DSoup

I made a comment about the Bulkloaddemo first run being fast and subsequent runs being slower than the initial run by proclaiming that perhaps there's an issue with how they calculate the bandwidth on the first run.. but as it turns out, a member on a different forum informed me that I was wrong and that it IS accurate.. and what's actually happening is that the initial run has the GPU going full out and the subsequent runs utilize a lower clock speed. They told me to go into my drivers, add a new profile for the demo, and set it to "prefer maximum power" and that it will maintain that high throughput and high clock speeds..

And so I did, and yep, they were right. It now maintains that ~20GB/s bandwidth across all runs.

demotest.png


Damn, that's quite the rookie mistake, but I guess I just never bothered to check the clocks and always just assumed it was an issue with how it initialized and loaded that first run.

Which also means that what I said about Compusemble was completely wrong and off base. Their results are accurate.

Just thought I'd clear that up now that I know better. Of course that doesn't mean any potential reasonings any of us gave weren't valid.. just that in this specific case, it was wrong.
 
Scratch that I got it working :rolleyes:

Observations

When the demo first starts the GPU load goes to 80-90% during the load and then down to 19-20% once the load is complete, then back to 80-90% on the reload and so on and so forth.

After 5-6 reloads the GPU use stays at 19-20%, even during the loading.

Then after 5-6 reloads the GPU use on the reload goes back up to 80-90% again for another 5-6 reloads and then back to staying at 19-20% and so on and so forth.

You can see that behaviour in Afterburner and GPU-Z sensors.

The lowest and highest GPU clocks can be seen in MSI Afterburner read out.

I reduced the GPU core clock offset by -502Mhz (GPU clock was now 1450Mhz) and it made zero difference to the speed of the loading or the maximum GPU use, I also reduced the VRAM offset by -502Mhz (GPU bandwidth went from 448GB/s to 416GB/s) and it also made zero difference.

I might try and capture a frame later and stick it in some frame analysing software to see if I can see exactly how much GPU time it's taking .


Max power.png
 
Scratch that I got it working :rolleyes:

Observations

When the demo first starts the GPU load goes to 80-90% during the load and then down to 19-20% once the load is complete, then back to 80-90% on the reload and so on and so forth.

After 5-6 reloads the GPU use stays at 19-20%, even during the loading.

Then after 5-6 reloads the GPU use on the reload goes back up to 80-90% again for another 5-6 reloads and then back to staying at 19-20% and so on and so forth.

You can see that behaviour in Afterburner and GPU-Z sensors.

The lowest and highest GPU clocks can be seen in MSI Afterburner read out.

I reduced the GPU core clock offset by -502Mhz (GPU clock was now 1450Mhz) and it made zero difference to the speed of the loading or the maximum GPU use, I also reduced the VRAM offset by -502Mhz (GPU bandwidth went from 448GB/s to 416GB/s) and it also made zero difference.

I might try and capture a frame later and stick it in some frame analysing software to see if I can see exactly how much GPU time it's taking .
Yours is working how you would expect it to, Some gpus are not boosting back up like yours after the initial load by the looks, what gpu are you using? I get an initial boost to 1935mhz and 17GB/s on the first load then it bounces around between 720mhz to 810mhz at around 10GB/s until the program is exited and reloaded. With the max performance change it sits at 1900mhz with 17GB/s each load with usage maxxing at 45% during reloads and 20% to 30% use between. At ~800mhz the use hits 85% during loads and 40% to 50% between. That's on a 3080.

edit: nvm can see the 3060ti label on gpuz
 
Last edited:
Direct storage is intended to make loading times in PC games a thing of the past and enable larger, more detailed game worlds. For this purpose, access to the data storage is modernized - with impressive results in the first PCGH benchmarks with Intel Arc, AMD Radeon and Nvidia Geforce.
So are SSDs overrated for game design or not? I feel like I'll either hear they will barely do anything outside of faster loading screens and everything can already be done already, or they are the second coming.

From what direct storage seems to imply it's what people were initially excited about with the new consoles nvme storage drives.

Allowing for larger worlds that are more detailed as well as eliminate loading in many cases. Now the pc will have the same streamlined approach leading to total market saturation and devs more likely to be more daring about including this stuff in multiplat releases maybe?

Yeah it must be using the solid state cache memory. HDDs top out at around 120-150MB/s.
What was the projection of the speed for the old 5400rpm drive of PS4 and Xbox one? Slower than the maximum?
 
Hdd can be a little bit faster than that now. My 8tb usb 8 western digital can read and write around 180/200mo/s pretty big files (so no cache screwing up things)...

Of course access time is still bad so for gaming it's not a solution anymore.
 
So I tried to find the GPU time that Direct Storage takes on my GPU using a frame capture but while I can find the Direct Storage commands and tasks contained in several frames I can't find the compute work within any of the frames to work out how much GPU time it's taking for the decompression.
 
What was the projection of the speed for the old 5400rpm drive of PS4 and Xbox one? Slower than the maximum?

in Insomniac's Spider-Man tech post-mortem, they said the PS4 stock drive was good for about 50 megabytes/sec, but they knew that some users had 'upgraded' to larger capacity drives which ran slower, so they built the game's entire streaming solution to work at 20 megabytes/sec.
 
Had a bit of time to kill so decided to run it and let it do 12 refreshes with gpuz running. With gpuz running I was seeing the odd spike to about 980mhz, you can see them in the graph but most of the time it was the normal around ~800mhz. You can see the only time it hit max clocks was right at the start.

bulkload.gif
 
in Insomniac's Spider-Man tech post-mortem, they said the PS4 stock drive was good for about 50 megabytes/sec, but they knew that some users had 'upgraded' to larger capacity drives which ran slower, so they built the game's entire streaming solution to work at 20 megabytes/sec.
Sata alone is 10 times the optimistic speed and almost 25 times the minimum....

And people say SSD in the consoles isnt revolutionary for game design?!?!?
 
And people say SSD in the consoles isnt revolutionary for game design?!?!?

If you watch the whole Spider-Man post-mortem they talk about the impact of the limitations of the I/O, including the things they couldn't do and the all the things they had to do to make the game work, including spending effort to carefully design the city and restricting the diversity and quality of assets.

Not having to jump through hoops is genuinely liberating because instead of spending time making the game not fall apart, you can make it better. You can make games better in the same time, or ship more games in less time. That is not to be under-estimated.
 
Last edited by a moderator:
If you watch the whole Spider-Man post-mortem they talk about the impact of the limitations of the I/O, including the things they couldn't do and the all the things they had to do to make the game work, including spending effort to carefully design the city and restricting the diversity and quality of assets.

Not having to jump through hoops is genuinely liberating because instead of spending time making the game not fall apart, you can make it better. You can make games better in the same time, or ship more games in less time. That is not to be under-estimated.
It's always been my opinion that this was the main aim of the PS5's 'overkill' SSD specs. It wasn't because developers would necessarily need a constant 5GB/s, let alone 15GB/s+, of streaming data or anything, it was simply to help take the concern about this sort of thing off the table. Make their lives easier so developers could have a more optimized game with much less effort. Saves time and hassle.
 
It's always been my opinion that this was the main aim of the PS5's 'overkill' SSD specs. It wasn't because developers would necessarily need a constant 5GB/s, let alone 15GB/s+, of streaming data or anything, it was simply to help take the concern about this sort of thing off the table. Make their lives easier so developers could have a more optimized game with much less effort. Saves time and hassle.
I agree. In the first few minutes of Mark Cerny's Road to PS5 video, he revisits his 'time-to-triangle' model, i.e. the ease with which developers can make immediate progress on the development of games and he compares PS4 (which was on par with the original PlayStation) and PS5.

Screenshot 2023-01-21 at 09.29.55-2.jpg

PS5 was all about making things as easy for developers as possible.
 
Last edited by a moderator:
We kind of discussed this way back in this thread but 5 GB/s or 15 GB/s might sound like a lot from a constant through put over time perspective but it's only ~85MB and ~255MB within a single frame (0.01667s) at 60 fps respectively.
While true...it's still multiple orders of magnitude better than what devs have to contend with catering to hdd in many scenarios no? So just getting of that that limitation should be celebrated for devs making games...
 
We kind of discussed this way back in this thread but 5 GB/s or 15 GB/s might sound like a lot from a constant through put over time perspective but it's only ~85MB and ~255MB within a single frame (0.01667s) at 60 fps respectively.

People forget the goal is to load as much as possible the data just in time. For example in a third person game, some data will stay inside RAM like the geometry of the protagonist, his animation, all the sound linked to him or some ambient sound. On the other side scenery and NPC texture, geometry, animation will be loaded during streaming too. And you can do it just in time Mark Cerny gave the example of the death word of a NPC but why keep any dialogue of a NPC you can't heard. Stream sound just in time when the protagonist can begin to hear a sound or people talking for example.
 
People forget the goal is to load as much as possible the data just in time. For example in a third person game, some data will stay inside RAM like the geometry of the protagonist, his animation, all the sound linked to him or some ambient sound. On the other side scenery and NPC texture, geometry, animation will be loaded during streaming too. And you can do it just in time Mark Cerny gave the example of the death word of a NPC but why keep any dialogue of a NPC you can't heard. Stream sound just in time when the protagonist can begin to hear a sound or people talking for example.
With cache scrubbers ps5 will get to JIT, some developers have indicated in frame before, I think Tim Sweeny. But I think most developers will code around it unfortunately. However, with PS5, developers can freely call things (within size limits) without needing to worry if it’ll arrive in time.
 
Back
Top