Next-Generation NVMe SSD and I/O Technology [PC, PS5, XBSX|S]

I think it can be clearly stated that the construction of the current generation of consoles will be much more durable than the previous generations. And this is precisely due to hardware solutions such as decompression blocks, SFS and ultimately solutions based on SSD I/O speed. The question here is not whether it will be possible to use all these capabilities on PCs, of course it is possible, but this requires much more expensive hardware components than those found in consoles since 2020.

More expensive hardware components? Sampler feedback is an integrated feature of all DX12 Ultra GPU's. GPU decompression is also an integrated feature of the GPU (effectively) so there is no additional hardware cost to the PC gamer at all of these features over and above the GPU. Even the $299 RTX 3050 supports both these features. Direct Storage is of course free, and that just leaves the NVMe drive which you pay for what you get.

Obviously the consoles will be cheaper because the core components are purchased for less and they can be sold with less or even no margin. But it's not as if you need expensive elaborate specialised hardware solutions on the PC side to achieve the same thing.

It is clear that hardware decompression blocks and hardware-integrated SFS are a much cheaper and more elegant (and forward-looking!) technical feature than simply entrusting the solution to the GPU or CPU, which requires much more resources.

I disagree. These are integrated features of the GPU (aside from the mip blending unit in the Series consoles) . For decompression, the silicone is arguably free because it re-uses the shader cores which are required by the GPU anyway. If done well, the decompression can use async compute to do decompression when those shader cores aren't being used for gaming. This is definitely the case for load screens and also potentially the case for the much less intensive streaming requirements.

I would personally say that it is a more elegant solution to re-use hardware that is already present in the system than to have a dedicated ASIC for that specific feature that can't be used for anything else. Same on the future looking aspect. How is a fixed hardware unit future looking? It does what it does and will never do anything more. The GPU based approach allows for more advanced compression routines that may be developed in the future to run on the same hardware. This is precisely the reason why the hardware units in the PS4/XBO fell out of use - because better compression routines became available that they were unable to process.

The ASIC absolutely will have an advantage from a power draw perspective though and that's certainly important for the consoles.

Anyway, it was not possible to take advantage of these capabilities in consoles until there were enough high-performance PCs on the market. Namely because of multiplatform developments.

I really don't think this is an issue. Returnal is an example of that. It's trivial to bypass the high IO requirements on PC through the use of additional RAM for all but the absolute corner cases of game design. It's also trivial to reduce IO requirements by allowing settings to scale down. In the absolute worst case scenario, the developers could simply make an NVMe drive a recommendation which is probably no more crazy than recommending 32GB of RAM.
 
More expensive hardware components? Sampler feedback is an integrated feature of all DX12 Ultra GPU's. GPU decompression is also an integrated feature of the GPU (effectively) so there is no additional hardware cost to the PC gamer at all of these features over and above the GPU. Even the $299 RTX 3050 supports both these features. Direct Storage is of course free, and that just leaves the NVMe drive which you pay for what you get.
From the perspective of design of the console I think that's definitely still a lot more expensive than the ps5 hardware -- whatever fraction of the 3050 hardware required to run decompression at the same rate as the ps5's hardware is almost certainly more expensive to manufactre. Not to mention the difference in heat and power consumption, which is probably much larger than the difference in $ -- and the fact that the 3050 was not out yet when the ps5 was first produced.
 
From the perspective of design of the console I think that's definitely still a lot more expensive than the ps5 hardware -- whatever fraction of the 3050 hardware required to run decompression at the same rate as the ps5's hardware is almost certainly more expensive to manufactre.

Yes but the decompression is running on the shader cores that are already there. They'd be there whether being used for decompression or not so from that perspective the decompression hardware is free. And when the bulk of their use is being done, i.e. at loading screens, they would otherwise be idle. So this is a really elegant solution in my view. The unknown of course is how much impact streaming decompression will have during gameplay. In ideal circumstances it'll be unnoticeable as it'll use async and slot into spare shader array capacity around the graphics workload. The actual streaming workload is unlikely to be large vs loading screens but I guess there is always potential for the two workloads to clash interframe and cause one or other to miss the frame time window. You would hope in these circumstances the graphics workload is given priority.

I completely agree on the power argument though. It's doubtless more power efficient to use an ASIC. Although I can't image the real world impact being that significant. when you consider general streaming is very light, and burst loading is very short.

The 3050 was just an example btw, hardware decompression works all the way down to Maxwell and GCN.
 
What sort of decompression? archive deflate hasn't. Certainly this 2021paper claims it hasn't. However, texture decompression is of course very present on GPUs.
I am not correct by any means, but video codecs are a form of compression that is decompressed by GPUs regularly. This is not GPU decompression via compute aligned to DirectStorage, but I think regularly in the CUDA space, people have been researching with decompression and compression for handling of large data sets or 4k video sets in data science as a means to get around memory limitations.

Some projects that people have written:

These are quite different needs than what we see from GDeflate however, which is pretty novel and new to the scene.
 
More expensive hardware components? Sampler feedback is an integrated feature of all DX12 Ultra GPU's. GPU decompression is also an integrated feature of the GPU (effectively) so there is no additional hardware cost to the PC gamer at all of these features over and above the GPU. Even the $299 RTX 3050 supports both these features. Direct Storage is of course free, and that just leaves the NVMe drive which you pay for what you get.

Obviously the consoles will be cheaper because the core components are purchased for less and they can be sold with less or even no margin. But it's not as if you need expensive elaborate specialised hardware solutions on the PC side to achieve the same thing.



I disagree. These are integrated features of the GPU (aside from the mip blending unit in the Series consoles) . For decompression, the silicone is arguably free because it re-uses the shader cores which are required by the GPU anyway. If done well, the decompression can use async compute to do decompression when those shader cores aren't being used for gaming. This is definitely the case for load screens and also potentially the case for the much less intensive streaming requirements.

I would personally say that it is a more elegant solution to re-use hardware that is already present in the system than to have a dedicated ASIC for that specific feature that can't be used for anything else. Same on the future looking aspect. How is a fixed hardware unit future looking? It does what it does and will never do anything more. The GPU based approach allows for more advanced compression routines that may be developed in the future to run on the same hardware. This is precisely the reason why the hardware units in the PS4/XBO fell out of use - because better compression routines became available that they were unable to process.

The ASIC absolutely will have an advantage from a power draw perspective though and that's certainly important for the consoles.



I really don't think this is an issue. Returnal is an example of that. It's trivial to bypass the high IO requirements on PC through the use of additional RAM for all but the absolute corner cases of game design. It's also trivial to reduce IO requirements by allowing settings to scale down. In the absolute worst case scenario, the developers could simply make an NVMe drive a recommendation which is probably no more crazy than recommending 32GB of RAM.

More expensive hardware components? Sampler feedback is an integrated feature of all DX12 Ultra GPU's. GPU decompression is also an integrated feature of the GPU (effectively) so there is no additional hardware cost to the PC gamer at all of these features over and above the GPU. Even the $299 RTX 3050 supports both these features. Direct Storage is of course free, and that just leaves the NVMe drive which you pay for what you get.

Obviously the consoles will be cheaper because the core components are purchased for less and they can be sold with less or even no margin. But it's not as if you need expensive elaborate specialised hardware solutions on the PC side to achieve the same thing.



I disagree. These are integrated features of the GPU (aside from the mip blending unit in the Series consoles) . For decompression, the silicone is arguably free because it re-uses the shader cores which are required by the GPU anyway. If done well, the decompression can use async compute to do decompression when those shader cores aren't being used for gaming. This is definitely the case for load screens and also potentially the case for the much less intensive streaming requirements.

I would personally say that it is a more elegant solution to re-use hardware that is already present in the system than to have a dedicated ASIC for that specific feature that can't be used for anything else. Same on the future looking aspect. How is a fixed hardware unit future looking? It does what it does and will never do anything more. The GPU based approach allows for more advanced compression routines that may be developed in the future to run on the same hardware. This is precisely the reason why the hardware units in the PS4/XBO fell out of use - because better compression routines became available that they were unable to process.

The ASIC absolutely will have an advantage from a power draw perspective though and that's certainly important for the consoles.



I really don't think this is an issue. Returnal is an example of that. It's trivial to bypass the high IO requirements on PC through the use of additional RAM for all but the absolute corner cases of game design. It's also trivial to reduce IO requirements by allowing settings to scale down. In the absolute worst case scenario, the developers could simply make an NVMe drive a recommendation which is probably no more crazy than recommending 32GB of RAM.
Fixed function hardware used in consoles has a very important role. Because this is how they can use modern functions and features on these fixed console hardware in the future. No matter how we look at it, these are forward-looking and advanced technologies, the purpose of which is to ensure that the $500 console already has the feature knowledge that later PCs can only produce with extra power and components available at a much higher price. In my opinion, it is not an elegant and better solution if something can only be achieved later and with hardware that can be purchased with more money...
 
Fixed function hardware used in consoles has a very important role. Because this is how they can use modern functions and features on these fixed console hardware in the future. No matter how we look at it, these are forward-looking and advanced technologies, the purpose of which is to ensure that the $500 console already has the feature knowledge that later PCs can only produce with extra power and components available at a much higher price. In my opinion, it is not an elegant and better solution if something can only be achieved later and with hardware that can be purchased with more money...

I'm not sure you really understood what I was saying. Nothing "extra" needs to be purchased on the PC side to achieve the functions you mentioned (Sampler Feedback streaming (small s), non-CPU based data decompression) outside of an SSD itself, which is obviously the same or very similar to the SSD's in the consoles.

Every PC GPU released since the PS4 launched supports GPU based decompression. This is a completely free feature to all PC gamers in hardware terms that requires no additional purchase whatsoever.

And Sampler Feedback has been a feature on PC GPU's since Turing launched 2 years before the current generation of consoles. Again, this is an inbuilt GPU feature and does not require any additional hardware to be purchased separately. PC's do lack the mip blending hardware in the series consoles, but then so does the PS5 and this isn't a requirement at all for Sampler Feedback streaming.

To leverage fast IO like the consoles in the PC space, the only thing you need to purchase (assuming you have a remotely modern GPU) is an NVMe SSD. And that's no different to the consoles themselves which include NVMe SSD's.

The PC itself will still cost more than the console naturally, but that's nothing to do with needing to purchase additional hardware for these specific features which are essentially free to the PC gamer if they have an NVMe drive. Nor do they require more performance to achieve on the PC side vs the consoles since as I mentioned above you can enjoy, fast IO with GPU based decompression and sampler feedback all the way down to an RTX 3050 or RTX 2060 which are both lower performance than the consoles.
 
No matter how we look at it, these are forward-looking and advanced technologies, the purpose of which is to ensure that the $500 console already has the feature knowledge that later PCs can only produce with extra power and components available at a much higher price.

Errr...... no.

The only reason these super fast I/O solutions exist in consoles is because they only have 16GB of RAM.

So they're purpose is make up for the consoles lack of RAM.

If PS5 or Series-S had 24 or 32GB of RAM they would have used bog standard SATA III SSD's and no fancy storage API's.
 
Errr...... no.

The only reason these super fast I/O solutions exist in consoles is because they only have 16GB of RAM.

So they're purpose is make up for the consoles lack of RAM.

If PS5 or Series-S had 24 or 32GB of RAM they would have used bog standard SATA III SSD's and no fancy storage API's.

At the end this solution is not cost effective for a console and it takes more time to load data meaning slower loading time. In a fixed hardware less RAM and a faster storage is a much better solution. The other solution is dumb because it is more expensive and player loses comfort. if 16 GB was limiting the assets quality it would be a problem but this is not the case. The limitation of asset quality will be game size.

Having more RAM is a clever solution in an open platform like PC because configuration aren't the same.

EDIT: From a developer perspective having enough RAM and a fast storage is a much better solution. It means less time lost to try to prefetch data depending of what the designer want. It means optimize the engine for fast loading and after this is not a concern anymore. Fast storage means loading data just in time and it is important for developer. They can find solution with a slower storage. But they have more time to do something more important than carefully prefetch data for the scene designers want to create or worse say them it will be impossible to do because the storage it too slow.
 
Last edited:
I'm not sure you really understood what I was saying. Nothing "extra" needs to be purchased on the PC side to achieve the functions you mentioned (Sampler Feedback streaming (small s), non-CPU based data decompression) outside of an SSD itself, which is obviously the same or very similar to the SSD's in the consoles.

Every PC GPU released since the PS4 launched supports GPU based decompression. This is a completely free feature to all PC gamers in hardware terms that requires no additional purchase whatsoever.

And Sampler Feedback has been a feature on PC GPU's since Turing launched 2 years before the current generation of consoles. Again, this is an inbuilt GPU feature and does not require any additional hardware to be purchased separately. PC's do lack the mip blending hardware in the series consoles, but then so does the PS5 and this isn't a requirement at all for Sampler Feedback streaming.

To leverage fast IO like the consoles in the PC space, the only thing you need to purchase (assuming you have a remotely modern GPU) is an NVMe SSD. And that's no different to the consoles themselves which include NVMe SSD's.

The PC itself will still cost more than the console naturally, but that's nothing to do with needing to purchase additional hardware for these specific features which are essentially free to the PC gamer if they have an NVMe drive. Nor do they require more performance to achieve on the PC side vs the consoles since as I mentioned above you can enjoy, fast IO with GPU based decompression and sampler feedback all the way down to an RTX 3050 or RTX 2060 which are both lower performance than the consoles.
I'll make it clear then. I was just talking about the fact that on PC these features are solved by force, but this requires a more powerful GPU or more CPU cores or more RAM so that it works with the same efficiency as on the console with fixed function hardware. How would it be free for players on PC? After all, they have to buy their PC assembled from components sold at a much higher price.

And one more thing. In today's world, since console and PC games are being developed together, this significantly limits the actual usability of consoles in terms of graphic quality. It is known that a game can only be released if it runs acceptably on all hardware. If it were the case that they would only develop for consoles first, and only later bring the games to PC, then what I was talking about would be even more obvious. If all the features of the Xbox Series or PS5 had been used in 2021, what percentage of the PCs on the market at that time would have been able to run these features at a sufficiently good speed...
 
The only reason these super fast I/O solutions exist in consoles is because they only have 16GB of RAM.

So they're purpose is make up for the consoles lack of RAM.
Completely disagree. Storage IO is a limiting factor for game/engine design and updating it to a next-gen solution makes a lot of sense. That's exactly how Cerny described it, not as the only way they could work around being unable to afford more RAM. That's also why super-fast solutions are coming to PC, otherwise there'd be no point in DirectStorage.
 
Completely disagree. Storage IO is a limiting factor for game/engine design and updating it to a next-gen solution makes a lot of sense. That's exactly how Cerny described it, not as the only way they could work around being unable to afford more RAM. That's also why super-fast solutions are coming to PC, otherwise there'd be no point in DirectStorage.

People didn't understand the road to PS5 video was for indie dev and what he was talking is about game design. Most of the advantage will be during game production and people won't understand the game will be very different with slow storage.


In the 2016 teaser of Spider-man PS4, there is a part where Spider-man goes from interior to exterior in a chase sequence inside a dinner and this was impossible to do in the final game. For the scene inside Fisk building they needed to insert a cutscene for hide data loading from open world web swinging to interior gameplay and it took months to make it work. And after this they decided to limit as much as possible exterior to interior transition and vice versa.

Out of assets quality, faster loading and improved web slinging speed I doubt gamers will understand SSD usefulness in Spiderman 2 for example.
 
Last edited:
I will not applaud the effect that, according to some, it is normal to buy new hardware every two years by paying more and more to the hardware manufacturers if you want to see better graphics...

To really take advantage of the capabilities and possibilities of a fixed hardware is what would be really appropriate.
 
Completely disagree. Storage IO is a limiting factor for game/engine design and updating it to a next-gen solution makes a lot of sense. That's exactly how Cerny described it, not as the only way they could work around being unable to afford more RAM. That's also why super-fast solutions are coming to PC, otherwise there'd be no point in DirectStorage.

A more efficient I/O stack would have still worked wonders on a 500MB/s SATA III SSD.

Going from an 70MB/s 5200rpm HDD to a 500MB/s SSD and more efficient I/O stack would have still been a 'next gen' update in my eyes.
 
It doesn't mean developers don't prefetch data using the priority level of the API. Storage is fast but developer can't load everything in one frame. People focus too much on GB or MB per second of NVME SSD where the most important is number of MB loaded per frame for streaming and game design. Out of loading a level or a game GB/s of a NVME SSD is not very important.


A more efficient I/O stack would have still worked wonders on a 500MB/s SATA III SSD.

Going from an 70MB/s 5200rpm HDD to a 500MB/s SSD and more efficient I/O stack would have still been a 'next gen' update in my eyes.

They decided to use NVME because developer ask it. And NVME SSD latency is much better than SATA SSD one.


In this video, Mark Cerny said NVME SSD was a demand of developers and Tim Sweeney was insisting they need one inside the next generation of consoles.

storage-sata_vs_sas_vs_nvme-f.png
 
Last edited:
I will not applaud the effect that, according to some, it is normal to buy new hardware every two years by paying more and more to the hardware manufacturers if you want to see better graphics...

To really take advantage of the capabilities and possibilities of a fixed hardware is what would be really appropriate.

I've changed my motherboard and CPU twice in the space of 3 months.

People often do change their PC hardware at a constant rate, why? Because we can.

Console gamers were offered the chance to "buy new hardware by paying more to the hardware manufacturers if you want to see better graphics" last generation.

It was called PS4 Pro and Xbox One X - And both sold extremely well indicating that like some PC gamers, some console gamers would also upgrade to a more powerful console every couple of years if they were given the chance.
 
A more efficient I/O stack would have still worked wonders on a 500MB/s SATA III SSD.

Going from an 70MB/s 5200rpm HDD to a 500MB/s SSD and more efficient I/O stack would have still been a 'next gen' update in my eyes.

That sata solution wouldve run rift apart completely fine if data is to be believed, in special when teamed to a more generous amount of ram.

Actually the more ram with sata solution would have its own benefits. SSD’s arent even remotely close to what fast modern ddr ram offers in terms of speeds, latency etc.
The ssd solution is part of a substitude to the tiny amount of ram.

Also i think gpu decompression is the way forward, gpus are practically never fully saturated, they are flexible, extremely fast and scaling is natural.
A modern fast gpu, 32gb of high bw ram tied to a pcie4/5 nvme together with DS will be a generation apart in IO capabilities.
Also consider the much larger vram allocations at a much higher transfer rate without cpu/gpu contention (dedicated ram pools) and much faster CPU’s, the ones in the consoles are zen2’s which arent all that…. Its no zen3 or alder lake, before even delving into clocks.

Its for sure not the pc thats behind.
 
That sata solution wouldve run rift apart completely fine if data is to be believed, in special when teamed to a more generous amount of ram.

Actually the more ram with sata solution would have its own benefits. SSD’s arent even remotely close to what fast modern ddr ram offers in terms of speeds, latency etc.
The ssd solution is part of a substitude to the tiny amount of ram.

Also i think gpu decompression is the way forward, gpus are practically never fully saturated, they are flexible, extremely fast and scaling is natural.
A modern fast gpu, 32gb of high bw ram tied to a pcie4/5 nvme together with DS will be a generation apart in IO capabilities.
Also consider the much larger vram allocations at a much higher transfer rate without cpu/gpu contention (dedicated ram pools) and much faster CPU’s, the ones in the consoles are zen2’s which arent all that…. Its no zen3 or alder lake, before even delving into clocks.

Its for sure not the pc thats behind.

Again you don't understand why they have NVME SSD inside the console, this is not only because of the amount of RAM but because it makes developer life easier. Insomniac games would have lost time for example during on rail sequence to make a system to prefetch data to load and they would maybe need to modify the level if an on rail sequence was too fast to load enough data. Here they just load the data when the player hit the crystal making the game warping to another dimension. 32 GB is not useful in console. You can have 256 GB of RAM in a PC, there is a moment you need to load the data from storage. And SSD won't replace RAM...

GPU are saturated if it was not the case games would have infinite resolution and infinite framerate. Direct Storage use Async Compute but games are now using Async compute heavily... Developers will need to find a moment where they can load data and it means less power use by the GPU for graphics. It will probably be minimal for most of the game.

From the moment, we saw the first UE 5 demo we know 16 GB coupled with a NVME SSD is enough and will not cripple assets quality. In this demo asset quality is above the quality we will have in game because 8k textures are too big compared to the limit of the size of a game and this is the same for the geometry of the statue asset.
 
Last edited:
SATA SSDs nowadays aren't really cheaper than NVMe drives anymore, and even in the past it was partly due to market segmentation in the consumer space. NVMe SSDs without DRAM (relying on HMB) in theory would be cheaper and higher performing than SATA drives with DRAM (DRAMless SATA does not work as well due to protocol limitations).

SATA SSD have legacy implications and some additional flexibility (depending on the consumers existing setup) as a product, but I don't feel those would be important considerations for Sony or Microsoft when designing their forward outlook plans with respect to leveraging storage.
 
Last edited:
Again you don't understand why they have NVME SSD inside the console, this is not only because of the amount of RAM but because it makes developer life easier.
How would an NVME SSD make life better than an SATA III SSD if transfer rates were the same?
Insomniac games would have lost time for example during on rail sequence to make a system to prefetch data to load and they would maybe need to modify the level if an on rail sequence was too fast to load enough data.
This is a baseless assumption to which you have no hard data on.
Here they just load the data when the player hit the crystal making the game warping to another dimension.
So you're claiming they don't do any pre-loading at all for that?
32 GB is not useful in console.
I'm sure developers would disagree.
You can have 256 GB of RAM in a PC, there is a moment you need to load the data from storage.
The latest COD game is 125GB, it would take ~4 minutes for a 550MB/s SATA III SSD to load all of that.

Worse case you have a 4 minute loading screen and then NEVER, EVER, EVER see a loading screen in the game. EVER.

Smaller games would fair even better.
GPU are saturated
No they're not, in typical gameplay they're running below 100% as developers have to leave headroom for more demanding scenes.

If I turn vsync on my PC my GPU, even on demanding games and settings my GPU isn't at 99% load all the time.
From the moment, we saw the first UE 5 demo we know 16 GB coupled with a NVME SSD is enough and will not cripple assets quality.
Another assumption, no one here knows for sure how assest size and quality will change over the next 5-6 years.

All we can do is guess which is what you're doing here so stop passing it off as factual.

You need to stop passing off your opinion and guess work off as being factual.

Back on ignore.
 
How would an NVME SSD make life better than an SATA III SSD if transfer rates were the same?

This is a baseless assumption to which you have no hard data on.

So you're claiming they don't do any pre-loading at all for that?

I'm sure developers would disagree.

The latest COD game is 125GB, it would take ~4 minutes for a 550MB/s SATA III SSD to load all of that.

Worse case you have a 4 minute loading screen and then NEVER, EVER, EVER see a loading screen in the game. EVER.

Smaller games would fair even better.

No they're not, in typical gameplay they're running below 100% as developers have to leave headroom for more demanding scenes.

If I turn vsync on my PC my GPU, even on demanding games and settings my GPU isn't at 99% load all the time.

Another assumption, no one here knows for sure how assest size and quality will change over the next 5-6 years.

All we can do is guess which is what you're doing here so stop passing it off as factual.

You need to stop passing off your opinion and guess work off as being factual.

Back on ignore.

We have hard data about this. @HolySmoke using an external SSD measured the full game loading data it is 1512 MB and find the burst during portal are around 400 to 500 MB of data load in a few frames. They just need to call the right level of priority. It means with a SATA SSD would have to prefetch data a little in advance for this taking into account the inferior bandwidth and latency and it means they would have to do it and lost time of devs on this.

4 minutes is too long... The faster are the loading, the better it is.

This is not an opinion 8k textures are used in offline rendering movies or TV series🤣, same they didn't say the statue is not useful for UE 5 the engine is used for TV production just too big for game asset. If tomorrow a studio decide to make a UE 5 500 GB game, we will maybe have 8k textures and massive geometry assets but I don't think gamers would be very happy.

The Mandalorian scenery use 8 k textures and crazy amount of geometry. This is what they call the cinematic/virtual production mode in Unreal Engine and they talk about it in the UE 5 demo video just at the beginning.

EDIT: Level of priority are a form of prefetching because PS5 SSD is not fast enough to load enough data in one frame. This is not like you can fully load a R&C Rift Apart level in one frame. But they doesn't do any prefetching of data before the player hit the crystal. The data is not loaded in one frame. With a SATA SSD they would need to do it.

And dev don't design their game around the typical gameplay but around the worst case.
 
Last edited:
Back
Top