Next-Generation NVMe SSD and I/O Technology [PC, PS5, XBSX|S]

With cache scrubbers ps5 will get to JIT, some developers have indicated in frame before, I think Tim Sweeny. But I think most developers will code around it unfortunately. However, with PS5, developers can freely call things (within size limits) without needing to worry if it’ll arrive in time.
But which ones will actually do that? Sony big developers (like Guerrilla and Naughty Dog) doing multiplatform games whether (PS4 or/and PC) likely won't be able to use JIT data streaming techniques as that would cost them some dev time to use 2 very different paradigms in their engines for streaming data during gameplay. For them it should be much easier (lowering dev time) to just use the CPU to decompress data like on PC.

My guess is that those 2 teams are already targeting PC hardware before porting to PS5 based on the PC type loading speed we see in both of their latest PS5 games and the fact that neither those 2 games use the most importnat specific hardware: RT. Even Housemarque, using Unreal Engine, do use hardware RT to render their GI in Returnal. They also use custom I/O. Why? Maybe because they didn't have to develop the game to PC simultaneously so it was rather easy.

So yes here I am very pessimistic. Naughty dogs and Guerrilla games are releasing games more than 1 year in the life of the console and having devkits for more than 3 years (and they were among the first to have those) and still don't use the 2 most important features of PS5. Why I am the only one bothered by this when others much smaller teams could do it while basically releasing during the launch window?
 
But which ones will actually do that? Sony big developers (like Guerrilla and Naughty Dog) doing multiplatform games whether (PS4 or/and PC) likely won't be able to use JIT data streaming techniques as that would cost them some dev time to use 2 very different paradigms in their engines for streaming data during gameplay. For them it should be much easier (lowering dev time) to just use the CPU to decompress data like on PC.

My guess is that those 2 teams are already targeting PC hardware before porting to PS5 based on the PC type loading speed we see in both of their latest PS5 games and the fact that neither those 2 games use the most importnat specific hardware: RT. Even Housemarque, using Unreal Engine, do use hardware RT to render their GI in Returnal. They also use custom I/O. Why? Maybe because they didn't have to develop the game to PC simultaneously so it was rather easy.

So yes here I am very pessimistic. Naughty dogs and Guerrilla games are releasing games more than 1 year in the life of the console and having devkits for more than 3 years (and they were among the first to have those) and still don't use the 2 most important features of PS5. Why I am the only one bothered by this when others much smaller teams could do it while basically releasing during the launch window?
Yea, I don’t necessarily look at it as something to be exploited. I am piggy backing off the commentary that it’s just easier development, if you’re lazy you can just call something from the nvme at the game update code and by the time render occurs for the GPU it’s arrived.

Whereas if you knew it wasn’t going to arrive you may need to do some additional case handling to ensure the game does not stall.

From this perspective I think makes more sense. The faster the bandwidth the larger the chunk can be for JIT or within frame recall. Which also decrease developer burden if they just want to be lazy about it.

To me, this falls in line with Cerny’s desire for ps5 to be easy to code for yet still maximize its potential.
 
But which ones will actually do that? Sony big developers (like Guerrilla and Naughty Dog) doing multiplatform games whether (PS4 or/and PC) likely won't be able to use JIT data streaming techniques as that would cost them some dev time to use 2 very different paradigms in their engines for streaming data during gameplay. For them it should be much easier (lowering dev time) to just use the CPU to decompress data like on PC.

My guess is that those 2 teams are already targeting PC hardware before porting to PS5 based on the PC type loading speed we see in both of their latest PS5 games and the fact that neither those 2 games use the most importnat specific hardware: RT. Even Housemarque, using Unreal Engine, do use hardware RT to render their GI in Returnal. They also use custom I/O. Why? Maybe because they didn't have to develop the game to PC simultaneously so it was rather easy.

So yes here I am very pessimistic. Naughty dogs and Guerrilla games are releasing games more than 1 year in the life of the console and having devkits for more than 3 years (and they were among the first to have those) and still don't use the 2 most important features of PS5. Why I am the only one bothered by this when others much smaller teams could do it while basically releasing during the launch window?

PC isn't the problem of GG or ND. Direct Storage is a reality of PC. Devs don't need cache scrubber on PC, it helps reduce the memory needed on PS5. On PC there will be another buffer for streamed data and PC have more RAM available even with 16 GB. PS4 is the problem of Sony first party but it will stop to be it from now. Burning shores is a DLC for Forbidden West and it is PS5 only. And this generation of console is here for a long time. In documents send to regulators Sony and Microsoft talk bout a new generation in 2028. If we have mid gen console in 2024, it will really be the middle of the generation
 
Last edited:
But which ones will actually do that? Sony big developers (like Guerrilla and Naughty Dog) doing multiplatform games whether (PS4 or/and PC) likely won't be able to use JIT data streaming techniques as that would cost them some dev time to use 2 very different paradigms in their engines for streaming data during gameplay. For them it should be much easier (lowering dev time) to just use the CPU to decompress data like on PC.

There is no reason whatsoever for developers to use the CPU to decompress data on the PS5 for "compatibility with the PC". I'm not even sure if they have the option NOT to use the hardware decompressor. Also, they could obviously target GPU based decompression on the PC side if decompression speed is a critical factor.

My guess is that those 2 teams are already targeting PC hardware before porting to PS5 based on the PC type loading speed we see in both of their latest PS5 games and the fact that neither those 2 games use the most importnat specific hardware: RT. Even Housemarque, using Unreal Engine, do use hardware RT to render their GI in Returnal. They also use custom I/O. Why? Maybe because they didn't have to develop the game to PC simultaneously so it was rather easy.

This is an absolutely bizarre view of reality. Since when does the PC hold back the PS5 in implementing RT in games? RT has featured in PC games for years before the PS5 launched, and in almost every single case where RT is present in a game, the PC implementation goes further than the PS5. And there are now multiple games that feature RT on the PC and not at all on the PS5. Hell even your own example of a game that uses RT on PS5 because it didn't target PC's originally or Returnal has more RT implemented on the PC version!

As to the argument of "not using the custom IO of the PS5 because they're targeting PC's", this is again total rubbish. There is only one IO system in the PS5, developers have to use it. If a game isn't loading as quickly as you'd like it has absolutely nothing to do with PC compatibility or the PS5's SSD somehow performing worse, and everything to do with CPU side limitations that either the devs didn't care enough to optimise sufficiently, or that the nature of the game simply make impossible to setup faster on the PS5's CPU.

Optimising a game engine to load quickly will absolutely benefit PC's just as much as the PS5.
 
Good counter and solid points, but I think you've missed the gist. It's not the IO system that'll be limiting, but the engine design (in Globby's hypothesis, which I don't necessarily agree with). An engine can't be designed around JIT content and then ported to a platform that cannot provide JIT content access. As such, if you know you are multiplatting to PC or last-gen or wherever, there's reason to design your engine around the limitations of the weakest IO. That means more conventional streaming, buffering ahead and caching content, and not 'using the PS5 IO system' to its potential.

Now possibly that doesn't change things much and a JIT platform can just cache less. Maybe the content streaming can be scaled that way? But even then, you're having to write an engine that'll scale and mitigate the easy-peasyness of the JIT platform, still having to design your engine around slow content access.

As you say though, there's plenty of reason why devs would still push a platform past lowest common denominators and weaker platforms. Higher tier devs, and probably UE5, will probably aim for best results. I hope. ;)
 
Good counter and solid points, but I think you've missed the gist. It's not the IO system that'll be limiting, but the engine design (in Globby's hypothesis, which I don't necessarily agree with). An engine can't be designed around JIT content and then ported to a platform that cannot provide JIT content access. As such, if you know you are multiplatting to PC or last-gen or wherever, there's reason to design your engine around the limitations of the weakest IO. That means more conventional streaming, buffering ahead and caching content, and not 'using the PS5 IO system' to its potential.

Now possibly that doesn't change things much and a JIT platform can just cache less. Maybe the content streaming can be scaled that way? But even then, you're having to write an engine that'll scale and mitigate the easy-peasyness of the JIT platform, still having to design your engine around slow content access.

As you say though, there's plenty of reason why devs would still push a platform past lowest common denominators and weaker platforms. Higher tier devs, and probably UE5, will probably aim for best results. I hope. ;)

When cross gen game period will be finished there will be absolutely no reason to not design games around SSD inside the consoles being PS5 or XSX. PC have a solution with Direct Storage and I am sure with more RAM available to dev and at least a SATA SSD PC port won't be very difficult to do.
 
Last edited:
An engine can't be designed around JIT content and then ported to a platform that cannot provide JIT content access. As such, if you know you are multiplatting to PC or last-gen or wherever, there's reason to design your engine around the limitations of the weakest IO. That means more conventional streaming, buffering ahead and caching content, and not 'using the PS5 IO system' to its potential.

Certainly this is true. But Globby did refer specifically to load times which are a different use case to JIT streaming. I can definitely understand devs holding back on JIT streaming in favour of more traditional caching where a game is targeted for PC's assuming it's additional work for them to implement both a smart caching system AND JIT streaming. But in terms of actual loading times, optimising your game so as to make it load as fast as possible on the CPU side, moving as much of the bottleneck to the IO side as you can, has universal benefits for both platforms. Last gen consoles may see no benefit here though due to being bottlenecked by the slow (even by HDD standards) HDD's and so perhaps developers may simply not bother and prefer to dedicate development resources to more impactful areas. Afterall, while 1-2 second load times are pretty awesome, you don't really find people complaining about 20 second load times, let alone saying they'll boycott a game because of them. Bugs or poor performance on the other hand....
 
Certainly this is true. But Globby did refer specifically to load times which are a different use case to JIT streaming. I can definitely understand devs holding back on JIT streaming in favour of more traditional caching where a game is targeted for PC's assuming it's additional work for them to implement both a smart caching system AND JIT streaming. But in terms of actual loading times, optimising your game so as to make it load as fast as possible on the CPU side, moving as much of the bottleneck to the IO side as you can, has universal benefits for both platforms. Last gen consoles may see no benefit here though due to being bottlenecked by the slow (even by HDD standards) HDD's and so perhaps developers may simply not bother and prefer to dedicate development resources to more impactful areas. Afterall, while 1-2 second load times are pretty awesome, you don't really find people complaining about 20 second load times, let alone saying they'll boycott a game because of them. Bugs or poor performance on the other hand....

More and more people are buying current gen console and more and more game will use Direct Storage. I am sure people will begin to complain about 10 seconds load times one day...
 
Certainly this is true. But Globby did refer specifically to load times which are a different use case to JIT streaming. I can definitely understand devs holding back on JIT streaming in favour of more traditional caching where a game is targeted for PC's assuming it's additional work for them to implement both a smart caching system AND JIT streaming. But in terms of actual loading times, optimising your game so as to make it load as fast as possible on the CPU side, moving as much of the bottleneck to the IO side as you can, has universal benefits for both platforms. Last gen consoles may see no benefit here though due to being bottlenecked by the slow (even by HDD standards) HDD's and so perhaps developers may simply not bother and prefer to dedicate development resources to more impactful areas. Afterall, while 1-2 second load times are pretty awesome, you don't really find people complaining about 20 second load times, let alone saying they'll boycott a game because of them. Bugs or poor performance on the other hand....
Fast loading is good quality of life improvement. It's not imperative but it certainly makes games easier to tolerate if it's there I think
 
PC isn't the problem of GG or ND. Direct Storage is a reality of PC. Devs don't need cache scrubber on PC, it helps reduce the memory needed on PS5. On PC there will be another buffer for streamed data and PC have more RAM available even with 16 GB. PS4 is the problem of Sony first party but it will stop to be it from now. Burning shores is a DLC for Forbidden West and it is PS5 only. And this generation of console is here for a long time. In documents send to regulators Sony and Microsoft talk bout a new generation in 2028. If we have mid gen console in 2024, it will really be the middle of the generation

Its a bandwidth saving feature. If you are scrubbing the cache lines of dead data then when you flush to memory you aren't burning bandwidth to accommodate unnecessary writes to dram. This isn't as big of a problem with most PCs as most aren't sharing DRAM and bandwidth with a CPU. Plus RDNA doesn't flush as often as the early iteration of GCN. AMD used "alot" in their white paper when talking about flushing to VRAM on the first couple of gens of GCN. They use "rare" in terms of flushing on RDNA. But console DRAM has to deal with cpu flushing to memory, so cache scrubbing on the PS5 benefit may lie with saving bandwidth for the overall system.
 
Last edited:
Its a bandwidth saving feature. If you are scrubbing the cache lines of dead data then when you flush to memory you aren't burning bandwidth to accommodate unnecessary writes to dram. This isn't as big of a problem with most PCs as most aren't sharing DRAM and bandwidth with a CPU. Plus RDNA doesn't flush as often as the early iteration of GCN. AMD used "alot" in their white paper when talking about flushing to VRAM on the first couple of gens of GCN. They use "rare" in terms of flushing on RDNA. But console DRAM has to deal with cpu flushing to memory, so cache scrubbing on the PS5 benefit may lie with saving bandwidth for the overall system.

Really great explanation!
 
Its a bandwidth saving feature. If you are scrubbing the cache lines of dead data then when you flush to memory you aren't burning bandwidth to accommodate unnecessary writes to dram. This isn't as big of a problem with most PCs as most aren't sharing DRAM and bandwidth with a CPU. Plus RDNA doesn't flush as often as the early iteration of GCN. AMD used "alot" in their white paper when talking about flushing to VRAM on the first couple of gens of GCN. They use "rare" in terms of flushing on RDNA. But console DRAM has to deal with cpu flushing to memory, so cache scrubbing on the PS5 benefit may lie with saving bandwidth for the overall system.
Cache scrubbers redeemed?! 😂

That's actually interesting. So it might be due to the drawbacks inherent with unified system memory. Is that why MS went with split ram configuration again I wonder
 
That's actually interesting. So it might be due to the drawbacks inherent with unified system memory. Is that why MS went with split ram configuration again I wonder

No. Microsoft does NOT have a split ram configuration on the Series consoles. It is one common pool that both CPU and GPU can access. Only difference is some portion runs slower because of data-width.
 
No. Microsoft does NOT have a split ram configuration on the Series consoles. It is one common pool that both CPU and GPU can access. Only difference is some portion runs slower because of data-width.
Explain a little more for me if you don't mind. What is data width and why is it something in Xbox series versus PS5?
 
Explain a little more for me if you don't mind. What is data width and why is it something in Xbox series versus PS5?
Split memory is 2 physical different pools
Of memory. On PC system memory is separate from GPU memory; the memory addresses are different.

Unified memory, the memory addresses are shared by the GPU and CPU, the only difference between ps5 and xsx is that PS5 shares 100% of all 16 GB of addresses at the same bandwidth speed. Whereas 6Gb of memory addresses on series x is going to run slower than the 10GB.
 
Split memory is 2 physical different pools
Of memory. On PC system memory is separate from GPU memory; the memory addresses are different.

Unified memory, the memory addresses are shared by the GPU and CPU, the only difference between ps5 and xsx is that PS5 shares 100% of all 16 GB of addresses at the same bandwidth speed. Whereas 6Gb of memory addresses on series x is going to run slower than the 10GB.
Do not forget that accessing the slower memory is on average reducing the available 560GB/s bandwidth. Because when the slower memory is accessed at 336Gb/s, the faster memory can't be accessed at all so it decreases the average bandwidth for the t duration. In worst case scenario it could potentially reduce average available bandwidth by like 40GB/s (from calculation done in the old era thread). But in practice the slower pool should be used by data not accessed that much so it should not have much impact.

But this requires more work for the developers, for instance Microsoft developers had to help Epic team optimizing the Unreal 5 demo on that particular problem.

This could also be a problem in very specific cases if the developer really need more than 10GB of fast bandwidth. This may be one of the reasons the Touryst could not be displayed at 8K on XSX.

There are good reasons GPUs manufacturers very rarely ever use such a splitted pool of memory in their GPUs.
 
Split memory is 2 physical different pools
Of memory. On PC system memory is separate from GPU memory; the memory addresses are different.

Unified memory, the memory addresses are shared by the GPU and CPU, the only difference between ps5 and xsx is that PS5 shares 100% of all 16 GB of addresses at the same bandwidth speed. Whereas 6Gb of memory addresses on series x is going to run slower than the 10GB.
So I guess the advantage to this is that Xbox gets to use the full bandwidth but only at certain times, whereas PS5 has consistent bandwidth but can only use a smaller total amount?

I can see how that might give devs some headaches on the same level as split ram despite not being the same thing. It's still a balancing act in certain ways to make sure everything gets what it needs.

On the other hand MS doing it in this particular fashion had advantages in being able to feed it's bigger GPU in the optimal scenario
 
Last edited:
Its a bandwidth saving feature. If you are scrubbing the cache lines of dead data then when you flush to memory you aren't burning bandwidth to accommodate unnecessary writes to dram. This isn't as big of a problem with most PCs as most aren't sharing DRAM and bandwidth with a CPU. Plus RDNA doesn't flush as often as the early iteration of GCN. AMD used "alot" in their white paper when talking about flushing to VRAM on the first couple of gens of GCN. They use "rare" in terms of flushing on RDNA. But console DRAM has to deal with cpu flushing to memory, so cache scrubbing on the PS5 benefit may lie with saving bandwidth for the overall system.

The cache scrubber are linked to the coherency engine inside the I/O complex. I think the RDNA documentation is not so useful in that case. I suppose the problem they try to solve is different than what we see inside a PC where everything is less integrated. After out of the people developing the system at Sony I suppose few peoples knows what they are doing exactly and why they do it. Most of the data loaded from the SSD will be texture or geometry maybe they try to insert some coherency inside texture cache? Or do some other stuff. We don't even know the type of cache impacted by the cache scrubber. I suppose being able to load around 83 MB/real case with R&C Rift Apart) to 183 MB every frame at 60 fps is very different to being able to load 25 MB to 50 MB of data per second on a PS4 depending where the data is on the HDD.

But for sure it is not useful on PC and it was told by Mark Cerny himself. With Direct Storage, PC isn't the bottleneck for something related to I/O.


After they said it is linked to SSD speed, GPU performance/Road to PS5) and ease of development.

Behind the scenes, the SSD's dedicated Kraken compression block, DMA controller, coherency engines and I/O co-processors ensure that developers can easily tap into the speed of the SSD without requiring bespoke code to get the best out of the solid-state solution. A significant silicon investment in the flash controller ensures top performance: the developer simply needs to use the new API. It's a great example of a piece of technology that should deliver instant benefits, and won't require extensive developer buy-in to utilise it.
 
Last edited:
Do not forget that accessing the slower memory is on average reducing the available 560GB/s bandwidth. Because when the slower memory is accessed at 336Gb/s, the faster memory can't be accessed at all so it decreases the average bandwidth for the t duration. In worst case scenario it could potentially reduce average available bandwidth by like 40GB/s (from calculation done in the old era thread). But in practice the slower pool should be used by data not accessed that much so it should not have much impact.

But this requires more work for the developers, for instance Microsoft developers had to help Epic team optimizing the Unreal 5 demo on that particular problem.

This could also be a problem in very specific cases if the developer really need more than 10GB of fast bandwidth. This may be one of the reasons the Touryst could not be displayed at 8K on XSX.

There are good reasons GPUs manufacturers very rarely ever use such a splitted pool of memory in their GPUs.
I agree with everything but generally contention is not that big a deal in this situation. CPU makes a call for memory the GPU over ridden for priority and this will happen for both ps5 and XSX. The only difference being that ps5 will get the cpu request at the hypothetical 480GB/s vs the slower request of hypothetical 336GB/s.

Neither machine have split pools to call 2 separate requests and serve 2 separate systems simultaneously.

In this sense I don’t see how there would be any additional bandwidth losses that could be calculated when so much is loss just due to read/write requests or just general CPU contention. The obvious challenge is footprint, but 6GB and 10GB is quite spacious compared to the PC GPUs that did it back then.
 
Back
Top