Next-Generation NVMe SSD and I/O Technology [PC, PS5, XBSX|S]

I wonder why they decided to have dedicated hardware for decompression if the GPUs are so good at it.

Their design probably started in 2015 somewhere, with perhaps 10TF of compute power gpus in mind. In the pc world everything is dynamically evolving, now with 20 to 36TF becoming available and more in the future, using the GPU makes more sense, its much faster, more flexible and probably scales better. Also less need of extra die logic.

Ampere's 14GB/s claim is similar to the PS5's theoretical max of 23GB/s claim. Don't be surprised if the PS5 gets closer to 23GB/s than any Ampere gets to 14GB/s*
* - assuming the graphics card hasn't died of a faulty PCB yet.

The 14GB/s figure can probably much and much higher then that, with TFs to spare, these things can decompress much higher then the PS5 ever can.
PS5 reaching 22gb/s probably will never happen, otherwise cerney would have rubbed that in your face by now. It's probably more in the lines of 5.5gb/s to 9*.
* -
assuming the ultra high clockspeeds with dimishing returns hasnt resulted in some sort of yellow light of death trying to maintain 10TF while the CPU has to do something intresting and keeping the monstrous (for a console) 350w in check. These heat issue rumours somehow light up again.

Besides, Sony is showing measured throughput numbers backed up by developers, with actual games running on it.
So far nvidia has shown a slide and PR answers to an online FAQ.

Ye, maybe NV was lying. But the same can be said about Sony, we dont have any measures made by both hardware solutions.
 
This is not correct.

Nvidia have been explicit that 2:1 is the expected typical compression ratio with that resulting in a 2x effective uplift to IO. See here:
I read that info from here which didn't say it was typical. But again, if the 2x number is typical, it sounded like too much rounding and somehow MS and Nvidia ended up in exactly the same place.
nvidia said:
Specifically, NVIDIA RTX IO brings GPU-based lossless decompression, allowing reads through DirectStorage to remain compressed while being delivered to the GPU for decompression. This removes the load from the CPU, moving the data from storage to the GPU in its more efficient, compressed form, and improving I/O performance by a factor of 2.

Again, Nvidia have been explicit that it can:
it can if the SSD is faster. Basicallty waht Nvidia said is that if there is a SSD that is faster than Gen4 SSD, it can still handle those. So if you have 14GB/s RAW SSD, Nvidia can potentially have 28GB/s throughput. But what is the limit when decompressing a 7GB/s stream of data? The impact my be tiny, but that is not the point if you potentially bottlenecked from somewhere.
Basically if I have a big data with 4:1 compression ratio, can RTX IO decompress it at 28GB/s throughput? With PS5, it is 22GB/s max, with Xbox series it is over 6GB/s..
If somehow oodle texture can produce 5:1 compression, it doesn't mean PS5 can decode it at 5:1 speed since that would ended up with 27GB/s, but it can decode it at 4:1 speed.

Compression ratio's are determined by the routine that's used. This is fixed on the consoles because they use hardware decompression blocks. So Microsoft is limited by the compression ratio of BCPACK and Sony is limited by the compression ratio of Kraken (with RDO encoded textures). Both end up around 2:1 and that's unlikely to go any higher. RTX IO appears to use compute shaders so may be able to leverage new, higher compression ratio algorithms in the future if such a thing becomes available but I wouldn't count on it. It's probably a good assumption to make that we'll be at 2:1 for the remainder of this generation unless ML texture upscaling becomes a thing (in which Nvidia would have a massive advantage thanks to the Tensor cores).
What people need to remember is that this compression thing is mainly about game install size which has a really nice side effect which is improving the IO speed. A PC game need to target various hardware and on some low end configuration, too aggressive compression might make that game unplayable. or maybe game install will have 2 flavor... compressed with something that is supported by RTX IO / directstorage and another one without (thus bigger install size).

There's no reason why games can't take full advantage of the highest speeds available from RTX IO while still scaling all the way down to SATA SSD's and maybe even mechanical HDD's (although I do expect SSD's to become a minimum requirement on many PC games in the near-mid term.
Yes, it can scale, but mainly because you built it to scale. If I give you an empty canvas and say you can have 14GB/s throughput from SSD, the game would probably ended up different if I only give you 600MB/s thorughput. PC will need a minimum storage speed requirement in the future if they want to truly take advantage of SSD for other than quicker level load speed. Since I'm not a game dev, I can't 100% say with conviction that I can create a game that requires 5.5GB/s throughput from the storage that the game would break no matter how much you try to scale it down when you port it to a 600MB/s storage (max SATA speed), but I definitely can imagine a game designed with 5.5GB/s storage would break/unplayable if it ran on 100MB/s storage.

I want to talk a bit about SFS. Yes, SFS is a wonderful tech, but it doesn't really speed up the IO nor does it make the game install size smaller. What it does is using the bandwidth more efficiently. You only load what you need thus you potentially save that precious bandwidth. It also can reduce memory usage. But unless you use it aggressively (like you keep the texture data usage on memory as small as possible, thus only loading the required texture and immediately throwing away currently unused texture data), I'm not really sure how much useful it is on a system with 10 to 16GB of memory. So what I'm looking at here is that SFS benefit is mainly about efficient use of memory bandwidth instead of increasing IO speed like some people think it would do. Having SFS will not make the Xbox series storage IO have 2.5 multiplier on top of the 4.8GB/s throughput. It is something good to have, definitely useful, but again, I'm not really sure about the impact in real games. I can imagine it will help games to run smoother/prevent hitching when Xbox series is loading textures.
 
I wonder why they decided to have dedicated hardware for decompression if the GPUs are so good at it. At least then a non-open world game could have more shader power when not decompressing. Well, a little bit extra.

I imagine it's simply cheaper in silicon and power terms to have a dedicated hardware block than to try and do this in shaders. Consoles are all about efficiency afterall.
 
I wonder why they decided to have dedicated hardware for decompression if the GPUs are so good at it. At least then a non-open world game could have more shader power when not decompressing. Well, a little bit extra.
Its about performance to die space.
The performance for the amount of space taken up outweighed doing it on gpu, even if the cost of doing it on gpu seems low. Its still saving gpu compute for other tasks, and saves power also.

Also things like deterministic latency for use with SFS.
 
Except compression ratios in games typically don't even reach 2:1 on average.

Using current compression routines no. But both BCPACK and now Oodle Texture + Kraken are claiming just that. There's no reason RTX IO couldn;t be using one of these or an equivalent.

Unless they're claiming those compression ratios are already counting with the delta color compression that is natively processed by the GPU, at which point the 14Gbps statement is just dishonest.

The way it's presented in the slides precludes that's possibility. It'd also be an absolutely bizarre way of claiming things so I'd find it extremely unlikely even without the slides.

Ampere's 14GB/s claim is similar to the PS5's theoretical max of 23GB/s claim. Don't be surprised if the PS5 gets closer to 23GB/s than any Ampere gets to 14GB/s


No. It's absolutely not. The 14GB/s claim is based on a typical 2:1 compression ratio. Nvidia have been absolutely clear and explicit on that. The hardware itself can go beyond that should the source SSD be faster than 7GB/s.

The PS5's 22GB/s is a hardware limit of the decompression block. It will never be reached on average in typical work loads. Sony have been absolutely clear about that.

Sony's 8-9GB/s (11GB/s with Oodle Texture) is the figure that is comparable to Nvidia's 14GB/s. Sony's 22GB/s is comparable to Nvidia's statement that they can go beyond the limits of PCIe 4.0 SSD's, but we don't know how far.

Besides, Sony is showing measured throughput numbers backed up by developers, with actual games running on it.
So far nvidia has shown a slide and PR answers to an online FAQ.

So? Direct Storage isn't available yet on the PC so clearly they can't demonstrate this in current games. If your argument is simply that "nvidia is lying" then this isn't a technical discussion anymore and we should end it there.
 
I read that info from here which didn't say it was typical. But again, if the 2x number is typical, it sounded like too much rounding and somehow MS and Nvidia ended up in exactly the same place.

That may not be coincidence. DirectStorage is directly linked to compression as MS are marketing it on the PC as reducing game install sizes. Since DirectStorage is used in the XSX aswell, which uses BCPACK, it's quite possible DS on the PC also uses that compression format.

it can if the SSD is faster. Basicallty waht Nvidia said is that if there is a SSD that is faster than Gen4 SSD, it can still handle those. So if you have 14GB/s RAW SSD, Nvidia can potentially have 28GB/s throughput. But what is the limit when decompressing a 7GB/s stream of data? The impact my be tiny, but that is not the point if you potentially bottlenecked from somewhere.
Basically if I have a big data with 4:1 compression ratio, can RTX IO decompress it at 28GB/s throughput? With PS5, it is 22GB/s max, with Xbox series it is over 6GB/s..
If somehow oodle texture can produce 5:1 compression, it doesn't mean PS5 can decode it at 5:1 speed since that would ended up with 27GB/s, but it can decode it at 4:1 speed.

Yep agreed on all of that. We know the hard limit on both PS5 and XSX but with RTX IO it's unknown. We only know that it can go beyond 7GB/s input at a typical compression rate of 2:1. Since a "typical", aka average rate would include both files that compress more and less than a 2:1 ratio, it stands to reason that the decompression capabilities of the hardware will peak at higher than 14GB/s output.


What people need to remember is that this compression thing is mainly about game install size which has a really nice side effect which is improving the IO speed. A PC game need to target various hardware and on some low end configuration, too aggressive compression might make that game unplayable. or maybe game install will have 2 flavor... compressed with something that is supported by RTX IO / directstorage and another one without (thus bigger install size).

Yes it'll be interesting to see how that's handled. As you say, dual install options might be the solution. Or maybe post install compression (via the GPU)?

Yes, it can scale, but mainly because you built it to scale. If I give you an empty canvas and say you can have 14GB/s throughput from SSD, the game would probably ended up different if I only give you 600MB/s thorughput. PC will need a minimum storage speed requirement in the future if they want to truly take advantage of SSD for other than quicker level load speed. Since I'm not a game dev, I can't 100% say with conviction that I can create a game that requires 5.5GB/s throughput from the storage that the game would break no matter how much you try to scale it down when you port it to a 600MB/s storage (max SATA speed), but I definitely can imagine a game designed with 5.5GB/s storage would break/unplayable if it ran on 100MB/s storage.

I see where you're coming from but I think it's safe to say that no game will be designed with 14GB/s throughput as the baseline. Cross platform games will likely use the XSX as the baseline and scale up and down from there. 4.8GB/s should scale down to 600MB SATA SSD's (1.2GB/s with RTX IO) quite easily. Simply reducing texture resolution from say 4k to 2k gets you there straight away. And inserting Oblivion style in game loading screens could take you even further.

I suspect you're right about mechanical HDD's though. At some point in this gen games will likely make SSD's a requirement.
 
What kind of game is supposed to constantly shift from memory to ssd 14GB of data any frelled second?
Cerny described it, PS5 SSD enables devs to create game where only things that are visible in front of the player are held in RAM. As player moves the camera, the engine streams the assets that are close to become being visible.

Current engines have to load entire zone around the player, and then GPU culling ensures that only what is visible in actually rendered [but all is still in RAM].
 
Unreal5 looks like an engine designed around constant streaming. How much unreal5 will scale and what is minimum needed and maximum useful is unknown for now. Unreal5 streaming looks like it might scale based on resolution used. Unreal4 stays behind for legacy systems.

From exclusives we know ps4 spiderman was already very streaming heavy. Likely any openworld game is streaming heavy. Piperun assets for openworld incoming?
 
Cerny described it
I was talking about a real game, not one that loads 14GB of assets, then if you look left loads another different 14GB, and if you look right another 14GB.
And please don't look up for Cerny's sake!
A game where in a single moment of a single level you have at least 4x14GB.
What kind of game is it? Matrix?
 
I was talking about a real game, not one that loads 14GB of assets, then if you look left loads another different 14GB, and if you look right another 14GB.
And please don't look up for Cerny's sake!
A game where in a single moment of a single level you have at least 4x14GB.
What kind of game is it? Matrix?

It doesn' have to be all the time 14GB/s to be useful. It could also bring latency down to have less popin artifacts. Something like nhl/mlb/fifa closeups with hero assets for replays and such. No need to waste ram for what is not being used.

Or something like jumping out of plane in gtav, landing and entering random building/car.
 
Cerny described it, PS5 SSD enables devs to create game where only things that are visible in front of the player are held in RAM. As player moves the camera, the engine streams the assets that are close to become being visible.

Current engines have to load entire zone around the player, and then GPU culling ensures that only what is visible in actually rendered [but all is still in RAM].
I'm not a programmer or game designer but that sounds like a really stupid and inefficient way to design a game... It honestly sounds like marketing talk meant to get Sony fanboys excited.
 
I was talking about a real game, not one that loads 14GB of assets, then if you look left loads another different 14GB, and if you look right another 14GB.
And please don't look up for Cerny's sake!
A game where in a single moment of a single level you have at least 4x14GB.
What kind of game is it? Matrix?
It's funny because if you go on the PS5 subreddit you will find tons of people who think the fast SSD will somehow unlock infinite levels of detail and graphical fidelity while ignoring the fact that there's only so much time available to actually render the frames.
 
I'm not a programmer or game designer but that sounds like a really stupid and inefficient way to design a game... It honestly sounds like marketing talk meant to get Sony fanboys excited.

Much easier to load and use assets just in time than relying on elaborate caching schemes with limited memory. Also removes load times which is nice. Games are much bigger than ram available.

It's not only sony. Microsoft does same via directstorage for xbox and pc. Nvidia showed already something that is on top of directstorage(rtx io).

It's also same stuff unreal5 completely relies on.
 
It's funny because if you go on the PS5 subreddit you will find tons of people who think the fast SSD will somehow unlock infinite levels of detail and graphical fidelity while ignoring the fact that there's only so much time available to actually render the frames.

Atleast spawned funny memes like ’im upgrading my gpu, what ssd should i look at?’
 
It's funny because if you go on the PS5 subreddit you will find tons of people who think the fast SSD will somehow unlock infinite levels of detail and graphical fidelity while ignoring the fact that there's only so much time available to actually render the frames.
Speed is important and is critical to being able to pull any asset on demand. It allows for significantly more varied scenes and setups and variability is the key component for what next gen will be.

more speed implies that you can do more in terms of how much you can allow to happen in a given moment.

rendering fidelity and performance is something else. If you lack hard drive performance you’re left to rely heavily on prebuffering assets into memory. Your level and game design becomes constrained because that memory you would use to hold more game related things is now being used as a buffer for areas of the map the player may not go.
 
Last edited:
Back
Top