Next-Generation NVMe SSD and I/O Technology [PC, PS5, XBSX|S]

Remij · Oct 15, 2022

arandomguy said:
We need to distinguish between sustained throughput and burst/latency. A single frame at 60 fps is only 16.67ms. To push say only 500 MB through in that time frame would actually be equivalent to almost 30 GB/s. If we are talking about a move to more reliance on real time streaming in of data how we look at the numbers need to change accordingly.

While I don't think we will actually see sustained throughput requirements in the multiple GB/s anytime soon (not even sure what type of game design would make use of that), burst rate performance could very well be meaningful in those ranges even if you were only wanting to push a few hundred MB of data in a short period.

Albeit given SSD speed limitations (relatively speaking) they will likely be the limiting factor as opposed to GPU processing speeds given the above numbers.

Visuals aren't completely changing every frame though.. that's ridiculous.

arandomguy · Oct 15, 2022

Remij said:
Visuals aren't completely changing every frame though.. that's ridiculous.

I feel you're still approaching it from a sustained mindset by that remark. The issue isn't whether or not you need to load in new data every single frame but if it's streaming in at real time it might need to stream that set of data at any given frame. Forgot a hard stutter for the moment, let's just it's textures being slow to load in as you go into a new room while data streams in.

We can argue how much of a delay ends up noticeable and detracting from the experience but going by numbers if it's half second than 500 MB/s goes to 1 GB/s. 0.25 seconds (15 frames at 60 fps) is 2 GB/s, and doubles so on. So while 500 MB (or whatever small number) may seem small, in a latency/burst scenario the processing requirement escalates quickly in terms of the numbers.

Remij · Oct 15, 2022

arandomguy said:
I feel you're still approaching it from a sustained mindset by that remark. The issue isn't whether or not you need to load in new data every single frame but if it's streaming in at real time it might need to stream that set of data at any given frame. Forgot a hard stutter for the moment, let's just it's textures being slow to load in as you go into a new room while data streams in.

We can argue how much of a delay ends up noticeable and detracting from the experience but going by numbers if it's half second than 500 MB/s goes to 1 GB/s. 0.25 seconds (15 frames at 60 fps) is 2 GB/s, and doubles so on. So while 500 MB (or whatever small number) may seem small, in a latency/burst scenario the processing requirement escalates quickly in terms of the numbers.

You're right, I am thinking about it in the wrong sense. I do understand what you are saying.

I still think these GPUs should easily be able to handle these types of scenarios.

PSman1700 · Oct 15, 2022

30 to 40gb/s thats just crazy fast, and probably outside the realm of what DS supported games will be streaming. However its nice to see the possibility being there, even say 50gb/s aint that much on a modern dedicated performance GPU, they have quite much dedicated BW to themselfs and when it happens it likely will be short-bursts of data at those higher speeds.
Realistically though were probably looking at a max of 3.5 to 5gb/s at the end of the generation when games fully utilize nvme speeds in the most rare cases, in short bursts. Even Rift Apart who according to the developers 'fully make use of the fast nvme' by swapping out entire worlds in a fraction of a second dont come close to 5gb/s even.

pjbliverpool · Oct 15, 2022

davis.anthony said:
Have we had any true next gen games built with NVNE in mind yet?

People need to stop believing that just because games haven't historically pushed a lot of data due to mechanical HDD limitations that they won't push that beyond that now.

Some games could very well end up streaming that much data, that's what makes this generation so exciting, we don't know.

The limitation isn't historical presidence, it's storage space. Games are rarely more that 100GB on disk and often much smaller. Any game that requires streaming at 5GB/s on a regular basis would have exhausted all of its content in a few minutes.

Short lived bursts of those speeds may occur though but as Remji said, a modern GPU shouldn't break much of a sweat handling 5GB/s via async compute.

Now if you were trying to decompress a stream at the max throughput of future PCIe 5 drives (say 14-15GB/s) on a 2060 or below then you might start hitting GPU limitations. But we won't see anything like those speeds this console generation.

davis.anthony · Oct 15, 2022

I'm not talking about the processing side of the decompression side of it.

I'm talking about the memory and bandwidth side of it.

For the last few years we've been asking GPU's to do ray tracing which is incredibly taxing on GPU bandwidth and memory.

And now we're about to start asking that memory system that's already getting absolutely hammered by ray tracing to now handle multiple GB's of read/writes from decompression.

I know ultra high end GPU's should be OK with that but I don't care about the ultra high end and neither do developers.

How is the average, mid-range GPU's bandwidth and memory system going to handle it all?

Nothing in computing is free, there is always a hardware and performance cost and GPU decompression will be no different.

pjbliverpool · Oct 15, 2022

We're talking about 5-10GB/s at most in very short occasional bursts. Even a 2060 has 336GB/s. I'm really not seeing the problem.

PSman1700 · Oct 15, 2022

pjbliverpool said:
We're talking about 5-10GB/s at most in very short occasional bursts. Even a 2060 has 336GB/s. I'm really not seeing the problem.

Not to forget that these will be short bursts of data streaming, the full BW of a gpu is rarely every saturated to its full potentional at all times. Adding decompression tasks to the GPU, which is the perfect arch for such operations, actually seems effectively making use of your hardware.

chris1515 · Oct 15, 2022

PSman1700 said:
Not to forget that these will be short bursts of data streaming, the full BW of a gpu is rarely every saturated to its full potentional at all times. Adding decompression tasks to the GPU, which is the perfect arch for such operations, actually seems effectively making use of your hardware.

Bandwidth is not a problem but for bust operation GPU aren't the perfect arch. This is the reason IHV think about add ASICs on GPU for decompression. If you need a burst of 5 GB/s this is 10% of the GPU power for a high end Ampere GPU 10 to 11 GB this is 20% and for lower power GPU like 2080Ti it can go 25 to 30% of the GPU at a moment of a burst. The GPU needs to run graphics too.

And this is another asynchronous process to add to the GPU job you need to find the time slice during a frame to run it. I am sure into a real game this is more complicated than that, this is not a benchmark.

EDIT: After this is much better than use the CPU but not ideal.

pjbliverpool · Oct 15, 2022

chris1515 said:
Bandwidth is not a problem but for bust operation GPU aren't the perfect arch. This is the reason IHV think about add ASICs on GPU for decompression. If you need a burst of 5 GB/s this is 10% of the GPU power for a high end Ampere GPU 10 to 11 GB this is 20% and for lower power GPU like 2080Ti it can go 25 to 30% of the GPU at a moment of a burst. The GPU needs to run graphics too.

And this is another asychronous process to add to the GPU job you need to find the timeslice during a frame to run it. I am sure into a real game this is more complicated than that, this is not a benchmark.

5GB/s is the absolute limit of what we'll potentially see this generation (due to PS5 limitations) and even that I expect is unlikely.

So unless the shader array is running at >90% at the moment the burst transfer is required then even that should have a negligible impact on a a decent GPU.

Remember we're unlikely to be talking about actual 5GB data transfers here over a full second, but rather a fraction of that over a handful of frames. Losing say 20% of shader performance (a high estimate IMO) for 10 consecutive frames at 60fps (as an example) isn't going to have a significant impact on the overall experience.

Remij · Oct 15, 2022

chris1515 said:
Bandwidth is not a problem but for bust operation GPU aren't the perfect arch. This is the reason IHV think about add ASICs on GPU for decompression. If you need a burst of 5 GB/s this is 10% of the GPU power for a high end Ampere GPU 10 to 11 GB this is 20% and for lower power GPU like 2080Ti it can go 25 to 30% of the GPU at a moment of a burst. The GPU needs to run graphics too.

And this is another asynchronous process to add to the GPU job you need to find the time slice during a frame to run it. I am sure into a real game this is more complicated than that, this is not a benchmark.

EDIT: After this is much better than use the CPU but not ideal.

Everything is speculation at this point.

Regardless... it's 1000x more preferrable to be GPU bound than CPU bound.

And again, I know some people are concerned with how we get there... but the goal is to have dedicated hardware handle this in the future. They simply need a stop-gap implementation until it can be developed and deployed to a large enough user base.

chris1515 · Oct 15, 2022

pjbliverpool said:
5GB/s is the absolute limit of what we'll potentially see this generation (due to PS5 limitations) and even that I expect is unlikely.

So unless the shader array is running at >90% at the moment the burst transfer is required then even that should have a negligible impact on a a decent GPU.

Remember we're unlikely to be talking about actual 5GB data transfers here over a full second, but rather a fraction of that over a handful of frames. Losing say 20% of shader performance (a high estimate IMO) for 10 consecutive frames at 60fps (as an example) isn't going to have a significant impact on the overall experience.

Don't forget the number are the decompression rate not the speed of the SSD. If CPU code is performant enough the rate can go as high as 11 GB/s on PS5.

No we have one game where this is 5GB/s of decompression rate Ratchet and Clank Rift Apart and they aren't limited by the SSD but by the code they use for the CPU. They have the same problem than into the Matrix Awakens demo, they can't initialize fast enough the entities of a level because they use game and render thread architecture in the CPU code. They said in Spiderman post mortem this was ok for this game but it begins to be a limitation probably thinking about R&C Rift Apart. And they will tackle the problem probably for Spiderman 2 or Wolverine depending of the time they need to change the CPU code architecture.

People forget that CPU code can improve a lot in many game engine. I am sure than if Epic tackle the problem the Matrix Awakens demo can run at 60 fps on a PC with a powerful CPU. Going from OOP architecture to ECS design pattern driven CPU code, it is a very long road maybe for Unreal Engine 6.

EDIT: Some part of UE 5 use ECS like Mass AI but the bottleneck begins to appear in other part of the code like the game thread or the render thread I suppose.

pjbliverpool · Oct 15, 2022

chris1515 said:
Don't forget the number are the decompression rate not the speed of the SSD. If CPU code is performant enough the rate can go as high as 11 GB/s on PS5.

Im not sure that is the case. There is no specific detail from the provided figures so far whether they are the using the pre or post decompression GB/s, but pre makes more sense IMO. Different workloads have vastly different compression ratios which would make the results a bit meaningless if they were the post decompression numbers.

Also the 7GB/s result they demonstrated on the CPU vs GPU test specifically for the DirectStorage 1.1 demo seems suspiciously close to the throughput available on today's fastest NVMe's.

chris1515 · Oct 15, 2022

pjbliverpool said:
Im not sure that is the case. There is no specific detail from the provided figures so far whether they are the using the pre or post decompression GB/s, but pre makes more sense IMO. Different workloads have vastly different compression ratios which would make the results a bit meaningless if they were the post decompression numbers.

Also the 7GB/s result they demonstrated on the CPU vs GPU test specifically for the DirectStorage 1.1 demo seems suspiciously close to the throughput available on today's fastest NVMe's.

I am speaking about the Nvida benchmark.

I never saw a 35 GB/s SSD. This is the only objective mesure we have and we know exactly the GPU they used. This is far from perfect and not a game or real game content but at least it gives an idea.

pjbliverpool · Oct 15, 2022

chris1515 said:
I am speaking about the Nvida benchmark. I never saw a 35 GB/s SSD.

I don't think the benchmark would require a fast SSD. It's only measuring GPU capability so the compressed data could be stored in system RAM or even GPU memory. I would take 35GB/s (and the other results) to be the input speed which the GPU is able to decompress in real time. Not the outputted size which would vary massively by workload.

chris1515 · Oct 15, 2022

pjbliverpool said:
I don't think the benchmark would require a fast SSD. It's only measuring GPU capability so the compressed data could be stored in system RAM or even GPU memory. I would take 35GB/s (and the other results) to be the input speed which the GPU is able to decompress in real time. Not the outputted size which would vary massively by workload.

It is easy to find the measurement. Because there is the file for the benchmark. If you know how much data you decompress easy to find the truth. I am pretty sure this is decompressed speed. There is no guess work. I am sure someone will do the measurement.

Shifty Geezer · Oct 16, 2022

arandomguy said:
I feel you're still approaching it from a sustained mindset by that remark. The issue isn't whether or not you need to load in new data every single frame but if it's streaming in at real time it might need to stream that set of data at any given frame. Forgot a hard stutter for the moment, let's just it's textures being slow to load in as you go into a new room while data streams in.

We can argue how much of a delay ends up noticeable and detracting from the experience but going by numbers if it's half second than 500 MB/s goes to 1 GB/s. 0.25 seconds (15 frames at 60 fps) is 2 GB/s, and doubles so on. So while 500 MB (or whatever small number) may seem small, in a latency/burst scenario the processing requirement escalates quickly in terms of the numbers.

Unless you get proper 'tiled assets' at which point BW is moot. To strategies to the same problem, one to load massive datasets very quickly and the other to optimise data access to far smaller datasets. I feel we shouldn't really be caring about multi-GB/s but more refined games, but perhaps that's impossible/unrealistic for reasons I don't understand? Considering how long ago Rage and Trials integrated tiled textures/streamed content, there's be very little real movement in that field. Is that simply because the engines (UE, Unity, etc) aren't using them and short of a proprietary engine, everyone's going to be reliant on the 'brute force' solution?

Silent_Buddha · Oct 16, 2022

Shifty Geezer said:
Considering how long ago Rage and Trials integrated tiled textures/streamed content, there's be very little real movement in that field. Is that simply because the engines (UE, Unity, etc) aren't using them and short of a proprietary engine, everyone's going to be reliant on the 'brute force' solution?

There was a combination of factors associated with the idea of tiled megatextures which prevented it from gaining more mainstream adoption. Asset creation into one large megatexture was a challenge, although it technically allowed for greater diversity of textures (everything can theoretically be unique) it ran into a situation where time constraints meant that artists were re-using texture "stamps". It was still better than the regular ordered grids you'd get from tiling textures over a landscape, but it was far from idea.

Storage requirements and access patterns was another factor. In order to get it to fit into a reasonably sized shipping product meant that some area's had to be greatly compressed. Much more so than traditional texture streaming techniques.

Combined that meant that the industry at large focused more on improving traditional texture streaming techniques (lots of smaller texture files) versus larger megatextures. It was not only faster WRT asset creation but it also generally leads to overall consistently higher texture quality albeit at the cost of having to find ways to hide obviously repeating tiled textures.

Regards,
SB

Shifty Geezer · Oct 16, 2022

Well if megatexturing wasn't possible due to storage requirements, that'll be just as much the case with super-fast SSDs, surely? As mentioned earlier in this conversation, just how much data can a game realistically use given production limitations? I guess you could have a smaller download bake textures for use via procedural generation. Not necessarily JIT, but in a background thread writing files to storage ahead of them being accessed. You could have 10 GBs of current textures, and a buffer of 10 GB/s of next-area textures generated over 15 minutes play in the current area. A complicated solution though! The other obvious solution would be streaming over the internet but that's a bit overkill and inefficient.

Silent_Buddha · Oct 16, 2022

Shifty Geezer said:
Well if megatexturing wasn't possible due to storage requirements, that'll be just as much the case with super-fast SSDs, surely? As mentioned earlier in this conversation, just how much data can a game realistically use given production limitations? I guess you could have a smaller download bake textures for use via procedural generation. Not necessarily JIT, but in a background thread writing files to storage ahead of them being accessed. You could have 10 GBs of current textures, and a buffer of 10 GB/s of next-area textures generated over 15 minutes play in the current area. A complicated solution though! The other obvious solution would be streaming over the internet but that's a bit overkill and inefficient.

Storage requirements don't make it not possible, it just limits how unique and detailed the texture can be. As iD Software did with Rage, you can have higher quality textures in some areas at the expense of lower (in the case of Rage, far lower) quality textures in other areas. Keep in mind this also shipped when Rage needed to be shipped on DVDs.

You'd have the same potential issue even with the traditional method of using many small textures if you wanted to attempt to have unique textures everywhere.

Megatextures do solve some problems that arise with traditional texturing in games, however it doesn't come without its own set of issues that the developer has to deal with.

Larger amounts of memory likely helped make megatextures less desirable the past console generation as you could just have more textures in memory. With significant increases in the amount of video memory for consoles likely a thing of the past, we may see a renewed interest in megatextures.

Being able to access and use just a fraction of any given texture will potentially be a huge benefit in games going into the future. And if you are going to have to do that anyway then megatextures might be seen as preferable to many smaller textures. Technology like SFS could eventually help make megatextures much more in demand than they currently are.

As it stands, we're sort of in a transition period WRT console games. Industry inertia is going to lean towards using existing techniques and only changing when a developer feels they absolutely have to. Hence we see relatively limited attempts to leverage the fast storage available. And similar to that we see most developers still texturing in their games in a traditional manner. Going forward, developers will have to leverage both fast storage as well as more efficient texture streaming techniques if they wish to advance the state and quality of textures within games.

However, until they feel they have to overhaul their engines in order to do that, they'll (in general) need to be forced to do it, otherwise they'll continue to do things as they've done it in the past. And they won't necessarily feel pressured to do it until another developer greatly advances the state of what is possible such that they feel they need to change things in order to "keep up." While Insomniac did some nice things with fast storage and streaming, it wasn't so incredibly advanced over what developers are already doing that they've felt any real pressure to significantly overhaul what they are doing.

Compounding this is that all the media buzz right now is RT RT RT. So developers are mostly focused on that at the expense of leveraging faster storage and more efficient texture streaming. Two things that, IMO, could bring similar or greater graphical improvements to games as the relatively limited RT that hardware (especially consoles) can currently leverage.

It's sort of like Audio. It could make an incredibly huge change in how games are experienced if used well, however, it gets relegated to ... well, if we have the time for most developers (who then don't have the time) due to lack of media attention which leads to lack of consumer demand.

Regards,
SB

Next-Generation NVMe SSD and I/O Technology [PC, PS5, XBSX|S]

Remij

arandomguy

Remij

PSman1700

pjbliverpool

B3D Scallywag

davis.anthony

pjbliverpool

B3D Scallywag

PSman1700

chris1515

pjbliverpool

B3D Scallywag

Remij

chris1515

pjbliverpool

B3D Scallywag

chris1515

pjbliverpool

B3D Scallywag

chris1515

Shifty Geezer

uber-Troll!

Silent_Buddha

Shifty Geezer

uber-Troll!

Silent_Buddha

Similar threads