Next-Generation NVMe SSD and I/O Technology [PC, PS5, XBSX|S]

Shifty Geezer

uber-Troll!
Moderator
Legend
I know what is is virtual texturing and virtual geometry(nanite). One day Unreal Engine will use only Nanite for all geometry and virtual texturing. Since my first post I said there is more than geometry and texture to load, First this is cool to use virtual geometry but if the game use raytracing you need a BVH and some offscreen geometry. The game probably stream BVH for static geometry. For the moment, it is using proxy geometry and not Nanite triangle data.

If the game uses this type of rendering where they mix sdf tracing and triangle based raytracing. You can stream two data structures for static geometry. In the paper the SDF is generated at runtime.


After like I said animation, sound or Alembic Cache animation/destruction or any other baked stuff can be stream too. And if I remember well 2019 GDC Spiderman postmortem, animations takes tons of place on the disc.
RT does definitely throw a spanner in the mix. Of course, the flip side is people are coming off the back of conventional memory management. If prior to RTRT we were using SSDs, far slower than now, with effective virtual assets, the whole thought process towards an RT solution would be different. With things as they are now, the momentum is towards faster storage for all the reasons you state.

Another key point is the end-target. Potentially, for the Dr. Strange example you give, we need as many GB/s as possible. But realistically, how much do real games need? Is it worth designing hardware around a tiny percentile versus the 95th percentile, or even 99th? On the one hand it's always nice to give devs as much freedom as possible. On the other, limits have to be drawn and I posit the alternative solution would be better overall resulting in more efficient hardware that's either cheaper and lower power draw or with a better processing/data ratio. But then maybe that be constraining and the future is inevitably Big Data? But then how do you get past the very real limits imposed by mammoth datasets, storage size and cost to produce?
 

chris1515

Legend
Supporter
RT does definitely throw a spanner in the mix. Of course, the flip side is people are coming off the back of conventional memory management. If prior to RTRT we were using SSDs, far slower than now, with effective virtual assets, the whole thought process towards an RT solution would be different. With things as they are now, the momentum is towards faster storage for all the reasons you state.

Another key point is the end-target. Potentially, for the Dr. Strange example you give, we need as many GB/s as possible. But realistically, how much do real games need? Is it worth designing hardware around a tiny percentile versus the 95th percentile, or even 99th? On the one hand it's always nice to give devs as much freedom as possible. On the other, limits have to be drawn and I posit the alternative solution would be better overall resulting in more efficient hardware that's either cheaper and lower power draw or with a better processing/data ratio. But then maybe that be constraining and the future is inevitably Big Data? But then how do you get past the very real limits imposed by mammoth datasets, storage size and cost to produce?

At the end of this generation, I expect virtual texturing to be in nearly all game engine. Virtual geometry will not be very common.
 

dobwal

Legend
Yes, but 40 GB/s??


For what engines? Is that a perfect streaming engine, or the modern sort? For comparison, Trials had perfect streaming off HDD, every single element of the game could be put into one level.

But this is going too OT intoa software architecture debate. Presently, the software state is what it is and isn't changing in a big way any time soon. SSDs and compression are going to get faster and faster. Lots of big numbers to post here and oggle over and get excited when they get bigger and bigger. ;)

GBps is just a way to make it easier for us to conceptualize RAM/storage data rates. But, 40 GBps is also 40 MBpms or 40 KBpus. An app may never need 40 GBs at a sustained rate over a full second. But can you say because of that the app will never readily benefit from 40 MB over a millisecond, 40 KBs over a microsecond or 40 bytes over a nanosecond (forgive me if my math is off)?

All else considered, the more bandwidth the better latency performance across all memory requests.
 
Last edited:

cwjs

Regular
For what engines? Is that a perfect streaming engine, or the modern sort? For comparison, Trials had perfect streaming off HDD, every single element of the game could be put into one level.
Trials as in the 2d game where you go One direction on a fixed track at a fixed max speed? I hate to break it to you, but most games have a different average use case
 

Albuquerque

Red-headed step child
Veteran
All else considered, the more bandwidth the better latency performance across all memory requests.
This is a somewhat common misperception; memory latency can very much be unlinked to absolute latency, and the lowest latency memory is never defacto the highest throughput. Although to be fair, memory latencies are crazy multivariable calculus to get your head wrapped around, consisting of dozens of timing values in modern DDR4 that all interweave in seemingly mysterious ways to create the final performance result. I feel like a better way to conceptualize this difference between storage bandwidth vs latency might be to think about it from a networking perspective.

For those of you who play online games, you already understand how latency affects your game, and I'm sure some of you have a nice fat pipe at home but still occasionally get on a server with abysmal ping times. Gig pipe, slow ping? Yeah, you can reliably and consistently push gigabit++ (tens of gigs, hundreds of gigs, even terabit-scale) speeds with latencies measured in full seconds. How you accomplish this depends on the protocol; TCP will need correctly sized buffers, further helped by TCP window scaling and selective acknowledgement (SACK) to maintain continuity of speed. UDP on the other hand could just blast at full volume without caring.

The converse can also be true: you could reasonably connect to a server very close to you with a sub-10msec ping, but only be able to establish a few tens of megabits in bandwidth between you. Lower bandwidth connections end up needing far less buffering (which can have it's own knock-on latency effect) and could reasonably operate with smaller frame sizes to force "more timely" updates in smaller chunklets. In a sense, lower bandwidth connections might feasibly help latency depending on a few of the factors I mentioned...

Anyway, all that to say, latency and bandwidth are related to sure, but lower latency is not directly correlated with higher bandwidth for myriad reasons.
 

dobwal

Legend
This is a somewhat common misperception; memory latency can very much be unlinked to absolute latency, and the lowest latency memory is never defacto the highest throughput. Although to be fair, memory latencies are crazy multivariable calculus to get your head wrapped around, consisting of dozens of timing values in modern DDR4 that all interweave in seemingly mysterious ways to create the final performance result. I feel like a better way to conceptualize this difference between storage bandwidth vs latency might be to think about it from a networking perspective.

For those of you who play online games, you already understand how latency affects your game, and I'm sure some of you have a nice fat pipe at home but still occasionally get on a server with abysmal ping times. Gig pipe, slow ping? Yeah, you can reliably and consistently push gigabit++ (tens of gigs, hundreds of gigs, even terabit-scale) speeds with latencies measured in full seconds. How you accomplish this depends on the protocol; TCP will need correctly sized buffers, further helped by TCP window scaling and selective acknowledgement (SACK) to maintain continuity of speed. UDP on the other hand could just blast at full volume without caring.

The converse can also be true: you could reasonably connect to a server very close to you with a sub-10msec ping, but only be able to establish a few tens of megabits in bandwidth between you. Lower bandwidth connections end up needing far less buffering (which can have it's own knock-on latency effect) and could reasonably operate with smaller frame sizes to force "more timely" updates in smaller chunklets. In a sense, lower bandwidth connections might feasibly help latency depending on a few of the factors I mentioned...

Anyway, all that to say, latency and bandwidth are related to sure, but lower latency is not directly correlated with higher bandwidth for myriad reasons.

I am not saying that latency and bandwidth are directly correlated on a hardware level. That’s obvious from looking at system memory vs VRAM. Where system memory favors latency reduction over bandwidth because of the serial nature of a cpu. But bandwidth and latency are intrinsically linked as a gpu dependence on greater bandwidth is still a matter of shortening the time it takes for data to reach its destination. Where VRAM is focused on reducing latency of memory requests across 1000s of processors operating in parallel and the latency performance of a single request takes a back seat.

Inevitably the purpose of increasing bandwidth is to minimize latency. Can latency be reduced through other avenues? Sure. But that doesn’t change the motivating factor of why bandwidth is growing across the space.
 
Last edited:

Shifty Geezer

uber-Troll!
Moderator
Legend
Trials as in the 2d game where you go One direction on a fixed track at a fixed max speed? I hate to break it to you, but most games have a different average use case
I knew that'd come up! Just because I used Trials as an example, doesn't mean I equated every game to it! There are only two games I know of that pushed virtualised textures, Rage and Trials. The rest is theory. Reliance on examples isn't great for theoretical discussions . ;)

Sebbbi didn't go on to say, in his numerous posts the matter, that "VT was great for our linear scroller but is otherwise limited." But for those who weren't around for those discussions years ago, I can't really do anything to tap you in to what was said and what we learnt. I can't even recommend you check Sebbbi's post history because it's long and deep and simply requires too much work to catch up! If you don't understand, that's okay, but the argument in favour of virtual assets isn't tied to a linear side-scroller...
 

Shifty Geezer

uber-Troll!
Moderator
Legend
I am not saying that latency and bandwidth are directly correlated on a hardware level. That’s obvious from looking at system memory vs VRAM. Where system memory favors latency reduction over bandwidth because of the serial nature of a cpu. But bandwidth and latency are intrinsically linked as a gpu dependence on greater bandwidth is still a matter of shortening the time it takes for data to reach its destination. Where VRAM is focused on reducing latency of memory requests across 1000s of processors operating in parallel and the latency performance of a single request takes a back seat.

Inevitably the purpose of increasing bandwidth is to minimize latency. Can latency be reduced through other avenues? Sure. But that doesn’t change the motivating factor of why bandwidth is growing across the space.
I'm identifying two uses if the idea of 'latency' here. The usual is time to start receiving data. Your use here seems to be more about the time it takes to complete access to necessary info. The latency you talk about is the time a process could be stalled waiting for some texture data. If you need 40 MB of data, that's available much faster on a 40 GB/s system than a 4 GB/s system, resulting in overall more potential data accesses and less waiting around (latency).

I agree with that. I don't think that defines the motivating factor for why bandwidth is growing though. I think it's growing because it can, and people will always want more! No-one's likely to stop and say, "this is good enough." As more resources become available, such as processing power, the systems that consume them are increased in complexity, hence you never have an excess until you reach the limits imposed on the human/physics level. And that doesn't make it useless or misplaced or redundnat or non-beneficial. It just might not be ideal in contrast to what's possible with a clean-slate design.
 

chris1515

Legend
Supporter
I knew that'd come up! Just because I used Trials as an example, doesn't mean I equated every game to it! There are only two games I know of that pushed virtualised textures, Rage and Trials. The rest is theory. Reliance on examples isn't great for theoretical discussions . ;)

Sebbbi didn't go on to say, in his numerous posts the matter, that "VT was great for our linear scroller but is otherwise limited." But for those who weren't around for those discussions years ago, I can't really do anything to tap you in to what was said and what we learnt. I can't even recommend you check Sebbbi's post history because it's long and deep and simply requires too much work to catch up! If you don't understand, that's okay, but the argument in favour of virtual assets isn't tied to a linear side-scroller...

I think you need to see the talk of Unreal Fest Day 2 where they talk about going from proprietary engine to Unreal Engine. They like virtual texture but studios need to ship games and this is sometimes difficult to do huge R&D. It can goes to another debate, will we finish with only two game engine Unity and Epic. I am sure for example Brian Karis would not have the luxury to do so much R&D on Nanite in a studio.
 

Shifty Geezer

uber-Troll!
Moderator
Legend
Umm...have you missed my, like, every post on this matter?
Is that simply because the engines (UE, Unity, etc) aren't using them and short of a proprietary engine, everyone's going to be reliant on the 'brute force' solution?

With SSD storage and compression getting so fast, it seems that'll be the method of choice and tiled resources will be on the back burner for years to come, if not completely relegated to 'novelty'.

And then the hardware leads, the software plays to its strengths, you get an entrenched paradigm. Once you get a point where the software paradigm can shift thanks to the hardware, the opportunity for something new has passed.

Presently, the software state is what it is and isn't changing in a big way any time soon.

But that waste might be necessary for various reasons and the perfect streaming engine might be impossible. ¯\_(ツ)_/¯

There would be value in re-evaluating the whole rendering systems, but that's now limited by business decisions.
 

chris1515

Legend
Supporter
Umm...have you missed my, like, every post on this matter?

I know but at the end we will see other engine go to virtual texturing. But from a rendering point of view I don't think there is tons of valid reason to stay with a proprietary engine for AAA games using triangle and in UE developer can customize the code. From a CPU point of view, this is currently different because I am sure Epic thinks a lot about doing more and more part of the engine going ECS and improve CPU performance. It will be a very long process because it impacts tools. I think exclusive studios will stay with proprietary engine for a long time but it will be more and more difficult for multiplatform dev out of having a title pushing multithreaded CPU very far. And many proprietary engine are using the same architecture relying a lot on one gameplay thread and one render thread.

But this is another debate.
 
Last edited:

Jawed

Legend
But from a rendering point of view I don't think there is tons of valid reason to stay with a proprietary engine for AAA games using triangle and one where developer can customize the code.
This is the crux. Which Sony platform AAA games are not going to end-up on PC?...

The writing is on the wall.
 

chris1515

Legend
Supporter
This is the crux. Which Sony platform AAA games are not going to end-up on PC?...

The writing is on the wall.

An engine is much more than a renderer, they need to change of tools and some first party have very performant multithreaded CPU code like ND or Guerrilla Games. I think first party will stay with proprietary engine for a long time. I expect Rockstar North and Take Two to keep a proprietary engine too and EA to keep Frosbite too and Ubi to keep the different engine. But AA or independent AAA studios this is a very difficult task.

Guerrilla Games said for improving sea water rendering they need 1 polygon per pixel in Notes in SIGGRAPH presentation powerpoint.
Through the project, the triangle density wasn’t high enough to support what we wanted. For example, we had sims with high splashes, which had to get toned down to remove obvious polygonal edges. That’s why the deformations on rivers are so flat, and why we aren’t using deformations for water impact effects from machines. Also, since vertex color is used for properties like foam it can’t create features smaller than triangles. This causes undersampling problems, like foam popping on the front side of the breaking wave.
Several times during the project we dug into the system to see if we could squeeze more triangles out of it. At the very end we added a system to raise the tessellation level around the waves, which improved the detail a lot. We reduced the tessellation level elsewhere slightly to compensate. We can probably push the deformations a bit further with this tech
Ideally, we’d like one vertex per pixel. But of course rendering a trimesh like that is really inefficient for the GPU. So maybe a splat-based solution is better fit.
One triangle per pixel is of course the same goal as Nanite, it’s a common theme these days…
 
Last edited:

Albuquerque

Red-headed step child
Veteran
I'm identifying two uses if the idea of 'latency' here. The usual is time to start receiving data. Your use here seems to be more about the time it takes to complete access to necessary info. The latency you talk about is the time a process could be stalled waiting for some texture data. If you need 40 MB of data, that's available much faster on a 40 GB/s system than a 4 GB/s system, resulting in overall more potential data accesses and less waiting around (latency).
Yes, this makes sense. To your point, this doesn't follow the typical definition of latency, however I'm not sure what a better term might be. It's analogous to how an ecommerce site would measure "page completion time" which measures the time between a page request being received and the page having been fully transmitted. In that specific instance, the time calculation doesn't generally consider network latency in the sense of latency invoked by transport layer (eg the internet).

Good catch :)
 

cwjs

Regular
Sebbbi didn't go on to say, in his numerous posts the matter, that "VT was great for our linear scroller but is otherwise limited." But for those who weren't around for those discussions years ago, I can't really do anything to tap you in to what was said and what we learnt. I can't even recommend you check Sebbbi's post history because it's long and deep and simply requires too much work to catch up! If you don't understand, that's okay, but the argument in favour of virtual assets isn't tied to a linear side-scroller...
Considering how long ago Rage and Trials integrated tiled textures/streamed content, there's be very little real movement in that field. Is that simply because the engines (UE, Unity, etc) aren't using them and short of a proprietary engine, everyone's going to be reliant on the 'brute force' solution?

Ok, I guess I misunderstood -- you're just talking about Virtual Texturing? It's pretty ubiquitous these days -- people don't talk about it much because there's not much to say anymore. It's a checkbox option in both ue4 and (finally) unity, a lot of AAA games use it, occasionally you see a new gdc talk or something about their innovations. I feel like basically everyone's custom engines have it -- halo infinite does, ubisofts games all do, cryengine does... Of course, it's not always used universally (I have to admit I don't really know the tradeoffs off the top of my head) but it's not a rare technique.
 

Ethatron

Regular
Supporter
Yes, this makes sense. To your point, this doesn't follow the typical definition of latency, however I'm not sure what a better term might be.
What is meant with latency here is Propagation delay , and the term for the other/bandwidth is Transmission delay. You can substitute delay for latency, e.g. propagation latency and transmittion latency. If you want to can add Roundtrip in front, to make clear it's the agregate of asking and answering time together.

Propagation delay
Amount of time required for a message to travel from the sender to receiver, which is a function of distance over speed with which the signal propagates.

Transmission delay
Amount of time required to push all the packet’s bits into the link, which is a function of the packet’s length and data rate of the link.
 

pjbliverpool

B3D Scallywag
Legend
Returnal, a previously PS5 exclusive game is recommending 32GB RAM for the PC version:


I'll wager this is because the game depends on the PS5's fast IO and the additional RAM will be used for streaming to negate the need for a fast IO system in the PC.

It could be a simple omission, but note the SSD recommendation at min spec with 16GB RAM but no such recommendation at the higher spec with 32GB.
 

Shifty Geezer

uber-Troll!
Moderator
Legend
What are the streaming features of Returnal that require fast IO? I don't recall it being called out for doing anything special, in contrast to, say, R&C.
 
Top