Unreal Engine 5, [UE5 Developer Availability 2022-04-05]

I’m more forgiving of shader comp stutter as the engine is trying to work around an underlying issue with DirectX. Traversal stutter makes no sense to me with todays massive system ram pools, SSDs and fast PCIe.
A big chunk of the stutter does not come from streaming in assets but from initializing objects in game. Some objects can also have scripts attached that start when the object first appears in a game. Think as an example of spawning a NPC in a game like Avowed of Oblivions. Most of those NPCs are not models you load straight from a disk, the NPC are usually made with the in-game character editor. This is all game-specific code and not engine code. So what you end up doing is load a base model and tweak a lot of parameters, like changing material values, textures of even materials, swap the hair. Heck, maybe even generate some textures on the fly depending on settings like age of the NPC. It's up to the game-devs to make this code run multi-threaded themselves (or time slice the code, aka spread the initialization across multiple frames).

So faster SSDs or more PCIe bandwidth will not help here. Single-core CPU speed is important here and that has not really been improving in the past years. Tim Sweeney actually touched on this in his interview with Lex that he wants to make this all multi-threaded one day. But that is a too much of a burden on the gameplay programmers now and would have not been a good idea to do considering a lot of external studios use it. He is not the only one with this view, here is a Remedy employee who says the same.
 
To
A big chunk of the stutter does not come from streaming in assets but from initializing objects in game. Some objects can also have scripts attached that start when the object first appears in a game. Think as an example of spawning a NPC in a game like Avowed of Oblivions. Most of those NPCs are not models you load straight from a disk, the NPC are usually made with the in-game character editor. This is all game-specific code and not engine code. So what you end up doing is load a base model and tweak a lot of parameters, like changing material values, textures of even materials, swap the hair. Heck, maybe even generate some textures on the fly depending on settings like age of the NPC. It's up to the game-devs to make this code run multi-threaded themselves (or time slice the code, aka spread the initialization across multiple frames).

So faster SSDs or more PCIe bandwidth will not help here. Single-core CPU speed is important here and that has not really been improving in the past years. Tim Sweeney actually touched on this in his interview with Lex that he wants to make this all multi-threaded one day. But that is a too much of a burden on the gameplay programmers now and would have not been a good idea to do considering a lot of external studios use it. He is not the only one with this view, here is a Remedy employee who says the same.
So why on PC with more memory cant more things be initialized and loaded into memory upfront?

What you're describing to me sounds like a similar issue to shader compilation. Why cant these entities be "instanced" ahead of time so that they're easier to stream in at runtime?
 
To

So why on PC with more memory cant more things be initialized and loaded into memory upfront?
Because you'll have long loading periods. Your options are:

1) Preload everything with a 5 minute initialisation and cache these to disk to load when needed
2) Pre-cache everything so there's no runtime procedural content and require 500 GBs of storage
3) Create on demand the moment it's needed with a moment's delay

And of course 4, have content generation threads that create the necessary instantiations on the side.
 
Because you'll have long loading periods. Your options are:

1) Preload everything with a 5 minute initialisation and cache these to disk to load when needed
2) Pre-cache everything so there's no runtime procedural content and require 500 GBs of storage
3) Create on demand the moment it's needed with a moment's delay

And of course 4, have content generation threads that create the necessary instantiations on the side.
We have SSDs with 5GB/s+ of speed and PCs with 32-64GB of RAM as well as CPUs and GPUs capable of decompressing at 30GB/s+ at load time.

They could precache more at the expense of some storage space.. it wouldn't have to be everything. I'm sure people would appreciate fewer stutters at the very least.

Also Devs could do a better job of timing these instances so they dont affect gameplay at all.
 
Because you'll have long loading periods. Your options are:

1) Preload everything with a 5 minute initialisation and cache these to disk to load when needed
2) Pre-cache everything so there's no runtime procedural content and require 500 GBs of storage
3) Create on demand the moment it's needed with a moment's delay

And of course 4, have content generation threads that create the necessary instantiations on the side.

I don’t know why 4 isn’t the default answer. There should always be background threads doing this stuff proactively given how underutilized modern systems are.
 
I do get the feeling PCs with 32 - 64 GB of System Memory could do with more caching before a game is entered to prevent frame-time hitches. At the end of the day, having memory scale seems like a positive thing instead of everyone always streaming at the exact same point regardless of PC configuration.
It does seem like the right way to do it, but I think such a system would increase the burden on QA. The game as a whole would become less deterministic, certain hard-to-reproduce bugs can appear, and QA would have to test with an even wider variety of hardware configurations than they already do.
 
So why on PC with more memory cant more things be initialized and loaded into memory upfront?
Even with everything being loaded into RAM up front you still have a CPU cost on object initialization (which is what Pjotr is saying), and depending on your target framerate that cost may lead to a hitch. Hiding that cost can be very non trivial especially on an engine which you didn't write.
 
To

So why on PC with more memory cant more things be initialized and loaded into memory upfront?

What you're describing to me sounds like a similar issue to shader compilation. Why cant these entities be "instanced" ahead of time so that they're easier to stream in at runtime?
You can do all that and UE5 gives you a lot of tools to do it. But the engine can't do it for you as there is a lot of game specific code . If you take the NPC example, it might have a certain outfit based on weather or time of day. But if you load it all upfront the outfit will not match when you finally spawn the NPC in. So now you need to separate the things you can load upfront and which you can't. It just gets complex and to do this right just costs extra development time for the game developers.

Also, most gameplay code is also often written as: a) I need this object so lets spawn the object, b) do something with object. All in one game tick (frame). So for gameplay programmers it will be a big shift in thinking about the code when dealing with delayed initialization or threading. This is what both Tim was hinting in the interview and Lea on twitter. It's all possible, but it adds a lot of complexity and bugs.

These kind of optimizations are usually done very late in the development cycle as the added complexity makes it hard to prototype new game ideas. And you usually do this where needed. So another thing you need to have is good debugging tools to actually find these issues so you can allocate the development time you have to fix the most severe cases. These debugging system were not there yet in early UE4 and most of those systems started to appear pretty late in UE4, some even in UE5.
 
You can do all that and UE5 gives you a lot of tools to do it. But the engine can't do it for you as there is a lot of game specific code . If you take the NPC example, it might have a certain outfit based on weather or time of day. But if you load it all upfront the outfit will not match when you finally spawn the NPC in. So now you need to separate the things you can load upfront and which you can't. It just gets complex and to do this right just costs extra development time for the game developers.

Also, most gameplay code is also often written as: a) I need this object so lets spawn the object, b) do something with object. All in one game tick (frame). So for gameplay programmers it will be a big shift in thinking about the code when dealing with delayed initialization or threading. This is what both Tim was hinting in the interview and Lea on twitter. It's all possible, but it adds a lot of complexity and bugs.

These kind of optimizations are usually done very late in the development cycle as the added complexity makes it hard to prototype new game ideas. And you usually do this where needed. So another thing you need to have is good debugging tools to actually find these issues so you can allocate the development time you have to fix the most severe cases. These debugging system were not there yet in early UE4 and most of those systems started to appear pretty late in UE4, some even in UE5.

That all makes sense but the issue is that these have been known problems for multiple console generations now. At what point can we stop using the excuse that it’s complicated? Multi-threading and resource management aren’t going to get any easier as games get bigger in the future. Core counts and memory pools and interface bandwidths will only keep growing. Software needs to catch up.
 
@trinibwoy But this is an issue in software in general, not just games. Some problems are very easy to split up into work that can be threaded, but many are not. It's a problem that a lot of systems languages are thinking about, but operating systems have a huge part to play. Unfortunately it's just not an easy problem.
 
@Pjotr actually covered a lot of good info and background so I'll maybe just add my 2c in a few places.

So why on PC with more memory cant more things be initialized and loaded into memory upfront?
Worth remembering this is not a black and white issue here. As Pjotr mentioned, you can indeed do a lot of things up front and games do tend to do this. In fact you can think of HLOD and similar systems as precisely this - bake down a pile of complicated editor things into a simpler representation that can be resident most of the time without redoing all of that work. But there are a number of considerations worth mentioning:

1) A lot of these streaming systems are not *just* about saving memory footprint, but rather various knock-on effects of having "more things". Ex. Nanite and virtual textures can generally handle streaming of the rendering of most objects independently at a fine-grain, so why do we even need HLOD anymore? A big one is just cutting down on total object/instance counts, which can affect Nanite but can have an even greater effect on lots of other parts of the engine. Ex. having a whole pile of physics objects loaded is not feasible, as even with acceleration structures there's some practical limits on how many active simulated objects there can be at a time. There are many more examples of places where systems that are optimized for thousands of objects just fall over entirely when presented with millions. Some of this can and is being improved on the tech side of course, but there will always be a need for streaming and LOD for open world games because...

2) The asymptotic scaling of this stuff in a 3D world is awful. Even in a relatively 2D/flat plane kind of world, doubling a streaming radius is 4x the cost (footprint, instances, everything). If there's significantly more "3D" content/verticality, it's even worse (up to 8x). Even with hierarchical data structures and cleverness you will always need LOD, both in graphics and in "gameplay".

3) Given the above, having more RAM is not necessarily even a huge advantage. Sure you can potentially push the radius of "high fidelity loaded stuff" out a bit, but the scaling is poor and more importantly, it doesn't actually change the cost of how many things you need to swap in/out as you move throughout the world, which is really just a function of movement speed and asset density. i.e. you don't actually make the streaming problem any easier by increasing the streaming radius until you can actually load the entire world, which is basically impossible for a game of any reasonable size due to the scaling function.
That all makes sense but the issue is that these have been known problems for multiple console generations now. At what point can we stop using the excuse that it’s complicated? Multi-threading and resource management aren’t going to get any easier as games get bigger in the future. Core counts and memory pools and interface bandwidths will only keep growing. Software needs to catch up.
First it's not an excuse, it's an explanation. But more importantly, there *has* been major progress on this front over the years; modern engines and consoles are significantly more efficient at streaming and handling higher complexity scenes than they used to be. But along with the improvement comes the insatiable desire to push the content even further, which unfortunately has often entirely eclipsed the improvements. It's the same issue with much of computing... as the capability expand so do the desires and expectations.

I will re-echo @Pjotr's point that part of the problem is that a lot of this code is on the game side, not the engine side, and it is often written by folks who are not performance/optimization experts primarily. Sometimes it'll be blueprint, but even if it's in C++ (in Unreal) it is often written in a very practical way. The systems have existed to make this stuff async and parallel for a long while, but it makes the code more complex and hard to maintain, particularly for non-experts. Stuff like MASS and PCG are trying to help address some of these cases in the short term in Unreal, but require rethinking how some of these systems are architected on the game side.

Ultimately though this is one of the main goals of Verse - to provide a programming language that is better designed for modern, parallel/async by default type needs while still letting people express the logic in a way that is more procedural and intuitive. I will not claim this is an obvious or easy thing to do... there are entire graveyards of languages that have attempted similar things and fallen short. Thus I think a "believe it when I see it" attitude is very warranted here, but conversely it's unreasonable to claim that this problem is being ignored; indeed they are trying to invent a new programming language with ambitious design goals pretty much precisely to try and address the issues of non-experts writing inefficient serial code that is impossible to optimize at an engine level.

This is all a good discussion, just don't expect a silver bullet here. This stuff is very adjacent to the "how do we parallelize general purpose code" questions that have been at the core of computer science research for decades.
 
So Gears of War Reloaded official specs looks:

"4K resolution
60 FPS in Campaign
120 FPS in Multiplayer"

I'll be curious to see what The Coalition shows with UE5 on console especially on Series X!
 
So Gears of War Reloaded official specs looks:

"4K resolution
60 FPS in Campaign
120 FPS in Multiplayer"

I'll be curious to see what The Coalition shows with UE5 on console especially on Series X!
It appears to be a remaster of the Ultimate Edition which used a heavily-modified version of Unreal Engine 3, not a full remake on a new engine. We'll have to wait until E-Day to see Gears on UE5.
 
why on PC with more memory cant more things be initialized and loaded into memory upfront?

I have tracked this issue for a long time (and made a thread about it), and came to the conclusion that certain things in graphics remain single threaded no matter what you try. They need an algorithmic innovation, a break through in software engineering to solve.

Full object initialization -especially the part involving unpredictable game logic- is still mostly single-threaded because of root issues.

Game object initialization typically involves allocating memory for the object, assigning a position in the world, linking components (mesh, physics, AI, ...etc), running scripts or logic designed by game designers (Blueprints or C++), registering the object with game systems like collision, rendering, navigation ...etc.

Each of the steps often rely on the successful completion of the previous step, and many are deeply intertwined with core game logic and because this logic can be customized per object or even per spawn instance, the engine can’t easily predict or parallelize it.

Worse yet, most game logic written in Blueprints or scripting languages is not thread-safe by default.

If an actor's script references global game state (e.g., weather, AI, save data, ... etc), it must be run in the correct order to avoid conflicts.

Ensuring thread safety in such a dynamic and designer-driven environment is incredibly hard. Mistakes could introduce bugs that only happen in specific conditions and are nearly impossible to reproduce.

As an example:

An enemy NPC may spawn with randomized weapons, AI behavior states, and custom shaders. Its behavior tree may need to be initialized in sync with level triggers.

If the AI controller tries to access the world state before the object is fully registered, it could cause a crash or undefined behavior.

Since these operations are sequentially dependent, running them on multiple threads risks race conditions -where two threads try to read/write the same data at the same time- unless developers write very careful thread-safe code. They also risk deadlocks, crashes, unspecified behavior and hard to reproduce bugs.

These systems often assume that operations occur in a strict order, often on the main thread, where each subsystem knows the global state is stable and consistent. If object initialization were multi-threaded, it would require each of these systems to be completely refactored for concurrent update, which is an enormously complex task.

Until game engines evolve to make these steps inherently safe and thread-aware, single-threaded object initialization remains a compromise between safety and performance, ... favoring stability, especially in complex or designer-driven games.

Thus, UE5 and most other engines (see RE engine) defaults to initializing objects on the main thread, where operations are predictable and sequential, reducing potential for errors.

In the end, single-threaded execution ensures deterministic behavior, while current multi theading execution methods do not, unfortunately.
 
@Pjotr actually covered a lot of good info and background so I'll maybe just add my 2c in a few places.


Worth remembering this is not a black and white issue here. As Pjotr mentioned, you can indeed do a lot of things up front and games do tend to do this. In fact you can think of HLOD and similar systems as precisely this - bake down a pile of complicated editor things into a simpler representation that can be resident most of the time without redoing all of that work. But there are a number of considerations worth mentioning:

1) A lot of these streaming systems are not *just* about saving memory footprint, but rather various knock-on effects of having "more things". Ex. Nanite and virtual textures can generally handle streaming of the rendering of most objects independently at a fine-grain, so why do we even need HLOD anymore? A big one is just cutting down on total object/instance counts, which can affect Nanite but can have an even greater effect on lots of other parts of the engine. Ex. having a whole pile of physics objects loaded is not feasible, as even with acceleration structures there's some practical limits on how many active simulated objects there can be at a time. There are many more examples of places where systems that are optimized for thousands of objects just fall over entirely when presented with millions. Some of this can and is being improved on the tech side of course, but there will always be a need for streaming and LOD for open world games because...

2) The asymptotic scaling of this stuff in a 3D world is awful. Even in a relatively 2D/flat plane kind of world, doubling a streaming radius is 4x the cost (footprint, instances, everything). If there's significantly more "3D" content/verticality, it's even worse (up to 8x). Even with hierarchical data structures and cleverness you will always need LOD, both in graphics and in "gameplay".

3) Given the above, having more RAM is not necessarily even a huge advantage. Sure you can potentially push the radius of "high fidelity loaded stuff" out a bit, but the scaling is poor and more importantly, it doesn't actually change the cost of how many things you need to swap in/out as you move throughout the world, which is really just a function of movement speed and asset density. i.e. you don't actually make the streaming problem any easier by increasing the streaming radius until you can actually load the entire world, which is basically impossible for a game of any reasonable size due to the scaling function.

First it's not an excuse, it's an explanation. But more importantly, there *has* been major progress on this front over the years; modern engines and consoles are significantly more efficient at streaming and handling higher complexity scenes than they used to be. But along with the improvement comes the insatiable desire to push the content even further, which unfortunately has often entirely eclipsed the improvements. It's the same issue with much of computing... as the capability expand so do the desires and expectations.

I will re-echo @Pjotr's point that part of the problem is that a lot of this code is on the game side, not the engine side, and it is often written by folks who are not performance/optimization experts primarily. Sometimes it'll be blueprint, but even if it's in C++ (in Unreal) it is often written in a very practical way. The systems have existed to make this stuff async and parallel for a long while, but it makes the code more complex and hard to maintain, particularly for non-experts. Stuff like MASS and PCG are trying to help address some of these cases in the short term in Unreal, but require rethinking how some of these systems are architected on the game side.

Ultimately though this is one of the main goals of Verse - to provide a programming language that is better designed for modern, parallel/async by default type needs while still letting people express the logic in a way that is more procedural and intuitive. I will not claim this is an obvious or easy thing to do... there are entire graveyards of languages that have attempted similar things and fallen short. Thus I think a "believe it when I see it" attitude is very warranted here, but conversely it's unreasonable to claim that this problem is being ignored; indeed they are trying to invent a new programming language with ambitious design goals pretty much precisely to try and address the issues of non-experts writing inefficient serial code that is impossible to optimize at an engine level.

This is all a good discussion, just don't expect a silver bullet here. This stuff is very adjacent to the "how do we parallelize general purpose code" questions that have been at the core of computer science research for decades.
Thanks for that explanation Andrew! I guess my curiosity is that it seems like a lot of games load these big zones and initialize/instance a lot of things at once when judging by the game it seems it could be done in a more fine-grained way. Obviously the more fined grain they try to make it, the harder it becomes to manage right? This is where I think people get stuck on it being an "engine issue". Ultimately how games load and stream in their data is up to the developers to manage, but I think when people refer to this stuff as an Unreal Engine issue... it's more of the way the devs choose to use it, instead of the engine itself. I look at FF7 Rebirth, and it's clear to me that those developers went above and beyond to tailor that game to loading and streaming properly.. which is in line with their comments about taking pride and working hard to ensure that the game could handle traversing through the environment on Chocobos, and obviously for future work coming with the next installment and having the airship. The game deserves props for not really having any traversal stuttering. It shows it can be done.

I honestly think this topic would make for a good "tech talk" style Digital Foundry video going over which aspects of game are handled by the engine code and which are handled by the game code. Maybe if people had a better idea of how these things work, the complaints would hit the right people.. because as much as some of us rag on Unreal Engine for all these issues.. it doesn't do us any good if developers are able to handwave their issues away as some general engine issue!

Edit: @DavidGraham Nice explanation as well, thank you!
 
Last edited:
@Pjotr 3) Given the above, having more RAM is not necessarily even a huge advantage. Sure you can potentially push the radius of "high fidelity loaded stuff" out a bit, but the scaling is poor and more importantly, it doesn't actually change the cost of how many things you need to swap in/out as you move throughout the world, which is really just a function of movement speed and asset density. i.e. you don't actually make the streaming problem any easier by increasing the streaming radius until you can actually load the entire world, which is basically impossible for a game of any reasonable size due to the scaling function.

Wouldn't increased radius alleviate some pressure on the system to load things JIT? Assuming future games feature similar player movement speed through the environment a bigger "world cache" should make it a bit easier to swap stuff around in the background without disrupting the player's immediate surroundings.

First it's not an excuse, it's an explanation. But more importantly, there *has* been major progress on this front over the years; modern engines and consoles are significantly more efficient at streaming and handling higher complexity scenes than they used to be. But along with the improvement comes the insatiable desire to push the content even further, which unfortunately has often entirely eclipsed the improvements. It's the same issue with much of computing... as the capability expand so do the desires and expectations.

That's fair. However we can't expect gamers to accept the explanation that it's complicated...forever. I'm not trying to say it's easy or that people aren't working on it but at some point fundamental engine and game architectures need to evolve to accommodate these intertwined systems and scale up from there. I remember back in the PS3 era there was a lot of excitement around evolving engines to take advantage of multi-threaded platforms and it seems there's still a long way to go.

Ultimately though this is one of the main goals of Verse - to provide a programming language that is better designed for modern, parallel/async by default type needs while still letting people express the logic in a way that is more procedural and intuitive. I will not claim this is an obvious or easy thing to do... there are entire graveyards of languages that have attempted similar things and fallen short. Thus I think a "believe it when I see it" attitude is very warranted here, but conversely it's unreasonable to claim that this problem is being ignored; indeed they are trying to invent a new programming language with ambitious design goals pretty much precisely to try and address the issues of non-experts writing inefficient serial code that is impossible to optimize at an engine level.

Hope it works out! I suspect the solution is less about having a magic language and more about good software design.
 
I’m more forgiving of shader comp stutter as the engine is trying to work around an underlying issue with DirectX. Traversal stutter makes no sense to me with todays massive system ram pools, SSDs and fast PCIe.

I'm jumping into this conversation late so this might be a bit disjointed with the other responses.

Those aside however while memory is faster and larger compared to the past in absolute terms it actually isn't in relative terms compared to processing performance. That is to say in the time it took processing speeds to increase 100x, memory performance and capacity did not (note I don't just mean ram). Also a note that system ram depending on the context can be considered very slow for what's needed.

The software side is advancing in being able to work around that limitation/disconnect that's been growing for decades. That ranges from the highest level (the game itself) all the way down to the lowest (such as prediction algorithms on the processors).
 
Back
Top