Unreal Engine 5, [UE5 Developer Availability 2022-04-05]

Wouldn't increased radius alleviate some pressure on the system to load things JIT? Assuming future games feature similar player movement speed through the environment a bigger "world cache" should make it a bit easier to swap stuff around in the background without disrupting the player's immediate surroundings.

An issue mentioned is that of scaling. You're likely looking at exponential capacity cost relative to processing speed gains.

An overarching problem than, as mentioned in a previous post, is that memory scaling is poor on the hardware progression side. In a hypothetical world in which we had say x8 more memory capacity at each tier the cost/benefit analysis would likely be different. The PS5 really would have 64GB or even 128GB memory, with PC hardware also adjusted comparably, in terms of inline scaling. At something like 128GB I do wonder if the design paradigm would be akin to something like loading the entire game world essentially.
 
Wouldn't increased radius alleviate some pressure on the system to load things JIT? Assuming future games feature similar player movement speed through the environment a bigger "world cache" should make it a bit easier to swap stuff around in the background without disrupting the player's immediate surroundings.
It would seem to but it doesn't actually :) Think about drawing a circle around a dot (the character) that you move around as the dot moves. The amount of stuff you need to cycle in/out you can think of as roughly the amount of area covered by the leading edge of the swept circle. If you move faster, that area will be larger per unit time. Critically though making that circle bigger (i.e. a larger streaming radius) actually makes the amount of things you need to stream in/out slightly *worse* for the same speed movement, because you effectively pick up more area on the "sides" (relative to your movement direction). The only case in which it is a benefit is if the new area you are covering is "off the map", in the limit being your circle is big enough to cover all the assets in your level.

To your point of "it means things that get swapped at the edge of a larger radius might be less noticeable": that could certainly be the case, but it doesn't do anything for the worst case - i.e. a player proceeding at the max speed in a single direction. This is also unfortunately a very common case as players traverse from point A to B. If you streaming throughput is not fast enough then as you continue moving your effective streamed in radius will decrease to a point where you are back to the original problem. This is similar to SSD SLC caches; having a larger cache can mitigate short bursts of I/O, but it doesn't help the case where someone writes a pile of sequential data, where it will eventually revert to the underlying non-cached throughput.

That's fair. However we can't expect gamers to accept the explanation that it's complicated...forever.
And I'm not saying gamers should accept it at all. In these conversations I'm only trying to provide some explanations when people say things like "I don't understand why this isn't the case". I'm not trying to say people should ever judge things by anything other than the results. We have plenty of good and bad examples when it comes to this stuff; it's completely legitimate to call out the bad cases.
 
That all makes sense but the issue is that these have been known problems for multiple console generations now. At what point can we stop using the excuse that it’s complicated? Multi-threading and resource management aren’t going to get any easier as games get bigger in the future. Core counts and memory pools and interface bandwidths will only keep growing. Software needs to catch up.
Just increasing the multi-threadedness might not help the problem, though, because that introduces different bottlenecks depending on how CPUs handle multi-threaded workloads. There is latency between AMD CCXs, for instance, and you could in theory no longer be limited by the time it takes to calculate your way though a workload but stall waiting for the parts of that workload to be assembled in a useable state. Just making it multithreaded might not solve how the problems present. I think we've seen something similar happen with the current implementations of DirectStorage. We may have had a stutter caused by the CPU being busy decompressing data, and moving it to the GPU cases a stutter because the GPU is busy decompressing data, sometimes even causing a larger performance hit.

I also think it's important to recognize that you notice the stutter a lot more at higher framerates, and the framerates of consoles have shifted from mostly 30 in 360/PS3 to mostly 60 now, with many titles offering a 120hz mode.
 
@Andrew Lauritzen How well does the engineering/programming side of game dev scale with headcount?

From my, rather limited - in games anyway, experience.
It's less about scaling with headcount and scaling with EXPERT headcount.

Tech lead, and senior level programmers, are always going to be your limiting factor.
Both in terms of gatekeeping the code that is accepted, and they will be the hardest people to find and keep!

20 junior and mid level devs can produce lots of new code and thus new features, AND new bugs.
But it takes expert programmers to review all that code, make and inform larger architectural decisions,
point out pitfalls to certain designs and highlight problem areas.
And experts are few and far between, especially for any decent sized codebase even a new expert will need months or years to become an
expert in that codebase.

This is all assuming you have a really good org around you, fully staffed QA and test systems.
good Dev Ops to handle everything that is required but isn't actually writing code.

tldr; imho it really only scales with the top level expert programmers, and they are VERY hard to find and keep.
But my wide area guess would be beyond 5-10 expert devs, and at least the same supporting mid level devs is around the point where
productivity starts to slow down if your adding more people - For a single specific project.
Obviously for something like UE you can break it up into many different components.

Thankfully game design scales much better, ie art creation.
 
However we can't expect gamers to accept the explanation that it's complicated...forever.
Well it kinda is. ;) And where a complicated problem is finally solved in principle, it always comes down to a financial, business problem whether it's applied or not. What will it cost to solve this, and what is it worth?

Not wanting to derail this great tech talk with business talk! Just...that's what gamers are going to face. The world is imperfect as there aren't enough resources to make it perfect to the standard we want, although if we settled for an older standard like 12th Century Western Europe standards or 1990s gaming, we could easily make it 'perfect'. The envelope is always going to be pushing something shy of ideal until tech plateaus. Tables and wheels are pretty good and reliable these days...
 
I think another "limiting" factor with UE, that Sweeney may have brought up in that interview, is backwards/forwards compatibility. My understanding of UE is you can take older projects and convert them eg UE4 -> UE5. Things get deprecated, and I'm not sure exactly how that process works, and could probably lead to people having to rewrite some sections of game code that they've written. There's probably been some hesitancy to change some things on the engine side that relate more to gameplay code so people can move on to the new engine version more easily. A custom engine for one particular game doesn't really have to be "forward compatible" in any sense. Plus they have a huge user-base that have a wide range of ability, so re-architecting the engine in a major way means all of those people have to relearn how to use UE. They likely try to minimize the learning curve to move from UE4 to UE5. Verse will be a major change, if they try to steer people that way in the future, but they're taking a smart path by building it up in Unreal Engine for Fortnite first.

In any case, multi-threading can be hard to debug and is not as easy to do well as a single-threaded code. UE5 is not just for AAA devs. It's a difficult problem to work with even in the AAA case. Naughty Dog and id both have highly multi-threaded job systems that don't even have a main thread, but those two studios are kind of cream of the crop in terms of technical ability. They also have engines that are highly tailored to the particular kind of games they make, instead of being general purpose engines.
 
At something like 128GB I do wonder if the design paradigm would be akin to something like loading the entire game world essentially.
It would make very little difference unfortunately, you see the bottleneck is not memory at all. CPUs with very large caches certainly help with performance, but they never help with traversal/compilation stutters at all.

The bottleneck is the CPU single threaded performance, which has been stagnating for a decade now.

Single threaded performance stagnated because transistor frequency/clock stagnated due to physics.

Also, and more importantly IPC stagnated. To improve IPC, CPUs must decode, issue, and execute more instructions per cycle, which means Increasing the width of the execution side, but this introduces timing and signal propagation issues, as wider execution = more logic = longer wire delays = harder to run at high clock speeds, and you are back where you started.

It's the same thing as GPUs, they are very wide, and thus run at much lower clocks than CPUs.

Even when you want to reach high clocks you need more pipelining, which increases the amount of logic you need = longer wire delays = harder to run at high clock speeds. The end result of all of this is that single threaded performance is progressing at a snail pace.
 
@DavidGraham Cache misses are a big part of game performance. It's why game companies pursue things like entity component systems as an architecture, though even that is starting to lose some favour because it can complicate some problems. A miss on the highest level of cache is a huge penalty on a modern cpu. It's why game companies get very focussed on cache alighments and using packing flags into an int, and using structs of arrays. They want to make sure they don't waste cache lines, and make sure they minimize cache misses. Cache misses are basically stalls. CD Projekt Red is making a minimal game object for static content for those specific reasons. Whether your code is single or multi-threaded data layout is a huge win on modern cpus.
 
Here's the issue devs face.

GTA 6 will release, and it will likely feature an open world more interactive, more dense, and more visually impressive than anything else out there... and I'm going to bet it will not stutter as a city block streams in, or you run/drive through the world... which ultimately suggests that it isnt a strict tech limitation... its a code limitation.. and yes GTA6 has billions invested in it if course... but that doesn't matter to the consumer. They expect products to work as they should... so this complicated issue for lower skilled developers has to become less complicated.
 
Using Rust you would be able to write multi-threaded systems much faster and with higher confidence but that is a very big engineering effort and there are reasons why all those projects (like Embark's) have stalled.
 
@Pjotr actually covered a lot of good info and background so I'll maybe just add my 2c in a few places.


Worth remembering this is not a black and white issue here. As Pjotr mentioned, you can indeed do a lot of things up front and games do tend to do this. In fact you can think of HLOD and similar systems as precisely this - bake down a pile of complicated editor things into a simpler representation that can be resident most of the time without redoing all of that work. But there are a number of considerations worth mentioning:

1) A lot of these streaming systems are not *just* about saving memory footprint, but rather various knock-on effects of having "more things". Ex. Nanite and virtual textures can generally handle streaming of the rendering of most objects independently at a fine-grain, so why do we even need HLOD anymore? A big one is just cutting down on total object/instance counts, which can affect Nanite but can have an even greater effect on lots of other parts of the engine. Ex. having a whole pile of physics objects loaded is not feasible, as even with acceleration structures there's some practical limits on how many active simulated objects there can be at a time. There are many more examples of places where systems that are optimized for thousands of objects just fall over entirely when presented with millions. Some of this can and is being improved on the tech side of course, but there will always be a need for streaming and LOD for open world games because...

2) The asymptotic scaling of this stuff in a 3D world is awful. Even in a relatively 2D/flat plane kind of world, doubling a streaming radius is 4x the cost (footprint, instances, everything). If there's significantly more "3D" content/verticality, it's even worse (up to 8x). Even with hierarchical data structures and cleverness you will always need LOD, both in graphics and in "gameplay".

3) Given the above, having more RAM is not necessarily even a huge advantage. Sure you can potentially push the radius of "high fidelity loaded stuff" out a bit, but the scaling is poor and more importantly, it doesn't actually change the cost of how many things you need to swap in/out as you move throughout the world, which is really just a function of movement speed and asset density. i.e. you don't actually make the streaming problem any easier by increasing the streaming radius until you can actually load the entire world, which is basically impossible for a game of any reasonable size due to the scaling function.

First it's not an excuse, it's an explanation. But more importantly, there *has* been major progress on this front over the years; modern engines and consoles are significantly more efficient at streaming and handling higher complexity scenes than they used to be. But along with the improvement comes the insatiable desire to push the content even further, which unfortunately has often entirely eclipsed the improvements. It's the same issue with much of computing... as the capability expand so do the desires and expectations.

I will re-echo @Pjotr's point that part of the problem is that a lot of this code is on the game side, not the engine side, and it is often written by folks who are not performance/optimization experts primarily. Sometimes it'll be blueprint, but even if it's in C++ (in Unreal) it is often written in a very practical way. The systems have existed to make this stuff async and parallel for a long while, but it makes the code more complex and hard to maintain, particularly for non-experts. Stuff like MASS and PCG are trying to help address some of these cases in the short term in Unreal, but require rethinking how some of these systems are architected on the game side.

Ultimately though this is one of the main goals of Verse - to provide a programming language that is better designed for modern, parallel/async by default type needs while still letting people express the logic in a way that is more procedural and intuitive. I will not claim this is an obvious or easy thing to do... there are entire graveyards of languages that have attempted similar things and fallen short. Thus I think a "believe it when I see it" attitude is very warranted here, but conversely it's unreasonable to claim that this problem is being ignored; indeed they are trying to invent a new programming language with ambitious design goals pretty much precisely to try and address the issues of non-experts writing inefficient serial code that is impossible to optimize at an engine level.

This is all a good discussion, just don't expect a silver bullet here. This stuff is very adjacent to the "how do we parallelize general purpose code" questions that have been at the core of computer science research for decades.

The idea is not to simply increase the LOD radius naively. But to keep the same simulation radius constant, but have a secondary, larger, preemptive loading/initialization radius that pro-actively loads and initializes game objects (in a frozen, invisible state until they hit the actual sim ring) on a low priority thread.

It runs ONLY when the CPU is free, and it wont stop the game simulation if its not completed, it will wait for when it can continue its work It just adds stuff to the back of a streaming/initialization queue. I mean, at least these open world games do have some sort of streaming queue, right? I'd spit my drink if you tell me they just load stuff instantly exactly in the simulation tick in which an object is needed and the whole thing is expected to happen instantly otherwise the game WILL hitch.

I mean, these systems seem like basic engineering to me. I assumed thats how games as early as GTA3 on PS2 were handling that stuff. This sounds like stuff any dev would anticipate is necessary for an open world game and implement early in the development, as a foundational piece of the engine. Just like basic object culling is.

If there are actual devs shoehorning this post-hoc late in development because they are surprised that in a vast open world game, many dozens entities can end up being initialized simultaneously while traversing the world at speed, have they even stopped to try to visualize their game running in their head whatsoever throughout developlent? These sound like stuff that a self-respecting software engineer would predict from the start.
 
Back
Top