Unreal Engine 5, [UE5 Developer Availability 2022-04-05]

Strangely, upscaling be it TSR or DLSS sometimes seems to tank framerate, or not improve it by much.
Why do you expect it to improve frame rate? Upscaling costs something as a bunch of the pipeline still runs at the upscaled size. The way NVIDIA has marketed it as a performance improvement is by putting "super-sampling" in the title and constantly comparing it to running brute force at the upscaled resolution. Don't get me know, upscaling is great and a better use of resources in most cases than "native 4k" (as much as such a concept even really exists anymore), but the process itself costs a few ms of performance compared to not upscaling. Unless you're saying that upscaling some lower resolution *to* 1080 is slower than rendering at 1080, in which case that's a bit odd but it's possible it's not actually doing that as there are a number of interacting cvars (also note the upscaling you see in editor is driven by the editor controls while in a cooked game it is based on some different cvars).

It also costs vram, as discussed. A bunch of the buffers are higher resolution, and the texture LOD bias will cause more virtual texture usage (although I'm not sure how the pool is sized). If performance is taking a reasonably dive, check vram usage as that might be what is putting it over the top here.

Thus it is not unexpected that upscaling 1080->4k will cost performance vs. just not upscaling the 1080 buffer. It should look notably better though... closer to the 4k image than the 1080 one in most cases.
 
GI in the Matrix Awakens pre-compiled demo is a major CPU bottleneck:


but the GPUs are still not the actual bottleneck in this comparison.

Obviously, GI set to "1" looks like arse.

There's also more CPU memory usage with GI set to "3" instead of "1".
 
Why do you expect it to improve frame rate? Upscaling costs something as a bunch of the pipeline still runs at the upscaled size. The way NVIDIA has marketed it as a performance improvement is by putting "super-sampling" in the title and constantly comparing it to running brute force at the upscaled resolution.
I guess due to the last few years of DLSS being a significant performance boost compared to native rendering. DLSS costs 2ms or so but the performance increase going from say 4k to 1440p is far more significant to provide an overall boost.

With the way UE5 Nanite and Lumen works however, would we see a far less significant change in performance due to the way textures and geometry are streamed to the buffer thereby the resolution not having as much of an impact on performance, but rather other factors now?
 
GI in the Matrix Awakens pre-compiled demo is a major CPU bottleneck:


but the GPUs are still not the actual bottleneck in this comparison.

Obviously, GI set to "1" looks like arse.

There's also more CPU memory usage with GI set to "3" instead of "1".

I have to say I kind of like the idea of the CPU being the major bottleneck. CPUs are much cheaper than GPUs, so I wouldn't mind if I needed a more powerful CPU than GPU.

Especially if it results in graphics (world density and complexity) of this quality.

Regards,
SB
 
I have to say I kind of like the idea of the CPU being the major bottleneck. CPUs are much cheaper than GPUs, so I wouldn't mind if I needed a more powerful CPU than GPU.

Especially if it results in graphics (world density and complexity) of this quality.

Regards,
SB

You probably going to need both ;)
 
Abit on UE5 release:


DF's take is that UE5 is far from its optimal state, performance and optimizations will come. Richard has talked to Epic and the stutters are allegedly shader compilation which they are working on to fix. They do have a solution in place for Fortnite where it has fixed the shader compilation problem.
 
Last edited:
IMO, UE5 will reach its optimal state when all the next gen tech like Sampler Feedback, Mesh Shaders, DirectStorage and VRS are fully implemented and working. :)
 
Abit on UE5 release:


DF's take is that UE5 is far from its optimal state, performance and optimizations will come. Richard has talked to Epic and the stutters are allegedly shader compilation which they are working on to fix. They do have a solution in place for Fortnite where it has fixed the shader compilation problem.
I don't understand why games today have issues with shader compilation... why it should be ingame and on demand? Is there a reason for that?
 
IMO, UE5 will reach its optimal state when all the next gen tech like Sampler Feedback, Mesh Shaders, DirectStorage and VRS are fully implemented and working. :)

And when actual games release, as DF explains. Then we can see what the performance will be across hardware.
 
I don't understand why games today have issues with shader compilation... why it should be ingame and on demand? Is there a reason for that?

Mostly because compiling all the shaders at the same time is a time consuming process. Potentially 10's of minutes.

Rather than have users wait while this is done at the start of the game or the start of the level, most engine devs. have chosen to minimize initial load times in favor of inline (during gameplay) shader compilation.

So, the drawback of faster initial load is that you'll have stutters as shaders compile as you run into situations where a shader is being used for the first time. Generally it'll be mostly fine after that as most games will cache the shader after it has been compiled.

Regards,
SB
 
I don't understand why games today have issues with shader compilation... why it should be ingame and on demand? Is there a reason for that?

Following on from Silent_Buddha above, shader compilation can be specific not just to the particular hardware, but I think also to the driver revision and game version. Normally you would want to cache them, but not every game does this and seemingly minor changes can require recompilation. There are probably actually some benefits (on the developer end) to compiling on demand and taking the performance stutter - you don't need to worry about driver or shader updates, checking cached shader builds against changes or whatever.

I also expect that as shaders have become more complex, compilation has become demanding. GPU performance gains have outstripped the kind of gains software can easily get from CPUs, and so maybe shader complexity vs compilation times have not been favourable?

Consoles ship a game with a specific version of an SDK and driver (Xbox uses a different and more efficient system of API & driver than is possible on PC) so you shouldn't really see these kind of specific issues there.
 
I don't understand why games today have issues with shader compilation... why it should be ingame and on demand? Is there a reason for that?
It's mostly the *driver* PSO/shader compilation that is the problem here. Since that depends on your specific SKU/driver combination, it can't easily be done in advance by the game (although IHVs will prepopulate caches sometimes for popular upcoming games). Sometimes it can be done at "load time" instead, but obviously making people wait for ages the first time they run the game (or whenever they update drivers) isn't ideal either. The other related issue is that there's not always a good way to know in advance which PSOs will actually be needed, so doing them *all* in advance is infeasible. That's why the PSO caching thing in Unreal basically logs which permutations are needed as you play the game, then let's you precache those. That said, it's still kind of tedious without significant automation (which is besides the point of this demo and would obfuscate the purpose to the developers somewhat).

PSO caching on PC is also made more complex by the fact that you have to take the conservative set of all state that any GPU/driver ever needed to bake into a shader compilation, and have unique PSOs for all of that. Individual drivers/hardware will only need unique shaders for some subset of those different states, but which ones will vary from SKU to SKU. Drivers will typically try and hash things to avoid recompiling based on state that they know doesn't actually affect the output, but that doesn't really help the high level issue that we had to generate all those "potential" permutations in the first place.

I agree with the commentary on the DF podcast - it would be great for there to be easier ways to automate this in Unreal though, as while IMO it isn't a big issue if a dev-focused tech demo doesn't have a precompiled PSO cache, some games have shipped in a similar state which should never really be the case.

That said, on PC there's always a delicate dance between the game, the OS and the graphics driver. As I noted earlier, PSO compilation is a big piece of that, but there are similar problems with things like allocation patterns and the like. I imagine now that UE5 is out in the wild there will be significantly more attention on tuning it from all parties. As soon as anyone starts to benchmark anything the IHVs get really interested, really quickly ;)
 
Consoles ship a game with a specific version of an SDK and driver (Xbox uses a different and more efficient system of API & driver than is possible on PC) so you shouldn't really see these kind of specific issues there.
It's actually one step further - on most consoles you can ship the actual compiled GPU code, because effectively the equivalent of the user-mode driver is compiled right into the application and can't be changed once you have shipped it without patching the game, so it's safe to bake it all down.
 
Saw the always reliable MJP speculating about some sort of peer to peer distributed/torrent kind of shader cache. So if someone with your setup compiles and uploads you can just grab that and go. It'd be a more formal and easier system than what people already do for emulating more modern consoles. I'm sure Valve could manage it for Steam at the very least, and eventually Microsoft whenever they get their bureaucratic collective selves into gear.
 
I have to say I kind of like the idea of the CPU being the major bottleneck. CPUs are much cheaper than GPUs, so I wouldn't mind if I needed a more powerful CPU than GPU.

Especially if it results in graphics (world density and complexity) of this quality.

Regards,
SB
CPU performance improves some 10-15% every 2-3 years. CPU limited games are not ideal.
 
GI in the Matrix Awakens pre-compiled demo is a major CPU bottleneck:

The 6800XT is doing the same fps as the RTX 3080, this is obviously using software ray tracing mode only.

I really wonder to which extent the city sample uses HW-Raytracing
I am convinced it is using none of it actually, see the video above. Heck, a GTX 1060 is running the demo already with full features.



Maybe you could stack up the 5700 against a 2060 Super again and see how they perform.
At 4K the 5700XT performs exactly as the significantly superior (ray tracing wise) RTX 2070 Super, the 5700 is actually faster than RTX 2060 Super!! The 3080Ti is just 6% faster than 6900XT (which is essentially the raster difference between them). Heck, the Radeon VII/5700XT is the same speed as the 6600XT and RTX 2070! Vega 64 is not that far off either!

Extrapolating from these results, the demo is definitely NOT using any kind of hardware ray tracing at all, just like I presumed earlier.

https://gamegpu.com/test-video-cards/the-matrix-awakens-demo-test-gpu-cpu

How would you draw any sort of conclusions like this without detailed profiling on the different platforms? Especially when almost everything is CPU bound on a 3090 to start with...
See above. I am deeply saddened by these results, there has to be some kind of a switch to activate hardware ray tracing on supported GPUs.
 
Last edited:
Mostly because compiling all the shaders at the same time is a time consuming process. Potentially 10's of minutes.

Rather than have users wait while this is done at the start of the game or the start of the level, most engine devs. have chosen to minimize initial load times in favor of inline (during gameplay) shader compilation.

So, the drawback of faster initial load is that you'll have stutters as shaders compile as you run into situations where a shader is being used for the first time. Generally it'll be mostly fine after that as most games will cache the shader after it has been compiled.

Regards,
SB

I would rather have a compilation step that takes 10s of hours versus experiencing stutter throughout the first play through. Let’s the game run a bot through the whole map while I wait if somehow real-time compilation is required.

Patience is a far better path than tolerance. We all wait months to years for titles we are excited about so what’s a couple of hours (or days)? Devs should at least provide an option.

Leave the stuttering to those who can tolerate it.
 
Last edited:
The demo is just what it is. A demo.

You shouldn’t judge it like it’s a finished game. Game development involves a level of optimization or quality control that a demo will never have. It meant for devs who have the knowledge and skills to play around with the engine to judge it capabilities accurately.
 
It's actually running on Unreal 4, but they plan to upgrade for UE5 for the final game.

this makes me very hopeful the Final Fantasy 7 remake part 2 will also be using Unreal 5.

Yesssss such a good sign. The fact this is on ue4 and not ue5 means this is likely the floor of what we can expect
 
Back
Top