Tom Looman said:
As of 5.1, the engine ships with two systems trying to solve the same problem. PSO Precaching (5.1+) and Bundled PSOs (UE4 and above). In this article I’ll explain both systems and how they currently work together.

Although the intend is for the new Precaching solution to replace the bundled approach entirely…I did not find that coverage is sufficient as of right now (UE 5.3) for a hitchless experience even in a simple test project.

Automation Suggestions​

Handling Bundled PSOs is a lot of work compared to the new PSO Precaching. Therefor we can only hope that it will eventually be replaced entirely saving everyone a ton of work. Until then I’d like to suggest some ideas for streamlining this process as implementing this falls out of the scope of this article.

  • Have QA or playtesters with the game with -logPSO, this would ideally automatically upload the generated file to a server to avoid manual work. Make sure they run on different scalability settings too as these will create different PSOs.
  • Create a simple spline actor in every level that can do a flythrough to visit all locations. This might not cover everything so keep cinematics and spawnables in mind. Perhaps these cinematics can be triggered as part of the automation after the fly through has completed.
  • Have a custom map for PSO gathering. This can contain all your spawnables from gameplay such as items, weapons.

The last suggestion (albeit this one intended for developers for PSO gathering pre-release), was also one of my suggestions early on in #stutterstruggle talk - games that couldn't devote sufficient time for QA to play through the entire game could at least have a new variant of the in-game 'benchmark', basically a "shader warming" optional stage that blasts as many materials/effects as it can to to screen and have the driver cache do its thing. I recall hearing a mobile developer essentially did that for one of their games during setup, the game would load all assets it could in the background in a test level. Could even be a seperate .exe like Metro 2033's Benchmark.exe, it just loads level/levels and spawns an intense firefight. Not comprehensive of course, but my guess was just creating a level in Unreal and load as many assets as it could to reduce PSO stutter would be a relatively low-friction method for devs that are strapped for time/resources.

Also while I suspected this was the case, Tom also mentions these QA runs need to be done with varying quality settings as they will generate new PSO's for each. Yowza.

This article aims to fill some of the knowledge gaps left by the docs and release notes. As Precaching was announced it took me longer than I care to admit before I had it fully working. Partially as its claims are bigger than what it delivers, as the simple project can’t seem to reach full coverage with Precaching and *needs* the old system for a solid 100% experience. I’m confident this will be addressed and improved in future versions, until then combining both old and new seems like the way to go.
Another interesting thing about Lords of the Fallen is the fact that when you toggle off Nanite via console, you actually get a serviceable lower LOD version of the game. As far as I know this isn't toggleable via in-game settings.

I wonder if this is authored by the devs? In other games I tried, it would just load the high-poly assets without utilizing Nanite which would obviously lead to performance issues.

Any performance advantage when toggling this?
Why is nantine fundamentally changing the objects on screen? Move the slider in this area and unless i'm blind, it's like having two different buildings.

Weird, I did look specifically at those areas and didn't see anything. Object count shouldn't really affect things on this front assuming they are Nanite. If it was longer distances it could be something like HLOD (also not related to Nanite, etc) but that shouldn't be a factor in the scenes you mention. Not sure what could be going on... is it possible to capture a video? I'm not sure why it doesn't repro for me here. 🤷‍♂️

The demo has been disabled on Steam :(
nanite has a fallback mesh function that creates a low resolution mesh for hardware that doesn't support nanite
Indeed, and these are also the meshes that are (by default) used for other systems (raytracing, etc) that can't handle the full quality Nanite meshes.
Oh too bad :( I guess it's coming out soon but seems weird for them not to leave it up as additional advertising if they bothered to put it together in the first place 🤷‍♂️
And I don't even know the pathtracing settings here, but for "real" reference quality versus real life you need at least 10 bounces and this might be less.
The path tracer in Unreal is targeted at being much more of a reference, physically-correct path tracer than a "real-time" one. As such I believe the defaults are more in the range of 32 bounces, 16k samples/pixel (no denoising by default although offline CPU denoisers are supported), etc. Unless the defaults were altered here it is generally pretty equivalent to classic offline path-tracing.
The Ubisoft folk have been bragging about cracking this problem ever since that Avatar game was shown off (2?) years ago. Unfortunately nothing's come out yet so no conference talk about how they're moving all that foliage around in a BVH in realtime :confused:
IIRC NVIDIA has shown a few things with foliage as well but it was still pretty brute force with heavy instancing of animated geometry to make it fast enough. That can work for some cases, but not all. Additionally I believe the foliage was more similar to current-gen assets (fairly low poly with heavy alpha masking) than what we're dealing with with Nanite (300k+ poly trees). Very interested to see what Ubisoft and others come up with but it definitely feels like some different rendering approaches are needed for foliage in general as the current techniques are not really very efficient to rasterize *or* ray-trace.

I do wonder if the visibility bitmask strategy could work for virtualized shadow map filtering (https://arxiv.org/pdf/2301.11376.pdf). They horizon test multiple "sectors" over a hemisphere, and can consequently get an independent depth/"thickness" estimation for each one.
SMRT already shoots fairly arbitrary rays and computes a dynamic "thickness" guess, in addition to keeping track of depth slopes for extrapolation behind occluded objects. Unfortunately the remaining cases that produce the most noticeable artifacts and limit penumbra sizes are effectively unsolvable with a single depth layer - i.e. an important object is not in the shadow map at all because it is entirely behind another occluder based on the point projection. I think the paper you quoted also only does a single depth layer so it will have similar issues in those cases.

That's not to say that there are no improvements that can be made to the current filtering scheme, but we're pretty near the point where it's as good as it can get with a shadow map. For area lights we really do need a world space structure at some point.

In 2024, prepare to venture into the microscopic realm like never before!
Embark on an extraordinary journey through the infinitesimally small and guide your colony towards a brighter future.
As the Ant Savior, your mission is to lead your people, rebuild their home, ensure their safety, prosperity, and conquer new territories across changing seasons. Explore, strategize, confront, and engage in diplomacy to triumph over the myriad challenges that lie ahead.
Get ready for an epic battle in the world of the infinitely small, powered by Unreal Engine 5.
SMRT already shoots fairly arbitrary rays and computes a dynamic "thickness" guess, in addition to keeping track of depth slopes for extrapolation behind occluded objects. Unfortunately the remaining cases that produce the most noticeable artifacts and limit penumbra sizes are effectively unsolvable with a single depth layer.

Good to know the algos are already that smart. Are other depth-map based ray-traces such as the screen-space ones (AO, reflections, lumen SSGI, etc) alao using such thickness estimations?
Can devs pre-bake averate thickness maps for their models, or has that solution been studied?

As for multiple depth layers... Has that been tested at all? Both for shadow maps and screenspace g-buffers? They might be halpful even if at a considerably lower res... Is there a way to reuse samples that are diacarded by depth-test when overdraw happens to fill those out?
Depth layers always feel like a piling expensive hacks one atop the other to try and get back what worldspace tracing already gives you. At the point you get to there, spending time trying make worldspace fast enough is very likely a better use of time.
Yes, in fact the SMRT algorithm was heavily inspired by the effectiveness of screen space contact shadows. It is similar to doing a "light space contact shadow" in some ways, albeit with fancier depth extrapolation. SMRT also uses a short screen space trace to disambiguate some situations close to surfaces.

Can devs pre-bake averate thickness maps for their models, or has that solution been studied?
I don't know that I've heard putting something like thickness directly in the gbuffer considered much. Usually the cases that cause the most issues are things like telephone poles and fences and such which have significant sizes in one or two dimensions, but are very thin in another one. Thus my intuition says we'd need some sort of view-dependent approximation, which is starting to sound kind of expensive...

As for multiple depth layers... Has that been tested at all? Both for shadow maps and screenspace g-buffers?
Yes this has been tested a fair bit both in research and production, but the results are not terribly great. It is *significantly* more expensive to rasterize a multilayer depth buffer (depends on the specific form, but it's always at least 3+ times more expensive) so you start off at a pretty large disadvantage. Furthermore tracing into such structures is a huge mess... the issue is if you have something like depth peeling then discontinuities affect *every layer* after the one in which they occur, so you end up having to do messy spatial searches in all layers to try and reconstruct what was actually a continuous surface, but is now "offset" in the data structure due to a surface unrelated to it. Representations that sample depth instead of encoding a list (layered VSMs, fourier opacity maps, etc) are significantly easier to deal with, but are practically limited to relatively short light ranges and thus are generally not suitable for directional lights.

These days it's hard to imagine that you could add 2-8x more cost to shadow maps and still come out with any sort of significant win vs. just switching to triangle ray-traced shadows, even with Nanite.
Yes, in fact the SMRT algorithm was heavily inspired by the effectiveness of screen space contact shadows. It is similar to doing a "light space contact shadow" in some ways, albeit with fancier depth extrapolation. SMRT also uses a short screen space trace to disambiguate some situations close to surfaces.

I don't know that I've heard putting something like thickness directly in the gbuffer considered much. Usually the cases that cause the most issues are things like telephone poles and fences and such which have significant sizes in one or two dimensions, but are very thin in another one. Thus my intuition says we'd need some sort of view-dependent approximation, which is starting to sound kind of expensive...

Yes this has been tested a fair bit both in research and production, but the results are not terrible great. It is *significantly* more expensive to rasterize a multilayer depth buffer (depends on the specific form, but it's always at least 3+ times more expensive) so you start off at a pretty significant disadvantage. Furthermore tracing into such structures is a huge mess... the issue is if you have something like depth peeling then discontinuities affect *every layer* after the one in which they occur, so you end up having to do messy spatial searches in all layers to try and reconstruct what was actually a continuous surface, but is now "offset" in the data structure due to a surface unrelated to it. Representations that sample depth instead of encoding a list (layered VSMs, fourier opacity maps, etc) are significantly easier to deal with, but are practically limited to relatively short light ranges and thus are generally not suitable for directional lights.

These days it's hard to imagine that you could add 2-8x more cost to shadow maps and still come out with any sort of significant win vs. just switching to triangle ray-traced shadows, even with Nanite.

Awesome response. Thanks a lot.
I don't know that I've heard putting something like thickness directly in the gbuffer considered much. Usually the cases that cause the most issues are things like telephone poles and fences and such which have significant sizes in one or two dimensions, but are very thin in another one. Thus my intuition says we'd need some sort of view-dependent approximation, which is starting to sound kind of expensive...
Sampling against an SDF to derive hints?
The Ark and MGS3 remakes shown on the Xbox stream are both using UE5. Shame that Snake Eater Triangle isnt using Fox Engine, but there it is.

We'll see what performance they deliver. The Ark developers have a long history with UE, so hopefully they avoid some of the issues the first round of titles showed.

Manor Lords is coming to PC Game Pass next year. Also UE5, from a solo dev!
