Unreal Engine 5, [UE5 Developer Availability 2022-04-05]

Yes, that'd be the most straightforward answer. But AFAIK Lumen traces not against the full polygon models of Nanite, but uses low poly meshes instead.

So IDK, the difference in polygon count between Metro and Nanite enabled games should not be that drastic with this technique.

Are you so sure about that?

With respect, I think you guys need to read more details about how the various paths work if you want to engage in this discussion. It's not a software implementation of triangle RT, it uses completely different data structures with different tradeoffs.

Too many people are going off of very simplistic pre-conceived notions of what UE5, nanite and lumen are doing and how they work and how they interact with existing hardware solutions.

Regards,
SB
 
If I were Epic, I would really try to get in touch with 4A Games, maybe there is something you guys are missing (though again I'm only a layman here).
If you know you're layman, then why do you make such (quite amateurish / disrespectful) proposals?
Even i know how A4s lighting stuff works, as they were quite open about it. Epic knows too, we can be sure about that.

I still don't know how Lumen works, so i can't say anything technical.
But there is a difference between a small engine made by a small team, primarily for just one game, with full control over it's content,
and a general purpose engine, which can't make much assumptions on content and games at all.
We can not hope the latter would be as optimized to a specific case as the former, because that's simply not possible.
Big reason for me why i hope we'll still see custom engines in the future, and why i respect such devs.
But this does not mean UE devs would care less about optimization, or deserve less respect.

It's also not fair to expect software devs could magically fix the flaws of HW raytracing and APIs. We can't.
If you want improvements, joint my whining and blame NV and Microsoft for their shortsighted and inflexible options they gave us.
 
If you know you're layman, then why do you make such (quite amateurish / disrespectful) proposals?
Even i know how A4s lighting stuff works, as they were quite open about it. Epic knows too, we can be sure about that.

I still don't know how Lumen works, so i can't say anything technical.
But there is a difference between a small engine made by a small team, primarily for just one game, with full control over it's content,
and a general purpose engine, which can't make much assumptions on content and games at all.
We can not hope the latter would be as optimized to a specific case as the former, because that's simply not possible.
Big reason for me why i hope we'll still see custom engines in the future, and why i respect such devs.
But this does not mean UE devs would care less about optimization, or deserve less respect.

It's also not fair to expect software devs could magically fix the flaws of HW raytracing and APIs. We can't.
If you want improvements, joint my whining and blame NV and Microsoft for their shortsighted and inflexible options they gave us.
I want to stress that this was not meant to be a disrespectful comment in any way or form. Rather, I would like industry to work together and share ideas how to tackle very complex topics such as HW-RT. You can have all the super talented devs in the world, sometimes all it tackes is one person from a different company who has an entirely different perspective about the matter than your team.

UE5 is still a great engine made by very talented people, I want to make that very clear. It's just that from an ENDUSER perspective, I see the RTGI in Metro EE running so fast for a similar level of quality and I just wonder why Lumen can't have this performance. And then I combine it with the fact that in the past, UE4 powered games by far had the most performance intense RT integrations out there for little to no visual benefit compared to other solutions from studios with different engines. Lumen is much improved from that, so that's a step in the right direction. But I can't help but wonder if there is any potential left to optimize to get it running with HW-RT at 60 FPS on lower end GPUs and consoles.
 
But to me as an enduser, I don't see any tradeoffs with the superb Raytraced Global Illumination used in Metro Exodus Enhanced Edition compared to Lumen, which also uses Triangle based Raytracing. In terms of quality they look(!) pretty much equal and both offer multi bounce GI. Yet, the RTGI in Metro seems to be way faster than Lumen, running easily at 60 FPS on consoles.
I'm not super familiar with the specifics (keep in mind I don't work on Lumen), but wasn't their stuff diffuse only on consoles? That's a *far* simpler problem than specular reflections and something you can easily solve with a variety of methods (in Unreal or otherwise). Most of the demos with any significant GI they show are in small, static spaces which is great for some games but dodges most of the issues with open world GI stuff as you can precompute a lot of the BVH's. With mostly static geometry, arguably stuff like enlighten was *far* cheaper for similar quality as well.

I would not be surprised if you could set up Lumen to provide similar quality/performance for such a situation if desired. That said, ironically diffuse only GI is a case where RT doesn't make as much sense. It's not terrible, but it's also not really as appropriate as SDFs or other structures that can support direct area queries.

Ultimately I'd mainly just caution that you need to really compare this stuff on identical content, because there are a myriad of tradeoffs. Stuff that may not be obvious like the amount of deforming geometry represented in sharp specular reflections and the like are actually primarily responsible for performance considerations.
 
Last edited:
You can have all the super talented devs in the world, sometimes all it tackes is one person from a different company who has an entirely different perspective about the matter than your team.
I shall know, as i claim to be such person. Once i'm ready to proof my point, i'll get in touch with the industry.
From my perspective i can assure you there is no secret or even innovation in A4s approach. Looking at it from a general view. Ofc. they'll have theri tricks and optimiazations they do not share in their GDC talks, but in general their approach is clear and nothing new - it's just a combination of well known techniques.
It's just that from an ENDUSER perspective, I see the RTGI in Metro EE running so fast for a similar level of quality and I just wonder why Lumen can't have this performance.

I totally iunderstand how you get to this point. Personally i like A4s approach of raytracing the most. It's the best compromise you could do right now using state of the art technology. They use something liek RTXGI, which sucks, and is a terrible approximation, beside being mindless brute force. But to compensate for this shortcoming, they use 'path tracing' for the frist bounce, so the inaccuracy tones down to 'just' the secodnary bounces. (Thus, they do not really use path tracing, becasue ther are no paths. the dynamic volume probes replace the paths with a simple lookup to deliver an aproximated solution for the bounces.)
It's not perfect, but actually nobody (excpt me, as i claim) knows better. So i like that. (And i have proposed this approach long before A4 used it her on this forumj - it's an obvious and trivial idea.)

However, the approach suffers from a need to place probe volume grids manually. Besides manual work, it also does not scale. If you have big worlds, likely you'll use very low resolution grids for outdoor landscape, and the GI solution degrades to very low quality, up to the point where you can see no improvment between vanilla Exodus and RT Exodus. That's actually the case.
Thus, the solution is not general, and no ideal choice for Epic.

From what i know about Lumen, that's way more complex. The effort and amount of innovation is way higher. Their solution is a mix of many techniques, approximations and hacks. Volume probe grides, attempts of surface caching, refinement in screenspace, various tracing paths ranging from global SDF to per model SDF, up to dynamic meshes with HW RT.
It's insanely complex at very high costs. I am sure they are not happy with it.
But they try, and nobody else does better. It's still an open problem, and even i'm disappointed the Matrix demo runs at 20 fps on my 10tf GPU, it's still impressive they make this monster work at all.
It's also fully automated, and no manual placement is needed. That's just important for production, even if we pay attention only to final FPS. No practical production - no games at all.

So you see i basically agree. I hope i did not sound too arrogant within my braces. I claim to have solved GI, but i already spend years on making it a practical and general solution, speaking of fully atomated tools. I might still fail on that, turning a decade of work useless and bringing me under the bridge.
It's way more difficult than the enduser can see, which applies to game development as a whole.
We make just silly games, but it's hard work. Thus - even if we fail on so many things - it sucks that always everybody points with fingers to us, calling us lazy, greedy, technically incompetent, socially incompetent, demotivated, and whatever else. So i just wanted to give the Epic guys some backing here. No big problem - just saying ; )

(my comments on various techniques are based on assumptions and may be partially incorrect)
 
So that's why I am wondering why Epic cannot achieve similar stuff for the HW-RT path in Lumen and why Epic chooses to ditch Hardware RT for the mode that everyone will use on console ( 60 fps mode).
I did mention this in a previous reply but in case you missed it - Epic is not ditching anything or forcing anyone to do anything. Those slides you are basing your rage on are literally just a high level recommendation for various performance targets, but users of the engine are free to make whatever tradeoffs they like. All the options are there, and as I mentioned default scalability settings are almost always modified by licensees because it's impossible to come up with a set that works for all content.

For me, the discussion board exists to inform and educate in a sociable learning structure.
That's fine and I didn't mean to pick on you specifically (just the comment you had was a good example). I'd definitely suggest broadly that if this is the goal please try and phrase your comments more as questions rather than coming out with rage based on a presentation that wasn't even targeted at end users. It's great if people want to learn more and I'm happy to help inform, but please be polite and minimize passing judgement and getting mad at things that you may not understand even at a high level.

I still don't know how Lumen works, so i can't say anything technical.
But there is a difference between a small engine made by a small team, primarily for just one game, with full control over it's content,
and a general purpose engine, which can't make much assumptions on content and games at all.
We can not hope the latter would be as optimized to a specific case as the former, because that's simply not possible.
Yes this is perhaps a better way to say what I've been trying to get at. UE is providing multiple options that can be mixed and matched and specialized for a given game. You can almost certainly attain similar tradeoffs to Metro or other things by playing with the settings (or modifying the code a bit if necessary). It is both expected and often required that people will modify all of these finer grained settings to fit their use cases better.

That said, it's always great to see other takes on how to do things. We all build on other people's work explicitly and implicitly. None of these techniques come out of nowhere.

I want to stress that this was not meant to be a disrespectful comment in any way or form. Rather, I would like industry to work together and share ideas how to tackle very complex topics such as HW-RT. You can have all the super talented devs in the world, sometimes all it tackes is one person from a different company who has an entirely different perspective about the matter than your team.
You can be assured this is the case. The very fact that UE is developed out in the open on github and you get to see work I did 2 days ago should be a large proof of Epic's commitment to this. But furthermore I have never worked for a place that has been particularly secretive about rendering tech... that is very much a group effort as far as game developers go. Certain IHVs do occasionally try to fight against the openness that otherwise is common (I doubt anyone has to guess on this one...), but so far the game industry has been fairly resilient to that sort of insular behavior and I very much hope that will continue.

Not to derail this thread but as someone who has worked on both sides of the fence, I am far more concerned about IHVs owning any significant parts of the rendering stack than any engine or rendering middleware, as the former provably and consistently misbehave because their profit models are based on it. Even with game engines the rendering part is a much smaller part of the whole package and everyone is pretty aware that we effectively work together as one community to evolve it across the industry rather than particularly specifically to one place.

ex. Nanite and Lumen are not secrets - the very presentations you reference spell out all the details and you can go read the code if you want more. The value is in all the tooling and R&D and engine stuff around the techniques rather than the techniques themselves, which is what you'll also hear when you read stuff from licensees who use Unreal Engine but replace decent chunks of the renderer, etc.

But I can't help but wonder if there is any potential left to optimize to get it running with HW-RT at 60 FPS on lower end GPUs and consoles.
There's always potential for optimization in any system. It's just not as simple as "HW-RT at 60fps" because it's so fundamentally content dependent. You can absolutely do HW-RT at 60fps now if you are careful with your content and make sacrifices in other areas. There's no hard switch like you seem to be implying from your reading of that presentation.

Sorry to reiterate the same points a few times, but I want to try and be very clear.
 
Now that MS and Sony consoles, AMD gpus, Intel gpus, Nvidia gpus and even samsung phone SoC's all have hardware based ray tracing, that sure has to mean something right? Or am i misunderstanding things? :p If SW-RT is better, then yea, so what, its not like modern GPUs lack any form of raw compute capabilities, were up in the what, 80TF range next year?
 
RT shadows as in the engine today are unrelated to Lumen. RT shadows were indeed only used for the large area lights in the cutscenes (as shadow map methods can't really handle that kind of thing), whereas virtual shadow maps were used for the main directional light.

And yes the Matrix demo on consoles used HW-RT for Lumen and targeted 30fps.

Each game is going to have the find the best tradeoffs for the content and target frame rate/quality level. For Matrix I think it made sense to aim for the really cinematic settings and there was a lot of content to show off mirror-like reflections and so on. For other projects, probably not so much. I don't imagine either will be clearly the better solution for all cases in the medium term as there are too many tradeoffs that vary in terms of costs maintaining the various data structures and so on.

I don't think it's fair to say that SW-RT "lacks features" either. It obviously also implements reflections and indirect visibility and all of the other things, it's just the data structure it samples to resolve the rays has different tradeoffs. It's generally very appropriate for blurry/diffuse type effects, and less appropriate for sharp reflections and so on. RT however has its own overheads both in terms of tracing and - often ignored but probably even more important - building and maintaining the BVH's. The latter is normally the actual limitation rather than the tracing performance in my experience. It's also totally fair to note that HW-RT path often "skips" many things in the name of performance as well, including significant amounts of deformation (vertex animation, skinning, etc.), alpha testing/translucency and far field geometry. There are many considerations to both methods beyond just a single toggle on/off.

Very little of this is unique to Unreal Engine as well. In cases where you find things with significantly higher performance/quality/whatever axis, they will be making tradeoffs in the other axes. That is the nature of the beast; at least for this generation games will have to make a series of content-specific tradeoffs in these areas.

[Edit] Also no one is "getting rid of any modes". The slides there are just talking about the default scalability options. Most developers can and do change those to suit their title and platforms. They are just general guidelines to start from. If you want to make a 60fps HW-RT game and are willing to make whatever other sacrifices are necessary to get to that performance level, nothing is stopping you.

Great and we learn too in the presentation that now Land of Nanite is running in 60 fps on PS4. Lumen comes from 8 ms to 4 ms and the lighting improve too. And I was impressed by Tekken 8 UE 5 teaser.
 
Now that MS and Sony consoles, AMD gpus, Intel gpus, Nvidia gpus and even samsung phone SoC's all have hardware based ray tracing, that sure has to mean something right? Or am i misunderstanding things? :p If SW-RT is better, then yea, so what, its not like modern GPUs lack any form of raw compute capabilities, were up in the what, 80TF range next year?
Sadly nothing of this helps against API limiatations ruling out any continuous LOD solution. Maybe we could rebuild BVH for the whole scene every frame if GPUs were 1000 times more powerful, but even then it would be just a waste.

So the question is not which method is better for which specific application, but rather: 'Why the fuck is geometry for HW RT assumed to have static topology, although there is no technical reason this limiation should exist?'

The problem is completely unrelated to other topics like sympathizing more with flexible SW or accelerated HW, or judging how practical realtime raytracing at all is right now. It's a purely technical / API problem which requires solution no matter what's our individual mindset or opinion.
But sadly, related discussion always is dominated by such unrelated subjective arguments, which obscures the true problem, and only helps to postpone an eventual solution, if anything.

People should focus on their common interests, of which there are plenty, instead arguing about ideological differences. That's the only way, no matter if we want better tech, or saving the planet, or whatever else.
Idk why this is so hard. Probably the problem is identifying those common interests, or arguing being just more fun.
 
Sadly nothing of this helps against API limiatations ruling out any continuous LOD solution. Maybe we could rebuild BVH for the whole scene every frame if GPUs were 1000 times more powerful, but even then it would be just a waste.

So the question is not which method is better for which specific application, but rather: 'Why the fuck is geometry for HW RT assumed to have static topology, although there is no technical reason this limiation should exist?'

The problem is completely unrelated to other topics like sympathizing more with flexible SW or accelerated HW, or judging how practical realtime raytracing at all is right now. It's a purely technical / API problem which requires solution no matter what's our individual mindset or opinion.
But sadly, related discussion always is dominated by such unrelated subjective arguments, which obscures the true problem, and only helps to postpone an eventual solution, if anything.

People should focus on their common interests, of which there are plenty, instead arguing about ideological differences. That's the only way, no matter if we want better tech, or saving the planet, or whatever else.
Idk why this is so hard. Probably the problem is identifying those common interests, or arguing being just more fun.

With this though, everyones in the same boat.
 
Thats the beauty I love about consoles. Even though they are weaker compared to high end PCs, developers can squeeze out more from the given hardware than they can on a PC

Its not the 6th generation anymore.... While still somewhat true, its by no means what it used to be. Actually, we can see an almost linear matching between hardware even with last generation.
Take 2077, the most taxing multiplatform game. Here in this video the guy is running a HD7850 2GB with a slight OC, almost exactly matching the PS4's raw theoretical compute performance. Both GPUS on GCN. The game running on the HD7850 actually performs better. While one could blame the CPU, the weakest link in this game is going to be the GPU.
I have tested this myself on a FX8350, stock 920 to a 7870, and yes, this outperforms the base PS4 in this and many other games. Now we have to remember that 2GB is a huge limiting factor. 4GB variants of these GPUs fare much better.
Now with 1.6 and abit of tweaking, these 2012 GCN midrange GPU's do surprisingly well in not just 2077, other games aswell.


Also to note, in the name of optimization on the consoles, usually that means lower than lowest for certain settings compared to its pc versions, optimizations you can achieve on the pc too in many instances, by mods/tweaks. It can be quite hard to exactly match the console settings like what we see in Spiderman, even with the same settings, the PS5 is running a setting on '8' wheras the pc wont really go lower than 10. Alot of times consoles use dynamic resolutions more aggressively aswell.
More often than not, these optimizations nowadays are mostly fine-tuning custom settings. Yes the consoles have less overhead, but its quite minimal compared to what it has been.
For ports native to another platform thats a different story, mostly.
 
Last edited:
Sadly nothing of this helps against API limiatations ruling out any continuous LOD solution. Maybe we could rebuild BVH for the whole scene every frame if GPUs were 1000 times more powerful, but even then it would be just a waste.
Where does BVH construction happen, CPU or GPU? I thought it was CPU?

The rest of your argument comes to how to accelerate space traversal. To date, the best ideas people have are trees, and using them shifts the cost for tracing from look-up to tree maintenance. I guess going forwards, now that tracing against these structures is 'fast enough', the interest will become how to create and maintain trees or other lookup structures most optimally, which is closely tied with how to optimally access RAM. A breakthrough data-structure would be very welcome!
 
A breakthrough data-structure would be very welcome!
I do not expect break throughs on basic building block such as trees. There are subtle improvements, i remeber 'MBVH' for example, but it's more or less always the same concept since decades.
Regarding optimal RAM access, it is natural that traversing trees causes random access, becasue it skips shitloads of work by doing so. Every cache miss saves us tons of work.
Of course there are ways to adapt, e.g. using trees with 64 children, so all threads of a GPU can read from linear memory and process one child node in parallel.
But that's again just variations of the same basic concept, which won't change.

Remember the memory access problem with classic RT comes from rays / threads traversing different portions of the tree. And the potential solution to this is reordering the rays to minimize this.
Thus the solvable problem is not about the tree data structure, but about the traversal algorithm and it's desired combination with reordering.
Even after they tackle and solve this in hardware, we will still use tree data structures, i'm sure.

But nothing of this is my argument.
I want to have read / write access to the tree data structure, so i can do minor changes to enable continuous LOD of the geometry, and so i can stream it from disk instead taxing both CPU and GPU with constantly building BVH even for static models.
My argument is that this should be possible because it is required.
There is no HW acceleration to build trees, and even if it was, streaming and modifying would be still faster than building it. This won't change, even if Jensen would be wrong and Moores Law would be NOT dead.

To date, the best ideas people have are trees, and using them shifts the cost for tracing from look-up to tree maintenance.
This somehow misses to point i think. Because there is no option for a constant time lookup, as long as memory is finite.
Plus, our options of tree maintenance do not actually exist, which is the problem. I can say 'build it it for me', and i can say 'refit it for me'. This gives me power, but no options. If i want options, i need to do the actual work myself, so i can execute options while doing so.

You are somewhat hopeful or optimistic a magic new data structure will replace trees.
If this would be a realistic expectation, and GPU makers would swap out their data structures and algorithms every year, then my request would be somewhat unpractical.
But that's not the case. It was and will remain trees, so nothing speaks against specifying their individual formats and exposing them to devs.
It's our data, so it is our right to access it.
 
Where does BVH construction happen, CPU or GPU? I thought it was CPU?
Lacking experience, i don't know precisely either, but bulk work happens on GPU. Devs often show their profiler output, showing BVH builds may take something like 2-4 ms of async compute. Depends on the game ofc.
I also know from claims of tech sites benching HW and games, that RT is taxing on CPU as well. But i can only make assumptions on the reason.
Eventually it's common to build TLAS on CPU, and BLAS on GPU, or something like that. May vary across GPU vendors and their drivers. But i'd like to know better too, if anybody can answer.
 
Where does BVH construction happen, CPU or GPU? I thought it was CPU?

The rest of your argument comes to how to accelerate space traversal. To date, the best ideas people have are trees, and using them shifts the cost for tracing from look-up to tree maintenance. I guess going forwards, now that tracing against these structures is 'fast enough', the interest will become how to create and maintain trees or other lookup structures most optimally, which is closely tied with how to optimally access RAM. A breakthrough data-structure would be very welcome!


I guess it can happen on both cpu or gpu depends on implementation. There is at least few blog articles on Nvidia about building bvh on gpus and I wouldn’t be surprised if this is the most common way to do that.


Good explanation starts at 5 min mark. There is even comparison between efficiency of bvh traversal in compute shader vs intel xe cores
 
Last edited:
There is at least few blog articles on Nvidia about building bvh on gpus and I wouldn’t be surprised if this is the most common way to do that.
Pretty sure they use something like that. Although the shown algorithm does a lot of searching work just to get rid of a need for dispatch and barrier per tree level, making it a typical NV proposal of solving problems with raw power instead work efficient algorithms. If we can run such task async, it's likely better to accept the cost of barriers and doing less work, becasue the other async tasks should be still able to saturate the GPU.
Now, assuming recent NV GPUs finally got async compute right, the article is maybe outdated in parts.

But ignoring such details i expect they still use similar methods, which gives fast builds but low quality trees.
So maybe the CPU is used to compensate by using more advanced methods (e.g. including SAH) to build the top levels of each BLAS, or maybe just for the TLAS, at higher quality.
 
Sadly nothing of this helps against API limiatations ruling out any continuous LOD solution. Maybe we could rebuild BVH for the whole scene every frame if GPUs were 1000 times more powerful, but even then it would be just a waste.

So the question is not which method is better for which specific application, but rather: 'Why the fuck is geometry for HW RT assumed to have static topology, although there is no technical reason this limiation should exist?'

The problem is completely unrelated to other topics like sympathizing more with flexible SW or accelerated HW, or judging how practical realtime raytracing at all is right now. It's a purely technical / API problem which requires solution no matter what's our individual mindset or opinion.
But sadly, related discussion always is dominated by such unrelated subjective arguments, which obscures the true problem, and only helps to postpone an eventual solution, if anything.

People should focus on their common interests, of which there are plenty, instead arguing about ideological differences. That's the only way, no matter if we want better tech, or saving the planet, or whatever else.
Idk why this is so hard. Probably the problem is identifying those common interests, or arguing being just more fun.
Can you define “continuous”? DXR with RTX allows updating pointers to the correct mesh LOD during top level rebuilds which most games do every frame anyways. Nvidia has even implemented stochastic LOD but is limited to 8 levels of transitions between LODs.

Are you arguing that DXR doesn't allow for enough transition levels between meshes. Or the solution itself isn't performant enough.
 
DXR with RTX allows updating pointers to the correct mesh LOD during top level rebuilds which most games do every frame anyways.
Honestly rebuilding the entire TLAS is not sustainable in the long run. Nanite can (and does) handle millions of instances... the cost of rebuilding a TLAS for all of that is prohibitive and unnecessary. But even if all of this is static (which it isn't) you can't afford to bake it into BLAS's because the memory cost would be ridiculous since these instances combine together to add all the unique detail everywhere.

The fixed two-level structure and opaque traversal/rebuilding is going to have to change to make truly "next gen" open world content feasible to ray-trace.
 
The fixed two-level structure and opaque traversal/rebuilding is going to have to change to make truly "next gen" open world content feasible to ray-trace.
Are there any working parties actually attempting this?

Traversal shaders seems to be the obvious suggestion, yet again.

I'm tempted to suggest that traversal shaders could also be used to build. Well, not perform the build, but to assess sections of acceleration structure for their quality and generate a prompt for a section to be rebuilt. If traversal can assess distance from camera and can be given a "quality" threshold, the assessment can be tuned in very useful ways.

I also wonder about compositing BVHs. It seems its practical to do so, so the question is how much compositing is practical?

In the meantime, at least one of the consoles has non-opaque build/traversal because it's not constrained by DXR...
 
Back
Top