Unreal Engine 5, [UE5 Developer Availability 2022-04-05]

Yeah, those rocks... can't wait for artists coming up with detailed tech / SciFi environments. I see some UE5 limitations for terrain, but i don't see any for such tech stuff. Really jaw dropping things ahead, i guess.
I missed the feeling to be impressed from games, but i think it will come back very soon.
 
David Jaffe said bend studio worked on days gone 2 moving from ue to decima engine. And we know now it will be new ip after cancelation of dg but it seems it will be on decima not ue5.
Makes sense for any large publishing house to prefer their own tech over a licensed one. And it's fine unless this becomes forced to a point where you can't use any tech but your own even if it's not that well suited for the game you're making.
 
Nanite is just light years ahead of other rendering engines.
Pretty sure it is not. You can throw billions of polygons at RT too, any engine with RT support can use it for similar or higher numbers of triangles in scenes with instancing.
This comes down to a simple fact that ray tracing scales logarithmically with the number of triangles (since BVH is kind of lossless LOD system by itself), so you can use this property to draw infinite amounts of super high poly static meshes, video memory capacity is the only limitation.
I've tried primary rays with disabled prepass and other rasterization technics in UE5 and they work flawlessly with hundreds of millions of polygons with 100% nanite proxies and no LODs at all. Reflections and shadows work well too in the same scenes, there is a small issue with shadows since Nanite losses some details at distance and RT continues to use the highest LOD.
So I guess if scene is not comprised of clusters which are problematic for BVH builders, HW RT works just fine. There is a number of optimisations which can be applied to make Nanite clusters more friendlier for RT. And the real issue might be proxies sizes for large projects, so I don't think using multiple LODs for RT is a good idea.
 
(since BVH is kind of lossless LOD system by itself)
That's simply wrong. It only is a spatial data structure to locate data, but the data has no levels of detail.
But we can use the hierarchy of any tree to access hierarchies of levels of detail, which is what Nanite is doing but classical RT did not and DXR does not yet allow.

Now you say you can trace bazillions of triangles and there is no more need for LOD at all, basically. But even if this would be true (it isn't - log n cost still increases with adding tree levels), we still can not have this in memory.
And it also causes aliasing. LOD is also necessary to prefilter geometry so point sampling methods like rasterization or RT work good enough.

You propose to solve a problem with pure HW brute force power (assumingly growing forever and for free), which is the same as ignoring the problem.
I'm happy UE5 finally proofs the alternative, using better algorithms, gives the biggest leap seen since a decade.
 
It only is a spatial data structure to locate data, but the data has no levels of detail.
BVH is LOD for rays, not for geometry, so that instead of searching trough the billions of triangles, we can search just through a small subset of the primitives encapsulated in the hierarchical data structure.

The main advantage of BVH is that you don't need to simplify the original geometry to reduce the amount of work.
The main disadvantage is that it's dependent on scene topology to some extend, so mindlessly slapping geometry together may not always work, but that's rather corner cases and they can be worked around.

But we can use the hierarchy of any tree to access hierarchies of levels of detail, which is what Nanite is doing but classical RT did not and DXR does not yet allow
Nanite uses clusters for LODs, so that triangles are close to pixel sizes, but not smaller. Does it really make sense to do the same with RT?
It does not because performance in RT is not dependent on pixel sizes of triangles, so keeping triangles at 1 pixel level sizes is pointless for RT, all other problems can be solved with standard discrete LODs.
Moreover the issues with Nanite and Foliage (i.e. topology of foliage) are exactly caused by the hierarchy of LODs which Nanite uses, i.e. this system has no method of scaling down leaves and other parts of foliage so that detalization losess are not visible to us, and with something like RT, it will simply work because there are no losses in details of the underlying geometry.
Other than this, Nanite requires hell a lot of clustering, preprocessing, culling, etc, that's why it doesn't work with dynamic geometry. RT does work with dynamic geometry because it way simplier and doesn't require expensive preprocessing (non feasible in realtime).

But even if this would be true
It's true, there are tons of examples on YouTube, etc.:
Billions of triangles is hardly something new for these, who have watched for RT development.
Moreover, BVH is highly scalable and you can check way more boxes per level than 4 in AMD hardware.

we still can not have this in memory.
This certainly can be optimized, it just that nobody have tried it yet.

LOD is also necessary to prefilter geometry so point sampling methods like rasterization or RT work good enough.
Unless all atributes are stored in geometry (which is highly inefficient since it takes way more space without texture compression formats), sampling textures for normals, etc should not be a problem and it's not a problem in UE5 with RT.

You propose to solve a problem with pure HW brute force power (assumingly growing forever and for free), which is the same as ignoring the problem.
That's not the same as ignoring the problem, that's just a different way to solve the same problem.

I'm happy UE5 finally proofs the alternative, using better algorithms, gives the biggest leap seen since a decade.
I don't think it's a better algorithm, it's way more complex and has many limitations on supported geometry types because of this complexity, it is also made of different solutions and doesn't solve tons of problems in nice, transparent and unified way like RT does.
 
Last edited:
Jaffe claimed it was coming to PS4 with means it almost certainly isn't as his predictions about every first party PlayStation game have been wrong for years. By the law of averages some of his predictions should have panned out but now but he's a barometer for what is not going to happen. He's not even wrong only about games, in January 2020 he predicted Sony would show PS5 within four weeks. If he predicts sunshine, buy an umbrella.

Wow, Jaffe actually got one right?
 
Pretty sure it is not. You can throw billions of polygons at RT too, any engine with RT support can use it for similar or higher numbers of triangles in scenes with instancing./QUOTE]

I don't understand what you're proposing -- using a ray tracer to produce your diffuse image instead of nanite and the deferred pass then letting the rest of the pipeline continue as is? Nanite is fast, there's no way you're going to beat their performance on their scenes. Raytracers may scale logarithmically with triangle count but they scale very badly with resolution, even with 1 sample and no bounces.
 
Pretty sure it is not. You can throw billions of polygons at RT too, any engine with RT support can use it for similar or higher numbers of triangles in scenes with instancing.
This comes down to a simple fact that ray tracing scales logarithmically with the number of triangles (since BVH is kind of lossless LOD system by itself), so you can use this property to draw infinite amounts of super high poly static meshes, video memory capacity is the only limitation.
I've tried primary rays with disabled prepass and other rasterization technics in UE5 and they work flawlessly with hundreds of millions of polygons with 100% nanite proxies and no LODs at all. Reflections and shadows work well too in the same scenes, there is a small issue with shadows since Nanite losses some details at distance and RT continues to use the highest LOD.
So I guess if scene is not comprised of clusters which are problematic for BVH builders, HW RT works just fine. There is a number of optimisations which can be applied to make Nanite clusters more friendlier for RT. And the real issue might be proxies sizes for large projects, so I don't think using multiple LODs for RT is a good idea.
Pretty sure it is not. You can throw billions of polygons at RT too, any engine with RT support can use it for similar or higher numbers of triangles in scenes with instancing.
This comes down to a simple fact that ray tracing scales logarithmically with the number of triangles (since BVH is kind of lossless LOD system by itself), so you can use this property to draw infinite amounts of super high poly static meshes, video memory capacity is the only limitation.
I've tried primary rays with disabled prepass and other rasterization technics in UE5 and they work flawlessly with hundreds of millions of polygons with 100% nanite proxies and no LODs at all. Reflections and shadows work well too in the same scenes, there is a small issue with shadows since Nanite losses some details at distance and RT continues to use the highest LOD.
So I guess if scene is not comprised of clusters which are problematic for BVH builders, HW RT works just fine. There is a number of optimisations which can be applied to make Nanite clusters more friendlier for RT. And the real issue might be proxies sizes for large projects, so I don't think using multiple LODs for RT is a good idea.

They mentioned logn scaling with ray tracing, but decided it was not fast enough. Their talk about evaluating all of the different options they looked at starts around the 57 minute mark.


Edit:
If we use ray tracing approach, that scales with log n of the triangles, which is nice, but not enough. We couldn't fit all of the data of this demo in memory, even if we could render it fast enough. We still have to rememer virtualized geometry is partly about memory. We're trying to virtualize the memory. Ray tracing isn't fast enough for our target, on all of the hardware that we want to support, even if it could fit in memory, so we really need something that is better than log n scaling. To think about this another way, there are only so many pixels on screen. Why should we draw more triangles than pixels?
- Brian Karis around 1:06 mark.
 
Last edited:
BVH is LOD for rays, not for geometry, so that instead of searching trough the billions of triangles, we can search just through a small subset of the primitives encapsulated in the hierarchical data structure.
BVH is not LOD for rays. Rays obviously need no LOD because they are just a line.
So what BVH achieves is reducing the need to iterate over all triangles, but you agree this has nothing to do with detail.

The main advantage of BVH is that you don't need to simplify the original geometry to reduce the amount of work.
No.
This is the first match i found for 'level of detail raytracing': http://gamma.cs.unc.edu/RAY/ They talk about simplified geometry integrated into kd-tree, and they get speed ups of 2x - 20x.
I didn't read the paper - the idea is obvious. Example:
BVH:
tree level 0: coarse geometry
level 1: a bit more detail.
level 2: medium detail.
level 3: high detail.
level 4: super high detail.
If we trace a ray, we look at its current length and know upfornt: We are happy with medium detail because the model is distant. We only need to traverse the hierarchy down to level 2, so we save half of the work. Even on RTX 3090 or any other magic black box, it is only half of work.
Even better: We set up all models based on camera distance to have only the levels of BVH and eventually only the single level of detail which matches screen resolution. And we stream in or BVH only down to this level to save memory.
We get much higher performance AND better image quality.
This, and only this is how realtime raytracing has to look like.
If this is not possible, IMHO RT is inefficient and ill defined for the needs of games.

Conclusion: Not Nanite has to be fixed to support RT. RT has to be fixed to support Nanite. Because Nanite is clearly ahead in efficiency and scaling, or general understanding of how hierarchies can be used to solve many problems at once.
Wake me up when DXR is ready. Til then, have fun with marbles, Lego, and a niche market of rich gamers.

The main disadvantage is that it's dependent on scene topology to some extend, so mindlessly slapping geometry together may not always work, but that's rather corner cases and they can be worked around.
If you mean UE5s overlapping models, then i need to ask: Rasterization has the exact same problem causing overdraw. So why can we now solve this problem with HW rasterization, but not with new and shiny HW raytracing? Do we really need to wait 20 years again? Is it ok to repeat the same mistakes using shallow excuses like 'We need to get started with something... It will get better, maybe...'. Nah. It's not. The damage has been done and now they need to fix their broken designs on our costs.

Nanite uses clusters for LODs, so that triangles are close to pixel sizes, but not smaller. Does it really make sense to do the same with RT?
Sure it does! Less triangles is faster and less space, less tree levels is faster and less space. It's as simple as that.

It's true, there are tons of examples on YouTube, etc.:
Implement BVH traversal and tracing yourself and you'll see it's not true. What you see is not 'ture', it only is 'good enough'. But good enough is not optimal.
And i don't want 1000 instances of a Ferrari in my GTA. I want 1000 cars each looking different.

This certainly can be optimized, it just that nobody have tried it yet.
We can't try at all because it's black boxed. Though there are many papers about compression of acceleration structures, and they all have in common: More triangles -> more memory.

That's not the same as ignoring the problem, that's just a different way to solve the same problem.
Waiting until somebody else (NV, MS...) solves your problem is the same as ignoring it. You may trust in them to understand and solve our problems, i don't. Recently i even see an increasing conflict of interests between HW vendors and games industry, so i doubt it even more. We need to guide them, not the other way around.

I don't think it's a better algorithm, it's way more complex and has many limitations on supported geometry types because of this complexity, it is also made of different solutions and doesn't solve tons of problems in nice, transparent and unified way like RT does.
Correct, but they focus on the low hanging fruit first, which is opaque, static geometry, making the vast majority of game worlds. Results speak for themselves. Also, geometry is generally harder than graphics, and their solution is practical and a very good compromise.
 
BVH is not LOD for rays. Rays obviously need no LOD because they are just a line.
Thanks for explaining obvious things (don't know to whom, but ok), that's not what I was talking about.
The purpose of LOD is to reduce working set, the BVH has exactly the same purpose for triangle intersection tests, i.e. for rays.

but you agree this has nothing to do with detail.
And that's great that it has nothing to do with detail.

If we trace a ray, we look at its current length and know upfornt: We are happy with medium detail because the model is distant.
BVH already saves the most of work and when you trace something at a distance you don't trace 8 millions of full 4K rays for it, distant objects take just a small fraction of pixels on screen, tracing an object of 128x128 pixels would take 16K rays, for example, with variable rate sampling you can save even more by tracing even fewer rays inside of geometry.
And then you can still use automatically generated LODs for these tiny distant objects to trace them even faster and still with conservative subpixel geometry details so that you can still draw grass and foliage. There is no need in the fancy lossy culling and clustering LOD schemes, that's overengineering at its best.

If this is not possible, IMHO RT is inefficient and ill defined for the needs of games.
Keeping 1000 solutions for different cases is inefficient and ill defined for the needs of games.
Getting rid* of geometry complexity limitation with Nanite (* - without foliage, skinned and deformable geometry) will not help solving lighting, etc.

Because Nanite is clearly ahead in efficiency and scaling, or general understanding of how hierarchies can be used to solve many problems at once.
It doesn't solve many problem, it merely solves just 1 problem with lots of "don't do that" stuff.

So why can we now solve this problem with HW rasterization, but not with new and shiny HW raytracing?
Because there is 0 time spent on solving this problem with Nanite obviously.

Do we really need to wait 20 years again?
Nanite is here and it works well with HW Ray-tracing with 100% proxies, download UE5 and go check yourself. I don't get why you are whining here instead of simply checking this stuff in the editor.
Besides, why would anybody need to wait for 20 years? You can merge meshes right in the UE5, so with a little bit of optimization effort it's perfectly possible to optimize even the most difficult projects for RT.

Sure it does! Less triangles is faster and less space, less tree levels is faster and less space. It's as simple as that.
Nobody wants keeping millions of lods in memory, especially if these lods would end up being useless for perf and would degrade quality.

Implement BVH traversal and tracing yourself and you'll see it's not true.
Why all of a sudden testing in the UE5 would not be enough? I already said that I had tested this in RT debug mode where primary visibility had been calculated with RT

I want 1000 cars each looking different.
So would I, but we still have to see any racing game with Nanite if it's possible at all.

Waiting until somebody else (NV, MS...) solves your problem is the same as ignoring it.
Solutions have been suggested many times in this thread, you can solve the problem right in the editor by spending time on project optimization (the issue is that this project has never been intended for Lumen or RT demonstration, there is 0 work for it and it even doesn't use detail cone tracing for SW Lumen) and it worth keeping in mind that these solutions were suggested for rather corner cases and not something which is expected in every UE5 game.
 
The idea of clusters in UE5 seems to be very similar to meshlets in the Mesh Shader api, even containing a similar number of triangles (~120, I think). Mesh shaders seem to be able to do all of the things that nanite can't do right now (deformation, animation, tesselation), and with such a similar data concept in terms of breaking a model down into "meshlets" vs "clusters" of similar size, I just wonder if they can't feed into each other to get the best of both.

I'm also really curious to see how DXR changes if virtualized geometry becomes popular. Seems like there needs to be a way to dynamically adjust the lod of the meshes in the AABBs without expensive operations. If the virtualized geometry resides fully on the gpu, they need some way to share data between the scene for rasterizing and the bvh for ray tracing. Maybe some pointer chasing.
 
I suspect in the long run the big innovation credited for ue5 is going to be artist/developer workflow and productivity. Doing away with extra steps/limitations is good. Increased productivity should turn either to budget savings or more impressive games if budgets are not cut.
 
My understanding is LOD isn't free with a BVH, it depends on how it's organized. I recall a Siggraph 2014 panel about this and while I didn't take good notes Blue Sky Studios said they created a sparse voxel octree for each asset in Rio 2. This gave them level of detail but cost a lot of disk space. I don't know what their data structure looked like above each asset, but I assumed it was not an SVO meaning they had a different Top Level Acceleration Structure from the BLAS.
 
They mentioned logn scaling with ray tracing, but decided it was not fast enough. Their talk about evaluating all of the different options they looked at starts around the 57 minute mark.


Edit:
- Brian Karis around 1:06 mark.

Interesting that he stated it as so though:

"Ray tracing isn't fast enough for our target, on all of the hardware that we want to support,"

That implies it's fast enough on some hardware, but not all.
 
The idea of clusters in UE5 seems to be very similar to meshlets in the Mesh Shader api, even containing a similar number of triangles (~120, I think). Mesh shaders seem to be able to do all of the things that nanite can't do right now (deformation, animation, tesselation), and with such a similar data concept in terms of breaking a model down into "meshlets" vs "clusters" of similar size, I just wonder if they can't feed into each other to get the best of both.

I'm also really curious to see how DXR changes if virtualized geometry becomes popular. Seems like there needs to be a way to dynamically adjust the lod of the meshes in the AABBs without expensive operations. If the virtualized geometry resides fully on the gpu, they need some way to share data between the scene for rasterizing and the bvh for ray tracing. Maybe some pointer chasing.

This features will arrive later in Nanite and it has the advantage to work on GPU without mesh shader. When DXR will evolve with more flexibility and LOD management. They will probably push RT too.



Another interesting tweet
 
BVH already saves the most of work and when you trace something at a distance you don't trace 8 millions of full 4K rays for it, distant objects take just a small fraction of pixels on screen, tracing an object of 128x128 pixels would take 16K rays, for example, with variable rate sampling you can save even more by tracing even fewer rays inside of geometry.
I know what you mean. In other words: If we have a 1M poly model at a screen size of 128^2 pixels, we only trace 128^2 rays, not 1M. That's an advantage of lodless RT over lodless rasterization which iterates all triangles. But it does not compensate for the lost advantage of having LOD!
Example:
A binary tree over 1M triangles has 707 tree levels, so traversal does 707 steps to find the intersection.
A detail version of the model fitting our screen area has approx 128^2 * 2 (for back sides) triangles. We reach this level of detail after 128 tree leves, so we get a speed up of 5.5.
The memory difference is 500k vs. 16K, so 31 times less of that.

Do you agree on this?

A lighting strike has killed my connection, so i redid this example but all the replys to other comments got lost. But we turn in circles anyways only repeat the same arguments.
 
This features will arrive later in Nanite and it has the advantage to work on GPU without mesh shader. When DXR will evolve with more flexibility and LOD management. They will probably push RT too.

Yah, I suppose even if they could make things were perfectly by doing deformation and animation with mesh shaders, they'd still want a more general compute solution that would be portable to other platforms that don't use directx or vulkan mesh shader apis
 
Interesting that he stated it as so though:

"Ray tracing isn't fast enough for our target, on all of the hardware that we want to support,"

That implies it's fast enough on some hardware, but not all.

From their tweets it appears that RT on all hardware isn't yet fast enough for the primary ray and current BVH implementations also offer problems that need to be overcome. So, any RT using current PC RT will focus on secondary rays.

Regards,
SB
 
Back
Top