Mega Geometry sits in the middle of rasterization and raytracing

MfA · Feb 10, 2025

Doing on the fly construction of acceleration structures for potentially visible LOD'd geometry isn't any more expensive than Nanite and offers high detail geometry for raytraced ambient occlusion and shadows, but it brings back problems raytracing is supposed to be ridding us off, "overdraw" and occlusion culling. It's a temporary hack to very specifically fit Nanite ... but Nanite is only the first production ready smooth LOD solution and it's not necessarily the best.

Everything is so rigid and boring ... sigh.

Andrew Lauritzen · Feb 10, 2025

I don't agree, but it might be because you have a very specific implementation in your head.

Driving some of the RT LOD/AS streaming from primary rays is probably a good initial plan, but it's not an exclusive decision. Even with Nanite the streaming pool is separately sized and largely decoupled from the rendering portion, and various controls exist around root pages/minimum LODs/prestreaming and so on. But critically it's not just primary rays that feed into streaming requests... at a minimum shadow rays also do (when using VSMs, because primary ray LODs are absolutely not good enough even for shadows), and there's no reason other secondary rays for raytracing can't as well (with some additional ray differential complexity but that is already shared with other things).

I guess I'm a bit unclear on what you consider an alternative here. Obviously there's no world (now or in the future) in which you have BVHs that go all the way down to the finest LODs in memory for every object at the same time. Some sort of streaming system is required, even more-so than rasterization as the storage overhead is higher with RT. That system needs to be fine-grained enough that it doesn't fall down on some big merged high poly mesh (i.e. discrete LODs), but the streaming and priorities can be driven from arbitrary heuristics that work for a given game.

MfA · Feb 10, 2025

Streaming is a separate issue.

LOD traversal should be done during ray traversal. Not by continually rejiggering a BVH based on camera distance and a potentially visible set.

Andrew Lauritzen · Feb 11, 2025

MfA said:
Streaming is a separate issue.

LOD traversal should be done during ray traversal. Not by continually rejiggering a BVH based on camera distance and a potentially visible set.

I think that's where you're assuming an implementation that certainly isn't how Nanite works. As far as I'm concerned the whole CLAS API is about streaming, *not* LOD. None of the changes affect what rays hit, only how we get the acceleration structure there. Very specifically Mega Geometry does not attempt to address tracking ray differentials (or equivalent) or having hierarchies of simplified geometry in the BVHs themselves so that rays can stop early. Maybe that's something that will get some attention in the future, but that is not what Mega Geometry is about.

However you decide to compute the levels of detail you want to be resident, stream it in and trace it is entirely up to you. I agree that driving it only from primary rays is not going to work very well... but that's not really even how Nanite works let alone other systems I can imagine. Is there some reason why you don't think you can drive the streaming - and thus the available detail levels of resident geometry - from your traced rays or whatever heuristic you want?

MfA · Feb 13, 2025

Mega geometry does a ray differential from the camera at cluster level

I don't care for the straight conversion of Nanite but with rasterization replaced by throwing the clusters in a BVH. Why should it matter how Nanite works for ray tracing?

Intel's "Real-Time Ray Tracing of Micro-Poly Geometry with Hierarchical Level of Detail" would not be more difficult to add into Unreal just because it does not so closely parallel Nanite's way of rendering.

Andrew Lauritzen · Feb 13, 2025

MfA said:
Mega geometry does a ray differential from the camera at cluster level

I don't care for the straight conversion of Nanite but with rasterization replaced by throwing the clusters in a BVH. Why should it matter how Nanite works for ray tracing?

Intel's "Real-Time Ray Tracing of Micro-Poly Geometry with Hierarchical Level of Detail" would not be more difficult to add into Unreal just because it does not so closely parallel Nanite's way of rendering.

I think you're going to have to spell out your assumptions more here. Or maybe what you're calling "mega geometry" specifically. I honestly don't understand the argument you are making (or even how you claim things work) from the one-liners. To be clear, there's lots of degrees of freedom here and I'd love to understand your thoughts but your statements don't make sense to me... maybe it's just me though

MfA · Feb 14, 2025

GitHub - nvpro-samples/vk_lod_clusters: Sample for cluster-based continuous level of detail rasterization or ray tracing

Sample for cluster-based continuous level of detail rasterization or ray tracing - nvpro-samples/vk_lod_clusters

github.com

vk_lod_clusters/shaders/traversal_run.comp.glsl at main · nvpro-samples/vk_lod_clusters

Sample for cluster-based continuous level of detail rasterization or ray tracing - nvpro-samples/vk_lod_clusters

github.com

That last stage is roughly equivalent to the occlusion culling and cluster selection which kicks off rasterization in Nanite, but instead it updates a BVH for ray tracing.

It's as straight a conversion of Nanite to ray tracing as possible, but in doing so it sits in the middle of rasterization and ray tracing. Inheriting things like occlusion culling, which ray tracing really should rid us off. "Real-Time Ray Tracing of Micro-Poly Geometry with Hierarchical Level of Detail" is a much more pure ray tracing solution to the same problem.

Andrew Lauritzen · Feb 14, 2025

Ok so first I see now that you're calling what that sample does "Mega Geometry", while that is just one potential way to implement various things as this is all in software control and none of the decisions you seem to have a problem with are baked into the actual API/hardware. That said, proceeding on the discussion around the sample I'm still a little confused as to the line you are drawing between it and the Intel paper.

MfA said:
That last stage is roughly equivalent to the occlusion culling and cluster selection which kicks off rasterization in Nanite, but instead it updates a BVH for ray tracing.

Yes, because you need some general pass that answers the question "what LODs do I want resident for my scene right now". The Intel paper has the same thing, and with very similar heuristics. I'll quote section 4.1:

4.1. LOD Cluster Selection Given a DAG2 hierarchy of clusters (Sec. 3.3) a subset of clusters needs to be selected which will be decompressed and converted into a QBVH6 (Sec. 4.2). We therefore traverse the DAG2 in a top-down manner, starting at the root clusters, which represent the coarsest LOD. Each node gets to decide whether its LOD is sufficient based on the compressed AABB stored in the cluster header. Anode and its neighbor arising from the same split (if any) always store the same AABB and thus they make the same LOD decision. If the LOD is sufficient, the cluster is selected for inclusion in the BVH. Otherwise, all of its children will be tested subsequently. In Appendix A, we prove that such a top-down traversal is guaranteed to give a complete and crack-free mesh.

The heuristic that determines whether a cluster has a sufficient LOD differentiates between clusters inside and outside the view frustum. For the sake of secondary rays, clusters completely outside the view frustum are not discarded. However, they use a coarser LOD, solely based on the distance to the viewer. For clusters inside the view frustum, the cluster’s compressed AABB (stored in the cluster header) is projected onto the image plane. Then the length of the diagonal of the projection of the 2D AABB is computed. The DAG2 top-down traversal stops if the diagonal length is smaller than a threshold, in our case 24 pixels. Note that LOD selection happens once per frame, not per ray, so these heuristics apply equally to all primary and secondary rays.

This also sounds extremely similar and sensible as a base implementation right? i.e. for onscreen geometry you project it and compute rough triangle sizes in camera space. For offscreen geometry you use some sort of distance-scaled representation. Regarding occlusion culling, that's entirely up to the implementation whether you use that as part of your heuristic or not. I would imagine for practical reasons you do want to scale the quality of occluded geometry - just like offscreen geometry - but how much is entirely up to the heuristic. Obviously if you want to just stream all your geometry at high LODs based on purely solid angles or something similar you are welcome to, but you will almost certainly run into VRAM problems with that approach.

You could of course imagine in the future driving or augmenting some of this selection with feedback from secondary rays/differentials themselves; that would be the only real way to capture stuff like refraction and the like in sufficient detail, but is also brings its own can of worms that probably doesn't make sense for the near term.

MfA said:
It's as straight a conversion of Nanite to ray tracing as possible, but in doing so it sits in the middle of rasterization and ray tracing. Inheriting things like occlusion culling, which ray tracing really should rid us off. "Real-Time Ray Tracing of Micro-Poly Geometry with Hierarchical Level of Detail" is a much more pure ray tracing solution to the same problem.

This stage of Nanite has nothing to do with rasterization; most people focus too much on the rasterization portion of Nanite, but that is a small part of what it does. Indeed the UE NVIDIA Mega Geometry demo obviously uses Nanite but didn't do any rasterization by my understanding.

I covered the Intel paper above... it seems to describe doing almost exactly the same thing to me sans occlusion culling (which as noted is optional in any implementation) so I'm not sure what differentiation you are drawing.

MfA · Feb 15, 2025

Oops, I completely misread how Intel was descending the hierarchy :/ Well replace that with the AMD method of approximation using the BVH hierarchy then which raytracing fan mentioned

Occlusion culling is an inherent part of rasterization IMO and I'd rather not see all its foibles inherited in a ray tracing engine. With primary ray ray tracing, I was hoping it was to disappear.

Mega Geometry sits in the middle of rasterization and raytracing

MfA

Andrew Lauritzen

Moderator

MfA

Andrew Lauritzen

Moderator

MfA

Andrew Lauritzen

Moderator

MfA

GitHub - nvpro-samples/vk_lod_clusters: Sample for cluster-based continuous level of detail rasterization or ray tracing

vk_lod_clusters/shaders/traversal_run.comp.glsl at main · nvpro-samples/vk_lod_clusters

Andrew Lauritzen

Moderator

MfA

Similar threads