Sorry, a noob question: how is possible that a software solution beats an hardware one?
The software one is doing less work for a lower quality result.
If it was just that, they should be able to reduce (or increase in other cases) HW quality so performance matches that of SW Lumen.
In terms of software RT versus hardware RT it's due to the cost of building the BVH for hardware RT combined with the geometric complexity of Nanite which requires using a much lower geometric representation of the nanite geometry in order for things not to completely bog down when using hardware RT.
So, it's a complex issue for Epic on how to get performant hardware RT without having it completely choke on the geometric complexity that Nanite brings. It's something they've been chipping away at over time. Their first attempt at hardware RT looked like ass, IMO, due to just how coarse the abstract geometry was that they used in order to build the BVH.
Does not make perfect sense either. You say that low poly proxies give worse results than low resolution SDF volumes, but i assume the low poly proxies are a better approximation than the SDF volumes. SDF volumes can't match visual Nanite detail either without doubt, because memory limits.
In general, to increase resolution by a factor of 2, volume data needs 8 times the data, while surface data like triangles + BVH only needs 4 times the data. From this we can conclude volume data becomes unpractical much more quickly as detail increases, so it's almost impossible SDF gives them higher quality.
I don't know the true reasons either. When the question came first up back then, there was the argument that HW RT has a problem with kit bashed, intersecting models, like those many layers of rocks used to model caves in the first UE5 demo. Basically a ray has to traverse many bottom level BVH structures, which hurts performance.
And i had assumed SDF has an advantage here, because we could use distance at ray entry to trace only the closest model while quickly skipping all others.
But this was wrong. My assumption only holds for the case of a closest point query, like we would use for physics collisions for example.
For a ray intersection test, having distance at ray entry does not help at all. We can't skip other SDF models just because the distance there is larger. They might still have a closer intersection than the model with the shortest distance on entry.
So i was wrong, and like with HWRT, we need to process all models where the ray intersects their bounds. There is no SDF advantage here either.
But there is one case left where an SDF advantage may indeed apply:
In the distance, afaict UE5 calculates a single global and static SDF from all models. So instead 1000 rock models we only have one. No more overlapping of multiple models, resolution is low, so SDF tracing will be fast.
But probably there is no such globally merged low res model for HWRT, removing all the hidden geometry. So a HW ray may still need to traverse multiple BLASs along its way, eventually at a level of detail which is higher than needed.
Maybe the next Nvidia GPU will have hardware BVH acceleration. I imagine games would have to be designed specifically or be patched to move this off the CPU?
We know HWRT has a big CPU cost too, but i don't know why. I speculate that the TLAS is built on CPU maybe, eventually at higher quality than the BLASs which are surely built on GPU.
But that's just guessing. In any case, the API abstraction means IHVs can do what they want, so there would be no need to patch games on changes.
With BVH acceleration you likely mean fixed function HW units specifically to build BVH. Like ImgTech already had long before.
But notice this would not solve the problem we have with Nanite. If only one patch on the model changes detail, a HW builder would still need to rebuild the entire BVH for the whole model from scratch.
HW acceleration would be faster than now maybe, but still too inefficient to be practical for LOD.
That's why i hope we'll never see a HW BVH builder. What we really need is the flexibility to change the BVH locally, to reflect those local changes of detail.
There is no way around that. A HW BVH builder would be just a short sighted waste of time and chip area.