Thanks for the insight. Can you elaborate on the scenarios where VSMs match or exceed RT accuracy? I haven’t seen a recent paper on VSMs but I’m sure the tech has advanced a lot in the past few years.
The concept of virtual shadow maps (sorry about the acronym... yes it's endlessly confusing but eventually you just give up
) has not changed a lot, mainly just the "other 90%" of the work/details to make them actually usable in a general purpose engine. We're not there yet for all workloads, but it's getting better over time.
In terms of what they tend to do better from a production point of view right now: large scenes with lots of geometric detail. Will get it into it a bit more below but fundamentally RT runs into various walls with building and updating sufficiently large BVHs.
I mean, maybe let Lumen handle distant lights? If their contribution is low enough you might not notice artefacts if you splat lights into the distance trace gather after a certain cutoff, use something like restir to estimate occlusion, or dedicate a ray to trace directly? Anyway.
Yes indeed, that is something we are exploring but we need solutions for when Lumen is disabled as well. Falling back to screen-space traces only may be a reasonable compromise for some scenes but I imagine we'll need a middle ground too. We've also thought about DF shadows as well, but again mesh distance fields are not enabled in all projects. The joys of giving the users tons of options and then trying to make all the paths as orthogonal as possible I guess
It’s pretty wild that it’s still more desirable to rasterize and sample multiple cascades of 16k shadow maps per light than it is to cast shadow rays at geometry with importance sampling. I had thought we were at an inflection point where RT would be the more efficient choice for non-trivial lighting scenarios.
So the key is that we're not actually rasterizing 16k shadow maps... we're rasterizing in general comparable or less resolution even than conventional 2k * 4 shadow cascades would have done. We're just doing it in much smarter places with a lot less waste. As long as you can keep the culling aspect fine-grained enough (as we can with Nanite), it is almost a pure win.
I would say it’s actually the opposite. Techniques like VSM are viable if you stay within strict bounds of scene complexity. I suspect RT scales more efficiently with the number of shadow casting lights for example so is more flexible from that perspective. Given UE5 supports both methods it would be cool if someone did a benchmark for a head to head comparison.
As usual, it just depends on how you define "complexity". Naively, RT lights more easily deal with higher shadow casting light counts but they run into significant bottlenecks with denoising that undercuts that benefit. There are ways around this of course, but all with their own tradeoffs. Virtual shadow maps and Nanite on the other hand deal far better with highly complex geometry and large worlds than does RT currently. Sure you can create entirely static scenes with the BVH precomputed and fully in VRAM where it works, but these are not realistic for many games, or even most of our largely-static demos to date.
Ultimately the real limiting factor for RT is not how efficient it is to trace rays into a precomputed tight BVH, but how efficient it is to maintain and update non-trivial amounts of a BVH to the required level of accuracy. Building a BVH is similar complexity to rasterization, and generally with higher constant factors. Thus I think the comparison going forward should be more around how does rasterizing something with Nanite compare to building/updating a BVH of that geometry as the camera/lights move around.
Cached virtual shadow maps sit in this sort of interesting middle-ground between rasterization and raytracing. They are relatively efficient with nanite geometry, which precomputes some aspects of the topology and so on to allow finer-grained culling, but supports efficient streaming and LOD. They are less efficient with stuff stuff like foliage using vertex animation and heavy alpha test.
But of course those things are *even worse* with raytracing, as the cost of rebuilding a reasonable BVH is even higher than the cost of re-rendering appropriate virtual shadow map pages or similar. Indeed everything that is bad for nanite + virtual shadow maps is bad for raytracing as well, and more.
The problem is that any continuous LOD solution requires dynamic gemoetry. Triangles subivide or join depending on camerea distance, so our mesh has no longer static topology even if it's just static background. In other words: The whole scene becomes dynamic geometry.
Current RT APIs don't support this. They assume topology to be static even for skinned characters. Refitting BVH instead rebuilding only works with static topology.
I'm not even willing to go so far as to say that we require continuous LOD in a topology-altering fashion in the future. Nanite fixes topology after all and still maintains watertight rasterization via other cleverness and avoids popping by achieving a low enough error threshold. Whether achieving error thresholds of that order is viable for raytracing remains to be seen, but I'm not yet willing to put my hat down on "definitely no".
That said you've definitely pointed at the core issue here: RT needs *some* good solutions for streaming and LOD that are much finer-grained than the current APIs (and possibly hardware) support. I worry that the cost of recreating or streaming BVH nodes at the granularity that is required might be too expensive even if APIs weren't in the way, but that remains to be seen.
The tracing part of the equation is almost irrelevant to me at this point, other than the vague notion that we'll likely have to make it more expensive by sacrificing the quality of BVHs to solve the real problem. The real problem that is preventing people from shipping this stuff without resorting to lower detail proxy-style stuff for secondary rays is the BVH streaming/update/LOD part of it. Certainly there are going to be cases where you can just preload a static BVH for the whole scene in which case you can get some really nice area lighting and soft shadows and so on. But in terms of being an end to end replacement solution that can service high frequency lookups like direct shadowing in large open worlds, there are many unknowns and problems to be solved there yet.