The gist is that instead of traversing deeper into the tree you can decide to stochastically sample just one of the triangles underneath the current node and use that as an approximation for the pixel covered by that node.
A way better approach: Instead of making assumptions from internal BVH boxes (which wont represent the surface well at all), make those boxes the leafs, containing just one triangle (or two) which a best fit of the geometry and materials.
Then you can use classical RT HW, you get the same wins, no need for new HW.
So basically we talk about LOD here again. And this example is exactly what brought me to the conclusion that hierarchical LOD and acceleration structures can (and probably should) be the same thing.
But i don't think using one for both needs is practical yet. For now it would be good enough if we could convert one to the other, saving the costs to build at runtime and enabling fine grained LOD for RT.
Do you have specific examples in mind of useful api extensions? The DXR interface is essentially “build an acceleration structure with this bag of triangles”. It does not mandate that the structure is a BVH or anything else. This gives maximum flexibility to the IHV and more room to innovate rapidly on the hardware side. The downside is that it’s completely opaque to developers.
Specifying the data structures they use needs no examples. But we can discuss how a 'BVH API' should look like.
Basically we would require new compute shader statements / built in functions to traverse, modify, and create the tree.
Following the Nanite example, there would be two things to handle: Collapsing clusters to a single one with lower details, and the other way around.
I make the assumption that the LOD hierarchy tree also makes a good BVH for tracing, which maybe isn't the case for Nanite, but would be possible.
So, on the collapse, we would need to make internal nodes leafs, make the child pointers point to triangles instead child nodes, and free the no longer needed sub branch nodes from memory.
On the expand, we would need to allocate and generate the new child nodes, turn triangle pointers into child node pointers, and free the triangle memory.
(Something like that, to make a simplified example)
The main challenge here seems the memory management. I'd be fine with doing this myself. Likely such functionality is used only by a few, so it does not have to be super easy to use.
A expected difficulty would be different branching factors across vendors. We know AMD uses BVH4, NV might use BVH64, Intel might use binary trees, ARM might use BVH8 (random numbers, no guesses).
We need to handle this in our offline data to HW format conversation compute shader.
Also, Nanite has X triangles per node, where X is much larger than one (or how many triangles the HW actually has in a leaf node. Intel hints to have at least two, because of their native support of two triangle quads with common edge).
Same is true for my stuff. One node maps to a whole cluster of triangles.
Thus we likely need to generate multiple bottom levels to the leafs in a single compute shader workgroup, to add the extra nodes and levels needed for RT. That's still fast, because no need for multiple dispatches with barriers between tree levels. It's actually a good trade off, because the BVH on disk, missing the lowest levels, takes much less storage and streaming time.
A complication would be NVs 'treelets', which would be a packet of a tree branch over multiple tree levels continuously in memory.
If they use this, things are no longer that trivial, but still possible.
You see the concept is simple, and the idea makes total sense. Though, i also assume high density geometry with high and quite regular tessellation. That's not yet the standard, and ofc. working with trees isn't either, so i admit requesting such API is quite a stretch.
Only the IHVs and API designers sitting on a table can make a practical proposal on what's possible and meaningful.