Ext3h
Regular
Except you wouldn't want to allow this on every internal node of a BVH tree, but rather link different alternate BVH trees for the different LOD levels into the top level acceleration structure. Keep traversal of all bottom level structures potentially fixed function / tight loop as much as possible. And only allow custom code where you have considerable cost / unavoidable shader load either way.At each internal node, you could decide to switch LOD based on distance and random number.
That paper also only suggests to allow different LOD level bottom level acceleration structures to be entered from same top level acceleration structure via programmable traversal strategy in between acceleration structures. Not per individual BVH tree node.Here the link to the LOD papaer in case you missed it: https://web.yonsei.ac.kr/wjlee/traversalshader.html
Traversal strategy is (currently) mostly focused on keeping rays in flight "minimal". Each BVH traversal step is amortized less than a single memory access amounting to less than 4 byte per node (wild guess, I strongly expect that at least a cluster of 2 or 3 levels of the octtree is bin-packed tightly into a single cache line size)? While the state of a ray is considerably more than that, minimum I can picture is 8 x 4 = 32 byte.or better: Terminate traversal and store the ray, binned by LOD for later processing.
Binning per LOD gives you expected cache hit only on the topmost node / node cluster per tree. At the high risk that the majority of binned rays is full miss, and now has to go straight back into the bin for top level structure, at the overhead of 2 reloads of the ray state plus >2 write accesses to data structure for binning. In comparison, common screen space binning only coherent, neighboring primary rays actually has a much better chance of uniform memory accesses even when traversing deeper into the BVH, regardless of implementation specific traversal strategy.
Non-uniform traversal into bottom level structures still isn't free, of course, but at least you don't end up with any write accesses or reloading ray states.