Next gen lighting technologies - voxelised, traced, and everything else *spawn*

At each internal node, you could decide to switch LOD based on distance and random number.
Except you wouldn't want to allow this on every internal node of a BVH tree, but rather link different alternate BVH trees for the different LOD levels into the top level acceleration structure. Keep traversal of all bottom level structures potentially fixed function / tight loop as much as possible. And only allow custom code where you have considerable cost / unavoidable shader load either way.
Here the link to the LOD papaer in case you missed it: https://web.yonsei.ac.kr/wjlee/traversalshader.html
That paper also only suggests to allow different LOD level bottom level acceleration structures to be entered from same top level acceleration structure via programmable traversal strategy in between acceleration structures. Not per individual BVH tree node.
or better: Terminate traversal and store the ray, binned by LOD for later processing.
Traversal strategy is (currently) mostly focused on keeping rays in flight "minimal". Each BVH traversal step is amortized less than a single memory access amounting to less than 4 byte per node (wild guess, I strongly expect that at least a cluster of 2 or 3 levels of the octtree is bin-packed tightly into a single cache line size)? While the state of a ray is considerably more than that, minimum I can picture is 8 x 4 = 32 byte.

Binning per LOD gives you expected cache hit only on the topmost node / node cluster per tree. At the high risk that the majority of binned rays is full miss, and now has to go straight back into the bin for top level structure, at the overhead of 2 reloads of the ray state plus >2 write accesses to data structure for binning. In comparison, common screen space binning only coherent, neighboring primary rays actually has a much better chance of uniform memory accesses even when traversing deeper into the BVH, regardless of implementation specific traversal strategy.

Non-uniform traversal into bottom level structures still isn't free, of course, but at least you don't end up with any write accesses or reloading ray states.
 
Would this be flexible enough to do skinning and cached subdivision/displacement?

PS. as I said before, we need ray differentials.
 
Last edited:
Let me know if anyone with experience on DXR thinks stochastic LOD by segmenting rays could be practically fast.
Technically it should work, both relative to camera (so overall the scene), but also relative to the shading point (aggressive LOD enough for area light shadows / glossy reflections).
Just the overhead of a full traversal restart at each segment might be too much...
 
Just the overhead of a full traversal restart at each segment might be too much...
You are foremost in for bugs if you accidentally start a segment within a span which was not yet a hit at the current level, but would had been at the next, resulting in missing the hit entirely. So worst case you will have to restart the next segment with a decent overlap, corresponding to the worst case size of such span. And that worst case can be as bad as backstepping to the bounding box of all objects you had been within at the end point of the previous segment.

Deciding on a LOD per ray only when entering an instance is a surprisingly sensible design choice.
 
You are foremost in for bugs if you accidentally start a segment within a span which was not yet a hit at the current level, but would had been at the next, resulting in missing the hit entirely. So worst case you will have to restart the next segment with a decent overlap, corresponding to the worst case size of such span. And that worst case can be as bad as backstepping to the bounding box of all objects you had been within at the end point of the previous segment.

Deciding on a LOD per ray only when entering an instance is a surprisingly sensible design choice.
Backstepping surely is an issue, but should be solvable if geometry has data to transform surface positions between the LODs, which i already have. Edit: dumb - no hitpoints in empty space - need to think more about it...

I assume it's not possible to preserve any traversal state when starting the next ray segment in any case. Even if possible, it would make no sense here because each segment works on another downlodded BVH and is a full restart from its root.
(The whole idea is about LOD on current RTX and independent of any nodes, hits or traversal - unlike the ideas from the paper. Just to make sure we get each other right.)
Basically, if we had 8 LODs, each ray would become 8 rays in the worst case. Shorter rays, and less scene complexity, but still much more rays which worries me. I assume if it would work well in practice we would have heard about it already.

I assume the instancing options we have now can't be of any help to achieve stochastic LOD.
 
Last edited:
I assume the instancing options we have now can't be of any help to achieve stochastic LOD.
I could think of a way, at least for primary rays baked for a specific origin. You can insert all LOD levels in the fuzzy ranges on top of each other, and then reject in any hit shader if the hit occurred in higher detail level beyond that rays individual LOD max depth, still compared to boundary of that instance.

If you can't control traversal, you have to filter the results instead.

You can bake adjacent levels into one, if you want to avoid multiple overlapping bottom level structures.
 
Last edited:
Neat idea!
Accepting LOD only relative to camera as origin, maybe it would also work to have a small number of fixed configurations with random transition instances per frame. So avoiding the need to have two overlapping LODs, and selecting configurations stochastically per pixel.
Would loose ray coherency with neighbouring pixels and have higher cost of building TLAS, though.

At least there are some options to try... :)
 
Boah, this is the best gfx mod i've ever seen i think.
Usually i do not pay attention, because 'HD texture pack' mostly means 'destroy art direction but claim improvement'.
But this is just awesome. Respect! Gimme that for San Andreas too! :)
 
@JoeJ

Has this been posted here/have you seen it? Now that's a remaster of a great classic.


Woah woah woah there, where's the NSFW-tag? I'm pretty sure there was a split-second 3D-nipple there! WON'T ANYONE THINK OF THE CHILDREN?! :runaway:

Boah, this is the best gfx mod i've ever seen i think.
Usually i do not pay attention, because 'HD texture pack' mostly means 'destroy art direction but claim improvement'.
But this is just awesome. Respect! Gimme that for San Andreas too! :)

Agreed, it's great, but the lighting needs some work IMO, too often it looked like it's way too under saturated for Vice City, not pastel and neon enough.
 
Hehe, yeah - just give me some saturated colors, avoid ugly contrast in textures, some reflections here and there and it pops right into my eyes :)
It's great how they use all this well selected here, and this SSRT crap really works well too.
 
Hehe, yeah - just give me some saturated colors, avoid ugly contrast in textures, some reflections here and there and it pops right into my eyes :)
It's great how they use all this well selected here, and this SSRT crap really works well too.

I believe they are just using GTA V's engine there, so no SSRT. Just SSAO, and a dynamic camera-centered cubemap + planar reflections for most significant horizontal surface for reflections and that's it.
 
Coherency gathering in ray tracing: the benefits of hardware ray tracking
11 February 2020 -
Rys Sommefeldt
What PowerVR’s implementation of ray tracing hardware acceleration does, which is unique compared to any other hardware ray tracing acceleration on the market today, is hardware ray tracking and sorting, which, transparently to the software, makes sure that parallel dispatches of rays do have similar underlying properties when executed by the hardware. We call that coherency gathering.

The hardware maintains a database of rays in flight that the software has launched and is able to select and group them by where they’re heading off to in the acceleration structure, based on their direction. This means that when they’re processed, they’re more likely to share the acceleration structure data being accessed in memory, with the added bonus of being able to maximise the amount of parallel ray-geometry intersections being performed by the GPU as testing occurs afterwards.
...
Without that hardware system in place to help the GPU process similar rays you’re left either hoping that the application or game developer took care of ray coherency on the host somehow, or you’re shooting for some middle ground of sorting them on the GPU using compute programs – if the way you process rays in hardware even allows for that in the first place. None of those options is compelling for performance and efficiency in a real-time system, yet Imagination is the only GPU supplier on the market with such a hardware ray tracking system.

https://www.imgtec.com/blog/coheren...racing-the-benefits-of-hardware-ray-tracking/
 
Back
Top