Next gen lighting technologies - voxelised, traced, and everything else *spawn*

I knew someone would say this. Ok. Voxel crap or SDF crap is NOT raytracing crap.
The suggestion was: Implement your compute RT together with RTX RT, stop whining and enjoy full flexibility.
And this suggestion is pretty much crap, which i meant with: This would require two implementations of BVH, so its not the best of both worlds, it is just both worlds side by side.
What is so hard to understand when i say: Replicating Fixed Function functionality is just stupid?


It does handle occlusion by negative light. That's the whole (brilliant) idea.
An image like Cornell Box looks like a path traced image of the same scene. You or I could not spot which is which.

The problem is: The negative light does not exactly match the effect real occlusion would have, and this causes leaks and very ugly color shifts at interiors. I don't think this could be solved. If so, i could stop working, and NV could focus on 16XX.
Bunnells suggestion of sectors to prevent the leaks is crap.
But for outdoors and some simple houses it would work and nobody would notice the color shifts. It would look much better than Metro or Q2.

Thanks for paying attention on alternatives!


You still don't get it: Research on faster RT algorithms will reduce to 1%. Only NV and other Vendors will do it. No faster algorithms! Only faster blackboxed hardware.
All it will spur is moving offline tech to games. An a way that barely makes sense from efficiency perspective. But that's how progress works nowadays. It's no longer important to be fast. Photorealism isn't important either. Only selling GPUs is. Progress only slightly fast enough to dominate competition.


The best choice for primary visibility O(N^2) is not necessarily the best choice for lighting O(N^3). And they do not need to be the same.
1) Of course replicating functionality is bad but the point is that you can mix and match techniques depending on when they work best.

2) The Sponza test shows why it's so limited. No shadows from the ceiling. Metro and Q2 don't have this issue. It's only good enough for very diffuse lighting environments and only applied to small objects like characters. At the point why even bother when you have a much more robust solution.

3) DXR is not RTX so alternatives to this specific hardware implementation will be researched. Without NVIDIA and MS the interest in RT would be minimal. In terms of speed, that's what RTX brings to the table.

4) For sharp features like reflections and shadows they pretty much do.
 
Impossible: Both worlds need their own acceleration structures - twice the work. DXR BVH is blackboxed and only accessible by tracing DXR rays, which further do not allow parallel algorithms.
So we have spent all of this time romancing over the probabilistic unlimited potential of RT compute "with no substantial actual on the ground data to back this up", and now -out of thin air- you are shutting the door completely on one possibility that involves both compute and fixed function? I am sorry but that doesn't see like sound logic at all.

If we are doing so much guesswork, future prediction and wishful thinking with pure RT compute approaches, we might as well do the same with mixed approaches, doesn't make sense to exclude one over the other. The justification of "too much work" sounds like a weak excuse to rule out a possibility based on a personal preference and not scientific evidence.
 
1) Of course replicating functionality is bad but the point is that you can mix and match techniques depending on when they work best.
... which is what i plan to do since i'm here. It's your questions forcing me to explain the downsides again and again, because you seem not to understand. Or you don't want to.

2) The Sponza test shows why it's so limited.
Which sponza test? Do you mean Bikkers test i've posted above?
He's the guy who brought up realtime path tracing, he's the initial Brigade dev - i doubt he did anything wrong.
Likely he's the father of your dreams.
This really shows how biased your standpoints are - it's quite funny as well. :D :D :D

3) DXR is not RTX so alternatives to this specific hardware implementation will be researched. Without NVIDIA and MS the interest in RT would be minimal. In terms of speed, that's what RTX brings to the table.
Stop making claims you can't know. No matter how much pre RTX RT stuff we show here, you ignore it all and keep saying there would be no interest. And you say this to me, although i used RT years before MS and NV brought greatness to the table.

4) For sharp features like reflections and shadows they pretty much do.
Yep, i think that's the strength of RTX. For my personal application. Others will use it differently as well.


So we have spent all of this time romancing over the probabilistic unlimited potential of RT compute "with no substantial actual on the ground data to back this up", and now -out of thin air- you are shutting the door completely on one possibility that involves both compute and fixed function? I am sorry but that doesn't see like sound logic at all.
You just did not understand my quote - that's all. Nothing you say is related to this simple quoted sentence of mine. I give it up.
 
... which is what i plan to do since i'm here. It's your questions forcing me to explain the downsides again and again, because you seem not to understand. Or you don't want to.


Which sponza test? Do you mean Bikkers test i've posted above?
He's the guy who brought up realtime path tracing, he's the initial Brigade dev - i doubt he did anything wrong.
Likely he's the father of your dreams.
This really shows how biased your standpoints are - it's quite funny as well. :D :D :D


Stop making claims you can't know. No matter how much pre RTX RT stuff we show here, you ignore it all and keep saying there would be no interest. And you say this to me, although i used RT years before MS and NV brought greatness to the table.


Yep, i think that's the strength of RTX. For my personal application. Others will use it differently as well.



You just did not understand my quote - that's all. Nothing you say is related to this simple quoted sentence of mine. I give it up.
1) Downsides... from your point of view. Maybe not to other developers.

2) No, I was talking about the Danger Planet tech. The slides I linked to have more pictures. One of them is a Sponza test. The demo at the link you posted looks nice. And talking about Brigade, maybe Otoy will have something new to show us soon now that they're about to support UE4 in addition to Unity.

3) Compared to before DXR, there was much less interest in RTRT. That's a fact. Now you see plenty of developers on Twitter learning and experimenting with ray tracers.

4) GDC is around the corner. Maybe we'll some interesting algorithms/use cases.
 
1) Downsides... from your point of view. Maybe not to other developers.
The most lucky devs are those who approach RT just now, because they will never experience those limitations.
But remember sebbies twitter page i have posted, with some of the most recognized and experienced gfx devs making the exact same critique as me.
Still i need to defend myself constantly, which is exhausting.

2) No, I was talking about the Danger Planet tech. The slides I linked to have more pictures. One of them is a Sponza test.
Sorry man, i would not have thought this sponza scene exists for so long time already.

Of course there now is more RT development for games. My point is SSR would have been replaced by RT also without NV / MS.

You just did not understand my quote - that's all. Nothing you say is related to this simple quoted sentence of mine. I give it up.

Sorry for this as well. Likely you meant i would say RTX and alternatives can not be combined.
But no, the context was: Can BVH be shared? Can the combination be made ideal? And here the answer is no in my case.


Let's close this argue finally. My points seem much too technical and detailed to discuss this here, and my intention never was to stretch minor issues over multiple pages.
I'll keep my development and experience out of discussion from now on.
 
My points seem much too technical and detailed to discuss this here

I dont think you can go too technical here, i do think theres something on both sides, RTX RT isnt crap and so isnt RT via compute, both have their advantages and disadvantages. Im sure that when used well, and a game optimized for it, nvidias current RT solution can show some intresting results that arent obtainable with just compute, especially anything below a Titan V. Were also quite fast at judging how bad the tech is, or how bad for that matter.
Atomic Heart, Metro Exodus, and if the list is anything to go by we will see more examples. I also hope the community come with things like the Quake 2 'mod'.

Hardware functions like bump mapping, T&L and whatever effects from the early 2000's where a thing, PS2 could do those too in a slower but more flexible solution, the platform was pushed more then any other but i dont think there was ever bump mapping on it, perhaps a title or two over its 13 years lifespan. This state of fixed function can exist, Turing and AMD's next probally dont lack in compute either so nothing to worry about.
 
Yes. The RTX units are invaluable to offline raytracing where nVidia has been working on GPU acceleration, and well worth including in GPUs designed for professionals.
The inclusion of RTX encourages devs to use an accelerated ray-tracing solution using nVidia's BVH structure rather than explore alternatives like cone-tracing.
From Nvidia's description of RTX, the driver handles building and refitting of the BVH, and the RT cores autonomously handle traversal. The BVH implementation used seems to be pretty black-box.
RTX has some of Nvidia's particular spin on the concept, but the overarching idea behind the API for ray tracing is that the low-level acceleration structures are encapsulated so that other implementors can have different structures while allowing them to plug into the API.
What specific elements of the BVH are developers exposed to?

Tensor cores are maths accelerators. They don't limit any ML algorithms and were included to solve the limitations of ML, not solve a specific problem. The BVH units in RTX are designed to solve a particular problem - traversing a particular memory structure - as opposed to being versatile accelerators.
Tensor cores accelerate ML in the form of weights and connections worked through dense matrix and vector multiplication with digital ALUs and crossbars that either map very well to existing hierarchies or extend them in a reasonable way.
By the logic applied to RTX and its BVH, they discriminate against various neuromorphic and analog methods, and steer devs away from optical and quantum methods as well.

RT cores do accelerate BVH traversal, and also the intersection tests (although the latter can be replaced). The BVH is the immediate implementation's solution for a more general problem.
Without an acceleration structure, no alternative methods have been able to get themselves close to practicality. BVH is the choice Nvidia want for in terms of what it thought it could map to the existing architecture. It's not the only one, but it's the one Nvidia seems to have been able to best map to the existing SIMD hardware for construction.
Traversal of the acceleration structure is a challenge for a lot of alternative methods, however. I thought cone tracing still had need of an acceleration structure, and its intersection evaluation would be more complex than for a ray. The latter point would seem to favor a different sort of hardware optimization, since Nvidia offers to accelerate intersection testing for a simpler case.

They are akin to the inclusion of video decode blocks. These video decode blocks were included after the need for video decoding was ascertained as pretty vital to any computer and the codec defined, after years and years of software decoding on the CPU gravitated towards an 'ideal' solution worth baking into hardware.
It wasn't always the case that these were taken for granted. AMD got nailed for lying by omission about R600's lack of a UVD block, so today's settled question had a period of pathfinding work and initial efforts.
It wasn't settled whether there would be T&L hardware on the graphics chip, texture compression, AA, or AF until someone put in the hardware to do it--and there were any number of now-gone implementations before a rough consensus was reached.
If Nvidia's specific version of RT hardware doesn't catch on, it's no different than other items didn't pan out in the long run (Truform, Trueaudio, quadric surfaces,.etc), or from many of the features we have now where someone had to commit hardware before there would be adoption.


My personal opinion about mobile RT is the exact opposite: Compute is no alternative and only FF can do it at all, and second: I do not understand the need for RT on mobile, while on PC / console i do.
I think there's a desire to have the same games or similar games with similar features on mobile platforms as there are in the PC and console. Among other things, it helps mobile devices steal more time from the other platforms, and can help the same product expand to multiple markets more readily.

Btw, most doubts efficient RT in compute can be done at all likely come from two arguments: Building acceleration structure takes too much time (solution: refit instead full rebuild),
Depending on the level of change in a scene, a refit can take significant fractions of the cost of a rebuild. There's no theoretical ceiling to this, and if the cost of a rebuild is no longer the dominant one, scene's complexity can raise until the refit becomes a similar limit.

and traversal per thread is too slow (solution: don't do it this way). Both of those arguments are outdated.
Traversal of some structure to find arbitrarily related geometry in an unknown place in DRAM or cache is a fundamental challenge. "Don't do that" is both true and unhelpful.

Mobile hardware often favors fixed-function more because it is cost and power constrained to a degree PCs and consoles are not. General compute resources have a higher baseline of area and power consumption, and the mm2 and milliwatts it takes are more costly.

Recently i have read a blog from a developer, and he made this interesting speculation: NV simply can not talk about how their RT works - they would have to expect legal issues from ImgTec patents.

This tends to be true in a wide range of cases, and if IMG really wanted to poke that hornet's nest it would likely have the resources to figure out if Nvidia was using patented tech.
Cross-licensing is common, and silently developing with disregard as to whether a competitor's technique is invented in parallel is constant. There's good odds that if there wasn't pre-existing licensing, there's a case of mutually assured destruction where IMG could infringe somewhere else that Nvidia hasn't yet taken them to task over.
Apple and Intel for example did get caught out for infringing on memory disambiguation hardware whose patents were enforced by an organization related to the University of Wisconsin, who couldn't be sued back like another IHV could.

One other way non-disclosure can help is if Nvidia finds a better method, they can change things while minimizing how much bleeds out from under the abstraction.
 
What specific elements of the BVH are developers exposed to?
None. The only way to access the BVH nodes is to trace rays against them. There is no Box query for example, so you could not implement a physics broad phase collision detection.
Though, box query should be easy to add and generalize by API even if other vendors choose different BVH data structures. So i expect this to come in the future. (Pretty sure everyone agrees on BVH over octree or kd tree etc., and AABB over OBB)
Disclaimer: I could have missed box query option while reading API.

I thought cone tracing still had need of an acceleration structure, and its intersection evaluation would be more complex than for a ray.
Cone tracing is an open problem and would be the holy grail in the field of ray tracing. Whenever you think about it, you come up with 2 options: 1. trace many rays instead. 2. Use a parallel friendly approximation with techniques from signal processing.
Neither is good, but option one is preferable because it does not leak. To optimize this, it would make sense to use parallel algorithms for ray bundles, which the API does not allow. (Rays are single threaded so isolated from others, similar to pixel shaders)
This restriction however makes sense to keep options changing HW design, and makes it easier to put a common high level API over multiple vendors.

The central idea of practicable realtime RT as shown however is: Avoid cone tracing and replace it with temporal accumulation of single rays (denoising).

I think there's a desire to have the same games or similar games with similar features on mobile platforms as there are in the PC and console.
Sure, but there also is the major performance difference. On mobile it makes more sense to stream data too expensive to calculate on chip, and FF is also more justified because the alternative is impossible much more likely.
However - i'm not sure if ImgTec ever made a mobile RT chip. I think i got all this wrong and they only made a co-processor for desktops, likely targeting AR and content creation there?

Depending on the level of change in a scene, a refit can take significant fractions of the cost of a rebuild. There's no theoretical ceiling to this, and if the cost of a rebuild is no longer the dominant one, scene's complexity can raise until the refit becomes a similar limit.
True. One option is to use a sphere tree. Then for a skinned character all there is necessary is to transform the node centers, so no dependency between tree levels and no need to update bounds at all. (Same can work for AABB, just extend the bounds large enough they cover any animation at any orientation.)
Because DXR is unaware of such things, optimizations like this are ruled out. Though, they also slightly reduce tracing efficiency.

Interesting: ImgTec seems to rebuild the tree per frame from vertex shader per object (per frame only for dynamic objects). But DXR allows refitting for lower level trees and rebuilds only to connect them, IIRC.

"Don't do that" is both true and unhelpful.
Good catch. Depends on granulary and algorithm suitable to current task. Inefficient outliers exist in both approaches.


Likely i sound again like criticizing, but that's all minor things which will be taken over by the FF speedup - i could agree to that eventually.
The main failure of DXR / RTX is the inability to allow any useful LOD mechanism. LOD is very hard, there is plenty of research going on.
It goes from 'how can i approximate branches of a tree with a 3D volume texture?', down to 'how can i grow procedural grass or other fine grained stuff?', or 'how can i merge complex materials from all this - how can a single pixel still represent the reflection properties of a whole tree, wood or mountain?'.
Thinking LOD would be just to reduce poly count is like looking at only the tip of an iceberg. And DXR does not even allow to reduce poly count.
This is the major restriction i see that slows down progress.
Of course this is hard to agree upon if you take current state in games as reference, because they do not yet handle proper LOD at all. This is a result of GPU raster power - they did not need to care. But we must address this if we want realistic images. It's not only about lighting.
This is why i see the limitation to static triangle raytracing as a short sighted decision not in the name of progress. Dynamic geometry and BVH building must become programmable at day one.
 
Shouldn't lod be implementable by building multiple bvh structures? Then collect rays and trace them against bvh containing desired lod level.

Lod might be painful as cracks could happen leading to artifacts or shadow edges might not match silhuettes... There is tradeoff here between correct&consistent versus performance to be made.

My view is that ray tracing against bvh looks a lot like texture operation,... Fairly hardcoded but using result of hardwired operations is flexible.
 
Last edited:
Shouldn't lod be implementable by building multiple bvh structures?
Yes, that's what you'll do if you have fixed LOD meshes. With RT the resulting popping becomes more visible because it appears also in reflections and GI, not only the object itself. (denoising could help a bit if it's lag is large)
But a continuous LOD mechanism can only be handled by complete BVH rebuild each frame. Continuos LOD is what we want, but RTX now makes it harder to get there.
Continuous LOD is much more attractive the you might think, because it not only solves the popping issues. If done right it also opens the door to combine / blend traditional triangle rendering with volumetric shells, voxels, SDF, point clouds etc to achieve a true LOD solution.

Lod might be painful as cracks could happen leading to artifacts or shadow edges might not match silhuettes... There is tradeoff here between correct&consistent versus performance to be made.
Of course a true LOD solution has no cracks. Seamless global parametrization resolves this issue and enables displacement mapping on any surface not just terrain. (replace the term 'displacement mapping' with any 'unlimited detail' tech you prefer here.)
To be clear, seamless global parametrization makes those promises:
Megatextures as seen in Rage
Object space lighting and irradiance caching
Displacement mapping everywhere
Volumetric shells for diffuse stuff impossible to handle by triangles (e.g. detailed foliage)
Continuous LOD
... all of this adaptable to a given performance budget. So it's not just about better performance by decreasing detail, it's also about increasing details.
It's very hard and will take many years to make all of this work finally, but keep in mind you can not achieve 'consistency' with popping LOD levels.

My view is that ray tracing against bvh looks a lot like texture operation,... Fairly hardcoded but using result of hardwired operations is flexible.
This seems the philosophy of DXR API as well: Treat a traversal as an atomic operation.
But a traversal is O(N log N), so seeing it this way is like ignoring this cost. Texture lookup is O(1) so you can not compare this.
Your view would be good if you worked on an offline renderer maybe, where you care about accuracy, features, and code maintenance more than on performance.
But for games this seems very wrong to me, and it does not fit into a low level API.

However we can no argue about this because we don't know how the hardware works. If there is no batching under the hood, the operation is atomic and you are right.
 
One interesting thing to do would be to write benchmark and see how triangle counts and various organisation of triangles affects performance. In essence figure out how much perf is left on table if agressive lods cannot be used.

I wonder if quake2 geometry could be tesselated and used to measure effect geometry complexity has to turing ray tracing performance.
 
But a traversal is O(N log N), so seeing it this way is like ignoring this cost. Texture lookup is O(1) so you can not compare this.
O(n) for worst case degenerated BVH. Just log n on average for non-degenerated.

But a continuous LOD mechanism can only be handled by complete BVH rebuild each frame.
Which isn't as bad as it sounds either. Mostly because it never is a full rebuild:
AccelerationStructure.svg

Keep the bottom levels for each LOD level, and only link them into the top level as required. Top-level may need to be rebuilt, but if you are only replacing a a single bottom level instance with identical scene space bounding box, it may as well just recycle the instance slot. Making a LOD swap potentially as cheap as just finding the correct slot in the top level structure.

The still so nasty part is if you have to rebuild the bottom level parts. Because that more or less involves rasterizing into 3D texture and then packing sparse texture into tree (or is the implementation straight out updating the tree from triangle list? Who knows...). Not so problematic either though, if you go two-step you only have the one-time memory overhead for the 3D texture, (almost) fixed function rasterization, and if the 3D texture was suitably swizzled, transformation into packed tree is just a matter of compressing the bitstream representation of the 3D texture.

However, all of this is still insufficient for LOD. Because LOD when raytracing requires you to choose LOD based on effective cone width. (And keep in mind you need to carry that one anyway, or texture LOD doesn't work either, as seen with reflections in the Quake3 demo.) You actually may require highest level LOD for primary hit or clear reflection, and simultaneously lowest LOD for diffuse reflection for an object right in front of the camera.

So the current structure of the top-level acceleration structure is insufficient, as it does not support any LOD for the links to the bottom levels yet. Stupid slip in the API design.
 
O(n) for worst case degenerated BVH. Just log n on average for non-degenerated.
My bad, my initial N was a mistake. (confused by N rays times log n cost)

ecause LOD when raytracing requires you to choose LOD based on effective cone width.
I would not go so far with my requests and set the LOD simply relative to camera, so just once per frame but not per ray. Distant objects need only little detail for reflections as well.

So the current structure of the top-level acceleration structure is insufficient, as it does not support any LOD for the links to the bottom levels yet. Stupid slip in the API design.
That's quite an interesting catch.
But a hard switch on cone width would also restrict to either unique traversal per ray or too less available work for iterating over rays per batch of geometry. Likely this can only be done with alternative geometry reasonable fast?
Also, LOD is not enough to support cone tracing. It can just help to approximate it better as you still have sharp edges and results at such 'depth continuities' would not be smooth. So you still need to jitter and denoise - no big win for the huge effort.
So no, only makes sense if you had total control over the BVH. Vendors would need to agree on BVH and expose it all. RT cores could become incompatible... will not happen anytime soon.

More likely they will allow for dynamic lod from camera in form of support for geometry / tesselation shaders at the cost of complete rebuilt.
ImgTec has FF hardware for tree building, so that's one option to make it faster maybe, but your conservative raster idea is hard to beat with more FF i guess.

Thinking about the big advantage of having geometry, LOD and material in one data structure (e.g. point hierarchy or unpractical voxels), it would be interesting to bring this to triangles.
This could work to some degree: If you have seamless parametrization, you also can make a quadrangulation. And those quads form the same nice hierarchy as texels and mip maps.
But it would be no solution for everything, and it would break above a certain level. Above this level connectivity breaks, no more continuous LOD.
Above this one could fall back to progressive meshes and geomorphing so at least the geometry is continuous but not textures.
Promising, but too complicated for FF. Sadly this is true for almost anything nowadays.

... I need to add, although RTX adds quite a restriction when thinking about LOD, this does not mean it is not possible.
The solution is to change LOD only on a subset of geometry per frame and rebuilt BVH just for that.
You want to do this in any case most likely, so the restriction is not that bad. (Problem is large continuous geometry like terrain and how to avoid visible discontinuities when splitting it into pieces.)
 
Last edited by a moderator:
Unreal Engine 4.22 preview released.
https://forums.unrealengine.com/unr...d-releases/1583659-unreal-engine-4-22-preview

    • Real-Time Ray Tracing and Path Tracing (Early Access)
      • Added ray tracing low level support.
        • Implemented a low level layer on top of UE DirectX 12 that provides support for DXR and allows creating and using ray tracing shaders (ray generation shaders, hit shaders, etc) to add ray tracing effects.
      • Added high-level ray tracing features
        • Rect area lights
        • Soft shadows
        • Reflections
        • Reflected shadows
        • Ambient occlusion
        • RTGI (real time global illumination)
        • Translucency
        • Clearcoat
        • IBL
        • Sky
        • Geometry types
          • Triangle meshes
            • Static
            • Skeletal (Morph targets & Skin cache)
            • Niagara particles support
        • Texture LOD
        • Denoiser
          • Shadows, Reflections, AO
        • Path Tracert
        • Unbiased, full GI path tracer for making ground truth reference renders inside UE4.
Disappointed to see that there is no refractions yet.
 
Aren't those included under reflections? it's the same thing only sending the ray through the other side of the surface. If not, it shouldn't be at all hard to rework the reflection shader to make a refraction shader.
 
Technically, though, this kind of parametric continuous LOD JoeJ describes (and which I used to think was the unavoidable direction we'd eventually go but am still waiting) is only actually proper for primary visibility, but for other day tracing effects, rays often diverge in ways that require more or less detail in different areas than what can be seen from the camera perspective.
Imagine for example a curved glass or mirror that forms a kind of magnifying glass effect, ideally you'd want rays hitting that lens to reach a higher detail LOD because the objects reflected/refracted would appear bigger than they are. Of course, this is a hypothetical we are far from considering a real problem anytime soon considering the much cruder problems we face with modern rendering.
A more realistic example may be the long stretching shadow of a distant object that hits a surface close to the camera. That might give away your LOD system and look distracting.
 
Last edited:
Aren't those included under reflections? it's the same thing only sending the ray through the other side of the surface. If not, it shouldn't be at all hard to rework the reflection shader to make a refraction shader.
Yes, for simple surfaces like plane of glass it should be possible to use reflection to get it done.. (Basically do internal trace in shader and spawn ray outward according to edge you hit.)

For more complex surfaces there is additional complexity and behaviour due index of refraction and possibly few additional traces within object before ray gets out. (And possible reflection rays spawned in within.)
 
For more complex surfaces there is additional complexity and behaviour due index of refraction and possibly few additional traces within object before ray gets out. (And possible reflection rays spawned in within.)
We're a long way from modelling that in games! Internal reflection is affected by volume of the material and IOR and can't be solved by a simple ray direction. You could fake it by using a texture map of refraction direction which is probably the only sane way, and that should be nicely mappable to the existing raytrace shader code, just using a normal map.
 
Back
Top