Next gen lighting technologies - voxelised, traced, and everything else *spawn*

One thing I find curious about RTX examples is the temporal element. Here, it's quite apparent in the lag in the shadows. The volumetric solutions suffer from lighting lag as a result of the algorithm, but I don't understand it in the RT examples. Is it a result of the denoising accumulating temporal samples?
Yes. Denoising can only give a good result after some time when enough matching samples are available. If you do it in screenspace, similar artifacts then with SS fakery are expected with fast motion.
In this example it is not acceptable, same in other UE4 videos.
We see in the video the performce never changes: 8.3 ms no matter if SSAO, RTAO, or RTAO + denoising.
Either this is some vsync or they use a very fast denoiser that just is not good enough.
(still wondering about denoising costs...)

Edit: As they are the gods of TAA, maybe they try to do it with a simple extension of this for now.
 
I love the look of the lighting in all such demos. The clean white focusses entirely on the beauty of the light and shadow in terms of form. A few games have gone with this style and they'd be well served with a next-gen lighting solution.

One thing I find curious about RTX examples is the temporal element. Here, it's quite apparent in the lag in the shadows. The volumetric solutions suffer from lighting lag as a result of the algorithm, but I don't understand it in the RT examples. Is it a result of the denoising accumulating temporal samples?
The denoising right now is pretty terrible. Considering other DXR implementations don't look so bad I guess it's just a matter of time before they fix it. Preview 1 didn't even have GI support at all.

In terms of lag, NVIDIA's denoiser has both a temporal and a spatial component. You can turn off the first but the results are not as good. I guess a bit of lag will be expected for the foreseeable future.
 
The denoising right now is pretty terrible. Considering other DXR implementations don't look so bad I guess it's just a matter of time before they fix it. Preview 1 didn't even have GI support at all.

In terms of lag, NVIDIA's denoiser has both a temporal and a spatial component. You can turn off the first but the results are not as good. I guess a bit of lag will be expected for the foreseeable future.
Is the UE4 Nvidia denoiser using the Tensor cores yet, or via compute?
 
To start off on the right foot again!

So looking at other ways of representation for GI, reflections, AO, etc, voxels and SDFs or other things more exotic, I guess a good alternative to triangle-based RT like in DXR would be one that has that same effect of fine granularity. When you think about it, those voxels in VXGI or CryEngine SVOGI have a fixed size in the world they occupy and represent, and then with cascades in the distance, regardless of how far you zoom into an object or how close you get the minimum voxel size for the closest cascade remains the same. On the otherhand, the RT we have seen in BFV and metro is not limited in that fashion, and is limited by the fidelity of the object being tested against as built up in the BVH (which could be a lower LOD version of the real primarily visible object with a different mip of texture or something).

That means in BFV or Metro Exodus the fidelity does not necessarily break down into the consitutent representation when you get closer (so you do not see the voxels comprising it for example) and it can represent macro and micro effects at the same time. I was thinking about that as I threw some throwing knives in metro against a window sill and got this effect:

Ultra DXR:
ultra1arktm.png


High DXR:
high1vlk46.png


Tiny little dynamic objects thrown by the player affecting the GI, here casting dynamic indirect shadows on the window sill, which itself is only lit by indirect lighting from the outside.

I made these screenshots at native 4K at high and ultra settings for RT - without tessellation and utilising the game's high preset, which is running surprisingly OK (41 and 52 fps respectively there at native 4K).
 
I guess a good alternative to triangle-based RT like in DXR would be one that has that same effect of fine granularity.
So you want to increase voxel detail as camera moves closer?
Interesting - that's one of those things i can not do because of API limitations. Because i need to write commands from CPU for each LOD level it has to be finite. If i could dispatch tasks from GPU instead, i would implement your suggestion... if there were no RTX.
My thought here is more the opposite of yours: RTX has the problem it does not scale good at distance if it's detailed, so you might want to use alternatives utilizing LOD to have a more constant cost independent of scene size.
But for close ups (and more importantly high frequency details) i see no good alternatives.

To start off on the right foot again!
I think my foot is much more wrong than yours.

I just decided to drop my plans to work on compute RT. Also, all the problems i see with RTX can be easily fixed with alternatives.
So what's my problem at all? I don't know myself anymore. Somehow all this went into some kind of pointless mania, and i feel sorry, bad and guilty for it.

My apologizes to all the people here, and even more apologizes to the folks at NV - It just works. :O

Better focus on using all the tools properly instead stupid ranting, sorry!
 
So you want to increase voxel detail as camera moves closer?
Interesting - that's one of those things i can not do because of API limitations. Because i need to write commands from CPU for each LOD level it has to be finite. If i could dispatch tasks from GPU instead, i would implement your suggestion... if there were no RTX.
My thought here is more the opposite of yours: RTX has the problem it does not scale good at distance if it's detailed, so you might want to use alternatives utilizing LOD to have a more constant cost independent of scene size.
But for close ups (and more importantly high frequency details) i see no good alternatives.


I think my foot is much more wrong than yours.

I just decided to drop my plans to work on compute RT. Also, all the problems i see with RTX can be easily fixed with alternatives.
So what's my problem at all? I don't know myself anymore. Somehow all this went into some kind of pointless mania, and i feel sorry, bad and guilty for it.

My apologizes to all the people here, and even more apologizes to the folks at NV - It just works. :O

Better focus on using all the tools properly instead stupid ranting, sorry!

Would turing mesh shaders help here? amd has their version also but as the api is not public it's difficult to know for sure what the capability there is.
 
Last edited:
Is the UE4 Nvidia denoiser using the Tensor cores yet, or via compute?
Based on the presentations they did last year it's compute based.

To start off on the right foot again!

So looking at other ways of representation for GI, reflections, AO, etc, voxels and SDFs or other things more exotic, I guess a good alternative to triangle-based RT like in DXR would be one that has that same effect of fine granularity. When you think about it, those voxels in VXGI or CryEngine SVOGI have a fixed size in the world they occupy and represent, and then with cascades in the distance, regardless of how far you zoom into an object or how close you get the minimum voxel size for the closest cascade remains the same. On the otherhand, the RT we have seen in BFV and metro is not limited in that fashion, and is limited by the fidelity of the object being tested against as built up in the BVH (which could be a lower LOD version of the real primarily visible object with a different mip of texture or something).

That means in BFV or Metro Exodus the fidelity does not necessarily break down into the consitutent representation when you get closer (so you do not see the voxels comprising it for example) and it can represent macro and micro effects at the same time. I was thinking about that as I threw some throwing knives in metro against a window sill and got this effect:

Ultra DXR:
ultra1arktm.png


High DXR:
high1vlk46.png


Tiny little dynamic objects thrown by the player affecting the GI, here casting dynamic indirect shadows on the window sill, which itself is only lit by indirect lighting from the outside.

I made these screenshots at native 4K at high and ultra settings for RT - without tessellation and utilising the game's high preset, which is running surprisingly OK (41 and 52 fps respectively there at native 4K).
Yep, this is a massive advantage ray tracing has over other alternatives. Specially for shadows: matter how much you zoom in they remain razor sharp.

So you want to increase voxel detail as camera moves closer?
Interesting - that's one of those things i can not do because of API limitations. Because i need to write commands from CPU for each LOD level it has to be finite. If i could dispatch tasks from GPU instead, i would implement your suggestion... if there were no RTX.
My thought here is more the opposite of yours: RTX has the problem it does not scale good at distance if it's detailed, so you might want to use alternatives utilizing LOD to have a more constant cost independent of scene size.
But for close ups (and more importantly high frequency details) i see no good alternatives.


I think my foot is much more wrong than yours.

I just decided to drop my plans to work on compute RT. Also, all the problems i see with RTX can be easily fixed with alternatives.
So what's my problem at all? I don't know myself anymore. Somehow all this went into some kind of pointless mania, and i feel sorry, bad and guilty for it.

My apologizes to all the people here, and even more apologizes to the folks at NV - It just works. :O

Better focus on using all the tools properly instead stupid ranting, sorry!
It is true that LOD is a problem, but isn't it the same for rasterization? Either render the objects twice during the transition, use imposters or just suffer from pop-in.
 
Yep, this is a massive advantage ray tracing has over other alternatives. Specially for shadows: matter how much you zoom in they remain razor sharp.

Irregular Z-buffer will almost certainly do simple pixel perfect hard shadows cheaper.
 
Would turing mesh shaders help here?
Good question. I assume they are not compatible with RTX, also geometry and displacement shaders, but i need to check API docs.
ImgTec generates BVH from vertex shaders - if DXR does the same this would work, but the cost of BVH building likely would be higher than the savings from LOD i'm pretty sure.

It is true that LOD is a problem, but isn't it the same for rasterization?
Yes, but with rasterization you iterate over each object only once, and dynamic LOD is possible.

But my critique mainly comes from the assumption compute could compete in performance with acceptable quality, and i no longer think so. (I have not tried yet but i underestimated some costs of my ideas)
If this argument breaks all the others become very minor.
For example, to support dynamic LOD the hardware could increase a lot in complexity, and we don't know yet how this LOD mechanism should look like.
If doing it all (including the BVH) in compute is an option it can be extended later. It is not a must for day one. Get going and initial performance is more important.

... just as many of you said.

And for me personally the LOD is no problem at all because i can use my GI data as a fallback in cases where RTX starts to struggle. Strengths are the opposite.
So i was using this just to argue maybe. That's not better than angry gamers ranting against Epic Store.
Throw some old eggs against me... :)
 
Irregular Z-buffer will almost certainly do simple pixel perfect hard shadows cheaper.
True as you say, it's limited to simple shadows. Contact hardening shadows require both sharpness and softness at the same time. The most obvious use case I can think off is character close ups. Per object shadow maps help but ray tracing is still a much better option.
 
But my critique mainly comes from the assumption compute could compete in performance with acceptable quality, and i no longer think so. (I have not tried yet but i underestimated some costs of my ideas)
What's changed? What the bottlenecks in compute RT and BVH/alternative traversal?
 
It really is impossible to say how performance and quality compares between schemes without some kind of reference implementations. Trying to judge it just from a game, with who knows how many hacks to backup the raytracing, is folly. Ray tracing has the advantage of hardware support, it has the disadvantage of massively more aliasing than cone tracing and needing all the advantages it can get. Wish all researchers just used a single engine to implement their schemes so apples to apples comparisons were possible. The field also needs some standard scene with a mix of dense geometry up close and long views.

I doubt faking it till you make it will go away. In the end I mostly just want dark corners to get a little more light, without everything looking like brightness was turned up and without noticeable artifacts. I'd rather sacrifice realism than having to wait a couple of frames for spatio-temporal denoising to kick in.
 
Dynamic geometry is also important to include in evluations. Take for example the RT's tree scene linked earlier - they are all static. How does raytracing cope when they're all moving in the wind? Some of the prettiest, most convincing voxelised solutions have, like, half a second lag on the lighting updates which is just unusable. The temporal stuff is really disconcerting. It gets as smeary as gaming on the earliest LCDs! More samples solves that, but we are far from that amount of power.
 
What's changed? What the bottlenecks in compute RT and BVH/alternative traversal?
My idea was to work with an approach centered on the geometry, not the rays. So caching branch of BVH into LDS and iterating all potential intersecting rays. This does more work but allows traversal without incoherent memory access.
But how to get potential rays? I thought binning them to a coarse grid would be fast and good enough. This requires a prefix sum which is usually a no brainer and i did not think about it further. Problem: 3D grids are always huge and so the prefix sum has quite a cost, and avoiding it requires too much memory :(
I'd need to do all this several times - per LOD level, eventually also per ray segments front to back.
I assume it would end up much slower than a naive per ray traversal of the full BVH.

I would still benefit from my LOD hierarchy using naive traversal because i can stop the traversal much earlier, but this advantage alone is likely not enough for games. Even if, it's not worth it.

Now i started thinking about something else, but if naive traversal is really the best option i'm quite grateful NV hides this mess behind FF.
 
Naive traversal is definitely the best solution if power is no object. For realtime though, it really is slow. We can only get decentish results from FF hardware by using very few rays and denoising. There has to be a better realtime solution somehow that compromises quality for speed. I suppose the balance will be:

speed : RAM : quality

As the ray sampling is so scattered, I guess we want more information per ray (larger area of sampling than an individual point), so a ray gives a decent approximation of the amount of light between that point and its neighbours.
 
Last edited:
As the ray sampling is so scattered, I guess we want more information per ray (larger area of sampling than an individual point), so a ray gives a decent approximation of the amount of light between that point and its neighbours.
Which brings us to cone tracing: A small framebuffer? / poly clipping? / signal processing approach accepting leakage? / or simply many rays?
Not good. This is what denoising solves much better. The artifacts you mention could be fixed in object space.

Naive traversal is definitely the best solution is power is no object. For realtime though, it really is slow.
I still think the best solution is to process ray packets with BVH clusters. This completely solves the caching problem.
There are many ways to do this. The other idea i have is to trace the BVH cluster and then storing the node pointer (likely two: once cluster parent and second actual node) with the ray.
After all clusters have processed sort the rays by their cluster pointer so the next processing clusters find their rays quickly.
This appears more elegant and work efficient then the regular grid, but because stackless traversal only works depth first it requires to process the same branches multiple times, so not sure.
However, the bottleneck is again the sorting / binning of millions of rays to the next clusters. ('bottleneck' because this step kills the advantage over naive traversal, although otherwise it's all cache efficient.)
Assuming we have a tree of 20 levels and each cluster spans 3 levels of MBH we would need to do the binning 6 times per frame.

So coming back to your question 'what would we need to add to compute to allow flexible RT?', it seems in any case hardware accelerated binning.

The traversal itself would be very efficient, slower than RT core but flexible and fast enough i guess. (It's LDS, so likely faster than SSR)
Though, all approaches i propose are unordered any hit algorithms clipping the ray with atomic ops but doing more work than the naive traversal. This could be leveraged with processing segments of rays.
 
hmmm... realizing the second idea only needs to run prefix sum over clusters not rays. So thousands, not millions. That's little work.
So the cost to solve the caching problem would be again mainly to have many dispatches, mostly with very little work. If those bubbles can be hidden behind async workloads it makes sense.
Would be interesting to try... :)
 
Screen Space GI (SSGI) early access in Unreal Engine 4.23 dev branch similar to what Unigine 2.7 has.


Pros:
- Works on all modern GPUs unlike DXR (only Volta & some Turing GPUs)
- Performance

Cons:
- It's.. in screen-space..

Speaking of Unigine 2.7:

There latest Voxel Based GI (which doesn't require UV Layout!)


https://developer.unigine.com/en/devlog/20180426-unigine-2.7
Nice addition, should be good with 'overscan' rendering.
I wonder when we get proper support for variable rate shading, it should be good combination
 
Back
Top