Next gen lighting technologies - voxelised, traced, and everything else *spawn*

Crytek Ray Tracing demo info
How we made Neon Noir - Ray Traced Reflections in CRYENGINE and more!:
https://www.cryengine.com/news/how-we-made-neon-noir-ray-traced-reflections-in-cryengine-and-more
THere it is:
"It runs in 1080p with 30 fps on a Vega 56"

So kind of like I heard - "injecting" geometry at those points when needed at a lower LOD for the ultra mirror like reflections and falling back to voxel cone tracing for less smooth surfaces + diffuse stuff. That explains a heck a lof the visual things going on in the NOIR demo! I love that idea though, tiered quality and different techniques used. Though it does mean that real time dynamic moving objects will not really effect diffuse GI or reflections on less than mirror like reflections. AFAIK character models and moving objects are not voxelised in CryEngine, just the static geo.
"However, RTX will allow the effects to run at a higher resolution. At the moment on GTX 1080, we usually compute reflections and refractions at half-screen resolution. RTX will probably allow full-screen 4k resolution. It will also help us to have more dynamic elements in the scene, whereas currently, we have some limitations. Broadly speaking, RTX will not allow new features in CRYENGINE, but it will enable better performance and more details."

Hell yeah! This sounds awesome. It seems like the Noir demo being devoid of many more dynamic elements is one of the things getting that 30 fps on the Vega56 even then!
 
PowerVR RayTracing is available now for licensing.
https://www.imgtec.com/powervr-ray-...g&hootPostID=0318b67c9e0e511dea25f13083b17c75
White Paper: http://cdn2.imgtec.com/whitepapers/powervr/ray-tracing/powervr-shining-a-light-on-ray-tracing.pdf

PowerVR's own "comparison":

TynCQFs.jpg

i4S4SA9.jpg

hcCnt4T.jpg
 
Last edited:
So PVR's solution is significantly more engineered than nVidia's. The idea here is to license the tech for other GPU vendors such as AMD to incorporate? And presumably the ideas are patent protected where they can be, so acceleration concepts (scene hierarchy generator?) that ImageTec are first on will remain exclusive to them?

Could this actually explain some of nVidia's choices? Are some ideas locked IP?
 
*lights the @willardjuice, @iroboto @BRiT signal* What was I saying yesterday? xD

Phil walks on stage.
“It’s @Rys Tracer! Ryyyyyyyyyyyys Tracer!”

/flees :runaway:


... 3 hours of sleep.
that is a hard software based implementation by AMD. It's an early graphic, so I'm not sure if that's going to require updating once Navi is released. But we will see.

oooooof if there is no change
 
Last edited:
So PVR's solution is significantly more engineered than nVidia's. The idea here is to license the tech for other GPU vendors such as AMD to incorporate? And presumably the ideas are patent protected where they can be, so acceleration concepts (scene hierarchy generator?) that ImageTec are first on will remain exclusive to them?

Could this actually explain some of nVidia's choices? Are some ideas locked IP?

Yeah, that's still the question.
ImgTec can generate BVH in hardware while running a kind of vertex shader, as i've seen on their earlier docs. NV does this from compute with implementation (AFAIK) hidden behind API.

But the more interesting claim in the whitepaper is: NV has nothing similar to 'Ray Coherency Engine', which is the most interesting part about RT performance.
We know NV did most of the research in this field. Just because thy do not talk about it does not mean they don't have it? I doubt this claim.

If it's true however, we can expect a big perf boost with NV next gen.
Personally i think the options here are too many patent issues could become a real problem, and likely NV has patents over their RT research. But i'm no lawyer.

...oh - i've missed the most important part here actually:
If ImgTecs BVH generation is so fast and allows per frame rebuild, this would fix the largest DXR/RTX problem: Missing LOD support.
With the option to generate geometry and BVH on the fly any LOD mechanism can be implemented.
We still want to reuse data over multiple frames ofc, so while moving through the world stuff coming closer becomes more detailed but only parts of geometry change each frame.
I like this :) With such issues solved and everything just working the mentioned black boxes are no more problem.
 


Can't run it yet but its awesome how many Ray-Traced games are already available. For non-RTX gpu's you can get away with something like a Radeon VII.
Would be more interesting, if it wasn't a screenspace tracer.
Inability to use backup methods for SS misses or excisting GI methods is quite big limitation.
 
There's only so much reordering rays can do ... you trade incoherence in intersection test for incoherence for the memory access for the re-ordering.

It's curious with how much NVIDIA has published on ray tracing in general recently, how little it has published on ray re-ordering in recent years. All the work of Timo Aila is relatively far in the past. Maybe it's just nowhere near relevant to real time ray tracing on commodity hardware?
 
Last edited:
Perhaps we will see something like this on next gen consoles too, a Radeon VII can do it and it has no RT cores like RTX Nvidia has.
 
There's only so much reordering rays can do ... you trade incoherence in intersection test for incoherence for the memory access for the re-ordering.

It's curious with how much NVIDIA has published on ray tracing in general recently, how little it has published on ray re-ordering in recent years. All the work of Timo Aila is relatively far in the past. Maybe it's just nowhere near relevant to real time ray tracing on commodity hardware?

Surely finding to best trade off isn't easy, but there must be a sweet spot worth to find. Otherwise ImgTec would not have their Coherency Engine. Although they list this as optional. Because of minimal power / die area, or because it is no big win? They don't give any data, just mentioning additional silicon cost. It's also not clear if reordering happens only once at each hit or frequently during traversal.

I remember Ailas paper which somebody linked here recently, about 'treelets'. That's a similar idea i have called 'caching branches of BVH to LDS', so quite an interesting read. In the paper they mentioned wide trees and stackless traversal as interesting future work. I would consider both to be essential - the lost front to back order with stackless could be counteracted by dividing rays into front to back segments. I agree the paper appears quite dated, but i doubt they stopped working in it. Maybe they changed from 'publicate and patent' to 'keep secret' strategy for some reason.

From the DXR API we have the impression tracing rays is an atomic operation and execution waits on results. This would hint there is no batching, reordering or however we call it. But that's no proof. It remains a mystery :)
 
Perhaps we will see something like this on next gen consoles too, a Radeon VII can do it and it has no RT cores like RTX Nvidia has.
Oh no... more screenspace crap :)
I'd like to see something like Crytek has shown instead. Likely we will, considering RT mentioned for PS5.
 
It's possible that the core of the coherence work is hard to patent, so they'd rather just not talk about it.

I'd guess Kirill Garanzha's work describes quite a lot of the patently obvious mechanisms necessary and he did that before joining NVIDIA.
 
From the DXR API we have the impression tracing rays is an atomic operation and execution waits on results. This would hint there is no batching, reordering or however we call it. But that's no proof. It remains a mystery :)

The API is very high level - shaders dispatch some rays, and at some point a hit/miss/whatever kernel is executed with the results. There's a lot of resemblance to Nvidia's dynamic parallelism in Cuda, honestly.

Rays are already inherently batched to some extent at the wavefront level. They're often even coherent here too!

Scheduling is complicated to say the least. Last thing you want are SMs sitting idle for thousands of clock cycles waiting for rays! And the latency will be thousands of cycles since traversing an accelleration structure is pointer chasing with a lot of it uncached.

So what do you do with the kernel while the ray dispatch is active? Evicting a block from an SM is an expensive operation since it has a huge chunk of registers and possibly shared memory too that needs to be persisted. We're talking kilobytes here. This is of course assuming the shader isn't just terminated at this point - one possible optimization would indeed be to put the dispatch at the end of a kernel, allowing it to terminate without waiting for results, and have the hit/miss kernels responsible for writing results to a UAV or something.

Asynchronous compute could be very useful at covering stalls from ray dispatch. Though the presence of the stalled kernels limits occupancy.

A driver/compiler level optimization is to let the kernel continue running past the ray dispatch point as far as possible. This sort of thing is a common optimization used by many compilers - by putting as much code between a high latency operation and where the result is consumed, you can cover part or all of the stall. Static reordering!

We can see that depending on where the ray results are used, tracing can either be latency sensitive or not. In the first case, large scale batching and reordering actually hurts performance. In the second, they might be a win. There's a lot of room for hardware, driver, and compiler optimization here.

Anyway, https://devblogs.nvidia.com/rtx-best-practices/ has a lot of indirect information about how RTX works under the hood.
 
More samples gives better estimates whatever the estimator and these methods all improvise heavily enough to need a long history, the lag comes with the territory.

As I said before, there's something to be said for faking it ...
 
If laggy light is part of real-time raytracing, I'm starting to take umbrage with calling it 'real-time'. If the processing power needed to perform adequate sampling is that great that we need a half second delay in when our lighting catches up, resulting in smeary visuals on dynamic lights, that's significantly different from the ideal and the ways RT has been portrayed. "It's a solution to all your lighting problems! (just please ignore the new one it introduces)"
 
If laggy light is part of real-time raytracing, I'm starting to take umbrage with calling it 'real-time'. If the processing power needed to perform adequate sampling is that great that we need a half second delay in when our lighting catches up, resulting in smeary visuals on dynamic lights, that's significantly different from the ideal and the ways RT has been portrayed. "It's a solution to all your lighting problems! (just please ignore the new one it introduces)"

Well, this has been a thing since the very fist demos more than a year ago (Star wars Reflection & everything after). This is the current price to pay for "noise free" IQ with extremely low sample count.
 
Last edited:
Back
Top