Next gen lighting technologies - voxelised, traced, and everything else *spawn*

Since Ray Tracing isn't all about graphics...
Valve has finally announced the availability of Steam Audio Radeon Rays support in the Beta 15
https://steamcommunity.com/games/596420/announcements/detail/1681419156989664451

We have just released Steam Audio 2.0 beta 15, which brings support for AMD Radeon Rays technology. Radeon Rays is a high-performance, GPU-accelerated software library for ray tracing, and works on any modern AMD, NVIDIA, or other GPU. Steam Audio uses ray tracing when baking indirect sound propagation and reverberation; using Radeon Rays lets Steam Audio achieve performance gains of 50x-150x over the built-in ray tracer running with a single thread during baking. For example, reverb bakes that required an hour using the built-in ray tracer with a single thread should now take less than a minute using Radeon Rays on a Radeon RX Vega 64 GPU.

Radeon Rays support is optional in Steam Audio; Steam Audio continues to work on any PC with any CPU or GPU, as well as on ARM-based Android devices.

How is Radeon Rays useful to Steam Audio?
Steam Audio uses ray tracing for baking indirect sound propagation. Rays are traced from a probe position and bounced around the scene until they hit a source. The surfaces hit by the rays determine how much energy is absorbed, and how much reaches the probe from the source. These energies, along with the arrival times of each ray, are used to construct the impulse response from the source to the probe. An impulse response (IR) is an audio filter that represents the acoustics of the scene; rendering a sound with the IR creates the impression that the sound was emitted from within the scene.

The above approach is also used when baking reverb; in this case rays are traced from a probe position and bounced around the scene until they hit the probe again. The IR constructed this way models the listener-centric reverb at the probe position.

When baking, ray tracing is used to simulate indirect sound propagation for hundreds, or even thousands of probes for a typical scene. Radeon Rays lets developers use the compute capabilities of their GPU to significantly accelerate baking, resulting in measurable time savings during the design process.

The current release of Steam Audio does not support real-time simulation with Radeon Rays.

What is Radeon Rays?
Radeon Rays is a software library that provides GPU-accelerated algorithms for tracing coherent rays (direct light) and incoherent rays (global illumination, sound propagation). Radeon Rays is highly optimized for modern GPUs, and provides OpenCL and Vulkan backends. Steam Audio uses the OpenCL backend, which requires a GPU that supports OpenCL 1.2 or higher.

Radeon Rays is not restricted to AMD hardware; it works with any device that supports OpenCL 1.2 or higher, including NVIDIA and Intel GPUs.

What are the benefits of Radeon Rays?
When using Steam Audio to bake indirect sound propagation or reverb, Radeon Rays provides significant speedups and time savings for designers:

30b33a9ee35eedc3773ded1de7dc10ded49f0093.png

Figure: Speedup when baking reverb using Radeon Rays vs. Embree (single-threaded) vs. Steam Audio's built-in ray tracer (single-threaded), for two scenes: Sibenik cathedral (80k triangles) and a Hangar scene from the Unity Asset Store (140k triangles). Speedups are averaged over a range of simulation settings and probe grid densities, and plotted using a logarithmic scale. Speedups shown in the graph are relative to Steam Audio's built-in ray tracer. For example, on the Sibenik cathedral, Embree on a single core is 5.2x faster than the built-in ray tracer on a single core; Radeon Rays on an RX Vega 64 is 153.9x faster than the built-in ray tracer on a single core.

The above performance measurements were obtained on an Intel Core i7 5930K (Haswell E) CPU, along with an AMD Radeon RX Vega 64 GPU, running Windows 10 64-bit.

More discussion in the Advanced Audio Technologies (HRTF, Dolby Atmos, etc) thread here: https://forum.beyond3d.com/threads/...logies-hrtf-dolby-atmos-etc-echo.58309/page-5
 

This sort of sums up why I feel ray tracing just isn't a good fit for real-time (yet).

It's just too precise. Real-time graphics are a trade-off of using approximations for performance. You can always be more accurate, but typically the cost for quality curve can get pretty frightening. Which is why I'm a big believer in spacial approximations like voxelization and signed distance fields (SDFs). They can't represent the ground truth, but when you don't have the performance to get to ground truth anyway you might as well choose an approximation that gets you as close as possible for your budget.

Ray tracing, and variants of, are our best method of getting to ground truth but are arguably the slowest way of getting there - they just get further than other techniques given enough time. Time we don't have.

My understanding is claybook is SDF based, so their ray performance shows. They made good tradeoffs for their needs.


I look at the battlefield demo, and while yes it's very impressive I still see the shortcomings very strongly. They clearly have *massively* cut back the complexity of the scene representation for the ray traced environment - you can see it in the dramatically lower detail in the reflections showing the buildings, the first car being the only one creating a reflection (it's underside appears to have been removed, structurally it looks to have been simplified a lot, see the headlight artifacts) - even the tram appears to have had most of it's interior removed, etc. The lighting for the ray samples also appears drastically simplified (presumably using a very simple forward shading path).

What this says to me is that in order to get acceptable performance they had to really gut their scene complexity. I don't find this surprising at all - but it comes back to way I said earlier, it's too precise. This reduction becomes strongly visible. Would a voxel based representation have been a better fit? Trading ultimate precision for a closer overall match to the world structure (Just lower fidelity)? Would that then allow them to unify their diffuse and specular? - the ray traced reflections being quite a jarring mismatch to the non-traced diffuse...

Would a better tradeoffs have been to have worse results for these ultra shiny/smooth surfaces, but more accurate results from more common glossy and diffuse surfaces? I would personally make that trade-off.

I get the impression they are doing something to prevent self intersection with the simplified scene too, which may be making things look a little off. Say, projecting the ray start position in screen space forward from the high res on to the lower complexity scene's depth. Just a guess... But I think that's why some of the contoured reflective surfaces look... Odd...

I don't want to take away how impressive it still is. I just don't think it's the right solution.

It's all very interesting tech, but it's just not there in my mind. Once we have 50/rpp/frame capability I think then we'll see a larger shift to using rays for things like shadows - where performance of rays becomes worth the trade-off for increased quality. But I don't see larger scale adoption (diffuse, glossy, etc) until we're at ~500/rpp/frame - and that's a *long* way off. Let alone the how to deal with scenes that have significant depth or movement complexity (dense foliage, open world, streaming, etc).


In the mean time I see a few games experimenting with small scale uses but ultimately I think most will fall back to tried and true methods when people realize the performance tradeoffs just aren't there, instead using RT hardware for more non-graphics uses. Things like offloading AI visibility etc. For visuals I see cone tracing and variants of being far more prevalent.

Of course I could be totally wrong :mrgreen:
 
Except that the above linked '5 faces' IS raytracing polygon soups...

Yes, but still on a very controlled environment. The guy hand tuned the hell out of the scene, amount of visible polys and viewpoints to reach 30fps.

Please note that I love that prod as much as everyone else, it's pure art and of extreme quality, but still, imho not a direct comparison to DXR/RTX.
 
Yes, but still on a very controlled environment. The guy hand tuned the hell out of the scene, amount of visible polys and viewpoints to reach 30fps.
note also all the physics etc prolly aint happening, everything will of been calculated beforehand eg
time 1.34
- list of polygons visible at which positions, which polygons are reflected from these polygons etc
which really saves CPU/GPU work

compare this to a game, where anything could happen at time 1.34
 
Quick cross post from the Next-Gen console HW prediction thread in case some don't go there:

Regarding the debate around general-purpose compute vs dedicated "fixed-function" HW for RT here's Microsoft's stance on it:

https://blogs.msdn.microsoft.com/directx/2018/03/19/announcing-microsoft-directx-raytracing/

"You may have noticed that DXR does not introduce a new GPU engine to go alongside DX12’s existing Graphics and Compute engines. This is intentional – DXR workloads can be run on either of DX12’s existing engines. The primary reason for this is that, fundamentally, DXR is a compute-like workload. It does not require complex state such as output merger blend modes or input assembler vertex layouts. A secondary reason, however, is that representing DXR as a compute-like workload is aligned to what we see as the future of graphics, namely that hardware will be increasingly general-purpose, and eventually most fixed-function units will be replaced by HLSL code. "


So yeah, Turing's RT Cores go against' s Microsoft's DXR vision. But this just strengthens my belief that Turing is principally a Pro grade GPU aimed and conquering the Deep Learning and most-importantly (compared to Volta) the CGI industry by totally replacing CPU based render farms in the long run (which is IMO the right way to go and I fully support NVidia in this endeavour).

EDIT: More DXR stuff

All RT operations go through DXR and are cross vendor/GPU arch compatible but on Turing GPUs some of the calls are automatically translated to OptiX (CUDA) through the driver and accelerated buy the (still mysterious) RT Cores.

Here's an example with ChaosGroup (VRay) project Lavina's real-time RT renderer which also interestingly doesn't use OptiX AI denoising but their own cross vendor AI denoising solution (VRay Next support both for production rendering):


"What is Lavina built on?
Project Lavina is written entirely within DXR, which allows it to run on GPUs from multiple vendors while taking advantage of the RT Core within the upcoming NVIDIA “RTX” class of Turing GPUs. You will notice that there’s no noise or “convergence” happening on the frames, which is thanks to a new, real-time Chaos denoiser written in HLSL that also allows it to run on almost any GPU. With this, we aim to eventually deliver noise-free ray tracing at speeds and resolution suitable for a VR headset with Lavina."


So, how much faster is it?

Lavina is already seeing a big boost from the RT Core on NVIDIA’s Turing GPU. How much of an increase is a little tricky to calculate right now because we don’t have a version that doesn’t use it. But, with some sleuthing, we believe we’re seeing about a doubling of performance beyond what the new GPU generation is already giving us – which is the equivalent of leapfrogging several years in hardware evolution. One thing’s for sure: the performance curve plotted by Moore’s Law just got a vertical stair step added to it, and Chaos is set to exploit it!"

https://www.chaosgroup.com/blog/ray-traced-tendering-accelerates-to-real-time-with-project-lavina
 
Last edited:
I look at the battlefield demo, and while yes it's very impressive I still see the shortcomings very strongly. They clearly have *massively* cut back the complexity of the scene representation for the ray traced environment - you can see it in the dramatically lower detail in the reflections showing the buildings, the first car being the only one creating a reflection (it's underside appears to have been removed, structurally it looks to have been simplified a lot, see the headlight artifacts) - even the tram appears to have had most of it's interior removed, etc. The lighting for the ray samples also appears drastically simplified (presumably using a very simple forward shading path).

What this says to me is that in order to get acceptable performance they had to really gut their scene complexity. I don't find this surprising at all - but it comes back to way I said earlier, it's too precise. This reduction becomes strongly visible. Would a voxel based representation have been a better fit? Trading ultimate precision for a closer overall match to the world structure (Just lower fidelity)? Would that then allow them to unify their diffuse and specular? - the ray traced reflections being quite a jarring mismatch to the non-traced diffuse...

Would a better tradeoffs have been to have worse results for these ultra shiny/smooth surfaces, but more accurate results from more common glossy and diffuse surfaces? I would personally make that trade-off.

I get the impression they are doing something to prevent self intersection with the simplified scene too, which may be making things look a little off. Say, projecting the ray start position in screen space forward from the high res on to the lower complexity scene's depth. Just a guess... But I think that's why some of the contoured reflective surfaces look... Odd...

In my opinion, screen space reflections have for example really annoying artefact in games. What does UHD help me if I can see the same mistakes for 15 years. At some point I get fed up with this level of graphics and I want something fresh. How to solve the problem with screen space artifacts of SSRs etc. without using raytracing? The question is how to get raytracing as performant to use it in real time. The fact that one has to make precision cuts is logical. The ray is started by the developer with an offset and if it does not hit one object it's not a film quality path tracer but still several leagues above any sreen space effect. In the end it is up to the developer which trade-off in terms of speed and quality they want to have.

This is subjective and the same argument can be made for any effect. In the one hand some will say if UHD with 240fps are not possible they will turn raytracing off while on the other hand some people prefer to see the nice reflections… Turing is the first GPU generation of this kind and Battlefield etc. are the earliest of those raytracing implementations in games and I think it can only get better and better.
 
Last edited:
I look at the battlefield demo, and while yes it's very impressive I still see the shortcomings very strongly. They clearly have *massively* cut back the complexity of the scene representation for the ray traced environment - you can see it in the dramatically lower detail in the reflections showing the buildings, the first car being the only one creating a reflection (it's underside appears to have been removed, structurally it looks to have been simplified a lot, see the headlight artifacts) - even the tram appears to have had most of it's interior removed, etc.
These cutbacks are present normally in any Battlefield game, objects are not usually detailed unless they are controlled by the player. This is a 64 player mayhem multiplayer after all, cutbacks are needed everywhere. You can see the same in Battlefield 1 as well.
 
Cross posting this from the impact of RT on consoles thread as I figured folks here might be interested too.

Speaking of the Quadro market, ChaosGroup the makers of V-Ray span up a demo named Project Lavina using RTX now while they are not making any commitment to releasing it into production they will be taking the lessons back for V-Ray GPU and may down the line release it. Haven't seen any others yet
https://www.chaosgroup.com/blog/ray-traced-tendering-accelerates-to-real-time-with-project-lavina

I do wonder if we'll see any other serious rendering products offer this, I wouldn't expect them to ever reach final production quality but higher quality previews are always welcome
 
Cross posting this from the impact of RT on consoles thread as I figured folks here might be interested too.

Speaking of the Quadro market, ChaosGroup the makers of V-Ray span up a demo named Project Lavina using RTX now while they are not making any commitment to releasing it into production they will be taking the lessons back for V-Ray GPU and may down the line release it. Haven't seen any others yet
https://www.chaosgroup.com/blog/ray-traced-tendering-accelerates-to-real-time-with-project-lavina

I do wonder if we'll see any other serious rendering products offer this, I wouldn't expect them to ever reach final production quality but higher quality previews are always welcome
I cross posted it here just 2 message up ;)
 
Cross posting this from the impact of RT on consoles thread as I figured folks here might be interested too.

Speaking of the Quadro market, ChaosGroup the makers of V-Ray span up a demo named Project Lavina using RTX now while they are not making any commitment to releasing it into production they will be taking the lessons back for V-Ray GPU and may down the line release it. Haven't seen any others yet
https://www.chaosgroup.com/blog/ray-traced-tendering-accelerates-to-real-time-with-project-lavina

I do wonder if we'll see any other serious rendering products offer this, I wouldn't expect them to ever reach final production quality but higher quality previews are always welcome
Octane is RTX accelerated as well.

 
Last edited:
It's still kind of an odd design choice. For some years GPU hardware has been progressing towards powerful flexible cores than can be used for anything and now we have this. Bespoke processors. If these are not being used, are those resources just sat there idle?

I'm not seeing much detail as to the particulars, but some counter-pressures to expanding the scope of the programmable units may be the way Nvidia's trying to fit more resources or performance into a transistor and power budget that has not been improved by going to 7nm. A dedicated hardware path can be added with limited impact to die size and power consumption. The scheduler and data paths dedicated to the more fully-featured units are already expensive and optimized towards a granularity that earlier research showed problems with divergence.
In this scenario, the programmable paths already have significant overheads to cover their broader use cases, and burdening them more would increase area/power/delay while at the same time using them less efficiently.

In terms of risk management, something as speculative as the RT hardware might have been better of as a separate and concurrent development project that was less likely to be delayed by the standard units' own design projects/potential cancellations and vice versa. Having multiple hardware variations come out even within a family due to the timing of a specific tech's maturation relative to when specific GPUs were finalized occurred with double-rate FP16.

Perhaps at a later time with more silicon to play with and better evaluation on how the functionality can be more seamlessly carried out by the main SM units, that might change.

Other factors that I am curious about are whether there's something to Nvidia's implementation that makes the payoff from running in the SIMDs less important. Some of the workarounds for BVH traversal and testing use persistent threads to process ray packets, which means longer-lived threads whose relationship to primitives is the reverse of pixel or vertex work. Traversal sounds memory-limited, which may be why there are multiple ALU and tensor subcores, but the newer streaming memory unit of the SM is 1:1 with the TEX and RT blocks. There was a tweet about the RT core being a hardware path that scheduled work and L0 buffers for the BVH work, which may be a similar sequencer-heavy process to how texturing can generate multiple real memory addresses.
The allocation of L0 cache storage might point to a similarity with another form of special-purpose hardware whose job is to follow indirection through hierarchical tables to reach a leaf node or array, a page table walker and hierarchy of TLB and intermediate translation caches.
A pipeline like that can try to coalesce requests from across various warps and try to extract locality without bogging down their schedulers, registers, and instruction caches. Specific features to Nvidia's implementation may encode shortcuts or other methods for speeding up traversal or freeing/updating the hierarchy. The depth of the hierarchy and the mapping aren't as straightforward, but the concept of having a hardware unit able to navigate it is well-understood and there may be parallels in managing updates to the BVH somewhat more efficiently as there are for maintaining and updating the page table structures.

Having hardware for that isn't strictly necessary, but even so having it has proven compelling for many architectures.
 
You've got to see the funny side of GPU vendors, display manufacturers and gamers going full steam ahead on the 4K hype train and the moment we get GPUs powerful enough to comfortably do 4k60, Nvidia announce raytraced 1080p60 is the new hotness. Like 4K was just something to throw new GPUs at until they figured out real-time raytracing. I can kinda see why there's such a backlash. The people buying high-end GPUs also probably bought higher res/refresh displays because that was the obvious direction of a number of industries. Are gamers going to find themselves in a 4K *or* raytracing situation?
 
They've always been in a "higher quality or higher resolution" situation - that's why there are settings for gamers to adjust to pick which compromise they prefer. This is no different to having 1 or 6 shadow casting lights on your Ti4600 back in the day.
The last decade has spoiled gamers with console ports. Now that an actual high-cost, high-end feature shows up many of them are losing their minds :LOL:
 
Since Ray Tracing isn't all about graphics...
Valve has finally announced the availability of Steam Audio Radeon Rays support in the Beta 15
https://steamcommunity.com/games/596420/announcements/detail/1681419156989664451



More discussion in the Advanced Audio Technologies (HRTF, Dolby Atmos, etc) thread here: https://forum.beyond3d.com/threads/...logies-hrtf-dolby-atmos-etc-echo.58309/page-5

Wait how is a picture of my house on there????? I’m scared now.

Edit. Never mind!
 
Last edited:
http://boostclock.com/show/000219/gpu-rendering-nv-fermat-gtx980ti-gtx1080-gtx1080ti-titanv.html

For this benchmarking session we set up NVIDIA's Fermat research oriented physically based rendering system - the project uses CUDA and the OptiX Prime intersection library to generate photorealistic images. To collect performance metrics we picked well known 3d models (Sponza Atrium, Natural History Museum, ...) and set Fermat's engine to opt for the path tracing backend at 4k resolution.

Wonder why the 980Ti is consistently faster than the 1080, up to 25% in one scene.
 
Back
Top