GART: Games and Applications using RayTracing

DegustatoR · Jul 3, 2021

chris1515 said:
Ray/intersection is hardwate accelerated but not BVH traversal.

Ray intersection is BVH traversal, until it hits a triangle.

DavidGraham · Jul 3, 2021

JoeJ said:
Vendor extensions would be the way to go, but

We've seen how developers fuck up flexible APIs, most DX12 impelementations were trash in the first 4 years, with massively worse performance than DX11. The situation could be alot worse with a DXR that is more to the metal.

chris1515 · Jul 3, 2021

DegustatoR said:
Ray intersection is BVH traversal, until it hits a triangle.

AMD does not have MIMD core to optimize BVH traversal at all. And NVIDIA told the AMD approach rely on SIMD core is less performant.

https://www.techpowerup.com/review/nvidia-geforce-ampere-architecture-board-design-gaming-tech-software/4.html#:~:text=NVIDIA's RT cores offer a,latency from the hardware stack.&text=NVIDIA tells us that this,effects in real-time raytracing.

With Ampere, NVIDIA introduces its 2nd generation RT core that aims to improve raytracing acceleration, as well as new effects, such as raytraced motion blur. An RT core is a fixed-function hardware component that handles two of the most challenging tasks for SIMD programmable shaders, bounding volume hierarchy (BVH) traversal and intersection; i.e., calculating the exact point where a ray collides with a surface, so its next course can be charted. Typical raytracing workloads in a raster+raytracing hybrid rendering path involve calculating steps of traversal and intersection across the BVH and bounding-box/triangle intersections, which is a very unsuitable workload for typical GPUs because of the nature of memory accesses involved. This kind of pointer chasing doesn't scale well with SIMD architectures (read: programmable shaders) and is better suited to special fixed-function hardware, like the MIMD RT cores.

Without taking names, NVIDIA pointed out that a minimalist approach toward raytracing (possibly what AMD is up to with RDNA2) has a performance impact due to overreliance on SIMD stream processors. NVIDIA's RT cores offer a completely hardware-based BVH traversal stack, a purpose-built MIMD execution unit, and inherently lower latency from the hardware stack. The 2nd generation RT core being introduced with Ampere adds one more hardware component.

Malo · Jul 3, 2021

chris1515 said:
Denoising and DLSS is accelerated by Tensor Core.

Has there been any use of tensor cores for denoising outside of rendering productivity software?

chris1515 · Jul 3, 2021

Malo said:
Has there been any use of tensor cores for denoising outside of rendering productivity software?

No I think currently denoising is done using shader core but at least this is a possibility. On AMD denoising can only be done using shader core.

DavidGraham · Jul 3, 2021

chris1515 said:
but at least this is a possibility

There isn't any possibility, all DXR/Vulkan implementations use shader denoisers, even the ones made by NVIDIA for games such as Quake 2, Minecraft or Watch Dogs Legion.

chris1515 · Jul 3, 2021

DavidGraham said:
There isn't any possibility, all DXR/Vulkan implementations use shader denoisers, even the one made by NVIDIA for games such as Quake 2, Minecraft or Watch Dogs Legion.

Thanks I was not knowing but the core being there I was thinking they can leverage them for denoising, at least they are used for DLSS:

Malo · Jul 3, 2021

chris1515 said:
No I think currently denoising is done using shader core but at least this is a possibility. On AMD denoising can only be done using shader core.

Yeah definitely a possibility and was one of the marketing items when Turing was first released. But for gaming DLSS became the focus for tensors (which is good imo).

chris1515 · Jul 3, 2021

Basically when more and more games will use Unreal Engine 5, we will see less and less games use raytracing until a new version of DXR probably from 2023.

DegustatoR · Jul 3, 2021

chris1515 said:
AMD does not have MIMD core to optimize BVH traversal at all. And NVIDIA told the AMD approach rely on SIMD core is less performant.

https://www.techpowerup.com/review/nvidia-geforce-ampere-architecture-board-design-gaming-tech-software/4.html#:~:text=NVIDIA's RT cores offer a,latency from the hardware stack.&text=NVIDIA tells us that this,effects in real-time raytracing.

You're confusing ray traversal with hit evaluation. RDNA2 does hit evaluation (what to do when ray hit a BVH volume or a triangle) on shading h/w but ray traversal is handled by dedicated RT h/w. Traversal is done by ray accelerators through BVH until it ends at some triangle. The difference between NV and AMD is in triangle hits where RT core can decide what to do with rays on it's own while RDNA2 has to run a shader - and since rays may diverge at this point the SIMD h/w on which the shader is running may end up being severely underutilized meaning that more cycles will be needed to fully evaluate the hit.

chris1515 said:
Basically when more and more games will use Unreal Engine 5, we will see less and less games use raytracing until a new version of DXR probably from 2023.

It's been confirmed many times already that UE5 will use DXR h/w.

chris1515 · Jul 3, 2021

DegustatoR said:
You're confusing ray traversal with hit evaluation. RDNA2 does hit evaluation (what to do when ray hit a BVH volume or a triangle) on shading h/w but ray traversal is handled by dedicated RT h/w. Traversal is done by ray accelerators through BVH until it ends at some triangle. The difference between NV and AMD is in triangle hits where RT core can decide what to do with rays on it's own while RDNA2 has to run a shader - and since rays may diverge at this point the SIMD h/w on which the shader is running may end up being severely underutilized meaning that more cycles will be needed to fully evaluate the hit.

It's been confirmed many times already that UE5 will use DXR h/w.

This is not the default option and far from ideal. Brian Karis asked on twitter better API to be able to use LOD.

DegustatoR · Jul 3, 2021

chris1515 said:
This is not the default option and far from ideal. Brian Karis asked on twitter better API to be able to use LOD.

It's as "ideal" as Lumen is while providing much better quality for reflections and likely a speed-up too. Better API will be needed to raytrace against Nanite meshes - something which isn't done by anything at the moment.

chris1515 · Jul 3, 2021

DegustatoR said:
It's as "ideal" as Lumen is while providing much better quality for reflections and likely a speed-up too. Better API will be needed to raytrace against Nanite meshes - something which isn't done by anything at the moment.

The usage of proxy is far from ideal

https://twitter.com/x/status/1397647599541231617

https://twitter.com/x/status/1397662748582944768

DegustatoR · Jul 3, 2021

chris1515 said:
The usage of proxy is far from ideal

It's better than Lumen's SDF tracing.

chris1515 · Jul 3, 2021

DegustatoR said:
It's better than Lumen's SDF tracing.

We will see when Unreal Engine 5 games will arrive. I think nearly all UE 5 games using Nanite will not use raytracing.

trinibwoy · Jul 3, 2021

chris1515 said:
Ray/intersection is hardware accelerated but not BVH traversal on AMD GPU. NVIDIA GPU hw accelration use RT core for BVH traversal and ray/intersection acceleration. Denoising and DLSS is accelerated by Tensor Core.

You are confusing BVH traversal (the logic that navigates the structure) with BVH the data structure itself. We are discussing the latter.

The quote you posted doesn’t say that devs can create their own BVH data model only that they can customize how it’s populated.

DegustatoR · Jul 3, 2021

chris1515 said:
We will see when Unreal Engine 5 games will arrive. I think nearly all UE 5 games using Nanite will not use raytracing.

Sure. I OTOH will be surprised if we'll get even one UE5 AAA title which won't use RT h/w in some capacity.

chris1515 · Jul 3, 2021

DegustatoR said:
Sure. I OTOH will be surprised if we'll get even one UE5 AAA title which won't use RT h/w in some capacity.

I will be precise, I don't think many Nanite games will use triangle based raytracing with proxies. They can use the RT H/W capacity for other type of primitive.

JoeJ · Jul 3, 2021

DegustatoR said:
Ray intersection is BVH traversal, until it hits a triangle.

That's a really bad simplification if given context.
Tree traversal is a very general algorithm, involving the problem of cache misses while chasing pointers, but having the advantage of skipping unnecessary work. It's general from sorting up to spatial queries.
Hitting a triangle or the bounding box of a node is specific to raytracing, but there is no 'problem' to solve her other than finding minimal math to calculate those intersections.
AMDs RT has no HW traversal. They implement DXR in custom code executed on regular CUs. They only have intersection instructions which expect data to be accessed from TMUs. This is why their RT is 'slow' in comparison to NV.
NV has HW traversal units which do the whole traversal through the tree until they find a hit on a triangle, which ofc. is faster, and SMs can do some other work while RT cores are busy doing that.

Advantage on AMDs side is flexible and programmable traversal, which can't be utilized on PC. Could be used eventually to implement Intels stochastic LOD for example.
Restrictions seems: Data must come from textures so video memory. We can not generate triangles or BVH nodes on the fly in compute registers or LDS and intersect them. But i guess it's possible to 'transport the ray' or 'switch branches of BVH' during traversal.

So the term 'Ray accelerators' really only means intersection instructions, which can't be compared to NVs RT cores handling traversal as well.

JoeJ · Jul 3, 2021

DegustatoR said:
You're confusing ray traversal with hit evaluation. RDNA2 does hit evaluation (what to do when ray hit a BVH volume or a triangle) on shading h/w but ray traversal is handled by dedicated RT h/w. Traversal is done by ray accelerators through BVH until it ends at some triangle. The difference between NV and AMD is in triangle hits where RT core can decide what to do with rays on it's own while RDNA2 has to run a shader - and since rays may diverge at this point the SIMD h/w on which the shader is running may end up being severely underutilized meaning that more cycles will be needed to fully evaluate the hit.

Oh, interesting! Where do you have this information from?
The intersection instruction is documented in ISA docs, but if they had traversal units, there would be no direct need to expose such instruction to compute?

GART: Games and Applications using RayTracing

DegustatoR

DavidGraham

chris1515

Malo

Yak Mechanicum

chris1515

DavidGraham

chris1515

Malo

Yak Mechanicum

chris1515

DegustatoR

chris1515

DegustatoR

chris1515

DegustatoR

chris1515

trinibwoy

Meh

DegustatoR

chris1515

JoeJ

JoeJ

Similar threads