GART: Games and Applications using RayTracing

Status
Not open for further replies.
Ray intersection is BVH traversal, until it hits a triangle.

AMD does not have MIMD core to optimize BVH traversal at all. And NVIDIA told the AMD approach rely on SIMD core is less performant.

https://www.techpowerup.com/review/nvidia-geforce-ampere-architecture-board-design-gaming-tech-software/4.html#:~:text=NVIDIA's RT cores offer a,latency from the hardware stack.&text=NVIDIA tells us that this,effects in real-time raytracing.

With Ampere, NVIDIA introduces its 2nd generation RT core that aims to improve raytracing acceleration, as well as new effects, such as raytraced motion blur. An RT core is a fixed-function hardware component that handles two of the most challenging tasks for SIMD programmable shaders, bounding volume hierarchy (BVH) traversal and intersection; i.e., calculating the exact point where a ray collides with a surface, so its next course can be charted. Typical raytracing workloads in a raster+raytracing hybrid rendering path involve calculating steps of traversal and intersection across the BVH and bounding-box/triangle intersections, which is a very unsuitable workload for typical GPUs because of the nature of memory accesses involved. This kind of pointer chasing doesn't scale well with SIMD architectures (read: programmable shaders) and is better suited to special fixed-function hardware, like the MIMD RT cores.

raytracing-acceleration.jpg


Without taking names, NVIDIA pointed out that a minimalist approach toward raytracing (possibly what AMD is up to with RDNA2) has a performance impact due to overreliance on SIMD stream processors. NVIDIA's RT cores offer a completely hardware-based BVH traversal stack, a purpose-built MIMD execution unit, and inherently lower latency from the hardware stack. The 2nd generation RT core being introduced with Ampere adds one more hardware component.
 
Has there been any use of tensor cores for denoising outside of rendering productivity software?

No I think currently denoising is done using shader core but at least this is a possibility. On AMD denoising can only be done using shader core.
 
There isn't any possibility, all DXR/Vulkan implementations use shader denoisers, even the one made by NVIDIA for games such as Quake 2, Minecraft or Watch Dogs Legion.

Thanks I was not knowing but the core being there I was thinking they can leverage them for denoising, at least they are used for DLSS:
 
No I think currently denoising is done using shader core but at least this is a possibility. On AMD denoising can only be done using shader core.
Yeah definitely a possibility and was one of the marketing items when Turing was first released. But for gaming DLSS became the focus for tensors (which is good imo).
 
Basically when more and more games will use Unreal Engine 5, we will see less and less games use raytracing until a new version of DXR probably from 2023.
 
You're confusing ray traversal with hit evaluation. RDNA2 does hit evaluation (what to do when ray hit a BVH volume or a triangle) on shading h/w but ray traversal is handled by dedicated RT h/w. Traversal is done by ray accelerators through BVH until it ends at some triangle. The difference between NV and AMD is in triangle hits where RT core can decide what to do with rays on it's own while RDNA2 has to run a shader - and since rays may diverge at this point the SIMD h/w on which the shader is running may end up being severely underutilized meaning that more cycles will be needed to fully evaluate the hit.

Basically when more and more games will use Unreal Engine 5, we will see less and less games use raytracing until a new version of DXR probably from 2023.
It's been confirmed many times already that UE5 will use DXR h/w.
 
You're confusing ray traversal with hit evaluation. RDNA2 does hit evaluation (what to do when ray hit a BVH volume or a triangle) on shading h/w but ray traversal is handled by dedicated RT h/w. Traversal is done by ray accelerators through BVH until it ends at some triangle. The difference between NV and AMD is in triangle hits where RT core can decide what to do with rays on it's own while RDNA2 has to run a shader - and since rays may diverge at this point the SIMD h/w on which the shader is running may end up being severely underutilized meaning that more cycles will be needed to fully evaluate the hit.


It's been confirmed many times already that UE5 will use DXR h/w.


This is not the default option and far from ideal. Brian Karis asked on twitter better API to be able to use LOD.
 
This is not the default option and far from ideal. Brian Karis asked on twitter better API to be able to use LOD.
It's as "ideal" as Lumen is while providing much better quality for reflections and likely a speed-up too. Better API will be needed to raytrace against Nanite meshes - something which isn't done by anything at the moment.
 
Ray/intersection is hardware accelerated but not BVH traversal on AMD GPU. NVIDIA GPU hw accelration use RT core for BVH traversal and ray/intersection acceleration. Denoising and DLSS is accelerated by Tensor Core.

You are confusing BVH traversal (the logic that navigates the structure) with BVH the data structure itself. We are discussing the latter.

The quote you posted doesn’t say that devs can create their own BVH data model only that they can customize how it’s populated.
 
Sure. I OTOH will be surprised if we'll get even one UE5 AAA title which won't use RT h/w in some capacity.

I will be precise, I don't think many Nanite games will use triangle based raytracing with proxies. They can use the RT H/W capacity for other type of primitive.
 
Last edited:
Ray intersection is BVH traversal, until it hits a triangle.
That's a really bad simplification if given context.
Tree traversal is a very general algorithm, involving the problem of cache misses while chasing pointers, but having the advantage of skipping unnecessary work. It's general from sorting up to spatial queries.
Hitting a triangle or the bounding box of a node is specific to raytracing, but there is no 'problem' to solve her other than finding minimal math to calculate those intersections.
AMDs RT has no HW traversal. They implement DXR in custom code executed on regular CUs. They only have intersection instructions which expect data to be accessed from TMUs. This is why their RT is 'slow' in comparison to NV.
NV has HW traversal units which do the whole traversal through the tree until they find a hit on a triangle, which ofc. is faster, and SMs can do some other work while RT cores are busy doing that.

Advantage on AMDs side is flexible and programmable traversal, which can't be utilized on PC. Could be used eventually to implement Intels stochastic LOD for example.
Restrictions seems: Data must come from textures so video memory. We can not generate triangles or BVH nodes on the fly in compute registers or LDS and intersect them. But i guess it's possible to 'transport the ray' or 'switch branches of BVH' during traversal.

So the term 'Ray accelerators' really only means intersection instructions, which can't be compared to NVs RT cores handling traversal as well.
 
You're confusing ray traversal with hit evaluation. RDNA2 does hit evaluation (what to do when ray hit a BVH volume or a triangle) on shading h/w but ray traversal is handled by dedicated RT h/w. Traversal is done by ray accelerators through BVH until it ends at some triangle. The difference between NV and AMD is in triangle hits where RT core can decide what to do with rays on it's own while RDNA2 has to run a shader - and since rays may diverge at this point the SIMD h/w on which the shader is running may end up being severely underutilized meaning that more cycles will be needed to fully evaluate the hit.
Oh, interesting! Where do you have this information from?
The intersection instruction is documented in ISA docs, but if they had traversal units, there would be no direct need to expose such instruction to compute?
 
Status
Not open for further replies.
Back
Top