AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

Kaotik · Dec 26, 2020

DavidGraham said:
In general, RDNA2 appears to use Level 2 acceleration for ray tracing, while Turing and Ampere are Level 3.
Level 4 and 5 take more die area and require even more highly specialized units, this is an area where NVIDIA might venture forward in their upcoming archetictures.

https://gfxspeak.com/2020/09/28/the-levels-tracing/

NightAntilli said:
I think both AMD and nVidia's implementation are Level 3. The difference is in the traversal, not the BVH.

Nope, you should have opened the link too.

Level 0: Legacy solutions

Level 1: Software on traditional GPUs

Level 2: Ray/box and ray/tri-testers in hardware

Level 3: Bounding Volume Hierarchy (BVH) processing in hardware

Level 4: BVH processing and coherency sorting in hardware

Level 5: Coherent BVH processing with Scene Hierarchy Generation (SHG) in hardware

RDNA2 is Level 2, but not sure if NVIDIA counts as Level 3 either. At least reading through that it's talking about more than just traversal of the BVH tree (which is what NVIDIA has over AMD)

3dcgi · Dec 27, 2020

Primitive shaders have a hardware assisted mode that uses the input assembler and a fast launch mode that looks like compute. Mesh Shaders use the later mode and it's existed since Vega though a few tweaks were need to support the Mesh Shader API.

Ethatron · Dec 27, 2020

3dcgi said:
Primitive shaders have a hardware assisted mode that uses the input assembler and a fast launch mode that looks like compute. Mesh Shaders use the later mode and it's existed since Vega though a few tweaks were need to support the Mesh Shader API.

AMD abandoned the IA a long time ago; for the better. They experimented a lot over the years and across products, regarding the primitive pipeline. It might not be so visible on the public surface.
I would say this is a fairly good proto-concept for mesh shaders: https://patents.google.com/patent/US20140362081A1/en
Basically in that iteration there's a large degree of freedom possible in the span from "fetch shader" to "compute vertex front" to "vertex shader", where the purpose of each conceptual stage is already overlapping. I tend to think of "Primitive Shaders" as a cleanup of the inevitable mess from the experimentation, a consolidation into a cleaner more unified hardware concept. Amplification isn't embedded into these as a first class citizen yet, but if you got no IA, you can do whatever you want through the draw parameters and just treat the whole thing as a procedural generation problem. You can see Amplification being realizable in terms of instancing. It's very interesting how this loops back to the DX9 tesselation add-on, where the amplification basically happened [conceptually] inside the IA, and the vertex shader got fed with barycentrics. Amazing flexibility. When looking at the AMD ISA, I feel the hardware can implement a large amount of different abstract rasterization pipeline models, without much of a problem.

I think this whole history and evolution of the primitive front-end would be a very nice article for Beyond3D.

Dampf · Dec 27, 2020

Kaotik said:
Nope, you should have opened the link too.

RDNA2 is Level 2, but not sure if NVIDIA counts as Level 3 either. At least reading through that it's talking about more than just traversal of the BVH tree (which is what NVIDIA has over AMD)

Nvidia is Level 3, as it has hardware acceleration for the BVH traversal process.

Kaotik · Dec 27, 2020

Dampf said:
Nvidia is Level 3, as it has hardware acceleration for the BVH traversal process.

Does running it on MIMD cores make it "hardware accelerated" over SIMD cores? Because that's literally the difference, NVIDIA has MIMD processor in the RT core for traversal while AMD runs it on SIMD cores

pTmdfx · Dec 27, 2020

Kaotik said:
Does running it on MIMD cores make it "hardware accelerated" over SIMD cores? Because that's literally the difference, NVIDIA has MIMD processor in the RT core for traversal while AMD runs it on SIMD cores

“MIMD” is a vague implementation detail of Nvidia’s hardware BVH processor (“RT cores”) by the Level 3’s definition.

It is debatable though by the loose level definitions. Say if the BVH processor is a microprocessor core (as implied with “MIMD”) with special data paths (like many GPU subsystems), you are free to argue it not being Level 3, since it is controlled by software/microcode.

Likewise, RDNA 2 accelerates not only the intersection, but also the memory access with its vector gather memory pipeline. So even if it runs the traversal loop in CU, one can’t say truly that it is “just Level 2”, as if there is no BVH traversal/walking acceleration by hardware.

DavidGraham · Dec 27, 2020

Kaotik said:
Because that's literally the difference, NVIDIA has MIMD processor in the RT core for traversal while AMD runs it on SIMD cores

Yep, this makes all the difference, those MIMD cores are much more suitable for that type of workload than SIMDs.

HLJ · Dec 27, 2020

This should bring some more light onto this:

OlegSH · Dec 27, 2020

Kaotik said:
Does running it on MIMD cores make it "hardware accelerated" over SIMD cores?

It does because these are specialized MIMD cores with specialized ISA and likely formats, which offload the main SIMD cores and make traversal as fast as posible (it would be weird to select the number of the cores if they can't saturate intersection units).
On the other hand, there are general SIMD cores with general formats and precisions, these can be OK for coherent rays and can be bad at uncoherent rays for millions reasons - divergence, memory boundness, etc.
General formats and precisions requirements might cause BVH bloating and increased memory traffic since one of the reasons why specialized HW is so efficient is because it uses minimal precsion for the task and specialized compact formats.

I guess a lot can be debated on the Level 4 and Level 5 though.
Coherency sorting doesn't not seem to be a solved problem (and that's not really a problem for MIMD cores), which can be well generalized in HW and be ok for most of the cases.
Imagination's point on coherency sorting for better memory accesses is arguable too, that stuff can be handled by better memory requests coalescing, better caches logic, larger caches, etc.
Making BVH building completely in HW doesn't make a lot of sense if you can do the same efficiently on SIMDs and hide the processing time in async queues (I don't see modern games with millions of triangles suffering from this).
And the main critique for the article - there are no evidences that additional levels would bring any performance improvements, there is an evidence (real performance numbers) that current Lvl 3 works much better than Lvl 2 though.
Making stuff complex (sorting in HW) doesn't always work. Ironically, imagination's retirement from the desktop PCs was the best proof of the statment.

HLJ · Dec 27, 2020

I just found much better layouts.
Turing:

Ampere:

Looking at the RT cores,
Turing:

Ampere:

3dcgi · Dec 27, 2020

Ethatron said:
AMD abandoned the IA a long time ago; for the better. They experimented a lot over the years and across products, regarding the primitive pipeline. It might not be so visible on the public surface.
I would say this is a fairly good proto-concept for mesh shaders: https://patents.google.com/patent/US20140362081A1/en
Basically in that iteration there's a large degree of freedom possible in the span from "fetch shader" to "compute vertex front" to "vertex shader", where the purpose of each conceptual stage is already overlapping. I tend to think of "Primitive Shaders" as a cleanup of the inevitable mess from the experimentation, a consolidation into a cleaner more unified hardware concept. Amplification isn't embedded into these as a first class citizen yet, but if you got no IA, you can do whatever you want through the draw parameters and just treat the whole thing as a procedural generation problem. You can see Amplification being realizable in terms of instancing. It's very interesting how this loops back to the DX9 tesselation add-on, where the amplification basically happened [conceptually] inside the IA, and the vertex shader got fed with barycentrics. Amazing flexibility. When looking at the AMD ISA, I feel the hardware can implement a large amount of different abstract rasterization pipeline models, without much of a problem.

I think this whole history and evolution of the primitive front-end would be a very nice article for Beyond3D.

It depends on what you consider to be the IA. I was referring to hardware that reads the index buffer, forms primitives, and performs vertex reuse. That patent refers to a concept with a feature name called Dispatch Draw. It predated Primitive Shaders and has similarities though it's implemented very differently.

Scott_Arm · Dec 27, 2020

https://interplayoflight.wordpress.com/2020/12/27/rdna-2-hardware-raytracing/

Nice little blog post about rdna2 ray tracing hardware.

trinibwoy · Dec 27, 2020

Scott_Arm said:
https://interplayoflight.wordpress.com/2020/12/27/rdna-2-hardware-raytracing/

Nice little blog post about rdna2 ray tracing hardware.

Surely he's mistaken about one triangle per leaf node :-|

.

"It can be inferred from the return data of the intersection instruction that it is a BVH4 though, i.e. a BVH tree with 4 child nodes per node and one triangle in the leaf node."

andermans · Dec 28, 2020

Why do you think one triangle per leaf node is mistaken? I can think of some disadvantages but nothing particularly huge as far as I can tell.

Deleted member 2197 · Dec 29, 2020

Sapphire Radeon RX 6800 NITRO+ review
https://www.guru3d.com/articles-pages/sapphire-radeon-rx-6800-nitro-review,1.html

trinibwoy · Dec 29, 2020

andermans said:
Why do you think one triangle per leaf node is mistaken? I can think of some disadvantages but nothing particularly huge as far as I can tell.

Exponential ncrease in BVH memory footprint.

Frenetic Pony · Dec 29, 2020

trinibwoy said:
Exponential ncrease in BVH memory footprint.

Turns out it's 4 per leaf, which makes sense. (RDNA2 docs)

andermans · Dec 29, 2020

trinibwoy said:
Exponential ncrease in BVH memory footprint.

why exponential? Pretty much the majority of the BVh is just going to be the raw triangle data which would be a lower bound anyway (40 bytes for 9 floats + the triangle id). For a packing with N triangles (and assuming each box node has at least 2 children) you need N triangle nodes + N/2 box nodes + N/2/2 box nodes etc coming to N triangle nodes + N-1 box nodes.

Malo · Dec 30, 2020

Why was this brought up again? There's a dedicated thread for the HU "anti-RT" arguments. Nothing has changed and neither has all your opinions. So why complain about your beloved Nvidia again when it's going to go nowhere?

BRiT · Dec 30, 2020

Discussion moved over to existing thread @ https://forum.beyond3d.com/threads/...ples-to-reviewers-that-follow-procedure.62170

AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

Kaotik

Drunk Member

3dcgi

Ethatron

Dampf

Kaotik

Drunk Member

pTmdfx

DavidGraham

HLJ

OlegSH

HLJ

3dcgi

Scott_Arm

trinibwoy

Meh

andermans

Deleted member 2197

Guest

trinibwoy

Meh

Frenetic Pony

andermans

Malo

Yak Mechanicum

BRiT

(>• •)>⌐■-■ (⌐■-■)

Similar threads