AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

Discussion in 'Architecture and Products' started by BRiT, Oct 28, 2020.

  1. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    Nope, you should have opened the link too.
    RDNA2 is Level 2, but not sure if NVIDIA counts as Level 3 either. At least reading through that it's talking about more than just traversal of the BVH tree (which is what NVIDIA has over AMD)
     
  2. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,493
    Likes Received:
    474
    Primitive shaders have a hardware assisted mode that uses the input assembler and a fast launch mode that looks like compute. Mesh Shaders use the later mode and it's existed since Vega though a few tweaks were need to support the Mesh Shader API.
     
  3. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    946
    Likes Received:
    413
    AMD abandoned the IA a long time ago; for the better. They experimented a lot over the years and across products, regarding the primitive pipeline. It might not be so visible on the public surface.
    I would say this is a fairly good proto-concept for mesh shaders: https://patents.google.com/patent/US20140362081A1/en
    Basically in that iteration there's a large degree of freedom possible in the span from "fetch shader" to "compute vertex front" to "vertex shader", where the purpose of each conceptual stage is already overlapping. I tend to think of "Primitive Shaders" as a cleanup of the inevitable mess from the experimentation, a consolidation into a cleaner more unified hardware concept. Amplification isn't embedded into these as a first class citizen yet, but if you got no IA, you can do whatever you want through the draw parameters and just treat the whole thing as a procedural generation problem. You can see Amplification being realizable in terms of instancing. It's very interesting how this loops back to the DX9 tesselation add-on, where the amplification basically happened [conceptually] inside the IA, and the vertex shader got fed with barycentrics. Amazing flexibility. When looking at the AMD ISA, I feel the hardware can implement a large amount of different abstract rasterization pipeline models, without much of a problem.

    I think this whole history and evolution of the primitive front-end would be a very nice article for Beyond3D. <3
     
    Digidi, PSman1700, pharma and 3 others like this.
  4. Dampf

    Regular

    Joined:
    Nov 21, 2020
    Messages:
    284
    Likes Received:
    474
    Nvidia is Level 3, as it has hardware acceleration for the BVH traversal process.
     
  5. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    Does running it on MIMD cores make it "hardware accelerated" over SIMD cores? Because that's literally the difference, NVIDIA has MIMD processor in the RT core for traversal while AMD runs it on SIMD cores
     
  6. pTmdfx

    Regular

    Joined:
    May 27, 2014
    Messages:
    416
    Likes Received:
    379
    “MIMD” is a vague implementation detail of Nvidia’s hardware BVH processor (“RT cores”) by the Level 3’s definition.

    It is debatable though by the loose level definitions. Say if the BVH processor is a microprocessor core (as implied with “MIMD”) with special data paths (like many GPU subsystems), you are free to argue it not being Level 3, since it is controlled by software/microcode.

    Likewise, RDNA 2 accelerates not only the intersection, but also the memory access with its vector gather memory pipeline. So even if it runs the traversal loop in CU, one can’t say truly that it is “just Level 2”, as if there is no BVH traversal/walking acceleration by hardware.
     
    #2066 pTmdfx, Dec 27, 2020
    Last edited: Dec 27, 2020
  7. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,213
    Yep, this makes all the difference, those MIMD cores are much more suitable for that type of workload than SIMDs.
     
    chris1515, PSman1700 and Rootax like this.
  8. HLJ

    HLJ
    Regular

    Joined:
    Aug 26, 2020
    Messages:
    529
    Likes Received:
    869
    This should bring some more light onto this:
    [​IMG]
     
    pharma, PSman1700, xpea and 1 other person like this.
  9. OlegSH

    Regular

    Joined:
    Jan 10, 2010
    Messages:
    801
    Likes Received:
    1,631
    It does because these are specialized MIMD cores with specialized ISA and likely formats, which offload the main SIMD cores and make traversal as fast as posible (it would be weird to select the number of the cores if they can't saturate intersection units).
    On the other hand, there are general SIMD cores with general formats and precisions, these can be OK for coherent rays and can be bad at uncoherent rays for millions reasons - divergence, memory boundness, etc.
    General formats and precisions requirements might cause BVH bloating and increased memory traffic since one of the reasons why specialized HW is so efficient is because it uses minimal precsion for the task and specialized compact formats.

    I guess a lot can be debated on the Level 4 and Level 5 though.
    Coherency sorting doesn't not seem to be a solved problem (and that's not really a problem for MIMD cores), which can be well generalized in HW and be ok for most of the cases.
    Imagination's point on coherency sorting for better memory accesses is arguable too, that stuff can be handled by better memory requests coalescing, better caches logic, larger caches, etc.
    Making BVH building completely in HW doesn't make a lot of sense if you can do the same efficiently on SIMDs and hide the processing time in async queues (I don't see modern games with millions of triangles suffering from this).
    And the main critique for the article - there are no evidences that additional levels would bring any performance improvements, there is an evidence (real performance numbers) that current Lvl 3 works much better than Lvl 2 though.
    Making stuff complex (sorting in HW) doesn't always work. Ironically, imagination's retirement from the desktop PCs was the best proof of the statment.
     
    Lightman, tinokun, PSman1700 and 5 others like this.
  10. HLJ

    HLJ
    Regular

    Joined:
    Aug 26, 2020
    Messages:
    529
    Likes Received:
    869
    I just found much better layouts.
    Turing:
    [​IMG]

    Ampere:
    [​IMG]

    Looking at the RT cores,
    Turing:
    upload_2020-12-27_18-35-10.png
    Ampere:
    upload_2020-12-27_18-35-33.png
     
    Alexko, pharma, Lightman and 4 others like this.
  11. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,493
    Likes Received:
    474
    It depends on what you consider to be the IA. I was referring to hardware that reads the index buffer, forms primitives, and performs vertex reuse. That patent refers to a concept with a feature name called Dispatch Draw. It predated Primitive Shaders and has similarities though it's implemented very differently.
     
    Lightman likes this.
  12. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    15,134
    Likes Received:
    7,679
  13. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,058
    Likes Received:
    3,116
    Location:
    New York
    Surely he's mistaken about one triangle per leaf node :-|.

    "It can be inferred from the return data of the intersection instruction that it is a BVH4 though, i.e. a BVH tree with 4 child nodes per node and one triangle in the leaf node."
     
  14. andermans

    Newcomer

    Joined:
    Sep 11, 2020
    Messages:
    28
    Likes Received:
    43
    Why do you think one triangle per leaf node is mistaken? I can think of some disadvantages but nothing particularly huge as far as I can tell.
     
  15. pharma

    Veteran

    Joined:
    Mar 29, 2004
    Messages:
    4,891
    Likes Received:
    4,539
  16. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,058
    Likes Received:
    3,116
    Location:
    New York
    Exponential ncrease in BVH memory footprint.
     
  17. Frenetic Pony

    Regular

    Joined:
    Nov 12, 2011
    Messages:
    807
    Likes Received:
    478
    Turns out it's 4 per leaf, which makes sense. (RDNA2 docs)
     
  18. andermans

    Newcomer

    Joined:
    Sep 11, 2020
    Messages:
    28
    Likes Received:
    43

    why exponential? Pretty much the majority of the BVh is just going to be the raw triangle data which would be a lower bound anyway (40 bytes for 9 floats + the triangle id). For a packing with N triangles (and assuming each box node has at least 2 children) you need N triangle nodes + N/2 box nodes + N/2/2 box nodes etc coming to N triangle nodes + N-1 box nodes.
     
    PSman1700 and Dictator like this.
  19. Malo

    Malo Yak Mechanicum
    Legend Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    8,931
    Likes Received:
    5,530
    Location:
    Pennsylvania
    Why was this brought up again? There's a dedicated thread for the HU "anti-RT" arguments. Nothing has changed and neither has all your opinions. So why complain about your beloved Nvidia again when it's going to go nowhere?
     
    BRiT likes this.
  20. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    20,511
    Likes Received:
    24,411
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...