AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

Discussion in 'Architecture and Products' started by BRiT, Oct 28, 2020.

  1. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    15,134
    Likes Received:
    7,679
    My understanding is the tensor cores only do matrix-matrix multiplication and accumulate.
     
  2. Frenetic Pony

    Regular

    Joined:
    Nov 12, 2011
    Messages:
    807
    Likes Received:
    478
    I'd assume RDNA2 is fine when it comes to deployment of neural networks. It supports quad rate int8 so while normal hw will be taken up it should go quickly enough.

    The real question is what to do with it. And actual game dev answers seem to be "animation" because graph decision making is basically what animation is about anyway, and is a huge pain to do by hand. As for image upscaling, I can see it for eliminating TAA artifacts, that's really what DLSS is good at (much less blur using TAA) but upscaling itself is a bit nonsense. You can clearly see the large amount of noise it introduces on Control's cleaner surfaces. If you wanted TAA noise you could just tweak TAA settings and post sharpening.

    And as for per task efficiency, practical application compute efficiency, for gaming, clearly goes to RDNA2 here for compute (actually for all As long as it's not waiting on hw rt (obvious Nvidia win) and isn't bottlenecked by bandwidth to main memory (it's nigh certainly deferred games and the gbuffer pass at 4k here) then the 6900xt can equal a 3090 at over a hundred less watts of power draw. All Nvidia's hypothetical compute power is useless from a gaming perspective, even for pure compute loads it's less efficient per watt, but if you're using blender or rendering video that doesn't matter. What matters is Nvidia has the faster card with more ram.

    Unfortunately for AMD right now there's games with high RT use optimized for Nvidia, and so they get clobbered in some benchmarks there, and they deserve it. Same with deferred 4k games. They should've seen the bottleneck during design, should've known they needed more bandwidth to main memory. But for whatever reason they didn't do it. And it's not like deferred rendering is going anywhere, nor like the consoles have the same limitations.

    Both vendors made design mistakes concerning gaming this generation. For now Nvidia is on top though. Of course a year from now there could easily be more Godfalls where even people's "great deal omg 3080 is the best" $700 cards can't even hit max settings. But explaining that to consumers never seems to work till after the fact.
     
  3. Rootax

    Veteran

    Joined:
    Jan 2, 2006
    Messages:
    2,400
    Likes Received:
    1,845
    Location:
    France
    Design mistakes = trade off I think. They're not dumb, they know what's up, but you have a power/price/size/be driver friendly/etc balance to find, with time constraint (releasing a product between day x and day y).
     
  4. gamervivek

    Regular

    Joined:
    Sep 13, 2008
    Messages:
    805
    Likes Received:
    320
    Location:
    india
    I'm wondering how well a 6SE 120CU part without the cache to take up all that area and using HBM2 instead, would've worked with RT. 50% more RT units than 6900XT, enough to make it on par with 3090?
     
    PSman1700 likes this.
  5. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    RDNA2 doesn’t have it but it has nothing to do with accelerating BVH building.

    It accelerates traversal in the presence of instanced geometry (e.g. building a forest by reusing the same tree many times with different poses..)
     
  6. OlegSH

    Regular

    Joined:
    Jan 10, 2010
    Messages:
    797
    Likes Received:
    1,624
    TLAS contains instances for every object of a scene, which are stored in BLASes. If different instances refer the same BLAS, that's instancing.
    Not sure why the "Instance Transform Acceleration" should refer just to instancing, it may as well be referring the instance and BLAS transformations in general.
    By accelerating AABB transformations, a lot of optimisations become possible at BLAS build time - faster refitting, better AABB alignment for geometry, etc.
     
    pharma and Dictator like this.
  7. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,112
    Location:
    New York
    My understanding is that instance transforms are done just in time during intersection testing. It’s not relevant during BVH builds because those just use the “default” orientation for each instanced object.
     
    OlegSH likes this.
  8. OlegSH

    Regular

    Joined:
    Jan 10, 2010
    Messages:
    797
    Likes Received:
    1,624
    What would happen if hardware doesn't support instance transforms?
    Following the description here - "This data structure is used in GPU memory during acceleration structure build" and "Per customer request, clarified for D3D12_RAYTRACING_INSTANCE_DESC that implementations transform rays as opposed to transforming all geometry/AABBs."
    You might be right that with HW acceleration it can happen during intersection testing, still some BVH builder assistance might be required for cases without HW acceleration.
     
    #2028 OlegSH, Dec 25, 2020
    Last edited: Dec 25, 2020
    pharma and Dictator like this.
  9. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,112
    Location:
    New York
    Yes the metadata for the orientation of each instance in world space is included in the TLAS structure. That data is provided by the application as is. No acceleration required here during BVH build.

    “This C++ struct definition is useful if generating instance data on the CPU first then uploading to the GPU.”

    The bit that seems to be accelerated on the GPU is the transformation of each individual instance based on its world space orientation during intersection testing. If AMD doesn’t have any special hardware to do that transform (either the ray or the instance) then presumably they’re doing it on the SIMDs.

    The alternative is to create unique BLAS entries for each instance during BVH build but that would likely be very wasteful.
     
    #2029 trinibwoy, Dec 25, 2020
    Last edited: Dec 25, 2020
    pjbliverpool, OlegSH and BRiT like this.
  10. OlegSH

    Regular

    Joined:
    Jan 10, 2010
    Messages:
    797
    Likes Received:
    1,624
    Yep, I thought about this variant, but probably doing transforms per ray on SIMD is cheaper, have no idea to be honest.
     
  11. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,213
    So Turing vs RDNA2 RT showdown:

    MineCraft RTX: 2080Ti is 35% faster than 6900XT
    Amid Evil RTX: 2080Ti is 45% faster than 6900XT
    Black Ops: 2080Ti is 12% faster than 6900XT
    Tomb Raider: 6900XT is 8% faster than 2080Ti
    Metro Exodus: 6900XT is 9% faster than 2080Ti
    Control: 2080Ti is equal to 6900XT
    Battlefield V: 2080Ti is equal to 6900XT

    The more ray tracing there is the faster Turing pulls ahead, confirming that Turing does indeed have better RT performance than RDNA2. I suspect the scenes WCCFTECH tested in Control, Battlefield, Metro and Tomb Raider didn't have that much ray tracing in them, allowing the 6900XT to be equal to the 2080Ti, if RT is heavily present in the scene the 2080Ti would pull ahead, just like Minecraft. I am waiting for the Digital Foundry big showdown to confirm this.

     
  12. Svensk Viking

    Regular

    Joined:
    Oct 11, 2009
    Messages:
    627
    Likes Received:
    208
    Is RDNA2 still known for having broken visuals across various games when using Raytracing? Anyway, Computerbase.de once again makes a point of RDNA2 competing better in the recently released Black Ops and Watch Dogs Legions. Black Ops even has better results in the 0,2% values on RDNA2 except from the test in 3.840 x 2.160

    https://www.computerbase.de/2020-12...itt_benchmarks_in_sieben_topaktuellen_spielen

    It might very well turn out that RDNA2 will generally always be bad at raytracing but it feels like people do put too much faith into the titles for which DXR only was optimized for Nvidia hardware, which was the only one to offer it from 2018 up until now
     
    no-X and Lightman like this.
  13. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,213
    Benches for Black Ops and Watch Dogs in Computerbase are old, using broken AMD drivers, the difference is rather large in these titles with proper drivers.

    Even the 3070 is 20% faster than 6800XT in Call Of Duty Black Ops with RT @4K.


    Watch Dogs Legion benchmarked after the AMD RT patch, the 6800XT remains slower than the 3070, while the 3080 is 37% faster @1440p and 50% faster @2160p.


    I also stress that it is very important to select scenes where RT is present in moderate to large amounts to properly test RT performance across architectures, it's not enough to generally select some random scenes and be done with it.
     
    HLJ, OlegSH, pharma and 2 others like this.
  14. Svensk Viking

    Regular

    Joined:
    Oct 11, 2009
    Messages:
    627
    Likes Received:
    208
    The Computerbase test is actually for the 6900XT and from the eighth of December, so it's actually more than two weeks more recent than that video you posted of Black Ops

    The Watch Dogs video is from the 17th December though, so nine days newer and probably a more representative test
     
    no-X and Deleted member 13524 like this.
  15. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,240
    Likes Received:
    3,397
    Both Cold War and Legion are running better than average on AMD h/w without RT and this likely skew the RT results in AMD's favor as well.
     
    PSman1700 likes this.
  16. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,213
    Cold War RT @4K: the 3090 is 66% faster than 6900XT, the 3080 is 50% faster. The 1440p results are not logical as they only have the 3090 being 18% faster than 3070. Suggesting a different bottleneck in the scene they selected.

    https://www.computerbase.de/2020-12...-in-call-of-duty-black-ops-cold-war-3840-2160

    Again, it is very important to select scenes where RT is present in moderate to large amounts to properly test RT performance across architectures, it's not enough to generally select some random scenes and be done with it.
     
    pharma, PSman1700 and Rootax like this.
  17. Biggest difference I see between these two games and e.g. Control is they have RT running on the RDNA2 consoles meaning they had to get optimizations for AMD's ray tracing units.
    In early PC implementations like Control there's only AMD RT hardware running code that was optimized for nvidia's RT units.


    I always thought a bit naive to assume RT performance in DXR is this super predictable process that will scale linearly and equally across all GPU architectures.
    I.e. "it's just plain DXR so there's no reason to believe this game whose RT implementation was co-developed by nvidia would be favoring one architecture over the other".
    I guess this is just empirical proof of that.


    Perhaps the RT performances we're seeing in Cold War and Legion is more representative of what to expect on future multiplatform titles than what we've had with designed-for-RTX titles.
    The GA102 GPUs still get a substantial advantage in RT over Navi 21 GPUs, but not on the 30+% deltas we're seeing on the older RTX titles.
     
    #2037 Deleted member 13524, Dec 25, 2020
    Last edited by a moderator: Dec 25, 2020
  18. PSman1700

    Legend

    Joined:
    Mar 22, 2019
    Messages:
    7,118
    Likes Received:
    3,090
    Yes makes sense, some seem to forget normal rendering even during RT scenes.
     
  19. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    946
    Likes Received:
    413
    The raytracing pipe is basically the OptiX pipeline without some flexibilities, parts of Nvidia's OptiX software stack was [allegedly] recyled for RTX as well.

    Remember that HLSL itself is from Nvidia, called Cg back then. Geometry shaders also stem from Nvidia. Constant Buffers come from Nvidia too.

    Tesselation and Mesh shaders can be traced to AMD in term of functionality, but the pipeline stage convention was brought forth by MS together with all the others.

    There never was something like a ISA (say from Microsoft), which the hardwares implemented, like ARM or x86. It always was opportunistic and reactive from MS and on a really high level. I don't know who failed who here. But I would prefer MS would invent an ISA actively (forward looking), which can be extended and / or optional (like SSE, AVX). Or a consortium would. Or AMDs involvement with Samsung leads to basically an establishment of a situation like x86, where multiple vendors co-develop the ISA.
     
    no-X and Deleted member 13524 like this.
  20. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    Actually, there's a similar proposition by Agner Fog for a hybrid CISC/RISC forward-compatible ISA: https://www.forwardcom.info/
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...