AMD Radeon RDNA2 Navi (RX 6800, 6800 XT, 6900 XT) [2020-10-28]

Discussion in 'Architecture and Products' started by BRiT, Oct 28, 2020.

  1. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,641
    Likes Received:
    6,664
    To me it reads like they're using RT hardware for tracing triangle meshes when the hardware is available, but fall back to cone-tracing of voxels when that hardware is not present. It seems like they have a distance cut-off for meshes as well where they switch to some other representation.

    Edit:
    Nope, I'm wrong. It explicitly says they don't use the RT hardware.

    This is pure software based, so any performance gains of Ampere over Turing are going to be related to parts of the architecture other than the RT cores.
     
  2. chris1515

    Legend Regular

    Joined:
    Jul 24, 2005
    Messages:
    5,968
    Likes Received:
    6,084
    Location:
    Barcelona Spain
    Yes but the part where they use mesh tracing will benefit in the future of hardware accelerated raytracing. I suppose we will begin to see later this type of engine.


    EDIT: They will not replace the current system but use hardware accelerated raytracing for better performance.

     
  3. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,641
    Likes Received:
    6,664
    So looking at that Crysis Remaster benchmark, based on the clocks they're showing (why so low?), the RTX 3090 should be 36.8 TFLOPS, the RTX 3080 should be 31 TFLOPS and the RX5700XT should be 90 TFLOPS.

    The 3090 is scoring almost exactly 4x the 5700XT which scales perfectly with TFLOPS. The 3080 is scoring 3x the 5700XT which is a little short of the 3.4x TFLOPS differential, but it's fairly close. It actually looks like this title is ALU limited.
     
  4. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,445
    Likes Received:
    3,974
    Crysis Remastered uses RT cores on NVIDIA GPUs through NVIDIA's proprietary RT Vulkan extension, which is super imposed on the DX11 path that the game is using.
     
    Scott_Arm, PSman1700, Krteq and 2 others like this.
  5. PSman1700

    Veteran Newcomer

    Joined:
    Mar 22, 2019
    Messages:
    4,264
    Likes Received:
    1,910
    Yes, there are actual use cases for the amount of compute performance. Since UE5 it seems we are moving into that direction. Much against NV, i dont actually think they are doing it wrong. But as with anything, time will tell.
     
  6. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    1,906
    Likes Received:
    1,345
    Location:
    France
    I've read on amd website what they wrote about Smart Access Memory, but I don't get where the speedup is.

    Data sent to the gpu by the cpu could do cpu=>pcie=>gpu directly ? Right now it's going to the main pc ram first ?

    What kind of transfert or instructions would benefit from that ?
     
  7. Cyan

    Cyan orange
    Legend Veteran

    Joined:
    Apr 24, 2007
    Messages:
    9,305
    Likes Received:
    2,987
    perhaps transfers to the superfast Infinity Cache, it's similar to RTX I/O but inverse, iirc, where the CPU can access DIRECTLY to part of the GPU memory, it works at a driver level and you can enable disable it depending on teh game and if games are designed to use it, this technology performs better.
     
    PSman1700 likes this.
  8. Cyan

    Cyan orange
    Legend Veteran

    Joined:
    Apr 24, 2007
    Messages:
    9,305
    Likes Received:
    2,987
    this should be taught in schools. This guy knows his stuff. He is spaniard -use subs, it's worth it- but I haven't seen or listened to a better explanation about the advantages of the new AMD GPUs -specially why the Infinity Cache is such a great idea where "slow" memories can't handle everything super fast and be efficient-. He mostly uses nVidia in his rigs, so he is not your typical suspicious biased fanboy. He has a way with words to explain it.

     
    PSman1700 likes this.
  9. xEx

    xEx
    Veteran Newcomer

    Joined:
    Feb 2, 2012
    Messages:
    1,054
    Likes Received:
    539
    Yeah it was interesting. But we really need the arch day to really see whats inside that die. The artistic render is beautiful but I'm very curious to see the real die. And for sure this will be a game changer for mobile but mostly APUs with the extra bandwidth, simplified design and lower power draw.

    One thing is for sure. I'm enjoying the Hitler videos of this AMD beating Nvidia thing. :lol2:
     
    Lightman, digitalwanderer and Malo like this.
  10. Ext3h

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    416
    Likes Received:
    474
    It's what you need to do after the transfer. You only had a small window for uploading via CPU with push semantics, and if you couldn't fit in there, you had to take the detour to write to CPU address space first, ending up in RAM, and then to trigger either shader or the copy engines to perform the transfer from RAM into VRAM.

    Twice the transfer size wasted in memory bandwidth on the CPU (with a good chance of cache miss on the CPU), plus the transfer size as wasted buffer space on the CPU, plus time further wasted as the shaders/copy engine are stalled by the slow PCIe bus.

    Allowing the entire VRAM to be directly accessible from the CPU eliminates that buffer in RAM, and not only for a few selected resources as with the previous 256MB, but now for all of them. Effectively this is shifting the stall in the slow PCIe bus from the GPU to the CPU, but that is actually mostly fine. You can afford to spare a CPU core (or even one per direction) to drive the data transfer.

    The speedup occurs only in titles which did already make use of the AMD "exclusive" host visible GPU memory pool. Only exclusive as none of the other vendors for this market did use the APIs, it's not as if it was locked away behind any proprietary extension.
     
    #510 Ext3h, Nov 2, 2020
    Last edited: Nov 2, 2020
  11. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,370
    Likes Received:
    353
    RTX 3080 offers 52-59 %* higher performance than RTX 2080 while having 69 % higher bandwidth.
    RTX 3090 offers 39-45 %* higher performance than RTX 2080 Ti while having 52 % higher bandwidth.

    The difference is not big, but when comparing product to product, bandwidth efficiency is worse for Ampere than for Turing. For many previous generations it was better when compared to the preceding one. RTX 3070 shows that the problem is not in architecture (which is in fact more bandwidth efficient), but in configuration of particular products.

    *1440p-2160k, ComputerBase
     
  12. JoeJ

    Veteran Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    1,053
    Likes Received:
    1,239
    It's probably the main difference between PC and console that is now addressed? I think Intel has this feature too (for their new discrete GPUs), and i hope we get it for any CPU / GPU vendor combination soon.

    The slow PCI data transfer forced me to implement everything on GPU itself, including things like BVH build / refit, work generation, etc. Those are small tasks that can not saturate the GPU, but doing them on CPU would require too much data transfer ending up much slower.
    (For debug purposes i may still download all data from GPU. If i do this, framerate drops to 1fps, not using GCNs shared 256MB memory feature.)
    So that allows to do some things very differently. One thing where this is very useful is doing simulations, e.g. fluid sim on GPU, rigid bodies on CPU and having proper interaction between them.
    Currently we might not want to do this because data transfer and sync within a single frame could add too much.

    Both vendors did much better than expected for me this time. I thought Moores Law is dead and stagnation is ahead, but seems we're not there yet.
    I'm surprised RDNA2 can compete Ampere with almost only half of shader cores. I don't think infinity cache alone can explain this. Maybe frontend situation has now reversed between vendors as well, but for now it looks like 1 AMD TF > 1 NV TF in general.
    I'll get me a 6800 next year to arrive at next gen. : )
     
  13. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    11,147
    Likes Received:
    1,647
    Location:
    New York
    The mystery is why did Nvidia bother with GDDR6X in the first place.
     
  14. Putas

    Regular Newcomer

    Joined:
    Nov 7, 2004
    Messages:
    488
    Likes Received:
    157
    Right, so don't say they are less bandwidth efficient when you deny it in next sentence.
    We can also make product to product (probably you wanted to use something like tiers) comparison between 3080 and 2080 Ti and see the same bandwidth utilization. Just to explain my carefulness regarding the claim "Ampere cannot utilize it's bandwidth".

    Is it possible Nvidia overshoot just because they can. But there is also the option of games today not being a good workload for GA102 cards.
     
    PSman1700 likes this.
  15. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    829
    Likes Received:
    478
    Doubling shader cores is so much more than doubling the number of FP32 ALUs.
    There is the instruction decoder/dispatcher, the register file size, the number of INT32 units, the number of load/store units, number of special functions units, number of texture units, size of L0/L1/L2 cache all of which have not doubled.
     
  16. rikrak

    Newcomer

    Joined:
    Sep 16, 2020
    Messages:
    23
    Likes Received:
    16
    Because that's just shady marketing from Nvidia's. The amount of ALUs is identical between Turing and Ampere, they just extended the integer ALUs to handle FP data. Basically going from FP32 + INT32 per cycle to
    FP32 + FP32/INT32 per cycle. That does allow you to claim that your peak FP32 throughput has doubled, but it will almost never happen in practice. Don't get me wrong, I think it was a smart move by Nvidia, modern GPU programs do need more floating point performance, but marketing it as 2x shader cores is grossly misleading.

    At the same time, Ampere's focus on FP throughput should give it an edge in many GPGPU workloads. I would love to see some comparisons between Navi 2 and Ampere for compute.
     
  17. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    11,849
    Likes Received:
    6,770
    They uh.. wanted to have the world's first 8K GPU?
     
    Lightman likes this.
  18. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    11,147
    Likes Received:
    1,647
    Location:
    New York
    They also never marketed the INT cores on Turing so that was also misleading. Essentially they undersold Turing and are overselling Ampere in comparison. However, relative to Pascal and AMDs stuff Ampere’s marketing is fine.

    Either way RDNA 2 looks to be a far more balanced architecture for gaming. Idle ALUs are no good and AMD seems to be tackling that head on with their cache implementation. Will be interesting to see how it holds up in a broader set of games in 3rd party reviews.
     
  19. SimBy

    Regular Newcomer

    Joined:
    Jun 21, 2008
    Messages:
    700
    Likes Received:
    391
    I may be going blind but I can't find captions.
     
  20. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,529
    Likes Received:
    477
    Location:
    Varna, Bulgaria
    It all boils down to balancing the die budget between compute and memory resources. With RDNA2, AMD put the stops on chasing the raw FLOPS numbers and shifted the budget to the memory side of the equation with the Infinity Cache. I hope it pays off in the long term.
     
    Man from Atlantis and PSman1700 like this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...