GPU Ray Tracing Performance Comparisons [2021] *spawn*

Discussion in 'Architecture and Products' started by DavidGraham, Mar 29, 2021.

  1. HLJ

    HLJ
    Regular

    Joined:
    Aug 26, 2020
    Messages:
    529
    Likes Received:
    869
    I too have problems with FPS games like CoD or BF due to being a veteran so I avoid games trying to be "real" (but actully being arcade).
    One FPS (more mil-sim actually) I do kinda enjoy is ARMA (with 3rd person disabled servers)...the closest I have seen to "real shoting/damage implementation".

    Hence I avoid games trying to be "realistic"..and go for games like CP2077 or Warhammer 40K...unless I am doing a mil-flight-sim.

    Shooters on rails bore me to death...autoaim/assist is a pestilence...and I hate controllers with a vengance ;)
     
    PSman1700 and JoeJ like this.
  2. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,242
    Likes Received:
    3,405
    They are more efficient on simple and single threaded wotkloads but these aren't a typical thing in GPU compute since you generally use it when you have a massive parallel workload. Thus this efficiency is more about graphics and less about compute where GCN may still end up being more efficient.

    And it's 32 and 64 wide, not 16 and 32. I don't think there were any research on how WGPs compare to CUs when running old code but I'd expect them to be less efficient if only because the dCU mode isn't their native mode of operation. And in case of RVII vs 6900XT specifically memory bandwidth will play a huge role for RVII's advantage. Compute workloads don't do too well on IC.
     
    PSman1700 likes this.
  3. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,213
    F1 2021 uses light RT for shadows and reflections, RT shadows in the game are of low resolution, which causes shimmering, the distance to the shadows appears low. RT reflections are rendered using 1/4 which makes them blurry and flickery.

    Despite this, RDNA2 loses more performance with RT than even Turing, RT reflections incur a ~ 35% penalty on the 2070 Super and 3080, but incur a 42% hit on the 6800XT, RT shadows incur an 8% penalty on the 2070 Super, and 5% penalty on the 3080, while the 6800XT takes a 14% hit.

    In the end, the 3090 is 20% faster than 6900XT @4K.
    https://www.computerbase.de/2021-07/f1-2021-benchmark-test/3/#diagramm-f1-2021-3840-2160-raytracing
     
    pharma and PSman1700 like this.
  4. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,213
    Lightman and PSman1700 like this.
  5. Krteq

    Newcomer

    Joined:
    May 5, 2020
    Messages:
    149
    Likes Received:
    263
    CB updated their review with results with 21.7.1 driver

    CB.de - F1 2021 im Test: Benchmarks in Full HD, WQHD sowie UHD, Frametimes und der Adrenalin 21.7.1 (Update)
     
    Lightman and pharma like this.
  6. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Lightman and pharma like this.
  7. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,242
    Likes Received:
    3,405
    My guess is that most gains in RT workloads on RDNA2 are down to IC data management and devs won't do them since there is no IC anywhere but in RDNA2 PC GPUs. There's also no way for devs to do it through regular APIs either.
     
    Lightman and PSman1700 like this.
  8. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    When I asked roughly around the RDNA2 launch timeframe, AMD said, there was no mechanism to explicitly manage the ∞$.
     
  9. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,242
    Likes Received:
    3,405
    I'm sure that they can do this in drivers.
     
  10. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    The gains aren't in RT workloads, the driver improves F1 2021 performance even more without RT.
     
  11. chris1515

    Legend

    Joined:
    Jul 24, 2005
    Messages:
    7,157
    Likes Received:
    7,965
    Location:
    Barcelona Spain
    For @JoeJ Frosbite next gen solution uses surfel and hardware raytracing acceleration. They will give a presentation at SIGGRAPH 2021

    https://advances.realtimerendering.com/s2021/index.html

    Surfel

    [​IMG]

     
    #631 chris1515, Jul 16, 2021
    Last edited: Jul 16, 2021
    milk, Kej, JoeJ and 11 others like this.
  12. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,058
    Likes Received:
    3,116
    Location:
    New York
    Oh nice, it's a Lumen vs GIBS showdown.
     
    milk and Lightman like this.
  13. Lurkmass

    Regular

    Joined:
    Mar 3, 2020
    Messages:
    565
    Likes Received:
    711
    Some notes gathered ...

    Common:

    Rebuilding your TLAS is virtually always a good idea.
    Ray flags FORCE_OPAQUE, ACCEPT_FIRST_HIT_AND_END_SEARCH, and SKIP_PROCEDURAL_PRIMITIVES can help accelerate traversal of the acceleration structure.

    AMD:

    Ray flags involving traversal can change the traversal programs. Using these flags will allow the compiler to generate the optimal traversal program for the hardware.
    General consensus is that building the acceleration structure is slow so use the build flag PREFER_ FAST_ TRACE as much as possible for both static and low deformation dynamic geometry. Avoid including high deformation dynamic geometry to the acceleration structure. Static geometry will give the best performance since you can build the highest quality acceleration structure and it never needs to be rebuilt or updated.
    Avoid simultaneously tracing multiple ray query objects in the shaders.

    NV:

    Traversal is implied to be implemented as a state machine in the hardware. Using related ray flags will let the hardware change into a more optimal state for traversal.
    Using the build flag ALLOW_COMPACTION is a hint for the driver to apply compression to the acceleration structures.
     
    Lightman, Krteq, BRiT and 1 other person like this.
  14. techuse

    Veteran

    Joined:
    Feb 19, 2013
    Messages:
    1,426
    Likes Received:
    909
  15. xpea

    Regular

    Joined:
    Jun 4, 2013
    Messages:
    551
    Likes Received:
    783
    Location:
    EU-China
    Didn't know where to post this, it's Nvidia Real-time Neural Radiance Caching for Path Tracing:

    White paper:
    https://d1qx31qr3h6wln.cloudfront.net/publications/paper_4.pdf

    CUDA source code:
    https://github.com/nvlabs/tiny-cuda-nn
     
  16. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,236
    Likes Received:
    4,259
    Location:
    Guess...
    PSman1700 likes this.
  17. Rootax

    Veteran

    Joined:
    Jan 2, 2006
    Messages:
    2,401
    Likes Received:
    1,845
    Location:
    France
    So in the end, they're trading bandwidth usage / memory access for tensors cores usage, right ? If tensors are not fully used even with DLSS, it make sense, even if I didn't understand everything :D
     
  18. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,058
    Likes Received:
    3,116
    Location:
    New York
    Wow the raytraced radiance caching techniques are coming out fast and furious.

    Those fps numbers do look promising. Granted it’s on a 3090 using tensors so we won’t be seeing fully path traced games anytime soon. The paper doesn’t mention anything about dynamic BVH update cost and the test scenes are pretty small so these numbers probably aren’t representative of an actual game.

    I don’t know if Quake 2 RTX already does importance sampling or any sort of irradiance caching but it’ll be interesting to see it updated with the latest techniques.

    I understood it as trading tensor core usage for a lower number of rays and a less noisy image for the denoiser to deal with.

    The point about avoiding trips to vram when training the network was an optimization of their DL training routine, not really reducing bandwidth usage compared to the baseline renderer. The actual performance savings comes from casting fewer rays overall.
     
    PSman1700, DavidGraham and Rootax like this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...