GPU Ray Tracing Performance Comparisons [2021-2022]

It's worth remembering that AMD with a compute-SIMD "slow" approach has equalled NVidia's dedicated-MIMD in Turing. With Ampere, NVidia gained 40% on Turing in games, but it seems likely there are no major gains to be had from "better MIMD" in Lovelace.

Are you drawing that conclusion based on mixed workloads where RT is just one consideration? For reference the 3080 and 2080 Ti have the same number of RT “cores” yet the former is 85% faster in Optix.
 
This?
View attachment 5667

Talk about finding a border case of unplayable architechture limits.
Ah, you must be new around here.

That's what DavidGraham suggested:

Nope, once you push RT workload upwards

That's what we do here at B3D: discuss what happens at the limits. If you don't like that, there's plenty of other forums.

Faulty testing for sure.
You have facts to base that conclusion on?

Is Metro Exodus:Enhanced Edition at 8K with Ultra ray traciing and no DLSS at 10.8fps:

TweakTown.com Enlarged Image

faulty testing?
 
You have facts to base that conclusion on?

Is Metro Exodus:Enhanced Edition at 8K with Ultra ray traciing and no DLSS at 10.8fps:

TweakTown.com Enlarged Image

faulty testing?
yea unlikely.
I think 8K broke it. Bottleneck differences may be breaking the camels' back sort of speak here for the nvidia card. Might be useful to use the nvidia Nsight here to see what's happening at 8K. Worth while to explore if @Clukos or another member has some time to try it. I'd be curious to see what's happening here.

Would also be curious to see an equivalent AMD one for the 6900XT. But I don't know the name of that tool and whether it's free for downloading and usage.
 
Yeah, or they used rebar as their default setup; at high resolutions it had a negative effect on 3090's in Cyberpunk.

Weeird, though from everything on here I get the impression Nvidia has somehow dropped the ball on their driver support, or some combination of such. Rebar works great on AMD and there's no "this driver works best for this game but it's not the newest drivers" there. So... while it might be "detrimental" at some point one has to start laying some blame on Nvidia for the underlying problems here.

If they'd clean up whatever issues they're having no one would have to bring up driver versions or what exact setup settings you have for what exact games. Certainly make life easier on everyone as well.
 
Weeird, though from everything on here I get the impression Nvidia has somehow dropped the ball on their driver support, or some combination of such. Rebar works great on AMD and there's no "this driver works best for this game but it's not the newest drivers" there. So... while it might be "detrimental" at some point one has to start laying some blame on Nvidia for the underlying problems here.

If they'd clean up whatever issues they're having no one would have to bring up driver versions or what exact setup settings you have for what exact games. Certainly make life easier on everyone as well.
Rebar sometimes also results in negative gains on AMD cards. For any "pure" testing they should just disable it.
 
Ah, you must be new around here.

That's what DavidGraham suggested:



That's what we do here at B3D: discuss what happens at the limits. If you don't like that, there's plenty of other forums.


You have facts to base that conclusion on?

Is Metro Exodus:Enhanced Edition at 8K with Ultra ray traciing and no DLSS at 10.8fps:

TweakTown.com Enlarged Image

faulty testing?

All you show is that AMD is better at 8K in unplayable settings, which will tell you nothing outside this specific border case.
When native 8K RT because a real option, it will not be with current SKU design or even 1st and 2nd gen RT cores.

But of course if you want to pivot AMD's lesser RT solution as "better"...this is all you really got but then you have to ignore all the real world gaming that doesn't suit that agenda.


Why-we-like-cherries--500x753.jpg
 
Last edited:
You have facts to base that conclusion on?

The small gap between the 3090 and 6900XT with RT on in TweakTown's testing.

In contrast to that, Computerbase shows that @4K with Ultra RT, the 3080 is 84% faster than the 6800XT, the 3090 stands to be even faster.
https://www.computerbase.de/2021-03...berpunk-2077-raytracing-und-dlss-in-3840-2160

PCGH shows the 3090 to be 92% faster than 6900XT @4K Ultra RT.
https://www.pcgameshardware.de/Cybe...als/Update-120-Benchmarks-Raytracing-1369667/

Frankly, Tweaktown has never stood to be a reliable source of testing for anyone for quite a long time.
 
Last edited:
I can't post a link yet, but you can see the pure RT performance in the GPSnoopy's RayTracingInVulkan demo. (you can google it)
The reality is that the 6900XT is slower than the 2080Ti. The 6900XT is only faster if the scene has very shallow BVH depth and no triangle geometry.
Not just that, here is PCGH's testing of the ART Mark RT demo, the 2080Ti remains faster than 6900XT, the 3090 is more than twice as fast.

upload_2021-4-30_19-18-22-png.5460

https://www.pcgameshardware.de/Rayt...cials/ART-Mark-Raytracing-Benchmarks-1371125/

And here is the Boundary game demo, with the same story.

q8hSyod6hfaJf3NCX5ofbZ-970-80.png.webp
.
https://www.tomshardware.com/reviews/amd-radeon-rx-6900-xt-review/3

Control, same story.

KggsJ54fA6v6CYRmPDCyqW-970-80.png.webp

https://www.tomshardware.com/reviews/amd-radeon-rx-6900-xt-review/3

COD Cold War too.

wsc2yTFn482dDSTmbnYqa7-650-80.png.webp
 
Last edited:
Not just that, here is PCGH's testing of the ART Mark RT demo, the 2080Ti remains faster than 6900XT, the 3090 is more than twice as fast.

upload_2021-4-30_19-18-22-png.5460

https://www.pcgameshardware.de/Rayt...cials/ART-Mark-Raytracing-Benchmarks-1371125/

And here is the Boundary game demo, with the same story.

q8hSyod6hfaJf3NCX5ofbZ-970-80.png.webp
.
https://www.tomshardware.com/reviews/amd-radeon-rx-6900-xt-review/3

COD Cold War, same story.

KggsJ54fA6v6CYRmPDCyqW-970-80.png.webp

https://www.tomshardware.com/reviews/amd-radeon-rx-6900-xt-review/3

Art Mark is interesting there, clearly leveraging some advantage of Amperes RT implementation over Turing that we don't typically see in games. I'd love to understand more what the difference is there and why we're not seeing it in games.
 
Art Mark is interesting there, clearly leveraging some advantage of Amperes RT implementation over Turing that we don't typically see in games. I'd love to understand more what the difference is there and why we're not seeing it in games.

I guess in games other limiting factor are more frequently involved ? Like, maybe you have 1.6-2x perfs on the pure RT part/ rt core work, but the shading part after that is not x2 so we never see the rt gain ?
 
I guess in games other limiting factor are more frequently involved ? Like, maybe you have 1.6-2x perfs on the pure RT part/ rt core work, but the shading part after that is not x2 so we never see the rt gain ?

But we usually see the performance ratio between similar GPUs (say 2080Ti and 3070) remain pretty much the same with RT off and on. I'd expect the 3070 to get relatively faster with RT on if that part of the rendering process is faster on that card.

I'm wondering if there's some feature of Ampere RT that just isn't being used by games right now that the benchmark is using. If true then I'd like to understand the likely hood of seeing that in future games.
 
I'm wondering if there's some feature of Ampere RT that just isn't being used by games right now that the benchmark is using.
Pretty sure it's not. Games have just too many other work going on on GPU, including any rasterization and compute, but also non accelerated RT tasks like BVH build/refit due to animation and streaming, denoising, shading, ray generation, and optimizations like ray binning.
What we see in games benchmarks so depends more on the overall improvement of Ampere > Turing, and RT Cores - even if they are 4 x faster - is just one factor of many. (IIRC, up to 4 x was communicated, but could be wrong.)

Personally i wonder much more about offline results. HW acceleration often shows only a net win of 2. That's suspiciously small. I think it's a result of 'missing optimizations on all ends', e.g. using really complex materials, construction of all BVH each frame, huge uploads each frame, etc.
 
A-RT setup from PCHW:
upload_2021-7-11_11-33-1.png
To hilight RT core improvement, we would want to turn TAA off and increase those settings to the max, even if resulting FPS end up 'unplayable'.
 
Art Mark is interesting there, clearly leveraging some advantage of Amperes RT implementation over Turing that we don't typically see in games.
This benchmark features perfect mirror reflections and lighting, this lighting is also visible in reflections, so pretty sure this benchmark mostly tests shading performance rather than tracing.
Perfect mirror rays are so cheap that Crytek were able to trace them efficiently in SW on last gen consoles, obviously, HW RT is still a way to go on PC for the best quality and performance.
Ampere has 2x FP32 SIMDs and 2x L1/texture bandwidth, so no wonder it works way better with shading heavy workloads. Even if there is some divergence due to materials (though all the shiny balls in this demo seem to be using the same materials), more SIMDs would mean better performance on scenes with lots of divergence.
I remember there were in-game Quake II RTX breakdowns of lighting, BVH and other passes somewhere here and all compute limited passes were close to 2x faster on 3090 vs 6900 XT
 
This benchmark features perfect mirror reflections and lighting, this lighting is also visible in reflections, so pretty sure this benchmark mostly tests shading performance rather than tracing.
Seems it even uses SM for shadows (impression from looking at settings).
Perfect mirror rays are so cheap
I assume the 50 bounces mean reflections of reflections, so that's no longer cheap. Divergence will also increase with each bounce, so it's no bad test for HW RT.
But idk if paths terminate after hitting some diffuse surface, and how long the average path really is.
 
I assume the 50 bounces mean reflections of reflections, so that's no longer cheap. Divergence will also increase with each bounce, so it's no bad test for HW RT.
It seems this benchmark is built on UE4, so one can easily check with Unreal Unlocker where this bench spends the most of time via the "stat GPU" command. Nsight profiling would be even more revealing.
 
This benchmark features perfect mirror reflections and lighting, this lighting is also visible in reflections, so pretty sure this benchmark mostly tests shading performance rather than tracing.
Perfect mirror rays are so cheap that Crytek were able to trace them efficiently in SW on last gen consoles, obviously, HW RT is still a way to go on PC for the best quality and performance.
Ampere has 2x FP32 SIMDs and 2x L1/texture bandwidth, so no wonder it works way better with shading heavy workloads. Even if there is some divergence due to materials (though all the shiny balls in this demo seem to be using the same materials), more SIMDs would mean better performance on scenes with lots of divergence.
I remember there were in-game Quake II RTX breakdowns of lighting, BVH and other passes somewhere here and all compute limited passes were close to 2x faster on 3090 vs 6900 XT
q2rtx7ek4a.png


https://forum.beyond3d.com/posts/2185240
 
Wow, seems AMDs BVH build needs some work. Also the overall loss on denoising is unexpected to me.

Related to RT Core benchmarks, Q2 RTX has (or had?) a mode with all mirror surfaces and 10 bounces. Would be ideal because denoising off.
 
Back
Top