GPU Ray Tracing Performance Comparisons [2021-2022]

That's an odd one as Nvidia claims multiple times in the Ampere whitepaper that Turing cannot.
Table 4 on page 18 pretty clearly says "Concurrent RT and Shading: NO" for Turing.

I mean...

NVIDIA-GeForce-RTX-30-Tech-Session-00033_FC7A1FBBF69C4EA2982432BF5515DFFE.jpg


This spreadsheet you've attached seem to imply that Turing simply doesn't have enough FP32 compute power to run anything concurrently with RT - which also need FP32 compute to run.
 
Last edited:
I mean...

NVIDIA-GeForce-RTX-30-Tech-Session-00033_FC7A1FBBF69C4EA2982432BF5515DFFE.jpg


This spreadsheet you've attached seem to imply that Turing simply doesn't have enough FP32 compute power to run anything concurrently with RT - which also need FP32 compute to run.
I have been looking for this exact graph for so long. This is not the first time I've seen it, and I remember answering it as though Turing can do this. Good to see my memory wasn't scrambled on this.
 
Thank-you @DegustatoR and @Dampf for clearing that up; clearly I needed more coffee this morning.
Best comparison I've been able to find of ray-box vs ray-triangle rates across the 3 architectures as a mea culpa... although I don't know where the original source data is from.

Looks like the information I found earlier was incorrect about the ratio of ray-box to ray-tri rates for Turing and Ampere, although it does appear that Ampere did double the ray-triangle rate as the Nvidia whitepaper mentioned, but not the ray-box rate.

 

Attachments

  • media_EjhhJR2UcAITXen.png
    media_EjhhJR2UcAITXen.png
    24.2 KB · Views: 27
Thank-you @DegustatoR and @Dampf for clearing that up; clearly I needed more coffee this morning.
Best comparison I've been able to find of ray-box vs ray-triangle rates across the 3 architectures as a mea culpa... although I don't know where the original source data is from.

Looks like the information I found earlier was incorrect about the ratio of ray-box to ray-tri rates for Turing and Ampere, although it does appear that Ampere did double the ray-triangle rate as the Nvidia whitepaper mentioned, but not the ray-box rate.

Not sure how these are calculated (at least triangle intersection), but if so, shows why XSX can't get away from PS5 in RT performance. Really just a 15-18% difference.
 
Thank-you @DegustatoR and @Dampf for clearing that up; clearly I needed more coffee this morning.
Best comparison I've been able to find of ray-box vs ray-triangle rates across the 3 architectures as a mea culpa... although I don't know where the original source data is from.

Looks like the information I found earlier was incorrect about the ratio of ray-box to ray-tri rates for Turing and Ampere, although it does appear that Ampere did double the ray-triangle rate as the Nvidia whitepaper mentioned, but not the ray-box rate.


That massive difference between Navi21 and the consoles.
 
Nvidia did claim Ampere was almost twice as fast as Turing in the optimal case when doing RT, and there's an awful lot of Ampere-specific enhancements that might come into play with a legacy title 'converted' to path tracing that may not elsewhere.

- GPU Accelerated RT Motion Blur
- Massive FP32 increase
- RT + Compute concurrency

These path traced legacy games tend to have a much higher proportion of their frame time spent on RT vs legacy rasterization, so a lot more benefit to be seen.

In a modern AAA title that might only splash a couple of RT 'effects' on top of a render pipeline that's mostly legacy rasterization, if you are only spending let's say 25% of your frame time on RT, then even if you make RT infinitely fast (0ms addition to your frame time) then you still only get ~25% more FPS.

Quake 2 RTX on my 3080 spends fully 30-35% of its frametime on denoising (which I believe leans heavily on FP32, giving Ampere a significant advantage) and who knows, they may have managed to take advantage of RT+compute concurrency too.
Can anyone with a Navi21 card post a similar screenshot of the built in Q2RTX Mini-Profiler? Doesn't really matter the resolution/settings so long as you post what they are, I can capture one from my RTX 3080 to match.
I can't find a single one with a pretty comprehensive Google and YouTube search, and all of DF's videos (which DEFINITELY would include this information!) seem to be from before AMD GPUs were supported.

Looked through a half dozen articles where sites posted the basic FPS info in comparison graphs for reviews of Nvidia vs AMD cards in Q2RTX, but nobody seems to have captured/posted the frametime breakdown. Might help get to the bottom of just why Navi21 is so slow there, where it isn't in Doom/Serious Sam.
 

Attachments

  • q2rtx-3080.png
    q2rtx-3080.png
    218.7 KB · Views: 24
Can anyone with a Navi21 card post a similar screenshot of the built in Q2RTX Mini-Profiler? Doesn't really matter the resolution/settings so long as you post what they are, I can capture one from my RTX 3080 to match.
I can't find a single one with a pretty comprehensive Google and YouTube search, and all of DF's videos (which DEFINITELY would include this information!) seem to be from before AMD GPUs were supported.

Looked through a half dozen articles where sites posted the basic FPS info in comparison graphs for reviews of Nvidia vs AMD cards in Q2RTX, but nobody seems to have captured/posted the frametime breakdown. Might help get to the bottom of just why Navi21 is so slow there, where it isn't in Doom/Serious Sam.
Can it be tested with shareware demo or does it require full game? I can test it on 6800 XT if demo is enough(also, which settings?)
 
Can anyone with a Navi21 card post a similar screenshot of the built in Q2RTX Mini-Profiler? Doesn't really matter the resolution/settings so long as you post what they are, I can capture one from my RTX 3080 to match.
I can't find a single one with a pretty comprehensive Google and YouTube search, and all of DF's videos (which DEFINITELY would include this information!) seem to be from before AMD GPUs were supported.

Looked through a half dozen articles where sites posted the basic FPS info in comparison graphs for reviews of Nvidia vs AMD cards in Q2RTX, but nobody seems to have captured/posted the frametime breakdown. Might help get to the bottom of just why Navi21 is so slow there, where it isn't in Doom/Serious Sam.

 
The 3090 is 67% faster than 6950XT in Serious Sam First Encounter Path Tracing @4K.


The 3090 is 77% faster than 6950XT in Doom Path Tracing @4K.


What's noticeable in these mods, is the poor performance for Turing GPUs, they really appear to be doing something wrong with Turing, the 3070 is more than 65% faster than 2080Ti, which should never happen!

Not sure if I'm doing something wrong or the scenario PCGH used is quite different, but I'm seeing much bigger disparity starting a new game. 6800XT in low 40s while 3090 is over 110, at 2560x1080.

The 6800XT is used with 12700KF but not maxing out the power, only 200-230W, while 3090 has no issues hitting the max. power with a OG Ryzen 1600. There are also some glitches on 6800XT in the opening cinematic for some reason and the game is quite laggy on controls even with 3090.
 
Can anyone with a Navi21 card post a similar screenshot of the built in Q2RTX Mini-Profiler? Doesn't really matter the resolution/settings so long as you post what they are, I can capture one from my RTX 3080 to match.
I can't find a single one with a pretty comprehensive Google and YouTube search, and all of DF's videos (which DEFINITELY would include this information!) seem to be from before AMD GPUs were supported.

Looked through a half dozen articles where sites posted the basic FPS info in comparison graphs for reviews of Nvidia vs AMD cards in Q2RTX, but nobody seems to have captured/posted the frametime breakdown. Might help get to the bottom of just why Navi21 is so slow there, where it isn't in Doom/Serious Sam.
You can use my video to compare. I just did rerun with the same settings and there is no difference in FPS compared to the original driver when I recorded that video.
 
Last edited:
Looks like the information I found earlier was incorrect about the ratio of ray-box to ray-tri rates for Turing and Ampere, although it does appear that Ampere did double the ray-triangle rate as the Nvidia whitepaper mentioned, but not the ray-box rate.
The Navi21 numbers are an absolute theoretical that have nothing to do with realiy here, the RT cores in Navi21 are shared with TMUs, which means their actual throughput is always significantly less than their theoretical capabilities. RT cores also accelerate only a portion of the RT workloads, the rest is done on the regular shader cores which means this portion is also shared.
why Navi21 is so slow there, where it isn't in Doom/Serious Sam.
Navi21 is slow in these two games as well, it's also slow in Minecraft path tracing, it's the Turing cards that are the outliers here. Not the Navi21.
 
Last edited:
anyone with a Navi21 card post a similar screenshot of the built in Q2RTX Mini-Profiler?


 
The Navi21 numbers are an absolute theoretical that have nothing to do with realiy here, the RT cores in Navi21 are shared with TMUs, which means their actual throughput is always significantly less than their theoretical capabilities.
As I understand it, in a conventional DXR 1.0 ray tracing shader there is no texturing. So the ray accelerator hardware has the TMU addressing, fetching, caching and data paths all to itself.
 
Not sure if I'm doing something wrong or the scenario PCGH used is quite different, but I'm seeing much bigger disparity starting a new game. 6800XT in low 40s while 3090 is over 110, at 2560x1080.

The 6800XT is used with 12700KF but not maxing out the power, only 200-230W, while 3090 has no issues hitting the max. power with a OG Ryzen 1600. There are also some glitches on 6800XT in the opening cinematic for some reason and the game is quite laggy on controls even with 3090.

Going back to it I realize that they are using FSRQ on AMD and DLSSQ on nvidia for their benchmarks and not just showing the differences on 3090 with different upscaling solutions.

The 6800XT numbers then line up, surprisingly the FSRUQ itself gives a huge 60-70% increase. However, the nvidia numbers are still strange, I get around 140fps without DLSS, and with DLSSQ it's over 200. This is at 1920x1080.
 
So, I remember speculation during the Turing generation here about whether and when RT only games would start to appear. We're now getting close to the 2nd generation after Turing (so 3rd generation RT hardware on NV's side) and the only game requiring RT (that I'm aware of) is an enhanced edition of a game from 2019 (development started in 2014) which also has a version that doesn't require RT. So technically you don't need RT to play the game but you do need RT to play that version of the game.

I do wonder now whether or not we'll see any AAA developer even attempt a game that requires RT until the next console generation as AAA game revenue and budgeting is so reliant on consoles and an RT only game on console is likely going to look like "last gen." game with RT, like Metro: Exodus. IE - we may not see a AAA developer attempt an RT only game until somewhere in the 2025-2030 timeframe (potential new console generation). More likely towards the latter part of that as I'm not currently aware of any AAA game in development that will require RT. Obviously, that doesn't mean there isn't one as I don't track all AAA games in development.

I do wonder if a non-AAA developer who might be less reliant on console gaming income would be more inclined to attempt a game that requires RT hardware in order to run, but then non-AAA developers are also more reliant on making their games available to the widest possible spectrum of hardware in order to attempt to get as many buyers as they can considering their limited consumer reach compared to AAA devs/publishers.

For me, that seems unfortunate as I'm really interested to see what a game would look like if it was crafted from the ground up with no requirement for any form of legacy lighting. I don't care about reflections nearly as much as I do about lighting, so I don't really care if they would still use legacy reflection techniques to save performance. BTW - when I reference RT only for a game, that's how I'm thinking about it. All the lighting in game done via hardware accelerated RT, but not reflections. Other people might also have reflections be included in that. :) Reflections might be more difficult to pull off since not only are they more immediately noticeable, but it's also much easier to immediately notice how they don't look right if there are multiple reflective surfaces and reflections of reflective surfaces don't themselves show reflections. If I'm going to be seeing jarring (to me) discontinuities in the presentation of reflections, I'll personally just disable them or reduce their quality (IE - non-RT reflections) so that performance can go towards something more important to me, lighting.

Metro: Exodus while giving some hints isn't a great showcase for that due to the low geometric detail of the game combined with uneven use of the global RT lighting in game. For example, the lack of cinematic lighting compared to the base game meant the lighting in the game looked natural due to the global RT lighting but also flat in cinematic scenes due to lack of any attempt at cinematic style lighting. This was most noticeable on character faces during conversations where they might often not be in direct lighting or even be in shadow meaning it was harder to see facial detail or expressions. This is something I expect the art director and lighting teams will address once they have to contend with an RT only lighting path, and I'm really excited to see what they do here. And then the low geometric detail combined with fully RT'd lighting is itself jarring as that exposes just how low poly everything is and thus reduces the graphical impact of the game due to how RT lighting makes it far more noticeable.

I suspect that 4A Games might be the only AAA developer (borderline high AA developer in terms of budget) might be the only one to attempt a game that will require RT in order to have fully global RT lighting and visuals designed from the ground up for it. Although I do wonder if their need to sell well on consoles will make them second guess themselves on that. After all, will it be able to visually match and distinguish itself from the competition on consoles who would be able to dedicate more of their performance budget on other potentially more noticeable graphical improvements?

In many ways, it's unfortunate that the current generation of consoles is basically just using first generation RT hardware which will limit the ability of developers to create a game from the ground up that relies on global RT illumination. OTOH - it's probably still better than no RT support as even the limited RT hardware acceleration that the current gen consoles have is enough to at least allow developers to experiment with limited RT effects which can only help when the next generation of consoles come out and they can potentially start to think about creating a game centered around hardware accelerated global RT lighting combined with high density environmental detail.

Regards,
SB
 
Last edited:
Seeing RDNA2 isnt even up to par with 2018's Turing in RT, we're indeed not going to see many RT based games this generation unless we see another Crysis happening. Its practically the same story for AI acceleration, as i like to think we can do so much more with the tech than reconstruction. Also, CPU power, UE5 is so far showing to be a true CPU-hog, we have seen serious IPC improvements after Zen2 (alder lake, zen3 and soon their successors) aswell as having 12 or more cores to play with (efficiency cores/performance cores).
We also see quite large jumps in normal raster and compute/FP32 performance, while RT obviously is the future, compute power is still needed and well north of 20+TF's of raw power (and probably double that soon) is still intresting, in special if we can combine that with nvme speeds that reach well above 20gb/s with compression.

But who knows, maybe scaling is the future. It does seem to be, even for more advanced and wider system performance deltas. MS basically is already doing that in the console space (abit offtopic but ok).
 
they really appear to be doing something wrong with Turing, the 3070 is more than 65% faster than
This can actually happen because Ampere has been tweaked to minimize perf holes that were here with Turing. Generally, one of perf optimizations for Turing was to simplify closest hit shaders for GI / Reflections since lots of materials and complex lighting in reflections can cause not just divergence, but also trashing of instructions caches, which I guess is to blame here for low perf on Turing.
 
Back
Top