GPU Ray Tracing Performance Comparisons [2021-2022]

Ok I went and checked out Darktide, I was slightly cpu bound with dlss on at about 93% gpu so turned off dlss for the tests in the hub.

All none RT settings on high
Native 1440p No RT 83fps, Low RTGI 70fps, High RTGI 71fps.

Yeh I know I double checked but still got numbers basically identical. I suspect there wont be a performance hit until I can find some dynamic lights. I tried in mission but I struggled to find something I could get an apples to apples comparison on.

Overall the improvement to performance with both RT effects on is massive, I played through a mission with dlss quality rtgi hi rt reflections low and I dropped below 60 once when a charger kinda sent my camera crazy. Completely playable with RT on for me now. Most of the time it was comfortably above 60. Going to have to find some time to play this now.

darktide 1440p highrtgi in mission.jpg
 
^^ there are no lower settings. Its either on or off for ray tracing. The GI is the most expensive one, the other ones much lower. Its also the most visually impressive, depending on the area of course. Then shadows make the largest visual impact. AO has near zero performance hit from i saw. I would say outside of cities, in the nature, GI and shadows make the most visual impact. In the cities, AO might be more noticeable, but i didnt bother to check it individually.

The performance in novigrad and other cities is atrocious.



AO vs No AO and no shadows


I would not replay this game unless you have a 4090, it just runs too bad
when I first confronted the beast in the Blood and Wine expansion, I was awed by the lighting inside the building where the fight takes place. I was equally awed by some other moments in the game where lighting looked so close to real life.

On another note, has anyone tried Portal RTX in Vulkan mode with an nVidia card:?: Having Vulkan on is the only way for now to get it working on my computer's GPU, but they say that even on nVidia cards the game is graphics glitches galore when running on Vulkan. I am interested in that game, it'd be my first time playing a Portal game.
 
When the fps is that low, it doesn't mean much. The most shocking part is the 3090 Ti at 10fps and the 4090 at 45.
I dunno. I am still impressed with how much AMD has improved despite being a generation behind in performance due to not having all the fancy additions Nvidia has(and obviously starting a generation later with RT). Eventually they will get to where Nvidia is although by that point Nvidia will be even higher.

For me though I am not looking to compare them but instead will consider the point when both vendors reach a point where RT performance is fully doable in the industry at large. That's when full adoption of these kinds of technologies start. In that sense AMD is making good progress
 
In addition to my previous post, @Dictator seems to think a big part of Portals performance on AMD isn't even on the AMD side but rather how it's not optimized in the first place beyond Nvidia GPUs. So it's probably not even a fair comparison

How do you optimize Pathtracing for AMD? This is just pure brute force and AMD has a huge amount of less cores than nVidia. Alone the RT Cores doing more work and they are doing it faster than AMDs compute units... And these can run concurrently with the Cuda Cores on nVidia hardware. Then there is the better BVH compression, better async functionality, better overall compute architecture etc...
 
How do you optimize Pathtracing for AMD? This is just pure brute force and AMD has a huge amount of less cores than nVidia. Alone the RT Cores doing more work and they are doing it faster than AMDs compute units... And these can run concurrently with the Cuda Cores on nVidia hardware. Then there is the better BVH compression, better async functionality, better overall compute architecture etc...
I don't know. I am just parroting Alex. He would know a lot more than I do about this kind of thing.
 
How do you optimize Pathtracing for AMD? This is just pure brute force and AMD has a huge amount of less cores than nVidia. Alone the RT Cores doing more work and they are doing it faster than AMDs compute units... And these can run concurrently with the Cuda Cores on nVidia hardware. Then there is the better BVH compression, better async functionality, better overall compute architecture etc...

The RESTIR stuff that Portal is using is also very shader heavy. It’s not all about casting rays. It’s very likely that a bit of optimization would greatly benefit RDNA.
 
The RESTIR stuff that Portal is using is also very shader heavy. It’s not all about casting rays. It’s very likely that a bit of optimization would greatly benefit RDNA.
In other path traced games like Minecraft RTX/Quake 2 RTX, the 3090Ti is about 40% to 50% faster than 7900XTX.

And considering the 3090Ti scores 10fps here, I think even after optimizations the 7900XTX would go from 6 fps to maybe 7 fps? The 6950XT would definitely see the greatest benefits here, and would go from sub 1 fps to maybe 3 or 4 fps?

Portal RTX: 4K native
6950XT: 0.4
7900XTX: 6
3090Ti: 10
4080: 30
4090: 45
 
In other path traced games like Minecraft RTX/Quake 2 RTX, the 3090Ti is about 40% to 50% faster than 7900XTX.

And considering the 3090Ti scores 10fps here, I think even after optimizations the 7900XTX would go from 6 fps to maybe 7 fps? The 6950XT would definitely see the greatest benefits here, and would go from sub 1 fps to maybe 3 or 4 fps?

Yeah we’re talking percentages here. Clearly the absolute fps numbers are pointless on most cards, even from Nvidia.
 
Yeah we’re talking percentages here. Clearly the absolute fps numbers are pointless on most cards, even from Nvidia.
Yeah, but percentages at native 4K none the less, and since DLSS is essential for the experience here, I guess a small percentage at native 4K would yield much bigger gains at native 1080p (DLSS P) or 1440p (DlSS Q).
 
Witcher 3 with just RTXGI on is in the over 200% range. From 119 fps (GPU limit) to 28 fps.

Curiosity got the better of me. The first scene in the game is a good RT test and I measured the following frametimes on my 3090 @ 3840x1600.

DX12 RT off: 8.1ms
GI: +14.7ms
AO +2.3ms
Shadows: +4.3ms
Reflections: +4.3ms

The GI cost includes all of the fixed RT overhead of BVH construction etc but it is still very, very high. In the OG RTXGI presentation a 2080 Ti needed 2.5ms to trace and update 16,000 probes with 144 rays per probe which is pretty intense. Something is truly borked with RTXGI on PC in TW3.

rtxgi-perf.PNG
 
Curiosity got the better of me. The first scene in the game is a good RT test and I measured the following frametimes on my 3090 @ 3840x1600.

DX12 RT off: 8.1ms
GI: +14.7ms
AO +2.3ms
Shadows: +4.3ms
Reflections: +4.3ms

The GI cost includes all of the fixed RT overhead of BVH construction etc but it is still very, very high. In the OG RTXGI presentation a 2080 Ti needed 2.5ms to trace and update 16,000 probes with 144 rays per probe which is pretty intense. Something is truly borked with RTXGI on PC in TW3.

View attachment 7880
Thank you for measuring in ms, that is indeed very helpful. Great analysis.
Over 14 ms just for GI is bonkers. For comparison: Lumen on Epic scalability has a budget of 8 ms for high quality GI and reflections. On high scalability, it's half of that.

However, it is apparently not borked as CDPR is not listing this as a bug despite everyone telling them how much RT performance sucks. Sadly, this seems to be expected performance from Nvidia and CDPR.
 
Curiosity got the better of me. The first scene in the game is a good RT test and I measured the following frametimes on my 3090 @ 3840x1600.

DX12 RT off: 8.1ms
GI: +14.7ms
AO +2.3ms
Shadows: +4.3ms
Reflections: +4.3ms

The GI cost includes all of the fixed RT overhead of BVH construction etc but it is still very, very high. In the OG RTXGI presentation a 2080 Ti needed 2.5ms to trace and update 16,000 probes with 144 rays per probe which is pretty intense. Something is truly borked with RTXGI on PC in TW3.

View attachment 7880
I am going Out on a limb Here and am gonna say the GI cost is normal actually but your number there in milliseconds is predominantly the BVH build/refit cost on GPU which is probably huge in tw3.
If you look at the RT shadow and RT reflection aplication distance, it is massive in tw3. It includes very expenaive things too like trees, grash and bushes with their leaves.

Perhaps not a big deal, but! Since it is DX11 on 12 - it might not use async compute... which means no hiding GPU BVH build/refit costs.

Anyone wanna run a trace to see If async compute is in use? I am getting the feeling it might be a "No".
 
I am going Out on a limb Here and am gonna say the GI cost is normal actually but your number there in milliseconds is predominantly the BVH build/refit cost on GPU which is probably huge in tw3.
If you look at the RT shadow and RT reflection aplication distance, it is massive in tw3. It includes very expenaive things too like trees, grash and bushes with their leaves.

Perhaps not a big deal, but! Since it is DX11 on 12 - it might not use async compute... which means no hiding GPU BVH build/refit costs.

Anyone wanna run a trace to see If async compute is in use? I am getting the feeling it might be a "No".

That’s a solid theory but it doesn’t explain why the cost is still so high indoors with static geo.
 
That’s a solid theory but it doesn’t explain why the cost is still so high indoors with static geo.

Edit - Ok someone else went through it with Nsight and yeah, looks like the BVH construction time is the major downside here, up to 15ms on a 5800x3d/3090.

If it weren't for that the performance would be a pretty typical "tacked on Nvidia SDK" RT stuff, not focused on optimization, but workable. Looks like CDPRs quick and dirty 11 on 12 hack has borked performance. I'm guessing the interior/exterior thing doesn't matter because it's a seamless open world (well, per map) and there's still animated characters doing stuff/foliage outside.
 
Last edited:
Since it is DX11 on 12 - it might not use async compute... which means no hiding GPU BVH build/refit costs. Anyone wanna run a trace to see If async compute is in use? I am getting the feeling it might be a "No".

BVH build takes 1.14ms on my card so it's not that. It runs async with a lot of other work on the fixed function pipeline and async compute is in full swing so that's not the problem either. According to Nsight the entire RT command list completes in under 3.2ms and that's in parallel with other graphics work.

I'm baffled as to where the extra 14ms is coming from.

nsight-tw3.PNG

Edit - Ok someone else went through it with Nsight and yeah, looks like the BVH construction time is the major downside here, up to 15ms on a 5800x3d/3090.

If it weren't for that the performance would be a pretty typical "tacked on Nvidia SDK" RT stuff, not focused on optimization, but workable. Looks like CDPRs quick and dirty 11 on 12 hack has borked performance. I'm guessing the interior/exterior thing doesn't matter because it's a seamless open world (well, per map) and there's still animated characters doing stuff/foliage outside.

Definitely not seeing 15ms build times on my card.
 
BVH build takes 1.14ms on my card so it's not that. It runs async with a lot of other work on the fixed function pipeline and async compute is in full swing so that's not the problem either. According to Nsight the entire RT command list completes in under 3.2ms and that's in parallel with other graphics work.

I'm baffled as to where the extra 14ms is coming from.

View attachment 7892



Definitely not seeing 15ms build times on my card.

Weeeird, no idea then. Got it from here but maybe they were misreading it? I dunno.
 
Back
Top