GPU Ray Tracing Performance Comparisons [2021-2022]

Deleted member 2197 · May 26, 2022

Dictator said:
What performance bug are people encountering here with DLSS exactly? I am not sure if I have "had that" yet?

From the above Hitman 3 review; it seems to occur randomly.

Now as we said in our previous article, we encountered a DLSS glitch/bug which resulted in really low performance. We were able to replicate this by enabling/disabling Ray Tracing (and without making any changes at all to DLSS). This issue appears randomly, so we don’t know what is really causing it. Still, and if you ever experience it, you can fix it by disabling and then re-enabling DLSS.

DegustatoR · May 26, 2022

Dampf said:
Not a good look that Nvidia brags about the DLSS performance improvement so much here. This is nothing to brag about. RT enabled games should run fine without DLSS and very smooth with it on, not running like a slideshow without DLSS and running barely acceptable with it on. This is ruining the reputition of both, their cards and Raytracing in general. These effects probably run at full resolution, unecessarily so.

This is a disaster IMO. Thankfully its an older game, so this might get overlooked.

I'd say it depends on what improvement you get from using RT. Here specifically it's just doesn't worth the performance hit.
This situation isn't helped by their weird settings UI nowhere in which it is explained that "reflections quality" option works with RT reflections. So most people just turn on RT and get their 30 fps with it as a result - while it's perfectly fine to use RT even with "low" reflections quality here IMO. The biggest visual impact of reflections on transparent glass surfaces is there on "low".

pharma said:
It's difficult to know if DLSS/P is functioning normally since DLSS glitch/bug that results in very low performance is still evident in this game.

I haven't seen any issues with DLSS functioning normally here. DLSS scaling is in line with what you'd expect from internal resolution changes.

Samwell · May 26, 2022

It's an Intel RT implementation. Hopefully soon we will get some Intel RT Benchmarks. Would be very interesting, whether they also loose so much performance with RT in Hitman.

DegustatoR · May 26, 2022

Samwell said:
New It's an Intel RT implementation.

I'm doubtful that IOI had any Intel RT h/w so...

DavidGraham · May 28, 2022

The 3080 is 50% faster than 6800XT in Hitman 3, across all resolutions.

https://www.computerbase.de/2022-05/hitman-3-raytracing-dlss-test/2/

Scott_Arm · May 29, 2022

DegustatoR said:
I'm doubtful that IOI had any Intel RT h/w so...

The gpus are out in China or something, aren't they? Not impossible that they could have samples, especially if intel is looking for good support on whenever they actually launch the thing to the world.

DegustatoR · Jun 14, 2022

Serious Sam Ray Traced: Alter Haudegen wird zum neuen Grafikkarten-Killer - auch mit DLSS 2.4 und FSR 1.0

Auf dieser Seite geht es ans Eingemachte: Wie läuft Sam "Serious" Stone mit Raytracing und inwiefern helfen DLSS sowie FSR 1.0?

www.pcgameshardware.de

Phantom88 · Jun 14, 2022

^ wooooow, a 3060 is faster than a 2080TI in here. And a 3090 is 261% faster. A 3070 is 72% faster. Wonder why the huge uplift

DavidGraham · Jun 16, 2022

The 3090 is 67% faster than 6950XT in Serious Sam First Encounter Path Tracing @4K.

Serious Sam Ray Traced: Alter Haudegen wird zum neuen Grafikkarten-Killer - auch mit DLSS 2.4 und FSR 1.0

Auf dieser Seite geht es ans Eingemachte: Wie läuft Sam "Serious" Stone mit Raytracing und inwiefern helfen DLSS sowie FSR 1.0?

www.pcgameshardware.de

The 3090 is 77% faster than 6950XT in Doom Path Tracing @4K.

Doom Ray Traced - Shooter-Großvater mit High-Tech-Infusion auf dem Benchmark-Prüfstand

Auf dieser Seite thematisieren wir das Upsacaling mit Doom Ray Traced, liefern ihnen entsprechende Benchmarks und runden das Ganze mit einem Fazit zur Mod ab.

www.pcgameshardware.de

What's noticeable in these mods, is the poor performance for Turing GPUs, they really appear to be doing something wrong with Turing, the 3070 is more than 65% faster than 2080Ti, which should never happen!

T2098 · Jun 16, 2022

DavidGraham said:
The 3090 is 67% faster than 6950XT in Serious Sam First Encounter Path Tracing @4K.

Serious Sam Ray Traced: Alter Haudegen wird zum neuen Grafikkarten-Killer - auch mit DLSS 2.4 und FSR 1.0

Auf dieser Seite geht es ans Eingemachte: Wie läuft Sam "Serious" Stone mit Raytracing und inwiefern helfen DLSS sowie FSR 1.0?

www.pcgameshardware.de

The 3090 is 77% faster than 6950XT in Doom Path Tracing @4K.

Doom Ray Traced - Shooter-Großvater mit High-Tech-Infusion auf dem Benchmark-Prüfstand

Auf dieser Seite thematisieren wir das Upsacaling mit Doom Ray Traced, liefern ihnen entsprechende Benchmarks und runden das Ganze mit einem Fazit zur Mod ab.

www.pcgameshardware.de

What's noticeable in these mods, is the poor performance for Turing GPUs, they really appear to be doing something wrong with Turing, the 3070 is more than 65% faster than 2080Ti, which should never happen!

Nvidia did claim Ampere was almost twice as fast as Turing in the optimal case when doing RT, and there's an awful lot of Ampere-specific enhancements that might come into play with a legacy title 'converted' to path tracing that may not elsewhere.

- GPU Accelerated RT Motion Blur
- Massive FP32 increase
- RT + Compute concurrency

These path traced legacy games tend to have a much higher proportion of their frame time spent on RT vs legacy rasterization, so a lot more benefit to be seen.

In a modern AAA title that might only splash a couple of RT 'effects' on top of a render pipeline that's mostly legacy rasterization, if you are only spending let's say 25% of your frame time on RT, then even if you make RT infinitely fast (0ms addition to your frame time) then you still only get ~25% more FPS.

Quake 2 RTX on my 3080 spends fully 30-35% of its frametime on denoising (which I believe leans heavily on FP32, giving Ampere a significant advantage) and who knows, they may have managed to take advantage of RT+compute concurrency too.

DavidGraham · Jun 16, 2022

T2098 said:
These path traced legacy games tend to have a much higher proportion of their frame time spent on RT vs legacy rasterization, so a lot more benefit to be seen.

The 2080Ti is so far behind the 6900XT and 6800XT as well, which just doesn't happen elsewhere. Certainly not the norm when it comes to either Quake 2 or Minecrafe path tracing, where the 2080Ti dominates RDNA2 GPUs by a significant margin. Something is wrong in these two mods (Doom and Serious Sam) which causes Turing GPUs to exhibit awful performance.

T2098 · Jun 17, 2022

DavidGraham said:
The 2080Ti is so far behind the 6900XT and 6800XT as well, which just doesn't happen elsewhere. Certainly not the norm when it comes to either Quake 2 or Minecraft path tracing, where the 2080Ti dominates RDNA2 GPUs by a significant margin. Something is wrong in these two mods (Doom and Serious Sam) which causes Turing GPUs to exhibit awful performance.

That's a good point; there's clearly something different about the way those two titles are implemented, although it may help to look at it from the opposite viewpoint. Don't ask 'why did they nerf poor Turing', but it may be something that they are leveraging that both RDNA2 and Ampere happen to be better at than Turing.

One such thing I was able to find:

"Ampere RT doubled ray-triangle performance vs Turing, which according to some evidence suggest it runs 2 ray-box and 1 ray-triangle per clock.
Ampere RT now does 2 ops/clk for both.
RDNA 2 RT has 4 ray-box and 1 ray-triangle per clock. Thus, RDNA2 has 2x faster ray-box, but only 50% ray-triangle vs Ampere."

I wasn't able to find concrete numbers from NVidia about Turing's ray-box rate per clock, but RDNA1 is definitely 4x the ray-triangle rate.
If Doom or Serious Sam were heavily leveraging ray-box intersections, given that RDNA2 is 4x faster at those per clock than Turing, the results may make a bit more sense.

PSman1700 · Jun 17, 2022

Turing’s more performant than rdna2 in RT.

trinibwoy · Jun 17, 2022

T2098 said:
"Ampere RT doubled ray-triangle performance vs Turing, which according to some evidence suggest it runs 2 ray-box and 1 ray-triangle per clock.”

Out of curiosity what’s the evidence for 2 box intersections per clock? Nvidia’s RT patents refer to 8 boxes per clock. Either way the BVH should be relatively simple given the low density geometry and low detail environment being rendered. Is the BVH build being done on the CPU or GPU?

It doesn’t really make sense for RDNA 2 to gain ground on Turing in raytraced versions of old games. RT should be an even greater percentage of the workload in such games given the simple geometry and shading and should therefore play to Turing’s RT strengths.

Jawed · Jun 17, 2022

EDIT: re-linked an article that was already being discussed, thinking it was for a different game - sigh

pjbliverpool · Jun 17, 2022

T2098 said:
That's a good point; there's clearly something different about the way those two titles are implemented, although it may help to look at it from the opposite viewpoint. Don't ask 'why did they nerf poor Turing', but it may be something that they are leveraging that both RDNA2 and Ampere happen to be better at than Turing.

One such thing I was able to find:

"Ampere RT doubled ray-triangle performance vs Turing, which according to some evidence suggest it runs 2 ray-box and 1 ray-triangle per clock.
Ampere RT now does 2 ops/clk for both.
RDNA 2 RT has 4 ray-box and 1 ray-triangle per clock. Thus, RDNA2 has 2x faster ray-box, but only 50% ray-triangle vs Ampere."

I wasn't able to find concrete numbers from NVidia about Turing's ray-box rate per clock, but RDNA1 is definitely 4x the ray-triangle rate.
If Doom or Serious Sam were heavily leveraging ray-box intersections, given that RDNA2 is 4x faster at those per clock than Turing, the results may make a bit more sense.

This reminds me of the good old days before RDNA2 launched when we where all comparing the XSXs advertised intersection rates (or some other paper RT metric) and some were concluding it was going to run rings around any Turing based GPU.

Ahhh good times.

Kaotik · Jun 17, 2022

PSman1700 said:
Turing’s more performant than rdna2 in RT.

It's blurts like these that give the bad rep. Turing is more performant than RDNA2 in RT in some or most cases, but it doesn't mean it's outright more performant in every RT scenario.

T2098 · Jun 17, 2022

trinibwoy said:
Out of curiosity what’s the evidence for 2 box intersections per clock? Nvidia’s RT patents refer to 8 boxes per clock. Either way the BVH should be relatively simple given the low density geometry and low detail environment being rendered. Is the BVH build being done on the CPU or GPU?

It doesn’t really make sense for RDNA 2 to gain ground on Turing in raytraced versions of old games. RT should be an even greater percentage of the workload in such games given the simple geometry and shading and should therefore play to Turing’s RT strengths.

I've been trying to find details, but NV is pretty tight lipped in their documentation:

https://images.nvidia.com/aem-dam/en-zz/Solutions/geforce/ampere/pdf/NVIDIA-ampere-GA102-GPU-Architecture-Whitepaper-V1.pdf

Starting on Page 17 they talk about the doubled ray-triangle intersection rate and are very careful not to mention ray-box intersection rate increases anywhere, which leads me to believe the ray-box intersection rate is the same as Turing. Still trying to find concrete information on Turing's rate for each, though.

They also talk about Ampere's concurrent compute and RT capability on the next page; specifically talking about the case where you do RT + denoising concurrently. Looking at the profile of the frametime for Quake2 RTX I posted above, if an engine were to leverage that concurrent compute properly, that would be an enormous speedup given that there looked to be at least 3ms each of denoising and RT for a 12ms frame. According to Nvidia, Turing simply isn't capable of this. They show their best case example of the speedup from concurrent compute/RT and it's nearly double. If RDNA2 is capable of the same thing, it's not too hard to envision a scenario where we get the results like in Doom/Serious Sam.

DegustatoR · Jun 17, 2022

T2098 said:
They also talk about Ampere's concurrent compute and RT capability on the next page; specifically talking about the case where you do RT + denoising concurrently. Looking at the profile of the frametime for Quake2 RTX I posted above, if an engine were to leverage that concurrent compute properly, that would be an enormous speedup given that there looked to be at least 3ms each of denoising and RT for a 12ms frame. According to Nvidia, Turing simply isn't capable of this. They show their best case example of the speedup from concurrent compute/RT and it's nearly double. If RDNA2 is capable of the same thing, it's not too hard to envision a scenario where we get the results like in Doom/Serious Sam.

Turing is fully capable of running compute concurrently with RT.
What Ampere adds is the ability to run compute, tensor ops and RT concurrently.
I'm assuming that Nvidia is talking about denoising done on tensor cores in this example. But this is hardly relevant to any real world usage scenario because no one is running denoising on tensor cores due to compatibility issues which would arise from that.

T2098 · Jun 17, 2022

DegustatoR said:
Turing is fully capable of running compute concurrently with RT.
What Ampere adds is the ability to run compute, tensor ops and RT concurrently.
I'm assuming that Nvidia is talking about denoising done on tensor cores in this example. But this is hardly relevant to any real world usage scenario because no one is running denoising on tensor cores due to compatibility issues which would arise from that.

That's an odd one as Nvidia claims multiple times in the Ampere whitepaper that Turing cannot.
Table 4 on page 18 pretty clearly says "Concurrent RT and Shading: NO" for Turing.

https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf

GPU Ray Tracing Performance Comparisons [2021-2022]

Deleted member 2197

Guest

DegustatoR

Samwell

DegustatoR

DavidGraham

Scott_Arm

DegustatoR

Serious Sam Ray Traced: Alter Haudegen wird zum neuen Grafikkarten-Killer - auch mit DLSS 2.4 und FSR 1.0

Phantom88

DavidGraham

Serious Sam Ray Traced: Alter Haudegen wird zum neuen Grafikkarten-Killer - auch mit DLSS 2.4 und FSR 1.0

Doom Ray Traced - Shooter-Großvater mit High-Tech-Infusion auf dem Benchmark-Prüfstand

T2098

Serious Sam Ray Traced: Alter Haudegen wird zum neuen Grafikkarten-Killer - auch mit DLSS 2.4 und FSR 1.0

Doom Ray Traced - Shooter-Großvater mit High-Tech-Infusion auf dem Benchmark-Prüfstand

Attachments

DavidGraham

T2098

Attachments

PSman1700

trinibwoy

Meh

Jawed

pjbliverpool

B3D Scallywag

Kaotik

Drunk Member

T2098

DegustatoR

T2098

Attachments

Similar threads