GPU Ray Tracing Performance Comparisons [2021-2022]

So I tried disabling Motion Blur in Portal RTX on 2080Ti, no change in fps, so it's not really accelerated by the RT cores.

NVIDIA says that Portal RTX is different from Minecrafrt RTX and Quake 2 RTX in two keyways: light bounces up to four times more, and new technologies are incorporated (ReSTiR), also I think the number of materials has increased substantially, which suits Ampere and Ada hardware more.
 
Yeah I think the art direction was sacrificed for this. But I think that's kind of true with most mods for games so I'm not totally against what Remix tries to do.
 
What a joke... it seems it's been created just for marketing purpose
I don't think NVIDIA should do AMD's work for them?

AMD's performance in VulkanRT is notably lower than DXR, which is low to begin with, especially in a more complex path traced game, where even Turing struggles, Turing is significantly faster than RDNA2 in Quake 2 RTX and Minecraft RTX. And Ampere is faster than Turing too. In Portal RTX Turing falls behind Ampere so much, so it's only natural RDNA2 falls even lower.
In Minecraft RTX, the 2080Ti is 30% faster than 6900XT, the 3070 is 66% faster than 6900XT.
In Quake 2 RTX, the 2080Ti is 10% faster than 6900XT, the 3070 is 35% faster than 6900XT.

https://www.comptoir-hardware.com/a...-test-nvidia-geforce-rtx-3070-ti.html?start=5

You can follow the discussion about this in the other thread, but basically this Uber shader (which is used by Portal RTX) causes low occupancy on AMD hardware due to divergent shading and secondary ray bounces (of which Portal RTX have abudnace of, more than any other PT game so far), especially with AMD's software traversal.

Guess this ubershader doing bad on AMD is exactly in line with what I said before: ""uber-materials" cause low occupancy and divergent shading, that's why inlining imposes restrictions in practice"

It obviously includes tons of TraceRay() calls. Given that these calls are mostly for divergent secondary bounce rays, low occupancy is something I would expect on AMD hw with SW traversal.

Even 4090 spends a lion portion of time in the TraceRay() and it doesn't suffer from the low occupancy issue since traversal is in HW, and there must be an order of magnitude difference or more between something like 4090 with SER and OMMs vs 6900 XT, so 99 ms spent in TraceRay() calls on 6900 XT sounds pretty realistic to me.
 
Last edited:
So I tried disabling Motion Blur in Portal RTX on 2080Ti, no change in fps, so it's not really accelerated by the RT cores.

Motion Blur HW accel. was added only with Ampere. In case you meant this.

Does Portal RTX use it? Would be interesting to post some screenshots to show off.
 
AMD were matching 3070 performance with 6900XT in Serious Sam and Doom raytracing updates. Meanwhile the Turing analog, 2080Ti was matching 3060 only.



I've run those games through the AMD profiler along with Q2RTX and Cyberpunk, and none of them are using inline RT which seems to be the problem for AMD in this game. Though the upgrades are also not utilizing the hardware fully either with power usage stuck around 200-230W.
 
Commenting on occupancy, UberShader, shown profiler results, this is quite interesting.
On the profiler outputs Clukos has posted a page up, i've noticed occupancy is just 25% with both SER on or off.
That's really bad, but not sure if 'SM occupancy' really means what i think it does. I think it means how many workgroups are in flight to compensate for VRAM latency.

If i had to implement DXR hit shaders in software, i would have to merge all material shaders to a single huge program, with some giant switch statement on top to jump to the proper material shader per ray.
This has many unacceptable problems:
In the worst cases, each thread calls another material, so execution is serialized and our awesome 32/64 threads multiprocessor behaves like a single core processor, with the rest of cores being idle.
Register usage from the worst shader will affect all others too, reducing occupancy.

Now i don't know how this is implemented for real, and which of these problems SER aims to address. But some profiler outputs and twitter comments seemingly confirm some big inefficiency going on.

Also, the term UberShader seems weakly defined here maybe. Some might associate it to the merged materials shader i've talked about. But i think that's exactly the opposite from what it means.
Instead, i thought using an UberShader aims to solve those problems, by using a common and unified material for (almost) everything. Something like the basic PBR material for example.

Now that's all confusing, and it reminds me on the initial wondering about DXRs hit point shaders. It's flexible and unlimited, but feels inefficient and not suiting how GPUs currently work.
 
AMD were matching 3070 performance with 6900XT in Serious Sam and Doom raytracing updates. Meanwhile the Turing analog, 2080Ti was matching 3060 only.
Those were unofficial unoptimized mods, they are an outlier here, not the rule.

none of them are using inline RT which seems to be the problem for AMD in this game.
AMD recommends using and only using inline ray tracing for RDNA2 GPUs. NVIDIA doesn't invoke such recommendation as their performance is fine either way.
 
AMD recommends using and only using inline ray tracing for RDNA2 GPUs.
I remember they recommended to try both and pick the faster option. Inline is not always faster for AMD either. (Just don't ask me to look up the reference.)
I believe Intel also does not recommend using inline ray tracing for their GPUs.
Yes. Inline disables any reordering or material sorting in HW, so we should use it only with care.

To me, what makes most sense is to avoid inline tracing but use uber shader for RT. That's some compromise on reflected material fidelity, but the fastest option for any HW. I doubt AMD would take any hit from such approach.

We have already seen and discussed the cost of material fidelity with BFV.
 
We have already seen and discussed the cost of material fidelity with BFV.
That is one thing that is enjoyable in Portal RTX from my perspective: things in reflections have visually the similar quality as things in primary view. GI, material responses, reflections in their own right... I cannot wait to see more games getting to that level.
b-roll.02_45_28_52.stdxf7u.png


It is nice to see after playing Fortnite where reflections are... not very good looking in the base set up they have. I really wish Epic allowed hitlighting in Fortnite as an option.
 
We've been through this before.
And please don't try to argue with some AMD vs. NV nonsense.
Isn't that what you are bringing up with this tweet? What would you expect an AMD fan with non corroborated comments from an AMD linux driver developer to say? And even if they were verified all you need is to consider the comment source. It will be interesting to see what they have to say with the next path-tracing game. (which AMD hopefully contributes towards optimizing their hardware)

I can't believe to take his comments at face value. :whistle:
 
Last edited by a moderator:
That is one thing that is enjoyable in Portal RTX from my perspective: things in reflections have visually the similar quality as things in primary view. GI, material responses, reflections in their own right... I cannot wait to see more games getting to that level.
Yes, we want it. But maybe it's too expensive to be worth it. Depends, ofc.
But to me this uber shader goal is not specific to RT, it's the final goal of material shading research in general. Disney tries to come up with some generalized material model, NV also has related research, everybody wants it.
But besides the complexity of the physics problem, technically we also need the transition to bindless rendering, so we no longer have old school 'one shader per texture per model', but a more efficient and flexible 'one general shader can access any texture it needs' approach. That's pretty common nowadays already. For ray tracing it seems the only way at all, if you care about performance.

Notice there is no visual compromise to expect, because the rasterizer would use the same system as well ideally. If we use a visibility buffer, we have the same problem here, because a tile of threads no longer will process the same materials either.
The legacy pipeline of many materials and shaders seems a bit outdated to me, or at least inefficient.

In case we really need many different materials (which sure is the norm, but we could easily reduce it to something like 10), we can still bin pixels to materials to get coherent shading. Iirc, UE5 does it, for example.

So i ask: Why isn't this the norm fro RT as well? We could bin reflection hitpoints to materials just the same way. And if we have so many hitpoints that this isn't practical (e.g. GI), then we could easily use a simplified and uniform material for everything without any visual compromise for those blurry cases, because material details do not show.

If nothing like this is implemented for Portal RTX, then questioning poor optimization, intended or not, is justified.
But i'm just speculating here that this eventually might be the reason profiler outputs look so bad.
I don't know, and i see no point in discussing 'Portal RTX was made to only barely run on new 40 series high end', etc.
The interesting question is: Could it run better, well enough also on less powerful HW?

This material shading issue really makes it hard to get an objective impression on RT performance, to me at least. But maybe i'm too naive assuming this should be no big problem at all.
 
We've been through this before.

Isn't that what you are bringing up with this tweet? What would you expect an AMD fan with non corroborated comments from an AMD linux driver developer to say? And even if they were verified all you need is to consider the comment source. It will be interesting to see what they have to say with the next path-tracing game. (which AMD hopefully contributes towards optimizing their hardware)

I can't believe to take his comments at face value. :whistle:

Report it, do not intervene in this kind of attempts to derail into discussions whether theres a conspiracy or sabotage going on.

Yes, we want it. But maybe it's too expensive to be worth it. Depends, ofc.
But to me this uber shader goal is not specific to RT, it's the final goal of material shading research in general. Disney tries to come up with some generalized material model, NV also has related research, everybody wants it.
But besides the complexity of the physics problem, technically we also need the transition to bindless rendering, so we no longer have old school 'one shader per texture per model', but a more efficient and flexible 'one general shader can access any texture it needs' approach. That's pretty common nowadays already. For ray tracing it seems the only way at all, if you care about performance.

Notice there is no visual compromise to expect, because the rasterizer would use the same system as well ideally. If we use a visibility buffer, we have the same problem here, because a tile of threads no longer will process the same materials either.
The legacy pipeline of many materials and shaders seems a bit outdated to me, or at least inefficient.

In case we really need many different materials (which sure is the norm, but we could easily reduce it to something like 10), we can still bin pixels to materials to get coherent shading. Iirc, UE5 does it, for example.

So i ask: Why isn't this the norm fro RT as well? We could bin reflection hitpoints to materials just the same way. And if we have so many hitpoints that this isn't practical (e.g. GI), then we could easily use a simplified and uniform material for everything without any visual compromise for those blurry cases, because material details do not show.

If nothing like this is implemented for Portal RTX, then questioning poor optimization, intended or not, is justified.
But i'm just speculating here that this eventually might be the reason profiler outputs look so bad.
I don't know, and i see no point in discussing 'Portal RTX was made to only barely run on new 40 series high end', etc.
The interesting question is: Could it run better, well enough also on less powerful HW?

This material shading issue really makes it hard to get an objective impression on RT performance, to me at least. But maybe i'm too naive assuming this should be no big problem at all.

UE5 is in its first stages, its new-born and only has seen use in a simple game as Fornite is. We will sure see better useage of ray tracing going forward, with heavier hardware demands, but also a prettier presentation.
 
UE5 is in its first stages, its new-born and only has seen use in a simple game as Fornite is. We will sure see better useage of ray tracing going forward, with heavier hardware demands, but also a prettier presentation.
Yeah but this has nothing to do with what i tried to say. I only mentioned UE5 as an example of state of the art tech which uses visibility buffer and material binning, afaict.
The same practices are available to RT. But DXR educates users to follow the basic and classic RT approach, ignoring the fact that this was standard for single threaded CPU architectures, and isn't compatible with GPU parallel execution.
Recursion support, hit shaders... that's really easy to use, but it's not fast.
However, unlike my real API concerns, nobody forces us to use those options in such inefficient ways, so i do not complain.
But i speculate some devs might not care, or wrongly assume SER would already fix the inefficiency.
Looking at the Quake RTX code back then, i definitively had this impression of classical RT, and simplicity > performance. The whole approach just ignores how GPUs work. They are not single threaded, and treating them as such performs poor.

Now you might think: Those are experts, and they surely know what they do. I trust their expertise more than yours.

In my opinion, pioneer research work on Restir and Spatial Temporal Denoising is indeed work of experts, no doubt. And the media spreads the word of how clever those things are, and how much optimization they achieve.
From there, everybody assumes that the whole game is cleverly optimized. Of coarse. Obviously. It must be.
But those profiler outputs we see seemingly proof that's actually not the case.
I just don't know if such results are eventually the norm with RT, but better not.
 
Now you might think: Those are experts, and they surely know what they do. I trust their expertise more than yours.

Absolutely not lol. Looking at UE5(.1) theres alot to be desired, and thats from a big studio like Epic with massive amounts of resources. And thats just Epic and unreal engine. Looking at how some games ship, be it pc or console, i sure do not trust always that their knowing what their doing.
 
Back
Top