GPU Ray Tracing Performance Comparisons [2021-2022]

Probably because they wanted it ultimately to stay in the hands of the IHV to determine what would be best for them. And what may be better today may not necessarily be better for their IHV tomorrow. Though I’m not sure honestly.

Another way is to consider that BVH is hw dependent. If BVH is done in driver no problem for IHV. They can add support for new hw in new driver. If BVH was implemented in game engine each game would have to be patched to have new BVH implementation whenever new hw(architecture or chip) comes available. Also game would have to be specifically optimized for old hw like turing variants also. This creates ton of work. Considering this and first generation RT games it would be very unlikely old games would get patched for new hw to function or perform optimally. In this case it would be tremendously difficult for amd to bring RT support to games after the fact. AMD would have needed to go back to each developer and ask them to make amd specific BVH implementation or non of the "dxr" games would run on amd. This would imply each game would have to be patched and released again, onus being game developer to verify amd hw+rt versus currently the onus being on amd to provide a performant driver.

Even simple things like data format and possibly compression used to describe and store BVH can(is) be HW specific. Compression, structuring of data to be hw friendly etc. Much better for now to let driver handle this than force game developers to deal with each chip and architecture separately in their code. It get's more complicated once developer has to implement hw friendly grouping of rays etc. to go through BVH efficiently. Just naively using hw will likely be very cache unfriendly and poor performance. The black box called BVH is not at all trivial at scale to own. It's fine to own BVH if you do a demo for one specific gpu or console exclusive game.
 
Probably because they wanted it ultimately to stay in the hands of the IHV to determine what would be best for them. And what may be better today may not necessarily be better for their IHV tomorrow.
Agree to that, which makes the goal of having general BVH format not desireable.
What might work is to query the max branching factor from driver, then build nodes from compute providing bounding box and child indices. Driver can then change those commands to create nodes in HW format without a need to buffer them.
Though, the developer needs to provide shaders to support multiple branching factors, so there is a need to port from say streamed octree to BVH4, which is not trivial but still linear time and orders of magnitude faster than building from scratch.
Then we also need ability to modify existing BVH instead just building it, meaning adding or removing leaf nodes for LOD support, to avoid a full rebuild just because some patches of surface change topology.

Surely hard to solve (e.g. thinking about avoiding dynamic memory allocation), but a solution has to be found.
For a start, vendor extensions to expose BVH would help. Even if not popular that's better than waiting unknown time until some big guys agree and specify something which might be fine or not.
 
But opening BVH is possible, and i don't see why waiting for even more HW architectures to come up should make this opening easier in the future. The earlier it happens, the better.
It is also the main limitation. Currently we are limited to static topology, so the existence of DXR actually prevents or at least restricts progress towards a solution of the LOD problem.
You can argue actual games do well without that, but larger worlds and increasing detail is a trend not going to stop. It is there since video games exist and it defines their evolution to a large degree.

I don't think it's realistically possible to open BVH traversal yet. To open it every gpu manufacturer+microsoft+khronos would need to agree on data formats and instructions used. Once those are in spec vendors can implement hw that can then run code using standard data formats and instructions in compatible way. I think this will happen once RT matures and different parties come to agreement on how low level API and data formats + potential compression should look like.

Issue is currently every hw can be(is) very different. If you opened up BVH traversal you would need to support and optimize for each chip separately in game engine. Optimize for specific data formats and use hw architecture specific instructions to go through BVH. And of course also implement grouping coherent rays together in hw specific way to optimize throughput. Likely hardware architectures(turing, ampere, rdna2, future hw architectures)are so different that what works for one doesn't even run on another hw,...

Another thing, if going through BVH was done using NV proprietary mechanism in game engines how would amd ever support RT in already released games? Reimplement BVH to all old game engines and rerelease games? Game developer is unlikely to want to do this after the fact(or at all if IHV is willing to do this in driver, time is money)? Nvidia would have had same issue with ampere hw versus turing,... This is even worse for intel who is coming in even later than amd,... And intel would add more to the burden of poor developer who tries to do hw specific implementations.
 
Last edited:
Another way is to consider that BVH is hw dependent. If BVH is done in driver no problem for IHV. They can add support for new hw in new driver. If BVH was implemented in game engine each game would have to be patched to have new BVH implementation whenever new hw(architecture or chip) comes available. Also game would have to be specifically optimized for old hw like turing variants also. This creates ton of work. Considering this and first generation RT games it would be very unlikely old games would get patched for new hw to function or perform optimally. In this case it would be tremendously difficult for amd to bring RT support to games after the fact. AMD would have needed to go back to each developer and ask them to make amd specific BVH implementation or non of the "dxr" games would run on amd. This would imply each game would have to be patched and released again, onus being game developer to verify amd hw+rt versus currently the onus being on amd to provide a performant driver.

Even simple things like data format and possibly compression used to describe and store BVH can(is) be HW specific. Compression, structuring of data to be hw friendly etc. Much better for now to let driver handle this than force game developers to deal with each chip and architecture separately in their code. It get's more complicated once developer has to implement hw friendly grouping of rays etc. to go through BVH efficiently. Just naively using hw will likely be very cache unfriendly and poor performance. The black box called BVH is not at all trivial at scale to own. It's fine to own BVH if you do a demo for one specific gpu or console exclusive game.
Yea. It seems as though the API is very high level to ensure that if there is progress in RT hardware R&D; this API doesn’t force them a specific path just yet.

I do agree low level DXR will eventually arrive, but only after the technology and research and the industry has had time to converge onto a specific best path.
 
I don't think it's realistically possible to open BVH traversal yet.
That's not what i meant - i talked about access to BVH data structure but not its traversal.

To open it every gpu manufacturer+microsoft+khronos would need to agree on data formats and instructions used. Once those are in spec vendors can implement hw that can then run code using standard data formats and instructions in compatible way.
No, it's not that bad. There is no need to agree on HW data format, we only need basic agreement of using spatial, hierarchical data structures at all. Which is the case on any HW - the commonly used term 'BVH' is enough to ensure this.
We also need agreement on interface instructions to write and read this data, but IHV can convert those to produce their native format in place, also handling compression / precision transparently.
So my request is only a software interface about common ground. We want to prevent future restrictions on both ends of course, but even the minimum common ground i can imagine is enough.

If you opened up BVH traversal you would need to support and optimize for each chip separately in game engine.
No. A game engine can use dynamic BVH (or programmable traversal if practical), but it does not have to use it.
And if they decide to use it, the only per chip difference i see is different branching factors. Everybody uses bounding boxes and child indices/pointers.
Effort is minimal on all ends, and to the game developer it even is optional. (I don't want to scrap DXR - i want to extend it.)

Another thing, if going through BVH was done using NV proprietary mechanism in game engines how would amd ever support RT in already released games?
If there ever was a game to use, say NV device generated command buffers extension, then it surely has a code path to do without such vendor extension, as usual.
But notice my proposal is to make an abstraction of BVH - accessible to game devs, still allowing different implementations across chip generations and future changes from IHVs. So this problem would not come up at all.
I don't want to get some struct in memory and hacking all bits of that - i only want to set / get the common variables expected from any generic implementation of BVH nodes.
 
That's not what i meant - i talked about access to BVH data structure but not its traversal.


No, it's not that bad. There is no need to agree on HW data format, we only need basic agreement of using spatial, hierarchical data structures at all. Which is the case on any HW - the commonly used term 'BVH' is enough to ensure this.
We also need agreement on interface instructions to write and read this data, but IHV can convert those to produce their native format in place, also handling compression / precision transparently.
So my request is only a software interface about common ground. We want to prevent future restrictions on both ends of course, but even the minimum common ground i can imagine is enough.


No. A game engine can use dynamic BVH (or programmable traversal if practical), but it does not have to use it.
And if they decide to use it, the only per chip difference i see is different branching factors. Everybody uses bounding boxes and child indices/pointers.
Effort is minimal on all ends, and to the game developer it even is optional. (I don't want to scrap DXR - i want to extend it.)


If there ever was a game to use, say NV device generated command buffers extension, then it surely has a code path to do without such vendor extension, as usual.
But notice my proposal is to make an abstraction of BVH - accessible to game devs, still allowing different implementations across chip generations and future changes from IHVs. So this problem would not come up at all.
I don't want to get some struct in memory and hacking all bits of that - i only want to set / get the common variables expected from any generic implementation of BVH nodes.

You are over simplifying. Reality is more complex.

BVH you want to access is likely packed and compressed in a very hw dependent way to optimize cache line and memory usage. You would need to be able to decode the BVH on per architecture/chip basis. There is no common BVH format that you could use. This is the whole point of black box. It allows each vendor to innovate as API doesn't limit trickery hw is allowed to do.

HW to go through BVH structure is very heavily different between vendors and architectures. Even very simple things like how many bounding boxes can be parallelly processed affects what is optimal BVH structure and data format.

Creating high level API that is not dependent on HW was a great first move. Once things mature I bet the black boxes will be opened. However this requires amd, nvidia, intel, microsoft, khronos etc. to work together and agree.

You definitely can do lods. You can create multiple BVH structures. For example trace against one BVH for primary rays. For secondary rays shoot rays against different BVH containing lower level lod. This however runs to all the usual problems like light bleeding. We kind of learnt this already with doom3 and tesselation. It's important that visible geometry and geometry used for lightning is same to avoid artifacts.
 
Creating high level API that is not dependent on HW was a great first move. Once things mature I bet the black boxes will be opened. However this requires amd, nvidia, intel, microsoft, khronos etc. to work together and agree.

You definitely can do lods. You can create multiple BVH structures. For example trace against one BVH for primary rays. For secondary rays shoot rays against different BVH containing lower level lod. This however runs to all the usual problems like light bleeding. We kind of learnt this already with doom3 and tesselation. It's important that visible geometry and geometry used for lightning is same to avoid artifacts.
The IHVs know what will make RT faster as they develop each generation, this feedback will go to MS. MS will work with all IHVs to define the behaviours of the next APIs but all the IHVs have to agree on it. If they don't you get nvidia games/RTX works and radeon fidelity solutions. When they do agree you get Tier 2 and Tier 3 supports. Overall as much as I want to see higher performance, it makes sense to wait and give the IHV time to come to their own conclusions and adapt on how best to move forward. Even though the API is behind, it should be to ensure the API does not present strong performance biases for 1 vendor over the other.
 
The IHVs know what will make RT faster as they develop each generation, this feedback will go to MS. MS will work with all IHVs to define the behaviours of the next APIs but all the IHVs have to agree on it. If they don't you get nvidia games/RTX works and radeon fidelity solutions. When they do agree you get Tier 2 and Tier 3 supports. Overall as much as I want to see higher performance, it makes sense to wait and give the IHV time to come to their own conclusions and adapt on how best to move forward. Even though the API is behind, it should be to ensure the API does not present strong performance biases for 1 vendor over the other.

Completely agree. If MS had done naive thing we would just get what amd did. Few more instructions for compute shaders. At the moment dedicated and fixed function(?) hw seems to be the way to go if ampere or even turing is compared against rdna2. Will be interesting to see in years to come how hw and api's evolve. I suspect in the end unified shaders will win again but meanwhile there is a time period where fixed function seems to be king.
 
June 18
let's GOOO
this is how you do a remaster.

edit- higher quality video

just comparing console and PC footage, I think the difference is visible without DF; reflections are really a big lift here.

4A studios way in front of the curve here in terms of RT fidelity and performance. Didn't expect this to be so far ahead of Resident Evil given the budget and studio sizes.

I will use for the first time on this forum; these guys are wizards.
 
Last edited:
Good thing I knew the update would be free for Series X, so I picked it up during the last sale. Looking forward to giving it a go.

Here's the video footnote:

View attachment 5471
yea it's a full price game again nearly with the expansion etc.
But I'm okay with giving the full amount. Gotta support this type of development.
 
PCGH comments regarding Resident Evil 8 reflections ..

In the case of windows and glass showcases, cube-map reflections are still used with ray tracing, which only depict the surroundings very imprecisely. In some mirrors, on the other hand, we see "real" RT reflections, but with a rather shadowy quality and therefore less spectacular than, for example, in Control, Cyberpunk 2077 or Watch Dogs Legion.

Interesting: Obviously, the minimum roughness factor from which the reflections on a material are used is a little lower for the ray tracing reflections than that for the screen space reflections, so some surfaces show reflections with RT, while those with deactivated Ray tracing cannot be refined using screen space reflections.

This is rather untypical and is exactly the other way around in many other games with ray tracing, often screenspace reflections are used on rougher surfaces with RT, while the smoother ones are refined with RT reflections.

https://www.pcgameshardware.de/Resi...l-Village-Benchmarks-Resi-8-Review-1371311/2/
 
yea it's a full price game again nearly with the expansion etc.
But I'm okay with giving the full amount. Gotta support this type of development.

I bought metro exodus from steam sale for cheap after the enhanced edition was announced. Not really the biggest fan of genre so I didn't want to pay full price.
 
June 18
let's GOOO
this is how you do a remaster.

edit- higher quality video

just comparing console and PC footage, I think the difference is visible without DF; reflections are really a big lift here.

4A studios way in front of the curve here in terms of RT fidelity and performance. Didn't expect this to be so far ahead of Resident Evil given the budget and studio sizes.

I will use for the first time on this forum; these guys are wizards.

Interesting ... if it is true that the console versions don't get any reflections at all, that's kind of disappointing. On the other hand the loadtimes, 3D Audio and DualSense support is nice. This will be a very, very interesting comparison feature for Digital Foundry ... and personally very curious to see which version I would prefer to play, the console or the PC version, given that I have a PC with the bare minimum specs GPU wise (RTX 2060) but with 32GB RAM, but DLSS2.1 support and raytraced reflections may still make that version visually preferable, really curious about how that will pan out.
 
It's an Intresting approach.

Maybe forgo RT reflections, for a fully RT lighting engine and the benefits that affords including in art, asset, level creation etc.

The consoles RTRT may be enough for that going forward.
Guess a lot of R&D in reflections to be done.

XSS will be interesting especially in terms of texture quality, as it doesn't sound like it's using XVA.
 
if it is true that the console versions don't get any reflections at all, that's kind of disappointing
This was announced with the Enhanced Edition itself. Console versions get SSR, PC - hybrid RT+SSR.

It is interesting though that they chose to do GI and not reflections on consoles with RT. You'd think that RDNA2 RT would fare better with the latter and not the former. But I guess it all boils down to the need to use RT when you base you lighting on it, with no clear option of falling back to raster here.
 
You are over simplifying. Reality is more complex.

BVH you want to access is likely packed and compressed in a very hw dependent way to optimize cache line and memory usage. You would need to be able to decode the BVH on per architecture/chip basis. There is no common BVH format that you could use. This is the whole point of black box. It allows each vendor to innovate as API doesn't limit trickery hw is allowed to do.

I'd like to see an example of BVH data structure which can't be made from bounding boxes and child pointers, or where compression prevents accessing / changing this data. Sounds much more you make assumptions than i'm over simplifying.

However, likely you are right and i have to wait for IHVs to be done with their innovating, so i can start to innovate too. Parallelism is overrated.
 
I'd like to see an example of BVH data structure which can't be made from bounding boxes and child pointers, or where compression prevents accessing / changing this data. Sounds much more you make assumptions than i'm over simplifying.

However, likely you are right and i have to wait for IHVs to be done with their innovating, so i can start to innovate too. Parallelism is overrated.

I think one can innovate quite a lot already as shown by metro exodus enhanced edition, cp2077, minecraft,... For those low level work maybe best is to work on console exclusive or r&d inside amd/intel/nvidia/microsoft/sony. Either be allowed to optimize for one hw or be allowed to do work with future hw. When something is out in stores it's old news for IHV at that point. New stuff is already well on the way to be designed and implemented. Replacement for rdna2/ampere probably is already very long in design phase and the next next architectures probably are also under work.

For BVH you could also consider things like instantiation. Is there standard way to handle instances of same objects or is it perhaps something hw can do tricks with? What about rotating, resizing etc. the instances during BVH creation? Naively one could create unique geometry out of instances(bigger BVH). Perhaps there are better ways to instantiate and also have tricks in hw to support this efficiently. Another thing is flexibility of hw. HW can be so hardcoded today that there is no point opening it up further. AMD is ahead here as they are using regular compute, on the other hand this approach seems to have a pretty huge performance hit(blender, cp2077,...)

edit. Your thought about parallelism is naive. Nvidia has separate fixed function hardware to handle bounding boxes and triangles. You need to keep all units fed or you will go slow. Parallelism is very important as is sorting for coherent rays etc. Luckily for now the black box hides a lot of low level stuff and developers can focus on the big things like how to cast rays that contribute most and how to reuse results between frames.
 
Last edited:
June 18
let's GOOO
this is how you do a remaster.

edit- higher quality video

just comparing console and PC footage, I think the difference is visible without DF; reflections are really a big lift here.

4A studios way in front of the curve here in terms of RT fidelity and performance. Didn't expect this to be so far ahead of Resident Evil given the budget and studio sizes.

I will use for the first time on this forum; these guys are wizards.

Rather huge difference, and thats just early games. Anyway, load times will be fast on PC aswell im sure. DualSense is supported on W10 might use that instead of kb/m.
 
I guess it all boils down to the need to use RT when you base you lighting on it, with no clear option of falling back to raster here.
Yea ;) with nothing to fall back on the choice is basically made for you. Hopefully in time a developer will discover to incorporate all of this + reflections on console.
 
Back
Top