GPU Ray Tracing Performance Comparisons [2021-2022]

JoeJ · Jul 8, 2021

PSman1700 said:
I doubt anyone would hire you as a dev by complaining on the largest userbase ray tracing implementation.

Reasonable assumption. Interestingly reality turned out the opposite of that.

PSman1700 said:
Theres a reason why rift apart is only having upscaled RT reflections at a cost.

Did not follow that, but i guess the reason is the same why on PC it's common to upscale the whole frame? Oh - i forgot - upscaled PC frame has even better IQ than native, as you have mentioned.

PSman1700 said:
Yeah i can refer to DF or Nvidia, no idea why you think your comments have more value.

IDK why you think that i would think my comments have more value than theirs?
However, reason i do not just repeat their findings is: I can think for myself. Try it. It isn't that hard.

PSman1700 said:
It is what it is, consoles are way behind in ray tracing.

No.

PSman1700 · Jul 8, 2021

JoeJ said:
No.

Yeah, quite much actually.

Florin · Jul 8, 2021

JoeJ said:
No.

?

Scott_Arm · Jul 9, 2021

Console is ahead in terms of RT flexibility, but PC just offers brute performance with the highest end cards.

HLJ · Jul 9, 2021

Scott_Arm said:
Console is ahead in terms of RT flexibility, but PC just offers brute performance with the highest end cards.

The "flexibilty" does not do RT effects that DXR cannot do on the PC.

Please define "flexibility", because without that definition your post has no "value" and simply sounds like a piss poor excuse for the real world lackluster RT on consoles.

Scott_Arm · Jul 9, 2021

HLJ said:
The "flexibilty" does not do RT effects that DXR cannot do on the PC.

Please define "flexibility", because without that definition your post has no "value" and simply sounds like a piss poor excuse for the real world lackluster RT on consoles.

Console has lower level api which seems to allow direct access to the RT hardware and the BVH. On PC the BVH is entirely opaque, and the hardware is only accessible through high level API. BVH builds, updates have a huge cost. Any flexibility in terms of how it's built or udpated is a big win. There's more possibility for developers to come up with novel algorithms in the console space. DXR 1.2 etc will probably offer more and more flexibility.

Also, I'm fully aware that the console GPUs are quite a bit less powerful than the PCs top end GPUs, which puts them behind by default. And I'm fully aware that AMD is not as capable in raw RT performance as Nvidia in the PC space. So I'm not suggesting there's some level playing field between PC and console. The console space just has a less opaque solution for RT, and that could lead to some new approaches on how to do things that aren't available in the PC space yet.

HLJ · Jul 9, 2021

Scott_Arm said:
Console has lower level api which seems to allow direct access to the RT hardware and the BVH. On PC the BVH is entirely opaque, and the hardware is only accessible through high level API. BVH builds, updates have a huge cost. Any flexibility in terms of how it's built or udpated is a big win. There's more possibility for developers to come up with novel algorithms in the console space. DXR 1.2 etc will probably offer more and more flexibility.

Also, I'm fully aware that the console GPUs are quite a bit less powerful than the PCs top end GPUs, which puts them behind by default. And I'm fully aware that AMD is not as capable in raw RT performance as Nvidia in the PC space. So I'm not suggesting there's some level playing field between PC and console. The console space just has a less opaque solution for RT, and that could lead to some new approaches on how to do things that aren't available in the PC space yet.

"Seems to" is a pretty weak argument.
All I hear you say is that DXR will updated in the future (surprise to no one) but since DXR run on the PC, you statement is kinda a paradox.

All I read is nothing concrete to my answer, så I will ask again:

Where does this flexibility SHOW?
Examples please.

chris1515 · Jul 9, 2021

HLJ said:
"Seems to" is a pretty weak argument.
All I hear you say is that DXR will updated in the future (surprise to no one) but since DXR run on the PC, you statement is kinda a paradox.

All I read is nothing concrete to my answer, så I will ask again:

Where does this flexibility SHOW?
Examples please.

For example support of LOD, offline BVH and partially update the BVH and Unreal Engine 5 would support raytracing without proxy usage.

https://twitter.com/x/status/1397662748582944768

pjbliverpool · Jul 9, 2021

Yeah it'll be nice to see the flexibility parity in DXR if indeed its even possible with PCs given they have to cater for future hardware.

It'll certainly be interesting to see how much if any of the raw performance delta the additional flexibility will bring before updated APIs land (if they do). The SeriesX looks to have RT performance somewhere around 2060S/2070 level and the PS5 obviously somewhat less, so that's a lot of raw performance delta to overcome to be competing with the higher end Ampere's.

If we were to start seeing those kind of performance uplifts on console, how feasible would it be for AMD at least to release its own API/extensions to utilise the same optimisations on its desktop range?

JoeJ · Jul 9, 2021

HLJ said:
"Seems to" is a pretty weak argument.

Console APIs are under NDAs, but multiple people confirmed both access / custom built / streaming BVH, and custom traversal shaders are possible on both consoles.

HLJ said:
Where does this flexibility SHOW?
Examples please.

Wait some time until consoles become more 'maxed out'. A4 games said they used custom traversal, but we don't know for which benefit.
Examples would be:
Blending discrete LODs, see Intels paper on 'Stochastic LOD', which succuessfully prevents visual popping. (Probably not possible on NV HW, although they showed some similar work, and MS listed 'Traversal Shaders' as optional future DXR feature)
Support of continuous LOD like Naninte is doing by adjusting BVH at runtime. (Possible on any HW, but impossible due to API restrictions / missing vendor extensions)
Streaming BVH from disk instead building it on high GPU / CPU costs during gameplay in an open world game. (Possible, but would require runtime conversion of compressed storage format into unknown GPU vendor formats - still a huge win)
Using BVH for non RT tasks, like finding all triangles at a given location. Application for physics, audio, AI, etc. (Possible if vendors would specify their data structures, or if API would provide abstarction over BVH)

Thus, from the developers perspective, RT is ahead on consoles because it enables things currently impossible in PC, and higher performance of high end GPU can not compensate those lacks.
From the customers perspective it's also ahead, if we did a fair comparison using entry level GPUs at similar specs, e.g. upcoming RX 6600 or maybe RTX 2060. The console would show better performance, if developers utilize flexibility, which is pretty likely and easy at least in case of 'stream instead build BVH'.

trinibwoy · Jul 9, 2021

chris1515 said:
For example support of LOD, offline BVH an dpartially update the BVH and Unreal Engine 5 would support raytracing without proxy usage.

https://twitter.com/x/status/1397662748582944768

He also mentions tracing performance is important. So the real question is does the additional flexibility make up for raw speed. Long term this is most certainly true but it would be nice to see some examples of it today.

DXR already supports partial BVH updates. You only need to rebuild the BLAS for deformable or breakable objects. You can also run BVH updates in parallel with other GPU work. So it’s not obvious that additional flexibility will bring huge gains on today’s hardware.

Nanite meshes in particular should be perfect candidates for raw tracing speed as it’s all static. LOD may help but again it would be good to see the trade off in practice. We don’t really have an RT benchmark that uses super detailed geometry on Nanite’s level. Unfortunately 3dmark’s piss poor attempt didn’t even come close.

CarstenS · Jul 9, 2021

Just for clarification JoeJ, you are talking about DXR, not about Vulkan RT extensions? Is there something that's keeping IHVs from exposing their additional features there?

chris1515 · Jul 9, 2021

trinibwoy said:
He also mentions tracing performance is important. So the real question is does the additional flexibility make up for raw speed. Long term this is most certainly true but it would be nice to see some examples of it today.

DXR already supports partial BVH updates. You only need to rebuild the BLAS for deformable or breakable objects. You can also run BVH updates in parallel with other GPU work. So it’s not obvious that additional flexibility will bring huge gains on today’s hardware.

Nanite meshes in particular should be perfect candidates for raw tracing speed as it’s all static. LOD may help but again it would be good to see the trade off in practice. We don’t really have an RT benchmark that uses super detailed geometry on Nanite’s level. Unfortunately 3dmark’s piss poor attempt didn’t even come close.

They would probably use offline BVH creation too for current Nanite version because this is only static geometry.

HLJ · Jul 9, 2021

JoeJ said:
Thus, from the developers perspective, RT is ahead on consoles because it enables things currently impossible in PC, and higher performance of high end GPU can not compensate those lacks.

Show me...because every single game using RT on the consoles have been subpar to the PC implementation.
So show me.
Talk is cheap.

DavidGraham · Jul 9, 2021

JoeJ said:
Wait some time until consoles become more 'maxed out'. A4 games said they used custom traversal, but we don't know for which benefit.

Sorry, I don't see that at all.

So far we've had dozens of console RT titles, 1st party or not, and they all showed the same lackluster performance/visual quality compared to even entry level Turing implementations: Metro Exodus, Call of Duty Cold War Black Ops, Watch Dogs Legion, Doom Eternal, Spider Man Miles Morales, Ratchet and Clank, Control, Resident Evil Village, The Medium .. etc. Upcoming announced RT titles don't deviate from that trend too (Forza Horizon 5, Bright Memory Infinite). So far, your argument remains theoretical examples with no practical data to back them up.

Just like the argument of compute RT vs hardware RT, which manged to prove how superior the hardware RT approach is: UE5 demo runs barely 1080p60 on a frigging 3090 while doing only a somewhat limited form of GI, contrast that to the implementation of Metro Exodus, which does full GI simulation + reflections, and it runs 1440p90 on the same 3090.

We are back to the same circular RT arguments form 3 years ago, hardware RT is not needed, Turing's RT is not performant, consoles will never have RT, AMD will not ride the RT bandwagon, adoption of RT will be limited, compute RT is better than hardware .. etc, all got proven blatantly incorrect with time, I thought we are supposed to go forward, not backward.

trinibwoy · Jul 9, 2021

chris1515 said:
They would probably use offline BVH creation too for current Nanite version because this is only static geometry.

Yeah probably. Theoretically though even on PC building the BVH for static geometry is a one time exercise if you had enough vram to spare. You wouldn’t rebuild it every frame. The main benefit of offline BVH would be streaming which is helpful for large levels or open world games while keeping memory usage in check.

xpea · Jul 9, 2021

JoeJ said:
Thus, from the developers perspective, RT is ahead on consoles because it enables things currently impossible in PC, and higher performance of high end GPU can not compensate those lacks.
From the customers perspective it's also ahead, if we did a fair comparison using entry level GPUs at similar specs, e.g. upcoming RX 6600 or maybe RTX 2060. The console would show better performance, if developers utilize flexibility, which is pretty likely and easy at least in case of 'stream instead build BVH'.

I'm not a game developer but the last 30 years of 3D graphic hardware proved you are wrong. Over and over, it's the same story, for every feature, hardware acceleration enables what was/is impossible to do in realtime by software (with maybe the exception of S3D broken Savage2000 T&L)
It's always like that:
1- Software is too slow
2- dedicated Hardware acceleration comes to enable it
3- depending on the feature, dedicated Hardware moves to general compute block, when the overall processing power increases enough to absorb the performance penalty.

It's never the other way around. Never

JoeJ · Jul 9, 2021

trinibwoy said:
He also mentions tracing performance is important. So the real question is does the additional flexibility make up for raw speed.

BVH build / refit is entirely in software on any GPU. Exposing those data structures won't affect tracing performance at all.
Custom built BVH may hurt or help tracing perf., but effect won't be large in either case.

trinibwoy said:
DXR already supports partial BVH updates. You only need to rebuild the BLAS for deformable or breakable objects. You can also run BVH updates in parallel with other GPU work. So it’s not obvious that additional flexibility will bring huge gains on today’s hardware.

Ten years ago i have proposed the same TLAS / BLAS strategy to people when they said 'Building AS is too expensive, so we can not do raytracing in realtime'. Back then i thought myself this strategy solves the problem in practice.
It's quite ironic it's me again now who hits a wall here. As soon as we add continuous LOD, the two level strategy does no longer work. It's not fine grained enough. We need to edit the leafs of BVH, by either removing them to decrease detail, or adding children to increase detail. With DXR, even changing just one patch of surface from a static model, we need to rebuild the whole BVH. Refitting is not possible because of the changes on the leafs. Notice refitting up the hierarchy is not even necessary on the static model - we only need to make some changes on the leafs, nothing else.
Rebuilding the whole tree ofc. is no option because horribly inefficient. Thus DXR completely prevents any form of fine grained LOD, which is a primary open problem in computer graphics.

CarstenS said:
Just for clarification JoeJ, you are talking about DXR, not about Vulkan RT extensions? Is there something that's keeping IHVs from exposing their additional features there?

Although i use VK myself, i do not check for updates that often, but there are no real differences to DX12. No extension to expose AMDs intersection instructions, and nothing to expose BVH either.
AFAIK, AMD has some other work to do, e.g. RT on Linux does not work at all yet. And i assume NV would prefer to keep things black boxed, so adding more HW specialization / changing BVH data structure remains easy.

chris1515 said:
They would probably use offline BVH creation too for current Nanite version because this is only static geometry.

The storage cost would be very high. Ideal case would be to have only one AS, shared for both LOD magament and RT. Nanites cluster hierarchy could eventually work, with some changes to ensure it suits RT demands. At runtime, they could so convert this LOD hierarchy into RT BVH, adding some more levels to have only N triangles per leaf instead 100. Adding those levels would be efficient within a single compute workgroup.
So pulling this off is some serious work, but on the long run clearly to way to go.

HLJ said:
Show me...because every single game using RT on the consoles have been subpar to the PC implementation.

Tell me some examples of 'subpar on console, but fine on PC'. I payed attention only to Exodus and how it runs even on Series S. That's impressive, because on PC we do not even have a RT GPU with such low specs.
You do not want to compare Series S vs. RTX 3090, i guess? And even if we did, that's about performance only, not about 'implementation'.
Also, please notice current games have no continuous LOD yet. UE5 is first, and we see it's impossible they could use that same detail for RT. They can only use constant LOD proxies, which in comparison is inefficient, and looses the RT advantage of accuracy. So shadow maps are more accurate than RT in that case.

PSman1700 · Jul 9, 2021

Quite telling whenever a 2060 needs to be used to gauge ps5 rt abilities. And even in that case id say the 2060 has the option to give superior results, see DF Doom Eternal pc (2060) vs ps5 analysis.
2060 is a 6.4TF gpu launched more than 3 years before the ps5.

Your one of the only ones claiming pc isnt ahead in RT also. Its quite the opposite, pc RT is ahead by about two generations. It is what it is.
Results and facts do matter. Even just looking at the paper specs, its quite clear whos ahead.

The sole argument remains API restrictions, whos to say amd/nv wont open up those more down the line anyway? You can claim console has an advantage, but cant be proven with results, same way it cant be proven software wont improve on pc side of things.

Anyway, says enough if you have to divert from hardware and use barebones ’muh API problem’ lol. Hardware’s there, and i dont see NV, AMD, Intel locking up their gpu capabilities that much that their HW advantages to the point their console level.
Fortunately thats not what were seeing, as pc rt results are magnitudes ahead, where a ps5 is competitive with a 6.5TF 2060 from 2018 in ray tracing. Meh.

Hope your statisfied with reflections in next gen games (r&c)

Metro on xss has lower settings, again.

JoeJ · Jul 9, 2021

xpea said:
I'm not a game developer but the last 30 years of 3D graphic hardware proved you are wrong. Over and over, it's the same story, for every feature, hardware acceleration enables what was/is impossible to do in realtime by software (with maybe the exception of S3D broken Savage2000 T&L)
It's always like that:
1- Software is too slow
2- dedicated Hardware acceleration comes to enable it
3- depending on the feature, dedicated Hardware moves to general compute block, when the overall processing power increases enough to absorb the performance penalty.

It's never the other way around. Never

All this is tangent to my message. Blackboxing data structures has nothing to do with hardware, how it evolves, or which problems it addresses.
It is the first time GPUs use complex data structures like BVH at all. So there is no history here we could look back at.
RT is not the same as rasterization. We talk about a paradigm shift, something entirely new, requiring to traverse trees. Rasterization is brute force O(1) and simple, RT is work efficient O(log n) and more complex.

GPU Ray Tracing Performance Comparisons [2021-2022]

JoeJ

PSman1700

Florin

Merrily dodgy

Scott_Arm

HLJ

Scott_Arm

HLJ

chris1515

pjbliverpool

B3D Scallywag

JoeJ

trinibwoy

Meh

CarstenS

Moderator

chris1515

HLJ

DavidGraham

trinibwoy

Meh

xpea

JoeJ

PSman1700

JoeJ

Similar threads