GPU Ray Tracing Performance Comparisons [2021-2022]

DegustatoR · Oct 6, 2022

DavidGraham said:
In path traced games: like Quake 2 RTX, the A770 is massively behind NVIDIA, to the point that it's more like an AMD GPU, as the 3060 is 70% faster than A770, and 80% faster than RX 6650XT.

Elsewhere, the A770 is 40% faster than the 3060 in Metro Exodus, Dying Light 2 and Hitman 3 (as expected), but less than 10% in Cyberpunk 2077, Doom Eternal, Deathloop and Ghostwire Tokyo.

Preview • Intel ARC A770 LE 16 Go & A750 LE

Test des nouvelles cartes graphiques ARC A750 et ARC A770 16 Go au travers des modèles Limited Edition d'Intel. Au programme, 14 jeux testés, dont 10 en rastérisation, 10 en Ray Tracing, 13 benchmarks, des mesures de consommation, nuisances sonores, températures et imagerie infrarouge.

www.comptoir-hardware.com

Worth remembering that A770 is using a chip which is is fact more complex than GA104, so unless it beats 3070 I don't see how we can say that Intel has a (relatively) good RT performance.

troyan · Oct 6, 2022

Arc has more bandwidth and more L2 cache than a 3070, too. Both helps with Raytracing.

PSman1700 · Oct 6, 2022

A770 is a good alternative to RDNA2, not so much a 3060/Ti. Though it sports a generous 16gb framebuffer which is nice. Its their first GPU also.

DegustatoR · Oct 6, 2022

PSman1700 said:
A770 is a good alternative to RDNA2, not so much a 3060/Ti. Though it sports a generous 16gb framebuffer which is nice. Its their first GPU also.

Too bad it's priced against 3060 for the most part which makes it a lot less competitive with cheaper AMD options like 6600 and XT.

DavidGraham · Oct 6, 2022

DegustatoR said:
Worth remembering that A770 is using a chip which is is fact more complex than GA104

Agreed, the A770 enjoys a healthy lead in memory bandwidth vs the RTX 3060 (560GB/s vs 360GB/s), operates at higher clocks (2100MHz vs 1800MHz), has higher count of FP32 cores (4096 vs 3584), and higher count of ray tracing cores (32 vs 28), so it's only natural the A770 beats the 3060 in Ray Tracing .. though, It's obvious it's beating it by raw brute force specs alone (35% higher FLOPs, and 55% higher bandwidth), and not by some better RT engine that Intel has.

davis.anthony · Oct 6, 2022

DavidGraham said:
and not by some better RT engine that Intel has.

I don't know, some of the results for the games that been 'optimised' (I use that term loosely) seem to perform better than the difference in RT units suggests.

Factor in certain games will have built with RTX in mind (Quake 2 RTX as example) and I think Intel's RT implementation may actually be better than Nvidia's 3000 series efforts.

I would love to know why Metro Exodus and Hitman are so much faster on the A770 compared to the 3060.

DavidGraham · Oct 6, 2022

davis.anthony said:
I would love to know why Metro Exodus and Hitman are so much faster on the A770 compared to the 3060.

The difference in FLOPs alone explain that.

davis.anthony said:
seem to perform better than the difference in RT units suggests

You also have FP32 and memory bandwidth difference.

davis.anthony said:
will have built with RTX in mind (Quake 2 RTX as example)

Quake 2 is optimized for VulkanRT now, for all GPUs.

davis.anthony said:
I think Intel's RT implementation may actually be better than Nvidia's 3000 series efforts

I don't think that to be honest. The 3060Ti is far a head of the A770, despite being a closer match on spec than the 3060.

T2098 · Oct 6, 2022

DavidGraham said:
In path traced games: like Quake 2 RTX, the A770 is massively behind NVIDIA, to the point that it's more like an AMD GPU, as the 3060 is 70% faster than A770, and 80% faster than RX 6650XT.

Elsewhere, the A770 is 40% faster than the 3060 in Metro Exodus, Dying Light 2 and Hitman 3 (as expected), but less than 10% in Cyberpunk 2077, Doom Eternal, Deathloop and Ghostwire Tokyo.

Preview • Intel ARC A770 LE 16 Go & A750 LE

Test des nouvelles cartes graphiques ARC A750 et ARC A770 16 Go au travers des modèles Limited Edition d'Intel. Au programme, 14 jeux testés, dont 10 en rastérisation, 10 en Ray Tracing, 13 benchmarks, des mesures de consommation, nuisances sonores, températures et imagerie infrarouge.

www.comptoir-hardware.com

Great find for the Q2 RTX benchmarks. The one thing I found quite curious is how the A770 LE 16GB and the A750 8GB are within <1% of each other, essentially within the range of measurement error.

The A770 LE has a ~10% memory bandwidth uplift, along with ~14% more functional units across the board, including RT ones, and a small clockspeed bump vs the A750 8GB.
And yet performance is identical. If they were overflowing on-chip caches you'd figure the memory bandwidth bump would help, I almost wonder if there's a CPU limitation or some other odd driver bottleneck at play there. I haven't been able to find any published benchmarks for BabyArc (A380) but I'd be very curious to see how it performed in Q2 RTX.

EDIT: Aha, I just need to search in German:

Intel Arc A380 im Benchmark-Test mit 35 Spielen: Ist der Treiber nun reif für die großen Kaliber A750 und A770?

Teil 2 der Benchmarks: Was leistet die Intel Arc A380 im Vergleich mit einer AMD Radeon RX 6400? Das Low-Cost-Duell - inklusive Raytracing.

www.pcgameshardware.de

It's only 720p, but A380 is doing way better than you'd think there given that it's essentially 1/4 of an A770LE.

T2098 · Oct 6, 2022

One more curious Intel Arc performance of note, comparing A770 LE 16GB vs A750 8GB:

Intel Arc A770 16GB und Arc A750 8GB im Test: Überzeugen Intels Gaming-Debütanten?

Intels Arc A770 16GB und Arc A750 im Spieletest mit zahlreichen Benchmarks und Platzierung in der Bestenliste - Raytracing inklusive.

www.pcgameshardware.de

Check out the Riftbreaker benchmarks with RT on. The A750 is roughly half the speed of the A770LE, even at 1080p, which makes no sense at all.
It doesn't appear that the 8GB of VRAM is limiting anything, as it's being beaten by the 6GB 2060, along with all the other 8GB cards in the comparison.

Even weirder, and more evidence that the VRAM capacity has nothing to do with it, relative to the rest of the product stack it somehow does better at 4k.
It performs strangely poorly at all resolutions though, so it's also clearly not an artifact from one benchmark run go wrong.

Kaotik · Oct 7, 2022

DavidGraham said:
The difference in FLOPs alone explain that.

You also have FP32 and memory bandwidth difference.

You also have all the things Arc is slower at, like copying from RAM to GPU regardless of bandwidth

Rootax · Oct 7, 2022

Sorry if it's not the right place to ask, but have we a game with RT that supports both Vulkan and DX12 ? I'm curious to know if Vulkan can be faster than DX for RT, or the opposite, or the same...

Below2D · Oct 7, 2022

Rootax said:
Sorry if it's not the right place to ask, but have we a game with RT that supports both Vulkan and DX12 ? I'm curious to know if Vulkan can be faster than DX for RT, or the opposite, or the same...

Quake II RTX posted above.

TopSpoiler · Oct 13, 2022

Improve Shader Performance and In-Game Frame Rates with Shader Execution Reordering | NVIDIA Technical Blog

Learn about Shader Execution Reordering (SER), a performance optimization that unlocks the potential for better ray and memory coherency in ray tracing shaders.

developer.nvidia.com

SER brings 20-50% performance increase in UE5 path tracer, and 20-30% performance increase in Lumen reflection.

DavidGraham · Oct 14, 2022

Sebbi made some very important discussion points regarding the thread sorting hardware, I will list them here for quick read.

Let's discuss about shader permutation hell. With latest hardware: Intel Thread Sorting Unit (TSU) and Nvidia Shader Execution Reordering (SER). Now that RTX 4090 is massively CPU bound, could we spend 1% of that perf to get rid of shader permutations?

These new hardware blocks shuffle the registers of multiple SIMDs in a way that each SIMD can run coherent threads. This is super important for ray-tracing and explains why Intel's mid range GPU is so good at ray-tracing, but also explains why RTX 4090 is such a best in RT apps.

But these hardware blocks are not just a great fit for ray-tracing. They could be used to make GPU dynamic branching faster in all shaders. As a result, we could write CPU-style shader code with branches, instead of compiling (hundreds of) thousands of permutations.

Even with hardware like this, it's not free to shuffle SIMD data around. There would be a slight performance hit. CPUs have to pay similar costs for branches too. But CPUs are now fast enough to make this a minor annoyance. I think these GPUs are starting to be there too.

Also RTX 4090 is so fast that we desperately need better API support GPU-driven rendering. We need a fine grained way of spawning new GPU work from shaders. Mesh shaders are great, but they are still lacking the ability to select the shader like ray-tracing does.

These thread sorting units finally make me want to do some HW ray-tracing. The DXR API is still not a perfect fit for GPU SIMD execution, but at least it's not dead stupid anymore. But please, give me access to this magical HW block also for traditional shaders!

https://twitter.com/x/status/1580811308634869760

DegustatoR · Oct 14, 2022

3DMARK Speed Way Tested - The DirectX 12 Ultimate Benchmark | A new Era of graphics requires a new 3DMARK | Software

A new Era of graphics requires a new 3DMARK

www.overclock3d.net

Jay · Oct 14, 2022

DavidGraham said:
Let's discuss about shader permutation hell. With latest hardware: Intel Thread Sorting Unit (TSU) and Nvidia Shader Execution Reordering (SER). Now that RTX 4090 is massively CPU bound, could we spend 1% of that perf to get rid of shader permutations?

I believe this is the case:
SER requires developer integration
TSU is handled automatically

So wonder if it's even possible to really expose TSU.
Definitely be interesting if they got exposed how sebbi wants.

DegustatoR · Oct 15, 2022

Jay said:
So wonder if it's even possible to really expose TSU.

I don't see why not, it should be s/w controllable in any case.
A more interesting question is would Alchemist get a performance boost in some situations if it will be disabled.

Jay · Oct 15, 2022

DegustatoR said:
I don't see why not, it should be s/w controllable in any case.
A more interesting question is would Alchemist get a performance boost in some situations if it will be disabled.

I would need to double check what they said, but I wouldn't assume it's software controllable if it's totally invisible and automatically used.
Especially if they didn't expect to expose it.
Remember this is first gen also, so things like making it accessible could've been very far down design considerations if its automatic.
I'm not saying definitely not viable.

DegustatoR · Oct 15, 2022

Jay said:
I would need to double check what they said, but I wouldn't assume it's software controllable if it's totally invisible and automatically used.

I'm fairly sure that all GPU vendors can control most of GPU's innards through s/w (BIOS/MC and drivers). Whether this can be exposed through drivers into a public API is another issue which may not have a positive answer if such exposure would lead to more problems than performance wins.

Cyan · Oct 16, 2022

DavidGraham said:
Sebbi made some very important discussion points regarding the thread sorting hardware, I will list them here for quick read.

Let's discuss about shader permutation hell. With latest hardware: Intel Thread Sorting Unit (TSU) and Nvidia Shader Execution Reordering (SER). Now that RTX 4090 is massively CPU bound, could we spend 1% of that perf to get rid of shader permutations?

These new hardware blocks shuffle the registers of multiple SIMDs in a way that each SIMD can run coherent threads. This is super important for ray-tracing and explains why Intel's mid range GPU is so good at ray-tracing, but also explains why RTX 4090 is such a best in RT apps.

But these hardware blocks are not just a great fit for ray-tracing. They could be used to make GPU dynamic branching faster in all shaders. As a result, we could write CPU-style shader code with branches, instead of compiling (hundreds of) thousands of permutations.

Even with hardware like this, it's not free to shuffle SIMD data around. There would be a slight performance hit. CPUs have to pay similar costs for branches too. But CPUs are now fast enough to make this a minor annoyance. I think these GPUs are starting to be there too.

Also RTX 4090 is so fast that we desperately need better API support GPU-driven rendering. We need a fine grained way of spawning new GPU work from shaders. Mesh shaders are great, but they are still lacking the ability to select the shader like ray-tracing does.

These thread sorting units finally make me want to do some HW ray-tracing. The DXR API is still not a perfect fit for GPU SIMD execution, but at least it's not dead stupid anymore. But please, give me access to this magical HW block also for traditional shaders!

https://twitter.com/x/status/1580811308634869760

the least CPU bound metrics I've seen for the RTX 4090 was in a GTA V video I shared in another thread, where the CPU shows a 4% usage but the 4090 gets a 100% usage -when running GTA V at 16K-.

Whenever I get the A770 delivered, one of the games I want to play the most and haven't seen benchmarked in any A770's video or article review is Resident Evil 2 Remak,eone of my favourite games ever, just to check how efficient it can be now that in the latest patch Capcom added RT under DirectX 12.

GPU Ray Tracing Performance Comparisons [2021-2022]

DegustatoR

Preview • Intel ARC A770 LE 16 Go & A750 LE

troyan

PSman1700

DegustatoR

DavidGraham

davis.anthony

DavidGraham

T2098

Preview • Intel ARC A770 LE 16 Go & A750 LE

Intel Arc A380 im Benchmark-Test mit 35 Spielen: Ist der Treiber nun reif für die großen Kaliber A750 und A770?

Attachments

T2098

Intel Arc A770 16GB und Arc A750 8GB im Test: Überzeugen Intels Gaming-Debütanten?

Attachments

Kaotik

Drunk Member

Rootax

Below2D

TopSpoiler

Improve Shader Performance and In-Game Frame Rates with Shader Execution Reordering | NVIDIA Technical Blog

DavidGraham

DegustatoR

3DMARK Speed Way Tested - The DirectX 12 Ultimate Benchmark | A new Era of graphics requires a new 3DMARK | Software

Jay

DegustatoR

Jay

DegustatoR

Cyan

orange

Similar threads