AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

I'd assume RDNA2 is fine when it comes to deployment of neural networks. It supports quad rate int8 so while normal hw will be taken up it should go quickly enough.

The real question is what to do with it. And actual game dev answers seem to be "animation" because graph decision making is basically what animation is about anyway, and is a huge pain to do by hand. As for image upscaling, I can see it for eliminating TAA artifacts, that's really what DLSS is good at (much less blur using TAA) but upscaling itself is a bit nonsense. You can clearly see the large amount of noise it introduces on Control's cleaner surfaces. If you wanted TAA noise you could just tweak TAA settings and post sharpening.

And as for per task efficiency, practical application compute efficiency, for gaming, clearly goes to RDNA2 here for compute (actually for all As long as it's not waiting on hw rt (obvious Nvidia win) and isn't bottlenecked by bandwidth to main memory (it's nigh certainly deferred games and the gbuffer pass at 4k here) then the 6900xt can equal a 3090 at over a hundred less watts of power draw. All Nvidia's hypothetical compute power is useless from a gaming perspective, even for pure compute loads it's less efficient per watt, but if you're using blender or rendering video that doesn't matter. What matters is Nvidia has the faster card with more ram.

Unfortunately for AMD right now there's games with high RT use optimized for Nvidia, and so they get clobbered in some benchmarks there, and they deserve it. Same with deferred 4k games. They should've seen the bottleneck during design, should've known they needed more bandwidth to main memory. But for whatever reason they didn't do it. And it's not like deferred rendering is going anywhere, nor like the consoles have the same limitations.

Both vendors made design mistakes concerning gaming this generation. For now Nvidia is on top though. Of course a year from now there could easily be more Godfalls where even people's "great deal omg 3080 is the best" $700 cards can't even hit max settings. But explaining that to consumers never seems to work till after the fact.
 
I'm wondering how well a 6SE 120CU part without the cache to take up all that area and using HBM2 instead, would've worked with RT. 50% more RT units than 6900XT, enough to make it on par with 3090?
 
https://www.nvidia.com/content/dam/...pere-GA102-GPU-Architecture-Whitepaper-V1.pdf
Look at the table 3 in the GA102 whitepaper
There is a feature called "Instance Transform Acceleration", it's related to the BVH building (probably it cost nothing for SIMD's in GA102 to apply the instance transformation - move/rotate a box or a model).
Do we know whether RDNA2 supports this feature?

RDNA2 doesn’t have it but it has nothing to do with accelerating BVH building.

It accelerates traversal in the presence of instanced geometry (e.g. building a forest by reusing the same tree many times with different poses..)
 
It accelerates traversal in the presence of instanced geometry (e.g. building a forest by reusing the same tree many times with different poses..)
TLAS contains instances for every object of a scene, which are stored in BLASes. If different instances refer the same BLAS, that's instancing.
Not sure why the "Instance Transform Acceleration" should refer just to instancing, it may as well be referring the instance and BLAS transformations in general.
By accelerating AABB transformations, a lot of optimisations become possible at BLAS build time - faster refitting, better AABB alignment for geometry, etc.
 
TLAS contains instances for every object of a scene, which are stored in BLASes. If different instances refer the same BLAS, that's instancing.
Not sure why the "Instance Transform Acceleration" should refer just to instancing, it may as well be referring the instance and BLAS transformations in general.
By accelerating AABB transformations, a lot of optimisations become possible at BLAS build time - faster refitting, better AABB alignment for geometry, etc.

My understanding is that instance transforms are done just in time during intersection testing. It’s not relevant during BVH builds because those just use the “default” orientation for each instanced object.
 
My understanding is that instance transforms are done just in time during intersection testing
What would happen if hardware doesn't support instance transforms?
Following the description here - "This data structure is used in GPU memory during acceleration structure build" and "Per customer request, clarified for D3D12_RAYTRACING_INSTANCE_DESC that implementations transform rays as opposed to transforming all geometry/AABBs."
You might be right that with HW acceleration it can happen during intersection testing, still some BVH builder assistance might be required for cases without HW acceleration.
 
Last edited:
What would happen if hardware doesn't support instance transforms?
Following the description here - "This data structure is used in GPU memory during acceleration structure build" and "Per customer request, clarified for D3D12_RAYTRACING_INSTANCE_DESC that implementations transform rays as opposed to transforming all geometry/AABBs."
You might be right that with HW acceleration it can happen during intersection testing, still some BVH builder assistance might be required for cases without HW acceleration.

Yes the metadata for the orientation of each instance in world space is included in the TLAS structure. That data is provided by the application as is. No acceleration required here during BVH build.

“This C++ struct definition is useful if generating instance data on the CPU first then uploading to the GPU.”

The bit that seems to be accelerated on the GPU is the transformation of each individual instance based on its world space orientation during intersection testing. If AMD doesn’t have any special hardware to do that transform (either the ray or the instance) then presumably they’re doing it on the SIMDs.

The alternative is to create unique BLAS entries for each instance during BVH build but that would likely be very wasteful.
 
Last edited:
So Turing vs RDNA2 RT showdown:

MineCraft RTX: 2080Ti is 35% faster than 6900XT
Amid Evil RTX: 2080Ti is 45% faster than 6900XT
Black Ops: 2080Ti is 12% faster than 6900XT
Tomb Raider: 6900XT is 8% faster than 2080Ti
Metro Exodus: 6900XT is 9% faster than 2080Ti
Control: 2080Ti is equal to 6900XT
Battlefield V: 2080Ti is equal to 6900XT

The more ray tracing there is the faster Turing pulls ahead, confirming that Turing does indeed have better RT performance than RDNA2. I suspect the scenes WCCFTECH tested in Control, Battlefield, Metro and Tomb Raider didn't have that much ray tracing in them, allowing the 6900XT to be equal to the 2080Ti, if RT is heavily present in the scene the 2080Ti would pull ahead, just like Minecraft. I am waiting for the Digital Foundry big showdown to confirm this.

 
Is RDNA2 still known for having broken visuals across various games when using Raytracing? Anyway, Computerbase.de once again makes a point of RDNA2 competing better in the recently released Black Ops and Watch Dogs Legions. Black Ops even has better results in the 0,2% values on RDNA2 except from the test in 3.840 x 2.160

https://www.computerbase.de/2020-12...itt_benchmarks_in_sieben_topaktuellen_spielen

It might very well turn out that RDNA2 will generally always be bad at raytracing but it feels like people do put too much faith into the titles for which DXR only was optimized for Nvidia hardware, which was the only one to offer it from 2018 up until now
 
Computerbase.de once again makes a point of RDNA2 competing better in the recently released Black Ops and Watch Dogs Legions. Black Ops even has better results in the 0,2% values on RDNA2 except from the test in 3.840 x 2.160

https://www.computerbase.de/2020-12...itt_benchmarks_in_sieben_topaktuellen_spielen
Benches for Black Ops and Watch Dogs in Computerbase are old, using broken AMD drivers, the difference is rather large in these titles with proper drivers.

Even the 3070 is 20% faster than 6800XT in Call Of Duty Black Ops with RT @4K.

Watch Dogs Legion benchmarked after the AMD RT patch, the 6800XT remains slower than the 3070, while the 3080 is 37% faster @1440p and 50% faster @2160p.

I also stress that it is very important to select scenes where RT is present in moderate to large amounts to properly test RT performance across architectures, it's not enough to generally select some random scenes and be done with it.
 
The Computerbase test is actually for the 6900XT and from the eighth of December, so it's actually more than two weeks more recent than that video you posted of Black Ops

The Watch Dogs video is from the 17th December though, so nine days newer and probably a more representative test
 
New The Computerbase test is actually for the 6900XT and from the eighth of December, so it's actually more than two weeks more recent than that video you posted of Black Ops
Cold War RT @4K: the 3090 is 66% faster than 6900XT, the 3080 is 50% faster. The 1440p results are not logical as they only have the 3090 being 18% faster than 3070. Suggesting a different bottleneck in the scene they selected.

https://www.computerbase.de/2020-12...-in-call-of-duty-black-ops-cold-war-3840-2160

Again, it is very important to select scenes where RT is present in moderate to large amounts to properly test RT performance across architectures, it's not enough to generally select some random scenes and be done with it.
 
Anyway, Computerbase.de once again makes a point of RDNA2 competing better in the recently released Black Ops and Watch Dogs Legions. Black Ops even has better results in the 0,2% values on RDNA2 except from the test in 3.840 x 2.160

Biggest difference I see between these two games and e.g. Control is they have RT running on the RDNA2 consoles meaning they had to get optimizations for AMD's ray tracing units.
In early PC implementations like Control there's only AMD RT hardware running code that was optimized for nvidia's RT units.


I always thought a bit naive to assume RT performance in DXR is this super predictable process that will scale linearly and equally across all GPU architectures.
I.e. "it's just plain DXR so there's no reason to believe this game whose RT implementation was co-developed by nvidia would be favoring one architecture over the other".
I guess this is just empirical proof of that.


Perhaps the RT performances we're seeing in Cold War and Legion is more representative of what to expect on future multiplatform titles than what we've had with designed-for-RTX titles.
The GA102 GPUs still get a substantial advantage in RT over Navi 21 GPUs, but not on the 30+% deltas we're seeing on the older RTX titles.
 
Last edited by a moderator:
I.e. "it's just plain DXR so there's no reason to believe this game whose RT implementation was co-developed by nvidia would be favoring one architecture over the other".
I guess this is just empirical proof of that.

The raytracing pipe is basically the OptiX pipeline without some flexibilities, parts of Nvidia's OptiX software stack was [allegedly] recyled for RTX as well.

Remember that HLSL itself is from Nvidia, called Cg back then. Geometry shaders also stem from Nvidia. Constant Buffers come from Nvidia too.

Tesselation and Mesh shaders can be traced to AMD in term of functionality, but the pipeline stage convention was brought forth by MS together with all the others.

There never was something like a ISA (say from Microsoft), which the hardwares implemented, like ARM or x86. It always was opportunistic and reactive from MS and on a really high level. I don't know who failed who here. But I would prefer MS would invent an ISA actively (forward looking), which can be extended and / or optional (like SSE, AVX). Or a consortium would. Or AMDs involvement with Samsung leads to basically an establishment of a situation like x86, where multiple vendors co-develop the ISA.
 
But I would prefer MS would invent an ISA actively (forward looking), which can be extended and / or optional (like SSE, AVX). Or a consortium would. Or AMDs involvement with Samsung leads to basically an establishment of a situation like x86, where multiple vendors co-develop the ISA.
Actually, there's a similar proposition by Agner Fog for a hybrid CISC/RISC forward-compatible ISA: https://www.forwardcom.info/
 
Back
Top