AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

Scott_Arm · Dec 24, 2020

My understanding is the tensor cores only do matrix-matrix multiplication and accumulate.

Frenetic Pony · Dec 24, 2020

I'd assume RDNA2 is fine when it comes to deployment of neural networks. It supports quad rate int8 so while normal hw will be taken up it should go quickly enough.

The real question is what to do with it. And actual game dev answers seem to be "animation" because graph decision making is basically what animation is about anyway, and is a huge pain to do by hand. As for image upscaling, I can see it for eliminating TAA artifacts, that's really what DLSS is good at (much less blur using TAA) but upscaling itself is a bit nonsense. You can clearly see the large amount of noise it introduces on Control's cleaner surfaces. If you wanted TAA noise you could just tweak TAA settings and post sharpening.

And as for per task efficiency, practical application compute efficiency, for gaming, clearly goes to RDNA2 here for compute (actually for all As long as it's not waiting on hw rt (obvious Nvidia win) and isn't bottlenecked by bandwidth to main memory (it's nigh certainly deferred games and the gbuffer pass at 4k here) then the 6900xt can equal a 3090 at over a hundred less watts of power draw. All Nvidia's hypothetical compute power is useless from a gaming perspective, even for pure compute loads it's less efficient per watt, but if you're using blender or rendering video that doesn't matter. What matters is Nvidia has the faster card with more ram.

Unfortunately for AMD right now there's games with high RT use optimized for Nvidia, and so they get clobbered in some benchmarks there, and they deserve it. Same with deferred 4k games. They should've seen the bottleneck during design, should've known they needed more bandwidth to main memory. But for whatever reason they didn't do it. And it's not like deferred rendering is going anywhere, nor like the consoles have the same limitations.

Both vendors made design mistakes concerning gaming this generation. For now Nvidia is on top though. Of course a year from now there could easily be more Godfalls where even people's "great deal omg 3080 is the best" $700 cards can't even hit max settings. But explaining that to consumers never seems to work till after the fact.

Rootax · Dec 25, 2020

Design mistakes = trade off I think. They're not dumb, they know what's up, but you have a power/price/size/be driver friendly/etc balance to find, with time constraint (releasing a product between day x and day y).

gamervivek · Dec 25, 2020

I'm wondering how well a 6SE 120CU part without the cache to take up all that area and using HBM2 instead, would've worked with RT. 50% more RT units than 6900XT, enough to make it on par with 3090?

nAo · Dec 25, 2020

OlegSH said:
https://www.nvidia.com/content/dam/...pere-GA102-GPU-Architecture-Whitepaper-V1.pdf
Look at the table 3 in the GA102 whitepaper
There is a feature called "Instance Transform Acceleration", it's related to the BVH building (probably it cost nothing for SIMD's in GA102 to apply the instance transformation - move/rotate a box or a model).
Do we know whether RDNA2 supports this feature?

RDNA2 doesn’t have it but it has nothing to do with accelerating BVH building.

It accelerates traversal in the presence of instanced geometry (e.g. building a forest by reusing the same tree many times with different poses..)

OlegSH · Dec 25, 2020

nAo said:
It accelerates traversal in the presence of instanced geometry (e.g. building a forest by reusing the same tree many times with different poses..)

TLAS contains instances for every object of a scene, which are stored in BLASes. If different instances refer the same BLAS, that's instancing.
Not sure why the "Instance Transform Acceleration" should refer just to instancing, it may as well be referring the instance and BLAS transformations in general.
By accelerating AABB transformations, a lot of optimisations become possible at BLAS build time - faster refitting, better AABB alignment for geometry, etc.

trinibwoy · Dec 25, 2020

OlegSH said:
TLAS contains instances for every object of a scene, which are stored in BLASes. If different instances refer the same BLAS, that's instancing.
Not sure why the "Instance Transform Acceleration" should refer just to instancing, it may as well be referring the instance and BLAS transformations in general.
By accelerating AABB transformations, a lot of optimisations become possible at BLAS build time - faster refitting, better AABB alignment for geometry, etc.

My understanding is that instance transforms are done just in time during intersection testing. It’s not relevant during BVH builds because those just use the “default” orientation for each instanced object.

OlegSH · Dec 25, 2020

trinibwoy said:
My understanding is that instance transforms are done just in time during intersection testing

What would happen if hardware doesn't support instance transforms?
Following the description here - "This data structure is used in GPU memory during acceleration structure build" and "Per customer request, clarified for D3D12_RAYTRACING_INSTANCE_DESC that implementations transform rays as opposed to transforming all geometry/AABBs."
You might be right that with HW acceleration it can happen during intersection testing, still some BVH builder assistance might be required for cases without HW acceleration.

trinibwoy · Dec 25, 2020

OlegSH said:
What would happen if hardware doesn't support instance transforms?
Following the description here - "This data structure is used in GPU memory during acceleration structure build" and "Per customer request, clarified for D3D12_RAYTRACING_INSTANCE_DESC that implementations transform rays as opposed to transforming all geometry/AABBs."
You might be right that with HW acceleration it can happen during intersection testing, still some BVH builder assistance might be required for cases without HW acceleration.

Yes the metadata for the orientation of each instance in world space is included in the TLAS structure. That data is provided by the application as is. No acceleration required here during BVH build.

“This C++ struct definition is useful if generating instance data on the CPU first then uploading to the GPU.”

The bit that seems to be accelerated on the GPU is the transformation of each individual instance based on its world space orientation during intersection testing. If AMD doesn’t have any special hardware to do that transform (either the ray or the instance) then presumably they’re doing it on the SIMDs.

The alternative is to create unique BLAS entries for each instance during BVH build but that would likely be very wasteful.

OlegSH · Dec 25, 2020

trinibwoy said:
The alternative is to create unique BLAS entries for each instance during BVH build but that would likely be very wasteful.

Yep, I thought about this variant, but probably doing transforms per ray on SIMD is cheaper, have no idea to be honest.

DavidGraham · Dec 25, 2020

So Turing vs RDNA2 RT showdown:

MineCraft RTX: 2080Ti is 35% faster than 6900XT
Amid Evil RTX: 2080Ti is 45% faster than 6900XT
Black Ops: 2080Ti is 12% faster than 6900XT
Tomb Raider: 6900XT is 8% faster than 2080Ti
Metro Exodus: 6900XT is 9% faster than 2080Ti
Control: 2080Ti is equal to 6900XT
Battlefield V: 2080Ti is equal to 6900XT

The more ray tracing there is the faster Turing pulls ahead, confirming that Turing does indeed have better RT performance than RDNA2. I suspect the scenes WCCFTECH tested in Control, Battlefield, Metro and Tomb Raider didn't have that much ray tracing in them, allowing the 6900XT to be equal to the 2080Ti, if RT is heavily present in the scene the 2080Ti would pull ahead, just like Minecraft. I am waiting for the Digital Foundry big showdown to confirm this.

Svensk Viking · Dec 25, 2020

Is RDNA2 still known for having broken visuals across various games when using Raytracing? Anyway, Computerbase.de once again makes a point of RDNA2 competing better in the recently released Black Ops and Watch Dogs Legions. Black Ops even has better results in the 0,2% values on RDNA2 except from the test in 3.840 x 2.160

https://www.computerbase.de/2020-12...itt_benchmarks_in_sieben_topaktuellen_spielen

It might very well turn out that RDNA2 will generally always be bad at raytracing but it feels like people do put too much faith into the titles for which DXR only was optimized for Nvidia hardware, which was the only one to offer it from 2018 up until now

DavidGraham · Dec 25, 2020

Svensk Viking said:
Computerbase.de once again makes a point of RDNA2 competing better in the recently released Black Ops and Watch Dogs Legions. Black Ops even has better results in the 0,2% values on RDNA2 except from the test in 3.840 x 2.160

https://www.computerbase.de/2020-12...itt_benchmarks_in_sieben_topaktuellen_spielen

Benches for Black Ops and Watch Dogs in Computerbase are old, using broken AMD drivers, the difference is rather large in these titles with proper drivers.

Even the 3070 is 20% faster than 6800XT in Call Of Duty Black Ops with RT @4K.

Watch Dogs Legion benchmarked after the AMD RT patch, the 6800XT remains slower than the 3070, while the 3080 is 37% faster @1440p and 50% faster @2160p.

I also stress that it is very important to select scenes where RT is present in moderate to large amounts to properly test RT performance across architectures, it's not enough to generally select some random scenes and be done with it.

Svensk Viking · Dec 25, 2020

The Computerbase test is actually for the 6900XT and from the eighth of December, so it's actually more than two weeks more recent than that video you posted of Black Ops

The Watch Dogs video is from the 17th December though, so nine days newer and probably a more representative test

DegustatoR · Dec 25, 2020

Both Cold War and Legion are running better than average on AMD h/w without RT and this likely skew the RT results in AMD's favor as well.

DavidGraham · Dec 25, 2020

Svensk Viking said:
New The Computerbase test is actually for the 6900XT and from the eighth of December, so it's actually more than two weeks more recent than that video you posted of Black Ops

Cold War RT @4K: the 3090 is 66% faster than 6900XT, the 3080 is 50% faster. The 1440p results are not logical as they only have the 3090 being 18% faster than 3070. Suggesting a different bottleneck in the scene they selected.

https://www.computerbase.de/2020-12...-in-call-of-duty-black-ops-cold-war-3840-2160

Again, it is very important to select scenes where RT is present in moderate to large amounts to properly test RT performance across architectures, it's not enough to generally select some random scenes and be done with it.

Deleted member 13524 · Dec 25, 2020

Svensk Viking said:
Anyway, Computerbase.de once again makes a point of RDNA2 competing better in the recently released Black Ops and Watch Dogs Legions. Black Ops even has better results in the 0,2% values on RDNA2 except from the test in 3.840 x 2.160

Biggest difference I see between these two games and e.g. Control is they have RT running on the RDNA2 consoles meaning they had to get optimizations for AMD's ray tracing units.
In early PC implementations like Control there's only AMD RT hardware running code that was optimized for nvidia's RT units.

I always thought a bit naive to assume RT performance in DXR is this super predictable process that will scale linearly and equally across all GPU architectures.
I.e. "it's just plain DXR so there's no reason to believe this game whose RT implementation was co-developed by nvidia would be favoring one architecture over the other".
I guess this is just empirical proof of that.

Perhaps the RT performances we're seeing in Cold War and Legion is more representative of what to expect on future multiplatform titles than what we've had with designed-for-RTX titles.
The GA102 GPUs still get a substantial advantage in RT over Navi 21 GPUs, but not on the 30+% deltas we're seeing on the older RTX titles.

PSman1700 · Dec 25, 2020

DegustatoR said:
Both Cold War and Legion are running better than average on AMD h/w without RT and this likely skew the RT results in AMD's favor as well.

Yes makes sense, some seem to forget normal rendering even during RT scenes.

Ethatron · Dec 25, 2020

ToTTenTranz said:
I.e. "it's just plain DXR so there's no reason to believe this game whose RT implementation was co-developed by nvidia would be favoring one architecture over the other".
I guess this is just empirical proof of that.

The raytracing pipe is basically the OptiX pipeline without some flexibilities, parts of Nvidia's OptiX software stack was [allegedly] recyled for RTX as well.

Remember that HLSL itself is from Nvidia, called Cg back then. Geometry shaders also stem from Nvidia. Constant Buffers come from Nvidia too.

Tesselation and Mesh shaders can be traced to AMD in term of functionality, but the pipeline stage convention was brought forth by MS together with all the others.

There never was something like a ISA (say from Microsoft), which the hardwares implemented, like ARM or x86. It always was opportunistic and reactive from MS and on a really high level. I don't know who failed who here. But I would prefer MS would invent an ISA actively (forward looking), which can be extended and / or optional (like SSE, AVX). Or a consortium would. Or AMDs involvement with Samsung leads to basically an establishment of a situation like x86, where multiple vendors co-develop the ISA.

fellix · Dec 25, 2020

Ethatron said:
But I would prefer MS would invent an ISA actively (forward looking), which can be extended and / or optional (like SSE, AVX). Or a consortium would. Or AMDs involvement with Samsung leads to basically an establishment of a situation like x86, where multiple vendors co-develop the ISA.

Actually, there's a similar proposition by Agner Fog for a hybrid CISC/RISC forward-compatible ISA: https://www.forwardcom.info/

AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

Scott_Arm

Frenetic Pony

Rootax

gamervivek

nAo

Nutella Nutellae

OlegSH

trinibwoy

Meh

OlegSH

trinibwoy

Meh

OlegSH

DavidGraham

Svensk Viking

DavidGraham

Svensk Viking

DegustatoR

DavidGraham

Deleted member 13524

Guest

PSman1700

Ethatron

fellix

Similar threads