AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

AMD Radeon Pro W6600 GPU review - AEC Magazine
August 27, 2021
To cater to CAD users who want to embrace more demanding workflows, AMD recently launched the Radeon Pro W6600, one of two pro GPUs based on the company’s 7nm RDNA 2 architecture.

With 8 GB of GDDR6 memory and 10.4 Teraflops of Single Precision compute performance, the Radeon Pro W6600 is significantly less powerful than the AMD Radeon Pro W6800 (32 GB, 17.83 Teraflops). But it’s much more affordable. With an estimated street price of $649, it fits within a price bracket that architects are more likely to be comfortable with.
...
One of the key features of the AMD Radeon Pro W6600 is that is has hardware ray tracing built in. To test this out we used Unreal Engine’s Audi A5 convertible Automotive Configurator with DirectX Ray tracing (DXR) enabled.
 
As I understand it, Nv RTX cards need to build BVH structures in trees with 2 nodes/leaves, whereas the AMD RT cards can support BVH structures with both 2 and 4 nodes/leaves.
I would be fascinated to see supporting evidence for this on both NVidia and AMD.

RDNA 2 accelerates at the rate of 4 child nodes per parent-node query. I've never seen anything stated for NVidia. I would tend to guess that NVidia uses 8 children per parent node (ignoring child nodes that are leaves - no idea).

Is this transparent to games? ( i think not..?)
It appears likely that console games can make use of such transparency. Simply because the hardware is a fixed target. Metro Exodus:Enhanced Edition appears to benefit in this way:

On the fun side, due to the specific way we utilize Ray Tracing on consoles, we are tied to exact formats for the acceleration structures in our BVH: formats that were changing with almost every SDK release. Constant reverse-engineering became an inherent part of the work-cycle. :)

EVERYTHING TECHNICAL ABOUT METRO EXODUS UPGRADE FOR PLAYSTATION 5 AND XBOX SERIES X|S — 4A Games (4a-games.com.mt)

The BVH building actually happens on the CPU or GPU? or is it workload/engine variable?
I don't really understand this topic. It seems it can be done by CPU or GPU. There are API functions that provide built-in BVH building. The geometry that's provided to the API functions appears to be a extremely important in terms of overall performance (build time and then query time).

The API functions offer varying trade-offs for efficiency.

The APIs for ray tracing (DirectX and Vulkan) are seriously complex and subtle.
 
help me out with what might be a dumb Q.

As I understand it, Nv RTX cards need to build BVH structures in trees with 2 nodes/leaves, whereas the AMD RT cards can support BVH structures with both 2 and 4 nodes/leaves.

Is this transparent to games? ( i think not..?)
Are there any games that support building BVH structures for RT using the 4 leaves per node, or any benchamrks?

Basically is there any app or game that would allow for a like for like ( or close to that ) comparison of the perf impact of each approach?
The BVH building actually happens on the CPU or GPU? or is it workload/engine variable?

Sorry if these are dumb Q.s

Nvidia patents mention 8 child nodes but they’ve never said how it actually works on Turing and Ampere. What makes you think it’s 2?

Quake 2 reports BVH update times. A comparison between the 3080 and 6800xt was posted a while back. However PC games have no control over the structure of the BVH. It’s up to the hardware/driver to decide.

https://forum.beyond3d.com/posts/2185240/
 
Thanks for the answers!
That forum post from earlier in this tread is very informative / useful.
i see that was posted dec 24! probably why i missed it. started a 3 week holiday on the 22nd :)
I'm still to get to the 4a games one, but on a quick skim it looks very informative.

Thanks guys!

I cant any solid info for me belief that Nv does 2 node BVH's vs AMD's 4 node trees,
after a bit of googling i'm gonna guess i was simply wrong there.
I also saw that the latest DF weekly video they mention that BVH building is all on GPU.
So i'm guessing my CPU expertise doesn't really match whats actually going on on the GPU
 
Workstation GPU Viewport Performance: CATIA, SolidWorks, Siemens NX & More – Techgage
September 14, 2021
In this article, we’re going to get up to speed with the latest performance from our collection of cards, including those new Radeon Pro W6600 and W6800 models.

Unfortunately, our GPU lineup currently lacks any of the latest Ampere-based workstation GPUs from NVIDIA, but given the performance we’ll see from the Turing cards, it won’t be too challenging to surmise where NVIDIA’s latest (and greatest) cards would stand among the rest.
 
ComputerBase has made more extensive power consumption comparison between AMD and NVIDIA. https://www.computerbase.de/2021-09/grafikkarten-tests-leistungsaufnahme/

I picked 6800 XT and 3080 because are pretty close to each other in Doom Eternal consumption and performance. Also picked only 1440p and 1080p, since both can't achieve 144 FPS at 4K which is relevant for this comparison (performance from https://www.computerbase.de/2021-07/doom-eternal-raytracing-dll-test/2/)
1440p: 6800 XT 249.2 FPS / 295W vs 3080 239.4 FPS / 319W
1080p: 6800 XT 323.8 FPS / 294W vs 3080 292 FPS / 316W

Not too far off. But things get really interesting when they turn on FPS limiter at 144 FPS
1440p: 6800 XT 194W vs 3080 252W
1080p: 6800 XT 153W vs 3080 213W

What kind of architectural differences explain the fact that RDNA2 consumption drops so low compared to Ampere when limiting performance? Infinity Cache eliminating memory accesses?
 
Maybe they should do FPS-limiting by underclocking the CPU massively?

I don't understand how FPS limiters work...
 
ComputerBase has made more extensive power consumption comparison between AMD and NVIDIA. https://www.computerbase.de/2021-09/grafikkarten-tests-leistungsaufnahme/

I picked 6800 XT and 3080 because are pretty close to each other in Doom Eternal consumption and performance. Also picked only 1440p and 1080p, since both can't achieve 144 FPS at 4K which is relevant for this comparison (performance from https://www.computerbase.de/2021-07/doom-eternal-raytracing-dll-test/2/)
1440p: 6800 XT 249.2 FPS / 295W vs 3080 239.4 FPS / 319W
1080p: 6800 XT 323.8 FPS / 294W vs 3080 292 FPS / 316W

Not too far off. But things get really interesting when they turn on FPS limiter at 144 FPS
1440p: 6800 XT 194W vs 3080 252W
1080p: 6800 XT 153W vs 3080 213W

What kind of architectural differences explain the fact that RDNA2 consumption drops so low compared to Ampere when limiting performance? Infinity Cache eliminating memory accesses?

It's possible, because if you look at the numbers at 4k resolution where the IC is not as effective, the differences become smaller, and also GDDR6X is quite power hungry (GeForce 3070, which does not use GDDR6X, is more comparable to the similar 6700 XT).
 
ComputerBase has made more extensive power consumption comparison between AMD and NVIDIA. https://www.computerbase.de/2021-09/grafikkarten-tests-leistungsaufnahme/

I picked 6800 XT and 3080 because are pretty close to each other in Doom Eternal consumption and performance. Also picked only 1440p and 1080p, since both can't achieve 144 FPS at 4K which is relevant for this comparison (performance from https://www.computerbase.de/2021-07/doom-eternal-raytracing-dll-test/2/)
1440p: 6800 XT 249.2 FPS / 295W vs 3080 239.4 FPS / 319W
1080p: 6800 XT 323.8 FPS / 294W vs 3080 292 FPS / 316W

Not too far off. But things get really interesting when they turn on FPS limiter at 144 FPS
1440p: 6800 XT 194W vs 3080 252W
1080p: 6800 XT 153W vs 3080 213W

What kind of architectural differences explain the fact that RDNA2 consumption drops so low compared to Ampere when limiting performance? Infinity Cache eliminating memory accesses?
Ampere being clocked close to its limit to compete probably plays a factor. Node differences as well. The cache is probably the biggest aspect though. That highly clocked memory Nvidia is using has to be power hungry.
 
It's possible, because if you look at the numbers at 4k resolution where the IC is not as effective, the differences become smaller, and also GDDR6X is quite power hungry (GeForce 3070, which does not use GDDR6X, is more comparable to the similar 6700 XT).
4k results with limiter aren't really relevant for comparison, since one can't reach the FPS limit and the other is barely over it, it's pretty much the same as testing without any limiters.

edit:
for clarification, main point is is it IC or what that explains how RDNA2 consumption drops so much compared to Ampere while neither is pushed to it's full capacity, when at full capacity they're close to each other
 
Last edited:
Ampere being clocked close to its limit to compete probably plays a factor. Node differences as well. The cache is probably the biggest aspect though. That highly clocked memory Nvidia is using has to be power hungry.

Yeah it’s probably a combination of all these things. Do we know how much each card is dropping clocks? GDDR6X consumption probably doesn’t change much at all at lower fps. Then there’s the manufacturing process, cache etc. lots of variables.
 
Also, there's a difference in how AMD and Nvidia GPUs underclock, typically radeons ramp their core frequency quite agressively (it drops to almost 2d clocks if GPU load is lower than 70% or 80%) while GeForces typically stay at almost maximum clocks until load drops to 40% or so (people who own a nV gpu can correct me on that)
 
Also, there's a difference in how AMD and Nvidia GPUs underclock, typically radeons ramp their core frequency quite agressively (it drops to almost 2d clocks if GPU load is lower than 70% or 80%) while GeForces typically stay at almost maximum clocks until load drops to 40% or so (people who own a nV gpu can correct me on that)
This was more true on my Pascal 1080Ti than it is on my current Ampere 3080Ti. I've been playing a lot of Space Engineers with the kiddos lately, and even with graphics knobs cranked to 11, the GPU stays between 40 and 70 percent utilized. The clocks hover around the 500-1200MHz area, quite a bit lower than the peak rate of ~1900MHz it's capable of. However, to be fair, I have the GPU limited to 60Hz because that's the upper refersh rate of my Dell U2711 monitor. Perhaps it's the frame limiter which allows this behavior, rather than just purely vsync? Dunno.
 
Kindly check all the company cheerleading at the door. It doesn't make for constructive technical discussions.
 
Memory bandwidth on consumer GPUs isn't doubling any time soon, so it seems likely NVidia will do something like Infinity Cache with Lovelace.

Also performance per watt and per unit bandwidth during maximum ray tracing (Cyberpunk on Psycho and Metro Exodus:Enhance Edition, both no DLSS) should be a data point in a discussion of efficiency. Ray tracing on AMD eats tons of power so NVidia should have a massive advantage there.

Cap the framerate at 1080p to 30fps comparing 3090 and 6900XT and see what happens, if you also want to investigate an equal framerate scenario...
 
We had 512 and 448 bit bus as one point, years ago. Why it is out of the question now ? Price ? It wasn't a massive problem before...
 
Back
Top