AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)


Here is what I have heard and seen direct documentation of the Next Gen console SDK environments.
Xbox Series X and PS5 actually do offer DXR equivalents to work with - which are just black boxes just like DXR 1.0 are and 1.1 are. From what I know already, a big name game on Playstation will/has/have had shipped using the DXR equivalent on Playstation 5 with good results (some might even say great results). It makes a lot of sense to offer such a DXR like api for RT for Xbox and Playstation as they want RT to be easy to implement and cross-platform.

The thing that consoles have that are different in terms of "low level" ray tracing beyond DXR is the ability to load up an offline bvh for static things and the ability to write your own traversal code. Unlike NV hardware (turing, ampere) which does Traversal in the RT core (and accelerates it substantially), the traversal on RDNA chips is done on the CUs, competing for compute resources but it is subsequently programmable. The traversal kernel is provided in an opaque DXR like way on both Playstation 5 and Xbox Series X (although there will be difference here in how quick traversal kernel is as Sony and Xbox wrote different ones as their default ones), but a developer can also to go "to the metal" and elect to write their own traversal kernel for their own game should they want to. So you could make a different level of BVH depth if you wanted on console or other custom things - something not at all provided in DXR. On the PC side, AMD is having their own traversal kernel in the AMD driver and cannot be accessed in DXR. Nvidia is doing what ever the heck Nvidia is doing there in hardware on the RT core.

There is a bit of a double edged sword to what Yuriy is mentioning there regarding the greater programmability - as the programmability is coming at the expense of acceleration in hardware to a degree on AMD. Yes, there is more control with the ability to write your own traversal on consoles here vs. DXR spec. A custom traversal kernel *can* have the advantage on a per application manner depending how quick the original default traversal kernel provided by Sony or MS are in that same use case. Though I now know of at least 2 developers that have used the default sony traversal kernel as they thought it was great as is. But at the same time it is about optimising something running on that is *slower* on AMD hardware and competes for resources precisely because AMD is not hardware accelerating it like Nvidia is. On the NV side, you are not writing your own traversal due to DXR but also, would you really need or want to if it is being hardware accelerated anyway and is faster?

That is the way I have come to understand this from developer discussions and looking at the behind the scene documents.
 
Last edited:
Here is what I have heard and seen direct documentation of the Next Gen console SDK environments.
Xbox Series X and PS5 actually do offer DXR equivalents to work with - which are just black boxes just like DXR 1.0 are and 1.1 are. From what I know already, a big name game on Playstation will/has/have had shipped using the DXR equivalent on Playstation 5 with good results (some might even say great results). It makes a lot of sense to offer such a DXR like api for RT for Xbox and Playstation as they want RT to be easy to implement and cross-platform.

The thing that consoles have that are different in terms of "low level" ray tracing beyond DXR is the ability to load up an offline bvh for static things and the ability to write your own traversal code. Unlike NV hardware (turing, ampere) which does Traversal in the RT core (and accelerates it substantially), the traversal on RDNA chips is done on the CUs, competing for compute resources but it is subsequently programmable. The traversal kernel is provided in an opaque DXR like way on both Playstation 5 and Xbox Series X (although there will be difference here in how quick traversal kernel is as Sony and Xbox wrote different ones as their default ones), but a developer can also to go "to the metal" and elect to write their own traversal kernel for their own game should they want to. So you could make a different level of BVH depth if you wanted on console or other custom things - something not at all provided in DXR. On the PC side, AMD is having their own traversal kernel in the AMD driver and cannot be accessed in DXR. Nvidia is doing what ever the heck Nvidia is doing there in hardware on the RT core.

There is a bit of a double edged sword to what Yuriy is mentioning there regarding the greater programmability - as the programmability is coming at the expense of acceleration in hardware to a degree on AMD. Yes, there is more control with the ability to write your own traversal on consoles here vs. DXR spec. A custom traversal kernel *can* have the advantage on a per application manner depending how quick the original default traversal kernel provided by Sony or MS are in that same use case. Though I now know of at least 2 developers that have used the default sony traversal kernel as they thought it was great as is. But at the same time it is about optimising something running on that is *slower* on AMD hardware and competes for resources precisely because AMD is not hardware accelerating it like Nvidia is. On the NV side, you are not writing your own traversal due to DXR but also, would you really need or want to if it is being hardware accelerated anyway and is faster?

That is the way I have come to understand this from developer discussions and looking at the behind the scene documents.

I think his comment is concerning raytracing performance on AMD RDNA 2 GPU maybe he thinks more flexibility will help. For sure it does not affect RTX card.

EDIT: This is the reason I answer to the video of Godfall running on a 6800XT.
 
Last edited:
But at the same time it is about optimising something running on that is *slower* on AMD hardware and competes for resources precisely because AMD is not hardware accelerating it like Nvidia is. On the NV side, you are not writing your own traversal due to DXR but also, would you really need or want to if it is being hardware accelerated anyway and is faster?

Thanks for the insight. Presumably the benefit of programmable traversal is that you can implement more targeted optimizations for your specific app and ultimately beat the generic hardware accelerated implementation. Of course that depends on how much an advantage the hardware impl starts out with.

It would be weird if Sony’s and Microsoft’s default traversal shaders on consoles are more efficient than AMD’s on PC. This topic deserves more than tweets I think. Really interesting stuff.
 
Thx for the info. I had thought low level programming on consoles was still quite complex. Would you say its easier than a high level PC API like DX11? Would these launch, cross gen games already be exploiting enough console optimization to have such a gaping performance gap as alluded to by Frenetic Pony?

I'm certainly biased. I find no APIs overly complex, relatively speaking. What's bothersome on PC, with all APIs, is that you don't know the cost of anything really, drivers can change all the time, and the higher level APIs can have dramatic shifts in function-call costs. New workarounds are bolted on frequently. Serving an unknown performance budget market (as is the PC market) creates polluted / complicated code-bases, and produces a lot of reasonable and unreasonable eye-balling regarding the quality presets and enabled features that are offered. A man-life is only so long.
The console APIs are reliable, they get more stable, but they don't suddenly change their performance. Mostly you have access to everything you could get access to, and long before release of the console. I personally can find my Zen moments programming there, [can] know everything (including at times the precise implementation of the API itself), and can count on it, and can make good use of it, know which space my creativity can explore.

Now if you throw cross-platform in the bag, all guesses are off. The above is mostly for "one game, one architecture, one API, one platform". The capacity of developers and architects to design their code-base around this aspect, is the distinguishing factor, not some particular API. There are not geniuses everywhere, and it's almost never a one man show. It's annoying too, demotivating even at times. Poor performance (in respect to the theoretically possible on some HW) on a console is rarely down to the API, poor performance on the PC (in respect to the theoretically possible on some HW) is more than you want, directly or indirectly down to the API.
Picking up a feature which is hard to get a grip on, on the PC, is up to each project. Obviously there are many stakeholders together with you in a room: the social sphere, the IHVs (some more than others), the companies bank account, the PR, the team composition and so on. There are cases where a correct technical judgement is overriden. "Drivers will improve", "Hardware will improve". This is not an argument on the consoles, you are unable to externalize a "problem", if it's there, it stays there.

The public, in general, is often homeing in on some sort of semi-understandable aspects, like hardware "numbers", or API "numbers" / revisions. But that's only a tiny fragment of everything.
I can not tell you anything whatsoever about what is the problem or efficiency of a particular project I have not worked on, I definitely can not make a generalized statement about the entire game industry as a whole. Maybe there's no talent, maybe there's no time, maybe there's no money, maybe the software architecture is crapy, maybe the producer is crazy, maybe the wrong person got sick, maybe another company caused your company to panic, maybe there's a pandemic ... What I can tell you, is that there are probably around 20-30 layers of operations involved with a AAA game, and those 20-30 stars often (!) don't align, and it's not correct to reflexively bash the IHV (which itself has the same 20-30 layers inside), and attribute all problems to "shitty" hardware, or incompetence or whatever.

RDNA is what it is, and you can judge it based on its own merrit. IMHO it's a fine architecture.

TL;DR: It is possible. But we just can't know.
 
GeForce GTX 680 was considered to be more powerfull than Xbox One or Playstation 4. At this time, there are games, which run ok on those consoles, but don't run acceptably on GeForce GTX 680 even at the same resolution.

Ye well, the Kepler GPUs just aged badly. The direct competitor to that 680 was the 7970 with either 3 or 6gb vram. Those GPUs aged much and much better, performing much better then the ps4/oneS.

@Dictator
Thanks for the insight :)

This is the reason I answer to the video of Godfall running on a 6800XT.

Without context its hard to judge on how well a 6800 performs compared to the PS5 version. The PS5 version is supposed to be running 1440p, some settings lowered and no ray tracing.

I'm certainly biased.

I think were closing in on pc vs console system wars now ;)
 
Ye well, the Kepler GPUs just aged badly. The direct competitor to that 680 was the 7970 with either 3 or 6gb vram. Those GPUs aged much and much better, performing much better then the ps4/oneS.

@Dictator
Thanks for the insight :)



Without context its hard to judge on how well a 6800 performs compared to the PS5 version. The PS5 version is supposed to be running 1440p, some settings lowered and no ray tracing.



I think were closing in on pc vs console system wars now ;)

I don't speak about the PS5 version. I speak about RT Godfall performance on RDNA2 PC GPU. The Epic guy seems to think more flexibility can help on AMD side.
 
I'm not particularly worried about RDNA2, Ampere, Turing aging badly. I'd be a bit worried about RDNA because it lacks support for DX Ultimate features that are performance driven like mesh shading and variable rate shaders. If someone is happy to live without Ray Tracing, I don't think it's a big deal. I think it'll be supported more and more, but you definitely won't lose performance if you decide to forgo ray tracing for this console gen.
 
Last edited:
and ultimately beat the generic hardware accelerated implementation
Generic HW implementations tend to be faster by an order of magnitude in comparison with generic SW implementations (otherwise there is not a lot of sense in doing HW implementation).
My bet would be that specialized SW implementations can be in some cases way faster than generic SW implementations, but would still lose a lot to generic HW implementation.
I can't imagine what you need to do with SIMD to make it as fast on branchy code as MIMD. In the best for SIMD case with coherent rays (no threads divergence) both should be limited by Ray-AABBs amd Ray-Triangles testing hardware, so the one with faster Ray-AABBs and Ray-Triangles testing hardware will win.
In the worst case with incoherent rays RT performance will be limited by data divergence with SW traversal on SIMD and by memory with MIMD cores.
 
der8aer has his 6800xt red devil running above 2700mhz on the stock cooler. Bananas

Were going to see higher probably closing in on 3ghz for this gen of rdna2 gpus. Very wide and blazing fast and IC to help out. In theoretical TF that 6800XT is probably not far off an 3080.
We thought 36cu at 2.2ghz was narrow and fast. 6800XT 'hold my CUs'
 
Generic HW implementations tend to be faster by an order of magnitude in comparison with generic SW implementations (otherwise there is not a lot of sense in doing HW implementation).
My bet would be that specialized SW implementations can be in some cases way faster than generic SW implementations, but would still lose a lot to generic HW implementation.
I can't imagine what you need to do with SIMD to make it as fast on branchy code as MIMD. In the best for SIMD case with coherent rays (no threads divergence) both should be limited by Ray-AABBs amd Ray-Triangles testing hardware, so the one with faster Ray-AABBs and Ray-Triangles testing hardware will win.
In the worst case with incoherent rays RT performance will be limited by data divergence with SW traversal on SIMD and by memory with MIMD cores.
I don't know if it's a useful comparison to make, but mesh shaders is a similar kind of "specialised software" to replace "generic hardware".

It's hard to beat the hardware if you're only "replicating" what the hardware does already. The reason to use mesh shaders is that you want to do something that the hardware can't do directly.

Yuri O'Donnell appears to be complaining that an "opaque" BVH, which the developer cannot manipulate directly, is a serious problem all on its own. This data format is the core of the ray accelerator hardware, and so using something like "custom inline traversal" is doomed, because the hardware that accelerates use of the opaque BVH is the most serious bottleneck. You can't help that hardware with its data, because the BVH is inaccessible.

So customised inline traversal is pissing in the wind, since the BVH is "locked away".

One of the questions this all raises for me is the cost of building the BVH and the cost of updating it. If "next gen" games have geometry that's almost entirely dynamic (destruction everywhere as well as animation), what can you ray trace?
 
I'm not particularly worried about RDNA2, Ampere, Turing aging badly. I'd be a bit worried about RDNA because it lacks support for DX Ultimate features that are performance driver like mesh shading and variable rate shaders. If someone is happy to live without Ray Tracing, I don't think it's a big deal. I think it'll be supported more and more, but you definitely won't lose performance if you decide to forgo ray tracing for this console gen.

It will become a huge problem for RDNA, but it will take a while until Mesh Shaders and Sampler Feedback get fully used, I would assume so, as that needs significant changes to a game's code.

So for cross gen RDNA might be still fine and RDNA users paid less than for competing GPUs, I guess they already expected no longevity.

In regards to Raytracing, RDNA buyers clearly didn't care about Raytracing at all... the issue is, the GPU will care once Raytracing gets efficient enough to replace certain raster effects like Global Illumination as we see with RTXGI, and then it will lag significantly behind cards with DXR acceleration... if AMD cares about adding DXR support for RDNA that is. It's fair to assume that will happen sooner rather than later, using Raytracing significantly decreases time and cost in development and by the end of cross gen, there will be a huge install base of cards with DXR acceleration including the consoles plus for cards without DXR acceleration, it can still be emulated via software with decent enough performance. I mean yeah visual fidelity and performance would have to be significantly dialed back for cards without DXR acceleration, but the game would still run not excluding any users.. again, if AMD cares about adding DXR support to RDNA. If not, then in that case it's dead in the water.
 
I get a lot of vulkan/dx12 vibes from dxr. Devs want low level access. I wonder if devs had low level access would it just mean higher barrier of entry and even less games using ray tracing? Low level access sounds like great idea until one realises that means having to optimize each hw separately(ampere, turing, rdna2 and small chips vs. large chips). It becomes a nightmare. At this point black box might be better and believe the owner of black box does the right optimizations for your game.

In ideal world devs would have option to choose low level/high level api. I suspect we will get there eventually, just like with dx12/vulkan.
 
Last edited:
Back
Top