AMD RDNA3 Specifications Discussion Thread


This information came from AMD itself. In a file called "performance.hpp," which was part of an update to the company's ROCm software platform, AMD put out some exciting specifications. Since then, the file has been taken down, but on social media Kepler (@Kepler L2) took a screenshot of a rather important part.

The Navi 32 has 60 CUs (made up of 30 WGPs), and the Navi 33 has 32 CUs (16 WGPs). his means that the most shaders a Navi 32-based (RX 7700) graphics card can have is 3840, which isn't a lot to be honest. However, the new architecture can double up on performance thanks to the new architecture. However, this does not guarantee the same level of performance.

Conversely, on paper, the little Navi 33 GPU appears (RX 7600) to be equivalent to the Radeon RX 6650 XT. When contrasted to RDNA 2's Navi 22 and 23, the last gen GPU has a 50% increase in CU count - 20 extra CUs - while the latter remains the same at 32 CUs. When put to use, performance should be roughly on par with the Radeon RX 6800. Although it can achieve solid 4K60 performance in many games, we anticipate that Navi 33's small cache and smaller design will make it more challenging to scale to high resolutions.

index.php
 
The gist is that instead of traversing deeper into the tree you can decide to stochastically sample just one of the triangles underneath the current node and use that as an approximation for the pixel covered by that node.
A way better approach: Instead of making assumptions from internal BVH boxes (which wont represent the surface well at all), make those boxes the leafs, containing just one triangle (or two) which a best fit of the geometry and materials.
Then you can use classical RT HW, you get the same wins, no need for new HW.
So basically we talk about LOD here again. And this example is exactly what brought me to the conclusion that hierarchical LOD and acceleration structures can (and probably should) be the same thing.
But i don't think using one for both needs is practical yet. For now it would be good enough if we could convert one to the other, saving the costs to build at runtime and enabling fine grained LOD for RT.

Do you have specific examples in mind of useful api extensions? The DXR interface is essentially “build an acceleration structure with this bag of triangles”. It does not mandate that the structure is a BVH or anything else. This gives maximum flexibility to the IHV and more room to innovate rapidly on the hardware side. The downside is that it’s completely opaque to developers.
Specifying the data structures they use needs no examples. But we can discuss how a 'BVH API' should look like.
Basically we would require new compute shader statements / built in functions to traverse, modify, and create the tree.
Following the Nanite example, there would be two things to handle: Collapsing clusters to a single one with lower details, and the other way around.
I make the assumption that the LOD hierarchy tree also makes a good BVH for tracing, which maybe isn't the case for Nanite, but would be possible.

So, on the collapse, we would need to make internal nodes leafs, make the child pointers point to triangles instead child nodes, and free the no longer needed sub branch nodes from memory.
On the expand, we would need to allocate and generate the new child nodes, turn triangle pointers into child node pointers, and free the triangle memory.
(Something like that, to make a simplified example)

The main challenge here seems the memory management. I'd be fine with doing this myself. Likely such functionality is used only by a few, so it does not have to be super easy to use.
A expected difficulty would be different branching factors across vendors. We know AMD uses BVH4, NV might use BVH64, Intel might use binary trees, ARM might use BVH8 (random numbers, no guesses).
We need to handle this in our offline data to HW format conversation compute shader.
Also, Nanite has X triangles per node, where X is much larger than one (or how many triangles the HW actually has in a leaf node. Intel hints to have at least two, because of their native support of two triangle quads with common edge).
Same is true for my stuff. One node maps to a whole cluster of triangles.
Thus we likely need to generate multiple bottom levels to the leafs in a single compute shader workgroup, to add the extra nodes and levels needed for RT. That's still fast, because no need for multiple dispatches with barriers between tree levels. It's actually a good trade off, because the BVH on disk, missing the lowest levels, takes much less storage and streaming time.

A complication would be NVs 'treelets', which would be a packet of a tree branch over multiple tree levels continuously in memory.
If they use this, things are no longer that trivial, but still possible.

You see the concept is simple, and the idea makes total sense. Though, i also assume high density geometry with high and quite regular tessellation. That's not yet the standard, and ofc. working with trees isn't either, so i admit requesting such API is quite a stretch.
Only the IHVs and API designers sitting on a table can make a practical proposal on what's possible and meaningful.
 
Andrew Lauritzens posts have confirmed my former speculations and critique, besides Karis comments on twitter.
It's true: RT APIs can't do LOD.
So? Why is this the reason of all possible reasons for Epic choosing s/w RT implementation for their "fast" rendering path?

But if your agenda dictates
The only agenda I see here is you mentioning LOD in every discussion about RT as if this is the main issue we have with RT now.

I did not mean mobiles. I mean PC platform. Last time i checked RT support on Steam HW survey it was less than 20%.
20% of the whole Steam userbase is 24M players and this is enough already to sell games which require RT h/w especially if they will also sell on consoles.

Idk what's the percentage now, but i would support old non RT HW during the entire console generation, at least.
I'm about 100% sure that support for non-RT h/w will be dropped way sooner than 2028 - which is seemingly the extent of this console generation.
It will likely happen as soon as console developers will drop support for non-RT consoles - which is very likely to happen over the next couple of years en mass.
DX12U feature set will be the main base API requirement from that point onward. (And it's fairly likely to be the case even for games which won't use RT.)

Yes, but it's all about games. World design, artist workflow, compression requirements are more important than photorealism.
And it's not a HW issue - it is a API issue, so actually SW.
It's not an API issue, it's an issue with how terrain design is handled in UE5 with Nanite. It can be both avoided (don't use Nanite for terrain?) and improved (make a new terrain generator which won't kit bash Nanite meshes together).
You're presenting this as an issue with RT h/w or RT API but it's not, it's an engine issue - and no other engine but UE5 has it.

Yes, but my point is: A software implementation of RT could support Nanite. Because there would be no API blackboxes preventing a 'lodable' BVH.
A h/w implementation of RT support Nanite just fine right now. The issue isn't in Nanite per se.

All we need is to open up BVH.
Who's "we"? Most people here and elsewhere seem to need more performance from RT, and any flexibility in APIs will likely result in the opposite of that.
RT h/w will get there eventually (probably) but right now when we're still arguing if RT can even be used b/c of its performance implication it is too early to make it more flexible.
Once all games will be using RT then sure.

I also think it's funny how all the critics of DXR are at the same time the proponents of the idea that modern consoles won't be able to use RT due to performance implications - but there are no DXR on PS5! So is DXR really an issue here?
 
Last edited:
I think you should try harder. You're reputated to deliver cutting edge (assuming i draw the proper conclusions from your nickname). Don't give those US techies the lead in absolutely everything without a fight >:)
@OlegSH
Oops, somebody has just informed me my assumptions about your real world identity were totally wrong.
So, some of my comments did not make much sense, i guess. I have to apologize. Sorry for the confusion.
 
So? Why is this the reason of all possible reasons for Epic choosing s/w RT implementation for their "fast" rendering path?
The problem is that RT can't handle the visible level of detail, so no precise shadows, no precise reflections, etc.
Also, it's sad that the problem they have just solved still comes back just to support RT, because they have to use discrete LODs and proxy geometry for that.

That's my only option too - proxy meshes. The same stuff i'll use for physics collisions.
It really sucks, because my GI already gives a complete lighting solution, with soft shadows and reflections, but it's too low frequency and blurry for gritty details.
So what i personally want from RT is precise direct lighting.
But i have to choose: Either that, or continuous LOD. Both at the same time does not work. Not really sure yet which option to pick.

The only agenda I see here is you mentioning LOD in every discussion about RT as if this is the main issue we have with RT now.
RT prevents any progress with LOD. Is this a main issue? Yes, if you want higher efficency or high detail. No if you're happy with current state of the art, but then you don't need RT either.

20% of the whole Steam userbase is 24M players and this is enough already to sell games which require RT h/w especially if they will also sell on consoles.
Obviously the industry disagrees. There is no existing or announced AAA game with RT on minimal specs.

I'm about 100% sure that support for non-RT h/w will be dropped way sooner than 2028
I have no problem with that if so.
But i guess 2028 we're still stuck on static mesh topology. Which is a problem, if so.

It's not an API issue, it's an issue with how terrain design is handled in UE5 with Nanite. It can be both avoided (don't use Nanite for terrain?) and improved (make a new terrain generator which won't kit bash Nanite meshes together).
You're presenting this as an issue with RT h/w or RT API but it's not, it's an engine issue - and no other engine but UE5 has it.
I do not think this is a flaw of HW RT at all. It's just interesting that SDF can beat HW in certain cases, and that such cases are indeed common and important.
Artists are used to design around limitations, agreed. I also dislike the idea of SDF volumes and sphere tracing in general, while i'm totally convinced about BVH.
So no, i do not present this as proof of failure for RT, i just replied to your quote we would not know why Epic uses SW tracing although HW tracing exists. This kitbashing is quite an argument. Artists need to do it to turn instances into the illusion of variance and natural flows. And it's present in all engines and most games, because all and everything is based on instanced, modular models. We are there since tilemaps in 2D games.


A h/w implementation of RT support Nanite just fine right now.
There is NO hw support for Nanite at all, because it's impossible due to API black box.
Who's "we"? Most people here and elsewhere seem to need more performance from RT, and any flexibility in APIs will likely result in the opposite of that.
Well, no. Streaming BVH instead building it at runtime as a background process gives better performance, not worse performance. Obviously, and as said before.

RT h/w will get there eventually (probably) but right now when we're still arguing if RT can even be used b/c of its performance implication it is too early to make it more flexible.

The HW already is there. The API is the issue. We request more flexibility because we want to improve performance. We want practical RT on affordable entry level HW.
Once the flexibility is given, i can take a look on how slow or fast RT actually is. And then i will find an application which fits the given performance.
If the flexibility isn't there so i can't use RT at all, than there is no point for me to look up performance at all. Thus my contribution here is focused on flexibility, not performance. I wish it would be different.
 
No, it's not the same process.
The process of making GPUs programmable required (huge) changes on the HW. So it couldn't have happened over night, even if they knew it's going to come.

Comparing the flexibility issues of DXR with this long years progress is just as pointless as comparing rasterization with ray tracing. It's apples vs. oranges. You all need to get past that point.


It was rushed, and decisions were short sighted.
Idk, what's the reasons for that, but shit happens.
Intel has to destroy AVX-256 on a whole architecture of chips. How could this happen? Why didn't they exclude the feature from the start, to save the die space and money?
Idk either, but it's quite the same shit which has happened, although they are experts and should have known what they do.
I see it as being the same. We go from generic transistors to dedicated accelerators and then back to generic silicon and back to dedicated silicon and back to generic silicon.

We saw this with math co-processors, 3D accelerators, RT accelerators, ML accelerators.

I think this is a normal cyclical progression that comes with technology. We are on our way back to centralized computing after having left main frames for PCs.

I get why you feel it’s rushed or doesn’t resolve the challenges that you want to solve, but you have to know that your concerns must have aired. There is a committee, there are major IHV at this committee working together what should be in DXR and what can be left an extension. There is a long roadmap that all IHVs should be working towards knowing well what’s coming down the pipe so that hardware and APIs can come together and release together.

There had to be economical, logistical or both, in terms of reasons to see why it started out like this. LOD may be on the roadmap at a later time, but other issues need to be solved before they get there so they released what can work today to get people moving over now otherwise there may be no RT support for another 3-4 generations.

They want games to move now and it still solves major problems we have with static lighting today. Sure you can use other methods and those options are entirely up to the developer. You’re not forced yo use the hardware just like you’re not forced to use tiled resources.
 
I see it as being the same. We go from generic transistors to dedicated accelerators and then back to generic silicon and back to dedicated silicon and back to generic silicon.
What you see is irrelevant to the current critique and requests, thus i assume you do not understand the problem, and what flexibility is missing.
So i need to explain again with examples:

Requests:
I want to implement reordering during traversal to improve performance.
I want to have traversal shaders to swap scene.
I want to intersect discs instead triangles.
I want to have curved rays.

Those things would require new HW, and your mentioning of how HW evolves forth and back between generic or specific functionality are related.
Arguments like HW needs time to add such features or questioning those requests usefulness is justified.

Requests:
I want to create BVH from streamed data instead building it at runtime.
I want to create BVH to support dynamic geometry, e.g. to achieve LOD.
I want to access BVH for other applications than ray tracing, e.g. range queries for collision detection.
I want to get better perf. by using precomputed higher quality BVH.

Those things have nothing to do with the HW, so a discussion of SW RT vs. HW RT, or how's the usual HW progress are not relevant.

you have to know that your concerns must have aired. There is a committee, there are major IHV at this committee working together what should be in DXR and what can be left an extension. There is a long roadmap that all IHVs should be working towards knowing well what’s coming down the pipe so that hardware and APIs can come together and release together.
Only this is relevant.
But LOD has not just aired. Visibility and LOD are the two main problems in computer graphics since the first attempts to render 3D graphics. Continuous LOD was used in games in the 2000s and before.
Just because progress wasn't that exciting back then does not mean an informed committee can just ignore the topic all together, and preventing any further attempt to solve it by their short sighted decisions.
They simply failed at doing their job, that's all.

LOD may be on the roadmap at a later time, but other issues need to be solved before
I'm sure it is on their roadmap. Now that they have learned unexpected progress can happen anytime.
However, any other issues are orthogonal to exposing BVH data structures. Thus they are irrelevant as well.
You’re not forced yo use the hardware just like you’re not forced to use tiled resources.
I am forced to deliver efficient solutions. Thus i am forced to utilize HW acceleration where available.

Well, don't worry. I'll use an inefficient solution. That's not the point.
The point is: If even experts on a high brow tech forum can't understand the problem, turning it into an pointless war of fixed function vs. programmable, or AMD vs. NV, or apples vs. oranges...
Then how long will it take until i can expect a solution from the committee?

Additionally: If all AAA devs give up on in house engines and switch over to U engines, how many people are left to side my requests?

Not enough, i guess. So the committee won't take action at all, because they have other things to do, like drinking coffee, listening to NVs proposals, and congratulating each other on their good work.
There is not much you could say to increase my trust in this system. No complaints, no improvements.
 
I would argue the complexities you identified leads to the opposite conclusion. It's actually not simple at all.
The concept is simple, but the implementation is not.
There is no better way, however.
Just expose the data structures. I'll deal with the complexity. I do not really believe in an BVH API, just mentioning it as an option.
 
  • We don't think the hardware is quite there yet for completely convincing RT without significant compromises in other areas that some of us find worse than what currentRT implementations bring.

I agree with most of your post, but I'd have to challenge this bit a little. Granted you're just talking about the opinions of some people here and not stating anything as fact, so there's nothing specifically wrong with the statement, however I think the general sentiment that RT comes with significant compromises is too much of a blanket statement.

On what hardware?​
On what games?​
Granted if you want to play Cyberpunk RT on a 2060 - which is absolutely doable, the compromises may be too great for many people, perhaps even a majority.

But if you want to run RT in Resident Evil Village on a 4090, there are essentially no compromises at all. In this case, RT is an unambiguous win and the hardware is absolutely there for it. And in this case much weaker hardware than a 4090 will deliver the same result.

Most other game/hardware combinations sit somewhere between those two extreme examples but ultimately it boils down to this:

  1. RT always (or arguably very nearly always) improves the core visuals of the game, regardless of whether that's by a little, or a lot. That's something we all want.
  2. RT is almost always usable in some form on any decent RT capable piece of hardware (I'd exclude the really shitty implementations like the 6400/6500XT here).
  3. The only real question here is how much resolution and framerate you are willing to sacrifice for the core graphics improvement - if indeed you need to sacrifice any at all.
  4. And that final point is a factor of the hardware you're using, the specific game implementation, and your personal preferences for frame rate and image quality.

If for example you can hit min 60fps at 4K with good image reconstruction with some level of RT enabled - as you can on many game/hardware combinations, then I think a large majority of gamers would consider that to be a perfectly playable experience with no significant compromises brought on by the use of RT (particularly if they are limited to those or lower settings by their monitor/TV). Obviously there will be frame rate and resolution purists, particularly on enthusiast forums like this that will class 60fps as unplayable and/or any form of image reconstruction as a no go, but I'll wager they are a very small minority within the overall gamer population.
 
But if you want to run RT in Resident Evil Village on a 4090, there are essentially no compromises at all. In this case, RT is an unambiguous win and the hardware is absolutely there for it. And in this case much weaker hardware than a 4090 will deliver the same result.
Pretty sure some of the usual suspects have been pretty vocal here with RE:V not being "proper RT" because it's "used too little". These kinds of things cause lots of the conflicts.
The only real question here is how much resolution and framerate you are willing to sacrifice for the core graphics improvement.
But lowering resolution lowers the graphics quality directly, and this is an issue for many of us; is RT improving "core graphcis" if by enabling it you have to lower everything by cutting the resolution?
 
The problem is that RT can't handle the visible level of detail, so no precise shadows, no precise reflections, etc.
Define precise. Lumen s/w is using SDF representation for those which are considerably less precise than proxy meshes used for Lumen h/w.
If you want 1:1 mapping then Nanite must be done in a different way - but is it really needed? Do you need pixel accurate surface details in reflections even? Most RT reflections examples are using lower than frame rendering resolution and simplified geometry anyway even without Nanite. I feel like this "issue" isn't really an issue of any kind for gaming purposes.

RT prevents any progress with LOD.
RT doesn't prevent any progress with LOD, what it does is prevent using said progress inside the results of RT itself. These are two completely different issues.

Obviously the industry disagrees. There is no existing or announced AAA game with RT on minimal specs.
There is at least one and considering that we're not out of crossgen just yet it's hardly surprising. Games these days are rarely announced more than a year before they are released.

I do not think this is a flaw of HW RT at all. It's just interesting that SDF can beat HW in certain cases, and that such cases are indeed common and important.
It can "beat" h/w by being considerably less precise and glitchy while running at the same or lower performance. Basically the ability to (somewhat) incorporate dynamic LOD into itself is the only area where it can "beat" h/w RT - and if the first visual results are anything to go by h/w RT "without LOD" looks better than s/w RT "with it".

This kitbashing is quite an argument. Artists need to do it to turn instances into the illusion of variance and natural flows. And it's present in all engines and most games, because all and everything is based on instanced, modular models.
But these even if used during content creation aren't present in the shipping code as is anywhere for the same reasons why they are introducing issues in UE5 - they are bad for actual rendering, with or without TR.

There is NO hw support for Nanite at all, because it's impossible due to API black box.
There is, it's using the same proxy mesh as Nanite is using anyway. Proxy meshes there are not for RT.

The HW already is there. The API is the issue.
We can safely say that this is not the case as otherwise we'd already see the h/w being used this way if not in Vulkan IHV EXTs then on PS5 at least.
 
Define precise. Lumen s/w is using SDF representation for those which are considerably less precise than proxy meshes used for Lumen h/w.
If you want 1:1 mapping then Nanite must be done in a different way - but is it really needed? Do you need pixel accurate surface details in reflections even? Most RT reflections examples are using lower than frame rendering resolution and simplified geometry anyway even without Nanite. I feel like this "issue" isn't really an issue of any kind for gaming purposes.


RT doesn't prevent any progress with LOD, what it does is prevent using said progress inside the results of RT itself. These are two completely different issues.


There is at least one and considering that we're not out of crossgen just yet it's hardly surprising. Games these days are rarely announced more than a year before they are released.


It can "beat" h/w by being considerably less precise and glitchy while running at the same or lower performance. Basically the ability to (somewhat) incorporate dynamic LOD into itself is the only area where it can "beat" h/w RT - and if the first visual results are anything to go by h/w RT "without LOD" looks better than s/w RT "with it".


But these even if used during content creation aren't present in the shipping code as is anywhere for the same reasons why they are introducing issues in UE5 - they are bad for actual rendering, with or without TR.


There is, it's using the same proxy mesh as Nanite is using anyway. Proxy meshes there are not for RT.


We can safely say that this is not the case as otherwise we'd already see the h/w being used this way if not in Vulkan IHV EXTs then on PS5 at least.

Use fuse BVH for LOD is possible on consoles not only on PS5. Xbox Series S|X can stream static geometry of the BVH too. The console are underpowered but they are much more flexible than PC on RT.
 
I think the general sentiment that RT comes with significant compromises is too much of a blanket statement.

We can put a finer point on it and clarify what those compromises are for the average gamer with a $300-$400 graphics card and a 1080p monitor. Yes 65% of steam users are apparently still on 1080p. So in reality it boils down to having to use upscaling and potentially reducing some settings from ultra to medium in order to hit 60 fps depending on the game. This is probably an issue for many people and is probably not an issue at all for many others.

Like you said individual opinions are fine but the friction arises when those are extrapolated to represent market sentiment.
 
The console are underpowered but they are much more flexible than PC on RT.

All that flexibilty doesnt mean much if the hw is too weak to run any meaningfull RT. And thats the whole discussion here, yes flexible is nice but its too slow. Lile with pixel shaders, hw T&L, 3D acceleration etc, you need to start somewhere. Intel and NV have started, AMD will follow.
 
All that flexibilty doesnt mean much if the hw is too weak to run any meaningfull RT. And thats the whole discussion here, yes flexible is nice but its too slow. Lile with pixel shaders, hw T&L, 3D acceleration etc, you need to start somewhere. Intel and NV have started, AMD will follow.
You do realize that the console solution is AMDs first gen solution, right? The flexibility is there from hardwares perspective at least on AMD (could be for NV/Intel too, dunno), it's the software (APIs) limiting it on PC.
 
You do realize that the console solution is AMDs first gen solution, right? The flexibility is there from hardwares perspective at least on AMD (could be for NV/Intel too, dunno), it's the software (APIs) limiting it on PC.
It's not "flexibility", it's a result of console h/w being closed and stable (the opposite of flexibility really, funnily enough). Another such result is the ability to ship precompiled shaders with console games for example.

Offline BVH build isn't related to LOD issue which we were talking about.
 
All that flexibilty doesnt mean much if the hw is too weak to run any meaningfull RT. And thats the whole discussion here, yes flexible is nice but its too slow. Lile with pixel shaders, hw T&L, 3D acceleration etc, you need to start somewhere. Intel and NV have started, AMD will follow.

Like other said this is first AMD RT implementation. I think we will see midgen console and they will be much more interesting than base console from a performance point of view with AMD improvement made for 2nd generation RT.

And some Nvidia researcher are not very happy of DXR lack of LOD support.


the fastest total performance.
Our new method for interpolating normals uses 1–7% more time compared to
the baseline, but provides substantially smoother images when viewing surfaces from
a short distance. For future work, it could be worthwhile to let ray cones [Akenine-
M¨oller et al. 2019] control when to switch from a less-expensive, lower-quality method
to our method. We recommend that the shadow ray optimization that we have proposed
is always used for methods based on polynomial splitting. In the future, it
would also be useful to evaluate all methods using level of detail with ray cones, so
that traversal would be stopped at a level where the voxel is approximately as large as
the cone diameter. DXR does not have support for this at the moment, however.
 
Back
Top