AMD RDNA3 Specifications Discussion Thread

Bondrewd · Nov 16, 2022

Frenetic Pony said:
Hopefully Navi 32 is delayed until the clockspeeds really can hit 3+ghz or so

It's not delayed.

Frenetic Pony said:
That would put it at 80% of the performance of a 7900xtx at like, $649 (or less).

Right on money chefe.

Jawed · Nov 16, 2022

It seems, according to this slide:

that there is native "fixed function primitive culling hardware to remove SW based culling overhead" which implies that AMD has gone "backwards" with NGG (primitive shaders), adding new hardware for primitive culling.

Is that the correct interpretation?

Bondrewd · Nov 16, 2022

Jawed said:
adding new hardware for primitive culling.

Well yeah, they do a prim unit per shader array and N31 has 50% more of them over N21.

DieH@rd · Nov 16, 2022

Jawed said:
It seems there's another slide deck, as seen here:

AMD Reveals More Details Around The Radeon RX 7900 Series / RDNA3 - Phoronix

www.phoronix.com

https://twitter.com/x/status/1592544512001572864

DegustatoR · Nov 16, 2022

DieH@rd said:
https://twitter.com/x/status/1592544512001572864

So as I've said, 7900XT will be more or less equal to 4080 in "rasterization" and will loose heavily when any RT will come into play.
7900XTX will likely end up being between 4080 and 4090 in "rasterization" and will also loose to the former with RT albeit there will probably be titles where it won't (FC6, REVIII, etc)

Deleted member 2197 · Nov 16, 2022

DieH@rd said:
https://twitter.com/x/status/1592544512001572864

There's a lot of RDNA3 performance numbers going around. Is that tweet including FSR in the 7900XT comparison against the 4080?

DieH@rd · Nov 16, 2022

pharma said:
Is that tweet including FSR in the 7900XT comparison against the 4080?

Perf numbers by Quasar Zone are for maxed out C2077 at 4K, no RT, and DLSS/FSR are used.

When they tested 4080, they matched the 6950XT framerate to the one listed by AMD in their 7900 XT/XTX perf slide.

Digidi · Nov 16, 2022

Jawed said:
It seems, according to this slide:

that there is native "fixed function primitive culling hardware to remove SW based culling overhead" which implies that AMD has gone "backwards" with NGG (primitive shaders), adding new hardware for primitive culling.

Is that the correct interpretation?

I‘m confused that they write in headline 12 prim/clk and in the text 24 prim/clk. I think normally it uses 12 prim/clk, when you use primitive shader and put it in your program you get 24 prim/clk?

Maybe @Rys can give a little bit more clearance about this? Maybe he writes a benchmark

Waiting 10 years for beyond3d suite 2.0

I‘m also wondering about Nvidia and it’s 12 Rasterizer. I was thinking that Ada will have 8 Rasterizer/GPC

TopSpoiler · Nov 17, 2022

Jawed said:
AMD has gone "backwards"

They call it "ADVANCES"

Jawed · Nov 17, 2022

Digidi said:
I‘m confused that they write in headline 12 prim/clk and in the text 24 prim/clk. I think normally it uses 12 prim/clk, when you use primitive shader and put it in your program you get 24 prim/clk?

"Primitives" and "vertices" in the text - they're not actually the same thing.

In a mesh there's approximately one triangle (primitive) per vertex, but when a mesh "wraps" around an object, going from front-facing to back-facing there are potentially some triangles that end up being viewed edge-on. They are degenerate (have 0 area measured in screen pixels), so should be culled, though they are not actually back-facing.

Another cause of culling within a mesh is when some triangles end-up entirely outside of the screen. The mesh in this case is near the edge and only parts of it fall over the edge. Only triangles whose full set of vertices are outside of the screen can be culled though, you can't cull a triangle which only has one vertex that's outside.

But anyway, none of this relates to the question I have, the apparent non-use of NGG because there's fixed function hardware to remove software culling overhead.

trinibwoy · Nov 17, 2022

Jawed said:
"Primitives" and "vertices" in the text - they're not actually the same thing.

In a mesh there's approximately one triangle (primitive) per vertex, but when a mesh "wraps" around an object, going from front-facing to back-facing there are potentially some triangles that end up being viewed edge-on. They are degenerate (have 0 area measured in screen pixels), so should be culled, though they are not actually back-facing.

Another cause of culling within a mesh is when some triangles end-up entirely outside of the screen. The mesh in this case is near the edge and only parts of it fall over the edge. Only triangles whose full set of vertices are outside of the screen can be culled though, you can't cull a triangle which only has one vertex that's outside.

But anyway, none of this relates to the question I have, the apparent non-use of NGG because there's fixed function hardware to remove software culling overhead.

Was NGG ever used?

Jawed · Nov 17, 2022

trinibwoy said:
Was NGG ever used?

Really don't know.

The reference to "software based culling overhead" is the key here and I can't tell what AMD is actually describing. This overhead has been "removed" in RDNA 3. Does that imply AMD has been using NGG, but with RDNA 3 that's been dropped in favour of fixed function hardware?

Digidi · Nov 17, 2022

Jawed said:
Really don't know.

The reference to "software based culling overhead" is the key here and I can't tell what AMD is actually describing. This overhead has been "removed" in RDNA 3. Does that imply AMD has been using NGG, but with RDNA 3 that's been dropped in favour of fixed function hardware?

I think they mean that you had to write some special code if you want to use the new pipline in your software. Didn't we discussed here that there are special commands for NGG? And i think now you can use the normal Piplines (DX11, DX12 and Vulcan) withouth the command and the polyons will automaticly bean routet to NGG?

DegustatoR · Nov 17, 2022

Digidi said:
I think they mean that you had to write some special code if you want to use the new pipline in your software. Didn't we discussed here that there are special commands for NGG? And i think now you can use the normal Piplines (DX11, DX12 and Vulcan) withouth the command and the polyons will automaticly bean routet to NGG?

Was NGG ever exposed in client side s/w as something you can program for? AFAIR all geometry went through NGG automatically starting with RDNA1.

Digidi · Nov 17, 2022

Didn't we discussed thet here:

Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Nintendo did it with Switch. There are games (MK11?) that run higher clocks in general (maybe just in handheld mode), and they also added a boost to clocks specific to some games during loading screens. The former has an impact on battery life, the latter may have an impact, but it might also...

forum.beyond3d.com

DegustatoR · Nov 17, 2022

Digidi said:
Didn't we discussed thet here:

Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Nintendo did it with Switch. There are games (MK11?) that run higher clocks in general (maybe just in handheld mode), and they also added a boost to clocks specific to some games during loading screens. The former has an impact on battery life, the latter may have an impact, but it might also...

forum.beyond3d.com

What you can do on PS5 rarely has any relevance to what you can do on PC.

Jawed · Nov 18, 2022

What is NGG and shader culling on AMD RDNA GPUs?

NGG (Next Generation Geometry) is the technology that is responsible for any vertex and geometry processing in AMD RDNA GPUs. I decided to do a write-up about my experience implementing it in RADV, which is the Vulkan driver used by many Linux systems, including the Steam Deck. I will also talk...

timur.hu

I hadn't seen this article before today. It talks about RDNA, which in effect means RDNA 1 and 2.

There are now only 2 HW shader stages for vertex/geometry processing:

Surface shader which is a pre-tessellation stage and is equivalent to what LS + HS was in the old HW.

Primitive shader which can feed the rasterizer and replaces all of the old ES + GS + VS stages.

These are referred to as hardware stages, which I suppose at the least implies that they are initiated/scheduled by dedicated hardware, even though they run as shaders.

Compared to the old HW VS, a primitive shader has these new features:

Compute-like: they are running in workgroups, and have full support for features such as workgroup ID, subgroup count, local invocation index, etc.

Aware of both input primitives and vertices: there are registers which contain information about the input primitive topology and the overall number of vertices/primitives (similar to GS).

They have to export not only vertex output attributes (positions and parameters), but also the primitive topology, ie. which primitive (eg. triangle) contains which vertices and in what order. Instead of processing vertices in a fixed topology, it is up to the shader to create as many vertices and primitives as the application wants.

Each shader invocation can create up to 1 vertex and up to 1 primitive.

Before outputting any vertex or primitive, a workgroup has to tell how many it will output, using s_sendmsg(gs_alloc_req) which ensures that the necessary amount of space in the parameter cache is allocated for them.

On RDNA2, per-primitive output params are also supported.

It's notable that RDNA 2 has functionality there that's not present on RDNA 1. Which may be relevant to general conclusions regarding NGG culling being of no benefit on RDNA 1.

Notes about hardware support

Vega had something similar, but I haven’t heard of any drivers that ever used this. Based on public info I cound find, it’s not even worth looking into.

Navi 10 and 12 lack some features such as per-primitive outputs which makes it impossible to implement mesh shaders on these GPUs. We don’t use NGG on Navi 14 (RX 5500 series) because it doesn’t work.

Navi 21 and newer have the best support. They have all necessary features for mesh shaders. We enabled shader culling by default on these GPUs because they show a measurable benefit.

Van Gogh (the GPU in the Steam Deck) has the same feature set as Navi 2x. It also shows benefits from shader culling, but to a smaller extent.

The original merge request for culling is here:

radv: Implement NGG culling (!10525) · Merge requests · Mesa / mesa · GitLab

NGG culling: Quick summary This MR implements NGG culling on top of the NGG lowering that was merged in MR

gitlab.freedesktop.org

The VS (or TES) is butchered into two parts:

Top part: ES vertex threads compute only the position output and store that to LDS.

Culling code

GS threads load the positions of each vertex that belongs to their triangle and decide whether to accept or cull the triangle.

Surviving vertices are repacked if needed.

Bottom part: ES threads of the repacked, surviving vertices compute the other outputs.

which matches, as far as I can tell, the model of culling that was described for Vega (but never used).

This article refers explicitly to RDNA 3:

RADV Vulkan Driver Continues At Full-Speed Preparing For RDNA3/GFX11 GPUs - Phoronix

www.phoronix.com

One interesting bit brought up by this patch series is that with GFX11, Next-Gen Geometry (NGG) is now always enabled.

and I think in the speculation thread we discussed NGG as being the only option for RDNA 3 in light of code commits.

The mystery remains, what hardware interaction does NGG use and is that hardware being referenced in the slide I linked earlier? The only hint at hardware usage that I can find is calling a s_sendmsg, where space in the output buffer is requested for vertex/primitives to be stored. Messages, themselves, have to be buffered (queued) in order to be handled and maybe the slide is referring to the intense message traffic produced by NGG (independent of culling)? Such intense message traffic can be interpreted as "software overhead", I suppose and have been improved in RDNA 3 versus RDNA 2.

Note for @Digidi : sorry when I quoted the slide earlier I misquoted it, which is how you ended up talking about primitives and I corrected you saying vertices...

Dampf · Nov 18, 2022

RDNA3 really has dedicated matrix accelerators after all!

That's really exciting!

Jawed said:
It's notable that RDNA 2 has functionality there that's not present on RDNA 1. Which may be relevant to general conclusions regarding NGG culling being of no benefit on RDNA 1.

Well of course, that's likely one reason why it doesn't support mesh shaders. RDNA1 is a dinosaur architecture.

Kaotik · Nov 18, 2022

Dampf said:
RDNA3 really has dedicated matrix accelerators after all!

That's really exciting!

Well of course, that's likely one reason why it doesn't support mesh shaders. RDNA1 is a dinosaur architecture.

Yes but no. They're there to support the WMMA-instructions, Dot2 (FP16, BF16, INT8) and Dot4 (INT4), but they still employ all the "normal ALUs" in Compute Unit, not just those accelerators.

Jawed · Nov 18, 2022

Dampf said:
RDNA3 really has dedicated matrix accelerators after all!

View attachment 7552

That's really exciting!

I think that conclusion is wrong. This slide is pretty explicit in its title, in my opinion:

saying that the 64 lanes of the vector unit can also function as the "matrix accelerator" with two variants of dot product instructions, either DOT2 or DOT4. Well, these aren't traditional dot product instructions, because they are accumulating variants.

Dampf said:
Well of course, that's likely one reason why it doesn't support mesh shaders. RDNA1 is a dinosaur architecture.

This is factually the case according to one of the articles I've just linked, Mesh Shaders requires per-primitive output parameters.

AMD RDNA3 Specifications Discussion Thread

Bondrewd

Jawed

Bondrewd

DieH@rd

AMD Reveals More Details Around The Radeon RX 7900 Series / RDNA3 - Phoronix

DegustatoR

Deleted member 2197

Guest

DieH@rd

Digidi

TopSpoiler

Jawed

trinibwoy

Meh

Jawed

Digidi

DegustatoR

Digidi

Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

DegustatoR

Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Jawed

What is NGG and shader culling on AMD RDNA GPUs?

Notes about hardware support

radv: Implement NGG culling (!10525) · Merge requests · Mesa / mesa · GitLab

RADV Vulkan Driver Continues At Full-Speed Preparing For RDNA3/GFX11 GPUs - Phoronix

Dampf

Kaotik

Drunk Member

Jawed

Similar threads

AMD RDNA3 Specifications Discussion Thread

Deleted member 2197

Guest

Meh

Notes about hardware support​

Drunk Member

Similar threads

Notes about hardware support