Digital Foundry Article Technical Discussion [2023]

Status
Not open for further replies.
IIRC, I read somewhere that Remedy commented that while they are using mesh shaders they aren't using amplification shaders for AW2. However, they are looking into using amplification shaders for their next title so that they can potentially have Nanite levels of geometric density.

This might imply there might be a soft ceiling on what can be accomplished with mesh shaders without using amplification shaders.
Very roughly, standard mesh shaders give you a way to do cluster ("meshlet") culling, but don't really do anything for LOD. Amplification shaders let you do tessellation-like things (which is on one end of LOD), but you still need simplification. Certainly for big open world games you definitely need the whole gamut of LOD including simplification. That is one of the main aspects of Nanite that I think folks will want to duplicate in the future as in addition to largely eliminating pop-in, it's the aspect that is most responsible for the increased art production efficiency.
 
One intended usage for amplification shaders is culling entire meshlets that can be skipped if they’re out of view. AW2 may be processing every meshlet and doing per triangle culling in the mesh shader instead.
“While the Mesh Shader is a fairly flexible tool, it does not allow for all tessellation scenarios and is not always the most efficient way to implement per-instance culling. For this we have the Amplification Shader. What it does is simple: dispatch threadgroups of Mesh Shaders.”

This could be the reason why remedy decided to implement custom culling method wich they found to be more efficient. Without amp shader you need to be more creative.
 
Very roughly, standard mesh shaders give you a way to do cluster ("meshlet") culling, but don't really do anything for LOD. Amplification shaders let you do tessellation-like things (which is on one end of LOD), but you still need simplification. Certainly for big open world games you definitely need the whole gamut of LOD including simplification. That is one of the main aspects of Nanite that I think folks will want to duplicate in the future as in addition to largely eliminating pop-in, it's the aspect that is most responsible for the increased art production efficiency.
"Amplification" shaders are the opposite of 'simplification'? I've never understand where the term 'amplification' comes from. What's be amplified?
 
It’s fine. No worries. Just know that not all of us respond with the intention of winning an argument. I’m long past that point in my time here at b3d. Sometimes I see value in bringing up counter views just so that we don’t have an echo chamber snowball, and sometimes I’m trying to slightly nudge people towards the answers they seek.

In this case, it’s messy. And as others have written each generation of card actually is actually getting better hardware at supporting mesh shaders so it gets much more complex than just saying primitive or mesh. Some people are looking at haves and have nots as being a performance indication but it’s not really like that.

More like do it this way or that way. And right now the angle looks like compute shaders with mesh/primitive is winning out over amplification + mesh in the multiplatform space. But support for the feature is limited because it’s more challenging to do it this way.

Though, if you don’t have to support PS5, the latter is doable, and I suspect amplification + mesh is easier for most developers to do.
According to Timur PS5 shouldn't have problems running amplification shaders as long as it can do compute shaders and Mesh shaders (like it is in Alan Wake 2). Amplification shaders is simply the D2D12 feature name of tasks shaders.
The task shader (aka. amplification shader in D3D12) is a new stage that runs in workgroups similar to compute shaders... Task shader driver implementation on AMD HW
Task shaders on AMD HW...Under the hood, task shaders are compiled to a plain old compute shader... Even though they are compute shaders as far as the AMD HW is concerned, task shaders do not work like a compute pre-pass. Instead, task shaders are dispatched on an async compute queue while at the same time the mesh shader work is executed on the graphics queue in parallel.

 
"Amplification" shaders are the opposite of 'simplification'? I've never understand where the term 'amplification' comes from. What's be amplified?

The above official documentation brings an explanation ...

Programmable Primitive Amplification​

A limited form of primitive amplification as in GS-amplification is supported with Mesh shaders. It’s possible to amplify input point geometry with up to 1:V and/or 1:p ratio where V is the number of output vertices reported by the runtime and P is the number of output primitives reported by the runtime.

However, programmable amplification as in Amplification shaders can’t be done in a single threadgroup because expansion factors are decided by the program and can be huge.

The Amplification shader is intended to be the shader stage that enables programmable Amplification in Mesh shaders.

Amplification shaders are meant to be the optimal stage in the mesh shading pipeline to do geometry expansion much like as is the case with tessellation ...
 
According to Timur PS5 shouldn't have problems running amplification shaders as long as it can do compute shaders and Mesh shaders (like it is in Alan Wake 2). Amplification shaders is simply the D2D12 feature name of tasks shaders.




Timur doesn't mention PS5. He does however say some things that could indicate some additional issues in a developer writing their own amplification shader stage for use with mesh shaders.

Bear in mind, Timur is coming from the perspective of writing a driver when he's talking, he's not just a regular developer. He's very low level and is likely messing with things most people never would.

Bolding is his.

The task+mesh dispatch packets are different from a regular compute dispatch. The compute and graphics queue firmwares work together in parallel

Side note, getting some implementation details wrong can easily cause a deadlock on the GPU. It is great fun to debug these.

The relevant details here are that most of the hard work is implemented in the firmware (good news, because that means I don’t have to implement it), and that task shaders are executed on an async compute queue and that the driver now has to submit compute and graphics work in parallel.


Keep in mind that the API hides this detail and pretends that the mesh shading pipeline is just another graphics pipeline that the application can submit to a graphics queue. So, once again we have a mismatch between the API programming model and what the HW actually does.

This leaves some questions up in the air about PS5.

Like, can it actually do what he's talking about here?

Alternatively, will doing the amplification shader stage on regular compute be as fast as firmware synchronised task + mesh that he's talking about here? E.g. would it involve more writes and reads to vram?

Clearly you can do anything you want in software - look at the amazing UE5 - but that doesn't mean there is no difference or advantage in having advances in hardware (or hardware + firmware in this case).
 

The above official documentation brings an explanation ...



Amplification shaders are meant to be the optimal stage in the mesh shading pipeline to do geometry expansion much like as is the case with tessellation ...
Okay. So 'Amplification' is the wrong term. If you have a tone generator producing a signal, amplifying doesn't change the tone but just increases it. Adding more tones would something very different to amplification. Amplification takes a value and increases it. Adding more data is interpolation or, the opposite of simplification, complexification. I could never intuit what that part of the pipeline was doing.
 
Alternatively, will doing the amplification shader stage on regular compute be as fast as firmware synchronised task + mesh that he's talking about here? E.g. would it involve more writes and reads to vram?

Clearly you can do anything you want in software - look at the amazing UE5 - but that doesn't mean there is no difference or advantage in having advances in hardware (or hardware + firmware in this case).

It will likely not be as fast. The task -> mesh -> raster pipeline apparently keeps data on-chip without round trips to vram. That’s the benefit of the hardware enforced limits (16KB maximum payload between task and mesh). Compute shaders have no such limits.

“Both mesh and task shaders follow the programming model of compute shaders, using cooperative thread groups to compute their results and having no inputs other than a workgroup index. These execute on the graphics pipeline; therefore the hardware directly manges memory passed between stages and kept on-chip.”

 
Okay. So 'Amplification' is the wrong term. If you have a tone generator producing a signal, amplifying doesn't change the tone but just increases it. Adding more tones would something very different to amplification. Amplification takes a value and increases it. Adding more data is interpolation or, the opposite of simplification, complexification. I could never intuit what that part of the pipeline was doing.

It doesn’t make sense in a signal processing context but it seems fine in terms of plain old English. Amplify as in “increase the number of vertices/primitives”.
 
It will likely not be as fast. The task -> mesh -> raster pipeline apparently keeps data on-chip without round trips to vram. That’s the benefit of the hardware enforced limits (16KB maximum payload between task and mesh). Compute shaders have no such limits.

“Both mesh and task shaders follow the programming model of compute shaders, using cooperative thread groups to compute their results and having no inputs other than a workgroup index. These execute on the graphics pipeline; therefore the hardware directly manges memory passed between stages and kept on-chip.”

Yup. But expecting all developers to be able to do it is not reasonable. Only Epic has pulled it off so far, and they still have some limitations on their Nanite system.

A lot of stuff can be done in compute that makes things like VRS and Tiled Resources, amplification shaders and mesh shaders not really that relevant. But it’s already a challenge to implement the system, to have to make it work on every configuration is another level of difficulty. This is where API support at the hardware level helps.
 
Yup. But expecting all developers to be able to do it is not reasonable. Only Epic has pulled it off so far, and they still have some limitations on their Nanite system.

A lot of stuff can be done in compute that makes things like VRS and Tiled Resources, amplification shaders and mesh shaders not really that relevant. But it’s already a challenge to implement the system, to have to make it work on every configuration is another level of difficulty. This is where API support at the hardware level helps.

I guess this is what sebbi and locuza discussed recently on twitter

Locuza
“Primitive Shaders (NGG) was first implemented in GCN5, partially re-written again with RDNA1.
However, RDNA1 is lacking per-primitive output, so neither in DX12 or Vulkan Mesh Shaders are supported.
It requires >= RDNA2 (GFX10.3).
So many discussions boil down to the question if”

Sebbi
“Per-primitive output is very handy. I am glad that Microsoft didn't cave-in. One of the bottlenecks of the old pipelines is lack of per-primitive data, and the hacks around that are not pretty.”

 
Timur doesn't mention PS5. He does however say some things that could indicate some additional issues in a developer writing their own amplification shader stage for use with mesh shaders.

Bear in mind, Timur is coming from the perspective of writing a driver when he's talking, he's not just a regular developer. He's very low level and is likely messing with things most people never would.

Bolding is his.







This leaves some questions up in the air about PS5.

Like, can it actually do what he's talking about here?

Alternatively, will doing the amplification shader stage on regular compute be as fast as firmware synchronised task + mesh that he's talking about here? E.g. would it involve more writes and reads to vram?

Clearly you can do anything you want in software - look at the amazing UE5 - but that doesn't mean there is no difference or advantage in having advances in hardware (or hardware + firmware in this case).

That depends on what Sony has done. Nothing has stopped or is stopping Sony from doing similar work as Timur.

Given the secrecy of console manufacturers and dev NDAs, we shouldn't automatically conclude that a console is missing a feature because it's not revealed or talked about.

The PS4 had decoding hardware for lossless compressed data even though it was only MS that touted it with the XB1 DMEs and the XB1 had a garlic and onion bus config even though only Sony talked about that aspect which was a standard for AMD APUs.
 
Last edited:
That depends on what Sony has done. Nothing has stopped or is stopping Sony from doing similar work as Timur.

Given the secrecy of console manufacturers and dev NDAs, we shouldn't automatically conclude that a console is missing a feature because it's not revealed or talked about.
I think the reasonable default position is that it is on Sony to announce support at that level, and if they don’t and no developer verifies it, then it’s reasonable to say it doesn’t exist.

if it does exist, there should be evidence of it existing. That’s all I’m saying, and if evidence shows up that exists we can shelf this discussion. I’m all for PS5 having support for it, part of me believes that if any company wanted to roll their own solution it would be them, but I haven’t seen any evidence of it yet.
 
I think the reasonable default position is that it is on Sony to announce support at that level, and if they don’t and no developer verifies it, then it’s reasonable to say it doesn’t exist.

if it does exist, there should be evidence of it existing. That’s all I’m saying, and if evidence shows up that exists we can shelf this discussion. I’m all for PS5 having support for it, part of me believes that if any company wanted to roll their own solution it would be them, but I haven’t seen any evidence of it yet.

They need to have something imo, Capcom recently announced that next REX engine will also use mesh shaders. It would be silly to go all in on something that market leading console is not supporting or lacking.
 
They need to have something imo, Capcom recently announced that next REX engine will also use mesh shaders. It would be silly to go all in on something that market leading console is not supporting or lacking.
Technically they don’t. As you can see with UE they rolled their own solution without it. And as you can see the impact of AW2, they didn’t use it either.

I think, if it were possible to implement both task and mesh shaders on Navi10 AMD would have. Instead it’s held at primitive.
 
I think the reasonable default position is that it is on Sony to announce support at that level, and if they don’t and no developer verifies it, then it’s reasonable to say it doesn’t exist.

if it does exist, there should be evidence of it existing. That’s all I’m saying, and if evidence shows up that exists we can shelf this discussion. I’m all for PS5 having support for it, part of me believes that if any company wanted to roll their own solution it would be them, but I haven’t seen any evidence of it yet.

There's absolutely no incentive for Sony to come out and announce their hardware could support an API they don't use. 99.999999% of gamers that buy PS5s don't even know what a mesh shader is. Also, if you really think about it the likely case is that the GPU and CPU hardware in SeriesX and PS5 is nearly identical. Mesh Shaders do not map directly to RDNA2 hardware. They get converted into primitive shaders in the gpu driver because primitive shaders map closer to the actual hardware. Just an educated guess, but the PS5 API likely looks a lot like primitive shaders. On Xbox you would write Mesh Shaders and they likely get converted into primitive shaders by the gpu driver, the same way it works in Windows, where primitive shaders are not exposed directly.

I'm just going by what I feel like is most likely. There's no information out there that says PS5 is rdna1. I just don't think it's likely that it is.
 
I think the reasonable default position is that it is on Sony to announce support at that level, and if they don’t and no developer verifies it, then it’s reasonable to say it doesn’t exist.

if it does exist, there should be evidence of it existing. That’s all I’m saying, and if evidence shows up that exists we can shelf this discussion. I’m all for PS5 having support for it, part of me believes that if any company wanted to roll their own solution it would be them, but I haven’t seen any evidence of it yet.

I think thats unreasonable given the circumstance. PS5's info from Sony mostly revolve around a presentation and a handful of interviews/tweets. While MS presents their hardware during hotchips, on its developer blog and other sites. And because MS has a large presence in the PC market where a ton of API and GPU related details aren't hidden behind NDAs, you get far more access to Xbox relevant data. 99% of what we know about mesh shaders doesn't come Xbox sources, it comes from the PC realm.

Plus where does Sony lack of mesh or task shaders manifest itself in terms of games? It doesn't seem to show up in AW2.

The absence of evidence is not evidence of absence.
 
Status
Not open for further replies.
Back
Top