Digital Foundry Article Technical Discussion [2023]

Status
Not open for further replies.
Metroid Prime Remake runs with 900p/60 FPS on the Switch. Why would it not possible to do 1080p/60FPS -> 4K upscaling?! I think this video is to negative. Most games wont be AAA games or using UE5. Indie games, smaller AA games and Nintendo games should run much better.
 
Yup. But expecting all developers to be able to do it is not reasonable. Only Epic has pulled it off so far, and they still have some limitations on their Nanite system.

A lot of stuff can be done in compute that makes things like VRS and Tiled Resources, amplification shaders and mesh shaders not really that relevant. But it’s already a challenge to implement the system, to have to make it work on every configuration is another level of difficulty. This is where API support at the hardware level helps.

Expecting all developers to do what?

Mesh shaders are basically vertex shaders that aren’t constrained by fixed input formats. Should be super easy to add to an engine.
 
Expecting all developers to do what?

Mesh shaders are basically vertex shaders that aren’t constrained by fixed input formats. Should be super easy to add to an engine.
To skip task/amplification shaders and roll their own compute based solution instead.
 
They need to have something imo, Capcom recently announced that next REX engine will also use mesh shaders. It would be silly to go all in on something that market leading console is not supporting or lacking.
Supposedly Capcom was shifting to be more PC-focused, and, therefore, their investing in Mesh Shaders, which would likely end up being in a PS5 Pro and PS6, possibly even in any future Nintendo hardware makes sense. Also, you have Alan Wake, where we are sure they are using something to make to achieve the same results. Also there is a rumor that Capcom will be making RE engine available to be licensed soon.
 
Last edited:
I think there's serious confusion about what an API is. The whole point of the API is to standardize writing the application so it can ran on a wide variety of hardware that is different. Mesh Shaders are written at the application level, the gpu driver supports the API and takes the shader and compiles it into something the hardware can run. There is no explicit section of the gpu for running mesh shaders. Mesh Shaders is not a fixed function hardware block. It's a general compute approach to geometry. The RDNA1 ISA is likely missing a few instructions that would be required to make Mesh Shaders work as defined by the APIs. Now RDNA2 has those instructions which allows it to support the API, though there seem to be some things about how RDNA2 schedules work that might not make their implementation ideal. Can't remember exactly, and don't feel like looking for the link. Now that there's a standardized API, AMD will likely target making changes to the instruction set and scheduler to try to optimize mesh shader support, but don't forget it's not fixed function hardware that runs mesh shaders. It's general purpose compute with an API.
 
Oh, why would they do that? Task shaders likely have a fast path to the mesh shader. A compute shader implementation would be slower for no benefit.
My assumption is that PS5 has no support for task shaders. And in order for have similar behaviour and reproducible behaviour between PC and console, this would be one such method.
 
I think there's serious confusion about what an API is. The whole point of the API is to standardize writing the application so it can ran on a wide variety of hardware that is different. Mesh Shaders are written at the application level, the gpu driver supports the API and takes the shader and compiles it into something the hardware can run. There is no explicit section of the gpu for running mesh shaders. Mesh Shaders is not a fixed function hardware block. It's a general compute approach to geometry. The RDNA1 ISA is likely missing a few instructions that would be required to make Mesh Shaders work as defined by the APIs. Now RDNA2 has those instructions which allows it to support the API, though there seem to be some things about how RDNA2 schedules work that might not make their implementation ideal. Can't remember exactly, and don't feel like looking for the link. Now that there's a standardized API, AMD will likely target making changes to the instruction set and scheduler to try to optimize mesh shader support, but don't forget it's not fixed function hardware that runs mesh shaders. It's general purpose compute with an API.
Correct. I have no confusion around this.
How the scheduler runs and the differences between memory sharing and how the compute and graphics queue can signal each other is how mesh and task shaders work.

If the instructions doesn’t exist for it, then one in theory could make it. But if any hardware is required to support the instruction call, even something as small as having a larger instruction cache to fit it all in, unless those modifications occur I can’t see it working.

I don’t really want to get into the whole is ps5 rdna 2 bit. It will derail this thread for sure but These are semi custom solutions not full custom. To me PS5 has always been a Navi10 taking on some Navi20 features. And XSX is a Navi21 taking on Navi10 features.

Cerny presented the Primitive Shader, the Geometry Engine. If they had task shaders I’m positive it would have been announced there as well - and yet all through development and even today we’ve never once heard of it. Just seems implausible that all this time Sony first party and multi-platform have been sleeping on Mesh and task shaders - and instead opting for their own compute based solutions which we know would run on all platforms.

As easy as it is to map mesh shaders to primitive shaders, then is should be just as easy to map amplifications shaders to task shaders on ps5. And I’ve not heard or seen it. No one talks about task shaders.

I’m more than happy to change my stance on this 3-5 years from now when more games are on mesh shaders and we hear more news about implementation. But we will need to wait for GDC for this. If it ever comes, and I’m certainly open to it - but I’ve not yet seen convincing evidence that it exists.
 
@iroboto but why does Sony have to support something analogous to mesh shaders? Why not design their api in a way that maps as closely as possible to the ideal operation of an rdna gpu? Like some people seem to feel there is some necessity to have mesh shaders. The ps5 api does not have to support a wide array of hardware like d3d or vulkan. This is why the argument doesn't make sense. You’re saying a lack of mesh shaders must mean the hardware couldn’t support it, when the reality is there could be a better programming model specifically for rdna2. The ps5 api/sdk is NDA’d as always.
 
Last edited:
@iroboto but why does Sony have to support something analogous to mesh shaders? Why not design their api in a way that maps as closely as possible to the ideal operation of an rdna gpu? Like some people seem to feel there is some necessity to have mesh shaders. The ps5 api does not have to support a wide array of hardware like d3d or vulkan. This is why the argument does make sense. You’re saying a lack of mesh shaders must mean the hardware couldn’t support it, when the reality is there could be a better programming model specifically for rdna2. The ps5 api/sdk is NDA’d as always.

there’s a bit of wiggle room for that. Like nvidias support for mesh shaders in their own extensions has more flexibility and features than DX. And that’s what I would expect from GNM and Xbox to have over the PC variant.

What I wouldn't expect is a completely different architecture. As somehow Sony was able to develop something entirely better than what both AMD and Nvidia built. Right?

For me, the grounding point is that these are semi custom solutions, not fully custom. They can borrow and switch some blocks here and there, but they aren’t rolling silicon that doesn’t exist on PC.
 
My assumption is that PS5 has no support for task shaders. And in order for have similar behaviour and reproducible behaviour between PC and console, this would be one such method.

What exactly do you mean by “support for task shaders”?

DX12 task shaders are functionally pretty simple.
Input: anything
Processing: anything
Output: anything that fits in 16KB

So task shaders are basically compute shaders with a limited output size. The only other real difference is that task shaders can feed data to mesh shaders more efficiently while compute shaders likely have to write through VRAM.

So the PS5 definitely supports “task shaders”.
 
What exactly do you mean by “support for task shaders”?

DX12 task shaders are functionally pretty simple.
Input: anything
Processing: anything
Output: anything that fits in 16KB

So task shaders are basically compute shaders with a limited output size. The only other real difference is that task shaders can feed data to mesh shaders more efficiently while compute shaders likely have to write through VRAM.

So the PS5 definitely supports “task shaders”.
According to Timur:
Even though they are compute shaders as far as the AMD HW is concerned, task shaders do not work like a compute pre-pass. Instead, task shaders are dispatched on an async compute queue while at the same time the mesh shader work is executed on the graphics queue in parallel.

The task+mesh dispatch packets are different from a regular compute dispatch. The compute and graphics queue firmwares work together in parallel:

  • Compute queue launches up to as many task workgroups as it has space available in the ring buffer.
  • Graphics queue waits until a task workgroup is finished and can launch mesh shader workgroups immediately. Execution of mesh dispatches from a finished task workgroup can therefore overlap with other task workgroups.
  • When a mesh dispatch from the a task workgroup is finished, its slot in the ring buffer can be reused and a new task workgroup can be launched.
  • When the ring buffer is full, the compute queue waits until a mesh dispatch is finished, before launching the next task workgroup.

Side note, getting some implementation details wrong can easily cause a deadlock on the GPU. It is great fun to debug these.

The relevant details here are that most of the hard work is implemented in the firmware (good news, because that means I don’t have to implement it), and that task shaders are executed on an async compute queue and that the driver now has to submit compute and graphics work in parallel.

Keep in mind that the API hides this detail and pretends that the mesh shading pipeline is just another graphics pipeline that the application can submit to a graphics queue. So, once again we have a mismatch between the API programming model and what the HW actually does.

****
I think developers can emulate this on PS5, but as per above, it’s likely painful. If Sony rolled their own task shader, that would be different, then developers can just call it and go.
 
Last edited:
What exactly do you mean by “support for task shaders”?

DX12 task shaders are functionally pretty simple.
Input: anything
Processing: anything
Output: anything that fits in 16KB

So task shaders are basically compute shaders with a limited output size. The only other real difference is that task shaders can feed data to mesh shaders more efficiently while compute shaders likely have to write through VRAM.

So the PS5 definitely supports “task shaders”.

Your conclusion doesn't follow from the rest of your post. The specific api feature, convention, and ability to feed data to mesh shaders more efficiently (without writing to a buffer and later calling execute indirect, I guess?) are exactly the things it sounds like the ps5 doesn't have.

Of course, people have been culling clusters of geometry and dispatching work in compute shaders since well before mesh shaders were announced, but presumably devs that rely on this kind of rendering (and choose to use mesh/task shaders on supported platforms) are maintaining a separate "old fashoined compute" path for platforms that don't support it. Maintining multiple different solutions for a single problem is always a big strain on teams and leads to more bugs and less polish. How much of a perf impact that has is an open question, I assume its negligable unless you have very specific bottlenecks.
 
there’s a bit of wiggle room for that. Like nvidias support for mesh shaders in their own extensions has more flexibility and features than DX. And that’s what I would expect from GNM and Xbox to have over the PC variant.

What I wouldn't expect is a completely different architecture. As somehow Sony was able to develop something entirely better than what both AMD and Nvidia built. Right?

For me, the grounding point is that these are semi custom solutions, not fully custom. They can borrow and switch some blocks here and there, but they aren’t rolling silicon that doesn’t exist on PC.

Nvidia came up with the mesh shader concept. AMD was working with their own primitive shader concept. Nvidia's concept was adopted as the standard for the API. AMD has made it work despite it not being the direction they'd chosen for their hardware. It's similar enough that they've made it work. Sony's solution is likely designed around if not provided to them by AMD. I'm guessing the Sony API is largely built around the actual hardware with large input from AMD, similar to how they've designed all of their previous APIs (get as close to the hardware as possible). That's why I don't understand why PS5 would or should adopt Mesh Shaders even if they could in the same way as AMD has on PC and Xbox.
 
I think you guys might be getting a bit too far into the weeds of all of this. At a basic level, the whole purpose in mesh/primitive/task/amplification stuff is to provide more generic compute-like functionality while keeping the intermediate data on-chip rather than going through memory, like would have happened for the convention VS/GS etc. pipeline. The implementation on a specific piece of hardware can be simple or complicated of course, but that's true of most graphics pipeline features.

Guaranteeing that the data stays on chip requires a lot of restrictions around buffer sizes and scheduling of course, which is the majority of what these features are doing: launching compute waves and letting you output to on-chip buffers with some amount of back-pressure on how additional ones are launched.

How important or unimportant it is to keep the data on chip depends a lot on the specific case. It's more relevant if you are doing vertex animation/skinning or tessellation as then you're adding (amplifying :p) data that never has to hit memory at all. That said, the importance of a lot of this is undercut if you are relying on raytracing heavily... if you're going to RT the geometry it has to end up in memory anyways. Once you are building a BVH for something dynamically skinned that is what is going to dominate performance considerations and you might as well just skin in compute shaders directly (as many engines do in these cases).
 
Your conclusion doesn't follow from the rest of your post. The specific api feature, convention, and ability to feed data to mesh shaders more efficiently (without writing to a buffer and later calling execute indirect, I guess?) are exactly the things it sounds like the ps5 doesn't have.

Yes I already made that point. The mesh/task shader api defines limits that enable hardware optimization. However it seems trivial to implement the same functionality as a compute shader. The only question is performance. And until we have evidence that the compute shader path is significantly slower it’s not clear what the debate is about.

Of course, people have been culling clusters of geometry and dispatching work in compute shaders since well before mesh shaders were announced, but presumably devs that rely on this kind of rendering (and choose to use mesh/task shaders on supported platforms) are maintaining a separate "old fashoined compute" path for platforms that don't support it. Maintining multiple different solutions for a single problem is always a big strain on teams and leads to more bugs and less polish. How much of a perf impact that has is an open question, I assume it’s negligable unless you have very specific bottlenecks.

The “multiple different solutions” in this case aren’t that different at all. The actual shader code inside the compute and task shaders should be extremely similar. This should not be difficult for any reasonably competent developer.
 
Plus where does Sony lack of mesh or task shaders manifest itself in terms of games? It doesn't seem to show up in AW2.

AW2 also doesn't utilize amplification shaders which at least if we talk about it terms of DX12 or Xbox consoles is considered part of "mesh" shaders.

I think that's where it starts to get really murky. PS5 obviously doesn't support amplification shaders otherwise Remedy would have used those rather than code something else that would work on the PS5.

So, in terms of what MS calls mesh shaders, the PS5 doesn't have it.

However, it's entirely possible that PS5 has mesh shaders without amplification shaders. Which means...

The absence of evidence is not evidence of absence.

That it could have it ... or it could not have it.

So, the absence of evidence also is not evidence that it exists.

We cannot definitively state that it does or does not have it.

Regards,
SB
 
if you're going to RT the geometry it has to end up in memory anyways. Once you are building a BVH for something dynamically skinned that is what is going to dominate performance considerations and you might as well just skin in compute shaders directly (as many engines do in these cases).
Wait. Mesh shaders must always write directly to the rasterizers? You can’t save the geometry output from mesh shaders?

Nvm. I already see the problem. If you cull back face triangles and write that to BVH, it becomes useless for RT. Since you can no longer have reflections or bounce GI.
 
Last edited:
Metroid Prime Remake runs with 900p/60 FPS on the Switch. Why would it not possible to do 1080p/60FPS -> 4K upscaling?! I think this video is to negative. Most games wont be AAA games or using UE5. Indie games, smaller AA games and Nintendo games should run much better.
That assessment doesn't entirely match the author's interpretation. They thought they were being pretty generous by only modeling the performance of new system's higher power mode (there's almost certainly going to be a lower clocked mode) and we have no idea of the SoC's fixed function setup (12SM/1 rasterizer vs 12SM/2 rasterizer) either which can significantly impact metrics even further ...

Smaller studios are expected to use UE5 along with some of it's advanced graphical features. Not being able to manage a locked 30FPS (regardless of whatever combination of settings used) on one of the easier UE5 games with a theoretical higher power mode is a fairly scathing conclusion especially considering that one of their own internal studios used unreal engine to ship a game recently ...
 
Wait. Mesh shaders must always write directly to the rasterizers? You can’t save the geometry output from mesh shaders?
You could write out to general UAVs of course, but it kind of defeats the purpose... at that point it might as well just be a compute shader. And yeah you can't really do LOD or fine-grained culling if you are going to feed it to a BVH builder.
 
Status
Not open for further replies.
Back
Top