Digital Foundry Article Technical Discussion [2023]

Below2D · Oct 29, 2023

Dictator said:
In case anyone is curious - all my performance capture from the video and here are Vsync off on PC running on the Ryzen 5 3600:
(this is an RTX 2080 Ti screenshot)

Interesting that the 3070 is beating the 2080 Ti by 11% when they’re usually neck-and-neck in non-VRAM constrained scenarios.

I wonder if it has anything to do with Turing vs Ampere because the 3070 is also faster than the 2070S to a greater degree than is customary.

Thanks for the screenshots!

trinibwoy · Oct 29, 2023

Is there any info on how AW2 is using mesh shaders?

Culling, LOD selection, tessellation?

davis.anthony · Oct 29, 2023

Below2D said:
Interesting that the 3070 is beating the 2080 Ti by 11% when they’re usually neck-and-neck in non-VRAM constrained scenarios.

I wonder if it has anything to do with Turing vs Ampere because the 3070 is also faster than the 2070S to a greater degree than is customary.

Thanks for the screenshots!

I feel it's mesh shaders being slightly more efficient on the newer architecture.

Below2D · Oct 29, 2023

davis.anthony said:
I feel it's mesh shaders being slightly more efficient on the newer architecture.

I would like to see how a 3060 performs. If this hypothesis is true, it should come very close to the 2070S in this game or even beat it.

iroboto · Oct 29, 2023

mr magoo said:
Amplification shader = task shader According to timur blog

“First things fist. Under the hood, task shaders are compiled to a plain old compute shader”

Task shader driver implementation on AMD HW

Previously, I gave you an introduction to mesh/task shaders and wrote up some details about how mesh shaders are implemented in the driver. But I left out the important details of how task shaders (aka. amplification shaders) work in the driver. In this post, I aim to give you some details about...

timur.hu

I think you missed the most important thing about what makes a task shader a task shader vs just a plain old compute shader: (color bolded is mine)

The relevant details here are that most of the hard work is implemented in the firmware (good news, because that means I don’t have to implement it), and that task shaders are executed on an async compute queue and that the driver now has to submit compute and graphics work in parallel.

Keep in mind that the API hides this detail and pretends that the mesh shading pipeline is just another graphics pipeline that the application can submit to a graphics queue. So, once again we have a mismatch between the API programming model and what the HW actually does.

Squeezing a hidden compute pipeline in your graphics
In order to use this beautiful scheme provided by the firmware, the driver needs to do two things:

Create a compute pipeline from the task shader.

Submit the task shader work on the asyc compute queue while at the same time also submit the mesh and pixel shader work on the graphics queue.

We already had good support for compute pipelines in RADV (as much as the API needs), but internally in the driver we’ve never had this kind of close cooperation between graphics and compute.

When you use a draw call in a command buffer with a pipeline that has a task shader, RADV must create a hidden, internal compute command buffer. This internal compute command buffer contains the task shader dispatch packet, while the graphics command buffer contains the packet that dispatches the mesh shaders. We must also ensure correct synchronization between these two command buffers according to application barriers ― because of the API mismatch it must work as if the internal compute cmdbuf was part of the graphics cmdbuf. We also need to emit the same descriptors and push constants, etc. When the application submits the graphics queue, this new, internal compute command buffer is then submitted to the async compute queue.

Thus far, this sounds pretty logical and easy.

The actual hard work is to make it possible for the driver to submit work to different queues at the same time. RADV’s queue code was written assuming that there is a 1:1 mapping between radv_queue objects and HW queues. To make task shaders work we must now break this assumption.

So, of course I had to do some crazy refactor to enable this. At the time of writing the AMDGPU Linux kernel driver doesn’t support “gang submit” yet, so I use scheduled dependencies instead. This has the drawback of submitting to the two queues sequentially rather than doing everything in the same submit.

So as per Lurkmass, he is correct. On AMD the most that can be done, at least with the documentation that's been written so far at the time of RDNA 2, was the emulate the task shader at the driver level (though I'm unsure about how it's implemented in DX vs Vulkan). It's nearly there. But you can see the issue with a developer trying to emulate task shaders on PS5. It's entirely possible that Sony have done something similar to what was done here, or if they don't, they have their own homebrew version of this (which would have more options imo if it does exist), and if they don't have that, then it would be on the developer to make the emulation happen.

imo, major differences between how amplification shaders are implemented on DX and PS5 is likely the culprit to slow the process of migration to mesh shaders.

Nesh · Oct 29, 2023

Dictator said:
In case anyone is curious - all my performance capture from the video and here are Vsync off on PC running on the Ryzen 5 3600:
(this is an RTX 2080 Ti screenshot)

Curious how is the 2070s handling the noisy artifcats compared to the console versions?
Are they still there and is the internal resolution the same as the PS5's? How about the corresponding quality mode?

Remij · Oct 29, 2023

Really though.. why does Alan Wake 2 not have any frame limiter or refresh-rate options...

They also need to change the graphics option menu to be completely open on the right side to see the graphics changes as they happen, and raise the transparency a bit on the options side. That would solve the issue of not understanding how each setting is impacting the visuals.

Hopefully the game gets some decent patches to fix some annoying issues like the map texture not loading in quickly when playing as Saga. It's clearly loading the entire mind space every time, and the map is something you need to look at frequently enough that it's annoying to have to wait for it.

techuse · Oct 29, 2023

I would think the greater Ampere performance relative to Turing is down to 2xFP32.

davis.anthony · Oct 29, 2023

Below2D said:
I would like to see how a 3060 performs. If this hypothesis is true, it should come very close to the 2070S in this game or even beat it.

Can't find a benchmark with the 2070 Super but I found this.

Shortbread · Oct 29, 2023

Nesh said:
Curious how is the 2070s handling the noisy artifcats compared to the console versions?
Are they still there and is the internal resolution the same as the PS5's? How about the corresponding quality mode?

Not all is perfect on the PC side when it comes to (no) noisy artifacts. There have been a few doors (like the rear door in the diner) with a specific reflective PBR material or surface that has a visible noisy pattern when RT/PT is active. It looks like a simple fix though.

Edit: Also, has anyone experienced a weird shadow issue with the stairs while walking up the steps in the morgue after the first initial fight with Nightingale? As if RT/PT is creating an odd 3D-like shadow of the stairs while walking up.

pjbliverpool · Oct 29, 2023

davis.anthony said:
Can't find a benchmark with the 2070 Super but I found this.

View attachment 9920

Interestingly the 2080Ti is on par with the 3070 there. Although the 3060 is still punching above its weight compared to the 2070. Higher settings may favour the 2080Ti's higher vram or memory bandwidth though (which would also help the 3060 on the VRAM capacity side).

Below2D · Oct 29, 2023

pjbliverpool said:
Interestingly the 2080Ti is on par with the 3070 there. Although the 3060 is still punching above its weight compared to the 2070. Higher settings may favour the 2080Ti's higher vram or memory bandwidth though (which would also help the 3060 on the VRAM capacity side).

Yeah but here it wipes the 2080 Ti by 14%.

The 2080 Ti is also equal to the 3060 Ti whereas it's usually 10-15% faster.

DavidGraham · Oct 29, 2023

Below2D said:
3070 is beating the 2080 Ti by 11% when they’re usually neck-and-neck in non-VRAM constrained scenarios

Not that strange, we've observed this in Quake 2 RTX, Minecraft RTX and Portal RTX, the 3070 is anywhere from 15% faster to 50% faster than the 2080Ti. The 3070Ti is even faster. The more you push ray tracing, either through path tracing or through piling up ray traced effects, the more the 3070/3070Ti comes up on top.

Test • Nvidia GeForce RTX 3070 Ti

Test de la GeForce RTX 3070 Ti de Nvidia, dans sa version Founders Edition. Au programme, 25 jeux testés, dont 19 en rastérisation, 11 en Ray Tracing, 11 benchmarks, des mesures de consommation, nuisances sonores, températures et imagerie infrarouge.

www.comptoir-hardware.com

pjbliverpool · Oct 29, 2023

DavidGraham said:
Not that strange, we've observed this in Quake 2 RTX, Minecraft RTX and Portal RTX, the 3070 is anywhere from 15% faster to 50% faster than the 2080Ti. The 3070Ti is even faster. The more you push ray tracing, either through path tracing or through piling up ray traced effects, the more the 3070/3070Ti comes up on top.

These are non-RT benchmarks though.

Below2D · Oct 29, 2023

DavidGraham said:
Not that strange, we've observed this in Quake 2 RTX, Minecraft RTX and Portal RTX, the 3070 is anywhere from 15% faster to 50% faster than the 2080Ti. The 3070Ti is even faster. The more you push ray tracing, either through path tracing or through piling up ray traced effects, the more the 3070/3070Ti comes up on top.

Test • Nvidia GeForce RTX 3070 Ti

Test de la GeForce RTX 3070 Ti de Nvidia, dans sa version Founders Edition. Au programme, 25 jeux testés, dont 19 en rastérisation, 11 en Ray Tracing, 11 benchmarks, des mesures de consommation, nuisances sonores, températures et imagerie infrarouge.

www.comptoir-hardware.com

Those benchmarks are without ray tracing and only use signed distance fields for reflections. Doesn’t sound like the difference should be that large.

Flappy Pannus · Oct 29, 2023

Remij said:
Really though.. why does Alan Wake 2 not have any frame limiter or refresh-rate options...

Especially for a game this demanding on a lot of hardware, a properly frame-paced 30fps option would be a nice addition too.

davis.anthony · Oct 29, 2023

Flappy Pannus said:
Especially for a game this demanding on a lot of hardware, a properly frame-paced 30fps option would be a nice addition too.

Use RTSS if you have it as it typically does a better job than built in frame rate limiters anyway.

Remij · Oct 29, 2023

davis.anthony said:
Use RTSS if you have it as it typically does a better job than built in frame rate limiters anyway.

Yeah but that's besides the point. Games loading is often tied to framerate and limiting the overall framerate to 30 can sometimes induce extra long loading times. A game with a proper frame limiter would disable the limit during those loading screens.

We should always advocate for games including these options and doing proper jobs of them, regardless of how easy it is to use 3rd party software. Some day that software may no longer work.

davis.anthony · Oct 29, 2023

Remij said:
Yeah but that's besides the point. Games loading is often tied to framerate and limiting the overall framerate to 30 can sometimes induce extra long loading times. A game with a proper frame limiter would disable the limit during those loading screens.

We should always advocate for games including these options and doing proper jobs of them, regardless of how easy it is to use 3rd party software. Some day that software may no longer work.

My point wasn't to excuse the developers for not including those options.

Merely to advise there's a way to do it.

Remij · Oct 29, 2023

davis.anthony said:
My point wasn't to excuse the developers for not including those options.

Merely to advise there's a way to do it.

You should know both Flappy and I well enough that we know RTSS exists.. But it's all good

Digital Foundry Article Technical Discussion [2023]

Below2D

trinibwoy

Meh

davis.anthony

Below2D

iroboto

Daft Funk

Task shader driver implementation on AMD HW

Squeezing a hidden compute pipeline in your graphics

Nesh

Double Agent

Remij

techuse

davis.anthony

Shortbread

Island Hopper

pjbliverpool

B3D Scallywag

Below2D

DavidGraham

Test • Nvidia GeForce RTX 3070 Ti

pjbliverpool

B3D Scallywag

Below2D

Test • Nvidia GeForce RTX 3070 Ti

Flappy Pannus

davis.anthony

Remij

davis.anthony

Remij

Similar threads

Digital Foundry Article Technical Discussion [2023]

Meh

Daft Funk

Squeezing a hidden compute pipeline in your graphics​

Double Agent

Island Hopper

B3D Scallywag

B3D Scallywag

Similar threads

Squeezing a hidden compute pipeline in your graphics