Digital Foundry Article Technical Discussion [2023]

Status
Not open for further replies.
bench.00_10_09_56.stiyvcam.png


In case anyone is curious - all my performance capture from the video and here are Vsync off on PC running on the Ryzen 5 3600:
(this is an RTX 2080 Ti screenshot)
bench.00_12_21_36.stix8eqe.png
Interesting that the 3070 is beating the 2080 Ti by 11% when they’re usually neck-and-neck in non-VRAM constrained scenarios.

I wonder if it has anything to do with Turing vs Ampere because the 3070 is also faster than the 2070S to a greater degree than is customary.

Thanks for the screenshots!
 
Interesting that the 3070 is beating the 2080 Ti by 11% when they’re usually neck-and-neck in non-VRAM constrained scenarios.

I wonder if it has anything to do with Turing vs Ampere because the 3070 is also faster than the 2070S to a greater degree than is customary.

Thanks for the screenshots!

I feel it's mesh shaders being slightly more efficient on the newer architecture.
 
Amplification shader = task shader According to timur blog

“First things fist. Under the hood, task shaders are compiled to a plain old compute shader”

I think you missed the most important thing about what makes a task shader a task shader vs just a plain old compute shader: (color bolded is mine)

The relevant details here are that most of the hard work is implemented in the firmware (good news, because that means I don’t have to implement it), and that task shaders are executed on an async compute queue and that the driver now has to submit compute and graphics work in parallel.

Keep in mind that the API hides this detail and pretends that the mesh shading pipeline is just another graphics pipeline that the application can submit to a graphics queue. So, once again we have a mismatch between the API programming model and what the HW actually does.

Squeezing a hidden compute pipeline in your graphics​

In order to use this beautiful scheme provided by the firmware, the driver needs to do two things:

  • Create a compute pipeline from the task shader.
  • Submit the task shader work on the asyc compute queue while at the same time also submit the mesh and pixel shader work on the graphics queue.
We already had good support for compute pipelines in RADV (as much as the API needs), but internally in the driver we’ve never had this kind of close cooperation between graphics and compute.

When you use a draw call in a command buffer with a pipeline that has a task shader, RADV must create a hidden, internal compute command buffer. This internal compute command buffer contains the task shader dispatch packet, while the graphics command buffer contains the packet that dispatches the mesh shaders. We must also ensure correct synchronization between these two command buffers according to application barriers ― because of the API mismatch it must work as if the internal compute cmdbuf was part of the graphics cmdbuf. We also need to emit the same descriptors and push constants, etc. When the application submits the graphics queue, this new, internal compute command buffer is then submitted to the async compute queue.

Thus far, this sounds pretty logical and easy.

The actual hard work is to make it possible for the driver to submit work to different queues at the same time. RADV’s queue code was written assuming that there is a 1:1 mapping between radv_queue objects and HW queues. To make task shaders work we must now break this assumption.

So, of course I had to do some crazy refactor to enable this. At the time of writing the AMDGPU Linux kernel driver doesn’t support “gang submit” yet, so I use scheduled dependencies instead. This has the drawback of submitting to the two queues sequentially rather than doing everything in the same submit.

So as per Lurkmass, he is correct. On AMD the most that can be done, at least with the documentation that's been written so far at the time of RDNA 2, was the emulate the task shader at the driver level (though I'm unsure about how it's implemented in DX vs Vulkan). It's nearly there. But you can see the issue with a developer trying to emulate task shaders on PS5. It's entirely possible that Sony have done something similar to what was done here, or if they don't, they have their own homebrew version of this (which would have more options imo if it does exist), and if they don't have that, then it would be on the developer to make the emulation happen.

imo, major differences between how amplification shaders are implemented on DX and PS5 is likely the culprit to slow the process of migration to mesh shaders.
 
Last edited:
bench.00_10_09_56.stiyvcam.png


In case anyone is curious - all my performance capture from the video and here are Vsync off on PC running on the Ryzen 5 3600:
(this is an RTX 2080 Ti screenshot)
bench.00_12_21_36.stix8eqe.png
Curious how is the 2070s handling the noisy artifcats compared to the console versions?
Are they still there and is the internal resolution the same as the PS5's? How about the corresponding quality mode?
 
Really though.. why does Alan Wake 2 not have any frame limiter or refresh-rate options...

They also need to change the graphics option menu to be completely open on the right side to see the graphics changes as they happen, and raise the transparency a bit on the options side. That would solve the issue of not understanding how each setting is impacting the visuals.

Hopefully the game gets some decent patches to fix some annoying issues like the map texture not loading in quickly when playing as Saga. It's clearly loading the entire mind space every time, and the map is something you need to look at frequently enough that it's annoying to have to wait for it.
 
Curious how is the 2070s handling the noisy artifcats compared to the console versions?
Are they still there and is the internal resolution the same as the PS5's? How about the corresponding quality mode?

Not all is perfect on the PC side when it comes to (no) noisy artifacts. There have been a few doors (like the rear door in the diner) with a specific reflective PBR material or surface that has a visible noisy pattern when RT/PT is active. It looks like a simple fix though.

Edit: Also, has anyone experienced a weird shadow issue with the stairs while walking up the steps in the morgue after the first initial fight with Nightingale? As if RT/PT is creating an odd 3D-like shadow of the stairs while walking up.
 
Last edited:
Interestingly the 2080Ti is on par with the 3070 there. Although the 3060 is still punching above its weight compared to the 2070. Higher settings may favour the 2080Ti's higher vram or memory bandwidth though (which would also help the 3060 on the VRAM capacity side).
Yeah but here it wipes the 2080 Ti by 14%.

performance-1920-1080.png


The 2080 Ti is also equal to the 3060 Ti whereas it's usually 10-15% faster.
 
3070 is beating the 2080 Ti by 11% when they’re usually neck-and-neck in non-VRAM constrained scenarios
Not that strange, we've observed this in Quake 2 RTX, Minecraft RTX and Portal RTX, the 3070 is anywhere from 15% faster to 50% faster than the 2080Ti. The 3070Ti is even faster. The more you push ray tracing, either through path tracing or through piling up ray traced effects, the more the 3070/3070Ti comes up on top.

 
Not that strange, we've observed this in Quake 2 RTX, Minecraft RTX and Portal RTX, the 3070 is anywhere from 15% faster to 50% faster than the 2080Ti. The 3070Ti is even faster. The more you push ray tracing, either through path tracing or through piling up ray traced effects, the more the 3070/3070Ti comes up on top.

These are non-RT benchmarks though.
 
Not that strange, we've observed this in Quake 2 RTX, Minecraft RTX and Portal RTX, the 3070 is anywhere from 15% faster to 50% faster than the 2080Ti. The 3070Ti is even faster. The more you push ray tracing, either through path tracing or through piling up ray traced effects, the more the 3070/3070Ti comes up on top.

Those benchmarks are without ray tracing and only use signed distance fields for reflections. Doesn’t sound like the difference should be that large.
 
Use RTSS if you have it as it typically does a better job than built in frame rate limiters anyway.
Yeah but that's besides the point. Games loading is often tied to framerate and limiting the overall framerate to 30 can sometimes induce extra long loading times. A game with a proper frame limiter would disable the limit during those loading screens.

We should always advocate for games including these options and doing proper jobs of them, regardless of how easy it is to use 3rd party software. Some day that software may no longer work.
 
Yeah but that's besides the point. Games loading is often tied to framerate and limiting the overall framerate to 30 can sometimes induce extra long loading times. A game with a proper frame limiter would disable the limit during those loading screens.

We should always advocate for games including these options and doing proper jobs of them, regardless of how easy it is to use 3rd party software. Some day that software may no longer work.
My point wasn't to excuse the developers for not including those options.

Merely to advise there's a way to do it.
 
Status
Not open for further replies.
Back
Top