Direct3D feature levels discussion

DavidGraham · Sep 2, 2023

DegustatoR said:
Advanced API Performance: Shaders | NVIDIA Technical Blog

This post covers best practices when working with shaders on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips. Shaders play a critical…

developer.nvidia.com

These bits are very interesting:

Don’t assume that half-precision floats are always faster than full precision and the reverse.

On NVIDIA Ampere GPUs, it’s just as efficient to execute FP32 as FP16 instructions. The overhead of converting between precision formats may just end up with a net loss.

NVIDIA Turing GPUs may benefit from using FP16 math, as FP16 can be issued at twice the rate of FP32.

So NVIDIA states that Turing can actually benefit from FP16 math (it's twice the FP32 rate), but Ampere (and consequently Ada), doesn't care. Probably because of their 2xFP32 design.

Don’t use raster order view (ROV) techniques pervasively.

Guaranteeing order doesn’t come for free.

Seems ROV remains expensive even on NVIDIA hardware, I am guessing that's why the feature didn't gain wide spread adoption within DX12. It's even far more expensive on AMD hardware.

DegustatoR · Sep 2, 2023

DavidGraham said:
So NVIDIA states that Turing can actually benefit from FP16 math (it's twice the FP32 rate), but Ampere (and consequently Ada), doesn't care. Probably because of their 2xFP32 design

Turing+ run FP16 shader math on tensor cores but in case of Ampere+ FP32 runs with the same speed so they are about equal. Ampere+ may still get some benefits from running FP16 but it needs to be async compute workload I think in which case they may be capable of running such workload in parallel with FP32 math, unless bandwidth will be the limit.

DmitryKo · Sep 8, 2023

Windows Preview SDK build 25947 (Direct3D SDK v712) includes a new feature level, D3D_FEATURE_LEVEL_1_0_GENERIC = 0x100 :-?

BTW there is also D3D_FEATURE_LEVEL_1_0_CORE = 0x1000, a feature level for compute-only devices with MCDM (Microsoft Compute Driver Model) drivers - this was probably implemented for AMD Instinct MI, but so far no Windows drivers have ever been released, and this feature level was behind an experimental feature flag named D3D12ComputeOnlyDevices, which has been removed from the SDK some time ago.

Pinstripe · Sep 21, 2023

I'm getting a crash error message "The D3D12 SDK-Version configuration is invalid" (or something like that) using Adrenalin 23.9.2.

Am I doing something wrong or does the tool need updating?

DmitryKo · Sep 21, 2023

You need to download the Agility SDK 1.711.3 Preview package, extract the D3D12Core.DLL, and enable the Developer mode, as detailed in the forum post.

Pinstripe · Sep 21, 2023

Ah, that's the one! Thanks.

I took the old stable one, hence the error.

DmitryKo · Sep 21, 2023

So the official AMD Adrenalin driver 23.9.2 (23.20.11.01), driver store version 31.0.22011.1008, comes with support for Work Graphs and WaveMMA in Agility SDK 1.711.3 Preview, which was previously released in a beta Adrenalin driver 23.10.01.14 from June 2023:

Article Number: RN-RAD-WIN-23-9-2

Additional SDK Support

Microsoft® Agility SDK Preview Release v1.711.3 including Shader Model 6.8 functionality for Work Graphs, Wave Matrix Multiply-Accumulate and AV1 Encode.

Microsoft® Agility SDK Retail Release 1.610.5 including Enhanced Barriers and Vulkan on DX12 compatibility features.

That would typically mean the design is mostly finished and no major changes to the Work Graphs specification are expected.

If you look at the Change Log section, the current version 0.44 from September 2023 includes Work Graphs Tier 1.0 feature option, D3D12_WORK_GRAPHS_TIER_1_0 = 10, to prepare for the "first full work graphs release", in addition to Tier 0.1 D3D12_WORK_GRAPHS_TIER_0_1 and D3D12_WORK_GRAPHS_TIER_NOT_SUPPORTED.

The Agilty SDK Preview has not been updated yet, and the whole Work Graphs functionaly is still missing from Windows Insider Preview SDK for the Gallium semester (builds 259xx).

DegustatoR · Sep 21, 2023

DmitryKo said:
D3D12_WORK_GRAPHS_TIER_0_1

This doesn't sound very promising.

Alessio1989 · Sep 21, 2023

DegustatoR said:
This doesn't sound very promising.

it's the support of June 2023 work graphs preview
tier 1.0 states support for final version.

Pinstripe · Sep 22, 2023

It seems WorkGraphs and WaveMMA are only supported on RDNA3?

DegustatoR · Sep 22, 2023

Alessio1989 said:
it's the support of June 2023 work graphs preview
tier 1.0 states support for final version.

Both are in the API which means that both will be used in release and suggest that there will be h/w with very basic support (0.1).

Pinstripe said:
It seems WorkGraphs and WaveMMA are only supported on RDNA3?

I've read that current insider build driver supports WG on RTX cards but didn't check.

Pinstripe · Sep 22, 2023

DegustatoR said:
I've read that current insider build driver supports WG on RTX cards but didn't check.

Well, it doesn't seem to be supported on my RDNA2 card:

Experimental.WorkGraphsTier : D3D12_WORK_GRAPHS_TIER_NOT_SUPPORTED (0)

WaveMMATier : D3D12_WAVE_MMA_TIER_NOT_SUPPORTED (0)

DmitryKo · Sep 22, 2023

DegustatoR said:
Both are in the API which means that both will be used in release and suggest that there will be h/w with very basic support (0.1).

It's probably tied to specific user-mode display driver interface details, such as DDI callbacks and data structures in d3dumddi.h, rather than hardware limitations which are not defined anywhere in the specs (besides a few notes in the introductory section). BTW Tier 1.0 was first introduced back in June 2021, though the changelog doesn't specify why or when it was removed and replaced by 0.1.

Pinstripe said:
It seems WorkGraphs and WaveMMA are only supported on RDNA3?

Yes, these are for RDNA3 cards only - this was explicity stated in the release notes for the beta Adrenalin drivers, as well as the requirements section of the DirectX Blog announcement post and GPUOpen blog posts (Work Graphs, Wave MMA):

Article Number: RN-RAD-MS-AGILITY-SDK-2023-6-711

Highlights
Support for:

Microsoft® Agility SDK Preview Release v1.711.3 including Shader Model 6.8 functionality for GPU Work Graphs (GWG)).

Microsoft® Agility SDK Preview Release 1.710.0 including GPU Upload Heaps functionality.

GPU Work Graphs (GWG)

GPU Work Graphs (GWG) or Work Graphs allow the GPU to schedule and control its own work generation without requiring a round trip back to the CPU and the overhead involved with additional dispatches while simplifying typical GPU programming paradigms on Radeon™ RX 7000 series graphics cards.

See additional details, and how-to here

GPU Upload Heaps

Driver support to allow shared access of the GPU’s VRAM by both the CPU and GPU using the VRAM Resizable Base Address Register (REBAR). See Agility SDK 1.710.0 for additional details and downloads usable with this driver.

WaveMMA

New HLSL intrinsics added in Shader Model 6.8 allow applications and shader developers direct access to high-speed hardware-based Wave Matrix-Multiply-Accumulate (WaveMMA) operations on Radeon™ RX 7000 series graphics cards.

See an introduction, sample shaders and a detailed explanation of how WaveMMA works here.

Unfortunately these explanatory sections were omitted from the release notes for the official Adrenalin driver 23.9.2 (quoted in my post above).

Other features like Enhanced Barriers and GPU Upload Heaps (aka Resizable BAR) do work on earlier cards.

Alessio1989 · Sep 23, 2023

DegustatoR said:
Both are in the API which means that both will be used in release and suggest that there will be h/w with very basic support (0.1).

I've read that current insider build driver supports WG on RTX cards but didn't check.

well it's the agile SDK, totally different story if Microsoft adds both tier in a non-agile runtime update.

DegustatoR · Sep 23, 2023

Alessio1989 said:
well it's the agile SDK, totally different story if Microsoft adds both tier in a non-agile runtime update.

Why would that be a different story? "Agility SDK" is little more than the DX API distributed with games which are using the API. It is set up in a way where the loader DLL would pass the calls to either local or system API DLL(s) depending on their version. Which means that whatever will be there in "Agility SDK" will eventually be shipped with Windows, possibly as an updated version of what was shipped previously with some games. So it will have to support the exact same calls and features as the "Agility SDK" does.

DmitryKo · Sep 23, 2023

DegustatoR said:
it will have to support the exact same calls and features as the "Agility SDK" does
It is set up in a way where the loader DLL would pass the calls to either local or system API DLL(s) depending on their version. Which means that whatever will be there in "Agility SDK" will eventually be shipped with Windows, possibly as an updated version of what was shipped previously with some games.

Nope, they said this feature query will be changed in the release version: "When this feature goes out of experimental phase, the tier enum will move into a non-experimental OPTIONS struct"

There is no point in keeping this low-end tier when the source code will need to be changed anyway in order to support the final version.

Besides using an "experimental" naming convention for the feature query option, Work Graphs are only available through a separate "experimental" Direct3D device interface. That's on top of releasing this in an Agility SDK Preview and flagging this feature with D3D12EnableExperimentalFeatures() - any one of these would require the user to enable the Developer mode even in later Windows Insider Preview builds that supercede the Agilty SDK version of d3d12core.dll.

So they basically went out of their way to prevent this preview from shipping to end users except consenting beta testers, since you don't want to tell regular customers to enable some developer feature just to run your application.

Alessio1989 · Sep 23, 2023

DegustatoR said:
Why would that be a different story? "Agility SDK" is little more than the DX API distributed with games which are using the API. It is set up in a way where the loader DLL would pass the calls to either local or system API DLL(s) depending on their version. Which means that whatever will be there in "Agility SDK" will eventually be shipped with Windows, possibly as an updated version of what was shipped previously with some games. So it will have to support the exact same calls and features as the "Agility SDK" does.

I don't remember what was but that's not the first time some "legacy"/dev only meant tiers were added. yes agility sdk is meant to refistribute DX12 features not yet part of the client OS directx runtimes but that's all. Naming conventions were never clever on Microsoft stuff, just remember VPAndRTArrayIndexFromAnyShaderFeedingRasterizerSupportedWithoutGSEmulation
It may be a thing on driver level or something else and they kept that dumb name since someone is already shipping something with that and they wanna avoid to break things.
I wouldn't be surprised IF they will change something in the API they will add to an OS updated. Maybe just another tier with better naming but with same integer/mask bit value.

DegustatoR · Oct 13, 2023

Work graphs API - compute rasterizer learning sample

Learn more about the power of work graphs API in our detailed blog, taking you step-by-step through an example which implements a scanline rasterizer.

gpuopen.com

Flappy Pannus · Oct 14, 2023

Interesting thread originating from this blog post Martin Fuller: Dynamic Resolution Scaling (DRS) Implementation Best Practice (Martin works on Microsoft’s Advanced Technology Group), talking about the issues with implementing DRS on PC. I'd like to see DRS implemented more widely on PC, but it's a tougher problem than consoles.

https://twitter.com/x/status/1712325976577941944

(For those without a logged-in Twitter account)

pjbliverpool · Oct 14, 2023

Flappy Pannus said:
Interesting thread originating from this blog post Martin Fuller: Dynamic Resolution Scaling (DRS) Implementation Best Practice (Martin works on Microsoft’s Advanced Technology Group), talking about the issues with implementing DRS on PC. I'd like to see DRS implemented more widely on PC, but it's a tougher problem than consoles.

https://twitter.com/x/status/1712325976577941944

(For those without a logged-in Twitter account)

View attachment 9802

Some very cool info in there and explains very nicely why we don't see DRS on PC nearly as often as console, and when we do, it usually doesn't work as well.

Of course the PC has an easier time approaching the issue from the other direction via VRR, so it's a trade off.

Direct3D feature levels discussion

Highlights​

GPU Work Graphs (GWG)​

GPU Upload Heaps​

WaveMMA​

B3D Scallywag

Highlights

GPU Work Graphs (GWG)

GPU Upload Heaps

WaveMMA