Direct3D feature levels discussion

These bits are very interesting:
Don’t assume that half-precision floats are always faster than full precision and the reverse.
  • On NVIDIA Ampere GPUs, it’s just as efficient to execute FP32 as FP16 instructions. The overhead of converting between precision formats may just end up with a net loss.
  • NVIDIA Turing GPUs may benefit from using FP16 math, as FP16 can be issued at twice the rate of FP32.
So NVIDIA states that Turing can actually benefit from FP16 math (it's twice the FP32 rate), but Ampere (and consequently Ada), doesn't care. Probably because of their 2xFP32 design.
Don’t use raster order view (ROV) techniques pervasively.
  • Guaranteeing order doesn’t come for free.
Seems ROV remains expensive even on NVIDIA hardware, I am guessing that's why the feature didn't gain wide spread adoption within DX12. It's even far more expensive on AMD hardware.
 
So NVIDIA states that Turing can actually benefit from FP16 math (it's twice the FP32 rate), but Ampere (and consequently Ada), doesn't care. Probably because of their 2xFP32 design
Turing+ run FP16 shader math on tensor cores but in case of Ampere+ FP32 runs with the same speed so they are about equal. Ampere+ may still get some benefits from running FP16 but it needs to be async compute workload I think in which case they may be capable of running such workload in parallel with FP32 math, unless bandwidth will be the limit.
 
Windows Preview SDK build 25947 (Direct3D SDK v712) includes a new feature level, D3D_FEATURE_LEVEL_1_0_GENERIC = 0x100 :-?

BTW there is also D3D_FEATURE_LEVEL_1_0_CORE = 0x1000, a feature level for compute-only devices with MCDM (Microsoft Compute Driver Model) drivers - this was probably implemented for AMD Instinct MI, but so far no Windows drivers have ever been released, and this feature level was behind an experimental feature flag named D3D12ComputeOnlyDevices, which has been removed from the SDK some time ago.
 
Last edited:
I'm getting a crash error message "The D3D12 SDK-Version configuration is invalid" (or something like that) using Adrenalin 23.9.2.

Am I doing something wrong or does the tool need updating?
 
So the official AMD Adrenalin driver 23.9.2 (23.20.11.01), driver store version 31.0.22011.1008, comes with support for Work Graphs and WaveMMA in Agility SDK 1.711.3 Preview, which was previously released in a beta Adrenalin driver 23.10.01.14 from June 2023:

Article Number: RN-RAD-WIN-23-9-2

That would typically mean the design is mostly finished and no major changes to the Work Graphs specification are expected.

If you look at the Change Log section, the current version 0.44 from September 2023 includes Work Graphs Tier 1.0 feature option, D3D12_WORK_GRAPHS_TIER_1_0 = 10, to prepare for the "first full work graphs release", in addition to Tier 0.1 D3D12_WORK_GRAPHS_TIER_0_1 and D3D12_WORK_GRAPHS_TIER_NOT_SUPPORTED.

The Agilty SDK Preview has not been updated yet, and the whole Work Graphs functionaly is still missing from Windows Insider Preview SDK for the Gallium semester (builds 259xx).
 
Last edited:
Both are in the API which means that both will be used in release and suggest that there will be h/w with very basic support (0.1).

It's probably tied to specific user-mode display driver interface details, such as DDI callbacks and data structures in d3dumddi.h, rather than hardware limitations which are not defined anywhere in the specs (besides a few notes in the introductory section). BTW Tier 1.0 was first introduced back in June 2021, though the changelog doesn't specify why or when it was removed and replaced by 0.1.

It seems WorkGraphs and WaveMMA are only supported on RDNA3?

Yes, these are for RDNA3 cards only - this was explicity stated in the release notes for the beta Adrenalin drivers, as well as the requirements section of the DirectX Blog announcement post and GPUOpen blog posts (Work Graphs, Wave MMA):

Article Number: RN-RAD-MS-AGILITY-SDK-2023-6-711

Highlights​

Support for:

GPU Work Graphs (GWG)​

  • GPU Work Graphs (GWG) or Work Graphs allow the GPU to schedule and control its own work generation without requiring a round trip back to the CPU and the overhead involved with additional dispatches while simplifying typical GPU programming paradigms on Radeon™ RX 7000 series graphics cards.
  • See additional details, and how-to here

GPU Upload Heaps​

  • Driver support to allow shared access of the GPU’s VRAM by both the CPU and GPU using the VRAM Resizable Base Address Register (REBAR). See Agility SDK 1.710.0 for additional details and downloads usable with this driver.

WaveMMA​

  • New HLSL intrinsics added in Shader Model 6.8 allow applications and shader developers direct access to high-speed hardware-based Wave Matrix-Multiply-Accumulate (WaveMMA) operations on Radeon™ RX 7000 series graphics cards.
  • See an introduction, sample shaders and a detailed explanation of how WaveMMA works here.

Unfortunately these explanatory sections were omitted from the release notes for the official Adrenalin driver 23.9.2 (quoted in my post above).

Other features like Enhanced Barriers and GPU Upload Heaps (aka Resizable BAR) do work on earlier cards.
 
Last edited:
Both are in the API which means that both will be used in release and suggest that there will be h/w with very basic support (0.1).


I've read that current insider build driver supports WG on RTX cards but didn't check.
well it's the agile SDK, totally different story if Microsoft adds both tier in a non-agile runtime update.
 
well it's the agile SDK, totally different story if Microsoft adds both tier in a non-agile runtime update.
Why would that be a different story? "Agility SDK" is little more than the DX API distributed with games which are using the API. It is set up in a way where the loader DLL would pass the calls to either local or system API DLL(s) depending on their version. Which means that whatever will be there in "Agility SDK" will eventually be shipped with Windows, possibly as an updated version of what was shipped previously with some games. So it will have to support the exact same calls and features as the "Agility SDK" does.
 
it will have to support the exact same calls and features as the "Agility SDK" does
It is set up in a way where the loader DLL would pass the calls to either local or system API DLL(s) depending on their version. Which means that whatever will be there in "Agility SDK" will eventually be shipped with Windows, possibly as an updated version of what was shipped previously with some games.


Nope, they said this feature query will be changed in the release version: "When this feature goes out of experimental phase, the tier enum will move into a non-experimental OPTIONS struct"

There is no point in keeping this low-end tier when the source code will need to be changed anyway in order to support the final version.


Besides using an "experimental" naming convention for the feature query option, Work Graphs are only available through a separate "experimental" Direct3D device interface. That's on top of releasing this in an Agility SDK Preview and flagging this feature with D3D12EnableExperimentalFeatures() - any one of these would require the user to enable the Developer mode even in later Windows Insider Preview builds that supercede the Agilty SDK version of d3d12core.dll.

So they basically went out of their way to prevent this preview from shipping to end users except consenting beta testers, since you don't want to tell regular customers to enable some developer feature just to run your application.
 
Last edited:
Why would that be a different story? "Agility SDK" is little more than the DX API distributed with games which are using the API. It is set up in a way where the loader DLL would pass the calls to either local or system API DLL(s) depending on their version. Which means that whatever will be there in "Agility SDK" will eventually be shipped with Windows, possibly as an updated version of what was shipped previously with some games. So it will have to support the exact same calls and features as the "Agility SDK" does.
I don't remember what was but that's not the first time some "legacy"/dev only meant tiers were added. yes agility sdk is meant to refistribute DX12 features not yet part of the client OS directx runtimes but that's all. Naming conventions were never clever on Microsoft stuff, just remember VPAndRTArrayIndexFromAnyShaderFeedingRasterizerSupportedWithoutGSEmulation
It may be a thing on driver level or something else and they kept that dumb name since someone is already shipping something with that and they wanna avoid to break things.
I wouldn't be surprised IF they will change something in the API they will add to an OS updated. Maybe just another tier with better naming but with same integer/mask bit value.
 
Interesting thread originating from this blog post Martin Fuller: Dynamic Resolution Scaling (DRS) Implementation Best Practice (Martin works on Microsoft’s Advanced Technology Group), talking about the issues with implementing DRS on PC. I'd like to see DRS implemented more widely on PC, but it's a tougher problem than consoles.


(For those without a logged-in Twitter account)

View attachment 9802

Some very cool info in there and explains very nicely why we don't see DRS on PC nearly as often as console, and when we do, it usually doesn't work as well.

Of course the PC has an easier time approaching the issue from the other direction via VRR, so it's a trade off.
 
Back
Top