Direct3D feature levels discussion

It's tier 1_1, not 2_0
Technically we don't know what tier it will end up being.
1_1 is just a tier for features which are in development right now.
These may end up in 1_0 at release or 1_1 may be moved to release or they may do a 2_0 for that.
Will depend on if there will be any point in making such tiers for the available h/w. As of right now I don't think that there is? All h/w supporting current 1_0 support mesh shaders so it should in theory support mesh leafs fine.
 
any point in making such tiers for the available h/w... I don't think that there is? All h/w supporting current 1_0 support mesh shaders

Feature options differentiate both hardware capabilities and capabilities of Direct3D 12 runtime in different releases of Windows.

For example, although raytracing tier 1_1 and shader model 6_5 are supported by all tier 1_0 hardware, there are previous Direct3D runtime versions which only supported tier 1_0 and shader model 6_3 - so they need to maintain a separate tier 1_1.

Similarly, Work Graphs and shader model 6_8 will be released in Germanium, but Mesh Nodes and shader model 6_9 (and Wave MMA) are scheduled for Dilithium - so there are two separate Work Graph tiers for each implementation of the Direct3D runtime included in these Windows releases.
 
Last edited:
Some implementations may prefer to minimize work backpressure at the graphics transition (e.g. mesh noddes) in a graph. For example, some hardware may experience heavyweight transitions when switching between the execution of compute based nodes and graphics leaf nodes.

Such implementations can glean some information about worst-case graph dataflow from the various maximum output declarations and grid/group size declarations that are statically declared from the graph. Beyond that, extra declarations are available here, specific to mesh nodes to to enable more accurate workload estimation given information from the application.
This section of the document likely refers to the fact that some architectures will naturally ALWAYS insert barriers between draw and dispatch commands when transitioning between them ... (AKA "subchannel switch")
A change to the graphics, compute, or copy workload type (also known as a subchannel switch) or the usage of a UAV barrier on the same queue triggers a wait for the idle (WFI) task. WFI forces all warps on the same queue to be fully drained, leaving a workload gap in the unit throughputs and SM occupancy.

If WFI is unavoidable and causes a large throughput gap, filling that gap with async compute could be a good solution. The common source of barriers and WFIs are as follows:

  • Resource transitions (barriers) between draw, dispatch, or copy calls
  • UAV barriers
  • Raster state changes
  • Back-to-back mixing of draw, dispatch, and copy calls on the same queue
  • Descriptor heap changes
The cost of doing a lot of PSO switching w/ mesh nodes could become restrictive for those hardware designs due to the presence of extra barriers ...
 

Some major updates. The Direct3D team at Microsoft have decided to NOT standardize Wave Matrix operations (marked as deprecated) and this decision is FINAL until should there ever be a future notice ...
 

Some major updates. The Direct3D team at Microsoft have decided to NOT standardize Wave Matrix operations (marked as deprecated) and this decision is FINAL until should there ever be a future notice ...
Why on earth would they decide that? uhhh
 
Why on earth would they decide that? uhhh
Well you should ask yourself this question. If the most compelling use case for AI in rendering so far is limited to just temporal upscaling and there's an upcoming super resolution API to fulfill that exact need are there any other applications out there that need to explicit matrix operations exposed ?
 
Well you should ask yourself this question. If the most compelling use case for AI in rendering so far is limited to just temporal upscaling and there's an upcoming super resolution API to fulfill that exact need are there any other applications out there that need to explicit matrix operations exposed ?
The reason is for DX to be more forward looking. They added in compute shaders for DX11 back at a time when their use was limited to simple things in comparison to how they are now used. DXC support half precision... why not matrix stuff?
 
Tbf there are no signs that the extension is "deprecated" and that "Microsoft have decided to NOT standardize Wave Matrix operations", that's pure fiction on Lurkmass's part. What this points to is them moving the WMMA into "reserved" status with no further info given. It is possible that they'll get back onto it after finalizing and launching the current scope.
 
The reason is for DX to be more forward looking.
The Direct3D team doesn't believe that any new widespread applications will materialize for it ...
They added in compute shaders for DX11 back at a time when their use was limited to simple things in comparison to how they are now used.
Compute shaders were introduced in D3D10.1. D3D11 added support for UAV resources to expose atomic operations ...
DXC support half precision... why not matrix stuff?
Half-precision math can be used to optimize many different sets of shaders for higher throughput/lower register pressure. Can you describe to me what other applications there would be for matrix math hardware acceleration ?
Tbf there are no signs that the extension is "deprecated" and that "Microsoft have decided to NOT standardize Wave Matrix operations", that's pure fiction on Lurkmass's part. What this points to is them moving the WMMA into "reserved" status with no further info given. It is possible that they'll get back onto it after finalizing and launching the current scope.

Above comment reiterated their latest stance to remove the feature altogether and they've updated the documentation accordingly ... (even the tests have been removed too)
 
However, I would strongly advise against making any assumptions about the meaning of these opcodes being stable.
Not a great look for a feature that's been in incubation for over 2 years no matter how you slice it since they probably don't intend to revisit working on it anytime soon ...
 
Not a great look for a feature that's been in incubation for over 2 years no matter how you slice it since they probably don't intend to revisit working on it anytime soon ...
You don't know what they intend. It is possible that the feature is "reserved" for a future scope, it is possible that they've decided to rethink the approach, it is possible that they may want to move that out of DX completely. All we can say for now is that the feature likely won't appear in SM 6.9 either.
 
You don't know what they intend. It is possible that the feature is "reserved" for a future scope, it is possible that they've decided to rethink the approach, it is possible that they may want to move that out of DX completely. All we can say for now is that the feature likely won't appear in SM 6.9 either.
They've had 2 years (probably even longer) to consider all of your ideas that you brought up here and they scrubbed all traces of their work on Wave Matrix operations with their only HLSL compiler (DXC) ...
 
They've had 2 years (probably even longer) to consider all of your ideas that you brought up here and they scrubbed all traces of their work on Wave Matrix operations with their only HLSL compiler (DXC) ...
They've removed the feature from DXIL 1.8 scope (the reason is given in your first link) while the codes remain as reserved so they can be used in a future scope (1.9... 2.0... etc.)
 
They've removed the feature from DXIL 1.8 scope (the reason is given in your first link) while the codes remain as reserved so they can be used in a future scope (1.9... 2.0... etc.)
It's not going to be in DXIL 1.9/SM 6.9 if the release notes for "Upcoming Release" are to be believed in the documentation. Unlikely to hit DXIL 2.0 either if they are dropping all feature work in late stages of development ...

The team wants shader model 6.9 to be finalized in September for the release of their Mesh Nodes extension to Work Graphs ...
 
The Direct3D team at Microsoft have decided to NOT standardize Wave Matrix operations (marked as deprecated)
DXC support half precision... why not matrix stuff?

Well, it does look like the most current WaveMMA spec and relevant DXIL opcodes are dead in the water - which is not surprising, considering it's been almost 5 years in the making, but its publc implementation only appeared in the June 2023 Agility SDK Preview under the experimental shader model flag, only to be removed again this Spring.

Both NVidia's Tensor Cores and AMD's RDNA4/RDNA5 would have capabilities that go beyond the scope of the original WaveMMA spec, i.e. SWMMA (Sparse Wave Matrix Multiply Accumulate) intstructions, 8-bit floating point formats, and 16x32 matrix dimensions.


It doesn't mean the WaveMMA spec will be abandoned - though they probably need to rework it to better match current hardware.

Current DXIL opcodes may be reused by the new/updated WaveMMA specs or they may be retired entirely and replaced by new opcodes and/or HLSL mnemonics - it's just Microsoft has no solid plan right now, since their current focus is on the ARM64/Copilot+ enabled release of Windows Germanium 24H2 this November.

It is possible that the feature is "reserved" for a future scope, it is possible that they've decided to rethink the approach, it is possible that they may want to move that out of DX completely.
Yes, exactly. WaveMMA and Work Graphs were the same 3 years in the making when they both were released in the June 2023 preview Agility SDK - but WaveMMA seems to have attracted a lot less attention from the developers, if you look at the sheer number of revisions to the Work Graphs specs leading to v0.43 and beyond, and compare it to WaveMMA change log...
 
Last edited:
@DmitryKo If WaveMMA is on the back burner then it ought to be deservedly so when there's no other proven major applications for it in rendering ...

Sure there's always going to be more advanced HW feature implementations in the future but that doesn't really serve as a very good reason to block the initial release of basic sets of functionality if they work on existing hardware. If we need to look at enhancing a feature then that's what we invented the concept of 'extensions' for ...
 
WaveMMA has never been intended for rendering applications, both NVidia and AMD advertised it for custom ML/AI and compute workloads, as opposed to a predefined set of 'fixed-function' Direct3D/DirectML metacommands implemented by the graphics card driver.


(NVidia Research did publish a paper on 'Neural Appearance Rendering', which should utilize PyTorch/TensorFlow models and HLSL/GLSL with WaveMMA instructions to generate neural textures in realtime).


Maybe this particular HLSL implementation wasn't well thought-out for seamless expansion, and sometimes it's better to start over from scratch rather than put patches on an existing design. It doesn't mean current hardware won't be supported by the new/updated specs, only that HLSL mnemonics/data types, DXIL opcodes and relevant Direct3D APIs would be organized a bit differently.
 
Last edited:
BTW part of the reason why WaveMMA is postponed could be that Microsoft DXC HLSL compiler is finally moving from a 10-year-old proprietary fork of Clang and LLVM 3.7 bytecode, which became impossible to update to current C++11/14/17 etc. standards, to an official 'HLSL' front-end of Clang and a new 'DirectX' back-end in LLVM to support the upcoming C++ language features in the HLSL 202x / 202y specs.




Implementing the Unimplementable: Bringing HLSL's Standard Library into Clang
https://llvm.org/devmtg/2022-11/slides/TechTalk1-Implementing-The-Unimplementable-HLSL.pdf
 
Last edited:
Back
Top