Direct3D feature levels discussion

DegustatoR · Jul 20, 2024

DmitryKo said:
It's tier 1_1, not 2_0

Technically we don't know what tier it will end up being.
1_1 is just a tier for features which are in development right now.
These may end up in 1_0 at release or 1_1 may be moved to release or they may do a 2_0 for that.
Will depend on if there will be any point in making such tiers for the available h/w. As of right now I don't think that there is? All h/w supporting current 1_0 support mesh shaders so it should in theory support mesh leafs fine.

DmitryKo · Jul 20, 2024

DegustatoR said:
any point in making such tiers for the available h/w... I don't think that there is? All h/w supporting current 1_0 support mesh shaders

Feature options differentiate both hardware capabilities and capabilities of Direct3D 12 runtime in different releases of Windows.

For example, although raytracing tier 1_1 and shader model 6_5 are supported by all tier 1_0 hardware, there are previous Direct3D runtime versions which only supported tier 1_0 and shader model 6_3 - so they need to maintain a separate tier 1_1.

Similarly, Work Graphs and shader model 6_8 will be released in Germanium, but Mesh Nodes and shader model 6_9 (and Wave MMA) are scheduled for Dilithium - so there are two separate Work Graph tiers for each implementation of the Direct3D runtime included in these Windows releases.

Lurkmass · Jul 20, 2024

Some implementations may prefer to minimize work backpressure at the graphics transition (e.g. mesh noddes) in a graph. For example, some hardware may experience heavyweight transitions when switching between the execution of compute based nodes and graphics leaf nodes.

Such implementations can glean some information about worst-case graph dataflow from the various maximum output declarations and grid/group size declarations that are statically declared from the graph. Beyond that, extra declarations are available here, specific to mesh nodes to to enable more accurate workload estimation given information from the application.

This section of the document likely refers to the fact that some architectures will naturally ALWAYS insert barriers between draw and dispatch commands when transitioning between them ... (AKA "subchannel switch")

A change to the graphics, compute, or copy workload type (also known as a subchannel switch) or the usage of a UAV barrier on the same queue triggers a wait for the idle (WFI) task. WFI forces all warps on the same queue to be fully drained, leaving a workload gap in the unit throughputs and SM occupancy.

If WFI is unavoidable and causes a large throughput gap, filling that gap with async compute could be a good solution. The common source of barriers and WFIs are as follows:

Resource transitions (barriers) between draw, dispatch, or copy calls

UAV barriers

Raster state changes

Back-to-back mixing of draw, dispatch, and copy calls on the same queue

Descriptor heap changes

The cost of doing a lot of PSO switching w/ mesh nodes could become restrictive for those hardware designs due to the presence of extra barriers ...

Lurkmass · Aug 7, 2024

Remove WaveMatrix by python3kgae · Pull Request #6807 · microsoft/DirectXShaderCompiler

Remove WaveMatrix from main branch. The DXIL operations are changed to Depreated to avoid affect DXIL operation ID.

github.com

[DOC] update release note for remove WaveMatrix. by python3kgae · Pull Request #6842 · microsoft/DirectXShaderCompiler

Add remove WaveMatrix in release note.

github.com

Some major updates. The Direct3D team at Microsoft have decided to NOT standardize Wave Matrix operations (marked as deprecated) and this decision is FINAL until should there ever be a future notice ...

Dictator · Aug 7, 2024

Lurkmass said:
Remove WaveMatrix by python3kgae · Pull Request #6807 · microsoft/DirectXShaderCompiler

Remove WaveMatrix from main branch. The DXIL operations are changed to Depreated to avoid affect DXIL operation ID.

github.com

[DOC] update release note for remove WaveMatrix. by python3kgae · Pull Request #6842 · microsoft/DirectXShaderCompiler

Add remove WaveMatrix in release note.

github.com

Some major updates. The Direct3D team at Microsoft have decided to NOT standardize Wave Matrix operations (marked as deprecated) and this decision is FINAL until should there ever be a future notice ...

Why on earth would they decide that? uhhh

Lurkmass · Aug 7, 2024

Dictator said:
Why on earth would they decide that? uhhh

Well you should ask yourself this question. If the most compelling use case for AI in rendering so far is limited to just temporal upscaling and there's an upcoming super resolution API to fulfill that exact need are there any other applications out there that need to explicit matrix operations exposed ?

Dictator · Aug 7, 2024

Lurkmass said:
Well you should ask yourself this question. If the most compelling use case for AI in rendering so far is limited to just temporal upscaling and there's an upcoming super resolution API to fulfill that exact need are there any other applications out there that need to explicit matrix operations exposed ?

The reason is for DX to be more forward looking. They added in compute shaders for DX11 back at a time when their use was limited to simple things in comparison to how they are now used. DXC support half precision... why not matrix stuff?

DegustatoR · Aug 7, 2024

Tbf there are no signs that the extension is "deprecated" and that "Microsoft have decided to NOT standardize Wave Matrix operations", that's pure fiction on Lurkmass's part. What this points to is them moving the WMMA into "reserved" status with no further info given. It is possible that they'll get back onto it after finalizing and launching the current scope.

Lurkmass · Aug 7, 2024

Dictator said:
The reason is for DX to be more forward looking.

The Direct3D team doesn't believe that any new widespread applications will materialize for it ...

Dictator said:
They added in compute shaders for DX11 back at a time when their use was limited to simple things in comparison to how they are now used.

Compute shaders were introduced in D3D10.1. D3D11 added support for UAV resources to expose atomic operations ...

Dictator said:
DXC support half precision... why not matrix stuff?

Half-precision math can be used to optimize many different sets of shaders for higher throughput/lower register pressure. Can you describe to me what other applications there would be for matrix math hardware acceleration ?

DegustatoR said:
Tbf there are no signs that the extension is "deprecated" and that "Microsoft have decided to NOT standardize Wave Matrix operations", that's pure fiction on Lurkmass's part. What this points to is them moving the WMMA into "reserved" status with no further info given. It is possible that they'll get back onto it after finalizing and launching the current scope.

Add DXIL 1.8 op code cap and move WaveMatrix intrisics to SM 6.9 by hekota · Pull Request #6163 · microsoft/DirectXShaderCompiler

Adds max opcode value for DXIL 1.8 and moves WaveMatrix intrinsics into future shader model 6.9. Contributes to #6133 Related to #6125

github.com

Above comment reiterated their latest stance to remove the feature altogether and they've updated the documentation accordingly ... (even the tests have been removed too)

DegustatoR · Aug 7, 2024

Lurkmass said:
Above comment reiterated their latest stance to remove the feature altogether and they've updated the documentation accordingly ... (even the tests have been removed too)

Add DXIL 1.8 op code cap and move WaveMatrix intrisics to SM 6.9 by hekota · Pull Request #6163 · microsoft/DirectXShaderCompiler

Adds max opcode value for DXIL 1.8 and moves WaveMatrix intrinsics into future shader model 6.9. Contributes to #6133 Related to #6125

github.com

Lurkmass · Aug 7, 2024

DegustatoR said:
Add DXIL 1.8 op code cap and move WaveMatrix intrisics to SM 6.9 by hekota · Pull Request #6163 · microsoft/DirectXShaderCompiler

Adds max opcode value for DXIL 1.8 and moves WaveMatrix intrinsics into future shader model 6.9. Contributes to #6133 Related to #6125

github.com

However, I would strongly advise against making any assumptions about the meaning of these opcodes being stable.

Not a great look for a feature that's been in incubation for over 2 years no matter how you slice it since they probably don't intend to revisit working on it anytime soon ...

DegustatoR · Aug 7, 2024

Lurkmass said:
Not a great look for a feature that's been in incubation for over 2 years no matter how you slice it since they probably don't intend to revisit working on it anytime soon ...

You don't know what they intend. It is possible that the feature is "reserved" for a future scope, it is possible that they've decided to rethink the approach, it is possible that they may want to move that out of DX completely. All we can say for now is that the feature likely won't appear in SM 6.9 either.

Lurkmass · Aug 7, 2024

DegustatoR said:
You don't know what they intend. It is possible that the feature is "reserved" for a future scope, it is possible that they've decided to rethink the approach, it is possible that they may want to move that out of DX completely. All we can say for now is that the feature likely won't appear in SM 6.9 either.

They've had 2 years (probably even longer) to consider all of your ideas that you brought up here and they scrubbed all traces of their work on Wave Matrix operations with their only HLSL compiler (DXC) ...

DegustatoR · Aug 7, 2024

Lurkmass said:
They've had 2 years (probably even longer) to consider all of your ideas that you brought up here and they scrubbed all traces of their work on Wave Matrix operations with their only HLSL compiler (DXC) ...

They've removed the feature from DXIL 1.8 scope (the reason is given in your first link) while the codes remain as reserved so they can be used in a future scope (1.9... 2.0... etc.)

Lurkmass · Aug 7, 2024

DegustatoR said:
They've removed the feature from DXIL 1.8 scope (the reason is given in your first link) while the codes remain as reserved so they can be used in a future scope (1.9... 2.0... etc.)

It's not going to be in DXIL 1.9/SM 6.9 if the release notes for "Upcoming Release" are to be believed in the documentation. Unlikely to hit DXIL 2.0 either if they are dropping all feature work in late stages of development ...

The team wants shader model 6.9 to be finalized in September for the release of their Mesh Nodes extension to Work Graphs ...

DmitryKo · Aug 7, 2024

Lurkmass said:
The Direct3D team at Microsoft have decided to NOT standardize Wave Matrix operations (marked as deprecated)

Dictator said:
DXC support half precision... why not matrix stuff?

Well, it does look like the most current WaveMMA spec and relevant DXIL opcodes are dead in the water - which is not surprising, considering it's been almost 5 years in the making, but its publc implementation only appeared in the June 2023 Agility SDK Preview under the experimental shader model flag, only to be removed again this Spring.

Both NVidia's Tensor Cores and AMD's RDNA4/RDNA5 would have capabilities that go beyond the scope of the original WaveMMA spec, i.e. SWMMA (Sparse Wave Matrix Multiply Accumulate) intstructions, 8-bit floating point formats, and 16x32 matrix dimensions.

It doesn't mean the WaveMMA spec will be abandoned - though they probably need to rework it to better match current hardware.

Current DXIL opcodes may be reused by the new/updated WaveMMA specs or they may be retired entirely and replaced by new opcodes and/or HLSL mnemonics - it's just Microsoft has no solid plan right now, since their current focus is on the ARM64/Copilot+ enabled release of Windows Germanium 24H2 this November.

DegustatoR said:
It is possible that the feature is "reserved" for a future scope, it is possible that they've decided to rethink the approach, it is possible that they may want to move that out of DX completely.

Yes, exactly. WaveMMA and Work Graphs were the same 3 years in the making when they both were released in the June 2023 preview Agility SDK - but WaveMMA seems to have attracted a lot less attention from the developers, if you look at the sheer number of revisions to the Work Graphs specs leading to v0.43 and beyond, and compare it to WaveMMA change log...

Lurkmass · Aug 7, 2024

@DmitryKo If WaveMMA is on the back burner then it ought to be deservedly so when there's no other proven major applications for it in rendering ...

Sure there's always going to be more advanced HW feature implementations in the future but that doesn't really serve as a very good reason to block the initial release of basic sets of functionality if they work on existing hardware. If we need to look at enhancing a feature then that's what we invented the concept of 'extensions' for ...

DmitryKo · Aug 7, 2024

WaveMMA has never been intended for rendering applications, both NVidia and AMD advertised it for custom ML/AI and compute workloads, as opposed to a predefined set of 'fixed-function' Direct3D/DirectML metacommands implemented by the graphics card driver.

Programming Tensor Cores in CUDA 9 | NVIDIA Technical Blog

A defining feature of the new NVIDIA Volta GPU architecture is Tensor Cores, which give the NVIDIA V100 accelerator a peak throughput that is 12x the 32-bit floating point throughput of the previous…

developer.nvidia.com

WMMA benefits for ML and general compute workloads

Read about how the new WMMA instructions added in HLSL SM 6.8 allow shader developers to accelerate GEMM matrix operations.

gpuopen.com

How to accelerate AI applications on RDNA 3 using WMMA

This blog is a quick how-to guide for using the WMMA feature with our RDNA 3 GPU architecture using a Hello World example.

gpuopen.com

(NVidia Research did publish a paper on 'Neural Appearance Rendering', which should utilize PyTorch/TensorFlow models and HLSL/GLSL with WaveMMA instructions to generate neural textures in realtime).

Maybe this particular HLSL implementation wasn't well thought-out for seamless expansion, and sometimes it's better to start over from scratch rather than put patches on an existing design. It doesn't mean current hardware won't be supported by the new/updated specs, only that HLSL mnemonics/data types, DXIL opcodes and relevant Direct3D APIs would be organized a bit differently.

DmitryKo · Aug 7, 2024

BTW part of the reason why WaveMMA is postponed could be that Microsoft DXC HLSL compiler is finally moving from a 10-year-old proprietary fork of Clang and LLVM 3.7 bytecode, which became impossible to update to current C++11/14/17 etc. standards, to an official 'HLSL' front-end of Clang and a new 'DirectX' back-end in LLVM to support the upcoming C++ language features in the HLSL 202x / 202y specs.

DXC 1.8.2405 Available Now, Including HLSL 202x - DirectX Developer Blog

The HLSL team is excited to announce DXC 1.8.2405 which is a packed release! In addition to a healthy assortment of bug fixes and quality of life improvements, this release features two things we’re really excited about. First this is the first DXC release to contain Windows binaries built with...

devblogs.microsoft.com

HLSL Support — Clang 21.0.0git documentation

User Guide for the DirectX Target — LLVM 21.0.0git documentation

Implementing the Unimplementable: Bringing HLSL's Standard Library into Clang
https://llvm.org/devmtg/2022-11/slides/TechTalk1-Implementing-The-Unimplementable-HLSL.pdf

Deleted member 2197 · Aug 7, 2024

https://twitter.com/x/status/1820856376803147827

Direct3D feature levels discussion

Deleted member 2197

Guest