Direct3D feature levels discussion

DmitryKo · Feb 4, 2024

FYI the idea of strict feature sets tied to each major release of DirectX was abandoned some 12 years ago during the development of Windows 8.

First, a bit of historical overview that kind of sums up this entire thread. The original Direct3D 10 API released in Windows Longhorn/Vista did not use feature levels - these were first introduced with Direct3D 10.1 update, which required additional features on top of 10.0 requirements. Direct3D 10 runtime also required a new user-mode Display Driver Interface (DDI), so it would not work with DX9-class cards and drivers. More than that, you couldn't even use any of the updated 10.1 APIs on 10.0 hardware.

This design approach wasn't very developer-friendly. The new API has been refactored and cleaned-up, making it source-code incompatible (Microsoft even considered renaming it Windows Graphics Foundation (WGF) during the development of Longhorn / Windows Vista). But since developers had to support older Direct3D9 hardware, they needed to maintain separate rendering front-ends for Direct3D10 and Direct3D9. WDDM drivers also had to provide two separate user-mode DDIs to support Direct3D 9 and 10 runtimes and applications.

Windows 7 came with Direct3D 11 which wasn't a refactoring like Direct3D 10 before, but rather an update and extension of the existing Direct3D 10 APIs, so it would run on existing Direct3D 10 hardware. Feature level 11_0 was added as a superset to existing levels 10_0 and 10_1 to allow the use of new APIs and hardware features.

Then Direct3D 11.1 runtime in Windows 8 introduced a new feature level 11_1 with a few optional features on level 11_0 (and 10_0/10_1), because both NVidia Fermi and NVidia Kepler did not qualify for level 11_1, with only 8 memory descriptors, or 'UAV slots' in all stages by Direct3D 11 speak (BTW Fermi also had other quirks which had to be worked around in drivers during the transition to WDDM 2.0 and Direct3D12, but driver support was removed very soon).

Direct3D 11.1 also added what they called "10 level 9", a subset of DIrect3D 10 APIs that could run on top of features provided by DX9-class hardware, with three new 'downlevel' feature levels 9_1, 9_2 and 9_3 corresponding to shader model 2.0/3.0 hardware; to make it possible, the Direct3D 11.1 runtime directly called DDI9 interfaces in the WDDM drivers.

It also used emulation of lower feature levels, the so-called 'feature level upgrade', so the WDDM driver would only implement the highest supported feature level, instead of explicitly supporting every feature level possible with DDI10, and the runtime would automatically convert API calls and data structures.

Windows 8.x didn't really take off with its controversial changes to the user interface, so Windows 7 remained the most used version - unfortunately it was stuck with feature level 11_0, because Microsoft couldn't backport WDDM 1.2 there. Therefore developers couldn't use significant new Direct3D 11.x features like uniform memory access (Unordered Access Views, or UAVs, and Shader resource views, or SRVs) and tiled resources (virtual GPU memory paging).

When Direct3D 12 in Windows 10 refactored the Direct3D 11 API to remove almost all data processing in the Direct3D runtime, making most APIs and features just a thin layer over DDI12 calls in the user-mode driver, this required redefining the basic feature set of Direct3D 11 hardware to support GPUMMU / IOMMU virtual memory models (which BTW has been in the making since WinHEC 2006 announcement of WDDM 2.x).

As a result, the difference between levels 11_0, 11_1 and the baseline level 12_0 is small in Direct3D 12, because most recent hardware at the time was capable of a great share of features required on level 12_0 so these were exposed as optional on lower levels 11_0 and 11_1. And level 12_1 was primarily about requiring rasterizer-ordered views, which didn't really take off in real-word usage.

So it's not nearly like it used to be during Direct3D 7/8/9 era, when hardware vendors could skip entire parts of the rendering pipeline by setting 'capability bits' (caps) which were counted by the hundreds and made life of both end users and Direct3D developers quite miserable...

Feature level 12_2 from 2021 remains the most significant upgrade that provides raytracing to improve lighting calculations, mesh shaders to improve geometry processing, and also sampler feedback and tiled resources to improve handling of large textures which is part of what Microsoft calls the 'Xbox Velocity Architecture' (see GameStack 2021 Live 'Game Asset Streaming' demo).

Davros · Feb 4, 2024

wow Dmitryko thats a hell of a post thanks
but it doesnt really answer my question which I guess is sort of moot now because afaik all the cards nv, and, intel support dx12_2 (which I believe is the top feature level) which is what I was arguing for
Do b3d readers still think it would be better if one or more of the cards didn't support dx12_2

DegustatoR · Feb 4, 2024

The key difference here I think is that current APIs are being continuously updated while the major version changes previously usually meant a clean rewrite with a new runtime being introduced. It may not be much of a difference from user perspective but it makes sense that now we're getting new "feature levels" instead of new major versions if you look at it this way.

Pinstripe · Feb 4, 2024

Do we know anything about a future hypothetical 12_3 FT or even DX13? Or is it all just brute force going forward?

DmitryKo · Feb 5, 2024

There's no use in introducing new features when game developers cannot make proper use of existing technologies.

Direct3D 12 already has four separate rasterization pipelines - traditional triangle-based geometry shaders, meshlet-based geometry shaders, traditional pixel shaders, and raytracing shaders. How many premium Windows games are actually using meshlet geometry and mesh shaders (which were introduced 5 years ago), DirectX raytracing (introduced 5 years ago), tiled resources (10 years ago) and sampler feedback wih DirectStorage streaming and GDeflate decompression (4 years ago)?

In another 5 years, how many games and game engines would actually be redesigned from the ground up to use micropolygon meshlets, path-traced global illumination, and gigapixel textures with direct disk streaming and GPU decompression? How many terabytes per second of memory and cache bandwidth - and hundreds TFLOPS of compute, rasterization, ML, and BVH traversal performance - will be needed to use these technologies in actual 4k gaming?

I really doubt any of these would become commonplace in just 5 years and 2 hardware generations; the best we can expect is current high-end performance levels will gradually trickle down to future mid-range GPU products.

I still remember how Microsoft advertised the original Xbox from 2001 with a pre-rendered demo video 'Two to Tango' produced by Tim Miller's Blur Studio, which featured the soon-forgotten Xbox mascots Raven and Robot (or were they Tamer and Robot?); the video was advertised as representing the actual tech demo and it received ecstatic reviews, but of course, when the actual realtime demo was shown, it didn't look anything close to the pre-rendered video, and it took a dozen or more years for actual realtime scene lighting to even approach that kind of image quality...

DmitryKo · Feb 5, 2024

Speaking of 'brute-force' performance, Microsoft is planning to release a next-gen XBox by 2028-2029, based on a Zen6 CPU and RDNA5 (Navi 5x) APU which would include 'next-gen DirectX raytracing', 'dynamic global illumination', 'micropolygon rendering optimizations', and 'ML based super resolution'. These seem like performance enhancements to currently existing technologies rather than new hardware-based features.

Huge Microsoft Leak Reveals Plans for 2028 Next-Gen “Cloud Hybrid” Xbox - IGN

Microsoft's plan for its next-generation Xbox console due out 2028 leaked from FTC trial document.

www.ign.com

According to the projected development timeline for the new XBox, developer kits would be distributed in early 2027 - these are traditionally made from high-end PCs running Windows, and I believe desktop RNDA5 parts are also expected to come around late 2026 or early 2027, because mid-range desktop RDNA4 parts would come in late 2024 or early 2025.

High-end RDNA5 chips are rumored to use a crazy high number of processing blocks, on the order of 300+ CUs which should have 300+ TFLOPS of compute performance... though this particular rumor looks like pure speculation to me.

DmitryKo · Feb 10, 2024

It seems RDNA4 includes WaveMMA (a.k.a. 'AI') improvements like SWMMAC (Sparse Wave Matrix Multiply Accumulate) intstructions, 8-bit floating point formats, and 16x32 matrix dimensions - that's probably the reason why WaveMMA support was recently moved to HLSL shader model 6_9, as the current WaveMMA spec for sm 6_8 does not support these data formats and dimensions...

https://www.xda-developers.com/rdna-4-preview-ai-architecture/

Examining AMD’s RDNA 4 Changes in LLVM

As 2024 continues on, because time never stops, AMD has been working on their upcoming RDNA 4 architecture. Part of this involves supporting open source projects like LLVM. If done right, merging t…

chipsandcheese.com

Carried over from RDNA3:

Instruction Multiplied Matrices (A and B) Format Result/Accumulate Matrix (C) Format
V_WMMA_F32_16X16X16_F16 FP16 FP32
V_WMMA_F32_16X16X16_BF16 BF16 FP32
V_WMMA_F16_16X16X16_F16 FP16 FP16
V_WMMA_BF16_16X16X16_BF16 BF16 BF16
V_WMMA_I32_16X16X16_IU8 INT8 INT32
V_WMMA_I32_16X16X16_IU4 INT4 INT32

New in RDNA4:

Instruction Multiplied Matrices (A and B) Format Result/Accumulate Matrix (C) Format
V_WMMA_F32_16x16x16_FP8_BF8 and V_WMMA_F32_16x16x16_BF8_FP8 FP8 and BF8, or BF8 and FP8 (matrix multiplication is not commutative) FP32
V_WMMA_F32_16x16x16_BF8_BF8 BF8 FP32
V_WMMA_F32_16x16x16_FP8_FP8 FP8 FP32
V_WMMA_I32_16X16X32_IU4 A: 16×16 INT4
B: 16×32 INT4 16×32 INT32
V_SWMMAC_F32_16X16X32_F16 FP16
A: 16×16 stored/32×16 actual
B: 16×32 32×32 FP32
V_SWMMAC_F32_16X16X32_BF16 BF16 FP32
V_SWMMAC_F16_16X16X32_F16 FP16 FP16
V_SWMMAC_BF16_16X16X32_BF16 BF16 BF16
V_SWMMAC_I32_16X16X32_IU8 INT8 INT32
V_SWMMAC_I32_16X16X32_IU4 INT4 INT32
V_SWMMAC_I32_16X16X64_IU4 INT4
A: 16×16 stored/32×16 actual
B: 16×64 32×64 INT32
V_SWMMAC_F32_16X16X32_FP8_FP8 FP8 FP32
V_SWMMAC_F32_16X16X32_FP8_BF8 FP8 and BF8 FP32
V_SWMMAC_F32_16X16X32_BF8_FP8 BF8 and FP8 FP32
V_SWMMAC_F32_16X16X32_BF8_BF8 BF8 FP32

BTW NVidia has a similar sparse matrix feature in Tensor cores:

Accelerating Matrix Multiplication with Block Sparse Format and NVIDIA Tensor Cores | NVIDIA Technical Blog

Sparse-matrix dense-matrix multiplication (SpMM) is a fundamental linear algebra operation and a building block for more complex algorithms such as finding the solutions of linear systems…

developer.nvidia.com

DavidGraham · Feb 25, 2024

Microsoft DirectSR is coming with collaboration from NVIDIA and AMD, to implement a standardized super resolution solution, alongside GPU Workgraphs.

The DirectX team will showcase the latest updates, demos, and best practices for game development with key partners from AMD and NVIDIA. Work graphs are the newest way to take full advantage of GPU hardware and parallelize workloads. Microsoft will provide a preview into DirectSR, making it easier than ever for game devs to scale super resolution support across Windows devices. Finally, dive into the latest tooling updates for PIX.

https://schedule.gdconf.com/session/directx-state-of-the-union-ft-work-graphs-and-introducing-directsr-presented-by-microsoft/903872

DegustatoR · Feb 25, 2024

DavidGraham said:
with collaboration from NVIDIA and AMD

That's Videocardz guessing from the fact that there are AMD and Nvidia engineers in the speakers I think. But the wording in the description kinda implies that these will talk about GWG and not DirectSR (somewhat unfortunate acronym btw as DSR means a different thing).
It's somewhat probable that this DirectSR is the same thing as was discovered in the new insider builds previously.

Pjotr · Feb 26, 2024

From the wording on the page I think this will be an interface for other vendors to put their upscaling solution behind. That would kind off imply this is done in collaboration with NVIDIA and AMD for the feedback into the interface design. One very big advantage of this is that they would be able to update their upscaling solution with drivers so games are not stuck with the dlss/fsr version that the game shipped with. They could even go one step further and tweak/optimize their solution per game.

Kaotik · Feb 26, 2024

Pjotr said:
From the wording on the page I think this will be an interface for other vendors to put their upscaling solution behind. That would kind off imply this is done in collaboration with NVIDIA and AMD for the feedback into the interface design. One very big advantage of this is that they would be able to update their upscaling solution with drivers so games are not stuck with the dlss/fsr version that the game shipped with. They could even go one step further and tweak/optimize their solution per game.

You touched the issue in the next sentence a bit - while you see it in positive light, I wouldn't be so sure about that. Newer versions aren't always better for every game and there might be per game tweaks done by the devs. It would be a huge undertaking for IHVs to start managing those instead.

DmitryKo · Feb 26, 2024

It remains to be seen whether DirectSR requires any support from GPU vendors. Microsoft could probably just use exising DirectML functionality, though it will be hard to equalise in-game performance between different GPU vendors and hardware generations. Or they could require new metacommands to help accelerate certain upscaling and frame generation tasks, and fall back to generic DirectML algorithms for non-supporting drivers, similarily to custom DirectStorage metacommand and initial driver support for GDEFLATE decompression as discussed above...

iroboto · Feb 26, 2024

DavidGraham said:
Microsoft DirectSR is coming with collaboration from NVIDIA and AMD, to implement a standardized super resolution solution, alongside GPU Workgraphs.

https://schedule.gdconf.com/session/directx-state-of-the-union-ft-work-graphs-and-introducing-directsr-presented-by-microsoft/903872

Been anticipating this for some time never really sure if it was actually going to come. I am very interested to see how this works

pharma · Feb 27, 2024

iroboto · Feb 28, 2024

Article from Tim’s on DirectSR.

Nothing in particular, but if you prefer an article over reading the GDC description this will work.

Microsoft to debut DirectSR universal image upscaling technology next month, co-developed with Nvidia and AMD

Universal upscaling technology for video games is nearly here.

www.tomshardware.com

pjbliverpool · Feb 28, 2024

iroboto said:
Article from Tim’s on DirectSR.

Nothing in particular, but if you prefer an article over reading the GDC description this will work.

Microsoft to debut DirectSR universal image upscaling technology next month, co-developed with Nvidia and AMD

Universal upscaling technology for video games is nearly here.

www.tomshardware.com

The article is written as though DirectSR will replace DLSS and FSR which I find extremely unlikely given how huge a USP DLSS is for Nvidia. No way they would collaborate on a project that wipes out that competitive advantage.

Personally I'm a little concerned that DirectlML just becomes yet another upscaling solution like FSR/XeSS that is (largely) vendor agnostic and makes everything even more complicated and fragmented.

Obviously simplifying the landscape under a single solution has benefits, but not if it means we lose a superior solution.

My best guess/hope is that this will simply be a unified interface for developers to implement against which the IHVs then plug their own solutions into at the back end. That would be the best of all worlds IMO.

troyan · Feb 28, 2024

Why not? If DirectSR uses a complex network and using decicated units than it will be a huge plus for nVidia. Like Raytracing and Tessellation (in the past).

But i think this more like nVidia's AI upscaling from the ShieldTV.

pharma · Feb 28, 2024

I doubt MS will train upscaling models for free ... they already offer a paid version of CoPilot that has a few extras that could have been offered for no extra cost.
If they did try to replace DLSS and the continuous training done by Nvidia model it would purely be for a financial incentive.

DavidGraham · Feb 28, 2024

Well, DLSS now involves super resolution, frame generation, anti aliasing (DLAA), downsampling (DLDSR) and denoising (Ray Reconstruction), I doubt any DirectSR solution could replace all of that.

iroboto · Feb 28, 2024

pjbliverpool said:
The article is written as though DirectSR will replace DLSS and FSR which I find extremely unlikely given how huge a USP DLSS is for Nvidia. No way they would collaborate on a project that wipes out that competitive advantage.

Hmm. It felt like finding a standard solution between them both. I can’t see it replacing IHV solutions. Investment by nvidia here is too large to ever let it go.

Direct3D feature levels discussion

DmitryKo

Davros

DegustatoR

Pinstripe

DmitryKo

DmitryKo

Huge Microsoft Leak Reveals Plans for 2028 Next-Gen “Cloud Hybrid” Xbox - IGN

DmitryKo

Examining AMD’s RDNA 4 Changes in LLVM

Accelerating Matrix Multiplication with Block Sparse Format and NVIDIA Tensor Cores | NVIDIA Technical Blog

DavidGraham

DegustatoR

Pjotr

Kaotik

Drunk Member

DmitryKo

iroboto

Daft Funk

pharma

iroboto

Daft Funk

Microsoft to debut DirectSR universal image upscaling technology next month, co-developed with Nvidia and AMD

pjbliverpool

B3D Scallywag

Microsoft to debut DirectSR universal image upscaling technology next month, co-developed with Nvidia and AMD

troyan

pharma

DavidGraham

iroboto

Daft Funk

Instruction	Multiplied Matrices (A and B) Format	Result/Accumulate Matrix (C) Format
V_WMMA_F32_16X16X16_F16	FP16	FP32
V_WMMA_F32_16X16X16_BF16	BF16	FP32
V_WMMA_F16_16X16X16_F16	FP16	FP16
V_WMMA_BF16_16X16X16_BF16	BF16	BF16
V_WMMA_I32_16X16X16_IU8	INT8	INT32
V_WMMA_I32_16X16X16_IU4	INT4	INT32