Direct3D feature levels discussion

DegustatoR · Oct 18, 2023

Nvidia's R545 driver release comes with support for GPU Work Graphs and WaveMMA:
D3D12_WORK_GRAPHS_TIER_0_1 (0x1)
WaveMMATier = D3D12_WAVE_MMA_TIER_1_0 (0xA)

A bunch of other stuff which was "false" is now "true" as well. I'll take a closer look tomorrow.

Alessio1989 · Oct 18, 2023

so, so far none gpu supports the work graphs tier 1.0

eddman · Oct 18, 2023

I was looking around in DirectX SDK 1 and saw the following inside the ddraw.h file:

/****************************************************************************
*
* DIRECTDRAWSURFACE CAPABILITY FLAGS
*
****************************************************************************/
/*
* Indicates that this surface is a front buffer, back buffer, or
* texture map that is being used in conjunction with a 3DDDI or
* Direct3D HAL.
*/
#define DDSCAPS_3D 0x00000001l

/*
* Indicates that this surface can be used as a 3D texture. It does not
* indicate whether or not the surface is being used for that purpose.
*/
#define DDSCAPS_TEXTUREMAP 0x00001000l

What are those about? I was under the impression Direct3D was introduced with DirectX 2, considering there's no d3d.h or d3dim.dll in SDK 1.

DmitryKo · Oct 18, 2023

Alessio1989 said:
so far none gpu supports the work graphs tier 1.0

No way to tell, the publicly released Agility SDK 1.711.3 Preview from June only supports tier 0.1.

eddman said:
Direct3D was introduced with DirectX 2, considering there's no d3d.h or d3dim.dll in SDK 1.

Direct3D was released in Summer 1996, but it has been in development since Spring 1995 and initial versions use DirectDraw for surface management.

DegustatoR · Oct 19, 2023

So here are all the differences between R545 and a previous R535 drivers on a 4090:

1. Regular "agile" caps:

Code:

MismatchingOutputDimensionsSupported : 0 -> 1
SupportedSampleCountsWithNoOutputs : 1 -> 31
PointSamplingAddressesNeverRoundUp : 0 -> 1
NarrowQuadrilateralLinesSupported : 0 -> 1
AnisoFilterWithPointMipSupported : 0 -> 1
MaxSamplerDescriptorHeapSize : 2048 -> 4080
Experimental.WorkGraphsTier : D3D12_WORK_GRAPHS_TIER_NOT_SUPPORTED (0) -> D3D12_WORK_GRAPHS_TIER_0_1 (1)

2. "Experimental" agile caps:

Code:

WaveMMATier : D3D12_WAVE_MMA_TIER_NOT_SUPPORTED (0) -> D3D12_WAVE_MMA_TIER_1_0 (10)
MismatchingOutputDimensionsSupported : 0 -> 1
SupportedSampleCountsWithNoOutputs : 1 -> 31
PointSamplingAddressesNeverRoundUp : 0 -> 1
NarrowQuadrilateralLinesSupported : 0 -> 1
AnisoFilterWithPointMipSupported : 0 -> 1
MaxSamplerDescriptorHeapSize : 2048 -> 4080
Experimental.WorkGraphsTier : D3D12_WORK_GRAPHS_TIER_NOT_SUPPORTED (0) -> D3D12_WORK_GRAPHS_TIER_0_1 (1)
WaveMMA operations supported : n/a -> 8

The WaveMMA operations supported are these:

Code:

D3D12_FEATURE_DATA_WAVE_MMA:
----------------------------
InputDataType = D3D12_WAVE_MMA_INPUT_DATATYPE_BYTE, M = D3D12_WAVE_MMA_DIMENSION_16, N = D3D12_WAVE_MMA_DIMENSION_16:
    Supported = TRUE
    K = 16
    AccumDataTypes = 0x1
        D3D12_WAVE_MMA_ACCUM_DATATYPE_INT32
    RequiredWaveLaneCountMin = 32
    RequiredWaveLaneCountMax = 32
InputDataType = D3D12_WAVE_MMA_INPUT_DATATYPE_BYTE, M = D3D12_WAVE_MMA_DIMENSION_16, N = D3D12_WAVE_MMA_DIMENSION_64:
    Supported = TRUE
    K = 16
    AccumDataTypes = 0x1
        D3D12_WAVE_MMA_ACCUM_DATATYPE_INT32
    RequiredWaveLaneCountMin = 32
    RequiredWaveLaneCountMax = 32
InputDataType = D3D12_WAVE_MMA_INPUT_DATATYPE_BYTE, M = D3D12_WAVE_MMA_DIMENSION_64, N = D3D12_WAVE_MMA_DIMENSION_16:
    Supported = TRUE
    K = 16
    AccumDataTypes = 0x1
        D3D12_WAVE_MMA_ACCUM_DATATYPE_INT32
    RequiredWaveLaneCountMin = 32
    RequiredWaveLaneCountMax = 32
InputDataType = D3D12_WAVE_MMA_INPUT_DATATYPE_BYTE, M = D3D12_WAVE_MMA_DIMENSION_64, N = D3D12_WAVE_MMA_DIMENSION_64:
    Supported = TRUE
    K = 16
    AccumDataTypes = 0x1
        D3D12_WAVE_MMA_ACCUM_DATATYPE_INT32
    RequiredWaveLaneCountMin = 32
    RequiredWaveLaneCountMax = 32
InputDataType = D3D12_WAVE_MMA_INPUT_DATATYPE_FLOAT16, M = D3D12_WAVE_MMA_DIMENSION_16, N = D3D12_WAVE_MMA_DIMENSION_16:
    Supported = TRUE
    K = 16
    AccumDataTypes = 0x6
        D3D12_WAVE_MMA_ACCUM_DATATYPE_FLOAT16
        D3D12_WAVE_MMA_ACCUM_DATATYPE_FLOAT
    RequiredWaveLaneCountMin = 32
    RequiredWaveLaneCountMax = 32
InputDataType = D3D12_WAVE_MMA_INPUT_DATATYPE_FLOAT16, M = D3D12_WAVE_MMA_DIMENSION_16, N = D3D12_WAVE_MMA_DIMENSION_64:
    Supported = TRUE
    K = 16
    AccumDataTypes = 0x6
        D3D12_WAVE_MMA_ACCUM_DATATYPE_FLOAT16
        D3D12_WAVE_MMA_ACCUM_DATATYPE_FLOAT
    RequiredWaveLaneCountMin = 32
    RequiredWaveLaneCountMax = 32
InputDataType = D3D12_WAVE_MMA_INPUT_DATATYPE_FLOAT16, M = D3D12_WAVE_MMA_DIMENSION_64, N = D3D12_WAVE_MMA_DIMENSION_16:
    Supported = TRUE
    K = 16
    AccumDataTypes = 0x6
        D3D12_WAVE_MMA_ACCUM_DATATYPE_FLOAT16
        D3D12_WAVE_MMA_ACCUM_DATATYPE_FLOAT
    RequiredWaveLaneCountMin = 32
    RequiredWaveLaneCountMax = 32
InputDataType = D3D12_WAVE_MMA_INPUT_DATATYPE_FLOAT16, M = D3D12_WAVE_MMA_DIMENSION_64, N = D3D12_WAVE_MMA_DIMENSION_64:
    Supported = TRUE
    K = 16
    AccumDataTypes = 0x6
        D3D12_WAVE_MMA_ACCUM_DATATYPE_FLOAT16
        D3D12_WAVE_MMA_ACCUM_DATATYPE_FLOAT
    RequiredWaveLaneCountMin = 32
    RequiredWaveLaneCountMax = 32

eddman · Oct 19, 2023

DmitryKo said:
Direct3D was released in Summer 1996, but it has been in development since Spring 1995 and initial versions use DirectDraw for surface management.

So it seems DDraw 1 received some Direct3D related stuff before D3D itself was ready. Found another readme, this one in SDK 2, that talks about interface name changes from Reality Lab to Direct3D; I guess DDraw 1 was supposed to work with Reality Lab API?

DmitryKo · Oct 19, 2023

eddman said:
I guess DDraw 1 was supposed to work with Reality Lab API?

DirectDraw effort started in 1994, long before RenderMorphics acquisition - so it's rather the other way round, i.e. Reality Lab was ported to use DirectDraw surfaces and this became the foundation of the DDI for the 'Direct3D HAL' device.

I guess the expectation was that DirectDraw would eventually support hardware-accelerated alpha-blended sprites, and Direct3D retained mode was initially intended for entry-level accelerators and software 3D rendering into DirectDraw maintained front/back buffers, while Project Talisman would introduce top-end multi-chip solutions for on-demand tiled rendering of game objects into alpha-blended sprites to save on the video memory bandwidth.

Then OpenGL/MiniGL accelerators like 3dfx Voodoo Graphics and 3DLabs GLINT and 3D games like Quake took the gaming world by storm, so retained mode, Talisman, and hardware sprites were set aside and Microsoft instead reimplemented the Direct3D 5.0 HAL device to expose the DrawPrimitive() method in the immediate mode, esentially implementing scene rendring pipeline from OpenGL. At the same time Nvidia Riva128 emerged as the first accelerator to natively support Direct3D 5.0 DDI, with enormous 128-bit bus and memory bandwidth of 1.6 Gbyte/s which was twice that of Voodoo Graphics, so 2D rendering was effectively dead by the end of 1997 and hardware alpha-blending for sprites never materialized. DirectDraw was eventually replaced by DXGI/WDDM for resource and swapchain management starting with Windows Vista.

cho · Oct 22, 2023

Something wrong with RTX 2070 ?

Code:

./D3D12CheckFeatureSupportAgile.exe
Direct3D 12 feature checker (July 2023) by DmitryKo (x64) (Agility SDK v711)
https://forum.beyond3d.com/posts/1840641/


Windows 10X version 22H2 (build 22621.2428 ni_release) x64


ADAPTER 0
"NVIDIA GeForce RTX 2070"
VEN_10DE, DEV_1F07, SUBSYS_12AD10DE, REV_A1
Dedicated video memory : 8019.0 MB (8408530944 bytes)
Total video memory : 24357.5 MB (25540673536 bytes)
BIOS string : Version90.6.b.0.1
Video driver version : 31.0.15.4584
WDDM version : KMT_DRIVERVERSION_WDDM_3_1 (3100)
Virtual memory model : GPUMMU
Hardware-accelerated scheduler : Enabled, DXGK_FEATURE_SUPPORT_STABLE (2)
GraphicsPreemptionGranularity : DXGI_GRAPHICS_PREEMPTION_PIXEL_BOUNDARY (3)
ComputePreemptionGranularity : DXGI_COMPUTE_PREEMPTION_DISPATCH_BOUNDARY (1)
Failed to create Direct3D 12 device
Error 0x887e0003:

DegustatoR · Oct 22, 2023

cho said:
Something wrong with RTX 2070 ?

You have to enable developer mode in Windows settings and put d3d12core.dll file from 1.711.3 Agility SDK (download package and unzip it) into the D3D12 folder.

DmitryKo · Oct 22, 2023

cho said:
Something wrong with RTX 2070 ?

Code:

Error 0x887E0003: The D3D12 SDK version configuration of the host exe is invalid.

That's D3D12_ERROR_INVALID_REDIST as described in the readme.txt and the actual forum post

https://forum.beyond3d.com/posts/1840641/ - dowload the Agility SDK Preview runtime and unpack d3d12core.dll into D3D12 folder as per above.

Lurkmass · Oct 26, 2023

Add SV_BaseVertexLocation and SV_StartInstanceLocation to VSIn by python3kgae · Pull Request #5770 · microsoft/DirectXShaderCompiler

New VS Input semantics SV_BaseVertexLocation and SV_StartInstanceLocation are added for BaseVertexLocation and StartInstanceLocation of https://learn.microsoft.com/en-us/windows/win32/api/d3d12/nf-...

github.com

Not in the PR title but shader model 6.8 is finally going to support a gl_DrawID equivalent (SV_IndirectCommandIndex) which makes doing GPU driven rendering more attractive with the vertex shading pipeline ...

DegustatoR · Oct 28, 2023

Advanced API Performance: Descriptors | NVIDIA Technical Blog

By using descriptor types, you can bind resources to shaders and specify how those resources are accessed. This creates efficient communication between the CPU and GPU and enables shaders to access…

developer.nvidia.com

Samwell · Oct 28, 2023

Lurkmass said:
Add SV_BaseVertexLocation and SV_StartInstanceLocation to VSIn by python3kgae · Pull Request #5770 · microsoft/DirectXShaderCompiler

New VS Input semantics SV_BaseVertexLocation and SV_StartInstanceLocation are added for BaseVertexLocation and StartInstanceLocation of https://learn.microsoft.com/en-us/windows/win32/api/d3d12/nf-...

github.com

Not in the PR title but shader model 6.8 is finally going to support a gl_DrawID equivalent (SV_IndirectCommandIndex) which makes doing GPU driven rendering more attractive with the vertex shading pipeline ...

It's good Direct X is moving forward and also Direct Storage is making progress. I just hope we will also get DXR 2.0 soon in the next year. It's frustrating that we don't have any progress here and Intel/Nvidia both already have architectures, which support more features, than DXR1.1 supports. Battlemage and Blackwell will just make things worse, if we don't get any common new DXR.2.0 Feature Baseline. Additionally there are enough Developers, which dislike DXR at the moment, so the software side needs strong improvement too.

Lurkmass · Oct 29, 2023

Samwell said:
It's good Direct X is moving forward and also Direct Storage is making progress. I just hope we will also get DXR 2.0 soon in the next year. It's frustrating that we don't have any progress here and Intel/Nvidia both already have architectures, which support more features, than DXR1.1 supports. Battlemage and Blackwell will just make things worse, if we don't get any common new DXR.2.0 Feature Baseline. Additionally there are enough Developers, which dislike DXR at the moment, so the software side needs strong improvement too.

Intel hates everything but pure RTPSOs/DXR 1.0 when it comes to ray tracing API abstractions. Intel exactly implements baseline RTPSO APIs in it's "most obvious" possible incarnation so they don't really have any other major features that could be exposed or benefit by being more explicit besides their "traversal shaders". Intel graphics hardware may support ray reordering but it's not like you need an "explicit" API to take advantage of this functionality unlike the latest Nvidia hardware. Even when you have two vendors that prefer a particular API abstraction (RTPSO), they can still disagree on how to "evolve" it ... (explicit traversal shaders vs explicit ray reordering)

That being said, it's up to the AAA game industry on whether or not they want hardware ray tracing to thrive in the future. If AAA game developers ultimately come to the conclusion that the entire concept isn't the way forward, the industry shouldn't be forced to double down into that singular direction against their will ...

Lurkmass · Nov 9, 2023

vkd3d-proton/docs/sampler_feedback.md at master · HansKristian-Work/vkd3d-proton

Fork of VKD3D. Development branches for Proton's Direct3D 12 implementation. - HansKristian-Work/vkd3d-proton

github.com

Documentation of reverse engineered sampler feedback implementations on drivers for anyone interested ...

DmitryKo · Jan 27, 2024

FYI Work Graphs specs have been updated to v0.50 to include D3D12_WORK_GRAPHS_TIER_1_1 for prototyping of Graphics nodes, and the most recent Windows Insider SDK 26040 (Germanium semester) now provides definitions for D3D12_WORK_GRAPHS_TIER_1_0, D3D_SHADER_MODEL_6_9, and D3D12_PREVIEW_SDK_VERSION = 713 in d3d12.h, with D3D12DDI_SHADER_MODEL_6_8_RELEASE_0108 definition added in the user mode DDI d3d12umddi.h.

WaveMMA is still not supported in the Insider Preview SDK, and the Agility SDK has not been updated since June 2023 1711.3 Preview.

Lurkmass · Feb 1, 2024

Add DXIL 1.8 op code cap and move WaveMatrix intrisics to SM 6.9 by hekota · Pull Request #6163 · microsoft/DirectXShaderCompiler

Adds max opcode value for DXIL 1.8 and moves WaveMatrix intrinsics into future shader model 6.9. Contributes to #6133 Related to #6125

github.com

WaveMMA has been moved to shader model 6.9 ...

DmitryKo · Feb 2, 2024

So Work Graphs and shader model 6.8 would be tested for the remainder of the current Germanium semester, but WaveMMA and SM 6.9 seem to have slipped into the upcoming Dilithium semester, which wouldn't leave the Canary channel until late 2024 or early 2025 - if ever, considering the release cadence of recent semesters like Manganese, Iron, Cobalt, Copper, Zinc, Gallium, and Germanium which never moved out the rs_prerelease branch (except for Zinc Refresh).

This would mark WaveMMA feature as being almost 5 years in development - it was initially released back in the Iron semester along with SM 6.7, but supporting drivers were never made public at the time, and then it was removed from the subsequent Insider SDK builds...

Davros · Feb 3, 2024

Back when feature levels were first introduced (with dx10 was it?) I made a post complaining that I shouldn't have to research what feature level a card supported I should just for example buy a dx10 cards and know it supported all the feature levels. I was told I was wrong and that was a bad idea (never understood why) Is that view still the consensus here ?

DmitryKo · Feb 4, 2024

GPU vendors advertise game performance and major new features like raytracing - they were never supposed to market feature levels or mention them outside of GPU specs and developer documentation. The single exception throughout all these years was the 'DirectX 12 Ultimate' campaign back in 2020 which promoted GPUs conforming to feature level 12_2 (which requires raytracing tier 1_1, mesh shaders, variable-rate shading, and sampler feedback).

Vendors used to advertise runtime versions like 'DirectX 11.1' and 'DirectX 12', but then Microsoft abandoned the practice of assigning minor versions to new releases of Direct3D 12 SDK and runtime, so it became meaningless for them to mention runtime versions (and it was rather meaningless and often misleading for end users even when they did).