Direct3D feature levels discussion

I doubt it, unless AMD exposed something similar on Playstation. The reaction from game devs makes it appear as if it's something new.
I assume this is more a programming paradigm than it being hardware related. So don't think PS has anything to do with it.
It's possible that the PS api already made it available, or could be added, not a good fit, who knows. This is a DX thing though.

Sounds like it would help getting better utilisation out of the XSX gpu.
 

I haven't seen too many responses that are critical of the programming paradigm, but mostly just about the lack of debugging tools.
The paradigm itself isn't new and has been on the roadmaps (so to speak) for quite some time now.

The implementation presented here though may not be very suitable for anything besides GPU compute and rather simple one at that as there are no synchronization options right now.

Still this is a preview and it will likely get expanded upon before release - at which point all IHVs will need to support it.

I assume this is more a programming paradigm than it being hardware related. So don't think PS has anything to do with it.
It's possible that the PS api already made it available, or could be added, not a good fit, who knows. This is a DX thing though.

Sounds like it would help getting better utilisation out of the XSX gpu.
I'd say that it's less interesting for UMA h/w like consoles but it may end up freeing up some CPU cycles there as well.
 
Timothy is complaining about the fact that the solution isn't more explicit and doesn't pivot hard enough to AMD HW despite the reception by his former colleagues over there ...
Launching compute kernels from device has been available in CUDA since Kepler.
CUDA Graphs have also been available for a while. Not sure how it compares to the DX Compute Graphs, but the concept seems to be the same.
Differences might lie in the interaction with graphics, but this has not been explored in the DX api yet.
D3D12 Work Graphs are a little bit more powerful than CUDA device graphs. CUDA graphs have restrictions like where memcpy nodes can't be used with CUDA Arrays which isn't ideal for Nanite-style producer-consumer queue work compaction ...

D3D12 Work Graphs shines over CUDA graphs in allowing implementations to efficiently pass registers from producer to the consumer which translates to significant memory bandwidth savings ...
 
One more feedback from a developer.

"This is great, but adopting new Agility SDK features is difficult when the DirectX Shader Compiler binaries have multiple conflicting linkage requirements, meaning you can't migrate an app from the Windows SDK version of DXC to the more recent GitHub release..."


What happened to the DirectX installers that came with every game? Why can't they fulfill this duty?
 
Yet another feature that will take ages to support a good amount of hardware, to master it and I bet will become bloated soonish :\
Going to a fully programmable pipeline should mean more simple solutions :V
But hell, I hope I am wrong :v
 
D3D12 Work Graphs shines over CUDA graphs in allowing implementations to efficiently pass registers from producer to the consumer which translates to significant memory bandwidth savings ...
Do you mean producer-consumer queues? They are supported in CUDA since A100, so you can allocate and pin a part of the L2 cache for these queues for the bandwidth savings.
 
Do you mean producer-consumer queues? They are supported in CUDA since A100, so you can allocate and pin a part of the L2 cache for these queues for the bandwidth savings.
In Workgraphs you have a blackbox data passing mechanism between nodes. You don't need (and should not) do it manually via something like UAV constructs. A hardware implementation can pass the data in any way it wants: registers, extra LDS, caches.
 
Or it solves a problem which doesn't exist on their h/w.
Hardware isn't self-serving. It appreciably simplifies development of types of pipelines for ISVs. Especially binning, reduction and occupancy related constructs benefit (runtime performance and development performance).
 
In Workgraphs you have a blackbox data passing mechanism between nodes. You don't need (and should not) do it manually via something like UAV constructs. A hardware implementation can pass the data in any way it wants: registers, extra LDS, caches.
Registers are private per thread, and LDS is too small, leaving caches as the only feasible solution for the task. However, the lack of restrictions and explicitness raises questions about its speed in practice. It remains to be seen whether this approach will, for example, save bandwidth.
 
they limited recursion to 32 steps if I understood correctly. But I started getting nausea just with the word recursion xD anything that isn't tail recursion is bad and tail recursion generally is trivial to change in an iteration.. The fact they limited it means they don't expect tail recursion at all... so yes bloating caches?
 
Seems like @DmitryKo 's feature checker is refusing to work with latest Agile SDK lib (It's working fine with 1.710
This is by design, the developer has to use external symbols to specify the exact version of the Agility SDK to be loaded at runtime - since the latest version of the tool was built with D3D12SDKVersion=710 embedded in the executable file, the OS will only load a matching D3D12Core.dll version 1.710, and throw an error if you replace it with any other version (BTW they used to allow higher versions of the DLL, but this was changed with the transition to SDK 6xx/7xx).


Either way, simply replacing the D3D12Core.dll with a new version will not gain you anything, because the source code needs to be updated to use the new structures defined in the latest Agility SDK header files. Even though I made some changes to support Agility SDK 711, WaveMMA reporting only works on Radeon RX 7000 (RDNA3) and my current card is RX 5700 XT, so I cannot test it unless AMD implements it on RDNA1 cards (or I get a very good deal on a Radeon RX 7600).

EDIT: I've updated my feature checker tool to report new experimental features in the Agility SDK 1.711 preview.
 
Last edited:
This is how the new features are reported by the current redistributable WARP library 1.0.5 (DLL version 10.0.25321.1003).

Code:
NonNormalizedCoordinateSamplersSupported : 1
ManualWriteTrackingResourceSupported : 0
RenderPassesValid : 1
MismatchingOutputDimensionsSupported : 1
SupportedSampleCountsWithNoOutputs : 31
PointSamplingAddressesNeverRoundUp : 1
RasterizerDesc2Supported : 1
NarrowQuadrilateralLinesSupported : 1
AnisoFilterWithPointMipSupported : 1
MaxSamplerDescriptorHeapSize : 2097152
MaxSamplerDescriptorHeapSizeWithStaticSamplers : 2097152
MaxViewDescriptorHeapSize : 2097152
ComputeOnlyCustomHeapSupported : 0

And this is a report by the Adrenalin driver 23.3.1 (build 31.0.14037.1007) on my Radeon 5700 XT (AMD did not relase an Agility SDK 1.710.0-specific driver yet)
BTW the beta AMD Adrenalin driver 23.10.01.14 (build 31.0.21001.14018) now supports a few new features in the Agility SDK 1.710 and 1.706/1.606 even on the Radeon RX 5700 XT:

Code:
EnhancedBarriersSupported : 1
RelaxedFormatCastingSupported : 1
DynamicIndexBufferStripCutSupported : 1
DynamicDepthBiasSupported : 1
GPUUploadHeapSupported : 1
NonNormalizedCoordinateSamplersSupported : 1
MismatchingOutputDimensionsSupported : 1
SupportedSampleCountsWithNoOutputs : 29
PointSamplingAddressesNeverRoundUp : 1
RasterizerDesc2Supported : 1
NarrowQuadrilateralLinesSupported : 1
AnisoFilterWithPointMipSupported : 1
MaxSamplerDescriptorHeapSize : 67108864
MaxSamplerDescriptorHeapSizeWithStaticSamplers : 67108864
MaxViewDescriptorHeapSize : 33554432
 
Last edited:
Has anyone ran @DmitryKo 's utility on an RDNA3 card btw?
It is likely a copy of RDNA2 feature wise but just to be sure.
I've just got myself an Radeon RX 7600 (RNDA3) card, and there are no major new feature options comparing to RDNA2, except for experimental WaveMMA and D3D12_WORK_GRAPHS_TIER_0_1, if you look at the most recent report for RX 6800 posted by CarstenS back in November 2020; his comparison with the Nvidia RTX series remains valid as well.

Here are the differences between RX 7600 and RX 5700 XT as reported by the exprimental WorkGraphs/WaveMMA driver 23.10.01.14 in the post above with the Agility SDK 1.711 preview:

Code:
Maximum feature level : D3D_FEATURE_LEVEL_12_2 (0xc200)
BarycentricsSupported : 1
RaytracingTier : D3D12_RAYTRACING_TIER_1_1 (11)
PerPrimitiveShadingRateSupportedWithViewportIndexing : 1
VariableShadingRateTier : D3D12_VARIABLE_SHADING_RATE_TIER_2 (2)
ShadingRateImageTileSize : 8
MeshShaderTier : D3D12_MESH_SHADER_TIER_1 (10)
SamplerFeedbackTier : D3D12_SAMPLER_FEEDBACK_TIER_1_0 (100)
MeshShaderPipelineStatsSupported : 1
WaveMMATier : D3D12_WAVE_MMA_TIER_1_0 (10)
VariableRateShadingSumCombinerSupported : 1

Here are the differences comparing to your RTX 4090 report posted this March:

Code:
GraphicsPreemptionGranularity : DXGI_GRAPHICS_PREEMPTION_PRIMITIVE_BOUNDARY (1)
ComputePreemptionGranularity : DXGI_COMPUTE_PREEMPTION_DMA_BUFFER_BOUNDARY (0)
PSSpecifiedStencilRefSupported : 1
MaxGPUVirtualAddressBitsPerResource : 44
MaxGPUVirtualAddressBitsPerProcess : 44
ViewInstancingTier : D3D12_VIEW_INSTANCING_TIER_1 (1)
AdditionalShadingRatesSupported : 0
ShadingRateImageTileSize : 8
BackgroundProcessingSupported : 0
SamplerFeedbackTier : D3D12_SAMPLER_FEEDBACK_TIER_1_0 (100)
WaveMMATier : D3D12_WAVE_MMA_TIER_1_0 (10)
MeshShaderPerPrimitiveShadingRateSupported : 0
MSPrimitivesPipelineStatisticIncludesCulledPrimitives : 0

EDIT: I've updated my feature checker tool to report new experimental features in the Agility SDK 1.711 preview.
 
Last edited:
From D3D12FeatureOptionsAgile.txt file, the result from my Intel A770.

Rich (BB code):
Direct3D 12 feature checker (July 2023) by DmitryKo (x64) (Agility SDK v711)

Windows 10X version 22H2 (build 22621.1928 ni_release) x64

ADAPTER 0
"Intel(R) Arc(TM) A770 Graphics"
VEN_8086, DEV_56A0, SUBSYS_10208086, REV_08
Dedicated video memory : 16256.0 MB (17045651456 bytes)
Total video memory : 24412.4 MB (25598205952 bytes)
BIOS string : Intel Video BIOS
Video driver version : 31.0.101.4314
WDDM version : KMT_DRIVERVERSION_WDDM_3_1 (3100)
Virtual memory model : GPUMMU
Hardware-accelerated scheduler : Disabled, DXGK_FEATURE_SUPPORT_ALWAYS_OFF (0)
GraphicsPreemptionGranularity : DXGI_GRAPHICS_PREEMPTION_TRIANGLE_BOUNDARY (2)
ComputePreemptionGranularity : DXGI_COMPUTE_PREEMPTION_THREAD_GROUP_BOUNDARY (2)
Maximum feature level : D3D_FEATURE_LEVEL_12_2 (0xc200)
DoublePrecisionFloatShaderOps : 0
OutputMergerLogicOp : 1
MinPrecisionSupport : D3D12_SHADER_MIN_PRECISION_SUPPORT_16_BIT (2) (0b0000'0010)
TiledResourcesTier : D3D12_TILED_RESOURCES_TIER_3 (3)
ResourceBindingTier : D3D12_RESOURCE_BINDING_TIER_3 (3)
PSSpecifiedStencilRefSupported : 1
TypedUAVLoadAdditionalFormats : 1
ROVsSupported : 1
ConservativeRasterizationTier : D3D12_CONSERVATIVE_RASTERIZATION_TIER_3 (3)
StandardSwizzle64KBSupported : 0
CrossNodeSharingTier : D3D12_CROSS_NODE_SHARING_TIER_NOT_SUPPORTED (0)
CrossAdapterRowMajorTextureSupported : 1
VPAndRTArrayIndexFromAnyShaderFeedingRasterizerSupportedWithoutGSEmulation : 1
ResourceHeapTier : D3D12_RESOURCE_HEAP_TIER_1 (1)
MaxGPUVirtualAddressBitsPerResource : 44
MaxGPUVirtualAddressBitsPerProcess : 48
Adapter Node 0:     TileBasedRenderer: 0, UMA: 0, CacheCoherentUMA: 0, IsolatedMMU: 1, HeapSerializationTier: 0, ProtectedResourceSession.Support: 1, ProtectedResourceSessionTypeCount: 1 D3D12_PROTECTED_RESOURCES_SESSION_HARDWARE_PROTECTED
HighestShaderModel : D3D12_SHADER_MODEL_6_7 (0x0067)
WaveOps : 1
WaveLaneCountMin : 8
WaveLaneCountMax : 32
TotalLaneCount : 16384
ExpandedComputeResourceStates : 1
Int64ShaderOps : 1
RootSignature.HighestVersion : D3D_ROOT_SIGNATURE_VERSION_1_2 (3)
DepthBoundsTestSupported : 1
ProgrammableSamplePositionsTier : D3D12_PROGRAMMABLE_SAMPLE_POSITIONS_TIER_1 (1)
ShaderCache.SupportFlags : D3D12_SHADER_CACHE_SUPPORT_SINGLE_PSO | LIBRARY | AUTOMATIC_INPROC_CACHE | AUTOMATIC_DISK_CACHE | SHADER_CONTROL_CLEAR | SHADER_SESSION_DELETE (111) (0b0110'1111)
CopyQueueTimestampQueriesSupported : 1
CastingFullyTypedFormatSupported : 1
WriteBufferImmediateSupportFlags : D3D12_COMMAND_LIST_SUPPORT_FLAG_DIRECT | BUNDLE | COMPUTE | COPY (15) (0b0000'1111)
ViewInstancingTier : D3D12_VIEW_INSTANCING_TIER_2 (2)
BarycentricsSupported : 0
ExistingHeaps.Supported : 1
MSAA64KBAlignedTextureSupported : 1
SharedResourceCompatibilityTier : D3D12_SHARED_RESOURCE_COMPATIBILITY_TIER_2 (2)
Native16BitShaderOpsSupported : 1
AtomicShaderInstructions : 0
SRVOnlyTiledResourceTier3 : 1
RenderPassesTier : D3D12_RENDER_PASS_TIER_0 (0)
RaytracingTier : D3D12_RAYTRACING_TIER_1_1 (11)
AdditionalShadingRatesSupported : 1
PerPrimitiveShadingRateSupportedWithViewportIndexing : 1
VariableShadingRateTier : D3D12_VARIABLE_SHADING_RATE_TIER_2 (2)
ShadingRateImageTileSize : 8
BackgroundProcessingSupported : 1
MeshShaderTier : D3D12_MESH_SHADER_TIER_1 (10)
SamplerFeedbackTier : D3D12_SAMPLER_FEEDBACK_TIER_0_9 (90)
UnalignedBlockTexturesSupported : 1
MeshShaderPipelineStatsSupported : 1
MeshShaderSupportsFullRangeRenderTargetArrayIndex : 1
AtomicInt64OnTypedResourceSupported : 0
AtomicInt64OnGroupSharedSupported : 0
DerivativesInMeshAndAmplificationShadersSupported : 0
WaveMMATier : D3D12_WAVE_MMA_TIER_NOT_SUPPORTED (0)
VariableRateShadingSumCombinerSupported : 1
MeshShaderPerPrimitiveShadingRateSupported : 1
AtomicInt64OnDescriptorHeapResourceSupported : 1
DisplayableTexture : 0
DisplayableTexture.SharedResourceCompatibilityTier : D3D12_SHARED_RESOURCE_COMPATIBILITY_TIER_0 (0)
MSPrimitivesPipelineStatisticIncludesCulledPrimitives : 1
EnhancedBarriersSupported : 1
RelaxedFormatCastingSupported : 1
UnrestrictedBufferTextureCopyPitchSupported : 1
UnrestrictedVertexElementAlignmentSupported : 1
InvertedViewportHeightFlipsYSupported : 1
InvertedViewportDepthFlipsZSupported : 1
TextureCopyBetweenDimensionsSupported : 1
AlphaBlendFactorSupported : 1
AdvancedTextureOpsSupported : 0
WriteableMSAATexturesSupported : 0
IndependentFrontAndBackStencilRefMaskSupported : 1
TriangleFanSupported : 1
DynamicIndexBufferStripCutSupported : 1
DynamicDepthBiasSupported : 1
GPUUploadHeapSupported : 1
NonNormalizedCoordinateSamplersSupported : 1
ManualWriteTrackingResourceSupported : 0
RenderPassesValid : 1
MismatchingOutputDimensionsSupported : 0
SupportedSampleCountsWithNoOutputs : 1
PointSamplingAddressesNeverRoundUp : 0
RasterizerDesc2Supported : 1
NarrowQuadrilateralLinesSupported : 0
AnisoFilterWithPointMipSupported : 0
MaxSamplerDescriptorHeapSize : 2048
MaxSamplerDescriptorHeapSizeWithStaticSamplers : 2048
MaxViewDescriptorHeapSize : 1000000
ComputeOnlyCustomHeapSupported : 0
ComputeOnlyWriteWatchSupported : 1
Experimental.WorkGraphsTier : D3D12_WORK_GRAPHS_TIER_NOT_SUPPORTED (0)
Metacommands enumerated : 11
Metacommands [parameters per stage]: Conv (Convolution) [84][1][6], GEMM (General matrix multiply) [67][1][6], Pooling [44][1][4], Conv (Convolution) [108][5][6], GEMM (General matrix multiply) [91][5][6], MVN (Mean Variance Normalization) [91][5][6], Pooling [56][3][4], LSTM (Long Short-Term Memory) [252][10][13], DStorageCustom Metacommand [4][0][11],  [1][0][9],  [4][0][11]
 
Last edited by a moderator:
Could actually do with an easily digestible table with all the results from the different cards and what they mean. Or by the time that's done the actual detail becomes worthless?
 
FYI, Wikipefia article Feature levels in Direct3D does include a support matrix table for a few important feature options.

There are references to Microsoft Learn (formerly MSDN/Docs) online documentation for D3D12_FEATURE and currently supported feature options, the Direct3D 12 Programming Guide on major feature tiers, and DirectX-Specs (Engineering Specs for DirectX Features) for low-level features currently in development.

Unfortunately online documentation is poorly cross-referenced and does not cover Insider Preview SDK and Agility SDK releases, and DirectX Specs only cover major developments...
 
Last edited:
Back
Top