Direct3D feature levels discussion

Interesting - that's a bit finer-grained comparing to GCN/RDNA:
Code:
GraphicsPreemptionGranularity : DXGI_GRAPHICS_PREEMPTION_PRIMITIVE_BOUNDARY (1)
ComputePreemptionGranularity : DXGI_COMPUTE_PREEMPTION_DMA_BUFFER_BOUNDARY (0)
I got "DISPATCH_BOUNDARY (1)".
Could it be that your Hardware-accelerated scheduler is disabled?
Code:
Windows 10 version 2004 (build 19041.329 vb_release) x64

ADAPTER 0
"AMD Radeon RX 5700 XT"
VEN_1002, DEV_731F, SUBSYS_E4111DA2, REV_C1
Dedicated video memory : 8148.5 MB (8544296960 bytes)
Total video memory : 40883.3 MB (42869293056 bytes)
Video driver version : 27.20.1017.4017
WDDM version : KMT_DRIVERVERSION_WDDM_2_7 (2700)
Hardware-accelerated scheduler : Enabled, supported
GraphicsPreemptionGranularity : DXGI_GRAPHICS_PREEMPTION_PRIMITIVE_BOUNDARY (1)
ComputePreemptionGranularity : DXGI_COMPUTE_PREEMPTION_DISPATCH_BOUNDARY (1)
Maximum feature level : D3D_FEATURE_LEVEL_12_1 (0xc100)
DoublePrecisionFloatShaderOps : 1
OutputMergerLogicOp : 1
MinPrecisionSupport : D3D12_SHADER_MIN_PRECISION_SUPPORT_16_BIT (2) (0b0000'0010)
TiledResourcesTier : D3D12_TILED_RESOURCES_TIER_3 (3)
ResourceBindingTier : D3D12_RESOURCE_BINDING_TIER_3 (3)
PSSpecifiedStencilRefSupported : 1
TypedUAVLoadAdditionalFormats : 1
ROVsSupported : 1
ConservativeRasterizationTier : D3D12_CONSERVATIVE_RASTERIZATION_TIER_3 (3)
StandardSwizzle64KBSupported : 0
CrossNodeSharingTier : D3D12_CROSS_NODE_SHARING_TIER_NOT_SUPPORTED (0)
CrossAdapterRowMajorTextureSupported : 0
VPAndRTArrayIndexFromAnyShaderFeedingRasterizerSupportedWithoutGSEmulation : 1
ResourceHeapTier : D3D12_RESOURCE_HEAP_TIER_2 (2)
MaxGPUVirtualAddressBitsPerResource : 44
MaxGPUVirtualAddressBitsPerProcess : 44
Adapter Node 0:         TileBasedRenderer: 0, UMA: 0, CacheCoherentUMA: 0, IsolatedMMU: 1, HeapSerializationTier: 0, ProtectedResourceSession.Support: 1, ProtectedResourceSessionTypeCount: 1 D3D12_PROTECTED_RESOURCES_SESSION_HARDWARE_PROTECTED
HighestShaderModel : D3D12_SHADER_MODEL_6_5 (0x0065)
WaveOps : 1
WaveLaneCountMin : 32
WaveLaneCountMax : 64
TotalLaneCount : 2560
ExpandedComputeResourceStates : 1
Int64ShaderOps : 1
RootSignature.HighestVersion : D3D_ROOT_SIGNATURE_VERSION_1_1 (2)
DepthBoundsTestSupported : 1
ProgrammableSamplePositionsTier : D3D12_PROGRAMMABLE_SAMPLE_POSITIONS_TIER_2 (2)
ShaderCache.SupportFlags : D3D12_SHADER_CACHE_SUPPORT_SINGLE_PSO | LIBRARY | AUTOMATIC_INPROC_CACHE | AUTOMATIC_DISK_CACHE (15) (0b0000'1111)
CopyQueueTimestampQueriesSupported : 1
CastingFullyTypedFormatSupported : 1
WriteBufferImmediateSupportFlags : D3D12_COMMAND_LIST_SUPPORT_FLAG_DIRECT | BUNDLE | COMPUTE | COPY (15) (0b0000'1111)
ViewInstancingTier : D3D12_VIEW_INSTANCING_TIER_1 (1)
BarycentricsSupported : 0
ExistingHeaps.Supported : 1
MSAA64KBAlignedTextureSupported : 1
SharedResourceCompatibilityTier : D3D12_SHARED_RESOURCE_COMPATIBILITY_TIER_2 (2)
Native16BitShaderOpsSupported : 1
AtomicShaderInstructions : 0
SRVOnlyTiledResourceTier3 : 1
RenderPassesTier : D3D12_RENDER_PASS_TIER_0 (0)
RaytracingTier : D3D12_RAYTRACING_TIER_NOT_SUPPORTED (0)
AdditionalShadingRatesSupported : 0
PerPrimitiveShadingRateSupportedWithViewportIndexing : 0
VariableShadingRateTier : D3D12_VARIABLE_SHADING_RATE_TIER_NOT_SUPPORTED (0)
ShadingRateImageTileSize : 0
BackgroundProcessingSupported : 0
MeshShaderTier : D3D12_MESH_SHADER_TIER_NOT_SUPPORTED (0)
SamplerFeedbackTier : D3D12_SAMPLER_FEEDBACK_TIER_NOT_SUPPORTED (0)
DirectML maximum feature level : DML_FEATURE_LEVEL_2_0 (0x2000)
Metacommands enumerated : 4
Metacommands [parameters per stage]: Conv (Convolution) [84][1][6], Conv (Convolution) [108][5][6], GEMM (General matrix multiply) [67][1][6], GEMM (General matrix multiply) [91][5][6]
 
Last edited:
I've tweaked the output for the WDMM 2.7 path to report hardware-accelerated scheduling as Enabled by default, Enabled, Disabled, and Not supported - so 'Enabled' / 'Disabled' now imply the feature is supported in the driver, whereas 'Not supported' means there is no driver support.
Please download the tool again.

Could it be that your Hardware-accelerated scheduler is disabled?
That's possible - I have a different driver installed, the WDDM 2.9 beta for WSL2 which doesn't support hardware scheduler.
 
Last edited:
Last edited:

Q: Which hardware platforms will support feature level 12_2?
A: We’re absolutely pleased to inform that:

  • Feature level 12_2 is supported on NVIDIA GeForce RTX and NVIDIA Quadro RTX GPUs.
  • AMD’s upcoming RDNA 2 architecture based GPUs will include full feature level 12_2 support.
  • Intel’s roadmap includes discrete GPUs that will empower developers to take full advantage of Feature Level 12_2.
  • Microsoft is collaborating with Qualcomm to bring the benefits of DirectX feature level 12_2 to Snapdragon platforms.
The powerful new capabilities in feature level 12_2 represent exciting new possibilities for game and application developers.
 
I thought DirectX 12 Ultimate was supposed to be available already, all those Direct3D 12 features included, but they imply it's only upcoming and currently in insider builds?
 
I thought DirectX 12 Ultimate was supposed to be available already, all those Direct3D 12 features included, but they imply it's only upcoming and currently in insider builds?
The features are all there. But Microsoft didn't actually add the feature level enumeration to Win10 2004.

https://devblogs.microsoft.com/directx/announcing-directx-12-ultimate/#comment-92

"We will be adding a 12_2 feature level in the API in the next update to Windows after 20H1. For now, all the features that make up DirectX 12 Ultimate are implemented ready for games to start using, but the feature level enum itself is not yet implemented,"
 
The best still has yet to come for D3D12/DXC. Shader model 6.6 introduces true bindless resources with the GetResourceFromHeap method.

I also heard from a former Intel engineer that some Intel HW is a closer match to a descriptor table rather than actual GPU addresses like we would see on the other hardware (AMD/NV) so D3D12 started with descriptor indexing rather than pointers or texture handles ...
 
The question I have (and why I've posted this here) - considering that it routes the decompression onto GPU h/w will it be a part of some feature level? And what GPUs will support this?
 
The question I have (and why I've posted this here) - considering that it routes the decompression onto GPU h/w will it be a part of some feature level? And what GPUs will support this?
For now we only know the RTX series will support it. We don't know why just those. GPU decompression over compute is supported on a lot of GPUs. If you use parquet format for data (I tend towards this from hadoop) RapidsAI supports GPU decompress of different algorithms for parquet:

This adds an interface for using GPU-accelerated reading and converting of Parquet to cuDF.

List of changes:

  • adds a simple GPU-accelerated Parquet-to-cuDF reader
  • adds a top-level read_parquet interface and associated python bindings
  • adds an engine parameter to select between pyarrow and cudf implementations
  • adds to the existing parameterized pytest to check against pyarrow reference
  • adds GPU-accelerated decompression for Brotli, Gzip and Snappy compressed data
I use snappy. Which is about 5:1 compression for my data sets.

None of this is related to direct storage though, I just wanted to point out that it's normal for GPUs to do this type of work. This is likely something else.
 
Last edited:
would be nice if they will support lzx or xpress and made them as default compression in ntfs, with automatic re-compression on file changes.
 
GeForce RTX 3080

Code:
Direct3D 12 feature checker (May 2020) by DmitryKo (x64)
https://forum.beyond3d.com/posts/1840641/
 
Windows 10 version 2004 (build 19041.508 vb_release) x64
 
ADAPTER 0
"NVIDIA GeForce RTX 3080"
VEN_10DE, DEV_2206, SUBSYS_22061569, REV_A1
Dedicated video memory : 10078.0 MB (10567548928 bytes)
Total video memory : 18237.0 MB (19122927616 bytes)
Video driver version : 27.21.14.5616
WDDM version : KMT_DRIVERVERSION_WDDM_2_7 (2700)
Hardware-accelerated scheduler : Disabled, supported
GraphicsPreemptionGranularity : DXGI_GRAPHICS_PREEMPTION_PIXEL_BOUNDARY (3)
ComputePreemptionGranularity : DXGI_COMPUTE_PREEMPTION_DISPATCH_BOUNDARY (1)
Maximum feature level : D3D_FEATURE_LEVEL_12_1 (0xc100)
DoublePrecisionFloatShaderOps : 1
OutputMergerLogicOp : 1
MinPrecisionSupport : D3D12_SHADER_MIN_PRECISION_SUPPORT_16_BIT (2) (0b0000'0010)
TiledResourcesTier : D3D12_TILED_RESOURCES_TIER_3 (3)
ResourceBindingTier : D3D12_RESOURCE_BINDING_TIER_3 (3)
PSSpecifiedStencilRefSupported : 0
TypedUAVLoadAdditionalFormats : 1
ROVsSupported : 1
ConservativeRasterizationTier : D3D12_CONSERVATIVE_RASTERIZATION_TIER_3 (3)
StandardSwizzle64KBSupported : 0
CrossNodeSharingTier : D3D12_CROSS_NODE_SHARING_TIER_NOT_SUPPORTED (0)
CrossAdapterRowMajorTextureSupported : 0
VPAndRTArrayIndexFromAnyShaderFeedingRasterizerSupportedWithoutGSEmulation : 1
ResourceHeapTier : D3D12_RESOURCE_HEAP_TIER_2 (2)
MaxGPUVirtualAddressBitsPerResource : 40
MaxGPUVirtualAddressBitsPerProcess : 40
Adapter Node 0:     TileBasedRenderer: 0, UMA: 0, CacheCoherentUMA: 0, IsolatedMMU: 1, HeapSerializationTier: 0, ProtectedResourceSession.Support: 1, ProtectedResourceSessionTypeCount: 1 D3D12_PROTECTED_RESOURCES_SESSION_HARDWARE_PROTECTED
HighestShaderModel : D3D12_SHADER_MODEL_6_5 (0x0065)
WaveOps : 1
WaveLaneCountMin : 32
WaveLaneCountMax : 32
TotalLaneCount : 8704
ExpandedComputeResourceStates : 1
Int64ShaderOps : 1
RootSignature.HighestVersion : D3D_ROOT_SIGNATURE_VERSION_1_1 (2)
DepthBoundsTestSupported : 1
ProgrammableSamplePositionsTier : D3D12_PROGRAMMABLE_SAMPLE_POSITIONS_TIER_2 (2)
ShaderCache.SupportFlags : D3D12_SHADER_CACHE_SUPPORT_SINGLE_PSO | LIBRARY (3) (0b0000'0011)
CopyQueueTimestampQueriesSupported : 1
CastingFullyTypedFormatSupported : 1
WriteBufferImmediateSupportFlags : D3D12_COMMAND_LIST_SUPPORT_FLAG_DIRECT | BUNDLE | COMPUTE | COPY | VIDEO_DECODE | VIDEO_PROCESS | VIDEO_ENCODE (127) (0b0111'1111)
ViewInstancingTier : D3D12_VIEW_INSTANCING_TIER_3 (3)
BarycentricsSupported : 1
ExistingHeaps.Supported : 1
MSAA64KBAlignedTextureSupported : 1
SharedResourceCompatibilityTier : D3D12_SHARED_RESOURCE_COMPATIBILITY_TIER_2 (2)
Native16BitShaderOpsSupported : 1
AtomicShaderInstructions : 0
SRVOnlyTiledResourceTier3 : 1
RenderPassesTier : D3D12_RENDER_PASS_TIER_0 (0)
RaytracingTier : D3D12_RAYTRACING_TIER_1_1 (11)
AdditionalShadingRatesSupported : 1
PerPrimitiveShadingRateSupportedWithViewportIndexing : 1
VariableShadingRateTier : D3D12_VARIABLE_SHADING_RATE_TIER_2 (2)
ShadingRateImageTileSize : 16
BackgroundProcessingSupported : 1
MeshShaderTier : D3D12_MESH_SHADER_TIER_1 (10)
SamplerFeedbackTier : D3D12_SAMPLER_FEEDBACK_TIER_0_9 (90)
DirectML maximum feature level : DML_FEATURE_LEVEL_2_0 (0x2000)
Metacommands enumerated : 7
Metacommands [parameters per stage]: Conv (Convolution) [84][1][6], CopyTensor [3][1][31], MVN (Mean Variance Normalization) [67][1][6], GEMM (General matrix multiply) [67][1][6], Conv (Convolution) [108][5][6], GEMM (General matrix multiply) [91][5][6], MVN (Mean Variance Normalization) [91][5][6]
The only difference with Turing so far (with Turing on an older driver though) is PerPrimitiveShadingRateSupportedWithViewportIndexing : 1
 
I've checked Turing (2080) on 456.38 driver and it still shows "PerPrimitiveShadingRateSupportedWithViewportIndexing : 0" so this seems like an additional Ampere only feature for now.
 
RDNA2: Radeon 6800 non-XT:
Direct3D 12 feature checker (July 2020) by DmitryKo (x64)
https://forum.beyond3d.com/posts/1840641/

Windows 10 version 2009 (build 19042.630 vb_release) x64
Checking for experimental features SM6 TR4 META

ADAPTER 0
"AMD Radeon RX 6800"
VEN_1002, DEV_73BF, SUBSYS_0E3A1002, REV_C3
Dedicated video memory : 16339.5 MB (17133170688 bytes)
Total video memory : 32694.6 MB (34282766336 bytes)
Video driver version : 27.20.14501.12006
WDDM version : KMT_DRIVERVERSION_WDDM_2_7 (2700)
Virtual memory model : GPUMMU
Hardware-accelerated scheduler : Not supported
GraphicsPreemptionGranularity : DXGI_GRAPHICS_PREEMPTION_PRIMITIVE_BOUNDARY (1)
ComputePreemptionGranularity : DXGI_COMPUTE_PREEMPTION_DMA_BUFFER_BOUNDARY (0)
Maximum feature level : D3D_FEATURE_LEVEL_12_1 (0xc100)
DoublePrecisionFloatShaderOps : 1
OutputMergerLogicOp : 1
MinPrecisionSupport : D3D12_SHADER_MIN_PRECISION_SUPPORT_16_BIT (2) (0b0000'0010)
TiledResourcesTier : D3D12_TILED_RESOURCES_TIER_4 (4)
ResourceBindingTier : D3D12_RESOURCE_BINDING_TIER_3 (3)
PSSpecifiedStencilRefSupported : 1
TypedUAVLoadAdditionalFormats : 1
ROVsSupported : 1
ConservativeRasterizationTier : D3D12_CONSERVATIVE_RASTERIZATION_TIER_3 (3)
StandardSwizzle64KBSupported : 0
CrossNodeSharingTier : D3D12_CROSS_NODE_SHARING_TIER_NOT_SUPPORTED (0)
CrossAdapterRowMajorTextureSupported : 0
VPAndRTArrayIndexFromAnyShaderFeedingRasterizerSupportedWithoutGSEmulation : 1
ResourceHeapTier : D3D12_RESOURCE_HEAP_TIER_2 (2)
MaxGPUVirtualAddressBitsPerResource : 44
MaxGPUVirtualAddressBitsPerProcess : 44
Adapter Node 0: TileBasedRenderer: 0, UMA: 0, CacheCoherentUMA: 0, IsolatedMMU: 1, HeapSerializationTier: 0, ProtectedResourceSession.Support: 1, ProtectedResourceSessionTypeCount: 1 D3D12_PROTECTED_RESOURCES_SESSION_HARDWARE_PROTECTED
HighestShaderModel : D3D12_SHADER_MODEL_6_6 (0x0066)
WaveOps : 1
WaveLaneCountMin : 32
WaveLaneCountMax : 64
TotalLaneCount : 5120
ExpandedComputeResourceStates : 1
Int64ShaderOps : 1
RootSignature.HighestVersion : D3D_ROOT_SIGNATURE_VERSION_1_1 (2)
DepthBoundsTestSupported : 1
ProgrammableSamplePositionsTier : D3D12_PROGRAMMABLE_SAMPLE_POSITIONS_TIER_2 (2)
ShaderCache.SupportFlags : D3D12_SHADER_CACHE_SUPPORT_SINGLE_PSO | LIBRARY | AUTOMATIC_INPROC_CACHE | AUTOMATIC_DISK_CACHE (15) (0b0000'1111)
CopyQueueTimestampQueriesSupported : 1
CastingFullyTypedFormatSupported : 1
WriteBufferImmediateSupportFlags : D3D12_COMMAND_LIST_SUPPORT_FLAG_DIRECT | BUNDLE | COMPUTE | COPY (15) (0b0000'1111)
ViewInstancingTier : D3D12_VIEW_INSTANCING_TIER_1 (1)
BarycentricsSupported : 1
ExistingHeaps.Supported : 1
MSAA64KBAlignedTextureSupported : 1
SharedResourceCompatibilityTier : D3D12_SHARED_RESOURCE_COMPATIBILITY_TIER_2 (2)
Native16BitShaderOpsSupported : 1
AtomicShaderInstructions : 0
SRVOnlyTiledResourceTier3 : 1
RenderPassesTier : D3D12_RENDER_PASS_TIER_0 (0)
RaytracingTier : D3D12_RAYTRACING_TIER_1_1 (11)
AdditionalShadingRatesSupported : 0
PerPrimitiveShadingRateSupportedWithViewportIndexing : 1
VariableShadingRateTier : D3D12_VARIABLE_SHADING_RATE_TIER_2 (2)
ShadingRateImageTileSize : 8
BackgroundProcessingSupported : 0
MeshShaderTier : D3D12_MESH_SHADER_TIER_1 (10)
SamplerFeedbackTier : D3D12_SAMPLER_FEEDBACK_TIER_1_0 (100)
DirectML maximum feature level : DML_FEATURE_LEVEL_2_0 (0x2000)
Metacommands enumerated : 4
Metacommands [parameters per stage]: Conv (Convolution) [84][1][6], Conv (Convolution) [108][5][6], GEMM (General matrix multiply) [67][1][6], GEMM (General matrix multiply) [91][5][6]

What seems interesting:
- GraphicsPreemptionGranularity worse than RTX 30
- ComputePreemptionGranularity worse than RTX 30
- TiledResourcesTier better than RTX 30
- PSSpecifiedStencilRefSupported better than RTX 30
- MaxGPUVirtualAddressBitsPerResource better than RTX 30
- MaxGPUVirtualAddressBitsPerProcess better than RTX 30
- HighestShaderModel better than RTX 30
- TotalLaneCount "worse" (i.e. less) than RTX 3080/3090
- ShaderCache.SupportFlags AUTOMATIC_INPROC_CACHE | AUTOMATIC_DISK_CACHE (15)
- AdditionalShadingRatesSupported worse than RTX 30
- ShadingRateImageTileSize smaller (i.e. better) than RTX 30
- BackgroundProcessingSupported worse than RTX 30
- SamplerFeedbackTier better than RTX 30
- Metacommands less than RTX 30
 
Last edited:
Back
Top