Direct3D feature levels discussion

... didn't follow why and if for-ever they will be tier 0.9 on sampler feedback.. but something something finally is becoming interesting
Tier 0.9 sounds like nvidia strong arming "Turing supports DX12 Ultimate, make it happen", at least if RDNA2 and Ampere have Tier 1 (or above if there are more tiers)
 
Tier 0.9 sounds like nvidia strong arming "Turing supports DX12 Ultimate, make it happen", at least if RDNA2 and Ampere have Tier 1 (or above if there are more tiers)
No it's not, Sampler feedback has been available alongside texture space shading on Turing since day 1, well before any DX12U announcements.
 
It likely means that Turing will have some limitations in supporting sampler feedback which won't be there in Ampere and RDNA2. Still, getting Turing even 90% compatible is good since there are a lot of Turings in user PCs while the install base of Ampere and RDNA2 is zero.
 
  • TIER_0_9 (i.e., version 0.9) indicates the following.
    • Sampler feedback is supported for samplers with these texture addressing modes:
      • D3D12_TEXTURE_ADDRESS_MODE_WRAP
      • D3D12_TEXTURE_ADDRESS_MODE_CLAMP
    • The Texture2D shader resource view passed in to feedback-writing HLSL methods has these restrictions:
      • The MostDetailedMip field must be 0.
      • The MipLevels count must span the full mip count of the resource.
      • The PlaneSlice field must be 0.
      • The ResourceMinLODClamp field must be 0.
    • The Texture2DArray shader resource view passed in to feedback-writing HLSL methods has these restrictions:
      • All the limitations as in Texture2D above, and
      • The FirstArraySlice field must be 0.
      • The ArraySize field must span the full array element count of the resource.
  • TIER_1_0 (i.e., version 1.0) indicates sampler feedback is supported for all texture addressing modes, and feedback-writing methods are supported irrespective of the passed-in shader resource view.

    https://microsoft.github.io/DirectX-Specs/d3d/SamplerFeedback.html
 
No it's not, Sampler feedback has been available alongside texture space shading on Turing since day 1, well before any DX12U announcements.
Just like several other technologies have been available in hardware before DirectX, but that doesn't mean the earlier hardware implementations are compatible with the future DirectX spec. The fact it's "0.9" instead of 1.0 tells pretty clearly it's not what MS intended Sampler Feedback to be, if it was, it would be called 1.0 and what's now called Tier 1 would be 1.1 or 2

late edit: removed two sneaky words who slipped past my original edit
 
Last edited:
Just like several other technologies have been available in hardware before DirectX, but that doesn't mean the earlier hardware implementations are compatible with the future DirectX spec. The fact it's "0.9" instead of 1.0 tells pretty clearly it's not what MS intended Sampler Feedback to be, if it was, it would be called 1.0 and what's now called Tier 1 would be 1.1 or 2

late edit: removed two sneaky words who slipped past my original edit

In practical terms does it even matter? If Turing's sampler feedback support is useful then it makes sense to expose it. What would be the benefit of not exposing it?
 
In practical terms does it even matter? If Turing's sampler feedback support is useful then it makes sense to expose it. What would be the benefit of not exposing it?
You tell me, this is the first time I remember something like this happening, no matter how close or far earlier hardware implementations have been to what ended up in DX spec, they've set certain minimum level ("tier 1" in current terms) and if you're not there you're not there.
 
You tell me, this is the first time I remember something like this happening, no matter how close or far earlier hardware implementations have been to what ended up in DX spec, they've set certain minimum level ("tier 1" in current terms) and if you're not there you're not there.

Yeah I agree it's strange they didn't just label Turing as 1.0.

Was Nvidia first out the gate but Microsoft decided after the fact that Turing's implementation wasn't good enough for a baseline feature? Or has Microsoft been thinking about this for a while and Turing didn't quite meet the minimum spec they had in mind. Either way the 0.9 thing is silly.
 
In fairness Turing has supported 98% of DX12U for the last 2 years. MS probably just figured that under those circumstances it would be crazy to lock the architecture out from full compliance on the basis of one feature that only meets 90% of their ideal criteria. It'd likely hamper the take up of that feature too given the significant Turing install base on PC.
 
In fairness Turing has supported 98% of DX12U for the last 2 years. MS probably just figured that under those circumstances it would be crazy to lock the architecture out from full compliance on the basis of one feature that only meets 90% of their ideal criteria. It'd likely hamper the take up of that feature too given the significant Turing install base on PC.
Is it "90%" though? Surely the numbering isn't based on percentages and the limitations seem (for someone who doesn't really know about those texturing modes and whatnot much) quite strict, when tier 1 supports "all texture addressing modes, and feedback-writing methods are supported irrespective of the passed-in shader resource view."
 
Tier 0.9 sounds like nvidia strong arming "Turing supports DX12 Ultimate, make it happen", at least if RDNA2 and Ampere have Tier 1 (or above if there are more tiers)
I'm curious: What would Nvidias leverage be in using strong arm tactics against Microsoft?
 
I'm curious: What would Nvidias leverage be in using strong arm tactics against Microsoft?
Marketshare and the fact MS could release DX12U with at least one set of cards supporting it on launch when they bend their requirements?
 
MS could release DX12U with at least one set of cards supporting it on launch when they bend their requirements?
And exactly that is required to happen happen. MS would be unable to successfully launch DX12U as a certification program presented to the consumer by both developers and IHVs if Nvidia declared "RTX cards are practically compatible regardless of that certification".

I mean, the motivation is to label "first generation", EOL DX12 cards as outdated by denying them the certification and pressuring consumers into upgrading, so that's a lot of motivation for the IHVs. And MS is in need for a new consumer friendly baseline label which developers may use to communicate minimum system requirements. Also because MS needs to get rid of "RTX" as the label used to reference that generation of GPUs.

And 90% rating is perfectly reasonable in this case. It's an unexpected set of limitations, but not enough to impair the usability of the feature. No real value added from a possible tier 2, other than not having a surprise limitation.
 
I've made an update to the Direct3D12 feature checker tool to report the state of hardware-accelerated GPU scheduler, graphics/compute scheduler preemption granularity, and virtual memory model (GpuMmu/IoMmmu), as well as new options in Windows Insider Preview "Iron" (build 20161).
Code:
Virtual memory model : GPUMMU
Hardware-accelerated scheduler : Disabled, DXGK_FEATURE_SUPPORT_ALWAYS_OFF (0)
GraphicsPreemptionGranularity : DXGI_GRAPHICS_PREEMPTION_PRIMITIVE_BOUNDARY (1)
ComputePreemptionGranularity : DXGI_COMPUTE_PREEMPTION_DMA_BUFFER_BOUNDARY (0)
MeshShaderPipelineStatsSupported : 0
DirectML maximum feature level : DML_FEATURE_LEVEL_2_1 (0x2100)

For WDDM 2.9 drivers (like the AMD DirectX WSL2 beta or NVIDIA CUDA WSL2 preview), it will report the current OS state of hardware-accelerated GPU scheduler (enabled or disabled) which will be followed by a driver support state - always off (not supported), experimental support, stable support, or always on (feature is required by the driver to operate).

For WDDM 2.7 drivers, there are three possible OS support states (enabled by default, enabled, or disabled) followed by driver support status (supported or not supported) - the latter only shows when OS state is disabled.
 
Last edited:
Results from the updated tool:

Code:
ADAPTER 0
"NVIDIA GeForce RTX 2080"
VEN_10DE, DEV_1E87, SUBSYS_1E8710DE, REV_A1
Dedicated video memory : 8010.0 MB (8399093760 bytes)
Total video memory : 40734.5 MB (42713178112 bytes)
Video driver version : 27.21.14.5148
WDDM version : KMT_DRIVERVERSION_WDDM_2_7 (2700)
Hardware-accelerated scheduler : Enabled, supported
GraphicsPreemptionGranularity : DXGI_GRAPHICS_PREEMPTION_PIXEL_BOUNDARY (3)
ComputePreemptionGranularity : DXGI_COMPUTE_PREEMPTION_DISPATCH_BOUNDARY (1)
Maximum feature level : D3D_FEATURE_LEVEL_12_1 (0xc100)
DoublePrecisionFloatShaderOps : 1
OutputMergerLogicOp : 1
MinPrecisionSupport : D3D12_SHADER_MIN_PRECISION_SUPPORT_16_BIT (2) (0b0000'0010)
TiledResourcesTier : D3D12_TILED_RESOURCES_TIER_3 (3)
ResourceBindingTier : D3D12_RESOURCE_BINDING_TIER_3 (3)
PSSpecifiedStencilRefSupported : 0
TypedUAVLoadAdditionalFormats : 1
ROVsSupported : 1
ConservativeRasterizationTier : D3D12_CONSERVATIVE_RASTERIZATION_TIER_3 (3)
StandardSwizzle64KBSupported : 0
CrossNodeSharingTier : D3D12_CROSS_NODE_SHARING_TIER_NOT_SUPPORTED (0)
CrossAdapterRowMajorTextureSupported : 0
VPAndRTArrayIndexFromAnyShaderFeedingRasterizerSupportedWithoutGSEmulation : 1
ResourceHeapTier : D3D12_RESOURCE_HEAP_TIER_2 (2)
MaxGPUVirtualAddressBitsPerResource : 40
MaxGPUVirtualAddressBitsPerProcess : 40
Adapter Node 0:    TileBasedRenderer: 0, UMA: 0, CacheCoherentUMA: 0, IsolatedMMU: 1, HeapSerializationTier: 0, ProtectedResourceSession.Support: 1, ProtectedResourceSessionTypeCount: 1 D3D12_PROTECTED_RESOURCES_SESSION_HARDWARE_PROTECTED
HighestShaderModel : D3D12_SHADER_MODEL_6_5 (0x0065)
WaveOps : 1
WaveLaneCountMin : 32
WaveLaneCountMax : 32
TotalLaneCount : 2944
ExpandedComputeResourceStates : 1
Int64ShaderOps : 1
RootSignature.HighestVersion : D3D_ROOT_SIGNATURE_VERSION_1_1 (2)
DepthBoundsTestSupported : 1
ProgrammableSamplePositionsTier : D3D12_PROGRAMMABLE_SAMPLE_POSITIONS_TIER_2 (2)
ShaderCache.SupportFlags : D3D12_SHADER_CACHE_SUPPORT_SINGLE_PSO | LIBRARY (3) (0b0000'0011)
CopyQueueTimestampQueriesSupported : 1
CastingFullyTypedFormatSupported : 1
WriteBufferImmediateSupportFlags : D3D12_COMMAND_LIST_SUPPORT_FLAG_DIRECT | BUNDLE | COMPUTE | COPY | VIDEO_DECODE | VIDEO_PROCESS | VIDEO_ENCODE (127) (0b0111'1111)
ViewInstancingTier : D3D12_VIEW_INSTANCING_TIER_3 (3)
BarycentricsSupported : 1
ExistingHeaps.Supported : 1
MSAA64KBAlignedTextureSupported : 1
SharedResourceCompatibilityTier : D3D12_SHARED_RESOURCE_COMPATIBILITY_TIER_2 (2)
Native16BitShaderOpsSupported : 1
AtomicShaderInstructions : 0
SRVOnlyTiledResourceTier3 : 1
RenderPassesTier : D3D12_RENDER_PASS_TIER_0 (0)
RaytracingTier : D3D12_RAYTRACING_TIER_1_1 (11)
AdditionalShadingRatesSupported : 1
PerPrimitiveShadingRateSupportedWithViewportIndexing : 0
VariableShadingRateTier : D3D12_VARIABLE_SHADING_RATE_TIER_2 (2)
ShadingRateImageTileSize : 16
BackgroundProcessingSupported : 1
MeshShaderTier : D3D12_MESH_SHADER_TIER_1 (10)
SamplerFeedbackTier : D3D12_SAMPLER_FEEDBACK_TIER_0_9 (90)
DirectML maximum feature level : DML_FEATURE_LEVEL_2_0 (0x2000)
Metacommands enumerated : 8
Metacommands [parameters per stage]: Conv (Convolution) [84][1][6], CopyTensor [3][1][31], 2x2 Nearest neighbour Upsample [15][1][2], MVN (Mean Variance Normalization) [67][1][6], GEMM (General matrix multiply) [67][1][6], Conv (Convolution) [108][5][6], GEMM (General matrix multiply) [91][5][6], MVN (Mean Variance Normalization) [91][5][6]

And the only "experimental" feature available in this driver is TiledResourcesTier : D3D12_TILED_RESOURCES_TIER_4 (4)
 
Code:
GraphicsPreemptionGranularity : DXGI_GRAPHICS_PREEMPTION_PIXEL_BOUNDARY (3)
ComputePreemptionGranularity : DXGI_COMPUTE_PREEMPTION_DISPATCH_BOUNDARY (1)
Interesting - that's a bit finer-grained comparing to GCN/RDNA:
Code:
GraphicsPreemptionGranularity : DXGI_GRAPHICS_PREEMPTION_PRIMITIVE_BOUNDARY (1)
ComputePreemptionGranularity : DXGI_COMPUTE_PREEMPTION_DMA_BUFFER_BOUNDARY (0)
 
Last edited:
Interesting - that's a bit finer-grained comparing to GCN/RDNA:
Code:
GraphicsPreemptionGranularity : DXGI_GRAPHICS_PREEMPTION_PRIMITIVE_BOUNDARY (1)
ComputePreemptionGranularity : DXGI_COMPUTE_PREEMPTION_DMA_BUFFER_BOUNDARY (0)
But what's the difference between DXGI_COMPUTE_PREEMPTION_DMA_BUFFER_BOUNDARY and DXGI_COMPUTE_PREEMPTION_DISPATCH_BOUNDARY?
 
From the following, it they are increasingly more fine levels of granularity for preemption:
https://docs.microsoft.com/en-us/wi...e-dxgi1_2-dxgi_compute_preemption_granularity

Code:
DXGI_COMPUTE_PREEMPTION_DMA_BUFFER_BOUNDARY      Indicates the preemption granularity as a compute packet.
DXGI_COMPUTE_PREEMPTION_DISPATCH_BOUNDARY        Indicates the preemption granularity as a dispatch (for example, a call to the ID3D11DeviceContext::Dispatch method). A dispatch is a part of a compute packet.
DXGI_COMPUTE_PREEMPTION_THREAD_GROUP_BOUNDARY    Indicates the preemption granularity as a thread group. A thread group is a part of a dispatch.
DXGI_COMPUTE_PREEMPTION_THREAD_BOUNDARY          Indicates the preemption granularity as a thread in a thread group. A thread is a part of a thread group.
DXGI_COMPUTE_PREEMPTION_INSTRUCTION_BOUNDARY     Indicates the preemption granularity as a compute instruction in a thread.

The earlier entries in the list may contain other components or multiple examples of entries later in the list. A given level of granularity indicates what chunk of work the system must complete before it can switch to another context. The more coarse units can take longer to drain, or can suffer from more pathological cases that can hinder or stop context switching.
 
From a semi-Turing
Code:
ADAPTER 0
"NVIDIA GeForce GTX 1660"
VEN_10DE, DEV_2184, SUBSYS_11673842, REV_A1
Dedicated video memory : 5991.0 MB (6282018816 bytes)
Total video memory : 14157.8 MB (14845515776 bytes)
Video driver version : 27.21.14.5148
WDDM version : KMT_DRIVERVERSION_WDDM_2_7 (2700)
Virtual memory model : GPUMMU
Hardware-accelerated scheduler : EnabledGraphicsPreemptionGranularity : DXGI_GRAPHICS_PREEMPTION_PIXEL_BOUNDARY (3)
ComputePreemptionGranularity : DXGI_COMPUTE_PREEMPTION_DISPATCH_BOUNDARY (1)
Maximum feature level : D3D_FEATURE_LEVEL_12_1 (0xc100)
DoublePrecisionFloatShaderOps : 1
OutputMergerLogicOp : 1
MinPrecisionSupport : D3D12_SHADER_MIN_PRECISION_SUPPORT_16_BIT (2) (0b0000'0010)
TiledResourcesTier : D3D12_TILED_RESOURCES_TIER_3 (3)
ResourceBindingTier : D3D12_RESOURCE_BINDING_TIER_3 (3)
PSSpecifiedStencilRefSupported : 0
TypedUAVLoadAdditionalFormats : 1
ROVsSupported : 1
ConservativeRasterizationTier : D3D12_CONSERVATIVE_RASTERIZATION_TIER_3 (3)
StandardSwizzle64KBSupported : 0
CrossNodeSharingTier : D3D12_CROSS_NODE_SHARING_TIER_NOT_SUPPORTED (0)
CrossAdapterRowMajorTextureSupported : 0
VPAndRTArrayIndexFromAnyShaderFeedingRasterizerSupportedWithoutGSEmulation : 1
ResourceHeapTier : D3D12_RESOURCE_HEAP_TIER_2 (2)
MaxGPUVirtualAddressBitsPerResource : 40
MaxGPUVirtualAddressBitsPerProcess : 40
Adapter Node 0:     TileBasedRenderer: 0, UMA: 0, CacheCoherentUMA: 0, IsolatedMMU: 1, HeapSerializationTier: 0, ProtectedResourceSession.Support: 1, ProtectedResourceSessionTypeCount: 1 D3D12_PROTECTED_RESOURCES_SESSION_HARDWARE_PROTECTED
HighestShaderModel : D3D12_SHADER_MODEL_6_5 (0x0065)
WaveOps : 1
WaveLaneCountMin : 32
WaveLaneCountMax : 32
TotalLaneCount : 1408
ExpandedComputeResourceStates : 1
Int64ShaderOps : 1
RootSignature.HighestVersion : D3D_ROOT_SIGNATURE_VERSION_1_1 (2)
DepthBoundsTestSupported : 1
ProgrammableSamplePositionsTier : D3D12_PROGRAMMABLE_SAMPLE_POSITIONS_TIER_2 (2)
ShaderCache.SupportFlags : D3D12_SHADER_CACHE_SUPPORT_SINGLE_PSO | LIBRARY (3) (0b0000'0011)
CopyQueueTimestampQueriesSupported : 1
CastingFullyTypedFormatSupported : 1
WriteBufferImmediateSupportFlags : D3D12_COMMAND_LIST_SUPPORT_FLAG_DIRECT | BUNDLE | COMPUTE | COPY | VIDEO_DECODE | VIDEO_PROCESS | VIDEO_ENCODE (127) (0b0111'1111)
ViewInstancingTier : D3D12_VIEW_INSTANCING_TIER_3 (3)
BarycentricsSupported : 1
ExistingHeaps.Supported : 1
MSAA64KBAlignedTextureSupported : 1
SharedResourceCompatibilityTier : D3D12_SHARED_RESOURCE_COMPATIBILITY_TIER_2 (2)
Native16BitShaderOpsSupported : 1
AtomicShaderInstructions : 0
SRVOnlyTiledResourceTier3 : 1
RenderPassesTier : D3D12_RENDER_PASS_TIER_0 (0)
RaytracingTier : D3D12_RAYTRACING_TIER_1_0 (10)
AdditionalShadingRatesSupported : 1
PerPrimitiveShadingRateSupportedWithViewportIndexing : 0
VariableShadingRateTier : D3D12_VARIABLE_SHADING_RATE_TIER_2 (2)
ShadingRateImageTileSize : 16
BackgroundProcessingSupported : 1
MeshShaderTier : D3D12_MESH_SHADER_TIER_1 (10)
SamplerFeedbackTier : D3D12_SAMPLER_FEEDBACK_TIER_0_9 (90)
DirectML maximum feature level : DML_FEATURE_LEVEL_2_0 (0x2000)
Metacommands enumerated : 7
Metacommands [parameters per stage]: Conv (Convolution) [84][1][6], CopyTensor [3][1][31], MVN (Mean Variance Normalization) [67][1][6], GEMM (General matrix multiply) [67][1][6], Conv (Convolution) [108][5][6], GEMM (General matrix multiply) [91][5][6], MVN (Mean Variance Normalization) [91][5][6]
 
Kepler
Code:
NVIDIA GeForce GTX TITAN"
VEN_10DE, DEV_1005, SUBSYS_84511043, REV_A1
Dedicated video memory : 6096.2 MB (6392315904 bytes)
Total video memory : 22448.7 MB (23539169280 bytes)
Video driver version : 27.21.14.5148
WDDM version : KMT_DRIVERVERSION_WDDM_2_7 (2700)
Hardware-accelerated scheduler : Disabled, not supported
GraphicsPreemptionGranularity : DXGI_GRAPHICS_PREEMPTION_DMA_BUFFER_BOUNDARY (0)
ComputePreemptionGranularity : DXGI_COMPUTE_PREEMPTION_DMA_BUFFER_BOUNDARY (0)
Maximum feature level : D3D_FEATURE_LEVEL_11_0 (0xb000)
DoublePrecisionFloatShaderOps : 1
OutputMergerLogicOp : 1
MinPrecisionSupport : D3D12_SHADER_MIN_PRECISION_SUPPORT_NONE (0) (0b0000'0000)
TiledResourcesTier : D3D12_TILED_RESOURCES_TIER_1 (1)
ResourceBindingTier : D3D12_RESOURCE_BINDING_TIER_2 (2)
PSSpecifiedStencilRefSupported : 0
TypedUAVLoadAdditionalFormats : 0
ROVsSupported : 0
ConservativeRasterizationTier : D3D12_CONSERVATIVE_RASTERIZATION_TIER_NOT_SUPPORTED (0)
StandardSwizzle64KBSupported : 0
CrossNodeSharingTier : D3D12_CROSS_NODE_SHARING_TIER_NOT_SUPPORTED (0)
CrossAdapterRowMajorTextureSupported : 0
VPAndRTArrayIndexFromAnyShaderFeedingRasterizerSupportedWithoutGSEmulation : 0
ResourceHeapTier : D3D12_RESOURCE_HEAP_TIER_1 (1)
MaxGPUVirtualAddressBitsPerResource : 40
MaxGPUVirtualAddressBitsPerProcess : 40
Adapter Node 0:     TileBasedRenderer: 0, UMA: 0, CacheCoherentUMA: 0, IsolatedMMU: 1, HeapSerializationTier: 0, ProtectedResourceSession.Support: 0
HighestShaderModel : D3D12_SHADER_MODEL_6_5 (0x0065)
WaveOps : 1
WaveLaneCountMin : 32
WaveLaneCountMax : 32
TotalLaneCount : 2688
ExpandedComputeResourceStates : 1
Int64ShaderOps : 1
RootSignature.HighestVersion : D3D_ROOT_SIGNATURE_VERSION_1_1 (2)
DepthBoundsTestSupported : 1
ProgrammableSamplePositionsTier : D3D12_PROGRAMMABLE_SAMPLE_POSITIONS_TIER_NOT_SUPPORTED (0)
ShaderCache.SupportFlags : D3D12_SHADER_CACHE_SUPPORT_SINGLE_PSO | LIBRARY (3) (0b0000'0011)
CopyQueueTimestampQueriesSupported : 1
CastingFullyTypedFormatSupported : 1
WriteBufferImmediateSupportFlags : D3D12_COMMAND_LIST_SUPPORT_FLAG_DIRECT | BUNDLE | COMPUTE | COPY | VIDEO_DECODE | VIDEO_PROCESS | VIDEO_ENCODE (127) (0b0111'1111)
ViewInstancingTier : D3D12_VIEW_INSTANCING_TIER_1 (1)
BarycentricsSupported : 0
ExistingHeaps.Supported : 1
MSAA64KBAlignedTextureSupported : 1
SharedResourceCompatibilityTier : D3D12_SHARED_RESOURCE_COMPATIBILITY_TIER_2 (2)
Native16BitShaderOpsSupported : 0
AtomicShaderInstructions : 0
SRVOnlyTiledResourceTier3 : 0
RenderPassesTier : D3D12_RENDER_PASS_TIER_0 (0)
RaytracingTier : D3D12_RAYTRACING_TIER_NOT_SUPPORTED (0)
AdditionalShadingRatesSupported : 0
PerPrimitiveShadingRateSupportedWithViewportIndexing : 0
VariableShadingRateTier : D3D12_VARIABLE_SHADING_RATE_TIER_NOT_SUPPORTED (0)
ShadingRateImageTileSize : 0
BackgroundProcessingSupported : 1
MeshShaderTier : D3D12_MESH_SHADER_TIER_NOT_SUPPORTED (0)
SamplerFeedbackTier : D3D12_SAMPLER_FEEDBACK_TIER_NOT_SUPPORTED (0)
DirectML maximum feature level : DML_FEATURE_LEVEL_2_0 (0x2000)
Metacommands enumerated : 0

Also got a laptop with Vega 8 and GTX 1050 I'll try it on whenever WU officially says the system is ready.
 
Back
Top