Direct3D feature levels discussion

Nemo

Newcomer
Compute pipelines have no access to graphics states

ACE can operate in parallel with Graphics CP, ie ACE have access to shader engine with rasterizer via Graphics CP >> GDS for mixed mode.
78767566nug16juwh.jpg


2wltwzd.jpg


22dwte.jpg
 

Ryan Smith

Regular
Supporter
Ryan, are you planning on writing an article regarding feature level support of the various D3D12 supported architectures? It seems almost nobody outside of this forum even knows that D3D12 support != D3D12 feature level compliance. It think it'd get a lot of hits :p
Once I have some confirmed information out of the various GPU vendors, yes. Right now everyone is being very hush-hush, I cannot get confirmation on anything except that Maxwell 2 is FL12_1.
 

DmitryKo

Regular
AMD will update the driver later for implemented support for tiled resources tier 3 on DX12, because GCN1.0 support Texture3D as well.
I don't follow the logic.

Texture3D support for some resource formats does not automatically imply tiled resources support for these formats. These are three separate optional capabilities:

1. Texture3D is an optional resource type in Direct3D 10:
D3D10_FORMAT_SUPPORT
D3D11_FORMAT_SUPPORT
D3D12_FORMAT_SUPPORT1

2. Tiled resource is an optional resource type since Direct3D 11.2 :
D3D11_FORMAT_SUPPORT2
D3D12_FORMAT_SUPPORT2

3. Volume Tiled Resources (i.e. tiled resources with Texture3D support) in Direct3D 11.3 and 12.0 require GPUs conforming to Tiled Resources Tier 3
D3D11_FEATURE_DATA_D3D11_OPTIONS1
D3D11_TILED_RESOURCES_TIER
D3D12_FEATURE_DATA_D3D12_OPTIONS
D3D12_TILED_RESOURCES_TIER

Tiers 2 and 3 require additional hardware features such as virtual memory/page tables and TLBs/caches, which are only accessible by the driver host, so this cannot be emulated with HLSL shaders.
It's like trying to implement protected mode flat addressing (IA32/i386) on a processor with real-mode segmented addressing (i8086).

Software Rasterization was planned for GCN1.0, but AMD can use ACE for Hardware Conservative Rasterization and ROVs.
Even if Conservative Rasterization and Rasterizer Ordered Views can be implemented with shaders, I honestly don't see how it is possible to program the fixed-function rasterizer stage from computing units...

You are quoting a shader-based "software" solution to occlusion culling, which involves rendering the scene in low-resolution, either on the CPU or by the compute pipeline. There are also algorithms to "emulate" conservative rasterization of triangles with vertex shaders.
If some applications developers use these solutions in their applications, fine. But it's completely unrealistic to expect them to appear in a low-level WDDM 2.0 driver to substitute for missing for Direct3D 11.3/12 hardware features.

ACE can operate in parallel with Graphics CP, ie ACE have access to shader engine with rasterizer via Graphics CP >> GDS for mixed mode.
ACE is a dedicated scheduler which operates independently of the CPU host (i.e. ExecuteIndirect and asynchronous Render/Compute/Copy).

Even if it can access the graphics pipeline, it cannot magically take the rasterizer block - that is the rasterizer pipeline stage in between vertex and pixels shader stages which takes the triangles and viewports and generates pixels for pixel shaders to run on - and reprogram the rasterization algorythm if the rasterizer is not programmable in the first place.
 
Last edited:

DmitryKo

Regular
Ryan, are you planning on writing an article regarding feature level support of the various D3D12 supported architectures? It seems almost nobody outside of this forum even knows that D3D12 support != D3D12 feature level compliance. It think it'd get a lot of hits :p
Nobody even knows about feature levels in Direct3D 11 - and the fact that Windows Phone 8.x hardware is mostly feature level 9_1/9_3 an very rarely 10_1 or 11_0, so it's actually Direct3D 11.2 on top of feature level 9_1... guess which part is cited in most reviews? WDDM 2.0 now supports virtual CPU memory address space - which should be quite common on mobile GPU parts that lack their own dedicated graphics memory. This means that Windows Phone 10 devices could have WDDM 2.0 drivers and Direct3D 12 on top of feature level 9_1... my brain already hurts over this proposition.

Also, the publicly released Windows 10 SDK seems quite incomplete and so are Direct3D 12 documentation, libraries and WDDM 2.0 drivers. While principal specs won't change at this stage, a lot of implementation details can and will change before the final release. The most recent pieces are still only available to DirectX12 Early Access Programs members, which only include established developers.

And what's more important, feature levels are not the only part of the equation since at least Direct3D 11.1 on Windows 8 which first introduced optional caps on all feature levels...
 

willardjuice

super willyjuice
Moderator
Veteran
Supporter
Kind of nitpicking I know, but all windows phone 8 devices are 9_3. I assume new devices will be at least 11_1 (adreno 4x0).
 

pTmdfx

Regular
ACE can operate in parallel with Graphics CP, ie ACE have access to shader engine with rasterizer via Graphics CP >> GDS for mixed mode.
Operating in parallel is one thing. Accessing graphics state is another thing, or why would you think the compute queues have to be separated from the graphics queue in the multi-engine model? Yeah, ACEs have access to the same set of shader engine, but rasterisers? I wouldn't be so sure.

Maybe "ACE can depend on part of the graphics pipe" gave you an illusion of ACEs and the dedicated compute pipeline having access to graphics state and fixed-function. Frankly, by context it is talking about forming a task graph, which is certainly possible with fences/barrier/special counters in memory or GDS. But none of your slides or the entire set has a word on what you might have thought it is. The compute pipelines are designed to bypass all the unnecessary states for compute from day one, anyway...
 

pjbliverpool

B3D Scallywag
Legend
I gave some of the info in my presentation at GDC: https://software.intel.com/sites/de...ndering-with-DirectX-12-on-Intel-Graphics.pdf

Note that what the driver returns at this point is fairly arbitrary across all implementations... that's literally just querying caps bits that the driver sets, it's not as if it's testing the features or anything so there are both cases where something that will be supported just isn't flicked on yet and other cases where things are set that may not even work yet. So while the stuff posted so far looks roughly accurate for those architectures (obviously tiled resources is not correct for GCN), do take it all with a grain of salt at this point :)

Anyways for Haswell/Broadwell it's roughly:
- Feature level 11_1
- Tier 1 binding
- ROVs, doubles, OM logic ops are supported
- No conservative raster, additional typed UAV formats, standard swizzle or ASTC
- Half precision (fp16) is supported on Broadwell, but not Haswell

Stuff I don't remember for sure off the top of my head:
- I believe both will ultimately support Tier 1 tiled resources, but may not be reported yet
- PS specified stencil ref is probably not supported on Haswell, don't remember if it is on Broadwell

In any case it's the basic set of ~DX11-level features + ROVs on Haswell/Broadwell. Those architectures obviously predate the interesting design changes in DX12 so it's more a question of fitting the new API onto existing hardware than designing hardware for the new API (ex. see what we have to do with resource binding in the presentation above). Definitely stay tuned and check again in the near future once new architectures come out :)

Awesome, thanks!
 

DmitryKo

Regular
So to confirm your assertion that ACEs can control the rasterizer, you provide a diagram from the GCN Architecture whitepaper where ACEs actually bypass the rasterizer?

Again, from all we know, rasterizer is a fixed-function unit controlled from the CPU host driver with parameters set by D3D11_RASTERIZER_DESC2 or D3D12_RASTERIZER_DESC, which is a part of the pipeline state D3D12_GRAPHICS_PIPELINE_STATE_DESC in Direct3D12.

Nothing in the publicly available GCN developer documentation suggests that the rasterizer stage is fully programmable in any way similar to the general purpose shader processors.

The rasterizer stage simply takes vertex data prepared by geometry/vertex/hull/domain shaders and generates pixel data for pixel shaders. It's just a GPU/API design peculiarity all shader programs have the same SM5.0 limits and HLSL commands and run on the same shared programmable processing cores in the GPU.
 

DmitryKo

Regular
It looks like GCN 1.1/1.2 and Xbox One do support feature level 12_0 - i.e. at least Resource Binding Tier 2, Tiled Resources Tier 2, and Typed UAV Load with additional formats!

AMD released WDDM 2.0 driver 15.200.1018.1 - comparing to 15.200.1012.2, the new driver reports Tiled Resources Tier 2 (previously None) and 38 bits for Max Virtual Address Bits Per Resource (previously 31 bit) on my Radeon R9 290X.

There is a minor fix to the command-line tool (didn't report Virtual Adress Bits), updated Win32 executable file and C source code are in a ZIP file attached to the post below).

Code:
ADAPTER 0
"AMD Radeon R9 200 Series (Engineering Sample - WDDM v2.0)"
VEN_1002, DEV_67B0, SUBSYS_30801462, REV_00
Dedicated video memory : 3221225472  bytes
Direct3D 12 is supported
Maximum feature level :  D3D_FEATURE_LEVEL_11_1 (0xb100)
DoublePrecisionFloatShaderOps : 1
OutputMergerLogicOp : 1
MinPrecisionSupport : D3D12_SHADER_MIN_PRECISION_NONE (0)
TiledResourcesTier : D3D12_TILED_RESOURCES_TIER_2 (2)
ResourceBindingTier : D3D12_RESOURCE_BINDING_TIER_3 (3)
PSSpecifiedStencilRefSupported : 1
TypedUAVLoadAdditionalFormats : 1
ROVsSupported : 0
ConservativeRasterizationTier : D3D12_CONSERVATIVE_RASTERIZATION_NOT_SUPPORTED (0)
MaxGPUVirtualAddressBitsPerResource : 38
StandardSwizzle64KBSupported : 0
ASTCProfile : D3D12_ASTC_PROFILE_NOT_SUPPORTED (0)
CrossNodeSharingTier : D3D12_CROSS_NODE_SHARING_NOT_SUPPORTED (0)
CrossAdapterRowMajorTextureSupported : 0
Adapter Node 0:         TileBasedRenderer: 0, UMA: 0, CacheCoherentUMA: 0
 
Last edited:
Hello.

Thanks all for this very interesting discussion.

Dmitry, on your last report you're talking about your R9 290X but the video memory is 3221225472 bytes. This is a 280X?
 

DmitryKo

Regular
s
you're talking about your R9 290X but the video memory is 3221225472 bytes. This is a 280X?

No, MSI R9 290X Gaming 4G, it's just running a 32-bit EXE.

The way DirectX reports video memory has always been tricky because the video card can allocate and access system memory over PCI/AGP/PCIe bus, so total video memory is counted as a sum of three pools: "dedicated video memory + dedicated system memory + shared system memory".
"Dedicated video" is complete onboard video memory mapped into virtual address space and shared over the PCIe bus, but this virtual address space is limited to 4GB in the 32-bit (x86) builds.

On my system, AMD Catalyst reports 3GB as "dedicated video" and ~1023.94MB as "shared system" memory in a 32-bit build of the tool; in x64 build there are ~4073.5MB of "dedicated video" and ~11.92GB of "shared system" memory (on a 16 GB machine).

Intel Graphics reports 32 Mbytes of "dedicated" and 1632 MB (?!!) of "shared system" memory - though Intel CPUs do not have dedicated video memory on their own, and both of these pools are actually allocated in system memory.

WARP12 reports 4095.94MB of "shared system" memory in 32-bit builds and 8134.3 MB in x64 builds.

"Dedicated system" memory pool seems to only be allocated on UMA notebook platforms with dedicated discrete GPUs with no onboard video memory; it's not allocated for desktop discrete cards with onboard memory or integrated CPU graphics.

PS: update May 1 2015
Direct3D12 MSDN documentation has been updated with the release of Visual Studio 2015 RC with Windows SDK build 10069, which require the latest Windows 10 build 10074. There are some minor changes to the API - the SDK now defines feature levels 12_0 and 12_1, and feature levels 9_x and 10_x now fail to create on any device in Direct3D 12.

Unfortunately some things are broken in Windows 10 build 10074 and Windows SDK build 10069. Direct3D12 device can't be created on the R290X anymore returning DXGI_ERROR_UNSUPPORTED/E_INVALIDARG - so there are probably some breaking changes which require updated drivers, though maybe someone would have a better luck with Haswell/Broadwell or Kepler/Maxwell. DxCapsView performs erratically, now reporting Conservative Rasterization support (???) but only under x64 version, not x86, and not reporting PS-Specified Stencil Ref anymore. No support for reporting Direct3D12 features either.
 
Last edited:

DmitryKo

Regular
Direct3D 12 feature checker (August 2022) by DmitryKo ( Agility SDK v.706, Direct3D12 SDK v.607)

D3D12CheckFeatureSupport.exe is a Windows 10 console app which calls D3DKMTQueryAdapterInfo, IDXGIAdapter4::GetDesc3 and ID3D12Device::CheckFeatureSupport to check the supported Direct3D 12 options for every graphics adapter in the system.

Executable file and C++ source code are included in the ZIP archive below.

Usage:
Run checkfeatures_agile.cmd to write a list of supported feature options to D3D12FeatureOptions.txt and D3D12FeatureOptions_exp.txt (enable the Developer mode in Windows Settings - Update & Security - For developers).

Redistributable runtime
Windows 10 version 1909 (build 18363.1350) and later support redistributable runtime from the Direct3D 12 Agility SDK; download the current Agility SDK 1706.4 Preview runtime and extract d3d12core.dll from bin\x64 (or bin\arm64) folder of the NuGet package to the D3D12\ subfolder.

NuGet package file (.NUPKG) is a ZIP format archive which can be opened by NuGet Package Explorer, 7-Zip, WinRAR, File Explorer etc.

Requires Visual C++ Redistributable for Visual Studio 2015-2022 - look under Downloads - Other Tools, Frameworks and Redistributables (direct download links: x64 ARM64).

What's new
July 31, 2019
Options 7 (mesh shader tier, sampler feedback tier), metacommand parameters, shader model 6_6, raytracing tier 1_1, shared resource compatibility tier 2 in Windows 10 version 2004 (build 19041).
May 5, 2020
DirectML feature level, sampler feedback texture formats and resource flags in version 2004 (build 19041). Options 8 in Windows 10 Manganese (build 19619). WDDM version and driver version for all adapters. Support Direct3D 12on7.
July 10, 2020
Options 9 and shader model 6_7 in Windows 10 Iron (build 20165), virtual memory model, scheduler preemption granularity and state of hardware-accelerated GPU scheduler.
April 21, 2021
Options 10, 11, displayable texture in Windows 10 Cobalt (build 21359); minor fixes.
August 31, 2022
Options 12, 13 in Windows 11 Nickel (build 22621), options 14, 15, 16, shader model 6_8 in Copper (build 25193); minor fixes.

Sample output
Code:
Direct3D 12 feature checker (August 2022) by DmitryKo (x64) (Agility SDK v706)
https://forum.beyond3d.com/posts/1840641/

Windows 10 version Dev (build 21390.2025 co_release) x64

ADAPTER 0
"AMD Radeon RX 5700 XT"
VEN_1002, DEV_731F, SUBSYS_05771043, REV_C1
Dedicated video memory : 8150.6 MB (8546471936 bytes)
Total video memory : 24496.6 MB (25686552576 bytes)
BIOS string : 115-D199PI0-101
Video driver version : 31.0.12019.15004
WDDM version : KMT_DRIVERVERSION_WDDM_3_0 (3000)
Virtual memory model : GPUMMU
Hardware-accelerated scheduler : Disabled, DXGK_FEATURE_SUPPORT_ALWAYS_OFF (0)
GraphicsPreemptionGranularity : DXGI_GRAPHICS_PREEMPTION_PRIMITIVE_BOUNDARY (1)
ComputePreemptionGranularity : DXGI_COMPUTE_PREEMPTION_DMA_BUFFER_BOUNDARY (0)
Maximum feature level : D3D_FEATURE_LEVEL_12_1 (0xc100)
DoublePrecisionFloatShaderOps : 1
OutputMergerLogicOp : 1
MinPrecisionSupport : D3D12_SHADER_MIN_PRECISION_SUPPORT_16_BIT (2) (0b0000'0010)
TiledResourcesTier : D3D12_TILED_RESOURCES_TIER_3 (3)
ResourceBindingTier : D3D12_RESOURCE_BINDING_TIER_3 (3)
PSSpecifiedStencilRefSupported : 1
TypedUAVLoadAdditionalFormats : 1
ROVsSupported : 1
ConservativeRasterizationTier : D3D12_CONSERVATIVE_RASTERIZATION_TIER_3 (3)
StandardSwizzle64KBSupported : 0
CrossNodeSharingTier : D3D12_CROSS_NODE_SHARING_TIER_NOT_SUPPORTED (0)
CrossAdapterRowMajorTextureSupported : 0
VPAndRTArrayIndexFromAnyShaderFeedingRasterizerSupportedWithoutGSEmulation : 1
ResourceHeapTier : D3D12_RESOURCE_HEAP_TIER_2 (2)
MaxGPUVirtualAddressBitsPerResource : 44
MaxGPUVirtualAddressBitsPerProcess : 44
Adapter Node 0:     TileBasedRenderer: 0, UMA: 0, CacheCoherentUMA: 0, IsolatedMMU: 1, HeapSerializationTier: 0, ProtectedResourceSession.Support: 1, ProtectedResourceSessionTypeCount: 1 D3D12_PROTECTED_RESOURCES_SESSION_HARDWARE_PROTECTED
HighestShaderModel : D3D12_SHADER_MODEL_6_7 (0x0067)
WaveOps : 1
WaveLaneCountMin : 32
WaveLaneCountMax : 64
TotalLaneCount : 2560
ExpandedComputeResourceStates : 1
Int64ShaderOps : 1
RootSignature.HighestVersion : D3D_ROOT_SIGNATURE_VERSION_1_1 (2)
DepthBoundsTestSupported : 1
ProgrammableSamplePositionsTier : D3D12_PROGRAMMABLE_SAMPLE_POSITIONS_TIER_2 (2)
ShaderCache.SupportFlags : D3D12_SHADER_CACHE_SUPPORT_SINGLE_PSO | LIBRARY | AUTOMATIC_INPROC_CACHE | AUTOMATIC_DISK_CACHE | DRIVER_MANAGED_CACHE | SHADER_CONTROL_CLEAR | SHADER_SESSION_DELETE (127) (0b0111'1111)
CopyQueueTimestampQueriesSupported : 1
CastingFullyTypedFormatSupported : 1
WriteBufferImmediateSupportFlags : D3D12_COMMAND_LIST_SUPPORT_FLAG_DIRECT | BUNDLE | COMPUTE | COPY (15) (0b0000'1111)
ViewInstancingTier : D3D12_VIEW_INSTANCING_TIER_1 (1)
BarycentricsSupported : 0
ExistingHeaps.Supported : 1
MSAA64KBAlignedTextureSupported : 1
SharedResourceCompatibilityTier : D3D12_SHARED_RESOURCE_COMPATIBILITY_TIER_2 (2)
Native16BitShaderOpsSupported : 1
AtomicShaderInstructions : 0
SRVOnlyTiledResourceTier3 : 1
RenderPassesTier : D3D12_RENDER_PASS_TIER_0 (0)
RaytracingTier : D3D12_RAYTRACING_TIER_NOT_SUPPORTED (0)
AdditionalShadingRatesSupported : 0
PerPrimitiveShadingRateSupportedWithViewportIndexing : 0
VariableShadingRateTier : D3D12_VARIABLE_SHADING_RATE_TIER_NOT_SUPPORTED (0)
ShadingRateImageTileSize : 0
BackgroundProcessingSupported : 0
MeshShaderTier : D3D12_MESH_SHADER_TIER_NOT_SUPPORTED (0)
SamplerFeedbackTier : D3D12_SAMPLER_FEEDBACK_TIER_NOT_SUPPORTED (0)
UnalignedBlockTexturesSupported : 1
MeshShaderPipelineStatsSupported : 0
MeshShaderSupportsFullRangeRenderTargetArrayIndex : 0
AtomicInt64OnTypedResourceSupported : 1
AtomicInt64OnGroupSharedSupported : 1
DerivativesInMeshAndAmplificationShadersSupported : 0
WaveMMATier : D3D12_WAVE_MMA_TIER_NOT_SUPPORTED (0)
VariableRateShadingSumCombinerSupported : 0
MeshShaderPerPrimitiveShadingRateSupported : 0
AtomicInt64OnDescriptorHeapResourceSupported : 1
DisplayableTexture : 0
DisplayableTexture.SharedResourceCompatibilityTier : D3D12_SHARED_RESOURCE_COMPATIBILITY_TIER_0 (0)
MSPrimitivesPipelineStatisticIncludesCulledPrimitives : 0
EnhancedBarriersSupported : 0
RelaxedFormatCastingSupported : 0
UnrestrictedBufferTextureCopyPitchSupported : 1
UnrestrictedVertexElementAlignmentSupported : 1
InvertedViewportHeightFlipsYSupported : 1
InvertedViewportDepthFlipsZSupported : 1
TextureCopyBetweenDimensionsSupported : 1
AlphaBlendFactorSupported : 1
AdvancedTextureOpsSupported : 1
WriteableMSAATexturesSupported : 1
IndependentFrontAndBackStencilRefMaskSupported : 1
TriangleFanSupported : 0
DynamicIndexBufferStripCutSupported : 0
Metacommands enumerated : 4
Metacommands [parameters per stage]: Conv (Convolution) [84][1][6], Conv (Convolution) [108][5][6], GEMM (General matrix multiply) [67][1][6], GEMM (General matrix multiply) [91][5][6]
 

Attachments

  • FormatSupport_XLSX.zip
    64.3 KB · Views: 10
  • D3D12CheckFeatureSupport.zip
    83.7 KB · Views: 5
  • D3D12CheckFeatureSupport_ARM64.zip
    40.3 KB · Views: 5
Last edited:

Mindtaker

Newcomer
Direct3D12 MSDN documentation has been updated with the release of Visual Studio 2015 RC with Windows SDK build 10069, which require the latest Windows 10 build 10074. There are some minor changes to the API and the SDK now defines feature levels 12_0 and 12_1.

Unfortunately many things are broken now - Direct3D12 device can't be created on the R290X anymore returning E_INVALIDARG. DxCapsView performs erratically, now reporting Conservative Rasterization support (???) but only under x64 version, not x86, and not reporting PS-Specified Stencil Ref anymore. Also feature levels 9_x and 10_x fail to create on any adapter. So there are probably some breaking changes which require updated drivers, but maybe someone would have a better luck with Haswell/Broadwell or Kepler/Maxwell.

Code:
Direct3D 12 Feature Checker (May 2015) by DmitryKo
https://forum.beyond3d.com/posts/1838269/

ADAPTER 0
"NVIDIA GeForce GTX 980"
VEN_10DE, DEV_13C0, SUBSYS_236819DA, REV_A1
Dedicated video memory : 3221225472  bytes
Total video memory : 4294901760  bytes
Created Direct3D 12 device at feature level 11_0

Maximum feature level : D3D_FEATURE_LEVEL_11_1 (0xb100)
DoublePrecisionFloatShaderOps : 1
OutputMergerLogicOp : 1
MinPrecisionSupport : D3D12_SHADER_MIN_PRECISION_SUPPORT_NONE (0)
TiledResourcesTier : D3D12_TILED_RESOURCES_TIER_3 (3)
ResourceBindingTier : D3D12_RESOURCE_BINDING_TIER_2 (2)
PSSpecifiedStencilRefSupported : 0
TypedUAVLoadAdditionalFormats : 1
ROVsSupported : 1
ConservativeRasterizationTier : D3D12_CONSERVATIVE_RASTERIZATION_TIER_1 (1)
MaxGPUVirtualAddressBitsPerResource : 38
StandardSwizzle64KBSupported : 0
CrossNodeSharingTier : D3D12_CROSS_NODE_SHARING_TIER_NOT_SUPPORTED (0)
CrossAdapterRowMajorTextureSupported : 0
VPAndRTArrayIndexFromAnyShaderFeedingRasterizerSupportedWithoutGSEmulation : 0
ResourceHeapTier : D3D12_RESOURCE_HEAP_TIER_2 (2)
Adapter Node 0:     TileBasedRenderer: 0, UMA: 0, CacheCoherentUMA: 0

ADAPTER 1
"Microsoft Basic Render Driver"
VEN_1414, DEV_008C, SUBSYS_00000000, REV_00
Dedicated video memory : 0  bytes
Total video memory : 4276551680  bytes
Failed to create Direct3D 12 device at feature level 11_0
Error 887A0004: Este sistema no admite la interfaz de dispositivo o el nivel de característica especificados.

FINISHED running on 2015-05-02 15:37:12
2 display adapters enumerated
 
Top