Direct3D feature levels discussion

DegustatoR · Nov 16, 2019

JoeJ said:
Do you know how this has improved with Volta / Turing?

Volta / Turing's async execution is similar to GCN / RDNA's. It's not that big of an uplift over Pascal though as NV's multiprocessors aren't idling too often in graphics in the first place.

JoeJ · Nov 16, 2019

DegustatoR said:
Volta / Turing's async execution is similar to GCN / RDNA's. It's not that big of an uplift over Pascal though as NV's multiprocessors aren't idling too often in graphics in the first place.

Sounds great

It really matters if your workloads are not like those huge brute force tasks seen in games

Ext3h · Nov 16, 2019

JoeJ said:
Actually, in Vulkan the number of queues is fixed, and for GCN i get one GFX+compute queue, and two pure compute queues, so i can only enqueue 3 different tasks concurrently.

Queues in Vulkan are not necessarily same as queues on GPU though, let alone the distinct engine types which are fed by each queue. They are merely to be read as the domain in which barriers are to be evaluated in submission order. As long as none of the synchronization constraints (barriers, semaphores) are violated, work may be scheduled wherever the driver sees fit. If the driver honors your choice by issuing work submitted on 3D / compute / copy queue only to specialized engines, then that's already voluntarily.

Especially for work submitted to the copy queue, there are several cases where you actually end up with a kernel launch under the hood if the copy engine doesn't support the request format conversion.
Other way around, submissions to the same logical queue in Vulkan may very well be split round-robin to several device side queues in order to increase out-of-order-execution depth, as long as no ordering constraints are violated. Except this doesn't happen with any Vulkan driver as far as I can tell, so far.

It mostly boils down to where the driver vendor draws the line between "implicit semantic the developer may assume even though not backed by specifition", and "optimization within bounds of specified observable effects".

JoeJ said:
If you enqueue multiple tasks to just one queue, but they have no dependencies on each other (no barriers in between), then GCN can and does run those tasks async. Likely also on DX11.

And that isn't even limited to GCN. What you are seeing there, is pretty much just out-of-order-execution for kernel launches from the same device side queue, which is legal whenever there is no barrier in the queue. Which NVidias GPUs do as well, albeit historically with the catch that they could only do so if it's a compatible kernel configuration, due to the design limitation of having to pre-configure SMs for a number of kernel properties and then getting stuck with that configuration. If that wasn't a feature, you couldn't have related features like efficient render passes for small geometry (->no full device launch for fragment shader) either.

Also, no, can't do that in DX11. Not unless you are strict about what you bind as write / read only, and everything you have bound as writable is distinct between launches. Two kernel launches having write access to the same resources imply a barrier. Only exception is for draw calls due to having the output merger in there giving defined behavior even for overlapping launches.

The other half, as mentioned, is that Vulkan queues are really just a synchronization domain:

The order that batches appear in pSubmits is used to determine submission order, and thus all the implicit ordering guarantees that respect it. Other than these implicit ordering guarantees and any explicit synchronization primitives, these batches may overlap or otherwise execute out of order.

In comparison, it's actually funny that on NVidia's own cuda API, you can't express that overlapped launches are legal for a single Cuda stream. Which is why they pretty much abandoned that "stream" concept recently in favor of explicit dependency annotations, too. Which are then baked into chunks of kernels which can be launched together without barriers.

Per Lindstrom · Dec 15, 2019

Windows 1903, Radeon 5700 XT, Driver version 19.12.2, some errors.

Direct3D 12 feature checker (July 2019) by DmitryKo (x64)
https://forum.beyond3d.com/posts/1840641/

Windows 10 version 1909 (build 18363.535 19h1_release) x64

ADAPTER 0
"AMD Radeon RX 5700 XT"
VEN_1002, DEV_731F, SUBSYS_E4111DA2, REV_C1
Dedicated video memory : 8151.4 MB (8547397632 bytes)
Total video memory : 40886.3 MB (42872410112 bytes)
Video driver version : 26.20.15002.61
Maximum feature level : D3D_FEATURE_LEVEL_12_1 (0xc100)
DoublePrecisionFloatShaderOps : 1
OutputMergerLogicOp : 1
MinPrecisionSupport : D3D12_SHADER_MIN_PRECISION_SUPPORT_16_BIT (2) (0b0000'0010)
TiledResourcesTier : D3D12_TILED_RESOURCES_TIER_3 (3)
ResourceBindingTier : D3D12_RESOURCE_BINDING_TIER_3 (3)
PSSpecifiedStencilRefSupported : 1
TypedUAVLoadAdditionalFormats : 1
ROVsSupported : 1
ConservativeRasterizationTier : D3D12_CONSERVATIVE_RASTERIZATION_TIER_3 (3)
StandardSwizzle64KBSupported : 0
CrossNodeSharingTier : D3D12_CROSS_NODE_SHARING_TIER_NOT_SUPPORTED (0)
CrossAdapterRowMajorTextureSupported : 0
VPAndRTArrayIndexFromAnyShaderFeedingRasterizerSupportedWithoutGSEmulation : 1
ResourceHeapTier : D3D12_RESOURCE_HEAP_TIER_2 (2)
MaxGPUVirtualAddressBitsPerResource : 44
MaxGPUVirtualAddressBitsPerProcess : 44
Adapter Node 0: TileBasedRenderer: 0, UMA: 0, CacheCoherentUMA: 0, IsolatedMMU: 1, HeapSerializationTier: 0, ProtectedResourceSession.Support: 1Failed to query protected resource session type count
Error 80070057: Felaktig parameter.

Failed to query protected resource session types
Error 80070057: Felaktig parameter.

Failed to query maximum shader model
Error 80070057: Felaktig parameter.

WaveOps : 1
WaveLaneCountMin : 32
WaveLaneCountMax : 64
TotalLaneCount : 2560
ExpandedComputeResourceStates : 1
Int64ShaderOps : 1
RootSignature.HighestVersion : D3D_ROOT_SIGNATURE_VERSION_1_1 (2)
DepthBoundsTestSupported : 1
ProgrammableSamplePositionsTier : D3D12_PROGRAMMABLE_SAMPLE_POSITIONS_TIER_2 (2)
ShaderCache.SupportFlags : D3D12_SHADER_CACHE_SUPPORT_SINGLE_PSO | LIBRARY | AUTOMATIC_INPROC_CACHE | AUTOMATIC_DISK_CACHE (15) (0b0000'1111)
CopyQueueTimestampQueriesSupported : 1
CastingFullyTypedFormatSupported : 1
WriteBufferImmediateSupportFlags : D3D12_COMMAND_LIST_SUPPORT_FLAG_DIRECT | BUNDLE | COMPUTE | COPY (15) (0b0000'1111)
ViewInstancingTier : D3D12_VIEW_INSTANCING_TIER_1 (1)
BarycentricsSupported : 0
ExistingHeaps.Supported : 1
MSAA64KBAlignedTextureSupported : 1
SharedResourceCompatibilityTier : D3D12_SHARED_RESOURCE_COMPATIBILITY_TIER_1 (1)
Native16BitShaderOpsSupported : 1
AtomicShaderInstructions : 0
SRVOnlyTiledResourceTier3 : 1
RenderPassesTier : D3D12_RENDER_PASS_TIER_0 (0)
RaytracingTier : D3D12_RAYTRACING_TIER_NOT_SUPPORTED (0)
AdditionalShadingRatesSupported : 0
PerPrimitiveShadingRateSupportedWithViewportIndexing : 0
VariableShadingRateTier : D3D12_VARIABLE_SHADING_RATE_TIER_NOT_SUPPORTED (0)
ShadingRateImageTileSize : 0
BackgroundProcessingSupported : 0
Failed to query feature data 7
Error 80070057: Felaktig parameter.

Metacommands enumerated : 4
Metacommands [parameters per stage]: Conv (Convolution) [84][1][6], Conv (Convolution) [108][5][6], GEMM (General matrix multiply) [67][1][6], GEMM (General matrix multiply) [91][5][6]

DmitryKo · Dec 15, 2019

Per Lindstrom said:
Windows 1903, Radeon 5700 XT, Driver version 19.12.2, some errors.

These features are only available on Windows 20H1 - my tool assumes build 18363 belongs to 20H1 branch, but Microsoft assigned it to Windows 1909 since August... will fix it soon.

DavidGraham · Feb 28, 2020

Windows 10 20H1 (Version 2004) includes WDDM 2.7.

Available in Windows 10 Insider with Nvidia Driver 450.12, and Intel 27.20.100.7859, & AMD 27.20.1000.8009 or newer in insider builds starting from 10.0.19041.84.

Hardware-accelerated GPU scheduling: It allows the video card to directly manage its video memory[49], which in turn significantly improves the performance of the minimum and average FPS, and thereby reducing latency. It works regardless of the API used for games and applications such as DirectX/Vulkan/OpenGL. (According to observations at the current time before the release of Windows 10 version 2004, the option requires hardware support for the Shader Model not lower than version 6.3, which can be found through AIDA64, but not GPU-Z, as it displays not reliable information) It is supported by Nvidia Geforce video cards starting from the 10th series, as well as integrated graphics from Intel HD 500 or later in both cases. But it is not supported by AMD cards, since the level of functions of the shader model is not updated and remains at 5.1 in the hardware and 6.2 for the latest cards, which is not enough to support to enable this option. And also it is worth noting that the forced change of the option through the registry keys does not affect for unsupported cards. It is also possible that this technology is associated with the description of this patent.
Shader Model 6.5
DirectX 12 Raytracing Tier 1.1
DirectX 12 Mesh Shader
DirectX 12 Sampler Feedback: Texture Streaming, Texture-Space Shading

https://en.wikipedia.org/wiki/Windows_Display_Driver_Model#WDDM_2.7

Malo · Feb 28, 2020

What hardware features in particular is AMD missing for shader model 6.3?

Kaotik · Feb 28, 2020

DavidGraham said:
Windows 10 20H1 (Version 2004) includes WDDM 2.7.

Available in Windows 10 Insider with Nvidia Driver 450.12, and Intel 27.20.100.7859, & AMD 27.20.1000.8009 or newer in insider builds starting from 10.0.19041.84.

Hardware-accelerated GPU scheduling: It allows the video card to directly manage its video memory[49], which in turn significantly improves the performance of the minimum and average FPS, and thereby reducing latency. It works regardless of the API used for games and applications such as DirectX/Vulkan/OpenGL. (According to observations at the current time before the release of Windows 10 version 2004, the option requires hardware support for the Shader Model not lower than version 6.3, which can be found through AIDA64, but not GPU-Z, as it displays not reliable information) It is supported by Nvidia Geforce video cards starting from the 10th series, as well as integrated graphics from Intel HD 500 or later in both cases. But it is not supported by AMD cards, since the level of functions of the shader model is not updated and remains at 5.1 in the hardware and 6.2 for the latest cards, which is not enough to support to enable this option. And also it is worth noting that the forced change of the option through the registry keys does not affect for unsupported cards. It is also possible that this technology is associated with the description of this patent.

Hm? GCN 2.0 and newer supports Shader Model 6.3 since 18.10.1 drivers

pharma · Feb 28, 2020

Malo said:
What hardware features in particular is AMD missing for shader model 6.3?

I imagine features related to DX12 raytracing acceleration. Not sure, but can AMD gpu's now utilize the DXR fallback layer if required?

JoeJ · Feb 28, 2020

pharma said:
I imagine features related to DX12 raytracing acceleration. Not sure, but can AMD gpu's now utilize the DXR fallback layer if required?

Guess no.
Just yesterday i installed newest Visual Studio to try a DX12 demo of somebody, but it failed at initializing DXR fallback with Vega56 GPU.

Malo · Feb 28, 2020

pharma said:
I imagine features related to DX12 raytracing acceleration. Not sure, but can AMD gpu's now utilize the DXR fallback layer if required?

What does hardware-accelerated GPU scheduling have to do with raytracing capabilities?

pharma · Feb 28, 2020

Malo said:
What does hardware-accelerated GPU scheduling have to do with raytracing capabilities?

No direct correlation to raytracing capabilities, though I'd be surprised if hardware acceleration raytracing was excluded from AMD's roadmap.

JoeJ · Feb 28, 2020

Malo said:
What does hardware-accelerated GPU scheduling have to do with raytracing capabilities?

I have been told, NVs RTX was a sum of 2 things: Fine grained sheduling introduced with Volta, and Turing RT cores.
The sheduling is likely to switch between various programs like generation / hit shaders and recursion, and rerouting / shuffling rays between them to improve coherency if that's a thing. Also task shaders may utilize it. (just guessing)

So, probably those sheduling options can be used for other things as well and MS is now utilizing this?
Mentioning video memory also hints dynamic allocation at a finer, potentially programmable level, maybe?

It smells like there is a bunch of revolutionary options available, potentially fixing the second class citizen co-prozessor status that GPUs currently still have. But it has to be programmable and exposed...

Alessio1989 · Feb 28, 2020

DavidGraham said:
Windows 10 20H1 (Version 2004) includes WDDM 2.7.

Available in Windows 10 Insider with Nvidia Driver 450.12, and Intel 27.20.100.7859, & AMD 27.20.1000.8009 or newer in insider builds starting from 10.0.19041.84.

Hardware-accelerated GPU scheduling: It allows the video card to directly manage its video memory[49], which in turn significantly improves the performance of the minimum and average FPS, and thereby reducing latency. It works regardless of the API used for games and applications such as DirectX/Vulkan/OpenGL. (According to observations at the current time before the release of Windows 10 version 2004, the option requires hardware support for the Shader Model not lower than version 6.3, which can be found through AIDA64, but not GPU-Z, as it displays not reliable information) It is supported by Nvidia Geforce video cards starting from the 10th series, as well as integrated graphics from Intel HD 500 or later in both cases. But it is not supported by AMD cards, since the level of functions of the shader model is not updated and remains at 5.1 in the hardware and 6.2 for the latest cards, which is not enough to support to enable this option. And also it is worth noting that the forced change of the option through the registry keys does not affect for unsupported cards. It is also possible that this technology is associated with the description of this patent.

Shader Model 6.5

DirectX 12 Raytracing Tier 1.1

DirectX 12 Mesh Shader

DirectX 12 Sampler Feedback: Texture Streaming, Texture-Space Shading

https://en.wikipedia.org/wiki/Windows_Display_Driver_Model#WDDM_2.7

stop posting this bullcrap. please..

pharma said:
I imagine features related to DX12 raytracing acceleration. Not sure, but can AMD gpu's now utilize the DXR fallback layer if required?

The fallback layer was primary intended for debug and development pruposese and is no more updated to lack of interest by IHVs and 3rd party software developers.

Malo said:
What hardware features in particular is AMD missing for shader model 6.3?

Nothing related to WDDM 2.7, however it would be nice to have barycentrics support without AGS..

DavidGraham · Feb 28, 2020

Alessio1989 said:
stop posting this bullcrap. please..

???

Alessio1989 · Feb 28, 2020

DavidGraham said:
???

nothing against you... but those "statements" about hw scheduling are 90% fake.

DavidGraham · Feb 28, 2020

Alessio1989 said:
but those "statements" about hw scheduling are 90% fake

Could you elaborate more? is the feature imaginary ? or it does something entirely different than the above description?

Alessio1989 · Feb 28, 2020

DavidGraham said:
Could you elaborate more? is the feature imaginary ? or it does something entirely different than the above description?

There is a change (undocumented for public) in the gpu driver scheduler but none of those claims are true or correlated.

All other new d3d12 features are already known and explained on the DirectX team dev blog: https://devblogs.microsoft.com/directx/

Malo · Feb 28, 2020

I'm sure we'll see an article on the usual clickbait sites soon about it.

DmitryKo · Apr 20, 2020

Direct3D feature level 12_2 is coming for Windows 20H2 (still missing from the most recent SDK 19592).

This will probably include individual features in DirectX12 Ultimate - that is DirectX Raytracing 1.1, mesh shaders, variable-rate shading, and sampler feedback - which shall be available in Windows 20H1 (build 19041) as well.

Direct3D feature levels discussion

DegustatoR

JoeJ

Ext3h

Per Lindstrom

DmitryKo

DavidGraham

Malo

Yak Mechanicum

Kaotik

Drunk Member

pharma

JoeJ

Malo

Yak Mechanicum

pharma

JoeJ

Alessio1989

DavidGraham

Alessio1989

DavidGraham

Alessio1989

Malo

Yak Mechanicum

DmitryKo