Direct3D feature levels discussion

Volta / Turing's async execution is similar to GCN / RDNA's. It's not that big of an uplift over Pascal though as NV's multiprocessors aren't idling too often in graphics in the first place.
Sounds great :D
It really matters if your workloads are not like those huge brute force tasks seen in games ;)
 
Actually, in Vulkan the number of queues is fixed, and for GCN i get one GFX+compute queue, and two pure compute queues, so i can only enqueue 3 different tasks concurrently.
Queues in Vulkan are not necessarily same as queues on GPU though, let alone the distinct engine types which are fed by each queue. They are merely to be read as the domain in which barriers are to be evaluated in submission order. As long as none of the synchronization constraints (barriers, semaphores) are violated, work may be scheduled wherever the driver sees fit. If the driver honors your choice by issuing work submitted on 3D / compute / copy queue only to specialized engines, then that's already voluntarily.

Especially for work submitted to the copy queue, there are several cases where you actually end up with a kernel launch under the hood if the copy engine doesn't support the request format conversion.
Other way around, submissions to the same logical queue in Vulkan may very well be split round-robin to several device side queues in order to increase out-of-order-execution depth, as long as no ordering constraints are violated. Except this doesn't happen with any Vulkan driver as far as I can tell, so far.

It mostly boils down to where the driver vendor draws the line between "implicit semantic the developer may assume even though not backed by specifition", and "optimization within bounds of specified observable effects".

If you enqueue multiple tasks to just one queue, but they have no dependencies on each other (no barriers in between), then GCN can and does run those tasks async. Likely also on DX11.
And that isn't even limited to GCN. What you are seeing there, is pretty much just out-of-order-execution for kernel launches from the same device side queue, which is legal whenever there is no barrier in the queue. Which NVidias GPUs do as well, albeit historically with the catch that they could only do so if it's a compatible kernel configuration, due to the design limitation of having to pre-configure SMs for a number of kernel properties and then getting stuck with that configuration. If that wasn't a feature, you couldn't have related features like efficient render passes for small geometry (->no full device launch for fragment shader) either.

Also, no, can't do that in DX11. Not unless you are strict about what you bind as write / read only, and everything you have bound as writable is distinct between launches. Two kernel launches having write access to the same resources imply a barrier. Only exception is for draw calls due to having the output merger in there giving defined behavior even for overlapping launches.

The other half, as mentioned, is that Vulkan queues are really just a synchronization domain:
The order that batches appear in pSubmits is used to determine submission order, and thus all the implicit ordering guarantees that respect it. Other than these implicit ordering guarantees and any explicit synchronization primitives, these batches may overlap or otherwise execute out of order.

In comparison, it's actually funny that on NVidia's own cuda API, you can't express that overlapped launches are legal for a single Cuda stream. Which is why they pretty much abandoned that "stream" concept recently in favor of explicit dependency annotations, too. Which are then baked into chunks of kernels which can be launched together without barriers.
 
Last edited:
Windows 1903, Radeon 5700 XT, Driver version 19.12.2, some errors.
Direct3D 12 feature checker (July 2019) by DmitryKo (x64)
https://forum.beyond3d.com/posts/1840641/

Windows 10 version 1909 (build 18363.535 19h1_release) x64

ADAPTER 0
"AMD Radeon RX 5700 XT"
VEN_1002, DEV_731F, SUBSYS_E4111DA2, REV_C1
Dedicated video memory : 8151.4 MB (8547397632 bytes)
Total video memory : 40886.3 MB (42872410112 bytes)
Video driver version : 26.20.15002.61
Maximum feature level : D3D_FEATURE_LEVEL_12_1 (0xc100)
DoublePrecisionFloatShaderOps : 1
OutputMergerLogicOp : 1
MinPrecisionSupport : D3D12_SHADER_MIN_PRECISION_SUPPORT_16_BIT (2) (0b0000'0010)
TiledResourcesTier : D3D12_TILED_RESOURCES_TIER_3 (3)
ResourceBindingTier : D3D12_RESOURCE_BINDING_TIER_3 (3)
PSSpecifiedStencilRefSupported : 1
TypedUAVLoadAdditionalFormats : 1
ROVsSupported : 1
ConservativeRasterizationTier : D3D12_CONSERVATIVE_RASTERIZATION_TIER_3 (3)
StandardSwizzle64KBSupported : 0
CrossNodeSharingTier : D3D12_CROSS_NODE_SHARING_TIER_NOT_SUPPORTED (0)
CrossAdapterRowMajorTextureSupported : 0
VPAndRTArrayIndexFromAnyShaderFeedingRasterizerSupportedWithoutGSEmulation : 1
ResourceHeapTier : D3D12_RESOURCE_HEAP_TIER_2 (2)
MaxGPUVirtualAddressBitsPerResource : 44
MaxGPUVirtualAddressBitsPerProcess : 44
Adapter Node 0: TileBasedRenderer: 0, UMA: 0, CacheCoherentUMA: 0, IsolatedMMU: 1, HeapSerializationTier: 0, ProtectedResourceSession.Support: 1Failed to query protected resource session type count
Error 80070057: Felaktig parameter.

Failed to query protected resource session types
Error 80070057: Felaktig parameter.

Failed to query maximum shader model
Error 80070057: Felaktig parameter.

WaveOps : 1
WaveLaneCountMin : 32
WaveLaneCountMax : 64
TotalLaneCount : 2560
ExpandedComputeResourceStates : 1
Int64ShaderOps : 1
RootSignature.HighestVersion : D3D_ROOT_SIGNATURE_VERSION_1_1 (2)
DepthBoundsTestSupported : 1
ProgrammableSamplePositionsTier : D3D12_PROGRAMMABLE_SAMPLE_POSITIONS_TIER_2 (2)
ShaderCache.SupportFlags : D3D12_SHADER_CACHE_SUPPORT_SINGLE_PSO | LIBRARY | AUTOMATIC_INPROC_CACHE | AUTOMATIC_DISK_CACHE (15) (0b0000'1111)
CopyQueueTimestampQueriesSupported : 1
CastingFullyTypedFormatSupported : 1
WriteBufferImmediateSupportFlags : D3D12_COMMAND_LIST_SUPPORT_FLAG_DIRECT | BUNDLE | COMPUTE | COPY (15) (0b0000'1111)
ViewInstancingTier : D3D12_VIEW_INSTANCING_TIER_1 (1)
BarycentricsSupported : 0
ExistingHeaps.Supported : 1
MSAA64KBAlignedTextureSupported : 1
SharedResourceCompatibilityTier : D3D12_SHARED_RESOURCE_COMPATIBILITY_TIER_1 (1)
Native16BitShaderOpsSupported : 1
AtomicShaderInstructions : 0
SRVOnlyTiledResourceTier3 : 1
RenderPassesTier : D3D12_RENDER_PASS_TIER_0 (0)
RaytracingTier : D3D12_RAYTRACING_TIER_NOT_SUPPORTED (0)
AdditionalShadingRatesSupported : 0
PerPrimitiveShadingRateSupportedWithViewportIndexing : 0
VariableShadingRateTier : D3D12_VARIABLE_SHADING_RATE_TIER_NOT_SUPPORTED (0)
ShadingRateImageTileSize : 0
BackgroundProcessingSupported : 0
Failed to query feature data 7
Error 80070057: Felaktig parameter.

Metacommands enumerated : 4
Metacommands [parameters per stage]: Conv (Convolution) [84][1][6], Conv (Convolution) [108][5][6], GEMM (General matrix multiply) [67][1][6], GEMM (General matrix multiply) [91][5][6]
 
Windows 10 20H1 (Version 2004) includes WDDM 2.7.

Available in Windows 10 Insider with Nvidia Driver 450.12, and Intel 27.20.100.7859, & AMD 27.20.1000.8009 or newer in insider builds starting from 10.0.19041.84.

  • Hardware-accelerated GPU scheduling: It allows the video card to directly manage its video memory[49], which in turn significantly improves the performance of the minimum and average FPS, and thereby reducing latency. It works regardless of the API used for games and applications such as DirectX/Vulkan/OpenGL. (According to observations at the current time before the release of Windows 10 version 2004, the option requires hardware support for the Shader Model not lower than version 6.3, which can be found through AIDA64, but not GPU-Z, as it displays not reliable information) It is supported by Nvidia Geforce video cards starting from the 10th series, as well as integrated graphics from Intel HD 500 or later in both cases. But it is not supported by AMD cards, since the level of functions of the shader model is not updated and remains at 5.1 in the hardware and 6.2 for the latest cards, which is not enough to support to enable this option. And also it is worth noting that the forced change of the option through the registry keys does not affect for unsupported cards. It is also possible that this technology is associated with the description of this patent.
  • Shader Model 6.5
  • DirectX 12 Raytracing Tier 1.1
  • DirectX 12 Mesh Shader
  • DirectX 12 Sampler Feedback: Texture Streaming, Texture-Space Shading

https://en.wikipedia.org/wiki/Windows_Display_Driver_Model#WDDM_2.7
 
What hardware features in particular is AMD missing for shader model 6.3?
 
Windows 10 20H1 (Version 2004) includes WDDM 2.7.

Available in Windows 10 Insider with Nvidia Driver 450.12, and Intel 27.20.100.7859, & AMD 27.20.1000.8009 or newer in insider builds starting from 10.0.19041.84.

  • Hardware-accelerated GPU scheduling: It allows the video card to directly manage its video memory[49], which in turn significantly improves the performance of the minimum and average FPS, and thereby reducing latency. It works regardless of the API used for games and applications such as DirectX/Vulkan/OpenGL. (According to observations at the current time before the release of Windows 10 version 2004, the option requires hardware support for the Shader Model not lower than version 6.3, which can be found through AIDA64, but not GPU-Z, as it displays not reliable information) It is supported by Nvidia Geforce video cards starting from the 10th series, as well as integrated graphics from Intel HD 500 or later in both cases. But it is not supported by AMD cards, since the level of functions of the shader model is not updated and remains at 5.1 in the hardware and 6.2 for the latest cards, which is not enough to support to enable this option. And also it is worth noting that the forced change of the option through the registry keys does not affect for unsupported cards. It is also possible that this technology is associated with the description of this patent.
Hm? GCN 2.0 and newer supports Shader Model 6.3 since 18.10.1 drivers
 
What does hardware-accelerated GPU scheduling have to do with raytracing capabilities?
No direct correlation to raytracing capabilities, though I'd be surprised if hardware acceleration raytracing was excluded from AMD's roadmap.
 
What does hardware-accelerated GPU scheduling have to do with raytracing capabilities?
I have been told, NVs RTX was a sum of 2 things: Fine grained sheduling introduced with Volta, and Turing RT cores.
The sheduling is likely to switch between various programs like generation / hit shaders and recursion, and rerouting / shuffling rays between them to improve coherency if that's a thing. Also task shaders may utilize it. (just guessing)

So, probably those sheduling options can be used for other things as well and MS is now utilizing this?
Mentioning video memory also hints dynamic allocation at a finer, potentially programmable level, maybe?

It smells like there is a bunch of revolutionary options available, potentially fixing the second class citizen co-prozessor status that GPUs currently still have. But it has to be programmable and exposed...
 
Windows 10 20H1 (Version 2004) includes WDDM 2.7.

Available in Windows 10 Insider with Nvidia Driver 450.12, and Intel 27.20.100.7859, & AMD 27.20.1000.8009 or newer in insider builds starting from 10.0.19041.84.

  • Hardware-accelerated GPU scheduling: It allows the video card to directly manage its video memory[49], which in turn significantly improves the performance of the minimum and average FPS, and thereby reducing latency. It works regardless of the API used for games and applications such as DirectX/Vulkan/OpenGL. (According to observations at the current time before the release of Windows 10 version 2004, the option requires hardware support for the Shader Model not lower than version 6.3, which can be found through AIDA64, but not GPU-Z, as it displays not reliable information) It is supported by Nvidia Geforce video cards starting from the 10th series, as well as integrated graphics from Intel HD 500 or later in both cases. But it is not supported by AMD cards, since the level of functions of the shader model is not updated and remains at 5.1 in the hardware and 6.2 for the latest cards, which is not enough to support to enable this option. And also it is worth noting that the forced change of the option through the registry keys does not affect for unsupported cards. It is also possible that this technology is associated with the description of this patent.
  • Shader Model 6.5
  • DirectX 12 Raytracing Tier 1.1
  • DirectX 12 Mesh Shader
  • DirectX 12 Sampler Feedback: Texture Streaming, Texture-Space Shading

https://en.wikipedia.org/wiki/Windows_Display_Driver_Model#WDDM_2.7
stop posting this bullcrap. please..
I imagine features related to DX12 raytracing acceleration. Not sure, but can AMD gpu's now utilize the DXR fallback layer if required?
The fallback layer was primary intended for debug and development pruposese and is no more updated to lack of interest by IHVs and 3rd party software developers.
What hardware features in particular is AMD missing for shader model 6.3?
Nothing related to WDDM 2.7, however it would be nice to have barycentrics support without AGS..
 
Last edited:
Back
Top