Direct3D feature levels discussion

Programmable sample. This. Finally.
My Siggraph 2015 presentation want list (http://advances.realtimerendering.c...siggraph2015_combined_final_footer_220dpi.pdf) is getting smaller every day. Still missing SV_Barycentric (pixel shader) and shader language with generics/templates.

And where's my UpdateTileMappingsIndirect! Tiled resources (and 3d tiled resources) are much less useful when you can't change tile mappings on GPU side. Still need to do software indirection for virtual shadow mapping :(
 
My Siggraph 2015 presentation want list (http://advances.realtimerendering.c...siggraph2015_combined_final_footer_220dpi.pdf) is getting smaller every day. Still missing SV_Barycentric (pixel shader) and shader language with generics/templates.

And where's my UpdateTileMappingsIndirect! Tiled resources (and 3d tiled resources) are much less useful when you can't change tile mappings on GPU side. Still need to do software indirection for virtual shadow mapping :(
Something is moving in the wish-list/roadmap of DXIL...
 
I was in DX12 advisory board when I was still working at Ubisoft... Now I am just waiting for new info patently like everybody else :(
They also don't share any-more previous infos like it was in the EAP :(
But I really appreciate the fact the new compiler is open on github.
 
They also don't share any-more previous infos like it was in the EAP :(
But I really appreciate the fact the new compiler is open on github.
That sucks... but the new open source HLSL compiler at github is a very good thing indeed. Hopefully there will be steady stream of language improvements in the future. DirectCompute hasn't practically changed a bit in 8 years! CUDA now is even further ahead than it was at DirectCompute launch. There's plenty of good properly tested and optimized CUDA libraries for various tasks, but practically no DirectCompute libraries at all. Without any support for generics/templates, it is painful to make general purpose algorithms and data structures. This is the main reason there's no libraries for DirectCompute.

Even simple stuff like passing reference to groupshared memory array (to a function) is missing in DirectCompute. You need to hardcode access to groupshared memory inside the functions (access groupshared globals directly). Need to copy the function if you want to do the same operation to another groupshared memory array. This is awful when you are programming reductions (prefix scan, averages/sums, etc). And there's no way to reuse the same groupshared memory region either (overlap arrays in groupshared memory). Two groupshared memory arrays both reserve their own space, even when you never want to use them at the same time. These two things combined make even simple things such as using functions hard in HLSL compute shaders. Copy & paste and macro hacks are often the only solutions. We need something better soon.

OpenCL 2.1 added C++ based shading language with modern features. Vulkan uses the same SPIR-V intermediate language as OpenCL 2.1+ and there's been discussions since the Vulkan launch about a C++ based modern shading language. When this happens, this would leave DirectX (DirectCompute) the only relevant modern API without a modern C++ based shading language: Metal, OpenCL, CUDA, (soon Vulkan).
 
OpenCL 2.1 added C++ based shading language with modern features. Vulkan uses the same SPIR-V intermediate language as OpenCL 2.1+ and there's been discussions since the Vulkan launch about a C++ based modern shading language. When this happens, this would leave DirectX (DirectCompute) the only relevant modern API without a modern C++ based shading language: Metal, OpenCL, CUDA, (soon Vulkan).

Vulkan and OpenCL don't really share SPIR-V, there are two profiles and Vulkan doesn't accept the profile used by OpenCL (logical vs physical memory model). That is not to say the problem isn't being worked on, but currently just having a C++ -> SPIR-V compiler that works for OpenCL isn't enough for Vulkan to get support.
 
Vulkan and OpenCL don't really share SPIR-V, there are two profiles and Vulkan doesn't accept the profile used by OpenCL (logical vs physical memory model). That is not to say the problem isn't being worked on, but currently just having a C++ -> SPIR-V compiler that works for OpenCL isn't enough for Vulkan to get support.
There are differences, but OpenCL and Vulkan compute shaders try to solve the same problem and use the same intermediate language (albeit not 100% compatible). We already have a modern C++ compiler for OpenCL and it produces SPIR-V code. We eventually want identical feature set for Vulkan compute shaders. I don't know what was the main reason to choose a different memory model for Vulkan compute shaders versus OpenCL 2.1. But I would argue that OpenCL (and CUDA) model is better for compute shaders, so I'd prefer Vulkan compute shaders to switch to physical memory model as well (eventually). There is no reason to limit compute shader functionality because you want simplicity in pixel/vertex shaders. Compute needs more flexibility and better language constructs for generic algorithms and data structures.
 
The updated MSDN documentation is live:

Windows 10, version 1703

The following topics have been added to the Direct3D documentation for Windows 10, version 1703:

Shader Model 6.0 is currently an experimental feature which needs to be explicitly enabled using the D3D12EnableExperimentalFeatures function.

https://github.com/Microsoft/DirectXShaderCompiler/wiki/FAQ

What is Experimental mode, and how do I enable experimental features?

Experimental mode is a new feature of Direct3D in Windows 10. It lets software developers collaborate with each other and with IHVs on prototyping of new features on GPU drivers. Here is how to access it:

  1. Turn on Developer Mode in your OS:
    Settings -> Update&Security -> For Developers -> (*) Developer Mode
  2. Enable an experimental mode feature in your app by calling this routine before calling CreateDevice().
    D3D12EnableExperimentalFeatures( D3D12ExperimentalShaderModels );
  3. Acquire a driver (or software renderer) that supports experimental mode

d3d12.h
HRESULT WINAPI D3D12EnableExperimentalFeatures(
UINT NumFeatures,
_In_ const IID *pIIDs,
_In_ void *pConfigurationStructs,
_In_ UINT *pConfigurationStructSizes
);

// --------------------------------------------------------------------------------------------------------------------------------
// Experimental Feature: D3D12ExperimentalShaderModels
//
// Use with D3D12EnableExperimentalFeatures to enable experimental shader model support,
// meaning shader models that haven't been finalized for use in retail.
//
// Enabling D3D12ExperimentalShaderModels needs no configuration struct, pass NULL in the pConfigurationStructs array.
//
// --------------------------------------------------------------------------------------------------------------------------------

static const UUID D3D12ExperimentalShaderModels = { /* 76f5573e-f13a-40f5-b297-81ce9e18933f */
0x76f5573e,
0xf13a,
0x40f5,
{ 0xb2, 0x97, 0x81, 0xce, 0x9e, 0x18, 0x93, 0x3f }
};
 
Last edited:
So, there is also depth-bound test. Maybe a little to late, but it could be still useful in some cases.. Most of AMD and NVIDIA graphics cards should support it. Not sure about Intel.
 
There are differences, but OpenCL and Vulkan compute shaders try to solve the same problem and use the same intermediate language (albeit not 100% compatible). We already have a modern C++ compiler for OpenCL and it produces SPIR-V code. We eventually want identical feature set for Vulkan compute shaders. I don't know what was the main reason to choose a different memory model for Vulkan compute shaders versus OpenCL 2.1. But I would argue that OpenCL (and CUDA) model is better for compute shaders, so I'd prefer Vulkan compute shaders to switch to physical memory model as well (eventually). There is no reason to limit compute shader functionality because you want simplicity in pixel/vertex shaders. Compute needs more flexibility and better language constructs for generic algorithms and data structures.

As I understand it, the reason for Vulkan compute being logical memory model, was concerns about validation, particularly among mobile and web clients.Its also worth noting that sometimes Khronos can be 'political', if Vulkan does everything OpenCL does, is OpenCL obselete? And then you get all the fun of people protecting turf etc. I've heard that any compute in Vulkan was hard, similarly why it came relatively late to GL, the orthodoxy is/was CL is for compute, Vulkan is just for graphics, so why blur the lines?!

Personally I agree 100% that we need physical memory model Vulkan compute ASAP.
 
As I understand it, the reason for Vulkan compute being logical memory model, was concerns about validation, particularly among mobile and web clients.Its also worth noting that sometimes Khronos can be 'political', if Vulkan does everything OpenCL does, is OpenCL obselete? And then you get all the fun of people protecting turf etc. I've heard that any compute in Vulkan was hard, similarly why it came relatively late to GL, the orthodoxy is/was CL is for compute, Vulkan is just for graphics, so why blur the lines?!
The only problem is that the lines have already been blurred a long time ago. Nowadays all modern AAA renderers use a lot of compute shaders. There are games that are almost pure compute (see: Media Molecule's game "Dreams"). Point clouds, distance fields, voxels are all getting traction rapidly. Compute shaders are used to process and display these. Polygon rendering isn't the only way to render modern graphics anymore. GPU-driven renderers that process the scene data structures using compute shaders perform render setup & culling (viewport & occlusion & sub-object) on GPU are also getting popular.

Graphics API without good compute support is useless for modern rendering techniques. I am not happy until DirectX and Vulkan compute shaders match CUDA in productivity and feature set. Shader Model 6.0 finally gave us cross lane operations and and other goodies, so at least there's hope. But HLSL/GLSL is still way behind CUDA in productivity features. Because of this there's barely any compute libraries for HLSL/GLSL.
 
Last edited:
Graphics API without good compute support is useless for modern rendering techniques. I am not happy until DirectX and Vulkan compute shaders match CUDA in productivity and feature set. Shader Model 6.0 finally gave us cross lane operations and and other goodies, so at least there's hope. But HLSL/GLSL is still way behind CUDA in productivity features. Because of this there's barely any compute libraries for HLSL/GLSL.
I whole heartily agree but what do we know *sigh*
I personally see the lack of decent cross-platform compute rendering capabilities the biggest thing holding rendering back at the moment. HIP gives us CUDA compatiblity (to a degree) on AMD GPUs, but that needs windows support first AND a way of interacting with Vulkan or Dx12.
And that still leaves out Intel, so maybe even that wouldn't be enough...
 
I did some quick queries on the AMD drivers: the D3D12_COMMAND_QUEUE_PRIORITY_GLOBAL_REALTIME priority is reported to be supported for both copy and compute queues on GCN3, none on GCN1. I may guess this is something similar to the "quick response queue".
Depth bound test is also reported to be supported for both architecture as I expected.
None of the driver cards reported to support programmable sampling for now.
Shader caching is reported to be D3D12_SHADER_CACHE_SUPPORT_SINGLE_PSO and D3D12_SHADER_CACHE_SUPPORT_LIBRARY, but not D3D12_SHADER_CACHE_SUPPORT_AUTOMATIC_INPROC_CACHE or D3D12_SHADER_CACHE_SUPPORT_AUTOMATIC_DISK_CACHE for now.
The driver also reported the architecture to have an "Isolated MMU" on both GPUs.
 
I did some quick queries on the AMD drivers: the D3D12_COMMAND_QUEUE_PRIORITY_GLOBAL_REALTIME priority is reported to be supported for both copy and compute queues on GCN3, none on GCN1. I may guess this is something similar to the "quick response queue".
Depth bound test is also reported to be supported for both architecture as I expected.
None of the driver cards reported to support programmable sampling for now.
Shader caching is reported to be D3D12_SHADER_CACHE_SUPPORT_SINGLE_PSO and D3D12_SHADER_CACHE_SUPPORT_LIBRARY, but not D3D12_SHADER_CACHE_SUPPORT_AUTOMATIC_INPROC_CACHE or D3D12_SHADER_CACHE_SUPPORT_AUTOMATIC_DISK_CACHE for now.
The driver also reported the architecture to have an "Isolated MMU" on both GPUs.
Worth noting is that AMD didn't include "official Creators Update support" on the 17.4.1's so maybe there will be some changes in next driver set
 
AMD never tell about non-gaming or non-multimedia related things in the release notes..

edit: just ntried with 17.4.2 (first AMD drivers marked as "WDDM 2.2"): no changes. I did not check with experimental mode enable, I am too lazy.

edit2: enabling experimental mode for shader model, both o my AMD GPUs, a GCN1 and a GCN3, reports SM6 support.
 
Last edited:
Last edited:
Back
Top