Direct3D feature levels discussion

Alessio1989 · Mar 24, 2017

Programmable sample. This. Finally.

sebbbi · Mar 24, 2017

Alessio1989 said:
Programmable sample. This. Finally.

My Siggraph 2015 presentation want list (http://advances.realtimerendering.c...siggraph2015_combined_final_footer_220dpi.pdf) is getting smaller every day. Still missing SV_Barycentric (pixel shader) and shader language with generics/templates.

And where's my UpdateTileMappingsIndirect! Tiled resources (and 3d tiled resources) are much less useful when you can't change tile mappings on GPU side. Still need to do software indirection for virtual shadow mapping

Alessio1989 · Mar 24, 2017

sebbbi said:
My Siggraph 2015 presentation want list (http://advances.realtimerendering.c...siggraph2015_combined_final_footer_220dpi.pdf) is getting smaller every day. Still missing SV_Barycentric (pixel shader) and shader language with generics/templates.

And where's my UpdateTileMappingsIndirect! Tiled resources (and 3d tiled resources) are much less useful when you can't change tile mappings on GPU side. Still need to do software indirection for virtual shadow mapping

Something is moving in the wish-list/roadmap of DXIL...

sebbbi · Mar 24, 2017

Alessio1989 said:
Something is moving in the wish-list/roadmap of DXIL...

I was in DX12 advisory board when I was still working at Ubisoft... Now I am just waiting for new info patently like everybody else

Alessio1989 · Mar 24, 2017

sebbbi said:
I was in DX12 advisory board when I was still working at Ubisoft... Now I am just waiting for new info patently like everybody else

They also don't share any-more previous infos like it was in the EAP

But I really appreciate the fact the new compiler is open on github.

sebbbi · Mar 25, 2017

Alessio1989 said:
They also don't share any-more previous infos like it was in the EAP
But I really appreciate the fact the new compiler is open on github.

That sucks... but the new open source HLSL compiler at github is a very good thing indeed. Hopefully there will be steady stream of language improvements in the future. DirectCompute hasn't practically changed a bit in 8 years! CUDA now is even further ahead than it was at DirectCompute launch. There's plenty of good properly tested and optimized CUDA libraries for various tasks, but practically no DirectCompute libraries at all. Without any support for generics/templates, it is painful to make general purpose algorithms and data structures. This is the main reason there's no libraries for DirectCompute.

Even simple stuff like passing reference to groupshared memory array (to a function) is missing in DirectCompute. You need to hardcode access to groupshared memory inside the functions (access groupshared globals directly). Need to copy the function if you want to do the same operation to another groupshared memory array. This is awful when you are programming reductions (prefix scan, averages/sums, etc). And there's no way to reuse the same groupshared memory region either (overlap arrays in groupshared memory). Two groupshared memory arrays both reserve their own space, even when you never want to use them at the same time. These two things combined make even simple things such as using functions hard in HLSL compute shaders. Copy & paste and macro hacks are often the only solutions. We need something better soon.

OpenCL 2.1 added C++ based shading language with modern features. Vulkan uses the same SPIR-V intermediate language as OpenCL 2.1+ and there's been discussions since the Vulkan launch about a C++ based modern shading language. When this happens, this would leave DirectX (DirectCompute) the only relevant modern API without a modern C++ based shading language: Metal, OpenCL, CUDA, (soon Vulkan).

Ethatron · Mar 27, 2017

There is always C++ AMP.

DeanoC · Mar 28, 2017

sebbbi said:
OpenCL 2.1 added C++ based shading language with modern features. Vulkan uses the same SPIR-V intermediate language as OpenCL 2.1+ and there's been discussions since the Vulkan launch about a C++ based modern shading language. When this happens, this would leave DirectX (DirectCompute) the only relevant modern API without a modern C++ based shading language: Metal, OpenCL, CUDA, (soon Vulkan).

Vulkan and OpenCL don't really share SPIR-V, there are two profiles and Vulkan doesn't accept the profile used by OpenCL (logical vs physical memory model). That is not to say the problem isn't being worked on, but currently just having a C++ -> SPIR-V compiler that works for OpenCL isn't enough for Vulkan to get support.

sebbbi · Mar 28, 2017

DeanoC said:
Vulkan and OpenCL don't really share SPIR-V, there are two profiles and Vulkan doesn't accept the profile used by OpenCL (logical vs physical memory model). That is not to say the problem isn't being worked on, but currently just having a C++ -> SPIR-V compiler that works for OpenCL isn't enough for Vulkan to get support.

There are differences, but OpenCL and Vulkan compute shaders try to solve the same problem and use the same intermediate language (albeit not 100% compatible). We already have a modern C++ compiler for OpenCL and it produces SPIR-V code. We eventually want identical feature set for Vulkan compute shaders. I don't know what was the main reason to choose a different memory model for Vulkan compute shaders versus OpenCL 2.1. But I would argue that OpenCL (and CUDA) model is better for compute shaders, so I'd prefer Vulkan compute shaders to switch to physical memory model as well (eventually). There is no reason to limit compute shader functionality because you want simplicity in pixel/vertex shaders. Compute needs more flexibility and better language constructs for generic algorithms and data structures.

Alessio1989 · Mar 28, 2017

Ethatron said:
There is always C++ AMP.

Did C++AMP receive any major update in the upcoming SKD?

Ethatron · Mar 31, 2017

Alessio1989 said:
Did C++AMP receive any major update in the upcoming SKD?

C++ 17 eventually.

DmitryKo · Apr 3, 2017

The updated MSDN documentation is live:

Windows 10, version 1703

The following topics have been added to the Direct3D documentation for Windows 10, version 1703:

The ID3D12Device2::CreatePipelineState method and D3D12_Pipeline_State_Stream_Desc struct represent new and more robust way to create PSOs, and unifies the inteface for creating graphics and compute pipelines.

The ID3D12Device1::CreatePipelineLibrary1 method expands the pipeline library interface to accept the PSOs created with the new, unified D3D12_Pipeline_State_Stream_Desc structure.

The D3D12EnableExperimentalFeatures function allows developers to experiment with certain in-development features using a machine in Developer Mode.

There are five new interfaces (refer to Interface Hierarchy):

ID3D12GraphicsCommandList1

ID3D12PipelineLibrary1

ID3D12Device2

ID3D12Debug2

ID3D12Tools

Refer to the HLSL Shader Model 6.0 Overview, which describes the wave intrinsic operations for multi-threaded pixel and compute shaders.

The use of ID3D12Device::SetStablePowerState has changed.

Some new features for Direct3D 11 are described in Direct3D 11.4 Features.

AtomicCopyBufferUINT and AtomicCopyBufferUINT64 enable late-latch to reduce percieved latency.

ID3D12Device2::CreatePipelineState and OMSetDepthBounds enable depth-bounds testing on supported hardware.

ResolveSubresourceRegion enables partial resolution of subresources to help optimize performance.

SetSamplePositions enables programmable sample positions on supported hardware.

Shader Model 6.0 is currently an experimental feature which needs to be explicitly enabled using the D3D12EnableExperimentalFeatures function.

https://github.com/Microsoft/DirectXShaderCompiler/wiki/FAQ

What is Experimental mode, and how do I enable experimental features?

Experimental mode is a new feature of Direct3D in Windows 10. It lets software developers collaborate with each other and with IHVs on prototyping of new features on GPU drivers. Here is how to access it:

Turn on Developer Mode in your OS:
Settings -> Update&Security -> For Developers -> (*) Developer Mode

Enable an experimental mode feature in your app by calling this routine before calling CreateDevice().
D3D12EnableExperimentalFeatures( D3D12ExperimentalShaderModels );

Acquire a driver (or software renderer) that supports experimental mode

d3d12.h

HRESULT WINAPI D3D12EnableExperimentalFeatures(
UINT NumFeatures,
_In_ const IID *pIIDs,
_In_ void *pConfigurationStructs,
_In_ UINT *pConfigurationStructSizes
);

// --------------------------------------------------------------------------------------------------------------------------------
// Experimental Feature: D3D12ExperimentalShaderModels
//
// Use with D3D12EnableExperimentalFeatures to enable experimental shader model support,
// meaning shader models that haven't been finalized for use in retail.
//
// Enabling D3D12ExperimentalShaderModels needs no configuration struct, pass NULL in the pConfigurationStructs array.
//
// --------------------------------------------------------------------------------------------------------------------------------
static const UUID D3D12ExperimentalShaderModels = { /* 76f5573e-f13a-40f5-b297-81ce9e18933f */
0x76f5573e,
0xf13a,
0x40f5,
{ 0xb2, 0x97, 0x81, 0xce, 0x9e, 0x18, 0x93, 0x3f }
};

Alessio1989 · Apr 4, 2017

So, there is also depth-bound test. Maybe a little to late, but it could be still useful in some cases.. Most of AMD and NVIDIA graphics cards should support it. Not sure about Intel.

DeanoC · Apr 5, 2017

sebbbi said:
There are differences, but OpenCL and Vulkan compute shaders try to solve the same problem and use the same intermediate language (albeit not 100% compatible). We already have a modern C++ compiler for OpenCL and it produces SPIR-V code. We eventually want identical feature set for Vulkan compute shaders. I don't know what was the main reason to choose a different memory model for Vulkan compute shaders versus OpenCL 2.1. But I would argue that OpenCL (and CUDA) model is better for compute shaders, so I'd prefer Vulkan compute shaders to switch to physical memory model as well (eventually). There is no reason to limit compute shader functionality because you want simplicity in pixel/vertex shaders. Compute needs more flexibility and better language constructs for generic algorithms and data structures.

As I understand it, the reason for Vulkan compute being logical memory model, was concerns about validation, particularly among mobile and web clients.Its also worth noting that sometimes Khronos can be 'political', if Vulkan does everything OpenCL does, is OpenCL obselete? And then you get all the fun of people protecting turf etc. I've heard that any compute in Vulkan was hard, similarly why it came relatively late to GL, the orthodoxy is/was CL is for compute, Vulkan is just for graphics, so why blur the lines?!

Personally I agree 100% that we need physical memory model Vulkan compute ASAP.

sebbbi · Apr 5, 2017

DeanoC said:
As I understand it, the reason for Vulkan compute being logical memory model, was concerns about validation, particularly among mobile and web clients.Its also worth noting that sometimes Khronos can be 'political', if Vulkan does everything OpenCL does, is OpenCL obselete? And then you get all the fun of people protecting turf etc. I've heard that any compute in Vulkan was hard, similarly why it came relatively late to GL, the orthodoxy is/was CL is for compute, Vulkan is just for graphics, so why blur the lines?!

The only problem is that the lines have already been blurred a long time ago. Nowadays all modern AAA renderers use a lot of compute shaders. There are games that are almost pure compute (see: Media Molecule's game "Dreams"). Point clouds, distance fields, voxels are all getting traction rapidly. Compute shaders are used to process and display these. Polygon rendering isn't the only way to render modern graphics anymore. GPU-driven renderers that process the scene data structures using compute shaders perform render setup & culling (viewport & occlusion & sub-object) on GPU are also getting popular.

Graphics API without good compute support is useless for modern rendering techniques. I am not happy until DirectX and Vulkan compute shaders match CUDA in productivity and feature set. Shader Model 6.0 finally gave us cross lane operations and and other goodies, so at least there's hope. But HLSL/GLSL is still way behind CUDA in productivity features. Because of this there's barely any compute libraries for HLSL/GLSL.

DeanoC · Apr 6, 2017

sebbbi said:
Graphics API without good compute support is useless for modern rendering techniques. I am not happy until DirectX and Vulkan compute shaders match CUDA in productivity and feature set. Shader Model 6.0 finally gave us cross lane operations and and other goodies, so at least there's hope. But HLSL/GLSL is still way behind CUDA in productivity features. Because of this there's barely any compute libraries for HLSL/GLSL.

I whole heartily agree but what do we know *sigh*
I personally see the lack of decent cross-platform compute rendering capabilities the biggest thing holding rendering back at the moment. HIP gives us CUDA compatiblity (to a degree) on AMD GPUs, but that needs windows support first AND a way of interacting with Vulkan or Dx12.
And that still leaves out Intel, so maybe even that wouldn't be enough...

Alessio1989 · Apr 11, 2017

I did some quick queries on the AMD drivers: the D3D12_COMMAND_QUEUE_PRIORITY_GLOBAL_REALTIME priority is reported to be supported for both copy and compute queues on GCN3, none on GCN1. I may guess this is something similar to the "quick response queue".
Depth bound test is also reported to be supported for both architecture as I expected.
None of the driver cards reported to support programmable sampling for now.
Shader caching is reported to be D3D12_SHADER_CACHE_SUPPORT_SINGLE_PSO and D3D12_SHADER_CACHE_SUPPORT_LIBRARY, but not D3D12_SHADER_CACHE_SUPPORT_AUTOMATIC_INPROC_CACHE or D3D12_SHADER_CACHE_SUPPORT_AUTOMATIC_DISK_CACHE for now.
The driver also reported the architecture to have an "Isolated MMU" on both GPUs.

Kaotik · Apr 12, 2017

Alessio1989 said:
I did some quick queries on the AMD drivers: the D3D12_COMMAND_QUEUE_PRIORITY_GLOBAL_REALTIME priority is reported to be supported for both copy and compute queues on GCN3, none on GCN1. I may guess this is something similar to the "quick response queue".
Depth bound test is also reported to be supported for both architecture as I expected.
None of the driver cards reported to support programmable sampling for now.
Shader caching is reported to be D3D12_SHADER_CACHE_SUPPORT_SINGLE_PSO and D3D12_SHADER_CACHE_SUPPORT_LIBRARY, but not D3D12_SHADER_CACHE_SUPPORT_AUTOMATIC_INPROC_CACHE or D3D12_SHADER_CACHE_SUPPORT_AUTOMATIC_DISK_CACHE for now.
The driver also reported the architecture to have an "Isolated MMU" on both GPUs.

Worth noting is that AMD didn't include "official Creators Update support" on the 17.4.1's so maybe there will be some changes in next driver set

Alessio1989 · Apr 12, 2017

AMD never tell about non-gaming or non-multimedia related things in the release notes..

edit: just ntried with 17.4.2 (first AMD drivers marked as "WDDM 2.2"): no changes. I did not check with experimental mode enable, I am too lazy.

edit2: enabling experimental mode for shader model, both o my AMD GPUs, a GCN1 and a GCN3, reports SM6 support.

Alessio1989 · Apr 12, 2017

About Programmable sampling (https://msdn.microsoft.com/en-us/library/windows/desktop/mt492581(v=vs.85).aspx)

Reading this great post of mjp: https://mynameismjp.wordpress.com/2015/09/13/programmable-sample-points/
Maxwell 2.0 and Pascal should get Tier 2, while all AMD GCN GPUs could be a tier 1 (there is an extension which looks like Tier1 capabilities on OpenGL since GCN1: https://www.khronos.org/registry/OpenGL/extensions/AMD/AMD_sample_positions.txt) or Intel iGPUs may fit it.

Direct3D feature levels discussion

Alessio1989

sebbbi

Alessio1989

sebbbi

Alessio1989

sebbbi

Ethatron

DeanoC

Trust me, I'm a renderer person!

sebbbi

Alessio1989

Ethatron

DmitryKo

Alessio1989

DeanoC

Trust me, I'm a renderer person!

sebbbi

DeanoC

Trust me, I'm a renderer person!

Alessio1989

Kaotik

Drunk Member

Alessio1989

Alessio1989