Direct3D feature levels discussion

Ext3h · Feb 18, 2021

CarstenS said:
Mesh shaders, in contrast to raytracing, can help improve framerate, thus making games playable in the first place on lower tier integrated graphics.

Do they really in practical application, or does that only apply to synthetic benchmarks in the first place which didn't use LOD for models on purpose in order to artificially trigger bottlenecks in the geometry pipeline?

I mean, sure, you can use a 2 million triangles character model. And then boast how you still managed to squeeze it down to only a few hundred k of not rejected triangles, so the impact had been reduced far enough to allow any form of multi pass.

Or you could just have provided a lower resolution model in the first place for slower hardware.

Lurkmass · Feb 18, 2021

Ext3h said:
Do they really in practical application, or does that only apply to synthetic benchmarks in the first place which didn't use LOD for models on purpose in order to artificially trigger bottlenecks in the geometry pipeline?

I mean, sure, you can use a 2 million triangles character model. And then boast how you still managed to squeeze it down to only a few hundred k of not rejected triangles, so the impact had been reduced far enough to allow any form of multi pass.

Or you could just have provided a lower resolution model in the first place for slower hardware.

As explained in another thread when we enable tessellation or geometry shaders, there is additional overhead incurred to the hardware pipeline so a similar principle applies with the mesh shading pipeline. Our mesh shading pipeline comes in two stages with both the amplification shader and the mesh shader but the overhead of running amplification shaders isn't trivial. A vertex shading only pipeline (no tess or GS in this case) with a trivial amount of work could potentially win in terms of performance when we consider the fact that we only have one stage ...

3dcgi · Feb 18, 2021

You can't just enable Mesh Shaders, provide it with the same content, and expect improved performance. For static content that can have vertex reuse performed offline Mesh Shaders are a great fit and may be able to outperform the traditional pipeline. I say may because it really depends on the bottleneck. If you're limited by VS performance Mesh Shaders will only be better if you can reduce the number of vertices.

If you want a GPU driven approach, like Sebbbi promoted, where compute was used for high level culling instead of the CPU then the combination of Amplification and Mesh Shaders should be a good fit.

I'm using generalizations here. We'll really need to wait and see real content to know if expectations like mine match reality.

To really know if Amplification/Mesh Shaders are better we need optimized versions of both pipelines for comparison. As Rys, said the Mesh Shader off case for the 3dmark test isn't as fast as the traditional pipeline could go given a different implementation.

Lurkmass · Feb 22, 2021

I heard that the Direct3D team are exploring the potential into exposing an equivalent DEVICE_LOCAL and HOST_VISIBLE memory type in Vulkan for D3D12. On AMD HW, this memory type amounts to 256MB in VRAM. On AMD HW with resizable BAR, this memory type includes the entire VRAM.

On D3D12, we don't have explicit access for this particular memory type like we do on Vulkan. When we create our memory heaps on D3D12, there are 4 types of heaps for which we can specify them as DEFAULT, UPLOAD, READBACK, or CUSTOM. If we are creating DEFAULT, UPLOAD, or READBACK heaps then their other properties such as the CPU page and the memory pool must be defined as UNKNOWN which makes it forbidden to express that memory type. When we're creating a CUSTOM heap type we don't know if the driver will use this memory type either so this case isn't ideal either.

I suspect a part of the reason why AMD are seeing performance gains with resizable BAR is because their drivers are placing the UPLOAD heaps in VRAM. I assume it would be helpful information for the driver if we could extend our UPLOAD heaps to have other explicitly defined heap properties such as specifying our memory pool to be D3D12_MEMORY_POOL_L1.

OlegSH · Feb 22, 2021

Lurkmass said:
I suspect a part of the reason why AMD are seeing performance gains with resizable BAR is because their drivers are placing the UPLOAD heaps in VRAM

I guess a few exe renaming tests might show whether it's a default driver behavior with resizable BAR or there are driver profiles per game (it doesn't look like there are any profiles since there are games which perform slower with rBAR on AMD).
It would be also interesting to know whether this driver promotion stuff works for the default 256 MB rBAR size (would require some video memory managment for the limited space, but still might capture a good fraction of benefit)

Lurkmass · Feb 24, 2021

OlegSH said:
I guess a few exe renaming tests might show whether it's a default driver behavior with resizable BAR or there are driver profiles per game (it doesn't look like there are any profiles since there are games which perform slower with rBAR on AMD).
It would be also interesting to know whether this driver promotion stuff works for the default 256 MB rBAR size (would require some video memory managment for the limited space, but still might capture a good fraction of benefit)

Here's a deeper explanation provided an AMD devtech engineer with respect to Vulkan. AMD also recommends that you shouldn't use the full 256 MB BAR either since "implicit resources" can take up that memory as well so you'll want to leave 33% available for the driver.

DmitryKo · Apr 8, 2021

AMD released beta Adrenalin driver 27.20.21001.7005 for Windows Insider Preview "Cobalt" on Windows Update. Here are new features reported in build 21354, comparing to current Adrenalin 21.3.2:

Code:

Hardware-accelerated scheduler : Disabled, DXGK_FEATURE_SUPPORT_EXPERIMENTAL (1)
HighestShaderModel : D3D12_SHADER_MODEL_6_6 (0x0066)
AtomicInt64OnTypedResourceSupported : 1
AtomicInt64OnGroupSharedSupported : 1
DerivativesInMeshAndAmplificationShadersSupported : 1
DirectML maximum feature level : DML_FEATURE_LEVEL_4_0 (0x4000)

Note that shader model 6.6 and experimental shader model 6.7 are supported; experimental support for hardware-accelerated graphics scheduler is available, but the feature is not enabled by recent Windows Insider builds.

[EDIT] New feature options in Insider Preview SDK 'Cobalt', from latest Direct3D 12 Agility SDK headers:

Code:

ShaderCache.SupportFlags : D3D12_SHADER_CACHE_SUPPORT_SINGLE_PSO | LIBRARY | AUTOMATIC_INPROC_CACHE | AUTOMATIC_DISK_CACHE | DRIVER_MANAGED_CACHE | SHADER_CONTROL_CLEAR | SHADER_SESSION_DELETE (127) (0b0111'1111)
VariableRateShadingSumCombinerSupported : 0
MeshShaderPerPrimitiveShadingRateSupported : 0
AtomicInt64OnDescriptorHeapResourceSupported : 1

Pinstripe · Apr 9, 2021

Will this Hardware-accelerate scheduler also help to alleviate Nvidia's driver overhead issue?

DegustatoR · Apr 9, 2021

Pinstripe said:
Will this Hardware-accelerate scheduler also help to alleviate Nvidia's driver overhead issue?

It is enabled on Pascal+ in current Windows release since last year.
I was actually wondering why nobody bothered to check how it is influencing CPU limitations on GFs.
But from what I can tell the differences are fairly minor, and you'd generally be better off with it being off since it does introduce some compatibility issues.

Pinstripe · Apr 9, 2021

That's the same as HAGS? Okay.

Disappointng it does so little in practice.

DegustatoR · Apr 9, 2021

Pinstripe said:
That's the same as HAGS? Okay.

Disappointng it does so little in practice.

I may be wrong here but I think that it's more about future h/w and API changes than for what's currently available.

DmitryKo · Apr 11, 2021

DegustatoR said:
it's more about future h/w and API changes

Yes, that's exactly what they said in the DirectX Blog post on hardware accelerated GPU scheduling.

Lurkmass · Apr 14, 2021

I wonder if there's any plans from Microsoft to make PSO creation simpler or expose bindless buffers because some Vulkan implementations are clearly ahead of D3D12 in these aspects ...

Dynamic states: EXT_extended_dynamic_state, EXT_color_write_enable, EXT_vertex_input_dynamic_state, EXT_extended_dynamic_state2 (unreleased/dynamic blend states ?)

Bindless: KHR_buffer_device_address & bindless texture handles (future)

It would be a massive turn around that Vulkan somehow ends up being the easier API to use between it and D3D12 on some driver implementations. Exposing more dynamic states would be very helpful to making pipeline state management simpler and it would aid in reducing pipeline/shader compilation times because drivers wouldn't need to constantly compile hundreds or even thousands of unique pipelines since the HW is capable of making these very cheap state changes and would instead only need to compile several dozen pipelines. (less pipelines = lower compilation times) D3D12 also exposes GPU virtual addresses but we can't do anything interesting with them compared to Vulkan like being able to build complex data structures such as linked lists. Bindless is mostly 'implicit' on D3D12 and the only time we can ever pass the GPU VAs to the shaders is through root descriptors where they're accessed from the very limited space of the root signature (64 DWORDs) whereas buffer references in Vulkan by comparison can be stored anywhere in memory.

Remij · Apr 19, 2021

^that would be sweet.

I'm excited for Game Stack Live on the 20th+21st. We should get a good look into DirectStorage, Shader Model 6.6, the Velocity Architecture, and improvements to DX12U dev tools.

Kaotik · Apr 19, 2021

Remij said:
^that would be sweet.

I'm excited for Game Stack Live on the 20th+21st. We should get a good look into DirectStorage, Shader Model 6.6, the Velocity Architecture, and improvements to DX12U dev tools.

That makes it even worse, it's literally celebrating the absurd price hikes and calling it "gamers buying up"

DmitryKo · Apr 19, 2021

WindowsCentral assumes that the key GameStack Live announcement would be the new D3D12 'Agility' SDK, which supposedly includes a redistributable runtime with NuGet delivery and polyfill code to allow new DXIL features and APIs on older versions of Windows 10.

Microsoft already posted details about shader model 6.6 back in November 2020; the essential parts of Xbox Velocity Architecture were also detailed back in November 2019 then July and October 2020 - it's made from a combination of Tiles Resources + Sampler Feedback + DirectStorage + LZ-family data decompression, which enables fast efficient texture streaming from NVMe disks (though hardware requirements for DirectStorage and data decompression on Windows 10 are yet to be clarified).

DegustatoR · Apr 20, 2021

https://devblogs.microsoft.com/directx/announcing-dx12agility/
https://devblogs.microsoft.com/directx/gettingstarted-dx12agility/

Remij · Apr 20, 2021

Awesome stuff.

Krteq · Apr 20, 2021

So, basically we are back at "DirectX redist" model, right?

DegustatoR · Apr 20, 2021

Krteq said:
So we are back at "DirectX redist" model, right?

Not quite. The D3D12Core.dll will ship with games which make use of the Agility SDK.
This won't change anything for the user and there won't be a need to install a DX Redist during application setup.
Games which will make use of the Agility SDK may use latest D3D12 features on Win10 machines which are 19H2+.

This is more or less similar to how MS has provided D3D12 support on Win7 for some games I'd say.

Direct3D feature levels discussion

Drunk Member