Direct3D feature levels discussion

Locuza · Aug 5, 2015

Andrew Lauritzen said:
Yes, if you have one of the chips I encourage you to try it out

Awesome work!

I'm really curious about the technical details on the next IDF.
Maybe some details about the difficulties coming with implementing all DX12 features.

virpz said:
Make an articled talking about the differences between the API versions and Feature Levels and why DirectX 12.1 has nothing to do with Feature Level 12.1.

DX12 is the API-Library, the Feature-Levels are simply definied within the API.
In the future we could get an API-Update to DX12.1, introduction the new FL12.2 and then everyone who wrote FL12.1 = DX12.1 will get a problem.

Ryan Smith · Aug 5, 2015

Andrew Lauritzen said:
I think we have to leave that to the press for a few reasons, if it even makes sense. Ryan Shrout was considering something like that earlier in the thread.

FWIW - there's no such thing as "DirectX 12.1" for starters

Andy, wrong Ryan!

Anyhow, the idea is still a WIP. Hard to find the time between Skylake and IDF, but it's definitely something I want to get done.

Andrew Lauritzen · Aug 5, 2015

Ryan Smith said:
Andy, wrong Ryan!

Anyhow, the idea is still a WIP. Hard to find the time between Skylake and IDF, but it's definitely something I want to get done.

Whoops, good point

In any case I agree with your earlier assessment that it's unclear whether it's a benefit to consumers to bring this topic up or not, but you're better qualified to decide than I am

In any case now you have some additional information to work from on the Intel side.

CarstenS · Aug 6, 2015

Kaarlisk said:
Did you check results for Haswell/Broadwell? Because with the same driver (I think), my Haswell does not support some features that it supposedly should.

No, not yet. Can try tomorrow (edit: darn, in a few hours later this day...) when I'm back at my Haswell/Windows 10 rig at the office.

Alessio1989 · Aug 6, 2015

Locuza said:
Awesome work!
I'm really curious about the technical details on the next IDF.
Maybe some details about the difficulties coming with implementing all DX12 features.

DX12 is the API-Library, the Feature-Levels are simply definied within the API.
In the future we could get an API-Update to DX12.1, introduction the new FL12.2 and then everyone who wrote FL12.1 = DX12.1 will get a problem.

That's not a problem at all, feature level it's just an enum name for the DDI...

Code:

typedef
enum D3D_FEATURE_LEVEL
    {
        D3D_FEATURE_LEVEL_9_1    = 0x9100,
        D3D_FEATURE_LEVEL_9_2    = 0x9200,
        D3D_FEATURE_LEVEL_9_3    = 0x9300,
        D3D_FEATURE_LEVEL_10_0    = 0xa000,
        D3D_FEATURE_LEVEL_10_1    = 0xa100,
        D3D_FEATURE_LEVEL_11_0    = 0xb000,
        D3D_FEATURE_LEVEL_11_1    = 0xb100,
        D3D_FEATURE_LEVEL_12_0    = 0xc000,
        D3D_FEATURE_LEVEL_12_1    = 0xc100
    }     D3D_FEATURE_LEVEL;

Alessio1989 · Aug 6, 2015

Andrew Lauritzen said:
Tiled resources have never been supported on Haswell. Even if they were, they would be mostly unusable with the 2GB VA limits.

So Haswell is the only GPU without TiledRes support? Even the old low-end HD 7700 GNC support them. But yes, I understand that 31 bit virtual address limit is quite awful.

By the way, do you know how many GPUs report correctly the new virtual address structure? (per process and per resource). I though GCN 1.0 was limited to 31 bit per resource, but now I got 40 which is quite strange :\

Kaarlisk · Aug 6, 2015

CarstenS said:
No, not yet. Can try tomorrow (edit: darn, in a few hours later this day...) when I'm back at my Haswell/Windows 10 rig at the office.

Thanks, but Andrew Lauritzen already solved the mystery

Andrew Lauritzen · Aug 6, 2015

Locuza said:
Maybe some details about the difficulties coming with implementing all DX12 features.

Come to my talk if you're at IDF

I'm sure slides will be posted otherwise. To be fair it's more about DX12 API and how you can use it to efficiently use Gen9, but there's a little bit on new features in there too and some overlap.

Ryan Smith · Aug 6, 2015

Andrew Lauritzen said:
Come to my talk if you're at IDF I'm sure slides will be posted otherwise. To be fair it's more about DX12 API and how you can use it to efficiently use Gen9, but there's a little bit on new features in there too and some overlap.

What time (and what's the session number) for your talk?

virpz · Aug 6, 2015

Alessio1989 said:
all D3D12 feature data options can be optionally supported in ANY feature level, from 11_0 to 12_1, lil example: ROVs and CR can be used with a FL 11_0 device if the driver/hardware allows it.

Does that means that Fermi or GNC 1.1 cards can have ROVs and CR ? What is needed for cards based GCN to support these resources ?

Locuza · Aug 6, 2015

Andrew Lauritzen said:
Come to my talk if you're at IDF I'm sure slides will be posted otherwise. To be fair it's more about DX12 API and how you can use it to efficiently use Gen9, but there's a little bit on new features in there too and some overlap.

I greatly would , if i were a journalist or developer, but I'm just a guy who is commonly interested in this topic.

virpz said:
Does that means that Fermi or GNC 1.1 cards can have ROVs and CR ? What is needed for cards based GCN to support these resources ?

A certain degree of hardware support of course.
The message was simply that for developers it isn't necessary to check for a whole Feature-Level, they can also check a bunch of features individually.

So it's possible to use ROVs on Haswell/Broadwell with FL11.0.
And also Tiled-Resources Tier 3 on Maxwell Gen 2 and Skylake, which isn't mandatory for any Feature-Level yet.

NThibieroz · Aug 6, 2015

Without taking away from the usefulness of other DX12 features I'm surprised by how little asynchronous compute/shaders (aka "multi-engine" in Microsoft terms) is mentioned on this thread. This feature is one of the most important feature of DX12 (along with multithreaded recording of command buffers) and is also the feature that game developers are the most excited about because of its ability to increase performance via fuller utilization of the GPU. Indeed Gen4 console developers are already getting multi-ms savings from scheduling and executing complimentary workloads on different queues (check out some of the excellent presentations on this topic).
Multi-Engine efficiency will vary across devices but unfortunately there is no CAP or feature level to indicate the level of support. Any 3D device can expose a compute or copy queue in addition to the graphics queue but these won't necessarily translate to performance savings depending on some factors including workload switching granularity (on AMD GCN this is a wavefront).

CarstenS · Aug 6, 2015

If i may hazard a guess, it's maybe because we're in D3D space here*, eagerly waiting for the first DX12 titles to appear - and since async compute, as well as other stuff primarily is a performance "feature" (akin to having multiple [instead of one] compute units in the first place), this is a thing that'll reap the merits once real applications show up.

*as opposed to consoles where it's used already, but with no real means of performance comparison.

Alessio1989 · Aug 6, 2015

NThibieroz said:
Without taking away from the usefulness of other DX12 features I'm surprised by how little asynchronous compute/shaders (aka "multi-engine" in Microsoft terms) is mentioned on this thread. This feature is one of the most important feature of DX12 (along with multithreaded recording of command buffers) and is also the feature that game developers are the most excited about because of its ability to increase performance via fuller utilization of the GPU. Indeed Gen4 console developers are already getting multi-ms savings from scheduling and executing complimentary workloads on different queues (check out some of the excellent presentations on this topic).
Multi-Engine efficiency will vary across devices but unfortunately there is no CAP or feature level to indicate the level of support. Any 3D device can expose a compute or copy queue in addition to the graphics queue but these won't necessarily translate to performance savings depending on some factors including workload switching granularity (on AMD GCN this is a wavefront).

Multi engine is supported across all D3D12 capable GPUs. If the hardware provides some sort of "dedicated" hardware engines, then graphics, copy and compute operations could be executed in concurrency and not only serialized by the driver. As far I know, all GPUs should take advantage of async. copy operations, while not all GPUs are able take advantage of async. copmpute operations. If the GPU/driver cannot take advantage of async operations (no hardware dedicated engine or simply the dedicated pipeline engine is "full") then all operation are serialized by the driver with a minimal performance impact.

DmitryKo · Aug 6, 2015

virpz said:
Does that means that Fermi or GNC 1.1 cards can have ROVs and CR ?

No. It means that important features can be implemented independently of feature levels. In case of CR and ROV, vendors don't need to support a full feature level 12_1 - they can just expose these two features while supporting maximum feature level 11_1, as Intel did.

NThibieroz said:
Without taking away from the usefulness of other DX12 features I'm surprised by how little asynchronous compute/shaders (aka "multi-engine" in Microsoft terms) is mentioned on this thread.

It's because this thread is about feature levels, while multi-engine is neither a part of any feature level nor an optional capability.
It only concerns WDDM driver DDIs, and none of us writes graphics drivers as far as I can tell. .

lanek · Aug 6, 2015

Alessio1989 said:
Multi engine is supported across all D3D12 capable GPUs. If the hardware provides some sort of "dedicated" hardware engines, then graphics, copy and compute operations could be executed in concurrency and not only serialized by the driver. As far I know, all GPUs should take advantage of async. copy operations, while not all GPUs are able take advantage of async. copmpute operations. If the GPU/driver cannot take advantage of async operations (no hardware dedicated engine or simply the dedicated pipeline engine is "full") then all operation are serialized by the driver with a minimal performance impact.

Dont you need DMA for "copy" compute operations" ? This reduce drastically the possibilty.. GCN and Maxwell 2.0.. I dont know for Intel Gen.

Looking at how you describe it, i ask me so why it have not been used before..

Alessio1989 · Aug 6, 2015

Never programmed with CUDA, but here is something about async copy operations: http://on-demand.gputechconf.com/gtc-express/2011/presentations/StreamsAndConcurrencyWebinar.pdf

CarstenS · Aug 6, 2015

Kaarlisk said:
Thanks, but Andrew Lauritzen already solved the mystery

For completeness`s sake anyway:

Intel HD Graphics 4600 (i7-4790K, 10.18.15.4256)

ADAPTER 0
"Intel(R) HD Graphics 4600"
VEN_8086, DEV_0412, SUBSYS_85341043, REV_06
Dedicated video memory : 117964800 bytes
Total video memory : 4247742464 bytes
Maximum feature level : D3D_FEATURE_LEVEL_11_1 (0xb100)
DoublePrecisionFloatShaderOps : 1
OutputMergerLogicOp : 1
MinPrecisionSupport : D3D12_SHADER_MIN_PRECISION_SUPPORT_NONE (0)
TiledResourcesTier : D3D12_TILED_RESOURCES_TIER_NOT_SUPPORTED (0)
ResourceBindingTier : D3D12_RESOURCE_BINDING_TIER_1 (1)
PSSpecifiedStencilRefSupported : 0
TypedUAVLoadAdditionalFormats : 0
ROVsSupported : 1
ConservativeRasterizationTier : D3D12_CONSERVATIVE_RASTERIZATION_TIER_NOT_SUPPORTED (0)
MaxGPUVirtualAddressBitsPerResource : 31
StandardSwizzle64KBSupported : 0
CrossNodeSharingTier : D3D12_CROSS_NODE_SHARING_TIER_NOT_SUPPORTED (0)
CrossAdapterRowMajorTextureSupported : 0
VPAndRTArrayIndexFromAnyShaderFeedingRasterizerSupportedWithoutGSEmulation : 1
ResourceHeapTier : D3D12_RESOURCE_HEAP_TIER_2 (2)
MaxGPUVirtualAddressBitsPerProcess : 31
Adapter Node 0: TileBasedRenderer: 0, UMA: 1, CacheCoherentUMA: 1

killeak · Aug 6, 2015

DmitryKo said:
No. It means that important features can be implemented independently of feature levels. In case of CR and ROV, vendors don't need to support a full feature level 12_1 - they can just expose these two features while supporting maximum feature level 11_1, as Intel did.

Just to extend to that commend...

You can consider feature levels as a simplification in the process of support X features, a matter of convenience. If you create a device with FL12_1 (I mean a call to D3D12CreateDevice), you know that you will have support for CR and ROV, but if you create a device with FL11_0 then you must check if CR or ROV are available individually.

Is kinda the same with OpenGL, if you create an OpenGL context capable of 4.x you will have n features available for sure, but if the context is 3.x it doesn't mean that you can't use the extensions to support the same features that an OpenGL 4.x context is capable of, but you need to check that those extensions are available, one by one.

lanek · Aug 6, 2015

Alessio1989 said:
Never programmed with CUDA, but here is something about async copy operations: http://on-demand.gputechconf.com/gtc-express/2011/presentations/StreamsAndConcurrencyWebinar.pdf

I think we are not speaking about the same thing so.. but its my fault, i was not clear ..

Direct3D feature levels discussion

Locuza

Ryan Smith

Andrew Lauritzen

Moderator

CarstenS

Moderator

Alessio1989

Alessio1989

Kaarlisk

Andrew Lauritzen

Moderator

Ryan Smith

virpz

Locuza

NThibieroz

CarstenS

Moderator

Alessio1989

DmitryKo

lanek

Alessio1989

CarstenS

Moderator

killeak

lanek