Direct3D feature levels discussion

Deleted member 2197 · Jun 8, 2015

ARK: Survival Evolved hoping for DX12 support this summer ...

Drake [developer]
We'll definitely have DX12 support this summer, if not at Early Access launch. We have the code for it (and it actually results in a nice performance boost), but don't want to add it to our immediate release test plan until Windows 10's official launch in July...

http://steamcommunity.com/app/346110/discussions/0/613957600545694189/

Alessio1989 · Jun 9, 2015

Little OT: https://i.imgflip.com/mmqub.jpg

DmitryKo · Jun 9, 2015

Andrew Lauritzen said:
No, that's not any less confusing. They should be named something that has *nothing to do* with the numbering of the API.

Alessio1989 said:
This. Something like D3D12_FEATURE_LEVEL_ALPHA, *_BETA, *_GAMMA, *_DELTA, and so on...

Andrew Lauritzen said:
I should note that Max did already say in another thread that he generally agrees that its confusing but that at this point it would be even more confusing to change it entirely

Since Direct3D 12 has hardware feature parity with Direct3D 11.3, it would make sense to have 11_2 as the level that requires tiled resources, since they were introduced in Direct3D 11.2, and level 11_3 that requires CR and ROV. (I recall this was the original intention back in early 2014 to have a new feature level in Direct3D 11.3 to support new rendering features).

Feature levels worked this way in Direct3D 10.1, 11.0 and 11.1 - new hardware features supported in the new version of the runtime were tied to a new feature level with a strict set of requirements. This was a well-understood system - a new version of Direct3D runtime has new APIs and/or driver model improvements to better serve existing hardware, but new hardware features always require a new higher feature level, which is a strict superset of existing levels.

What made it much more confusing was 10level9 in Direct3D 11.0, an attempt to lure "legacy" D3D9 developers with a taste of Direct3D 10/11 API, and reintroduction of optional capabilities in Direct3D 11.1/WDDM 1.2. This is when the support matrix between Windows version, WDDM driver version, Direct3D runtime version, and supported feature level became unmanageable for most developers - especially when you take installed base of Windows 8/Direct3D 11.1/WDDM 1.2 and 8.1/11.2/1.3 into account, and recall that an attempt to port Direct3D 11.1/WDDM 1.2 downlevel to Windows 7 proved unsuccessful - so any new hardware features above level 11_0 (including the optional features for Nvidia Fermi+ cards!) were never exposed to most users, because Windows 8.x was very unpopular and Windows 7 continued to hold overwhelming market share.

his note about WDDM being mixed up in this fun when new OSes are released is very true as well, and something that consumers are even more confused about

Yes, it's a sad reality that most people were limited to level 11_0 without even knowing it. HOWEVER it is different now because of the free upgrade to Windows 10, which includes WDDM 2.0, Direct3D 11.3 and 12.0. Most people should be running Windows 10 by the end of 2016, and with the "last Windows version" promise of continous development, there should be much less confusion and support troubles in regard to OS/runtime/WDDM versions.

AND since Direct3D 12 does not support downlevel anymore, i.e. levels 9_x and 10_x, it's a kind of clean break already - so it could be the opportunity to use a different numbering scheme for feature levels - including starting anew and renaming Direct3D 12 version of level 11_0 as D3D_FEATURE_LEVEL_A or *LEVEL_0, or maybe *LEVEL_12_0, *12_LOW etc.

DmitryKo · Jun 9, 2015

Rys said:
If you must have hardware feature levels (and you must), make it very simple for everyone to understand.

Direct3D 10.0, 10.1 and 11.0 were quite consistent - you had graphics hardware conforming to the strict requirements of Direct3D 10.0, 10.1 and 11.0, the latest runtime supported all previous hardware, but it wasn't a problem as vendors only advertised the version of runtime they actually supported on a hardware feature level.

This pattern was broken with 10level9 and Windows 8/Direct3D 11.1 optional capabilities - this is when Nvidia started advertising "DirectX 11.1" support though in fact having level 11_0 with some capabilities from 11_1, and then they started to advertise "DirectX 11.2" though it had no new mandatory hardware requirements...

silent_guy said:
Why do you need to have feature levels? Wouldn't it be easier to just have the cap bits from years ago?

We had a discussion on this in another thread.
https://forum.beyond3d.com/posts/1836875/

Basically, capability bits in D3D9 were an unmanageable hell for developers because there never was a lowest common denominator set of features across different vendors.

Direct3D 10 in Vista was designed to solve this problem radically by having absolutely no caps (besides resource format support) and strictly requiring a common "10.0" feature set that each GPU had to adhere to, and it wasn't until Direct3D 11.1 that optional capabilities were reintroduced agan.

silent_guy said:
But when you see statements about how Nvidia supports certain 11_1 features but not all, and that they can be enabled on request, then it's clear that it also slows down adoption of some features

NVidia said they only omitted the features which are not really useful for game development.

http://www.guru3d.com/news-story/nvidia-kepler-not-fully-compliant-with-directx-11-1,3.html

More important is that neither the options on level 11_0 nor the full feature level 11_1 can be supported in Windows 7 because it lacks WDDM 1.2. Or that most games were hardly even using level 11_0 features such as advanced pipeline stages, in spite of being advertised as "DirectX 11".

DmitryKo · Jun 9, 2015

Infinisearch said:
t I'm a bit hazy on bindless in relation to hardware and resource binding tiers.
I thought bindless would be required for 12 to work period, but thats not the case

The API is designed to be bindless - everything is in resource descriptor heaps/tables, and there is no automatic resource/memory management on the API side. You just populate the heaps with links to resources in local or system memory, and then use "root signatures" to link these resources with your shaders.

However some hardware imposes restrictions on both the total number of descriptors for "access views" (UAVs, CBVs, SRVs), and the number of these "access view" decriptors and texture samplers that can be passed to each shader stage. So if you have Resource Binding Tier 1 (not bindless) or Tier 2 (partially bindless) hardware, you must track these limits for each shader stage or the runtime will throw errors. On Tier 3 (fully bindless) hardware, there are no practical limits for these descriptors.

Infinisearch · Jun 9, 2015

DmitryKo said:
The API is designed to be bindless - everything is in resource descriptor heaps/tables, and there is no automatic resource/memory management on the API side. You just populate the heaps with links to resources in local or system memory, and then use "root signatures" to link these resources with your shaders.

However some hardware imposes restrictions on both the total number of descriptors for "access views" (UAVs, CBVs, SRVs), and the number of these "access view" decriptors and texture samplers that can be passed to each shader stage. So if you have Resource Binding Tier 1 (not bindless) or Tier 2 (partially bindless) hardware, you must track these limits for each shader stage or the runtime will throw errors. On Tier 3 (fully bindless) hardware, there are no practical limits for these descriptors.

Thank you, so basically tier 1 and 2(maybe - forgot) doesn't support "true" bindless and is emulated by the driver.

Alessio1989 · Jun 9, 2015

It's not emulated, tier1 and tier2 limitations must be manually managed by the application.

Andrew Lauritzen · Jun 9, 2015

DmitryKo said:
Since Direct3D 12 has hardware feature parity with Direct3D 11.3

That's not true. Particularly stuff like resource binding (and bindless) is only exposed in D3D12 and has no equivalent in 11.3.

DmitryKo said:
What made it much more confusing was 10level9 in Direct3D 11.0, an attempt to lure "legacy" D3D9 developers with a taste of Direct3D 10/11 API, and reintroduction of optional capabilities in Direct3D 11.1/WDDM 1.2.

There have actually been optional capabilities in every version of DX, but I agree they swung back in that direction a bit with 11.1.

DmitryKo said:
This is when the support matrix between Windows version, WDDM driver version, Direct3D runtime version, and supported feature level became unmanageable for most developers

The alternative isn't any more manageable - you need the latest OS *and* the latest hardware to use a new API. Then developers would just ignore the new APIs until 5+ years after they ship, like they do with CPU ISAs. DX already takes flak about not having extensions for the bleeding edge stuff, so I doubt this would be acceptable to anyone.

Partially the system was "simple" before because everything always starts simple. It's fairly unavoidable that when you actually build up a few years of legacy hardware and software it's going to get somewhat more complex. I don't see a clearly better way for it to be handled than what DX is doing, although it would be much nicer if feature levels just had unrelated names.

Infinisearch said:
Thank you, so basically tier 1 and 2(maybe - forgot) doesn't support "true" bindless and is emulated by the driver.

"Emulate" and "true" bindless are the wrong ways to think about it. On any DX12 compatible hardware you can dynamically index portions of the descriptor heap. On lower resource binding tiers these ranges are fairly small (~128 resources) so you still need to move around descriptor tables. On higher resource tiers you can dynamically index the entire heap (~1 million resources) so you no longer need multiple descriptor tables - you can just have a single one that has a view of the entire heap. This is generally good for driver efficiency but more importantly it is required for global scene access (deferred texturing, ray tracing, etc) as you don't know up front which textures you may need to access.

Osamar · Jun 9, 2015

So in plain english and for sillies as myself.

DX11, use a lot of trains with one engine and one car.
DX12 tiers 1 and 2, use much less trains, with one engine and some or some more cars.
And DX12 tier 3 use one train with one engine and as many cars as needed.

willardjuice · Jun 9, 2015

Andrew Lauritzen said:
The alternative isn't any more manageable

I don't agree. We can aim a little higher than the status quo.

I think your idea about separating feature level names and DirectX is a good start. I know this makes marketing (stickers on a box) harder, but I think each version of DirectX (e.g. 11, 11.1, 11.2, 11.3, 12, etc.) should have their own set of feature levels (e.g. feature level 1, 2, 3, etc.). While this may create many duplicate feature levels (it would be possible that feature level 1 in DX11 is the same as DX11.1), it would make the question "Does this gpu support DirectX 12" a simple yes or no question (no ambiguity with feature levels). It would make clear to users that they really want to be asking two questions: "Does this gpu support DirectX 12 (the software end) and if so what feature level (the hardware end)".

It would also allow you to more or less retroactively add, make optional, or remove features in a straight forward matter. Maybe you decided feature level 2 in DX11.1 had silly requirements that you could correct in DX11.2. Can you imagine the confusion if MS decided Kepler was a feature level 11_1 gpu in DX11.2 but not DX11.1/11.0? With this system they could make those changes without vast confusion. You could change how gpus are grouped together whenever and however you wanted.

Finally this may be the most controversial aspect but I think each version of DirectX should be set in stone. Want to add optional features or new feature levels? Create a new version of DirectX. I'm aware I haven't really thought this through, but I think part of the current problem is DirectX feels like it's a moving target. Having set apis would keep things crystal clear at any point in the life cycle (and on any platform). While I appreciate MS's effort to back port as much of DirectX as they can, I think ultimately ideas like the platform update for Windows 7 were mistakes (caused great confusion). I don't like how there's various optional features depending on what operating system (or worse Windows update) a user has installed. It's much easier checking if a whole api exists or not. And since we're setting the api in stone, I also think each api should contain a "maxed out" feature level (all optional features, formats, and tiers are supported at the highest level). That will allow developers to "future proof" their renderers more easily while also putting pressure on IHVs to adopt new features faster (who doesn't want a DirectX 12 feature level 5 card!?!). I don't think currently there's much pressure on IHVs to support some of these higher tiers that aren't feature level requirements.

Hopefully this system eliminates more confusion than it adds.

Infinisearch · Jun 9, 2015

Andrew Lauritzen said:
"Emulate" and "true" bindless are the wrong ways to think about it. On any DX12 compatible hardware you can dynamically index portions of the descriptor heap. On lower resource binding tiers these ranges are fairly small (~128 resources) so you still need to move around descriptor tables. On higher resource tiers you can dynamically index the entire heap (~1 million resources) so you no longer need multiple descriptor tables - you can just have a single one that has a view of the entire heap. This is generally good for driver efficiency but more importantly it is required for global scene access (deferred texturing, ray tracing, etc) as you don't know up front which textures you may need to access.

I'll review my DX12 resource binding "stuff" and see if I can see why you say its the wrong way to think about it. If not I'll be back with more questions. Thanks.

sebbbi · Jun 9, 2015

Andrew Lauritzen said:
That's not true. Particularly stuff like resource binding (and bindless) is only exposed in D3D12 and has no equivalent in 11.3.

Also 11.3 is lacking ExecuteIndirect, async copy and async compute. I would assume it is also lacking GPU buffer predicates (skip commands based on a GPU buffer value).

Andrew Lauritzen said:
On any DX12 compatible hardware you can dynamically index portions of the descriptor heap. On lower resource binding tiers these ranges are fairly small (~128 resources) so you still need to move around descriptor tables. On higher resource tiers you can dynamically index the entire heap (~1 million resources) so you no longer need multiple descriptor tables - you can just have a single one that has a view of the entire heap. This is generally good for driver efficiency but more importantly it is required for global scene access (deferred texturing, ray tracing, etc) as you don't know up front which textures you may need to access.

Unfortunately DirectX had to introduce descriptor heaps. On a pure bindless hardware such as GCN, resource descriptors can be stored and loaded from anywhere (by a standard memory load/store). You could even create a resource descriptor by shader code (by scalar unit integer bit operations to a SGPR). DirectX and OpenGL bindless resources are a bit less powerful, as these APIs are designed to meet the hardware limitations of GPUs that are not pure bindless.

Deferred texturing can be achieved by tiled resources (or with software virtual texturing). I would say that these alternatives are better for deferred texturing, but my reasoning is based on the current GPU implementations of bindless in cases where threads inside a single wave/warp access more than one resource (resource descriptor is not wave/warp invariant).

Alexko · Jun 9, 2015

sebbbi said:
Unfortunately DirectX had to introduce descriptor heaps. On a pure bindless hardware such as GCN, resource descriptors can be stored and loaded from anywhere (by a standard memory load/store). You could even create a resource descriptor by shader code (by scalar unit integer bit operations to a SGPR). DirectX and OpenGL bindless resources are a bit less powerful, as these APIs are designed to meet the hardware limitations of GPUs that are not pure bindless.

I would assume that you're talking about what is possible on consoles, and probably exposed by Mantle as well. But what about Vulkan?

MJP · Jun 9, 2015

DmitryKo said:
NVidia said they only omitted the features which are not really useful for game development.

http://www.guru3d.com/news-story/nvidia-kepler-not-fully-compliant-with-directx-11-1,3.html

I like how UAV-only rendering is a "gaming" feature, but UAV's for all shader stages is somehow a "non-gaming" feature. :-?

Alessio1989 · Jun 10, 2015

I also wonder, why Microsoft didn't add a cap-bit to expose 64 slots and UAV for all sahders for what that was like... 50% of the GPU market?

Two logical possibilities:

1) Fermi and Kepler fails to manage the UAV "revision" introduced with D3D 11.1
2) Microsoft was just evil or the very rich AMD bribed their monkey dancing CEO

Orthogonal Line Rendering Mode

What's that?

Alexko · Jun 10, 2015

A wireframe rendering mode dedicated to Minecraft, perhaps?

Andrew Lauritzen · Jun 10, 2015

sebbbi said:
Unfortunately DirectX had to introduce descriptor heaps. On a pure bindless hardware such as GCN, resource descriptors can be stored and loaded from anywhere (by a standard memory load/store).

Yes and I'm surprised they didn't go this route in Mantle, but I would actually disagree with the notion that this represents the clear trend in hardware design. If you want to pass the full descriptor data to samplers then you need very wide SIMD to amortize the cost. GCN obviously has wide SIMD, but that's a fundamental architecture tradeoff that has as many negatives as positives, and I believe GCN is the only architecture that works that way.

Fundamentally the data-paths between the execution units and samplers do have to handle a lot of bandwidth already and "compression" of the data passed between these two is highly desirable. In this case, the compression takes the form of passing an offset/index instead of the full descriptor data (usually ~128-256 bit) or even a 64-bit pointer. While it's convenient from a programming standpoint to have "just data" and "just pointers", I don't necessarily buy the argument that in this particular case it's functionally important enough to scatter descriptors around memory that it's worth constraining hardware designs or hampering performance for.

sebbbi said:
DirectX and OpenGL bindless resources are a bit less powerful, as these APIs are designed to meet the hardware limitations of GPUs that are not pure bindless.

Like I said, it's clearly useful to index a large set of descriptors and most hardware can support that without too much overhead. The cost/benefit ratio of putting descriptor data in non-continuous memory is far less clear.

sebbbi said:
Deferred texturing can be achieved by tiled resources (or with software virtual texturing). I would say that these alternatives are better for deferred texturing, but my reasoning is based on the current GPU implementations of bindless in cases where threads inside a single wave/warp access more than one resource (resource descriptor is not wave/warp invariant).

It's almost always better to pack things into atlases/texture arrays where possible, and indeed this is precisely for the reasons I outlined above - it's data compression

You're using fewer bits by just adding some address range to one texture than creating entirely new descriptors.

Andrew Lauritzen · Jun 10, 2015

MJP said:
I like how UAV-only rendering is a "gaming" feature, but UAV's for all shader stages is somehow a "non-gaming" feature.

Yeah it's completely arbitrary what they put in those two lists... or more realistically they made lists based on what their hardware can/can't do and labeled one "gaming" to save face

Deleted member 2197 · Jun 10, 2015

Almost deja vu ... I guess Hallock's recent comments should be taken with a grain of salt.

According to Hallock, the absence of the feature level 12_1 is no problem in principle. So the key for games features are all included in the feature level 11_1 and 12_0. In addition, most games would already geared to the console and thus hardly rely on 12_1.

However, these statements should be treated with caution. As long as (some) a manufacturer does not have a feature, this experience shows always important in his view. The same has, for example, Nvidia claims the feature level 11_1, which is not supported by Kepler graphics cards.

http://www.computerbase.de/2015-06/directx-12-amd-radeon-feature-level-12-0-gcn/

DmitryKo · Jun 10, 2015

Andrew Lauritzen said:
stuff like resource binding (and bindless) is only exposed in D3D12 and has no equivalent in 11.3

I know, but what would be the practical implications?

New rendering features in 11.3 are available on the same hardware as in Direct3D 12 and require the same WDDM 2.0 drivers - it's an update for those not wishing to go 12.0 route just yet. It's not like developers will be first making a Direct3D 12 version with bindless shaders and other 12_0 features, then port it down to Direct3D 11.3 just to run slower on the same hardware and rework all shaders again too.

There have actually been optional capabilities in every version of DX, but I agree they swung back in that direction a bit with 11.1

Yes, I said it before that resource formats and their supported operations are optional.
D3D10_FORMAT_SUPPORT
D3D11_FORMAT_SUPPORT
D3D12_FORMAT_SUPPORT1

This is when the support matrix between Windows version, WDDM driver version, Direct3D runtime version, and supported feature level became unmanageable for most developers

Click to expand...

The alternative isn't any more manageable - you need the latest OS *and* the latest hardware to use a new API.

At least on Windows 10 if you have the latest hardware, you can finally use its features - for AMD and NVidia, it's almost anything they released in the last 3-4 years. And the a free OS upgrade is an in-place update process initiated through Windows Update.

This should be clearly an improvement over current OS/WDDM/Direct3D fragmentation.

I don't see a clearly better way for it to be handled than what DX is doing, although it would be much nicer if feature levels just had unrelated names.

Yes, apparently the source of the problem is not really with the naming of feature levels, but with hardware support.

The confusion started when NVidia did not implement full level 11_1 requirements in their hardware for some reasons - and continued when Windows 8/WDDM 1.2 failed in the market, in effect making level 11_1 support much less relevant since level 11_0 features were the highest the users could practically get on Windows 7. Clever naming schemes are the most effortless way to minimize the consequences of moving off the "no options" strategy.

I wonder how this would work for future version of Direct3D 12.x - will they return to the more strict approach of Direct3D 10.x and 11.0, where the feature levels included an extended list of strict of requirements, or they will follow the current path of 11.1 where only a few major features are required on the higher levels and most other features are optional. I guess it also depends on whether these new features will be driven by hardware vendors or application software developers...

sebbbi said:
Also 11.3 is lacking ExecuteIndirect, async copy and async compute. I would assume it is also lacking GPU buffer predicates (skip commands based on a GPU buffer value).

But these differences also apply to feature levels 11_0 and 11_1.

willardjuice said:
each version of DirectX (e.g. 11, 11.1, 11.2, 11.3, 12, etc.) should have their own set of feature levels (e.g. feature level 1, 2, 3, etc.)

Each major version should probably have its own feature levels. And thankfully Microsoft settled on 11_0 as a minimum supported in Direct3D 12 and excluded levels 9_x and 10_x.

Trying to freeze minor revisions or build numbers is a bit extreme - Microsoft tried it with the HLSL compiler which had a different DLL name in each DirectX SDK revision, but finally settled with d3dcompiler_47.dll and made it part of the OS in Windows 8.1.

Yes, this is how it works on the game consoles, but what would be the benefits of "freezing" minor revisions of Direct3D runtime in Windows, where both graphics drivers by vendors and the OS platform itself are indeed a moving target?

Alessio1989 said:
why Microsoft didn't add a cap-bit to expose 64 slots and UAV for all sahders for what that was like... 50% of the GPU market?

Probably because Fermi/Kepler/Maxwell do not support it - at least the way it was implemented in Direct3D, since OpenGL limits seem to be quite different.

Direct3D feature levels discussion

Deleted member 2197

Guest

Alessio1989

DmitryKo

DmitryKo

DmitryKo

Infinisearch

Alessio1989

Andrew Lauritzen

Moderator

Osamar

willardjuice

super willyjuice

Infinisearch

sebbbi

Alexko

MJP

Alessio1989

Alexko

Andrew Lauritzen

Moderator

Andrew Lauritzen

Moderator

Deleted member 2197

Guest

DmitryKo