Direct3D feature levels discussion

I posted it since I am not aware of any NV related public documentation. If the GPU provides some sort of hardware implementation, than copy and compute queues if possible are executed in parallel, otherwise are serialized by the driver. I am pretty sure that all DX12 capable desktop GPUs take advantage for async copy operations. That's not the same for async compute operations, where only GCN (all revisions), Maxwell 2.0 and Skylake should take advantage of async compute operations. With "take advantage" I man the GPU provides some sort of hardware implementation for copy/compute engine that allows concurrent execution increasing parallelism, but this is hidden to the D3D12 API.
 
...async compute operations, where only GCN (all revisions), Maxwell 2.0 and Skylake should take advantage of async compute operations. With "take advantage" I man the GPU provides some sort of hardware implementation for copy/compute engine that allows concurrent execution increasing parallelism, but this is hidden to the D3D12 API.
I am under impressiom that Intel Haswell and Broadwell also support async compute, but not at the same time as graphics. You should get gains if you run multiple compute queues simultaneously (without graphics). This would be benefical for games that do not use the rasterizer at all (like Media Molecule's Dreams).
Are Volume Tiled Resources the same as Tiled Resources with the exception of working with 3D textures?
Yes. For example a 1 byte/pixel 2d texture tile is 256x256 pixels (64 KB page). 3d texture tile of the same texture format would be 64x32x32 pixels.
 
I am under impressiom that Intel Haswell and Broadwell also support async compute, but not at the same time as graphics. You should get gains if you run multiple compute queues simultaneously (without graphics). This would be benefical for games that do not use the rasterizer at all (like Media Molecule's Dreams).
That's nice to know, and this should add a little of extra flexibility on heterogeneous multi-adapter scenarios with Intel Gen7.5/8. I always considered async concurrent execution along with graphics only..
 
What time (and what's the session number) for your talk?
I can't figure out how to link the session on the IDF site :S In any case it's GVCS004 - 3D Optimization for Intel Graphics, Gen9 at 9:30-10:30 AM, Room 2007.

Without taking away from the usefulness of other DX12 features I'm surprised by how little asynchronous compute/shaders (aka "multi-engine" in Microsoft terms) is mentioned on this thread.
I think that's just because this is a "feature levels" thread and as noted, all DX12 implementations must support asynchronous queues. From a developer perspective, the capability is not hidden behind a cap because it's supported everywhere, and that's great. There are tons of great DX12 features "in general" (I'll point to execute indirect as another) that are supported everywhere.

Incidentally there are several other threads that are entirely dedicated to async compute, so I don't think we need to derail this one too much :)

Multi-Engine efficiency will vary across devices but unfortunately there is no CAP or feature level to indicate the level of support.
A "cap" for that sort of thing is highly problematic for a number of reasons. How much better doing things with multiple queues is will depend not only on the hardware architecture, but the nature of the workload itself. For instance, it's obviously not going to be much faster to "asynchronously" run a sampler heavy task alongside another sampler heavy task regardless of the hardware. On the hardware side it becomes far trickier... depending on the architecture a given workload could already be efficiently mapped to use the majority of the machine, or there could be constraints that prevent that leaving much of the machine idle. Even different SKUs can behave very differently here with wider ones typically needing more explicit parallelism.

It's quite similar to CPUs though - the ideal is always to mortgage as little parallelism as possible to fill the machine as there is always parallelization overheads (in this case largely due to the additional synchronization and scheduling). Unfortunately there's no easy way to know "how much" parallelism an implementation needs, and I don't think there's a simple caps bit to express that given the inherent complexities.

(on AMD GCN this is a wavefront).
Not to get too off-topic, but can a single CU be running both compute and 3D at once? I thought there were resources that were overlapped and used by the 3D pipe such that the "granularity" of switching between 3D and compute is at least a whole CU, right? Otherwise is the smem just sitting around doing nothing while 3D is running? :O

Obviously for pure compute workloads most (all?) implementations can mix multiple kernels at a HW thread/"wavefront" granularity.
 
Last edited:
Vendors expose Feature Levels to Direct3D by the drivers ?

Is it up to vendors to expose or hide resources and Feature Levels ?

Can a card that is FL11.1 today become FL12.0 after a driver update ?
 
Vendors expose Feature Levels to Direct3D by the drivers ?

Is it up to vendors to expose or hide resources and Feature Levels ?

Can a card that is FL11.1 today become FL12.0 after a driver update ?

This post from nvidia suggests what you said: http://blogs.nvidia.com/blog/2015/05/15/dx-12-game-ready-drivers/
Plus, our Maxwell and Kepler GPU architectures already support DX12, with support for Fermi coming later. - See more at: http://blogs.nvidia.com/blog/2015/05/15/dx-12-game-ready-drivers/#sthash.HOz6z4x6.dpuf
 
Hm?
Of course the vendors expose their cards capabilities through their drivers. Mostly, they have a vital interest to not hold anything back that the ASIC can do and that is somewhere represented in DX space. Sometimes, they are not quite ready with validation and so a driver initially might not expose all features. Some will also think, IHVs could be tempted use this as a means of product differentiation - as is already the case between consumer and professional cards, though not strictly in terms of DX cap bits.
 
Vendors expose Feature Levels to Direct3D by the drivers ? Is it up to vendors to expose or hide resources and Feature Levels ?
Yes, obviously.

Can a card that is FL11.1 today become FL12.0 after a driver update ?
Features can be enabled gradually but only if the card does have the required hardware. Engineering resources are always limited, so development of new features and APIs can take some time - and comprehensive testing probably requires more time than actual development.

The chances for any existing card getting a higher feature level are very slim though. The hardware features of current cards have been known for years - even the new resource management model in Direct3D 12 with virtual memory addressing support from the GPU was planned by graphics vendors for like 8 years and the latest 2011+ GPUs were designed accordingly.


Considering the possibilityof feature level 11_1 being "updated" to to 12_0 - it was said multiple times that AMD's GCN 1.0 cards do not really differ from GCN1.1/1.2, even though GCN 1.0 cards "only" conform to feature level 11_1 and not 12_0 - for the very reason of having two of the three required level 12_0 features enabled as options, specifically Resource Binding Tier 2 and Typed UAV Loads.
https://en.wikipedia.org/wiki/Feature_levels_in_Direct3D#matrix

The only actual difference is Tiled Resources Tier 1 in GCN 1.0 versus Tier 2 required for feature level 12_0 and implemented in GCN 1.1+, but these two tiers are very similar and there is no performance difference or major missing features. NVidia assumes both tiers are practically similar :

TIER_1
  • Tiled Resource and Tile Pool creation supported
  • Accessing (r/w) NULL mapped tiles has undefined behavior
    • Up to the user to define “default” tile and point all “unmapped” tile mappings to it
  • Available on all AMD and NVIDIA hardware from the past few years

TIER_2
  • Relaxes some restrictions
  • Accessing NULL mapped tiles now defined to return zero
    • Writes to NULL mapped discarded
  • Sample instructions for LOD clamp and getting feedback supported

TIER_1 vs. TIER_2


TIER 1
Tiled Resources √
Tile Pool √
LOD clamp Sample instruction x
Feedback Sample instruction x
NULL mapped behavior: undefined

TIER 2
Tiled Resources √
Tile Pool √
LOD clamp Sample instruction √
Feedback Sample instruction √
NULL mapped behavior: Zero

In general, almost all algorithms can be mapped to both tiers
  • For example, LOD clamp can be approximated with explicit LOD and gather4
  • Tier 2 generally just an optimization
 
Last edited:
Are Volume Tiled Resources the same as Tiled Resources with the exception of working with 3D textures?
Yes. "Volume Tiled Resources" is a friendly name for Tiled Resources Tier 3 which indicates support for tiled Texure3D resources. Tiers 1 or 2 indicate support for both tiled Buffer and tiled Texture2D (plus some additional features detailed in the post above).

Note that this is format-dependent, i.e. the DXGI format used must have both Texture3D (Texture2D, Buffer) and Tiled Resource flags, tiers just indicate that all such formats support the relevant resource creation APIs and shader operations.

https://channel9.msdn.com/Events/Build/2013/4-063
https://msdn.microsoft.com/en-us/library/dn903951(v=vs.85).aspx

It's different - Fermi drivers currently don't support Direct3D 12 at all, it's not like they just miss a few important features which will be added later.
 
Last edited:
That table already looks suspect because GCN 1.1+ is FL 12.0 and does not support Volume Tiled Resources.
Link? GCN (1.1+) supports 3D swizzle which should be enough for volume tiled resources when tiled resources are supported, too?
 
Strange, I always assumed that "Volume textures", "3D textures" and "Texture3D" are synonyms.

Volume Tiled Resources says

Volume (3D) textures can be used as tiled resources, noting that tile resolution is three-dimensional.
but also that

D3D12_FEATURE_DATA_D3D12_OPTIONS : holds the supported tile resource tier level and a boolean, VolumeTiledResourcesSupported, indicated whether volume tiled resources are supported.
and there is no such VolumeTiledResourcesSupported member in the latest SDK.


Also D3D12_TILED_RESOURCES_TIER defines that D3D12_TILED_RESOURCES_TIER_3 adds Texture3D support:

Indicates that a superset of Tier 2 is supported, with the addition that 3D textures are supported.​
 
Volume Textures, 3D Textures and Texture3D are synonyms. All GPUs for at least as long as I've been doing computer graphics have supported 3D Textures.

What Volume *Tiled* Resources introduces is the ability for 3D Textures to be used as Tiled Resources (aka Sparse Textures, aka Partially Resident Textures), and it's that that GCN has never been reported to support. I have a demo out there at the moment that makes use of Volume Tiled Resources and know that the Fury X and 390X both don't report support for Tier 3 Tiled Resources.
 
VolumeTiledResourcesSupported was a caps-bit in an older never-public SDK... Volume tiled resources is a D3D12_TILED_RESOURCES_TIER / D3D11_TILED_RESOURCES_TIER tier.
 
What Volume *Tiled* Resources introduces is the ability for 3D Textures to be used as Tiled Resources (aka Sparse Textures, aka Partially Resident Textures), and it's that that GCN has never been reported to support. I have a demo out there at the moment that makes use of Volume Tiled Resources and know that the Fury X and 390X both don't report support for Tier 3 Tiled Resources.
GCN hardware should support virtual memory of any resource (64KB pages). As far as I understand, DX 12.1 volume tiled resources means specially support for 3d swizzle layout (the tiles are cubes, not slices with z=1).

Example 1 bpp 3d texture:
- With 3d swizzle layout = 64x32x32 pixel tiles
- With 2d sizzle layout = 256x256x1 pixel tiles

Data structures such as sparse (voxel/SDF) volumes NEED proper 3d swizzle layout to work properly. I don't know whether you can create 3d tiled textures in DirectX 11 (with 2d swizzle layout), since our main development computers are Windows 7 based (I have never been able to experiment with DirectX 11.2 tiled resources on PC).
 
Back
Top