Direct3D feature levels discussion

Alessio1989 · Aug 6, 2015

I posted it since I am not aware of any NV related public documentation. If the GPU provides some sort of hardware implementation, than copy and compute queues if possible are executed in parallel, otherwise are serialized by the driver. I am pretty sure that all DX12 capable desktop GPUs take advantage for async copy operations. That's not the same for async compute operations, where only GCN (all revisions), Maxwell 2.0 and Skylake should take advantage of async compute operations. With "take advantage" I man the GPU provides some sort of hardware implementation for copy/compute engine that allows concurrent execution increasing parallelism, but this is hidden to the D3D12 API.

Deleted member 2197 · Aug 6, 2015

Are Volume Tiled Resources the same as Tiled Resources with the exception of working with 3D textures?

sebbbi · Aug 6, 2015

Alessio1989 said:
...async compute operations, where only GCN (all revisions), Maxwell 2.0 and Skylake should take advantage of async compute operations. With "take advantage" I man the GPU provides some sort of hardware implementation for copy/compute engine that allows concurrent execution increasing parallelism, but this is hidden to the D3D12 API.

I am under impressiom that Intel Haswell and Broadwell also support async compute, but not at the same time as graphics. You should get gains if you run multiple compute queues simultaneously (without graphics). This would be benefical for games that do not use the rasterizer at all (like Media Molecule's Dreams).

pharma said:
Are Volume Tiled Resources the same as Tiled Resources with the exception of working with 3D textures?

Yes. For example a 1 byte/pixel 2d texture tile is 256x256 pixels (64 KB page). 3d texture tile of the same texture format would be 64x32x32 pixels.

Alessio1989 · Aug 6, 2015

sebbbi said:
I am under impressiom that Intel Haswell and Broadwell also support async compute, but not at the same time as graphics. You should get gains if you run multiple compute queues simultaneously (without graphics). This would be benefical for games that do not use the rasterizer at all (like Media Molecule's Dreams).

That's nice to know, and this should add a little of extra flexibility on heterogeneous multi-adapter scenarios with Intel Gen7.5/8. I always considered async concurrent execution along with graphics only..

Andrew Lauritzen · Aug 6, 2015

Ryan Smith said:
What time (and what's the session number) for your talk?

I can't figure out how to link the session on the IDF site :S In any case it's GVCS004 - 3D Optimization for Intel Graphics, Gen9 at 9:30-10:30 AM, Room 2007.

NThibieroz said:
Without taking away from the usefulness of other DX12 features I'm surprised by how little asynchronous compute/shaders (aka "multi-engine" in Microsoft terms) is mentioned on this thread.

I think that's just because this is a "feature levels" thread and as noted, all DX12 implementations must support asynchronous queues. From a developer perspective, the capability is not hidden behind a cap because it's supported everywhere, and that's great. There are tons of great DX12 features "in general" (I'll point to execute indirect as another) that are supported everywhere.

Incidentally there are several other threads that are entirely dedicated to async compute, so I don't think we need to derail this one too much

NThibieroz said:
Multi-Engine efficiency will vary across devices but unfortunately there is no CAP or feature level to indicate the level of support.

A "cap" for that sort of thing is highly problematic for a number of reasons. How much better doing things with multiple queues is will depend not only on the hardware architecture, but the nature of the workload itself. For instance, it's obviously not going to be much faster to "asynchronously" run a sampler heavy task alongside another sampler heavy task regardless of the hardware. On the hardware side it becomes far trickier... depending on the architecture a given workload could already be efficiently mapped to use the majority of the machine, or there could be constraints that prevent that leaving much of the machine idle. Even different SKUs can behave very differently here with wider ones typically needing more explicit parallelism.

It's quite similar to CPUs though - the ideal is always to mortgage as little parallelism as possible to fill the machine as there is always parallelization overheads (in this case largely due to the additional synchronization and scheduling). Unfortunately there's no easy way to know "how much" parallelism an implementation needs, and I don't think there's a simple caps bit to express that given the inherent complexities.

NThibieroz said:
(on AMD GCN this is a wavefront).

Not to get too off-topic, but can a single CU be running both compute and 3D at once? I thought there were resources that were overlapped and used by the 3D pipe such that the "granularity" of switching between 3D and compute is at least a whole CU, right? Otherwise is the smem just sitting around doing nothing while 3D is running? :O

Obviously for pure compute workloads most (all?) implementations can mix multiple kernels at a HW thread/"wavefront" granularity.

virpz · Aug 7, 2015

Vendors expose Feature Levels to Direct3D by the drivers ?

Is it up to vendors to expose or hide resources and Feature Levels ?

Can a card that is FL11.1 today become FL12.0 after a driver update ?

Clukos · Aug 7, 2015

virpz said:
Vendors expose Feature Levels to Direct3D by the drivers ?

Is it up to vendors to expose or hide resources and Feature Levels ?

Can a card that is FL11.1 today become FL12.0 after a driver update ?

This post from nvidia suggests what you said: http://blogs.nvidia.com/blog/2015/05/15/dx-12-game-ready-drivers/

Plus, our Maxwell and Kepler GPU architectures already support DX12, with support for Fermi coming later. - See more at: http://blogs.nvidia.com/blog/2015/05/15/dx-12-game-ready-drivers/#sthash.HOz6z4x6.dpuf

virpz · Aug 7, 2015

Clukos said:
This post from nvidia suggests what you said: http://blogs.nvidia.com/blog/2015/05/15/dx-12-game-ready-drivers/

Errr...Nvidia...AMD...

What do you think ?

CarstenS · Aug 7, 2015

Hm?
Of course the vendors expose their cards capabilities through their drivers. Mostly, they have a vital interest to not hold anything back that the ASIC can do and that is somewhere represented in DX space. Sometimes, they are not quite ready with validation and so a driver initially might not expose all features. Some will also think, IHVs could be tempted use this as a means of product differentiation - as is already the case between consumer and professional cards, though not strictly in terms of DX cap bits.

DmitryKo · Aug 7, 2015

virpz said:
Vendors expose Feature Levels to Direct3D by the drivers ? Is it up to vendors to expose or hide resources and Feature Levels ?

Yes, obviously.

Can a card that is FL11.1 today become FL12.0 after a driver update ?

Features can be enabled gradually but only if the card does have the required hardware. Engineering resources are always limited, so development of new features and APIs can take some time - and comprehensive testing probably requires more time than actual development.

The chances for any existing card getting a higher feature level are very slim though. The hardware features of current cards have been known for years - even the new resource management model in Direct3D 12 with virtual memory addressing support from the GPU was planned by graphics vendors for like 8 years and the latest 2011+ GPUs were designed accordingly.

Considering the possibilityof feature level 11_1 being "updated" to to 12_0 - it was said multiple times that AMD's GCN 1.0 cards do not really differ from GCN1.1/1.2, even though GCN 1.0 cards "only" conform to feature level 11_1 and not 12_0 - for the very reason of having two of the three required level 12_0 features enabled as options, specifically Resource Binding Tier 2 and Typed UAV Loads.
https://en.wikipedia.org/wiki/Feature_levels_in_Direct3D#matrix

The only actual difference is Tiled Resources Tier 1 in GCN 1.0 versus Tier 2 required for feature level 12_0 and implemented in GCN 1.1+, but these two tiers are very similar and there is no performance difference or major missing features. NVidia assumes both tiers are practically similar :

TIER_1

Tiled Resource and Tile Pool creation supported
Accessing (r/w) NULL mapped tiles has undefined behavior
- Up to the user to define “default” tile and point all “unmapped” tile mappings to it
Available on all AMD and NVIDIA hardware from the past few years

TIER_2

Relaxes some restrictions
Accessing NULL mapped tiles now defined to return zero
- Writes to NULL mapped discarded
Sample instructions for LOD clamp and getting feedback supported

TIER_1 vs. TIER_2

TIER 1
Tiled Resources √
Tile Pool √
LOD clamp Sample instruction x
Feedback Sample instruction x
NULL mapped behavior: undefined

TIER 2
Tiled Resources √
Tile Pool √
LOD clamp Sample instruction √
Feedback Sample instruction √
NULL mapped behavior: Zero

In general, almost all algorithms can be mapped to both tiers

For example, LOD clamp can be approximated with explicit LOD and gather4
Tier 2 generally just an optimization

DmitryKo · Aug 7, 2015

pharma said:
Are Volume Tiled Resources the same as Tiled Resources with the exception of working with 3D textures?

Yes. "Volume Tiled Resources" is a friendly name for Tiled Resources Tier 3 which indicates support for tiled Texure3D resources. Tiers 1 or 2 indicate support for both tiled Buffer and tiled Texture2D (plus some additional features detailed in the post above).

Note that this is format-dependent, i.e. the DXGI format used must have both Texture3D (Texture2D, Buffer) and Tiled Resource flags, tiers just indicate that all such formats support the relevant resource creation APIs and shader operations.

https://channel9.msdn.com/Events/Build/2013/4-063
https://msdn.microsoft.com/en-us/library/dn903951(v=vs.85).aspx

Clukos said:
This post from nvidia suggests what you said: http://blogs.nvidia.com/blog/2015/05/15/dx-12-game-ready-drivers/

It's different - Fermi drivers currently don't support Direct3D 12 at all, it's not like they just miss a few important features which will be added later.

DmitryKo · Aug 14, 2015

MSDN documentation updates:

DXGI 1.4 Improvements

Direct3D 12 Hardware Feature Levels
Capability Querying
Checking Hardware Feature Support

Format Support for Direct3D Feature Level 12.1 Hardware
Format Support for Direct3D Feature Level 12.0 Hardware

Working Samples
D3D12 Code Walk-Throughs

Alessio1989 · Aug 15, 2015

Hopefully, mobile GPUs will not break that table...

Alessio1989 · Aug 15, 2015

Alessio1989 said:
Hopefully, mobile GPUs will not break that table...

I need to investigate on that Volume tiled resources supported by default on on FL 12.0...

ajmiles · Aug 15, 2015

That table already looks suspect because GCN 1.1+ is FL 12.0 and does not support Volume Tiled Resources.

Kaotik · Aug 16, 2015

ajmiles said:
That table already looks suspect because GCN 1.1+ is FL 12.0 and does not support Volume Tiled Resources.

Link? GCN (1.1+) supports 3D swizzle which should be enough for volume tiled resources when tiled resources are supported, too?

DmitryKo · Aug 16, 2015

Strange, I always assumed that "Volume textures", "3D textures" and "Texture3D" are synonyms.

Volume Tiled Resources says

Volume (3D) textures can be used as tiled resources, noting that tile resolution is three-dimensional.

but also that

D3D12_FEATURE_DATA_D3D12_OPTIONS : holds the supported tile resource tier level and a boolean, VolumeTiledResourcesSupported, indicated whether volume tiled resources are supported.

and there is no such VolumeTiledResourcesSupported member in the latest SDK.

Also D3D12_TILED_RESOURCES_TIER defines that D3D12_TILED_RESOURCES_TIER_3 adds Texture3D support:

Indicates that a superset of Tier 2 is supported, with the addition that 3D textures are supported.

ajmiles · Aug 16, 2015

Volume Textures, 3D Textures and Texture3D are synonyms. All GPUs for at least as long as I've been doing computer graphics have supported 3D Textures.

What Volume *Tiled* Resources introduces is the ability for 3D Textures to be used as Tiled Resources (aka Sparse Textures, aka Partially Resident Textures), and it's that that GCN has never been reported to support. I have a demo out there at the moment that makes use of Volume Tiled Resources and know that the Fury X and 390X both don't report support for Tier 3 Tiled Resources.

Alessio1989 · Aug 16, 2015

VolumeTiledResourcesSupported was a caps-bit in an older never-public SDK... Volume tiled resources is a D3D12_TILED_RESOURCES_TIER / D3D11_TILED_RESOURCES_TIER tier.

sebbbi · Aug 16, 2015

ajmiles said:
What Volume *Tiled* Resources introduces is the ability for 3D Textures to be used as Tiled Resources (aka Sparse Textures, aka Partially Resident Textures), and it's that that GCN has never been reported to support. I have a demo out there at the moment that makes use of Volume Tiled Resources and know that the Fury X and 390X both don't report support for Tier 3 Tiled Resources.

GCN hardware should support virtual memory of any resource (64KB pages). As far as I understand, DX 12.1 volume tiled resources means specially support for 3d swizzle layout (the tiles are cubes, not slices with z=1).

Example 1 bpp 3d texture:
- With 3d swizzle layout = 64x32x32 pixel tiles
- With 2d sizzle layout = 256x256x1 pixel tiles

Data structures such as sparse (voxel/SDF) volumes NEED proper 3d swizzle layout to work properly. I don't know whether you can create 3d tiled textures in DirectX 11 (with 2d swizzle layout), since our main development computers are Windows 7 based (I have never been able to experiment with DirectX 11.2 tiled resources on PC).

Direct3D feature levels discussion

Alessio1989

Deleted member 2197

Guest

sebbbi

Alessio1989

Andrew Lauritzen

Moderator

virpz

Clukos

Bloodborne 2 when?

virpz

CarstenS

Moderator

DmitryKo

DmitryKo

DmitryKo

Alessio1989

Alessio1989

ajmiles

Kaotik

Drunk Member

DmitryKo

ajmiles

Alessio1989

sebbbi