Direct3D feature levels discussion

An ACE is an "engine" in this context, however. It is a unit that processes queue messages. An Nvidia front-end unit that processes queue messages sounds a lot like one of the ACEs for this purpose.
 
An ACE is an "engine" in this context, however. It is a unit that processes queue messages. An Nvidia front-end unit that processes queue messages sounds a lot like one of the ACEs for this purpose.

The problem is both are not implemented the same way and at the same level..
 
Last edited:
As DirectCompute guy "Chas Boyd" described in his Dx12/DirectCompute talk ... an "Engine" in the Dx12 world is any type of "core" wether it be a CPU core, GPU core, COPY core...


i.e. All these types of "cores"

CAwpRBLUcAA2A2h.png:large


are now just called "Engines" for "historical reasons within the Dx code"

CAwqOkBUMAAtz_d.png:large



and each "Engine" can have 1 "command" queue for independent async operations

This 1 queue per engine guarantees serial order of execution

You can also prioritize multiple queues (and as explained above each queue is a core "engine") ..
 
Source code is attached for anyone who already has Windows 10 preview with a D3D12-capable card and wants to play with SDK VS2015 CTP6. Absolutely no UI, you have to use the debugger.
I have compiled a small Win32 console app that enumerates hardware adapters in the system, opens a Direct3D12 device on each them to get the options, and outputs them to the screen. It can also output to a file using a simple output redirector, >.
Example as run on my system:
Code:
ADAPTER 0
"AMD Radeon R9 200 Series (Engineering Sample - WDDM v2.0)"
VEN 1002, DEV 67B0, SUBSYS 30801462, REV 00
Dedicated video memory : 3221225472 bytes
Direct3D 12 is supported
Maximum feature level :  D3D_FEATURE_LEVEL_11_1 (11.1)
DoublePrecisionFloatShaderOps : 1
OutputMergerLogicOp : 1
MinPrecisionSupport : D3D12_SHADER_MIN_PRECISION_NONE (0)
TiledResourcesTier : D3D12_TILED_RESOURCES_NOT_SUPPORTED (0)
ResourceBindingTier : D3D12_RESOURCE_BINDING_TIER_3 (3)
PSSpecifiedStencilRefSupported : 1
TypedUAVLoadAdditionalFormats : 1
ROVsSupported : 0
ConservativeRasterizationTier : D3D12_CONSERVATIVE_RASTERIZATION_NOT_SUPPORTED (0)
MaxGPUVirtualAddressBitsPerResource :
StandardSwizzle64KBSupported : 0
ASTCProfile : D3D12_ASTC_PROFILE_NOT_SUPPORTED (0)
CrossNodeSharingTier : D3D12_CROSS_NODE_SHARING_NOT_SUPPORTED (0)
CrossAdapterRowMajorTextureSupported : 0
GPU Node 0:     TileBasedRenderer: 0, UMA: 0, CacheCoherentUMA: 0

ADAPTER 1
"Intel(R) HD Graphics 3000"
VEN 8086, DEV 0112, SUBSYS 01121849, REV 09
Dedicated video memory : 33554432 bytes
Failed to create Direct3D 12 device
Error 887A0004: The specified device interface or feature level is not supported on this system.

FINISHED running on 2015.04.03 02:32.32
2 hardware adapters found
 
As DirectCompute guy "Chas Boyd" described in his Dx12/DirectCompute talk ... an "Engine" in the Dx12 world is any type of "core" wether it be a CPU core, GPU core, COPY core...


i.e. All these types of "cores"



are now just called "Engines" for "historical reasons within the Dx code"

and each "Engine" can have 1 "command" queue for independent async operations

This 1 queue per engine guarantees serial order of execution

You can also prioritize multiple queues (and as explained above each queue is a core "engine") ..
So how does this work with ACE's doing 8 queues each? Are they shown as 8 engines each?
 
So how does this work with ACE's doing 8 queues each? Are they shown as 8 engines each?
In HSA, if my memory is correct, ACEs aka the packet processors are hidden behind its parent HSA Agent. Applications would request the HSA runtime to create queues for a particular HSA Agent, but they have no control over or visibility in how the queues are pushed to the packet processors. In other words, it is the driver's responsibility to schedule and bind the queues to the ACEs.

I think these low-level graphics APIs would go along the same way, as the queue semantics already implies concurrency to bind them to different packet processors (or "engines").
 
A few bits to add.

In other words, it is the driver's responsibility to schedule and bind the queues to the ACEs.
In fact, it can be also the hardware's responsibility too. AMD's HSA driver by default enables hardware scheduling, where the driver provides only a run list to a hardware scheduler. The HW scheduler is independent from the "pipes" aka ACEs, if I am not mistaken, and is responsible of (i) scheduling, binding and unbinding the queues to/from hardware slots in the pipes (8 slots per pipe for CI), (ii) binding and unbinding the processes to/from the hardware (e.g. PASID-VMID) and (iii) also handling doorbell signals of the queues. With it, oversubscription of queues is supported and it is said to support up to 1024 queues per process and up to 512 thousands of queues per device...

I think these low-level graphics APIs would go along the same way, as the queue semantics already implies concurrency to bind them to different packet processors (or "engines").
So I guess it would expose a 3D engine, a compute engine and a copy engine. 3D engine should have just a single queue, anyway, and applications are free to allocate an arbitrary number of queues for compute and copy.
 
I have compiled a small Win32 console app that enumerates hardware adapters in the system, opens a Direct3D12 device on each them to get the options, and outputs them to the screen. It can also output to a file using a simple output redirector
Ryan, can you run this on your Window 10 TP testbed please? I would be particularly interested in Maxwell-2 (GeForce GTX 980) and GCN 1.2 (Radeon R9 285).
 
Ryan, can you run this on your Window 10 TP testbed please? I would be particularly interested in Maxwell-2 (GeForce GTX 980) and GCN 1.2 (Radeon R9 285).
(For anyone that runs Dmitry's program, you're going to want the VC++ 2015 CTP6 x86 redistributable)

Alright, here we go.

Same drivers as before, which are still the latest WDDM 2.0 drivers that I have.

(These are beta drivers on a not even beta OS, so as usual the standard disclaimers about results being subject to change do apply. Especially if not all features are currently being exposed in these drivers)

http://images.anandtech.com/reviews/video/dx12/fls/GTXTitanX.txt
http://images.anandtech.com/reviews/video/dx12/fls/GTX680.txt
http://images.anandtech.com/reviews/video/dx12/fls/GTX750Ti.txt

http://images.anandtech.com/reviews/video/dx12/fls/290X.txt
http://images.anandtech.com/reviews/video/dx12/fls/285.txt
http://images.anandtech.com/reviews/video/dx12/fls/7970.txt

I think it's safe to say that AMD has not yet implemented support for tiled resources on D3D12. And thanks for the program, Dmitry.:)
 
Last edited:
Thanks Ryan, some interesting new bits of info there. It'd be cool to get the same outputs for a Fermi based GPU + Haswell and Broadwell if you have easy access to any of those? We'd then have a full (provisional) picture of all the DX12 supporting architectures.
 
Thanks Ryan, some interesting new bits of info there. It'd be cool to get the same outputs for a Fermi based GPU + Haswell and Broadwell if you have easy access to any of those? We'd then have a full (provisional) picture of all the DX12 supporting architectures.
NVIDIA still hasn't released WDDM 2.0 drivers for Fermi. As for HSW/BDW, I don't have any of those on me at this second, but I can poke the systems guys after Easter.
 
"NVIDIA GeForce GTX 750 Ti"
Maximum feature level : D3D_FEATURE_LEVEL_11_0 (11.0)
ResourceBindingTier : D3D12_RESOURCE_BINDING_TIER_2 (2)

"AMD Radeon HD 7900 Series (Engineering Sample - WDDM v2.0)"
Maximum feature level : D3D_FEATURE_LEVEL_11_1 (11.1)
ResourceBindingTier : D3D12_RESOURCE_BINDING_TIER_3 (3)

TiledResourcesTier : D3D12_TILED_RESOURCES_NOT_SUPPORTED (0) -- AMD will update the driver later for implemented support for tiled resources tier 3 on DX12, because GCN1.0 support Texture3D as well.

9tzwn6.jpg

https://msdn.microsoft.com/en-us/library/windows/desktop/dn280435(v=vs.85).aspx

28tcenn.jpg


Software Rasterization was planned for GCN1.0, but AMD can use ACE for Hardware Conservative Raserization and ROVs. ACE units are like CELL's SPUs as stated by Sony, then there's a workaround.

31358g7.jpg
 
Software Rasterization was planned for GCN1.0, but AMD can use ACE for Hardware Conservative Raserization and ROVs. ACE units are like CELL's SPUs as stated by Sony, then there's a workaround.
First of all, it is not "planned" but a potential application (say hi to Mantle for the same potential use case of async compute). The hardware rasteriser is still there and is here to stay. Secondly, I don't know if Sony has said such thing, but ACEs are definitely not like SPUs in CELL. Last but not least, I doubt ACEs would ever be relevant to conservative rasterization or ROV. Compute pipelines have no access to graphics states, and ROV actually needs information from the rasteriser and likely new hardware IP for ordering (conceptually an ordered counter with lock on each pixel touched). Let alone the fact that ROV is limited to pixel shaders, and conservative rasterisation is a feature of the fixed-function pipeline stages...
 
It'd be cool to get the same outputs for a Fermi based GPU + Haswell and Broadwell if you have easy access to any of those?
I gave some of the info in my presentation at GDC: https://software.intel.com/sites/de...ndering-with-DirectX-12-on-Intel-Graphics.pdf

Note that what the driver returns at this point is fairly arbitrary across all implementations... that's literally just querying caps bits that the driver sets, it's not as if it's testing the features or anything so there are both cases where something that will be supported just isn't flicked on yet and other cases where things are set that may not even work yet. So while the stuff posted so far looks roughly accurate for those architectures (obviously tiled resources is not correct for GCN), do take it all with a grain of salt at this point :)

Anyways for Haswell/Broadwell it's roughly:
- Feature level 11_1
- Tier 1 binding
- ROVs, doubles, OM logic ops are supported
- No conservative raster, additional typed UAV formats, standard swizzle or ASTC
- Half precision (fp16) is supported on Broadwell, but not Haswell

Stuff I don't remember for sure off the top of my head:
- I believe both will ultimately support Tier 1 tiled resources, but may not be reported yet
- PS specified stencil ref is probably not supported on Haswell, don't remember if it is on Broadwell

In any case it's the basic set of ~DX11-level features + ROVs on Haswell/Broadwell. Those architectures obviously predate the interesting design changes in DX12 so it's more a question of fitting the new API onto existing hardware than designing hardware for the new API (ex. see what we have to do with resource binding in the presentation above). Definitely stay tuned and check again in the near future once new architectures come out :)
 
NVIDIA still hasn't released WDDM 2.0 drivers for Fermi. As for HSW/BDW, I don't have any of those on me at this second, but I can poke the systems guys after Easter.
Ryan, are you planning on writing an article regarding feature level support of the various D3D12 supported architectures? It seems almost nobody outside of this forum even knows that D3D12 support != D3D12 feature level compliance. It think it'd get a lot of hits :p
 
Back
Top