An ACE is an "engine" in this context, however. It is a unit that processes queue messages. An Nvidia front-end unit that processes queue messages sounds a lot like one of the ACEs for this purpose.
An ACE is an "engine" in this context, however. It is a unit that processes queue messages. An Nvidia front-end unit that processes queue messages sounds a lot like one of the ACEs for this purpose.
I have compiled a small Win32 console app that enumerates hardware adapters in the system, opens a Direct3D12 device on each them to get the options, and outputs them to the screen. It can also output to a file using a simple output redirector, >.Source code is attached for anyone who already has Windows 10 preview with a D3D12-capable card and wants to play with SDK VS2015 CTP6. Absolutely no UI, you have to use the debugger.
ADAPTER 0
"AMD Radeon R9 200 Series (Engineering Sample - WDDM v2.0)"
VEN 1002, DEV 67B0, SUBSYS 30801462, REV 00
Dedicated video memory : 3221225472 bytes
Direct3D 12 is supported
Maximum feature level : D3D_FEATURE_LEVEL_11_1 (11.1)
DoublePrecisionFloatShaderOps : 1
OutputMergerLogicOp : 1
MinPrecisionSupport : D3D12_SHADER_MIN_PRECISION_NONE (0)
TiledResourcesTier : D3D12_TILED_RESOURCES_NOT_SUPPORTED (0)
ResourceBindingTier : D3D12_RESOURCE_BINDING_TIER_3 (3)
PSSpecifiedStencilRefSupported : 1
TypedUAVLoadAdditionalFormats : 1
ROVsSupported : 0
ConservativeRasterizationTier : D3D12_CONSERVATIVE_RASTERIZATION_NOT_SUPPORTED (0)
MaxGPUVirtualAddressBitsPerResource :
StandardSwizzle64KBSupported : 0
ASTCProfile : D3D12_ASTC_PROFILE_NOT_SUPPORTED (0)
CrossNodeSharingTier : D3D12_CROSS_NODE_SHARING_NOT_SUPPORTED (0)
CrossAdapterRowMajorTextureSupported : 0
GPU Node 0: TileBasedRenderer: 0, UMA: 0, CacheCoherentUMA: 0
ADAPTER 1
"Intel(R) HD Graphics 3000"
VEN 8086, DEV 0112, SUBSYS 01121849, REV 09
Dedicated video memory : 33554432 bytes
Failed to create Direct3D 12 device
Error 887A0004: The specified device interface or feature level is not supported on this system.
FINISHED running on 2015.04.03 02:32.32
2 hardware adapters found
So how does this work with ACE's doing 8 queues each? Are they shown as 8 engines each?As DirectCompute guy "Chas Boyd" described in his Dx12/DirectCompute talk ... an "Engine" in the Dx12 world is any type of "core" wether it be a CPU core, GPU core, COPY core...
i.e. All these types of "cores"
are now just called "Engines" for "historical reasons within the Dx code"
and each "Engine" can have 1 "command" queue for independent async operations
This 1 queue per engine guarantees serial order of execution
You can also prioritize multiple queues (and as explained above each queue is a core "engine") ..
In HSA, if my memory is correct, ACEs aka the packet processors are hidden behind its parent HSA Agent. Applications would request the HSA runtime to create queues for a particular HSA Agent, but they have no control over or visibility in how the queues are pushed to the packet processors. In other words, it is the driver's responsibility to schedule and bind the queues to the ACEs.So how does this work with ACE's doing 8 queues each? Are they shown as 8 engines each?
In fact, it can be also the hardware's responsibility too. AMD's HSA driver by default enables hardware scheduling, where the driver provides only a run list to a hardware scheduler. The HW scheduler is independent from the "pipes" aka ACEs, if I am not mistaken, and is responsible of (i) scheduling, binding and unbinding the queues to/from hardware slots in the pipes (8 slots per pipe for CI), (ii) binding and unbinding the processes to/from the hardware (e.g. PASID-VMID) and (iii) also handling doorbell signals of the queues. With it, oversubscription of queues is supported and it is said to support up to 1024 queues per process and up to 512 thousands of queues per device...In other words, it is the driver's responsibility to schedule and bind the queues to the ACEs.
So I guess it would expose a 3D engine, a compute engine and a copy engine. 3D engine should have just a single queue, anyway, and applications are free to allocate an arbitrary number of queues for compute and copy.I think these low-level graphics APIs would go along the same way, as the queue semantics already implies concurrency to bind them to different packet processors (or "engines").
Ryan, can you run this on your Window 10 TP testbed please? I would be particularly interested in Maxwell-2 (GeForce GTX 980) and GCN 1.2 (Radeon R9 285).I have compiled a small Win32 console app that enumerates hardware adapters in the system, opens a Direct3D12 device on each them to get the options, and outputs them to the screen. It can also output to a file using a simple output redirector
Sure. I need to reload Windows 10, but I can probably get around to that tonight.Ryan, can you run this on your Window 10 TP testbed please? I would be particularly interested in Maxwell-2 (GeForce GTX 980) and GCN 1.2 (Radeon R9 285).
(For anyone that runs Dmitry's program, you're going to want the VC++ 2015 CTP6 x86 redistributable)Ryan, can you run this on your Window 10 TP testbed please? I would be particularly interested in Maxwell-2 (GeForce GTX 980) and GCN 1.2 (Radeon R9 285).
NVIDIA still hasn't released WDDM 2.0 drivers for Fermi. As for HSW/BDW, I don't have any of those on me at this second, but I can poke the systems guys after Easter.Thanks Ryan, some interesting new bits of info there. It'd be cool to get the same outputs for a Fermi based GPU + Haswell and Broadwell if you have easy access to any of those? We'd then have a full (provisional) picture of all the DX12 supporting architectures.
1 DMA for GeForce cards and 2 DMA for Quadro/Tesla cards, I think. This chart was posted on AT-forum some days ago.Whats the 1(2) on the Nvidia Cards on the DMA Engines column? And where is this chart from?
First of all, it is not "planned" but a potential application (say hi to Mantle for the same potential use case of async compute). The hardware rasteriser is still there and is here to stay. Secondly, I don't know if Sony has said such thing, but ACEs are definitely not like SPUs in CELL. Last but not least, I doubt ACEs would ever be relevant to conservative rasterization or ROV. Compute pipelines have no access to graphics states, and ROV actually needs information from the rasteriser and likely new hardware IP for ordering (conceptually an ordered counter with lock on each pixel touched). Let alone the fact that ROV is limited to pixel shaders, and conservative rasterisation is a feature of the fixed-function pipeline stages...Software Rasterization was planned for GCN1.0, but AMD can use ACE for Hardware Conservative Raserization and ROVs. ACE units are like CELL's SPUs as stated by Sony, then there's a workaround.
I gave some of the info in my presentation at GDC: https://software.intel.com/sites/de...ndering-with-DirectX-12-on-Intel-Graphics.pdfIt'd be cool to get the same outputs for a Fermi based GPU + Haswell and Broadwell if you have easy access to any of those?
Ryan, are you planning on writing an article regarding feature level support of the various D3D12 supported architectures? It seems almost nobody outside of this forum even knows that D3D12 support != D3D12 feature level compliance. It think it'd get a lot of hitsNVIDIA still hasn't released WDDM 2.0 drivers for Fermi. As for HSW/BDW, I don't have any of those on me at this second, but I can poke the systems guys after Easter.