Direct3D feature levels discussion

Discussion in 'Rendering Technology and APIs' started by DmitryKo, Feb 20, 2015.

  1. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,120
    Likes Received:
    2,867
    Location:
    Well within 3d
    An ACE is an "engine" in this context, however. It is a unit that processes queue messages. An Nvidia front-end unit that processes queue messages sounds a lot like one of the ACEs for this purpose.
     
  2. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,469
    Likes Received:
    315
    Location:
    Switzerland
    The problem is both are not implemented the same way and at the same level..
     
    #142 lanek, Apr 2, 2015
    Last edited: Apr 3, 2015
  3. liquidboy

    Regular Newcomer

    Joined:
    Jan 16, 2013
    Messages:
    416
    Likes Received:
    77
    As DirectCompute guy "Chas Boyd" described in his Dx12/DirectCompute talk ... an "Engine" in the Dx12 world is any type of "core" wether it be a CPU core, GPU core, COPY core...


    i.e. All these types of "cores"

    [​IMG]

    are now just called "Engines" for "historical reasons within the Dx code"

    [​IMG]


    and each "Engine" can have 1 "command" queue for independent async operations

    This 1 queue per engine guarantees serial order of execution

    You can also prioritize multiple queues (and as explained above each queue is a core "engine") ..
     
  4. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    642
    Likes Received:
    485
    Location:
    55°38′33″ N, 37°28′37″ E
    I have compiled a small Win32 console app that enumerates hardware adapters in the system, opens a Direct3D12 device on each them to get the options, and outputs them to the screen. It can also output to a file using a simple output redirector, >.
    Example as run on my system:
    Code:
    ADAPTER 0
    "AMD Radeon R9 200 Series (Engineering Sample - WDDM v2.0)"
    VEN 1002, DEV 67B0, SUBSYS 30801462, REV 00
    Dedicated video memory : 3221225472 bytes
    Direct3D 12 is supported
    Maximum feature level :  D3D_FEATURE_LEVEL_11_1 (11.1)
    DoublePrecisionFloatShaderOps : 1
    OutputMergerLogicOp : 1
    MinPrecisionSupport : D3D12_SHADER_MIN_PRECISION_NONE (0)
    TiledResourcesTier : D3D12_TILED_RESOURCES_NOT_SUPPORTED (0)
    ResourceBindingTier : D3D12_RESOURCE_BINDING_TIER_3 (3)
    PSSpecifiedStencilRefSupported : 1
    TypedUAVLoadAdditionalFormats : 1
    ROVsSupported : 0
    ConservativeRasterizationTier : D3D12_CONSERVATIVE_RASTERIZATION_NOT_SUPPORTED (0)
    MaxGPUVirtualAddressBitsPerResource :
    StandardSwizzle64KBSupported : 0
    ASTCProfile : D3D12_ASTC_PROFILE_NOT_SUPPORTED (0)
    CrossNodeSharingTier : D3D12_CROSS_NODE_SHARING_NOT_SUPPORTED (0)
    CrossAdapterRowMajorTextureSupported : 0
    GPU Node 0:     TileBasedRenderer: 0, UMA: 0, CacheCoherentUMA: 0
    
    ADAPTER 1
    "Intel(R) HD Graphics 3000"
    VEN 8086, DEV 0112, SUBSYS 01121849, REV 09
    Dedicated video memory : 33554432 bytes
    Failed to create Direct3D 12 device
    Error 887A0004: The specified device interface or feature level is not supported on this system.
    
    FINISHED running on 2015.04.03 02:32.32
    2 hardware adapters found
     
    homerdog, Kej, Ryan Smith and 4 others like this.
  5. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,143
    Likes Received:
    1,830
    Location:
    Finland
    So how does this work with ACE's doing 8 queues each? Are they shown as 8 engines each?
     
  6. pTmdfx

    Newcomer

    Joined:
    May 27, 2014
    Messages:
    249
    Likes Received:
    128
    In HSA, if my memory is correct, ACEs aka the packet processors are hidden behind its parent HSA Agent. Applications would request the HSA runtime to create queues for a particular HSA Agent, but they have no control over or visibility in how the queues are pushed to the packet processors. In other words, it is the driver's responsibility to schedule and bind the queues to the ACEs.

    I think these low-level graphics APIs would go along the same way, as the queue semantics already implies concurrency to bind them to different packet processors (or "engines").
     
  7. pTmdfx

    Newcomer

    Joined:
    May 27, 2014
    Messages:
    249
    Likes Received:
    128
    A few bits to add.

    In fact, it can be also the hardware's responsibility too. AMD's HSA driver by default enables hardware scheduling, where the driver provides only a run list to a hardware scheduler. The HW scheduler is independent from the "pipes" aka ACEs, if I am not mistaken, and is responsible of (i) scheduling, binding and unbinding the queues to/from hardware slots in the pipes (8 slots per pipe for CI), (ii) binding and unbinding the processes to/from the hardware (e.g. PASID-VMID) and (iii) also handling doorbell signals of the queues. With it, oversubscription of queues is supported and it is said to support up to 1024 queues per process and up to 512 thousands of queues per device...

    So I guess it would expose a 3D engine, a compute engine and a copy engine. 3D engine should have just a single queue, anyway, and applications are free to allocate an arbitrary number of queues for compute and copy.
     
    pharma likes this.
  8. DmitryKo

    Regular

    Joined:
    Feb 26, 2002
    Messages:
    642
    Likes Received:
    485
    Location:
    55°38′33″ N, 37°28′37″ E
    Ryan, can you run this on your Window 10 TP testbed please? I would be particularly interested in Maxwell-2 (GeForce GTX 980) and GCN 1.2 (Radeon R9 285).
     
  9. Nemo

    Newcomer

    Joined:
    Sep 15, 2012
    Messages:
    125
    Likes Received:
    23
  10. Ryan Smith

    Regular

    Joined:
    Mar 26, 2010
    Messages:
    609
    Likes Received:
    1,036
    Location:
    PCIe x16_1
    Sure. I need to reload Windows 10, but I can probably get around to that tonight.
     
  11. pTmdfx

    Newcomer

    Joined:
    May 27, 2014
    Messages:
    249
    Likes Received:
    128
    Should've read the D3D12 API docs earlier. ID3D12Device::CreateCommandQueue depicts a very similar queuing model to HSA, where the application needs to specify only a queue type (Direct, Bundle, Compute or Copy) to create a queue specific to a particular device.
     
  12. Ryan Smith

    Regular

    Joined:
    Mar 26, 2010
    Messages:
    609
    Likes Received:
    1,036
    Location:
    PCIe x16_1
    (For anyone that runs Dmitry's program, you're going to want the VC++ 2015 CTP6 x86 redistributable)

    Alright, here we go.

    Same drivers as before, which are still the latest WDDM 2.0 drivers that I have.

    (These are beta drivers on a not even beta OS, so as usual the standard disclaimers about results being subject to change do apply. Especially if not all features are currently being exposed in these drivers)

    http://images.anandtech.com/reviews/video/dx12/fls/GTXTitanX.txt
    http://images.anandtech.com/reviews/video/dx12/fls/GTX680.txt
    http://images.anandtech.com/reviews/video/dx12/fls/GTX750Ti.txt

    http://images.anandtech.com/reviews/video/dx12/fls/290X.txt
    http://images.anandtech.com/reviews/video/dx12/fls/285.txt
    http://images.anandtech.com/reviews/video/dx12/fls/7970.txt

    I think it's safe to say that AMD has not yet implemented support for tiled resources on D3D12. And thanks for the program, Dmitry.:)
     
    #152 Ryan Smith, Apr 4, 2015
    Last edited: Apr 4, 2015
    homerdog, silent_guy, pharma and 3 others like this.
  13. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    7,583
    Likes Received:
    703
    Location:
    Guess...
    Thanks Ryan, some interesting new bits of info there. It'd be cool to get the same outputs for a Fermi based GPU + Haswell and Broadwell if you have easy access to any of those? We'd then have a full (provisional) picture of all the DX12 supporting architectures.
     
  14. Ryan Smith

    Regular

    Joined:
    Mar 26, 2010
    Messages:
    609
    Likes Received:
    1,036
    Location:
    PCIe x16_1
    NVIDIA still hasn't released WDDM 2.0 drivers for Fermi. As for HSW/BDW, I don't have any of those on me at this second, but I can poke the systems guys after Easter.
     
    pjbliverpool likes this.
  15. Metal_Spirit

    Regular Newcomer

    Joined:
    Jan 3, 2007
    Messages:
    396
    Likes Received:
    183
    Whats the 1(2) on the Nvidia Cards on the DMA Engines column? And where is this chart from?
     
  16. Nemo

    Newcomer

    Joined:
    Sep 15, 2012
    Messages:
    125
    Likes Received:
    23
    "NVIDIA GeForce GTX 750 Ti"
    Maximum feature level : D3D_FEATURE_LEVEL_11_0 (11.0)
    ResourceBindingTier : D3D12_RESOURCE_BINDING_TIER_2 (2)

    "AMD Radeon HD 7900 Series (Engineering Sample - WDDM v2.0)"
    Maximum feature level : D3D_FEATURE_LEVEL_11_1 (11.1)
    ResourceBindingTier : D3D12_RESOURCE_BINDING_TIER_3 (3)

    TiledResourcesTier : D3D12_TILED_RESOURCES_NOT_SUPPORTED (0) -- AMD will update the driver later for implemented support for tiled resources tier 3 on DX12, because GCN1.0 support Texture3D as well.

    [​IMG]
    https://msdn.microsoft.com/en-us/library/windows/desktop/dn280435(v=vs.85).aspx

    [​IMG]

    Software Rasterization was planned for GCN1.0, but AMD can use ACE for Hardware Conservative Raserization and ROVs. ACE units are like CELL's SPUs as stated by Sony, then there's a workaround.

    [​IMG]
     
  17. Nemo

    Newcomer

    Joined:
    Sep 15, 2012
    Messages:
    125
    Likes Received:
    23
    1 DMA for GeForce cards and 2 DMA for Quadro/Tesla cards, I think. This chart was posted on AT-forum some days ago.
     
  18. pTmdfx

    Newcomer

    Joined:
    May 27, 2014
    Messages:
    249
    Likes Received:
    128
    First of all, it is not "planned" but a potential application (say hi to Mantle for the same potential use case of async compute). The hardware rasteriser is still there and is here to stay. Secondly, I don't know if Sony has said such thing, but ACEs are definitely not like SPUs in CELL. Last but not least, I doubt ACEs would ever be relevant to conservative rasterization or ROV. Compute pipelines have no access to graphics states, and ROV actually needs information from the rasteriser and likely new hardware IP for ordering (conceptually an ordered counter with lock on each pixel touched). Let alone the fact that ROV is limited to pixel shaders, and conservative rasterisation is a feature of the fixed-function pipeline stages...
     
    pjbliverpool likes this.
  19. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,526
    Likes Received:
    454
    Location:
    British Columbia, Canada
    I gave some of the info in my presentation at GDC: https://software.intel.com/sites/de...ndering-with-DirectX-12-on-Intel-Graphics.pdf

    Note that what the driver returns at this point is fairly arbitrary across all implementations... that's literally just querying caps bits that the driver sets, it's not as if it's testing the features or anything so there are both cases where something that will be supported just isn't flicked on yet and other cases where things are set that may not even work yet. So while the stuff posted so far looks roughly accurate for those architectures (obviously tiled resources is not correct for GCN), do take it all with a grain of salt at this point :)

    Anyways for Haswell/Broadwell it's roughly:
    - Feature level 11_1
    - Tier 1 binding
    - ROVs, doubles, OM logic ops are supported
    - No conservative raster, additional typed UAV formats, standard swizzle or ASTC
    - Half precision (fp16) is supported on Broadwell, but not Haswell

    Stuff I don't remember for sure off the top of my head:
    - I believe both will ultimately support Tier 1 tiled resources, but may not be reported yet
    - PS specified stencil ref is probably not supported on Haswell, don't remember if it is on Broadwell

    In any case it's the basic set of ~DX11-level features + ROVs on Haswell/Broadwell. Those architectures obviously predate the interesting design changes in DX12 so it's more a question of fitting the new API onto existing hardware than designing hardware for the new API (ex. see what we have to do with resource binding in the presentation above). Definitely stay tuned and check again in the near future once new architectures come out :)
     
    pjbliverpool likes this.
  20. homerdog

    homerdog donator of the year
    Legend Veteran Subscriber

    Joined:
    Jul 25, 2008
    Messages:
    6,124
    Likes Received:
    902
    Location:
    still camping with a mauler
    Ryan, are you planning on writing an article regarding feature level support of the various D3D12 supported architectures? It seems almost nobody outside of this forum even knows that D3D12 support != D3D12 feature level compliance. It think it'd get a lot of hits :razz:
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...