DirectX 12 API Preview

Discussion in 'PC Hardware, Software and Displays' started by PeterAce, Apr 28, 2014.

  1. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    Same hardware, same game settings, same graphics driver:



    The only difference is that on the left we have Windows 10 and on the right Windows 8.1.

    [​IMG]
     
  2. homerdog

    homerdog donator of the year
    Legend Veteran Subscriber

    Joined:
    Jul 25, 2008
    Messages:
    6,216
    Likes Received:
    1,001
    Location:
    still camping with a mauler
    WTF is that about? Something glitched in Win8.1?
     
  3. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    I think it's a side effect of AMD's driver, to be honest. The game is controversial because AMD performance is generally looking very bad, but less bad under W10.

    There may be an interaction with PhysX CPU code, since the game is known to use that library.

    Perhaps it's some something to do with multi-threading and whether AMD's driver behaves differently with respect to threading under W10.
     
  4. mosen

    Regular

    Joined:
    Mar 30, 2013
    Messages:
    452
    Likes Received:
    152
    Windows 10 deriver supports WDDM 2.0?
     
  5. liquidboy

    Regular Newcomer

    Joined:
    Jan 16, 2013
    Messages:
    416
    Likes Received:
    77
    yes ... in Win10 its WDDM 2.0 and Dx12/11.3 .. both tightly coupled
     
  6. liquidboy

    Regular Newcomer

    Joined:
    Jan 16, 2013
    Messages:
    416
    Likes Received:
    77
    Incase anyone wanted to see whats in Dx11.3

    Direct3D 11.3 Features

    Adaptive Scalable Texture Compression

    ASTC provides developers with greater control over the size verses quality tradeoff with textures. ASTC is a lossy format, but one that is designed to provide an inexpensive route to greater quality textures. The idea is that a developer can choose the optimum format without having to support multiple compression schemes.

    Conservative Rasterization

    Conservative rasterization adds some certainty to pixel rendering, which is helpful in particular to collision detection algorithms.

    Default Texture Mapping

    The use of default texture mapping reduces copying and memory usage while sharing image data between the GPU and the CPU. However, it should only be used in specific situations. The standard swizzle layout avoids copying or swizzling data in multiple layouts.

    Rasterizer Order Views

    Rasterizer ordered views (ROVs) allow pixel shader code to mark UAV bindings with a declaration that alters the normal requirements for the order of graphics pipeline results for UAVs. This enables Order Independent Transparency (OIT) algorithms to work, which give much better rendering results when multiple transparent objects are in line with each other in a view.

    Shader Specified Stencil Reference Value

    Enabling pixel shaders to output the Stencil Reference Value, rather than using the API-specified one, enables a very fine granular control over stencil operations.

    Typed Unordered Access View Loads

    Unordered Access View (UAV) Typed Load is the ability for a shader to read from a UAV with a specific DXGI_FORMAT.

    Unified Memory Architecture

    Querying for whether Unified Memory Architecture (UMA) is supported can help determine how to handle some resources.

    Volume Tiled Resources

    Volume (3D) textures can be used as tiled resources, noting that tile resolution is three-dimensional.
     
    homerdog and mosen like this.
  7. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,293
    Location:
    Helsinki, Finland
    So, the biggest performance boosting features were left out of DX11.3.

    There is no ExecuteIndirect. It would have been a super nice feature especially for DX11, as DX11 is so slow on draw calls. This feature would have single handledly increased DX11 draw call performance on par with DX12 (in cases where you don't need to change the GPU state betweeen the draws).

    And there is no asynchronous compute. This was of course expected, as it would have required big API changes.
     
  8. Malo

    Malo Yak Mechanicum
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    7,451
    Likes Received:
    3,476
    Location:
    Pennsylvania
    Which is one of the main public features they're touting with DX12 which also requires Windows 10. Hence there's no way they'd want it ending up in DX11?
     
  9. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,526
    Likes Received:
    454
    Location:
    British Columbia, Canada
    ExecuteIndirect needs root constants/descriptors and resource barriers to be useful and efficient on a variety of hardware.

    IMHO while Microsoft did expose some of the easier GPU capabilities on DX11.3, if developers want to plan ahead it's better to start transitioning to DX12, even if it means making use of the DX11on12 layer in the short term to do partial ports.
     
    #209 Andrew Lauritzen, May 12, 2015
    Last edited: May 12, 2015
  10. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,293
    Location:
    Helsinki, Finland
    That's true. Full ExecuteIndirect needs root constants/descriptors. However they could have implemented a limited subset (equal to OpenGL multiDrawIndirect) to DX 11.3. That would't need any API refactorings at all.
     
  11. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,526
    Likes Received:
    454
    Location:
    British Columbia, Canada
    I think it would need to at least have support for something like draw parameters to be very useful though. If you literally just need a sequence of DrawIndirect calls that's not terribly inefficient to do today; GPUs are pretty efficient at throwing out 0-length draws if you need to cull some of them out.

    Don't get me wrong, I like the feature but it's really the binding changes that make it cool.
     
  12. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,293
    Location:
    Helsinki, Finland
    We ONLY need the ability to control the draw call count from the GPU side. Pushing a constant number of draw calls (most empty) from the CPU side wastes lots of GPU performance (empty draws cost surprisingly much). We don't need binding changes since we use virtual texturing (and all our mesh data is in a single big raw buffer). SV_DrawId would obviously be mandatory.
     
  13. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    With D3D11? What about 12?
     
  14. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,293
    Location:
    Helsinki, Finland
    I am talking about the GPU cost. The command processor will be a big bottleneck if you push the maximum worst case (let's say 50k, mostly empty) draws for each viewport (let's say main + 4 shadow cascades + 10 shadow casting local lights). If you don't know what you are going to render on CPU side, it is hard to estimate tight (conservative) maximums that are never exceeded, especially when you use fine grained (sub object precision) occlusion culling for all viewports (including shadows).

    If DX11 had GPU buffer predicates (skip over set of commands if a GPU buffer memory location contains zero), you could divide the (potentially empty) draws in groups (of 1000 for example) and pay only for the GPU overhead of the last group on each viewport. Unfortunately this would not save any CPU cost.
     
  15. MistaPi

    Regular

    Joined:
    Jun 12, 2002
    Messages:
    363
    Likes Received:
    7
    Location:
    Norway
    A quick question, perhaps not so quick answer. Is it true that D3D12's lower cost to draw calls only helps bad console ports and bad coding in general? Is it true that writing better code and draw things in batches would overcome every benefits D3D12 have with low cost to draw calls?
     
  16. Rodéric

    Rodéric a.k.a. Ingenu
    Moderator Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,022
    Likes Received:
    875
    Location:
    Planet Earth.
    Simple answer : NO.
    Spending less time in the API is spending more time on game's computations, which is always good.
     
  17. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    9,344
    Likes Received:
    8,239
    There are tradeoffs when you batch draw calls. Good code should leverage draw calls where necessary and batch where necessary.

    As the scene gets more graphically complex, and you want to reach a certain level of graphical fidelity, draw calls are likely going to increase with scene complexity.
     
    #217 iroboto, Jun 19, 2015
    Last edited: Jun 19, 2015
  18. MistaPi

    Regular

    Joined:
    Jun 12, 2002
    Messages:
    363
    Likes Received:
    7
    Location:
    Norway
    So is this in term of cost effciency, as in less time in optimizing/minimizing draw calls more time for other things. Or is it also a pure technical limitation with D3D11 which no optimizing can overcome? Either way, is the lower overhead going to be a big step forward in practice?
     
  19. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    9,344
    Likes Received:
    8,239
    I can't answer technically. Senior members here can provide you more accuracy. But my understanding is that there is no way to optimize your API overhead, but you can optimize around it - hence batched draw calls. In D3D11, say you make a call to draw a triangle strip, maybe that unpacks to 50 instructions for the GPU (that the CPU needs to send), where with D3D12 maybe it only takes 8 instructions. As the instruction overhead drops, that also means that GPU saturation can increase. In this scenario the GPU is waiting for all the commands to come in before it starts doing work, so the less instructions it needs to wait before it starts doing work the better. Lower overhead should result in immediate gains, as well allows for better control over the GPU so there should be less time spent fighting against what the API is doing, and more time programming the graphics for the game.
     
  20. Rodéric

    Rodéric a.k.a. Ingenu
    Moderator Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,022
    Likes Received:
    875
    Location:
    Planet Earth.
    I suppose small studios might not have people proficient enough to use low level API, that's why MS is updating D3D11, therefore the gain will be freed CPU & GPU time for those able to use those API.
    Having finer control over memory and lower overhead in the API will make game streaming much easier and with current hardware flexibility it should open the door for noticable new/improved gfx.
    As you said it should also free up dev time that can be spent on better shaders, less aliasing and new techniques...
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...