DX12 Performance Discussion And Analysis Thread

Discussion in 'Rendering Technology and APIs' started by A1xLLcqAgt0qc2RyMz0y, Jul 29, 2015.

  1. CasellasAbdala

    Newcomer

    Joined:
    Aug 28, 2015
    Messages:
    11
    Likes Received:
    5
    Talking about gaming performance, how COULD this affect maxwell 2 in the future compared to a Fury X and a 390x for example?
    Im a little bit lost with all theese low level definitions. Ive only coded a little bit in assembly for MIPS. Thats mainly it lol.
     
  2. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY
    lol yeah that's a good question, too hard to say, nV can do tricks where it offloads things to the cpu and shuffles things back to the gpu, so.....

    But if Maxwell 2's queues are over loaded, expect to see a performance hit, and depending on how big the over load is, the hit can be large. But it all depends on the game, will soon to be released games push it that much it creates a bottleneck that just shuts down Maxwell 2, i don't think they will and by the time games like that come out I think other parts of all the GPU's will be struggling.
     
  3. CasellasAbdala

    Newcomer

    Joined:
    Aug 28, 2015
    Messages:
    11
    Likes Received:
    5
    Then, wouldnt it be better for them to just leave Async Shaders off? And work as they used to with DX11?
     
  4. Darius

    Newcomer

    Joined:
    Sep 27, 2013
    Messages:
    37
    Likes Received:
    30
    Maybe I'm being naive, but no hardware is perfect and everything has limitations to work around. I'm hearing that real world best case is a 30% boost, if NVIDIA's implementation can capture even half of that I think they'll be in a good spot and either way it doesn't sound make or break. If the NVIDIA solution starts to fall apart under heavy load I wonder if devs won't go the extra mile beyond that point, given that NVIDIA is the majority. Although TBH I don't know whether this is something that requires careful optimization and hard work, or a simple variable (like number of threads in handbrake) that you turn up until it breaks.
     
  5. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY
    Well no, because there is performance to be gained, and that performance can be significant similar what we see in GCN now or more, as long as its not pushed too much.

    From a software development project management point of view, first the driver team will sit down and go over what they need done first, after the critical path is defined *critical path are the tasks that have to be done before anything else is done *. now Async will be a low priority because games that are going to be using it won't be out any time soon, it gets pushed to the end of the backlog.
     
  6. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,081
    Likes Received:
    651
    Location:
    O Canada!
    Not sure why "games using async shades won't be coming anytime soon" given that exposure to developers under the consoles has been there for a number of years now and it sounds like it has already being implemented. DX 12 means those characteristics are available on more platforms.
     
  7. CasellasAbdala

    Newcomer

    Joined:
    Aug 28, 2015
    Messages:
    11
    Likes Received:
    5
    Allright allright now we are on my field (Systems/Software Engeneering), and i definetly agree with you, this wont likely be an issue for Maxwell 2 during the next years, and by the time it would (just lowering some settings will be fair enough, all in all, other parts of the gpu will struggle by then, as you said).
    Therefore, this puts Maxwell 2 in a similar position to Fiji (Given the 4gb Vram and other specs that lack performance)...

    Also, I think that given the priority Async Shaders have and the phases Nvidia is going through in its development cycle its natural to see inmature Async development in their drivers. Im pretty sure their marketing dpt wasnt expecting this at all...
     
    Razor1 likes this.
  8. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    3,326
    Likes Received:
    1,952
    Interesting comments from the Oxide Dev ... guess creating a DX12 game isn't going to be as straight forward as they may have initially thought.
     
    Razor1 likes this.
  9. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY

    I agree with that but when creating a new driver set they still have to focus on current games and games just about to come out., so depending on how deep their driver team is they might have prioritized for them first.

    We have seen this before with Windows 7 and Dx11 too.

    Actually this comes up to from a business point of view getting drivers ready for current games and games just about to come up is more important too. Because although they would have to ignore developers wanting to use new features in the longer term, they still have time to get to them. There is always both aspects when development is happening business and development vs time and cost.
     
    #569 Razor1, Sep 5, 2015
    Last edited: Sep 5, 2015
  10. CasellasAbdala

    Newcomer

    Joined:
    Aug 28, 2015
    Messages:
    11
    Likes Received:
    5
    Well, Unreal Engine 4 just implemented it... Also, PS4 and Mantle have been able to use this for a long time, but theres hasnt been an important focus onto this subject yet. Im pretty sure all the fuzz about Async Development, grew exponentially after this years GDC. (Ofc not for devs, but for the general public).

    Also, AMD on their demo, got around 10% perf boost using this, some devs say it can get to a max of 30% performance... (Oxide states they used a mdoerate ammount of this)... this makes me wonder...

    How much will this really improove (performance wise) in games...

    And, (this is a question for you guys who actually know a LOT about this)... when and what for are theese shaders used? (Would love an easy example of a kind of game which would use them the most) (Will FPS or games liek The Witcher use a lot of them?)
     
  11. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY
    Lighting algorithms, deferred rendering, physics, all can use heavy compute.
     
    digitalwanderer likes this.
  12. huebie

    Newcomer

    Joined:
    Apr 10, 2012
    Messages:
    29
    Likes Received:
    5
    As nVidia said: Heavy pp will slow down the GPU with async shading. Since Ashes of Singularity uses heavy pp with a lot of units and lights there is such a rare case in practice with other games out there that this will happen often. But i don't disagree with your statement - quite the contrary. Thanks for sharing your knowledge.
     
  13. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    16,861
    Likes Received:
    6,001
    There is no max 30%. There's already one developer on this forum that has mentioned certain tasks can see greater than 50% improvement from async compute. It's all going to depend on whether a particular job is well suited to it or not. If it isn't you aren't likely to see any performance gains. If it's particularly well suited you may see large gains.

    Regards,
    SB
     
    Jackalito, BRiT and Razor1 like this.
  14. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,469
    Likes Received:
    315
    Location:
    Switzerland
    Thanks to everyone who have provide us with numbers ..... even if this have make the topic a bit hard to understand as a bit noisy.

    Still i dont see how they will do the scheduling with efficiency at driver level.. the only way is to get the devs who make the pre empt scheduling ( fixed tasks / queue )... but maybe the result will not be bad when well optimized.
     
  15. chris1515

    Veteran Regular

    Joined:
    Jul 24, 2005
    Messages:
    4,203
    Likes Received:
    2,915
    Location:
    Barcelona Spain
    People forget changing a rendering engine is a long road. Some consoles games have some compute shaders but aren't compute heavy... All tiled rendered games use it like BF4 and BF Hardline, particles physics in Infamous, Force fields in KZ SF, some compute shader too in The Order 1886... Same thing for Forward + games...

    Some 2015 title will use compute heavily like The Tomorrow Children maybe Rise of the Tomb Raider, Battlefront, Need for Speed...

    I think 2016 will probably be the year of compute shader heavy consoles title... Dreams 100% compute shaders, maybe Red Lynx game, UC4, Quantum Break , Deux Ex Mankind divided and so on...

    From Frosbite developer they told their games will have a DX12 mode fall 2016 on PC...
     
    #575 chris1515, Sep 5, 2015
    Last edited: Sep 5, 2015
    Razor1 and BRiT like this.
  16. Ext3h

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    365
    Likes Received:
    319
    What I meant by batching, is actually stapling multiple tasks onto each other in a sequential order, in order to keep the queues filled for a longer time. What you call "blending" is what the hardware is supposed to do on its own with the 32 queues. There are 2 cases which you want to avoid:
    • Underutilizing special function units while the GPU is active
    • Idle GPU
    Blending instructions from multiple tasks helps with the 1st point. Enqueuing additional tasks right behind already queued ones helps with the 2nd one, and that's what Nvidia appears to be missing so far, even though that is just a driver, not a hardware feature.
     
    serversurfer likes this.
  17. Ext3h

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    365
    Likes Received:
    319
    Are you sure? I mean the register files being too small for actual parallel computations is a well know issue, so that can be a likely issue when parallelism raises.

    But the caches shouldn't limit the size of each queue. If you insist on pushing longer programs into each queue, the hardware should cope with that quite well. More cache misses, yes. Possibly even running into the memory bandwidth limit. But I don't see how this would possibly affect the refill of the queues. Currently, it only looks as if the queues are simply underrunning far too often, due to a lack of used queue depth.
     
    Razor1 likes this.
  18. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,293
    Location:
    Helsinki, Finland
    Also compute is used for: skinning, global illumination, occlusion culling, particle animation, depth of field, fast blur kernels (ex. bloom), reductions (ex. average luminance for eye adaptation), screen space reflections, distance field ray tracing (ex. soft shadows and AO in UE4), etc, etc.

    Media Molecule's new game "Dreams" has a GPU pipeline that is fully compute shader based (no rasterization at all).

    Q-Games (Tomorrow's Children) had quite nice performance improvements from asynchronous compute in their global illumination implementation. DICE (Frostbite) is using asynchronous compute for character skinning. Their presentation described skinning being almost free this way, as the async skinning fills holes in GPU execution. Resent presentations from game/technology developers have shown lots of uses cases for asynchronous compute (on consoles). As DX12 supports multiple compute queues, we will certainly see similar optimizations on PC.
     
    Lightman, drSeehas, Jackalito and 8 others like this.
  19. chris1515

    Veteran Regular

    Joined:
    Jul 24, 2005
    Messages:
    4,203
    Likes Received:
    2,915
    Location:
    Barcelona Spain
    Particle scatter/gather like in the 2014 AMD presentation with particles without overdraw... No implementation into any released games
     
    Razor1 likes this.
  20. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,293
    Location:
    Helsinki, Finland
    Media Molecule must be using something similar in Dreams since they don't use rasterizer at all. So they will be likely the first ones to ship a fully compute based particle engine (that I know of). I am sure many others are using/evaluating similar tiled particle system as described in the AMD paper. It provides impressive gains for heavy overdraw cases and greatly reduces the bandwidth usage.
     
    Lightman, drSeehas, Jackalito and 2 others like this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...