DX12 Performance Discussion And Analysis Thread

Discussion in 'Rendering Technology and APIs' started by A1xLLcqAgt0qc2RyMz0y, Jul 29, 2015.

  1. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    Unreal and Unity GPU profiling tools aren't that great for graphics programmers. Only basic counters and timing brackets. You'd want to know the reason why the shader is slow. PC PIX is a good start. They are adding hardware GPU counter support soon. That will allow you to know something about the hardware bottlenecks. Hopefully it gets closer to console profiling experience.
     
    Pixel, Razor1, Lightman and 3 others like this.
  2. Jay

    Jay
    Veteran

    Joined:
    Aug 3, 2013
    Messages:
    4,032
    Likes Received:
    3,428
    I thought this was demoed in build last year or the year before.
    Has it really taken so long to be released?
     
  3. ajmiles

    Newcomer

    Joined:
    Feb 4, 2014
    Messages:
    7
    Likes Received:
    1
    Location:
    UK
    It was announced at GDC last year and I don't believe has been shown in any form until Wednesday.
     
  4. willardjuice

    willardjuice super willyjuice
    Moderator Veteran Alpha

    Joined:
    May 14, 2005
    Messages:
    1,386
    Likes Received:
    299
    Location:
    NY
    That's not completely accurate. Unity has a frame debugger that let's you step through draw calls. I solved a lot problems this way. You're right though it won't help you solve all issues (especially ones that only exist on certain ihvs), but it's more than basic counters and timing brackets. :)
     
    Razor1 and pharma like this.
  5. Alessio1989

    Regular

    Joined:
    Jun 6, 2015
    Messages:
    614
    Likes Received:
    321
    sebbbi, Razor1 and pharma like this.
  6. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    946
    Likes Received:
    413
    No driver apart from possibly WARP supports it, completely academic currently.
     
  7. Pinstripe

    Newcomer

    Joined:
    Feb 24, 2013
    Messages:
    153
    Likes Received:
    133
    Shader Model 6.0 will arrrive with the Creators Update. Nvidia/AMD should then have WDDM 2.2 drivers ready for release to support it.
     
  8. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    I was talking about profiling. Frame debuggers are a different thing. Unity's frame debugger is nice, but RenderDoc exceeds it clearly and works with any engine. RenderDoc for example allows you to step in shaders and inspect variables (registers) on every line of shader execution. You can also edit shaders on fly and check modified results (and timings). RenderDoc has excellent resource viewers (with captured human readable resource names). You can inspect buffers by raw binary view or input any type (and it reinterpret casts the data before showing it). RenderDoc also captures constant buffer layouts, making it easy to check whether all constants are correctly set for draws and dispatches. PC graphics development without RenderDoc is awful. I would say RenderDoc matches console debugging tools.

    Console profiling tools however are way ahead of generic PC profiling tools. You get thousands of HW counters and various analysis reports based on the counter values (ALU, bandwidth, cache, bank conflicts, geometry pipeline stats, etc). Occupancy and bottleneck graphs are invaluable in performance finetuning, especially with async compute & overlapping draws/dispatches. The new PC PIX will soon expose GPU counters. This will be huge improvement over staring at per draw/dispatch millisecond counter without knowing why the GPU behaves the way it does.
     
    dogen, Razor1 and Ike Turner like this.
  9. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,469
    Likes Received:
    315
    Location:
    Switzerland
    And that is somewhat a real mystery for me. Looking at the nature of both system. logically PC should have a quite larger advance in tools available. ( and thats not the case )
     
  10. MDolenc

    Regular

    Joined:
    May 26, 2002
    Messages:
    696
    Likes Received:
    446
    Location:
    Slovenia
    Why is it a mystery? As sebbbi said... There are generic profiling tools on PC. Problem is that all the really interesting stuff is proprietary and may change drastically with a change of GPU architecture (for example from Kepler to Maxwell, or from Polaris to Vega). And at the same time it does not translate well across different vendors (what you might be interested in on NV hardware might not be the same as on AMD hardware). IHVs have a much much better toolset suite for their hardware.
     
    Razor1 and pharma like this.
  11. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
  12. Ike Turner

    Veteran

    Joined:
    Jul 30, 2005
    Messages:
    2,110
    Likes Received:
    2,304
    #1672 Ike Turner, Jan 30, 2017
    Last edited: Jan 30, 2017
  13. Malo

    Malo Yak Mechanicum
    Legend Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    8,929
    Likes Received:
    5,529
    Location:
    Pennsylvania
    #1673 Malo, Jan 30, 2017
    Last edited: Jan 30, 2017
  14. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Normally, it should be automagically sorted with the highest performance on top - at least that's our standard way of doing it. I understand that it does incur a minor inconvenience clicking though.

    As for the NaN errors: I cannot reproduce them. Might have something to do with langugage settings and different interpretation of the decimal point? In germany, we have a „.“ as decimal point instead of a "," (wow that looked awkward with typographic quotation marks).
     
  15. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,213
    Most gains are in 720p when the CPU of choice (the hobbled 8350) is the bottleneck, otherwise we are looking at the classic Ashes and Hitman DX12 poster children. Results from the rest of the games do more harm than good for the image of DX12, some games even post negative results when CPU limited (Deus Ex and Warhammer). I second the calls for testing with a better CPU and GPU.
     
  16. Alessio1989

    Regular

    Joined:
    Jun 6, 2015
    Messages:
    614
    Likes Received:
    321
    Though it did not improve at all the performance in the AMD modified D3D12nBodyGravity sample, looks like last drivers (17.2.1) re-enabled async-compute for GCN1 GPUs. Not sure how much will impact games (currentyl my 280 is used as slave gpus for non-game applications), but at least this should cause less problem to developers.

    Here's a shot by GPU View, as you can see, the compute queue is back and runs in concurrency with the default queue. Please note that some flip and presentation issues are caused since the GCN1 GPU does not directly output to the monitor (which is controlled by a R9 380X, a GCN3 GPU) and all is handled by WDDM magic.

    [​IMG]

    And here is the log of that small CS test:

    Compute only:
    1. 54.19ms
    2. 54.17ms
    3. 54.17ms
    4. 54.18ms
    5. 54.17ms
    6. 54.18ms
    7. 54.17ms
    8. 54.16ms
    9. 54.17ms
    10. 54.18ms
    11. 54.19ms
    12. 54.18ms
    13. 54.17ms
    14. 54.17ms
    15. 54.17ms
    16. 54.17ms
    17. 54.18ms
    18. 54.16ms
    19. 54.18ms
    20. 54.17ms
    21. 54.15ms
    22. 54.14ms
    23. 54.14ms
    24. 54.14ms
    25. 54.15ms
    26. 54.15ms
    27. 54.14ms
    28. 54.14ms
    29. 54.14ms
    30. 54.17ms
    31. 54.15ms
    32. 54.14ms
    33. 54.14ms
    34. 54.14ms
    35. 54.14ms
    36. 54.14ms
    37. 54.14ms
    38. 54.14ms
    39. 54.14ms
    40. 54.15ms
    41. 54.14ms
    42. 54.15ms
    43. 54.15ms
    44. 54.15ms
    45. 54.15ms
    46. 54.15ms
    47. 54.15ms
    48. 54.15ms
    49. 54.14ms
    50. 54.15ms
    51. 54.15ms
    52. 54.15ms
    53. 54.15ms
    54. 54.16ms
    55. 54.15ms
    56. 54.16ms
    57. 54.15ms
    58. 54.15ms
    59. 54.15ms
    60. 54.16ms
    61. 54.15ms
    62. 54.15ms
    63. 54.16ms
    64. 54.18ms
    65. 54.17ms
    66. 54.15ms
    67. 54.15ms
    68. 54.15ms
    69. 54.15ms
    70. 54.15ms
    71. 54.15ms
    72. 54.16ms
    73. 54.15ms
    74. 54.15ms
    75. 54.17ms
    76. 54.15ms
    77. 54.14ms
    78. 54.15ms
    79. 54.15ms
    80. 54.16ms
    81. 54.14ms
    82. 54.16ms
    83. 55.18ms
    84. 54.19ms
    85. 54.15ms
    86. 54.14ms
    87. 54.14ms
    88. 54.15ms
    89. 54.16ms
    90. 54.15ms
    91. 54.14ms
    92. 54.15ms
    93. 54.17ms
    94. 54.15ms
    95. 54.14ms
    96. 54.15ms
    97. 54.15ms
    98. 54.15ms
    99. 54.17ms
    100. 54.15ms
    101. 54.14ms
    102. 54.15ms
    103. 54.15ms
    104. 54.16ms
    105. 54.18ms
    106. 54.18ms
    107. 54.15ms
    108. 54.14ms
    109. 54.15ms
    110. 54.14ms
    111. 54.14ms
    112. 54.14ms
    113. 54.14ms
    114. 54.15ms
    115. 54.14ms
    116. 54.15ms
    117. 54.15ms
    118. 54.14ms
    119. 54.14ms
    120. 54.14ms
    121. 54.14ms
    122. 54.14ms
    123. 54.14ms
    124. 54.14ms
    125. 54.15ms
    126. 54.14ms
    127. 54.14ms
    128. 54.14ms
    Graphics only: 56.29ms (29.81G pixels/s)
    Graphics + compute:
    1. 56.25ms (29.82G pixels/s)
    2. 56.23ms (29.84G pixels/s)
    3. 56.23ms (29.84G pixels/s)
    4. 56.25ms (29.83G pixels/s)
    5. 56.25ms (29.83G pixels/s)
    6. 56.23ms (29.84G pixels/s)
    7. 56.25ms (29.82G pixels/s)
    8. 56.24ms (29.83G pixels/s)
    9. 56.24ms (29.83G pixels/s)
    10. 56.23ms (29.84G pixels/s)
    11. 56.23ms (29.84G pixels/s)
    12. 56.24ms (29.83G pixels/s)
    13. 56.23ms (29.84G pixels/s)
    14. 56.24ms (29.83G pixels/s)
    15. 56.23ms (29.84G pixels/s)
    16. 56.24ms (29.83G pixels/s)
    17. 56.23ms (29.84G pixels/s)
    18. 56.24ms (29.83G pixels/s)
    19. 56.24ms (29.83G pixels/s)
    20. 56.23ms (29.84G pixels/s)
    21. 56.24ms (29.83G pixels/s)
    22. 56.23ms (29.84G pixels/s)
    23. 56.25ms (29.83G pixels/s)
    24. 56.23ms (29.84G pixels/s)
    25. 56.24ms (29.83G pixels/s)
    26. 56.24ms (29.83G pixels/s)
    27. 56.26ms (29.82G pixels/s)
    28. 56.23ms (29.83G pixels/s)
    29. 56.23ms (29.84G pixels/s)
    30. 56.24ms (29.83G pixels/s)
    31. 56.24ms (29.83G pixels/s)
    32. 56.24ms (29.83G pixels/s)
    33. 56.24ms (29.83G pixels/s)
    34. 56.24ms (29.83G pixels/s)
    35. 56.23ms (29.84G pixels/s)
    36. 56.25ms (29.83G pixels/s)
    37. 56.24ms (29.83G pixels/s)
    38. 56.24ms (29.83G pixels/s)
    39. 56.24ms (29.83G pixels/s)
    40. 56.24ms (29.83G pixels/s)
    41. 56.24ms (29.83G pixels/s)
    42. 56.23ms (29.84G pixels/s)
    43. 56.24ms (29.83G pixels/s)
    44. 56.24ms (29.83G pixels/s)
    45. 56.24ms (29.83G pixels/s)
    46. 56.23ms (29.84G pixels/s)
    47. 56.23ms (29.84G pixels/s)
    48. 56.24ms (29.83G pixels/s)
    49. 56.24ms (29.83G pixels/s)
    50. 56.25ms (29.83G pixels/s)
    51. 56.24ms (29.83G pixels/s)
    52. 56.24ms (29.83G pixels/s)
    53. 56.23ms (29.84G pixels/s)
    54. 56.24ms (29.83G pixels/s)
    55. 56.26ms (29.82G pixels/s)
    56. 56.24ms (29.83G pixels/s)
    57. 56.24ms (29.83G pixels/s)
    58. 56.24ms (29.83G pixels/s)
    59. 56.24ms (29.83G pixels/s)
    60. 56.25ms (29.82G pixels/s)
    61. 56.23ms (29.83G pixels/s)
    62. 56.23ms (29.84G pixels/s)
    63. 56.24ms (29.83G pixels/s)
    64. 56.23ms (29.84G pixels/s)
    65. 56.23ms (29.84G pixels/s)
    66. 56.23ms (29.84G pixels/s)
    67. 56.24ms (29.83G pixels/s)
    68. 56.23ms (29.84G pixels/s)
    69. 56.23ms (29.84G pixels/s)
    70. 56.23ms (29.84G pixels/s)
    71. 56.23ms (29.83G pixels/s)
    72. 56.23ms (29.84G pixels/s)
    73. 56.23ms (29.84G pixels/s)
    74. 56.23ms (29.84G pixels/s)
    75. 56.23ms (29.84G pixels/s)
    76. 56.24ms (29.83G pixels/s)
    77. 56.23ms (29.84G pixels/s)
    78. 56.23ms (29.84G pixels/s)
    79. 56.23ms (29.84G pixels/s)
    80. 56.23ms (29.83G pixels/s)
    81. 56.23ms (29.84G pixels/s)
    82. 56.23ms (29.84G pixels/s)
    83. 56.23ms (29.84G pixels/s)
    84. 56.23ms (29.84G pixels/s)
    85. 56.24ms (29.83G pixels/s)
    86. 56.23ms (29.84G pixels/s)
    87. 56.23ms (29.84G pixels/s)
    88. 56.23ms (29.84G pixels/s)
    89. 56.23ms (29.84G pixels/s)
    90. 56.23ms (29.84G pixels/s)
    91. 56.23ms (29.83G pixels/s)
    92. 56.23ms (29.84G pixels/s)
    93. 56.25ms (29.82G pixels/s)
    94. 56.24ms (29.83G pixels/s)
    95. 56.23ms (29.83G pixels/s)
    96. 56.24ms (29.83G pixels/s)
    97. 56.39ms (29.75G pixels/s)
    98. 56.47ms (29.71G pixels/s)
    99. 56.23ms (29.84G pixels/s)
    100. 56.23ms (29.84G pixels/s)
    101. 56.24ms (29.83G pixels/s)
    102. 56.23ms (29.84G pixels/s)
    103. 56.23ms (29.84G pixels/s)
    104. 56.23ms (29.84G pixels/s)
    105. 56.26ms (29.82G pixels/s)
    106. 56.23ms (29.84G pixels/s)
    107. 56.24ms (29.83G pixels/s)
    108. 56.24ms (29.83G pixels/s)
    109. 56.24ms (29.83G pixels/s)
    110. 56.23ms (29.83G pixels/s)
    111. 56.23ms (29.84G pixels/s)
    112. 56.23ms (29.84G pixels/s)
    113. 56.23ms (29.84G pixels/s)
    114. 56.23ms (29.84G pixels/s)
    115. 56.23ms (29.84G pixels/s)
    116. 56.23ms (29.84G pixels/s)
    117. 56.23ms (29.84G pixels/s)
    118. 56.23ms (29.84G pixels/s)
    119. 56.23ms (29.84G pixels/s)
    120. 56.23ms (29.84G pixels/s)
    121. 56.23ms (29.84G pixels/s)
    122. 56.23ms (29.84G pixels/s)
    123. 56.23ms (29.84G pixels/s)
    124. 56.24ms (29.83G pixels/s)
    125. 56.23ms (29.84G pixels/s)
    126. 56.23ms (29.83G pixels/s)
    127. 56.23ms (29.84G pixels/s)
    128. 56.23ms (29.84G pixels/s)
     
    #1676 Alessio1989, Feb 15, 2017
    Last edited: Feb 15, 2017
    sebbbi, Lightman, lanek and 2 others like this.
  17. CSI PC

    Veteran

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Back in March last year I raised:
    Finally looks like Nvidia answered this at the GDC event.
    Additionally it will be interesting to see how this pans out in terms of multithread support/performance for when CPU orientated such as PhysX (not the only GameWorks feature able to utilise CPU side), which responds well up to 3 threads.
    http://physxinfo.com/news/11327/multithreaded-performance-scaling-in-physx-sdk/

    Yeah I appreciate GameWorks has a love/hate depending upon you talk to and its influence on gaming, but fingers crossed this actually helps to improve its "bolt-on" related performance and much less cumbersome.
    Anyway the liquid and flame/smoke particle simulation demo have come a long way.
    Cheers
     
    #1677 CSI PC, Mar 1, 2017
    Last edited: Mar 1, 2017
    pharma likes this.
  18. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,213
    Seems they found a good ground for using Async Compute on their GPUs, It seems GPU PhysX represents a poor case for GPU utilization in general. NV intends to fix that through Async, giving that Async is almost useless to them under normal rendering due to their very high utilization rate.

    I hope to see The Division and Tomb Raider implementing VXAO and HFTS under DX12. This will bring the DX12 path to parity with the DX11 path in the visual aspect.
     
  19. Malo

    Malo Yak Mechanicum
    Legend Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    8,929
    Likes Received:
    5,529
    Location:
    Pennsylvania
    Unfortunately it's quite to be quite a few years before pre-Pascal generations are replaced by consumers and they start seeing gains from async. People will be hanging on to their Maxwell cards for a long time since they still perform well under DX11 and simply don't see gains under DX12.
     
  20. CSI PC

    Veteran

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    PhysX would be better for Nvidia on their GPUs but developers prefer to keep it more neutral and use the CPU option, such as I think Witcher 3.

    But more generally beyond my post, yeah Nvidia is suggesting you are going to see improvements with Tomb Raider and other games with the next 'Game Ready Driver Optimised for DX12'
    http://nvidianews.nvidia.com/news/nvidia-announces-gameworks-dx12
    Link also covers aspects of GameWorks.

    Cheers
     
    #1680 CSI PC, Mar 1, 2017
    Last edited: Mar 1, 2017
    DavidGraham likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...