DX12 Performance Discussion And Analysis Thread

Discussion in 'Rendering Technology and APIs' started by A1xLLcqAgt0qc2RyMz0y, Jul 29, 2015.

  1. Anarchist4000

    Veteran

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    Even in the linux drivers the microcode was signed binaries last I checked.

    The big difference for ACEs is that GCN1.0 doesn't have microcode as far as I know. Should still work, just a bit less intuitive. It was the future programmable ACE/HWS designs that got the newer features backported. GCN1.2 and newer.
     
  2. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    The capacity constraint for microcode I'm thinking of came up in the context of allowing the microcode engines to support the standard command packet types, HWS, and the AQL packets for HSA at the same time.
    At least the standard compute being used for the synthetic benchmark from this thread wouldn't be purposefully involving those extra sets of microcode since that's beyond what the program could control, and it hasn't been changed.
    Losing the exact same functionality now might be that something changed the driver's threshold for serialization, a choice to back off on AC for 1.0, or a bug.

    There are features that AMD has introduced that may not filter back to anything older than Sea Islands .
    The front-end hardware for GCN 1.0 doesn't seem to have the foundation shared by the next revisions, in terms of the ability to update and the ability/hardware features for the microcode engine's interaction with the GPU back end. The underlying hardware of the front end for GCN 1.0 might have had more of a shared basis with Northern Islands and its introduction of compute.

    From posts in the following thread, it seems that going forward AMD's overall compute platform is based on 1.1 and higher.
    https://www.phoronix.com/forums/for...te-1-3-platform-brings-polaris-other-features

    I wouldn't think that would necessitate scrapping 1.0 support for basic AC, although perhaps there are details on how the stack communicates with the hardware that might explain why this gets more difficult, or the moving onto bigger and better things can increase the chance of corner cases coming up and forcing a fallback when no additional hotfix is forthcoming. Applications that do try to use more recent features could prompt a drop back to standard execution rather than trying to infer how they can be massaged into 1.0 at runtime.
     
    pharma, sebbbi and DavidGraham like this.
  3. Alessio1989

    Regular

    Joined:
    Jun 6, 2015
    Messages:
    614
    Likes Received:
    321
    Just tried last drivers with my 280, got 4% performance gain with AC on the nBody sample that AMD modified some time ago adding naive "async. compute" support: https://github.com/GPUOpen-LibrariesAndSDKs/nBodyD3D12/tree/master/Samples/D3D12nBodyGravity

    But I suspect there is a bug in the current driver branch for GCN1 devices, since I got a lot less performance then my other 380, with and without "async. compute" enable. The performance bottleneck looks like the compute shader in both async on and off.
     
    #1623 Alessio1989, Dec 7, 2016
    Last edited: Dec 7, 2016
    digitalwanderer and sebbbi like this.
  4. Alessio1989

    Regular

    Joined:
    Jun 6, 2015
    Messages:
    614
    Likes Received:
    321
    Also "async compute on" on the GCN 1 produce a stuttering pattern, while this is completely absent if running the demo with the 380. Moreover the performance of the 380 are like the double, even with async compute off, while the two cards should be more ore less in the same performance range with +10/-15% for the 280 vs the 380 depending on the context. In a worst case scenario, like a synthetic benchmark, the the 380 should performs 21% better then the 280 on single precision (and like -68% less on double precision :p)... All this smell like a crap bug. Please note that I am running those two cards under a CPU limited scenario, which should increase the performance benefit when the two queues (default and compute) are not serialized by the driver..
     
    Lightman, I.S.T., sebbbi and 2 others like this.
  5. swaaye

    swaaye Entirely Suboptimal
    Legend

    Joined:
    Mar 15, 2003
    Messages:
    9,044
    Likes Received:
    1,116
    Location:
    WI, USA
    I own AOTS Escalation and ran the benchmark test in both D3D11 and D3D12 modes on both my notebook with 980M and desktop with 1070. The 980M is slightly slower in D3D12 whereas the 1070 is slightly faster. Same drivers. No idea if "async compute" is enabled or disabled by default these days.
     
  6. Kwee

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    17
    Likes Received:
    4
  7. Alessio1989

    Regular

    Joined:
    Jun 6, 2015
    Messages:
    614
    Likes Received:
    321
  8. Kwee

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    17
    Likes Received:
    4
  9. Alessio1989

    Regular

    Joined:
    Jun 6, 2015
    Messages:
    614
    Likes Received:
    321
    Same issues and same terrible performance on GCN1 and compute shaders under D3D12, still stuttering and small performance improvement under async-compute.
     
  10. Kwee

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    17
    Likes Received:
    4
  11. Kwee

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    17
    Likes Received:
    4
    I spent all my night to install every drivers between 16.3.1 and 16.9.2. The break point is the driver 16.4.2. After this driver no more Async Compute on GCN 1.0. So Nixxes was aware that Async Compute was not active. They release Async Compute patch for Rise Of The Tomb Raider in July, specifying that only GCN 1.1 and superior can take advantage of Async Compute.
     
  12. Alessio1989

    Regular

    Joined:
    Jun 6, 2015
    Messages:
    614
    Likes Received:
    321
    R9 280 (GCN1 Tahiti)
    Code:
    Compute only:
    1. 54.14ms
    2. 54.13ms
    3. 54.13ms
    4. 54.13ms
    5. 54.13ms
    6. 54.13ms
    7. 54.13ms
    8. 54.13ms
    9. 54.13ms
    10. 54.13ms
    11. 54.13ms
    12. 54.13ms
    13. 54.13ms
    14. 54.13ms
    15. 54.13ms
    16. 54.13ms
    17. 54.13ms
    18. 54.13ms
    19. 54.13ms
    20. 54.13ms
    21. 54.13ms
    22. 54.13ms
    23. 54.13ms
    24. 54.13ms
    25. 54.13ms
    26. 54.13ms
    27. 54.13ms
    28. 54.13ms
    29. 54.13ms
    30. 54.13ms
    31. 54.13ms
    32. 54.13ms
    33. 54.13ms
    34. 54.13ms
    35. 54.13ms
    36. 54.14ms
    37. 54.13ms
    38. 54.13ms
    39. 54.13ms
    40. 54.14ms
    41. 54.13ms
    42. 54.15ms
    43. 54.13ms
    44. 54.13ms
    45. 54.13ms
    46. 54.13ms
    47. 54.13ms
    48. 54.14ms
    49. 54.13ms
    50. 54.13ms
    51. 54.13ms
    52. 54.13ms
    53. 54.13ms
    54. 54.14ms
    55. 54.13ms
    56. 54.14ms
    57. 54.13ms
    58. 54.13ms
    59. 54.14ms
    60. 54.13ms
    61. 54.14ms
    62. 54.13ms
    63. 54.14ms
    64. 54.13ms
    65. 54.13ms
    66. 54.13ms
    67. 54.13ms
    68. 54.13ms
    69. 54.13ms
    70. 54.14ms
    71. 54.13ms
    72. 54.13ms
    73. 54.13ms
    74. 54.13ms
    75. 54.13ms
    76. 54.13ms
    77. 54.13ms
    78. 54.13ms
    79. 54.13ms
    80. 54.13ms
    81. 54.13ms
    82. 54.13ms
    83. 54.13ms
    84. 54.13ms
    85. 54.13ms
    86. 54.14ms
    87. 54.13ms
    88. 54.13ms
    89. 54.13ms
    90. 54.13ms
    91. 54.13ms
    92. 54.13ms
    93. 54.13ms
    94. 54.13ms
    95. 54.13ms
    96. 54.13ms
    97. 54.13ms
    98. 54.13ms
    99. 54.13ms
    100. 54.13ms
    101. 54.13ms
    102. 54.13ms
    103. 54.13ms
    104. 54.13ms
    105. 54.13ms
    106. 54.13ms
    107. 54.13ms
    108. 54.13ms
    109. 54.13ms
    110. 54.13ms
    111. 54.13ms
    112. 54.13ms
    113. 54.13ms
    114. 54.13ms
    115. 54.13ms
    116. 54.14ms
    117. 54.13ms
    118. 54.13ms
    119. 54.13ms
    120. 54.13ms
    121. 54.13ms
    122. 54.13ms
    123. 54.13ms
    124. 54.13ms
    125. 54.14ms
    126. 54.13ms
    127. 54.13ms
    128. 54.13ms
    Graphics only: 56.28ms (29.81G pixels/s)
    Graphics + compute:
    1. 110.47ms (15.19G pixels/s)
    2. 110.53ms (15.18G pixels/s)
    3. 110.53ms (15.18G pixels/s)
    4. 110.53ms (15.18G pixels/s)
    5. 110.53ms (15.18G pixels/s)
    6. 110.52ms (15.18G pixels/s)
    7. 110.52ms (15.18G pixels/s)
    8. 110.53ms (15.18G pixels/s)
    9. 110.53ms (15.18G pixels/s)
    10. 110.53ms (15.18G pixels/s)
    11. 110.53ms (15.18G pixels/s)
    12. 110.53ms (15.18G pixels/s)
    13. 110.54ms (15.18G pixels/s)
    14. 110.53ms (15.18G pixels/s)
    15. 110.53ms (15.18G pixels/s)
    16. 110.53ms (15.18G pixels/s)
    17. 110.53ms (15.18G pixels/s)
    18. 110.53ms (15.18G pixels/s)
    19. 110.54ms (15.18G pixels/s)
    20. 110.53ms (15.18G pixels/s)
    21. 110.53ms (15.18G pixels/s)
    22. 110.53ms (15.18G pixels/s)
    23. 110.53ms (15.18G pixels/s)
    24. 110.53ms (15.18G pixels/s)
    25. 110.56ms (15.18G pixels/s)
    26. 110.53ms (15.18G pixels/s)
    27. 110.53ms (15.18G pixels/s)
    28. 110.53ms (15.18G pixels/s)
    29. 110.53ms (15.18G pixels/s)
    30. 110.53ms (15.18G pixels/s)
    31. 110.54ms (15.18G pixels/s)
    32. 110.53ms (15.18G pixels/s)
    33. 110.54ms (15.18G pixels/s)
    34. 110.53ms (15.18G pixels/s)
    35. 110.53ms (15.18G pixels/s)
    36. 110.52ms (15.18G pixels/s)
    37. 110.53ms (15.18G pixels/s)
    38. 110.53ms (15.18G pixels/s)
    39. 110.52ms (15.18G pixels/s)
    40. 110.53ms (15.18G pixels/s)
    41. 110.53ms (15.18G pixels/s)
    42. 110.53ms (15.18G pixels/s)
    43. 110.53ms (15.18G pixels/s)
    44. 110.53ms (15.18G pixels/s)
    45. 110.53ms (15.18G pixels/s)
    46. 110.53ms (15.18G pixels/s)
    47. 110.53ms (15.18G pixels/s)
    48. 110.53ms (15.18G pixels/s)
    49. 110.53ms (15.18G pixels/s)
    50. 110.53ms (15.18G pixels/s)
    51. 110.54ms (15.18G pixels/s)
    52. 110.53ms (15.18G pixels/s)
    53. 110.55ms (15.18G pixels/s)
    54. 110.54ms (15.18G pixels/s)
    55. 110.52ms (15.18G pixels/s)
    56. 110.53ms (15.18G pixels/s)
    57. 110.54ms (15.18G pixels/s)
    58. 110.54ms (15.18G pixels/s)
    59. 110.54ms (15.18G pixels/s)
    60. 110.53ms (15.18G pixels/s)
    61. 110.53ms (15.18G pixels/s)
    62. 110.53ms (15.18G pixels/s)
    63. 110.55ms (15.18G pixels/s)
    64. 110.53ms (15.18G pixels/s)
    65. 110.54ms (15.18G pixels/s)
    66. 110.53ms (15.18G pixels/s)
    67. 110.54ms (15.18G pixels/s)
    68. 110.55ms (15.18G pixels/s)
    69. 110.55ms (15.18G pixels/s)
    70. 110.54ms (15.18G pixels/s)
    71. 110.54ms (15.18G pixels/s)
    72. 110.55ms (15.18G pixels/s)
    73. 110.55ms (15.18G pixels/s)
    74. 110.55ms (15.18G pixels/s)
    75. 110.55ms (15.18G pixels/s)
    76. 110.56ms (15.17G pixels/s)
    77. 110.54ms (15.18G pixels/s)
    78. 110.54ms (15.18G pixels/s)
    79. 110.54ms (15.18G pixels/s)
    80. 110.54ms (15.18G pixels/s)
    81. 110.55ms (15.18G pixels/s)
    82. 110.54ms (15.18G pixels/s)
    83. 110.55ms (15.18G pixels/s)
    84. 110.55ms (15.18G pixels/s)
    85. 110.57ms (15.17G pixels/s)
    86. 110.55ms (15.18G pixels/s)
    87. 110.57ms (15.17G pixels/s)
    88. 110.55ms (15.18G pixels/s)
    89. 110.55ms (15.18G pixels/s)
    90. 110.56ms (15.17G pixels/s)
    91. 110.55ms (15.18G pixels/s)
    92. 110.55ms (15.18G pixels/s)
    93. 110.55ms (15.18G pixels/s)
    94. 110.57ms (15.17G pixels/s)
    95. 110.56ms (15.17G pixels/s)
    96. 110.55ms (15.18G pixels/s)
    97. 110.57ms (15.17G pixels/s)
    98. 110.55ms (15.18G pixels/s)
    99. 110.56ms (15.18G pixels/s)
    100. 110.55ms (15.18G pixels/s)
    101. 110.54ms (15.18G pixels/s)
    102. 110.54ms (15.18G pixels/s)
    103. 110.55ms (15.18G pixels/s)
    104. 110.55ms (15.18G pixels/s)
    105. 110.55ms (15.18G pixels/s)
    106. 110.54ms (15.18G pixels/s)
    107. 110.54ms (15.18G pixels/s)
    108. 110.55ms (15.18G pixels/s)
    109. 110.55ms (15.18G pixels/s)
    110. 110.54ms (15.18G pixels/s)
    111. 110.54ms (15.18G pixels/s)
    112. 110.56ms (15.17G pixels/s)
    113. 110.55ms (15.18G pixels/s)
    114. 110.55ms (15.18G pixels/s)
    115. 110.55ms (15.18G pixels/s)
    116. 110.55ms (15.18G pixels/s)
    117. 110.57ms (15.17G pixels/s)
    118. 110.55ms (15.18G pixels/s)
    119. 110.55ms (15.18G pixels/s)
    120. 110.55ms (15.18G pixels/s)
    121. 110.55ms (15.18G pixels/s)
    122. 110.55ms (15.18G pixels/s)
    123. 110.55ms (15.18G pixels/s)
    124. 110.55ms (15.18G pixels/s)
    125. 110.55ms (15.18G pixels/s)
    126. 110.55ms (15.18G pixels/s)
    127. 110.55ms (15.18G pixels/s)
    128. 110.55ms (15.18G pixels/s)
    
    -.-
     
  13. Kwee

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    17
    Likes Received:
    4
    Which driver ?
     
  14. Alessio1989

    Regular

    Joined:
    Jun 6, 2015
    Messages:
    614
    Likes Received:
    321
    16.11.x, but with the 16.12 I got the same results. I simply rolled back to 16.11.x since the 16.12 have some power and fan control issues.
     
  15. Kwee

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    17
    Likes Received:
    4
    Okay it's just for my database. I received many results all around the world. Thanks to you.
     
  16. Alessio1989

    Regular

    Joined:
    Jun 6, 2015
    Messages:
    614
    Likes Received:
    321
  17. Kwee

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    17
    Likes Received:
    4
    Nice ! I was testing this sample when you post this ^^'
     
  18. Alessio1989

    Regular

    Joined:
    Jun 6, 2015
    Messages:
    614
    Likes Received:
    321
    Note also there are some real issues on the flip queue with the R9 280. EDIT: I posted that on the AMD driver support forum too (there is no Direct3D developer forum at all on the AMD dev portal -.-), hope to get a real answer this time.

    https://community.amd.com/message/2765237
     
  19. swaaye

    swaaye Entirely Suboptimal
    Legend

    Joined:
    Mar 15, 2003
    Messages:
    9,044
    Likes Received:
    1,116
    Location:
    WI, USA
    nevermind. I see someone tested 16.12.1.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...