DX12 Performance Discussion And Analysis Thread

Discussion in 'Rendering Technology and APIs' started by A1xLLcqAgt0qc2RyMz0y, Jul 29, 2015.

  1. RedditUserB

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    24
    Likes Received:
    1
    Is he?

    ie.
    25ms Graphics
    25ms Compute

    With Async Compute enabled, the combined Graphics + Compute task should be completed in.... ?

    With Async Compute disabled, the combined Graphics + Compute task should be completed in....?
     
  2. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY
    when he was talking about the first statement he was referring to the first test. The second test doesn't show that at all,

    graphics + compute is always lower than the two separate
    Titan X
    Compute
    38.26ms
    32.95ms
    35.61ms
    32.91ms
    32.93ms
    35.59ms
    32.93ms

    TitanX
    Graphics
    18.17ms

    TitanX
    Compute + Graphics

    28.96ms (57.93G pixels/s)
    29.01ms (57.84G pixels/s)
    29.14ms (57.57G pixels/s)
    29.11ms (57.64G pixels/s)
    29.13ms (57.59G pixels/s)
    29.07ms (57.71G pixels/s)
    29.11ms (57.64G pixels/s)

    If it was going in serial it should be well above 50ms.

    And this is why I said parts of it.
     
    #282 Razor1, Sep 2, 2015
    Last edited: Sep 2, 2015
  3. Nub

    Nub
    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    10
    Likes Received:
    18
    Hey guys, someone help me out here.

    I don't quite understand the difference between the async+compute test set and the async+compute(single commandlist) test set.

    If someone could help me understand that, i can think of a way to properly visualize the data on the chart page i made.

    But for now, the new test sets are added, highlighted in green where "async+compute(single commandlist)" is available, but currently not yet presenting that data.

    [​IMG]
     
  4. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY
    the single command list is forced synchronized
     
  5. Phyxsyus

    Joined:
    Sep 2, 2015
    Messages:
    3
    Likes Received:
    0
    R9 280 OC / Catalyst 15.8 beta

    Compute only:
    1. 49.14ms
    2. 49.13ms
    3. 49.16ms
    4. 49.14ms
    5. 49.13ms
    6. 49.16ms
    7. 49.14ms
    8. 49.13ms
    9. 49.15ms
    10. 49.16ms
    11. 49.15ms
    12. 49.10ms
    13. 49.15ms
    14. 49.15ms
    15. 49.14ms
    16. 49.14ms
    17. 49.14ms
    18. 49.45ms
    19. 49.14ms
    20. 49.14ms
    21. 49.12ms
    22. 49.13ms
    23. 49.16ms
    24. 49.14ms
    25. 49.10ms
    26. 49.11ms
    27. 49.13ms
    28. 49.16ms
    29. 49.15ms
    30. 49.12ms
    31. 49.14ms
    32. 49.14ms
    33. 49.12ms
    34. 49.15ms
    35. 49.12ms
    36. 49.14ms
    37. 49.14ms
    38. 49.12ms
    39. 49.13ms
    40. 49.13ms
    41. 49.10ms
    42. 49.15ms
    43. 49.17ms
    44. 49.15ms
    45. 49.15ms
    46. 49.15ms
    47. 49.12ms
    48. 49.39ms
    49. 49.15ms
    50. 49.13ms
    51. 49.14ms
    52. 49.14ms
    53. 49.15ms
    54. 49.16ms
    55. 49.14ms
    56. 49.14ms
    57. 49.16ms
    58. 49.16ms
    59. 49.14ms
    60. 49.14ms
    61. 49.14ms
    62. 49.14ms
    63. 49.15ms
    64. 49.12ms
    65. 49.16ms
    66. 49.14ms
    67. 49.14ms
    68. 49.15ms
    69. 49.17ms
    70. 49.16ms
    71. 49.15ms
    72. 49.13ms
    73. 49.17ms
    74. 49.13ms
    75. 49.14ms
    76. 49.14ms
    77. 49.18ms
    78. 49.13ms
    79. 49.12ms
    80. 49.13ms
    81. 49.13ms
    82. 49.10ms
    83. 49.13ms
    84. 49.15ms
    85. 49.14ms
    86. 49.15ms
    87. 49.15ms
    88. 49.15ms
    89. 49.14ms
    90. 49.14ms
    91. 49.13ms
    92. 49.15ms
    93. 49.13ms
    94. 49.15ms
    95. 49.17ms
    96. 49.15ms
    97. 49.13ms
    98. 49.16ms
    99. 49.13ms
    100. 49.14ms
    101. 49.14ms
    102. 49.16ms
    103. 49.16ms
    104. 49.14ms
    105. 49.16ms
    106. 49.14ms
    107. 49.28ms
    108. 49.15ms
    109. 49.14ms
    110. 49.15ms
    111. 49.13ms
    112. 49.17ms
    113. 49.13ms
    114. 49.13ms
    115. 49.14ms
    116. 49.15ms
    117. 49.15ms
    118. 49.16ms
    119. 49.11ms
    120. 49.13ms
    121. 49.12ms
    122. 49.15ms
    123. 49.14ms
    124. 49.15ms
    125. 49.14ms
    126. 49.15ms
    127. 49.15ms
    128. 49.14ms
    Graphics only: 47.23ms (35.52G pixels/s)
    Graphics + compute:
    1. 49.28ms (34.04G pixels/s)
    2. 49.30ms (34.03G pixels/s)
    3. 49.31ms (34.02G pixels/s)
    4. 49.34ms (34.00G pixels/s)
    5. 49.33ms (34.01G pixels/s)
    6. 49.36ms (33.99G pixels/s)
    7. 49.30ms (34.03G pixels/s)
    8. 57.71ms (29.07G pixels/s)
    9. 49.32ms (34.02G pixels/s)
    10. 49.33ms (34.01G pixels/s)
    11. 49.31ms (34.03G pixels/s)
    12. 49.30ms (34.03G pixels/s)
    13. 49.32ms (34.02G pixels/s)
    14. 49.30ms (34.03G pixels/s)
    15. 49.32ms (34.01G pixels/s)
    16. 49.32ms (34.02G pixels/s)
    17. 49.35ms (34.00G pixels/s)
    18. 49.33ms (34.01G pixels/s)
    19. 49.31ms (34.02G pixels/s)
    20. 49.34ms (34.00G pixels/s)
    21. 49.33ms (34.01G pixels/s)
    22. 49.37ms (33.99G pixels/s)
    23. 49.31ms (34.03G pixels/s)
    24. 49.32ms (34.02G pixels/s)
    25. 49.35ms (34.00G pixels/s)
    26. 49.36ms (33.99G pixels/s)
    27. 49.34ms (34.00G pixels/s)
    28. 49.34ms (34.00G pixels/s)
    29. 49.34ms (34.00G pixels/s)
    30. 49.33ms (34.01G pixels/s)
    31. 49.35ms (33.99G pixels/s)
    32. 49.36ms (33.99G pixels/s)
    33. 49.29ms (34.04G pixels/s)
    34. 49.36ms (33.99G pixels/s)
    35. 49.35ms (33.99G pixels/s)
    36. 49.32ms (34.02G pixels/s)
    37. 49.46ms (33.92G pixels/s)
    38. 62.19ms (26.98G pixels/s)
    39. 49.34ms (34.01G pixels/s)
    40. 49.33ms (34.01G pixels/s)
    41. 49.33ms (34.01G pixels/s)
    42. 49.32ms (34.02G pixels/s)
    43. 49.33ms (34.01G pixels/s)
    44. 49.33ms (34.01G pixels/s)
    45. 49.35ms (34.00G pixels/s)
    46. 49.35ms (33.99G pixels/s)
    47. 49.36ms (33.99G pixels/s)
    48. 49.33ms (34.01G pixels/s)
    49. 49.36ms (33.99G pixels/s)
    50. 49.33ms (34.01G pixels/s)
    51. 49.37ms (33.99G pixels/s)
    52. 49.35ms (34.00G pixels/s)
    53. 49.35ms (34.00G pixels/s)
    54. 49.37ms (33.98G pixels/s)
    55. 49.36ms (33.99G pixels/s)
    56. 49.36ms (33.99G pixels/s)
    57. 49.35ms (34.00G pixels/s)
    58. 49.34ms (34.00G pixels/s)
    59. 49.33ms (34.01G pixels/s)
    60. 49.35ms (34.00G pixels/s)
    61. 49.37ms (33.99G pixels/s)
    62. 49.34ms (34.00G pixels/s)
    63. 49.37ms (33.98G pixels/s)
    64. 49.34ms (34.00G pixels/s)
    65. 49.34ms (34.00G pixels/s)
    66. 49.43ms (33.94G pixels/s)
    67. 49.34ms (34.00G pixels/s)
    68. 49.34ms (34.00G pixels/s)
    69. 49.36ms (33.99G pixels/s)
    70. 49.34ms (34.00G pixels/s)
    71. 49.38ms (33.97G pixels/s)
    72. 49.36ms (33.99G pixels/s)
    73. 49.34ms (34.00G pixels/s)
    74. 49.35ms (34.00G pixels/s)
    75. 49.39ms (33.97G pixels/s)
    76. 49.35ms (34.00G pixels/s)
    77. 49.34ms (34.00G pixels/s)
    78. 49.36ms (33.99G pixels/s)
    79. 49.37ms (33.98G pixels/s)
    80. 49.36ms (33.99G pixels/s)
    81. 49.30ms (34.03G pixels/s)
    82. 49.39ms (33.97G pixels/s)
    83. 49.33ms (34.01G pixels/s)
    84. 49.35ms (34.00G pixels/s)
    85. 49.36ms (33.99G pixels/s)
    86. 49.33ms (34.01G pixels/s)
    87. 49.36ms (33.99G pixels/s)
    88. 49.37ms (33.98G pixels/s)
    89. 49.35ms (33.99G pixels/s)
    90. 49.38ms (33.98G pixels/s)
    91. 49.36ms (33.99G pixels/s)
    92. 49.35ms (33.99G pixels/s)
    93. 49.34ms (34.00G pixels/s)
    94. 49.36ms (33.99G pixels/s)
    95. 49.33ms (34.01G pixels/s)
    96. 61.37ms (27.34G pixels/s)
    97. 49.34ms (34.00G pixels/s)
    98. 49.37ms (33.98G pixels/s)
    99. 49.37ms (33.98G pixels/s)
    100. 49.38ms (33.98G pixels/s)
    101. 49.35ms (34.00G pixels/s)
    102. 49.34ms (34.00G pixels/s)
    103. 49.34ms (34.01G pixels/s)
    104. 49.38ms (33.98G pixels/s)
    105. 49.35ms (34.00G pixels/s)
    106. 49.35ms (34.00G pixels/s)
    107. 49.34ms (34.00G pixels/s)
    108. 49.39ms (33.97G pixels/s)
    109. 49.35ms (34.00G pixels/s)
    110. 49.35ms (34.00G pixels/s)
    111. 49.36ms (33.99G pixels/s)
    112. 49.34ms (34.00G pixels/s)
    113. 49.39ms (33.97G pixels/s)
    114. 49.39ms (33.97G pixels/s)
    115. 49.44ms (33.93G pixels/s)
    116. 49.43ms (33.94G pixels/s)
    117. 49.44ms (33.93G pixels/s)
    118. 49.42ms (33.95G pixels/s)
    119. 49.45ms (33.93G pixels/s)
    120. 49.46ms (33.92G pixels/s)
    121. 49.42ms (33.95G pixels/s)
    122. 49.45ms (33.93G pixels/s)
    123. 49.46ms (33.92G pixels/s)
    124. 49.46ms (33.92G pixels/s)
    125. 49.72ms (33.75G pixels/s)
    126. 49.45ms (33.93G pixels/s)
    127. 49.42ms (33.95G pixels/s)
    128. 49.47ms (33.91G pixels/s)
     
  6. Kobata

    Joined:
    May 18, 2015
    Messages:
    3
    Likes Received:
    0
    From the test results, I'd say it's not as forced as you might think. AMD at least is obviously capable of executing the compute portion of that test parallelized, although it does appear to cause the graphics portion to be completely separated.

    Even if you're assigning the same UAV to every dispatch, AMD does have a way of implementing GL_INTEL_fragment_shader_ordering which performs (memory-visible) serialization, so they could be running the bulk calculations in parallel and arranging for the writes to happen serially after the computations are done, since the shader never reads a value from it. You should see a more actually serial result if the singe commandlist shaders were dependent on the results of the previous invocations.
     
  7. RedditUserB

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    24
    Likes Received:
    1
    So that's the 2nd test on a 980Ti,

    1st:
    Compute: 5.67ms
    Graphics: 16.77ms

    Graphics + Compute: 21.15ms
    Graphics + Compute (Single Commandlist): 20.70ms

    And for 512th:
    Compute: 76.11ms
    Graphics: 16.77ms
    Graphics + Compute: 97.38ms
    Graphics + Compute (Single Commandlist): 2294.69ms

    ---------------------------------------

    In both the 1st to 512th, Async Mode adds up the time. Single Commandlist mode went nuts.

    Serial:
    A (Compute) + B (Graphics) = A + B

    Async:
    A + B = A OR B

    Right? Or is that not how we are meant to interpret the data of this test?
     
  8. Phyxsyus

    Joined:
    Sep 2, 2015
    Messages:
    3
    Likes Received:
    0
    I just ran the updated benchmark... File attached...
     

    Attached Files:

  9. MDolenc

    Regular

    Joined:
    May 26, 2002
    Messages:
    696
    Likes Received:
    446
    Location:
    Slovenia
    Two queues (COMPUTE and DIRECT). The graphics, compute single command lists packs everything into one command list (number of compute dispatches and 100 draw calls) and executes it over DIRECT queue.

    I knew I saw that somewhere... No, it's not used at the moment.

    P.S.: Numbers in [] are gpu timestamps from beginning of the whole thing to the end of n-th dispatch converted to ms. Fillrate in {} is fillrate calculated based on gpu timestamp before clear and after all the draws.
     
    #289 MDolenc, Sep 2, 2015
    Last edited: Sep 2, 2015
  10. Justin Cottrell

    Joined:
    Aug 30, 2015
    Messages:
    2
    Likes Received:
    0
    Heres my Radeon 285 Results. with TDR on, they are different with TDR off, ill upload them shortly.
     

    Attached Files:

  11. doob

    Regular

    Joined:
    May 21, 2005
    Messages:
    394
    Likes Received:
    5
    Log from a 7950 (oc'd @ 1ghz with latest driver 15.8 beta)
     

    Attached Files:

    #291 doob, Sep 2, 2015
    Last edited: Sep 2, 2015
  12. Justin Cottrell

    Joined:
    Aug 30, 2015
    Messages:
    2
    Likes Received:
    0
    scratch that, they arent different with TDR off any more, not sure what caused it.
     
  13. RedditUserB

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    24
    Likes Received:
    1
    Where is this data from, the Titan X SLI user?

    What's going on here, it seems splitting up the workload across the 2 GPUs.. whereas the single 980Ti config has the same behavior as prior: additive, sum of compute + graphics.
     
  14. Kwee

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    17
    Likes Received:
    4
    Got strange results with the last version
     

    Attached Files:

  15. TinMan710

    Joined:
    Sep 2, 2015
    Messages:
    1
    Likes Received:
    0
    Here are my results from my single 980 Ti
     

    Attached Files:

  16. RedditUserB

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    24
    Likes Received:
    1
    Compute only:
    1. 6.79ms

    Graphics only: 16.21ms

    Graphics + compute:
    1. 20.22ms

    Graphics, compute single commandlist:
    1. 20.04ms

    Your result is identical to others. Running Graphics + Compute results in an additive output, close to the sum of compute + graphics.

    Also your single commandlist results (forced), result in ever rising timings as we've seen with the others, up to 281st with a time of 2117.00ms!

    Is this what Oxide is talking about? When they try to force direct async mode it would mess up.
     
  17. Nub

    Nub
    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    10
    Likes Received:
    18
    Hmm, i've added the data into the tooltip and plotted it on the chart as a single dot. It should be apparent just by looking at it.

    I've yet to add a label on the y-axis though :T

    [​IMG]
     
    Razor1 likes this.
  18. Nub

    Nub
    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    10
    Likes Received:
    18
  19. RedditUserB

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    24
    Likes Received:
    1
    So Titan X SLI can do compute + graphics async faster than compute + graphics serial, from this app. But single GPU (both 980Ti results the same thus far) cannot.

    So it seems its offloading compute for GPU 1 and graphics for GPU 2?
     
  20. Nobu

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    21
    Likes Received:
    1
    Maybe I'm reading the graphs wrong...are the GCN cards doing better with a single commandlist?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...