DX12 Performance Discussion And Analysis Thread

Discussion in 'Rendering Technology and APIs' started by A1xLLcqAgt0qc2RyMz0y, Jul 29, 2015.

  1. dogen

    Newcomer

    Joined:
    Oct 27, 2014
    Messages:
    218
    Likes Received:
    143
    It doesn't. It jumps after every 16.
     
  2. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,918
    Likes Received:
    5,218
    Location:
    Helsinki, Finland
    Slide 22 of this presentation (http://developer.amd.com/wordpress/media/2013/06/2620_final.pdf) lists many GCN specific things that would be hard to port to PC DirectX. And if people start to do crazy stuff such as writing to the GPU command queue by a compute shader (to spawn tasks by the GPU), then porting to PC becomes almost impossible.

    You could also write PS3 SPU rendering code that was very hard to port to PC. And we still got acceptable PC ports. However this time both consoles support same low level GPU tricks, making PC the most different platform (last gen PS3 was the most different). It is interesting too see where this leads, and how fast the PC APIs will start to expose the remaining missing console GPU features.
     
    Razor1, BRiT, pharma and 2 others like this.
  3. Kodiack

    Joined:
    Sep 1, 2015
    Messages:
    1
    Likes Received:
    0
    GTX 980 Ti (ForceWare 355.82)

    Compute only:
    1. 11.53ms
    2. 11.01ms
    3. 10.31ms
    4. 10.43ms
    5. 10.40ms
    6. 9.50ms
    7. 9.28ms
    8. 9.29ms
    9. 9.32ms
    10. 9.33ms
    11. 9.12ms
    12. 9.10ms
    13. 9.15ms
    14. 9.17ms
    15. 9.16ms
    16. 9.18ms
    17. 9.17ms
    18. 9.15ms
    19. 9.12ms
    20. 9.16ms
    21. 9.16ms
    22. 9.16ms
    23. 9.19ms
    24. 9.18ms
    25. 9.21ms
    26. 9.15ms
    27. 9.13ms
    28. 9.22ms
    29. 9.18ms
    30. 9.21ms
    31. 9.14ms
    32. 18.02ms
    33. 18.05ms
    34. 22.44ms
    35. 20.28ms
    36. 20.24ms
    37. 20.20ms
    38. 18.05ms
    39. 20.27ms
    40. 20.21ms
    41. 20.24ms
    42. 18.05ms
    43. 20.23ms
    44. 20.26ms
    45. 22.47ms
    46. 20.22ms
    47. 20.25ms
    48. 18.05ms
    49. 20.26ms
    50. 20.27ms
    51. 20.23ms
    52. 18.02ms
    53. 20.23ms
    54. 20.25ms
    55. 20.22ms
    56. 18.05ms
    57. 20.24ms
    58. 20.26ms
    59. 20.25ms
    60. 18.08ms
    61. 20.24ms
    62. 22.48ms
    63. 18.07ms
    64. 28.96ms
    65. 28.98ms
    66. 31.31ms
    67. 29.21ms
    68. 26.98ms
    69. 26.97ms
    70. 26.98ms
    71. 31.40ms
    72. 26.97ms
    73. 26.96ms
    74. 29.16ms
    75. 31.41ms
    76. 27.03ms
    77. 27.01ms
    78. 29.14ms
    79. 29.22ms
    80. 31.39ms
    81. 27.01ms
    82. 26.98ms
    83. 29.15ms
    84. 31.39ms
    85. 26.95ms
    86. 26.98ms
    87. 27.01ms
    88. 27.06ms
    89. 29.18ms
    90. 26.95ms
    91. 29.18ms
    92. 27.07ms
    93. 27.02ms
    94. 31.39ms
    95. 27.00ms
    96. 40.13ms
    97. 37.90ms
    98. 35.84ms
    99. 38.15ms
    100. 38.12ms
    101. 35.91ms
    102. 35.91ms
    103. 40.29ms
    104. 35.89ms
    105. 38.09ms
    106. 38.16ms
    107. 38.08ms
    108. 35.89ms
    109. 35.98ms
    110. 35.90ms
    111. 35.90ms
    112. 40.36ms
    113. 40.41ms
    114. 40.35ms
    115. 36.14ms
    116. 40.28ms
    117. 35.89ms
    118. 38.10ms
    119. 35.94ms
    120. 38.14ms
    121. 38.09ms
    122. 35.96ms
    123. 35.91ms
    124. 35.93ms
    125. 36.03ms
    126. 35.90ms
    127. 35.93ms
    128. 44.67ms
    Graphics only: 16.40ms (102.31G pixels/s)
    Graphics + compute:
    1. 25.02ms (67.05G pixels/s)
    2. 24.96ms (67.23G pixels/s)
    3. 25.05ms (66.99G pixels/s)
    4. 25.25ms (66.44G pixels/s)
    5. 25.19ms (66.60G pixels/s)
    6. 25.10ms (66.83G pixels/s)
    7. 25.10ms (66.83G pixels/s)
    8. 25.17ms (66.65G pixels/s)
    9. 25.19ms (66.61G pixels/s)
    10. 25.19ms (66.60G pixels/s)
    11. 25.22ms (66.52G pixels/s)
    12. 25.20ms (66.56G pixels/s)
    13. 25.19ms (66.60G pixels/s)
    14. 25.21ms (66.56G pixels/s)
    15. 25.16ms (66.68G pixels/s)
    16. 25.15ms (66.70G pixels/s)
    17. 25.18ms (66.63G pixels/s)
    18. 25.22ms (66.51G pixels/s)
    19. 25.15ms (66.70G pixels/s)
    20. 25.20ms (66.57G pixels/s)
    21. 25.16ms (66.69G pixels/s)
    22. 25.15ms (66.71G pixels/s)
    23. 25.12ms (66.78G pixels/s)
    24. 25.13ms (66.76G pixels/s)
    25. 25.14ms (66.74G pixels/s)
    26. 25.20ms (66.58G pixels/s)
    27. 25.12ms (66.80G pixels/s)
    28. 25.13ms (66.76G pixels/s)
    29. 25.15ms (66.72G pixels/s)
    30. 25.16ms (66.68G pixels/s)
    31. 25.16ms (66.68G pixels/s)
    32. 34.02ms (49.32G pixels/s)
    33. 33.94ms (49.43G pixels/s)
    34. 34.10ms (49.20G pixels/s)
    35. 36.37ms (46.13G pixels/s)
    36. 34.15ms (49.12G pixels/s)
    37. 34.10ms (49.19G pixels/s)
    38. 34.14ms (49.14G pixels/s)
    39. 34.13ms (49.16G pixels/s)
    40. 34.12ms (49.17G pixels/s)
    41. 34.13ms (49.16G pixels/s)
    42. 34.17ms (49.10G pixels/s)
    43. 34.10ms (49.20G pixels/s)
    44. 34.10ms (49.19G pixels/s)
    45. 34.05ms (49.27G pixels/s)
    46. 36.30ms (46.22G pixels/s)
    47. 34.06ms (49.25G pixels/s)
    48. 34.10ms (49.21G pixels/s)
    49. 36.40ms (46.09G pixels/s)
    50. 34.12ms (49.17G pixels/s)
    51. 34.20ms (49.06G pixels/s)
    52. 34.13ms (49.16G pixels/s)
    53. 36.33ms (46.17G pixels/s)
    54. 34.08ms (49.23G pixels/s)
    55. 34.12ms (49.17G pixels/s)
    56. 34.14ms (49.14G pixels/s)
    57. 36.38ms (46.12G pixels/s)
    58. 33.98ms (49.37G pixels/s)
    59. 36.34ms (46.17G pixels/s)
    60. 34.04ms (49.29G pixels/s)
    61. 34.12ms (49.17G pixels/s)
    62. 34.22ms (49.03G pixels/s)
    63. 38.47ms (43.61G pixels/s)
    64. 44.98ms (37.30G pixels/s)
    65. 42.83ms (39.17G pixels/s)
    66. 42.94ms (39.07G pixels/s)
    67. 43.04ms (38.98G pixels/s)
    68. 43.04ms (38.98G pixels/s)
    69. 43.00ms (39.01G pixels/s)
    70. 42.98ms (39.04G pixels/s)
    71. 45.46ms (36.91G pixels/s)
    72. 47.66ms (35.20G pixels/s)
    73. 49.83ms (33.67G pixels/s)
    74. 43.31ms (38.73G pixels/s)
    75. 43.13ms (38.90G pixels/s)
    76. 45.22ms (37.10G pixels/s)
    77. 43.10ms (38.93G pixels/s)
    78. 43.00ms (39.02G pixels/s)
    79. 45.36ms (36.99G pixels/s)
    80. 49.69ms (33.76G pixels/s)
    81. 49.77ms (33.71G pixels/s)
    82. 43.09ms (38.93G pixels/s)
    83. 43.00ms (39.01G pixels/s)
    84. 45.43ms (36.93G pixels/s)
    85. 43.03ms (38.99G pixels/s)
    86. 43.03ms (38.99G pixels/s)
    87. 47.60ms (35.25G pixels/s)
    88. 42.97ms (39.04G pixels/s)
    89. 43.03ms (38.99G pixels/s)
    90. 45.36ms (36.99G pixels/s)
    91. 43.06ms (38.96G pixels/s)
    92. 43.12ms (38.91G pixels/s)
    93. 43.09ms (38.93G pixels/s)
    94. 45.32ms (37.02G pixels/s)
    95. 45.29ms (37.04G pixels/s)
    96. 53.97ms (31.08G pixels/s)
    97. 51.91ms (32.32G pixels/s)
    98. 51.88ms (32.34G pixels/s)
    99. 54.15ms (30.98G pixels/s)
    100. 54.24ms (30.93G pixels/s)
    101. 54.20ms (30.95G pixels/s)
    102. 52.13ms (32.19G pixels/s)
    103. 52.01ms (32.26G pixels/s)
    104. 54.23ms (30.94G pixels/s)
    105. 52.02ms (32.25G pixels/s)
    106. 51.98ms (32.28G pixels/s)
    107. 52.00ms (32.26G pixels/s)
    108. 52.00ms (32.27G pixels/s)
    109. 54.20ms (30.95G pixels/s)
    110. 52.04ms (32.24G pixels/s)
    111. 52.15ms (32.17G pixels/s)
    112. 51.97ms (32.28G pixels/s)
    113. 52.01ms (32.26G pixels/s)
    114. 51.97ms (32.29G pixels/s)
    115. 52.04ms (32.24G pixels/s)
    116. 52.15ms (32.17G pixels/s)
    117. 51.93ms (32.30G pixels/s)
    118. 52.07ms (32.22G pixels/s)
    119. 51.96ms (32.29G pixels/s)
    120. 54.22ms (30.94G pixels/s)
    121. 52.02ms (32.25G pixels/s)
    122. 51.98ms (32.28G pixels/s)
    123. 52.12ms (32.19G pixels/s)
    124. 51.95ms (32.29G pixels/s)
    125. 52.04ms (32.24G pixels/s)
    126. 51.96ms (32.29G pixels/s)
    127. 54.30ms (30.90G pixels/s)
    128. 60.72ms (27.63G pixels/s)
     
  4. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,415
    Likes Received:
    287
    Location:
    Varna, Bulgaria
    source
    source
    Not sure if AMD is preemptively downplaying on the DX12 feature levels. :???:
     
    Razor1, pharma and Nemo like this.
  5. Alessio1989

    Regular Newcomer

    Joined:
    Jun 6, 2015
    Messages:
    540
    Likes Received:
    255
    Well, full support would mean also tiled-deferred support, which honestly I never saw it outside mobile and the Dreamcast @_@
     
  6. Forceman

    Newcomer

    Joined:
    Dec 23, 2010
    Messages:
    11
    Likes Received:
    10
    Yeah, I saw that after I looked closer. Shouldn't it also be 32 though, since it is supposed to be able to do 32 compute (as opposed to 1+31)?
     
  7. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,415
    Likes Received:
    287
    Location:
    Varna, Bulgaria
    A test run on a GK110 board would give us a bit more clarity.
     
  8. drSeehas

    Joined:
    Sep 1, 2015
    Messages:
    5
    Likes Received:
    2
    GeForce GT 730 (GK208) Forceware 353.62
    Compute only:
    1. 21.50ms
    2. 21.46ms
    3. 21.45ms
    4. 21.47ms
    5. 21.46ms
    6. 21.55ms
    7. 42.63ms
    8. 42.66ms
    9. 42.78ms
    10. 42.72ms
    11. 42.71ms
    12. 42.77ms
    13. 63.86ms
    14. 63.86ms
    15. 63.86ms
    16. 63.91ms
    17. 63.93ms
    18. 63.87ms
    19. 85.03ms
    20. 85.10ms
    21. 85.06ms
    22. 85.07ms
    23. 85.08ms
    24. 85.16ms
    25. 106.70ms
    26. 106.88ms
    27. 106.38ms
    28. 106.67ms
    29. 106.93ms
    30. 106.48ms
    31. 127.43ms
    32. 148.42ms
    33. 164.57ms
    34. 164.99ms
    35. 153.84ms
    36. 153.89ms
    37. 148.41ms
    38. 169.69ms
    39. 174.89ms
    40. 169.53ms
    41. 174.85ms
    42. 174.89ms
    43. 174.96ms
    44. 190.76ms
    45. 190.64ms
    46. 196.04ms
    47. 196.02ms
    48. 192.24ms
    49. 201.42ms
    50. 211.86ms
    51. 217.15ms
    52. 217.24ms
    53. 217.16ms
    54. 217.28ms
    55. 217.17ms
    56. 233.23ms
    57. 254.90ms
    58. 238.60ms
    59. 238.31ms
    60. 238.26ms
    61. 250.63ms
    62. 254.64ms
    63. 270.24ms
    64. 286.86ms
    65. 280.66ms
    66. 280.79ms
    67. 275.22ms
    68. 280.57ms
    69. 280.54ms
    70. 301.76ms
    71. 301.84ms
    72. 301.70ms
    73. 301.77ms
    74. 312.61ms
    75. 301.74ms
    76. 322.90ms
    77. 322.87ms
    78. 322.86ms
    79. 322.88ms
    80. 328.20ms
    81. 351.39ms
    82. 354.82ms
    83. 371.05ms
    84. 355.59ms
    85. 371.04ms
    86. 354.73ms
    87. 362.46ms
    88. 375.93ms
    89. 378.02ms
    90. 387.64ms
    91. 393.14ms
    92. 376.66ms
    93. 370.47ms
    94. 397.34ms
    95. 397.43ms
    96. 412.72ms
    97. 419.23ms
    98. 419.82ms
    99. 419.05ms
    100. 435.07ms
    101. 410.72ms
    102. 433.87ms
    103. 428.58ms
    104. 433.91ms
    105. 428.59ms
    106. 428.55ms
    107. 433.89ms
    108. 455.07ms
    109. 449.74ms
    110. 474.66ms
    111. 466.78ms
    112. 466.16ms
    113. 455.24ms
    114. 481.76ms
    115. 487.04ms
    116. 481.76ms
    117. 481.89ms
    118. 486.94ms
    119. 481.60ms
    120. 508.05ms
    121. 502.92ms
    122. 502.77ms
    123. 508.17ms
    124. 502.82ms
    125. 508.19ms
    126. 524.03ms
    127. 521.02ms
    128. 550.32ms
    Graphics only: 240.26ms (6.98G pixels/s)
    Graphics + compute:
    1. 261.56ms (6.41G pixels/s)
    2. 261.33ms (6.42G pixels/s)
    3. 261.30ms (6.42G pixels/s)
    4. 261.33ms (6.42G pixels/s)
    5. 261.31ms (6.42G pixels/s)
    6. 261.82ms (6.41G pixels/s)
    7. 282.36ms (5.94G pixels/s)
    8. 282.44ms (5.94G pixels/s)
    9. 282.35ms (5.94G pixels/s)
    10. 282.57ms (5.94G pixels/s)
    11. 282.53ms (5.94G pixels/s)
    12. 282.42ms (5.94G pixels/s)
    13. 303.52ms (5.53G pixels/s)
    14. 303.53ms (5.53G pixels/s)
    15. 303.50ms (5.53G pixels/s)
    16. 303.56ms (5.53G pixels/s)
    17. 303.49ms (5.53G pixels/s)
    18. 303.60ms (5.53G pixels/s)
    19. 325.76ms (5.15G pixels/s)
    20. 324.78ms (5.17G pixels/s)
    21. 324.75ms (5.17G pixels/s)
    22. 324.66ms (5.17G pixels/s)
    23. 324.83ms (5.16G pixels/s)
    24. 324.73ms (5.17G pixels/s)
    25. 345.95ms (4.85G pixels/s)
    26. 345.96ms (4.85G pixels/s)
    27. 345.87ms (4.85G pixels/s)
    28. 345.81ms (4.85G pixels/s)
    29. 345.91ms (4.85G pixels/s)
    30. 346.03ms (4.85G pixels/s)
    31. 367.08ms (4.57G pixels/s)
    32. 388.30ms (4.32G pixels/s)
    33. 398.85ms (4.21G pixels/s)
    34. 398.80ms (4.21G pixels/s)
    35. 404.17ms (4.15G pixels/s)
    36. 398.84ms (4.21G pixels/s)
    37. 398.85ms (4.21G pixels/s)
    38. 409.41ms (4.10G pixels/s)
    39. 419.93ms (4.00G pixels/s)
    40. 414.65ms (4.05G pixels/s)
    41. 419.95ms (4.00G pixels/s)
    42. 419.97ms (3.99G pixels/s)
    43. 425.17ms (3.95G pixels/s)
    44. 430.60ms (3.90G pixels/s)
    45. 435.86ms (3.85G pixels/s)
    46. 446.44ms (3.76G pixels/s)
    47. 446.50ms (3.76G pixels/s)
    48. 435.88ms (3.85G pixels/s)
    49. 446.43ms (3.76G pixels/s)
    50. 451.93ms (3.71G pixels/s)
    51. 462.20ms (3.63G pixels/s)
    52. 462.28ms (3.63G pixels/s)
    53. 462.23ms (3.63G pixels/s)
    54. 457.06ms (3.67G pixels/s)
    55. 467.50ms (3.59G pixels/s)
    56. 472.97ms (3.55G pixels/s)
    57. 488.73ms (3.43G pixels/s)
    58. 478.13ms (3.51G pixels/s)
    59. 478.27ms (3.51G pixels/s)
    60. 488.91ms (3.43G pixels/s)
    61. 483.57ms (3.47G pixels/s)
    62. 493.98ms (3.40G pixels/s)
    63. 510.01ms (3.29G pixels/s)
    64. 536.43ms (3.13G pixels/s)
    65. 525.76ms (3.19G pixels/s)
    66. 531.28ms (3.16G pixels/s)
    67. 531.14ms (3.16G pixels/s)
    68. 531.11ms (3.16G pixels/s)
    69. 530.86ms (3.16G pixels/s)
    70. 541.71ms (3.10G pixels/s)
    71. 536.40ms (3.13G pixels/s)
    72. 546.96ms (3.07G pixels/s)
    73. 541.68ms (3.10G pixels/s)
    74. 536.38ms (3.13G pixels/s)
    75. 541.67ms (3.10G pixels/s)
    76. 578.68ms (2.90G pixels/s)
    77. 568.07ms (2.95G pixels/s)
    78. 557.46ms (3.01G pixels/s)
    79. 568.07ms (2.95G pixels/s)
    80. 573.27ms (2.93G pixels/s)
    81. 562.79ms (2.98G pixels/s)
    82. 589.27ms (2.85G pixels/s)
    83. 589.21ms (2.85G pixels/s)
    84. 589.29ms (2.85G pixels/s)
    85. 594.57ms (2.82G pixels/s)
    86. 584.10ms (2.87G pixels/s)
    87. 589.66ms (2.85G pixels/s)
    88. 600.39ms (2.79G pixels/s)
    89. 610.24ms (2.75G pixels/s)
    90. 610.28ms (2.75G pixels/s)
    91. 605.01ms (2.77G pixels/s)
    92. 615.60ms (2.73G pixels/s)
    93. 600.25ms (2.80G pixels/s)
    94. 631.56ms (2.66G pixels/s)
    95. 636.96ms (2.63G pixels/s)
    96. 647.93ms (2.59G pixels/s)
    97. 658.03ms (2.55G pixels/s)
    98. 657.94ms (2.55G pixels/s)
    99. 652.72ms (2.57G pixels/s)
    100. 647.38ms (2.59G pixels/s)
    101. 647.56ms (2.59G pixels/s)
    102. 668.41ms (2.51G pixels/s)
    103. 673.82ms (2.49G pixels/s)
    104. 668.60ms (2.51G pixels/s)
    105. 673.90ms (2.49G pixels/s)
    106. 675.27ms (2.48G pixels/s)
    107. 706.66ms (2.37G pixels/s)
    108. 708.39ms (2.37G pixels/s)
    109. 701.54ms (2.39G pixels/s)
    110. 695.48ms (2.41G pixels/s)
    111. 694.96ms (2.41G pixels/s)
    112. 695.47ms (2.41G pixels/s)
    113. 701.90ms (2.39G pixels/s)
    114. 720.94ms (2.33G pixels/s)
    115. 722.85ms (2.32G pixels/s)
    116. 716.19ms (2.34G pixels/s)
    117. 716.11ms (2.34G pixels/s)
    118. 713.24ms (2.35G pixels/s)
    119. 721.78ms (2.32G pixels/s)
    120. 744.78ms (2.25G pixels/s)
    121. 739.88ms (2.27G pixels/s)
    122. 752.32ms (2.23G pixels/s)
    123. 753.69ms (2.23G pixels/s)
    124. 744.38ms (2.25G pixels/s)
    125. 735.86ms (2.28G pixels/s)
    126. 780.35ms (2.15G pixels/s)
    127. 770.58ms (2.18G pixels/s)
    128. 780.66ms (2.15G pixels/s)
     
  9. vedivis

    Joined:
    Sep 1, 2015
    Messages:
    1
    Likes Received:
    0
    GeForce GTX 780 Ti (GK110)
    Compute only:
    1. 17.09ms
    2. 17.32ms
    3. 16.75ms
    4. 16.52ms
    5. 16.74ms
    6. 16.48ms
    7. 16.51ms
    8. 16.50ms
    9. 16.57ms
    10. 16.57ms
    11. 16.71ms
    12. 16.50ms
    13. 16.58ms
    14. 16.56ms
    15. 16.51ms
    16. 16.52ms
    17. 16.51ms
    18. 16.52ms
    19. 16.52ms
    20. 16.50ms
    21. 16.54ms
    22. 16.52ms
    23. 16.56ms
    24. 16.52ms
    25. 16.55ms
    26. 16.50ms
    27. 16.54ms
    28. 16.56ms
    29. 16.53ms
    30. 16.51ms
    31. 32.30ms
    32. 48.64ms
    33. 52.77ms
    34. 52.78ms
    35. 52.74ms
    36. 52.70ms
    37. 52.72ms
    38. 52.69ms
    39. 52.70ms
    40. 52.78ms
    41. 52.72ms
    42. 52.72ms
    43. 52.74ms
    44. 52.72ms
    45. 52.72ms
    46. 52.73ms
    47. 52.75ms
    48. 56.85ms
    49. 56.88ms
    50. 56.87ms
    51. 56.84ms
    52. 56.85ms
    53. 52.78ms
    54. 52.95ms
    55. 52.72ms
    56. 56.90ms
    57. 56.83ms
    58. 56.82ms
    59. 56.83ms
    60. 52.74ms
    61. 52.74ms
    62. 64.73ms
    63. 72.64ms
    64. 84.86ms
    65. 88.94ms
    66. 88.95ms
    67. 84.84ms
    68. 88.90ms
    69. 84.85ms
    70. 88.94ms
    71. 84.86ms
    72. 88.94ms
    73. 84.84ms
    74. 88.96ms
    75. 88.94ms
    76. 88.96ms
    77. 88.94ms
    78. 84.84ms
    79. 88.93ms
    80. 84.85ms
    81. 88.93ms
    82. 88.98ms
    83. 88.94ms
    84. 88.94ms
    85. 88.95ms
    86. 89.07ms
    87. 84.85ms
    88. 88.92ms
    89. 84.87ms
    90. 88.93ms
    91. 88.95ms
    92. 88.94ms
    93. 88.92ms
    94. 104.72ms
    95. 100.64ms
    96. 121.08ms
    97. 121.03ms
    98. 116.98ms
    99. 121.05ms
    100. 121.04ms
    101. 121.26ms
    102. 116.95ms
    103. 121.04ms
    104. 121.03ms
    105. 121.09ms
    106. 116.96ms
    107. 121.04ms
    108. 121.03ms
    109. 121.03ms
    110. 116.96ms
    111. 121.03ms
    112. 116.91ms
    113. 121.02ms
    114. 116.94ms
    115. 121.04ms
    116. 121.03ms
    117. 121.04ms
    118. 116.95ms
    119. 121.04ms
    120. 121.04ms
    121. 121.02ms
    122. 116.94ms
    123. 121.05ms
    124. 121.06ms
    125. 116.93ms
    126. 132.75ms
    127. 132.73ms
    128. 153.13ms
    Graphics only: 37.21ms (45.09G pixels/s)
    Graphics + compute:
    1. 53.49ms (31.36G pixels/s)
    2. 53.42ms (31.40G pixels/s)
    3. 53.43ms (31.40G pixels/s)
    4. 53.52ms (31.35G pixels/s)
    5. 53.54ms (31.34G pixels/s)
    6. 53.51ms (31.35G pixels/s)
    7. 53.51ms (31.35G pixels/s)
    8. 53.47ms (31.38G pixels/s)
    9. 53.46ms (31.38G pixels/s)
    10. 53.49ms (31.36G pixels/s)
    11. 53.50ms (31.36G pixels/s)
    12. 53.50ms (31.36G pixels/s)
    13. 53.51ms (31.35G pixels/s)
    14. 53.47ms (31.38G pixels/s)
    15. 53.45ms (31.39G pixels/s)
    16. 53.46ms (31.38G pixels/s)
    17. 53.49ms (31.37G pixels/s)
    18. 53.54ms (31.34G pixels/s)
    19. 53.58ms (31.31G pixels/s)
    20. 53.56ms (31.32G pixels/s)
    21. 53.51ms (31.35G pixels/s)
    22. 53.51ms (31.35G pixels/s)
    23. 53.49ms (31.36G pixels/s)
    24. 53.48ms (31.37G pixels/s)
    25. 53.46ms (31.38G pixels/s)
    26. 53.43ms (31.40G pixels/s)
    27. 53.43ms (31.40G pixels/s)
    28. 53.48ms (31.37G pixels/s)
    29. 53.52ms (31.35G pixels/s)
    30. 53.50ms (31.36G pixels/s)
    31. 69.27ms (24.22G pixels/s)
    32. 85.57ms (19.61G pixels/s)
    33. 85.56ms (19.61G pixels/s)
    34. 89.60ms (18.73G pixels/s)
    35. 89.65ms (18.71G pixels/s)
    36. 89.66ms (18.71G pixels/s)
    37. 85.58ms (19.60G pixels/s)
    38. 85.55ms (19.61G pixels/s)
    39. 89.61ms (18.72G pixels/s)
    40. 89.65ms (18.71G pixels/s)
    41. 89.66ms (18.71G pixels/s)
    42. 85.56ms (19.61G pixels/s)
    43. 89.61ms (18.72G pixels/s)
    44. 89.65ms (18.71G pixels/s)
    45. 89.66ms (18.71G pixels/s)
    46. 85.62ms (19.60G pixels/s)
    47. 85.61ms (19.60G pixels/s)
    48. 89.60ms (18.73G pixels/s)
    49. 89.65ms (18.71G pixels/s)
    50. 89.65ms (18.71G pixels/s)
    51. 89.66ms (18.71G pixels/s)
    52. 85.59ms (19.60G pixels/s)
    53. 89.63ms (18.72G pixels/s)
    54. 89.68ms (18.71G pixels/s)
    55. 89.70ms (18.70G pixels/s)
    56. 85.59ms (19.60G pixels/s)
    57. 85.62ms (19.59G pixels/s)
    58. 89.63ms (18.72G pixels/s)
    59. 89.68ms (18.71G pixels/s)
    60. 89.68ms (18.71G pixels/s)
    61. 89.68ms (18.71G pixels/s)
    62. 101.37ms (16.55G pixels/s)
    63. 105.45ms (15.91G pixels/s)
    64. 121.75ms (13.78G pixels/s)
    65. 121.77ms (13.78G pixels/s)
    66. 121.74ms (13.78G pixels/s)
    67. 121.78ms (13.78G pixels/s)
    68. 121.79ms (13.78G pixels/s)
    69. 121.75ms (13.78G pixels/s)
    70. 117.64ms (14.26G pixels/s)
    71. 121.71ms (13.79G pixels/s)
    72. 117.65ms (14.26G pixels/s)
    73. 121.69ms (13.79G pixels/s)
    74. 117.66ms (14.26G pixels/s)
    75. 121.68ms (13.79G pixels/s)
    76. 126.01ms (13.31G pixels/s)
    77. 121.79ms (13.78G pixels/s)
    78. 121.75ms (13.78G pixels/s)
    79. 121.70ms (13.79G pixels/s)
    80. 117.63ms (14.26G pixels/s)
    81. 121.76ms (13.78G pixels/s)
    82. 121.72ms (13.78G pixels/s)
    83. 121.75ms (13.78G pixels/s)
    84. 117.65ms (14.26G pixels/s)
    85. 125.84ms (13.33G pixels/s)
    86. 121.74ms (13.78G pixels/s)
    87. 125.77ms (13.34G pixels/s)
    88. 121.74ms (13.78G pixels/s)
    89. 125.81ms (13.34G pixels/s)
    90. 121.71ms (13.78G pixels/s)
    91. 121.69ms (13.79G pixels/s)
    92. 117.64ms (14.26G pixels/s)
    93. 121.73ms (13.78G pixels/s)
    94. 133.40ms (12.58G pixels/s)
    95. 137.53ms (12.20G pixels/s)
    96. 153.84ms (10.91G pixels/s)
    97. 153.75ms (10.91G pixels/s)
    98. 149.74ms (11.20G pixels/s)
    99. 153.80ms (10.91G pixels/s)
    100. 153.83ms (10.91G pixels/s)
    101. 157.93ms (10.62G pixels/s)
    102. 149.73ms (11.20G pixels/s)
    103. 149.78ms (11.20G pixels/s)
    104. 153.88ms (10.90G pixels/s)
    105. 153.87ms (10.90G pixels/s)
    106. 149.73ms (11.20G pixels/s)
    107. 153.89ms (10.90G pixels/s)
    108. 153.88ms (10.90G pixels/s)
    109. 157.99ms (10.62G pixels/s)
    110. 149.84ms (11.20G pixels/s)
    111. 149.77ms (11.20G pixels/s)
    112. 153.81ms (10.91G pixels/s)
    113. 153.96ms (10.90G pixels/s)
    114. 153.77ms (10.91G pixels/s)
    115. 149.77ms (11.20G pixels/s)
    116. 153.86ms (10.90G pixels/s)
    117. 153.82ms (10.91G pixels/s)
    118. 153.86ms (10.90G pixels/s)
    119. 153.76ms (10.91G pixels/s)
    120. 149.75ms (11.20G pixels/s)
    121. 153.86ms (10.90G pixels/s)
    122. 153.84ms (10.91G pixels/s)
    123. 153.77ms (10.91G pixels/s)
    124. 149.80ms (11.20G pixels/s)
    125. 153.82ms (10.91G pixels/s)
    126. 169.65ms (9.89G pixels/s)
    127. 169.72ms (9.89G pixels/s)
    128. 185.96ms (9.02G pixels/s)
     
  10. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    8,414
    Likes Received:
    3,059
    The plot thickens:

    Thoughts?




    Is the EDRAM micromanaged? Honest question. I thought DX11.x handled that automatically, like Intel's IGPs using L3 and L4.
    As for the HSA-centric code, perhaps they can point many of those tasks towards the CPU anyways? I imagine most gaming PCs will have CPUs that are >3x faster than the 8x 1.6-1.75GHz Jaguars.


    Although it backfired really hard because of its even-worse-than-expected performance on Kepler cards, Arkham Knight is nVidia's dream come true.
    If only gamers would suck it up, buy the game and not complain like the entitled little whining brats that they are...

    Your selective quoting abilities are great. Let me try it too.
    Same link: https://docs.unrealengine.com/lates...ing/ShaderDevelopment/AsyncCompute/index.html

     
  11. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    745
    Location:
    NY, NY

    pointless, tier 2 binding isn't even being used fully and probably won't be for another few years.

    The bolded part, interesting, and without them profiling their shaders, and showing that to their respective community its very hard to even say what is going on on Maxwell, so what is thier point? The only other way around that is to see other Dx12 async code at work.


    has nothing to do with this discussion

    yeah so, that's why I linked the entire page? your point? Outside of useless banter?
     
    #211 Razor1, Sep 1, 2015
    Last edited: Sep 1, 2015
    pharma likes this.
  12. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    8,414
    Likes Received:
    3,059
    Here's a reminder that back in 2013, Mark Cerny presented Async almost as the "holy grail" for this generation of consoles, and that we should be seeing titles taking advantage of that in.. 2016:


    I just find it curious that you chose to quote the only sentence in the whole damn page that seems to demean the use of Async compute, that's all.
    Very curious.
     
  13. Andrew

    Newcomer

    Joined:
    Jul 26, 2002
    Messages:
    58
    Likes Received:
    5
    Here are the results on my 780ti with 355.82

    Compute only:
    1. 20.27ms
    2. 19.38ms
    3. 18.32ms
    4. 17.98ms
    5. 16.55ms
    6. 16.56ms
    7. 15.88ms
    8. 15.55ms
    9. 15.53ms
    10. 15.52ms
    11. 15.52ms
    12. 15.58ms
    13. 15.56ms
    14. 15.51ms
    15. 15.56ms
    16. 15.54ms
    17. 15.53ms
    18. 15.53ms
    19. 15.53ms
    20. 15.52ms
    21. 15.54ms
    22. 15.53ms
    23. 15.59ms
    24. 15.54ms
    25. 15.53ms
    26. 15.56ms
    27. 15.51ms
    28. 15.59ms
    29. 15.54ms
    30. 15.55ms
    31. 31.01ms
    32. 46.38ms
    33. 50.23ms
    34. 54.15ms
    35. 58.11ms
    36. 54.17ms
    37. 50.24ms
    38. 54.16ms
    39. 54.13ms
    40. 50.36ms
    41. 50.27ms
    42. 54.12ms
    43. 58.04ms
    44. 50.24ms
    45. 58.08ms
    46. 54.13ms
    47. 50.24ms
    48. 50.27ms
    49. 54.24ms
    50. 54.18ms
    51. 50.24ms
    52. 50.25ms
    53. 54.15ms
    54. 58.09ms
    55. 54.17ms
    56. 50.24ms
    57. 58.06ms
    58. 50.25ms
    59. 50.31ms
    60. 54.14ms
    61. 50.24ms
    62. 61.87ms
    63. 69.63ms
    64. 81.12ms
    65. 85.01ms
    66. 88.94ms
    67. 81.14ms
    68. 92.77ms
    69. 85.05ms
    70. 81.12ms
    71. 88.88ms
    72. 85.07ms
    73. 84.99ms
    74. 85.00ms
    75. 85.10ms
    76. 84.97ms
    77. 85.00ms
    78. 85.10ms
    79. 84.98ms
    80. 85.00ms
    81. 85.09ms
    82. 81.11ms
    83. 85.01ms
    84. 85.09ms
    85. 81.09ms
    86. 85.05ms
    87. 85.09ms
    88. 81.14ms
    89. 81.12ms
    90. 85.10ms
    91. 81.10ms
    92. 85.01ms
    93. 88.96ms
    94. 96.51ms
    95. 104.37ms
    96. 111.99ms
    97. 115.86ms
    98. 119.76ms
    99. 115.85ms
    100. 115.90ms
    101. 115.88ms
    102. 119.85ms
    103. 115.86ms
    104. 123.72ms
    105. 115.85ms
    106. 115.94ms
    107. 111.99ms
    108. 123.69ms
    109. 115.84ms
    110. 119.80ms
    111. 112.00ms
    112. 111.97ms
    113. 119.78ms
    114. 112.01ms
    115. 119.76ms
    116. 115.85ms
    117. 119.84ms
    118. 115.88ms
    119. 115.95ms
    120. 115.85ms
    121. 123.70ms
    122. 111.99ms
    123. 123.70ms
    124. 115.84ms
    125. 119.81ms
    126. 131.27ms
    127. 135.22ms
    128. 150.60ms
    Graphics only: 35.75ms (46.93G pixels/s)
    Graphics + compute:
    1. 51.28ms (32.72G pixels/s)
    2. 51.17ms (32.79G pixels/s)
    3. 51.13ms (32.81G pixels/s)
    4. 51.20ms (32.77G pixels/s)
    5. 51.25ms (32.73G pixels/s)
    6. 51.22ms (32.76G pixels/s)
    7. 51.21ms (32.76G pixels/s)
    8. 51.20ms (32.77G pixels/s)
    9. 51.17ms (32.79G pixels/s)
    10. 51.28ms (32.72G pixels/s)
    11. 51.24ms (32.75G pixels/s)
    12. 51.20ms (32.77G pixels/s)
    13. 51.19ms (32.78G pixels/s)
    14. 51.19ms (32.77G pixels/s)
    15. 51.26ms (32.73G pixels/s)
    16. 51.23ms (32.75G pixels/s)
    17. 51.15ms (32.80G pixels/s)
    18. 51.14ms (32.81G pixels/s)
    19. 51.17ms (32.79G pixels/s)
    20. 51.27ms (32.72G pixels/s)
    21. 51.18ms (32.78G pixels/s)
    22. 51.16ms (32.80G pixels/s)
    23. 51.21ms (32.76G pixels/s)
    24. 51.19ms (32.78G pixels/s)
    25. 51.28ms (32.72G pixels/s)
    26. 51.22ms (32.75G pixels/s)
    27. 51.16ms (32.79G pixels/s)
    28. 51.21ms (32.76G pixels/s)
    29. 51.18ms (32.78G pixels/s)
    30. 51.19ms (32.78G pixels/s)
    31. 66.60ms (25.19G pixels/s)
    32. 82.16ms (20.42G pixels/s)
    33. 82.13ms (20.43G pixels/s)
    34. 85.94ms (19.52G pixels/s)
    35. 85.95ms (19.52G pixels/s)
    36. 82.15ms (20.42G pixels/s)
    37. 82.07ms (20.44G pixels/s)
    38. 85.90ms (19.53G pixels/s)
    39. 82.09ms (20.44G pixels/s)
    40. 85.93ms (19.52G pixels/s)
    41. 85.96ms (19.52G pixels/s)
    42. 89.84ms (18.67G pixels/s)
    43. 82.04ms (20.45G pixels/s)
    44. 82.03ms (20.45G pixels/s)
    45. 85.97ms (19.51G pixels/s)
    46. 82.04ms (20.45G pixels/s)
    47. 85.94ms (19.52G pixels/s)
    48. 86.05ms (19.50G pixels/s)
    49. 82.00ms (20.46G pixels/s)
    50. 85.97ms (19.51G pixels/s)
    51. 82.12ms (20.43G pixels/s)
    52. 85.89ms (19.53G pixels/s)
    53. 85.93ms (19.52G pixels/s)
    54. 89.90ms (18.66G pixels/s)
    55. 81.98ms (20.46G pixels/s)
    56. 85.94ms (19.52G pixels/s)
    57. 89.95ms (18.65G pixels/s)
    58. 82.01ms (20.46G pixels/s)
    59. 85.95ms (19.52G pixels/s)
    60. 86.05ms (19.50G pixels/s)
    61. 82.01ms (20.46G pixels/s)
    62. 97.56ms (17.20G pixels/s)
    63. 101.35ms (16.55G pixels/s)
    64. 112.91ms (14.86G pixels/s)
    65. 116.93ms (14.35G pixels/s)
    66. 116.78ms (14.37G pixels/s)
    67. 116.89ms (14.35G pixels/s)
    68. 112.92ms (14.86G pixels/s)
    69. 120.73ms (13.90G pixels/s)
    70. 116.75ms (14.37G pixels/s)
    71. 116.86ms (14.36G pixels/s)
    72. 116.77ms (14.37G pixels/s)
    73. 112.99ms (14.85G pixels/s)
    74. 116.79ms (14.37G pixels/s)
    75. 116.91ms (14.35G pixels/s)
    76. 112.91ms (14.86G pixels/s)
    77. 116.79ms (14.36G pixels/s)
    78. 120.74ms (13.90G pixels/s)
    79. 116.81ms (14.36G pixels/s)
    80. 124.65ms (13.46G pixels/s)
    81. 116.81ms (14.36G pixels/s)
    82. 120.76ms (13.89G pixels/s)
    83. 116.79ms (14.37G pixels/s)
    84. 120.76ms (13.89G pixels/s)
    85. 112.90ms (14.86G pixels/s)
    86. 116.85ms (14.36G pixels/s)
    87. 116.71ms (14.38G pixels/s)
    88. 116.88ms (14.35G pixels/s)
    89. 112.87ms (14.86G pixels/s)
    90. 120.75ms (13.89G pixels/s)
    91. 116.75ms (14.37G pixels/s)
    92. 113.00ms (14.85G pixels/s)
    93. 116.84ms (14.36G pixels/s)
    94. 132.31ms (12.68G pixels/s)
    95. 128.35ms (13.07G pixels/s)
    96. 151.59ms (11.07G pixels/s)
    97. 143.71ms (11.67G pixels/s)
    98. 147.74ms (11.36G pixels/s)
    99. 147.65ms (11.36G pixels/s)
    100. 151.61ms (11.07G pixels/s)
    101. 143.84ms (11.66G pixels/s)
    102. 147.66ms (11.36G pixels/s)
    103. 151.60ms (11.07G pixels/s)
    104. 143.73ms (11.67G pixels/s)
    105. 147.73ms (11.36G pixels/s)
    106. 143.84ms (11.66G pixels/s)
    107. 147.62ms (11.37G pixels/s)
    108. 147.70ms (11.36G pixels/s)
    109. 143.76ms (11.67G pixels/s)
    110. 147.75ms (11.35G pixels/s)
    111. 147.65ms (11.36G pixels/s)
    112. 147.68ms (11.36G pixels/s)
    113. 147.73ms (11.36G pixels/s)
    114. 147.62ms (11.37G pixels/s)
    115. 147.68ms (11.36G pixels/s)
    116. 143.80ms (11.67G pixels/s)
    117. 147.71ms (11.36G pixels/s)
    118. 143.83ms (11.66G pixels/s)
    119. 147.67ms (11.36G pixels/s)
    120. 151.58ms (11.07G pixels/s)
    121. 143.73ms (11.67G pixels/s)
    122. 147.70ms (11.36G pixels/s)
    123. 147.68ms (11.36G pixels/s)
    124. 151.53ms (11.07G pixels/s)
    125. 147.78ms (11.35G pixels/s)
    126. 159.17ms (10.54G pixels/s)
    127. 163.07ms (10.29G pixels/s)
    128. 178.62ms (9.39G pixels/s)
     
  14. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    745
    Location:
    NY, NY
    It was specific to what we were seeing on the sample program that everyone was testing.......
    curiously killed the cat ;), be straight forward and you will get a straight answer.

    And the Oxide dev or who ever he is, is not being straight froward by using vagaries in his comments about performance differences between the two architectures, which I would expect him to know exactly what is going on.
     
  15. madyasiwi

    Newcomer

    Joined:
    Oct 7, 2008
    Messages:
    194
    Likes Received:
    32
    Kepler vs Maxwell

    [​IMG]

    [​IMG]
     
  16. madyasiwi

    Newcomer

    Joined:
    Oct 7, 2008
    Messages:
    194
    Likes Received:
    32
    I made a mistake with the first (yesterday) chart, the data actually come from 390x not Fury-X. Sorry for my blunder. This is Fury-X.

    [​IMG]
     
  17. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    8,414
    Likes Received:
    3,059
    Ok so I'm looking at the several results being posted from MDolenc's tool and here's what I'm seeing:

    1 - All GCN chips seem to present an almost flat compute time of 50-60ms regardless of the number of compute kernels and the rendering task being enabled or not.

    2 - a) All nVidia chips seem to present a time that increases with the increase of compute kernels. Maxwell chips show lower compute times than Kepler chips.
    b) If rendering task time is X for an nVidia chip and compute time for a given number n of kernels is Y(n), then the "async compute" time for all nVidia chips seems to be very close to Y(n)+X.




    If nVidia's chips need to add the rendering time to a compute task with even one active kernel, doesn't this mean that "Async Compute" is not actually working and nVidia's hardware, at least in this test, does not seem to support Async Compute? Even if the driver does allow Async Compute tasks to be done, the hardware just seems to be doing rendering+compute in a serial fashion and not parallel at all.
     
    #217 ToTTenTranz, Sep 1, 2015
    Last edited: Sep 1, 2015
    RedditUserB, Jawed and fellix like this.
  18. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    7,807
    Likes Received:
    2,072
    Location:
    Well within 3d
    One idea I had is that, if this an internal processor or potentially a SIMD running a firmware routine, is that it's a 32-slot structure.
    Particularly if this runs as a shader, there could be a single lane allocated as a sort of queue manager thread.
    (edit: As an aside, if it were running as an internel compute kernel, that could push Nvidia's implementation outside of what HSA states should be done for queue management--if Nvidia cared.)

    Even if not running as a shader, a reserved queue for a sort of system-controlled queue manager might be prudent.
    That would provide a way for something to monitor the queue and this lane would be able to respond to prompts to kill a queue or suspend it.
    It could overlap its work with the rest of the lanes, while remaining separate from possible hangs, malformed commands, or floods.
    The downside as slots reach the end of the bundle, is what to do at the 32 limit.

    One way to contain the cost of this without impacting successive sets of 32 is for the management software to double-book one of the lanes, which would take twice as long. (edit: Unless it can opportunistically schedule on a lane that finishes faster.) After that though, the queue manager's cost is hidden until the hardware's ability to take dispatches is exhausted, which seems far off with the current settings.


    I was thinking of other ways to test this. The current method is potentially mixing concurrency testing with asynchronous execution.
    If the kernel can programmatically shift the loop duration to fractions of itself, it might tease out how coarsely execution is tracked, and whether we're looking at some non-compute issue that is confounding measurements.
    Something like having the shader loop terminate at 0, 1/4, 1/2, 1, 1.5, 2x rather than every one behaving the same, might show where the floor is in this measurement.
    If this could be done at specific counts, it might show if there is something physical to the grouping behavior for Nvidia if its timing behavior changes based on delays incurred before or after certain limits.

    Varying the iteration count might give some kind of idea if there is a single-threaded limitation on the GCN ones, or if it really is that insensitive to this level of load.
    Injecting stalls into the graphics portion and/or some of the dispatches might show how flexibly the GPUs can work around them.
     
    #218 3dilettante, Sep 1, 2015
    Last edited: Sep 1, 2015
    Jawed likes this.
  19. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    7,807
    Likes Received:
    2,072
    Location:
    Well within 3d
    If the GPU is juggling two distinct modes internally, it might be that it cannot readily run both at the same time, hence the discussion of an expensive context switch. Rather than at a kernel or wavefront level, it might be a front-end context.
    If the GPU is also not able to readily preempt the graphics portion, it might be that it will keep its context at the forefront barring an explicit way of yielding priority.
     
    Jawed likes this.
  20. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    8,414
    Likes Received:
    3,059
    But isn't Async Compute supposed to be a feature where the GPU does not need to juggle between two distinct modes (graphics/compute)?

    So now we have a second test pointing to what both the Oxide employee and AMD_Robert claimed?





    BTW, what if we monitor CPU usage during MDolenc's test to see if the "heavy CPU costs" claimed by Kollok are there for Kepler/Maxwell, and then compare for GCN GPUs?
     
    #220 ToTTenTranz, Sep 1, 2015
    Last edited: Sep 1, 2015

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...