Slide 22 of this presentation (http://developer.amd.com/wordpress/media/2013/06/2620_final.pdf) lists many GCN specific things that would be hard to port to PC DirectX. And if people start to do crazy stuff such as writing to the GPU command queue by a compute shader (to spawn tasks by the GPU), then porting to PC becomes almost impossible. You could also write PS3 SPU rendering code that was very hard to port to PC. And we still got acceptable PC ports. However this time both consoles support same low level GPU tricks, making PC the most different platform (last gen PS3 was the most different). It is interesting too see where this leads, and how fast the PC APIs will start to expose the remaining missing console GPU features.

GTX 980 Ti (ForceWare 355.82) Spoiler Compute only: 1. 11.53ms 2. 11.01ms 3. 10.31ms 4. 10.43ms 5. 10.40ms 6. 9.50ms 7. 9.28ms 8. 9.29ms 9. 9.32ms 10. 9.33ms 11. 9.12ms 12. 9.10ms 13. 9.15ms 14. 9.17ms 15. 9.16ms 16. 9.18ms 17. 9.17ms 18. 9.15ms 19. 9.12ms 20. 9.16ms 21. 9.16ms 22. 9.16ms 23. 9.19ms 24. 9.18ms 25. 9.21ms 26. 9.15ms 27. 9.13ms 28. 9.22ms 29. 9.18ms 30. 9.21ms 31. 9.14ms 32. 18.02ms 33. 18.05ms 34. 22.44ms 35. 20.28ms 36. 20.24ms 37. 20.20ms 38. 18.05ms 39. 20.27ms 40. 20.21ms 41. 20.24ms 42. 18.05ms 43. 20.23ms 44. 20.26ms 45. 22.47ms 46. 20.22ms 47. 20.25ms 48. 18.05ms 49. 20.26ms 50. 20.27ms 51. 20.23ms 52. 18.02ms 53. 20.23ms 54. 20.25ms 55. 20.22ms 56. 18.05ms 57. 20.24ms 58. 20.26ms 59. 20.25ms 60. 18.08ms 61. 20.24ms 62. 22.48ms 63. 18.07ms 64. 28.96ms 65. 28.98ms 66. 31.31ms 67. 29.21ms 68. 26.98ms 69. 26.97ms 70. 26.98ms 71. 31.40ms 72. 26.97ms 73. 26.96ms 74. 29.16ms 75. 31.41ms 76. 27.03ms 77. 27.01ms 78. 29.14ms 79. 29.22ms 80. 31.39ms 81. 27.01ms 82. 26.98ms 83. 29.15ms 84. 31.39ms 85. 26.95ms 86. 26.98ms 87. 27.01ms 88. 27.06ms 89. 29.18ms 90. 26.95ms 91. 29.18ms 92. 27.07ms 93. 27.02ms 94. 31.39ms 95. 27.00ms 96. 40.13ms 97. 37.90ms 98. 35.84ms 99. 38.15ms 100. 38.12ms 101. 35.91ms 102. 35.91ms 103. 40.29ms 104. 35.89ms 105. 38.09ms 106. 38.16ms 107. 38.08ms 108. 35.89ms 109. 35.98ms 110. 35.90ms 111. 35.90ms 112. 40.36ms 113. 40.41ms 114. 40.35ms 115. 36.14ms 116. 40.28ms 117. 35.89ms 118. 38.10ms 119. 35.94ms 120. 38.14ms 121. 38.09ms 122. 35.96ms 123. 35.91ms 124. 35.93ms 125. 36.03ms 126. 35.90ms 127. 35.93ms 128. 44.67ms Graphics only: 16.40ms (102.31G pixels/s) Graphics + compute: 1. 25.02ms (67.05G pixels/s) 2. 24.96ms (67.23G pixels/s) 3. 25.05ms (66.99G pixels/s) 4. 25.25ms (66.44G pixels/s) 5. 25.19ms (66.60G pixels/s) 6. 25.10ms (66.83G pixels/s) 7. 25.10ms (66.83G pixels/s) 8. 25.17ms (66.65G pixels/s) 9. 25.19ms (66.61G pixels/s) 10. 25.19ms (66.60G pixels/s) 11. 25.22ms (66.52G pixels/s) 12. 25.20ms (66.56G pixels/s) 13. 25.19ms (66.60G pixels/s) 14. 25.21ms (66.56G pixels/s) 15. 25.16ms (66.68G pixels/s) 16. 25.15ms (66.70G pixels/s) 17. 25.18ms (66.63G pixels/s) 18. 25.22ms (66.51G pixels/s) 19. 25.15ms (66.70G pixels/s) 20. 25.20ms (66.57G pixels/s) 21. 25.16ms (66.69G pixels/s) 22. 25.15ms (66.71G pixels/s) 23. 25.12ms (66.78G pixels/s) 24. 25.13ms (66.76G pixels/s) 25. 25.14ms (66.74G pixels/s) 26. 25.20ms (66.58G pixels/s) 27. 25.12ms (66.80G pixels/s) 28. 25.13ms (66.76G pixels/s) 29. 25.15ms (66.72G pixels/s) 30. 25.16ms (66.68G pixels/s) 31. 25.16ms (66.68G pixels/s) 32. 34.02ms (49.32G pixels/s) 33. 33.94ms (49.43G pixels/s) 34. 34.10ms (49.20G pixels/s) 35. 36.37ms (46.13G pixels/s) 36. 34.15ms (49.12G pixels/s) 37. 34.10ms (49.19G pixels/s) 38. 34.14ms (49.14G pixels/s) 39. 34.13ms (49.16G pixels/s) 40. 34.12ms (49.17G pixels/s) 41. 34.13ms (49.16G pixels/s) 42. 34.17ms (49.10G pixels/s) 43. 34.10ms (49.20G pixels/s) 44. 34.10ms (49.19G pixels/s) 45. 34.05ms (49.27G pixels/s) 46. 36.30ms (46.22G pixels/s) 47. 34.06ms (49.25G pixels/s) 48. 34.10ms (49.21G pixels/s) 49. 36.40ms (46.09G pixels/s) 50. 34.12ms (49.17G pixels/s) 51. 34.20ms (49.06G pixels/s) 52. 34.13ms (49.16G pixels/s) 53. 36.33ms (46.17G pixels/s) 54. 34.08ms (49.23G pixels/s) 55. 34.12ms (49.17G pixels/s) 56. 34.14ms (49.14G pixels/s) 57. 36.38ms (46.12G pixels/s) 58. 33.98ms (49.37G pixels/s) 59. 36.34ms (46.17G pixels/s) 60. 34.04ms (49.29G pixels/s) 61. 34.12ms (49.17G pixels/s) 62. 34.22ms (49.03G pixels/s) 63. 38.47ms (43.61G pixels/s) 64. 44.98ms (37.30G pixels/s) 65. 42.83ms (39.17G pixels/s) 66. 42.94ms (39.07G pixels/s) 67. 43.04ms (38.98G pixels/s) 68. 43.04ms (38.98G pixels/s) 69. 43.00ms (39.01G pixels/s) 70. 42.98ms (39.04G pixels/s) 71. 45.46ms (36.91G pixels/s) 72. 47.66ms (35.20G pixels/s) 73. 49.83ms (33.67G pixels/s) 74. 43.31ms (38.73G pixels/s) 75. 43.13ms (38.90G pixels/s) 76. 45.22ms (37.10G pixels/s) 77. 43.10ms (38.93G pixels/s) 78. 43.00ms (39.02G pixels/s) 79. 45.36ms (36.99G pixels/s) 80. 49.69ms (33.76G pixels/s) 81. 49.77ms (33.71G pixels/s) 82. 43.09ms (38.93G pixels/s) 83. 43.00ms (39.01G pixels/s) 84. 45.43ms (36.93G pixels/s) 85. 43.03ms (38.99G pixels/s) 86. 43.03ms (38.99G pixels/s) 87. 47.60ms (35.25G pixels/s) 88. 42.97ms (39.04G pixels/s) 89. 43.03ms (38.99G pixels/s) 90. 45.36ms (36.99G pixels/s) 91. 43.06ms (38.96G pixels/s) 92. 43.12ms (38.91G pixels/s) 93. 43.09ms (38.93G pixels/s) 94. 45.32ms (37.02G pixels/s) 95. 45.29ms (37.04G pixels/s) 96. 53.97ms (31.08G pixels/s) 97. 51.91ms (32.32G pixels/s) 98. 51.88ms (32.34G pixels/s) 99. 54.15ms (30.98G pixels/s) 100. 54.24ms (30.93G pixels/s) 101. 54.20ms (30.95G pixels/s) 102. 52.13ms (32.19G pixels/s) 103. 52.01ms (32.26G pixels/s) 104. 54.23ms (30.94G pixels/s) 105. 52.02ms (32.25G pixels/s) 106. 51.98ms (32.28G pixels/s) 107. 52.00ms (32.26G pixels/s) 108. 52.00ms (32.27G pixels/s) 109. 54.20ms (30.95G pixels/s) 110. 52.04ms (32.24G pixels/s) 111. 52.15ms (32.17G pixels/s) 112. 51.97ms (32.28G pixels/s) 113. 52.01ms (32.26G pixels/s) 114. 51.97ms (32.29G pixels/s) 115. 52.04ms (32.24G pixels/s) 116. 52.15ms (32.17G pixels/s) 117. 51.93ms (32.30G pixels/s) 118. 52.07ms (32.22G pixels/s) 119. 51.96ms (32.29G pixels/s) 120. 54.22ms (30.94G pixels/s) 121. 52.02ms (32.25G pixels/s) 122. 51.98ms (32.28G pixels/s) 123. 52.12ms (32.19G pixels/s) 124. 51.95ms (32.29G pixels/s) 125. 52.04ms (32.24G pixels/s) 126. 51.96ms (32.29G pixels/s) 127. 54.30ms (30.90G pixels/s) 128. 60.72ms (27.63G pixels/s)

Well, full support would mean also tiled-deferred support, which honestly I never saw it outside mobile and the Dreamcast @_@

Yeah, I saw that after I looked closer. Shouldn't it also be 32 though, since it is supposed to be able to do 32 compute (as opposed to 1+31)?

GeForce GT 730 (GK208) Forceware 353.62 Spoiler: 730 / GK208 Compute only: 1. 21.50ms 2. 21.46ms 3. 21.45ms 4. 21.47ms 5. 21.46ms 6. 21.55ms 7. 42.63ms 8. 42.66ms 9. 42.78ms 10. 42.72ms 11. 42.71ms 12. 42.77ms 13. 63.86ms 14. 63.86ms 15. 63.86ms 16. 63.91ms 17. 63.93ms 18. 63.87ms 19. 85.03ms 20. 85.10ms 21. 85.06ms 22. 85.07ms 23. 85.08ms 24. 85.16ms 25. 106.70ms 26. 106.88ms 27. 106.38ms 28. 106.67ms 29. 106.93ms 30. 106.48ms 31. 127.43ms 32. 148.42ms 33. 164.57ms 34. 164.99ms 35. 153.84ms 36. 153.89ms 37. 148.41ms 38. 169.69ms 39. 174.89ms 40. 169.53ms 41. 174.85ms 42. 174.89ms 43. 174.96ms 44. 190.76ms 45. 190.64ms 46. 196.04ms 47. 196.02ms 48. 192.24ms 49. 201.42ms 50. 211.86ms 51. 217.15ms 52. 217.24ms 53. 217.16ms 54. 217.28ms 55. 217.17ms 56. 233.23ms 57. 254.90ms 58. 238.60ms 59. 238.31ms 60. 238.26ms 61. 250.63ms 62. 254.64ms 63. 270.24ms 64. 286.86ms 65. 280.66ms 66. 280.79ms 67. 275.22ms 68. 280.57ms 69. 280.54ms 70. 301.76ms 71. 301.84ms 72. 301.70ms 73. 301.77ms 74. 312.61ms 75. 301.74ms 76. 322.90ms 77. 322.87ms 78. 322.86ms 79. 322.88ms 80. 328.20ms 81. 351.39ms 82. 354.82ms 83. 371.05ms 84. 355.59ms 85. 371.04ms 86. 354.73ms 87. 362.46ms 88. 375.93ms 89. 378.02ms 90. 387.64ms 91. 393.14ms 92. 376.66ms 93. 370.47ms 94. 397.34ms 95. 397.43ms 96. 412.72ms 97. 419.23ms 98. 419.82ms 99. 419.05ms 100. 435.07ms 101. 410.72ms 102. 433.87ms 103. 428.58ms 104. 433.91ms 105. 428.59ms 106. 428.55ms 107. 433.89ms 108. 455.07ms 109. 449.74ms 110. 474.66ms 111. 466.78ms 112. 466.16ms 113. 455.24ms 114. 481.76ms 115. 487.04ms 116. 481.76ms 117. 481.89ms 118. 486.94ms 119. 481.60ms 120. 508.05ms 121. 502.92ms 122. 502.77ms 123. 508.17ms 124. 502.82ms 125. 508.19ms 126. 524.03ms 127. 521.02ms 128. 550.32ms Graphics only: 240.26ms (6.98G pixels/s) Graphics + compute: 1. 261.56ms (6.41G pixels/s) 2. 261.33ms (6.42G pixels/s) 3. 261.30ms (6.42G pixels/s) 4. 261.33ms (6.42G pixels/s) 5. 261.31ms (6.42G pixels/s) 6. 261.82ms (6.41G pixels/s) 7. 282.36ms (5.94G pixels/s) 8. 282.44ms (5.94G pixels/s) 9. 282.35ms (5.94G pixels/s) 10. 282.57ms (5.94G pixels/s) 11. 282.53ms (5.94G pixels/s) 12. 282.42ms (5.94G pixels/s) 13. 303.52ms (5.53G pixels/s) 14. 303.53ms (5.53G pixels/s) 15. 303.50ms (5.53G pixels/s) 16. 303.56ms (5.53G pixels/s) 17. 303.49ms (5.53G pixels/s) 18. 303.60ms (5.53G pixels/s) 19. 325.76ms (5.15G pixels/s) 20. 324.78ms (5.17G pixels/s) 21. 324.75ms (5.17G pixels/s) 22. 324.66ms (5.17G pixels/s) 23. 324.83ms (5.16G pixels/s) 24. 324.73ms (5.17G pixels/s) 25. 345.95ms (4.85G pixels/s) 26. 345.96ms (4.85G pixels/s) 27. 345.87ms (4.85G pixels/s) 28. 345.81ms (4.85G pixels/s) 29. 345.91ms (4.85G pixels/s) 30. 346.03ms (4.85G pixels/s) 31. 367.08ms (4.57G pixels/s) 32. 388.30ms (4.32G pixels/s) 33. 398.85ms (4.21G pixels/s) 34. 398.80ms (4.21G pixels/s) 35. 404.17ms (4.15G pixels/s) 36. 398.84ms (4.21G pixels/s) 37. 398.85ms (4.21G pixels/s) 38. 409.41ms (4.10G pixels/s) 39. 419.93ms (4.00G pixels/s) 40. 414.65ms (4.05G pixels/s) 41. 419.95ms (4.00G pixels/s) 42. 419.97ms (3.99G pixels/s) 43. 425.17ms (3.95G pixels/s) 44. 430.60ms (3.90G pixels/s) 45. 435.86ms (3.85G pixels/s) 46. 446.44ms (3.76G pixels/s) 47. 446.50ms (3.76G pixels/s) 48. 435.88ms (3.85G pixels/s) 49. 446.43ms (3.76G pixels/s) 50. 451.93ms (3.71G pixels/s) 51. 462.20ms (3.63G pixels/s) 52. 462.28ms (3.63G pixels/s) 53. 462.23ms (3.63G pixels/s) 54. 457.06ms (3.67G pixels/s) 55. 467.50ms (3.59G pixels/s) 56. 472.97ms (3.55G pixels/s) 57. 488.73ms (3.43G pixels/s) 58. 478.13ms (3.51G pixels/s) 59. 478.27ms (3.51G pixels/s) 60. 488.91ms (3.43G pixels/s) 61. 483.57ms (3.47G pixels/s) 62. 493.98ms (3.40G pixels/s) 63. 510.01ms (3.29G pixels/s) 64. 536.43ms (3.13G pixels/s) 65. 525.76ms (3.19G pixels/s) 66. 531.28ms (3.16G pixels/s) 67. 531.14ms (3.16G pixels/s) 68. 531.11ms (3.16G pixels/s) 69. 530.86ms (3.16G pixels/s) 70. 541.71ms (3.10G pixels/s) 71. 536.40ms (3.13G pixels/s) 72. 546.96ms (3.07G pixels/s) 73. 541.68ms (3.10G pixels/s) 74. 536.38ms (3.13G pixels/s) 75. 541.67ms (3.10G pixels/s) 76. 578.68ms (2.90G pixels/s) 77. 568.07ms (2.95G pixels/s) 78. 557.46ms (3.01G pixels/s) 79. 568.07ms (2.95G pixels/s) 80. 573.27ms (2.93G pixels/s) 81. 562.79ms (2.98G pixels/s) 82. 589.27ms (2.85G pixels/s) 83. 589.21ms (2.85G pixels/s) 84. 589.29ms (2.85G pixels/s) 85. 594.57ms (2.82G pixels/s) 86. 584.10ms (2.87G pixels/s) 87. 589.66ms (2.85G pixels/s) 88. 600.39ms (2.79G pixels/s) 89. 610.24ms (2.75G pixels/s) 90. 610.28ms (2.75G pixels/s) 91. 605.01ms (2.77G pixels/s) 92. 615.60ms (2.73G pixels/s) 93. 600.25ms (2.80G pixels/s) 94. 631.56ms (2.66G pixels/s) 95. 636.96ms (2.63G pixels/s) 96. 647.93ms (2.59G pixels/s) 97. 658.03ms (2.55G pixels/s) 98. 657.94ms (2.55G pixels/s) 99. 652.72ms (2.57G pixels/s) 100. 647.38ms (2.59G pixels/s) 101. 647.56ms (2.59G pixels/s) 102. 668.41ms (2.51G pixels/s) 103. 673.82ms (2.49G pixels/s) 104. 668.60ms (2.51G pixels/s) 105. 673.90ms (2.49G pixels/s) 106. 675.27ms (2.48G pixels/s) 107. 706.66ms (2.37G pixels/s) 108. 708.39ms (2.37G pixels/s) 109. 701.54ms (2.39G pixels/s) 110. 695.48ms (2.41G pixels/s) 111. 694.96ms (2.41G pixels/s) 112. 695.47ms (2.41G pixels/s) 113. 701.90ms (2.39G pixels/s) 114. 720.94ms (2.33G pixels/s) 115. 722.85ms (2.32G pixels/s) 116. 716.19ms (2.34G pixels/s) 117. 716.11ms (2.34G pixels/s) 118. 713.24ms (2.35G pixels/s) 119. 721.78ms (2.32G pixels/s) 120. 744.78ms (2.25G pixels/s) 121. 739.88ms (2.27G pixels/s) 122. 752.32ms (2.23G pixels/s) 123. 753.69ms (2.23G pixels/s) 124. 744.38ms (2.25G pixels/s) 125. 735.86ms (2.28G pixels/s) 126. 780.35ms (2.15G pixels/s) 127. 770.58ms (2.18G pixels/s) 128. 780.66ms (2.15G pixels/s)

GeForce GTX 780 Ti (GK110) Spoiler Compute only: 1. 17.09ms 2. 17.32ms 3. 16.75ms 4. 16.52ms 5. 16.74ms 6. 16.48ms 7. 16.51ms 8. 16.50ms 9. 16.57ms 10. 16.57ms 11. 16.71ms 12. 16.50ms 13. 16.58ms 14. 16.56ms 15. 16.51ms 16. 16.52ms 17. 16.51ms 18. 16.52ms 19. 16.52ms 20. 16.50ms 21. 16.54ms 22. 16.52ms 23. 16.56ms 24. 16.52ms 25. 16.55ms 26. 16.50ms 27. 16.54ms 28. 16.56ms 29. 16.53ms 30. 16.51ms 31. 32.30ms 32. 48.64ms 33. 52.77ms 34. 52.78ms 35. 52.74ms 36. 52.70ms 37. 52.72ms 38. 52.69ms 39. 52.70ms 40. 52.78ms 41. 52.72ms 42. 52.72ms 43. 52.74ms 44. 52.72ms 45. 52.72ms 46. 52.73ms 47. 52.75ms 48. 56.85ms 49. 56.88ms 50. 56.87ms 51. 56.84ms 52. 56.85ms 53. 52.78ms 54. 52.95ms 55. 52.72ms 56. 56.90ms 57. 56.83ms 58. 56.82ms 59. 56.83ms 60. 52.74ms 61. 52.74ms 62. 64.73ms 63. 72.64ms 64. 84.86ms 65. 88.94ms 66. 88.95ms 67. 84.84ms 68. 88.90ms 69. 84.85ms 70. 88.94ms 71. 84.86ms 72. 88.94ms 73. 84.84ms 74. 88.96ms 75. 88.94ms 76. 88.96ms 77. 88.94ms 78. 84.84ms 79. 88.93ms 80. 84.85ms 81. 88.93ms 82. 88.98ms 83. 88.94ms 84. 88.94ms 85. 88.95ms 86. 89.07ms 87. 84.85ms 88. 88.92ms 89. 84.87ms 90. 88.93ms 91. 88.95ms 92. 88.94ms 93. 88.92ms 94. 104.72ms 95. 100.64ms 96. 121.08ms 97. 121.03ms 98. 116.98ms 99. 121.05ms 100. 121.04ms 101. 121.26ms 102. 116.95ms 103. 121.04ms 104. 121.03ms 105. 121.09ms 106. 116.96ms 107. 121.04ms 108. 121.03ms 109. 121.03ms 110. 116.96ms 111. 121.03ms 112. 116.91ms 113. 121.02ms 114. 116.94ms 115. 121.04ms 116. 121.03ms 117. 121.04ms 118. 116.95ms 119. 121.04ms 120. 121.04ms 121. 121.02ms 122. 116.94ms 123. 121.05ms 124. 121.06ms 125. 116.93ms 126. 132.75ms 127. 132.73ms 128. 153.13ms Graphics only: 37.21ms (45.09G pixels/s) Graphics + compute: 1. 53.49ms (31.36G pixels/s) 2. 53.42ms (31.40G pixels/s) 3. 53.43ms (31.40G pixels/s) 4. 53.52ms (31.35G pixels/s) 5. 53.54ms (31.34G pixels/s) 6. 53.51ms (31.35G pixels/s) 7. 53.51ms (31.35G pixels/s) 8. 53.47ms (31.38G pixels/s) 9. 53.46ms (31.38G pixels/s) 10. 53.49ms (31.36G pixels/s) 11. 53.50ms (31.36G pixels/s) 12. 53.50ms (31.36G pixels/s) 13. 53.51ms (31.35G pixels/s) 14. 53.47ms (31.38G pixels/s) 15. 53.45ms (31.39G pixels/s) 16. 53.46ms (31.38G pixels/s) 17. 53.49ms (31.37G pixels/s) 18. 53.54ms (31.34G pixels/s) 19. 53.58ms (31.31G pixels/s) 20. 53.56ms (31.32G pixels/s) 21. 53.51ms (31.35G pixels/s) 22. 53.51ms (31.35G pixels/s) 23. 53.49ms (31.36G pixels/s) 24. 53.48ms (31.37G pixels/s) 25. 53.46ms (31.38G pixels/s) 26. 53.43ms (31.40G pixels/s) 27. 53.43ms (31.40G pixels/s) 28. 53.48ms (31.37G pixels/s) 29. 53.52ms (31.35G pixels/s) 30. 53.50ms (31.36G pixels/s) 31. 69.27ms (24.22G pixels/s) 32. 85.57ms (19.61G pixels/s) 33. 85.56ms (19.61G pixels/s) 34. 89.60ms (18.73G pixels/s) 35. 89.65ms (18.71G pixels/s) 36. 89.66ms (18.71G pixels/s) 37. 85.58ms (19.60G pixels/s) 38. 85.55ms (19.61G pixels/s) 39. 89.61ms (18.72G pixels/s) 40. 89.65ms (18.71G pixels/s) 41. 89.66ms (18.71G pixels/s) 42. 85.56ms (19.61G pixels/s) 43. 89.61ms (18.72G pixels/s) 44. 89.65ms (18.71G pixels/s) 45. 89.66ms (18.71G pixels/s) 46. 85.62ms (19.60G pixels/s) 47. 85.61ms (19.60G pixels/s) 48. 89.60ms (18.73G pixels/s) 49. 89.65ms (18.71G pixels/s) 50. 89.65ms (18.71G pixels/s) 51. 89.66ms (18.71G pixels/s) 52. 85.59ms (19.60G pixels/s) 53. 89.63ms (18.72G pixels/s) 54. 89.68ms (18.71G pixels/s) 55. 89.70ms (18.70G pixels/s) 56. 85.59ms (19.60G pixels/s) 57. 85.62ms (19.59G pixels/s) 58. 89.63ms (18.72G pixels/s) 59. 89.68ms (18.71G pixels/s) 60. 89.68ms (18.71G pixels/s) 61. 89.68ms (18.71G pixels/s) 62. 101.37ms (16.55G pixels/s) 63. 105.45ms (15.91G pixels/s) 64. 121.75ms (13.78G pixels/s) 65. 121.77ms (13.78G pixels/s) 66. 121.74ms (13.78G pixels/s) 67. 121.78ms (13.78G pixels/s) 68. 121.79ms (13.78G pixels/s) 69. 121.75ms (13.78G pixels/s) 70. 117.64ms (14.26G pixels/s) 71. 121.71ms (13.79G pixels/s) 72. 117.65ms (14.26G pixels/s) 73. 121.69ms (13.79G pixels/s) 74. 117.66ms (14.26G pixels/s) 75. 121.68ms (13.79G pixels/s) 76. 126.01ms (13.31G pixels/s) 77. 121.79ms (13.78G pixels/s) 78. 121.75ms (13.78G pixels/s) 79. 121.70ms (13.79G pixels/s) 80. 117.63ms (14.26G pixels/s) 81. 121.76ms (13.78G pixels/s) 82. 121.72ms (13.78G pixels/s) 83. 121.75ms (13.78G pixels/s) 84. 117.65ms (14.26G pixels/s) 85. 125.84ms (13.33G pixels/s) 86. 121.74ms (13.78G pixels/s) 87. 125.77ms (13.34G pixels/s) 88. 121.74ms (13.78G pixels/s) 89. 125.81ms (13.34G pixels/s) 90. 121.71ms (13.78G pixels/s) 91. 121.69ms (13.79G pixels/s) 92. 117.64ms (14.26G pixels/s) 93. 121.73ms (13.78G pixels/s) 94. 133.40ms (12.58G pixels/s) 95. 137.53ms (12.20G pixels/s) 96. 153.84ms (10.91G pixels/s) 97. 153.75ms (10.91G pixels/s) 98. 149.74ms (11.20G pixels/s) 99. 153.80ms (10.91G pixels/s) 100. 153.83ms (10.91G pixels/s) 101. 157.93ms (10.62G pixels/s) 102. 149.73ms (11.20G pixels/s) 103. 149.78ms (11.20G pixels/s) 104. 153.88ms (10.90G pixels/s) 105. 153.87ms (10.90G pixels/s) 106. 149.73ms (11.20G pixels/s) 107. 153.89ms (10.90G pixels/s) 108. 153.88ms (10.90G pixels/s) 109. 157.99ms (10.62G pixels/s) 110. 149.84ms (11.20G pixels/s) 111. 149.77ms (11.20G pixels/s) 112. 153.81ms (10.91G pixels/s) 113. 153.96ms (10.90G pixels/s) 114. 153.77ms (10.91G pixels/s) 115. 149.77ms (11.20G pixels/s) 116. 153.86ms (10.90G pixels/s) 117. 153.82ms (10.91G pixels/s) 118. 153.86ms (10.90G pixels/s) 119. 153.76ms (10.91G pixels/s) 120. 149.75ms (11.20G pixels/s) 121. 153.86ms (10.90G pixels/s) 122. 153.84ms (10.91G pixels/s) 123. 153.77ms (10.91G pixels/s) 124. 149.80ms (11.20G pixels/s) 125. 153.82ms (10.91G pixels/s) 126. 169.65ms (9.89G pixels/s) 127. 169.72ms (9.89G pixels/s) 128. 185.96ms (9.02G pixels/s)

The plot thickens: Thoughts? Is the EDRAM micromanaged? Honest question. I thought DX11.x handled that automatically, like Intel's IGPs using L3 and L4. As for the HSA-centric code, perhaps they can point many of those tasks towards the CPU anyways? I imagine most gaming PCs will have CPUs that are >3x faster than the 8x 1.6-1.75GHz Jaguars. Although it backfired really hard because of its even-worse-than-expected performance on Kepler cards, Arkham Knight is nVidia's dream come true. If only gamers would suck it up, buy the game and not complain like the entitled little whining brats that they are... Your selective quoting abilities are great. Let me try it too. Same link: https://docs.unrealengine.com/lates...ing/ShaderDevelopment/AsyncCompute/index.html

pointless, tier 2 binding isn't even being used fully and probably won't be for another few years. The bolded part, interesting, and without them profiling their shaders, and showing that to their respective community its very hard to even say what is going on on Maxwell, so what is thier point? The only other way around that is to see other Dx12 async code at work. has nothing to do with this discussion yeah so, that's why I linked the entire page? your point? Outside of useless banter?

Here's a reminder that back in 2013, Mark Cerny presented Async almost as the "holy grail" for this generation of consoles, and that we should be seeing titles taking advantage of that in.. 2016: I just find it curious that you chose to quote the only sentence in the whole damn page that seems to demean the use of Async compute, that's all. Very curious.

Here are the results on my 780ti with 355.82 Spoiler Compute only: 1. 20.27ms 2. 19.38ms 3. 18.32ms 4. 17.98ms 5. 16.55ms 6. 16.56ms 7. 15.88ms 8. 15.55ms 9. 15.53ms 10. 15.52ms 11. 15.52ms 12. 15.58ms 13. 15.56ms 14. 15.51ms 15. 15.56ms 16. 15.54ms 17. 15.53ms 18. 15.53ms 19. 15.53ms 20. 15.52ms 21. 15.54ms 22. 15.53ms 23. 15.59ms 24. 15.54ms 25. 15.53ms 26. 15.56ms 27. 15.51ms 28. 15.59ms 29. 15.54ms 30. 15.55ms 31. 31.01ms 32. 46.38ms 33. 50.23ms 34. 54.15ms 35. 58.11ms 36. 54.17ms 37. 50.24ms 38. 54.16ms 39. 54.13ms 40. 50.36ms 41. 50.27ms 42. 54.12ms 43. 58.04ms 44. 50.24ms 45. 58.08ms 46. 54.13ms 47. 50.24ms 48. 50.27ms 49. 54.24ms 50. 54.18ms 51. 50.24ms 52. 50.25ms 53. 54.15ms 54. 58.09ms 55. 54.17ms 56. 50.24ms 57. 58.06ms 58. 50.25ms 59. 50.31ms 60. 54.14ms 61. 50.24ms 62. 61.87ms 63. 69.63ms 64. 81.12ms 65. 85.01ms 66. 88.94ms 67. 81.14ms 68. 92.77ms 69. 85.05ms 70. 81.12ms 71. 88.88ms 72. 85.07ms 73. 84.99ms 74. 85.00ms 75. 85.10ms 76. 84.97ms 77. 85.00ms 78. 85.10ms 79. 84.98ms 80. 85.00ms 81. 85.09ms 82. 81.11ms 83. 85.01ms 84. 85.09ms 85. 81.09ms 86. 85.05ms 87. 85.09ms 88. 81.14ms 89. 81.12ms 90. 85.10ms 91. 81.10ms 92. 85.01ms 93. 88.96ms 94. 96.51ms 95. 104.37ms 96. 111.99ms 97. 115.86ms 98. 119.76ms 99. 115.85ms 100. 115.90ms 101. 115.88ms 102. 119.85ms 103. 115.86ms 104. 123.72ms 105. 115.85ms 106. 115.94ms 107. 111.99ms 108. 123.69ms 109. 115.84ms 110. 119.80ms 111. 112.00ms 112. 111.97ms 113. 119.78ms 114. 112.01ms 115. 119.76ms 116. 115.85ms 117. 119.84ms 118. 115.88ms 119. 115.95ms 120. 115.85ms 121. 123.70ms 122. 111.99ms 123. 123.70ms 124. 115.84ms 125. 119.81ms 126. 131.27ms 127. 135.22ms 128. 150.60ms Graphics only: 35.75ms (46.93G pixels/s) Graphics + compute: 1. 51.28ms (32.72G pixels/s) 2. 51.17ms (32.79G pixels/s) 3. 51.13ms (32.81G pixels/s) 4. 51.20ms (32.77G pixels/s) 5. 51.25ms (32.73G pixels/s) 6. 51.22ms (32.76G pixels/s) 7. 51.21ms (32.76G pixels/s) 8. 51.20ms (32.77G pixels/s) 9. 51.17ms (32.79G pixels/s) 10. 51.28ms (32.72G pixels/s) 11. 51.24ms (32.75G pixels/s) 12. 51.20ms (32.77G pixels/s) 13. 51.19ms (32.78G pixels/s) 14. 51.19ms (32.77G pixels/s) 15. 51.26ms (32.73G pixels/s) 16. 51.23ms (32.75G pixels/s) 17. 51.15ms (32.80G pixels/s) 18. 51.14ms (32.81G pixels/s) 19. 51.17ms (32.79G pixels/s) 20. 51.27ms (32.72G pixels/s) 21. 51.18ms (32.78G pixels/s) 22. 51.16ms (32.80G pixels/s) 23. 51.21ms (32.76G pixels/s) 24. 51.19ms (32.78G pixels/s) 25. 51.28ms (32.72G pixels/s) 26. 51.22ms (32.75G pixels/s) 27. 51.16ms (32.79G pixels/s) 28. 51.21ms (32.76G pixels/s) 29. 51.18ms (32.78G pixels/s) 30. 51.19ms (32.78G pixels/s) 31. 66.60ms (25.19G pixels/s) 32. 82.16ms (20.42G pixels/s) 33. 82.13ms (20.43G pixels/s) 34. 85.94ms (19.52G pixels/s) 35. 85.95ms (19.52G pixels/s) 36. 82.15ms (20.42G pixels/s) 37. 82.07ms (20.44G pixels/s) 38. 85.90ms (19.53G pixels/s) 39. 82.09ms (20.44G pixels/s) 40. 85.93ms (19.52G pixels/s) 41. 85.96ms (19.52G pixels/s) 42. 89.84ms (18.67G pixels/s) 43. 82.04ms (20.45G pixels/s) 44. 82.03ms (20.45G pixels/s) 45. 85.97ms (19.51G pixels/s) 46. 82.04ms (20.45G pixels/s) 47. 85.94ms (19.52G pixels/s) 48. 86.05ms (19.50G pixels/s) 49. 82.00ms (20.46G pixels/s) 50. 85.97ms (19.51G pixels/s) 51. 82.12ms (20.43G pixels/s) 52. 85.89ms (19.53G pixels/s) 53. 85.93ms (19.52G pixels/s) 54. 89.90ms (18.66G pixels/s) 55. 81.98ms (20.46G pixels/s) 56. 85.94ms (19.52G pixels/s) 57. 89.95ms (18.65G pixels/s) 58. 82.01ms (20.46G pixels/s) 59. 85.95ms (19.52G pixels/s) 60. 86.05ms (19.50G pixels/s) 61. 82.01ms (20.46G pixels/s) 62. 97.56ms (17.20G pixels/s) 63. 101.35ms (16.55G pixels/s) 64. 112.91ms (14.86G pixels/s) 65. 116.93ms (14.35G pixels/s) 66. 116.78ms (14.37G pixels/s) 67. 116.89ms (14.35G pixels/s) 68. 112.92ms (14.86G pixels/s) 69. 120.73ms (13.90G pixels/s) 70. 116.75ms (14.37G pixels/s) 71. 116.86ms (14.36G pixels/s) 72. 116.77ms (14.37G pixels/s) 73. 112.99ms (14.85G pixels/s) 74. 116.79ms (14.37G pixels/s) 75. 116.91ms (14.35G pixels/s) 76. 112.91ms (14.86G pixels/s) 77. 116.79ms (14.36G pixels/s) 78. 120.74ms (13.90G pixels/s) 79. 116.81ms (14.36G pixels/s) 80. 124.65ms (13.46G pixels/s) 81. 116.81ms (14.36G pixels/s) 82. 120.76ms (13.89G pixels/s) 83. 116.79ms (14.37G pixels/s) 84. 120.76ms (13.89G pixels/s) 85. 112.90ms (14.86G pixels/s) 86. 116.85ms (14.36G pixels/s) 87. 116.71ms (14.38G pixels/s) 88. 116.88ms (14.35G pixels/s) 89. 112.87ms (14.86G pixels/s) 90. 120.75ms (13.89G pixels/s) 91. 116.75ms (14.37G pixels/s) 92. 113.00ms (14.85G pixels/s) 93. 116.84ms (14.36G pixels/s) 94. 132.31ms (12.68G pixels/s) 95. 128.35ms (13.07G pixels/s) 96. 151.59ms (11.07G pixels/s) 97. 143.71ms (11.67G pixels/s) 98. 147.74ms (11.36G pixels/s) 99. 147.65ms (11.36G pixels/s) 100. 151.61ms (11.07G pixels/s) 101. 143.84ms (11.66G pixels/s) 102. 147.66ms (11.36G pixels/s) 103. 151.60ms (11.07G pixels/s) 104. 143.73ms (11.67G pixels/s) 105. 147.73ms (11.36G pixels/s) 106. 143.84ms (11.66G pixels/s) 107. 147.62ms (11.37G pixels/s) 108. 147.70ms (11.36G pixels/s) 109. 143.76ms (11.67G pixels/s) 110. 147.75ms (11.35G pixels/s) 111. 147.65ms (11.36G pixels/s) 112. 147.68ms (11.36G pixels/s) 113. 147.73ms (11.36G pixels/s) 114. 147.62ms (11.37G pixels/s) 115. 147.68ms (11.36G pixels/s) 116. 143.80ms (11.67G pixels/s) 117. 147.71ms (11.36G pixels/s) 118. 143.83ms (11.66G pixels/s) 119. 147.67ms (11.36G pixels/s) 120. 151.58ms (11.07G pixels/s) 121. 143.73ms (11.67G pixels/s) 122. 147.70ms (11.36G pixels/s) 123. 147.68ms (11.36G pixels/s) 124. 151.53ms (11.07G pixels/s) 125. 147.78ms (11.35G pixels/s) 126. 159.17ms (10.54G pixels/s) 127. 163.07ms (10.29G pixels/s) 128. 178.62ms (9.39G pixels/s)

It was specific to what we were seeing on the sample program that everyone was testing....... curiously killed the cat , be straight forward and you will get a straight answer. And the Oxide dev or who ever he is, is not being straight froward by using vagaries in his comments about performance differences between the two architectures, which I would expect him to know exactly what is going on.

I made a mistake with the first (yesterday) chart, the data actually come from 390x not Fury-X. Sorry for my blunder. This is Fury-X.

Ok so I'm looking at the several results being posted from MDolenc's tool and here's what I'm seeing: 1 - All GCN chips seem to present an almost flat compute time of 50-60ms regardless of the number of compute kernels and the rendering task being enabled or not. 2 - a) All nVidia chips seem to present a time that increases with the increase of compute kernels. Maxwell chips show lower compute times than Kepler chips. b) If rendering task time is X for an nVidia chip and compute time for a given number n of kernels is Y(n), then the "async compute" time for all nVidia chips seems to be very close to Y(n)+X. If nVidia's chips need to add the rendering time to a compute task with even one active kernel, doesn't this mean that "Async Compute" is not actually working and nVidia's hardware, at least in this test, does not seem to support Async Compute? Even if the driver does allow Async Compute tasks to be done, the hardware just seems to be doing rendering+compute in a serial fashion and not parallel at all.

One idea I had is that, if this an internal processor or potentially a SIMD running a firmware routine, is that it's a 32-slot structure. Particularly if this runs as a shader, there could be a single lane allocated as a sort of queue manager thread. (edit: As an aside, if it were running as an internel compute kernel, that could push Nvidia's implementation outside of what HSA states should be done for queue management--if Nvidia cared.) Even if not running as a shader, a reserved queue for a sort of system-controlled queue manager might be prudent. That would provide a way for something to monitor the queue and this lane would be able to respond to prompts to kill a queue or suspend it. It could overlap its work with the rest of the lanes, while remaining separate from possible hangs, malformed commands, or floods. The downside as slots reach the end of the bundle, is what to do at the 32 limit. One way to contain the cost of this without impacting successive sets of 32 is for the management software to double-book one of the lanes, which would take twice as long. (edit: Unless it can opportunistically schedule on a lane that finishes faster.) After that though, the queue manager's cost is hidden until the hardware's ability to take dispatches is exhausted, which seems far off with the current settings. I was thinking of other ways to test this. The current method is potentially mixing concurrency testing with asynchronous execution. If the kernel can programmatically shift the loop duration to fractions of itself, it might tease out how coarsely execution is tracked, and whether we're looking at some non-compute issue that is confounding measurements. Something like having the shader loop terminate at 0, 1/4, 1/2, 1, 1.5, 2x rather than every one behaving the same, might show where the floor is in this measurement. If this could be done at specific counts, it might show if there is something physical to the grouping behavior for Nvidia if its timing behavior changes based on delays incurred before or after certain limits. Varying the iteration count might give some kind of idea if there is a single-threaded limitation on the GCN ones, or if it really is that insensitive to this level of load. Injecting stalls into the graphics portion and/or some of the dispatches might show how flexibly the GPUs can work around them.

If the GPU is juggling two distinct modes internally, it might be that it cannot readily run both at the same time, hence the discussion of an expensive context switch. Rather than at a kernel or wavefront level, it might be a front-end context. If the GPU is also not able to readily preempt the graphics portion, it might be that it will keep its context at the forefront barring an explicit way of yielding priority.

But isn't Async Compute supposed to be a feature where the GPU does not need to juggle between two distinct modes (graphics/compute)? So now we have a second test pointing to what both the Oxide employee and AMD_Robert claimed? BTW, what if we monitor CPU usage during MDolenc's test to see if the "heavy CPU costs" claimed by Kollok are there for Kepler/Maxwell, and then compare for GCN GPUs?