DX12 Performance Discussion And Analysis Thread

Discussion in 'Rendering Technology and APIs' started by A1xLLcqAgt0qc2RyMz0y, Jul 29, 2015.

  1. BlackScout

    Joined:
    Aug 31, 2015
    Messages:
    1
    Likes Received:
    0
    GTX 750 Ti - Driver Version: 10.18.13.5580 / 335.80
    Compute only:
    1. 12.36ms
    2. 12.35ms
    3. 12.24ms
    4. 11.26ms
    5. 11.29ms
    6. 11.24ms
    7. 11.06ms
    8. 10.92ms
    9. 10.90ms
    10. 10.88ms
    11. 10.90ms
    12. 10.90ms
    13. 10.88ms
    14. 10.91ms
    15. 10.90ms
    16. 10.90ms
    17. 21.59ms
    18. 21.59ms
    19. 21.62ms
    20. 21.64ms
    21. 21.64ms
    22. 21.62ms
    23. 21.62ms
    24. 21.60ms
    25. 21.64ms
    26. 21.62ms
    27. 21.64ms
    28. 21.62ms
    29. 21.64ms
    30. 21.61ms
    31. 21.61ms
    32. 32.40ms
    33. 37.72ms
    34. 35.00ms
    35. 40.45ms
    36. 35.03ms
    37. 35.01ms
    38. 34.99ms
    39. 37.74ms
    40. 37.73ms
    41. 35.01ms
    42. 37.76ms
    43. 35.00ms
    44. 35.01ms
    45. 37.75ms
    46. 35.04ms
    47. 37.71ms
    48. 43.05ms
    49. 48.49ms
    50. 45.69ms
    51. 51.13ms
    52. 45.70ms
    53. 48.44ms
    54. 48.49ms
    55. 45.70ms
    56. 48.45ms
    57. 48.43ms
    58. 48.43ms
    59. 51.18ms
    60. 45.74ms
    61. 51.21ms
    62. 48.44ms
    63. 48.42ms
    64. 59.18ms
    65. 56.42ms
    66. 61.85ms
    67. 56.38ms
    68. 64.58ms
    69. 56.39ms
    70. 59.15ms
    71. 56.39ms
    72. 61.90ms
    73. 56.41ms
    74. 56.44ms
    75. 59.13ms
    76. 59.11ms
    77. 59.19ms
    78. 56.42ms
    79. 56.46ms
    80. 69.81ms
    81. 67.17ms
    82. 67.09ms
    83. 69.85ms
    84. 72.57ms
    85. 69.84ms
    86. 72.61ms
    87. 67.09ms
    88. 72.56ms
    89. 67.11ms
    90. 69.87ms
    91. 72.57ms
    92. 69.81ms
    93. 72.58ms
    94. 67.11ms
    95. 72.57ms
    96. 77.85ms
    97. 80.53ms
    98. 85.96ms
    99. 77.79ms
    100. 80.56ms
    101. 85.96ms
    102. 77.82ms
    103. 80.59ms
    104. 83.25ms
    105. 77.79ms
    106. 80.66ms
    107. 80.57ms
    108. 80.53ms
    109. 80.55ms
    110. 80.54ms
    111. 80.52ms
    112. 93.93ms
    113. 91.22ms
    114. 88.51ms
    115. 91.24ms
    116. 91.24ms
    117. 93.96ms
    118. 91.22ms
    119. 91.26ms
    120. 93.93ms
    121. 93.98ms
    122. 91.22ms
    123. 91.24ms
    124. 91.24ms
    125. 88.48ms
    126. 91.22ms
    127. 91.25ms
    128. 104.67ms
    Graphics only: 109.40ms (15.34G pixels/s)
    Graphics + compute:
    1. 120.15ms (13.96G pixels/s)
    2. 120.13ms (13.97G pixels/s)
    3. 120.07ms (13.97G pixels/s)
    4. 120.06ms (13.97G pixels/s)
    5. 120.13ms (13.97G pixels/s)
    6. 120.11ms (13.97G pixels/s)
    7. 120.13ms (13.97G pixels/s)
    8. 120.17ms (13.96G pixels/s)
    9. 120.07ms (13.97G pixels/s)
    10. 120.11ms (13.97G pixels/s)
    11. 120.11ms (13.97G pixels/s)
    12. 120.10ms (13.97G pixels/s)
    13. 120.10ms (13.97G pixels/s)
    14. 120.07ms (13.97G pixels/s)
    15. 120.12ms (13.97G pixels/s)
    16. 120.21ms (13.96G pixels/s)
    17. 130.86ms (12.82G pixels/s)
    18. 130.89ms (12.82G pixels/s)
    19. 130.91ms (12.82G pixels/s)
    20. 130.84ms (12.82G pixels/s)
    21. 130.82ms (12.82G pixels/s)
    22. 130.93ms (12.81G pixels/s)
    23. 130.88ms (12.82G pixels/s)
    24. 130.82ms (12.82G pixels/s)
    25. 130.89ms (12.82G pixels/s)
    26. 130.87ms (12.82G pixels/s)
    27. 130.90ms (12.82G pixels/s)
    28. 130.88ms (12.82G pixels/s)
    29. 130.83ms (12.82G pixels/s)
    30. 130.84ms (12.82G pixels/s)
    31. 130.91ms (12.82G pixels/s)
    32. 141.57ms (11.85G pixels/s)
    33. 146.99ms (11.41G pixels/s)
    34. 144.31ms (11.63G pixels/s)
    35. 144.33ms (11.62G pixels/s)
    36. 149.63ms (11.21G pixels/s)
    37. 144.30ms (11.63G pixels/s)
    38. 141.53ms (11.85G pixels/s)
    39. 144.24ms (11.63G pixels/s)
    40. 147.01ms (11.41G pixels/s)
    41. 144.34ms (11.62G pixels/s)
    42. 146.92ms (11.42G pixels/s)
    43. 144.32ms (11.63G pixels/s)
    44. 141.54ms (11.85G pixels/s)
    45. 147.10ms (11.41G pixels/s)
    46. 148.51ms (11.30G pixels/s)
    47. 144.32ms (11.63G pixels/s)
    48. 152.27ms (11.02G pixels/s)
    49. 157.62ms (10.64G pixels/s)
    50. 154.91ms (10.83G pixels/s)
    51. 163.31ms (10.27G pixels/s)
    52. 158.13ms (10.61G pixels/s)
    53. 160.59ms (10.45G pixels/s)
    54. 159.35ms (10.53G pixels/s)
    55. 158.48ms (10.59G pixels/s)
    56. 152.25ms (11.02G pixels/s)
    57. 163.20ms (10.28G pixels/s)
    58. 158.19ms (10.61G pixels/s)
    59. 163.27ms (10.28G pixels/s)
    60. 160.40ms (10.46G pixels/s)
    61. 154.96ms (10.83G pixels/s)
    62. 154.96ms (10.83G pixels/s)
    63. 157.68ms (10.64G pixels/s)
    64. 165.59ms (10.13G pixels/s)
    65. 168.41ms (9.96G pixels/s)
    66. 168.42ms (9.96G pixels/s)
    67. 165.64ms (10.13G pixels/s)
    68. 168.37ms (9.96G pixels/s)
    69. 168.42ms (9.96G pixels/s)
    70. 162.93ms (10.30G pixels/s)
    71. 168.34ms (9.97G pixels/s)
    72. 168.38ms (9.96G pixels/s)
    73. 165.64ms (10.13G pixels/s)
    74. 168.35ms (9.97G pixels/s)
    75. 165.64ms (10.13G pixels/s)
    76. 165.64ms (10.13G pixels/s)
    77. 168.31ms (9.97G pixels/s)
    78. 165.64ms (10.13G pixels/s)
    79. 162.95ms (10.30G pixels/s)
    80. 176.45ms (9.51G pixels/s)
    81. 179.07ms (9.37G pixels/s)
    82. 173.66ms (9.66G pixels/s)
    83. 176.31ms (9.52G pixels/s)
    84. 179.10ms (9.37G pixels/s)
    85. 173.63ms (9.66G pixels/s)
    86. 173.74ms (9.66G pixels/s)
    87. 179.12ms (9.37G pixels/s)
    88. 176.33ms (9.51G pixels/s)
    89. 173.74ms (9.66G pixels/s)
    90. 179.11ms (9.37G pixels/s)
    91. 176.32ms (9.52G pixels/s)
    92. 179.08ms (9.37G pixels/s)
    93. 179.09ms (9.37G pixels/s)
    94. 173.67ms (9.66G pixels/s)
    95. 179.11ms (9.37G pixels/s)
    96. 184.39ms (9.10G pixels/s)
    97. 184.34ms (9.10G pixels/s)
    98. 189.73ms (8.84G pixels/s)
    99. 187.17ms (8.96G pixels/s)
    100. 187.10ms (8.97G pixels/s)
    101. 184.33ms (9.10G pixels/s)
    102. 189.73ms (8.84G pixels/s)
    103. 187.05ms (8.97G pixels/s)
    104. 187.03ms (8.97G pixels/s)
    105. 187.07ms (8.97G pixels/s)
    106. 203.42ms (8.25G pixels/s)
    107. 184.40ms (9.10G pixels/s)
    108. 187.04ms (8.97G pixels/s)
    109. 187.07ms (8.97G pixels/s)
    110. 187.01ms (8.97G pixels/s)
    111. 187.09ms (8.97G pixels/s)
    112. 200.46ms (8.37G pixels/s)
    113. 197.81ms (8.48G pixels/s)
    114. 206.55ms (8.12G pixels/s)
    115. 197.77ms (8.48G pixels/s)
    116. 197.80ms (8.48G pixels/s)
    117. 195.02ms (8.60G pixels/s)
    118. 203.18ms (8.26G pixels/s)
    119. 195.03ms (8.60G pixels/s)
    120. 200.51ms (8.37G pixels/s)
    121. 197.75ms (8.48G pixels/s)
    122. 203.19ms (8.26G pixels/s)
    123. 195.04ms (8.60G pixels/s)
    124. 200.51ms (8.37G pixels/s)
    125. 195.04ms (8.60G pixels/s)
    126. 203.19ms (8.26G pixels/s)
    127. 200.55ms (8.37G pixels/s)
    128. 253.66ms (6.61G pixels/s)
     
  2. InfYn

    Joined:
    Aug 31, 2015
    Messages:
    1
    Likes Received:
    0
    How'd you do the graph? :p

    Here's mine:
     
  3. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,469
    Likes Received:
    315
    Location:
    Switzerland
    Well, for the moment, the question on the technical aspect, is maybe just to know if Nvidia dont support it really, or does it support it in a different way ( context switching ) and could they enable it fully then ?

    Also, looking at the question of console port to PC, if consoles devs are allready investigate it, and planning to use it, this will be pretty funny to see PC devs disable it for AMD gpu's... ( And in this case, i fear allready the tons of flaming threads who will sadly keep us busy in 2016 )

    Lets be honest, im not sure all engine / games will really benefits of it, and there's other stuffs in DX12 who could benefit both when we compare to DX11.
     
  4. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    The latency doesn't matter if you are using GPU compute (including async) for rendering. You should not copy the results back to CPU or wait for the GPU on CPU side. Discrete GPUs are far away from the CPU. You should not expect to see low latency. Discrete GPUs are not good for tightly interleaved mixed CPU->GPU->CPU work.

    To see realistic results, you should benchmark async compute in rendering tasks. For example render a shadow map while you run a tiled lighting compute shader concurrently (for the previous frame). Output the result to display instread of waiting compute to finish on CPU. For result timing, use GPU timestamps, do not use a CPU timer. CPU side timing of GPU results in lots of noise and even false results because of driver related buffering.
     
  5. Actually, AMD's presence in high-profile development crushes everything else due to GCN being in both Sony and Microsoft consoles (and most probably, next year's Nintendo NX).
    Even if AMD's marketshare on the PC is rather low nowadays, there's isn't any honest reason to keep console-developed async compute benefits from passing towards DX12 ports.

    We can definitely count on Gameworks to at least try to find a way to keep this from happening, but nVidia won't be able to cover all high-profile console ports with their trojan horse program. Developers with big financial backings (studios from EA, Rockstar, Activision-Blizzard, Microsoft, Valve, etc.), will probably refuse nVidia's attempts when they do their PC ports.

    Of course, if you're running a studio that has a fraction of the budget from the big guys (e.g. CD Projekt) or your budget is low because your publisher has been making terrible decisions over the years (e.g. anything Ubisoft lately), then Gameworks seems really nice because it could save a lot of money.
     
  6. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    Its not even going to be a "port" soon. XBOX One will be Win 10 / DX12 and a lot of it will be directly transportable between the two. Given the XBOX One configuration of GPU I would imagine a lot of performance optimization will go into ensuring that gets the baseline quality / performance then the PC maybe gets extra bells and whistles.
     
    Deleted member 13524 likes this.
  7. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    I think the EDRAM plus the HSA-like architecture of the consoles makes a load of console-specific performance-centric design decisions moot in the PC space.

    Also, if publishers hand over console games to some fly-by-night studio which solely has to get the game working on PC, given the art assets and the console gaming experience as a guide then you get something like Batman: Arkham Knight.
     
  8. MDolenc

    Regular

    Joined:
    May 26, 2002
    Messages:
    696
    Likes Received:
    446
    Location:
    Slovenia
    That was only the first version. This second one only uses 2 queues and command lists are prepared in advance.

    It already doesn't wait for any copies back to CPU side. It does it's timing on CPU side and waits for fences to signal. So that's actually a great idea to use GPU side timestamps! :) That should point out a few more things.
     
  9. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Which seems to be why 128 kernel launches appear to run simultaneously on GCN as, effectively, a single enqueue operation. It still sounds to me as if your code then waits for that single enqueue to drain.

    Obviously you're dependent upon the driver and hardware in this situation and it seems to me that AMD and NVidia are behaving quite differently. My theory is now that NVidia assigns each instance of a kernel that you enqueue to a single queue entry, which is why there's the 32-spaced stepping in the results on GM200. On AMD there is a single queue with all the kernels lined up.

    An alternative test would be to construct a set of 8 different kernels, each of which compiles to independent code and each of which is bound to a distinct output buffer. Issued to 8 distinct queues (round-robin), that should exercise GCN some more. It might also provide evidence of the theories going around that NVidia doesn't handle concurrent compute contexts gracefully.

    Another thing to try is simply to launch substantially more kernels than the count of concurrent work-groups that these GPUs can support, i.e. >2560 on Fiji and erm 1536 on GM200 (16 per SIMD, with 96 SIMDs)? At some point we should see a step up in time on GCN...
     
  10. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY
    pharma and Nemo like this.
  11. cmdrdredd

    Joined:
    Sep 1, 2015
    Messages:
    1
    Likes Received:
    0
    GTX 970 - Driver Version 10.18.13.5582(355.82 WHQL)
    Compute only:
    1. 9.57ms
    2. 9.56ms
    3. 9.61ms
    4. 9.59ms
    5. 9.60ms
    6. 9.60ms
    7. 9.61ms
    8. 9.60ms
    9. 9.61ms
    10. 9.60ms
    11. 9.62ms
    12. 9.61ms
    13. 9.62ms
    14. 9.61ms
    15. 9.61ms
    16. 9.60ms
    17. 9.63ms
    18. 9.62ms
    19. 9.60ms
    20. 9.61ms
    21. 9.62ms
    22. 9.62ms
    23. 9.61ms
    24. 9.65ms
    25. 9.64ms
    26. 9.63ms
    27. 9.67ms
    28. 9.64ms
    29. 9.64ms
    30. 9.63ms
    31. 9.63ms
    32. 19.08ms
    33. 22.23ms
    34. 21.55ms
    35. 19.11ms
    36. 19.11ms
    37. 21.48ms
    38. 21.53ms
    39. 23.91ms
    40. 21.50ms
    41. 19.12ms
    42. 19.12ms
    43. 19.14ms
    44. 21.52ms
    45. 19.15ms
    46. 23.89ms
    47. 19.13ms
    48. 21.50ms
    49. 19.14ms
    50. 19.15ms
    51. 23.89ms
    52. 27.95ms
    53. 24.43ms
    54. 26.93ms
    55. 24.34ms
    56. 24.45ms
    57. 26.94ms
    58. 24.44ms
    59. 24.34ms
    60. 26.82ms
    61. 24.38ms
    62. 26.71ms
    63. 21.55ms
    64. 28.59ms
    65. 33.35ms
    66. 31.08ms
    67. 28.62ms
    68. 28.62ms
    69. 28.62ms
    70. 28.69ms
    71. 31.04ms
    72. 28.64ms
    73. 30.99ms
    74. 35.83ms
    75. 28.66ms
    76. 28.63ms
    77. 28.64ms
    78. 33.46ms
    79. 28.64ms
    80. 28.66ms
    81. 28.64ms
    82. 35.83ms
    83. 28.68ms
    84. 28.65ms
    85. 28.65ms
    86. 33.47ms
    87. 28.66ms
    88. 28.67ms
    89. 28.65ms
    90. 33.49ms
    91. 28.67ms
    92. 33.41ms
    93. 28.64ms
    94. 33.48ms
    95. 31.02ms
    96. 38.09ms
    97. 38.12ms
    98. 42.92ms
    99. 38.15ms
    100. 40.55ms
    101. 42.92ms
    102. 42.91ms
    103. 40.57ms
    104. 42.92ms
    105. 40.50ms
    106. 45.32ms
    107. 42.89ms
    108. 42.90ms
    109. 38.22ms
    110. 42.91ms
    111. 38.15ms
    112. 42.96ms
    113. 38.16ms
    114. 40.52ms
    115. 42.98ms
    116. 38.16ms
    117. 40.51ms
    118. 45.33ms
    119. 40.53ms
    120. 42.90ms
    121. 42.94ms
    122. 38.16ms
    123. 40.53ms
    124. 40.58ms
    125. 40.52ms
    126. 40.53ms
    127. 40.60ms
    128. 47.60ms
    Graphics only: 31.28ms (53.64G pixels/s)
    Graphics + compute:
    1. 40.66ms (41.26G pixels/s)
    2. 40.66ms (41.26G pixels/s)
    3. 40.69ms (41.23G pixels/s)
    4. 40.72ms (41.20G pixels/s)
    5. 40.69ms (41.23G pixels/s)
    6. 40.71ms (41.21G pixels/s)
    7. 40.73ms (41.19G pixels/s)
    8. 40.71ms (41.22G pixels/s)
    9. 40.70ms (41.22G pixels/s)
    10. 40.76ms (41.16G pixels/s)
    11. 40.71ms (41.21G pixels/s)
    12. 40.70ms (41.22G pixels/s)
    13. 40.74ms (41.18G pixels/s)
    14. 40.70ms (41.22G pixels/s)
    15. 40.68ms (41.24G pixels/s)
    16. 40.69ms (41.23G pixels/s)
    17. 40.71ms (41.21G pixels/s)
    18. 40.73ms (41.19G pixels/s)
    19. 40.71ms (41.22G pixels/s)
    20. 40.71ms (41.21G pixels/s)
    21. 40.73ms (41.19G pixels/s)
    22. 40.73ms (41.19G pixels/s)
    23. 40.69ms (41.23G pixels/s)
    24. 40.75ms (41.17G pixels/s)
    25. 40.75ms (41.17G pixels/s)
    26. 40.73ms (41.19G pixels/s)
    27. 40.73ms (41.19G pixels/s)
    28. 40.72ms (41.20G pixels/s)
    29. 40.72ms (41.21G pixels/s)
    30. 40.74ms (41.18G pixels/s)
    31. 40.72ms (41.21G pixels/s)
    32. 50.19ms (33.43G pixels/s)
    33. 52.66ms (31.86G pixels/s)
    34. 50.25ms (33.38G pixels/s)
    35. 50.32ms (33.34G pixels/s)
    36. 52.63ms (31.88G pixels/s)
    37. 50.25ms (33.38G pixels/s)
    38. 50.24ms (33.39G pixels/s)
    39. 50.25ms (33.39G pixels/s)
    40. 50.33ms (33.33G pixels/s)
    41. 50.25ms (33.39G pixels/s)
    42. 50.26ms (33.38G pixels/s)
    43. 50.24ms (33.39G pixels/s)
    44. 50.27ms (33.38G pixels/s)
    45. 50.30ms (33.35G pixels/s)
    46. 50.27ms (33.38G pixels/s)
    47. 50.34ms (33.33G pixels/s)
    48. 50.29ms (33.36G pixels/s)
    49. 50.26ms (33.38G pixels/s)
    50. 52.68ms (31.85G pixels/s)
    51. 50.26ms (33.38G pixels/s)
    52. 52.71ms (31.83G pixels/s)
    53. 50.27ms (33.38G pixels/s)
    54. 50.23ms (33.40G pixels/s)
    55. 52.67ms (31.85G pixels/s)
    56. 52.64ms (31.87G pixels/s)
    57. 52.73ms (31.82G pixels/s)
    58. 50.28ms (33.37G pixels/s)
    59. 50.36ms (33.32G pixels/s)
    60. 50.27ms (33.38G pixels/s)
    61. 50.22ms (33.41G pixels/s)
    62. 52.65ms (31.87G pixels/s)
    63. 50.27ms (33.37G pixels/s)
    64. 62.10ms (27.02G pixels/s)
    65. 59.71ms (28.10G pixels/s)
    66. 62.17ms (26.99G pixels/s)
    67. 59.76ms (28.08G pixels/s)
    68. 59.81ms (28.05G pixels/s)
    69. 59.78ms (28.07G pixels/s)
    70. 62.22ms (26.96G pixels/s)
    71. 62.13ms (27.01G pixels/s)
    72. 62.17ms (26.99G pixels/s)
    73. 59.77ms (28.07G pixels/s)
    74. 62.19ms (26.98G pixels/s)
    75. 59.77ms (28.07G pixels/s)
    76. 64.60ms (25.97G pixels/s)
    77. 59.72ms (28.09G pixels/s)
    78. 62.21ms (26.97G pixels/s)
    79. 59.74ms (28.08G pixels/s)
    80. 62.23ms (26.96G pixels/s)
    81. 59.77ms (28.07G pixels/s)
    82. 62.21ms (26.97G pixels/s)
    83. 59.77ms (28.07G pixels/s)
    84. 62.17ms (26.98G pixels/s)
    85. 59.79ms (28.06G pixels/s)
    86. 62.18ms (26.98G pixels/s)
    87. 59.78ms (28.07G pixels/s)
    88. 62.18ms (26.98G pixels/s)
    89. 59.78ms (28.07G pixels/s)
    90. 62.22ms (26.96G pixels/s)
    91. 59.76ms (28.08G pixels/s)
    92. 62.19ms (26.98G pixels/s)
    93. 59.78ms (28.06G pixels/s)
    94. 62.22ms (26.97G pixels/s)
    95. 59.77ms (28.07G pixels/s)
    96. 69.25ms (24.23G pixels/s)
    97. 69.23ms (24.24G pixels/s)
    98. 69.23ms (24.23G pixels/s)
    99. 71.68ms (23.41G pixels/s)
    100. 69.28ms (24.22G pixels/s)
    101. 74.10ms (22.64G pixels/s)
    102. 69.28ms (24.22G pixels/s)
    103. 69.34ms (24.20G pixels/s)
    104. 69.33ms (24.20G pixels/s)
    105. 71.67ms (23.41G pixels/s)
    106. 74.10ms (22.64G pixels/s)
    107. 69.28ms (24.22G pixels/s)
    108. 74.11ms (22.64G pixels/s)
    109. 69.29ms (24.21G pixels/s)
    110. 71.71ms (23.40G pixels/s)
    111. 71.74ms (23.38G pixels/s)
    112. 69.24ms (24.23G pixels/s)
    113. 71.70ms (23.40G pixels/s)
    114. 69.28ms (24.22G pixels/s)
    115. 69.35ms (24.19G pixels/s)
    116. 71.71ms (23.40G pixels/s)
    117. 69.30ms (24.21G pixels/s)
    118. 71.77ms (23.38G pixels/s)
    119. 69.26ms (24.22G pixels/s)
    120. 69.35ms (24.19G pixels/s)
    121. 69.24ms (24.23G pixels/s)
    122. 69.32ms (24.20G pixels/s)
    123. 69.26ms (24.22G pixels/s)
    124. 69.26ms (24.22G pixels/s)
    125. 71.71ms (23.40G pixels/s)
    126. 71.66ms (23.41G pixels/s)
    127. 71.74ms (23.39G pixels/s)
    128. 78.74ms (21.31G pixels/s)
     
  12. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
  13. RedditUserB

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    24
    Likes Received:
    1
    Could some of you experts in the field help us laymen interpret the data correctly?

    For clarity sake, are these the correct analysis?

    https://www.reddit.com/r/nvidia/comments/3j5e9b/analysis_async_compute_is_it_true_nvidia_cant_do/

    https://www.reddit.com/r/oculus/comments/3j5h9y/put_that_popcorn_away_nvidia_maxwell_does/

    Some are saying that Maxwell is superior to GCN for compute, or that GCN itself is doing compute + graphics serially due to the time/ms count being much higher than NV's GPU.
     
  14. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY

    I agree with that Jawed but this is the way the industry is going, so unfortunately for inexperienced programmers its much higher learning curve when API's and coding standards are concerned.
     
  15. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY

    still going over that ;),

    It seems that nV's hardware is better for compute on a per unit basis as with graphics, now with graphics with compute there seems to be some issues but I think we need more data to make a conclusive assessment on what is going on.
     
  16. Ryan Smith

    Regular

    Joined:
    Mar 26, 2010
    Messages:
    629
    Likes Received:
    1,131
    Location:
    PCIe x16_1
    For inexperienced programmers, the idea is really to use an off-the shelf engine (Unity, Unreal, etc) rather than rolling your own. Middleware is essentially the new high-level API.
     
    Lightman and Jawed like this.
  17. RedditUserB

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    24
    Likes Received:
    1
    Thank you.

    So in the meantime, is it accurate to use the results/data as a benchmark to compare how effective/good the architectures are in relation to compute, graphics or async compute, or is the test a case of function present/absent base on expected outcomes?
     
  18. dogen

    Regular

    Joined:
    Oct 27, 2014
    Messages:
    340
    Likes Received:
    260
    Doesn't seem like it. It definitely looks like something odd is going on with the AMD results.
     
  19. Forceman

    Newcomer

    Joined:
    Dec 23, 2010
    Messages:
    11
    Likes Received:
    10
    Why do the Maxwell cards take longer as the number of batches (or commands) increases, even in the compute only pass? Shouldn't that be running synchronously, and not be impacted by whether Maxwell supports async compute or not?
     
  20. Forceman

    Newcomer

    Joined:
    Dec 23, 2010
    Messages:
    11
    Likes Received:
    10
    And why does the 750Ti show the same 32 command pattern as the 9 series? Shouldn't it be different since it doesn't support even the 31+1 of the 9 series?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...