DX12 Performance Discussion And Analysis Thread

Then if this was the case why is Fiji performance right around where it should be, hanging around the 980ti, I would expect it to crush the 980ti, if what that oxide dev stated is correct, not to mention, he says he thinks lol he doesn't know he is making a guess, what, if I had access to that alpha, beta game, I would run it through a profiler and find out exactly what the difference is, it wouldn't take more then 5 minutes, don't even need to be an oxide dev to see that.

Seems like AMD let go of a golden opportunity to pay back in kind.

As for async shaders, I thought it was merely bringing around concurrent kernel execution to dx, so it sounds strange that maxwell doesn't do it or doesn't do it well enough.
 
Yeah as that's a possibility too, as sebbbi stated

Async compute is a little bit like hyperthreading. It helps some workloads a lot, while it (slightly) hurts some workloads. Because every GPU has a bit different bottlenecks, it is very hard to write async compute code that benefits them all. Console engines will be off course optimized first for GCN hardware. This might result in gains on PC side as well, but it is too early to say really, since even AMD PC GPUs have different bottlenecks (config is not identical to consoles). Nobody has shipped a DX12 PC console port yet and the DX12 drivers are still quite immature.
 
As for async shaders, I thought it was merely bringing around concurrent kernel execution to dx, so it sounds strange that maxwell doesn't do it or doesn't do it well enough.
It's not just about concurrent kernel execution. Kepler can already do that and probably all other DX12 capable hardware. Main advantage is running compute kernels in parallel with graphics rendering. As has already been mentioned for example doing some ROP heavy rendering and running a compute heavy compute shader at the same time.
 
It's not just about concurrent kernel execution. Kepler can already do that and probably all other DX12 capable hardware. Main advantage is running compute kernels in parallel with graphics rendering. As has already been mentioned for example doing some ROP heavy rendering and running a compute heavy compute shader at the same time.
As far I know, talking about current NV hardware, only Maxwell 2.0 should take advantage of executing compute works and graphics works in concurrency.. All other hardware (Fermi, Kerpler, Maxwell 1.0) should not notice any sensible performance improvement (nor any sensible performance downgrade).

Is that a possible source of slowdowns, that might be driver-correctable?
On all GCN compute and graphics share many hardware resources, so to take full advantage of asynchronous operations the engines should do complementary works in concurrency (ie: do graphics and computes on the same resources at the same time is not a great idea). Thanks to the great public documentation, I am not so sure about Maxwell 2.0 (but I guess it is something similar).
Asynchronous copy operation should be beneficial for all hardware (and probably even older non-DX12 hardware).
 
It is practically the same that happens on CPUs. You can create any amount of threads (queues) even if your CPU (GPU) just has a single core (runs a single command stream). If more threads (queues) are active at the same time than are supported by the hardware, they will be periodically context switched.
And this is why I've always felt it was an oversight not to allow queries for programs to see what kind of async functionality is available. In the long run all architectures will have oodles of async support, but right now not even Gen9 supports everything, and meanwhile Kepler + Maxwell 1 is going to be with us for a long time to come.

There are going to be async tasks that are suboptimal on non-async GPUs, and gamers will be counting on developers to optimize their games for these GPUs.
 
So here's the updated version of the benchmark. This time it runs compute only, graphics only and compute + graphics. This time just two command queues.
GTX 680:
Compute only:
1. 17.91ms
2. 18.03ms
3. 17.90ms
4. 17.98ms
5. 18.05ms
6. 18.09ms
7. 18.02ms
8. 18.03ms
9. 35.49ms
...
Graphics only: 50.75ms (33.06G pixels/s)
Graphics + compute:
1. 68.12ms (24.63G pixels/s)
2. 68.20ms (24.60G pixels/s)
3. 68.23ms (24.59G pixels/s)
4. 68.15ms (24.62G pixels/s)
5. 68.26ms (24.58G pixels/s)
6. 68.14ms (24.62G pixels/s)
7. 68.48ms (24.50G pixels/s)
8. 68.09ms (24.64G pixels/s)
9. 85.80ms (19.55G pixels/s)
...
So multiple dispatches yes and graphics + compute is just as expected graphics + compute (not async).
Anyone willing to give it a try on Maxwell 2 or GCN?
 

Attachments

  • AsyncCompute.zip
    16.5 KB · Views: 231
GCN 1.0 tahiti - driver version 15.200.1062.1002

Compute only:
1. 58.64ms
2. 58.63ms
3. 58.63ms
4. 58.63ms
5. 58.64ms
6. 58.64ms
7. 58.64ms
8. 58.63ms
9. 58.64ms
10. 58.63ms
11. 58.64ms
12. 58.64ms
13. 58.64ms
14. 58.64ms
15. 58.63ms
16. 58.64ms
17. 58.63ms
18. 58.64ms
19. 58.63ms
20. 58.64ms
21. 58.63ms
22. 58.64ms
23. 58.63ms
24. 58.64ms
25. 58.64ms
26. 58.64ms
27. 58.63ms
28. 58.64ms
29. 58.64ms
30. 58.64ms
31. 58.64ms
32. 58.64ms
33. 58.64ms
34. 58.64ms
35. 58.64ms
36. 58.63ms
37. 58.64ms
38. 58.64ms
39. 58.64ms
40. 58.63ms
41. 58.64ms
42. 58.64ms
43. 58.64ms
44. 58.64ms
45. 58.64ms
46. 58.64ms
47. 58.63ms
48. 58.64ms
49. 58.63ms
50. 58.64ms
51. 58.64ms
52. 58.64ms
53. 58.64ms
54. 58.64ms
55. 58.64ms
56. 58.64ms
57. 58.64ms
58. 58.64ms
59. 58.63ms
60. 58.64ms
61. 58.64ms
62. 58.64ms
63. 58.64ms
64. 58.64ms
65. 58.64ms
66. 58.64ms
67. 58.64ms
68. 58.64ms
69. 58.64ms
70. 58.64ms
71. 58.64ms
72. 58.64ms
73. 58.64ms
74. 58.64ms
75. 58.64ms
76. 58.64ms
77. 58.64ms
78. 58.63ms
79. 58.64ms
80. 58.64ms
81. 58.64ms
82. 58.64ms
83. 58.64ms
84. 58.64ms
85. 58.64ms
86. 58.65ms
87. 58.64ms
88. 58.64ms
89. 58.64ms
90. 58.64ms
91. 58.64ms
92. 58.64ms
93. 58.64ms
94. 58.64ms
95. 58.63ms
96. 58.64ms
97. 58.64ms
98. 58.64ms
99. 58.64ms
100. 58.64ms
101. 58.64ms
102. 58.64ms
103. 58.64ms
104. 58.64ms
105. 58.64ms
106. 58.64ms
107. 58.65ms
108. 58.64ms
109. 58.64ms
110. 58.64ms
111. 58.64ms
112. 58.64ms
113. 58.64ms
114. 58.64ms
115. 58.64ms
116. 58.64ms
117. 58.64ms
118. 58.64ms
119. 58.64ms
120. 58.65ms
121. 58.64ms
122. 58.64ms
123. 58.66ms
124. 58.64ms
125. 58.64ms
126. 58.65ms
127. 58.64ms
128. 58.65ms
Graphics only: 56.41ms (29.74G pixels/s)
Graphics + compute:
1. 58.85ms (28.51G pixels/s)
2. 58.82ms (28.52G pixels/s)
3. 58.84ms (28.51G pixels/s)
4. 58.86ms (28.51G pixels/s)
5. 58.87ms (28.50G pixels/s)
6. 58.85ms (28.51G pixels/s)
7. 58.86ms (28.50G pixels/s)
8. 58.85ms (28.51G pixels/s)
9. 58.87ms (28.50G pixels/s)
10. 58.88ms (28.50G pixels/s)
11. 58.86ms (28.51G pixels/s)
12. 58.85ms (28.51G pixels/s)
13. 58.87ms (28.50G pixels/s)
14. 58.86ms (28.50G pixels/s)
15. 58.87ms (28.50G pixels/s)
16. 58.88ms (28.49G pixels/s)
17. 58.88ms (28.50G pixels/s)
18. 58.87ms (28.50G pixels/s)
19. 58.88ms (28.49G pixels/s)
20. 58.87ms (28.50G pixels/s)
21. 58.88ms (28.49G pixels/s)
22. 58.88ms (28.49G pixels/s)
23. 58.89ms (28.49G pixels/s)
24. 58.88ms (28.49G pixels/s)
25. 58.88ms (28.49G pixels/s)
26. 58.87ms (28.50G pixels/s)
27. 58.87ms (28.50G pixels/s)
28. 58.89ms (28.49G pixels/s)
29. 58.88ms (28.50G pixels/s)
30. 58.87ms (28.50G pixels/s)
31. 58.87ms (28.50G pixels/s)
32. 58.89ms (28.49G pixels/s)
33. 58.87ms (28.50G pixels/s)
34. 58.86ms (28.50G pixels/s)
35. 58.88ms (28.50G pixels/s)
36. 58.88ms (28.49G pixels/s)
37. 58.89ms (28.49G pixels/s)
38. 58.87ms (28.50G pixels/s)
39. 58.89ms (28.49G pixels/s)
40. 58.90ms (28.48G pixels/s)
41. 58.89ms (28.49G pixels/s)
42. 58.90ms (28.48G pixels/s)
43. 58.89ms (28.49G pixels/s)
44. 58.89ms (28.49G pixels/s)
45. 58.90ms (28.48G pixels/s)
46. 58.89ms (28.49G pixels/s)
47. 58.89ms (28.49G pixels/s)
48. 58.90ms (28.48G pixels/s)
49. 58.90ms (28.48G pixels/s)
50. 58.90ms (28.48G pixels/s)
51. 58.90ms (28.49G pixels/s)
52. 58.90ms (28.48G pixels/s)
53. 58.90ms (28.48G pixels/s)
54. 58.91ms (28.48G pixels/s)
55. 58.89ms (28.49G pixels/s)
56. 58.90ms (28.48G pixels/s)
57. 58.92ms (28.48G pixels/s)
58. 58.90ms (28.49G pixels/s)
59. 58.90ms (28.49G pixels/s)
60. 58.89ms (28.49G pixels/s)
61. 58.90ms (28.48G pixels/s)
62. 58.89ms (28.49G pixels/s)
63. 58.90ms (28.48G pixels/s)
64. 58.90ms (28.49G pixels/s)
65. 58.90ms (28.48G pixels/s)
66. 58.90ms (28.48G pixels/s)
67. 58.90ms (28.49G pixels/s)
68. 58.90ms (28.48G pixels/s)
69. 58.89ms (28.49G pixels/s)
70. 58.91ms (28.48G pixels/s)
71. 58.90ms (28.48G pixels/s)
72. 58.91ms (28.48G pixels/s)
73. 58.90ms (28.48G pixels/s)
74. 58.90ms (28.49G pixels/s)
75. 58.91ms (28.48G pixels/s)
76. 58.91ms (28.48G pixels/s)
77. 58.90ms (28.48G pixels/s)
78. 58.91ms (28.48G pixels/s)
79. 58.90ms (28.48G pixels/s)
80. 58.91ms (28.48G pixels/s)
81. 58.90ms (28.48G pixels/s)
82. 58.91ms (28.48G pixels/s)
83. 58.90ms (28.49G pixels/s)
84. 58.89ms (28.49G pixels/s)
85. 58.90ms (28.48G pixels/s)
86. 58.90ms (28.49G pixels/s)
87. 58.90ms (28.48G pixels/s)
88. 58.90ms (28.49G pixels/s)
89. 58.90ms (28.48G pixels/s)
90. 58.90ms (28.48G pixels/s)
91. 58.89ms (28.49G pixels/s)
92. 58.90ms (28.48G pixels/s)
93. 58.90ms (28.48G pixels/s)
94. 58.90ms (28.49G pixels/s)
95. 58.91ms (28.48G pixels/s)
96. 58.90ms (28.48G pixels/s)
97. 58.95ms (28.46G pixels/s)
98. 58.91ms (28.48G pixels/s)
99. 58.92ms (28.48G pixels/s)
100. 58.91ms (28.48G pixels/s)
101. 58.91ms (28.48G pixels/s)
102. 58.91ms (28.48G pixels/s)
103. 58.90ms (28.48G pixels/s)
104. 58.90ms (28.48G pixels/s)
105. 58.91ms (28.48G pixels/s)
106. 58.90ms (28.48G pixels/s)
107. 58.90ms (28.48G pixels/s)
108. 58.92ms (28.48G pixels/s)
109. 58.90ms (28.48G pixels/s)
110. 58.90ms (28.48G pixels/s)
111. 58.90ms (28.48G pixels/s)
112. 58.90ms (28.48G pixels/s)
113. 58.92ms (28.48G pixels/s)
114. 58.96ms (28.46G pixels/s)
115. 58.98ms (28.44G pixels/s)
116. 58.97ms (28.45G pixels/s)
117. 58.99ms (28.44G pixels/s)
118. 59.00ms (28.44G pixels/s)
119. 59.02ms (28.42G pixels/s)
120. 59.06ms (28.41G pixels/s)
121. 59.07ms (28.40G pixels/s)
122. 59.03ms (28.42G pixels/s)
123. 59.02ms (28.43G pixels/s)
124. 59.04ms (28.42G pixels/s)
125. 59.03ms (28.42G pixels/s)
126. 59.03ms (28.42G pixels/s)
127. 59.06ms (28.41G pixels/s)
128. 59.02ms (28.43G pixels/s)

I guess that numthreads(1, 1, 1) is not the best set-up for AMD hardware...
 
Fable Legends closed beta starts ....

6qgGB5f.jpg
 
So here's the updated version of the benchmark.
What kind of workloads are you running on each thread? It is super easy to contruct workloads with zero potential gains.

Use cases to avoid:
- Both queues run the same shader. Obviously this has zero advantage as bottlenecks are identical.
- Both queues are bottlenecked by the same resource (bandwidth, ALU, sampler cycles).

Good use cases:
- One queue is bandwidth heavy and other is ALU heavy.
- Graphics queue is fixed function heavy and you run compute in the other queue. For example high poly shadow map rendering (primitive setup + rop, no pixel shader).
- One queue is running long lasting computation and the other is running small single lane tasks with data dependencies. The long running background task keeps the GPU occupied while the other queue waits for synchronization.
 
Radeon 290X Hawaii GCN 1.1
Driver version: 15.20.1062.1004
Compute only:
1. 55.17ms
2. 55.14ms
3. 55.13ms
4. 55.13ms
5. 55.15ms
6. 55.14ms
7. 55.14ms
8. 55.14ms
9. 55.14ms
10. 55.14ms
11. 55.13ms
12. 55.14ms
13. 55.15ms
14. 55.14ms
15. 55.13ms
16. 55.14ms
17. 55.15ms
18. 55.14ms
19. 55.13ms
20. 55.13ms
21. 55.15ms
22. 55.13ms
23. 55.14ms
24. 55.13ms
25. 55.14ms
26. 55.13ms
27. 55.13ms
28. 55.14ms
29. 55.14ms
30. 55.14ms
31. 55.13ms
32. 55.14ms
33. 55.13ms
34. 55.13ms
35. 55.16ms
36. 55.14ms
37. 55.14ms
38. 55.13ms
39. 55.15ms
40. 55.14ms
41. 55.16ms
42. 55.13ms
43. 55.14ms
44. 55.14ms
45. 55.13ms
46. 55.14ms
47. 55.13ms
48. 55.14ms
49. 55.15ms
50. 55.13ms
51. 55.14ms
52. 55.14ms
53. 55.14ms
54. 55.14ms
55. 55.14ms
56. 55.13ms
57. 55.15ms
58. 55.14ms
59. 55.15ms
60. 55.13ms
61. 55.14ms
62. 55.15ms
63. 55.15ms
64. 55.14ms
65. 55.14ms
66. 55.15ms
67. 55.13ms
68. 55.14ms
69. 55.13ms
70. 55.14ms
71. 55.14ms
72. 55.14ms
73. 55.14ms
74. 55.14ms
75. 55.15ms
76. 55.15ms
77. 55.16ms
78. 55.13ms
79. 55.14ms
80. 55.14ms
81. 55.14ms
82. 55.14ms
83. 55.13ms
84. 55.15ms
85. 55.14ms
86. 55.16ms
87. 55.13ms
88. 55.14ms
89. 55.14ms
90. 55.16ms
91. 55.14ms
92. 55.13ms
93. 55.15ms
94. 55.18ms
95. 55.15ms
96. 55.15ms
97. 55.15ms
98. 55.15ms
99. 55.16ms
100. 55.15ms
101. 55.14ms
102. 55.15ms
103. 55.14ms
104. 55.16ms
105. 55.14ms
106. 55.16ms
107. 55.15ms
108. 55.15ms
109. 55.14ms
110. 55.14ms
111. 55.16ms
112. 55.18ms
113. 55.18ms
114. 55.15ms
115. 55.15ms
116. 55.14ms
117. 55.15ms
118. 55.17ms
119. 55.15ms
120. 55.16ms
121. 55.15ms
122. 55.15ms
123. 55.14ms
124. 55.15ms
125. 55.14ms
126. 55.15ms
127. 55.16ms
128. 55.15ms
Graphics only: 26.98ms (62.18G pixels/s)
Graphics + compute:
1. 55.62ms (30.16G pixels/s)
2. 55.44ms (30.26G pixels/s)
3. 55.52ms (30.22G pixels/s)
4. 55.44ms (30.26G pixels/s)
5. 55.49ms (30.24G pixels/s)
6. 55.50ms (30.23G pixels/s)
7. 55.23ms (30.37G pixels/s)
8. 55.49ms (30.24G pixels/s)
9. 55.97ms (29.98G pixels/s)
10. 56.55ms (29.67G pixels/s)
11. 55.74ms (30.10G pixels/s)
12. 55.77ms (30.08G pixels/s)
13. 55.69ms (30.13G pixels/s)
14. 56.24ms (29.83G pixels/s)
15. 55.76ms (30.09G pixels/s)
16. 56.43ms (29.73G pixels/s)
17. 55.26ms (30.36G pixels/s)
18. 56.42ms (29.74G pixels/s)
19. 55.82ms (30.06G pixels/s)
20. 56.71ms (29.59G pixels/s)
21. 55.25ms (30.37G pixels/s)
22. 56.56ms (29.66G pixels/s)
23. 56.09ms (29.91G pixels/s)
24. 55.29ms (30.35G pixels/s)
25. 56.88ms (29.49G pixels/s)
26. 56.04ms (29.94G pixels/s)
27. 57.03ms (29.42G pixels/s)
28. 55.89ms (30.02G pixels/s)
29. 56.94ms (29.46G pixels/s)
30. 56.97ms (29.45G pixels/s)
31. 57.41ms (29.22G pixels/s)
32. 56.41ms (29.74G pixels/s)
33. 56.43ms (29.73G pixels/s)
34. 56.43ms (29.73G pixels/s)
35. 57.74ms (29.06G pixels/s)
36. 56.28ms (29.81G pixels/s)
37. 55.25ms (30.37G pixels/s)
38. 57.74ms (29.06G pixels/s)
39. 56.68ms (29.60G pixels/s)
40. 56.71ms (29.59G pixels/s)
41. 56.10ms (29.91G pixels/s)
42. 56.45ms (29.72G pixels/s)
43. 56.81ms (29.53G pixels/s)
44. 56.62ms (29.63G pixels/s)
45. 57.80ms (29.03G pixels/s)
46. 55.24ms (30.37G pixels/s)
47. 58.05ms (28.90G pixels/s)
48. 56.90ms (29.49G pixels/s)
49. 58.88ms (28.49G pixels/s)
50. 56.59ms (29.65G pixels/s)
51. 56.24ms (29.83G pixels/s)
52. 55.25ms (30.37G pixels/s)
53. 58.86ms (28.50G pixels/s)
54. 56.48ms (29.70G pixels/s)
55. 56.44ms (29.73G pixels/s)
56. 57.15ms (29.36G pixels/s)
57. 56.37ms (29.76G pixels/s)
58. 57.21ms (29.32G pixels/s)
59. 55.25ms (30.37G pixels/s)
60. 63.42ms (26.46G pixels/s)
61. 62.42ms (26.88G pixels/s)
62. 59.59ms (28.16G pixels/s)
63. 61.56ms (27.25G pixels/s)
64. 59.58ms (28.16G pixels/s)
65. 57.54ms (29.16G pixels/s)
66. 56.56ms (29.66G pixels/s)
67. 57.62ms (29.12G pixels/s)
68. 56.75ms (29.57G pixels/s)
69. 57.60ms (29.13G pixels/s)
70. 56.70ms (29.59G pixels/s)
71. 63.93ms (26.24G pixels/s)
72. 63.34ms (26.49G pixels/s)
73. 55.23ms (30.38G pixels/s)
74. 57.61ms (29.12G pixels/s)
75. 57.86ms (29.00G pixels/s)
76. 62.02ms (27.05G pixels/s)
77. 55.28ms (30.35G pixels/s)
78. 57.09ms (29.39G pixels/s)
79. 57.16ms (29.35G pixels/s)
80. 57.96ms (28.94G pixels/s)
81. 57.63ms (29.11G pixels/s)
82. 60.51ms (27.73G pixels/s)
83. 60.92ms (27.54G pixels/s)
84. 60.92ms (27.54G pixels/s)
85. 58.06ms (28.90G pixels/s)
86. 60.25ms (27.85G pixels/s)
87. 56.88ms (29.50G pixels/s)
88. 57.08ms (29.39G pixels/s)
89. 58.34ms (28.76G pixels/s)
90. 63.08ms (26.60G pixels/s)
91. 64.11ms (26.17G pixels/s)
92. 55.25ms (30.36G pixels/s)
93. 62.66ms (26.78G pixels/s)
94. 61.75ms (27.17G pixels/s)
95. 59.35ms (28.27G pixels/s)
96. 55.24ms (30.37G pixels/s)
97. 64.26ms (26.11G pixels/s)
98. 57.48ms (29.19G pixels/s)
99. 60.13ms (27.90G pixels/s)
100. 61.44ms (27.31G pixels/s)
101. 62.08ms (27.03G pixels/s)
102. 58.66ms (28.60G pixels/s)
103. 58.68ms (28.59G pixels/s)
104. 61.03ms (27.49G pixels/s)
105. 62.36ms (26.90G pixels/s)
106. 55.26ms (30.36G pixels/s)
107. 63.71ms (26.34G pixels/s)
108. 64.76ms (25.91G pixels/s)
109. 62.62ms (26.79G pixels/s)
110. 66.02ms (25.41G pixels/s)
111. 65.91ms (25.46G pixels/s)
112. 63.62ms (26.37G pixels/s)
113. 64.92ms (25.84G pixels/s)
114. 57.36ms (29.25G pixels/s)
115. 62.29ms (26.93G pixels/s)
116. 69.98ms (23.97G pixels/s)
117. 65.34ms (25.68G pixels/s)
118. 57.41ms (29.22G pixels/s)
119. 63.61ms (26.38G pixels/s)
120. 69.54ms (24.13G pixels/s)
121. 68.88ms (24.36G pixels/s)
122. 62.64ms (26.78G pixels/s)
123. 55.26ms (30.36G pixels/s)
124. 61.46ms (27.30G pixels/s)
125. 63.78ms (26.31G pixels/s)
126. 67.35ms (24.91G pixels/s)
127. 64.78ms (25.90G pixels/s)
128. 68.07ms (24.65G pixels/s)

How sure are we that this is an actual developer? There are NO "async compute" flags or cap bits.
What he claimed is that nVidia asked them to disabled it because their current implementation is very slow.

This isn't painting a good picture for Nvidia,

Especially this part:
P.S. There is no war of words between us and Nvidia. Nvidia made some incorrect statements, and at this point they will not dispute our position if you ask their PR. That is, they are not disputing anything in our blog. I believe the initial confusion was because Nvidia PR was putting pressure on us to disable certain settings in the benchmark, when we refused, I think they took it a little too personally.

This is not the first time we've seen reports of nVidia's marketing being very aggressive towards developers. Anyone remembers that famous blog post from a reputable dev who compared nVidia's engineers to CIA agents who always tried to force things to be the way they wanted?


but the comments around not seeing similar performance gains for Fiji also have a very valid point.

Actually, they don't. Fiji has the exact same number of ACEs than Hawaii (8) so if that's where the bottleneck is, the performance between the two wouldn't be very different.


So async compute is optional in D3D12. I suppose over time NVidia will work out which games to say "nope" to when queried for this then.

That is until Pascal comes out with an obnoxious amount of async units and their Gameworks DX12 program proceeds to crap all over GCN, Kepler and Maxwell users by forcing the async compute capabilities to become the bottleneck on all affected titles.
I have no idea if they can make this possible or not, but after the CPU PhysX path using x87 instructions, super concrete slabs, invisible oceans, sub-pixel polygons for Geralt's hair et al, anything can be expected of nVidia's underhandedness.
 
GTX 960 - Driver Version: 355.60

Compute only:
1. 11.21ms
2. 11.31ms
3. 11.31ms
4. 10.97ms
5. 10.16ms
6. 10.24ms
7. 10.26ms
8. 9.95ms
9. 9.22ms
10. 9.24ms
11. 9.22ms
12. 9.29ms
13. 9.83ms
14. 8.96ms
15. 9.01ms
16. 9.09ms
17. 9.06ms
18. 9.09ms
19. 9.05ms
20. 9.08ms
21. 9.01ms
22. 8.94ms
23. 8.98ms
24. 8.94ms
25. 8.97ms
26. 9.00ms
27. 9.03ms
28. 9.06ms
29. 9.13ms
30. 9.12ms
31. 9.07ms
32. 17.85ms
33. 20.06ms
34. 20.10ms
35. 20.16ms
36. 22.42ms
37. 17.90ms
38. 20.06ms
39. 20.07ms
40. 20.04ms
41. 20.13ms
42. 20.10ms
43. 20.07ms
44. 20.05ms
45. 20.07ms
46. 20.03ms
47. 22.42ms
48. 17.91ms
49. 20.05ms
50. 20.06ms
51. 20.03ms
52. 20.04ms
53. 22.27ms
54. 17.90ms
55. 20.04ms
56. 20.03ms
57. 20.04ms
58. 20.03ms
59. 20.24ms
60. 22.34ms
61. 20.07ms
62. 20.05ms
63. 20.03ms
64. 26.55ms
65. 35.47ms
66. 28.90ms
67. 26.65ms
68. 28.94ms
69. 35.51ms
70. 28.90ms
71. 28.88ms
72. 28.85ms
73. 26.81ms
74. 28.89ms
75. 28.85ms
76. 28.83ms
77. 31.26ms
78. 28.92ms
79. 29.11ms
80. 26.65ms
81. 37.79ms
82. 28.91ms
83. 26.67ms
84. 31.08ms
85. 31.14ms
86. 26.74ms
87. 28.92ms
88. 28.86ms
89. 33.72ms
90. 31.12ms
91. 28.86ms
92. 26.73ms
93. 31.22ms
94. 31.15ms
95. 26.70ms
96. 35.39ms
97. 37.72ms
98. 39.92ms
99. 37.65ms
100. 35.55ms
101. 39.91ms
102. 37.66ms
103. 37.67ms
104. 39.91ms
105. 37.68ms
106. 35.47ms
107. 39.98ms
108. 37.67ms
109. 37.64ms
110. 35.61ms
111. 37.70ms
112. 35.47ms
113. 42.11ms
114. 35.48ms
115. 46.50ms
116. 37.70ms
117. 37.65ms
118. 42.22ms
119. 35.48ms
120. 37.65ms
121. 35.53ms
122. 42.13ms
123. 35.49ms
124. 37.64ms
125. 39.92ms
126. 37.69ms
127. 35.45ms
128. 46.48ms
Graphics only: 41.80ms (40.14G pixels/s)
Graphics + compute:
1. 50.54ms (33.19G pixels/s)
2. 50.45ms (33.25G pixels/s)
3. 50.42ms (33.27G pixels/s)
4. 50.53ms (33.20G pixels/s)
5. 50.46ms (33.25G pixels/s)
6. 50.59ms (33.16G pixels/s)
7. 50.47ms (33.24G pixels/s)
8. 50.43ms (33.27G pixels/s)
9. 50.50ms (33.22G pixels/s)
10. 50.42ms (33.27G pixels/s)
11. 50.55ms (33.19G pixels/s)
12. 50.45ms (33.26G pixels/s)
13. 51.76ms (32.41G pixels/s)
14. 50.49ms (33.23G pixels/s)
15. 50.46ms (33.25G pixels/s)
16. 50.54ms (33.20G pixels/s)
17. 50.45ms (33.25G pixels/s)
18. 50.53ms (33.20G pixels/s)
19. 50.61ms (33.15G pixels/s)
20. 50.49ms (33.23G pixels/s)
21. 50.55ms (33.19G pixels/s)
22. 50.45ms (33.25G pixels/s)
23. 50.56ms (33.18G pixels/s)
24. 50.54ms (33.19G pixels/s)
25. 50.55ms (33.19G pixels/s)
26. 50.49ms (33.23G pixels/s)
27. 50.45ms (33.26G pixels/s)
28. 50.52ms (33.21G pixels/s)
29. 50.50ms (33.23G pixels/s)
30. 50.52ms (33.21G pixels/s)
31. 50.48ms (33.23G pixels/s)
32. 59.18ms (28.35G pixels/s)
33. 61.67ms (27.20G pixels/s)
34. 61.61ms (27.23G pixels/s)
35. 61.61ms (27.23G pixels/s)
36. 59.35ms (28.27G pixels/s)
37. 61.58ms (27.25G pixels/s)
38. 59.40ms (28.24G pixels/s)
39. 61.58ms (27.25G pixels/s)
40. 61.53ms (27.26G pixels/s)
41. 61.62ms (27.23G pixels/s)
42. 59.34ms (28.27G pixels/s)
43. 61.65ms (27.21G pixels/s)
44. 59.39ms (28.25G pixels/s)
45. 61.65ms (27.21G pixels/s)
46. 63.84ms (26.28G pixels/s)
47. 59.46ms (28.21G pixels/s)
48. 59.25ms (28.32G pixels/s)
49. 61.59ms (27.24G pixels/s)
50. 59.34ms (28.27G pixels/s)
51. 61.58ms (27.25G pixels/s)
52. 59.31ms (28.29G pixels/s)
53. 61.57ms (27.25G pixels/s)
54. 61.52ms (27.27G pixels/s)
55. 61.55ms (27.26G pixels/s)
56. 59.82ms (28.05G pixels/s)
57. 63.83ms (26.28G pixels/s)
58. 61.53ms (27.27G pixels/s)
59. 61.58ms (27.25G pixels/s)
60. 59.34ms (28.27G pixels/s)
61. 61.63ms (27.22G pixels/s)
62. 59.48ms (28.21G pixels/s)
63. 61.57ms (27.25G pixels/s)
64. 68.08ms (24.64G pixels/s)
65. 68.23ms (24.59G pixels/s)
66. 68.29ms (24.57G pixels/s)
67. 72.55ms (23.13G pixels/s)
68. 72.62ms (23.10G pixels/s)
69. 68.12ms (24.63G pixels/s)
70. 74.92ms (22.39G pixels/s)
71. 68.22ms (24.59G pixels/s)
72. 72.61ms (23.11G pixels/s)
73. 74.88ms (22.40G pixels/s)
74. 68.04ms (24.66G pixels/s)
75. 72.64ms (23.09G pixels/s)
76. 68.13ms (24.62G pixels/s)
77. 68.20ms (24.60G pixels/s)
78. 68.23ms (24.59G pixels/s)
79. 72.61ms (23.11G pixels/s)
80. 72.69ms (23.08G pixels/s)
81. 68.16ms (24.61G pixels/s)
82. 74.83ms (22.42G pixels/s)
83. 70.46ms (23.81G pixels/s)
84. 72.65ms (23.09G pixels/s)
85. 72.68ms (23.08G pixels/s)
86. 68.16ms (24.61G pixels/s)
87. 68.36ms (24.54G pixels/s)
88. 70.34ms (23.85G pixels/s)
89. 68.30ms (24.56G pixels/s)
90. 68.14ms (24.62G pixels/s)
91. 68.21ms (24.59G pixels/s)
92. 68.14ms (24.62G pixels/s)
93. 72.59ms (23.11G pixels/s)
94. 72.67ms (23.09G pixels/s)
95. 68.18ms (24.61G pixels/s)
96. 81.44ms (20.60G pixels/s)
97. 79.29ms (21.16G pixels/s)
98. 77.00ms (21.79G pixels/s)
99. 79.40ms (21.13G pixels/s)
100. 79.19ms (21.18G pixels/s)
101. 81.38ms (20.61G pixels/s)
102. 77.10ms (21.76G pixels/s)
103. 79.16ms (21.19G pixels/s)
104. 76.93ms (21.81G pixels/s)
105. 81.55ms (20.57G pixels/s)
106. 79.12ms (21.20G pixels/s)
107. 81.33ms (20.63G pixels/s)
108. 81.46ms (20.60G pixels/s)
109. 79.09ms (21.21G pixels/s)
110. 83.55ms (20.08G pixels/s)
111. 83.79ms (20.02G pixels/s)
112. 79.23ms (21.17G pixels/s)
113. 77.05ms (21.77G pixels/s)
114. 85.90ms (19.53G pixels/s)
115. 76.93ms (21.81G pixels/s)
116. 81.41ms (20.61G pixels/s)
117. 79.32ms (21.15G pixels/s)
118. 76.99ms (21.79G pixels/s)
119. 81.45ms (20.60G pixels/s)
120. 79.28ms (21.16G pixels/s)
121. 79.13ms (21.20G pixels/s)
122. 77.15ms (21.75G pixels/s)
123. 77.06ms (21.77G pixels/s)
124. 81.37ms (20.62G pixels/s)
125. 81.41ms (20.61G pixels/s)
126. 76.96ms (21.80G pixels/s)
127. 76.97ms (21.80G pixels/s)
128. 90.11ms (18.62G pixels/s)
 
Radeon 290X Hawaii GCN 1.1


Actually, they don't. Fiji has the exact same number of ACEs than Hawaii (8) so if that's where the bottleneck is, the performance between the two wouldn't be very different.


If the bottleneck were the ACE's, then batch counts resolution changes will have less effect then what we saw.

The rest is still up in the air, what is interesting, is then when doing pure compute, Keplar has less latency than the 290, and maxwell 2 even more so, but when doing graphics nV hardware is faster too, but when doing them together there are issues.

Putting the performance purely on Async shaders I don't think that is the case seems too shallow of an issue to be hard not to know what is going on concretely.
 
sebbbi (or others) - this report indicates that XBOX One DX11 did not have async shaders exposed (where PS4 did). Do you know if that's accurate?


I don't think they will be exposed till Dx12 is available on Xbox.......

Then again, I just started working with the dx12 version of UE4, and its xbox version of dx12 version of the engine isn't ready yet.
 
sebbbi (or others) - this report indicates that XBOX One DX11 did not have async shaders exposed (where PS4 did). Do you know if that's accurate?
It was late and added to dx11 fast semantics in 2014. It does not require dx12 to be exposed I think. I could be wrong. But pretty fairly confident from reading the leaked SDK documentation
 
GTX 970 - 355.60
Compute only:
1. 9.77ms
2. 9.77ms
3. 9.79ms
4. 9.80ms
5. 9.76ms
6. 9.74ms
7. 9.89ms
8. 9.77ms
9. 9.74ms
10. 9.75ms
11. 9.77ms
12. 9.78ms
13. 9.78ms
14. 9.77ms
15. 9.76ms
16. 9.76ms
17. 9.77ms
18. 9.77ms
19. 9.78ms
20. 9.78ms
21. 9.77ms
22. 9.76ms
23. 9.77ms
24. 9.81ms
25. 9.76ms
26. 10.09ms
27. 10.05ms
28. 10.05ms
29. 9.82ms
30. 9.77ms
31. 9.80ms
32. 19.53ms
33. 21.83ms
34. 21.83ms
35. 24.79ms
36. 24.55ms
37. 21.86ms
38. 19.41ms
39. 21.81ms
40. 22.12ms
41. 24.69ms
42. 24.31ms
43. 21.95ms
44. 21.85ms
45. 21.85ms
46. 24.88ms
47. 24.64ms
48. 21.84ms
49. 21.84ms
50. 19.42ms
51. 24.55ms
52. 29.75ms
53. 22.04ms
54. 27.47ms
55. 19.43ms
56. 19.45ms
57. 25.19ms
58. 21.89ms
59. 19.43ms
60. 21.83ms
61. 19.44ms
62. 19.98ms
63. 24.62ms
64. 29.10ms
65. 31.59ms
66. 29.11ms
67. 32.31ms
68. 29.21ms
69. 29.08ms
70. 29.07ms
71. 32.35ms
72. 29.20ms
73. 29.19ms
74. 29.09ms
75. 32.10ms
76. 29.37ms
77. 29.10ms
78. 29.09ms
79. 34.20ms
80. 32.14ms
81. 29.22ms
82. 31.49ms
83. 31.78ms
84. 34.40ms
85. 29.12ms
86. 29.09ms
87. 32.08ms
88. 29.40ms
89. 29.20ms
90. 29.11ms
91. 31.83ms
92. 32.05ms
93. 31.64ms
94. 34.00ms
95. 34.34ms
96. 52.62ms
97. 47.68ms
98. 48.96ms
99. 46.38ms
100. 41.42ms
101. 43.93ms
102. 43.80ms
103. 45.15ms
104. 42.17ms
105. 44.21ms
106. 49.36ms
107. 44.31ms
108. 41.24ms
109. 47.44ms
110. 41.34ms
111. 41.44ms
112. 48.56ms
113. 48.03ms
114. 47.88ms
115. 45.92ms
116. 49.06ms
117. 50.09ms
118. 48.73ms
119. 44.53ms
120. 43.89ms
121. 46.97ms
122. 49.90ms
123. 43.88ms
124. 41.34ms
125. 46.85ms
126. 43.71ms
127. 43.86ms
128. 51.57ms
Graphics only: 32.13ms (52.22G pixels/s)
Graphics + compute:
1. 41.63ms (40.30G pixels/s)
2. 41.73ms (40.20G pixels/s)
3. 41.70ms (40.23G pixels/s)
4. 41.47ms (40.46G pixels/s)
5. 42.06ms (39.89G pixels/s)
6. 41.43ms (40.50G pixels/s)
7. 41.59ms (40.34G pixels/s)
8. 42.30ms (39.66G pixels/s)
9. 41.48ms (40.45G pixels/s)
10. 41.48ms (40.45G pixels/s)
11. 42.35ms (39.61G pixels/s)
12. 41.46ms (40.47G pixels/s)
13. 41.55ms (40.37G pixels/s)
14. 42.28ms (39.68G pixels/s)
15. 41.44ms (40.48G pixels/s)
16. 41.50ms (40.43G pixels/s)
17. 42.39ms (39.58G pixels/s)
18. 41.45ms (40.47G pixels/s)
19. 41.56ms (40.37G pixels/s)
20. 42.03ms (39.92G pixels/s)
21. 41.44ms (40.49G pixels/s)
22. 41.47ms (40.45G pixels/s)
23. 42.12ms (39.83G pixels/s)
24. 41.48ms (40.45G pixels/s)
25. 41.58ms (40.35G pixels/s)
26. 42.02ms (39.92G pixels/s)
27. 41.44ms (40.49G pixels/s)
28. 41.44ms (40.48G pixels/s)
29. 42.36ms (39.61G pixels/s)
30. 41.50ms (40.43G pixels/s)
31. 41.64ms (40.29G pixels/s)
32. 51.93ms (32.31G pixels/s)
33. 51.16ms (32.79G pixels/s)
34. 57.22ms (29.32G pixels/s)
35. 51.13ms (32.81G pixels/s)
36. 56.59ms (29.65G pixels/s)
37. 51.40ms (32.64G pixels/s)
38. 53.57ms (31.32G pixels/s)
39. 56.92ms (29.48G pixels/s)
40. 53.57ms (31.32G pixels/s)
41. 57.32ms (29.27G pixels/s)
42. 53.81ms (31.18G pixels/s)
43. 52.56ms (31.92G pixels/s)
44. 54.06ms (31.04G pixels/s)
45. 51.53ms (32.56G pixels/s)
46. 56.54ms (29.67G pixels/s)
47. 53.60ms (31.30G pixels/s)
48. 56.89ms (29.49G pixels/s)
49. 51.17ms (32.79G pixels/s)
50. 57.31ms (29.28G pixels/s)
51. 53.62ms (31.29G pixels/s)
52. 57.23ms (29.31G pixels/s)
53. 53.79ms (31.19G pixels/s)
54. 56.44ms (29.72G pixels/s)
55. 54.07ms (31.03G pixels/s)
56. 53.89ms (31.13G pixels/s)
57. 55.01ms (30.50G pixels/s)
58. 57.06ms (29.40G pixels/s)
59. 56.86ms (29.50G pixels/s)
60. 51.86ms (32.35G pixels/s)
61. 60.31ms (27.82G pixels/s)
62. 54.38ms (30.85G pixels/s)
63. 54.60ms (30.73G pixels/s)
64. 66.91ms (25.08G pixels/s)
65. 67.40ms (24.89G pixels/s)
66. 61.57ms (27.25G pixels/s)
67. 67.35ms (24.91G pixels/s)
68. 60.95ms (27.53G pixels/s)
69. 65.37ms (25.67G pixels/s)
70. 63.21ms (26.54G pixels/s)
71. 64.39ms (26.06G pixels/s)
72. 67.71ms (24.78G pixels/s)
73. 65.88ms (25.46G pixels/s)
74. 63.26ms (26.52G pixels/s)
75. 69.45ms (24.16G pixels/s)
76. 61.14ms (27.44G pixels/s)
77. 64.54ms (25.99G pixels/s)
78. 63.25ms (26.52G pixels/s)
79. 66.93ms (25.07G pixels/s)
80. 60.83ms (27.58G pixels/s)
81. 64.23ms (26.12G pixels/s)
82. 68.69ms (24.42G pixels/s)
83. 64.53ms (26.00G pixels/s)
84. 63.60ms (26.38G pixels/s)
85. 63.51ms (26.42G pixels/s)
86. 64.04ms (26.20G pixels/s)
87. 61.49ms (27.28G pixels/s)
88. 66.66ms (25.17G pixels/s)
89. 66.44ms (25.25G pixels/s)
90. 61.76ms (27.16G pixels/s)
91. 68.88ms (24.36G pixels/s)
92. 66.04ms (25.40G pixels/s)
93. 66.89ms (25.08G pixels/s)
94. 66.86ms (25.09G pixels/s)
95. 61.56ms (27.25G pixels/s)
96. 76.57ms (21.91G pixels/s)
97. 73.18ms (22.93G pixels/s)
98. 73.56ms (22.81G pixels/s)
99. 73.67ms (22.77G pixels/s)
100. 73.22ms (22.91G pixels/s)
101. 76.49ms (21.93G pixels/s)
102. 75.74ms (22.15G pixels/s)
103. 70.81ms (23.69G pixels/s)
104. 71.15ms (23.58G pixels/s)
105. 78.29ms (21.43G pixels/s)
106. 79.14ms (21.20G pixels/s)
107. 72.98ms (22.99G pixels/s)
108. 90.04ms (18.63G pixels/s)
109. 82.83ms (20.25G pixels/s)
110. 73.47ms (22.83G pixels/s)
111. 79.30ms (21.16G pixels/s)
112. 76.99ms (21.79G pixels/s)
113. 77.57ms (21.63G pixels/s)
114. 80.25ms (20.91G pixels/s)
115. 78.04ms (21.50G pixels/s)
116. 76.67ms (21.88G pixels/s)
117. 77.69ms (21.60G pixels/s)
118. 79.38ms (21.13G pixels/s)
119. 73.97ms (22.68G pixels/s)
120. 77.37ms (21.69G pixels/s)
121. 73.06ms (22.96G pixels/s)
122. 77.32ms (21.70G pixels/s)
123. 73.23ms (22.91G pixels/s)
124. 76.02ms (22.07G pixels/s)
125. 76.81ms (21.84G pixels/s)
126. 78.89ms (21.27G pixels/s)
127. 82.35ms (20.37G pixels/s)
128. 86.50ms (19.40G pixels/s)
 
I can't see an edit button?

Anyway, 750Ti just for comparison(Maxwell 1)
Compute only:
1. 11.92ms
2. 11.90ms
3. 11.18ms
4. 10.75ms
5. 10.74ms
6. 10.75ms
7. 10.59ms
8. 10.54ms
9. 10.55ms
10. 10.54ms
11. 10.55ms
12. 10.56ms
13. 10.56ms
14. 10.55ms
15. 10.56ms
16. 10.55ms
17. 21.02ms
18. 21.01ms
19. 20.99ms
20. 20.99ms
21. 21.01ms
22. 21.02ms
23. 21.02ms
24. 21.02ms
25. 21.02ms
26. 21.02ms
27. 21.04ms
28. 21.00ms
29. 21.01ms
30. 21.00ms
31. 21.01ms
32. 31.48ms
33. 31.47ms
34. 31.44ms
35. 31.46ms
36. 31.47ms
37. 31.45ms
38. 31.45ms
39. 31.46ms
40. 31.46ms
41. 31.45ms
42. 31.47ms
43. 31.46ms
44. 31.47ms
45. 31.47ms
46. 31.48ms
47. 31.46ms
48. 41.91ms
49. 41.90ms
50. 41.90ms
51. 41.91ms
52. 41.90ms
53. 41.93ms
54. 41.92ms
55. 41.91ms
56. 41.91ms
57. 41.91ms
58. 41.94ms
59. 41.92ms
60. 41.94ms
61. 41.92ms
62. 41.91ms
63. 41.91ms
64. 52.38ms
65. 52.37ms
66. 52.37ms
67. 52.37ms
68. 52.37ms
69. 52.34ms
70. 52.37ms
71. 52.37ms
72. 52.40ms
73. 52.37ms
74. 52.39ms
75. 52.36ms
76. 52.39ms
77. 52.38ms
78. 52.37ms
79. 52.38ms
80. 62.83ms
81. 62.83ms
82. 62.81ms
83. 62.87ms
84. 62.82ms
85. 62.81ms
86. 62.81ms
87. 62.82ms
88. 62.82ms
89. 62.84ms
90. 62.83ms
91. 62.84ms
92. 62.83ms
93. 62.83ms
94. 62.83ms
95. 62.83ms
96. 73.29ms
97. 73.28ms
98. 73.28ms
99. 73.27ms
100. 73.28ms
101. 73.27ms
102. 73.28ms
103. 73.27ms
104. 73.27ms
105. 73.27ms
106. 73.28ms
107. 73.26ms
108. 73.27ms
109. 73.29ms
110. 73.28ms
111. 73.31ms
112. 83.73ms
113. 83.73ms
114. 83.74ms
115. 83.73ms
116. 83.74ms
117. 83.74ms
118. 83.75ms
119. 83.74ms
120. 83.73ms
121. 83.73ms
122. 83.76ms
123. 83.74ms
124. 83.74ms
125. 83.74ms
126. 83.75ms
127. 83.76ms
128. 94.18ms
Graphics only: 106.67ms (15.73G pixels/s)
Graphics + compute:
1. 117.15ms (14.32G pixels/s)
2. 117.19ms (14.32G pixels/s)
3. 117.12ms (14.32G pixels/s)
4. 117.15ms (14.32G pixels/s)
5. 117.15ms (14.32G pixels/s)
6. 117.14ms (14.32G pixels/s)
7. 117.15ms (14.32G pixels/s)
8. 117.18ms (14.32G pixels/s)
9. 117.14ms (14.32G pixels/s)
10. 117.15ms (14.32G pixels/s)
11. 117.13ms (14.32G pixels/s)
12. 117.18ms (14.32G pixels/s)
13. 117.15ms (14.32G pixels/s)
14. 117.15ms (14.32G pixels/s)
15. 117.15ms (14.32G pixels/s)
16. 117.16ms (14.32G pixels/s)
17. 127.60ms (13.15G pixels/s)
18. 127.60ms (13.15G pixels/s)
19. 127.61ms (13.15G pixels/s)
20. 127.59ms (13.15G pixels/s)
21. 127.60ms (13.15G pixels/s)
22. 127.60ms (13.15G pixels/s)
23. 127.60ms (13.15G pixels/s)
24. 127.59ms (13.15G pixels/s)
25. 127.59ms (13.15G pixels/s)
26. 127.60ms (13.15G pixels/s)
27. 127.61ms (13.15G pixels/s)
28. 127.63ms (13.15G pixels/s)
29. 127.63ms (13.15G pixels/s)
30. 127.60ms (13.15G pixels/s)
31. 127.60ms (13.15G pixels/s)
32. 138.05ms (12.15G pixels/s)
33. 138.06ms (12.15G pixels/s)
34. 138.07ms (12.15G pixels/s)
35. 138.08ms (12.15G pixels/s)
36. 138.06ms (12.15G pixels/s)
37. 138.02ms (12.16G pixels/s)
38. 138.05ms (12.15G pixels/s)
39. 138.05ms (12.15G pixels/s)
40. 138.05ms (12.15G pixels/s)
41. 138.04ms (12.15G pixels/s)
42. 138.06ms (12.15G pixels/s)
43. 138.05ms (12.15G pixels/s)
44. 138.06ms (12.15G pixels/s)
45. 138.07ms (12.15G pixels/s)
46. 138.06ms (12.15G pixels/s)
47. 138.06ms (12.15G pixels/s)
48. 148.54ms (11.30G pixels/s)
49. 148.50ms (11.30G pixels/s)
50. 148.57ms (11.29G pixels/s)
51. 148.54ms (11.29G pixels/s)
52. 148.51ms (11.30G pixels/s)
53. 148.51ms (11.30G pixels/s)
54. 148.49ms (11.30G pixels/s)
55. 148.49ms (11.30G pixels/s)
56. 148.52ms (11.30G pixels/s)
57. 148.49ms (11.30G pixels/s)
58. 148.52ms (11.30G pixels/s)
59. 148.52ms (11.30G pixels/s)
60. 148.50ms (11.30G pixels/s)
61. 148.53ms (11.30G pixels/s)
62. 148.56ms (11.29G pixels/s)
63. 148.52ms (11.30G pixels/s)
64. 158.96ms (10.55G pixels/s)
65. 158.95ms (10.56G pixels/s)
66. 158.96ms (10.55G pixels/s)
67. 158.96ms (10.55G pixels/s)
68. 158.95ms (10.55G pixels/s)
69. 158.96ms (10.55G pixels/s)
70. 158.94ms (10.56G pixels/s)
71. 158.97ms (10.55G pixels/s)
72. 158.95ms (10.56G pixels/s)
73. 158.97ms (10.55G pixels/s)
74. 158.96ms (10.55G pixels/s)
75. 158.95ms (10.56G pixels/s)
76. 158.99ms (10.55G pixels/s)
77. 159.00ms (10.55G pixels/s)
78. 159.01ms (10.55G pixels/s)
79. 159.01ms (10.55G pixels/s)
80. 169.43ms (9.90G pixels/s)
81. 169.45ms (9.90G pixels/s)
82. 169.42ms (9.90G pixels/s)
83. 169.42ms (9.90G pixels/s)
84. 169.42ms (9.90G pixels/s)
85. 169.43ms (9.90G pixels/s)
86. 169.40ms (9.90G pixels/s)
87. 169.47ms (9.90G pixels/s)
88. 169.43ms (9.90G pixels/s)
89. 169.44ms (9.90G pixels/s)
90. 169.42ms (9.90G pixels/s)
91. 169.46ms (9.90G pixels/s)
92. 169.44ms (9.90G pixels/s)
93. 169.43ms (9.90G pixels/s)
94. 169.42ms (9.90G pixels/s)
95. 169.43ms (9.90G pixels/s)
96. 179.87ms (9.33G pixels/s)
97. 179.90ms (9.33G pixels/s)
98. 179.87ms (9.33G pixels/s)
99. 179.86ms (9.33G pixels/s)
100. 179.89ms (9.33G pixels/s)
101. 179.88ms (9.33G pixels/s)
102. 179.88ms (9.33G pixels/s)
103. 179.90ms (9.33G pixels/s)
104. 179.90ms (9.33G pixels/s)
105. 179.91ms (9.33G pixels/s)
106. 179.92ms (9.33G pixels/s)
107. 179.91ms (9.33G pixels/s)
108. 179.88ms (9.33G pixels/s)
109. 179.89ms (9.33G pixels/s)
110. 179.88ms (9.33G pixels/s)
111. 179.87ms (9.33G pixels/s)
112. 190.36ms (8.81G pixels/s)
113. 190.33ms (8.81G pixels/s)
114. 190.30ms (8.82G pixels/s)
115. 190.42ms (8.81G pixels/s)
116. 190.35ms (8.81G pixels/s)
117. 190.36ms (8.81G pixels/s)
118. 190.32ms (8.82G pixels/s)
119. 190.36ms (8.81G pixels/s)
120. 190.34ms (8.81G pixels/s)
121. 190.31ms (8.82G pixels/s)
122. 190.34ms (8.81G pixels/s)
123. 190.35ms (8.81G pixels/s)
124. 190.36ms (8.81G pixels/s)
125. 190.33ms (8.81G pixels/s)
126. 190.34ms (8.81G pixels/s)
127. 190.37ms (8.81G pixels/s)
128. 200.82ms (8.35G pixels/s)
 
Radeon 290 Hawaii GCN 1.1
Driver version: 15.20.1062.1004

Compute only:
1. 52.71ms
2. 53.19ms
3. 52.75ms
4. 52.71ms
5. 52.66ms
6. 52.72ms
7. 52.70ms
8. 52.88ms
9. 52.92ms
10. 52.66ms
11. 53.13ms
12. 53.17ms
13. 52.84ms
14. 52.72ms
15. 53.05ms
16. 53.05ms
17. 52.71ms
18. 52.72ms
19. 52.67ms
20. 53.17ms
21. 53.14ms
22. 52.91ms
23. 52.87ms
24. 53.16ms
25. 53.11ms
26. 52.70ms
27. 52.95ms
28. 52.93ms
29. 52.70ms
30. 53.08ms
31. 52.62ms
32. 52.69ms
33. 52.67ms
34. 52.71ms
35. 52.77ms
36. 53.14ms
37. 52.89ms
38. 52.77ms
39. 53.13ms
40. 52.88ms
41. 52.69ms
42. 52.98ms
43. 52.79ms
44. 52.87ms
45. 52.79ms
46. 52.67ms
47. 53.07ms
48. 53.07ms
49. 53.14ms
50. 53.13ms
51. 53.06ms
52. 52.74ms
53. 52.67ms
54. 52.73ms
55. 53.12ms
56. 52.97ms
57. 53.22ms
58. 52.96ms
59. 53.20ms
60. 52.70ms
61. 52.75ms
62. 53.09ms
63. 53.02ms
64. 53.16ms
65. 52.67ms
66. 52.71ms
67. 52.68ms
68. 53.11ms
69. 52.98ms
70. 53.04ms
71. 53.15ms
72. 53.03ms
73. 52.96ms
74. 52.88ms
75. 52.75ms
76. 52.91ms
77. 52.69ms
78. 52.67ms
79. 53.10ms
80. 52.70ms
81. 52.62ms
82. 52.71ms
83. 52.69ms
84. 53.07ms
85. 53.17ms
86. 53.04ms
87. 52.76ms
88. 52.66ms
89. 52.70ms
90. 52.99ms
91. 53.00ms
92. 53.06ms
93. 52.75ms
94. 52.73ms
95. 53.11ms
96. 52.76ms
97. 53.13ms
98. 52.95ms
99. 53.13ms
100. 53.13ms
101. 52.66ms
102. 52.70ms
103. 52.86ms
104. 53.14ms
105. 52.70ms
106. 52.72ms
107. 52.98ms
108. 52.70ms
109. 52.72ms
110. 52.70ms
111. 53.23ms
112. 53.19ms
113. 53.08ms
114. 53.10ms
115. 53.11ms
116. 52.73ms
117. 52.99ms
118. 52.69ms
119. 52.73ms
120. 52.69ms
121. 52.74ms
122. 52.70ms
123. 52.86ms
124. 52.93ms
125. 53.11ms
126. 53.07ms
127. 52.90ms
128. 53.12ms
Graphics only: 26.25ms (63.90G pixels/s)
Graphics + compute:
1. 53.32ms (31.47G pixels/s)
2. 53.48ms (31.37G pixels/s)
3. 53.90ms (31.12G pixels/s)
4. 53.62ms (31.29G pixels/s)
5. 53.37ms (31.44G pixels/s)
6. 53.66ms (31.26G pixels/s)
7. 53.31ms (31.47G pixels/s)
8. 53.72ms (31.23G pixels/s)
9. 53.44ms (31.39G pixels/s)
10. 53.66ms (31.27G pixels/s)
11. 53.50ms (31.36G pixels/s)
12. 54.45ms (30.81G pixels/s)
13. 53.93ms (31.11G pixels/s)
14. 54.35ms (30.87G pixels/s)
15. 53.54ms (31.34G pixels/s)
16. 53.22ms (31.52G pixels/s)
17. 54.44ms (30.82G pixels/s)
18. 53.18ms (31.55G pixels/s)
19. 53.66ms (31.27G pixels/s)
20. 54.32ms (30.89G pixels/s)
21. 54.90ms (30.56G pixels/s)
22. 54.28ms (30.91G pixels/s)
23. 53.20ms (31.53G pixels/s)
24. 54.42ms (30.83G pixels/s)
25. 54.35ms (30.87G pixels/s)
26. 53.92ms (31.12G pixels/s)
27. 54.10ms (31.01G pixels/s)
28. 53.95ms (31.10G pixels/s)
29. 54.26ms (30.92G pixels/s)
30. 54.01ms (31.07G pixels/s)
31. 53.50ms (31.36G pixels/s)
32. 53.72ms (31.23G pixels/s)
33. 54.15ms (30.98G pixels/s)
34. 54.16ms (30.98G pixels/s)
35. 54.29ms (30.90G pixels/s)
36. 55.49ms (30.23G pixels/s)
37. 55.94ms (29.99G pixels/s)
38. 54.42ms (30.83G pixels/s)
39. 55.66ms (30.14G pixels/s)
40. 54.31ms (30.89G pixels/s)
41. 54.34ms (30.88G pixels/s)
42. 53.11ms (31.59G pixels/s)
43. 54.35ms (30.87G pixels/s)
44. 53.23ms (31.52G pixels/s)
45. 56.28ms (29.81G pixels/s)
46. 53.88ms (31.14G pixels/s)
47. 56.09ms (29.91G pixels/s)
48. 56.73ms (29.57G pixels/s)
49. 54.20ms (30.95G pixels/s)
50. 56.27ms (29.81G pixels/s)
51. 54.77ms (30.63G pixels/s)
52. 55.94ms (29.99G pixels/s)
53. 54.19ms (30.96G pixels/s)
54. 56.65ms (29.61G pixels/s)
55. 54.60ms (30.73G pixels/s)
56. 55.36ms (30.31G pixels/s)
57. 54.31ms (30.89G pixels/s)
58. 54.90ms (30.56G pixels/s)
59. 53.29ms (31.48G pixels/s)
60. 53.50ms (31.36G pixels/s)
61. 55.05ms (30.48G pixels/s)
62. 53.10ms (31.59G pixels/s)
63. 57.00ms (29.43G pixels/s)
64. 54.74ms (30.65G pixels/s)
65. 55.56ms (30.20G pixels/s)
66. 55.61ms (30.17G pixels/s)
67. 55.68ms (30.13G pixels/s)
68. 53.59ms (31.31G pixels/s)
69. 58.97ms (28.45G pixels/s)
70. 53.31ms (31.47G pixels/s)
71. 56.08ms (29.92G pixels/s)
72. 53.12ms (31.59G pixels/s)
73. 54.91ms (30.55G pixels/s)
74. 56.46ms (29.72G pixels/s)
75. 54.63ms (30.71G pixels/s)
76. 58.35ms (28.75G pixels/s)
77. 54.81ms (30.61G pixels/s)
78. 53.07ms (31.61G pixels/s)
79. 54.51ms (30.78G pixels/s)
80. 54.40ms (30.84G pixels/s)
81. 55.64ms (30.15G pixels/s)
82. 57.77ms (29.04G pixels/s)
83. 55.78ms (30.08G pixels/s)
84. 57.56ms (29.15G pixels/s)
85. 56.06ms (29.93G pixels/s)
86. 53.34ms (31.45G pixels/s)
87. 57.99ms (28.93G pixels/s)
88. 55.18ms (30.40G pixels/s)
89. 59.28ms (28.30G pixels/s)
90. 53.72ms (31.23G pixels/s)
91. 56.42ms (29.74G pixels/s)
92. 55.42ms (30.28G pixels/s)
93. 55.06ms (30.47G pixels/s)
94. 53.45ms (31.39G pixels/s)
95. 55.74ms (30.10G pixels/s)
96. 53.76ms (31.21G pixels/s)
97. 56.34ms (29.78G pixels/s)
98. 53.65ms (31.27G pixels/s)
99. 57.34ms (29.26G pixels/s)
100. 56.76ms (29.56G pixels/s)
101. 53.55ms (31.33G pixels/s)
102. 56.84ms (29.52G pixels/s)
103. 55.35ms (30.31G pixels/s)
104. 58.31ms (28.77G pixels/s)
105. 56.89ms (29.49G pixels/s)
106. 55.64ms (30.16G pixels/s)
107. 53.35ms (31.45G pixels/s)
108. 55.53ms (30.21G pixels/s)
109. 56.98ms (29.44G pixels/s)
110. 60.04ms (27.94G pixels/s)
111. 57.09ms (29.39G pixels/s)
112. 58.69ms (28.59G pixels/s)
113. 55.15ms (30.42G pixels/s)
114. 58.21ms (28.82G pixels/s)
115. 55.72ms (30.11G pixels/s)
116. 60.84ms (27.58G pixels/s)
117. 57.01ms (29.43G pixels/s)
118. 56.55ms (29.67G pixels/s)
119. 56.45ms (29.72G pixels/s)
120. 56.20ms (29.85G pixels/s)
121. 59.73ms (28.09G pixels/s)
122. 58.50ms (28.68G pixels/s)
123. 58.53ms (28.66G pixels/s)
124. 53.60ms (31.30G pixels/s)
125. 59.90ms (28.01G pixels/s)
126. 53.69ms (31.25G pixels/s)
127. 58.86ms (28.50G pixels/s)
128. 53.30ms (31.48G pixels/s)
 
GTX 980 Ti

Compute only:
1. 11.63ms
2. 11.62ms
3. 11.82ms
4. 10.68ms
5. 10.66ms
6. 10.65ms
7. 10.68ms
8. 9.97ms
9. 9.99ms
10. 9.99ms
11. 9.99ms
12. 9.98ms
13. 9.97ms
14. 10.01ms
15. 9.99ms
16. 9.99ms
17. 10.01ms
18. 9.99ms
19. 10.01ms
20. 9.99ms
21. 10.01ms
22. 9.97ms
23. 10.00ms
24. 10.02ms
25. 10.05ms
26. 10.07ms
27. 10.09ms
28. 10.07ms
29. 10.02ms
30. 10.01ms
31. 10.00ms
32. 19.82ms
33. 24.77ms
34. 19.89ms
35. 24.90ms
36. 22.35ms
37. 22.40ms
38. 22.36ms
39. 22.40ms
40. 24.96ms
41. 24.87ms
42. 22.39ms
43. 22.36ms
44. 19.93ms
45. 19.90ms
46. 24.87ms
47. 22.36ms
48. 22.40ms
49. 22.36ms
50. 19.90ms
51. 24.90ms
52. 24.86ms
53. 24.85ms
54. 22.37ms
55. 22.40ms
56. 24.87ms
57. 22.41ms
58. 24.87ms
59. 19.92ms
60. 22.36ms
61. 19.91ms
62. 27.40ms
63. 22.38ms
64. 32.21ms
65. 29.77ms
66. 34.75ms
67. 32.30ms
68. 34.75ms
69. 29.81ms
70. 34.81ms
71. 34.79ms
72. 32.30ms
73. 32.30ms
74. 34.83ms
75. 32.29ms
76. 32.30ms
77. 32.30ms
78. 29.85ms
79. 29.85ms
80. 32.25ms
81. 29.80ms
82. 34.81ms
83. 32.30ms
84. 32.30ms
85. 34.77ms
86. 32.31ms
87. 34.81ms
88. 32.31ms
89. 34.82ms
90. 32.34ms
91. 32.31ms
92. 32.25ms
93. 34.84ms
94. 34.76ms
95. 29.86ms
96. 47.08ms
97. 47.07ms
98. 44.65ms
99. 44.72ms
100. 39.69ms
101. 44.64ms
102. 47.17ms
103. 42.15ms
104. 44.66ms
105. 44.69ms
106. 39.74ms
107. 42.19ms
108. 39.73ms
109. 42.16ms
110. 44.65ms
111. 47.21ms
112. 44.65ms
113. 44.66ms
114. 44.70ms
115. 39.70ms
116. 47.16ms
117. 44.65ms
118. 42.16ms
119. 47.18ms
120. 42.20ms
121. 39.74ms
122. 47.18ms
123. 44.64ms
124. 44.64ms
125. 44.68ms
126. 44.68ms
127. 42.20ms
128. 54.52ms
Graphics only: 17.88ms (93.82G pixels/s)
Graphics + compute:
1. 27.69ms (60.59G pixels/s)
2. 27.62ms (60.74G pixels/s)
3. 27.76ms (60.45G pixels/s)
4. 27.68ms (60.61G pixels/s)
5. 27.67ms (60.64G pixels/s)
6. 27.76ms (60.44G pixels/s)
7. 27.79ms (60.37G pixels/s)
8. 27.82ms (60.31G pixels/s)
9. 27.75ms (60.46G pixels/s)
10. 27.76ms (60.44G pixels/s)
11. 27.75ms (60.46G pixels/s)
12. 27.74ms (60.49G pixels/s)
13. 27.76ms (60.44G pixels/s)
14. 27.72ms (60.51G pixels/s)
15. 27.57ms (60.85G pixels/s)
16. 27.78ms (60.39G pixels/s)
17. 27.70ms (60.56G pixels/s)
18. 27.75ms (60.45G pixels/s)
19. 27.77ms (60.41G pixels/s)
20. 27.82ms (60.31G pixels/s)
21. 27.79ms (60.36G pixels/s)
22. 27.70ms (60.57G pixels/s)
23. 27.80ms (60.36G pixels/s)
24. 27.81ms (60.33G pixels/s)
25. 27.74ms (60.49G pixels/s)
26. 27.78ms (60.39G pixels/s)
27. 27.73ms (60.49G pixels/s)
28. 27.70ms (60.56G pixels/s)
29. 27.82ms (60.31G pixels/s)
30. 27.80ms (60.34G pixels/s)
31. 27.77ms (60.41G pixels/s)
32. 37.61ms (44.61G pixels/s)
33. 40.10ms (41.84G pixels/s)
34. 37.69ms (44.51G pixels/s)
35. 37.72ms (44.48G pixels/s)
36. 40.20ms (41.74G pixels/s)
37. 40.11ms (41.83G pixels/s)
38. 37.74ms (44.45G pixels/s)
39. 37.72ms (44.48G pixels/s)
40. 40.19ms (41.75G pixels/s)
41. 40.17ms (41.77G pixels/s)
42. 37.65ms (44.56G pixels/s)
43. 40.19ms (41.75G pixels/s)
44. 37.66ms (44.55G pixels/s)
45. 40.17ms (41.77G pixels/s)
46. 40.25ms (41.68G pixels/s)
47. 40.14ms (41.80G pixels/s)
48. 40.18ms (41.76G pixels/s)
49. 37.74ms (44.46G pixels/s)
50. 40.11ms (41.82G pixels/s)
51. 40.13ms (41.81G pixels/s)
52. 40.25ms (41.68G pixels/s)
53. 37.64ms (44.57G pixels/s)
54. 37.59ms (44.64G pixels/s)
55. 40.24ms (41.70G pixels/s)
56. 40.20ms (41.73G pixels/s)
57. 37.68ms (44.53G pixels/s)
58. 37.75ms (44.45G pixels/s)
59. 40.19ms (41.74G pixels/s)
60. 37.72ms (44.48G pixels/s)
61. 37.68ms (44.52G pixels/s)
62. 37.77ms (44.42G pixels/s)
63. 40.15ms (41.78G pixels/s)
64. 49.98ms (33.56G pixels/s)
65. 47.56ms (35.28G pixels/s)
66. 50.09ms (33.49G pixels/s)
67. 47.61ms (35.24G pixels/s)
68. 47.61ms (35.24G pixels/s)
69. 47.55ms (35.28G pixels/s)
70. 52.59ms (31.90G pixels/s)
71. 47.62ms (35.23G pixels/s)
72. 50.18ms (33.43G pixels/s)
73. 47.57ms (35.27G pixels/s)
74. 47.56ms (35.28G pixels/s)
75. 47.67ms (35.19G pixels/s)
76. 49.99ms (33.56G pixels/s)
77. 50.14ms (33.46G pixels/s)
78. 47.67ms (35.19G pixels/s)
79. 47.59ms (35.25G pixels/s)
80. 47.71ms (35.16G pixels/s)
81. 47.63ms (35.22G pixels/s)
82. 50.09ms (33.49G pixels/s)
83. 50.05ms (33.52G pixels/s)
84. 47.60ms (35.25G pixels/s)
85. 52.61ms (31.89G pixels/s)
86. 47.53ms (35.30G pixels/s)
87. 50.08ms (33.50G pixels/s)
88. 50.07ms (33.51G pixels/s)
89. 47.47ms (35.34G pixels/s)
90. 50.09ms (33.50G pixels/s)
91. 47.63ms (35.22G pixels/s)
92. 47.59ms (35.25G pixels/s)
93. 52.59ms (31.90G pixels/s)
94. 47.59ms (35.26G pixels/s)
95. 47.72ms (35.16G pixels/s)
96. 59.85ms (28.03G pixels/s)
97. 57.36ms (29.25G pixels/s)
98. 57.44ms (29.21G pixels/s)
99. 57.50ms (29.18G pixels/s)
100. 57.53ms (29.16G pixels/s)
101. 60.01ms (27.96G pixels/s)
102. 60.05ms (27.94G pixels/s)
103. 57.53ms (29.16G pixels/s)
104. 57.54ms (29.16G pixels/s)
105. 59.90ms (28.01G pixels/s)
106. 60.04ms (27.94G pixels/s)
107. 59.97ms (27.98G pixels/s)
108. 57.64ms (29.11G pixels/s)
109. 57.41ms (29.22G pixels/s)
110. 60.07ms (27.93G pixels/s)
111. 57.50ms (29.18G pixels/s)
112. 57.53ms (29.16G pixels/s)
113. 60.01ms (27.96G pixels/s)
114. 59.98ms (27.97G pixels/s)
115. 57.46ms (29.20G pixels/s)
116. 57.54ms (29.16G pixels/s)
117. 57.49ms (29.18G pixels/s)
118. 57.53ms (29.16G pixels/s)
119. 62.53ms (26.83G pixels/s)
120. 60.00ms (27.96G pixels/s)
121. 62.51ms (26.84G pixels/s)
122. 60.01ms (27.96G pixels/s)
123. 60.03ms (27.95G pixels/s)
124. 57.46ms (29.20G pixels/s)
125. 57.44ms (29.21G pixels/s)
126. 57.51ms (29.18G pixels/s)
127. 57.54ms (29.16G pixels/s)
128. 72.22ms (23.23G pixels/s)

pJqBBDS.png
 
Back
Top