DX12 Performance Discussion And Analysis Thread

https://www.reddit.com/r/pcgaming/comments/3j87qg/nvidias_maxwell_gpus_can_do_dx12_async_shading/

Err I think Hallock doesn't understand what this little program does well at least parts of it lol. He is miss representing the single command list to prove his points :)

This is almost like a soap opera! With tech

Is he?

ie.
25ms Graphics
25ms Compute

With Async Compute enabled, the combined Graphics + Compute task should be completed in.... ?

With Async Compute disabled, the combined Graphics + Compute task should be completed in....?
 
when he was talking about the first statement he was referring to the first test. The second test doesn't show that at all,

graphics + compute is always lower than the two separate
Titan X
Compute
38.26ms
32.95ms
35.61ms
32.91ms
32.93ms
35.59ms
32.93ms

TitanX
Graphics
18.17ms

TitanX
Compute + Graphics

28.96ms (57.93G pixels/s)
29.01ms (57.84G pixels/s)
29.14ms (57.57G pixels/s)
29.11ms (57.64G pixels/s)
29.13ms (57.59G pixels/s)
29.07ms (57.71G pixels/s)
29.11ms (57.64G pixels/s)

If it was going in serial it should be well above 50ms.

And this is why I said parts of it.
 
Last edited:
Hey guys, someone help me out here.

I don't quite understand the difference between the async+compute test set and the async+compute(single commandlist) test set.

If someone could help me understand that, i can think of a way to properly visualize the data on the chart page i made.

But for now, the new test sets are added, highlighted in green where "async+compute(single commandlist)" is available, but currently not yet presenting that data.

ceoG5Mh.png
 
R9 280 OC / Catalyst 15.8 beta

Compute only:
1. 49.14ms
2. 49.13ms
3. 49.16ms
4. 49.14ms
5. 49.13ms
6. 49.16ms
7. 49.14ms
8. 49.13ms
9. 49.15ms
10. 49.16ms
11. 49.15ms
12. 49.10ms
13. 49.15ms
14. 49.15ms
15. 49.14ms
16. 49.14ms
17. 49.14ms
18. 49.45ms
19. 49.14ms
20. 49.14ms
21. 49.12ms
22. 49.13ms
23. 49.16ms
24. 49.14ms
25. 49.10ms
26. 49.11ms
27. 49.13ms
28. 49.16ms
29. 49.15ms
30. 49.12ms
31. 49.14ms
32. 49.14ms
33. 49.12ms
34. 49.15ms
35. 49.12ms
36. 49.14ms
37. 49.14ms
38. 49.12ms
39. 49.13ms
40. 49.13ms
41. 49.10ms
42. 49.15ms
43. 49.17ms
44. 49.15ms
45. 49.15ms
46. 49.15ms
47. 49.12ms
48. 49.39ms
49. 49.15ms
50. 49.13ms
51. 49.14ms
52. 49.14ms
53. 49.15ms
54. 49.16ms
55. 49.14ms
56. 49.14ms
57. 49.16ms
58. 49.16ms
59. 49.14ms
60. 49.14ms
61. 49.14ms
62. 49.14ms
63. 49.15ms
64. 49.12ms
65. 49.16ms
66. 49.14ms
67. 49.14ms
68. 49.15ms
69. 49.17ms
70. 49.16ms
71. 49.15ms
72. 49.13ms
73. 49.17ms
74. 49.13ms
75. 49.14ms
76. 49.14ms
77. 49.18ms
78. 49.13ms
79. 49.12ms
80. 49.13ms
81. 49.13ms
82. 49.10ms
83. 49.13ms
84. 49.15ms
85. 49.14ms
86. 49.15ms
87. 49.15ms
88. 49.15ms
89. 49.14ms
90. 49.14ms
91. 49.13ms
92. 49.15ms
93. 49.13ms
94. 49.15ms
95. 49.17ms
96. 49.15ms
97. 49.13ms
98. 49.16ms
99. 49.13ms
100. 49.14ms
101. 49.14ms
102. 49.16ms
103. 49.16ms
104. 49.14ms
105. 49.16ms
106. 49.14ms
107. 49.28ms
108. 49.15ms
109. 49.14ms
110. 49.15ms
111. 49.13ms
112. 49.17ms
113. 49.13ms
114. 49.13ms
115. 49.14ms
116. 49.15ms
117. 49.15ms
118. 49.16ms
119. 49.11ms
120. 49.13ms
121. 49.12ms
122. 49.15ms
123. 49.14ms
124. 49.15ms
125. 49.14ms
126. 49.15ms
127. 49.15ms
128. 49.14ms
Graphics only: 47.23ms (35.52G pixels/s)
Graphics + compute:
1. 49.28ms (34.04G pixels/s)
2. 49.30ms (34.03G pixels/s)
3. 49.31ms (34.02G pixels/s)
4. 49.34ms (34.00G pixels/s)
5. 49.33ms (34.01G pixels/s)
6. 49.36ms (33.99G pixels/s)
7. 49.30ms (34.03G pixels/s)
8. 57.71ms (29.07G pixels/s)
9. 49.32ms (34.02G pixels/s)
10. 49.33ms (34.01G pixels/s)
11. 49.31ms (34.03G pixels/s)
12. 49.30ms (34.03G pixels/s)
13. 49.32ms (34.02G pixels/s)
14. 49.30ms (34.03G pixels/s)
15. 49.32ms (34.01G pixels/s)
16. 49.32ms (34.02G pixels/s)
17. 49.35ms (34.00G pixels/s)
18. 49.33ms (34.01G pixels/s)
19. 49.31ms (34.02G pixels/s)
20. 49.34ms (34.00G pixels/s)
21. 49.33ms (34.01G pixels/s)
22. 49.37ms (33.99G pixels/s)
23. 49.31ms (34.03G pixels/s)
24. 49.32ms (34.02G pixels/s)
25. 49.35ms (34.00G pixels/s)
26. 49.36ms (33.99G pixels/s)
27. 49.34ms (34.00G pixels/s)
28. 49.34ms (34.00G pixels/s)
29. 49.34ms (34.00G pixels/s)
30. 49.33ms (34.01G pixels/s)
31. 49.35ms (33.99G pixels/s)
32. 49.36ms (33.99G pixels/s)
33. 49.29ms (34.04G pixels/s)
34. 49.36ms (33.99G pixels/s)
35. 49.35ms (33.99G pixels/s)
36. 49.32ms (34.02G pixels/s)
37. 49.46ms (33.92G pixels/s)
38. 62.19ms (26.98G pixels/s)
39. 49.34ms (34.01G pixels/s)
40. 49.33ms (34.01G pixels/s)
41. 49.33ms (34.01G pixels/s)
42. 49.32ms (34.02G pixels/s)
43. 49.33ms (34.01G pixels/s)
44. 49.33ms (34.01G pixels/s)
45. 49.35ms (34.00G pixels/s)
46. 49.35ms (33.99G pixels/s)
47. 49.36ms (33.99G pixels/s)
48. 49.33ms (34.01G pixels/s)
49. 49.36ms (33.99G pixels/s)
50. 49.33ms (34.01G pixels/s)
51. 49.37ms (33.99G pixels/s)
52. 49.35ms (34.00G pixels/s)
53. 49.35ms (34.00G pixels/s)
54. 49.37ms (33.98G pixels/s)
55. 49.36ms (33.99G pixels/s)
56. 49.36ms (33.99G pixels/s)
57. 49.35ms (34.00G pixels/s)
58. 49.34ms (34.00G pixels/s)
59. 49.33ms (34.01G pixels/s)
60. 49.35ms (34.00G pixels/s)
61. 49.37ms (33.99G pixels/s)
62. 49.34ms (34.00G pixels/s)
63. 49.37ms (33.98G pixels/s)
64. 49.34ms (34.00G pixels/s)
65. 49.34ms (34.00G pixels/s)
66. 49.43ms (33.94G pixels/s)
67. 49.34ms (34.00G pixels/s)
68. 49.34ms (34.00G pixels/s)
69. 49.36ms (33.99G pixels/s)
70. 49.34ms (34.00G pixels/s)
71. 49.38ms (33.97G pixels/s)
72. 49.36ms (33.99G pixels/s)
73. 49.34ms (34.00G pixels/s)
74. 49.35ms (34.00G pixels/s)
75. 49.39ms (33.97G pixels/s)
76. 49.35ms (34.00G pixels/s)
77. 49.34ms (34.00G pixels/s)
78. 49.36ms (33.99G pixels/s)
79. 49.37ms (33.98G pixels/s)
80. 49.36ms (33.99G pixels/s)
81. 49.30ms (34.03G pixels/s)
82. 49.39ms (33.97G pixels/s)
83. 49.33ms (34.01G pixels/s)
84. 49.35ms (34.00G pixels/s)
85. 49.36ms (33.99G pixels/s)
86. 49.33ms (34.01G pixels/s)
87. 49.36ms (33.99G pixels/s)
88. 49.37ms (33.98G pixels/s)
89. 49.35ms (33.99G pixels/s)
90. 49.38ms (33.98G pixels/s)
91. 49.36ms (33.99G pixels/s)
92. 49.35ms (33.99G pixels/s)
93. 49.34ms (34.00G pixels/s)
94. 49.36ms (33.99G pixels/s)
95. 49.33ms (34.01G pixels/s)
96. 61.37ms (27.34G pixels/s)
97. 49.34ms (34.00G pixels/s)
98. 49.37ms (33.98G pixels/s)
99. 49.37ms (33.98G pixels/s)
100. 49.38ms (33.98G pixels/s)
101. 49.35ms (34.00G pixels/s)
102. 49.34ms (34.00G pixels/s)
103. 49.34ms (34.01G pixels/s)
104. 49.38ms (33.98G pixels/s)
105. 49.35ms (34.00G pixels/s)
106. 49.35ms (34.00G pixels/s)
107. 49.34ms (34.00G pixels/s)
108. 49.39ms (33.97G pixels/s)
109. 49.35ms (34.00G pixels/s)
110. 49.35ms (34.00G pixels/s)
111. 49.36ms (33.99G pixels/s)
112. 49.34ms (34.00G pixels/s)
113. 49.39ms (33.97G pixels/s)
114. 49.39ms (33.97G pixels/s)
115. 49.44ms (33.93G pixels/s)
116. 49.43ms (33.94G pixels/s)
117. 49.44ms (33.93G pixels/s)
118. 49.42ms (33.95G pixels/s)
119. 49.45ms (33.93G pixels/s)
120. 49.46ms (33.92G pixels/s)
121. 49.42ms (33.95G pixels/s)
122. 49.45ms (33.93G pixels/s)
123. 49.46ms (33.92G pixels/s)
124. 49.46ms (33.92G pixels/s)
125. 49.72ms (33.75G pixels/s)
126. 49.45ms (33.93G pixels/s)
127. 49.42ms (33.95G pixels/s)
128. 49.47ms (33.91G pixels/s)
 
the single command list is forced synchronized
From the test results, I'd say it's not as forced as you might think. AMD at least is obviously capable of executing the compute portion of that test parallelized, although it does appear to cause the graphics portion to be completely separated.

Even if you're assigning the same UAV to every dispatch, AMD does have a way of implementing GL_INTEL_fragment_shader_ordering which performs (memory-visible) serialization, so they could be running the bulk calculations in parallel and arranging for the writes to happen serially after the computations are done, since the shader never reads a value from it. You should see a more actually serial result if the singe commandlist shaders were dependent on the results of the previous invocations.
 
Thanks, did this. Was then able to finish without driver crash. But I had to restart and now the performance is different(it seems) from my previous runs, faster. Now the GPU remains mostly at 100% when in "Graphics, compute single commandlist" part, instead of switching constantly between 100% and 0%, as it also did for the 970. Do not know why now it runs better...

980TI, 355.82 no crash:

Compute only:1. 5.67ms ~ 512. 76.11ms
Graphics only: 16.77ms (100.06G pixels/s)
Graphics + compute: 1. 21.15ms (79.34G pixels/s) ~ 512. 97.38ms (17.23G pixels/s)
Graphics, compute single commandlist: 1. 20.70ms (81.05G pixels/s) ~ 512. 2294.69ms (0.73G pixels/s)

So that's the 2nd test on a 980Ti,

1st:
Compute: 5.67ms
Graphics: 16.77ms

Graphics + Compute: 21.15ms
Graphics + Compute (Single Commandlist): 20.70ms

And for 512th:
Compute: 76.11ms
Graphics: 16.77ms
Graphics + Compute: 97.38ms
Graphics + Compute (Single Commandlist): 2294.69ms

---------------------------------------

In both the 1st to 512th, Async Mode adds up the time. Single Commandlist mode went nuts.

Serial:
A (Compute) + B (Graphics) = A + B

Async:
A + B = A OR B

Right? Or is that not how we are meant to interpret the data of this test?
 
R9 280 OC / Catalyst 15.8 beta

Compute only:
1. 49.14ms
2. 49.13ms
3. 49.16ms
4. 49.14ms
5. 49.13ms
6. 49.16ms
7. 49.14ms
8. 49.13ms
9. 49.15ms
10. 49.16ms
11. 49.15ms
12. 49.10ms
13. 49.15ms
14. 49.15ms
15. 49.14ms
16. 49.14ms
17. 49.14ms
18. 49.45ms
19. 49.14ms
20. 49.14ms
21. 49.12ms
22. 49.13ms
23. 49.16ms
24. 49.14ms
25. 49.10ms
26. 49.11ms
27. 49.13ms
28. 49.16ms
29. 49.15ms
30. 49.12ms
31. 49.14ms
32. 49.14ms
33. 49.12ms
34. 49.15ms
35. 49.12ms
36. 49.14ms
37. 49.14ms
38. 49.12ms
39. 49.13ms
40. 49.13ms
41. 49.10ms
42. 49.15ms
43. 49.17ms
44. 49.15ms
45. 49.15ms
46. 49.15ms
47. 49.12ms
48. 49.39ms
49. 49.15ms
50. 49.13ms
51. 49.14ms
52. 49.14ms
53. 49.15ms
54. 49.16ms
55. 49.14ms
56. 49.14ms
57. 49.16ms
58. 49.16ms
59. 49.14ms
60. 49.14ms
61. 49.14ms
62. 49.14ms
63. 49.15ms
64. 49.12ms
65. 49.16ms
66. 49.14ms
67. 49.14ms
68. 49.15ms
69. 49.17ms
70. 49.16ms
71. 49.15ms
72. 49.13ms
73. 49.17ms
74. 49.13ms
75. 49.14ms
76. 49.14ms
77. 49.18ms
78. 49.13ms
79. 49.12ms
80. 49.13ms
81. 49.13ms
82. 49.10ms
83. 49.13ms
84. 49.15ms
85. 49.14ms
86. 49.15ms
87. 49.15ms
88. 49.15ms
89. 49.14ms
90. 49.14ms
91. 49.13ms
92. 49.15ms
93. 49.13ms
94. 49.15ms
95. 49.17ms
96. 49.15ms
97. 49.13ms
98. 49.16ms
99. 49.13ms
100. 49.14ms
101. 49.14ms
102. 49.16ms
103. 49.16ms
104. 49.14ms
105. 49.16ms
106. 49.14ms
107. 49.28ms
108. 49.15ms
109. 49.14ms
110. 49.15ms
111. 49.13ms
112. 49.17ms
113. 49.13ms
114. 49.13ms
115. 49.14ms
116. 49.15ms
117. 49.15ms
118. 49.16ms
119. 49.11ms
120. 49.13ms
121. 49.12ms
122. 49.15ms
123. 49.14ms
124. 49.15ms
125. 49.14ms
126. 49.15ms
127. 49.15ms
128. 49.14ms
Graphics only: 47.23ms (35.52G pixels/s)
Graphics + compute:
1. 49.28ms (34.04G pixels/s)
2. 49.30ms (34.03G pixels/s)
3. 49.31ms (34.02G pixels/s)
4. 49.34ms (34.00G pixels/s)
5. 49.33ms (34.01G pixels/s)
6. 49.36ms (33.99G pixels/s)
7. 49.30ms (34.03G pixels/s)
8. 57.71ms (29.07G pixels/s)
9. 49.32ms (34.02G pixels/s)
10. 49.33ms (34.01G pixels/s)
11. 49.31ms (34.03G pixels/s)
12. 49.30ms (34.03G pixels/s)
13. 49.32ms (34.02G pixels/s)
14. 49.30ms (34.03G pixels/s)
15. 49.32ms (34.01G pixels/s)
16. 49.32ms (34.02G pixels/s)
17. 49.35ms (34.00G pixels/s)
18. 49.33ms (34.01G pixels/s)
19. 49.31ms (34.02G pixels/s)
20. 49.34ms (34.00G pixels/s)
21. 49.33ms (34.01G pixels/s)
22. 49.37ms (33.99G pixels/s)
23. 49.31ms (34.03G pixels/s)
24. 49.32ms (34.02G pixels/s)
25. 49.35ms (34.00G pixels/s)
26. 49.36ms (33.99G pixels/s)
27. 49.34ms (34.00G pixels/s)
28. 49.34ms (34.00G pixels/s)
29. 49.34ms (34.00G pixels/s)
30. 49.33ms (34.01G pixels/s)
31. 49.35ms (33.99G pixels/s)
32. 49.36ms (33.99G pixels/s)
33. 49.29ms (34.04G pixels/s)
34. 49.36ms (33.99G pixels/s)
35. 49.35ms (33.99G pixels/s)
36. 49.32ms (34.02G pixels/s)
37. 49.46ms (33.92G pixels/s)
38. 62.19ms (26.98G pixels/s)
39. 49.34ms (34.01G pixels/s)
40. 49.33ms (34.01G pixels/s)
41. 49.33ms (34.01G pixels/s)
42. 49.32ms (34.02G pixels/s)
43. 49.33ms (34.01G pixels/s)
44. 49.33ms (34.01G pixels/s)
45. 49.35ms (34.00G pixels/s)
46. 49.35ms (33.99G pixels/s)
47. 49.36ms (33.99G pixels/s)
48. 49.33ms (34.01G pixels/s)
49. 49.36ms (33.99G pixels/s)
50. 49.33ms (34.01G pixels/s)
51. 49.37ms (33.99G pixels/s)
52. 49.35ms (34.00G pixels/s)
53. 49.35ms (34.00G pixels/s)
54. 49.37ms (33.98G pixels/s)
55. 49.36ms (33.99G pixels/s)
56. 49.36ms (33.99G pixels/s)
57. 49.35ms (34.00G pixels/s)
58. 49.34ms (34.00G pixels/s)
59. 49.33ms (34.01G pixels/s)
60. 49.35ms (34.00G pixels/s)
61. 49.37ms (33.99G pixels/s)
62. 49.34ms (34.00G pixels/s)
63. 49.37ms (33.98G pixels/s)
64. 49.34ms (34.00G pixels/s)
65. 49.34ms (34.00G pixels/s)
66. 49.43ms (33.94G pixels/s)
67. 49.34ms (34.00G pixels/s)
68. 49.34ms (34.00G pixels/s)
69. 49.36ms (33.99G pixels/s)
70. 49.34ms (34.00G pixels/s)
71. 49.38ms (33.97G pixels/s)
72. 49.36ms (33.99G pixels/s)
73. 49.34ms (34.00G pixels/s)
74. 49.35ms (34.00G pixels/s)
75. 49.39ms (33.97G pixels/s)
76. 49.35ms (34.00G pixels/s)
77. 49.34ms (34.00G pixels/s)
78. 49.36ms (33.99G pixels/s)
79. 49.37ms (33.98G pixels/s)
80. 49.36ms (33.99G pixels/s)
81. 49.30ms (34.03G pixels/s)
82. 49.39ms (33.97G pixels/s)
83. 49.33ms (34.01G pixels/s)
84. 49.35ms (34.00G pixels/s)
85. 49.36ms (33.99G pixels/s)
86. 49.33ms (34.01G pixels/s)
87. 49.36ms (33.99G pixels/s)
88. 49.37ms (33.98G pixels/s)
89. 49.35ms (33.99G pixels/s)
90. 49.38ms (33.98G pixels/s)
91. 49.36ms (33.99G pixels/s)
92. 49.35ms (33.99G pixels/s)
93. 49.34ms (34.00G pixels/s)
94. 49.36ms (33.99G pixels/s)
95. 49.33ms (34.01G pixels/s)
96. 61.37ms (27.34G pixels/s)
97. 49.34ms (34.00G pixels/s)
98. 49.37ms (33.98G pixels/s)
99. 49.37ms (33.98G pixels/s)
100. 49.38ms (33.98G pixels/s)
101. 49.35ms (34.00G pixels/s)
102. 49.34ms (34.00G pixels/s)
103. 49.34ms (34.01G pixels/s)
104. 49.38ms (33.98G pixels/s)
105. 49.35ms (34.00G pixels/s)
106. 49.35ms (34.00G pixels/s)
107. 49.34ms (34.00G pixels/s)
108. 49.39ms (33.97G pixels/s)
109. 49.35ms (34.00G pixels/s)
110. 49.35ms (34.00G pixels/s)
111. 49.36ms (33.99G pixels/s)
112. 49.34ms (34.00G pixels/s)
113. 49.39ms (33.97G pixels/s)
114. 49.39ms (33.97G pixels/s)
115. 49.44ms (33.93G pixels/s)
116. 49.43ms (33.94G pixels/s)
117. 49.44ms (33.93G pixels/s)
118. 49.42ms (33.95G pixels/s)
119. 49.45ms (33.93G pixels/s)
120. 49.46ms (33.92G pixels/s)
121. 49.42ms (33.95G pixels/s)
122. 49.45ms (33.93G pixels/s)
123. 49.46ms (33.92G pixels/s)
124. 49.46ms (33.92G pixels/s)
125. 49.72ms (33.75G pixels/s)
126. 49.45ms (33.93G pixels/s)
127. 49.42ms (33.95G pixels/s)
128. 49.47ms (33.91G pixels/s)

I just ran the updated benchmark... File attached...
 

Attachments

  • r9-280OC-perf.zip
    57.9 KB · Views: 14
How many compute queues are being set up with the latest test version?
I recall that it was a large number initially, but is it now one graphics and one compute for the mixed case?
Two queues (COMPUTE and DIRECT). The graphics, compute single command lists packs everything into one command list (number of compute dispatches and 100 draw calls) and executes it over DIRECT queue.

MDolenc, did you use the disable timeout flag? I wonder if it completely disables it or just extends the timeout.
I knew I saw that somewhere... No, it's not used at the moment.

P.S.: Numbers in [] are gpu timestamps from beginning of the whole thing to the end of n-th dispatch converted to ms. Fillrate in {} is fillrate calculated based on gpu timestamp before clear and after all the draws.
 
Last edited:
Log from a 7950 (oc'd @ 1ghz with latest driver 15.8 beta)
 

Attachments

  • perf.zip
    72.8 KB · Views: 8
Last edited:
when he was talking about the first statement he was referring to the first test. The second test doesn't show that at all,

graphics + compute is always lower than the two separate
Titan X
Compute
38.26ms
32.95ms
35.61ms
32.91ms
32.93ms
35.59ms
32.93ms

TitanX
Graphics
18.17ms

TitanX
Compute + Graphics

28.96ms (57.93G pixels/s)
29.01ms (57.84G pixels/s)
29.14ms (57.57G pixels/s)
29.11ms (57.64G pixels/s)
29.13ms (57.59G pixels/s)
29.07ms (57.71G pixels/s)
29.11ms (57.64G pixels/s)

If it was going in serial it should be well above 50ms.

And this is why I said parts of it.

Where is this data from, the Titan X SLI user?

What's going on here, it seems splitting up the workload across the 2 GPUs.. whereas the single 980Ti config has the same behavior as prior: additive, sum of compute + graphics.
 
Got strange results with the last version
 

Attachments

  • perf GTX 960 OC 353.63-AsyncComputeV2.zip
    1.3 KB · Views: 9
  • perf R9 280X OC 15.7.1-AsyncComputeV2.zip
    1 KB · Views: 4
  • perf GTX 960 OC 353.62-AsyncComputeV3.zip
    50.2 KB · Views: 10
  • perf R9 280X OC 15.7.1-AsyncComputeV3.zip
    52.6 KB · Views: 5
Here are my results from my single 980 Ti

Compute only:
1. 6.79ms

Graphics only: 16.21ms

Graphics + compute:
1. 20.22ms

Graphics, compute single commandlist:
1. 20.04ms

Your result is identical to others. Running Graphics + Compute results in an additive output, close to the sum of compute + graphics.

Also your single commandlist results (forced), result in ever rising timings as we've seen with the others, up to 281st with a time of 2117.00ms!

Is this what Oxide is talking about? When they try to force direct async mode it would mess up.
 
the single command list is forced synchronized

Hmm, i've added the data into the tooltip and plotted it on the chart as a single dot. It should be apparent just by looking at it.

I've yet to add a label on the y-axis though :T

rZAQeRL.png
 
Hmm, i've added the data into the tooltip and plotted it on the chart as a single dot. It should be apparent just by looking at it.

I've yet to add a label on the y-axis though :T

rZAQeRL.png

So Titan X SLI can do compute + graphics async faster than compute + graphics serial, from this app. But single GPU (both 980Ti results the same thus far) cannot.

So it seems its offloading compute for GPU 1 and graphics for GPU 2?
 
Maybe I'm reading the graphs wrong...are the GCN cards doing better with a single commandlist?
 
Back
Top