DX12 Performance Discussion And Analysis Thread

The graphs made by ka_rf are very nice.

The graph actually shows that the compute only line, blue, shows a divergence from the overall trend, slowing down in that region. The orange line hasn't shifted downwards from its trend, to indicate that async compute is occurring, the blue line has wobbled into slowness. The blue line returns to its trend near 360.

There's no async here.


Could that be the start of the single command list run?

I think that happened half way during the single command list run. GPU starts going from 0 to 100% when the single command list starts.
 
So with TDR on my graphs are consistent with the others with TDR on.

But with TDR off they aren't consistent with the others? May I ask everyone that contributed with TDR off for graphs, did you still get with it off the 0-100-0-100 gpu load switch? If yes I suspect your TDR wasn't actually off and you were still experiencing the TDR time-out of 2 seconds even if the test finished successfully without the driver crashing.

Can I get specific confirmation that Forceman's TDR off graph(he was the only one able to reproduce TDR off without the 0-100-0 gpu load switch) is also not matching mine?

Pharma, thanks for the heads up on the edit count, good to know.

PadyEos, do you have 3rd party software installed such as MSI Afterburner or anything comparable (even if CPU orientated), or is it pretty clean from that perspective?
I guess most would have some kind of 3rd party performance related software installed, and most would probably be the MSI Afterburner - yeah very unlikely correlation I know but a variable that cannot be ignored.
Cheers
 
Compute only:
1. 9.76ms
2. 9.75ms
3. 9.75ms
4. 9.75ms
5. 9.13ms
6. 8.87ms
7. 8.87ms
8. 8.87ms
9. 8.81ms
10. 8.49ms
11. 8.49ms
12. 8.49ms
13. 8.51ms
14. 8.48ms
15. 8.49ms
16. 8.51ms
17. 8.48ms
18. 8.49ms
19. 8.51ms
20. 8.52ms
21. 8.50ms
22. 8.52ms
23. 8.48ms
24. 8.49ms
25. 8.49ms
26. 8.49ms
27. 8.48ms
28. 8.51ms
29. 9.18ms
30. 8.53ms
31. 8.50ms
32. 16.92ms
33. 19.02ms
34. 19.02ms
35. 19.02ms
36. 19.02ms
37. 19.03ms
38. 19.02ms
39. 19.02ms
40. 19.02ms
41. 19.02ms
42. 19.02ms
43. 21.17ms
44. 19.02ms
45. 19.02ms
46. 19.02ms
47. 19.02ms
48. 19.02ms
49. 19.02ms
50. 21.16ms
51. 19.02ms
52. 19.02ms
53. 19.02ms
54. 19.02ms
55. 19.02ms
56. 19.03ms
57. 19.02ms
58. 19.02ms
59. 19.02ms
60. 19.02ms
61. 19.02ms
62. 21.18ms
63. 19.03ms
64. 27.43ms
65. 27.43ms
66. 27.43ms
67. 29.59ms
68. 27.44ms
69. 27.43ms
70. 27.43ms
71. 27.44ms
72. 29.57ms
73. 27.43ms
74. 27.43ms
75. 27.43ms
76. 29.59ms
77. 27.43ms
78. 27.43ms
79. 27.43ms
80. 29.59ms
81. 27.44ms
82. 27.43ms
83. 27.43ms
84. 27.44ms
85. 27.45ms
86. 27.44ms
87. 27.44ms
88. 27.44ms
89. 29.59ms
90. 27.44ms
91. 27.44ms
92. 27.50ms
93. 27.44ms
94. 29.58ms
95. 27.44ms
96. 35.84ms
97. 35.84ms
98. 37.98ms
99. 37.97ms
100. 35.84ms
101. 35.87ms
102. 35.85ms
103. 35.85ms
104. 37.99ms
105. 35.85ms
106. 35.85ms
107. 35.85ms
108. 37.99ms
109. 35.85ms
110. 35.85ms
111. 38.00ms
112. 35.85ms
113. 35.85ms
114. 38.00ms
115. 35.85ms
116. 35.85ms
117. 35.85ms
118. 35.87ms
119. 35.85ms
120. 35.85ms
121. 38.00ms
122. 35.85ms
123. 35.85ms
124. 35.85ms
125. 37.99ms
126. 35.85ms
127. 35.85ms
128. 46.40ms
Graphics only: 14.24ms (117.80G pixels/s)
Graphics + compute:
1. 22.65ms (74.07G pixels/s)
2. 22.53ms (74.46G pixels/s)
3. 22.46ms (74.69G pixels/s)
4. 22.53ms (74.47G pixels/s)
5. 22.61ms (74.20G pixels/s)
6. 22.42ms (74.83G pixels/s)
7. 22.56ms (74.37G pixels/s)
8. 22.55ms (74.40G pixels/s)
9. 22.48ms (74.64G pixels/s)
10. 22.42ms (74.83G pixels/s)
11. 22.59ms (74.28G pixels/s)
12. 22.56ms (74.35G pixels/s)
13. 22.65ms (74.06G pixels/s)
14. 22.41ms (74.86G pixels/s)
15. 22.46ms (74.70G pixels/s)
16. 22.42ms (74.83G pixels/s)
17. 22.57ms (74.34G pixels/s)
18. 22.45ms (74.72G pixels/s)
19. 22.62ms (74.15G pixels/s)
20. 22.63ms (74.14G pixels/s)
21. 22.41ms (74.86G pixels/s)
22. 22.52ms (74.48G pixels/s)
23. 22.55ms (74.41G pixels/s)
24. 22.61ms (74.22G pixels/s)
25. 22.52ms (74.50G pixels/s)
26. 22.49ms (74.61G pixels/s)
27. 22.53ms (74.48G pixels/s)
28. 22.43ms (74.79G pixels/s)
29. 22.58ms (74.29G pixels/s)
30. 22.62ms (74.16G pixels/s)
31. 22.57ms (74.32G pixels/s)
32. 30.86ms (54.37G pixels/s)
33. 30.86ms (54.37G pixels/s)
34. 30.99ms (54.13G pixels/s)
35. 32.97ms (50.89G pixels/s)
36. 31.00ms (54.12G pixels/s)
37. 30.99ms (54.14G pixels/s)
38. 30.85ms (54.39G pixels/s)
39. 30.83ms (54.43G pixels/s)
40. 31.11ms (53.93G pixels/s)
41. 30.95ms (54.21G pixels/s)
42. 30.89ms (54.31G pixels/s)
43. 31.02ms (54.09G pixels/s)
44. 30.92ms (54.25G pixels/s)
45. 30.96ms (54.19G pixels/s)
46. 30.91ms (54.28G pixels/s)
47. 33.06ms (50.75G pixels/s)
48. 31.01ms (54.11G pixels/s)
49. 30.92ms (54.25G pixels/s)
50. 31.01ms (54.11G pixels/s)
51. 30.87ms (54.35G pixels/s)
52. 30.90ms (54.30G pixels/s)
53. 30.89ms (54.31G pixels/s)
54. 31.00ms (54.12G pixels/s)
55. 30.86ms (54.36G pixels/s)
56. 30.99ms (54.14G pixels/s)
57. 30.91ms (54.28G pixels/s)
58. 31.03ms (54.06G pixels/s)
59. 30.99ms (54.13G pixels/s)
60. 30.89ms (54.31G pixels/s)
61. 31.01ms (54.10G pixels/s)
62. 31.00ms (54.12G pixels/s)
63. 32.93ms (50.94G pixels/s)
64. 39.36ms (42.63G pixels/s)
65. 39.26ms (42.73G pixels/s)
66. 39.40ms (42.58G pixels/s)
67. 39.35ms (42.64G pixels/s)
68. 39.36ms (42.62G pixels/s)
69. 39.26ms (42.73G pixels/s)
70. 39.22ms (42.77G pixels/s)
71. 39.35ms (42.64G pixels/s)
72. 39.30ms (42.69G pixels/s)
73. 39.18ms (42.82G pixels/s)
74. 39.23ms (42.76G pixels/s)
75. 41.36ms (40.56G pixels/s)
76. 39.34ms (42.64G pixels/s)
77. 39.39ms (42.59G pixels/s)
78. 39.55ms (42.42G pixels/s)
79. 39.34ms (42.64G pixels/s)
80. 39.38ms (42.61G pixels/s)
81. 39.37ms (42.62G pixels/s)
82. 41.31ms (40.61G pixels/s)
83. 39.38ms (42.60G pixels/s)
84. 39.41ms (42.57G pixels/s)
85. 39.34ms (42.64G pixels/s)
86. 39.36ms (42.62G pixels/s)
87. 39.34ms (42.64G pixels/s)
88. 39.46ms (42.52G pixels/s)
89. 39.18ms (42.82G pixels/s)
90. 39.22ms (42.77G pixels/s)
91. 41.43ms (40.50G pixels/s)
92. 39.38ms (42.60G pixels/s)
93. 39.33ms (42.65G pixels/s)
94. 39.36ms (42.62G pixels/s)
95. 39.23ms (42.77G pixels/s)
96. 47.59ms (35.25G pixels/s)
97. 47.69ms (35.18G pixels/s)
98. 47.77ms (35.12G pixels/s)
99. 47.82ms (35.08G pixels/s)
100. 49.79ms (33.70G pixels/s)
101. 47.82ms (35.09G pixels/s)
102. 47.85ms (35.06G pixels/s)
103. 47.80ms (35.10G pixels/s)
104. 47.78ms (35.12G pixels/s)
105. 49.93ms (33.60G pixels/s)
106. 47.68ms (35.18G pixels/s)
107. 49.93ms (33.60G pixels/s)
108. 47.74ms (35.14G pixels/s)
109. 47.73ms (35.15G pixels/s)
110. 47.72ms (35.16G pixels/s)
111. 47.81ms (35.09G pixels/s)
112. 47.69ms (35.18G pixels/s)
113. 47.75ms (35.14G pixels/s)
114. 47.72ms (35.16G pixels/s)
115. 49.98ms (33.57G pixels/s)
116. 47.80ms (35.10G pixels/s)
117. 49.90ms (33.62G pixels/s)
118. 47.80ms (35.10G pixels/s)
119. 47.76ms (35.13G pixels/s)
120. 47.78ms (35.11G pixels/s)
121. 47.72ms (35.15G pixels/s)
122. 47.78ms (35.11G pixels/s)
123. 49.81ms (33.68G pixels/s)
124. 47.76ms (35.12G pixels/s)
125. 47.83ms (35.07G pixels/s)
126. 47.68ms (35.19G pixels/s)
127. 47.77ms (35.12G pixels/s)
128. 58.32ms (28.77G pixels/s)
 
Compute only:
1. 9.76ms
2. 9.75ms
3. 9.75ms
4. 9.75ms
5. 9.13ms
6. 8.87ms
7. 8.87ms
8. 8.87ms
9. 8.81ms
10. 8.49ms
11. 8.49ms
12. 8.49ms
13. 8.51ms
14. 8.48ms
15. 8.49ms
16. 8.51ms
17. 8.48ms
18. 8.49ms
19. 8.51ms
20. 8.52ms
21. 8.50ms
22. 8.52ms
23. 8.48ms
24. 8.49ms
25. 8.49ms
26. 8.49ms
27. 8.48ms
28. 8.51ms
29. 9.18ms
30. 8.53ms
31. 8.50ms
32. 16.92ms
33. 19.02ms
34. 19.02ms
35. 19.02ms
36. 19.02ms
37. 19.03ms
38. 19.02ms
39. 19.02ms
40. 19.02ms
41. 19.02ms
42. 19.02ms
43. 21.17ms
44. 19.02ms
45. 19.02ms
46. 19.02ms
47. 19.02ms
48. 19.02ms
49. 19.02ms
50. 21.16ms
51. 19.02ms
52. 19.02ms
53. 19.02ms
54. 19.02ms
55. 19.02ms
56. 19.03ms
57. 19.02ms
58. 19.02ms
59. 19.02ms
60. 19.02ms
61. 19.02ms
62. 21.18ms
63. 19.03ms
64. 27.43ms
65. 27.43ms
66. 27.43ms
67. 29.59ms
68. 27.44ms
69. 27.43ms
70. 27.43ms
71. 27.44ms
72. 29.57ms
73. 27.43ms
74. 27.43ms
75. 27.43ms
76. 29.59ms
77. 27.43ms
78. 27.43ms
79. 27.43ms
80. 29.59ms
81. 27.44ms
82. 27.43ms
83. 27.43ms
84. 27.44ms
85. 27.45ms
86. 27.44ms
87. 27.44ms
88. 27.44ms
89. 29.59ms
90. 27.44ms
91. 27.44ms
92. 27.50ms
93. 27.44ms
94. 29.58ms
95. 27.44ms
96. 35.84ms
97. 35.84ms
98. 37.98ms
99. 37.97ms
100. 35.84ms
101. 35.87ms
102. 35.85ms
103. 35.85ms
104. 37.99ms
105. 35.85ms
106. 35.85ms
107. 35.85ms
108. 37.99ms
109. 35.85ms
110. 35.85ms
111. 38.00ms
112. 35.85ms
113. 35.85ms
114. 38.00ms
115. 35.85ms
116. 35.85ms
117. 35.85ms
118. 35.87ms
119. 35.85ms
120. 35.85ms
121. 38.00ms
122. 35.85ms
123. 35.85ms
124. 35.85ms
125. 37.99ms
126. 35.85ms
127. 35.85ms
128. 46.40ms
Graphics only: 14.24ms (117.80G pixels/s)
Graphics + compute:
1. 22.65ms (74.07G pixels/s)
2. 22.53ms (74.46G pixels/s)
3. 22.46ms (74.69G pixels/s)
4. 22.53ms (74.47G pixels/s)
5. 22.61ms (74.20G pixels/s)
6. 22.42ms (74.83G pixels/s)
7. 22.56ms (74.37G pixels/s)
8. 22.55ms (74.40G pixels/s)
9. 22.48ms (74.64G pixels/s)
10. 22.42ms (74.83G pixels/s)
11. 22.59ms (74.28G pixels/s)
12. 22.56ms (74.35G pixels/s)
13. 22.65ms (74.06G pixels/s)
14. 22.41ms (74.86G pixels/s)
15. 22.46ms (74.70G pixels/s)
16. 22.42ms (74.83G pixels/s)
17. 22.57ms (74.34G pixels/s)
18. 22.45ms (74.72G pixels/s)
19. 22.62ms (74.15G pixels/s)
20. 22.63ms (74.14G pixels/s)
21. 22.41ms (74.86G pixels/s)
22. 22.52ms (74.48G pixels/s)
23. 22.55ms (74.41G pixels/s)
24. 22.61ms (74.22G pixels/s)
25. 22.52ms (74.50G pixels/s)
26. 22.49ms (74.61G pixels/s)
27. 22.53ms (74.48G pixels/s)
28. 22.43ms (74.79G pixels/s)
29. 22.58ms (74.29G pixels/s)
30. 22.62ms (74.16G pixels/s)
31. 22.57ms (74.32G pixels/s)
32. 30.86ms (54.37G pixels/s)
33. 30.86ms (54.37G pixels/s)
34. 30.99ms (54.13G pixels/s)
35. 32.97ms (50.89G pixels/s)
36. 31.00ms (54.12G pixels/s)
37. 30.99ms (54.14G pixels/s)
38. 30.85ms (54.39G pixels/s)
39. 30.83ms (54.43G pixels/s)
40. 31.11ms (53.93G pixels/s)
41. 30.95ms (54.21G pixels/s)
42. 30.89ms (54.31G pixels/s)
43. 31.02ms (54.09G pixels/s)
44. 30.92ms (54.25G pixels/s)
45. 30.96ms (54.19G pixels/s)
46. 30.91ms (54.28G pixels/s)
47. 33.06ms (50.75G pixels/s)
48. 31.01ms (54.11G pixels/s)
49. 30.92ms (54.25G pixels/s)
50. 31.01ms (54.11G pixels/s)
51. 30.87ms (54.35G pixels/s)
52. 30.90ms (54.30G pixels/s)
53. 30.89ms (54.31G pixels/s)
54. 31.00ms (54.12G pixels/s)
55. 30.86ms (54.36G pixels/s)
56. 30.99ms (54.14G pixels/s)
57. 30.91ms (54.28G pixels/s)
58. 31.03ms (54.06G pixels/s)
59. 30.99ms (54.13G pixels/s)
60. 30.89ms (54.31G pixels/s)
61. 31.01ms (54.10G pixels/s)
62. 31.00ms (54.12G pixels/s)
63. 32.93ms (50.94G pixels/s)
64. 39.36ms (42.63G pixels/s)
65. 39.26ms (42.73G pixels/s)
66. 39.40ms (42.58G pixels/s)
67. 39.35ms (42.64G pixels/s)
68. 39.36ms (42.62G pixels/s)
69. 39.26ms (42.73G pixels/s)
70. 39.22ms (42.77G pixels/s)
71. 39.35ms (42.64G pixels/s)
72. 39.30ms (42.69G pixels/s)
73. 39.18ms (42.82G pixels/s)
74. 39.23ms (42.76G pixels/s)
75. 41.36ms (40.56G pixels/s)
76. 39.34ms (42.64G pixels/s)
77. 39.39ms (42.59G pixels/s)
78. 39.55ms (42.42G pixels/s)
79. 39.34ms (42.64G pixels/s)
80. 39.38ms (42.61G pixels/s)
81. 39.37ms (42.62G pixels/s)
82. 41.31ms (40.61G pixels/s)
83. 39.38ms (42.60G pixels/s)
84. 39.41ms (42.57G pixels/s)
85. 39.34ms (42.64G pixels/s)
86. 39.36ms (42.62G pixels/s)
87. 39.34ms (42.64G pixels/s)
88. 39.46ms (42.52G pixels/s)
89. 39.18ms (42.82G pixels/s)
90. 39.22ms (42.77G pixels/s)
91. 41.43ms (40.50G pixels/s)
92. 39.38ms (42.60G pixels/s)
93. 39.33ms (42.65G pixels/s)
94. 39.36ms (42.62G pixels/s)
95. 39.23ms (42.77G pixels/s)
96. 47.59ms (35.25G pixels/s)
97. 47.69ms (35.18G pixels/s)
98. 47.77ms (35.12G pixels/s)
99. 47.82ms (35.08G pixels/s)
100. 49.79ms (33.70G pixels/s)
101. 47.82ms (35.09G pixels/s)
102. 47.85ms (35.06G pixels/s)
103. 47.80ms (35.10G pixels/s)
104. 47.78ms (35.12G pixels/s)
105. 49.93ms (33.60G pixels/s)
106. 47.68ms (35.18G pixels/s)
107. 49.93ms (33.60G pixels/s)
108. 47.74ms (35.14G pixels/s)
109. 47.73ms (35.15G pixels/s)
110. 47.72ms (35.16G pixels/s)
111. 47.81ms (35.09G pixels/s)
112. 47.69ms (35.18G pixels/s)
113. 47.75ms (35.14G pixels/s)
114. 47.72ms (35.16G pixels/s)
115. 49.98ms (33.57G pixels/s)
116. 47.80ms (35.10G pixels/s)
117. 49.90ms (33.62G pixels/s)
118. 47.80ms (35.10G pixels/s)
119. 47.76ms (35.13G pixels/s)
120. 47.78ms (35.11G pixels/s)
121. 47.72ms (35.15G pixels/s)
122. 47.78ms (35.11G pixels/s)
123. 49.81ms (33.68G pixels/s)
124. 47.76ms (35.12G pixels/s)
125. 47.83ms (35.07G pixels/s)
126. 47.68ms (35.19G pixels/s)
127. 47.77ms (35.12G pixels/s)
128. 58.32ms (28.77G pixels/s)

Think you need to run the updated version.
https://forum.beyond3d.com/posts/1869354/
 
I saw some running the earlier test, so I went back and tried it.

I get 52 ms the whole test.

Edit: will rerun, I forgot I was running "Log.cmd" at the same time.
Edit2: came out the same
 

Attachments

  • Fury_older_test.zip
    1.1 KB · Views: 5
Last edited:
Ok so I recorded a session in WPA and opened it in GPUview, and if I'm interpreting it correctly it may be doing a little async on my 980 Ti? It's hard to tell. It would be very helpful if someone with a Radeon could do this as well so we can compare.
 
I recorded the older test, well I ran the newer one first, and the merge file was around 10 Gig.
 
GPUView zoomed into section with graphics, and compute, older version.
GPUVIEW_Fury.jpg
 
Actually, I liked your original visualizer better. If everyone agrees this test is basically just useful for testing for the presence of fine-grained compute by measuring the savings offered by running jobs concurrently, your visualizer does a nice job of simplifying that as, "Look for overlap; gaps are bad."

If the overlap is really the most/only relevant information we can get from this test, I was wondering if it may be useful and further reduce confusion to plot that all by itself. Could you plot the Faster By as a line graph, and then give the user the ability to multi-select cards to compare savings? Bonus points if you can have buttons that plot averages for different GPU families. ;)

Hmm, I've added the "time saved by async" data into the scatter plot mode, in orange, as can be seen here:

PNMPzvT.png


I don't think a comparison feature is useful in this context. The test tool was never meant to be a benchmarking tool, so comparing numbers to numbers of different cards is pointless and misleading and may just start more pissing contests.

As long as you can see whether the orange values are mostly at 0 or tend to stay close to the blue line, that's enough information for our purpose.

As for plotting multiple charts side by side... it's a web page so just open multiple windows ;)
 
Hmm, I've added the "time saved by async" data into the scatter plot mode, in orange, as can be seen here:

PNMPzvT.png


I don't think a comparison feature is useful in this context. The test tool was never meant to be a benchmarking tool, so comparing numbers to numbers of different cards is pointless and misleading and may just start more pissing contests.

As long as you can see whether the orange values are mostly at 0 or tend to stay close to the blue line, that's enough information for our purpose.

As for plotting multiple charts side by side... it's a web page so just open multiple windows ;)

Am I reading this wrong, because it looks like a lot of Maxwell cards save a lot of time using async? And the Furys spend a lot of time saving virtually nothing? And then the 280X saves more time than anything?
 
Capture2.PNG

Ok, so a very different picture on the 980 Ti. The way the graph works is that only the block on the bottom is being actively worked on, any blocks stacked on top are queued. The colors correspond to the same work.

So in the device context you can see the two separate jobs - the top, longer one is the compute from the test app, and the one below it is the graphics load from the test app. Now here's the weird thing - both green blocks are pushed into the 3D queue, and they don't run asynchronously at all - the graphics runs first, then the compute. So what's that brown stuff in the compute queue? It's DWM.exe, the desktop compositor, completely separate from the test. I'm no graphics programmer, but doesn't that seem backwards? That the "compute" job is going in the 3D queue. But the DWM, which I presume would be more graphics related....is going in the compute queue? The DWM corresponds with the flip, so looking at the fury graph, it's probably the light blue block that also corresponds with the flip...and it's in the 3D queue where I would assume it belongs.

So taking a closer look at a section where only the compute job is running:

Captur3.PNG


The asyncompute.exe compute job comes in 4 separate bursts. The brown DWM compute spike overlaps with the first of a new batch of four. The last three are precisely 36ms. But the first is a little longer at 45ms, but the DWM compute spike is 15ms. If they were run serially, it should have been 51ms. And this pattern is repeatable, the sum is always larger than the actual run time. So it looks like there may be some asynchronous behavior here....but not with what we expect it to?

Now maybe someone else can explain why the compute test isn't going into the compute queue, but for the thing that IS going into that queue, unless I'm reading this entirely incorrectly...is running asynchronously.
 
Back
Top