DX12 Performance Discussion And Analysis Thread

GTX 750 Ti - Driver Version: 10.18.13.5580 / 335.80
Compute only:
1. 12.36ms
2. 12.35ms
3. 12.24ms
4. 11.26ms
5. 11.29ms
6. 11.24ms
7. 11.06ms
8. 10.92ms
9. 10.90ms
10. 10.88ms
11. 10.90ms
12. 10.90ms
13. 10.88ms
14. 10.91ms
15. 10.90ms
16. 10.90ms
17. 21.59ms
18. 21.59ms
19. 21.62ms
20. 21.64ms
21. 21.64ms
22. 21.62ms
23. 21.62ms
24. 21.60ms
25. 21.64ms
26. 21.62ms
27. 21.64ms
28. 21.62ms
29. 21.64ms
30. 21.61ms
31. 21.61ms
32. 32.40ms
33. 37.72ms
34. 35.00ms
35. 40.45ms
36. 35.03ms
37. 35.01ms
38. 34.99ms
39. 37.74ms
40. 37.73ms
41. 35.01ms
42. 37.76ms
43. 35.00ms
44. 35.01ms
45. 37.75ms
46. 35.04ms
47. 37.71ms
48. 43.05ms
49. 48.49ms
50. 45.69ms
51. 51.13ms
52. 45.70ms
53. 48.44ms
54. 48.49ms
55. 45.70ms
56. 48.45ms
57. 48.43ms
58. 48.43ms
59. 51.18ms
60. 45.74ms
61. 51.21ms
62. 48.44ms
63. 48.42ms
64. 59.18ms
65. 56.42ms
66. 61.85ms
67. 56.38ms
68. 64.58ms
69. 56.39ms
70. 59.15ms
71. 56.39ms
72. 61.90ms
73. 56.41ms
74. 56.44ms
75. 59.13ms
76. 59.11ms
77. 59.19ms
78. 56.42ms
79. 56.46ms
80. 69.81ms
81. 67.17ms
82. 67.09ms
83. 69.85ms
84. 72.57ms
85. 69.84ms
86. 72.61ms
87. 67.09ms
88. 72.56ms
89. 67.11ms
90. 69.87ms
91. 72.57ms
92. 69.81ms
93. 72.58ms
94. 67.11ms
95. 72.57ms
96. 77.85ms
97. 80.53ms
98. 85.96ms
99. 77.79ms
100. 80.56ms
101. 85.96ms
102. 77.82ms
103. 80.59ms
104. 83.25ms
105. 77.79ms
106. 80.66ms
107. 80.57ms
108. 80.53ms
109. 80.55ms
110. 80.54ms
111. 80.52ms
112. 93.93ms
113. 91.22ms
114. 88.51ms
115. 91.24ms
116. 91.24ms
117. 93.96ms
118. 91.22ms
119. 91.26ms
120. 93.93ms
121. 93.98ms
122. 91.22ms
123. 91.24ms
124. 91.24ms
125. 88.48ms
126. 91.22ms
127. 91.25ms
128. 104.67ms
Graphics only: 109.40ms (15.34G pixels/s)
Graphics + compute:
1. 120.15ms (13.96G pixels/s)
2. 120.13ms (13.97G pixels/s)
3. 120.07ms (13.97G pixels/s)
4. 120.06ms (13.97G pixels/s)
5. 120.13ms (13.97G pixels/s)
6. 120.11ms (13.97G pixels/s)
7. 120.13ms (13.97G pixels/s)
8. 120.17ms (13.96G pixels/s)
9. 120.07ms (13.97G pixels/s)
10. 120.11ms (13.97G pixels/s)
11. 120.11ms (13.97G pixels/s)
12. 120.10ms (13.97G pixels/s)
13. 120.10ms (13.97G pixels/s)
14. 120.07ms (13.97G pixels/s)
15. 120.12ms (13.97G pixels/s)
16. 120.21ms (13.96G pixels/s)
17. 130.86ms (12.82G pixels/s)
18. 130.89ms (12.82G pixels/s)
19. 130.91ms (12.82G pixels/s)
20. 130.84ms (12.82G pixels/s)
21. 130.82ms (12.82G pixels/s)
22. 130.93ms (12.81G pixels/s)
23. 130.88ms (12.82G pixels/s)
24. 130.82ms (12.82G pixels/s)
25. 130.89ms (12.82G pixels/s)
26. 130.87ms (12.82G pixels/s)
27. 130.90ms (12.82G pixels/s)
28. 130.88ms (12.82G pixels/s)
29. 130.83ms (12.82G pixels/s)
30. 130.84ms (12.82G pixels/s)
31. 130.91ms (12.82G pixels/s)
32. 141.57ms (11.85G pixels/s)
33. 146.99ms (11.41G pixels/s)
34. 144.31ms (11.63G pixels/s)
35. 144.33ms (11.62G pixels/s)
36. 149.63ms (11.21G pixels/s)
37. 144.30ms (11.63G pixels/s)
38. 141.53ms (11.85G pixels/s)
39. 144.24ms (11.63G pixels/s)
40. 147.01ms (11.41G pixels/s)
41. 144.34ms (11.62G pixels/s)
42. 146.92ms (11.42G pixels/s)
43. 144.32ms (11.63G pixels/s)
44. 141.54ms (11.85G pixels/s)
45. 147.10ms (11.41G pixels/s)
46. 148.51ms (11.30G pixels/s)
47. 144.32ms (11.63G pixels/s)
48. 152.27ms (11.02G pixels/s)
49. 157.62ms (10.64G pixels/s)
50. 154.91ms (10.83G pixels/s)
51. 163.31ms (10.27G pixels/s)
52. 158.13ms (10.61G pixels/s)
53. 160.59ms (10.45G pixels/s)
54. 159.35ms (10.53G pixels/s)
55. 158.48ms (10.59G pixels/s)
56. 152.25ms (11.02G pixels/s)
57. 163.20ms (10.28G pixels/s)
58. 158.19ms (10.61G pixels/s)
59. 163.27ms (10.28G pixels/s)
60. 160.40ms (10.46G pixels/s)
61. 154.96ms (10.83G pixels/s)
62. 154.96ms (10.83G pixels/s)
63. 157.68ms (10.64G pixels/s)
64. 165.59ms (10.13G pixels/s)
65. 168.41ms (9.96G pixels/s)
66. 168.42ms (9.96G pixels/s)
67. 165.64ms (10.13G pixels/s)
68. 168.37ms (9.96G pixels/s)
69. 168.42ms (9.96G pixels/s)
70. 162.93ms (10.30G pixels/s)
71. 168.34ms (9.97G pixels/s)
72. 168.38ms (9.96G pixels/s)
73. 165.64ms (10.13G pixels/s)
74. 168.35ms (9.97G pixels/s)
75. 165.64ms (10.13G pixels/s)
76. 165.64ms (10.13G pixels/s)
77. 168.31ms (9.97G pixels/s)
78. 165.64ms (10.13G pixels/s)
79. 162.95ms (10.30G pixels/s)
80. 176.45ms (9.51G pixels/s)
81. 179.07ms (9.37G pixels/s)
82. 173.66ms (9.66G pixels/s)
83. 176.31ms (9.52G pixels/s)
84. 179.10ms (9.37G pixels/s)
85. 173.63ms (9.66G pixels/s)
86. 173.74ms (9.66G pixels/s)
87. 179.12ms (9.37G pixels/s)
88. 176.33ms (9.51G pixels/s)
89. 173.74ms (9.66G pixels/s)
90. 179.11ms (9.37G pixels/s)
91. 176.32ms (9.52G pixels/s)
92. 179.08ms (9.37G pixels/s)
93. 179.09ms (9.37G pixels/s)
94. 173.67ms (9.66G pixels/s)
95. 179.11ms (9.37G pixels/s)
96. 184.39ms (9.10G pixels/s)
97. 184.34ms (9.10G pixels/s)
98. 189.73ms (8.84G pixels/s)
99. 187.17ms (8.96G pixels/s)
100. 187.10ms (8.97G pixels/s)
101. 184.33ms (9.10G pixels/s)
102. 189.73ms (8.84G pixels/s)
103. 187.05ms (8.97G pixels/s)
104. 187.03ms (8.97G pixels/s)
105. 187.07ms (8.97G pixels/s)
106. 203.42ms (8.25G pixels/s)
107. 184.40ms (9.10G pixels/s)
108. 187.04ms (8.97G pixels/s)
109. 187.07ms (8.97G pixels/s)
110. 187.01ms (8.97G pixels/s)
111. 187.09ms (8.97G pixels/s)
112. 200.46ms (8.37G pixels/s)
113. 197.81ms (8.48G pixels/s)
114. 206.55ms (8.12G pixels/s)
115. 197.77ms (8.48G pixels/s)
116. 197.80ms (8.48G pixels/s)
117. 195.02ms (8.60G pixels/s)
118. 203.18ms (8.26G pixels/s)
119. 195.03ms (8.60G pixels/s)
120. 200.51ms (8.37G pixels/s)
121. 197.75ms (8.48G pixels/s)
122. 203.19ms (8.26G pixels/s)
123. 195.04ms (8.60G pixels/s)
124. 200.51ms (8.37G pixels/s)
125. 195.04ms (8.60G pixels/s)
126. 203.19ms (8.26G pixels/s)
127. 200.55ms (8.37G pixels/s)
128. 253.66ms (6.61G pixels/s)
 
Unfortunately with the current market share, Nvidia not being capable doesn't hurt Nvidia, but AMD that's supposed to gain from the use of Async compute as this somewhat reduces the incentive to use these techniques. Sorry for the business related side of things on this technical thread..

Well, for the moment, the question on the technical aspect, is maybe just to know if Nvidia dont support it really, or does it support it in a different way ( context switching ) and could they enable it fully then ?

Also, looking at the question of console port to PC, if consoles devs are allready investigate it, and planning to use it, this will be pretty funny to see PC devs disable it for AMD gpu's... ( And in this case, i fear allready the tons of flaming threads who will sadly keep us busy in 2016 )

Lets be honest, im not sure all engine / games will really benefits of it, and there's other stuffs in DX12 who could benefit both when we compare to DX11.
 
40+ms launch overhead would indeed make it completely useless. That would cap any game making a dispatch call to 25 fps. Can you try to shorten the loop to 1024 and see how it reacts? You have a GCN board right?
The latency doesn't matter if you are using GPU compute (including async) for rendering. You should not copy the results back to CPU or wait for the GPU on CPU side. Discrete GPUs are far away from the CPU. You should not expect to see low latency. Discrete GPUs are not good for tightly interleaved mixed CPU->GPU->CPU work.

To see realistic results, you should benchmark async compute in rendering tasks. For example render a shadow map while you run a tiled lighting compute shader concurrently (for the previous frame). Output the result to display instread of waiting compute to finish on CPU. For result timing, use GPU timestamps, do not use a CPU timer. CPU side timing of GPU results in lots of noise and even false results because of driver related buffering.
 
Unfortunately with the current market share, Nvidia not being capable doesn't hurt Nvidia, but AMD that's supposed to gain from the use of Async compute as this somewhat reduces the incentive to use these techniques. Sorry for the business related side of things on this technical thread..

Actually, AMD's presence in high-profile development crushes everything else due to GCN being in both Sony and Microsoft consoles (and most probably, next year's Nintendo NX).
Even if AMD's marketshare on the PC is rather low nowadays, there's isn't any honest reason to keep console-developed async compute benefits from passing towards DX12 ports.

We can definitely count on Gameworks to at least try to find a way to keep this from happening, but nVidia won't be able to cover all high-profile console ports with their trojan horse program. Developers with big financial backings (studios from EA, Rockstar, Activision-Blizzard, Microsoft, Valve, etc.), will probably refuse nVidia's attempts when they do their PC ports.

Of course, if you're running a studio that has a fraction of the budget from the big guys (e.g. CD Projekt) or your budget is low because your publisher has been making terrible decisions over the years (e.g. anything Ubisoft lately), then Gameworks seems really nice because it could save a lot of money.
 
Its not even going to be a "port" soon. XBOX One will be Win 10 / DX12 and a lot of it will be directly transportable between the two. Given the XBOX One configuration of GPU I would imagine a lot of performance optimization will go into ensuring that gets the baseline quality / performance then the PC maybe gets extra bells and whistles.
 
I think the EDRAM plus the HSA-like architecture of the consoles makes a load of console-specific performance-centric design decisions moot in the PC space.

Also, if publishers hand over console games to some fly-by-night studio which solely has to get the game working on PC, given the art assets and the console gaming experience as a guide then you get something like Batman: Arkham Knight.
 
As for the launch overhead, I think that's partially related to the test which fills then drains the device queues. Normally games never let device queues run dry.
That was only the first version. This second one only uses 2 queues and command lists are prepared in advance.

The latency doesn't matter if you are using GPU compute (including async) for rendering. You should not copy the results back to CPU or wait for the GPU on CPU side. Discrete GPUs are far away from the CPU. You should not expect to see low latency. Discrete GPUs are not good for tightly interleaved mixed CPU->GPU->CPU work.
It already doesn't wait for any copies back to CPU side. It does it's timing on CPU side and waits for fences to signal. So that's actually a great idea to use GPU side timestamps! :) That should point out a few more things.
 
That was only the first version. This second one only uses 2 queues and command lists are prepared in advance.
Which seems to be why 128 kernel launches appear to run simultaneously on GCN as, effectively, a single enqueue operation. It still sounds to me as if your code then waits for that single enqueue to drain.

Obviously you're dependent upon the driver and hardware in this situation and it seems to me that AMD and NVidia are behaving quite differently. My theory is now that NVidia assigns each instance of a kernel that you enqueue to a single queue entry, which is why there's the 32-spaced stepping in the results on GM200. On AMD there is a single queue with all the kernels lined up.

An alternative test would be to construct a set of 8 different kernels, each of which compiles to independent code and each of which is bound to a distinct output buffer. Issued to 8 distinct queues (round-robin), that should exercise GCN some more. It might also provide evidence of the theories going around that NVidia doesn't handle concurrent compute contexts gracefully.

Another thing to try is simply to launch substantially more kernels than the count of concurrent work-groups that these GPUs can support, i.e. >2560 on Fiji and erm 1536 on GM200 (16 per SIMD, with 96 SIMDs)? At some point we should see a step up in time on GCN...
 
GTX 970 - Driver Version 10.18.13.5582(355.82 WHQL)
Compute only:
1. 9.57ms
2. 9.56ms
3. 9.61ms
4. 9.59ms
5. 9.60ms
6. 9.60ms
7. 9.61ms
8. 9.60ms
9. 9.61ms
10. 9.60ms
11. 9.62ms
12. 9.61ms
13. 9.62ms
14. 9.61ms
15. 9.61ms
16. 9.60ms
17. 9.63ms
18. 9.62ms
19. 9.60ms
20. 9.61ms
21. 9.62ms
22. 9.62ms
23. 9.61ms
24. 9.65ms
25. 9.64ms
26. 9.63ms
27. 9.67ms
28. 9.64ms
29. 9.64ms
30. 9.63ms
31. 9.63ms
32. 19.08ms
33. 22.23ms
34. 21.55ms
35. 19.11ms
36. 19.11ms
37. 21.48ms
38. 21.53ms
39. 23.91ms
40. 21.50ms
41. 19.12ms
42. 19.12ms
43. 19.14ms
44. 21.52ms
45. 19.15ms
46. 23.89ms
47. 19.13ms
48. 21.50ms
49. 19.14ms
50. 19.15ms
51. 23.89ms
52. 27.95ms
53. 24.43ms
54. 26.93ms
55. 24.34ms
56. 24.45ms
57. 26.94ms
58. 24.44ms
59. 24.34ms
60. 26.82ms
61. 24.38ms
62. 26.71ms
63. 21.55ms
64. 28.59ms
65. 33.35ms
66. 31.08ms
67. 28.62ms
68. 28.62ms
69. 28.62ms
70. 28.69ms
71. 31.04ms
72. 28.64ms
73. 30.99ms
74. 35.83ms
75. 28.66ms
76. 28.63ms
77. 28.64ms
78. 33.46ms
79. 28.64ms
80. 28.66ms
81. 28.64ms
82. 35.83ms
83. 28.68ms
84. 28.65ms
85. 28.65ms
86. 33.47ms
87. 28.66ms
88. 28.67ms
89. 28.65ms
90. 33.49ms
91. 28.67ms
92. 33.41ms
93. 28.64ms
94. 33.48ms
95. 31.02ms
96. 38.09ms
97. 38.12ms
98. 42.92ms
99. 38.15ms
100. 40.55ms
101. 42.92ms
102. 42.91ms
103. 40.57ms
104. 42.92ms
105. 40.50ms
106. 45.32ms
107. 42.89ms
108. 42.90ms
109. 38.22ms
110. 42.91ms
111. 38.15ms
112. 42.96ms
113. 38.16ms
114. 40.52ms
115. 42.98ms
116. 38.16ms
117. 40.51ms
118. 45.33ms
119. 40.53ms
120. 42.90ms
121. 42.94ms
122. 38.16ms
123. 40.53ms
124. 40.58ms
125. 40.52ms
126. 40.53ms
127. 40.60ms
128. 47.60ms
Graphics only: 31.28ms (53.64G pixels/s)
Graphics + compute:
1. 40.66ms (41.26G pixels/s)
2. 40.66ms (41.26G pixels/s)
3. 40.69ms (41.23G pixels/s)
4. 40.72ms (41.20G pixels/s)
5. 40.69ms (41.23G pixels/s)
6. 40.71ms (41.21G pixels/s)
7. 40.73ms (41.19G pixels/s)
8. 40.71ms (41.22G pixels/s)
9. 40.70ms (41.22G pixels/s)
10. 40.76ms (41.16G pixels/s)
11. 40.71ms (41.21G pixels/s)
12. 40.70ms (41.22G pixels/s)
13. 40.74ms (41.18G pixels/s)
14. 40.70ms (41.22G pixels/s)
15. 40.68ms (41.24G pixels/s)
16. 40.69ms (41.23G pixels/s)
17. 40.71ms (41.21G pixels/s)
18. 40.73ms (41.19G pixels/s)
19. 40.71ms (41.22G pixels/s)
20. 40.71ms (41.21G pixels/s)
21. 40.73ms (41.19G pixels/s)
22. 40.73ms (41.19G pixels/s)
23. 40.69ms (41.23G pixels/s)
24. 40.75ms (41.17G pixels/s)
25. 40.75ms (41.17G pixels/s)
26. 40.73ms (41.19G pixels/s)
27. 40.73ms (41.19G pixels/s)
28. 40.72ms (41.20G pixels/s)
29. 40.72ms (41.21G pixels/s)
30. 40.74ms (41.18G pixels/s)
31. 40.72ms (41.21G pixels/s)
32. 50.19ms (33.43G pixels/s)
33. 52.66ms (31.86G pixels/s)
34. 50.25ms (33.38G pixels/s)
35. 50.32ms (33.34G pixels/s)
36. 52.63ms (31.88G pixels/s)
37. 50.25ms (33.38G pixels/s)
38. 50.24ms (33.39G pixels/s)
39. 50.25ms (33.39G pixels/s)
40. 50.33ms (33.33G pixels/s)
41. 50.25ms (33.39G pixels/s)
42. 50.26ms (33.38G pixels/s)
43. 50.24ms (33.39G pixels/s)
44. 50.27ms (33.38G pixels/s)
45. 50.30ms (33.35G pixels/s)
46. 50.27ms (33.38G pixels/s)
47. 50.34ms (33.33G pixels/s)
48. 50.29ms (33.36G pixels/s)
49. 50.26ms (33.38G pixels/s)
50. 52.68ms (31.85G pixels/s)
51. 50.26ms (33.38G pixels/s)
52. 52.71ms (31.83G pixels/s)
53. 50.27ms (33.38G pixels/s)
54. 50.23ms (33.40G pixels/s)
55. 52.67ms (31.85G pixels/s)
56. 52.64ms (31.87G pixels/s)
57. 52.73ms (31.82G pixels/s)
58. 50.28ms (33.37G pixels/s)
59. 50.36ms (33.32G pixels/s)
60. 50.27ms (33.38G pixels/s)
61. 50.22ms (33.41G pixels/s)
62. 52.65ms (31.87G pixels/s)
63. 50.27ms (33.37G pixels/s)
64. 62.10ms (27.02G pixels/s)
65. 59.71ms (28.10G pixels/s)
66. 62.17ms (26.99G pixels/s)
67. 59.76ms (28.08G pixels/s)
68. 59.81ms (28.05G pixels/s)
69. 59.78ms (28.07G pixels/s)
70. 62.22ms (26.96G pixels/s)
71. 62.13ms (27.01G pixels/s)
72. 62.17ms (26.99G pixels/s)
73. 59.77ms (28.07G pixels/s)
74. 62.19ms (26.98G pixels/s)
75. 59.77ms (28.07G pixels/s)
76. 64.60ms (25.97G pixels/s)
77. 59.72ms (28.09G pixels/s)
78. 62.21ms (26.97G pixels/s)
79. 59.74ms (28.08G pixels/s)
80. 62.23ms (26.96G pixels/s)
81. 59.77ms (28.07G pixels/s)
82. 62.21ms (26.97G pixels/s)
83. 59.77ms (28.07G pixels/s)
84. 62.17ms (26.98G pixels/s)
85. 59.79ms (28.06G pixels/s)
86. 62.18ms (26.98G pixels/s)
87. 59.78ms (28.07G pixels/s)
88. 62.18ms (26.98G pixels/s)
89. 59.78ms (28.07G pixels/s)
90. 62.22ms (26.96G pixels/s)
91. 59.76ms (28.08G pixels/s)
92. 62.19ms (26.98G pixels/s)
93. 59.78ms (28.06G pixels/s)
94. 62.22ms (26.97G pixels/s)
95. 59.77ms (28.07G pixels/s)
96. 69.25ms (24.23G pixels/s)
97. 69.23ms (24.24G pixels/s)
98. 69.23ms (24.23G pixels/s)
99. 71.68ms (23.41G pixels/s)
100. 69.28ms (24.22G pixels/s)
101. 74.10ms (22.64G pixels/s)
102. 69.28ms (24.22G pixels/s)
103. 69.34ms (24.20G pixels/s)
104. 69.33ms (24.20G pixels/s)
105. 71.67ms (23.41G pixels/s)
106. 74.10ms (22.64G pixels/s)
107. 69.28ms (24.22G pixels/s)
108. 74.11ms (22.64G pixels/s)
109. 69.29ms (24.21G pixels/s)
110. 71.71ms (23.40G pixels/s)
111. 71.74ms (23.38G pixels/s)
112. 69.24ms (24.23G pixels/s)
113. 71.70ms (23.40G pixels/s)
114. 69.28ms (24.22G pixels/s)
115. 69.35ms (24.19G pixels/s)
116. 71.71ms (23.40G pixels/s)
117. 69.30ms (24.21G pixels/s)
118. 71.77ms (23.38G pixels/s)
119. 69.26ms (24.22G pixels/s)
120. 69.35ms (24.19G pixels/s)
121. 69.24ms (24.23G pixels/s)
122. 69.32ms (24.20G pixels/s)
123. 69.26ms (24.22G pixels/s)
124. 69.26ms (24.22G pixels/s)
125. 71.71ms (23.40G pixels/s)
126. 71.66ms (23.41G pixels/s)
127. 71.74ms (23.39G pixels/s)
128. 78.74ms (21.31G pixels/s)
 
Could some of you experts in the field help us laymen interpret the data correctly?

For clarity sake, are these the correct analysis?

https://www.reddit.com/r/nvidia/comments/3j5e9b/analysis_async_compute_is_it_true_nvidia_cant_do/

https://www.reddit.com/r/oculus/comments/3j5h9y/put_that_popcorn_away_nvidia_maxwell_does/

Some are saying that Maxwell is superior to GCN for compute, or that GCN itself is doing compute + graphics serially due to the time/ms count being much higher than NV's GPU.
 
D3D12 should be used with caution as it requires more coding effort.


I agree with that Jawed but this is the way the industry is going, so unfortunately for inexperienced programmers its much higher learning curve when API's and coding standards are concerned.
 
Could some of you experts in the field help us laymen interpret the data correctly?

For clarity sake, are these the correct analysis?

https://www.reddit.com/r/nvidia/comments/3j5e9b/analysis_async_compute_is_it_true_nvidia_cant_do/

https://www.reddit.com/r/oculus/comments/3j5h9y/put_that_popcorn_away_nvidia_maxwell_does/

Some are saying that Maxwell is superior to GCN for compute, or that GCN itself is doing compute + graphics serially due to the time/ms count being much higher than NV's GPU.


still going over that ;),

It seems that nV's hardware is better for compute on a per unit basis as with graphics, now with graphics with compute there seems to be some issues but I think we need more data to make a conclusive assessment on what is going on.
 
I agree with that Jawed but this is the way the industry is going, so unfortunately for inexperienced programmers its much higher learning curve when API's and coding standards are concerned.
For inexperienced programmers, the idea is really to use an off-the shelf engine (Unity, Unreal, etc) rather than rolling your own. Middleware is essentially the new high-level API.
 
still going over that ;),

It seems that nV's hardware is better for compute on a per unit basis as with graphics, now with graphics with compute there seems to be some issues but I think we need more data to make a conclusive assessment on what is going on.

Thank you.

So in the meantime, is it accurate to use the results/data as a benchmark to compare how effective/good the architectures are in relation to compute, graphics or async compute, or is the test a case of function present/absent base on expected outcomes?
 
Thank you.

So in the meantime, is it accurate to use the results/data as a benchmark to compare how effective/good the architectures are in relation to compute, graphics or async compute, or is the test a case of function present/absent base on expected outcomes?

Doesn't seem like it. It definitely looks like something odd is going on with the AMD results.
 
Why do the Maxwell cards take longer as the number of batches (or commands) increases, even in the compute only pass? Shouldn't that be running synchronously, and not be impacted by whether Maxwell supports async compute or not?
 
And why does the 750Ti show the same 32 command pattern as the 9 series? Shouldn't it be different since it doesn't support even the 31+1 of the 9 series?
 
Back
Top