DX12 Performance Discussion And Analysis Thread

With alll the papers presented by console developpers and cosigned or co-presented with AMD this year about how to code for the GCN architectures, what to do or not do and how to extract the most performances of GCN. If they have not collaborate, how they have do ?

We have never learn so much about GCN architectures that since the consoles use it.Just because every conferences of gaming studio ( Siggraph etc ) are full of presentation about it made by game console developpers, specially the one who was not developp for the PC.

In consoles it is surely more made on an large scale level, than forcibly on an "studio by studio", as consoles developpers have need learn new API, new architectures, and they are really quick to explore and share their finds. ( something on PC who is maybe made, on an architecture basis, more intimately between brands and studio).

Maybe im wrong, but looking at the difference between old generation and new consoles, they should have been a large collaboration scale.
 
Last edited:
I especially glad to have bought the RX 480 after having read the newest AMD press release saying
„AMD reaffirms its DirectX 12 performance leadership with Futuremark’s new 3DMark Time Spy benchmark
Futuremark® has today released its newest benchmark, 3DMark® Time Spy, and Radeon™ graphics results further demonstrate AMD’s performance leadership in DirectX® 12. “
Here's the same graph with current USD prices applies and sorted by price/perf rating

rating graph.JPG
 
So you are saying that consoles are not affecting development decisions in terms of engine-rendering-post processing, further exacerbated with low level API for the game released on mulit-platforms?
What went wrong (and still is) with Quantum Break for Nvidia hardware?

I am not sure what we are arguing about here as there are overlaps, AMD is helping developers and consoles is a contributing factor, especially when also considering low level API and also async compute.
Although I assume other opinions (I appreciate not all agree) would not contribute consoles for the reason with the improved performance we are seeing with AAA multi-platform DX11 games in general for AMD on PC, and especially with developers going more with rendering-post processing effects that are very well optimised-designed around GCN relative to Nvidia.

But as I say, Square Enix (easiest example if ignoring Dice) has been pretty open about the assistance they receive from AMD in the development stage, and this has synergy between the various platforms.
Same way it could be said Mantle has synergy with consoles.
And the crux of it could be said; Would Async Compute be integral to the development and design of games we are seeing on PC without the current consoles that AMD controls.
Cheers
 
Last edited:
Considering PS Vita used an ARM based core, many people were surprised when Sony didn't go with an ARM based solution for PS4.

why? ARM64 was not there. ARM32 is *not* even an option to consider - you are not making a phone or an handheld device (and you dont want to be struck by the 32 bit limits).
So no, AMD had all the numbers. Plus, offering a single SOC vs 2 chips means you are literally unbeatable price wise.

Next gen could be interesting, but it is quite clear that backward compatibility is more important now, as more and more services goes implemented into both platforms.
Next gen, you'll have to have them ready, or some your customers will go the other platform if they have them.
 
Actually, also on Pascal, scheduling is done on a per-SM-basis. So no mixing of workloads intra-SM.

This was my understanding as well, with Maxwell the partitioning was static at drawcall boundaries, but with a fixed latency pipeline I would expect it to be possible to develop a heuristic that at the very least doesn't lead to performance loss :p Why does MDolencs bench scale with 32 though, why 32?
 
I especially glad to have bought the RX 480 after having read the newest AMD press release saying
„AMD reaffirms its DirectX 12 performance leadership with Futuremark’s new 3DMark Time Spy benchmark
Futuremark® has today released its newest benchmark, 3DMark® Time Spy, and Radeon™ graphics results further demonstrate AMD’s performance leadership in DirectX® 12. “
Accompanying it was this remarkable benchmark-(ch)art, reaffirming the Radeon's performance leadership in DX12.
q86yCSR.png

;)
edit:
Credit were it's due though: Props to the chart beginning at 0 actually!!

more edit:
Thanks to this, I can also run it on my HD 7970. Awesome!

3Dmark Time spy Guru3D numbers are out with Async compute on and off.. pretty interesting numbers with Pascal.

http://www.guru3d.com/articles-pages/futuremark-3dmark-timespy-benchmark-review,1.html

I was going to comment on how much faster than the 980Ti the 1070 is and why that might be, but then Laneks post seems to have confirmed that Pascal is benefiting from async.... nice!
 
MDolenc I am getting large variations with the async compute test, initially i thought it was because browser was open, so I closed it but still had problems. Appears sporadic lol.

ANyway manage to get a log of one of the runs in which it worked properly
CxUWWVT.png

Turns out all the work on the compute queue is dwm
 
Last edited:
Compute only:
1. 1.14ms
2. 1.16ms
3. 1.18ms
4. 1.15ms
5. 1.16ms
6. 1.16ms
7. 1.16ms
8. 1.16ms
9. 1.16ms
10. 1.16ms
11. 1.16ms
12. 1.17ms
13. 1.16ms
14. 1.16ms
15. 1.16ms
16. 1.16ms
17. 1.18ms
18. 1.18ms

91. 3.44ms
92. 3.47ms
93. 3.45ms
94. 3.44ms
95. 3.45ms
96. 3.46ms
97. 4.55ms

Graphics only: 19.71ms (108.96G pixels/s)
Graphics + compute:
1. 20.75ms (103.48G pixels/s)
2. 20.76ms (103.42G pixels/s)
3. 20.81ms (103.17G pixels/s)
4. 20.75ms (103.50G pixels/s)
5. 20.77ms (103.38G pixels/s)
6. 20.80ms (103.25G pixels/s)
7. 20.73ms (103.60G pixels/s)
8. 20.76ms (103.46G pixels/s)
9. 20.82ms (103.16G pixels/s)
10. 20.83ms (103.08G pixels/s)
11. 20.77ms (103.41G pixels/s)
12. 20.81ms (103.19G pixels/s)
13. 20.78ms (103.35G pixels/s)
14. 20.83ms (103.10G pixels/s)
15. 20.80ms (103.26G pixels/s)
16. 20.86ms (102.97G pixels/s)
17. 20.84ms (103.04G pixels/s)
18. 20.83ms (103.11G pixels/s)
19. 20.78ms (103.32G pixels/s)
20. 20.78ms (103.35G pixels/s)
21. 20.77ms (103.38G pixels/s)
22. 20.84ms (103.06G pixels/s)
23. 20.76ms (103.46G pixels/s)
24. 20.78ms (103.36G pixels/s)
25. 20.75ms (103.50G pixels/s)
26. 20.78ms (103.34G pixels/s)
27. 20.78ms (103.33G pixels/s)
28. 20.84ms (103.03G pixels/s)
29. 20.87ms (102.90G pixels/s)
30. 20.77ms (103.38G pixels/s)
31. 20.79ms (103.29G pixels/s)
32. 20.85ms (102.98G pixels/s)
33. 21.98ms (97.69G pixels/s)
34. 21.99ms (97.66G pixels/s)
35. 22.00ms (97.62G pixels/s)
36. 21.89ms (98.09G pixels/s)
37. 21.96ms (97.80G pixels/s)
38. 21.97ms (97.72G pixels/s)
39. 21.91ms (98.00G pixels/s)
40. 21.98ms (97.71G pixels/s)
41. 22.00ms (97.61G pixels/s)
42. 21.93ms (97.91G pixels/s)
43. 21.88ms (98.14G pixels/s)
44. 21.89ms (98.08G pixels/s)
45. 21.97ms (97.76G pixels/s)
46. 21.90ms (98.07G pixels/s)
47. 22.00ms (97.62G pixels/s)
48. 21.87ms (98.21G pixels/s)
49. 22.01ms (97.57G pixels/s)
50. 21.94ms (97.90G pixels/s)
51. 22.00ms (97.63G pixels/s)
52. 22.06ms (97.34G pixels/s)
53. 21.93ms (97.92G pixels/s)
54. 21.98ms (97.71G pixels/s)
55. 22.04ms (97.45G pixels/s)
56. 21.92ms (97.95G pixels/s)
57. 21.97ms (97.73G pixels/s)
58. 22.04ms (97.45G pixels/s)
59. 22.00ms (97.63G pixels/s)
60. 21.91ms (97.99G pixels/s)
61. 22.02ms (97.53G pixels/s)
62. 22.02ms (97.54G pixels/s)
63. 21.98ms (97.70G pixels/s)
64. 21.94ms (97.87G pixels/s)
65. 23.12ms (92.90G pixels/s)
66. 23.21ms (92.52G pixels/s)
67. 23.04ms (93.20G pixels/s)
68. 23.12ms (92.89G pixels/s)
69. 23.15ms (92.78G pixels/s)
70. 23.04ms (93.19G pixels/s)
71. 23.04ms (93.22G pixels/s)
72. 23.12ms (92.88G pixels/s)
73. 23.09ms (93.01G pixels/s)
74. 23.14ms (92.81G pixels/s)
75. 23.05ms (93.15G pixels/s)
76. 23.11ms (92.94G pixels/s)
77. 23.27ms (92.29G pixels/s)
78. 23.20ms (92.57G pixels/s)
79. 23.04ms (93.21G pixels/s)
80. 23.24ms (92.39G pixels/s)
81. 23.08ms (93.03G pixels/s)
82. 23.12ms (92.89G pixels/s)
83. 23.03ms (93.24G pixels/s)
84. 23.18ms (92.65G pixels/s)
85. 23.24ms (92.42G pixels/s)
86. 23.08ms (93.04G pixels/s)
87. 23.14ms (92.80G pixels/s)
88. 23.21ms (92.53G pixels/s)
89. 23.08ms (93.05G pixels/s)
90. 23.14ms (92.81G pixels/s)
91. 23.12ms (92.86G pixels/s)
92. 23.11ms (92.93G pixels/s)
93. 23.13ms (92.84G pixels/s)
94. 23.27ms (92.30G pixels/s)
95. 23.09ms (93.02G pixels/s)
96. 23.12ms (92.90G pixels/s)
97. 24.21ms (88.71G pixels/s)
98. 24.31ms (88.35G pixels/s)
99. 24.23ms (88.62G pixels/s)
100. 24.22ms (88.68G pixels/s)
101. 24.47ms (87.75G pixels/s)
102. 24.18ms (88.82G pixels/s)
103. 24.45ms (87.81G pixels/s)
104. 24.17ms (88.85G pixels/s)
105. 24.40ms (88.01G pixels/s)
106. 24.30ms (88.36G pixels/s)
107. 24.17ms (88.86G pixels/s)
108. 24.29ms (88.41G pixels/s)
109. 24.16ms (88.89G pixels/s)
110. 24.16ms (88.88G pixels/s)
111. 24.42ms (87.92G pixels/s)
112. 24.25ms (88.55G pixels/s)
113. 24.23ms (88.62G pixels/s)
114. 24.32ms (88.30G pixels/s)
115. 24.18ms (88.82G pixels/s)
116. 24.43ms (87.89G pixels/s)
117. 24.24ms (88.59G pixels/s)
118. 24.30ms (88.37G pixels/s)
119. 24.32ms (88.32G pixels/s)
120. 24.32ms (88.30G pixels/s)
121. 24.52ms (87.59G pixels/s)
122. 24.22ms (88.68G pixels/s)
123. 24.29ms (88.42G pixels/s)
124. 24.33ms (88.27G pixels/s)
125. 24.29ms (88.41G pixels/s)
126. 24.28ms (88.45G pixels/s)
127. 24.22ms (88.68G pixels/s)
128. 24.24ms (88.59G pixels/s)
Graphics, compute single commandlist:
1. 20.84ms (103.07G pixels/s)
2. 20.71ms (103.67G pixels/s)
3. 20.77ms (103.40G pixels/s)
4. 20.90ms (102.73G pixels/s)
5. 20.73ms (103.59G pixels/s)
6. 20.72ms (103.65G pixels/s)
7. 20.91ms (102.69G pixels/s)
8. 20.76ms (103.46G pixels/s)
9. 20.77ms (103.38G pixels/s)
10. 20.82ms (103.12G pixels/s)
11. 20.77ms (103.40G pixels/s)
12. 20.77ms (103.38G pixels/s)
13. 20.87ms (102.90G pixels/s)
14. 20.74ms (103.54G pixels/s)
15. 20.74ms (103.54G pixels/s)
16. 20.87ms (102.90G pixels/s)
17. 20.74ms (103.56G pixels/s)
18. 20.73ms (103.59G pixels/s)
19. 20.82ms (103.13G pixels/s)
20. 20.75ms (103.52G pixels/s)
21. 20.73ms (103.61G pixels/s)
22. 20.81ms (103.18G pixels/s)
23. 20.75ms (103.49G pixels/s)
24. 20.71ms (103.70G pixels/s)
25. 20.76ms (103.43G pixels/s)
26. 20.75ms (103.49G pixels/s)
27. 20.74ms (103.55G pixels/s)
28. 20.82ms (103.15G pixels/s)
29. 20.75ms (103.51G pixels/s)
30. 20.72ms (103.63G pixels/s)
31. 20.91ms (102.70G pixels/s)
32. 20.72ms (103.65G pixels/s)
33. 21.84ms (98.32G pixels/s)
34. 22.01ms (97.59G pixels/s)
35. 21.87ms (98.20G pixels/s)
36. 21.91ms (98.01G pixels/s)
37. 21.88ms (98.15G pixels/s)
38. 21.88ms (98.14G pixels/s)
39. 21.92ms (97.97G pixels/s)
40. 21.92ms (97.97G pixels/s)
41. 21.88ms (98.14G pixels/s)
42. 22.01ms (97.56G pixels/s)
43. 21.85ms (98.27G pixels/s)
44. 21.85ms (98.27G pixels/s)
45. 22.01ms (97.56G pixels/s)
46. 21.84ms (98.33G pixels/s)
47. 21.84ms (98.33G pixels/s)
48. 22.03ms (97.47G pixels/s)
49. 21.86ms (98.26G pixels/s)
50. 21.84ms (98.34G pixels/s)
51. 21.94ms (97.87G pixels/s)
52. 21.90ms (98.05G pixels/s)
53. 21.93ms (97.93G pixels/s)
54. 21.85ms (98.29G pixels/s)
55. 21.92ms (97.98G pixels/s)
56. 21.96ms (97.81G pixels/s)
57. 21.88ms (98.15G pixels/s)
58. 21.85ms (98.30G pixels/s)
59. 22.01ms (97.58G pixels/s)
60. 21.88ms (98.14G pixels/s)
61. 21.85ms (98.28G pixels/s)
62. 21.93ms (97.93G pixels/s)
63. 21.85ms (98.26G pixels/s)
64. 21.85ms (98.26G pixels/s)
65. 23.05ms (93.16G pixels/s)
66. 23.01ms (93.33G pixels/s)
67. 23.00ms (93.38G pixels/s)
68. 23.01ms (93.34G pixels/s)
69. 22.97ms (93.47G pixels/s)
70. 23.08ms (93.05G pixels/s)
71. 23.01ms (93.34G pixels/s)
72. 23.00ms (93.36G pixels/s)
73. 23.07ms (93.09G pixels/s)
74. 23.04ms (93.22G pixels/s)
75. 23.05ms (93.17G pixels/s)
76. 22.98ms (93.43G pixels/s)
77. 23.09ms (92.99G pixels/s)
78. 23.01ms (93.33G pixels/s)
79. 23.08ms (93.03G pixels/s)
80. 23.00ms (93.39G pixels/s)
81. 23.14ms (92.80G pixels/s)
82. 23.01ms (93.35G pixels/s)
83. 23.03ms (93.26G pixels/s)
84. 23.15ms (92.78G pixels/s)
85. 23.05ms (93.17G pixels/s)
86. 23.17ms (92.69G pixels/s)
87. 23.01ms (93.31G pixels/s)
88. 23.04ms (93.23G pixels/s)
89. 23.12ms (92.89G pixels/s)
90. 23.06ms (93.13G pixels/s)
91. 22.99ms (93.42G pixels/s)
92. 23.02ms (93.30G pixels/s)
93. 23.04ms (93.21G pixels/s)
94. 23.05ms (93.15G pixels/s)
95. 23.03ms (93.24G pixels/s)
96. 23.00ms (93.37G pixels/s)
97. 24.24ms (88.58G pixels/s)
98. 24.08ms (89.17G pixels/s)
99. 24.09ms (89.13G pixels/s)
100. 24.14ms (88.94G pixels/s)
101. 24.09ms (89.15G pixels/s)
102. 24.29ms (88.39G pixels/s)
103. 24.10ms (89.11G pixels/s)
104. 24.10ms (89.10G pixels/s)
105. 24.22ms (88.65G pixels/s)
106. 24.11ms (89.07G pixels/s)
107. 24.19ms (88.78G pixels/s)
108. 24.11ms (89.06G pixels/s)
109. 24.20ms (88.74G pixels/s)
110. 24.15ms (88.91G pixels/s)
111. 24.14ms (88.95G pixels/s)
112. 24.18ms (88.80G pixels/s)
113. 24.16ms (88.89G pixels/s)
114. 24.17ms (88.87G pixels/s)
115. 24.19ms (88.77G pixels/s)
116. 24.11ms (89.06G pixels/s)
117. 24.12ms (89.05G pixels/s)
118. 24.17ms (88.85G pixels/s)
119. 24.17ms (88.84G pixels/s)
120. 24.25ms (88.57G pixels/s)
121. 24.20ms (88.76G pixels/s)
122. 24.12ms (89.02G pixels/s)
123. 24.17ms (88.85G pixels/s)
124. 24.20ms (88.74G pixels/s)
125. 24.30ms (88.37G pixels/s)
126. 24.13ms (89.00G pixels/s)
127. 24.18ms (88.81G pixels/s)
128. 24.22ms (88.68G pixels/s)
Latency (compute starts 10ms after graphics):
1. 10.57ms
2. 11.31ms
3. 10.86ms
4. 10.64ms
5. 10.78ms
6. 10.45ms
7. 10.27ms
8. 10.29ms
9. 11.30ms
10. 10.92ms
11. 10.27ms
12. 10.27ms
13. 11.31ms
14. 10.78ms
15. 10.27ms
16. 11.31ms
17. 10.20ms
18. 10.20ms
19. 10.70ms
20. 10.40ms
21. 11.20ms
22. 10.20ms
23. 9.95ms
24. 11.20ms
25. 11.20ms
26. 10.20ms
27. 10.56ms
28. 11.22ms
29. 10.21ms
30. 10.20ms
31. 10.21ms
32. 11.21ms
33. 10.19ms
34. 10.20ms
35. 11.20ms
36. 10.20ms
37. 11.21ms
38. 10.56ms
39. 10.77ms
40. 10.20ms
41. 10.21ms
42. 10.66ms
43. 10.20ms
44. 10.20ms
45. 11.20ms
46. 10.73ms
47. 10.20ms
48. 10.19ms
49. 11.20ms
50. 10.64ms
51. 10.57ms
52. 10.77ms
53. 10.20ms
54. 10.20ms
55. 10.21ms
56. 10.78ms
57. 10.22ms
58. 10.20ms
59. 11.20ms
60. 11.71ms
61. 10.20ms
62. 10.21ms
63. 10.60ms
64. 11.21ms
65. 10.20ms
66. 11.21ms
67. 10.21ms
68. 10.20ms
69. 10.85ms
70. 10.20ms
71. 10.20ms
72. 10.20ms
73. 11.21ms
74. 10.70ms
75. 10.20ms
76. 10.20ms
77. 10.20ms
78. 10.71ms
79. 10.78ms
80. 10.76ms
81. 10.20ms
82. 10.20ms
83. 11.21ms
84. 10.20ms
85. 10.20ms
86. 10.20ms
87. 11.21ms
88. 10.20ms
89. 10.20ms

Average: 10.53ms

So, it appears that if I run the test on desktop, i get huge variation in the results like so
ODgasTh.png
Whereas if I have a video open in VLC (lowers performance in bench by significant amount) or have a "dynamic" webpage open in browser it runs normally like the first result in this post

Just tested again with browser open on thread; inconsistent results. Opened cnn.com, back to normal! LOL!
 
Last edited:
I especially glad to have bought the RX 480 after having read the newest AMD press release saying
„AMD reaffirms its DirectX 12 performance leadership with Futuremark’s new 3DMark Time Spy benchmark
Futuremark® has today released its newest benchmark, 3DMark® Time Spy, and Radeon™ graphics results further demonstrate AMD’s performance leadership in DirectX® 12. “
Accompanying it was this remarkable benchmark-(ch)art, reaffirming the Radeon's performance leadership in DX12.
q86yCSR.png

;)
edit:
Credit were it's due though: Props to the chart beginning at 0 actually!!

more edit:
Thanks to this, I can also run it on my HD 7970. Awesome!

Whats up with all the sarcasm coming from you lately? Is it necessary?
 
Because technical marketing is giving us a good laugh! Just more of the same misinformation that we got at launch day ... performance leadership, please!
 
Performance leadership is a very broad term. And can anyone deny they pretty much 'lead' anything below 1070 which is the majority of the GPU market?
 
why? ARM64 was not there. ARM32 is *not* even an option to consider - you are not making a phone or an handheld device (and you dont want to be struck by the 32 bit limits).
So no, AMD had all the numbers. Plus, offering a single SOC vs 2 chips means you are literally unbeatable price wise.

Next gen could be interesting, but it is quite clear that backward compatibility is more important now, as more and more services goes implemented into both platforms.
Next gen, you'll have to have them ready, or some your customers will go the other platform if they have them.

Originally the PS4 was only going to have 4 GB of memory. A 32 bit CPU would have been just fine. It wasn't until late into development after the hardware was finalized that the memory amount was upgraded to 8 GB which would then have required a 64 bit CPU.

Regards,
SB
 
Originally the PS4 was only going to have 4 GB of memory. A 32 bit CPU would have been just fine. It wasn't until late into development after the hardware was finalized that the memory amount was upgraded to 8 GB which would then have required a 64 bit CPU.

Regards,
SB

However, development kits frequently benefit from having more memory than the home version of the console. Even if the PS4 stuck with 4 GB, I think at some point 8GB would have become a desired development kit feature that would have been complicated by an architecture that didn't address it in a straightforward manner.
There are also various benefits 64-bit mode like more effective address space randomization (although Sony's kernel space apparently did not take advantage of this early on) and keeping up long-term with the ecosystem around it.
 
However, development kits frequently benefit from having more memory than the home version of the console. Even if the PS4 stuck with 4 GB, I think at some point 8GB would have become a desired development kit feature that would have been complicated by an architecture that didn't address it in a straightforward manner.
There are also various benefits 64-bit mode like more effective address space randomization (although Sony's kernel space apparently did not take advantage of this early on) and keeping up long-term with the ecosystem around it.

Wouldn't have precluded ARM CPUs from being considered in the first place then if they already knew that before hardware was finalized? As Sony has stated an ARM based design for PS4 was in the running all the way up until hardware was finalized.

Or was that just one of the ancillary benefits of going with an AMD SOC? That development kits could then have 8 GB instead of just 4 GB for an ARM design?

Regards,
SB
 
Wouldn't have precluded ARM CPUs from being considered in the first place then if they already knew that before hardware was finalized? As Sony has stated an ARM based design for PS4 was in the running all the way up until hardware was finalized.
Both x86 and ARM had ways of extending addressability on 32-bit systems, although not without complication.
It wouldn't be world-ending, but it would have been a benefit for the system that was 64-bit.
Similarly, it's not the end of the world if dev kits don't have double the RAM with the 8GB PS4 as an example. GDDR5 density has increased since then, however.
PAE was a pain to use for x86, but I don't know much about the success of the ARM equivalent.
 
Back
Top