DX12 Performance Discussion And Analysis Thread

I played with the AOTS benchmark on my 970 a couple of weeks back, with the game ini file's async option on/off and the game actually ran slightly slower with it enabled. It comes disabled by default. I've read that sometime in the last few months NV has added some kind of support for the feature. Mysterious.
 
Last edited:
I played with the AOTS benchmark on my 970 a couple of weeks back, with the game ini file's async option on/off and the game actually ran slightly slower with it enabled. It comes disabled by default. I've read that sometime in the last few months NV has added some kind of support for the feature. Mysterious.
That's why it's running only slightly slower now. ;)
 
Just checked with Dan Baker. Async is still functionally disabled on Ashes when it detects an NVIDIA card, including the GTX 1080 (since they don't have one to test against yet).
Ah great news - great from receiving clarification from Oxide not necessarily regarding the outcome :)
Here is a big question and I think it comes back to what Kollock mentioned in past that whatever the performance change seen between drivers is whatever NVIDIA is doing outside of that; so how is it that the 1080 does not see any negative performance unlike Maxwell?
Is this to do with the behaviour of the pre-emption and how it now works from a hardware perspective on Pascal?
Anyway this makes most of the testing with Ashes a moot point for now; although nice to see that at least Pascal is behaving better than Maxwell even without async compute in DX12 and AoTS.
Thanks
 
Last edited:
Well it also doesn't explain the increases at some resolutions either lol

Funny stuff is going on.......
Not sure anyone has used the Intel PerfMon on AoTS, might make sense rather than just relying upon the internal benchmark measurement tool.
Cheers
 
Not sure anyone has used the Intel PerfMon on AoTS, might make sense rather than just relying upon the internal benchmark measurement tool.
Cheers

Hardware Canucks used PresentMon in their review:

Ashes of the Singularity is a real time strategy game on a grand scale, very much in the vein of Supreme Commander. While this game is most known for is Asynchronous workloads through the DX12 API, it also happens to be pretty fun to play. While Ashes has a built-in performance counter alongside its built-in benchmark utility, we found it to be highly unreliable and often posts a substantial run-to-run variation. With that in mind we still used the onboard benchmark since it eliminates the randomness that arises when actually playing the game but utilized the PresentMon utility to log performance

WG0Mnpj.jpg
 
In performance overall yes, but that would not explain how Pascal is now always slightly positive result rather than always slightly negative with Maxwell in AoTS.
Cheers
There's still some work re-org that goes on when async is turned on in Ashes. Apparently this is benefiting the Pascal cards ever so slightly.
 
For me, it didn't. It's actually a bit faster with Async off via settings.ini (1080p low or extreme). Maybe I have an outlier, maybe some were mixing up results with what their expectations were.
 
Hardware Canucks used PresentMon in their review:



WG0Mnpj.jpg

Those are interesting results for the Fury/980ti.
Just need more to use it to build up a validated picture.
I always wondered if the pre-set benchmark created an unrealistic load in terms of additional synthetic test *shrug*.
Thanks.
 
Hardware Canucks used PresentMon in their review:



WG0Mnpj.jpg

Looking at the difference of performance, even between a 980TI and a Fury, i will enjoy other sites and reviewers to redo the tests . As thoses results are clearly inverted .. But not only with Presentmon ofc.
 
Looking at the difference of performance, even between a 980TI and a Fury, i will enjoy other sites and reviewers to redo the tests . As thoses results are clearly inverted .. But not only with Presentmon ofc.
Would be interesting to know which present mode PerfMon claimed to have detected for each of the architectures.

Furthermore, at least in the case of AotS, there is also the possibility that frames are being dropped prior(!) to being presented (saying the application culls stale frames itself instead of letting the OS do this). Same goes for a couple of other titles as well.

We have seen this before with AotS, that for AMD hardware, only actually displayed frames were also presented, while the render engine internally ran at a higher framerate, only blitting out the most recent frame per refresh to the present API. The last time, this tripped a reviewer attempting to measure frame times. There is a ton of vendor specific code, heuristics and optimizations in the present chain setup in AotS, so measuring at this point is very unlikely to produce any comparable results across multiple vendors.
 
Would be interesting to know which present mode PerfMon claimed to have detected for each of the architectures.

Furthermore, at least in the case of AotS, there is also the possibility that frames are being dropped prior(!) to being presented (saying the application culls stale frames itself instead of letting the OS do this). Same goes for a couple of other titles as well.

We have seen this before with AotS, that for AMD hardware, only actually displayed frames were also presented, while the render engine internally ran at a higher framerate, only blitting out the most recent frame per refresh to the present API. The last time, this tripped a reviewer attempting to measure frame times. There is a ton of vendor specific code, heuristics and optimizations in the present chain setup in AotS, so measuring at this point is very unlikely to produce any comparable results across multiple vendors.

This is probably a stupid question from me; but if it rendered internally at a higher framerate and the issue is dropped frames internally, why then at the displayed frames is the AMD performance lower than the NVIDIA?
In the PresentMon utility that is.
It does raise shouldn't performance be based on the actual fps/frame-time presented to the user rather than internal mechanism?
Because we are seeing a skewing happening between much higher internal "frames" to what is practically seen by the user, and this matters if it is possible for a large swing to take place between internal and seen measurements.

This is a great analysis on PresentMon, one of the more detailed out there:
http://www.pcper.com/reviews/Graphics-Cards/PresentMon-Frame-Time-Performance-Data-DX12-UWP-Games

Cheers
 
This is probably a stupid question from me; but if it rendered internally at a higher framerate and the issue is dropped frames internally, why then at the displayed frames is the AMD performance lower than the NVIDIA?
Your assumption that the frames have been displayed is wrong, in the case of Nvidia. Not unless you run PresentMon explicitly with "-exclude_dropped".

As soon as two frames are successfully rendered within the same refresh cycle, they are otherwise both going to be counted as presented, but not actually displayed.
It does raise shouldn't performance be based on the actual fps/frame-time presented to the user rather than internal mechanism?
No, definitely not. If you did that, e.g. most VR ready applications with async timewarp would all achieve solid 90fps, since you have that v-synced timewarp loop which does present at a fixed 90fps.

There is neither a universal solution to measure input delay, nor to measure framerate at any point beyond the end of the internal render path. The goal for a smooth presentation is to have the start of each frame synchronized as good as possible with the screen refresh, but how that is achieved is an entirely different matter.

Especially with regard to the now popular method of temporal antialiasing it's even worse. It's only a matter of time, until we are not only going to see fixed window temporal anti-aliasing, but dynamic ones which will attempt to fit as many (unique) sub-frames as possible within each output frame, so you are going to see the game presenting precisely at the refresh rate, with a minimal latency for the latest sub-frame. But at the cost of visual quality, which then depends on the internal frame rate.

Now ask yourself: Does the display or present rate mean anything at all, under such conditions? And if the actual visual quality or smoothness of the animation actually depends n the internal framerate, is it in any way valid to disregard it?
 
Last edited:
Back
Top