DX12 Performance Discussion And Analysis Thread

swaaye · May 18, 2016

I played with the AOTS benchmark on my 970 a couple of weeks back, with the game ini file's async option on/off and the game actually ran slightly slower with it enabled. It comes disabled by default. I've read that sometime in the last few months NV has added some kind of support for the feature. Mysterious.

Razor1 · May 18, 2016

Jawed said:
Fury X is 1 frame per second faster than 390X at 3840x2160:

http://www.guru3d.com/articles_pages/hitman_2016_pc_graphics_performance_benchmark_review,7.html

I wonder if that's due to there being only 4GB of RAM.

I don't think Hitman is that heavy on memory usage though.

CarstenS · May 18, 2016

swaaye said:
I played with the AOTS benchmark on my 970 a couple of weeks back, with the game ini file's async option on/off and the game actually ran slightly slower with it enabled. It comes disabled by default. I've read that sometime in the last few months NV has added some kind of support for the feature. Mysterious.

That's why it's running only slightly slower now.

Ryan Smith · May 18, 2016

Just checked with Dan Baker. Async is still functionally disabled on Ashes when it detects an NVIDIA card, including the GTX 1080 (since they don't have one to test against yet).

CSI PC · May 19, 2016

Ryan Smith said:
Just checked with Dan Baker. Async is still functionally disabled on Ashes when it detects an NVIDIA card, including the GTX 1080 (since they don't have one to test against yet).

Ah great news - great from receiving clarification from Oxide not necessarily regarding the outcome

Here is a big question and I think it comes back to what Kollock mentioned in past that whatever the performance change seen between drivers is whatever NVIDIA is doing outside of that; so how is it that the 1080 does not see any negative performance unlike Maxwell?
Is this to do with the behaviour of the pre-emption and how it now works from a hardware perspective on Pascal?
Anyway this makes most of the testing with Ashes a moot point for now; although nice to see that at least Pascal is behaving better than Maxwell even without async compute in DX12 and AoTS.
Thanks

Razor1 · May 19, 2016

more shader throughput is probably what is helping pascal over maxwell.

CSI PC · May 19, 2016

Razor1 said:
more shader throughput is probably what is helping pascal over maxwell.

In performance overall yes, but that would not explain how Pascal is now always slightly positive result rather than always slightly negative with Maxwell in AoTS.
Cheers

Razor1 · May 19, 2016

Well it also doesn't explain the increases at some resolutions either lol

Funny stuff is going on.......

CSI PC · May 19, 2016

Razor1 said:
Well it also doesn't explain the increases at some resolutions either lol

Funny stuff is going on.......

Not sure anyone has used the Intel PerfMon on AoTS, might make sense rather than just relying upon the internal benchmark measurement tool.
Cheers

Forceman · May 19, 2016

CSI PC said:
Not sure anyone has used the Intel PerfMon on AoTS, might make sense rather than just relying upon the internal benchmark measurement tool.
Cheers

Hardware Canucks used PresentMon in their review:

Ashes of the Singularity is a real time strategy game on a grand scale, very much in the vein of Supreme Commander. While this game is most known for is Asynchronous workloads through the DX12 API, it also happens to be pretty fun to play. While Ashes has a built-in performance counter alongside its built-in benchmark utility, we found it to be highly unreliable and often posts a substantial run-to-run variation. With that in mind we still used the onboard benchmark since it eliminates the randomness that arises when actually playing the game but utilized the PresentMon utility to log performance

Ryan Smith · May 19, 2016

CSI PC said:
In performance overall yes, but that would not explain how Pascal is now always slightly positive result rather than always slightly negative with Maxwell in AoTS.
Cheers

There's still some work re-org that goes on when async is turned on in Ashes. Apparently this is benefiting the Pascal cards ever so slightly.

CarstenS · May 19, 2016

For me, it didn't. It's actually a bit faster with Async off via settings.ini (1080p low or extreme). Maybe I have an outlier, maybe some were mixing up results with what their expectations were.

CSI PC · May 19, 2016

Forceman said:
Hardware Canucks used PresentMon in their review:

Those are interesting results for the Fury/980ti.
Just need more to use it to build up a validated picture.
I always wondered if the pre-set benchmark created an unrealistic load in terms of additional synthetic test *shrug*.
Thanks.

lanek · May 21, 2016

Forceman said:
Hardware Canucks used PresentMon in their review:

Looking at the difference of performance, even between a 980TI and a Fury, i will enjoy other sites and reviewers to redo the tests . As thoses results are clearly inverted .. But not only with Presentmon ofc.

Ext3h · May 21, 2016

lanek said:
Looking at the difference of performance, even between a 980TI and a Fury, i will enjoy other sites and reviewers to redo the tests . As thoses results are clearly inverted .. But not only with Presentmon ofc.

Would be interesting to know which present mode PerfMon claimed to have detected for each of the architectures.

Furthermore, at least in the case of AotS, there is also the possibility that frames are being dropped prior(!) to being presented (saying the application culls stale frames itself instead of letting the OS do this). Same goes for a couple of other titles as well.

We have seen this before with AotS, that for AMD hardware, only actually displayed frames were also presented, while the render engine internally ran at a higher framerate, only blitting out the most recent frame per refresh to the present API. The last time, this tripped a reviewer attempting to measure frame times. There is a ton of vendor specific code, heuristics and optimizations in the present chain setup in AotS, so measuring at this point is very unlikely to produce any comparable results across multiple vendors.

CSI PC · May 21, 2016

Ext3h said:
Would be interesting to know which present mode PerfMon claimed to have detected for each of the architectures.

Furthermore, at least in the case of AotS, there is also the possibility that frames are being dropped prior(!) to being presented (saying the application culls stale frames itself instead of letting the OS do this). Same goes for a couple of other titles as well.

We have seen this before with AotS, that for AMD hardware, only actually displayed frames were also presented, while the render engine internally ran at a higher framerate, only blitting out the most recent frame per refresh to the present API. The last time, this tripped a reviewer attempting to measure frame times. There is a ton of vendor specific code, heuristics and optimizations in the present chain setup in AotS, so measuring at this point is very unlikely to produce any comparable results across multiple vendors.

This is probably a stupid question from me; but if it rendered internally at a higher framerate and the issue is dropped frames internally, why then at the displayed frames is the AMD performance lower than the NVIDIA?
In the PresentMon utility that is.
It does raise shouldn't performance be based on the actual fps/frame-time presented to the user rather than internal mechanism?
Because we are seeing a skewing happening between much higher internal "frames" to what is practically seen by the user, and this matters if it is possible for a large swing to take place between internal and seen measurements.

This is a great analysis on PresentMon, one of the more detailed out there:
http://www.pcper.com/reviews/Graphics-Cards/PresentMon-Frame-Time-Performance-Data-DX12-UWP-Games

Cheers

Ext3h · May 21, 2016

CSI PC said:
This is probably a stupid question from me; but if it rendered internally at a higher framerate and the issue is dropped frames internally, why then at the displayed frames is the AMD performance lower than the NVIDIA?

Your assumption that the frames have been displayed is wrong, in the case of Nvidia. Not unless you run PresentMon explicitly with "-exclude_dropped".

As soon as two frames are successfully rendered within the same refresh cycle, they are otherwise both going to be counted as presented, but not actually displayed.

CSI PC said:
It does raise shouldn't performance be based on the actual fps/frame-time presented to the user rather than internal mechanism?

No, definitely not. If you did that, e.g. most VR ready applications with async timewarp would all achieve solid 90fps, since you have that v-synced timewarp loop which does present at a fixed 90fps.

There is neither a universal solution to measure input delay, nor to measure framerate at any point beyond the end of the internal render path. The goal for a smooth presentation is to have the start of each frame synchronized as good as possible with the screen refresh, but how that is achieved is an entirely different matter.

Especially with regard to the now popular method of temporal antialiasing it's even worse. It's only a matter of time, until we are not only going to see fixed window temporal anti-aliasing, but dynamic ones which will attempt to fit as many (unique) sub-frames as possible within each output frame, so you are going to see the game presenting precisely at the refresh rate, with a minimal latency for the latest sub-frame. But at the cost of visual quality, which then depends on the internal frame rate.

Now ask yourself: Does the display or present rate mean anything at all, under such conditions? And if the actual visual quality or smoothness of the animation actually depends n the internal framerate, is it in any way valid to disregard it?

CSI PC · May 21, 2016

That is a good point regarding VR but not quite the same issue.
So are you saying FCAT/FRAPS does not work and we should use an internal benchmark?
OK this argument is more to do outside of DX12, but your argument would also be applicable to that.

Here is PCper experience using PresentMon looking at VR: http://www.pcper.com/reviews/Graphi...eriences-Prologue/Testing-Process-and-Early-R
Cheers

BRiT · May 21, 2016

Please continue the conversation on fixed display tech and meaningful analysis in the VR forum:

https://forum.beyond3d.com/threads/...analysis-with-fixed-display-tech-spawn.57954/

I think the impact applies more to VR than DX12, and shouod spark more contributions there.

doompc · May 21, 2016

Hi, first time posting here.

Ashes uses async compute for post processing, so it immediately starts the next frame, but the "previous" one will only be presented when the post is done.

from here: http://32ipi028l5q82yhj72224m8j.wpe...16/03/GDC_2016_D3D12_Right_On_Queue_final.pdf

DX12 Performance Discussion And Analysis Thread

swaaye

Entirely Suboptimal

Razor1

CarstenS

Moderator

Ryan Smith

CSI PC

Razor1

CSI PC

Razor1

CSI PC

Forceman

Ryan Smith

CarstenS

Moderator

CSI PC

lanek

Ext3h

CSI PC

Ext3h

CSI PC

BRiT

(>• •)>⌐■-■ (⌐■-■)

doompc

Similar threads