Meaningful Performance Analysis with Fixed Display Tech *spawn*

Ext3h

Regular
Looking at the difference of performance, even between a 980TI and a Fury, i will enjoy other sites and reviewers to redo the tests . As thoses results are clearly inverted .. But not only with Presentmon ofc.
Would be interesting to know which present mode PerfMon claimed to have detected for each of the architectures.

Furthermore, at least in the case of AotS, there is also the possibility that frames are being dropped prior(!) to being presented (saying the application culls stale frames itself instead of letting the OS do this). Same goes for a couple of other titles as well.

We have seen this before with AotS, that for AMD hardware, only actually displayed frames were also presented, while the render engine internally ran at a higher framerate, only blitting out the most recent frame per refresh to the present API. The last time, this tripped a reviewer attempting to measure frame times. There is a ton of vendor specific code, heuristics and optimizations in the present chain setup in AotS, so measuring at this point is very unlikely to produce any comparable results across multiple vendors.
 
Would be interesting to know which present mode PerfMon claimed to have detected for each of the architectures.

Furthermore, at least in the case of AotS, there is also the possibility that frames are being dropped prior(!) to being presented (saying the application culls stale frames itself instead of letting the OS do this). Same goes for a couple of other titles as well.

We have seen this before with AotS, that for AMD hardware, only actually displayed frames were also presented, while the render engine internally ran at a higher framerate, only blitting out the most recent frame per refresh to the present API. The last time, this tripped a reviewer attempting to measure frame times. There is a ton of vendor specific code, heuristics and optimizations in the present chain setup in AotS, so measuring at this point is very unlikely to produce any comparable results across multiple vendors.

This is probably a stupid question from me; but if it rendered internally at a higher framerate and the issue is dropped frames internally, why then at the displayed frames is the AMD performance lower than the NVIDIA?
In the PresentMon utility that is.
It does raise shouldn't performance be based on the actual fps/frame-time presented to the user rather than internal mechanism?
Because we are seeing a skewing happening between much higher internal "frames" to what is practically seen by the user, and this matters if it is possible for a large swing to take place between internal and seen measurements.

This is a great analysis on PresentMon, one of the more detailed out there:
http://www.pcper.com/reviews/Graphics-Cards/PresentMon-Frame-Time-Performance-Data-DX12-UWP-Games

Cheers
 
This is probably a stupid question from me; but if it rendered internally at a higher framerate and the issue is dropped frames internally, why then at the displayed frames is the AMD performance lower than the NVIDIA?
Your assumption that the frames have been displayed is wrong, in the case of Nvidia. Not unless you run PresentMon explicitly with "-exclude_dropped".

As soon as two frames are successfully rendered within the same refresh cycle, they are otherwise both going to be counted as presented, but not actually displayed.
It does raise shouldn't performance be based on the actual fps/frame-time presented to the user rather than internal mechanism?
No, definitely not. If you did that, e.g. most VR ready applications with async timewarp would all achieve solid 90fps, since you have that v-synced timewarp loop which does present at a fixed 90fps.

There is neither a universal solution to measure input delay, nor to measure framerate at any point beyond the end of the internal render path. The goal for a smooth presentation is to have the start of each frame synchronized as good as possible with the screen refresh, but how that is achieved is an entirely different matter.

Especially with regard to the now popular method of temporal antialiasing it's even worse. It's only a matter of time, until we are not only going to see fixed window temporal anti-aliasing, but dynamic ones which will attempt to fit as many (unique) sub-frames as possible within each output frame, so you are going to see the game presenting precisely at the refresh rate, with a minimal latency for the latest sub-frame. But at the cost of visual quality, which then depends on the internal frame rate.

Now ask yourself: Does the display or present rate mean anything at all, under such conditions? And if the actual visual quality or smoothness of the animation actually depends n the internal framerate, is it in any way valid to disregard it?
 
So are you saying FCAT/FRAPS does not work and we should use an internal benchmark?
Yes, that's precisely what I'm saying. You can no longer rely on the numbers yielded by these tools alone for anything beyond DX11, or at least true DX11-style render paths.

Even in the PCper article, that only worked for a single title by chance, and at that only because the title tested used classic V-sync without async timewarp.
 
Yes, that's precisely what I'm saying. You can no longer rely on the numbers yielded by these tools alone for anything beyond DX11, or at least true DX11-style render paths.

Even in the PCper article, that only worked for a single title by chance, and at that only because the title tested used classic V-sync without async timewarp.
I know they are specific to DX11, but my point is your logic would mean they too should not had been used for for DX11 or am I missing something when you say:
There is neither a universal solution to measure input delay, nor to measure framerate at any point beyond the end of the internal render path. The goal for a smooth presentation is to have the start of each frame synchronized as good as possible with the screen refresh, but how that is achieved is an entirely different matter.
PresentMon gives similar identifiable result-behaviour to those tools - context more DX11-DX12 rather than VR.

Ah I need to re-read that article as I thought that mentioned async timewarp in their VR learning experience - I must had misread its context.
Thanks
 
I know they are specific to DX11, but my point is your logic would mean they too should not had been used for for DX11 or am I missing something when you say:
What I mean by that, is that the picture painted by these tools was somewhat flawed before as well. Even when we talk about a v-synced application, there are multiple possible implementations.
Ranging from "we start rendering immediately after V-sync, at the risk of delivering data which is stale by half a frame", or "we guess the render time and delay the start as long as possible to reduce input lag", or "screw V-sync, we are implementing a frame rate limiter instead and only flip buffers timely to avoid tearing".

Going by "present" is entirely pointless, what you would actually want, is:
  • Simulation time stamp of frame
  • Time stamp of render start
  • Time stamp for render completion
  • Time stamp for display
If you are lucky, "present" somehow related to the 3rd timestamp. If you are unlucky, the engine handles that internally and it's always close to display, and doesn't even occur at the same rate.

For VR, you also have the "timewarp" time stamp in between the 3rd and 4th step. So that's actually a lot of different latencies which are potentially of interest.
 
Back
Top