DX12 Performance Discussion And Analysis Thread

My guess is that the internal benchmark counts the frame to frame time at the begining of processing, not at the time the frame is presented (like presentmon).
 
My guess is that the internal benchmark counts the frame to frame time at the begining of processing, not at the time the frame is presented (like presentmon).
Yes and no. The above chart is simplified. The engine uses different present modes for different hardware, depending on what the corresponding companies engineers recommended for their hardware.

It does look like that on Nvidia-hardware, every frame which has completed post processing is also presented right away. Regardless of whether another frame has already been presented during the same screen refresh interval.

It doesn't look like that with the present mode used for AMD hardware. At most a single frame per refresh is actually handed to the present API, the last completed one, that is. Well, at least to the part of the present API which PresentMon can monitor.

So your "average" and "peak" FPS are garbage/incomparable when your benchmark contains scenes in which the FPS would actually exceed the monitor refresh rate, as one of the implementations didn't even both to hassle the OS with presenting these. You can verify that by comparing the FPS reported by the benchmark itself, with the ones reported by PresentMon. They will only match for Nvidias hardware, at least the peak and average ones. The minimum FPS should match much better, at least as long as you make sure to use the same sliding window size for computing the minimum. (Which means you don't just pick the single largest present-to-present interval found, and try to calculate FPS from that.)
 
Last edited:
I am sorry Ext3h but you do realise that problems have been identified in games using FCAT/FRAPS that was not possible without them?
Yes those are DX11, but presentmon provides a similar view for DX12 games.
This means they actually do provide important data between user and the rendering engine.

I am surprised you are saying they are basically useless and do not help in providing a real-world picture of how a game/engine is performing and behaving.
If you disagree, maybe you should raise this as a technical debate with Intel who developed PresentMon, or those that developed FCAT/FRAPs and has been used for many years successfully.
Cheers
 
FCAT/FRAPS had their use in the past, at least as long as you could rely on the fact that every rendered frame was presented the moment it was rendered. But it just no longer works like that.

And even with PresentMon at least being able to detect all of the different present methods available on Windows 10, it still doesn't mean that it is possible to get any *meaningful* data from it. Well, it obviously does mean something, but not necessarily what you expected. The render paths and present methods are just varying more and more from the standard layout they used to follow. There is just no longer a generic method, valid for all titles and hardware platforms, to measure latencies and framerates. There is no other option but to carefully analyze each driver/hardware/software tuple manually to find a way to profile this specific title. There is even the possibility that a former standard metric such as "FPS" is no longer tied to the performance at all, for some titles.

This was expected to happen with these new low-level APIs, and it's partially even happening in DX11 titles now, as concepts are being backported, diverting from the intended usage patterns.
 
What a weird title with the 1080 reigning supreme!
Which isn't surprising since they didn't bench anything remotely close to it in price or capabilities. No Furies, 980ti, etc. Would have been nice if they actually investigated async on the 1080 a bit more while running that benchmark. Regardless, the results don't look too dissimilar from what AOTS provided.
 

The other problem with their conclusion and focus is that they use reference NVIDIA cards and custom AIB AMD cards....
Which is really strange considering they would know that custom AIB's for NVIDIA models have much greater performance window than the reference.
Taking that into account it would probably show the custom AIB NVIDIA having either same or slightly better FPS...
So strange how they can state that conclusion when it is the most extreme example of diverging card model performance.
I do not know what to make actually, Look how close the 960 model is to the 970, reviews have the gap much wider if it is custom vs custom.
They also have the 380 with the same performance as a 390...
Cheers
 
Last edited:
Total War: WARHAMMER will not support DirectX 12 at Launch

We’re pleased to confirm that Total War: WARHAMMER will also be DX12 compatible, and our graphics team has been working in close concert with AMD’s engineers on the implementation.

This will be patched in a little after the game launches, but we’re really happy with the DX12 performance we’re seeing so far, so watch this space!

In GPU terms, we’ve shifted our particle simulation pipeline from the pixel shader to the compute shader, which is a more efficient use of the GPU’s time. In fact we’ve done this with several parts of the rendering pipeline, further utilizing the GPU and letting the CPU focus on everything else it has to do.


Long story short: all of this means we’re using the CPU and the GPU more efficiently. TW: Warhammer takes better advantage of multicore CPUs, balancing the load across the cores so that no single core is maxed out and limiting framerates while others sit idle.

We’ve also switched up the Total War engine from 32 to 64-bit. While this brings no tangible performance benefits, we no longer have the 32-bit restriction of a maximum of 2GB of memory devoted to processes.

The upshot is we can basically cram a greater variety of models, animations and textures into battles. One neat side benefit though is that it’s brought a reduction in end-turn times. Coupled with further optimisation we’ve done on the AI’s decision-making, this means you’ll enjoy quite noticeably reduced end-turn rounds while all the AI factions take their turns.

http://www.overclock3d.net/articles...ammer_will_not_support_directx_12_at_launch/1
 
Only 980 was reference, 960 was Asus Strix
Yeah but it is strange results seeing a 960 so close to the 970 if both are custom AIB....
Look at other game reviews.
Also they should had noticed the game is still not optimised because of the 380 (Tonga) same performance as 390 (Hawaii) - we have seen this as well in the past with some other developments.
Cheers
 
Different resolutions.
Yes good catch.
Confusing how they do not bother to test most resolutions on each or split them up more like other reviews.
Wonder why used a reference 980 as that would end up being very similar result to the 390x if a custom or possibly a bit higher depending upon the custom AIB.
Edit:
What caught me as well at one stage is that graph also showed the 970 as reference in the past (they corrected it) , double checked as I have it captured.
Cheers
 
Last edited:
We’ve also switched up the Total War engine from 32 to 64-bit. While this brings no tangible performance benefits, we no longer have the 32-bit restriction of a maximum of 2GB of memory devoted to processes.
3GB on 32-bit OS with LARGEADDRESSAWARE flag, 4GB on 64-bit OS, so recent "AAA" games usually are not limited to 31-bit virtual address....
People too often forgot that x86-64 ISA exposes the double of registers (both integer and floating) then x86-ia32 ISA to the compiler..
 
Yes and no. The above chart is simplified. The engine uses different present modes for different hardware, depending on what the corresponding companies engineers recommended for their hardware.

It does look like that on Nvidia-hardware, every frame which has completed post processing is also presented right away. Regardless of whether another frame has already been presented during the same screen refresh interval.

It doesn't look like that with the present mode used for AMD hardware. At most a single frame per refresh is actually handed to the present API, the last completed one, that is. Well, at least to the part of the present API which PresentMon can monitor.

So your "average" and "peak" FPS are garbage/incomparable when your benchmark contains scenes in which the FPS would actually exceed the monitor refresh rate, as one of the implementations didn't even both to hassle the OS with presenting these. You can verify that by comparing the FPS reported by the benchmark itself, with the ones reported by PresentMon. They will only match for Nvidias hardware, at least the peak and average ones. The minimum FPS should match much better, at least as long as you make sure to use the same sliding window size for computing the minimum. (Which means you don't just pick the single largest present-to-present interval found, and try to calculate FPS from that.)

PresentMon is not as accurate as FCAT but works in a simular fashion, looking at the interval between the frames are presented. Until recently DX12 games on AMD (even Steam games, like Ashes) were forced into a borderless window mode, so no matter how many frames per second your game is running, you get 60Hz (or whatever your refresh rate is) updates with v-sync on. But they changed this recently, implementing DirectFlip, and as of an Windows 10 update a few weeks ago, even UWP games can work in exclusive fullscreen (with vsync on/off and compatible with G-SYNC/Freesync).
 
PresentMon is not as accurate as FCAT but works in a simular fashion
On the CPU/Present side PresentMon should give results that are basically identical to FRAPS. As I've explained in earlier threads, these measurements are effectively what game engines use to update their simulations so they are important regardless of the display pipe smoothness as well.

On the display side, outside of SLI I'm not aware of any situations in which PresentMon's counters are "less accurate". SLI can introduce some driver magic with frame pacing that causes the two to diverge somewhat, but that doesn't occur with single GPUs.

The "more accurate/less accurate" notion is a bit misguided though - there's not one true way to measure performance and the user experience. These various methods span the more/less useful spectrum but anyone telling you that one number is the full story is selling something :)
 
Stupid question - does triple buffering affect the accuracy of presentmon or fraps or fcat?
No. The things that affect the "accuracy" (or rather cause FCAT and PresentMon to diverge slightly on the display side data) are driver or hardware playing with the frame pacing/flip timing beyond what the OS is requesting. Generally folks only do this in the presence of driver SLI/AFR.
 
Back
Top