DX12 Performance Discussion And Analysis Thread

The regressions from DX11 to DX12 in some of the Nvidia test cases are interesting. It seems to happen more at higher settings and on stronger CPU cores. Possibly memory management is a little off for Nvidia? That seems like something that could be lost in the transition that could give a DX11 driver a bump in cases that should be less driver limited.
 
Last edited:
Last edited by a moderator:
DX12 getting lower performance at higher resolutions and settings for nvidia is quite reminiscent of Fury's performance in BF4 with mantle where it shows good improvements at lower resolutions but falters at higher. Probably memory problems which was often blamed in the early days of mantle on Hawaii and Tahiti cards.
 
Or Nvidia's DX11 drivers already worked around a lot of the API deficiencies (comparatively to their competitors). I'm curious to see if AMD/Intel closes the "driver performance gap" with Vulkan/DX12 now that some of the optimization work has been moved from the IHV to the developer.
 
That's a lot of CPU usage:

CtL6hFN.jpg
 
Depending what are those core doing... Without power and temperature stats, the performance monitor showing only the CPU usage is quite useless...
 
http://oxidegames.com/2015/08/16/the-birth-of-a-new-api/

The second interesting number is the CPU framerate. This calculation is an estimate of what the FPS framerate would be if the GPU could keep up with the CPU. It is a very accurate estimate of what would happen if you put in an infinitely fast GPU. Likewise, we have another mode which instead of blocking on the GPU, will do all the work but throw away the frame. This can be useful for measuring CPU performance. However, if you do this then be sure that you use the same video card and driver for a different CPU, as some of the measurement will be driver related.

What is fascinating about the CPU framerate is it demonstrates how much more potential D3D12 has over D3D11. D3D12 will not show its true CPU benefits in average frame rates while the GPU is full. One thing to consider is that we are often pairing 28nm GPUs with 14nm CPUs. Next year, when the GPUs move to a higher process, you’re going to see a huge jump in GPU performance. This means that the gap between D3D11 and D3D12 will not only grow, but D3D12 may well become essential to achieving performance on the coming GPU architectures.
 
It's a benchmark that should be trying to load the system pretty heavily just to show what the engine can do. Are the videos being shown for the current set of tests? I was hoping for a bit more of something happening, and less screen area devoted to hillsides this time around.

In a game scenario, hopefully some of that core time could be devoted to other things, making everything a bullet sponge with so much respect for every other unit's personal space and the safety of every blade of grass makes it all seem a little too remote--although likely more consistent for testing.
 
Last edited:
http://arstechnica.co.uk/gaming/201...ly-win-for-amd-and-disappointment-for-nvidia/

While I wasn't expecting a particularly big jump in performance for team green, I certainly wasn't expecting performance to go down. It's not by a huge amount, but the results are consistent: Nvidia's GPU doesn't perform as well under DX12 as it does under DX11 in the Ashes benchmark.

Contrast that with the AMD results, which show a huge uplift in performance; as much as 70 percent in some cases. While you have to bear in mind that AMD is coming from a bad place here—its DX11 performance is nowhere near Nvidia's—that's still an impressive result. Under DX12, the much older and much cheaper R9 290X nearly matches the performance of the GTX 980 Ti. At 4K, it actually beats it, if only by a few frames per second. That's an astonishing result no matter how you slice it.

Finally we have the 99th percentile frame rates—that is, the minimum frame rate you can expect to see 99 percent of the time—calculated from the frame times that the Ashes benchmark spits out. This time, the R9 290X card actually manages to beat the 980 Ti when it comes to minimum frame rates, meaning that in Ashes of the Singularity at least, you'll have a slightly smoother experience with AMD. Given that, like it's overall DX11 performance, AMD has suffered with erratic frame timings in the past, this is surprising to see.
 
I'm having a bit of an issue trying to cross reference the data and methods used by each site. I'm seeing mixing and matching of CPU generations, GPUs, core clocks, overclocks, and settings.

AMD's card comes from behind and is trading blows with Nvidia, whether that card is a 290 or 390 or Fury and whether that card is a 980 or 980 Ti. The transitive property is not much of a friend in cross-site comparisons. DX12's benefit is a little modest for the hype, so long as you don't include AMD's consistently measured poor DX11 implementation.

This might have more to say about the roughness of quality of the data we're getting, and the consistency of testing methodologies across sites. Or is there something we can glean from this?
 
I'm having a bit of an issue trying to cross reference the data and methods used by each site. I'm seeing mixing and matching of CPU generations, GPUs, core clocks, overclocks, and settings.

AMD's card comes from behind and is trading blows with Nvidia, whether that card is a 290 or 390 or Fury and whether that card is a 980 or 980 Ti. The transitive property is not much of a friend in cross-site comparisons. DX12's benefit is a little modest for the hype, so long as you don't include AMD's consistently measured poor DX11 implementation.

This might have more to say about the roughness of quality of the data we're getting, and the consistency of testing methodologies across sites. Or is there something we can glean from this?
I don't know if this is correct, but the article suggests that nvidia cards were better built for serialized commands, where AMD ones were not, so in the situation where a lot of commands were coming in serially nvidia was completely outperforming AMD. But in a parallel situation where different parts of the GPU can be accessed simultaneously, perhaps this where the parallel argument starts to work back in favour for AMD. I recall speaking with D3D lead Max McCullen at Build conference after his DX12 advanced topics presentation that he mentioned that nvidia cannot handle async compute the same way AMD does. paraphrased: It's not an entirely separated unit like AMD has it. This brought up an interesting question for me which I poised to sebbbi: async compute optimization could be difficult on PC if the two cards do not operate the same, and he agreed that it was the biggest unknown for async compute usage on PC.

AMD appears to be able to run 3 completely separate flows on their GCN hardware, it is quite possible this is not the case with nvidia cards today.
 
The noise to signal ratio is higher than that. There are pairwise comparisons between AMD and Nvidia GPUs, but they avoid deriving any relative performance outside of that one comparison.
If the 980 is A, the 980 Ti is B, 290 is C, 390 is D, and Fury is E, there is an A to C to D to B to E chain where the sum total of the previews and their methods makes it look like A~B~C~D~E.
Never mind the CPU mix, which hops generations in the middle of previews and may or may not OC.
 
Some (relative) old hardware does not take full advantage of asynchronous compute works with graphics/raster works, that's a well know fact. On those GPUs, async compute works are always serialized by the driver when graphics/raster works happen. But that's not such a big deal, Direct3D 12 is not only asynchronous compute, multi-engine support means async-copy too (which should improve performance on all DX12 hardware, at least on PC), D3D12 comes with other supported features like explicit multi-adapter support, execute indirect, HLSL 5.1 and many others.
Of course if an application is both graphics and compute highly intense, those hardware that take advantage of asynchronous compute works will scale better.
 
Was there a reason intel gpus were not tested? I suspect they would see similar gains as amd, but I'd like to see the numbers.
 
Back
Top