DX12 Performance Discussion And Analysis Thread

That's a lot of CPU usage:

CtL6hFN.jpg
Why waste 16 CPU cores to push draw calls (and determine visibility)? You can instead do the culling on GPU and perform a few ExecuteIndirects to draw the whole scene, saving you 15.9 CPU cores for tasks that are better suited for CPU :)
 
Why waste 16 CPU cores to push draw calls (and determine visibility)? You can instead do the culling on GPU and perform a few ExecuteIndirects to draw the whole scene, saving you 15.9 CPU cores for tasks that are better suited for CPU :)
Isn't the game just filling the CPU with AI/physics calculations, leaving a minimal amount of CPU time for graphics?
 
Was there a reason intel gpus were not tested? I suspect they would see similar gains as amd, but I'd like to see the numbers.

From the article:

despite our best efforts, the benchmark won't run on the integrated GPU of a shiny new Intel Skylake Core i7-6700K. That's particularly disappointing, because it would have been interesting to see if the performance uplift from DX12 was enough to make Intel's integrated GPUs
 
Why waste 16 CPU cores to push draw calls (and determine visibility)? You can instead do the culling on GPU and perform a few ExecuteIndirects to draw the whole scene, saving you 15.9 CPU cores for tasks that are better suited for CPU :)
Sebbbi quick question regarding D3D12 and explicit multi-adapter... can you use one GPU to do culling and indirectexecution buffer generation for another GPU?
 
+1 to test Intel GPUs with Ashes specially someone with Skylake HD 530..
also interested to see 3DMark API overhead D3D12 test on new Gen9 Intel HD 530.. but can't find any review with that results.. anyone?
 
Sebbbi quick question regarding D3D12 and explicit multi-adapter... can you use one GPU to do culling and indirectexecution buffer generation for another GPU?
In our case the visible cluster buffer is just a regular append buffer. The culler appends a single 32 bit integer to the buffer for each visible cluster (cluster = 64 vertices). You can copy this append buffer from one GPU to other just like any resource. So you could do the culling on integrated GPU and rendering on discrete.
 
+1 to test Intel GPUs with Ashes specially someone with Skylake HD 530..
I read somewhere that Ashes doesn't work yet with HD 530. It would be interesting to see the results, especially on a GT4e laptop with limited TDP. Current desktop Skylakes with low end GT2 graphics should be 100% GPU bound. DX12 shouldn't improve things much, unless Ashes uses async compute or some other new DX12 features that improve GPU utilization.
 
I read somewhere that Ashes doesn't work yet with HD 530.

It was in the arstechnica review.

It would be interesting to see the results, especially on a GT4e laptop with limited TDP. Current desktop Skylakes with low end GT2 graphics should be 100% GPU bound. DX12 shouldn't improve things much, unless Ashes uses async compute or some other new DX12 features that improve GPU utilization.

In the artechnica review the scores are pretty much identical across the board whether they use a 4 core without HT Haswell or a 6 core with HT Haswell so that suggests that even on a 980Ti and 290x, whether they are using DX12 or DX11, the game is GPU bound - unless it can't take advantage of more than 4 cores that is.

But the crazy thing is, the 290x still gets a huge leap when using DX12 as opposed to DX11. So that suggests a GPU limitation is being freed up by DX12. As you say, maybe async compute? I find it hard to imagine that GCN would get such a huge boost from async compute though compared to Maxwell but perhaps it's a bug with Maxwells implementation (driver or hardware)?
 
But the crazy thing is, the 290x still gets a huge leap when using DX12 as opposed to DX11. So that suggests a GPU limitation is being freed up by DX12. As you say, maybe async compute? I find it hard to imagine that GCN would get such a huge boost from async compute though compared to Maxwell but perhaps it's a bug with Maxwells implementation (driver or hardware)?
Have we considered more trivial explanations ahead of the fancier ones? I.e., if we compare pound for pound the 290x and the 980Ti (i.e. spec-per-spec), it appears that moving to DX12, in Ashes, allows the former to perform more in line with (some of) its theoreticals (e.g. slightly less ALU, slightly more BW etc.).
 
As you say, maybe async compute? I find it hard to imagine that GCN would get such a huge boost from async compute though compared to Maxwell but perhaps it's a bug with Maxwells implementation (driver or hardware)?
I don't know anything about Maxwell's async compute implementation, but I know that GCN gets huge benefits from it. It is too early to speculate, since we don't even know whether Ashes of Singularity uses asynch compute or not. If they use async compute, it might be that AMD is the only vendor that has implemented it in the drivers currently.

If I had time, I would write a DX12 microbenchmark at home (to see how well all the DX12 GPUs perform async compute, ExecuteIndirect and other new features)... but we have a newborn baby at home, taking all my free time :)
 
Have we considered more trivial explanations ahead of the fancier ones? I.e., if we compare pound for pound the 290x and the 980Ti (i.e. spec-per-spec), it appears that moving to DX12, in Ashes, allows the former to perform more in line with (some of) its theoreticals (e.g. slightly less ALU, slightly more BW etc.).

That would certainly be one hell of a result for AMD and probably benefit PC gaming as a whole given the increased competition is would bring. I can't say I'm particularly optimistic about that option though, especially as we didn't see similar "unleashing of potential" with Mantle which you would assume would be even more likely to achieve that result.
 
That would certainly be one hell of a result for AMD and probably benefit PC gaming as a whole given the increased competition is would bring. I can't say I'm particularly optimistic about that option though, especially as we didn't see similar "unleashing of potential" with Mantle which you would assume would be even more likely to achieve that result.
I would not necessarily take the Mantle experiments to mean much (now that we've gotten over the "it's going to change the world" phase). Granted, I'd like to underline that drawing many conclusions from Ashes is, IMHO, unwise, as it's still rather early days. Having said that, it does not appear to me that simply performing somewhat closer to what hardware specifications would suggest is such an outworldly win. I also would not necessarily take it as a strong indication of the future, as there's room in DX12 for unmatcheable investments in ones driver and developer outreach to act as the key differentiator.
 
It was in the arstechnica review.
In the artechnica review the scores are pretty much identical across the board whether they use a 4 core without HT Haswell or a 6 core with HT Haswell so that suggests that even on a 980Ti and 290x, whether they are using DX12 or DX11, the game is GPU bound - unless it can't take advantage of more than 4 cores that is.
PC Perspective's results show little difference for Intel when going from 4 to 8 cores, although it does so by mixing CPU architectures.
It takes an i3 or an AMD chip to tank the throughput. Within AMD's CPU range, 6 to 8 cores is not a major performance change.

As far as GPU bound?
A= 980
B= 980 Ti
C= 290
D= 390
E = Fury X

B to E
http://www.extremetech.com/gaming/2...-singularity-amd-and-nvidia-go-head-to-head/2
A to D
http://www.pcper.com/reviews/Graphi...ted-Ashes-Singularity-Benchmark/Results-Avera
C to B
http://arstechnica.com/gaming/2015/...ly-win-for-amd-and-disappointment-for-nvidia/

There are two notable performance tiers for each IHV.

So, given the lovely way the tech press, the IHVs, and Oxide have handled a very immature platform:
A is on par to D
B is on par to E
C is on par to B

I don't have a direct comparison that can fully close the loop. There are frame numbers given that give rough equivalences, although that is risky given how noisy the set is.
However, even without the numbers, if we assume that the 290 <=390 and the 980 is <= the 980 Ti.
The 290 is bracketed as being less than or equal to the 390, yet on par with the the 980 Ti. That means there's no room for the < and everything is coming out the same. Either these chips are all the same or we have plenty of room for crap in these results.
It is not showing a clear sign we're getting what we should out of these GPUs or the preview methods.

But the crazy thing is, the 290x still gets a huge leap when using DX12 as opposed to DX11.
The most clear constant among the previews is that AMD's DX11 implementation is inferior.
To me, this looks like one of the least crazy things about how the performance testing has been handled across all these sites.
 
I also would not necessarily take it as a strong indication of the future, as there's room in DX12 for unmatcheable investments in ones driver and developer outreach to act as the key differentiator.
There's going to be an element of baseline coding/targeting that's favourable to the architectures that are featured in consoles, given consoles already employ a software model akin to this and one will pretty much get exactly this.
 
Last edited:
An interesting hot topic discussion about DX12 performance here:

http://www.overclock.net/t/1569897/...singularity-dx12-benchmarks/490#post_24325434

Apparently, ISV code will play particularly important role in profiling the performance for specific architectures, not simply middle-ware type of add-on features.


Yes this was and probably will always be the case even in the future, different architectures have affinities to how code is written. This is why gameworks and Tress fx will always work better on the respective IHV's hardware, unless the developer helps with the opposite IHV's paths. DX12 doesn't solve this, no API will really.
 
The first mover thinks that ROPs control the frontend and tessellation performance and that since Fiji hasn't improved it over Hawaii is the reason why it is not doing that much better.

http://www.overclock.net/t/1569897/...singularity-dx12-benchmarks/400#post_24321843

And his 'analysis' of the hardware that has only come into prominence now is supposedly all the rage right now.

The antagonist who has been posted above doesn't know that CUDA miner was there before OpenCL for AMD.

The whole thing was a bit funny like all ocn threads turn into before it was being plastered everywhere. :-|
 
Back
Top