DX12 Performance Discussion And Analysis Thread

Indeed. As I read the Anandtech piece, there's not enough information to tell what those numbers mean in terms of async compute.

A GPU showing 5+1 numbers (graphics+compute) could be no faster then one doing 6+6 overall.

(The first one would have been faster without the compute task, though. Which one would be faster on the compute task alone is not clear.)
 
The bench run is rather disappointing. It's just a fly-by sequence, stressing mostly the asset streaming on the static objects.
 
Sorry, but I do not believe reviews and "tech" sites any-more. I want to see the GPU-view graphs or a technical analysis/presentation of the rendering algorithm (what they use - for example - for dynamic lightning? Tiled deferred, tiled forward?, a clustered algorithm, pure deferred, linked lightning...?). They can wrote whatever they want, but I don't get any valuable data about hardware efficiency on manage certain features and algorithms...
I want to see an updated UE with no more preprocessor errors quoting the lack of implementation. And I want to see working drivers.
All the rest "manual resource barrier tracking and explicit memory management to help achieve maximum performance across a wide range of CPU and GPU hardware." it is just the mandatory usage of the D3D12 API, so nothing special.
 
The bench run is rather disappointing. It's just a fly-by sequence, stressing mostly the asset streaming on the static objects.
most of games are composed by quite only static objects... That's not a big deal. The big deal is to see what happen to the hardware when the code is running, ie how much stalls there are, how long they are, how they vary on different hardware and why (driver or application code).
It would be also interesting how different hardware do async-copy operations.
 
Sorry, but I do not believe reviews and "tech" sites any-more. I want to see the GPU-view graphs or a technical analysis/presentation of the rendering algorithm (what they use - for example - for dynamic lightning? Tiled deferred, tiled forward?, a clustered algorithm, pure deferred, linked lightning...?). They can wrote whatever they want, but I don't get any valuable data about hardware efficiency on manage certain features and algorithms...
I want to see an updated UE with no more preprocessor errors quoting the lack of implementation. And I want to see working drivers.
All the rest "manual resource barrier tracking and explicit memory management to help achieve maximum performance across a wide range of CPU and GPU hardware." it is just the mandatory usage of the D3D12 API, so nothing special.

that was stated by the developer

Compute shader simulation and culling is the cost of our foliage physics sim, collision and also per-instance culling, all of which run on the GPU. Again, this work runs asynchronously on supporting hardware.

this wasn't written by the author of the article.

info on the gi system fable legends uses, i think UE4 now uses as it as its standard GI too.

http://www.lionhead.com/blog/2014/april/17/dynamic-global-illumination-in-fable-legends/
 
that was stated by the developer



this wasn't written by the author of the article.

info on the gi system fable legends uses, i think UE4 now uses as it as its standard GI too.

http://www.lionhead.com/blog/2014/april/17/dynamic-global-illumination-in-fable-legends/
Asynchronously along with other computes works or asynchronously along with graphics work? Because the behaviour of hardware would be different. This is not currently implemented in UE.
Good to know they do not use NV physx for foliage... Last time I checked there were different GI paths on UE in development, but maybe things have changed (sorry, I do not check the entire repo at every update, they are millions of lines code). You see, there are tons of unanswered questions.

And if we are back on the benchmarks graphs again, it is clearly strange that certain hardware with the same architecture (and generation) performs very different on certain rendering features.
 
That really isn't async, async is only when using a graphics pipeline and a compute pipeline and combining (filling in when resources are available) the compute with the the graphics. Unless I'm reading it wrong lol, I don't think I am.

Otherwise they wouldn't call it async, they would call it order independent or out of order instructions.
 
Last edited:
That really isn't async, async is only when using a graphics pipeline and a compute pipeline and combining the compute with the the graphics. Unless I'm reading it wrong lol, I don't think I am.
Async works means that you submit works to be executed without explicit synchronization because the works relay on different write resources (if there is hardware support of course). With D3D12 you can have multiple compute queues and sumbit different works to do in concurrency (this should works on most of hardware, which is not the case of graphics + compute). They will execute in a asynchronous way. Of course you can try to synchronize all if you need to manage the write-access of a resource.
Async works, or whatever fancy name you wanna use, it's not AMD vs NV whatever propaganda, it's just a general concept of the API and it applyies to graphics + copy, copy, graphics + compute, compute and compute + copy (did I forget any use-case?). . And I bet Maxwell 2.0 will works better than all GCN on asynchronous (FP32 of course) compute works only.
 
fable4k-fps.gif


fable-fps.gif


The GeForce cards perform well generally, in spite of this game's apparent use of asynchronous compute shaders. Cards based on AMD's Hawaii chips look relatively strong here, too, and they kind of embarrass the Fiji-based R9 Fury offerings by getting a little too close for comfort, even in 4K. One would hope for a stronger showing from the Fury and Fury X in this case.

But, you know, it's just one benchmark based on an unreleased game, so it's nothing to get too worked up about one way or another. I do wish we could have tested DX12 versus DX11, but the application Microsoft provided only works in DX12. We'll have to grab a copy of Fable Legends once the game is ready for public consumption and try some side-by-side comparisons.
http://techreport.com/review/29090/fable-legends-directx-12-performance-revealed
 
Last edited by a moderator:
Async works means that you submit works to be executed without explicit synchronization because the works relay on different write resources. With D3D12 you can have multiple compute queues and sumbit different works to do in concurrency. They will execute in a asynchronous way. Async shaders, or whatever fancy name you wanna use, it's not AMD vs NV whatever propaganda. And I bet Maxwell 2.0 will works better than all GCN on asynchronous (FP32 of course) compute works only.


Well I understand that, but mixing of different queues isn't really out of order instructions it's different, even Keplar had compute shader out of order capabilities, I think Fermi might have too, don't remember, but the later two couldn't mix to different type of pipeline instructions together.
 
Well I understand that, but mixing of different queues isn't really out of order instructions it's different, even Keplar had compute shader out of order capabilities, I think Fermi might have too, don't remember, but the later two couldn't mix to different type of pipeline instructions together.
Yes. The big deal is how hardware behaves on different use case. If an application is highly compute-shader intense I would consider more the execution of different compute works (and this sould works even on Haswell and Kepler IIRC, don't remember Fermi, which still lacks of WDDM 2.0 drivers) then compute + grpahics. On the other way, If the application is still highly bound by "traditional" graphics works, I would be more interested on graphics + compute execution.
 
Nvidia Inspector shows only a bunch of unidentified entries in the profile. No standard settings are enabled (SLi or AA bits).
 
I just have seen the WCCF articles and what i see is not one site have the same results, will try watch deeper when i can.

Edit.. ok i understand now... not a reviewers use the same gpu .. Asus Strix 980TI in one, Gigabyte 980 OC in a second, stock 980 in other. with clock difference going from 100 to 200mhz.... well. etc etc..
 
Last edited:
Agreed, something is seems to be pulling Fiji back, mainly because of the short distance between it and Hawaii.
On the other hand, the 390 seems to be head and shoulders above the GXT 980 now.

Extremetech is getting different results from Anandtech regarding the comparison between Fury X and 980Ti. The only discernible difference I see is that they're using Haswell-E with faster DDR4 memory:

Vw3Z4SN.png

KdBmomM.png

Those results are AMD's PR provided results, not results from compiled by ExtremeTech.

In our initial coverage for this article, we included a set of AMD-provided test results. This was mostly done for practical reasons — I don’t actually have an R9 390X, 390, or R9 380, and therefore couldn’t compare performance in the midrange graphics stack. Our decision to include this information “shocked” Nvidia’s PR team, which pointed out that no other reviewer had found the R9 390 winning past the GTX 980.
 
"It runs asynchronously on supporting hardware"!

The fact that they're measuring how fast a certain code is running (compute shaders) does not tell you that it's running asynchronously with the graphics pipeline. You could put a Kepler chip running that benchmark and it would still tell you how long the compute shaders took to run.

It's even possible that the compute shaders are running faster on nVidia hardware because they're not running asynchronously. If all of the GPU's compute resources need to be dedicated to compute tasks, there's a good chance they will take less time.
It is absolutely running asynchronously. The question is whether it is running CONCURRENTLY.

Please, Please, PLEASE pay attention to the words you use in technical discussion.
 
Back
Top