DX12 Performance Discussion And Analysis Thread

To be honest: I'm not sure. I'm not familiar enough with how the software stack is structured to give a proper reasoning.
All I did understand from the explanation given to me, is that the scheduler is in fact part of the OS.
Kernel mode or part of the user space runtime? No clue, even though kernel mode appear likely since it's also responsible for scheduling concurrent execution of multiple 3D accelerated applications. Definitely not part of the driver, or in any way exposed to it.
On hardware not supporting multiple queues of any of the 3 types, it performs a transparent mapping, both from the perspective of the application and the driver.

Does this mesh with Futuremark's description of its multi-queue process?

http://www.futuremark.com/pressreleases/a-closer-look-at-asynchronous-compute-in-3dmark-time-spy

Unlike the Draw/Dispatch calls in DirectX 11 (with immediate context), in DirectX 12, the recording and execution of command lists are decoupled operations. This means that recording can and does happen as soon as it has all available information and there is no thread limitation on it.

For GPU work to happen, command lists are executed on queues, which come in variants of DIRECT (commonly known as graphics), COMPUTE and COPY. Submission of a command list to a queue can happen on any thread. The D3D runtime serializes and orders the lists within a queue.

Once initiated, multiple queues can execute in parallel. But it is entirely up to the driver and the hardware to decide how to execute the command lists - the game or application cannot affect this decision with the DirectX 12 API.
 
Does this mesh with Futuremark's description of its multi-queue process?

http://www.futuremark.com/pressreleases/a-closer-look-at-asynchronous-compute-in-3dmark-time-spy
Almost. But I think the PR folks at Futuremark fell into the same trap we did:
As soon as the driver reports "I can't do any more queues of this type" (simplified, I don't know whether this is queried at device initialization or at run time), the compatibility layer takes over. Its not even in the control of the driver to influence scheduling.
 
Almost. But I think the PR folks at Futuremark fell into the same trap we did:
As soon as the driver reports "I can't do any more queues of this type" (simplified, I don't know whether this is queried at device initialization or at run time), the compatibility layer takes over. Its not even in the control of the driver to influence scheduling.

I am not sure Futuremark is the one most likely to be mistaken.
The press release itself explicitly discusses the role the OS plays with the CPU task scheduler, and then gives the analogous role for graphics to the driver.

Would it be possible to clarify what the term "scheduling" refers to? Is it the specific order of commands issuing on the GPU, which Futuremark describes as the purview of the driver and hardware? Futuremark disavows any ability on the part of the application to "schedule" command list execution, for whatever meaning they are using.
What obligation does the OS have to step in and emulate the driver if the driver cannot perform its function of tracking its queues? What is the emulation layer going to talk to as an intermediary between the OS and the hardware if the driver has fallen down?
 
Since new branch drivers (21.19.XXX.XX) AMD disable Async compute on old GCN architecture, making artificialy new GCN architecture shiny.

I started to suspect something is wrong when Nixxes(which port the PC version of Rise Of The Tomb Raider) stated that they enable Async Compute on GCN 1.1 and superior.

Someone ask AMD about that and why GCN 1.0 can't run Total War Hammer in DirectX 12 mode. The Answer ? Ask the devs...

https://community.amd.com/thread/202794

AMD Drivers 15.7.1 and inferior to 16.9.2
Compute only:
1. 49.01ms

Graphics only: 47.38ms (35.41G pixels/s)

Graphics + compute:
1. 49.12ms (34.16G pixels/s)

Async Compute works !

AMD Drivers superior to 16.9.2

Compute only:
1. 45.25ms

Graphics only: 47.14ms (35.59G pixels/s)

Graphics + compute:
1. 92.39ms (18.16G pixels/s)

Async Compute broken !

 

Attachments

  • perf R9 280X OC 15.7.1-AsyncComputeV2.zip
    1 KB · Views: 2
  • perf R9 280X OC 16.11.4-AsyncComputeV2.zip
    1.1 KB · Views: 1
I guess it's a sign that GCN 1.0 GPUs are starting to be phased out in regards to driver optimizations...

Truth be told, those GPUs will be turning 5 years in a month.
 
I guess it's a sign that GCN 1.0 GPUs are starting to be phased out in regards to driver optimizations...

Truth be told, those GPUs will be turning 5 years in a month.
Radeon 7970 was quite popular. I still use one frequently for testing. 7970 GE is very close to RX 470 in compute performance. Geometry performance of course is much worse, but our game doesn't render triangles. I wonder whether Vulkan async compute still works on GCN 1.0. AMD themselves recommend to use just one compute queue (on PC). GCN 1.0 queue count isn't the limiting factor here. IIRC somebody in B3D forums said that GCN 1.0 can't run the same microcode for ACEs as there's not enough room. Maybe there's load balancing issues or some other bugs and the new code just doesn't fit in GCN 1.0. This is very unfortunate if true, since 7970 is still widely used. Async compute would have extended its lifetime a bit, especially in compute heavy games like ours.

Update: GCN 1.0 also used in products such as Radeon R9 270, 270X, 280, 280X, 370, 370X. That's plenty of relevant GPUs.
 
Last edited:
I'm not suggesting AMD dropping optimizations for GCN 1.0 cards is a good thing, on the contrary. But you have to wonder how much longer could the current limited-resources AMD keep investing time and money on optimizing the drivers for GPUs that were sold 3-5 years ago. There is a real driver effort happening at AMD, but they do have to choose their battles wisely.

Tahiti and Pitcairn's longevity is really the odd duck here. And it's that same odd longevity that still make them relevant GPUs.
For example, who exactly is still very worried about the performance optimizations on 1st-gen Kepler cards like the GTX 680 or the GTX 660 (both were the original competitors to Tahiti and Pitcairn, respectively)?
 
Fortunately with the coming of low level API AMD won't have too spend that much effort on drivers anymore, and it will at last reveal each vendor hardware quality.
 
Im not quite sure its related to "compute" as used in this benchmark, but OpenCL compute is completely broken on last AMD drivers anywaay.. it can be shown by using Cycles ( Blender render internal) or Luxrender ( many problem with the viewport render and when you do multiples render in follow. So we have need to advice peoples to revert back to old drivers ( i use 2x 7970, but the problem seems happend on some newer series anyway ).

( well we was imagine it was based on OpenCL, but could be something else )
 
GCN 1.0 still occupies a large user base within AMD's own market share, not without the contribution of the Bitcoin mining boom few years ago, that caused large circulation of second hand boards to flood the market.
I can only image how AMD would wish to flush out this lump of first gen GCN.
 
I'm not suggesting AMD dropping optimizations for GCN 1.0 cards is a good thing, on the contrary. But you have to wonder how much longer could the current limited-resources AMD keep investing time and money on optimizing the drivers for GPUs that were sold 3-5 years ago. There is a real driver effort happening at AMD, but they do have to choose their battles wisely.
Many stores still sell brand new R9 370X today. It is a GCN 1.0 part. You expect full support when you buy a new (1 gen old) card.
 
Many stores still sell brand new R9 370X today. It is a GCN 1.0 part. You expect full support when you buy a new (1 gen old) card.


Pitcairn cards should have been completely replaced by Polaris 11 models by now, but let's all agree to one thing here: Pitcairn being in the 300 series was really stretching the GPU's life a lot beyond what it should have been. I don't know if there ever was another GCN GPU for that performance bracket. Maybe there was one for 20nm, but since both major IHVs decided to can all their 20nm solutions, leaving aside a Pitcairn successor may have been a possibility.

That said, it's not like any of these games have stopped working for GCN 1.0. It's just that AMD seems to have stopped dedicating the same amount of time at optimizing driver code to games for 1st-gen GCN GPUs.
Will the 370X-equipped end user ever notice any difference, even in TW: Warhammer, with the new driver?
As seen before, Creative Assembly's DX12 implementation in TW: Warhammer didn't exactly break any records in performance boosts. Like most other implementations so far, it seems to be a port made to ramp up QA and customer feedback for future titles.
Which makes us go back to AMD having to carefully choose their battles, and with the Vega cards and also Zen APUs with Vega iGPUs coming up, someone at AMD may have decided it was time to allocate the efforts towards those. And the caveat was to stop supporting all the latest features in cards using their old chips.

I too wish all features and full potential performance could be squeezed out of older cards for many, many years, but AMD is choking and the last thing they want is a Vega family launch with poorly developed drivers.
 
GCN 1.0 Tahiti (7900s 8900s, 280s) is still such a lovely piece of silicon for FP64 :D

Hope AMD do not kill GCN 1.0 support soon, I have planes for my previous GPU ( ͡° ͜ʖ ͡°)
 
Many stores still sell brand new R9 370X today. It is a GCN 1.0 part. You expect full support when you buy a new (1 gen old) card.
I think that was expected, after all it happened before (ie, Mantle games not working on newer hardware), when you go lower, you risk the chance of introducing incompatibilities or dividing optimization efforts on too many fronts. I think developers themselves don't bother optimize Async Compute for (GCN 1.0). Restricted to Windows 10, DX12 install base is still relatively small compared to DX11 and DX10, add to that people with AMD GPUs and the number gets smaller (as a result of their smaller market share), it gets even smaller adding GCN 1 GPUs to the mix. DX12 is hard work enough already on developers, perhaps their code doesn't offer that much impact on GCN 1 anyway, so they choose to put their effort where it matters the most. .
 
Just remember that by a developer, GCN 1 vs GCN 2 is just all about multi-sampling (well, min-max filters) on tiled resources and nothing more... But I guess AMD will not add SM 6.0 optional support.
 
It is about the 1 year anniversary of their previous canning of all DirectX 11 GPUs and APUs. Errr I mean "legacy" status. Time for another round of "retirements".
 
Just remember that by a developer, GCN 1 vs GCN 2 is just all about multi-sampling (well, min-max filters) on tiled resources and nothing more...

But for a consumer, it's the lack of FreeSync. And that's a huge deal IMO.
Not to mention:

IIRC somebody in B3D forums said that GCN 1.0 can't run the same microcode for ACEs as there's not enough room.
 
Async Compute Not working anymore on Ashes Of Singularity, here the results

http://imgur.com/a/2sSJr

DirectX12 16.3.1 Async Compute off :
http://i.imgur.com/aiV1pSg.png

DirectX12 16.3.1 Async Compute on :
http://i.imgur.com/CGrb4yM.png

DirectX12 superior to 16.9.2 Async Compute off :
http://i.imgur.com/yiSSRCE.png

DirectX12 superior to 16.9.2 Async Compute on :
http://i.imgur.com/Fch5V8w.png


I don't have AOS, however on my R9 280 (Tahiti GCN v1), I got small performance gains running pure compute workloads on a separate compute queue in concurrency with the default queue. So no, AMD didn't (completely at lest) broke/deactivate anything.
Probably it is just a hardware discrimination made by AOS (a simple filter could be the AMD vendor ID + FL 12_0 support, which exclude all GCN 1 GPUs ).
On the last Unreal Engine (4.14) version, "async. compute" is still discriminated by AMD vendor Id only (so they do not look for NVIDIA Pascal GPUs at all).
 
Last edited:
Back
Top