Ok, so to summarize my thoughts:
In the asynccompute test, I believe the crashing behavior is just a driver bug or something it was never designed to handle. It's seeing a heavy graphics workload (made worse by the fact that the compute is also in the graphics queue) - but there's not any graphics activity on the screen. And since it's running in a window, it has to compete with running the graphics for the windows desktop. In that single command list test you can see the time spent processing the DWM command gets longer and longer as it goes on, and there's a corresponding increase CPU usage in csrss.exe - both things that are tiny sliver when theyre not run alongside the benchmark get stretched out to extraordinary lengths. It's almost as if the driver isn't able to properly preempt the benchmark to run the DWM, and it's just burning away CPU cycles and it switches between the DWM and an ever increasing test load. To me it doesn't look like this is really revealing anything about whether or not Maxwell supports async compute, either its just a bug or a normal reaction to an abnormal load, one that GCN happens to handle more gracefully.
The primary thing that GPUview revealed is that GCN considers the compute portion of the test as compute, while Maxwell still considers it graphics. This either tells us that A) Maxwell has a smaller or different set of operations that it considers compute, B) Maxwell needs to be addressed a certain/different way to consider it compute or C) it's another corner case or driver bug. And it's possible that whatever was happening in the ashes benchmark that was causing the performance issues is the same thing that's happening here. But we've got enough examples of stuff actually using the compute queue from CUDA to OpenCL, so it's absolutely functional.
So first we need to find some way to send in DX12 a concurrent graphics workload alongside a compute workload that Maxwell recognizes as compute, and see if there's any async behavior in that context. Unless the async test can be modified to do this, I think it's utility has run it's course and it's revealed a lot along the way.
And then figure out why it's not being used in the current version of the test. And if it's not being used for a legitimate reason rather than a bug or programming error - that this is one of the many things that GCN can consider compute that Maxwell can't. I can certainly believe GCN is more capable in this regard, but I still find it very difficult to believe that NVIDIA outright lied about Maxwell 2 and it's async capabilities.