This then raises the question, under what circumstances would async shaders even be beneficial to maxwell/Kepler, other than obvious advantages for VR with fine granularity in preemption?
Not only "would", but "are", as we can see from examples like (the canceled) Fable Legends.
IMHO, I think that there are cases in which the scheduler does a better job at aligning the render pipeline stall free, compared to what developers at Lionhead Studios achieved when attempting to do it manually / statically, using only the 3D queue.
Apart from that: None.
Should even out at +-0 with use of async shaders. Slight gains if the scheduler can avoid a stall, slight losses when the scheduler messes up. And that's about the same for every architecture not supporting parallel execution in hardware, not limited to Maxwell and Kepler.
Actually, there are two aspects where the hardware can aid with / profit from async shaders. One is the obvious parallel execution of independent command lists, the other one would be hardware support for queue synchronization, avoiding the CPU roundtrip for scheduling. Both require support for multiple queues in hardware, but the features can be provided independently.
The first theoretically brings increased resource utilization, at the possible risk of cache trashing. The second one reduces the penalty for synchronization greatly.
If I'm not mistaken, then GCN provides both features for the compute queues, the synchronization from 3D to compute queue is limited to barriers and fences in the compute queue.
Kepler and Maxwell provide neither of these features unless the GMU is brought back to life, in which case it should actually behave quite similar to GCN. (The lack of mixed mode operation on single SMM units aside.)