Has there ever been a PC game that actually fully used all 4096 SP, efficiently, through async compute via Parallelism?
Games like Forza 7 and Forza Horizon 4 play very well on GCN as well as other console ports (It seems UWP Games in particular). Wolfenstien 2 isn't the only 'PC' game using Async Compute.
IMO Navi's HW scheduler might be a bit more robust and their ACE could be re-organized with a lower SP count. Something that would take them beyond Async Shader quick response queuing
It might help to clarify what specifically you mean by HW scheduler.
There is an HWS unit, at least on Polaris and later GPUs. Are your data points covering older chips like Hawaii, which may mean this is not making as much of a difference?
The HWS and ACEs are on the front end of the GPU's command processing pipeline, and many games that see high performance on GCN often see big gains because their shaders and workload characteristics better fit the architecture of the back end where the workgroups launched by the front end go to be executed.
Losing resources in the back end, past a relatively modest baseline for asynchronous compute, would generally make things harder for the front end. They have limited influence on how well wavefronts execute past their early involvement with them.
If there is any expectation of improvement in the graphics pipeline, removing compute resources would make it doubly hard for the ACEs and HWS, as they exist to fill in gaps in graphics execution. The fewer spare resources, the fewer gaps.
Quick response queuing and other forms of resource reservation or pre-emption do worse if there are fewer resources to spare. They explicitly sacrifice throughput for responsiveness, either by walling off resources that could have gone somewhere else "just in case" or by forcing wavefronts that were running at very high utilization to yield to something that is not.