Preemption has a non-zero cost in terms of time and the non-workload consumption of resources for bookkeeping and running the special subroutines for moving data and execution context out of the way, and then later ramping it back up. That's injecting a second startup and flush in the middle.Thanks. Actually, the first phrase that came to my mind when trying to think of a way to describe this was "QoS mechanism". Would you mind going into the bolded a little more?
For the graphics pipeline, the priority queue slides show where the graphics workload slopes down, leaving resources it could be using otherwise idle, but the compute portion cannot start until the last bit of graphics execution is out of the way.
Context switching for compute isn't as global a switch, but individual wavefronts and kernels need to move data in and out rather than running their own code and can tie up a CU for a while even for unrelated wavefronts on the same CU.
In either case, if it weren't for time pressure the GPU probably would have waited for a while and filled in slots as they eventually opened up. This assumes there aren't super long-lived wavefronts, or in another scenario malicious ones trying to DoS the system.
Reserving CUs constrains the GPU from being able to use all its resources for the problem at hand, in favor of keeping them free for a workload that might not need them for a while.
Prioritization does mean the GPU starts picking winners and losers when it comes to competing for a CU, so the losers will see their wavefront launch rate drop.