No, it is blending them together just fine. One queue is filled commands from the render pipeline. The other 31 queues are either filled with commands from up to 31 dedicated (software) queues or with async tasks.
That's almost working as it is supposed to. There appear to be several distinct problem though:
- If a task is flagged as async, it is never batched with other tasks - so the corresponding queue underruns as soon as the assigned task finished.
- If a queue is flagged as async AND serial, apart from the lack of batching, all other 31 queues are not filled in either, so the GPU runs out of jobs after executing just a single task each.
- As soon as dependencies become involved, the entire scheduling is performed on the CPU side, as opposed to offloading at least parts of it to the GPU.
- Mixed mode (31+1) seems to have different hardware capabilities than pure compute mode (32). Via DX12, only mixed mode is accessible. (Not sure if this is true or if just the driver isn't making use of any hardware features.)
- The graphic part of the benchmark in this thread appears to keep the GPU completely busy on Nvidia hardware, so none of the compute tasks is actually running "parallel".
- "Real" compute tasks require the GPU to be completely idle first before switching context. This became prominent as Windows is dispatching at least one pure compute task on a regular base, which starved and caused Windows to kill the driver.
Well, at least the first two are apparently driver bugs:
- Nvidia COULD just merge multiple async tasks into batches in order to avoid underruns in the individual queues. This would reduce the number of roundtrips to the CPU enormously.
- Same goes for sequential tasks, they can be merged in batches as well. AMD already does that, and even performes additional optimizations on the merged code (notice how the performance went up in sequential mode?).
So AMD's driver is just far more matured, while Nvidia is currently using a rather naive approach only.
Well, and there's the confirmation.
I wouldn't expect the Nvidia cards to perform THAT bad in the future, given that there are still possible gains to be made in the driver. I wouldn't exactly overestimate them either, though. AMD has just a far more scalable hardware design in this domain, and the necessity of switching between compute and graphic context in combination with the starvation issue will continue to haunt Nvidia as that isn't a software but a hardware design fault.