To be honest: I'm not sure. I'm not familiar enough with how the software stack is structured to give a proper reasoning.
All I did understand from the explanation given to me, is that the scheduler is in fact part of the OS.
Kernel mode or part of the user space runtime? No clue, even though kernel mode appear likely since it's also responsible for scheduling concurrent execution of multiple 3D accelerated applications. Definitely not part of the driver, or in any way exposed to it.
On hardware not supporting multiple queues of any of the 3 types, it performs a transparent mapping, both from the perspective of the application and the driver.
Does this mesh with Futuremark's description of its multi-queue process?
http://www.futuremark.com/pressreleases/a-closer-look-at-asynchronous-compute-in-3dmark-time-spy
Unlike the Draw/Dispatch calls in DirectX 11 (with immediate context), in DirectX 12, the recording and execution of command lists are decoupled operations. This means that recording can and does happen as soon as it has all available information and there is no thread limitation on it.
For GPU work to happen, command lists are executed on queues, which come in variants of DIRECT (commonly known as graphics), COMPUTE and COPY. Submission of a command list to a queue can happen on any thread. The D3D runtime serializes and orders the lists within a queue.
Once initiated, multiple queues can execute in parallel. But it is entirely up to the driver and the hardware to decide how to execute the command lists - the game or application cannot affect this decision with the DirectX 12 API.