geo said:So then, when are YOU expecting WDDM 2.0?
As this is a kernel change I expect it a part of a service pack. Additional we will need hardware that makes use of it. This makes me believe that we are at least one year away from WDDM 2.
geo said:And how penal do we think the context switching in WDDM 1.0 will be in the meantime?
The WDDM1 GPU scheduling is fully based on the command / DMA buffer system that is used for the new driver model. Command buffers are sending from the user mode driver to kernel graphics subsystem and DMA buffers are used to transfer data to the GPU. At first we need to remember that the user mode driver is loaded in every process that makes use of Direct3D. As every form of interprocess communication is expensive this different instances should not need to talk with each other. But this is not the end. Every Instance of the driver needs to support multiple independent devices even for the same GPU. These different devices should not need to talk with each other, too.
Every device provides a virtual 3D pipeline that matches the API requirements. The driver needs to store the current state for this pipeline (D3D10 offers another solution but this have nothing to do with these buffers). Additional the runtime and the kernel graphics subsystem makes sure that for every device in the user space there is a device in the kernel space too. Remember that the kernel driver is only loaded once. After the devices are created the user mode driver starts to collect commands that are send form the application in a private format in a command buffer. If such a command buffer is full or need to flush for any other reason it will be transferred to the kernel graphics subsystem. As we can have many devices the subsystem can receive buffers form multiple processes and stored them. Now every time the GPU scheduler knows that the GPU needs more work it select one of these buffers and call a driver functions that translate the content of a command buffer to a DMA buffer. As the kernel mode driver knows the current state of the physical GPU and the virtual state of the device that sends the command buffer it can generate and add all commands that are necessary to switch from the current to the new state to the DMA buffer. After the DMA buffer is filled it will be reported to the GPU. The GPU processes it and generate an interrupt to inform the graphics subsystem. To prevent stalling the GPU scheduler always tries to generate DMA buffers in advanced. This means during the GPU still works on one buffer the system have already called the driver to generate the next one.
As long as only one process sends command buffers the state changes in the DMA buffer are the same that are done in the application. But if two or more devices send command buffers the subsystem can interleave them and the driver have to add additional state change commands to switch the physical state of the GPU between the multiple virtual states. Depending on how different the virtual states on the multiple devices are this could be very expensive. But as it was already possible with Windows XP to have more than one 3D device running at the same time the problem is not that new and already solved. Because of this we should not expect it more badly than with Windows XP. The biggest performances problem when you use multiple devices at the same time is not the context of the GPU at all. It is the local ram on the card. If all the devices needs more than you have the GPU memory manager have to swap all the time.
Sorry for such an amount of text without any picture to make it easier to understand. Maybe I should write a full introduction for WDDM instead of such snippets.