DX11 Compute Shader Dependencies

Rogon · Mar 13, 2013

Hi,

I was wondering if anyone here has a notion of how DX11 schedules compute jobs. As we all know, the threads in a thread-group executes in lock-step, but other thread-groups from the same Dispatch() call may execute on other units.

My question is, can other jobs from other Dispatch() calls also execute in parallel? If so, how is dependencies tracked? For instance, how do you guarantee that the results of compute job A is finished before compute job B starts up (in case B reads a RwBuffer generated by A)?

Thanks.

3dcgi · Mar 14, 2013

Multiple dispatches can execute in parallel, but only if there are no dependencies. If the output of A is an input to B, B will not start until A finishes.

Ethatron · Mar 14, 2013

There are no guarantees. It's up to you to synchronize. You can for example use full memory barriers, and then you at least get the guarantee that outstanding writes from one group have been made visible to all others. In practice you can use that to use algorithms which rely on memory-coherency, and to channel information from group to group in a lock-free setup.
You don't have mutexes in DC, and because of that you can't implement lock-based algorithms the regular way.

Rogon · Mar 14, 2013

First, thanks both!

@3dcgi: I didn't see this mentioned on Microsoft's SDK documentation, would you mind sharing with us where this is specified?

@ethatron: How do you specify barriers in the DX11 command buffer? And you mention "DC", what is that?

Thanks!

Rogon · Mar 14, 2013

Thanks both!

3dcgi: Do you know where in Microsoft's documentation this is specified?

Ethatron: What is DC? Also, how do you make memory barriers in DX11? I didn't see any mentions of it in the DeviceContext classes.

MJP · Mar 14, 2013

The D3D model requires that the inputs of a dispatch or draw call correctly reflect the results of a dispatch or draw call that was executed previously. Hence, there is no explicit dispatch-to-dispatch or draw-to-dispatch synchronization in D3D. The driver and GPU is able to implement it however they want behind the scenes, as long they maintain correctness from the point of view of the programmer. In practice GPU's are certainly capable of having multiple draws and dispatches in flight simultaneously, and the driver is responsible for analyzing your commands to determine dependencies so that it can insert sync points.

3dcgi · Mar 15, 2013

Rogon, I don't know where anything is documented, just what happens in the hardware. MJP summed it up well.

Ethatron · Mar 15, 2013

DC is just DirectCompute, too lazy too type.

Manual barriers are intrinsic functions, just like the atomics. If the driver detects they"re redundant they may be quit from the program-flow:
http://msdn.microsoft.com/en-us/library/windows/desktop/hh447241%28v=vs.85%29.aspx
http://msdn.microsoft.com/de-de/library/windows/desktop/ff471351(v=vs.85).aspx

DX11 Compute Shader Dependencies

Similar threads