That was not in conjunction with concurrent execution. That was taking about when having different kernel execution in the same queue when the program called for turning off one kernel to execute another.
PS fine grain preemption is already enabled in pascal, they have talked about it anyways.
Maxwell 2 can do CUDA kernel execution in its graphics queue with no problem at all, it does have issues with doing direct compute in its graphics queue (this is the problem that we saw here, after doing one operation the second operation just falls into serial execution, we did see other issues where the graphics instructions started to go into the compute queue with DX12 and direct compute, this should not be happening at all, something was or is messed up in the drivers). Open CL behaves similarly to CUDA in this respect too.
They should not be using preemption for this kind kernel execution, preemption is only for when you need to force something to be done at a certain time, where latency matters for certain operation.
PS fine grain preemption is already enabled in pascal, they have talked about it anyways.
Maxwell 2 can do CUDA kernel execution in its graphics queue with no problem at all, it does have issues with doing direct compute in its graphics queue (this is the problem that we saw here, after doing one operation the second operation just falls into serial execution, we did see other issues where the graphics instructions started to go into the compute queue with DX12 and direct compute, this should not be happening at all, something was or is messed up in the drivers). Open CL behaves similarly to CUDA in this respect too.
They should not be using preemption for this kind kernel execution, preemption is only for when you need to force something to be done at a certain time, where latency matters for certain operation.
Last edited: