So you envision a scenario, say next gen, where running 20 separate programs simultaneously isn't enough, and you need 60?
At the chip level, at least for CUDA, I was under the impression that the number of kernels at a given instant was 1.
The current methodology is context switching between kernels.
The overhead of this looks like a future design's low-hanging fruit.
I was only thinking of juggling 2-4 such contexts.
The minimal granularity is per-SIMD, just by virtue of the fact that SIMDs run the exact same instruction over all their units, so nothing smaller can be done.
I'm not advocating that things be split down to one kernel per SIMD, just that the current setups are very coarse.
As far as I can tell CAL and CUDA both support more than one kernel running simultaneously.
I was under the impression that the last time we checked that the threading hardware would only work from CUDA kernel at a time.
Running multiple kernels at the chip level was not simultaneous, rather there was a context switch and startup of the separate kernel.
We're looking at the following types of kernel in D3D11 I reckon:
- Control Point
- Vertex
- Geometry
- Pixel
- General Computation
Jawed
Kernel types or thread types? The usage prior to this indicated that multiple thread types could be applied to a kernel.