With all the multi-core systems coming out (hello my nice little Mac Pro ), it is becoming increasingly difficult to distribute your workload in a sensible manner. I plan on writing a small, multi-platform scheduler that takes care of of those things for me, and I'll make it open source as well (as I'd like to use it in my other open source endeavours as well). Note that one of the most important features of it is that it should be near-"lossless" on a single-core system, so I don't feel bad about creating thousands of "work-units" to get something done. Enter tish - tiny scheduler. No code is written, but I wanted to discuss / get some feedback on some of the ideas I've had.
Essentially, tish is supposed to allocate an OS task for each available core and via processor affinity force that to be always scheduled on the same one. tish then dishes out work-units to all the available worker-threads, taking into account the things like input / output regions (of memory) so that data in the L2 cache can be reused when the thread gets scheduled on a core that has access to the same cache (not necessarily the same core as L2 cache can be shared).
So work-units would need to specify their input data / arguments, what sort of output they're producing (e.g. single number or a stream), possibly some sort of load vector (with components for integer, fpu, mmx, sse, "bandwidth") that can be used to optimize scheduling differing workloads for hyperthreading-like execution unit sharing, dependencies on other results / work-units (which could be referenced as inputs maybe), and of course the code that is supposed to be executed.
I'm not quite sure yet what the best way is to specify this graph, or how to make things like "inputs" and "outputs" generic enough without being too cumbersome. Similarly, some investigation will have to be done so figure out how make the communication between the scheduler and its worker threads as lockless as possible. Also (at least when running on a desktop system), multiple applications using tish should ideally all use the same scheduler and worker threads, but that's not a high priority.
Implementation-wise, I think the problem would be well suited to an OO-approach (i.e. C++), but portability-wise I would probably prefer sticking to basic C (as otherwise all clients would have to be either C++ or use some sort of wrapper, neither of which is particularly appealing).
In any case, I would welcome some feedback / ideas / criticism, and in the best case, collaborators...
Essentially, tish is supposed to allocate an OS task for each available core and via processor affinity force that to be always scheduled on the same one. tish then dishes out work-units to all the available worker-threads, taking into account the things like input / output regions (of memory) so that data in the L2 cache can be reused when the thread gets scheduled on a core that has access to the same cache (not necessarily the same core as L2 cache can be shared).
So work-units would need to specify their input data / arguments, what sort of output they're producing (e.g. single number or a stream), possibly some sort of load vector (with components for integer, fpu, mmx, sse, "bandwidth") that can be used to optimize scheduling differing workloads for hyperthreading-like execution unit sharing, dependencies on other results / work-units (which could be referenced as inputs maybe), and of course the code that is supposed to be executed.
I'm not quite sure yet what the best way is to specify this graph, or how to make things like "inputs" and "outputs" generic enough without being too cumbersome. Similarly, some investigation will have to be done so figure out how make the communication between the scheduler and its worker threads as lockless as possible. Also (at least when running on a desktop system), multiple applications using tish should ideally all use the same scheduler and worker threads, but that's not a high priority.
Implementation-wise, I think the problem would be well suited to an OO-approach (i.e. C++), but portability-wise I would probably prefer sticking to basic C (as otherwise all clients would have to be either C++ or use some sort of wrapper, neither of which is particularly appealing).
In any case, I would welcome some feedback / ideas / criticism, and in the best case, collaborators...