Programming model for Cell's SPEs

cbarcus

Newcomer
Programming model for distributed-memory parallel machines adapted for use with Cell.

The microtask we propose here provides a programming model that frees programmers from local-store management and enables the preprocessor and runtime system to optimize the scheduling of computations and communications by taking advantage of the explicit communication model in the Message Passing Interface (MPI).

http://www.research.ibm.com/journal/sj/451/ohara.html
 
I agree, I think it's an interesting paper and certainly thread worthy. I'll admit readily that I couldn't read the whole thing this morning - skipped around to key parts and read the conclusion basically - but it seems to show some promise as something that might reduce the work overhead for making greater use of the SPE's.
 
An interesting thing I noticed... it seems that one of the Japanese research teams (either from the Sony or Toshiba camp?) must have have been the one to explore this programming model research, as all the names on the paper are Japanese. Makes me wonder in what other ways and in what areas the various teams may have self-segregated their research efforts.

It'd be interesting to examine all of the slideshows, presentations, research papers (up on IBM and otherwise), and other available information when it's all said and done to get a sense of who worked on what throughout the course of the Cell project.
 
Last edited by a moderator:
"In our new model, programmers do not need to manage the local store as long as they partition their application into a collection of small microtasks that fit into the local store."

Haven't had time to read it over in detail but that doesn't make it sound very exciting. It sounds like they are removing some of the headaches from manually managing local store but you've still got to break things into microtasks that fit which is the main difficulty of targeting the SPEs anyway. If you can break your application into microtasks that individually fit in local store you've already solved the difficult part of the problem.
 
Yes, micro-tasks is definitely the way to go. But building your code that way is definitely non-trivial as well. It still requires the programmers to re-learn how to program. But it has to be done, and not only for Cell. Just a few big threads isn't even an optimal solution for Xbox360, let alone just about any multi-core processor. Large, independent tasks is only a good solution when you've got many different programs running at the same time.

Then again, streaming data instead of micro-tasks might be an even better paradigm for Cell, and probably for most future multi-core solutions as well.
 
xbdestroya said:
An interesting thing I noticed... it seems that one of the Japanese research teams (either from the Sony or Toshiba camp?) must have have been the one to explore this programming model research, as all the names on the paper are Japanese. Makes me wonder in what other ways and in what areas the various teams may have self-segregated their research efforts.
They are all IBMers at IBM Yamato Lab in Japan.
http://www.research.ibm.com/journal/sj/451/oharaaut.html
 
DiGuru said:
Yes, micro-tasks is definitely the way to go. But building your code that way is definitely non-trivial as well. It still requires the programmers to re-learn how to program. But it has to be done, and not only for Cell. Just a few big threads isn't even an optimal solution for Xbox360, let alone just about any multi-core processor. Large, independent tasks is only a good solution when you've got many different programs running at the same time.

Then again, streaming data instead of micro-tasks might be an even better paradigm for Cell, and probably for most future multi-core solutions as well.

I'm wondering if programming trends move in this microtask direction it seems that CPUs with L2 cache will amount to a big waste of space.
 
heliosphere said:
"In our new model, programmers do not need to manage the local store as long as they partition their application into a collection of small microtasks that fit into the local store."

Haven't had time to read it over in detail but that doesn't make it sound very exciting. It sounds like they are removing some of the headaches from manually managing local store but you've still got to break things into microtasks that fit which is the main difficulty of targeting the SPEs anyway. If you can break your application into microtasks that individually fit in local store you've already solved the difficult part of the problem.

I agree, breaking all you code in to microstasks can't be all to easy, so if you go through that trouble, why not go the whole way and manage the SPEs manually as well and get more out them...
 
Wasn't this microtask concept fundametal to the idea of distrubted processing across a Cell network? SPUlets they were called, and breaking you code into SPUlets they could be manged by the system across available resources. I would have thought this paper is looking at ways to implement this networked processor friendly programming model, which isn't necessarily what you'd want on PS3 (at least for games) where you know you have finite resources.
 
seismologist said:
I'm wondering if programming trends move in this microtask direction it seems that CPUs with L2 cache will amount to a big waste of space.

This doesn't look too likely in most general-purpose scenarios. The thing about cache that local store llike that of Cell cannot match is transparency between implementations.

If another version of cell came out with double the local store per SPE, software compiled for the original version will simply ignore it. Also, local store is not kept coherent with system memory, which means critical changes must be explicitely written out by the program through the DMA engine to keep shared data up to date.

Cache, despite its somewhat unpredictable utilization, will be used if the program needs it. Coherency checks are also heavily leveraged for automatically keeping multiprocessor memory behavior consistent, though at a cost of hardware complexity and bandwidth inefficiency.

What may happen in the future is a more tightly controlled cache like Xenon's L2. It won't be explicitely addressed as a separate space, but there would be additional software hooks to get it to do what programmers want. Most architectures already have a few examples of basic cache instructions.
 
Back
Top