Brodda Thep
Newcomer
SPEs to LS
The relevant quote in that document may be:
I think the biggest problem is that SPEs do not mesh well with current popular programming paradigms. Ideally, you would want to load up a small function/executable/whatever into a small part of the local store then churn through a ton of data while issuing new DMAs. Perhaps asking for 96k at a time. Issue a request, work on the other 96k while waiting, and then swapping and issuing another 96k request. But then your memory needs to be put into sequential memory blocks. You certainly don't want the data you need to be working on intermixed with data that has no use in the current thread or spread all over the memory space.
I like using object oriented programming techniques myself. And the hoops I would have to go through to get my data in the proper format does not sound fun, but then I haven't done any graphics or physics work worth speaking about. Certainly you won't want to be using anything that has dynamic memory needs, but then I assume that is avoided anyways in the console space as allocating memory tends to be expensive.
At any rate, you won't be putting normal threads on the SPEs without losing a lot of performance. It sounds like you will need a significantly different approach to getting the most out of cell, luckily that approach should work well with XeCPU and PCs. But going the other way is really not an option.
I don't see anything in there that says that SPEs can directly access the Local Store of other SPEs.Its called the Element Interconnect Bus.
http://www-306.ibm.com/chips/techlib/techlib.nsf/techdocs/D9439D04EA9B080B87256FC00075CC2D/$file/MPR-Cell-details-article-021405.pdf
The relevant quote in that document may be:
Which is, of course, not direct access.Similarly, another SPE can use the DMA controller to move data to an address range that is mapped onto a local store of another SPE or even to itself.
I think the biggest problem is that SPEs do not mesh well with current popular programming paradigms. Ideally, you would want to load up a small function/executable/whatever into a small part of the local store then churn through a ton of data while issuing new DMAs. Perhaps asking for 96k at a time. Issue a request, work on the other 96k while waiting, and then swapping and issuing another 96k request. But then your memory needs to be put into sequential memory blocks. You certainly don't want the data you need to be working on intermixed with data that has no use in the current thread or spread all over the memory space.
I like using object oriented programming techniques myself. And the hoops I would have to go through to get my data in the proper format does not sound fun, but then I haven't done any graphics or physics work worth speaking about. Certainly you won't want to be using anything that has dynamic memory needs, but then I assume that is avoided anyways in the console space as allocating memory tends to be expensive.
At any rate, you won't be putting normal threads on the SPEs without losing a lot of performance. It sounds like you will need a significantly different approach to getting the most out of cell, luckily that approach should work well with XeCPU and PCs. But going the other way is really not an option.