My argument boils down to this:
1. Because memory coherence is software based in CELL inter process and inter thread communication is expensive. That is counter productive (that's being nice) when scaling your architecture to a massively multicored implementation.
2. Local stores have no automatic way of sharing data. When you want to parallize a task that has a large shared dataset, you end up with individual copies in each SPE, in other words, each SPE only ever has fast access to (less than) 256KB. So even though you have 8 SPEs with a total of 2MB, you only have 256KB effectively per SPE, going forward to future implementations with say 24 SPEs, each SPE still only sees 256KB. This in contrast to a cache memory hiearchy processor, say you have a dual core with 2 MB (L2, shared) cache, going forward and doubling you get 4MB that all processors might take advantage of (constructive interference).
The alternative is to spatially divide the dataset among the SPEs. But like Patsu mentions if you have a hot region with many agents, one SPE is doing all the work while the others are just very expensive (unused) SRAM.
Cheers
Gubbi
1. Because memory coherence is software based in CELL inter process and inter thread communication is expensive. That is counter productive (that's being nice) when scaling your architecture to a massively multicored implementation.
2. Local stores have no automatic way of sharing data. When you want to parallize a task that has a large shared dataset, you end up with individual copies in each SPE, in other words, each SPE only ever has fast access to (less than) 256KB. So even though you have 8 SPEs with a total of 2MB, you only have 256KB effectively per SPE, going forward to future implementations with say 24 SPEs, each SPE still only sees 256KB. This in contrast to a cache memory hiearchy processor, say you have a dual core with 2 MB (L2, shared) cache, going forward and doubling you get 4MB that all processors might take advantage of (constructive interference).
The alternative is to spatially divide the dataset among the SPEs. But like Patsu mentions if you have a hot region with many agents, one SPE is doing all the work while the others are just very expensive (unused) SRAM.
Cheers
Gubbi