Jawed
Legend
One of the goals in the design of Cell is to support "wildly-abandoned gather".MrWibble said:There are loads of algorithms where currently we access memory with wild abandon, simply because it was easiest to write them that way and it's not too bad on a typical CPU (though it's never the most sensible thing to do). However that's not to say that they *have* to be done that way.
The memory flow controller (I think it's the MFC, one per SPE) is like a small CPU that can access LS. In LS it finds lists of memory addresses and how much data is to be fetched (and where to put it - from main memory to LS or vice versa, or from one LS to another LS).
It then works out the best way to perform those tasks and generates DMA tasks for its own private DMA unit.
What's unclear to me is how predictable the latency of gathers and scatters is, and whether it's possible for the developer to program SPEs in a way that's tolerant of the unpredictable latency associated with "wildly abandoned gathers".
That's not to say that a traditional L2 cache is better - it seems to me it's going to cause the thread to grind to a halt too while waiting for the gathers to turn up.
I think the fact that LS isn't restricted to an "n-way" associativity that's normal for caches gives programmers the chance to control when data is loaded and "evicted" from LS in a more fine-grained fashion than caches normally support.
The other side of the coin is the coding overhead in order to maintain LS data - something that's normally performed with significantly more hardware support.
http://www-128.ibm.com/developerworks/power/library/pa-celldmas/
Jawed
Last edited by a moderator: