aaronspink
Veteran
DeanoC said:1) Random access - you need to localise or predict data you need to read OR write (writing is even more interesting than reading...).
Christ on toast! I hadn't even thought of that one yet. In the normal processor world we do a lot of things that effectively defer writes to cache/memory. In essence, with a modern processor, you should rarely if ever encounter a stall caused by a store. This is primarily because we keep track of them and handle all the nasty coherence and memory order model issues. In the DMA/LS model you can't really do this, as you need the data space that you will be storing into loaded in the LS. Now you can get away without doing this, but only if you can guarentee that your overwriting the whole thing which can be unfortunately pretty hard to actually program in.
Is this documented anywhere? How big is large quanta? I could see how that could present problems for a lot of known parallized algorithims. Even when you have rare syncronization currently, you generally split the locking structure into small enough quanta that the probability of actually having a conflict is minimal.2) Synchronisation - Its fairly slow and has a minimum node lockable size (the atomic unit has a large quanta, so you can't lock data less than that).
that damn Amdahl guy again.3) Efficiency - Keeping the algorithm fast.
Aaron spink
speaking for myself inc.