PDA

View Full Version : exponential backoff in atomics?


wadeb
20-Nov-2007, 20:31
Hello,

I am wondering if anyone else has found the need to implement exponential backoff in their low level atomics on Cell - atomic Add, Or, And, CompareAndSwap, etc.

We are seeing an issue in our job manager where all 6 SPUs and both PPUs are trying to lock the same mutex, and it *occasionally* takes several milliseconds of fighting before one of them gets it.

The locking mechanism is essentially:

while (!CompareAndSwap(&mutex->Lock, MyThreadID, 0));

The PPU threads are doing a yield in the while loop, the SPU threads are doing nothing.

CompareAndSwap is implemented using lwarx/stcwx on PPU and getllar/putllc on SPU.

So I'm wondering:

a) Has anyone else seen this behavior in their applications?
b) Is exponential backoff the right solution?

I'm thinking of something similar to what ethernet does: http://en.wikipedia.org/wiki/Truncated_binary_exponential_backoff
But I will need my random function to be fast and to not have degenerate cases that would still cause fighting.

Regards,
Wade

Vitaly Vidmirov
21-Nov-2007, 18:53
takes several milliseconds of fighting before one of them gets it

Sounds weird. Probably mutex is owned by blocked PPE thread?
One millisecond is 3.2 million cycles. More than enough to render a couple of frames in Quake1 :grin:
Did you tried non-blocking queues or something to reduce PPE<->SPE communication?

wadeb
24-Nov-2007, 15:48
Hi there, thanks for the response.

Yeah, it is possible in this situation that the PPU thread is being swapped out while having the mutex acquired, it might not be a real livelock situation. FWIW the problem was resolved for now by switching to a lock free queue implementation (versus head / tail mutex implementation), although that approach will encounter a similar problem when a thread pops a queue element it cannot handle, and gets interrupted before pushing it back on the queue, leaving the other threads waiting.

However, it seems unlikely - for the above reasons, we are careful to give each PPU core only one thread that takes any significant time. Any interference would be coming from the OS.

-----------------------------

I'm still wondering if having 8 cores simultaneously reserving, modifying and conditionally storing the same cache line as fast as they can is a good thing for overall latency, lock free queue or mutex notwithstanding.

I'm also wondering if the differences between PPU lwarx/stcwx and SPU getllar/putllc could contribute to livelocks in a way that exponential backoff would help.

Regards,

Wade

patsu
25-Nov-2007, 19:40
wadeb, you might also want to post your question at the CBE OSS mailing list: http://ozlabs.org/pipermail/cbe-oss-dev/2007-November/thread.html

The people working on PS3 Linux kernel are there.