That seems like just 1 dev, not multiple devs, and still using the same horribly wrong test program.
As Linus writes from one of his other RWT followups:
See what I'm trying to explain here?
The fact is, doing your own locking is hard. You need to really understand the issues, and you need to not over-simplify your model of the world to the point where it isn't actually describing reality any more.
And no, any locking model that uses "sched_yield()" is simply garbage. Really. If you use "sched_yield()" you are basically doing something random. Imagine what happens if you use your "sched_yield()" for locking in a game, and somebody has a background task that does virus scanning, updates some system DB, or does pretty much anything else at the time?
Yeah, you just potentially didn't just yield cross-CPU, you were yiedling to something else entirely that has nothing to do with your locking.
sched_yield() is not acceptable for locking. EVER. Not unless you're en embedded system running a single load on a single core.
If I haven't convinced you of that by now, I don't know what I can say.