On top of that, I find it rather absurd to think that grabbing an extra SPE on demand and taking over an existing thread is feasible. I mean, the contents of the LS is part of the state, which means that if you interrupt a working thread, you have to back all of that up somewhere, and then restore it when you're done. Which could easily take hundreds of thousands of cycles.
Yes, it's quite costly yet that's exactly what happens. The OS needs to perform some (presumably audio) tasks on demand, and swaps the lowest priority SPE thread.
This is exactly how Xbox360 operates as well, hence the use of core 3, which I've seen reported from 5% to 25%.