I'd have thought tesselation on SPU's would work just as well. The difference with PS3 being that that'd take away from other activities, whereas in XB360 if you don't use the tesselator it sits there idle. In PS3's case, you needs these cycles 'going spare'.
There are three issues there. The first of course as you mention is that the spu's are already busy doing other things. Look back on many threads here on B3D and all the things spu's get mentioned with. Hey lets throw vertex processing on spu's, shadow calcs, lighting, post processing, tesselation, ai, culling, animation blending, texture decompression, etc, etc. The spu's are fast but they are still finite, and they currently have a million other things to do!
Secondly, tessellation was meant to reduce cpu load. So you do all your cpu processing on a reduced mesh, then let the tesselator expand it into something that looks nicer. But this new improved mesh still must run through the vertex shader before heading on to the pixel shader. In other words, you can still be bitten by vertex processing limitations even with spu tessellating. For example, lets say you have magically moved 100% of your vertex processing to the spu's and your vertex shaders do nothing. They may do no shader processing, but they still have to spend time fetching streams of data, interpolating them, and passing them onto the pixel shaders which likely still have some work to do. Cpu side tessellating will increase this load. So if you unfortunately need say 3 or more vectors of graphics data on rsx (which causes performance hits), then these hits will be multiplied when spu side tessellating is used even if your vertex shaders do nothing. I might be wrong here since I've never been allowed to use the 360's hardware tesselator, but I believe the limitation there is similar in that the hardware tesselator does is thing pre-vertex shader, then that new uber mesh in it's entirety goes through the normal graphics pipeline of vert and pixel processing, but I'm sure someone will correct me if I'm wrong on that.
Thirdly, the big issue with all this spu graphics work is stalls. Using the spu's in your graphics pipeline can result in large dependency chains of 'a' depends on the completion of 'b' which depends on 'c', which depends on 'd', etc. When all these dependencies are spread across gpu+cpu processing then the likelihood of stalls increases. Massive parallelism works great on gpu's because they have tons of processors at which the entirety of the graphics problem is hurled at. In that methodology the "graphics processing" part can almost be seem as an atomic process at which the machine will happily grind away at, scheduling, shifting and prioritizing tasks as only it can do on it's myriad of processors to help reduce stalls automatically. When you have humans trying to do the same with spu's+gpu, then the odds of keeping everything 100% fed with data are low, meaning some processing will idle/stall. Throw in the randomness of a game application and this likelihood increases somewhat. Throw in a severely fractured pipeline, like where say cpu/gpu processing is heavily intermixed, and good luck in hitting peak performance.
EDIT: Oops sorry Al, saw your reply after I posed