Most GPUs seem to be designed so that both T&L and setup run at full speed with 1 vertex/tri transformed by a 4 instruction shaders. Longer shaders and you'll be T&L limited. Even with a realy good mesh with 0.5 vertex/tri, you'd still be T&L limited with a 8 instruction shader.
That's a rather short shader. 4 instructions is enough for one matrix*vector operation. Do some matrix palette skinning, morphing and/or per vertex lightning, and you'll do far more instructions than that. And then there's plenty of time in the setup engine.
Same goes with the sage/rampage bus. If you're running "advanced" shaders (that don't realy need to be all that advanced), then you're not pushing so many vertices over that bus, so there are time left for two sages to push data.
There *could* be one shared bus. When a sage finishes a vertex, it sends it to both rampages simultaneously on one bus. When the other sage finishes its vertex, it does the same thing, over the same bus. The sages would need some extra connection to keep in sync so they know when they can send data.
If you run 4-instruction shaders on such a setup. There will of course be stalls that drag down the performance to single rampage levels. But in such a situation you wouldn't likely be limited by T&L anyway.
Geeforcer:
It wouldn't make it possible to run longer shaders. But "long" shaders would get higher throughput.