That's what I believe, too, tEd. Of course, smaller granularity is more efficient, but doubling the number of pipes is far more expensive than doubling the number of ALUs per pipe. Not considering the mini ALUs, R420's arithmetic to texture ratio is 1:1, while R500's is 3:1 (vec4). It seems plausible that R520 is somewhere in between, having two full ALUs per pipeline, but still having 16 pipes.
Considering NV47 however, if that part actually exists, NVidia certainly had less time and ressources to design that chip. So it's more likely they reuse the existing design and add some more pipes. Additionally, the ROPs are already decoupled, they're at a lower clock (at least in the high-end) and they already have two shader units, though not identical, per pipe.