The pipeline configuration of a card is an implementation detail. Over the years, we have gone back and forth from 1-3 texture units per pipeline.
(Voodoo2, 2 units per pipe, TNT 1 per pipe, Voodoo5, 1 per pipe, GF2 2 per pipe, R100 3 per pipe, R300 1 per pipe, GF FX (variable pipes)...)
The optimal pipeline configuration depends on the rendering task at hand. If most of your pixels are single textured, then a x1 config is better, if you are doing Quake style multitexture or an even number of texture stages, a x2 config is better.
For games that switch back and forth between single textured passes and multitextured shader passes, it depends on which pass requires the highest fillrate. Obviously, it would be nice if the configuration of the pipelines themselves was programmable.
It is not true that 8 pipelines are always better than 4, since it depends on what you are rendering. Remember, both the Voodoo5 and TNT had "variable pipes". In single texturing, you had twice the number of pipes available for writing pixels than in multitexturing mode. Also, depending on whether trilinear was enabled on not, some cards would "lose" a texture unit.
The GFFX's architecture is probably just another variation of the type of limitations the V5, TNT, etc had where you lost resources depending on rendering state.
It would be nice if the NV35 could do something like: write 16 Z+Stencil values OR write 8 dual textured pixels per cycle. That would take care of the Doom3 case quite nicely.