If we consider the concept of a unified shader design with a farm of "ALUs" assigned work units (shader code) against a pool of pixels/vertices, would this be a strong argument against a dual-GPU graphics card?
I'm thinking that latency across multiple GPUs' discrete pixel/vertex pools becomes prohibitive, or that the architectural benefits in using a farm<->pool architecture are heavily depleted by halving (at least) the possibility that an ALU can work on a pixel/vertex. i.e. if a farm becomes "free" to work on a pixel, but the pixel that's "ready" at that moment is on the other GPU's pool, then you've lost some of the benefits of the unification.
Alternatively, if you consider that the unified model is designed to hide the latency of memory, and that a moderate increase in latency caused by a multi-GPU architecture can be overcome by a small increase in the overall capacity of the farms and pools, then maybe multi-GPU isn't fatal.
Perhaps a single shared pool between multiple cores would be required. Blimey, what kind of memory controllers are you talking about then?
It's also interesting to think about whether ATI's "super-tiled" (R300) architecture naturally progresses (by way of increased granularity) into a unified architecture. This transition seems to require an increase in granuality in both functionality and time. If super-tiling provides such a neatly load-balanced approached to a multi-GPU architecture, then I suppose a unified architecture would follow quite smoothly, being nothing more than a finer-grained version of the super-tiled architecture.
Well, I expect I'm talking to myself...
Jawed