Freak'n Big Panda
Regular
It's not the memory BW that's the issue as much as the BW between the GPUs. Making a 50-100 GB/s connection between two GPUs or having a "northbridge" with 100GB/s connections to the RAM and each GPU isn't particularly easy, and the latter would waste gobs of silicon. You need a lot of pins running at a very high speed to get that kind of a connection.
I do think it will eventually happen, though. Maybe we'll see fibre-optic GPU interconnects in a few generations.
I think it's technically possible to use the gddr protocol between mem controllers on adjacent GPUs. It may also be possible to put switch chips between the DRAM channels of each GPU to create a UMA. Either of these solutions could potentially solve the persistent surfaces issue