The main thing I was thinking is that the total size of a single die is largely limited by engineering concerns: beyond a certain size, yields start to fall significantly. So what you tend to have is die size limit out and go from having one die of a certain size to having two or more at roughly the same size. Thus the distributed design tends to have a much greater total die area available to it.
So if each individual chip is limited in its ability to dissipate heat efficiently, then you should be able to build a more powerful system by sharing the processing among a larger number of chips: the larger number of chips will be less efficient and consume more total power, but the dramatic increase in die area should allow for higher total performance.
Unless, that is, the inefficiencies from having the processing distributed into different chips overcomes the additional silicon available. This might happen if total power consumption is the most pressing issue, or it might happen if the chips become so much faster than the communications buses that the chips are continually starved for data to work on.
As chips become denser and denser, and power constraints become more and more of an issue, I do wonder if the whole industry will move to SoC designs.
I would say that yes, it will. In hindsight, this has been the trend for decades.
FPUs and successive levels of cache were integrated into CPUs, networking chips were integrated into motherboard chipsets, along with sound chips, USB controllers, etc. Then northbridges (i.e. memory controllers) were integrated into CPUs, followed by PCI-Express controllers and then GPUs.
Now AMD's Kabini is about to be launched, and it's a full SoC. Haswell will not include anything on die that Ivy doesn't (I think) but it will have on-package VRMs. And apparently, there may be some SKUs with fast DRAM on an interposer.
So mainstream CPUs are looking more and more like SoCs. Meanwhile, ARM SoCs are getting more and more powerful, to the point that they can adequately power not just tablets but also light notebooks. Even large notebooks typically have no use for chips bigger than ~200mm², whether they're pure CPUs or pure GPUs, while only APUs exceed this size, though they remain below 300mm². The only reason they still feature discrete graphics is memory bandwidth. When that hurdle is overcome, discrete graphics in notebooks will be a thing of the past.
As long as die sizes remain a constraint on desktops, discrete graphics will make sense. Yet how long will that be? Processes continue to scale well in density, but not in power. What use are more billions of transistors if you can't power them? Bill Dally recently stated that current GPUs are power-constrained more than transistor-constrained. When he said that, GF100 was NVIDIA's 550mm² flagship, which made the statement rather strange, but I imagine he was saying this from his own perspective, which is that of someone looking at designs 2~6 years down the pipeline. So perhaps the days of the discrete GPU are numbered, even on desktops.