I was tempted to write an in-depth reply, but then I realised this entire discussion is just a short-term anomaly. All CPUs and GPUs are designed around one key limitation: massive external memory latency. It is not a coincidence that the most interesting innovations in the last 10+ years have happened in DSP architectures with very small external bandwidth requirements (e.g. wireless basebands: Picochip, Icera, Coresonic, ...)
Is this a fundamental limitation of physics? No, it's only a fundamental limitation of electricity. While data movement will always be a potential bottleneck in a fixed dimensional world (as opposed to a pointer-based world as possibly implied by quantum teleportation) it seems unlikely that the cost of data movement per computation must be so high as it is today.
Imagine what would happen if external memory latency was so low that you had the equivalent of a 1GB L1 cache in terms of latency and bandwidth (with very low power consumption). Every single design consideration of modern architectures would fly right out of the window. Even fine-grained parallelism would be made orders of magnitude simpler. In the long-term, Silicon Photonics is one viable contender - it might not get to that level of performance overnight, but it's far from impossible.
---
Nick, as to your points, even if you were right about area (which I'm very skeptical about), I think you're massively underestimating the data movement overhead and in general the power consumption penalty of all this. And chip designers are more and more willing to sacrifice a LOT of area to save power consumption, both directly at the architectural level and indirectly by reducing voltages.
So yeah, errr... we were talking about 22nm Larrabee I think? It is noteworthy that even Larrabee is severely limited by the cost of on-chip data movement (hi R5xx/R6xx ring bus). Adding more accelerators would have a lot of hidden costs. I think it's really important that both the software and hardware architectures are made with data movement in mind, and this adds yet another layer of complexity that you wouldn't expect from a classical software renderer.