Let's not forget that a unified CPU + IGP would be for people who don't consider themselves gamers. So it's not that big a deal if graphics don't run at the highest possible efficiency. But merely by adding scatter/gather units there would be no need for wasting dedicated area on graphics (and note that scatter/gather helps multimedia and such as well).
I think the biggest problem is not the hardware, but rather the revolution it requires in terms of driver and software support.
It's hard enough to get something like Cuda/OpenCL/etc adopted.
So you could add nice 'graphics' features to your CPU, and even re-use them for the IGP functions of the same chip... but if no software will be using such CPU extensions anyway, why would you even bother trying to integrate the two? In fact, it seesm to be a bit of a catch-22... if this functionality is on the GPU anyway, and you can use something like OpenCL from your application... why would you even want it in your CPU? When using OpenCL, it really doesn't matter where the actual code is being run... whether it's on a CPU, IGP, discrete GPU... who cares? We're starting to see this already... For years we bought fast CPUs for 3d rendering, video encoding, physics processing etc... now that GPUs are starting to take over these tasks, who is going to care how well a CPU does them? If the GPU is faster anyway, you might aswell just leave the functionality out of the CPU to make it simpler, cheaper and faster at the things it DOES do better than a GPU.
I think it will be a long time before CPUs with integrated graphics are more than just an IGP circuit copy-pasted onto the CPU die.
I also think it wil be a LONG time until a CPU core and a GPU core are going to be virtually identical.
I mean, if you look at Larrabee... while it is built on x86 technology, it doesn't resemble a Core2 or Core i7 in any way. With it being so different from x86 CPUs, I don't see Intel being able to merge the functionality of their GPUs with their CPUs anytime soon.
If it was just a case of taking a regular x86 CPU and adding some of the SIMD extensions that Larrabee receives, then that's what Intel would have done, but obviously that was not the way to go (you can say that 200 GFLOPS is enough for IGP-like graphics... but that only goes when you're talking about an actual IGP, which has more parallelism and various fixed-function units making it far more efficient than a CPU with a few scatter/gather extensions).
And it's not just a high-end thing either, because Intel announced that their IGPs will be based on Larrabee... so it sounds like they'll just scale down Larrabee and copy-paste that onto their CPUs.
In fact, I wonder if that gap can ever be bridged... it's a case of a few complex execution cores aimed at maximum serial performance vs a case of many simple execution cores aimed at maximum parallel performance. Sounds like Amdahl to me, you just need both types of cores, since software is built up from both types of code.