well, what springs to my mind right away is, if designing a CPU+GPU package from the start, rather than being a way of condensing 2 seperate dies originally, then you can potentially start off with much higher bandwidth between CPU and GPU on one chip, rather than having the chips communicate across a bus on the motherboard as is normally the case.
the way I see chips evolving [beyond CELL+RSX and Xenos+Xenos] is this: a new unified processing architecture. first part is the CPU portion that is designed to do CPU things
second part are unified processing elements that can either be tasked to help the CPU portion or tasked to help the graphics portion. these units are the majority of this new type of PROCESSOR in terms of transistors and space. these guys can be tasked on the fly to help either the CPU part or the GPU part, or both, much like Xenos' unfied shaders that can do either geometry processing or pixel processing. the third part is the graphics rendering portion that do only GPU things, contains hardwired functions for rasterizing and displaying graphics. On a system level, you can have 1,2,3,4 or as many of these processors as cost allows. graphics enjoy a large boost in clock frequency and there is much less latency and more bandwidth and much more efficent use of processing / computational resources. It can do what CELL does not do well, that is, process graphics on its own because it is a unified CPU+GPU with both general purpose programmable units, CPU units and GPU units in one chip.
^just a dream but maybe that is the road Sony and Nvidia are going down....