Right... there will be a need to rearrange the layout so that it's optimal for making a rectangular die as you shouldn't expect one side of the CPU to line up exactly with one side of the GPU. Then there's the I/O to each processor's respective RAM pools as well as to one another so just putting two chips together isn't going to work per se.
It'll depend on what you're trying to achieve by merging the two or if there even is a second interface to RAM. Waternoose accesses RAM via Xenos, but other CPUs will naturally have some I/O to DDR3 or XDR for example. There are clear advantages to UMA so... there will be some design changes. Llano only accesses DDR3, for instance.
Now, as a monolithic design, that allows for a wider bus to RAM, but then they'll have to figure out just how small they expect to shrink the chip in the future, and whether or not to start out with your modest 128-bit or 256-bit interface; FWIW, the smallest 256-bit designs have been upwards of 190mm^2.
Also, starting out with a larger die has implications to yield, manufacturing, clock speeds, TDP, and so on compared to producing two smaller chips, especially if you expect eDRAM, which has different requirements over your regular CMOS stuff. Xenos's history so far is the perfect example.