I think a dual APU/GPU setup could make sense in a two SKU system: a low priced tier consisting of only the APU that runs only Kinect, XBLA, and Apps (maybe without an optical drive as well) and a system with that SOC but an additional GPU for the triple AAA games. Perhaps the second GPU is eventually merged into the SOC after a die shrink (or two).
I don't think that this would be an efficient implementation, but I guess it's a possibility.
I don't like this idea of different SKUs (to begin with) with that much disparity in characteristic.
There were rumors about MS going that route as well as announcement to be made by the CES, so far it's been proved BS.
I think that the second GPU could indeed get merged down the road but it has implication on the memory organization. IF the second GPU has it own ram and most likely is connected to it by 128 bits bus once the whole stuff put together the resulting chip will have to accommodate for two 128 bits bus which has strong impact on the chip minimal size.
Some pages ago I considered that both the SoC and GPU could be on the same as xenos and its daughter die and connected with a high bandwidth link. Problem is I don't know what kind of bandwidth can be achieved at reasonable cost
Case 1. We might want something like 64GB/s ie as much as the HD6670 is provided with if the second gpu is completed (ie ROPs are on chips).
Case 2. If in the end the second GPU is incomplete akin to Xenos and all the ROPS are on the SoC die I don't know how much bandwidth would be needed to make things workable. I fished for information about it earlier and so far I got answer. FYI I though that the bandwidth requirement would grow with the number render targers, their resolution, and the precision used for colors.
I tried to think more about it and here is my "thinking flow":
If Ms sticks in face of enthusiasts and most likely marketing division to 720P rendering a 32GB/s or a bit more link as in nowadays xbox is obviously doable.
Actually if you need less than 32GB/s and that the bus in the 360 was oversized to take in account the possibly bursty nature of the comunication between shader cores and ROPs that would be a good news (one should not forget either about the communication with the Main RAM). As it is the link in the 360 allow to write to move ~1GB of data per frame (at 33ms a frame) that's a lot more than a handfew of Render target and any sane resolution. Basically you are hold back by the bandwidth to the main RAM (22GB/s).
There are also games that rendered at 1080P on the 360 and the 32 GB/s has not been raised as a concerned as far as I remember.
So at this stage and without insiders giving me clue I start to build the conviction that one may not need that much bandwidth between the shader cores and the ROPs.
In the 360 that bandwidth was also need to move the your render target to the main RAM. If in a hypothetical system the ROPs are on the SoC and render straight into the main RAM bandwidth requirement would be somehow lowered.
In our hypotherical system the link would also be the only way for the GPU to access any kind of data (/texture) so we have to account for that, in the 360 Xenos had only 22GB/s to do so (shared with the CPU). For ref a pci Express x16 link provide up to 32GB/s
To make a long (and iffy) story short I believe that it would be achievable to have a functional second GPU as long as all the ROPs are on the SoC.
I can't see MS shipping being basically a x cores SMP CPUs + a HD 6670. Trying to make sense out of what we heard so far I could see a well design dual graphics solution surprise buy its performances and its silicon footprint.
I will give another try at what could be a really cheap system to produce and would do in fact pretty well as far as performances as concerned (and obviously giving more credibility than needed to all this early talks, but if we learn more I'll try to make sense out of it as anybody else).
SoC
6 tiny and power efficient IO cpu cores 2 or 4 way SMT. Close parent to XENON and POWER A2.
6 SIMD arrays so 96 VLIW5 units or 480 SP (as the hd 6670)
64 Z/Stencil ROP Units & 8 Color ROP Units (twice the hd 6670 so close to the hd 5770)
128-bit GDDR5 memory interface
1200MHz gddr5 => same bandwidth as hd 5770/6770 parts.
UVD3 engine
@ 32nm
GPU 2
"ROP-less" hd 6670, no UVD3 @ 28nm
I won't cone with FLOPS figures or clock speed as it's not reasonable, we've seen that AMD lately use pretty high tensions to make sure all their parts function properly. It could be even worse for a console manufacturers as bad chips have no possible use. The good news for quiet some parts (llano or hd 6670 6570) the difference in GPU clock speed have marginal impact on power consumption. On llano the main offender to power consumption seems to be the CPU cores clock speed, so manufacturers may have more room to play than AMD in the part that interest us the most ie GPU perfs
I believe that the silicon foott print for such a system would be really low, south of 200mm2 for the SOC, around the size of nowadays daughter die for the second GPU.
A summup of the system could be, 6 cores, 960 SP which would sound more sane to a lot of members here. Then it's a matter of clock speed especially the CPU clock speed to make things fit under a single radiator.