I searched for hd 6850 power consumption figures, hardware.fr is pretty reliable in this regard, they get 107Watts under 3DMarks and 116 under Furmark. It's close to be acceptable for a console especially as it takes in account 2GB of GDDR5.
A slight down would make it workable. Finer lithography may allow to pack extra SIMD to make up for lower frequency. Still it's not completely straight forward statement the 5850 while clocked a bit lower than the 6850 packs more SIMD/tex units and consumes a lot more (comparison is valid same memory speed, same bus width, etc.).
So sadly it's tough to pack more than hd 6850 power within a console power budget (and we're speaking of an healthy power budget).
Hardware designers are facing quiet a challenge. Clearly a one chip solution is at the same time the most efficient solution and the cheaper. The problem is that it may not packed enough muscles to be a compliant replacement for our consoles (I believe that the replacement will last more than 5/6 years too).
Design an efficient two chips system is in fact non trivial especially if you want to provide a UMA space. There are plenty of trade off that taxes the system efficiency:
*You're likely to have the memory controller attached to the GPU, that implies that investment in the CPUs will be some how hindered. Actually it may not make sense to invest too much in the CPU as it could be quiet bandwidth constrain. In fact using a 128 bits bus even the GPU could prove bandwidth constrained.
*EDRAM is a solution to alleviate this constrain but it adds extra cost. You need more than 10MB, you may need a faster link than the one in the 360 (which provides 32GB/s) as you're likely to move more datas. Actually to make the most of this pool of EDRAM you may need extra logic as it makes sense for it to be accessible from the GPU. Moving data to the EDRAM chip, then from edram to the main through the GPU and then back to GPU sounds like pretty inefficient. Overall you would want the chip to be more complew than xenos daughter to make the most of what should most likely be an healthy piece of silicon.
Overall my understanding is to make an optimal two chips system is a tough job, you may end up with actually a three chips system , setting limitation constrains on the programming model (tiling, giving up on UMA, etc.).
I've wondered about the idea of using twice the same chips so two APUs. In regard to costs it's the same as a CPU+GPU set up. (x2 a bus from APU to RAM and a fast interconnect between the two APUs). There's a
big but, that's a dual GPU setup and feeding two GPUs is still not trivial.
I've read the slide released about their upcoming gpus and it doesn't help on the matter. If I understand right the "graphic pipeline " is still handled by the command processor in a monolithic fashion ( no matter geometry set-up, tessellation rasterization are now distributed). So it would appear to programmers as two chips.
But there is another
but whereas the graphic pipeline is still monolithic it seems that AMD broke it when it comes to compute operation. As I get it their next GPUs can be view as multiple Compute devices, there will be multiple ACE which stands for Asynchronous Compute devices
It's irrelevant for us if the ACE and the "Command processor/thingy in charge of the graphic pipeline" are implemented as a monolithic piece.
This is of great help when we consider using two chips, developers won't see 2 GPUs but for example a pool of 8 computes devices (say each GPU includes 4 of such devices). As AMD states those are asynchronous which means devs can conveniently spread the rendering tasks among those devices, they are like independent cores or SPUs.
Actually it get me thinking about the relevance of the graphic pipeline. I wonder if could be relevant assuming a closed bow to simply throw away the part "dedicated to graphic pipeline" altogether and move to a software. I guess it has to do with their size, if it amounts to a fraction of the chip... who cares but the cost should be evaluate also in regard to the implementation cost. In next AMD GPUs there are an export bus that move data from the CU to the "graphic pipeline". The overhead could be big than it looks and passing on it may allow a more streamlined design or really keep it minimal.
Honestly I'm drooling to learn more (and see how they perform) about those parts. I drool even more about a further step in the direction of a pure compute architecture. If it mean giving up on the rasterizer and tessellation units I'm ok with that. If a trade off. Rendering through computes units only may allow techniques that lower the (external) bandwidth requirements. Why not implement something like TBR intended for Larrabee?
Basically having two APU that delivers ~1TFLOPs of power is achievable within a reasonable power envelope so we could be looking @2TFLOPS of pretty usable compute power (+ whatever the CPU cores adds to this figure). That's as much as the optimal figure Intel aiming for with Larrabee and most likely the architecture will be (way) more efficient. The cost distributed on plenty of SIMD should be low enough.
I read concerns about tessellation for larrabee like / TBR kind of rendering. Honestly tessellation sounded great but I see no great use of it coming, Carmack might be right about it. Anyway taking in account the forecasts for the tech I believe that passing on it is not unreasonable.