All the AMD talk made me want to make some silicon budget comparisons.
CPU
AMD Zactate (Bobcat / E-350) was 75mm^2 on 40nm in Winter 2011 with 2 CPU cores (1.6GHz) and 80 stream processors (500MHz) with 18W TDP. (Bulldozer, just to compare, has 8 CPU INT cores and was 315mm^2 on 32nm.) AMD’s new stream architecture has 64 stream processors per CU.
http://www.anandtech.com/show/4134/the-brazos-review-amds-e350-supplants-ion-for-miniitx
But it gets interesting as the GPU is nearly 75% of the Bobcat core:
http://www.chip-architect.com/news/AMD_Ontario_Bobcat_vs_Intel_Pineview_Atom.jpg
A CPU is more than the core, so there needs to also be room for various cache sizes and levels, memory controllers, and so forth. Moving forward Jaguar is threatening to introduce quite a few enhancements to Bobcat so it is likely the core size will inflate quite a bit, especially as caches are very important to performance. 28nm is theoretically 50% smaller than 40nm yet Jaguar cores will, again, have a bit of enhancements and likely require more local memory so it is not likely to fit 4 cores into the same area as Bobcat cores, even with the die shrink.
But even being generous and saying 2 Bobcat cores + cache, memory controller were half a die at 75mm^2 (so about 38mm^2 for 2 cores) it would seem 2 Jaguars could (conjecture) fit into that die area on 28nm and 4 Jaguar cores into 75mm^2. 150mm^2 would be a rough guestimate to the total area needed for 8 Jaguar cores.
Because Al is so awesome:
http://beyond3d.com/showthread.php?t=62651
Xenon was 167mm^2 on 90nm in Fall 2005.
Cell was 235mm^2 on 90nm in Fall 2005.
>> 4 Jaguar cores are conjecturally going to be about half, or less, the silicon budget of 2005 consoles and much lower TDP.
>> 8 Jaguar cores would be roughly in the ball part of the silicon budget of the 2005 consoles but with a similar or lower TDP.
GPU
Pitcairn (Radeon 78570) was 212mm^2 on 28nm in Winter 2012 with 20 CUs (1280 stream processors), 80 TMUs, and 32 ROPs with various models clocking in from 860MHz to 1000MHz with various units disabled on a 256bit bus with a total GPU board TDP ranging from 130-175W.
Xenos was 262mm^2 (182mm^2 GPU and 80mm^2 Daughter Die) in Fall 2005.
RSX was 258mm^2 in Fall 2005.
Various things to consider is 28nm maturing as well (density, performance, TDP), various tradeoffs to maximize the primary limiting factor (TDP) between fewer units/higher clocks and more units/lower clocks, and that high frequency GDDR5 uses a lot of power—not necessarily just the chips/# but the memory controller. It is hard to guess the power needed for eDRAM (especially if it is a cache instead of a write buffer like Xenos) and the size, and important if stacked memory and a Silicon interposer may be used.
>> Pitcairn is in the general size class, if not on the slightly small size, compared to consoles in 2005 (or about the same if 32-64MB of eDRAM was introduced).
Looking at these numbers, and the SA snapshot of a chip with memory on a Silicon Interposer, it seems really likely one could do the following:
~ 325mm^2 SoC
>> ~ 75mm^2 : 4 Jaguar cores
>> ~ 200mm^2 Pitcairn class GPU with some CUs disables
Put two memory modules (stacked?) on the same SI and you lower the power and ramp up the bandwidth. You could even go with a relatively large bus from the entire SI to a more general memory pool.
So in theory you could have full “HAS” with a higher end GPU and CPU on the same die and a boatload of lower power memory next door. The only real major drawback seems to be the large step back in total CPU cores and peak throughput—but I don’t know how reasonable 8 CPU cores would be going with a SoC.
Anyways, it seems to my armchair (bad?) math and guesses the AMD rumblings of a Pitcairn class + Jaguar cores SoC, possibly on an SI, seem very reasonable based on last gen BOM budgets for the silicon.
I am sure a lot of people would consider such a design very elegant and balanced.