The technical specs will be dictated by economics.
A BOM of $500 seems fair for a $400-$450 launch price, I don't believe Sony nor MS will abandon the loss leading model. This gives us a cost breakdown very similar to last gen:
Around 450 mm² die area in total for CPU and GPU.
4GB RAM
Optical drive
$30-$50 worth of permanent storage
Both MS and Sony had reliability problems related to heat/cooling. Microsoft had a tons of heat related RRODs and Sony had failures related to their neat low noise centrifugal cooling system. Since quality cooling solutions costs, I expect both to aim at a slightly lower power consumption of 120-150W.
Speed will be primarily dictated by power consumption, secondly by yield. Power scales with operating frequency to the third degree, lowering speed by 26% halves power consumption, this gives a lot of room for hitting the right power envelope.
Cheers
I would like that to come true as I managed to convinced my self that a single pretty big APU was the way to go.
I'm no longer sure about the 4GB of RAM, last Laa-Yosh post managed somehow to convince me that 2GB could be enough. It just sounds "wise".
I assume that you think of a 2 chips design (one CPU and one GPU). I would you split the budget?
I assume
that 1/3 for the CPU and 2/3 for GPU is a reasonable bet. That's ~150mm2 for the CPU and 300mm2 for the GPU.
Asuming 32nm and 28nm processes would be used, one could fit 4/6 high performance CPUs cores (~gross estimate based on Llano supposed size) and something bigger more powerful than cayman.
Without taking in account power consumption and more importantly thermal dissipation that means really high single thread performances and maybe ~3TFLOPS of power from the GPU.
I checked the Hd 6970/50 characteristics in this regard again (idle/ under load):
HD 6970: 22/210 W //// 40° C / 92° C //// 39db / 49 db
HD 6950: 20/160 W //// 47° C / 90° C //// 39db / 47 db
That's pretty ugly, manufacturers may have (as you say) to significantly cut the clock speed to meet the requirements of a pretty tiny closed box especially as they would want all the valid chip on the wafer no matter their power/thermal characteristic.
Consistently cutting clock speed the chip should deliver ~2TFLOPS (gross guesstimate). That's beefy it's almost a ten time increase in raw power vs say a Xenos (more taking in account the gain in efficiency).
I assume that the manufacturers would go with a UMA memory model with the GPU integrating the north bridge (aka the 360). That means CPU perfs will suffer a bit from the latency of GDDR5 to begin with and from the extra latencies associated with an external memory controller.
I can help but think that with that silicon budget, the power and thermal constrains one could try to push out a "larrabee like". I've been closely tryign to follow the conversation about the odds of larrabee @22nm, I wonder if it would be possible now in a closed box. The context in not the same in the aforementioned thread it's about the relevance of CPU rendering vs using an IGP, they are clearly different opinions actually may be not considering the same time frame.
Nick has some neat ideas to alleviate some of the lacking of nowadays CPUs (that were actually share by the larrabee core we know). He gave links on some Intel paper showing how scatter-gather could be implemented on a CPU. He also explained how extending the registers width vs the actual SIMD/ALUs width could be beneficial. If I understand the benefit, it hide X times the latency (depending on the ration between the actual SIMD width and the register width), it also the number of instruction you need to get your core busy.
We're not interested that much in IGP level of performances at least not by nowadays IGP level of performance, so texture units are needed for the design.
While trying to be conservative I wondered about what such a chip could do nowadays.
I considered 8 cores @2.4GHz.
The chip would mostly be mix of the one described in the Intel paper, original larrabee as well as as bending in Nick insight and features that are now common in nowadays CPUs (and more importantly in IBM ones) and Xenon/PPU.
64KB of L1 (32+32). (I$ 2-way D$ 4-way)
x8 256KB of L2. (8-way)
Lower than latency than the one in Xenon/PPU
16 Wide SIMD (512 bits as in larrabee).
64 wide registers (2048 bits as in I guess Xenos and lots of ATI GPUs).
The scalar pipeline would be an improved (not drastically) version of the xenon one.
4 way multi threading (either barrel or round/robin).
Some MB of L3.
There would be up to date text units.
Cores and text units should be tied to a ring bus.
integrated memory controller (DDR3).
Fast chip to chip interconnect.
The CPUs would have pretty complete power management measures, turbo mode and clock gating. For example if the SIMD is not used it's killed power wise and the scalar pipeline is allowed to run at way higher frequency (at least the frequency of xenon) and depending on the temperature of the chip as a whole frequency would adapt further(say plenty of cores are idle).
Such a chip would deliver way higher single thread performance than for example the same number of larrabee cores (not hard) and if SIMD are killed the goal should be to do a bit better per cycle than a PPU/Xenon. It should deliver 8x16x2x2.4 =>614GFLOPS 2 of them 1.2 TFLOPS.
Say the system consists of two chips and 2GB of DDR3
Vs the first system ( cpu+gpu 2GB of GDDR5 ) it would fall short in raw power that's sure, there's less to begin with and the first system doesn't have to hit in resources to ensure operations as rasterization for example.
On the other hand the system would cheaper, R&D for only one chip, scale economy (you produce twice as much), cheaper RAM. Actually the chip could end reasonably tiny and 2 may fit on the same package from scratch 32 cores larrabee was big @45 nm, a big quarter of that @32nm even "non Intel process could be pretty reasonable. Larrabee was +500mm2 taking 1/3 of that and going by a 0.7 scaling while moving to 32nm gives 120mm2 (which leaves room for the L3 + optimizing the design for power vs density), why not a video encoding/decoding unit).
So it could end with either more ram, more Flash on board, or simply a price advantage.
Overall it would be I think an interesting match with in standard situations a performance advantage for the first system (something like one rendering @1080p and the other at 1440*1080) but we may see stuffs impossible or tough to pull out on first system on the second one.