TDP is another obvious reason. Although I didn't see any numbers, but I'd estimate the TDP of the entire chip to be in the neighborhood of 200W. It's acceptable for a server, but it's impossible to use something like this on a consumer device.
Of course, a scaled-down version of POWER7 could help (just like PowerPC 970 is a scaled down version of POWER4), but I don't see any clear advantage of a scaled down POWER7 vs other chips without further details published.
I can't find any news on whether they brought back any of the out or order functionality they dropped in the Power6 model. I'm guessing no...
Edit: I see now, looks like they have put it back, at least to some degree.
Density is around 2.1 mm2 per MByte at 45nm compared to ~12.5
mm2 per MByte for the sram based caches of Tukwilla (at 65nm).
Would be more interested in Nehalem EX's L3 density than any IPF chip
Would be more interested in Nehalem EX's L3 density than any IPF chip, given the older/larger process node(s) always involved in that sort of comparison.
May be it can help you.IBM says that it has a 65nm prototype eDRAM running with 1.5ns latency and 2ns random cycle time—speeds that are competitive with current SRAM.
Paul DeMone
A 4.7 GHz Power6 sees 160 cycles of latency to the L3, an external
device.
That is effectively a 34 ns access time.
If Power7's L3 latency really is a factor of six less then its L3 effective
access time is ~5.7 ns or about 23 cycles at 4 GHz. BTW that figure is
almost certainly for each CPU's associated local 3 MB L3 slice and would
be longer for accesses to the L3 slice associated with a different CPU
on the die. The best case L3 latency may also assume a hit to an open
page and an L3 access with a page miss is noticably longer.
For comparison, the 6 MB SRAM based L3 associated with each CPU in
Tukwila has an access latency of 15 cycles or 7.5 ns @ 2 GHz.
If Power7's L3 latency really is on the order of 6 ns then that would be
a remarkable achievement in eDRAM circuit design and architecture
but I suspect the 6:1 quote is based on a different set of numbers and
assumptions than I used here and the real figure is higher.
A 3-core adaptation (Even with 3-core, the hardware thread count doubles) plus removal of MCM communication and just 9 MB of eDRAM would keep a POWER7 based XCPU3 under 200 mm2.
A 3-core adaptation (Even with 3-core, the hardware thread count doubles) plus removal of MCM communication and just 9 MB of eDRAM would keep a POWER7 based XCPU3 under 200 mm2.
"Shanghai" has a best case latency of 29 CPU clocks, whereas "Barcelona" had a best case latency of 34 CPU clocks. So the lower latency to data stored in L3 cache should also help to significantly boost performance.
Why 3-Core?
Why not a "simple" 1/2 POWER7 4-core CPU with 16MB eDRAM and ~280mm² @ 4GHz with ~ 65-70Watt (based on pcchens estimate of 130-140 Watt for the Power7 CPU). Could be cometitive with other 4 core CPUs but with the energy consumption of some 2 core CPUs.
From what I've heard, the maximum die size Microsoft would accept is 180 mm2.
Not saying down-sized power7 would make sense or not, but surely by the time xbox3 would appear it would use at least cpu manufactured in 32nm, hence it would fit nicely.From what I've heard, the maximum die size Microsoft would accept is 180 mm2.