IBM Power 7 @ HotChips

POWER6 3.2GHz is rated at about 100W TDP @ 65nm SOI, and 4.7GHz POWER6 is 160W. Since POWER7 is expected to run at 4GHz, and if its TDP is "roughly the same as POWER6" then it probably will be at about 120W ~ 130W?
 
TDP is another obvious reason. Although I didn't see any numbers, but I'd estimate the TDP of the entire chip to be in the neighborhood of 200W. It's acceptable for a server, but it's impossible to use something like this on a consumer device.

Of course, a scaled-down version of POWER7 could help (just like PowerPC 970 is a scaled down version of POWER4), but I don't see any clear advantage of a scaled down POWER7 vs other chips without further details published.

It looks like the TDP isn't completely out of the question, however I understand its a very complicated issue. Its just great to know that theres another viable processor architecture our there for me to speculate about.
 
I can't find any news on whether they brought back any of the out or order functionality they dropped in the Power6 model. I'm guessing no...

Edit: I see now, looks like they have put it back, at least to some degree.
 
I can't find any news on whether they brought back any of the out or order functionality they dropped in the Power6 model. I'm guessing no...

Edit: I see now, looks like they have put it back, at least to some degree.

? Either it's in-order or out-of-order, there's no in-between-order :p
 
I believe Power6 had some limited out-of-order stuff, it was for the floating point unit and according to their tech doc: "In general, the POWER6 processor is an in-order machine, but the BFU instructions can execute slightly out of order.... At most, eight FPU instructions can be issued out of order from the FPQ."

So it sounds like even the float unit didn't have the its ooo capability to the same degree as the older Power CPUs. So I'm just wondering if they really brought it back to the same level they had before.
 
Last edited by a moderator:
An interesting note about the EDRAM density from Hans de Vries @ Ace's:

Density is around 2.1 mm2 per MByte at 45nm compared to ~12.5
mm2 per MByte for the sram based caches of Tukwilla (at 65nm).

Would be more interested in Nehalem EX's L3 density than any IPF chip, given the older/larger process node(s) always involved in that sort of comparison.
 
Would be more interested in Nehalem EX's L3 density than any IPF chip, given the older/larger process node(s) always involved in that sort of comparison.

Hans de Vries said 5.7mm²/MB for standard Nehalem, http://aceshardware.freeforums.org/finally-an-image-of-shanghai-t405-15.html, presumably EX will be similar? Would be nearly factor 3 in size difference, but without knowing latency that doesn't mean much (though over at aceshardware they seem to think the ibm edram latency is quite low).
As a side note, as usual intel got way higher cache density than AMD (comparable density for L2 but intel is using 8T-sram for that...).
 
I did some searches about the differences between DRAM and SRAM, I also want to learn more about Edram (forgive my ignorance :) ) thus I spent some time on wiki. I find a link to this article (I know it's pretty old but it's the only thing I found).
here a quote:
IBM says that it has a 65nm prototype eDRAM running with 1.5ns latency and 2ns random cycle time—speeds that are competitive with current SRAM.
May be it can help you.
 
from here: http://aceshardware.freeforums.org/power-7-t891-15.html

Paul DeMone

A 4.7 GHz Power6 sees 160 cycles of latency to the L3, an external
device.

That is effectively a 34 ns access time.

If Power7's L3 latency really is a factor of six less then its L3 effective
access time is ~5.7 ns or about 23 cycles at 4 GHz. BTW that figure is
almost certainly for each CPU's associated local 3 MB L3 slice and would
be longer for accesses to the L3 slice associated with a different CPU
on the die. The best case L3 latency may also assume a hit to an open
page and an L3 access with a page miss is noticably longer.

For comparison, the 6 MB SRAM based L3 associated with each CPU in
Tukwila has an access latency of 15 cycles or 7.5 ns @ 2 GHz.

If Power7's L3 latency really is on the order of 6 ns then that would be
a remarkable achievement in eDRAM circuit design and architecture
but I suspect the 6:1 quote is based on a different set of numbers and
assumptions than I used here and the real figure is higher.
 
I saw Paul's comments in that same thread but didn't want to quote them because of his disclaimer at the end.

So no one outside IBM knows the real latency figure for Power 7's L3 as of yet. Will be interesting to revisit this thread once we do know, though.
 
A 3-core adaptation (Even with 3-core, the hardware thread count doubles) plus removal of MCM communication and just 9 MB of eDRAM would keep a POWER7 based XCPU3 under 200 mm2.
 
A 3-core adaptation (Even with 3-core, the hardware thread count doubles) plus removal of MCM communication and just 9 MB of eDRAM would keep a POWER7 based XCPU3 under 200 mm2.

It's possible, but right now we don't know how this fares in a consumer oriented application compared to, say, a Bloomfield, which is roughly 260mm2 with 4 cores.
 
A 3-core adaptation (Even with 3-core, the hardware thread count doubles) plus removal of MCM communication and just 9 MB of eDRAM would keep a POWER7 based XCPU3 under 200 mm2.

Why 3-Core?

Why not a "simple" 1/2 POWER7 4-core CPU with 16MB eDRAM and ~280mm² @ 4GHz with ~ 65-70Watt (based on pcchens estimate of 130-140 Watt for the Power7 CPU). Could be cometitive with other 4 core CPUs but with the energy consumption of some 2 core CPUs.
 

Quoting myself ;)

According to this link: http://forums.amd.com/devblog/blogpost.cfm?catid=271&threadid=103010

the AMD Shanghai CPU has a L3-Latency of 29 cycles.
"Shanghai" has a best case latency of 29 CPU clocks, whereas "Barcelona" had a best case latency of 34 CPU clocks. So the lower latency to data stored in L3 cache should also help to significantly boost performance.

So the Power7 eDRAM L3-Cache could be faster than the SRAM Cache of the AMD Shanghai. If that is true than the eDRAM L3-Cache would IMHO really be a breakthrough.
 
Why 3-Core?

Why not a "simple" 1/2 POWER7 4-core CPU with 16MB eDRAM and ~280mm² @ 4GHz with ~ 65-70Watt (based on pcchens estimate of 130-140 Watt for the Power7 CPU). Could be cometitive with other 4 core CPUs but with the energy consumption of some 2 core CPUs.

From what I've heard, the maximum die size Microsoft would accept is 180 mm2.
 
16 MB eDRAM is any way a waste for console CPU's. They stream a a lot of data from disk to GPU. So they prolly need only 4MB cache. After all, 360 has only 1 mb L2 cache.

And yes, I think they'll have more than 4 cores for xbox720.
 
From what I've heard, the maximum die size Microsoft would accept is 180 mm2.
Not saying down-sized power7 would make sense or not, but surely by the time xbox3 would appear it would use at least cpu manufactured in 32nm, hence it would fit nicely.
 
If you measure the POWER7 die shot, each core, with its associated L2 cache, is only about 30 mm^2. That's about the same size as one core of the current Xbox 360 CPU. A console-destined POWER7 variant could be even smaller because there would be no need for the full four double-precision floating point units. (One should be enough for a console application.) I strongly suspect that IBM will use the POWER7 as the basis for the PPE' (<-- note the "prime") in the PowerXCell 32ii and 32iv.
 
Back
Top