IBM Power 7 @ HotChips

pcchen · Aug 26, 2009

POWER6 3.2GHz is rated at about 100W TDP @ 65nm SOI, and 4.7GHz POWER6 is 160W. Since POWER7 is expected to run at 4GHz, and if its TDP is "roughly the same as POWER6" then it probably will be at about 120W ~ 130W?

Squilliam · Aug 26, 2009

pcchen said:
TDP is another obvious reason. Although I didn't see any numbers, but I'd estimate the TDP of the entire chip to be in the neighborhood of 200W. It's acceptable for a server, but it's impossible to use something like this on a consumer device.

Of course, a scaled-down version of POWER7 could help (just like PowerPC 970 is a scaled down version of POWER4), but I don't see any clear advantage of a scaled down POWER7 vs other chips without further details published.

It looks like the TDP isn't completely out of the question, however I understand its a very complicated issue. Its just great to know that theres another viable processor architecture our there for me to speculate about.

Mat3 · Aug 26, 2009

I can't find any news on whether they brought back any of the out or order functionality they dropped in the Power6 model. I'm guessing no...

Edit: I see now, looks like they have put it back, at least to some degree.

ShaidarHaran · Aug 26, 2009

Mat3 said:
I can't find any news on whether they brought back any of the out or order functionality they dropped in the Power6 model. I'm guessing no...

Edit: I see now, looks like they have put it back, at least to some degree.

? Either it's in-order or out-of-order, there's no in-between-order

Mat3 · Aug 27, 2009

I believe Power6 had some limited out-of-order stuff, it was for the floating point unit and according to their tech doc: "In general, the POWER6 processor is an in-order machine, but the BFU instructions can execute slightly out of order.... At most, eight FPU instructions can be issued out of order from the FPQ."

So it sounds like even the float unit didn't have the its ooo capability to the same degree as the older Power CPUs. So I'm just wondering if they really brought it back to the same level they had before.

ShaidarHaran · Aug 27, 2009

An interesting note about the EDRAM density from Hans de Vries @ Ace's:

Density is around 2.1 mm2 per MByte at 45nm compared to ~12.5
mm2 per MByte for the sram based caches of Tukwilla (at 65nm).

Would be more interested in Nehalem EX's L3 density than any IPF chip, given the older/larger process node(s) always involved in that sort of comparison.

mboeller · Aug 27, 2009

Would be more interested in Nehalem EX's L3 density than any IPF chip

...and latency. The EX L3-Cache architecture seems to be the same as the power7 architecture with 3 MB associated with each Core. Link: http://www.semiaccurate.com/2009/08/25/intel-details-becton-8-cores-and-all/

mczak · Aug 27, 2009

ShaidarHaran said:
Would be more interested in Nehalem EX's L3 density than any IPF chip, given the older/larger process node(s) always involved in that sort of comparison.

Hans de Vries said 5.7mm²/MB for standard Nehalem, http://aceshardware.freeforums.org/finally-an-image-of-shanghai-t405-15.html, presumably EX will be similar? Would be nearly factor 3 in size difference, but without knowing latency that doesn't mean much (though over at aceshardware they seem to think the ibm edram latency is quite low).
As a side note, as usual intel got way higher cache density than AMD (comparable density for L2 but intel is using 8T-sram for that...).

liolio · Aug 27, 2009

I did some searches about the differences between DRAM and SRAM, I also want to learn more about Edram (forgive my ignorance

) thus I spent some time on wiki. I find a link to this article (I know it's pretty old but it's the only thing I found).
here a quote:

IBM says that it has a 65nm prototype eDRAM running with 1.5ns latency and 2ns random cycle time—speeds that are competitive with current SRAM.

May be it can help you.

mboeller · Aug 27, 2009

from here: http://aceshardware.freeforums.org/power-7-t891-15.html

Paul DeMone

A 4.7 GHz Power6 sees 160 cycles of latency to the L3, an external
device.

That is effectively a 34 ns access time.

If Power7's L3 latency really is a factor of six less then its L3 effective
access time is ~5.7 ns or about 23 cycles at 4 GHz. BTW that figure is
almost certainly for each CPU's associated local 3 MB L3 slice and would
be longer for accesses to the L3 slice associated with a different CPU
on the die. The best case L3 latency may also assume a hit to an open
page and an L3 access with a page miss is noticably longer.

For comparison, the 6 MB SRAM based L3 associated with each CPU in
Tukwila has an access latency of 15 cycles or 7.5 ns @ 2 GHz.

If Power7's L3 latency really is on the order of 6 ns then that would be
a remarkable achievement in eDRAM circuit design and architecture
but I suspect the 6:1 quote is based on a different set of numbers and
assumptions than I used here and the real figure is higher.

ShaidarHaran · Aug 27, 2009

I saw Paul's comments in that same thread but didn't want to quote them because of his disclaimer at the end.

So no one outside IBM knows the real latency figure for Power 7's L3 as of yet. Will be interesting to revisit this thread once we do know, though.

TEXAN* · Aug 27, 2009

A 3-core adaptation (Even with 3-core, the hardware thread count doubles) plus removal of MCM communication and just 9 MB of eDRAM would keep a POWER7 based XCPU3 under 200 mm2.

pcchen · Aug 28, 2009

TEXAN* said:
A 3-core adaptation (Even with 3-core, the hardware thread count doubles) plus removal of MCM communication and just 9 MB of eDRAM would keep a POWER7 based XCPU3 under 200 mm2.

It's possible, but right now we don't know how this fares in a consumer oriented application compared to, say, a Bloomfield, which is roughly 260mm2 with 4 cores.

mboeller · Aug 29, 2009

TEXAN* said:
A 3-core adaptation (Even with 3-core, the hardware thread count doubles) plus removal of MCM communication and just 9 MB of eDRAM would keep a POWER7 based XCPU3 under 200 mm2.

Why 3-Core?

Why not a "simple" 1/2 POWER7 4-core CPU with 16MB eDRAM and ~280mm² @ 4GHz with ~ 65-70Watt (based on pcchens estimate of 130-140 Watt for the Power7 CPU). Could be cometitive with other 4 core CPUs but with the energy consumption of some 2 core CPUs.

mboeller · Aug 29, 2009

mboeller said:
from here: http://aceshardware.freeforums.org/power-7-t891-15.html

Quoting myself

According to this link: http://forums.amd.com/devblog/blogpost.cfm?catid=271&threadid=103010

the AMD Shanghai CPU has a L3-Latency of 29 cycles.

"Shanghai" has a best case latency of 29 CPU clocks, whereas "Barcelona" had a best case latency of 34 CPU clocks. So the lower latency to data stored in L3 cache should also help to significantly boost performance.

So the Power7 eDRAM L3-Cache could be faster than the SRAM Cache of the AMD Shanghai. If that is true than the eDRAM L3-Cache would IMHO really be a breakthrough.

TEXAN* · Aug 30, 2009

mboeller said:
Why 3-Core?

Why not a "simple" 1/2 POWER7 4-core CPU with 16MB eDRAM and ~280mm² @ 4GHz with ~ 65-70Watt (based on pcchens estimate of 130-140 Watt for the Power7 CPU). Could be cometitive with other 4 core CPUs but with the energy consumption of some 2 core CPUs.

From what I've heard, the maximum die size Microsoft would accept is 180 mm2.

rpg.314 · Aug 30, 2009

16 MB eDRAM is any way a waste for console CPU's. They stream a a lot of data from disk to GPU. So they prolly need only 4MB cache. After all, 360 has only 1 mb L2 cache.

And yes, I think they'll have more than 4 cores for xbox720.

mboeller · Aug 31, 2009

TEXAN* said:
From what I've heard, the maximum die size Microsoft would accept is 180 mm2.

OK, somehow I overlooked the reference to the XBox3 CPU in your first posting.

mczak · Aug 31, 2009

TEXAN* said:
From what I've heard, the maximum die size Microsoft would accept is 180 mm2.

Not saying down-sized power7 would make sense or not, but surely by the time xbox3 would appear it would use at least cpu manufactured in 32nm, hence it would fit nicely.

paawl · Sep 1, 2009

If you measure the POWER7 die shot, each core, with its associated L2 cache, is only about 30 mm^2. That's about the same size as one core of the current Xbox 360 CPU. A console-destined POWER7 variant could be even smaller because there would be no need for the full four double-precision floating point units. (One should be enough for a console application.) I strongly suspect that IBM will use the POWER7 as the basis for the PPE' (<-- note the "prime") in the PowerXCell 32ii and 32iv.

IBM Power 7 @ HotChips

pcchen

Moderator

Squilliam

Beyond3d isn't defined yet

Mat3

ShaidarHaran

hardware monkey

Mat3

ShaidarHaran

hardware monkey

mboeller

mczak

liolio

Aquoiboniste

mboeller

ShaidarHaran

hardware monkey

TEXAN*

pcchen

Moderator

mboeller

mboeller

TEXAN*

rpg.314

mboeller

mczak

paawl

Similar threads