It'll depend on a number of factors so I don't think a single figure is possible.
A rule of thumb is not an absolute. It is a quick estimate that usually is roughly correct.
According to IBM the figure is marginally higher than the POWER5+.
Yes, another undisclosed and very high TDP.
I was off by roughly 50 Watts. POWER6 would be over 150-170W, without taking into account the TDP of the L3 cache it is tied to.
However that is not the point I was making. I was answering the point that OOO = higher performance, it's not. In the case of POWER6 IBM gained performance by dropping OOO.
And making a process transition, upping the clock speed, adding look-ahead load execution, fixing a number of design issues, expanding the caches, and seriously expanding socket level bandwidth.
Without a lot of extra logic and resources, POWER6 would suffer a huge dip due to its lower efficiency.
Sun's Niagara processors are also in-order and their performance is also massively higher than their predecessors, although it is for a relatively restricted range of tasks.
It's different enough that I'm hesitant to say Niagara has predecessors, except in a limited subset of SPARC machines.
As has been pointed out these are completely different processors for different applications. I was comparing POWER6 to POWER5+.
I think it would be an interesting comparison to see how much closer POWER5+ would be if it had the same expansion of bandwidth, better process, and time to refactor some implementation-specific faults, like its 2-cycle result forwarding.
Anyway, SPEC is more of a compiler test that anything else these days, since Niagara and Cell appeared it's become effectively useless.
Unless you use the real-world applications that are either the exact same applications or operate similarly to the real-world application exemplars in SPEC.
The key point is that there are measures of performance where a spectacularly aggressive in-order design loses to an OoO chip, and in some measures it loses incredibly badly: such as power and cost.
For anything other than comparing Intel and AMD. We now that many things can run very fast on Cell, however the SPEC rules mean it can't be tested properly, so it would run like a dog on Cell.
What rule would that be?
It's the other way around, the PPC processors in Cell are based on a design which stated back in 1997. High end processors take a very long time to develop so it's unlikely they could have learned anything from it that could have had any impact.
The circuit design techniques used in the PPE and Xenon did inform the final circuit design for POWER6. It's not entirely coincidence that they all are on high-performance SOI processes. IBM has low-power and bulk processes as well, and they would have been cheaper.
Anyway this is all irrelevant.
My point was that you cannot add OOO and expect a doubling of performance without increasing power consumption. It's just not going to happen!
That would be true, but the point you used, specifically tying OoO with more register file read ports was wrong.
Furthermore, using POWER6 as an example is fraught with danger because the chip uses a gigantic amount of resources to make up for its being in-order. It does a number of things that would make no sense for an SPE.
Since clock scaling leads to rapid climbs in power consumption and future processes are making it increasingly difficult to yield great clocks, irrespective of design, console chips will likely have modest gains in clock speed, if any.
Since clock speed is a prime factor for in-order performance, since it often lacks so much else, the benefit of slightly increasing design complexity to increase per-clock efficiency is there.
It may not benefit the SPE too much right now, but it might help the PPE.
Since the console market's TDP cap artificially limits clocks, there is a gray area where evaluating power consumption versus efficiency can lead to interesting outcomes.