Which has zero significance. Power/megahurtz is 100% pointless to compare.
In Power/Performance Woodcrest wins hands down.
I wouldn't be so sure about that, in floating point SIMD I'd expect a Xenon to demolish even a Woodcrest.
I disagree. OOO saves power. To get anywhere near the performance of an OOO core an nn-order would have to be:
1. Wider (or faster).
2. Spend significant resources (power)to make critical data-depency latencies smaller, like L1 load-to-use latency
3. Increase the size of on die caches because inlining and loop-unrolling will bloat code.
Xenon's SIMD engines have 128 registers and while it doesn't have a huge cache it gets around added latency by using 2 threads.
The cache isn't big because it's being accessed by 4 processors (the 3 cores and the GPU) and can do things like cache locking.
If in-order CPUs had any kind of power/performance edge over OOO CPUs we'd see laptops with in-order CPUs in them.
But really low power devices like Phones and PDAs all use in-order processors. High throughput chips like GPUs (which completely destroy CPUs) are also in-order.
As I said, OOO is useful for certain types of workload.
What complicates matters is OOO is also a useful bandaid for designs which have small numbers of registers - e.g. x86. Without OOO you'd lose all the rename registers and performance would likely plummet (see VIA C3 benchmaks). In that case OOO probably does save power since it's boosting performance so much.
However PowerPC has always had 32 registers so doesn't need OOO quite so much and doesn't have so much of an effect, according to IBM's figures OOO only boosts performance by 30-40%.
That said the PPE is NOT a pure in-order machine, it does OOO loads...
Historically, almost all if not all in-order cores have been outperformed by OOOE equivalents.
If history goes back to 1998 yes, before that the in-order Alpha was outgunning the out-of-order PA-RISC. Before the Alpha the fastest CPUs were all huge multi-chip things, the fastest of the fast being Cray's machines, all high clocked in-order designs, all beating the living S**t out of IBM's OOO mainframes.
The savings from going in order is vastly exaggerated, your physical register file will have to have the same size as a renaming register file. Instruction caches will have to be bigger in order to get the same hitrates/performance because of inlining and unrolling bloat. The leaves the ROB, which was <10% of the core in PPRO, P3 and Pentium M, is less than 8% in K8 (all of the schedulers combined),
It's also incredibly complex and needs to run very fast, i.e. it's gets hot. 8% of a die may not should like that much but consider that more than half of the die is taken up by cache and it only uses a few percent of the CPU's power budget. Being small doesn't mean it's not a potential problem.
Your statement is misleading. You imply that IPC for an in-order PC CPU would haved remained constant at current clock speeds. That would not be the case because memory latencies have increased significantly. So the IPC of an in-order CPU would be lower than it was 10 years ago. My guess is that the IPC of modern OOO x86 CPU's is somewhat 3-5x that of an comparable in-order x86 CPU, maybe even more for memory intensive applications.
IPC is generally limited by code, not the hardware, the average IPC you can extracet from code is around 2 - exactly what the PPE and SPEs were designed for. In reality however IPC is usually lower.
This sounds very strange as IBM has plenty of OOO cores to pick from if OOO was really that important to MS.
440, 750, 970, POWER5 and probably several others besides, if OOO was that important they would have got it.
OOO was dropped because of space and power concerns, and because the workload (SIMD floating point) doesn't benefit from it much, if at all.
From what I've read Xenon and Cell was developed by entirely separate teams.
Yes, but the PPE/Xenon's integer core was from an older project and was a plug-in they could both use.