Except that you can't even mention "conservative" and Northwood in the same sentence.
The P4 line was an engineering freakshow and might have well been put together by magic. Intel spent billions and billions of dollars get the P4 line to the clockspeeds they did, until they finally hit the genuine clockspeed brickwall in Prescott. IBM can't do that, or even think about doing that. They don't have a tiny fraction of the kind of resources Intel threw at the P4 line.
The Pentium 4 Northwood core had a pipeline roughly as long as the PPE's.
It was probably 2-3 times wider internally than the PPE.
The P4 had a lot of custom circuitry work, and its pipeline was designed to hit high clock speeds.
The PPE has a similar philosophy, but to an even greater degree. IBM's been harping on its great circuit design techniques that enable high clock speeds.
The P4 hit 3.8 GHz at 130nm.
IBM with the PPC 970 at 130 nm hit 2.2 GHz. That core was roughly twice as wide as the P4, and it had a shorter pipeline.
Are you trying to say that a more conservative OoO core--narrower than the P4, half as wide, less aggressive, yet given the exact same long pipeline as the PPE, and it's made at 90nm--couldn't hit 3+ GHz?
Totally agreed except for the "conservative" OoO CPU. Not going OoO was in their better interest for the Xbox 360.
My argument is that there is no technical reason why it couldn't be done. Microsoft's more pressing constraints were the earlier release date and their price priorities.
Those are not technical reasons.
edit:
Is there much point to conservatice OoO though? If it's slimmed down, it's gains will be also, perhaps to the point of not benefitting much. We hear Xenon hasn't the greatest implementatiuon of features like branch prediction. To create a proper, well rounded OoO processor that benefits from the OoO features, you'd be looking at bigger cores. I'd be surprised at more than dual-core in that case, which, if games do become vector heavy, would put the CPU at a considerable disadvantage.
It's difficult to say. The problem with finding examples is that no major designs that introduced OoOE had matching in-order counterparts.
Every time a manufacturer transitioned to an OoO core, it also widened the chip, upped the cache, and added a lot of other complex features.
The Pentium vs Pentium Pro is an example of this.
4.5 million transistors for the first, and 5.5 for the second.
At 32 KB of cache for the first and 16 for the second, (at 6 transistors per bit, I'm guesstimating something like 1.5 million transistors in cache for Pentium, and half that for Pro), the logic section is 3 million for the Pentium and 4.5 million for PPro.
The PPro is ~1.5 times larger than the Pentium. It also tended to get 1.5x+ the performance in spec95.
The gains were variable.
However, at the same time, the Pentium Pro signficantly widened the core and added a complex decoding scheme.
For an ISA not as burdened by CISC decoding and implemented less aggressively, I feel a good portion of the low-hanging fruit could be captured without expanding the core as much as the PPro did.