Is true the legend about the OoOE CPU?

Urian

Regular
The legend says that an in-order CPU can only gets at maximum performance the 33% of the maximum performance of an out-of-order CPU.

Is this true or only shit against Xenos and Cell?
 
Of course there is no such "maximum performance" differences. You can have a tight loop which runs very well on an in-order CPU, or a very randomly branchy/memory accessing codes which runs like dog on an in-order CPU. All depends on what you do.

Out-of-order execution is a way to hide latency, by trying to run something instead of waiting for some blocking instructions. In-order CPU can't do this, so they need something to hide latency to have similar performance, such as multi-threading.
 
Urian said:
The legend says that an in-order CPU can only gets at maximum performance the 33% of the maximum performance of an out-of-order CPU.

Is this true or only shit against Xenos and Cell?

Engineers working on the DEC Alpha pegged the general performance improvement in going OoO at about 50%.

If code can be statically scheduled well to match the underlying resources of a processor, then it is possible to match or better an OoO processor with an in-order if everything else is equal.

If the software is either too difficult or expensive to warrant the low-level optimization, then the OoO will probably do better.

It's all a trade-off. If the task is highly regular and predictable, an in-order can be given highly optimized code and possibly be clocked higher, since scheduling hardware in OoO is a serious drag on cycle times.

However, tasks that are more difficult to profile well in software due to branches or memory accesses that are hard or impossible to predict at compile time will lose out on an in-order.

It gets worse if unpredictable memory latency is involved. OoO processors tend to be more tolerant of a first-level cache miss that stays on-chip.

Programmers have relied heavily on OoO because its greatest performance improvements come on relatively unoptomized and legacy code. It's a problem for chip designers who have to make the extra hardware run fast.

It's different for every task.

Is it predictable/profilable?
Is it branchy?
Can it fit in cache?
Have compilers/tools advanced enough to optimize it well?
Do you have the time to hand-tune it?
Do you have the money to heavily optimize?

Media tasks like graphics especially can be run very well on optimized statically scheduled code. This is why Cell has so much fp power packed into in-order SPEs.

Gameplay code like AI is difficult to get scheduled right, which is why some programmers are unhappy about the PPE and Xenon cores.

It may not be impossible to get it scheduled in software, but nobody has lengthened development times to match the extra effort.
 
Back
Top