For Core 2:
2.4.3 Execution Core
The execution core of the Intel Core microarchitecture is superscalar and can process instructions out of
order. When a dependency chain causes the machine to wait for a resource (such as a second-level data
cache line), the execution core executes other instructions. This increases the overall rate of instructions
executed per cycle (IPC).
The execution core contains the following three major components:
• Renamer — Moves micro-ops from the front end to the execution core. Architectural registers are
renamed to a larger set of microarchitectural registers. Renaming eliminates false dependencies
known as read-after-read and write-after-read hazards.
• Reorder buffer (ROB) — Holds micro-ops in various stages of completion, buffers completed microops,
updates the architectural state in order, and manages ordering of exceptions. The ROB has 96
entries to handle instructions in flight.
• Reservation station (RS) — Queues micro-ops until all source operands are ready, schedules and
dispatches ready micro-ops to the available execution units. The RS has 32 entries.
The initial stages of the out of order core move the micro-ops from the front end to the ROB and RS. In
this process, the out of order core carries out the following steps:
• Allocates resources to micro-ops (for example: these resources could be load or store buffers).
• Binds the micro-op to an appropriate issue port.
• Renames sources and destinations of micro-ops, enabling out of order execution.
• Provides data to the micro-op when the data is either an immediate value or a register value that has
already been calculated.
For Nehalem specifially:
The IDQ (Figure 2-11) delivers micro-op stream to the allocation/renaming stage (Figure 2-10) of the
pipeline. The out-of-order engine supports up to 128 micro-ops in flight. Each micro-ops must be allocated
with the following resources: an entry in the re-order buffer (ROB), an entry in the reservation
station (RS), and a load/store buffer if a memory access is required.
The allocator also renames the register file entry of each micro-op in flight. The input data associated
with a micro-op are generally either read from the ROB or from the retired register file.
The RS is expanded to 36 entry deep (compared to 32 entries in previous generation). It can dispatch up
to six micro-ops in one cycle if the micro-ops are ready to execute. The RS dispatch a micro-op through
an issue port to a specific execution cluster, each cluster may contain a collection of integer/FP/SIMD
execution units.
The result from the execution unit executing a micro-op is written back to the register file, or forwarded
through a bypass network to a micro-op in-flight that needs the result.