The chip itself is a three way SMP PPC with specialised function VMX extensions and two threads per core. It has 1MB of L2 cache, a FSB of 21.6GB/s. It has 165M transistors, and is built on IBM's 10KE 90nm SOI process.
The L1 Icache is 32K 2 way set associative, and has a 128 byte cache line size. It can issue 2 instructions per clock, in order, but can do delayed execution to cover load to use delays. The chip, still somewhat unnamed, has 2 fixed point units, and has a 2 cycle op latency. The Dcache is also 32K but is 4 way set associative, and is non-blocking. The FPU is combined with the VMX unit and can also handle two threads.
The full pipeline is 11 FO4 in length, and has a 10 cycle Scalar DP FPU latency, 2 cycle load latency, 4 cycles for simple VMX and 14 for dot product VMX. This is important because of the target for the chip, gaming. The VMX extensions are going to be heavily used here, and part of the MS mods were upping the number of VMX registers from 32 to 128. It also adds Direct3D pack and unpack instructions.
http://www.theinquirer.net/?article=27221
The L1 Icache is 32K 2 way set associative, and has a 128 byte cache line size. It can issue 2 instructions per clock, in order, but can do delayed execution to cover load to use delays. The chip, still somewhat unnamed, has 2 fixed point units, and has a 2 cycle op latency. The Dcache is also 32K but is 4 way set associative, and is non-blocking. The FPU is combined with the VMX unit and can also handle two threads.
The full pipeline is 11 FO4 in length, and has a 10 cycle Scalar DP FPU latency, 2 cycle load latency, 4 cycles for simple VMX and 14 for dot product VMX. This is important because of the target for the chip, gaming. The VMX extensions are going to be heavily used here, and part of the MS mods were upping the number of VMX registers from 32 to 128. It also adds Direct3D pack and unpack instructions.
http://www.theinquirer.net/?article=27221