orionthehunter
Newcomer
Either way, the compiler/programmer has a pretty big load/use slot to fill on either chip. Of course, the XeCPU has two-threads to hide some of the latency if needed.
Wait, doesn't this:
(from the IBM paper I linked earlier. I included the cycle times for anyone interested)Up to two instructions are issued per cycle; one issue slot supports fixed- and floating-point operations and the other provides loads/stores and a byte permutation operation as well as branches. Simple fixed-point operations take two cycles, and single-precision floating-point and load instructions take six cycles. Two-way SIMD double-precision floating point is also supported, but the maximum issue rate is one SIMD instruction per seven cycles. All other instructions are fully pipelined.
accomplish the same type of thing for the SPE that two-threads does for the XeCPU, as far as latency is concerned?