SIMD Vector Processor Comparison
Playstation 3 - Cell
- Developed by IBM, Sony, and Toshiba - mostly IBM, at Austin Texas
- Based on the Power4 architecture, the PowerPC 970 main core
- 7, 128-bit Floating point multiply-adders
- 256 Kbyte local memory for each SPU
- 1 PowerPC 970-based main core
- 512 Kbytes L2 cache connected to the 7 SPUs via a Token-Ring bus - which they used to inter-communicate as well
- 200 GB/s EIB L2 cache bandwidth
- 3.2 ghz clockrate
- 230 million transistors
- $400 million to develop, $100 to produce
- 4-way SIMD
Playstation 4 - Vector Co-Processor
- Developed by AMD and me (just a joke)
- Based on the X86 architecture
- 1, 512-bit floating point multiply-adder
- 1.6 ghz clock rate
- 512 Kbyte local buffer + the CPU core's main buffer
- Connected to the bus directly via a bus-interface unit, and to a CPU core
- About 300 million transistors
- $0 to develop, $80 to produce
- 307.2 GB/s buffer to execution unit
- 16-way SIMD
- 8 X86-based cores, each with 128-bit FP unit, L2 cache, and shared local memory
between pairs of CPUs and between all 8
Doesn't seem as powerful as Cell does it? In fact - it's way, way more powerful. I just found this an interesting comparison. This design is more powerful - as the CPUs do not require direct developer intervention to get them to perform, as opposed to Cell, which only worked if the developers specifically coded for it.
16-way SIMD makes it 4 times faster, per clock. I.e. to do 16 FP multiply-adds, it takes 8 cycles. 4 load-stores, followed by 4 multiply adds - the Cell. But for the Vector Co-Processor to do 16 FP mulitply-adds, it takes 1 cycles. The Out-of-Order, 512-bit load-stores load the data, while the Vector Co-Processor is performing the data. Thus - the Vector Co-Processor alone is equal to 8 SPUs.