Update from San Francisco
First post is on SPU, next will be on overall CELL.
Presentations haven't happened yet, but here is some stuff from the conference proceedings (which anyone can buy as of this morning):
On the SPU paper they can't seem to make up their mind on the name. It's called an SPU (streaming processor unit) and also an SPE (synergistic processor element), and then in the overall CELL paper the 8 little boxes in the block diagram are labelled SXU. Seriously, I didn't make that second one up. The last one I think actually refers to the interconnect mechanism to the rest of the chip.
The core area of one SPU/SPE (of which there are 8 on the chip) is 2.5x5.81mm2 in 90nm.
Each SPU has 256KB local SRAM which is not part of system address space (referred to as "untranslated, unguarded and non-coherent"). There is a DMA unit per SPU to manage background transfers to/from system memory space (with MMU). There can be up to 16 pending DMA requests, each of up to 16kb.
Each SPU has 128 128bit registers. The text says there are both seven and eight execution units per SPU (doesn't anyone proofread their papers anymore?
). There are fixed and floating point units, permute, some other stuff. Ask if you want details.
All data fetch and branch prediction is managed in software, i.e. you have to explicitly prefetch what you want when you want it, and for branches it mentions that "efficient S/W" manages branches by replacing branches with bitwise select instructions, arranging common case code to be inline, and inserting branch hint instructions.
They claim the SPU/SPE is programmable in C/C++ with intrinsics.
Clock rate ranges from 2-5 GHz over a voltage range 0.9-1.3v with power ranging from 1-11W.