Some thoughts arbitrarily popping into my mind while browsing through this thread.
1.) Comparison PS2 VU - PS3 FP APU: I don't expect these to differ as much in complexity as e.g. Deadmeat implies (for whatever reasons he has). First of, while the PS3 incarnation of a vector unit will feature more memory, logic for e.g. PS2's coprocessor mode can be ommitted. Also i'd imagine that quite a lot of what is a VIF responsibillity today might very well be relocated at the associated PE Core in PS3 (That's what i understand as their prime responisibility (naivly said)-- serving as a communication endpoint for software cells, "dismantling them", i.e. unpacking of data & code, performing various binder/loader functions for its apus, initiate code execution, etc.,etc.
2.)Two-chip solution: I am a bit sceptical of the two chip version that many are expecting. It would make quite a bit of sense to me to go with a single chip implementation where two PEs would feature special APUs for Set-up, Rasterization, Shading, etc, while the remaining two PEs handle the application model. Such an approach would have the added benefit of being quite flexible with regards to geometry transformation / procedural texturing, memory use, general load balancing, general post-processing of the frame buffer as on-chip bandwidth will be lots larger then in a more traditional seperated implementation. It Imo also sounds more feasible in regards to the traditional price point of console systems and supports the references to a SOC implementation in all official press releases (that i have read).
3.)developement costs: The 400 mio $ figure, while a lot of money, is not an exceedingly large amount of money, when compared to other high-performance, broad-scale, paradigm-introducing processor developement efforts. E.g. figures thrown around for the developement(ISA, Compiler(s) & Merced implementation (but nothing production related)) of IA-64 between Intel and HP range between high single digit - low dual digit billion $ sums. Nvidia once revealed to EETimes that the nv20->nv25 developement (along with masking costs though) costs where in the range of 170 mio $ (i am to lazy to dig up the links now, but browsing through the 3d technology forum, you'll probably find lots of references to this).
4.)4 GHz: The alledged 4 GHz design goal seem quite ambitous and a bit surprising to me. Attaining such short cycle times implies (IMO) quite lengthy pipelines for all of Cell's ALUs (even if fabbed at the 65nm node), which in return implies more complex control. To maintain decent IPC (and therefore decent performance relative to cells' theoretical specs, assuming a respectivly distributed application) ,i'd imagine, complex flow control logic (branch prediction, ooe-exec., etc.) might be required for general purpose "spaghetti code", though I expect them to just swallow these inefficiencies, as, if they manage to sustain a five-percent real world efficiency (50 Gops), it'll be quite powerfull for a 2005-2007 cpu.
I might be a little bit more sceptical about Cell then some others in this forum, but nevertheless i am really looking forward to see official info on its first implementations (and having it show up besides my tv-set someday
...).