It seems to me a lot of people are missing the proper argument here. It isn't really 'is OOOE better the IOE' or 'is SMP better than UMP' or 'it's lots of cache better than little cache'. It's a question of will we get rid of IOE in future console processors so that in whatever multicore system you have, it's all OOOE.
Putting a better alternative to the PPE into Cell makes sense, and is something I'm sure everyone wants - but does improving Cell also mean adding OOOE to SPEs? Or will future processors get rid of SPE like processors altogether and go back to multi-core GP processors?
There's a lot of discussion here but I think most of it isn't getting to the heart of the debate
What is the best solution depends on the application.
1) File, web, and database servers:
Multiple symmetrical oooe cores with lots of cache is the way to go. Servers are limited by i/o performance rather than cpu or fp performance. This requires large stacks, caches, buffers, and other pointer addressed data structures in RAM. This is exactly what the SPEs are rubbish at. Also the server has to handle many independent instances of connections and requests which can most easily be handled by spawning off new threads/processes and letting the OS schedule them on an SMP or NUMA architecture with many identical cores. No need to do manual scheduling here, much easier to let the OS do it automatically.
2) HPC/supercomputing:
The Cell concept - one GP ioe core plus lots of assymetric DSP cores - is ideal here. The problem needs to be forked out, and then the results need to be put together at various stages. Hence for performance, the parallel processes require close coordination which requires manual scheduling. Using the OS to schedule processes as a means of distributing workload to multiple cores won't work well unless the processes can run independently. Hence with the requirement for manual scheduling, the advantage of easier programming using symmetric cores and automatic scheduling is lost. For hardware efficiency Cell is the best approach.
3) Games consoles:
The Cell concept wins here. Tight coordination is required between parralel processes is required here so automatic scheduling of processes by the OS can't be used here. Hence no advantage in using symmetric cores - better to optimise the cores to do best what they will be asked to do - GP code execution for the PPE, DSP type application for the SPEs. Oooe creates indeterminacy in timing - not particularly desirable in games or in tightly bound parallel code, so why bother with oooe..
4) Desktop ix86 PC:
The optimum in terms of cost/performance is one oooe core with lots of cache to give the best possible Windows of Linux single thread performance on non-optimised code (Windows OS and application code will always be generic, and for Linux on ix86, the same will be true for most distributions and applications), and lots of SPEs to boost sound, media playback etc, and boost FP performance. Also a powerful GPU maybe with SPEs to help with Window and display management. AMD and Intel may be able to use the CPU-GPU on a core concept to tailor the GPU and SPEs to complement each other for this. The only problem with this approach is that the SPEs being assymetric can only be used to accelerate code that can be rewritten and optimised for the SPE architecture - which for Windows means only drivers and emulated devices can be accelerated, and for Linux drivers, emulated devices and standard libraries. With Linux, it is possible to accelerate any open source program if there is a need, but unless there is a standard ix86 architecture which includes a universal SPE architecture, who will bother for a small fraction of the market? Still, because certain things like media players and graphics are very speed critical, I think it is worthwhile even with only these accelerated.