I have read quite a few articles on the Internet about how badly Cell will perform compared with "normal" CPUs like Intel or AMD processors at general tasks like running an OS and stuff like AI because of lack of out of order processing in the PPE (the same applies to Xenon). The SPEs are criticised for lack of enough cache, lack out of order execution and lack of branch prediction, supposedly making it unsuitable for AI as a consequence.
The PPE and Xenon cores are conventional processors with less cache than typical, but since most processor time is spent on small number of small tight loops, the resulting speed reduction shouldn't be that huge and should be more than made up for by the code accelerated by the multi-core architecture of both Cell and Xenon.
Both PPE and Xenon cores are in-order. This can be made up for by using intelligent compiler technology to pre-optimise order of execution. This won't work for binary OSes like Windows which need to run code written for legacy processors (ix386). Windows PCs need to be able to run code written for several processors, making it impossible to use intelligent compilers to optimise code to make up for lack of out of order code. Also the vector processing features of the CPU in the multi-media extentions are different, and so difficult to optimise for multiple processors on Windows. However in code compiled for a specific CPU (like PS3 or XBox360) an intelligent optimising compiler can replace the need for out of order execution. This is also true for an OS like Apple OSX which is machine specific. It is also true to some extent for OSes like Linux which exist in source code rather than binaries and so can easily be recompiled in a machine specific way - especially with a source distribution like Gentoo Linux (the only problem is proprietary applications for which source code is not distributed and is not compiled for the specific CPU). It is certainly possible for a Sony Linux or a Microsoft Windows CE included with the PS3 or XBox360.
As far as the SPEs are concerned, I can' t understand how code written for the SPE units can run slower than on a conventional processor, unless you take typical procedural code written for conventional processors and try to run it on an SPE (ie. load code into SPE local memory and use it only once, and then load the next segment of code). Nobody in their right mind would do that would they? In most cases code written to run within the local memory of the SPEs and make use of DMA (including use of scatter lists to load data randomly located in large expanses of main memory), and using branch hints and clever coding and stream processing methodology to keep loops and frequently used data within the local store. In most cases the code will run an order of magnitude faster than on conventional processors with an machine managed cache simply because this enforced minimisation of instruction and data transfer to/from slow main memory can be done far more effectively if you can know and manually control what the algorithm does rather than leaving it up to a dumb piece of hardware.
As for AI on the SPE, a procedural AI model where the logic state is represented by the position of the program in the code is not appropriate for the SPE because this involves long branching. Instead the AI state running on the SPE should be represented as boolean flags or flag array data in main memory, loaded into SPE local memory for processing. For most things logic operations performed on boolean array tables in SPE local memory can take the place of branching and code execution position to represent AI state. In most cases it should be possible to carry out a more complex sequence of boolean operations on an array of boolean values without branching so as to give the same effect as repeating operations that set one bit at a time using conventional decision branching. The code that does this will run in SPE local memory, an may stream process a large number of AI objects eg. the status flags for a large number of characters in a game to update status. It is possible to parse or traverse large data structures in main memory efficiently using scatter list DMA to bring data to the SPE local memory. If blocks of data can be processed together then the SPE is more efficient than a conventional processor. If not, then it is no worse than a conventional processor. The main thing is to keep the code in local SPE memory (ie. to stream process AI data) otherwise it will be less efficient than a conventional processor. The overall AI/logic status of the main program (rather than object AI) may be more suited to the conventional procedural branch represented logic status, but this is no problem, because it can be done on the PPE which runs the main program execution thread.
It has also been suggested that programming Cell will be more difficult than a conventional processor? But will it? The SPE code will mostly be separate code seen as libraries or devices called by the main game programmer. Is this any more difficult to use than any other type of programming? The fact that, Cell has a single core, with the parrallelization undertaken through standardised libraries and device files may in fact make in easier to program than XBox360. Also the modularisation that the SPEs enforce can actually make programming easier and cleaner by encouraging standardisation/ code reuse.
The PPE and Xenon cores are conventional processors with less cache than typical, but since most processor time is spent on small number of small tight loops, the resulting speed reduction shouldn't be that huge and should be more than made up for by the code accelerated by the multi-core architecture of both Cell and Xenon.
Both PPE and Xenon cores are in-order. This can be made up for by using intelligent compiler technology to pre-optimise order of execution. This won't work for binary OSes like Windows which need to run code written for legacy processors (ix386). Windows PCs need to be able to run code written for several processors, making it impossible to use intelligent compilers to optimise code to make up for lack of out of order code. Also the vector processing features of the CPU in the multi-media extentions are different, and so difficult to optimise for multiple processors on Windows. However in code compiled for a specific CPU (like PS3 or XBox360) an intelligent optimising compiler can replace the need for out of order execution. This is also true for an OS like Apple OSX which is machine specific. It is also true to some extent for OSes like Linux which exist in source code rather than binaries and so can easily be recompiled in a machine specific way - especially with a source distribution like Gentoo Linux (the only problem is proprietary applications for which source code is not distributed and is not compiled for the specific CPU). It is certainly possible for a Sony Linux or a Microsoft Windows CE included with the PS3 or XBox360.
As far as the SPEs are concerned, I can' t understand how code written for the SPE units can run slower than on a conventional processor, unless you take typical procedural code written for conventional processors and try to run it on an SPE (ie. load code into SPE local memory and use it only once, and then load the next segment of code). Nobody in their right mind would do that would they? In most cases code written to run within the local memory of the SPEs and make use of DMA (including use of scatter lists to load data randomly located in large expanses of main memory), and using branch hints and clever coding and stream processing methodology to keep loops and frequently used data within the local store. In most cases the code will run an order of magnitude faster than on conventional processors with an machine managed cache simply because this enforced minimisation of instruction and data transfer to/from slow main memory can be done far more effectively if you can know and manually control what the algorithm does rather than leaving it up to a dumb piece of hardware.
As for AI on the SPE, a procedural AI model where the logic state is represented by the position of the program in the code is not appropriate for the SPE because this involves long branching. Instead the AI state running on the SPE should be represented as boolean flags or flag array data in main memory, loaded into SPE local memory for processing. For most things logic operations performed on boolean array tables in SPE local memory can take the place of branching and code execution position to represent AI state. In most cases it should be possible to carry out a more complex sequence of boolean operations on an array of boolean values without branching so as to give the same effect as repeating operations that set one bit at a time using conventional decision branching. The code that does this will run in SPE local memory, an may stream process a large number of AI objects eg. the status flags for a large number of characters in a game to update status. It is possible to parse or traverse large data structures in main memory efficiently using scatter list DMA to bring data to the SPE local memory. If blocks of data can be processed together then the SPE is more efficient than a conventional processor. If not, then it is no worse than a conventional processor. The main thing is to keep the code in local SPE memory (ie. to stream process AI data) otherwise it will be less efficient than a conventional processor. The overall AI/logic status of the main program (rather than object AI) may be more suited to the conventional procedural branch represented logic status, but this is no problem, because it can be done on the PPE which runs the main program execution thread.
It has also been suggested that programming Cell will be more difficult than a conventional processor? But will it? The SPE code will mostly be separate code seen as libraries or devices called by the main game programmer. Is this any more difficult to use than any other type of programming? The fact that, Cell has a single core, with the parrallelization undertaken through standardised libraries and device files may in fact make in easier to program than XBox360. Also the modularisation that the SPEs enforce can actually make programming easier and cleaner by encouraging standardisation/ code reuse.