Update 2007/08/20: Added two paragraphs at the end talking about the latest EA Madden porting issues
Update 2007/08/19: From a reference to this article on vgchartz.com I found this link to a thread on beyond3d.com where the concepts refered here have been exemplified by various statements from PS3 developers on the usage of SPE's within their games.
What is one of the most heard prejudice when you ask people about the PS3? I guess number one is "The PLAYSTATION 3 is difficult to program, more difficult than other platforms". Well, this statement as such is of course stupid. Compared to let's say the PS2, the PS3 is a big improvement. While you had to use assembler code on the PS2 for almost all advanced stuff, the PS3 basically just needs C++ or C code in most cases. This is even true for the Cell SPEs, the main differenciator.
But before we even start let me already make a short disclaimer here as the article itself as well as the material that gets referenced are not easily digestable
Anyway, I tried to use common sense terminology wherever possible as I am not a geek either
I have also linked to several Wikipedia articles to explain some of the technologies that are used here in more detail.
The Beast
Of course what people could mean by that statement is the fact that the next generation consoles as well as PCs are now based on multi-core architectures, i.e. several CPUs can be used at any time to run all sorts of parallel code to achieve what we finally call next-gen gameplay. But then again is no different for the various platforms.
The real difference becomes obvious when we look at the guts, i.e. the different CPUs used within the various platforms / consoles. Starting with the 360 we find the so-called Xenon CPU which features 3 symetrical PPC cores, being able to run two parallel threads at a time. Each core features an additional vector processing unit that can be used to do some fast vector operations. Those three cores typical multi purpose CPUs and are all programmed the same way using the same memory resources which makes the development of multi-threaded relativly easy, as there is only one programming model used.
The PC on the other side is very similar in a sence that the latest Intel and AMD chips are all using symetrical multi-cores (usually dual core technology) which basically leads to the same pros and cons in terms of generating multi-threaded code.
Looking at the Cell, things start to look a little different: one of the design goals of the Cell chip was to produce the best performance / size ratio at the time of production of all the chips being out there. This was achieved by breaking with a lot of design principles that have been used at that time. First of all the Cell is not fully symetrical in the way the Xenon or Intel/AMD chips are.
[img=500x104, 39,0Kb]http://static.flickr.com/1222/1147256521_5d0678e078.jpg[/img]
Fig 1. The PS3 Hardware Architecture (click for larger view)
The Cell features one dual-threaded PowerPC Element (PPE), very similar to one of the Xenon cores, that acts as a conductor for what seems to be the orchestra, a set of 7 Synergistic Processing Elements (SPE), which are symetric cores that are once again very much simplified compared to the PPE. Think of those SPE's as very fast and flexible vector units. Even though we are talking about 7 SPU's here, keep in mind that the PS3 dedicates one single SPU to the PS3's OS that handles a couple of background tasks (e.g. content download, etc.). Looking at Fig. 1, The PPE as well as the SPE'S are connected via a very fast Element Interconnect Bus (EIB) which allows the PPE to talk to the SPE's, as well as the SPE's to talk to each other. Finally, all memory access is going thru the EIB and various memory and IO controllers.
What's interesting to note: even though the PS3 has destinct memory for the Cell CPU as well as the RSX GPU, this memory is somewhat "unified" in way that the RSX can access the different memory areas rather fast with a peak of 20 GB/s. In that sense, the PS3 architecture combines the best of both worlds, separated memory areas for CPU and GPU (no concurrent access issues), still being able for the GPU to use the respective other memory area for its own purposes. The 360 for instance uses a unified memory model where there is no distinction between different memory types, but with the problem of concurrent access to the same memory from CPU and GPU. Again, from a programming point of view, the 360 is easier to handle as you don't have to make a decision where certain stuff needs to reside.
One other complexity that needs to be handled by the programmer is the data feed to and from the SPE's. As the SPE are very simplistic processing units, they only feature are relativly small local storage (LS) of 256 KB. This storage is used for the SPE's program running at one point in time on a SPE. The data that gets processed by the SPE needs to come from the memory via DMA commands that need to be issued just in time for the data to be ready for processing. The same is true for writing back stuff. The other option of course is that data will be consumed by another SPE after it has been processed by the first one.
[img=400x235, 27,8Kb]http://playstation-disorder.com/uploads/ps2arch.gif[/img]
Fig 2: The PS2's Emotion Engine
Let me do one last comment on the PS3 architecture before we dive into the Beauty section
The overall PS3 architecture is very similar to the PS2 from a certain viewing distance. Not really a surprise, as Ken Kutaragi was the responsible engineer for both platforms. Take the PPE on the one hand and compare it with the MIPS core within the PS2 Emotion Engine (Fig. 2). Then take a look at the PS2's 2 Vector Processing Units (VU1 & 2) which are in a sense simplified SPE's. In a way, the PS3 architecture is a blown up and extended PS2 architecture. The design goals on both platforms are obviously very similar: powerful and flexible. Both platforms are optimized for streaming data processing. Keep in mind that the way this data gets routed through the Cell as well as through the Emotion Engine is not fixed, you can route data differently through the different units of the Emotion Engine as well as through the Cell SPE's. There are many configurations possible on both platforms where of course the Cell is even more powerful in that discipline. Check out those references for a more detailed look into this topic [3] and [2 Slide 20 ff].
The Beauty
So far we have seen the beast part of the PS3, a platform that has some programming challenges in terms of keeping the various SPE's busy and managing the different memory types efficiently. Let's now come to the beauty aspects of it: its flexibility. As you can already see in Fig 1. the RSX GPU is the other major component within the PS3 architecture.
The RSX chip is very similar to any other standard PC based GPU with the exception, that it can access both memory types (main and graphics) almost at the same speed. Beside that it most likely features (no official specs have been released so far) a graphics pipeline consisting of 24 pixel shaders, 8 vertex shaders, 8 ROPS (raster operation units), and 550MHz clock speed.
Looking at those shaders as very specialized vector units similar to the Cells SPE's, again one could think of off-loading some of the tasks within the GPU's graphics pipeline to the Cell. In other words, you can use the Cell to seamlessly extend the GPU's capability and power. Let's take a look at two examples where this can be seen in real life in order to see that this is not just theory.
The first example is the already discussed "Deferred Rendering in Killzone 2" presentation that was given at the develop conference in Brighton this July. Check out my article on that one if you need more details.
The second example I just found on the SCEA research website a couple of days ago. The article "Deferred Pixel Shading on the PLAYSTATION®3" perfectly describes the benefits of what we just discussed:
the Cell providing additional pixel shaders to the graphics pipeline that are even more powerful than the RSX build-in pixel shaders. In a nutshell the result of the exercise (the implementation of a certain soft shadowing alogorithm) described in the article is that
5 SPE's basically acting as pixel shaders achieve the same overall performance as the 24 pixel shaders used on a NVIDIA GeForce 7800 GTX running on Linux based system . Btw, the used NVIDIA GPU is very similar to the RSX GPU used in the PS3.
Conclusion
Similar to the PS2, a platform that even for today's standards surprises us with visual qualities we wouldn't have expected from the console in the first place, the PS3 is a very flexible gaming platform. Almost nothing is carved in stone and it is really up to the developers to unleash the power of the architecture. This makes all sense as opposite to the PC world, hardware improvements are not an option for console makers. With a lifecycle of at least 10 years, the PS3 needs to be still capable of running new algorithms and methods which we don't even know by now.
Those are just two examples of how the combination of Cell and the RSX can achieve things that sometimes even the designing engineers havn't thought about. A great outlook for the future if you ask me.
This is not possible with a fixed hardware architecture that is designed to make life easier for developers. Sure, on the PC side you can easily extend your hardware, for consoles that's not an option. Or maybe we will see another Xbox just in couple more years, something that is most likely to happen for the Wii anyhow.
One last comment on those various statements regarding this let's say mediocre Madden port to the PS3. First of all, I don't think that this will have an impact on PS3 sales as one analyst has predicted. Second I think this is more of an EA issue, showing once again, that those guys don't get the necessary resources to address the respective platforms appropriately. Not knowing what the issues really are but it seems to be a perfect example where developers didn't have the knowledge or the skills to do a proper PS3 port of the game. I almost looks like as if the whole code is just sitting on the PPE whichI already said is similar to the Xbox Xenon cores. No need for them to do any adaption at all. Again, I don't know the real story. Is this going to happen more often in the future? Usually no. Again, similar to the PS2's VRAM issues (just 4MB graphic RAM) where developers had to learn that due to the PS2 architecture there is actually no need for a larger VRAM, best practices will be shared within the developer community.
This time things are also a little different: opposite to the PS2 days, where both the Emotion Engine & the Reality Synthesizer where very exotic devices only used within the PS2, both the Cell and the RSX (aka G70 based) are already well known chip and are supported otherwise by IBM and NVIDIA. In other words, knowledge will be spread a lot faster compared to the PS2. It just requires the EAs and Ubisofts of the world to apply those best practices within their own developments. What would be even more stupid, if those companies would not be able to share already available knowledge within their own company. Maybe the GRAW guys should talk to the SC Conviction team?
Additional Links
"Introduction to the graphics pipeline of the PS3" (PDF)
Presentation from Eurographics 07 by Cetric Perthuis, SCEA
"CELL: A New Platform for Digital Entertainment" (HTML)
Presentation by Dominic Mallinson & Marc DeLoura SCEA
Cell Software Model (PDF)
IBM programming course
"Deferred Pixel Shading on the PS3" (PDF)
Article by Alan Heirich and Louis Bavoil, SCEA
"Deferred Rendering in Killzone 2" (PDF)
Presentation from develop conference 07 by Michal Valient, Guerrilla