I've been reading a thread on PS3forums where a posted with significant exposure to PS3 Cell is claiming that it is gimped due to the disabled SPU and Hypervisor. Here are a few quotes: (Any Thoughts?)
Link: http://ps3forums.com/showthread.php?t=22858&page=32
Link: http://ps3forums.com/showthread.php?t=22858&page=32
I'm mostly on this forum because I'm interested in Cell development, which I write about professionally. I do a lot of programming across a lot of architectures, and I've been published (like, "paying the mortgage" kind of money involved, not just posting to a blog) writing about both Cell and Xenon.
I'm making a technical claim, which is that the PS3's Cell is gimped in ways that make it very hard to achieve its theoretical performance abilities.
As I've said, repeatedly, even what you can get to on the gimped Cell runs rings around Xenon.
Thus, no amount of showing how Cell is outperforming Xenon contradicts my point -- which is that the PS3's Cell will not perform as well as the 8-SPE versions being used in supercomputing apps. It can't -- and it's not just the one SPE disabled, and one taken over by the hypervisor, but that the unpredictability means you have to avoid the top part of the performance envelope, or your game will have unpredictable performance problems on some people's PS3s but not on others.
Which is fine; the performance you can get to is unequivocally better than anything I expect to see out of the 360 any time soon.
I think the problem here is that I made a claim purely about the comparative merits of different Cell variants, and also comparing their ease of use to EE, and you're assuming that this somehow ought to be reflected by Xenon being faster than Cell or something. But that's silly. Xenon was rushed to market to try to get the first-to-launch advantage over the PS3. It's not otherwise particularly technically impressive.
But the thing you're trying to rebut isn't something I said; it's also not something I implied, suggested, hinted at, or intended. It's just something you assumed I'd say, because you've got me pigeon-holed as some kind of weird PS3-hater who doesn't believe the PS3 is powerful. Which is dumb, given that I've paid the bills on a number of occasions specifically by writing about how powerful Cell is and how to get more performance out of it!
A rough ballpark estimate: Start with a full-featured 8-SPE Cell, with support for SPE affinity. Now, take off one SPE. You've lost about 1/9 of your power; maybe a bit over, maybe a bit under... But you've also lost support for affinity, which kills your ability to push the system to the edge. Now, you have to design everything so that it doesn't rely on saturating EIB, and doesn't make any assumptions about being able to assign an adjacent pair of SPEs to a particular task, or about whether a given SPE is close to the PPE -- which matters, because the rings in EIB can do up to three non-overlapping transactions at once.
Now add in the hypervisor. You lose another SPE -- you've now lost a quarter of your heavy-duty vector processing. Furthermore, any and all attempts at affinity are totally shot. Worse, you cannot anticipate or plan for the hypervisor's bandwidth usage.
What that means is that you have to leave even MORE headroom, or have random, unpredictable, losses of performance that are totally outside your code. A new system update could kill your performance by saturating EIB, at least as much as the hypervisor can use... And the SPEs are pretty powerful, and can use a lot of bandwidth.
Putting it in terms of your notion of estimating cars, I'd expect about a 20% loss over what a developer could do on an RSX+Cell system without the hypervisor and without the missing SPE. Some of the damage is done by the hypervisor, some by the unpredictable missing SPE.
The thing is, if you really want to get the best performance out of Cell, you need affinity. You need to be able to allocate adjacent SPEs, and you need to know what they're adjacent to when allocating your workload. Without that, you are going to be well short of the theoretical capacity of the machine.
What that means is that, not only are you missing two SPEs (remember, that's 1/4 the total gruntwork power of the machine), but that you're actually a lot worse off than you would be on a hypervisor-free machine that simply had exactly six SPEs to begin with, and had them in a predictable topology.
So, while it's 25% off just the SPEs, not off the PPE, it's also a substantial hit to the optimization techniques you need if you're going to get full advantage of EIB. EIB's an absolutely gob-smacking technical achievement, and Sony crippled it. I know they had business reasons, but it's still a crying shame.
Last edited by a moderator: