Acert93 said:
The above statement is for 10k items, so obviously it could be toned down for a game. But, this raises a question: The PPE is doing a lot of work, and the SPEs not so much. Due to its nature the PPE is a precious resource. I would have assumed the SDK demo would have been designed with the idea to touch the PPE minimally and use the SPEs as the work horses.
Not necessarily, at all. The current PhysX implementation on Cell touches the PPE way more than it would need to in an optimal implementation for example, because it was the easiest way to avoid compatability concerns with other parts of the library, and code portability.
I do not think it unusual or surprising that there is a dependence on the PPE more than you might like at the moment. That'll improve over time, I've no doubt (could be improved by reducing the amount of PPE involvement per "unit" of SPU work or increasing the amount of SPU work per "unit" of PPE involvement). The current approach is often "start with the PPE and see what you can move off", but the ultimate approach will be splitting work evenly between every processor - which is a tougher nut to crack, for sure. This very presentation seems to suggest vectors for improvement though, perhaps with streaming buckets and the like.
Oh, and yeah, this is old, and discussed previously. And as for games applications, there's lots. Think of a GTA-style game with crowds of people on streets etc.
edit - oh, and another consequence of this is also that they could probably reduce the number of SPUs used and maintain roughly the same performance. Again, the same phenomenon can be seen in the AGEIA library, where under one particular demo presented performance really didn't scale beyond 3 SPUs IIRC. You could employ more, but it would only result in each SPU being increasingly more idle (like our crowds demo above!) because of the PPE dependency, rather than improving the absolute performance - so in such situations you're not likely to throw all your SPUs at the problem, you'd throw as many as improves the performance and use the others elsewhere. So while you could say that 6 SPUs get you x amount of performance in a certain benchmark, in cases like this you could probably say that 5 or 4 or 3...get roughly the same.