Crowd Technology PS3 Presentation

So in short: (correct me where I'm no doubt wrong..)

It's taken 1 year to speed the demo up from 150 people @ 60fps to 10,000 @ 60 fps. (?!)
In theory it should be capable of 16,000.
Flocking logic is the most expensive part (?) so it's only computed every X frames?
The SPE's are mostly idle because the SPU can't send data to them fast enough (this sounds worrying)?
10,000 & 60fps is only for 2D simulation (and rendering?), where full 3D was significantly less (3000?), at a lower frame rate..? (30?)

Personally 10,000 items for n(logn) processing doesn't sound all that impressive even if the logic is apparently fairly heavy. But maybe thats just me...
 
Last edited by a moderator:
Just a side note: I was noticing NBA 2K7 had very nice crowds and they seemed to be moving independantly (or at least in groups dispersed enough that it did not look too fake). PGR3 also has nice large crowds. This is definately one of the perks of next gen, one I really enjoy.

I am reading over the PPT and one of the cool things they mention is that this is being developed as sample SDK code for developers. I like seeing/hearing that!
 
Last edited by a moderator:
150 to 10000 in about a years time really puts the ps3's learning/programming curve in perspective. I cant imagine how ps3 games would look compared to launch titles in 4-5 years when it's lifecycle is almost up.
 
Slide 39 (Pie Chart) and Slide 40:

Faster in the future?
SPU's idle more than half of each frame
PPU spends about half its time spoon-feeding the SPUs new Bucket assignments

The above statement is for 10k items, so obviously it could be toned down for a game. But, this raises a question: The PPE is doing a lot of work, and the SPEs not so much. Due to its nature the PPE is a precious resource. I would have assumed the SDK demo would have been designed with the idea to touch the PPE minimally and use the SPEs as the work horses. This only makes sense because you have 5+ SPEs dedicated to developer use 100% of the time. If I am doing a game, lets say a car game like PGR3, a crowd is very beneficial BUT not central to the game. You don't want that eating into your PPE, but if you can leverage the SPEs (since you have so many and they are quick) that would appear ideal.

I liked the fish demo and I am impressed with seeing 10k individuals interacting, but it doesn't seem very practical for a "perfect world" scenario. It is a nice start of course, but I expected more SPE interaction and less use of the PPE. With the PPE load I am kind of surprised they did dedicate an SPE to the task of assigning new bucket assignments and leave the PPE alone on the premise that many games will need the PPE for other tasks.
 
kabacha said:
ok so we see all these fancy tech demos, but where is the translation into game code?

This code is being incorperated into the SDK as sample code for developers to use, modify, and impliment into their software. This code translates into having thousands of active elements reacting, in realtime, to other elements. The SDK example is fish that are navigating on their own, but this basic structure and premise could be used for any number of crowd base scenarios. Heck, it could make an uber-Lemmings game! On a practical level the N3/Heavenly Sword style games with hundreds of individual members in large crowds could be made to be more independant and dynamic. A wild life hunting game (be it human or animal hunters) could use something like this to create a world where different species can co-exist and others not, with natural and automated grouping and movement toward items of need (water, shelter, food, mating) and away from unpleasant items (tourists, preditors, hunters, fires, stampeeds, game trails, etc). It could also be diversified where some animals liked grouping with various species and when an unwated species joins another leaves, or when most animals run away from humans another species comes close to beg for food.

We have not seen a lot of crowd behavior in games, certainly not large crowds numbering into the thousands that react dynamically to their environment and to eachother. This type of technology can translate into all sorts of living worlds.
 
kabacha said:
ok so we see all these fancy tech demos, but where is the translation into game code?
Perhaps you don't get it. This is MEANT to be a tech demo. You don't bash a tractor for being slow and heavy, therefore it's pointless to bash tech demos for being tech demos.

Stop trolling and go back to opa-ages or wherever.
 
Acert93 said:
The above statement is for 10k items, so obviously it could be toned down for a game. But, this raises a question: The PPE is doing a lot of work, and the SPEs not so much. Due to its nature the PPE is a precious resource. I would have assumed the SDK demo would have been designed with the idea to touch the PPE minimally and use the SPEs as the work horses.

Not necessarily, at all. The current PhysX implementation on Cell touches the PPE way more than it would need to in an optimal implementation for example, because it was the easiest way to avoid compatability concerns with other parts of the library, and code portability.

I do not think it unusual or surprising that there is a dependence on the PPE more than you might like at the moment. That'll improve over time, I've no doubt (could be improved by reducing the amount of PPE involvement per "unit" of SPU work or increasing the amount of SPU work per "unit" of PPE involvement). The current approach is often "start with the PPE and see what you can move off", but the ultimate approach will be splitting work evenly between every processor - which is a tougher nut to crack, for sure. This very presentation seems to suggest vectors for improvement though, perhaps with streaming buckets and the like.

Oh, and yeah, this is old, and discussed previously. And as for games applications, there's lots. Think of a GTA-style game with crowds of people on streets etc.

edit - oh, and another consequence of this is also that they could probably reduce the number of SPUs used and maintain roughly the same performance. Again, the same phenomenon can be seen in the AGEIA library, where under one particular demo presented performance really didn't scale beyond 3 SPUs IIRC. You could employ more, but it would only result in each SPU being increasingly more idle (like our crowds demo above!) because of the PPE dependency, rather than improving the absolute performance - so in such situations you're not likely to throw all your SPUs at the problem, you'd throw as many as improves the performance and use the others elsewhere. So while you could say that 6 SPUs get you x amount of performance in a certain benchmark, in cases like this you could probably say that 5 or 4 or 3...get roughly the same.
 
Last edited by a moderator:
Agreed. Here the tech demo is looking at finding the limits, but in reality the SPEs will be used for a greater number of smaller problems, like the hear and cloth routine (which is the same routine as I understand it) in Heavenly Sword which is located on one SPE, with no doubt other physics routines located on other SPEs.

By the way, I should have checked it was posted before, my apologies. I guess though it'll be interesting to discuss a little again now that we know a little bit more about what devs are actually doing.
 
Don't compare crowd simulation to the stuff in PGR3 or NBA games. Those crowds are static - the individuals do not have to sense their enviroment, do not have to move around in it, just produce simple reactions to gameworld events. The simulations in this pdf are far, far more complex.
 
I'd rather see what type of crowd similations occur in the Getaway, sony seems to be saying that the getaway is a realistic moving world, so id assume it wont be like gta where they just walk up and down the street.
 
Graham said:
So in short: (correct me where I'm no doubt wrong..)

It's taken 1 year to speed the demo up from 150 people @ 60fps to 10,000 @ 60 fps. (?!)
In theory it should be capable of 16,000.
Flocking logic is the most expensive part (?) so it's only computed every X frames?
The SPE's are mostly idle because the SPU can't send data to them fast enough (this sounds worrying)?
10,000 & 60fps is only for 2D simulation (and rendering?), where full 3D was significantly less (3000?), at a lower frame rate..? (30?)

Personally 10,000 items for n(logn) processing doesn't sound all that impressive even if the logic is apparently fairly heavy. But maybe thats just me...

The fact that the SPEs are underutilised and the PPE is overutilised indicates to me that there is a huge scope for further performance improvement.

The key to increased SPE utilisation I believe is to separate the AI and parametric motion algorithms from the physics/collision detection the problem so program and data fits into the local store of the SPE. The AI, parametric movement, geometry generation and collision detection parts can then be stream processed in different SPEs or processed in turn by a single SPE. For example collision or proximity data from the last frame can be stored as data by the collision detecting algorithm, the AI algorithm can pick this up and determine actions/reaction stored in some coded form. This can then be converted into movement stored in a parametric form, which can then be converted into a geometry which is rendered, and then collision or proximity detection can be performed. Each stage can be handled by an SPE.

The SPE can do scatter/gather list DMA, so it can extract the data from a suitable data structure (eg. a simple list of objects for each stage to be processed) all on it's own without the PPE having to feed it, provided that the problem can be broken up into small enough units for code and data to fit in the SPE's local store.

The other thing to be considered is that in many games, there may be hundreds of thousands of characters, but it may only be necessary to put geometry and render those in the field of vision. Also as you get further away from the player, the simulation may not need to be so detailed. For example characters in the immediate locale may be fully animated. in neighbouring locales, only the position and orientation of the charactors may be calculated, and far away from the player's locale, a more crude top down algorithm may be used to simulate crowd clustering behaviour. This can allow a huge increase in the number of characters in the simulation.
 
Back
Top