Another New Watch Impress GDC/PS3 article

Titanio

Legend
Seems to be lots of nice info in this one - still trying to make my way through it all:

http://watch.impress.co.jp/game/docs/20060331/ps3_2.htm

Some slides:

ps3_206.jpg


A Rapidmind demo:

ps3_228.jpg
 
Another article by Zenji Nishikawa @ Game Watch on PS3 presentations by SCE
http://www.watch.impress.co.jp/game/docs/20060331/ps3_2.htm

There are presentations from IBM but I omit them as they are old and well-known.

Cooperation of SPE and RSX on graphics
http://www.watch.impress.co.jp/game/docs/20060331/ps3_206.htm
Skinning and other geometry processing can be done on SPE while RSX does pixel rendering.
Besides apparently SPE can access VRAM via DMA so SPE can add some postprocess.

PS3 system offers 2 models of SPE abstraction
http://www.watch.impress.co.jp/game/docs/20060331/ps3_208.htm

SPU Threads is provided by the OS and implements job queue
http://www.watch.impress.co.jp/game/docs/20060331/ps3_209.htm
http://www.watch.impress.co.jp/game/docs/20060331/ps3_210.htm
http://www.watch.impress.co.jp/game/docs/20060331/ps3_211.htm
http://www.watch.impress.co.jp/game/docs/20060331/ps3_212.htm
http://www.watch.impress.co.jp/game/docs/20060331/ps3_213.htm

SPURS(SPU Runtime System) is provided as a library and implements self-multitasking microkernel
http://www.watch.impress.co.jp/game/docs/20060331/ps3_214.htm

What tasks should be run on SPE
http://www.watch.impress.co.jp/game/docs/20060331/ps3_215.htm
http://www.watch.impress.co.jp/game/docs/20060331/ps3_216.htm
http://www.watch.impress.co.jp/game/docs/20060331/ps3_217.htm
http://www.watch.impress.co.jp/game/docs/20060331/ps3_218.htm
http://www.watch.impress.co.jp/game/docs/20060331/ps3_219.htm
http://www.watch.impress.co.jp/game/docs/20060331/ps3_220.htm
http://www.watch.impress.co.jp/game/docs/20060331/ps3_221.htm
http://www.watch.impress.co.jp/game/docs/20060331/ps3_222.htm
http://www.watch.impress.co.jp/game/docs/20060331/ps3_223.htm
http://www.watch.impress.co.jp/game/docs/20060331/ps3_224.htm
http://www.watch.impress.co.jp/game/docs/20060331/ps3_225.htm
http://www.watch.impress.co.jp/game/docs/20060331/ps3_226.htm

Chameleon Fish demo
http://www.watch.impress.co.jp/game/docs/20060331/ps3_238.htm
http://www.watch.impress.co.jp/game/docs/20060331/ps3_239.htm
http://www.watch.impress.co.jp/game/docs/20060331/ps3_240.htm

Chameleon Fish demo movie (17.7MB zipped WMV)
http://cgi1.watch.impress.co.jp/cgi....impress.co.jp/game/docs/20060331/ps3_2m3.zip
A demo movie of PSCrowd simulation library distributed with PS3 devkit (38.9MB zipped WMV)
(10,000 2D sprite-based fishes in the first half, 7,000 polygon-based textureless fishes (36 polys each) with vertex-level animations in the latter, the both are 60fps on 3.2Ghz Cell)
http://cgi1.watch.impress.co.jp/cgi....impress.co.jp/game/docs/20060331/ps3_2m4.zip

The workload of Chameleon Fish demo (5000 fishes (2D&3D) each with AI @ 30fps)
http://www.watch.impress.co.jp/game/docs/20060331/ps3_241.htm
"Bucket" is 1 flock of fishes, Lattice consists of multiple buckets. All fishes have individual AI and crowd AI.

According to Nishikawa, some developers at GDC are griping that the CPUs in Xbox or PS2 may be faster than what you get when you use only PPE in some cases. So Nishikawa's impression is PPE has very little margin and utilizing SPE is the key to the full performance of PS3 (duh)
 
Last edited by a moderator:
Here's a summary from what I could gather:


SPEs and graphics


Geometry processing, deformation etc. in advance of rendering on RSX.

SPEs can DMA in from video memory, thus post-processing is also possible - could be done in parallel with RSX rendering of next frame. Writer seems to speculate about glare processing, HDR, 'photoshop retouch' etc.

Shows the slide relating to Cell/RSX synchronisation posted in the OP, but doesn't seem to explain it..


Task distribution between PPE and SPE:


Suitable for PPE:
OS
File I/O
Memory
Peripherals
Timer
Scripting

Suitable for SPEs:
Math
Networking
Sound and Music
Cameras and Lights
Animation Track
Menu System
Artificial Intelligence
Particles
Font System
Physics ('star' SPE task, as described by the Watch Impress writer)
Graphics

A further breakdown of all the tasks is provided in the slides one linked to above.

SPE threading models

- Priority queue, with PPE management
- SPURS (SPU Runtime System), a small 2KB SPE kernal with 24KB Policy Manager, that allows the SPEs to 'self-manage' without PPE intervention, included in PS3 development library

Particle Simulation with Rapidmind

Unfortunately there's little detail on the specifics of the simulation, just the results (see slide in original post).

Basically, with 1.6m particles, single SPE @ 2.1Ghz is 1.5x faster than Pentium 4 at 3.2Ghz.

Performance appears to scale linearly with SPEs, so 6 SPEs would be circa 9 times faster than the Pentium 4 - and could handle those 1.6m particles at >30fps. If we scale performance with clockspeed, a 6 SPEs at 3.2Ghz would be nearly 13 times faster than the Pentium4 (and would offer closer to 50fps).

Similar relative performance results with 5m particles.

Presumably even better performance is possible if hand-coded, without a runtime dev platform like Rapidmind?

"Chameleon Fish" demos

Demoing PSCrowd Simulation library included in PS3 dev kit

10,000 Fish, Sprites, Goal following with obstacle avoidance and flocking behaviour, 60fps

7,000 Fish, 3D with animation - 60fps

5,000 3D fish in 'nicely rendered' water environment, 30fps (a key point is that the framerate is due to rendering time - underlying simulation can run at 60fps)

Funny, but in some ways I found the initial demo with the 2D sprites changing colours prettier than the last demo :smile:
 
Last edited by a moderator:
The Physics solver presentation I linked to above is looking very interesting - from the SCEA guys who're porting AGEIA to Cell. They talk about the port, the 'lots of ducks' demo etc. etc.

Here's a couple of slides to give you a taster of each presentation (out of context, but go look at all of them!).

PSCrowd:

14uu.jpg


One of his colleagues bet 15 months ago that he'd hit 16,000 at 60fps. So they think they can do better still - SPUs are idle over half the time, PPE spends half its time feeding SPUs new assignments.

AGEIA/Physics solver:

15zb2.jpg

27sk2.jpg

(With multiple SPEs, effect of more on overall speed-up is eventually limited by PPE pre and post-processing of AGEIA data structures)
30us.jpg

(Gives you an idea of how under-utilised a lot of power actually was in that demo! They say this themselves, with more time they could have done better, but this was the easiest approach given the time).
 
Last edited by a moderator:
These slides are quite interesting from a technical point of view, but they also show that a lot of effort has to go into utilizing the available resources well (and I am glad that IBM and Sony are both doing this on multiple fronts and making it part of their SDKs).
Nevertheless, you can see the progress being made, from working on trivially parallelizable problems to more realistic ones where it's not as easy and obvious anymore (which in turn usually end up with the PPE being the limiting factor); although this is still a bit off from a real "game" workload, where many more components are fighting for resources (execution, storage, or bandwidth) with hard constraints.
 
[maven] said:
Nevertheless, you can see the progress being made, from working on trivially parallelizable problems to more realistic ones where it's not as easy and obvious anymore (which in turn usually end up with the PPE being the limiting factor); although this is still a bit off from a real "game" workload, where many more components are fighting for resources (execution, storage, or bandwidth) with hard constraints.

I agree, of course, but the funny thing is how free resources appear to be in some of these examples. I mean, looking at the the 'lots of ducks' demo, they obviously were simply splitting up work to run on seperate processors just because they could, which results in quite wasteful usage. I mean, they started by thinking of doing all the cloth sails together, but then they appear to have thought "oh, but wait! we've multiple boats, so lets just use multiple processors!". But the result is that you've got two SPUs lying idle most of the time. Unless I'm missing something, cloth simulation for both could have been easily accomodated on one SPU, in sequence perhaps. Similarly with the two components of the fluid simulation - though there is a dependency there, looking at the total execution time between the two, it seems like something they surely could have fit on one SPU. Certainly they're only using about 1 SPU's "worth" of execution time for the frame. The PPU is obviously quite free also, the only SPU that's actually busy most of the frametime is that taking care of rigid bodies.
 
I do find the consumption of SPE processing for cloth surprisingly high. I'd like to know detail on that and see how it'd relate to cloth simulations in game (such as capes). When I've a moemnt I'll check the .pdf. Thanks for your useful links, Titanio
 
Shifty Geezer said:
I do find the consumption of SPE processing for cloth surprisingly high. I'd like to know detail on that and see how it'd relate to cloth simulations in game (such as capes). When I've a moemnt I'll check the .pdf. Thanks for your useful links, Titanio

We don't know how complex those cloth simulations are compared to what we see today in videogames.
 
Also Chameleon Fish demo uses only 2 SPEs (30%) while PPE uses 84% so freeing up PPE is the obvious subject for now.
 
london-boy said:
We don't know how complex those cloth simulations are compared to what we see today in videogames.

Quite a lot more sophisticated, judging by just the demos - they appeared not to be self-penetrating, they could be ripped etc. There were multiple sails per boat, also. I'll have a look again at the presentation, but IIRC it doesn't give much clue as to mesh density.

The water simulation bit is interesting also.
 
Last edited by a moderator:
Titanio said:
The water simulation bit is interesting also.

I'm sure you meant to say mind blowing ;)

Considering the model they used allowed them to throw water, causing it to spray, slosh, and splash, as well as simply pouring it and it maintaining cohesion. The only thing I would liked to have seen from that demo was them crossing the streams :D

But to be doing this on top of all the simulations running for the LOD demo (and compiled for E3 2005 and on what now must have been quite a dated CELL), it's just mind blowing.
 
london-boy said:
We don't know how complex those cloth simulations are compared to what we see today in videogames.
Of course. That's why I'd like to see the info ro see what level of detail there is and how it compares to the gaming side. Like the Alias cloth simulator didn't get amazing performance in terms of calculating 500 soldier capes and flags simultaneously, but we don't know what level of detail etc. it had.
 
Mmmkay said:
I'm sure you meant to say mind blowing ;)

Considering the model they used allowed them to throw water, causing it to spray, slosh, and splash, as well as simply pouring it and it maintaining cohesion. The only thing I would liked to have seen from that demo was them crossing the streams :D

But to be doing this on top of all the simulations running for the LOD demo (and compiled for E3 2005 and on what now must have been quite a dated CELL), it's just mind blowing.

Agreed, that's stuff that i've only seen the AGEIA PPU do in realtime. I LOVE fluid dynamics, it's been one of my fetishes since i was messing around with Maya.
 
Titanio said:
(Gives you an idea of how under-utilised a lot of power actually was in that demo! They say this themselves, with more time they could have done better, but this was the easiest approach given the time).

That's actually a scary piece of information. If it is hard to make demos fully utilize the system's resources, it is even harder for real games.
 
jimpo said:
That's actually a scary piece of information. If it is hard to make demos fully utilize the system's resources, it is even harder for real games.

True, but it's both normal and expected for a completely new CPU that's not even been released to the public yet, not to be used fully.
 
So they are not tasking the gpu but send the information just to be output, simplified but you know what i mean.

Would be interesting to know how they did the Getaway demo as it atleast was said to be purely hardtasking on Cell but i find that to be a little optimistic.
 
Mmmkay said:
Considering the model they used allowed them to throw water, causing it to spray, slosh, and splash, as well as simply pouring it and it maintaining cohesion. The only thing I would liked to have seen from that demo was them crossing the streams :D

Anyone know how intensive SPH fluid simulation is? I did some googling, but only found links in the context of astrophysics rather than games ;)

They did show crossing of streams, but it was very brief, and hard to tell how or if they were interacting:

water3rd.jpg


Shifty Geezer said:
Like the Alias cloth simulator didn't get amazing performance in terms of calculating 500 soldier capes and flags simultaneously

Which demo was that? The only one I'd heard of previously was the different simulations running on each SPE, in a cube..

jimpo said:
That's actually a scary piece of information. If it is hard to make demos fully utilize the system's resources, it is even harder for real games.

Not sure if I'd agree. You'd have more tasks in a game to help soak up the power, and more time than these guys had for the demo. Like I said, also, a lot of the SPU idling seems rather wanton more than anything else - they spread what looks like the work of 2-3 SPUs across 5. I just found that interesting given the approach many take to tech demos, the attitude that they represent a level of technical achievement that could never be realised in a game. In this instance, many PS3 games might have enough 'spare' to do all the CPU calcs for a demo like this on the side, on top of whatever they're doing for their own game!
 
Last edited by a moderator:
Back
Top