SPU usage in games

betan

Veteran
I wanted to create a thread to keep track of how SPUs are used in games.

From Q&A with Dylan Jobe of Warhawk[Official PS Blog] :

"4. What did you do on this game that you couldn’t do on another platform?
It’s hard to answer this and not sound like a gratuitous SONY sales pitch :)

Although I would say it’s the sum-total of all of our natural phenomenon in the game. Our clouds, procedural water, atmospheric scattering, terrain, etc. All of this stuff runs in parallel on all 7 SPUs simultaneously every frame – I’m still not sure if the game community is giving enough credit to just how fast the SPUs really are."

Of course the number 7 is a little puzzling.
 
Resistance:

Even such perfect fits require some compromises. Ideally, software could automatically allocate tasks to whichever of the SPEs has the most time on its hands, but in order to simplify the programming, Insomniac was forced to dedicate two SPEs exclusively to collisions. Two processors are needed in the most demanding situation, one with lots of players, monsters, and bullets all moving around at once. “In games you’re more concerned about the worst-case scenario rather than the average,” explains Hastings. If you aim for the average, then there will be many times when the processors can’t finish the job in time and the game stalls. But by the same reasoning, most of the time those two processors aren’t being used to the fullest.

Ultimately, in future games, Insomniac will try to get almost all tasks running on the SPEs. “The holy grail that people writing games on the Cell are ultimately trying to reach is to get…the real highest-level decision making [onto the SPEs],” says Hastings. “I think that, based on where we are now, that’s still a few years away.”

http://www.spectrum.ieee.org/dec06/4745/2




Motorstorm:

MotorStorm only uses between 15 and 20 percent of available SPU resource, so we’re aiming to achieve a 5 fold increase in SPU performance, which should allow us to do some awesome stuff!

Our SPU exploiting systems consist of:

i) Havok physics.
ii) Determination of object visibility.
iii) Concatenation of hierarchies.
iv) Billboard object culling and vertex buffer creation.
v) Updating of particles and vertex buffer creation.
vi) Updating of vehicle dynamics.
vii) Updating of vehicle suspension constraints.
viii) Audio (MultiStream).
ix) Video decoding.

http://www.beyond3d.com/content/interviews/38/



I guess these guys could be lying, so who knows for sure.
 
Last edited by a moderator:
Thanks Todd, I have been planning to track down the Spectrum article as I read it from the printed copy.
I assume you know Motorstorm interview is from Beyond3d.

Another quote with number 7, from Formula 1 CE interview[Newsweek]:
Please detail how F1 CE is using the Cell processor's components, the PPU and the seven SPUs. (Example from an actual launch title: PPU for game logic; SPU 1 and 2 for shader effects; SPU 3, 4, 5 for PhysX physics simulation; SPU 6 for particle effects; SPU 7 for audio. Also SPUs 1-5 used during loading to reduce load times.)

We don't really use the concept of reserving certain SPUs for specific tasks. Instead we employ the concept of prioritized job lists that are executed by the SPUs whenever one is available. We use the SPUs for the following jobs: audio effects, particle system, physics (landscape collision, narrow phase and collision resolution), rain effects (rain droplets and rain splashes) and various render side jobs. The game logic is driven largely by the PPU. We use the SPUs together to collaborate on working through each frame that's displayed by the game. The SPUs are extremely versatile so they can be used to accelerate any in-game system.

How is F1 CE using the RSX graphics processor? Do the Cell and RSX work together on any part of the graphics pipeline, and if so, which one?

The SPUs are heavily involved in the graphics pipeline and do an enormous amount of work to eliminate inefficiency before anything arrives at the PPU and RSX. For example, the SPUs are powerful enough to decompress and check every triangle [polygon] before passing it on to the RSX. Triangles that are facing away from the player, or that are not on the screen can be 'trimmed' away by the SPUs, which hugely reduces the amount of redundant work sent to the RSX. This in turn lets the RSX get on with what it does best--drawing stuff on screen.

The SPUs can also be used to augment the RSX vertex shaders, making far more vertex-heavy tasks possible which is very useful for character animation. Additionally, the SPUs can be used to implement behavior very similar to geometry shaders--F1 CE uses them in this way to render seamless interpolated levels of detail for some scene elements. So in answer to the question "Do the Cell and RSX work together?" the answer is a resounding "Yes," and I think this is one of the real strengths of Playstation 3 that we'll see increasingly exploited by development teams going forward.

ps: I wasn't quote-formating as it somewhat more difficult to reply, but wth, consistency is better.
 
Is this why the 7th SPU is reserved then? Not so much for OS tasks, but audio processing etc.

I never understood why they need a dedicated SPU for the OS, it's not like the Os is running on the SPU. Is it just polling the PS button? I think they will give it back to the devs soon, like they are slowing doing with the OS memory.
 
Is this why the 7th SPU is reserved then? Not so much for OS tasks, but audio processing etc.

I don't think it makes sense to reserve an SPU for OS-independent, game specific functions. Also Croal was just giving an allocation example, corrected by "We don't really use the concept of reserving certain SPUs for specific tasks.", but not regarding the number.
 
I never understood why they need a dedicated SPU for the OS, it's not like the Os is running on the SPU. Is it just polling the PS button? I think they will give it back to the devs soon, like they are slowing doing with the OS memory.

They've done that for the PS2 and PSP so possibly. My thought was that rather than have dedicated sound processing hardware, the SPU can convert it etc. and pass it to RSX (for inclusion in the HDMI output). However, others suspect RSX has a large portion dedicated to sound processing (hence the extra transistors).
 
http://gamers-creed.com/?p=239
Q : is Drake’s fortune using the full ps3 potential ?
CB : Truly speaking, i think we only use 30% of the cell capacities. When we started the development of the game, we worked on PC without still knowing the exact potential of the console. Let just tell you that we had a heavy hand ! Once we started to really develop on the console, we were stunned by so much power. Moreover, as we are curently pursuing development, we are still impressed by the possibilities offered to us.
 

More SPU related talk from the interview:

B3D: Were Evolution's thoughts that the application of Cell towards such tasks might go too far in removing SPEs from being available for work on AI, physics, and other gameplay-related code?

Scott Kirkland: Cell’s SPUs provide a huge amount of processing power. Early adopters tended to bias usage towards either RSX or PPU support (we fall into the latter category). I’m confident that over the coming months, exploitation of this resource will become far more balanced.

B3D: Further to that, do you believe that as the generation progresses, cooperative rendering techniques will become a larger part of what grows to define baseline PS3 rendering methods, or are your thoughts that such efforts will play out more or less in niche areas?

Scott Kirkland: If by “cooperative renderingâ€￾ you’re referring to SPUs supporting the RSX, I strongly believe that this approach will become far more widespread. In addition to reducing the vertex load on the RSX through the use of culling and vertex pre-processing, this approach also provides an efficient mechanism to introduce procedural geometry.

Historically, CPUs have provided course grain scene culling using view frustums, occlusion planes, portal visibility and BSP-trees with GPUs left to perform fine grain rejection using guard band clipping, occlusion and backface culling. While such features improve fragment performance, they don’t reduce vertex processing overhead.

The leap in performance provided by Cell gives us the bandwidth to significantly reduce RSX time spent processing vertices that don’t contribute to the final scene. The favoured approach is to use SPUs to generate minimal scene/instance specific index and vertex buffers from compressed data.
 
Lair & Cell

In what other ways does Lair take advantage of the Cell?

We have all of our animations running on the S.P.U.s of the Cell's chip because you couldn't draw armies or basically animate armies of that amount and size without it. And our physics are completely on there. We are also doing fluid dynamics for the first time in a game, as far as I know. Water is not basically a sheet of a base surface, but completely animated and sub-divided, and you actually can direct with it thanks to the Cell. We actually do part of our rendering on the Cell. Simply because it's so powerful, we spent months and months moving more and more systems onto the S.P.U.'s.

Do you dedicate one S.P.U. to enemy A.I.?

Not a specific S.P.U. Our S.P.U. code works dynamically, so we are not locking up one S.P.U. and saying "OK, you are the A.I. S.P.U., but we instead say, "OK, here are these 15 things including A.I." We run them on the S.P.U., and the code automatically distributes them. And sometimes, yes, A.I. certainly can take up a full S.P.U.

Rest of interview here: http://www.gamepro.com/news.cfm?article_id=110368
 
David Kirk: SPE and RSX can work together. SPE can preprocess graphics data in the main memory or postprocess rendering results sent from RSX.

Nishikawa's speculation: for example, when you have to create a lake scene by multi-pass rendering with plural render targets, SPE can render a reflection map while RSX does other things. Since a reflection map requires less precision it's not much of overhead even though you have to load related data in both the main RAM and VRAM. It works like SLI by SPE and RSX.

David Kirk: Post-effects such as motion blur, simulation for depth of field, bloom effect in HDR rendering, can be done by SPE processing RSX-rendered results.

Nishikawa's speculation: RSX renders a scene in the main RAM then SPEs add effects to frames in it. Or, you can synthesize SPE-created frames with an RSX-rendered frame.

David Kirk: Let SPEs do vertex-processing then let RSX render it.

Nishikawa's speculation: You can implement a collision-aware tesselator and dynamic LOD by SPE.

David Kirk: SPE and GPU work together, which allows physics simulation to interact with graphics.

Nishikawa's speculation: For expression of water wavelets, a normal map can be generated by pulse physics simulation with a height map texture. This job is done in SPE and RSX in parallel

Enjoy :)
 
I'm not sure how much I buy into developers giving a percentage of usage figure for Cell, but maybe they have a standard way of determining that, for comparison (?)

FWIW, in the latest issue of GamesTM, Dylan Jobe says he reckons they're using about a third of the SPU's potential, which is the same as what ND reckons with Uncharted. That may seem odd at first glance, but I guess Warhawk is doing some intensive stuff on Cell i.e. the clouds etc.
 
I'm not sure how much I buy into developers giving a percentage of usage figure for Cell, but maybe they have a standard way of determining that, for comparison (?)

FWIW, in the latest issue of GamesTM, Dylan Jobe says he reckons they're using about a third of the SPU's potential, which is the same as what ND reckons with Uncharted. That may seem odd at first glance, but I guess Warhawk is doing some intensive stuff on Cell i.e. the clouds etc.

Why is this so hard to believe?

Most profilers will give you a good indication of how much idle time per core occurs per frame..

Then you can just work it out from there..
 
I'm not sure how much I buy into developers giving a percentage of usage figure for Cell, but maybe they have a standard way of determining that, for comparison (?)
Although the line between reality and PR sht may stay blurry for sometime, they really do have profilers.
FWIW, in the latest issue of GamesTM, Dylan Jobe says he reckons they're using about a third of the SPU's potential, which is the same as what ND reckons with Uncharted. That may seem odd at first glance, but I guess Warhawk is doing some intensive stuff on Cell i.e. the clouds etc.

Dylan Jobe actually said 1/3 SPU usage? Not totally inconsistent with 7 SPUs but still....

I believe Uncharted's figure first surfaced in one of the Full Moon podcasts, Evan Wells said they ran the code through the profiler after Gamer's Day and found out about 1/3 SPU usage. About 2 (possibly distributed) SPUs seem reasonable for that particular game. After all we know EDGE's animation code that runs on an SPU is written by Naughty Dog. Along with some minor stuff, 2 SPUs isn't unlikely.

I find it more interesting that the figure was found out after Gamer's Day.
 
Well, let's not confuse the number of SPUs used with SPU usage. You could be running code on all SPUs and only be using some given fraction of their potential.
 
Even if someone was using 100% of the SPUs time it doesn't mean they are doing so efficiently either.
 
Well, let's not confuse the number of SPUs used with SPU usage. You could be running code on all SPUs and only be using some given fraction of their potential.

Granted.. but nine times out of ten your going to be running heavy processes on them anyway which aid a particular system of your engine..

Whether it be physics calcs, vertex processing, AI or something else these areas are still going to take up the vast bulk of processing work being done per frame on each core..

I guess the ideal situation is choosing optimal algorithms which allow you to process the greatest possible data load at run time..

The hardest part is choosing the optimal algorithm in the first place.. Then after that it's just a case of optimising your implementation of it to get it running in the best way possible..

It's not particularly likely that the average PS3 game engine is going to be spending a significant margin of its processing time dealing with redundant/poor code that hasn't been optimised out at some point...
 
Granted.. but nine times out of ten your going to be running heavy processes on them anyway which aid a particular system of your engine..

Whether it be physics calcs, vertex processing, AI or something else these areas are still going to take up the vast bulk of processing work being done per frame on each core..

Not sure if I agree. We've had a variety of examples from papers and presentations where SPUs were being used, but being idled a lot of the time also. Not of games specifically, because no developer has given us that level of insight, but from various physics demos, the 'lots of ducks' demo etc. and the reasons as to why that happened probably arise in some games too.

You can be sure most of these games indicating x% of SPU usage are probably spreading that over all SPUs, so in such cases you'd be having a lot of idle-time there..most of it would be idle-time, in fact.
 
Back
Top