PDA

View Full Version : Georgia Tech STI Cell/B.E. Workshop 2007


one
23-Jun-2007, 08:13
Now there are a bunch of presentation pdfs from Georgia Tech STI Cell/B.E. Workshop 2007 (June 18-19) on their web site.

http://sti.cc.gatech.edu/
http://sti.cc.gatech.edu/program.html

Among them, about games:
Cell/B.E. Powered Games, Jim Tilander (http://sti.cc.gatech.edu/Slides/Tilander-070619.pdf), LucasArts
Cell/B.E.: Programming for Digital Animation and Visual F/X (http://sti.cc.gatech.edu/Slides/DAmora-070619.pdf), Bruce D'Amora (IBM T. J. Watson Research)
Developing Technology for Insomniac's Ratchet and Clank Future: Tools of Destruction (http://sti.cc.gatech.edu/Slides/Acton-070619.pdf), Mike Acton, CellPerformance

Bruce D'Amora's presentation is the latest update for server-side physics-based modeling which I think will be the basis of a PS4-era system architecture.
Motivation: Server-side Physically Based Modeling
Enable the next generation MMOGs & virtual environments
– Current online video games perform limited amount of physical simulation
– Not enough client CPU resources
– Bandwidth & Latency between processing nodes prohibitive to achieving real time performance
Enable complex visual F/X movies on large servers

Jim Tilander thinks new languages may be needed for better SPU programing, I find this part interesting
Visual Basic for CELL?
● Visual interface to a dataflow language.
● Targeting non CELL programmers.
● SPU constraints can be build into the language.
● Visual debugging and single stepping.
● Race condition analysis, deadlock analysis
But Mike Acton can't wait for a new language to come.

Excerpts from Mike Acton:
SPU Management, Why, Oh Why?
● Trying to find a single SPU job manager is a lost cause. i.e. SPURS
● Dynamic job management is just like dynamic memory mangement (but time vs. space)
● ... and we wouldn't use malloc/free, why would want to use a job manager/scheduler?

Away from SPURS?
● Resistance used a mix of job management and manual. We're moving toward much more manual control.
● But what about middleware?
– Red herring.
– Use mix of PPU and SPU libraries. Pass whatever data is needed.
What's been our strategy for RCF?
● Put more on the SPUs, less on the PPU.
● Less PPU/SPU synchronization.
● Better dataflow organization.
● Better SPU code design.
● Concentrating on major engine systems
Sideline: Basic Philosophies
● Not porting to the Cell, designing for the Cell.
– SPUs are the core of the Cell, the PPU is a minor player.
● THE DATA IS EVERYTHING.
– Good code = Data transformation kernel
● Small
● Fast
● Does nothing more than it needs to,
● No extra complexity
Example: Physics, Before
● Heavy PPU Synchronization
● SPU was used as a coprocessor
● SPU “packets” were built on the PPU
– For small jobs, this often took more time than the processing!
● SPU code was scalar port
● Many stage pipeline with PPU synchronization at each stage.
● Scattered data (No dataflow design)
Physics, Now
● Pipeline well defined, SPUdriven
– e.g. Code uploading is controlled by SPU
● SPU processing completely asynchronous
● Data wellorganized and defined.
● No (or minimal) PPU intervention
● Also, will be:
– Reorganizing
data for better SIMD processing
– Prefer si_* intrinsics and asm to spu_* intrinsics
– Better local store management for bigger jobs.
Random Stuff to Fill Time
● AutoSIMDization. No value e for games.
● Autoparallelization. Deadend.
● How much parallelism is in games?
– A whole lot. Will be able to fill 32 Cells.
● What do we want in Cell v2?
– Half precision floats (full native math)
– Bit interleave instruction. Maybe dot product too.
– Full 128 bit shifts and rotates.
– Fill out the integer instruction sets.


EDIT: Forgot to add another SCE presentation about SPURS!

Beyond the GFLOPS (http://sti.cc.gatech.edu/Slides/Mallinson-070618.pdf), Dominic Mallinson, Vice President, US Research and Development, Sony Computer Entertainment Inc.

patsu
23-Jun-2007, 08:40
Very interesting ! Thanks for posting this. I am still digesting the comment about moving away from SPURS towards manual control. Is this because Insomniac does not perform triangle culling on the SPUs, and so they need more careful/absolute control over what gets sent to RSX (and when).

...which makes me wonder what kind of custom programs run on their SPUs. I heard they wrote their own physics engine in Resistance. What else ?


EDIT: Just came across this...

Another real world application for PS3:
http://www.kndo.com/Global/story.asp?S=6688685&nav=menu484_2_10

Titanio
23-Jun-2007, 09:44
Very interesting ! Thanks for posting this. I am still digesting the comment about moving away from SPURS towards manual control. Is this because Insomniac does not perform triangle culling on the SPUs, and so they need more careful/absolute control over what gets sent to RSX (and when).

...which makes me wonder what kind of custom programs run on their SPUs. I heard they wrote their own physics engine in Resistance. What else ?

From the presentation:

Resistance, Launch title 2006
– Collision detection on 2 SPUs
– Physics, SPU managed jobs
– Special FX, SPU managed jobs
– Geometry culling, SPU managed jobs
– Lowlevel
Animation, SPU managed jobs
– ...and lots more!
● Resistance did take advantage of the Cell, but...
– We could still do much better.

I don't think the move more towards manual stuff has much to do with SPURs managing the timing of data sent to RSX..SPURs is a general framework for managing SPU tasks, but I think Acton is saying that one-size-fits-all isn't really possible, and he pointed out one bad example of SPURs usage. I am thinking if you were using SPURs, however, you would have control over when data is passed to RSX since it's still just running your code.

Arwin
23-Jun-2007, 13:13
I think it's a matter of designing the software in such a manner that you can predict the load, and can divide the loads as efficiently as possible, and give priority to loads as efficiently as possible. You'll want to change jobs as little as possible because changing the jobs requires a lot of overhead, no matter how well you do it, compared to not changing the jobs. In this case, tailoring your application to your exact needs should always allow more performance to be squeezed out of the system than any kind of general job management ever could.

patsu
24-Jun-2007, 06:15
I am thinking if you were using SPURs, however, you would have control over when data is passed to RSX since it's still just running your code.

This is what I thought too. Generally speaking, something like SPURS should be able to balance the load dynamically instead of statically. For them to prefer manual control, off the top of my head, it would mean that:
(i) The workload is predictable (so they can allocate them statically to cut down overhead), or
(ii) They are using some "hacks" to manage the data sharing and data flow carefully (like Deano's cache). So leaving it to SPURS can be "unpredictable".
(iii) They want to optimize the cooperation between RSX and Cell "fully" (i.e., utilize both SPU and RSX fully, not to help RSX)

Of course I am guessing in the dark. :) It also depends on how high they are shooting (e.g., SPURS may be great, but they want "best in every situations").

EDIT: Yeah... I kinda lump Arwin's response as (i)

Shifty Geezer
24-Jun-2007, 09:42
Of course I am guessing in the dark. :) It also depends on how high they are shooting (e.g., SPURS may be great, but they want "best in every situations").I think this is actually a key thing. It's all very well for a high class, low-level dev like Insomniac to say 'don't use SPURS - it's rubbish!' but for ordinary devs who want decent results with minimum effort, SPURS might be a smart choice. SPURS seems to me like a conventional sort of API overhead - it simplifies the job of development at a cost of peak power attainable.

inefficient
24-Jun-2007, 11:04
I think Mike Action is basically saying SPURS type systems are still very immature and create more problems than they solve.

A complicated engine, although theoretically better, is harder to optimize compared to a simpler engine. A simpler engine that is theoretically less efficient, might use a brute force technique that you understand very well and can optimize the hell out of.

Similarly, their position on "Do we need a new language" to which his answer is basically NO. It's better at this point to use the languages we already have and understand very well.

I think the main take away is that you can just "hide" the main issues with extracting performance out of Cell by using some clever library like SPURS, or using some new fancy programming language, or by sprinkling pixie dust over your code. These no way out of getting your programmers to really understand the issues. And the best solution at this point seems to be "better programmers" rather than "better tools". So they have been focusing on a training program and knowledge sharing to improve their programing team.

It seems the quality of your coders is going to have the biggest impact on this particular platform. Which makes me smile a little bit.

SPM
24-Jun-2007, 15:39
Interesting.


Physics, Now
● Pipeline well defined, SPUdriven
– e.g. Code uploading is controlled by SPU
● SPU processing completely asynchronous
ie you should never ever make SPUs wait for other processes like you would with a conventional system. In a conventional shared memory multi-processor system you spawn many threads and make them wait on other threads in order make use of the results. On SPEs context switching to allow multiple threads to run on a single SPE is very expensive, so SPEs should never wait. They should be run as batch processors (or chained batch processors with the out put of one SPE feeeding the next SPE) which load process code into local store, run the process on many datasets in turn and store each result in turn in memory. When the batch processing for that type of dataset is complete, then load another batch process code into local store and process another type of dataset and so on. In games the screen frames may be used to syncronise batch processing with real time processing.


● THE DATA IS EVERYTHING.
– Good code = Data transformation kernel
● Small
● Fast
● Does nothing more than it needs to,
● No extra complexity
ie. the SPE is a awesome data processor, but not a good branch processor or good at elegantly manipulating complex structure structures in RAM. Rethink everything you have learned about data structures based on conventional shared memory processors, treat AI as boolean data to be batch processed, and the results left to be acted on later, as far as possible.

All this is simple and obvious, but conventional thinking gets in the way of writing good code.

Jesus2006
25-Jun-2007, 07:07
ie. the SPE is a awesome data processor, but not a good branch processor or good at elegantly manipulating complex structure structures in RAM. Rethink everything you have learned about data structures based on conventional shared memory processors, treat AI as boolean data to be batch processed, and the results left to be acted on later, as far as possible.

I think the BFS tree examples have shown that it's also quite performant with loads of random access patterns, just depending on the method of implementation (as so often).

Mike Acton
28-Jun-2007, 04:39
I think this is actually a key thing. It's all very well for a high class, low-level dev like Insomniac to say 'don't use SPURS - it's rubbish!' but for ordinary devs who want decent results with minimum effort, SPURS might be a smart choice. SPURS seems to me like a conventional sort of API overhead - it simplifies the job of development at a cost of peak power attainable.

We are not trying to be elitist here. I absolutely would make the same recommendation to any developer. As a matter of fact, for those developers with less parallel development, and specifically SPU, experience I would even say it's more important that they learn to do things manually.

Yes, there're optimizations that can only be done when you know the data and have designed the dataflow and system load yourself. But more than that, it's about what happens when things go wrong: If you don't understand how things work, "under the hood" as they say, you're going to be stuck spinning your wheels and trying totally random things if a system like SPURS crashes or doesn't give you the results you expect.

It's about fundamentally realizing that it's the developer that's responsible for how things are running (and the performance) - not the library or some other tool.

It's akin to my experience with good assembly coders: Good assembly coders usually know how to write high-level code that's fast. That's because they fundamentally understand what the compiler is doing and what trade-offs it's making and they are using it as a tool to speed their development - not a tool to replace their brain.

And, honestly, it's not that hard. We're talking about how to balance a load across six SPUs, not 128,000. Six. I'd hope that'd be approachable by just about anyone with a whiteboard and a pen. (And if it isn't, it's probably a sign that the systems are unnecessarily complicated anyway.)

Mike.

Mike Acton
28-Jun-2007, 05:09
I think Mike Action is basically saying SPURS type systems are still very immature and create more problems than they solve.

Yes, I am saying that these kinds of solutions cause more problems than they solve. The problem with them is that they disguise something which is very simple (load code onto SPU, DMA data around, repeat.) with something which is both more complex and more prone to failure. And that is nearly always because "general" solutions like this (and malloc, etc.) always try to solve every problem conceivable by the developers, all at the same time. How could that not be more complicated than just solving the problem you need to solve?

Similarly, their position on "Do we need a new language" to which his answer is basically NO. It's better at this point to use the languages we already have and understand very well.

Right. As far as I'm concerned it's a completely academic discussion, which I'll let the academics work out. We need to make a game right now and anything that's not available to us right at this moment is off the table. If and when something is there, we can talk about it.

On top of that, for what we're doing at least, the language is simply not a bottleneck. A new language isn't going to make us think faster. A new language isn't going to help us design our data faster. At best we'd be able to type stuff in a little faster - but it's never once been the case were some feature couldn't get in the game just because we couldn't type it in fast enough.


I think the main take away is that you can just "hide" the main issues with extracting performance out of Cell by using some clever library like SPURS, or using some new fancy programming language, or by sprinkling pixie dust over your code. These no way out of getting your programmers to really understand the issues. And the best solution at this point seems to be "better programmers" rather than "better tools". So they have been focusing on a training program and knowledge sharing to improve their programing team.

I don't think it's ever been about "better tools", regardless of what anyone wants to believe. The problem comes in when people forget that tools are there to help us to our job better or faster, not to do the job for us. Our team is responsible for the product that gets in the hands of the players - we're never, ever going to tell them that we couldn't do something "because the tools wouldn't allow it." or because "it was too hard." Those are the poorest of excuses.

As programmers, we're not even close to perfect. But we're working hard to do the best we can with what we have and hopefully that shows in the end.

archangelmorph
28-Jun-2007, 09:55
I don't think it's ever been about "better tools", regardless of what anyone wants to believe. The problem comes in when people forget that tools are there to help us to our job better or faster, not to do the job for us. Our team is responsible for the product that gets in the hands of the players - we're never, ever going to tell them that we couldn't do something "because the tools wouldn't allow it." or because "it was too hard." Those are the poorest of excuses.

Amen to that!

I wouldn't be anywhere near where I am today careerwise (or even throughout my academic training) if I adopted this mentality..

Crossbar
29-Jun-2007, 07:55
... It's akin to my experience with good assembly coders: Good assembly coders usually know how to write high-level code that's fast. That's because they fundamentally understand what the compiler is doing and what trade-offs it's making and they are using it as a tool to speed their development - not a tool to replace their brain.
Thanks Mike, everything you write make perfect sense to anyone with experience from development of performance critical software.

I think every programmer involved in such development should have some basic understanding of assembler of their target system to be able to evaulate the assembler output of the compiler. There are so many ways that you can help the compiler by resolving dependencies on a high level by proper declarations of access-functions and such or by organizing your code and data in a good way to enable efficient pre-fetch loads to registers, loop unrolling etc..

The compilers are often pretty clever, but if you don´t pay attention you can screw up things pretty badly by making a function call at the wrong place or something similar. After all the compiler can not be smarter than what the input of the programmer allows it to be.

Fdooch
29-Jun-2007, 14:40
Hello Mike!
You mentioned some problems with devkits increase your iteration time.
Did you mean write code - compile - run (debug) iterations?
Do I understand correctly that you started using PS3 with Linux installed?

Mike Acton
29-Jun-2007, 20:36
Do I understand correctly that you started using PS3 with Linux installed?

Yes. Not exclusively obviously, since there isn't access to the Sony SDK nor the RSX under Linux. But for SPU systems work, it's an ideal development environment.

We can compile, build and test quickly and easily under Linux without going through the extra steps needed with our development kits. A few of us have been doing this with good success, and I think we'll start to see some more use made of Linux over time. I'm personally a more of a Unix guy anyway, so it's great for me.

I would like to make something clear though: The PS3/Linux kits are great for development process of creating SPU systems, but I would not want to develop a game that runs on Linux. The OS would simply be too slow and get too much in the way. We much prefer the very thin OS that's running on the PS3 - ideally, there's as little as possible between us and the hardware.

Mike.

Shifty Geezer
29-Jun-2007, 20:49
While you're here, is anyone looking into physics via rods, as demo's in the Rigs of Rods (http://rigsofrods.blogspot.com/) program that impressed Brian Beckman so much? This seems a physics model ideally suited to SPEs.

Titanio
29-Jun-2007, 21:21
Interesting. Looks like spring-and-mass systems, but using very stiff springs, and building what would traditionally be rigid bodies out of them. They are indeed very simple to implement..one of my first larger programming projects was a simple spring-and-mass simulation for cloth and soft bodies.

And yeah, for the same reason that SPEs are great with cloth and particle systems, so they would be good for this kind of stuff. I guess it's using the same way you'd simulate soft bodies, but now for harder bodies too. I found stability was always a problem in my simulations if you tried to make the springs very stiff, but evidently there are ways around that I guess.

Fdooch
29-Jun-2007, 22:15
...
Thanks for the explanation Mike!

DegustatoR
29-Jun-2007, 22:45
We can compile, build and test quickly and easily under Linux without going through the extra steps needed with our development kits.
Can you tell us what are these extra steps or is this NDAed?

Mike Acton
30-Jun-2007, 04:18
Can you tell us what are these extra steps or is this NDAed?

There's nothing to tell really. We have special development kits which have features and tools not found in the retail PS3s. We use special tools to communicate with the devkits, and to debug and profile on them. Having those extra features implies a bit of extra complexity in system. The details of those things are of course covered by NDA, but it's not really dis-similar from any other embedded development platform or even that much different from PS2 development.

Sometimes we just don't need all the extra stuff and something simple like just writing and running the code natively on the retail box is a win.

Mike.