John Carmack on PS3 Video

DiGuru said:
Simple objects within small boundaries, with only a single goal at any one time and only a rough idea about their surroundings seem to create much better models. Look at mobs, don't try to be a puppet player seems to be the lesson in this.

Boids are the perfect example. Extremely simple, yields complex flocking behavior.
 
DemoCoder said:
I don't see why agents need access to complete information about the world, or even perfect information.

It seems that an agent needs access to the following information:

1) detail and update to date information about their immediate vicinity
2) rough/semiaccurate but possibly stale information about areas not in view or in the vicinity (e.g. not necessarily updated every tick/frame and not necessarily detailed down to the mesh level, or even a fine grained waypoint graph)

The AI will have to burn more cycles on planning than traditional "omniscient" AIs, and it can be wrong and surprised from time to time, but on the other hand, it will behave more realistically.

Then the question becomes, how easy is it to gather #1 and how expensive is it to run the AI planning.

I think we can safety eliminate the scripting issue by designing a "AI shader" language coupled with a SPU compiler rather than relying on p-code interpretation. Either that, or just create a bunch of high level C++ classes/templates which can be "scripted" by composition.

Sure, it is tougher than PC coding, but I am not convinced that the problem is intractable from a parallelization point of view. It seems hard from a data-management point of view to optimize data structures and get the right data into the SPUs at the right time. But this doesn't mean the core problem isn't embarassingly parallel, just that CELL makes it a little difficult to code.

Even if we consider hundreds or thousands of agents, the AI doesn't need to know about the state of each and every one of them. Just think of a human being at a football stadium. There are tens of thousands of people. The only detailed information real people utilize is the state of the people immediately around them (such as the guy who is blocking your view, or talking on his cell phone). The rest of the audience is perceived as a big blob with only its macroscopic properties noted. Your attention is only called to someone far away when something surprising happens to make them stand out from the macroscopic mass.

In game AI, with a huge army for example, it seems to me that the AI only needs to care about the enemies immediately around them, as well as note the large scale macroscopic properties of the actors far away, except for those that call you attention, such as if someone far away hits you with a distance weapon, or you have a target that you need to get to.


Yes agreed, however it isn't necessarilly clear which agents I need to know about to compute a given agents behavior.

On an SPU type model you have basically 2 solutions to this, the agent makes requests for data in a previous frame and it stores local copies of the data it cares about. Or you make multiple DMA requests for agents that can affect you.

The first is an extension of the perception/knowledge phase, but this is usually not cleanly decoupled from the decision making phase.

The second is really collapsing Peception into decision making, and the multiple DMA requests are suboptimal.

Either one is none trivial and significantly more work than say using an SPU to do somecloth simulation.

I'm not arguing you can't do these types of work on an SPU, just that it's a lot of work and SPU's will likely be used for trivially parallel tasks.
 
ERP said:
In this case where it starts to become hard to make parallel is where old data is not sufficient.
Why would it be necessary? Humans don't react that fast ...
 
MfA said:
Why would it be necessary? Humans don't react that fast ...

You have to stop thinking in terms of humans...
Say for example I have a requirement where only one thing in a group can do X, I can't poll the group and choose to do X because other things in the group could choose X at the same time.

This case is trivially solved by negotiating as you would over a network, in fact a lot of parallel issues start looking like network problems, simply becasue they are network problems once the memory pools are disparate.
 
ERP said:
Yes agreed, however it isn't necessarilly clear which agents I need to know about to compute a given agents behavior.

On an SPU type model you have basically 2 solutions to this, the agent makes requests for data in a previous frame and it stores local copies of the data it cares about. Or you make multiple DMA requests for agents that can affect you.

The first is an extension of the perception/knowledge phase, but this is usually not cleanly decoupled from the decision making phase.

The second is really collapsing Peception into decision making, and the multiple DMA requests are suboptimal.

Either one is none trivial and significantly more work than say using an SPU to do somecloth simulation.

I'm not arguing you can't do these types of work on an SPU, just that it's a lot of work and SPU's will likely be used for trivially parallel tasks.
Why would you want to do that? Turn it around: when the bullet hits them, they become interested. They have no need to know that sniper exisits before that bullet hits them or someone around them in their direct view.

Just add the processing of the event to the end of the list.
 
Sounds to me like a variation on the Dining Philosophers problem, and there are versions of solutions to DP that do not require negotiation. There are also versions of distributed dead-lock free locks that do not require "negotiating" (e.g. voting) or communication with any nodes other than the an immediately adjacent one.

If I wanted to impose mutual exclusion on a group based on a state variable, e.g. that only one of them could go into state X at any given time, I don't think I would choose an algorithm that requires network negotiation.

In the worst case, you could allow more than one to go into state X optimistically. Then, on your second pass (say, render pass done by a separate thread that consolidates all the actors being updated by various SPUs), you "rollback" any actors who violate preconditions to some pre-defined collision state. As long as collisions are relatively rare, it's a win.

I don't think anyone is claiming that there aren't some problems for which global coordination (or semi-global) would be necessary. I just don't think this negates the advantages that a highly parallel chip has. Unless your workload is dominated by such paradigms, I'd rather have the more parallel chip than the SMP or single threaded chip.
 
DemoCoder said:
I don't think anyone is claiming that there aren't some problems for which global coordination (or semi-global) would be necessary. I just don't think this negates the advantages that a highly parallel chip has. Unless your workload is dominated by such paradigms, I'd rather have the more parallel chip than the SMP or single threaded chip.
Agreed.
 
BTW, lighweight threading inside a SPU makes sense. If one is waiting on a DMA request, one would like to continue working until the request has been satisfied. For example, if I am processing an actor, and need to request something from external memory, I'd make the request, and then move on to the next actor or piece of processing. As requests come back from main memory, I'd place them in appropriate queues and wake up threads waiting for them. As a developer, you then decide how much local storage you want to dedicate to context saving (typically registers and/or stack) and how to partition local storage into several streams that can be populated separately and worked on.
 
DemoCoder said:
BTW, lighweight threading inside a SPU makes sense. If one is waiting on a DMA request, one would like to continue working until the request has been satisfied. For example, if I am processing an actor, and need to request something from external memory, I'd make the request, and then move on to the next actor or piece of processing. As requests come back from main memory, I'd place them in appropriate queues and wake up threads waiting for them. As a developer, you then decide how much local storage you want to dedicate to context saving (typically registers and/or stack) and how to partition local storage into several streams that can be populated separately and worked on.
And you can even use a solution, where you split the local storage into multiple independant stores, and store the least used one to global memory, if you need the space.
 
DiGuru said:
Why would you want to do that? Turn it around: when the bullet hits them, they become interested. They have no need to know that sniper exisits before that bullet hits them or someone around them in their direct view.

Just add the processing of the event to the end of the list.

It's an example and not a particularly unusual one.
 
DiGuru said:
But it would depend on your model, right?

define model.

OK let me give you a variation on the problem, I have a chair in a room and actors can choose to sit in it.

Actors have to either poll the chair to see if it becomes occupied on the way to it, or at some point they have to "claim" the chair. When you would claim it is somewhat dependant on the behavior you want to model.

For chair you can read any limited resource in the world, these race conditions are why you can dupe in online games.

As I've said you can build something that will work on SPU's, it's a lot of work, and you rule out time savers like scripting languages because they are not very distributed memory friendly.

In the short term I see SPU's being used for things that can be trivially moved there, necessity might later force other tasks there. Right now I would consider anything that can be put on an SPU a win (even if it runs MUCH slower there) because I strongly believe that games will be largely PPU bound.
 
The example you give however is local. Let's change it to a parking lot. Limited number of parking spaces. Hundreds of drivers looking for spots. At any given time, each driver can only look at a few parking spots, and they make erroneous conclusions as to whether spots are empty, full, or in process of being filled. This leads many drivers to conclude a spot is empty, they plan to fill it, but right up until the final point, until they realize someone else is filling it. Then they back off.

Both of these scenarios work well on SPUs without contention, or global locks except in a rare circumstance when two actors reach the same point at the same time. But even in the worst case, the lock is local, not global. I don't see any need for SPUs to lock global shared data. And even if they do, it should be rare, and done with a protocol that does not negotiation (Distributed locks which have this properly exist and are absolutely deadlock free)

For RPG duping, one can simply use double entry accounting + journaling. Each actor has a balance sheet. If Actor A gives Actor B a Sword, actor A is first deducted 1 sword. Actor B is next debited 1 sword (they both have a transaction log). With a two phase transaction, this can't be duped. You can always detect a dupe on reset by recomputing the inventory from the balance sheet. The fallacy of most MMORPGs with duping is the failure to follow basic accounting techniques long used in the banking and database industry. This means anything they crashes the transaction in process or escapes it, can leave it in an invalid state.

(transaction logs are also lock free and highly parallelizable and streamable to boot)
 
DemoCoder said:
For RPG duping, one can simply use double entry accounting + journaling. Each actor has a balance sheet. If Actor A gives Actor B a Sword, actor A is first deducted 1 sword. Actor B is next debited 1 sword (they both have a transaction log). With a two phase transaction, this can't be duped. You can always detect a dupe on reset by recomputing the inventory from the balance sheet. The fallacy of most MMORPGs with duping is the failure to follow basic accounting techniques long used in the banking and database industry. This means anything they crashes the transaction in process or escapes it, can leave it in an invalid state.

(transaction logs are also lock free and highly parallelizable and streamable to boot)
Combine that with a single function that is allowed to change those values (ie: make it into a property with get and set methods), and it's pretty watertight.
 
ERP said:
Actors have to either poll the chair to see if it becomes occupied on the way to it, or at some point they have to "claim" the chair. When you would claim it is somewhat dependant on the behavior you want to model.
Timestep simulation would produce the correct results, all the actors who happen to notice that the chair became empty in the previous timestep would start claiming the chair by moving towards it ... next timestep the chair's state has a to-be-claimed-by list with multiple actors, and appropriate action can be taken by the actors based on that knowledge (fight!). In the end only one can sit in the chair, collision detection will take care of that.

Hell, that's way more realistic than just picking one.
 
To make the problem more realistic, what happens if the actors decide their next moves by observing other actors ? e.g., if actor A realizes that it can't compete with B for chair X after a few steps, and decide to go for chair Y. A greedy + localized algorithm may lead to a local optimal solution (which may or may not be acceptable). What happens if at the same time, actor Z is user controlled (which does not obey the "localized" algorithm) ?

The general problem seem to require different parallelizing approaches for different situations. Does that mean that we have to (re)load the SPUs with different APUlets for every "mini-problems" ? (Just asking people in the know)

Depending on the problem size, the most general approach seem to be the job queue (with multiple SPU workers). But in some cases, allocating SPUs to predefined data partitions and stream processing give better results.

The "small" local memory problem may cause issues, but they can be solved on a case by case basis (mostly). But if there are many such instances that require different solution. Then it becomes frustrating to hide them.

Is this the general problem that makes it hard to parallelize Cell programs (Other than the regular OOE stuff) ?

As for using 2-phase commit, how's the performance these days ? I tried so-called 2-phase commits long ago. The performance is not good at peak load. Someone may know better.
 
I thought this interview with a NV rep was interesting and semi-related to this thread in that it shows Carmack's technology knowledge and forsight are still repected in the industry.

TG Daily: Do you ever have a developer come to you and say, "We have a concept which we believe will require this much graphical throughput..."

DP: John Carmack from id [Software] does it all the time.

TG Daily: And you give him an idea of when that future window is going to be?

DP: We have a technical board - John Carmack's on it, the guys from Epic are on it, and a couple of guys from EA. They tell us what they're shooting for. That snapshot gives you a piece of the market. Based on that, we can help them, and they help us.
 
I don't think the guy is disrespected in the industry for his views, he knows his stuff about 3D engines, I just don't think his views are the one and only truth. He's not a code-God. He's just another talented guy who has evolutionised 3D gaming with his first Doom game, and after that had a steady rate of very good optimised engines. His D3 engine is certainly not the only brilliant piece on the market, actually I think the work of Crytec, EPIC and Valve is a bit more evolved. Still think Far Cry was a better game and engine than Doom 3 even if FC came out a full year earlier on the market.

I do think JC is a bit lazy in terms of finding out new ways to solve problems he's been solving already on a platform he's used to. He didn't feel like coding PS2 was worth the effort because of the complexity of the VU engines and the EE, together with the lack of a GPU. Couldn't deliver the performance on that thing he was looking for in his games. He also didn't bother the GC, maybe he didn't like the IBM CPU in it. The Xbox with the standard Intel CPU and the GF3 did have his interest for porting Doom3, although the main part of that job was done by a seperate company.

We've been hearing alot lately from PS3-devs that support is a huge step forward on the PS3 compared to the PS2, seems to be alot easier. Lot's of middleware available now, or in the works for release in july/august. We've also been hearing from several devs at E3 that they have multiple SPUs up and running succesfully in their early builds. Resistence, HS, Motostorm, Mercenary's 2 come to my mind first, but I don't think any PS3 game showed at E3 was using just the PPC core without SPU's.

On the other hand we have EPIC, also regarded to be on top of 3D-stuff with their talent and experience , running only one core with Gears, struggling for a already a year now with their multi-core renderer, although they seem to come close finshing that one soon. I mean, if EPIC has a problem with getting this up and running, what are the odds lot's of other 360-devs have. I say high. Most of the 360 launchgames only used 1 core too is what I heard.

Stepping back, looking at these tiny bits of info, one would say that either : a) PS3 coding isn't that hard after all since engines are up and running succesfully at this early stage, or b) 360-multi-core coding is harder than some might think, or c) maybe there's just more talent, motivation and effort in some PS3 teams than Carmack has, that instead of complaining may be working hard to solve problems, and focus on the fresh opportunities a new architecture brings.
 
Last edited by a moderator:
Back
Top