John Carmack on PS3 Video

Fox5 said:
Hmm....you know, I've been wondering something. I don't at all understand that inner workings of cpus, but it seems to me that a design like Cell would offer a developer more control over the processor's abilities, whereas a design like the x360 cpu, which is more of a traditional PC design, offers less control and unfortunately doesn't share the OOE of PC cpus to automatically make best use of its resources. It seems to me that if the two designs are approached from a standard PC viewpoint, Cell would be bad and Xenon would be less bad, whereas from a more hands on approach, Cell would be good and Xenon not so good.

Can anyone satisfy my suppositions, or crush them bitterly into the ground?
I agree.
 
hey69 said:
but he said the ps3 has more peak performance ..

and how is 3 cores symetrical actually?

the nuclear logo is symetrical :)
istockphoto_420478_radiation_warning.jpg

sure most times we think of SMP as 2, 4 or 8 processors. google tells you could have 3-way SMP on Alpha and other Unix servers.


A big weakness of the Cell is the PPE's FPU is very weak, so you can't get away with lots of FPU usage in your general purpose code.. X360 CPU while in order has the powerful VMX128 units. this is an incentive for moving as most code as possible to the SPEs . . but then I expect first gen PS3 titles to be have bad framerate slowdowns.
 
Fox5 said:
Hmm....you know, I've been wondering something. I don't at all understand that inner workings of cpus, but it seems to me that a design like Cell would offer a developer more control over the processor's abilities, whereas a design like the x360 cpu, which is more of a traditional PC design, offers less control and unfortunately doesn't share the OOE of PC cpus to automatically make best use of its resources.

sure, lack of OOO makes it worse than an athlon X2 for instance. but my understanding is that it would be efficient with lots of threads, I see it kind of like a smaller scale Sun Niagara. but having six well balanced threads would be as much a nightmare as making the same usage of all SPE I guess.
 
Last edited by a moderator:
ERP said:
I think there is a real desire to scale gameplay, I want to put 100 people in the streets of a city to populate it instead of the 20 or so in GTA. I don't want them to disapear when I turn around. I'd like them to exhibit reasonable behavior in response to what's happening, I'd rather thay didn't run into each other all the time and look stupid. All this stuff will likely increase the load on the main game thread.
You can't get realistic group behaviour from timestep simulation? (Which is embarrasingly parallel as long as you have enough actors.)
 
Blazkowicz_ said:
<snip>
but having six well balanced threads would be as much a nightmare as making the same usage of all SPE I guess.

Right. Microsoft's attempt does seem half baked.

They have six hardware contexts, but at the same time context switching is expensive, so you can't really go for scores (or even hundreds) of threads as you would do in a more multithreaded CPU (like Niagara). So you end up with this pseudo fixed upper limit on the number of threads, - not fixed by software or hardware, but by context switching performance.

Of course, CELL is a lot worse in this regard.

Cheers
 
Gholbine said:
Which can be reduced to: "Cell means more work for me therefore it was the wrong decision."

Exactly...Console system have always been harder to program for than PC...why is he going out of his way to voice his opinion? I smell a rat.
 
Gubbi said:
Right. Microsoft's attempt does seem half baked.

They have six hardware contexts, but at the same time context switching is expensive, so you can't really go for scores (or even hundreds) of threads as you would do in a more multithreaded CPU (like Niagara). So you end up with this pseudo fixed upper limit on the number of threads, - not fixed by software or hardware, but by context switching performance.

Of course, CELL is a lot worse in this regard.

Cheers

What's the limit?

Exactly...Console system have always been harder to program for than PC...why is he going out of his way to voice his opinion? I smell a rat.

Not always, Playstation was notable easier to program for than the competition, Gamecube and Xbox notably easier to program for than PS2 or previous consoles, and Xbox 360 easier than Ps3.
 
DiGuru said:
Simple breakdown: John Carmac isn't happy about changing his habits. While there is much to gain by doing it the PS3 way, it's a lot of hard work. And he was doing very well doing what he did, thank you very much. And the Xbox360 is very much as he is used to things, Visual Studio and all the common classes and objects.

Cynical? Yes. True? Yes. The higher people are elevated, the less they like having to start over and proving themselves once again. When you're considered a Demi-God, you're not trampling to change your territory and run the risk that you get your ass kicked by the inhabitants.
I'm not going to comment on this with respect to John Carmack as this forum has had way too much drama in the past over this exact topic and I don't want to get into it again.

What I would like to say, is that in the general sense there is a lot of wisdom in what you are saying here. Adaptability is one of the most important traits in this field. I'd like to add, that it is also necessary to be able to defer to other people and trust their expertise. Personally, I think the days where a single person can be brilliant at every aspect of game development (and even smaller subsets like engine work) are numbered. With how fast technology and knowledge have been improving, it is going to be impossible for any single person to do it all.

This brings me to a couple of thoughts.

1) Well understood and documented public interfaces are extremely important. This is pretty much classic OO school thought.

2) It's probably ok if the team lead isn't an expert at programming the hardware so long as he/she can make informed design decisions based on someone else's more intimate knowledge of the hardware.

3) Because of 2, it is probably ok if a cpu or part of a cpu requires a steep learning curve if in the end, code can be written for it that does not violate the first point.

Nite_Hawk
 
Last edited by a moderator:
If the Xenon had 6 symmetrical cores (or whatever), and had a higher theorietical peak performance than the Cell processor in the PS3, he would still prefer the symmetrical design. Thats important. He's evaluating each comanies approach to the 'problem' of multicore CPU programming and saying that its easier for all developers for a symmetrical design.

Given the the position of the Cell processor in Sony's long term plans and then Xenon for Microsoft, it makes perfect sense that Xenon *should* be easier to program games for, that was one of its primary design requirements.

The Cell, otoh, had to have more scalable and flexible design for a myriad of functions, not just a games console.

It has nothing to do with lazy, or changing his ways, its about his opinion on spending more time on designing a game rather than around a CPU.

I dont even understand the debate here.
 
Last edited by a moderator:
Gubbi said:
By not making the local stores coherent with main memory they become part of the SPU context, making context switching hugely expensive.

Cheers

Only if you need to save and restore an entire SPU. But one could just as easily implement an N-M model, like on Solaris, where you have N real threads, and M lightweight threads. The lightweight threads run within the SPU and all share the same local store. You then partition the local store on thread creation depending on how much memory each thread may require.

Context switching then amounts to register save/restore.
 
MfA said:
You can't get realistic group behaviour from timestep simulation? (Which is embarrasingly parallel as long as you have enough actors.)

The issue is more about interactions between agents, and knowledge what can an agent see and respond to etc. Pixels in a pipeline don't impact adjacent pixels.

Knowledge gathering is hard to compartmentalize effectively, it makes the 256K constraining. It's often an area where scripting is employed and that's not very SPU friendly either. All this could be done on an SPU, it's just that there are a lot of other things that are easier to do.
 
DemoCoder said:
Only if you need to save and restore an entire SPU. But one could just as easily implement an N-M model, like on Solaris, where you have N real threads, and M lightweight threads. The lightweight threads run within the SPU and all share the same local store. You then partition the local store on thread creation depending on how much memory each thread may require.

Context switching then amounts to register save/restore.

Only if yopur average thread is using much less than the 256K for code and data.

Both platforms are more ameable to run to completion job type models for medium grain parallelism anyway.
 
ERP, so you do admit that the problem in and of itself is embarassingly parallel? Just that Cell has problems?
 
Execution to completion == streaming. And you can solve all interdependencies (when using a fixed time step) the brute force way: just add them to the end of the list and reprocess them. There is plenty of hardware available to do that.

And it's technically not hard to change the object model to favor local storage, although it is when using the current compilers and libraries.
 
Last edited by a moderator:
I don't see why agents need access to complete information about the world, or even perfect information.

It seems that an agent needs access to the following information:

1) detail and update to date information about their immediate vicinity
2) rough/semiaccurate but possibly stale information about areas not in view or in the vicinity (e.g. not necessarily updated every tick/frame and not necessarily detailed down to the mesh level, or even a fine grained waypoint graph)

The AI will have to burn more cycles on planning than traditional "omniscient" AIs, and it can be wrong and surprised from time to time, but on the other hand, it will behave more realistically.

Then the question becomes, how easy is it to gather #1 and how expensive is it to run the AI planning.

I think we can safety eliminate the scripting issue by designing a "AI shader" language coupled with a SPU compiler rather than relying on p-code interpretation. Either that, or just create a bunch of high level C++ classes/templates which can be "scripted" by composition.

Sure, it is tougher than PC coding, but I am not convinced that the problem is intractable from a parallelization point of view. It seems hard from a data-management point of view to optimize data structures and get the right data into the SPUs at the right time. But this doesn't mean the core problem isn't embarassingly parallel, just that CELL makes it a little difficult to code.

Even if we consider hundreds or thousands of agents, the AI doesn't need to know about the state of each and every one of them. Just think of a human being at a football stadium. There are tens of thousands of people. The only detailed information real people utilize is the state of the people immediately around them (such as the guy who is blocking your view, or talking on his cell phone). The rest of the audience is perceived as a big blob with only its macroscopic properties noted. Your attention is only called to someone far away when something surprising happens to make them stand out from the macroscopic mass.

In game AI, with a huge army for example, it seems to me that the AI only needs to care about the enemies immediately around them, as well as note the large scale macroscopic properties of the actors far away, except for those that call you attention, such as if someone far away hits you with a distance weapon, or you have a target that you need to get to.
 
DemoCoder said:
I don't see why agents need access to complete information about the world, or even perfect information.

It seems that an agent needs access to the following information:

1) detail and update to date information about their immediate vicinity
2) rough/semiaccurate but possibly stale information about areas not in view or in the vicinity (e.g. not necessarily updated every tick/frame and not necessarily detailed down to the mesh level, or even a fine grained waypoint graph)

<snip>

Even if we consider hundreds or thousands of agents, the AI doesn't need to know about the state of each and every one of them. Just think of a human being at a football stadium. There are tens of thousands of people. The only detailed information real people utilize is the state of the people immediately around them (such as the guy who is blocking your view, or talking on his cell phone). The rest of the audience is perceived as a big blob with only its macroscopic properties noted. Your attention is only called to someone far away when something surprising happens to make them stand out from the macroscopic mass.

The challenge would be to build a datastructure to quickly resolve which other agents are "in the vicinity" and which are "in the blob" for each individual agent (to "cull" as many agents as possible). Agents move around (not only thinking 3D here, in general graph terms) so you need a datastructure that is easy to update. That seems like an awful dynamic beast to fit into the block load/store model of the SPUs.

Cheers
 
As for a nice example, some people made a model some years ago that only looks at the direct surroundings and with very simple goals in mind, like where to put the next step, and used that to simulate things from people running to the fire exits or waiting at the elevators in large buildings, through how traffic behaves, to things like the hunting patterns of animals or combat in war.

And, contrary to the very simplicity of the model used, it turned out to behave much more life like than just about any much more complex behavioural model. Because people are pretty single minded when you get down to it, and they don't have that omniscient view.

Simple objects within small boundaries, with only a single goal at any one time and only a rough idea about their surroundings seem to create much better models. Look at mobs, don't try to be a puppet player seems to be the lesson in this.
 
Gubbi said:
The challenge would be to build a datastructure to quickly resolve which other agents are "in the vicinity" and which are "in the blob" for each individual agent (to "cull" as many agents as possible). Agents move around (not only thinking 3D here, in general graph terms) so you need a datastructure that is easy to update. That seems like an awful dynamic beast to fit into the block load/store model of the SPUs.

Cheers
Determining locality will keep on being a problem, true. But, that also means that when you can gather all the relevant data of the surrounding area, it becomes a localized set of functions. Which can be executed within those bounds. Great for streaming and local storage solutions. There is no need for an AI to be omniscient. Lots of lost individuals who only react to their direct surroundings is much more realistic.
 
MfA said:
ERP, so you do admit that the problem in and of itself is embarassingly parallel? Just that Cell has problems?


I'm saying it has a large degree of potential parallelism, just not at the level of pixels or verts.
In this case where it starts to become hard to make parallel is where old data is not sufficient. But even where old data is sufficient, you have potentially large data dependencies between actors.
 
Back
Top