arstechnica part 2 : inside the xbox 360

Mythos · Jun 2, 2005

...though people at MS claimed that they were focused more on the XCpu towards console code like branch predict, AI, and physics code; Whats all the hype when the dam thing seems marginal at best...

"Rumors and some game developer comments (on the record and off the record) have Xenon's performance on branch-intensive game control, AI, and physics code as ranging from mediocre to downright bad."

bbot · Jun 2, 2005

Why does Hannibal think the X360 cpu core is based on the PPE when J. Allard said in that interview with Goto:

A: Simpler and advanced. Basically we adopted the same CPU core as PowerPC G5. It's based on PowerPC G5, but we removed unimportant features from it. For example instead of having L2 cache for each core we adopted L2 cache shared by 3 cores.

archie4oz · Jun 2, 2005

bbot said:
Why does Hannibal think the X360 cpu core is based on the PPE when he said in that interview with Goto:

A: Simpler and advanced. Basically we adopted the same CPU core as PowerPC G5. It's based on PowerPC G5, but we removed unimportant features from it. For example instead of having L2 cache for each core we adopted L2 cache shared by 3 cores.

Cause that just sounds totally retarded...?

DemoCoder · Jun 2, 2005

shaderguy said:
Isn't game physics (polygon soup collision detection and reaction) more of a data base search and update problem? While there's some spatial coherency you can take advantage of, isn't most of the time in these algorithms spent searching data structures?

A physics engine consists primary of three steps: Collision detection, Constraint Solving, and Integration. The final step is very parallel since once you've calculated any interactions, applying time integration to every object in the game database is inherently parallel.

Constraint solving is as parallelizable as any matrix solving solution you're used to, but since multiple mutually non-contacting objects are in need of updating from the collision detection phase, these object's constraints can be solved in parallel. The collision detection step can inherently partition the constrain solving problem set by the very fact multiple mutually non-interacting object collisions will be found, each with their own disjoint set of constraints.

That leaves collision detection, which is the "search" problem. Sure, this problem is *less* easily paralleizable than the other two, but it is not impossible to parallelize. A smart physics engine is going to have some kind of scene database partitioning scheme anyway, and well as multiple collision detect phases (gross, midrange, and fine-detail)

One you can determine that a particular gross volume has no collisions with the rest of the database, you can procede to run deeper level collisions within that volume in a separate thread, while the rest of the database search proceeds.

But even in cases where you don't partition like this, database searches and updates can be parallelized, you just need efficient concurrency primitives.

Let's consider a simplified 1 dimensional example: I have an array of 1000 integers A[], and I wish to determine "collisions" between integers in this array. A collision is defined as abs(A - A[j]) < epsilon. The output is C, a list of pairs of array positions which are in collision.

I can partition this problem into solving the collision problem on A[0..499], A[500..1000] and collisions between numbers from A where i=0..499 to numbers in the range A[j] where j=500..1000. Thus, I can run all three searches concurrently. You can either surround access to the output list C via a critical section, use a concurrent data structure like a concurrent queue (no blocks, no spin locks) implementation, or you can output three sublists, and then merge them at the end.

I'm not saying it isn't a difficult implementation, but to claim that the problem can in no way be parallelized I think is extremist.

There is a reason these physics SDKs haven't been parallelized before, and that's the fact that the target markets did not have SMP capable CPUs, so there was little gain using true concurrency. But now we have dual core CPUs, tri core XBox360, and octo-core PS3, which means alot more work and research will be put into efficient parallel implementations.

bbot · Jun 2, 2005

archie4oz said:
bbot said:

Why does Hannibal think the X360 cpu core is based on the PPE when he said in that interview with Goto:

A: Simpler and advanced. Basically we adopted the same CPU core as PowerPC G5. It's based on PowerPC G5, but we removed unimportant features from it. For example instead of having L2 cache for each core we adopted L2 cache shared by 3 cores.

Click to expand...

Cause that just sounds totally retarded...?

Go back and read it now.

Fafalada · Jun 2, 2005

aaaa0 said:
Major Nelson vindicated?

In fairy land maybe.
Compared to cramming the most processing intensive Physics components into VU0(and I know I'm not the only one to have done that on PS2), using SPEs for the same job(s) is a walk in the park.

bbot said:
Go back and read it now.

I did, still sounds retarded

Carl B · Jun 2, 2005

This doesn't add up - Hannibal's hang-up on the lack of branch prediction is so severe that he completely rules out the possibility of physics, AI, or gameplay code being run on the SPE's - something that I have seen debated and debated myself enough to know that he is likely coming to a false conclusion there. But all that aside, it was still a useful article in terms of exploring the architecture.

Hannibal can be hit or miss sometimes - after all he himself admits to being the one primarily responsible for starting the theory of the XeCPU being derived from the Power 970 theory; that at a time when there was enough information available to make such a certain statement seem overly bold, even for him.

aaaaa00 · Jun 2, 2005

xbdestroya said:
This doesn't add up - Hannibal's hang-up on the lack of branch prediction is so severe that he completely rules out the possibility of physics, AI, or gameplay code being run on the SPE's - something that I have seen debated and debated myself enough to know that he is likely coming to a false conclusion there. But all that aside, it was still a useful article in terms of exploring the architecture.

I think Hannibal is just complaining that the performance of what he considers branchy memory-intensive code will be bad on xenon/CELL's PPCs, and even worse on CELL's SPEs.

I don't think he means that it is impossible to run such code on an SPE if you really wanted to.

-tkf- · Jun 2, 2005

Fafalada said:
Compared to cramming the most processing intensive Physics components into VU0(and I know I'm not the only one to have done that on PS2), using SPEs for the same job(s) is a walk in the park.

Ok, so what Haninbal says he is just plain wrong?

The Cell has only one PPE to the Xenon's three, which means that developers will have to cram all their game control, AI, and physics code into at most two threads that are sharing a very narrow execution core with no instruction window.

Carl B · Jun 2, 2005

aaaaa00 said:
I don't think he means that it is impossible to run such code on an SPE if you really wanted to.

Well, from reading this text of his:

The Cell has only one PPE to the Xenon's three, which means that developers will have to cram all their game control, AI, and physics code into at most two threads that are sharing a very narrow execution core with no instruction window. (Don't bother suggesting that the PS3 can use its SPEs for branch-intensive code, because the SPEs lack branch prediction entirely.)

...one definitely gets the feeling that he thinks it more or less impossible/irrelevant.

aaaaa00 · Jun 2, 2005

xbdestroya said:
aaaaa00 said:

I don't think he means that it is impossible to run such code on an SPE if you really wanted to.

Click to expand...

Well, from reading this text of his:

[...]

...one definitely gets the feeling that he thinks it more or less impossible/irrelevant.

I think that if you read the whole paragraph, Hannibal is claiming that CELL's SPEs won't be enough to make up for the 2 extra PPCs in xenon, specifically running what he considers to be branchy memory-intensive code.

The whole paragraph is:

At any rate, Playstation 3 fanboys shouldn't get all flush over the idea that the Xenon will struggle on non-graphics code. However bad off Xenon will be in that department, the PS3's Cell will probably be worse. The Cell has only one PPE to the Xenon's three, which means that developers will have to cram all their game control, AI, and physics code into at most two threads that are sharing a very narrow execution core with no instruction window. (Don't bother suggesting that the PS3 can use its SPEs for branch-intensive code, because the SPEs lack branch prediction entirely.) Furthermore, the PS3's L2 is only 512K, which is half the size of the Xenon's L2. So the PS3 doesn't get much help with branches in the cache department. In short, the PS3 may fare a bit worse than the Xenon on non-graphics code, but on the upside it will probably fare a bit better on graphics code because of the seven SPEs.

That says to me Hannibal thinks the SPE won't be enough to catch up to 2 extra PPCs specifically for the type of code he cites.

Now whether he's right or not is a different matter, but I think that's what he's trying to say.

DemoCoder · Jun 2, 2005

I think he overestimates the workload that is branch bound. There aren't many games that are AI-bound or game-logic bound. AI typically is a small fraction of the overall workload. And game logic is typically so low tier that it is done in a scripting language. If my game logic is run in interpreted C, LISP, Basic, Python, or some other scripting language, that says alot about how much a branch is effecting my performance. (the indirection in the scripting code will impose orders of magnitude higher costs)

Acert93 · Jun 2, 2005

aaaaa00 said:
I think Hannibal is just complaining that the performance of what he considers branchy memory-intensive code will be bad on xenon/CELL's PPCs, and even worse on CELL's SPEs.

I don't think he means that it is impossible to run such code on an SPE if you really wanted to.

To be fair to Hannibal, I think he does come back to a certain conclusion a couple of times that is relevant to the discussion: Namely, that these limitations may be less of an issue for a closed box design, but as powerful as these chips are it would be very difficult for them to perform to these levels in an open framework that is predominated by general purpose needs (and not 3D) such as the PC and Mac.

At least that point stuck out to me.

Obviously he is making a lot of assumptions (e.g. he is not even certain how many load-store units the XeCPU has) and drawing conclusions based on the CELL PPE. But overall I think his discussion of the overall architecture and its strengths and weaknesses is a good contribution to the discussion.

I think his point about 1st generation software will be very true. I think it will be 2006 and 2007 before we start seeing games designed to take advantage of the special nuances of each system. Both have awesome GPUs and a lot of memory. With a closed box and no legacy demands (like DX 7 or DX 8) I think either system is capable of turning out a jaw dropping game ussing a single PPE and the GPU.

aaaaa00 · Jun 2, 2005

DemoCoder said:
I think he overestimates the workload that is branch bound. There aren't many games that are AI-bound or game-logic bound. AI typically is a small fraction of the overall workload. And game logic is typically so low tier that it is done in a scripting language. If my game logic is run in interpreted C, LISP, Basic, Python, or some other scripting language, that says alot about how much a branch is effecting my performance. (the indirection in the scripting code will impose orders of magnitude higher costs)

Has anyone actually sat down and profiled a bunch of modern games and counted the execution time spent doing various types of instructions, or are we all just making assertions from gut feeling and the seat of our pants?

I'm sure Microsoft sat down and did tons of performance profiling before they called IBM and asked for 3 PPC cores, knowing that on paper they'd look much worse than CELL with it's 8 SPUs.

Carl B · Jun 2, 2005

aaaaa00 said:
I think that if you read the whole paragraph, Hannibal is claiming that CELL's SPEs won't be enough to make up for the 2 extra PPCs in xenon, specifically running what he considers to be branchy memory-intensive code.

The whole paragraph is:

Quote:
At any rate, Playstation 3 fanboys shouldn't get all flush over the idea that the Xenon will struggle on non-graphics code. However bad off Xenon will be in that department, the PS3's Cell will probably be worse. The Cell has only one PPE to the Xenon's three, which means that developers will have to cram all their game control, AI, and physics code into at most two threads that are sharing a very narrow execution core with no instruction window. (Don't bother suggesting that the PS3 can use its SPEs for branch-intensive code, because the SPEs lack branch prediction entirely.) Furthermore, the PS3's L2 is only 512K, which is half the size of the Xenon's L2. So the PS3 doesn't get much help with branches in the cache department. In short, the PS3 may fare a bit worse than the Xenon on non-graphics code, but on the upside it will probably fare a bit better on graphics code because of the seven SPEs.

That says to me Hannibal thinks the SPE won't be enough to catch up to 2 extra PPCs specifically for the type of code he cites.

Now whether he's right or not is a different matter, but I think that's what he's trying to say.

Well, that's not what I get from that text - I more get the feeling of:

"Even though Cell is apocalyptic at running AI, physics, and gameplay... to XeCPU's merely horrendous... don't worry Cell fans - because those SPE's are still good for graphics!"

I hardly think he's implying the worthwhile use of SPE's in terms of those mentioned codes above, whatever the case. I'll have to reiterate, I just think he's off the mark this time on some of these conclusions - but oh well whatever. 8)

DemoCoder · Jun 2, 2005

Has anyone actually sat down and profiled a bunch of modern games and counted the execution time spent doing various types of instructions, or are we all just making assertions from gut feeling and the seat of our pants?

Is Hannibal a game programmer? Has he ever written any game engines?

I'm sure Microsoft sat down and did tons of performance profiling before they sat down and asked IBM for 3 PPC cores.

And Sony didn't? On what basis do you think Microsoft, which is a recent contender in the games market, has more experience and wisdom than Sony? And how do you know if EITHER of them are studying the right problem (the type of HW needed for *future* game requirements, not *current* game requirements)

Profiling current X-Box titles will tell you nothing about the workload for next-gen titles.

bbot · Jun 2, 2005

Fafalada said:
aaaa0 said:

Major Nelson vindicated?

Click to expand...

In fairy land maybe.
Compared to cramming the most processing intensive Physics components into VU0(and I know I'm not the only one to have done that on PS2), using SPEs for the same job(s) is a walk in the park.

bbot said:

Go back and read it now.

Click to expand...

I did, still sounds retarded

Exactly what part sounds retarded? Care to enlighten me?

Fafalada · Jun 2, 2005

-tkf- said:
Ok, so what Haninbal says he is just plain wrong?

I'd say dismissing those type of workloads on SPEs would be wrong.

But he's right in the sense that a lot of early titles(especially the PC originating nexgen stuff) will cram most of that stuff on PPE core, as it's the easiest thing to do - just recompile your existing C++ codebase and away you go.
Then again something similar applies to utilization of multiple cores on 360.

aaaaa00 · Jun 2, 2005

DemoCoder said:
Has anyone actually sat down and profiled a bunch of modern games and counted the execution time spent doing various types of instructions, or are we all just making assertions from gut feeling and the seat of our pants?

Click to expand...

Is Hannibal a game programmer? Has he ever written any game engines?

I don't know. I asked the question of anyone here on the forum.

I'm sure Microsoft sat down and did tons of performance profiling before they sat down and asked IBM for 3 PPC cores.

Click to expand...

And Sony didn't? On what basis do you think Microsoft, which is a recent contender in the games market, has more experience and wisdom than Sony? And how do you know if EITHER of them are studying the right problem (the type of HW needed for *future* game requirements, not *current* game requirements)

Profiling current X-Box titles will tell you nothing about the workload for next-gen titles.

First, who says they profiled current xbox titles? Maybe they profiled UE3? Or other unannounced titles? Next-generation PC stuff? Or maybe whatever their internal teams have been working on in secret?

I dunno. It seems pretty rational to me to start with workloads you know, not ones you're not sure will be used or not sometime in the future.

I'm sure Sony did the same kind of profiling, maybe with the assumption that they'd be building a system without a normal GPU, but with 4 or more CELLs, exactly like the patent.

When that assumption changed and they swapped in a GPU, maybe the system they built was left with a weird balance.

But thats just speculation.

DemoCoder · Jun 2, 2005

Sony had always planned to build a GPU. They actually built one, but for various reasons, they had to go with the RSX. There was never any realistic chance IMHO that the PS3 was designed for pure software rendering. The SPEs can in no way compete with GPU pixel pipelines in terms of fetching texels, filtering, and doing ROP operations.

arstechnica part 2 : inside the xbox 360

Mythos

bbot

archie4oz

ea_spouse is H4WT!

DemoCoder

bbot

Fafalada

Carl B

Friends call me xbd

aaaaa00

-tkf-

Carl B

Friends call me xbd

aaaaa00

DemoCoder

Acert93

Artist formerly known as Acert93

aaaaa00

Carl B

Friends call me xbd

DemoCoder

bbot

Fafalada

aaaaa00

DemoCoder

Similar threads