arstechnica part 2 : inside the xbox 360

To go back to the AI performance on Cell, I want to add also that I recently read a post by an AI researcher that was not only non-critical of Cell, it was actually a glowing appraisal as to the sorts of things AI programmers might be able to do with the SPE's - so it seems that as in most things Cell related, two schools of thought exist within the fog of confusion that is this architecture.
 
DemoCoder said:
But physics isn't inherently serial, in fact, most of it is inherently parallelizable. That's why the government keeps buying supercomputers like ASCI White and Purple to run weapons simulations. Or why Japan built the Earth Simulator to simulate weather. Or why Boeing and NASA use supercomputers to do computational fluid dynamics.
May I ask, since we are talking about games here (hardly critical research), where there may be inherent problems could simple hacks be used to sidestep them and what would they be? Similar to the debates on IEEE 32 compliance in shaders whereas since exact and repeatable results are not required some trade offs could be of significant benefit.
 
Titanio said:
wco81 said:
Could it be that in the end, games on both consoles offer big improvement in graphics but not as much improvement in AI or physics over current games?

It's already been discussed, but I don't agree with Hannibal's suggestions about physics on SPEs or X360's cores. AI is another matter, and that really depends on what you're doing. As far as physics go, we've already had AGEIA and Epic touting Cell's suitability for physics at least.

Neither of these chips primary work is going to be graphics related. I certainly don't think there's enough headroom on X360 to allow it to spend a whole lot of time on procedural vertex work and such unless you want to hold everything else to a single core. Cell has a bit more headroom, and Sony has openly discussed Cell/RSX collaboration on graphics, but obviously the chip needs to be doing physics, AI and the like, like any other CPU. If it's good at physics, you may as well let it rip on that. Of course, it all depends on the priorities of your game..if you want something that looks truly "standout", perhaps you'd allocate more resources to graphics work on the CPU versus other areas.

But as a practical matter, will game companies devote resources to optimizing or parallelizing physics code in the manner discussed in this thread if they have to end up porting the game to hardware which doesn't have SPEs or triple-core CPUs?

Even between X360 and PS3, it sounds like tailoring for one results in code which really can't be leveraged on the other. Will multiplatform publishers let developers do such work if that's the case?
 
Even between X360 and PS3, it sounds like tailoring for one results in code which really can't be leveraged on the other. Will multiplatform publishers let developers do such work if that's the case?

Well isn't that why you have the Novodex API?
 
wco81 said:
Even between X360 and PS3, it sounds like tailoring for one results in code which really can't be leveraged on the other. Will multiplatform publishers let developers do such work if that's the case?


Well, that's where Sony and MS's tools (or middleware) come into the picture to make the programmers' jobs easier...
 
wco81 said:
But as a practical matter, will game companies devote resources to optimizing or parallelizing physics code in the manner discussed in this thread if they have to end up porting the game to hardware which doesn't have SPEs or triple-core CPUs?

Even between X360 and PS3, it sounds like tailoring for one results in code which really can't be leveraged on the other. Will multiplatform publishers let developers do such work if that's the case?

That's a good question, although seperate from the issue of technical potential.

I guess it'll depend on how easy it is to leverage the "beyond the common denominator" performance - certainly it should be easier in many senses to tap the SPEs than it was to tap the vector units on PS2. I suppose it also depends on the market - if you've a massive market for one system, there's more incentive to optimise for it. If they do restrict themselves to the common denominator, though, it'll only highlight the gulf between exclusive titles harnessing that power and multiplatform titles that don't, much much more so than with the current generation - I mean ignoring the SPEs, for example, would be ignoring far more "extra" power than ignoring the vector units on PS2. That may compel multiplatform SKUs to optimise more explicitly for different platforms in order to remain competitive with exclusive content..and once a few start doing that, more will feel they need to also..

I guess, ultimately, though it's a cost/reward issue ;)

As others mentioned, middleware to a certain degree will allow you to "optimise" for different platforms without much effort of your own.
 
PC-Engine said:
Even between X360 and PS3, it sounds like tailoring for one results in code which really can't be leveraged on the other. Will multiplatform publishers let developers do such work if that's the case?

Well isn't that why you have the Novodex API?

Exacltly.Novodex is already completly Mul-ti-threa-ded !.
 
MfA said:
Nao : unless he means the "loop" from fetch to FX2 ... but in that case I think he missed the fact that branch hints can be stored well before the actual branch.
I also assumed branch hintes can be inserted way before the branch is executed.

For short loops you just put in a static prediction before starting the loop.
I don't even know if SPE supports dynamic branch hintes ( ruling out selfmodifying code ;) )
 
Titanio said:
version said:
Love_In_Rio said:
The MS specs say 1 VMX unit per core. 115 Gflops could be achieved with 2 FP units but not with two VMX units: 3,2 GHz * 2 ( * 2 FPu ) * 8 ( * 1 VMX ) * 3 cores = 115 GFflops


only 1 fpu , not?

115Gflops suggests 2, or some funky math at least to arrive at that figure. 1 FPU would be 96Gflops - the other 19Gflops have to come from somewhere, and a second FPU fits the bill ;)

This is all very new to me. Can I just clarify a few things?

1) The VMX and Floating point units are seperate and can each perform floating point operations?
2) The VMX unit can perform 8 operations and the floating point units 2 per cycle?
3) 1 VMX and 2 floating point units per PPE = 115 GFLOPS

As far as I know, the A64 has 3 floating point units per core. Are these also capable of 2 operations per cycle like those in the Xenon?

Were does double/single precision come in?

Thanks
 
nAo said:
xbdestroya said:
To go back to the AI performance on Cell, I want to add also that I recently read a post by an AI researcher...
Link? 8)

Well I'm not a fan of cross-posting, as I've seen the trouble it can cause, but I'll go ahead and post it here for the sake of 'the science.'

BUT, should anyone disagree, I please ask, as a personal favor, that instead of flaming in this thread or this post causing riots, you please go to THIS thread and discuss it there, so that he might be able to defend himself and make his own points, as certainly I am no AI expert.

Well, i can contribute a little here. I work as a AI researcher (yes, i really do), and am familiar with multi-agents programming and real time AI. I have absolutly no doubt that the Cell will be a great processor for a whole lot of AI application. Why ? Here's why:

* suppose you are using a pathfinding algorithm in a game (say, a RTS game) and you use a variant of the A* algorithm (as some games do currently): you could fully dedicate one (or more) core(s) to that task. The number of units that would be able to be handled concurrently would be very high, compared to games running on an P4 or AMD cpu today

* suppose you are using finite state automatons for the behavior of your monsters (as most games do): you could dedicate one (or two, or three) core(s) to that task, allowing for huge and complex finite state automaton, and lots and lots and lots of controlled tanks/orcs/whatever

* suppose your are very creative, and want to use more complex AI techniques such as neural network (think Creatures, Black and White, etc): the floating processor power of one core would be perfectly suited to that task (neural network are using floating points computations, lots of them). The more core used for that, the more complex the AI or the higher the number of different, independtly controlled monsters

* on the same creative domain, suppose you are using a genetic algorithm to learn in real time the patterns of actions used by the human player (say, in a fighting game such as Tekken): one core could be enough to provide power for this realtime profiling of the human behaviour and the corresponding adaptation of the game

* suppose you want to use gradient field to represent semantic values (such as "weakly defended area with respect to past observations") in a RTS game: these gradient fields are using mostly integer calculations, but lots and lots of them: put them on a separate core, and use the result of the calculations in the other ones

None of these propositions are mutually exclusive of course. If i were working in a game company as an AI developper (unfortunately i am not) i would be salivating right now. Sony is right, the Cell power will unleashed a new realism in games AI that will be unprecedented. Provided that developpers do not use all of the Cell cores for nice graphics effects of course



PS: about the absence of out-of-order execution and branch-prediction logic on the PPU: this is no more related to AI than it is to game physics or gameplay code. And yes indeed, it is the job of the compiler to take that into account. But for computations intensive tasks (like AI), seven cores with high power and even without out-of-order execution will perform very, very well.

I believe he's either Russian or Eastern European, so forgive his accent, but nontheless hopefully this will fuel some discussion. Again though - should you disagree strongly, please go to the link I have provided above to debate - and I ask that this not get cross-posted around the web. I rather like this guy and would rather he not get mad at me for my making him the object of an Internet firestorm.

And none of that was directed at you nAo - (or the majority of this board for that matter) - you just know how it is here nowadays with all the troll-visitors this site gets, and it seems whatever is posted here ends up somewhere else before too long. :)
 
pjbliverpool said:
3) 1 VMX and 2 floating point units per PPE = 115 GFLOPS

PPEs are a Cell thing not XeCPU, lets not confuse matters...
 
DeanoC said:
pjbliverpool said:
3) 1 VMX and 2 floating point units per PPE = 115 GFLOPS

PPEs are a Cell thing not XeCPU, lets not confuse matters...


So the cores of the X360 processor are based on PPC970. How many vmx unit(s) per core?
 
I don't think Hannibal's closing statements should be so literally dissected. To me they were merely generalizations made to qualify statements made earlier in the article in an effort to avoid any fanboi criticism. It's a simple attempt at playing both sides of the fence.
 
nAo said:
MfA said:
For short loops you just put in a static prediction before starting the loop.
I don't even know if SPE supports dynamic branch hintes ( ruling out selfmodifying code ;) )
Given that they talk about a software managed branch target buffer I presume the branch hint will say the equivalent of "the branch at PC + X will likely jump to Y" rather than "the next branch will likely not be taken".

(Only X being an immediate operand ... hence dynamic.)
 
Zeross said:
bbot said:
So the cores of the X360 processor are based on PPC970.

No it is NOT. How in the hell did you arrive to this conclusion from Deano's post ?
No need to get snippy. He's just wondering if the 360 processor is based on cell ie. PPE or the 970 ppc.
 
Zeross said:
bbot said:
So the cores of the X360 processor are based on PPC970.

No it is NOT. How in the hell did you arrive to this conclusion from Deano's post ?


I didn't. I also relied on what J. Allard said in an interview with H. Goto.
 
Back
Top