PS3 vs. Xenon CPU performance

archie4oz · Feb 10, 2005

A VU program that can initiate DMA request was how one person in the know described the SPUs.

Make of that what you will.

Was that being faceteous? :?:

Panajev2001a · Feb 10, 2005

archie4oz said:
A VU program that can initiate DMA request was how one person in the know described the SPUs.

Make of that what you will.

Click to expand...

Was that being faceteous?

Well, Deano... I think this person was fundamentally correct, but a bit too pessimistic

.

First, the SPU's have ~4x the amount of registers and 8x the amount of Local Storage (32 KB vs 256 KB): still, at this point he/she is right.. this is a super VU if we stop looking at this point.

Second the SPU can initiate its own DMA requests and supports GPU-like scatter/gather DMA operations and not only that, the DMA controller attached to each SPU understands Virtual Addresses which is a fundamental shift from the functionality embedded in the EE and its VU's (no DMA requests initiated by VU programs and DMA transfers used only Physical Addresses... which is a bit of a pain in the ass, especially if you work with Linux and have issues to how many virtually contiguous memory locations will be physically contiguous, etc...).

Last, but not least on the VU's Integers were really second class citizens (separate register-file [only 16 bits Integer registers], limited amount of operations available for Integer values, etc...) while I feel that on the SPU's the situation is not as bad for Integer values: the 128x128 bits GPR's can be used to store both Integer and Floating-Point values and there are more instructions to handle Integer values (for example there is an Integer MADD Instruction which has onl 1 more cycle of latency compared to a Floating-Point MADD instruction).

I think this person expected much more from CELL and he/she might be the kind of guy/gal who is very difficult to please/impress.

nAo · Feb 10, 2005

Panajev2001a said:
Second the SPU can initiate its own DMA requests and supports GPU-like scatter/gather DMA operations and not only that

Just a minor correction, actual GPUs can't perform scatter ops, AFAIK

Panajev2001a · Feb 10, 2005

nAo said:
Panajev2001a said:

Second the SPU can initiate its own DMA requests and supports GPU-like scatter/gather DMA operations and not only that

Click to expand...

Just a minor correction, actual GPUs can't perform scatter ops, AFAIK

Well, that does not make CELL look worse... on the contrary, it makes it look better IMHO.

nAo · Feb 10, 2005

Panajev2001a said:
Well, that does not make CELL look worse... on the conrary, it makes it look better IMHO.

I know

ERP · Feb 10, 2005

Fafalada said:
ERP said:

To be honest I'm still not convinced that the balance is right, how many Flops do you really need to vertex shade? What percentage of an overall game is dependant of FP performance vs Integer performance?

Click to expand...

Who says anything about shading? I'm gonna decompress everything including textures on the SPUs That's why I need at least 16 dammit!

Fair enough....
when I say vertex shading I'm actually including things like tessalation and decompression.

Sure we can use extra math power, it's not really my point. I'm nore interested in how as a whole the architecture ends up balanced relative to the stuff were trying to run on it.

I still think that a lot of what your average game code does (especially outside of graphics and to a lesser extent physics) is walk through data structures and copy bits of data about. I generally refer to this as integer performance even though it has next to no integer math in it's critical path. The SPU's just aren't going to help much with this stuff. Now the PPC core might be more than sufficient for all this stuff but I can't see the future and I don't know what will drive next gen games.

How about a game like say GTA where the entire city is live all the time? Where you can kill someone on one side of town and it affects the business he owned on the other side of town, which in turn affects the cities economy, causing housing to degenerate etc etc etc... In a game like this even with half asses solutions AI cost would be astronomical by todays standards. Where today we dedicate maybe 20% of our existing resources to AI I can see that balance shifting dramatically as people start to expect more from games.

SPU's are not generically useful, there is a lot of code in your average game that isn't going to be SPU friendly.

I'm still looking forwards to having something to play with, I think Cell is intriguing to say the least.

Alejux · Feb 10, 2005

ERP said:
Fafalada said:

ERP said:

To be honest I'm still not convinced that the balance is right, how many Flops do you really need to vertex shade? What percentage of an overall game is dependant of FP performance vs Integer performance?

Click to expand...

Who says anything about shading? I'm gonna decompress everything including textures on the SPUs That's why I need at least 16 dammit!

Click to expand...

Fair enough....
when I say vertex shading I'm actually including things like tessalation and decompression.

Sure we can use extra math power, it's not really my point. I'm nore interested in how as a whole the architecture ends up balanced relative to the stuff were trying to run on it.

I still think that a lot of what your average game code does (especially outside of graphics and to a lesser extent physics) is walk through data structures and copy bits of data about. I generally refer to this as integer performance even though it has next to no integer math in it's critical path. The SPU's just aren't going to help much with this stuff. Now the PPC core might be more than sufficient for all this stuff but I can't see the future and I don't know what will drive next gen games.

How about a game like say GTA where the entire city is live all the time? Where you can kill someone on one side of town and it affects the business he owned on the other side of town, which in turn affects the cities economy, causing housing to degenerate etc etc etc... In a game like this even with half asses solutions AI cost would be astronomical by todays standards. Where today we dedicate maybe 20% of our existing resources to AI I can see that balance shifting dramatically as people start to expect more from games.

SPU's are not generically useful, there is a lot of code in your average game that isn't going to be SPU friendly.

I'm still looking forwards to having something to play with, I think Cell is intriguing to say the least.

Why do you think SPU's won't be able to process AI and logic? Because of memory access problems? Lack of bandwidth?

My impression, is that games involving simulations, with multiple elements to process simultaneously, would be greatly benefited by having independent processing units. Why do you think the SPU's wouldn't be able to deal with that?

ERP · Feb 10, 2005

Why do you think SPU's won't be able to process AI and logic? Because of memory access problems? Lack of bandwidth?

My impression, is that games involving simulations, with multiple elements to process simultaneously, would be greatly benefited by having independent processing units. Why do you think the SPU's wouldn't be able to deal with that?

AAAAAARRRRRRGGGGHHH!!!!!!

For the last time, because of data locality issues.
Small local memory systems like the VU or the SPU are designed to work on compact datasets (or streaming of independant entities). Once you start distributing data between the elements in the simulation, you have a problem.

A vertex doesn't care what the one next to it is doing, an AI entity does. If you extend this logically the set of actions available to an entity is dictated by the state of many local things and a lesser number of distant things. In a complex system this set of behaviors and things is completly dynamic.

If every simulated entity contains all of the knowledge it uses to make a decision then it's trivial to parallelize. However generally you don't replicate data accross all of the entities. Example in combat I would need to know who's shooting at me and where he is. This would generally be stored with the entity shooting at me not in the the AI entity runnig.

Now this doesn't completly preclude using the SPU's for AI, it just means that the "knowledge gathering" part will not necessarilly run well in a system like that.

In small scale problems, you could probably do what havok does with physics and use "simulaiton islands", which is just dynamically dividing the data, so that ll relevant data is partitioned together, but this would have to be done on the PPC.

There are other solutions most of them trade of duplicate data, and correctness of data for storage locality.

Alejux · Feb 10, 2005

ERP said:
Why do you think SPU's won't be able to process AI and logic? Because of memory access problems? Lack of bandwidth?

My impression, is that games involving simulations, with multiple elements to process simultaneously, would be greatly benefited by having independent processing units. Why do you think the SPU's wouldn't be able to deal with that?

Click to expand...

AAAAAARRRRRRGGGGHHH!!!!!!

Chill-out man! I just got on board and was too lazy to read everything. Sorry.

For the last time, because of data locality issues.
Small local memory systems like the VU or the SPU are designed to work on compact datasets (or streaming of independant entities). Once you start distributing data between the elements in the simulation, you have a problem.

A vertex doesn't care what the one next to it is doing, an AI entity does. If you extend this logically the set of actions available to an entity is dictated by the state of many local things and a lesser number of distant things. In a complex system this set of behaviors and things is completly dynamic.

If every simulated entity contains all of the knowledge it uses to make a decision then it's trivial to parallelize. However generally you don't replicate data accross all of the entities. Example in combat I would need to know who's shooting at me and where he is. This would generally be stored with the entity shooting at me not in the the AI entity runnig.

Now this doesn't completly preclude using the SPU's for AI, it just means that the "knowledge gathering" part will not necessarilly run well in a system like that.

In small scale problems, you could probably do what havok does with physics and use "simulaiton islands", which is just dynamically dividing the data, so that ll relevant data is partitioned together, but this would have to be done on the PPC.

There are other solutions most of them trade of duplicate data, and correctness of data for storage locality.

I get what you're saying, and like you said, the main challenge here is task decomposition. This is not exactly something new when dealing with AI. A lot of people have dedicated a huge amounts of time in creating models for AI in distributed computing systems.

I may be wrong, but I don't think such logic problems as you mentioned are the real bottleneck when concerning game AI. Things like pattern recognition and fast fourier transforms are what really would consume CPU processing. And those would fit in great with SPU, and are easily isolated.

IMO, the main bottle neck here for AI, is actually the lack of memory for the huge datasets used nowadays. It's a pity, because otherwise, we might have been able to see things like voive recognition, text-to-speech, and a few other cool stuff.

DemoCoder · Feb 10, 2005

I also don't think you'd really spawn multiple threads to deal with the same AI algorithm, since you'd run into synchronization and concurrency issues.

The SPEs would probably work like dedicated worker-threads, being driven by threads on the main PPE core, which is responsible for orchestration and data movement.

You'd probably have 1 SPE for AI, plus multiple SPEs for tesselation, physics, sound, etc

If you were using a concurrent language, you'd decompose the work into queues, and some concurrent datastructures. You'd stream spatially coherent data into queues, which the workers would consume, using a few concurrent structures for cross communication when needed, but avoiding synchronization as much as possible.

Obviously, if you try to run 2 or more concurrent threads on the AI, you'll run into lots of issues, regardless of whether or not you have small or large memory, due to concurrent reads/writes of datastructures.

I think organizing the SPEs into a hetereogenous collection, would work better than trying to take one task (say AI), and distribute it across multiple SPEs.

PS3 vs. Xenon CPU performance

archie4oz

ea_spouse is H4WT!

Panajev2001a

nAo

Nutella Nutellae

Panajev2001a

nAo

Nutella Nutellae

ERP

Alejux

ERP

Alejux

DemoCoder

Similar threads