Does Cell Have Any Other Advantages Over XCPU Other Than FLOPS?

Alpha_Spartan said:
I'm not quite too sure about this and I would like some clarification.
It has alot more cache, no? XeCPU has 128KB L1 and 1MB L2 IIRC. Cell's PPE probably has minimum of 256KB of L2 (no idea about L1), and each SPE has 256KB right?

I know that they work quite differently, but never the less, Cell has alot more cache.
 
Yes. I'd suggest reading more of IBM's docs, particularly information relating to performance (in the apps exhibited to date), and the wheres and whys as to where that performance comes from. The pure processing capability is of course, important, but it's part of a larger picture.
 
Shogmaster said:
It has alot more cache, no? XeCPU has 128KB L1 and 1MB L2 IIRC. Cell's PPE probably has minimum of 256KB of L2 (no idea about L1), and each SPE has 256KB right?

I know that they work quite differently, but never the less, Cell has alot more cache.

I guess it depends how you look at it. In some cases, if dev's are going with the lowest common denominator (1 PPE) then the XeCPU has twice as much cache (CELL has 512kb i think)

However if you spread that over 3 cores the XeCPU will have less than the single core CELL PPE. But, i guess the obvious question in that scenario is how much of the PPE's cache is used in order to keep the SPE's fed, and how much does it have left over to do the tasks that are not suited to the SPE's.
 
Each SPE has it's own local memory and being that they aren't cache's cache-misses in themselves are eliminated.

The PPE has 512 L2 cache to itself where Xenon's core will have share a single 1MB L2 cache.

Thread execution isn't interleaved or what have you on SPEs in that a thread has an SPE all to it's lonesome.

Cross core communication should be better as the EIB is pretty fast and Cell is commonly referred to as a stream processor.

Cell has the ability to access XDR and the GDDR3 memory pools for it's own needs and in assistance to the GPU via it's on chip XDram controller and Flexio interface. Both the XDram controller and Flexio appear to be faster than Xenon's FSB.

Cell can handle 9 threads where Xenon can handle 6. (Cell in the PS3 that is)

Can't think of anything else at the moment ignoring FLOPS. Some positives can be negatives though in certain situations.
 
Last edited by a moderator:
scooby_dooby said:
But, i guess the obvious question in that scenario is how much of the PPE's cache is used in order to keep the SPE's fed, and how much does it have left over to do the tasks that are not suited to the SPE's.

Cell misconception #1: the PPE needs to be involved in SPU tasks/tasking. It doesn't. It can if you want it to, or it can stay out of the way entirely if you want it to.

scificube said:
Each SPE has it's own local memory and being that they aren't caches cache misses in themselves are eliminated.

Interesting you mention that, because there's talk on the IBM boards about doing "software" multithreading on a SPU. Apparently you can, or will be able to, segment the registers on a SPU and compile to a range, effectively allowing you to maintain multiple contexts on the SPU at a time, and switch between them (takes a cycle to switch, IIRC). It's suggested for use where you have a thread with unpredictable memory access, which can be made small (the IBM guy suggests you could pack up to between 4 and 8 threads in this manner, if you can make them small enough of course).
 
Last edited by a moderator:
dukmahsik said:
yes it's harder to code for which will hopefully push devs to squeeze more out of it.

This seems contradictory? Devs will be pushed to get more out of a CPU that is harder to code for than an easier one?
 
Alpha_Spartan said:
I'm not quite too sure about this and I would like some clarification.

Bon't the cell have a flops advantage but also a integer disadvantage because it have one main core versus the general core on the xbox360 ?

A big flop power means that the cpu will be able to pusch more geometry ?

Also what is more important flops power or integer power for making games ?

The geometry is originated from the cpu right ?

I ask this because i know that only with the dx10 all the geometry will be created directly on the gpu.


The xbox 1 cpu is only 3 Gflop verus the ps2 cpu that have 6flop performance but the ps2 cpu is not better at puscing games.

Do any of you think that the ps3 would have be more powerfull with for example a double core amd cpu ?


I would like to know if this flop power will translate in something or will be useless.
 
seismologist said:
How do they caclulate the Cell flop advantage is that based on physical FPU units or does it account for parallelism among SPE's?

Not sure what you mean, but of course, it includes the SPUs.

It works out as

PPU: 12 floating point ops per clock * 3.2Ghz = 38.4Gflops
SPU: 8 floating point ops per clock * 3.2Ghz = 25.6 Gflops * 7 = 179.2 Gflops
Total: 217.6 Gflops

iknowall said:
Bon't the cell have a flops advantage but also a integer disadvantage because it have one main core versus the general core on the xbox360 ?

If you mean mathematical integer, I don't think that's clear at all. I don't think we've seen figures for integer ops on Cell, but the SPUs are capable of them.

iknowall said:
I ask this because i know that only with the dx10 all the geometry will be created directly on the gpu.

I don't think all, certainly, or the vast majority even. You'll be able to create vertices on the GPU though.

iknowall said:
The xbox 1 cpu is only 3 Gflop verus the ps2 cpu that have 6flop performance but the ps2 cpu is not better at puscing games.

They're fundamentally different chips. The PS2 CPU was in-order, and its main core wasn't anything to write home about, or so I'm told. The PS2 CPU was also doing work that Xbox's GPU did instead - it's function was a dual one really.

iknowall said:
Do any of you think that the ps3 would have be more powerfull with for example a double core amd cpu ?

I don't think so, no.
 
Last edited by a moderator:
Every floating point processing unit in the Cell added together. That's PPE + 7 SPEs worth. Cell should be able to maintain very good efficiency in feeding these in many situations due to the LS and structured data access too, so I'd expect the real-world float performance efficiency to be higher than most processors too in these situations.
 
scificube said:
Cell has the ability to access XDR and the GDDR3 memory pools for it's own needs and in assistance to the GPU via it's on chip XDram controller and Flexio interface. Both the XDram controller and Flexio appear to be faster than Xenon's FSB.

I always thought, and correct me if I'm wrong, that the GPU has access to the XDR and GDDR3 RAM, while the Cell only has access to the 256 MB of XDR RAM. Again, I'm a little confused in this area.

Do they both have access to both RAM pools?
 
drpepper said:
Do they both have access to both RAM pools?

Seems so:

Kutaragi said:
CELL and RSX have close relationship and both can access the main memory and the VRAM transparently. CELL can access the VRAM just like the main memory, and RSX can use the main memory as a frame buffer. They are just separated for the main usage, and do not really have distinction.
 
iknowall said:
Bon't the cell have a flops advantage but also a integer disadvantage because it have one main core versus the general core on the xbox360 ?
Depends what you mean by integer disadvantage.
A big flop power means that the cpu will be able to pusch more geometry ?
Process geometry, yes.
Also what is more important flops power or integer power for making games ?
Depends whether you write your game using ints or floats! These days floats are more important as 3D data is represented as such.
The geometry is originated from the cpu right ?
The geometry is originated in a 3D graphics package ;) The models are stored in RAM. The CPU processes this model data for animation and 3D positioning etc.
I ask this because i know that only with the dx10 all the geometry will be created directly on the gpu.
Nope, it'll still originate with the artists and the 3D graphics applications ;)
The xbox 1 cpu is only 3 Gflop verus the ps2 cpu that have 6flop performance but the ps2 cpu is not better at puscing games.
The PS2 also had to do a lot of processing for the graphics which the XB CPU didn't, but there's lots of reasons why flops alone aren't a comparable measure of CPU performance.
Do any of you think that the ps3 would have be more powerfull with for example a double core amd cpu ?
I don't. I think Cell's a clever architecture for solving the demands made of it.
I would like to know if this flop power will translate in something or will be useless.
*sigh* There's plenty of other people who seem to think Sony, IBM and Toshiba got together to design a new CPU and produced something useless. Do you honestly believe them that stupid and incompetant? :???:
 
iknowall said:
Bon't the cell have a flops advantage but also a integer disadvantage because it have one main core versus the general core on the xbox360 ?

Cell actually has superior integer performance to the XeCPU, though the majority of that is on the SPE's - thus access/harder to utilize(?). That 'three times' the 'general purpose' talk came from Major Nelson comparing the Cell's single PPE to the XeCPU's triple Power-core arrangement. Granted what would you expect with him dismissing them for the 'DSPs' they are?
 
I would think the overall need to parallelize code in general would where the real difficulties lie.

Then again I don't see why devs wouldn't take the easy way out. Both physic and graphics are supposed embarrassingly parallelizable (sp?) tasks so why not spread these tasks over the cores of these CPUs and then run AI and game code in a serial fashion on a separate core dedicated to just these things. I would think the equivalent of a 3.2 Ghz processor dedicated solely to AI would be enough to get developers impressive results.

I don't see why both CPUs couldn't go this route as there would seem to enough available threads and horsepower to pull this sort of thing off.

1-2 threads for AI and game code
2-3 for snazzy new CPU assisted graphics (vertex shading/processing/lighting, post processing, procedural synthesis...AA maybe?)
2-3 for snazzy new physics sims (pure physical interactions, physics/animation, physics/graphics)
1-2 threads for 3D sound, system I/O etc.

The point being one may not necessarily need to solve the problems associated with parallelizing game code and AI if one can isolate them to a few threads by themselves and then use these CPUs raw power towards what appears they were designed for.

The problem is then synchronization of all this activity in the game code rather than parallelizing the game code itself and this of course includes AI which doesn't need computational power but rather of conditional operations and memory accesses. Given some notable developers remarks...synchronization would seem the lesser evil when compared to reworking serial tasks into parallel tasks where it is not obviously as to how to do so.

Of course this isn't ignoring each CPU presents it's own unique challenges as far as Xenon's small cache size goes and the relatively small size of the SPU's LS goes but these are the kinds of things console developers are accustomed to dealing with. I've repeated this a few times so if I'm wrong I beg forgiveness...but it was only last gen than console devs had OoOe processors to work with (G3 hybrid in GCN and Celeron in Xbox) so IO execution shouldn't be something they're unfamiliar with or haven't seen in ages. Also given how the PS2 was ...difficult to tap due to both architectural and supports issues I would think console devs would be be adept at taking advantage of these CPUs. Not to mention support this time around seems far and away better all around for everyone.
 
Last edited by a moderator:
Back
Top