Cell benchmarked

Lord Darkblade said:
The TRE line (grey highlighted) @30fps shows using the cell BE, which presumably is the Broadband Engine, or 4 Cell chips coupled together... ie: 4 PPC + 32 SPEs iirc.

Is that correct? If so the Cell isn't realistically rendering that data, its probably achieving 1/4th of that or around 7.5fps...

Darkblade I think you're confusing the specific 'Broadband Engine' refered to in the original patent for the 'Broadband Engine' of today, which just refers to the architecture overall and for which Cell is the only processor currently representing.

I know where you're coming from though - *the* 'Broadband Engine' was indeed indicated in the patent drawings to be four 'Processing Elements,' or what we've come to know as Cell's. But the terrain demo was just one Cell.


@Shifty: The 50fps stat is for one Cell scaled up to 3.2 GHz from the 2.4 GHz at which the demo was conducted. The link One provided gives the raw data for the different configurations and their respective fps rates. The dual 2.4 Cell config got 75 fps.
 
Last edited by a moderator:
Titanio said:
Actually, we know nothing about the level of optimisation on the G5. How do you know they did not implement these optimisations?

I do not know the optimizations in the other platforms. IBM is saying nothing about them. Which at least is suspicious.

Shifty was giving an explanation with the G5 stalling most of the time because of the cache lantencies. That explanation IMO does not make any sense if the program has been coded properly.
So, that Raytracing program is basically working with 2 bitmaps. Order makes a massive difference in any cached CPU, por example accessing the bitmaps in this order:
(column 0, row 0), (column 1, row 0), (column 2, row 0)... (column N, row 0), (column 0, row 1)...
performs much better than in this order
(column 0, row 0), (column 0, row 1), (column 0, row 2)... (column 0, row N), (column 1, row 0)...
in a G5, but exactly the same in a SPE if the data fits in the LS.

Yes, the result is the same. The performance can be many times worse. How tunned is the G5 version of the code? No idea, but assuming that has received half of the optimization effort of the Cell, I fail to see how it runs 4 times slower.
 
msia2k75 said:
I guess it was a 3.2Ghz CELL.

My bad. To clarify, the 50x performance improvement was a single 3.2Ghz Cell vs a 2Ghz G5. So 35x the performance for a single 3.2Ghz Cell vs a 2.7Ghz G5 seems to make sense. The dual blade was giving 75x the performance of the G5.
 
Cell and Cell Broadband Engine seem to be used interchangeably. Perhaps it would be easier to talk about the implementation we will find in Playstation 3 by referring to that as DD2. (but I don't know that it is true that DD2 is the one that will be in the final spec of PS 3, so...I guess that could end up being equally confusing.)
 
Shifty Geezer said:
Micro seconds. That's 1/1000th a millisecond and 1000* a nanosecond. At 1GHz, 1 microsecond =1000 cycles. At 3.2 GHz, that's 3200 cycles. 20 of those is 64000 cycles. Sounds like a lot to me! Not that I imagine one to be context switching SPE's very often. That'd go against their ideal function.
So SPE's are at their best when they are working on one thead at a time? Makes sense looking at their design.
 
Alpha_Spartan said:
So SPE's are at their best when they are working on one thead at a time? Makes sense looking at their design.

Yes, pretty much. It's not coincidental that a good approach to the SPUs in your code will address many of the same issues that multithreading does (e.g. memory blocking - multithreading gets around this by switching to another thread. In SPU code, you want to avoid memory blocking altogether, so you make your data access predictable ;)).
 
wireframe said:
Cell and Cell Broadband Engine seem to be used interchangeably. Perhaps it would be easier to talk about the implementation we will find in Playstation 3 by referring to that as DD2. (but I don't know that it is true that DD2 is the one that will be in the final spec of PS 3, so...I guess that could end up being equally confusing.)
Cell is actually the misnomer here. Cell is the architecture, not a specific implementation. A Cell of a 1:4 PPE:SPE ratio is as much a Cell as is a 1:8. The Cell architecture has, if I remeber right, one or more PPE coupled to one ro more SPE's on a ring bus. Because at the moment there's only one type of Cell, the 1:8 config, that's been named as Cell. But it's really one implementation that is, AFAIK, designated the Broadband engine as the device featuring in PS3. That is PS3 has a BE instead of the old EE, and every other 1:8 is thus a BE. DD2 is only a revision of the PPE, and by all accounts it's the only Cell out there. DD1 was quickly superceeded.

The issues not helped by the fact IBM seem to reference inconsistently too!
 
Shifty Geezer said:
Cell is actually the misnomer here. Cell is the architecture, not a specific implementation. A Cell of a 1:4 PPE:SPE ratio is as much a Cell as is a 1:8. The Cell architecture has, if I remeber right, one or more PPE coupled to one ro more SPE's on a ring bus. Because at the moment there's only one type of Cell, the 1:8 config, that's been named as Cell. But it's really one implementation that is, AFAIK, designated the Broadband engine as the device featuring in PS3. That is PS3 has a BE instead of the old EE, and every other 1:8 is thus a BE. DD2 is only a revision of the PPE, and by all accounts it's the only Cell out there. DD1 was quickly superceeded.

The issues not helped by the fact IBM seem to reference inconsistently too!
So, what you are saying with all of that is that Cell Broadband Engine refers specifically to the implementation for Playstation 3?
 
one said:
"Cell Broadband Engine" is just a trademark of SCEI for the Cell microprocessor (as "Cell Processor" is already taken by someone)
http://www-128.ibm.com/developerworks/power/library/pa-cellperf/

Ok, so when talking about PS 3 we are really talking abou Cell BE or DD2 (codename). I thought Even IBM was using these terms interchangeably. Maybe this is due to the mass of early marketing material focusing on the PS 3 implementation.
 
wireframe said:
So, what you are saying with all of that is that Cell Broadband Engine refers specifically to the implementation for Playstation 3?

I think Shifty just means if you have a Cell chip with a Power based main PPE core and 8 SPE's on die, you have a 'Cell Broadband Engine,' and that is what will be going into PS3. But it will also be going into the Mercury Systems servers and such.

Right now we have:

"Cell Broadband Engine Architecture": refering to the architectural model described in the patent and implemented first in 'Cell.'

"Cell Broadband Engine": The Cell we all know and love, and a Sony trademark I guess.

"Broadband Engine": Used interchageably with the above? Things are getting confusing...

"Cell": etc etc...

So a lot of overlap it seems.

DD2 is a reference to the second revision of the Cell chip, which only revised the PPE core. There will likely be future revisions of both the 'main' core and the SPE's.

I could easily envision a future 'side-project' revision by Sony or Toshiba called 'Matrix,' maybe with tweaked SPE's and utilizing a non-Power main core. And it'll be like "Matrix Broadband Engine," a subset of the "Cell Broadband Engine Architecture" architectural model.
 
Last edited by a moderator:
DarkRage said:
How tunned is the G5 version of the code? No idea, but assuming that has received half of the optimization effort of the Cell, I fail to see how it runs 4 times slower.
If you doubt IBM's effort on their own PPC970, the MPEG2 decoding test may be more interesting for you as the paper mentions a test Intel conducted with their own Intel Performance Primitive library.
 
xbdestroya,

I've always understood Cell to refer to the concept, the fundamental architecture of the design. My only confusion, it seems, was if Cell Broadband Engine was the full name that they sometimes shortened to Cell in documention. It has since been clarified (above) that Cell Broadband Engine is the name SONY has given their implementation for the Cell chip in PS 3 (and perhaps they will apply this same name to other devices that have a different PPE/SPE arrangement?)
 
wireframe said:
The Apple G5s seem to have a variable FSB that is 1/3 the core clock. link

PS. Or, depending on how you want to look at it, the G5 has a fixed FSB and Pentium 4 and the like have a variable one.

http://www.apple.com/g5processor/architecture.html

So 16GB/s? Which is 62.5% of the bandwidth available to Cell (25.6GB/s).

Makes me wonder about that "40% of cycles are waiting for memory" statistic... Obviously, there's no breakdown of the impact of latency versus bandwidth on that statistic.

Jawed
 
wireframe said:
xbdestroya,

I've always understood Cell to refer to the concept, the fundamental architecture of the design. My only confusion, it seems, was if Cell Broadband Engine was the full name that they sometimes shortened to Cell in documention. It has since been clarified (above) that Cell Broadband Engine is the name SONY has given their implementation for the Cell chip in PS 3 (and perhaps they will apply this same name to other devices that have a different PPE/SPE arrangement?)

I think they're all using the terms interchangeably to be honest. IBM clearly was uses Cell Broadband Engine fairly frequently, and Sony uses Cell alone often as well. CBEA is the technical term with which to refer to the architecture as a whole, and I think until they get a fundamentally different implementation of Cell out there, we're not really going to have all that much guidance on what term is limited to what.
 
PC-Engine likes to brag that the Cell doesn't have a double precision floating point capabilities, or that it's diminished in some way, yet the article points out a test using Linpack for the DP floating point benchmark. So the Cell is capable of DP-FP work?

What is PC-Engine going on about?
 
I think it's safe to assume wherever you read Cell BE they're talking about a 1:8 configuration. Cell for the time being also means a 1:8 config, but that might change in the not too distant future as more varieties appear, in which case the term Cell will just reference the family in the same way PPC and x86 do.
 
drpepper said:
PC-Engine likes to brag that the Cell doesn't have a double precision floating point capabilities, or that it's diminished in some way, yet the article points out a test using Linpack for the DP floating point benchmark. So the Cell is capable of DP-FP work?

What is PC-Engine going on about?

Cell's DP power is inferior to it's SP power, that much is true. But it's DP power is still quite competetive with other contemporary processors. Quite good actually.
 
one said:
If you doubt IBM's effort on their own PPC970, the MPEG2 decoding test may be more interesting for you as the paper mentions a test Intel conducted with their own Intel Performance Primitive library.

Yes, I noticed, and I have not got any complain about that result. They make sense to me -which basically means they are aligned with my expectations-, I can be wrong, anyway. Impressive, but expected. I guess that output is exactly the same -not very clear with IBM's document-

But I am not that confident with those benchmarks of x40 times faster (or x4 with a single SPU) with no code to compare with and no information regarding optimizations in the other platform.
 
Back
Top