IBM unveils Cell roadmap

In performance per watt they're pretty bloody useless compared to the state of the art.

http://www.clearspeed.com/products/cs_advance/

50 GFLOPs at 25W for a 2 chip add-in board. Currently Cell is doing ~25 GFLOPs and consumes more than 25W. How much more I dunno, IBM is pretty secretive as far as I can tell - making a lot of noise about performance per watt, but not coming clean. But it doesn't really matter because Cell is currently just a placeholder for DP Cell.

Jawed
Are you kidding, right? clearspeed is a joke/toy compared to CELL.It is far less general purpose, has less than 1/3 of the memory that a PS2's vector unit has per core and it is severely bw limited :rolleyes:
CELL would destroy that thing in the vast majoriy of cases in any real world application
 
The argument is staring you in the face: HPC double-precision GFLOPs is anything but "general purpose, real world application".

Jawed
 
Thanks!
I guess it´s also fair to assume that a lot of HPC-applications will not be as tuned as future PS3 games where the dependency of the PPE will e significantly reduced as predicted by DeanoC, and the HPC apps cannot be tuned to the same degree as they cannot rely on a fixed hardware configuration.
Most of them will also likely run Linux and use some legacy cluster software that will help distribute the work, that will also benfit from more PPE power.
HPC could very well be much more finely tuned than a game can be.
Groups that have large computing systems are often used to creating custom code, and often have much more time to tune than a game developer.

Games may also have other software components that can burden the PPE, which may not be the case for some HPC settings, where a job can be trimmed down to just a single purpose.

It really depends on the individual customer and what they need the computation power for.

The Cell IBM will be selling for HPC will probably be more dependant of powerful PPEs than the Cell of the PS4.
I'm not sure there's any guarantee of that. There are a lot of ways to compensate for weaker PPEs, though it will still depend on the situation.

IBM may be motivated to have stronger PPEs if CELL faces stronger competition from other architectures, which will likely be asymmetric by 2010 as well.
 
I think the Roadrunner model itself is an indicator that the weak PPE can be worked around; in fact in that system to a certain extent the Opterons are serving as a sort of souped up PPE stand-in to throw out tasks to the Cell processors... and by Cell we mean SPEs.

I think Cell has a pretty decent future in HPC; there are tasks it is just plainly suited to, and it's adoption rate is already surprisingly good when you figure it started from 'zero' just a year ago. I do expect the PPE to improve in some senses, because it really has been marked as a sort of weak spot in the architecture (and it would strike me as a bit lazy to leave things be), but at the same time we know that as long as the PPE drag is 'liveable,' the SPEs are really the focus here.
 
Last edited by a moderator:
The argument is staring you in the face: HPC double-precision GFLOPs is anything but "general purpose, real world application".

Jawed

Cell beat Clearspeed hands down on performance for the next fastest supercomputer contract which IBM won recently. Without enough local store and communications bandwidth, most of the Cell's real world performance advantages evaporate.
 
Cell beat Clearspeed hands down on performance for the next fastest supercomputer contract which IBM won recently. Without enough local store and communications bandwidth, most of the Cell's real world performance advantages evaporate.

Except you mean Clearspeed in the last sentance.
 
I wonder if they might not consider putting out of order hardware back into the PPEs, given the extra real estate? If they are considering multiplying the area of the SPE assemblies by a factor of four while only doubling the PPEs, it might be nice to scale those up a bit.
 
The argument is staring you in the face: HPC double-precision GFLOPs is anything but "general purpose, real world application".
On B3D we argue, we discuss, we try to bring facts to the table, not marketing one liners. Now if you have something substantial to counter what I wrote that's fine, otherwise stop here.
 
They will for sure have to beef up the PPE to be able to orchestrate all the SPEs, I mean if it is difficult to handle manually 8 SPEs now, I wonder how it will be to handle 32 SPEs and see that they all are in sync. Even crazier it would be if several SPEs have to work on the same task. Maybe it would work if you have large groups a engineers writing specific code for specific applications/problems but that must cost a lot of money and I don't see that being to desirable in game development. Furthermore, this would also make them more suitable for maybe having in normal PCs at some point, which is also the question, does IBM have any desire what so ever to see those things in every mans home sometime in the future or are they only targeting super computers and so on?...
 
Astronomy and biology both solve the same problem?

Actualy it depends, it is surpizing many times how similar they are. Deconvolution for example is an image processing method that was developed to remove haze and backround to bring clarity in images of stars captured by telescopes. That same method is now used to deconvolve fluorescent images of cells captured by fluorecent microscopy...
 
Furthermore, this would also make them more suitable for maybe having in normal PCs at some point, which is also the question, does IBM have any desire what so ever to see those things in every mans home sometime in the future or are they only targeting super computers and so on?...
Do you mean: Will Sony and Toshiba be shafted by IBM in the same way as Apple?

Probably not as their relationship is different. Sony and Toshiba are probably strong enough to continue their own deveopment if IBM goes off in a different direction. It may be also be a division of the market that all parties will benefit from while still sharing some of the development costs.
 
The ring-bus would likely need some work as well. Otherwise, some of those SPEs are going to be very distant from the PPE and DMA engine(s) indeed.

Here it comes the cross-bar switch ;), like some of the CELL BE designers wanted, but could not implement (not enough time and the cost would have been a bit high to justify the jump to fully switched solution over the current ring bus topology) for PLAYSTATION 3's BE.
 
Here it comes the cross-bar switch ;), like some of the CELL BE designers wanted, but could not implement (not enough time and the cost would have been a bit high to justify the jump to fully switched solution over the current ring bus topology) for PLAYSTATION 3's BE.

Woudnt it be very hard to arrange 32 SPEs & couple of other devices arround a switch without having wildly varying tracelenghts between switch-component? I can only imagine it would either need "hops" for the more distant components or you would have to clock it conservative enough by looking at the slowest (most distant) link.
The first solution would likely be better if you can use that hop for another component - ie. multiple "ring-buses" to a crossbar. Or a grid-like setup, so you dont lose the modularity of the ring-bus (easy to add/remove SPEs, a crossbar would have to be redesigned to support additionally elements )
 
The current Cell delivers good DP FP performance on HPC applications,
and that's before refactoring is used to allow the use of multi-precision
libraries. Full speed DP support in 2008 is the icing on the cake, along with
large memory sizes. It's a much more balanaced and capable processor
than you'd perhaps expect.
 
The current Cell delivers good DP FP performance on HPC applications,
and that's before refactoring is used to allow the use of multi-precision
libraries. Full speed DP support in 2008 is the icing on the cake, along with
large memory sizes. It's a much more balanaced and capable processor
than you'd perhaps expect.

Are you saying the size of the SPEs local store will also be increased in the enhanced CELL version coming out in 2008? Any links?
 
Are you saying the size of the SPEs local store will also be increased in the enhanced CELL version coming out in 2008? Any links?

I meant that the blades with the enhanced DP Cell will support larger memory sizes, up to 32 GB,
whereas today they ship with 1 GB.
 
I meant that the blades with the enhanced DP Cell will support larger memory sizes, up to 32 GB,
whereas today they ship with 1 GB.

And this really is crucial.
DP weakness as compared to SP was liveable, but the memory support is a dealbreaker. Of the two, I'd point to main memory support as by far the greater weakness for scientific codes. (Bearing in mind that generalizing about scientific code is difficult.)
 
Back
Top