Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 25-Nov-2007, 04:00   #1
stof
Registered
 
Join Date: Nov 2006
Posts: 3
Default Does the Cell processor have a chance?

I occaisionally hear hype about the Cell processor, but I wonder if it really has a chance. It seems to have fatal flaws.

The upside of the Cell is that it has 200 GFLOPS peak performance per chip. This performance number comes from each SPU running at 3.2 GHz, able to perform 4 multiplies & 4 adds simultaneously, which is 25 GFLOPS per SPU, times the 8 SPUs on the chip.

I wonder if you can really get to 50% of peak performance.

A single modern Intel core running at 3 GHz can do 4 multiplies or adds at the same time, which is 12 GFLOPS. It's not that hard to get to peak performance of an x86-64 [COLOR=#000080! important][COLOR=#000080! important]CPU[/COLOR][/COLOR]. That's 100 GFLOPs too.
And you can put 8 of them in a cheap box.


A major problem with the Cell is that it uses expensive XDR memory and you can only put 2 Gbytes on a node. That is very limiting. A Cell blade is very expensive, ~$10,000. And, Cell isn't improving as fast as Intel/AMD is.

So, the Cell doesn't look that great with price/performance, it has limited memory, it has little software infrastructure, and uncertainty with its future.

Does it have a chance?
stof is offline   Reply With Quote
Old 27-Nov-2007, 21:41   #2
Vitaly Vidmirov
Member
 
Join Date: Jul 2007
Location: Russia
Posts: 96
Default

Quote:
Originally Posted by stof View Post
I wonder if you can really get to 50% of peak performance.
It is possible to get 99% of peak performance on certain tasks like matrix multiply.

Quote:
and uncertainty with its future.
So what did you expect? CELL in place of x86?
x86 is not the most popular processor in the world, anyway.
Vitaly Vidmirov is offline   Reply With Quote
Old 27-Nov-2007, 22:11   #3
3dilettante
Senior Member
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,071
Default

Quote:
Originally Posted by stof View Post
[FONT=Verdana][SIZE=2]I occaisionally hear hype about the Cell processor, but I wonder if it really has a chance. It seems to have fatal flaws.

The upside of the Cell is that it has 200 GFLOPS peak performance per chip. This performance number comes from each SPU running at 3.2 GHz, able to perform 4 multiplies & 4 adds simultaneously, which is 25 GFLOPS per SPU, times the 8 SPUs on the chip.

I wonder if you can really get to 50% of peak performance.
This is highly dependent on workload, and for many problem types and system sizes, 50% utilization would be something any architecture would kill for.

Quote:
A single modern Intel core running at 3 GHz can do 4 multiplies or adds at the same time, which is 12 GFLOPS. It's not that hard to get to peak performance of an x86-64
That's 100 GFLOPs too.
And you can put 8 of them in a cheap box.
It's also not too hard to make x86 run below peak. Even on Linpack, the going rate is something like 80% peak, and Linpack is a standard benchmark everyone targets.
Since memory latency and bandwidth has become so important, the greater control Cell has for memory access is in many areas far superior to the current broadcast coherency schemes of x86 chips.

It should also be noted that the x86 system that can hit 100 GFLOPS does so with two chips with TDPs of 120W.
That's several times the TDP of one Cell.
Power concerns are going to be dominant from now on, as it now constrains clock speeds, system footprint, and operating costs for a system.

Quote:
A major problem with the Cell is that it uses expensive XDR memory and you can only put 2 Gbytes on a node. That is very limiting. A Cell blade is very expensive, ~$10,000. And, Cell isn't improving as fast as Intel/AMD is.
A valid point, which is why the HPC variant of Cell uses DDR2.
I'll cover the volume and price considerations at the end of this.

Quote:
So, the Cell doesn't look that great with price/performance, it has limited memory, it has little software infrastructure, and uncertainty with its future.

Does it have a chance?
Does it have a chance in what field?

The desktop? Basically none.
Future game consoles? Maybe one of them.
HPC? Probably the best chance it has for creating a niche, much in the way Blue Gene's processors have their own small space.
Other fields? Maybe something here or there, but the support isn't all that enthusiastic.

The primary reasons for doubt is that Cell so far has not realized the volume that commodity x86 has attained.
Given market trends and costs, this may prove telling.
The more likely outcome is that future x86 chips are going to copy most of what makes Cell perform so well, leaving Cell with little to offer.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is online now   Reply With Quote
Old 27-Nov-2007, 23:45   #4
pjbliverpool
B3D Scallywag
 
Join Date: May 2005
Location: Guess...
Posts: 4,556
Send a message via MSN to pjbliverpool
Default

Quote:
Originally Posted by 3dilettante View Post
It should also be noted that the x86 system that can hit 100 GFLOPS does so with two chips with TDPs of 120W.
I thought Core2 could perform 4 dual precision but 8 single precision operations per cycle (per core that is)?
__________________
PowerVR PCX1 4MB --> Voodoo Banshee 16MB --> GeForce2 MX200 32MB --> GeForce2 Ti 64MB --> GeForce4 Ti 4200 128MB --> 9800Pro 128MB --> 8800GTS 640MB --> Radeon HD 4890 1GB --> GeForce GTX 670 DirectCU II TOP 2GB
pjbliverpool is offline   Reply With Quote
Old 27-Nov-2007, 23:55   #5
3dilettante
Senior Member
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,071
Default

I was going by the DP throughput of a two-socket Yorkfield system which is roughly 100 GFLOPS, while the HPC Cell with enhanced DP throughput also tops out at ~100 DP GFLOPS.

edit:
Cell would also have double the SP throughput over DP for the HPC version.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is online now   Reply With Quote
Old 27-Nov-2007, 23:59   #6
pjbliverpool
B3D Scallywag
 
Join Date: May 2005
Location: Guess...
Posts: 4,556
Send a message via MSN to pjbliverpool
Default

Quote:
Originally Posted by 3dilettante View Post
I was going by the DP throughput of a two-socket Yorkfield system which is roughly 100 GFLOPS, while the HPC Cell with enhanced DP throughput also tops out at ~100 DP GFLOPS.
Ah cool. Just wanted to make sure I wasn't mistaken. So the HPC Cell pretty much doubles Yorkfields peak throughput in either SP or DP.

I wonder if we'll see a new, beefier Cell before Nehalem arrives. I expect so but it would be strange to see a single socket x86 matching or exceeding Cell in peak floating point.
__________________
PowerVR PCX1 4MB --> Voodoo Banshee 16MB --> GeForce2 MX200 32MB --> GeForce2 Ti 64MB --> GeForce4 Ti 4200 128MB --> 9800Pro 128MB --> 8800GTS 640MB --> Radeon HD 4890 1GB --> GeForce GTX 670 DirectCU II TOP 2GB
pjbliverpool is offline   Reply With Quote
Old 28-Nov-2007, 00:10   #7
stof
Registered
 
Join Date: Nov 2006
Posts: 3
Default

Where can I find more information on Cell boards with DDR2? The IBM web site doesn't have any.

The Core2 can do only 4 single precision operations per cycle and 2 double precision. It can't do simultaneous multiply & add, like the SPE on the Cell. But, it's hard to keep simultaneous multiply & adds busy.
stof is offline   Reply With Quote
Old 28-Nov-2007, 01:15   #8
Carl B
Friends call me xbd
 
Join Date: Feb 2005
Posts: 6,293
Default

Quote:
Originally Posted by stof View Post
Where can I find more information on Cell boards with DDR2? The IBM web site doesn't have any.
What are you going to buy some?

Anyway this is the thread that would probably be your best introduction to the DDR2/HPC Cell: http://forum.beyond3d.com/showthread.php?t=40661

I'll mention also that it's this version of Cell that's going to go into Roadrunner. It's not available to the 'general' public right now, but as time goes on I'm sure you'll see it pop up. As to the original point of the thread, frankly I think Cell has done very well for itself considering it's a new architecture.
__________________
Somebody set up us the bomb.
Carl B is offline   Reply With Quote
Old 28-Nov-2007, 17:22   #9
pjbliverpool
B3D Scallywag
 
Join Date: May 2005
Location: Guess...
Posts: 4,556
Send a message via MSN to pjbliverpool
Default

Quote:
Originally Posted by stof View Post
Where can I find more information on Cell boards with DDR2? The IBM web site doesn't have any.

The Core2 can do only 4 single precision operations per cycle and 2 double precision. It can't do simultaneous multiply & add, like the SPE on the Cell. But, it's hard to keep simultaneous multiply & adds busy.
According to this Core2 is capable of 8 SP operations per cycle:

http://www.behardware.com/articles/6...-duo-test.html

"Core uses two floating point calculation units, one dedicated to addition and the other to multiplication and division. Theoretical calculation capacity is 2 x87 instructions per cycle and 2 SSE 128 bit floating point instructions per cycle (that is 8 operations on 32 bit simple precision floating points, or 4 operations for double precision 64 bit floating points). Core is, in theory, two times faster for this type of instruction than Mobile, Netburst and K8."

That would result in a theoretical peak of 96 GFLOPs for the fastest single socket CPU.
__________________
PowerVR PCX1 4MB --> Voodoo Banshee 16MB --> GeForce2 MX200 32MB --> GeForce2 Ti 64MB --> GeForce4 Ti 4200 128MB --> 9800Pro 128MB --> 8800GTS 640MB --> Radeon HD 4890 1GB --> GeForce GTX 670 DirectCU II TOP 2GB
pjbliverpool is offline   Reply With Quote
Old 28-Nov-2007, 21:50   #10
stof
Registered
 
Join Date: Nov 2006
Posts: 3
Default

I am a HPC software developer. My software is used on about $100 million of hardware. It's pretty important for new hardware to recruit HPC software developers.

I need to be careful about what I invest my time in. With the high cost of Cell boards, the limited memory, and the limited install base, I don't have confidence Cell will become mainstream for commercial HPC ($500K-$10 million clusters). I agree with the above comments that
Quote:
The more likely outcome is that future x86 chips are going to copy most of what makes Cell perform so well, leaving Cell with little to offer.
The x86 chips will probably do it at lower price and better software infrastructure.

And while I don't want to diverge this discussion on Intel hardware, the above Intel information is misleading. Yes, the Intel chips can work on a SIMD multiply and add at the same time, but they take more than a clock cycle. You can submit a SSE multiply but it takes 5 cycles to complete. 1 clock cycle after the submit, you can submit another SSE instruction, such as an SSE add, and they will work at the same time, but you won't get 8 flop throughput per cycle. You can only submit one SSE instruction at a time.
stof is offline   Reply With Quote
Old 28-Nov-2007, 22:45   #11
patsu
Regular
 
Join Date: Jun 2005
Posts: 24,905
Default

stof, what kind of HPC software ? Is it media related ? or scientific computing ?
patsu is offline   Reply With Quote
Old 28-Nov-2007, 23:22   #12
3dilettante
Senior Member
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 4,071
Default

Quote:
Originally Posted by stof View Post
And while I don't want to diverge this discussion on Intel hardware, the above Intel information is misleading. Yes, the Intel chips can work on a SIMD multiply and add at the same time, but they take more than a clock cycle. You can submit a SSE multiply but it takes 5 cycles to complete. 1 clock cycle after the submit, you can submit another SSE instruction, such as an SSE add, and they will work at the same time, but you won't get 8 flop throughput per cycle. You can only submit one SSE instruction at a time.
I read that the FP mulitplier has a throughput of 1 per cycle and a latency of 4. Only 80-bit FP multiply has a throughput of less than 1 per cycle.

Core2 also has SSE units on 3 issue ports, 1 port for FADD, 1 port for FMUL, and 1 port for other ops.

The peak number would seem to hold unless you can't find any non-dependent multiplies.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is online now   Reply With Quote
Old 29-Nov-2007, 16:26   #13
Nite_Hawk
Senior Member
 
Join Date: Feb 2002
Location: Minneapolis, MN
Posts: 1,202
Send a message via ICQ to Nite_Hawk Send a message via AIM to Nite_Hawk Send a message via MSN to Nite_Hawk
Default

Quote:
Originally Posted by stof View Post
I am a HPC software developer. My software is used on about $100 million of hardware. It's pretty important for new hardware to recruit HPC software developers.

I need to be careful about what I invest my time in. With the high cost of Cell boards, the limited memory, and the limited install base, I don't have confidence Cell will become mainstream for commercial HPC ($500K-$10 million clusters). I agree with the above comments that The x86 chips will probably do it at lower price and better software infrastructure.

And while I don't want to diverge this discussion on Intel hardware, the above Intel information is misleading. Yes, the Intel chips can work on a SIMD multiply and add at the same time, but they take more than a clock cycle. You can submit a SSE multiply but it takes 5 cycles to complete. 1 clock cycle after the submit, you can submit another SSE instruction, such as an SSE add, and they will work at the same time, but you won't get 8 flop throughput per cycle. You can only submit one SSE instruction at a time.
Hi Stof,

I'm a developer at the Minnesota Supercomputing Institute. Similar feelings about Cell. I really wish they would make development hardware cheaper to attract more attention. $10k isn't that much in the grand scheme of things, but it's not exactly throw away money either. Cell is a popular topic around here (MSI) mostly because it's neat and exotic. There are few people here that are actually doing any real work on them.

Nite_Hawk
Nite_Hawk is offline   Reply With Quote
Old 29-Nov-2007, 19:20   #14
Shifty Geezer
Grumpy Mod
 
Join Date: Dec 2004
Location: In a pretty pink padded cell
Posts: 25,988
Default

For the sake of experimentation, isn't PS3 a suitable introduction to try things out and gauge performance? IBM's libraries support distributed processing over networked PS3's, right? So you could get 2 or 3 and try out some algorithms and see how well you think it manages for a grand or so. Less if you know a few PS3 owning mates who wouldn't mind lending you their PS3's to run a bit of Linux code on!
__________________
Shifty Geezer
...

Tolerance for internet moronism is exhausted. Anyone talking about people's attitudes in the Console fora, rather than games and technology, will feel my wrath. Read the FAQ to remind yourself how to behave and avoid unsightly incidents.
Shifty Geezer is offline   Reply With Quote
Old 29-Nov-2007, 20:22   #15
Mmmkay
Member
 
Join Date: Jul 2005
Posts: 627
Default

There's little interest at RAL, given its commodity focused HPC efforts. DP performance of the PS3 is just not worth it, and the eventual HPC Cell products will be out of reach. And that's forgoing the problems how RAL operates in terms of library and application support. In fact it's probably the latter which has more influence. Neat and exotic just isn't in the language.
Mmmkay is offline   Reply With Quote
Old 29-Nov-2007, 21:23   #16
Arwin
Now Officially a Top 10 Poster
 
Join Date: May 2006
Location: Maastricht, The Netherlands
Posts: 12,879
Default

Quote:
Originally Posted by Nite_Hawk View Post
Hi Stof,

I'm a developer at the Minnesota Supercomputing Institute. Similar feelings about Cell. I really wish they would make development hardware cheaper to attract more attention. $10k isn't that much in the grand scheme of things, but it's not exactly throw away money either. Cell is a popular topic around here (MSI) mostly because it's neat and exotic. There are few people here that are actually doing any real work on them.

Nite_Hawk
As Shifty said, precisely what is making the Cell a popular chip in this area is the possibility to just buy that 399 PS3, install Linux on it and get going with the SDKs and excellent documentation. And you can even see examples out there already from people stacking several PS3s too.
Arwin is offline   Reply With Quote
Old 30-Nov-2007, 02:15   #17
seebs
Junior Member
 
Join Date: Nov 2007
Location: Minnesota
Posts: 44
Default

It's a good testbed, I think. I did a bunch of stuff on cell simulators early on, and the PS3's faster, even if it's not quite the same.

I was going to get one of the actual dev systems, but I never got so much as a call back when I tried to contact the nice folks at Mercury. Apparently, they're WAY too busy with important things to even bother to tell me that they don't want my business.
seebs is offline   Reply With Quote
Old 30-Nov-2007, 09:25   #18
Arwin
Now Officially a Top 10 Poster
 
Join Date: May 2006
Location: Maastricht, The Netherlands
Posts: 12,879
Default

Quote:
Originally Posted by seebs View Post
Apparently, they're WAY too busy with important things to even bother to tell me that they don't want my business.
That's a shame. On the other hand, I guess that also partly answers the thread title.
Arwin is offline   Reply With Quote
Old 30-Nov-2007, 09:29   #19
seebs
Junior Member
 
Join Date: Nov 2007
Location: Minnesota
Posts: 44
Default

Well, to be fair, I'm just some guy. I wasn't even affiliated with a company -- I just wanted a cell blade system because I do a lot of technical writing, and I could have taken it as a deductible expense, and PROBABLY paid for it with work eventually.

But I'm just one guy, there's no company involved, so I assume they just figured there wasn't enough business there to justify the effort. It's not as though, if I wrote a lot of articles about it, I'd come back and buy fifty or a hundred more.
seebs is offline   Reply With Quote
Old 30-Nov-2007, 09:46   #20
Arwin
Now Officially a Top 10 Poster
 
Join Date: May 2006
Location: Maastricht, The Netherlands
Posts: 12,879
Default

Probably not, but if they were genuinely bored (i.e. not be at 100%+ work capacity), my guess would have been that they'd have gladly sold you one, precisely because you do write articles about it. That's just speculation on my part though.
Arwin is offline   Reply With Quote
Old 30-Nov-2007, 09:58   #21
seebs
Junior Member
 
Join Date: Nov 2007
Location: Minnesota
Posts: 44
Default

It might be. One of my coworkers dealt with them in another capacity once, and apparently they tend to blow off anyone who isn't likely to directly buy a LOT of hardware. I figure there's no reason for them to check that, out of a hundred people who said "I want to write about this", one particular guy might be a moderately successful writer whose articles might get read, when most of them are just dead blogs.

Still, it's sort of a shame. I really want one of those to mess around with. What Cell programming I've done has been neat, but I'd rather have a blade with real memory than a PS3 with 6 available SPEs and barely over 200MB to play with.
seebs is offline   Reply With Quote
Old 30-Nov-2007, 14:38   #22
Nite_Hawk
Senior Member
 
Join Date: Feb 2002
Location: Minneapolis, MN
Posts: 1,202
Send a message via ICQ to Nite_Hawk Send a message via AIM to Nite_Hawk Send a message via MSN to Nite_Hawk
Default

Quote:
Originally Posted by seebs View Post
It might be. One of my coworkers dealt with them in another capacity once, and apparently they tend to blow off anyone who isn't likely to directly buy a LOT of hardware. I figure there's no reason for them to check that, out of a hundred people who said "I want to write about this", one particular guy might be a moderately successful writer whose articles might get read, when most of them are just dead blogs.

Still, it's sort of a shame. I really want one of those to mess around with. What Cell programming I've done has been neat, but I'd rather have a blade with real memory than a PS3 with 6 available SPEs and barely over 200MB to play with.
That's pretty much our problem too. We do have people doing development on PS3s, but it's even more of a niche than cell development in general. At least with a cell blade we'd have a small chance of getting it in our data center and making it a general resource for MSI users. There's no chance of that with PS3s.

Nite_Hawk
Nite_Hawk is offline   Reply With Quote
Old 30-Nov-2007, 18:28   #23
patsu
Regular
 
Join Date: Jun 2005
Posts: 24,905
Default

Where are you guys located ? I know of institutions with donated Cell Blades to encourage R&D activities.

EDIT: Oh... in Minnesota. Have you approach the schools for some value exchange (write about their programs in exchange for use of Cell and whoever are working on the Cell) ? I also know of an oversea location that allow companies to use their grid network and Cell blades for free (Some strings attached).
patsu is offline   Reply With Quote
Old 30-Nov-2007, 18:31   #24
seebs
Junior Member
 
Join Date: Nov 2007
Location: Minnesota
Posts: 44
Default

I'm in Minnesota, just ilke it says in the post.

The thing is, I'm not an "institution". I'm some guy. If I got a Cell system, it'd probably be in the basement about five or ten feet from the dryer. This is not an environment conducive to sales people drooling over the future sales prospects.
seebs is offline   Reply With Quote
Old 30-Nov-2007, 20:54   #25
Vitaly Vidmirov
Member
 
Join Date: Jul 2007
Location: Russia
Posts: 96
Default

seebs
it'd probably be in the basement
Probably garage is a better place. Some great things started it's life in a garage
Vitaly Vidmirov is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 08:13.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.