High Performance Computing Potential of Cell

mckmas8808 said:
When you say very few; how many companies do you see using CELL? 1 company, 3, 5?

I would say pretty well everybody involved in supercomputing, image processing such as the film industry, fields like radar processing, medical MRI and ultrasonic imaging, and military applications in signal processing and weapons systems.

That is excluding the cut down versions of the existing Cell that will be used in TVs, mobile phones and other devices to do things like mpeg decoding.
 
Also double-precision is absolutly vital for eye-toy like and/or augmented reality applications like the e3 card game demo. So I'd wonder about that a bit... You get accuracy problems with fp32 amazingly fast when using these sorts of algorithms.
 
SPM said:
I would say pretty well everybody involved in supercomputing, image processing such as the film industry, fields like radar processing, medical MRI and ultrasonic imaging, and military applications in signal processing and weapons systems.

That is excluding the cut down versions of the existing Cell that will be used in TVs, mobile phones and other devices to do things like mpeg decoding.

Good god, so you are predicting that CELL will be a hit? Was that sarcasm?
 
mckmas8808 said:
Good god, so you are predicting that CELL will be a hit? Was that sarcasm?

I don't think I am the one being sarcastic. With the huge performance advantage being offered, it is inevitable Cell will be a success in those fields. The fact that it is already being developed for blade servers for specialist fields despite shortages due to the PS3 sucking in virtually all production, and the fact that it's floating point operations aren't 100% IEEE compliant (important for scientific computing which uses standard numerical libraries) and it's double precision performance advantage is more modest, is an early indication of this.

When the IEEE compliant Cell+ double precision version comes out, it and other similar processors will dominate HPC and floating point intensive computation.
 
Graham said:
Also double-precision is absolutly vital for eye-toy like and/or augmented reality applications like the e3 card game demo. So I'd wonder about that a bit... You get accuracy problems with fp32 amazingly fast when using these sorts of algorithms.

Actually, there are visual recognition algorithms which don't even use FP , but instead, work entirely in a symbolic domain. Numerical time series data from a frame is converted into a symbol alphabet, and then traditional text modeling algorithms are used to recognize key sequences. Lookup Symbolic Aggregate Approximation.

I also don't see what FP32 has to do with AR either. AR doesn't even imply visual recognition. Plenty of AR systems use positional trackers, known augmented GIS data, GPS, to know where objects are located rather than trying to recognize geometry from video.
 
What kind of design modifications for DP can we expect according to the actual SPE ? :

speunit9qg.jpg
 
SPM said:
I don't think I am the one being sarcastic. With the huge performance advantage being offered, it is inevitable Cell will be a success in those fields.

Memory limitations might be a problem in the CG industry. We're already fighting the 2GB per task barrier, even we - a small studio working on game cinematics. Large movie VFX studios won't use Cell if it can't work with more than 512MB of memory...
 
Laa-Yosh said:
Memory limitations might be a problem in the CG industry. We're already fighting the 2GB per task barrier, even we - a small studio working on game cinematics. Large movie VFX studios won't use Cell if it can't work with more than 512MB of memory...

Laa-Yosh, we are not back to the CELL can only support 256-512 MB of RAM are we :p ?
 
Laa-Yosh said:
Large movie VFX studios won't use Cell if it can't work with more than 512MB of memory...
Is this some joke or are you suggesting Cell is limited to 512 MB memory? :???:

If it's not a joke I can assure you that you the Cell can have almost any size you like, the data bus of the XDR memory can easily be configurd to different sizes.
 
SPM said:
I would say pretty well everybody involved in supercomputing, image processing such as the film industry, fields like radar processing, medical MRI and ultrasonic imaging, and military applications in signal processing and weapons systems.
Software utilized in these fields is generally very specialized and I would imagine moving such applications to a cell-based architecture would be an expensive an non-trivial matter. There are also hardware aspects to take into account. Many people involved in supercomputing for example utilize grids that consist of idle pc's, which make any such specialized hardware superfluous and incompatible. There's undoubtedly several markets waiting for the cell, but there's also numerous practical reasons and plentiful competition to make any progress slow and difficult.
 
zifnab said:
Software utilized in these fields is generally very specialized and I would imagine moving such applications to a cell-based architecture would be an expensive an non-trivial matter.

That is precisely why it is worth moving to Cell - there is no mass market advantage or case for leveraging of existing code when using ix86. It is a custom program deployed in small numbers whether it is on Cell or on ix86, only Cell is an order of better in cost, space density and power consumption to performance ratio.

There are also hardware aspects to take into account. Many people involved in supercomputing for example utilize grids that consist of idle pc's, which make any such specialized hardware superfluous and incompatible. There's undoubtedly several markets waiting for the cell, but there's also numerous practical reasons and plentiful competition to make any progress slow and difficult.

Yes, in the past PCs were used, but the trend is to go to server blades or big SMP or NUMA boxes now to increase density. The reason is that floor space is expensive, heat generated is a problem and has to be removed by airconditioning, the interconnecting wiring is a problem, and servicing or replacing PCs is difficult. All of these problems are solved by blade servers, and IBM, the market leader in supercomputing clusters is planning to introduce Cell server blades for precisely this purpose. IBM is a partner with Sony and Toshiba in developing Cell. Sony wants it for games. Toshiba wants it as a media processor eg. for use in TV sets, set top boxes and mobile phones. IBM has no interest in games or consumer electronics. It wants Cell for supercomputing applications.
 
Laa-Yosh said:
Memory limitations might be a problem in the CG industry. We're already fighting the 2GB per task barrier, even we - a small studio working on game cinematics. Large movie VFX studios won't use Cell if it can't work with more than 512MB of memory...

The PS3 version of Cell will have a 1GB RAM limit (I think) imposed by the width of the XDR interface used in the PS3. IBM's Cell blade may not have the same limit.

In any case the film industry standard practice for years has been to use Linux clusters to do various movie special effects. The cluster nodes don't need much RAM each because they only run one compute intensive task each and return results to a central server. 256MB would be plenty for most applications. The huge data or results set in a complex supercomputing application like weather modelling would be stored on a central server, not on the supercomputing nodes, and would be made available to the nodes in the cluster via the network.

For graphics artist's workstations, the film indistry would obviously use a PC graphics workstation, not a PS3 so they can plug in more RAM and use s high-end graphics card of their choice, large HDD, BluRay writer etc., although there is scope for using a Cell or Aegis PPU on a plug-in card to accelerate certain things, eg. like ray tracing.
 
Last edited by a moderator:
DemoCoder said:
Actually, there are visual recognition algorithms which don't even use FP , but instead, work entirely in a symbolic domain. Numerical time series data from a frame is converted into a symbol alphabet, and then traditional text modeling algorithms are used to recognize key sequences. Lookup Symbolic Aggregate Approximation.

I also don't see what FP32 has to do with AR either. AR doesn't even imply visual recognition. Plenty of AR systems use positional trackers, known augmented GIS data, GPS, to know where objects are located rather than trying to recognize geometry from video.

Ok. I was talking about the use of Cell for the new generation eyetoy. Ie, cheap tracking, which has to be visual. This pretty much means the card based demo sony showed, which appeared to be using simple binary patterns on the cards.
You can't expect to be using magnetic trackers, etc, in a ~$50 product :p
I was guessing the card demo was based on threshold edge/corner tracking (like other 'cheap' augmented reality tracking systems use).
It's not a problem to detect that a trackable item is visible, I'm not saying you need DP for that, but determining it's exact position and orientation accuractly is the difficult bit, and whenever I've tried, lowing the floating point precision always has a spectacular impact on this accuracy. Part of this can be put down to algorithm design of course, however if they plan to have tracking accuracy like what they showed in their demo (the dragon - even though it was almost certainly fake) getting that sort of accuracy would be extremly hard with single precision (imo :).
I just found it interesting as in this case Cell had 'poor' DP performance. Something I didn't realise, which may explain the realitive simplicity of the implementation they were using (I always expected cell would be perfect for that sort of app).
 
First of all Aldo I meant to post this Saturday, but great find.

One contribution to our Cell knowledge on the whole that the article provided independent efficiency comparisons to other architectures on certain tasks, but even beyond that it's our first solid insight into the wattage of the 3.2GHz version of the chip, located on page 4 at ~40 watts. Not too bad! A couple of months ago I guessed low 30's and a vcore of 1.0; too low on the wattage unfortunately (though the estimates were based on some rough Realworldtech 4Hz operation numbers), but I wonder if the vcore is indeed 1.0 or if in the end it's up or down a tenth. Vcore 1.0 would be consistent with the ISSCC 2005 schmoo though, since they listed individual 3.2GHZ SPE wattage at 3 watts per in the Berkeley paper.
 
Last edited by a moderator:
xbdestroya said:
but even beyond that it's our first solid insight into the wattage of the 3.2GHz version of the chip, located on page 4 at ~40 watts.
Well, AFAICS in the paper they use the cycle-accurate IBM simulator, apparently they do not have the hardware.
The difficulty of programming Cell, which requires assembly level intrinsics for the best performance, makes this model useful as an initial step in algorithm design and evaluation. Next, we validate the accuracy of our model by comparing results against published hardware results, as well as our own implementations on the Cell full system simulator.
 
Good catch One, that was a sloppy miss on my part.

Well, yeah so probably a vcore of 1.0 then if it's IBM's own docs being used to estimate power draw. ;) (unless the schmoo has substantially changed along with recent chip revisions)

That's too bad though, as I really would have enjoyed some independent wattage numbers. Ah well, still a great doc, and even knowing the IBM estimates for 3.2GHz chip operation is more than we've gotten until now, even if not independently authenticated in that regard.
 
Graham said:
Ok. I was talking about the use of Cell for the new generation eyetoy. Ie, cheap tracking, which has to be visual. This pretty much means the card based demo sony showed, which appeared to be using simple binary patterns on the cards.
You can't expect to be using magnetic trackers, etc, in a ~$50 product :p
I was guessing the card demo was based on threshold edge/corner tracking (like other 'cheap' augmented reality tracking systems use).

What are you using to do the corner tracking? Did you try a Hough transform approach?
 
Graham said:
I just found it interesting as in this case Cell had 'poor' DP performance.

"On average, Cell is eight times faster and at least eight times more power efficient than current Opteron and Itanium processors, despite the fact that Cell's peak double precision performance is fourteen times slower than its peak single precision performance."

I wouldn't call it poor, even between quotes. ;) It's just that it is so much better at single precision. But it in no way means it sucks at double precision.

Also, if I remember correctly the Eyetoy demo shows a 3d model of the hand in the game, so it does more than recognise a binary pattern on the card. I think this is the technology that makes a 3D image out of a 2D one, also demonstrated earlier with the emotion recognition demo.
 
SPM said:
That is precisely why it is worth moving to Cell - there is no mass market advantage or case for leveraging of existing code when using ix86. It is a custom program deployed in small numbers whether it is on Cell or on ix86, only Cell is an order of better in cost, space density and power consumption to performance ratio.
The fact that they are custom programs means that it’s expensive to move them to a cell-based system. Any possible hardware price benefits are therefore likely to be negligible in comparison. Different groups of people often also use a cluster, which makes it impractical since they all to have to port their code. New hardware is of course introduced in supercomputing so I’m not saying it’s impossible, but I don’t imagine supercomputing to be a very dynamic market that’ll be easy for the cell to make entry into.

SPM said:
Yes, in the past PCs were used, but the trend is to go to server blades or big SMP or NUMA boxes now to increase density. The reason is that floor space is expensive, heat generated is a problem and has to be removed by airconditioning, the interconnecting wiring is a problem, and servicing or replacing PCs is difficult.
I don’t have any specific numbers to go by, so I’m admittedly speculating, but I do know several places that employ pc grids and they simply use desktops that have other purposes (but are idle most of the time - for example they might be used for training purposes). So in this sense floor space is not an issue.

SPM said:
All of these problems are solved by blade servers, and IBM, the market leader in supercomputing clusters is planning to introduce Cell server blades for precisely this purpose. IBM is a partner with Sony and Toshiba in developing Cell. Sony wants it for games. Toshiba wants it as a media processor eg. for use in TV sets, set top boxes and mobile phones. IBM has no interest in games or consumer electronics. It wants Cell for supercomputing applications.
There are lot’s of other systems that don’t employ cell, including IBM’s own power pc chips with which it will have to compete.

Arwin said:
They also sell lot’s of other systems that for example are based off of MIPS, DSP, power pc, x86 etc…I can’t really see how this supports cell’s potential market success.
 
Back
Top