High Performance Computing Potential of Cell

Word's come that today AMD opened up their socket to third-party development and products.

Personally, I think that this represents a challenge to STI's targeting of what I'll arbitrarily call the 'mid-range' of high-end computing (just take that as a catch-all to save type); but if they wanted it could represent an opportunity as well.

I think they should get a jump on this, but I'm not sure what the hurdles are. The major R&D has already been done, but is the Power vs x86 an insurmountable obstacle? Creating a version of Cell compatible with regular DDR1/2/3 and porting it to Socket 1207 could be a big win for STI though if they were to do it.

There may be trepidation out there about taking a plunge on Cell-based systems themselves, simply due to unfamiliarity and expense, but at the same time I think the chip has garnered some positive mindshare in the community as a chip that rocks at what it does well. If you package all that goodness as a co-processor for Opteron based systems, well I think you have a winner. Opteron is on fire presently, and AMD is talking about achieving a 30% share by years end now. Plus anything to get the SPE programming model out there would be of benefit to STI, and this seems like a good way to ease people into it, through Opteron-based systems.

That will help in ultimately pushing the higher-end XDR-based Cell clusters as well.

I'm sort of off on a tangent here, but sort of not.

There has been talk on these boards before of the marketplace risk Cell might face via other architectures heading it off at the pass by introducing co-processors or through sheer volume of cores, but IMO AMD actually opening F1027 up presents STI with a perfect opportunity to become that co-processor.

But again, I'm not sure on the x86/Power concerns in order to do so. You may not even need the 'master' core though in this situation. Their willingness to deviate from the current revision is the only barrier I see.
 
Last edited by a moderator:
xbdestroya said:
There has been talk on these boards before of the marketplace risk Cell might face via other architectures heading it off at the pass by introducing co-processors or through sheer volume of cores, but IMO AMD actually opening F1027 up presents STI with a perfect opportunity to become that co-processor.

Windows is ready for 64-core machines (they had one running at WinHEC just a few weeks ago), it's just the hardware and apps that have to catch up now.

I think the future for general computing is some sort of symetric multi-core where all the cores share the same basic instruction set where some of the cores can be specialized for a purpose through extended instruction sets or just more execution resources of that particular type.
 
Last edited by a moderator:
Not sure if it's been mentioned before, but HPCWire makes note of work at the University of Tenessee that has developed software that implements double precision accuracy using 32-bit FP math on Cell. Its performance is twice that of Cell's native 64-bit capability.

Recent work at the University of Tennessee applying iterative methods has demonstrated that 64-bit accuracy can be achieved at twice the performance of the normal 64-bit mode of the Cell architecture by exploiting the 32-bit SPEs. These and other examples represent an important trend in HPC this year.

http://www.hpcwire.com/hpc/709078.html

Also, IBM appears to have confirmed its intentions to leverage Cell in the supercomputer arena. From this report on the current supercomputer rankings:

IBM spokeswoman Flor Estevez noted in an e-mail that the Cell multi-core processor developed by IBM, Sony and Toshiba will be used later and will be "one of the most promising developments in high performance computing in years."
 
As Cell's FP isn't IEEE compliant, are these 64bit results? Presumably not, in which case will they be useable for scientific applications where I guess they're intended?
 
aaaaa00 said:
Windows is ready for 64-core machines (they had one running at WinHEC just a few weeks ago), it's just the hardware and apps that have to catch up now.

I think the future for general computing is some sort of symetric multi-core where all the cores share the same basic instruction set where some of the cores can be specialized for a purpose through extended instruction sets or just more execution resources of that particular type.
I'd add AMD's Reverse HT, or some similar technology, to that too. When working with single-threaded applications (which will be prevalent for quite some time) it enables the use of all cores by exposing one logical processor. This would allow software programmers to stick to what works best, instead of trying to extract parallelism from code that just won't work that way.
 
Shifty Geezer said:
As Cell's FP isn't IEEE compliant, are these 64bit results? Presumably not, in which case will they be useable for scientific applications where I guess they're intended?

I tried to look up more detail on their work but couldn't find anything. HPCWire just mentioned in an older article that it'd be covered in an "upcoming issue" which may already have been and gone.

Maybe someone else will have more luck finding something..
 
Titanio said:
Not sure if it's been mentioned before, but HPCWire makes note of work at the University of Tenessee that has developed software that implements double precision accuracy using 32-bit FP math on Cell. Its performance is twice that of Cell's native 64-bit capability.

What they came up with at UT is to basically to split up the solver of linear systems into two parts. The first part is a linear solver, which is done in O(n³), in single precision, and the second part use double precision to iterativly refine the solution, which is an O(n²) operation, to get acceptable accuray.
Since the single precision math is the dominating part of the calculations, a high degree of speedup of that part leads to a total speedup of the whole linear solver (on some systems).

Here is a link to the site on UT:
http://icl.cs.utk.edu/iter-ref/index.html


And the HPC-wire articel:
http://www.hpcwire.com/hpc/692906.html

Also note the no results are retrieved from Cell, when the presintation on 19th of may was held, the implementation of a linear system solver on Cell was still future work.
 
Shifty Geezer said:
As Cell's FP isn't IEEE compliant, are these 64bit results? Presumably not, in which case will they be useable for scientific applications where I guess they're intended?

They will be usable, but standard program libraries which are all based on the IEEE format may need tweaking. This is not desirable. I think IBM intends to make an IEEE compliant version of Cell with faster DP as well for supercomputing applications.
 
Guden Oden said:
While 32-bit FP isn't, I read that Cell's 64-bit DP is...
But this scheme is using Cell's Single Precision to create Double Precision maths. Thus the errors in SP will creep into the DP results, unless solved somehow.
 
Shifty Geezer said:
But this scheme is using Cell's Single Precision to create Double Precision maths. Thus the errors in SP will creep into the DP results, unless solved somehow.
The iterative Procedures used wont propagate errors, so its enough if only the last iterations use DP.
Noone is using SP to calculate DP values, cant think of a way that would be fast enough to be preferable to just using native DP, the emulated DP would be even more slower.
 
Shifty Geezer said:
But this scheme is using Cell's Single Precision to create Double Precision maths. Thus the errors in SP will creep into the DP results, unless solved somehow.

Are you sure about that?
It is using single precision to make an approximate solution, and the using iterative refinement in double precision to get at better solution.
</arm chair mode = ON>
While not particularly familiar with iterative solutions for linear systems, the article claims that a final solution will be given with the same precision as if only double precision would have been used, give that the system is not ill-conditioned. Single precision _is_ used in some of the stes in the iterative refinement, and _perhaps_ it is possible that the lack of IEEE 754 compliance is necessary here, I will leave that for experts.

Otherwise it just seems like the lack of IEEE 754 compliance otherwise only wold need better conditioned systems, if even that.
</arm chair mode = OFF>
 
Npl said:
Noone is using SP to calculate DP values, cant think of a way that would be fast enough to be preferable to just using native DP, the emulated DP would be even more slower.
But that's exactly what this is!
Titanio said:
Not sure if it's been mentioned before, but HPCWire makes note of work at the University of Tenessee that has developed software that implements double precision accuracy using 32-bit FP math on Cell. Its performance is twice that of Cell's native 64-bit capability.
Using FP to do DP is twice as fast as using DP on Cell, according to this.

Are you sure about that?
Not at all!
 
Shifty Geezer said:
But that's exactly what this is!
Using FP to do DP is twice as fast as using DP on Cell, according to this.
You are reading too much into that single statement. You wont find a way to do DP operations like a MULTIPLY with SP and be anything like competive to the native DP implementation. And even if you are, you would likely even use Integers rather than SP floats for emulating DP.

The Paper deals about specific Algorithms and their Implementation.
Qoate from http://icl.cs.utk.edu/iter-ref/index.html:
The motivation for this paper is to exploit single precision operations whenever possible and resort to double precision at critical stages while attempting to provide the full double precision results.
 
Last edited by a moderator:
zifnab said:
The fact that they are custom programs means that it’s expensive to move them to a cell-based system. Any possible hardware price benefits are therefore likely to be negligible in comparison. Different groups of people often also use a cluster, which makes it impractical since they all to have to port their code. New hardware is of course introduced in supercomputing so I’m not saying it’s impossible, but I don’t imagine supercomputing to be a very dynamic market that’ll be easy for the cell to make entry into.


I don’t have any specific numbers to go by, so I’m admittedly speculating, but I do know several places that employ pc grids and they simply use desktops that have other purposes (but are idle most of the time - for example they might be used for training purposes). So in this sense floor space is not an issue.


There are lot’s of other systems that don’t employ cell, including IBM’s own power pc chips with which it will have to compete.


They also sell lot’s of other systems that for example are based off of MIPS, DSP, power pc, x86 etc…I can’t really see how this supports cell’s potential market success.

http://www.richtigsaidfred.com/?p=99
"Sun 28 May 2006
According to HPCWire.com the Cell processor in the upcoming PS3 does indeed pack some very high performance computing capabilities. They compare it in multiple scientific applications where number crunching is all the processor does, and the results are impressive even compared to the current batch of 64 bit processors like the Athlon 64, the Itanium 2 and even the Cray X1E (that’s a supercomputer). "

Handbook for Cell Broadband Engine programming
http://www-128.ibm.com/developerworks/library/pa-nl32-downloads/
"Download an extensive programming manual for the Cell Broadband Engine™ (Cell BE) processor. Test new high-performance supercomputing math functions without the overhead of conditional branches. Try out the new platforms the Extreme Cluster Administration Toolkit supports. In the library: Learn to configure a System z9™ EC, uncover the full story on System z™ connectivity, arrange a BladeCenter® Boot from iSCSI SAN, install Eclipse on Linux® on POWER™, and discover alternative Linux distributions for POWER5™ systems."


http://www-128.ibm.com/developerworks/power/cell/docs_articles.html


this made me smile, IBM research on how to
http://www.research.ibm.com/journal/sj/451/damora.html

seems IBM and their partners cover most of what you outline from single fun kit, workstation, bewulf, industrial cluster, supercomputing, linux and so on sem well covered for Cell.
 
Back
Top