Is there something that CELL can still do better than modern CPU/GPU


the article seems to be rather old, espically considering how active this research topic is at the moment.

I don't understand the following though: I thought that only the newest GPUs ( the Fermis) have local cache available?!
What does he mean when talking about GPUs rely on caching?!

But it may be possible that this is the reason that CELL is dead with respect to high performance computing:

We ported about 10% of the code that was responsible for over 98% of the CPU time spent to the SPUs, amounting to a bit over 21,500 lines of code. The total effort — including the learning curve — took about 2 man-years of work.
 
Actually not mine by any way, the link is right there at the bottom. Although its a pretty argumented article... What's your account?
I didn't read the link when I first read your sum-up. Some stuffs caught my attention.
*It's unclear when you're comparing the Cell with a CPU or a GPU.
*You compare internal bandwidth with external bandwidth.
*I didn't find in the presentation where the "They both manage to pack over 200 GFlops" and it's unclear what is the object of this comparision, most likely Cell vs GPU but which? The presentation is dated from 2007 I'm pretty sure that GPUs throughput was already higher at this time.
* The presentation is unclear to me and so is your post in regard to "how many instructions you can execute:
An SPE using dualpipe can execute 8 instructions per cycle.
I don't get it, SPE are dual issue so can execute one Integer and FP instruction at a time if my memory serves rights, even if my memory is not right I don't get how they extrapolate 8 out of 2.

The whole thing looks a bit over enthusiastic to me and it's pretty dated anyway. The Cell roadmap stalled (not arguing that is a good or a bad thing) and that the main reason why I find this kind of comparision pointless, things have evolved the Cell didn't it would be unfair to compare the cell to upcoming sandy-bridge CPUs, nowadays GPUs, etc.
 
SPE can perform some calcs on 4 32bit values in one 128bit cycle is what I presume this refers to. Not sure about the dual issue thing, isn't that a feature of the later Cell version that was optimised for DP values?
 
That was my interpretation too, which is a misleading use of 'instructions per cycle' when dealing with SIMD (single instruction, multiple data).
 
*The 8 Synergetic Processor Units of the Cell/B.E. can transfer data between each others memory via a 192GB/s bandwidth bus, while the fastest GPU (GeForce 8800 Ultra) has a bandwidth of 103.7 GB/s and all others fall well below 100GB/s. The high end GPUs have over 300GFlops theoretical throughput, but due to the memory bus speed limitations and cache miss latency, the practical throughput falls far short of that, while the Cell/B.E. has demonstrated benchmark results (e.g. for real-time ray tracing application) far superior to that of the G80 GPU despite the theoretical throughput being lower than the GPU.

FYI the internal bus bandwitdh for CPUs and GPUs is tremendous. Doesn't mather though in the big picture when youre limited by external bus bandwitdh, Cell in PS3 at ~22GB/sec, RSX at ~~22GB/sec and the 8800 Ultra numbers you mentioned are for external bus. Thus the weakest link in the end ist the bottleneck..."BOTTLE-NECK". "NECK".

Also for GFlops numbers the fastest PC graphic card ones have about theorectical peak of over 5000GFlops. My single stock 4890 that is old tech by now has 1200GFlops peak perfomance. IIRC the 5870 should be 2700GFlops for a single GPU.

Heck the old 2005/2006 GPU called x1900xt has about 450GFlops peak perfomance which falls in the same ballpark as the 8800GTX series though the G80 is by far superior due to GFlops not being it all.


About the G80 vs Cell comparision the link dont take you to mentioned article. But each platform has their own renderer and quality might differ greatly. Not much unlike the article from another site (or maybe even same site) where a weak laptop with insufficient RAM and VRAM was using 3Dmax and compared to 3xPS3 with custom renderer achieving acceptable framerates for a ray tracing renderer and laptopt taking tons of hours to finish. Ofcourse the 3Dmax rendering was tremendously superior in quality and precision while working relying on HDD caching due to insufficient RAM/VRAM thus test was obviously skewed or even faked (despite site claiming reputation).

In fact one Cell processor is four to five times faster at ray-tracing the Stanford Bunny than the G80 and the Cell QS20 blade.
2.6 GHz AMD Opteron - Saarland Ray-tracer

Nvidia GeForce 8800 GTX - Saarland Ray-tracer

Sony Playstation3 (partial 3.2 GHz Cell processor running Linux) - IBM iRT

3.2 GHz Cell Processor - IBM iRT

IBM QS20 Blade (Two 3.2 GHz Cell Processors) - IBM iRT

Is this correct extraction from article?
 
Last edited by a moderator:
I would guess that audio (at least audio workstations with lot of fx and virtual instruments) is still better with a Cell like architecture.

That because GPU still aren't that usable (at least keeping low latency, probably the most important part of it) from what many devs say.

On the other side it really wants math processing, memory architecture doesn't mater that much because it is pretty streamed anyway (eg a Athlon II VS a Phenon II is equal).

Also it can do a great use of multi cores because at the very least can assign each track/instrument/fx to a different core. And the DSP like architecture of audio make lower use of the branch prediction features.

I wouldn't be surprised if Cell still beat a 6 core Intel in those jobs.

And trust me audio in games is primitive to what you can do in even lower end setups of realtime audio, let alone high end reverbs.
 
I find that the word that Andrew Richard used in the interview he gave T.Sweeney for semiaccurate pretty much spot on to describe the Cell "un-failed".
 
http://www.xbitlabs.com/news/cpu/di...rate_Cell_Chip_into_Future_Power_Roadmap.html

CELL is back in the game?
Uncanny coincidence as i wanted to pose a question why is IBM staying on the sideline..(seemed that way with the no rumors)...when they dominated the current gen...there should have been rumors of IBM courting next gen consoles...or IBM wants to become irrelevant in consumers' home
In the interview J.Menon states:
"I think you'll see [Cell] integrated into our future Power road map. That's the way to think about it as opposed to a separate line - it will just get integrated into the next line of things that we do. But certainly, we are working with all of the game folks to provide our capabilities into those next-generation machines,"
So maybe it's a hint ;) IBM may be currently working with the big Three. I'm not sure what he means by integrated tho, your usual SMP + SPUs?
 
Last edited by a moderator:
IBM is working with gaming machine vendors including Nintendo and Sony, said Jai Menon, CTO of IBM's Systems and Technology Group, during an interview Thursday. "We want to stay in the business, we intend to stay in the business," he said.

Notable here is the absense of Microsoft, could be they've already let IBM know they're going in a different direction next gen.
 
Notable here is the absense of Microsoft, could be they've already let IBM know they're going in a different direction next gen.
Indeed but then he states "all the gaming folks".
But there is a possibily that either Ms moved from IBM to X86 or ARM (not that much choice) or that the deal with them is still not secured vs Sony and Nintendo.
It could be interesting if both Sony and Nintendo got very close CPU architecture from the editors pov.
That's quiet the first consistent hint we have in regard to next gen systems.
 
Indeed but then he states "all the gaming folks".
But there is a possibily that either Ms moved from IBM to X86 or ARM (not that much choice) or that the deal with them is still not secured vs Sony and Nintendo.
It could be interesting if both Sony and Nintendo got very close CPU architecture from the editors pov.
That's quiet the first consistent hint we have in regard to next gen systems.

I could see MS going with an integrated AMD solution perhaps? Have AMD design a good combination of GPU and CPU ...
 
MS are going with a multiple Ontario CPU+GPU combination in their tablet/console combo system. :yep2:

Or a rather less subtle hint, this isn't really the place to discuss MS's options. :p Instead, point to the IBM remarks in the Next-gen tech thread. As Rangers recognised - kudos Rangers!
 
Cell is built around very fundamental parallel computing principles. Those principles won't/can't go away, but the specific implementations may improve (e.g., Larger or more flexible Local Store).
 
You know, this brings the discussion back to its origins. If IBM haven't abandoned Cell, why not? What is the vision for Cell, and what is it offering, such that it's a still a player? I get the impression the more vocal tech-heads here are rather down on Cell's implementation. People like nAo (major apologies if that's the wrong guy!) have criticised Cell and looked to conventional architectures as being more effective.

With new CPU/GPU combos on the horizon, I think now's a good time to reevaluate what Cell brought and where the future is going, and if Cell2 is a good fit for a future platform. Or if Cell 2 is going to be a fairly radical departure from Cell's design philosophy!
 
Cell is:
* Hetereogeneous units in one CPU
* More simple and fast cores for specialized tasks
* Explicit memory management to enable extremely fast memory access
* DMA

I think nAo suggested that the Local Store should be bigger. The PPU should be more powerful. I can't remember anymore but I think we can have more than 1 PPU per Cell too.

EDIT: Forgot about low/efficient power consumption, which is another mantra for Cell.
 
Also, Cell's architecture brings near linear speed up per core added. Are there any other CPUs, on the consumer end, that can provide that while maintaining the same or better flexibility and ease of use? Then, there is the power efficiency of the Cell, when on equal scale of manufacturing. Has that been matched or surpassed by a CPU, yet?
 
Back
Top