Details trickle out on CELL processor...

The companies expect that a one rack Cell processor-based workstation will reach a performance of 16 teraflops or trillions of floating point calculations per second.

I want one!!11!1!!!!!!1!!1!!1
:devilish: :devilish: :devilish:
 
Cell is optimized for compute-intensive workloads and broadband rich media applications, including computer entertainment, movies and other forms of digital content. Other highlights of the Cell processor design include:

• Multi-thread, multicore architecture.

• Supports multiple operating systems at the same time.

• Substantial bus bandwidth to/from main memory, as well as companion chips.

• Flexible on-chip I/O (input/output) interface.

• Real-time resource management system for real-time applications.

• On-chip hardware in support of security system for intellectual property protection.

• Implemented in 90 nanometer (nm) silicon-on-insulator (SOI) technology. Additionally, Cell uses custom circuit design to increase overall performance, while supporting precise processor clock control to enable power savings.

What's with the multiple OS's? Virtual Os's?
 
Jaws said:
• Supports multiple operating systems at the same time.

What's with the multiple OS's? Virtual Os's?[/quote]

It’s called virtualisation... it’s big thing that’s happening in the server world. Goal is to optimally utilise available h/w resources.

Think VMWare, Sun Domains/Zones, HP n/vPar, and IBM's LPar (whatever its called).
 
Any specifics given so far, apart from 4.8GHz (!) SRAM clock? Dare we even hope for 4.8GHz ALU clock as well? I myself won't believe that just yet, lest I be disappointed when the actual speed is revealed and it turns out to be lower, say, 2.4GHz. :p
 
Jaws said:
The companies expect that a one rack Cell processor-based workstation will reach a performance of 16 teraflops or trillions of floating point calculations per second.

I want one!!11!1!!!!!!1!!1!!1
:devilish: :devilish: :devilish:

Now all we need to know is how many Cells are there in the WS... and at what speed?
 
With the capability to support multiple operating systems, Cell can perform both PC/WS operating systems as well as real-time CE/Game operating systems at the same time.

Sounds like Longhorn... :p
 
Jaws said:
What's with the multiple OS's? Virtual Os's?

Maybe something like Intel's Vanderpool. Now you have a Cell workstation, you assign 80% for Linux, and 20% for another real-time OS, I assume like that.
 
Guden Oden said:
Any specifics given so far, apart from 4.8GHz (!) SRAM clock? Dare we even hope for 4.8GHz ALU clock as well? I myself won't believe that just yet, lest I be disappointed when the actual speed is revealed and it turns out to be lower, say, 2.4GHz. :p

Even if at that speed (2.4GHz), the overall performance is not significantly dropped, how disappoint will you be the? :D
 
Jov said:
Guden Oden said:
Any specifics given so far, apart from 4.8GHz (!) SRAM clock? Dare we even hope for 4.8GHz ALU clock as well? I myself won't believe that just yet, lest I be disappointed when the actual speed is revealed and it turns out to be lower, say, 2.4GHz. :p

Even if at that speed (2.4GHz), the overall performance is not significantly dropped, how disappoint will you be the? :D

Also the PE bus is 6.4GHz and the PUs are 64bit Power cores as opposed to PowerPC.
 
Jaws said:
The companies expect that a one rack Cell processor-based workstation will reach a performance of 16 teraflops or trillions of floating point calculations per second.

I want one!!11!1!!!!!!1!!1!!1
:devilish: :devilish: :devilish:

So roughly ~3 times as powerful as a BlueGene Rack (5.6 TFlops)
http://www.ipab.org/Presentation/sem04/04-02-1.pdf

Now the question is how much power does one of those Cell racks use? The BlueGene rack only uses 20.1kW per rack (2048 processors) and I would expect the Cell to be higher since it was designed with performance more in mind instead of power usage (which is an important balancing act for supercomputer processors)

So really not sure how impressed I am, doesn't sound like they reached the 1 TFlop per chip but then again things could be different for the PS3.
 
Let's see SRAM running at chip speed great that's what SRAM is supposed to do, but where's the eDRAM to backup this little pool of SRAM??? How are you going to get 1 TFLOPS with only 1MB of SRAM and 100 GBs external bandwidth??? Looks like realworld numbers aren't going to be too impressive like everyone predicted. Oh and what's the die size?
 
If you want to talk sheer theoreticals: If the 4.8GHz number is true, it would take a rack of 20 of these 90nm Cell PE's to be FP equivalent to the 1024 BG/L cluster. If the 65nm Broadband Engine exists, then it would only take ~5 BE's to equal the BG/L cluster in FP.

PC-Engine said:
100 GBs external bandwidth???

Uh, that's 614GB/sec or 1.2TB/sec (if it can dual read/write) per PE. Would be neat to see a 65nm Broadband Engine that appraches 5TB/sec in bandwith to Local Staorage.

And as Dr. Kelly said, “Today, we're revealing just a sampling of what we believe makes the innovative Cell processor a premiere open platform for next-generation computing and entertainment products.â€￾ It's not a product unveiling, more akin to the early discussion of the 250MHz Vector Processor at ISSCC in '99 IIRC. Maybe there will be eDRAM, maybe it'll be more like a GPU.
 
PC-Engine said:
Let's see SRAM running at chip speed great that's what SRAM is supposed to do, but where's the eDRAM to backup this little pool of SRAM??? How are you going to get 1 TFLOPS with only 1MB of SRAM and 100 GBs external bandwidth??? Looks like realworld numbers aren't going to be too impressive like everyone predicted. Oh and what's the die size?

Stream processing I'd assume.

Stream Processors: Programmability with Efficiency

While a conventional microprocessor or DSP can benefit from the locality and parallelism exposed by a stream program, it is unable to fully realize the parallelism and locality of streaming. A conventional processor has only a few (typically fewer than four, compared with hundreds for a stream processor) arithmetic units and thus is unable to exploit much of the parallelism exposed by a stream program. A conventional processor is unable to realize much kernel locality because it has too few processor registers (typically fewer than 32, compared with thousands for a stream processor) to capture the working set of a kernel. A processor’s cache memory is unable to exploit much of the producer-consumer locality because there is little reuse of consumed data (the data is read once and discarded). Also, a cache is reactive, waiting for the data to be requested before fetching it. In contrast, data is proactively fetched into an SRF so it is ready when needed. Finally, a cache replaces data without regard to its liveness (using a least-recently used or random replacement strategy) and often discards data that is still needed. In contrast, an SRF is managed by a compiler in such a manner that only dead data (data that is no longer of interest) is replaced to make room for new data.


Small image predicting 1.4 Teraflops at the .65 nm node.

http://www.acmqueue.com/figures/issue011/dallyfig9.jpg
 
And as Dr. Kelly said, “Today, we're revealing just a sampling of what we believe makes the innovative Cell processor a premiere open platform for next-generation computing and entertainment products.â€￾

Whats with the "premiere open platform for next-generation computing and entertainment products"? Open platform?
 
BTW I should comment on my BlueGene comparisons that they are kinda hard processors to compare to some extent since they are really meant for different purposes.

A single BlueGene processor is kinda like a single APU so each board is like a single Cell Chip so its expected there would be a lot less Cell chips in one of these Cell racks then. But if the Cell chips are running at 4.8Ghz there is sure going to be a nice amount of heat to dissipate.

It will be interesting to see some more specific numbers on the Cell chip before commenting much more.
 
PC-Engine said:
Let's see SRAM running at chip speed great that's what SRAM is supposed to do, but where's the eDRAM to backup this little pool of SRAM??? How are you going to get 1 TFLOPS with only 1MB of SRAM and 100 GBs external bandwidth??? Looks like realworld numbers aren't going to be too impressive like everyone predicted.

Always finding a way to be pessimistic about everything. I bet if PS3 renders Shrek 2 AND Finding Nemo simultaneously in realtime with one hand and cooks thanksgiving dinner with the other, you'd still find a way to criticize Sony. :rolleyes: Maybe there won't be enough salt in the gravy?
 
I love this:
Multi-thread, multicore architecture.
Now I would like to know if this multithread support thing is just an OS thing..or something more!
 
Guden Oden said:
PC-Engine said:
Let's see SRAM running at chip speed great that's what SRAM is supposed to do, but where's the eDRAM to backup this little pool of SRAM??? How are you going to get 1 TFLOPS with only 1MB of SRAM and 100 GBs external bandwidth??? Looks like realworld numbers aren't going to be too impressive like everyone predicted.

Always finding a way to be pessimistic about everything. I bet if PS3 renders Shrek 2 AND Finding Nemo simultaneously in realtime with one hand and cooks thanksgiving dinner with the other, you'd still find a way to criticize Sony. :rolleyes: Maybe there won't be enough salt in the gravy?

Actually what I just wrote was exactly what many have predicted pessimisim or optimism isn't the issue.. As a matter of fact there's no mention of eDRAM at all. Maybe the BB engine using 65nm process will include eDRAM? This TFLOPS is starting to sound like 75 million polygons/sec all over again.

Uh, that's 614GB/sec or 1.2TB/sec (if it can dual read/write) per PE.

To 1MB of SRAM yes.

I've read that PS3 will use a downgraded version of XDR which is directly related to my eDRAM prediction.
 
Back
Top