nondescript
Regular
This new processor by Clearspeed seems to show the viability of a CELL-type chip.
http://www.newscientist.com/news/news.jsp?id=ns99994274
Also in Wired: http://www.wired.com/news/technology/0,1282,60791,00.html
I just discovered these articles, so I haven't done much background reading yet. But for me, this erases any doubt that 1-Tflop CELL is indeed possible. (EDIT: This means I think 1-Tflop cell is possible.)
As for the price, keep in mind that this is a low-volume specialized chip made by a small company - most of the price is R&D, not production. For CELL, that will obviously not be the case.
As I said in this post I believe that a architecture that recognizes and exploits the massive data parallelism in computer graphics will exhibit much better cost-effective performance.
You can see Clearspeed's emphasis on parallel-data processing in their previous products.
http://www.eetimes.com/semi/news/OEG20010611S0119
http://www.newscientist.com/news/news.jsp?id=ns99994274
New chip gives PCs supercomputing muscle
18:04 14 October 03
NewScientist.com news service
A computer chip that will enable personal computers to perform some calculations as fast as some supercomputers was unveiled on Tuesday.
Developed by ClearSpeed Technologies, based in California, the CS301 chip is capable of 25 gigaflops - 25 billion "floating point" calculations per second. These arithmetical calculations are also a common measure of computing power.
A desktop Pentium processor operates at a few hundred million flops, while some of the most powerful computers in the world operate at few hundred gigaflops. Putting around 20 ClearSpeed chips into a few personal computers could potentially provide the sort of power normally only found in a supercomputer built from hundreds of parallel processors or specialised hardware.
The CS301 works as a supplementary component to a regular processor. A chipset carrying one or two of the chips can be plugged into a normal PC like a graphics card and perform intensive calculations on behalf of the machine's normal processor. The chip is also very power-efficient, consuming only three watts and ClearSpeed is working on a version for laptop computers.
"The goal here is to enhance supercomputers at one level," says Tom Beese, CEO of ClearSpeed. "But also to deliver a power-efficiency that means you can put a few of chips inside a laptop, running along side a Pentium, and have a gigaflop laptop."
Protein modelling
The CS301 would be especially suited to arithmetically intensive scientific applications such as protein modelling or geological data analysis. Beese says the chip is fast and efficient because it has been designed almost entirely to focus on performing mathematical calculations with around 70 per cent of its surface dedicated to number crunching.
ClearSpeed plans to start selling a PC-compatible version of the microprocessor to research companies and universities within the next few months. A price has yet to be finalised but Beese says a single chip will initially cost around $16,500.
Many supercomputers are built from large arrays of off-the-shelf processors, although there is also a growing return to the use of specialised hardware. The world's fastest supercomputer, NEC's Earth Simulator, is made from specialised components. It is theoretically capable of 35 thousand gigaflops or 35 trillion floating point operations per second.
Details of the CS301 chip will be announced at the Microprocessor Forum 2003, which takes place in California this week.
Also in Wired: http://www.wired.com/news/technology/0,1282,60791,00.html
I just discovered these articles, so I haven't done much background reading yet. But for me, this erases any doubt that 1-Tflop CELL is indeed possible. (EDIT: This means I think 1-Tflop cell is possible.)
As for the price, keep in mind that this is a low-volume specialized chip made by a small company - most of the price is R&D, not production. For CELL, that will obviously not be the case.
As I said in this post I believe that a architecture that recognizes and exploits the massive data parallelism in computer graphics will exhibit much better cost-effective performance.
You can see Clearspeed's emphasis on parallel-data processing in their previous products.
http://www.eetimes.com/semi/news/OEG20010611S0119
ClearSpeed revises graphics engine to process packets
By Chris Edwards
EE Times
June 12, 2001 (6:30 p.m. ET)
LONDON — At the Embedded Processor Forum this week, ClearSpeed Technology Ltd. (Bristol, England) will detail how it has taken an architecture originally designed to process 3D graphics and modified it to handle network packet processing at OC-768 (40-Gbit/second) data rates.
ClearSpeed, the recently renamed PixelFusion Ltd., said its original Fuzion 150 design combined embedded DRAM with a parallel processing single instruction, multiple data (SIMD) array running at 400-MHz to accelerate graphics operations. ClearSpeed's modified design will run at similar speeds, but the company said it has redesigned the array to suit common networking operations such as Layer 3 and Layer 4 packet forwarding and classification, with simultaneous support for multiple protocols.
"There are a number of innovations we have made along the way. The main one is data-dependent processing," said Simon McIntosh-Smith, architecture program manager of ClearSpeed, who is due to make a presentation Thursday (June 14) at the Network Processor Forum portion of this year's Embedded Processor Forum.
In place of the Fusion 150's unified SIMD array is the revised array comprised of four independent processors, each of which controls 64 SIMD processing elements.
The array was split up to let each processor handle packets independently, which lets one unit handle in-depth processing without holding up simpler operations on the others.
As with the Fuzion 150, each element is made up of an arithmetic logic unit and its own area of memory.
"The four processors are completely independent. Inside each of these, the processing elements run off one instruction stream," McIntosh-Smith said. "The processing elements have a path where they can pass data to each other." This is useful in string searches such as those needed to classify packets based on their contents, he said.
"The data to be searched can be spread across the processing elements and searched in parallel," he said. "The processing elements are in a linear array. Each one talks to its left and right neighbor."
But the elements do not have to communicate through their neighbors. "Processing elements can access data from where they like independent of other processing elements," McIntosh-Smith said. "They can load data from completely different places."
McIntosh-Smith said the on-chip interconnect and I/O engines help speed up memory-intensive operations. The ClearConnect bus supports an aggregate bandwidth of 400 Gbits/s, providing access to off-chip packet memory and a table-lookup engine.
Instead of using content-addressable memory, the table-lookup engine has 2 Mbyte of compiled SRAM and a hardware accelerator to find forwarding addresses. An I/O engine passes the data to and from the processors, and a second one is used for external direct memory access transfers.
Each processor array has "a strong element of multithreading," McIntosh-Smith said. "We use it to overlap I/O and compute cycles. It can be processing one packet while fetching the next."
The architecture does not limit each processor to 64 SIMD processing elements. "There can be up to 256 processing elements per processor," said McIntosh-Smith.
The company has built a test chip for its architecture using a 0.15-micron process.
Chris Edwards is editor of Electronics Times, EE Times' sister publication in the United Kingdom.
More Embedded Processor Forum coverage.