Clearspeed announces CELL-like processor

zidane1strife · Oct 15, 2003

I doubt the cell chip will be 1tflop . It will be high for the time but not 1tflop.

I too doubt it will be 1Tflops...

but the fact that the techology can yeild a 10+GHz IC is what's important.

indeed...

ed

Paul · Oct 15, 2003

I too doubt it will be 1Tflops...

I know what Zidane is thinking.

MfA · Oct 15, 2003

The powerconsumption of desktop processors is also projected to be well over 100 Watt by that time, and desktop processors have a far lower average switching frequency than Cell (I think Sony could get away with a 100 Watt processor, but some significant surface area of the outer case will need to consist of cooling fins ... and I dont know wether large heatpipes are cheap).

Josiah · Oct 15, 2003

Re: ...

Vince said:
Josiah said:

The unification of shaders has to do with sharing of hardware resources, something Cell does not do.

Click to expand...

How does Cell not "share resources". You can run anything from Physics to basically any Shader program. It's the exact embodiment that, on a high-level, we'll one day see PC IC's processing.

To say that a Unified Shader is unlike what you can do with Cell is like saying that Icing doesn't work on Cake.

On a higher level the difference between how Cell works and how an ATI card works should be very obvious. On a lower level, Cell (as I understand it) is a group of processor cores, split into clusters, split into execution units. These elements do not share resources such as registers or ALUs, but each independantly work on a "cell" of data.

On a high level a unified shader model is just an implementation detail of the API. On a low level ILDPs like R500 allocate hardware resources as needed by the program, rather than feeding clusters of instructions through static function units (as Cell apparently does).

Vince said:
Josiah said:

Cell can be compared to a giant FPGA controlled by software. The chip itself does very little inherently, it's just a fast platform for the applets ("cells") to run on.

Click to expand...

Ok, now this is a very obtuse comment IMHO. So, by this very logic, the EmotionEngine (which is clearly a "father" to Cell) is also like a giant FPGA. As are the TCL-front-ends in the NV3x and R3x0 which have Vec processors arranged in a "loose" contruct. As will be the future IHV hardware with a Unified Shading Model running on them.

If you think in such abstract terms, you can also say Cell isn't that much different from a digital watch (they both use electricity) or an apple (they both are constructed with atomic particles). The key difference here is that Cell does not contain anything like a hardware graphics pipeline, the graphics pipeline is the software you feed it (if that is what you feed it, Sony intends to use this architechture in a wide range of devices). Any graphics card available today or in the next few years is the reverse (you won't find a GeforceFX or an R500 in a Walkman).

Vince said:
Josiah said:

A DirectX style GPU OTOH has a clear graphics pipeline defined by silicon. With NV3X the chip is the architechture, with Cell the software is the architechture.

Click to expand...

Ok, now this is so very wrong. DirectX is an abstraction, totally irrelevant to what's running below it, obscured by the driver. The NV3x is the shining example of my case IMHO that clearly shouts Bullshit! to your argument. And you can see the effect better "software" does to an architecture like it by looking at the latest benchmarks using their new driver, I think it's 52.xx. There was a massive, almost 50%+, improvement in preformance just by having the driver better arrange and mediate register use and data flow between DX and the underlying hardware.

I don't see your point here. 3dfx Voodoo, which was very much a fixed-function platform, also increased performance through driver revisions, and performed much better using its own API (Glide) than Direct3D. This is simply due to the fact that Direct3D is a high level API, "compiled" on the fly by the driver for the target hardware, meaning there is a huge capacity for waste.

jvd · Oct 16, 2003

Panajev2001a said:
jvd said:

Vince said:

Katsuaki Tsurushima; [url said:

http://eetimes.com/semi/news/OEG20031015S0027[/url]]The Japanese consumer electronics giant announced earlier this year that it will invest a total of $4 billion over the next three years in semiconductor-manufacturing facilities, and another $4 billion in R&D for key devices, including semiconductors, displays and batteries. The total includes investment plan for 65-nanometer process technology on 300-mm wafers, which Sony considers critical to the Cell processor it is designing jointly with IBM Corp. and Toshiba Corp.

The Cell microprocessor, expected to be the main product at a new 300-mm wafer fab to be constructed by Sony, is targeted to provide teraflops performance and consume relatively little power. The processor will be used in future versions of the company's Playstation game console, as well as in various broadband network nodes, according to Sony.

Click to expand...

Click to expand...

Dang thats alot of money if the ps3 tanks . Lets hope there are no acts of god that destroy those plants and cause the ps3 to fail .

Oh and clock speed doesn't matter for the end result . It only matters when compareing the same chips . Higher mhz is allways better . But chip to chip alot more plays into it . Look at the athlon and the p4.

I doubt the cell chip will be 1tflop . It will be high for the time but not 1tflop. I would put it around 500gflops at the most and most likely 250 gflops sustained and it will still be extremly impressive .

Click to expand...

I posted the same ind of news ( only with links to Sony's own site ) a while ago and jvd notices it now ?

This was all present in Sony's IR website for a good while and I have repeated over and over the $4 Billions invested in CELL R&D and the $4 Billions invested for Semiconductor R&D all over the course of the next three years.

no recognition for the good ol' Panajev

I like Vince's link as it puts, one more time, CELL together with PlayStation 3 even though I would think that by now there would be no doubt about that

CELL will not be only used for PlayStation 3 and that investment is needed to help Sony save money in the future and mantain their competitive edge technologically wise: if CELL and other chips produced thanks to these investments allow Sony to produce most of the ICs it now buys from third parties ( $2+ Billions worth of ICs each year ) the whole R&D investment will kinda pay for itself if you think about it.

No i noticed it then too . I just felt like being sarcastic about the only thing ruining sonys plans is an act of god. Because it really don't matter if they pumped all the money in the world into the cell chip. IT could still easily fail.

Saem · Oct 16, 2003

I can definately see Josiah's point.

Having read a few articles and lectures slides from universities about reconfigurable computing, I think his description of Cell being FPGA isn't far off. There are actually a few proposed academic designs which are reminescent of Cell, which are designed to optimise themselves to a certain task by effectively allocating it's processing elements via software for whatever tasks are thrown at it. This is where Josiah seems to hit the nail squarely on the head and where you (Vince) seem to be missing his point.

Now the NV3x and so on, are different. As Josiah said, you can change the allocation mechanism (software) in cell and have it reconfigure itself for various tasks. What josiah is saying is that in NV3x et al, this is not the case. Here you have a fixed hardware unit which looks at in comming data (instructions + information) to work on.

The other side of the argument is that Cell isn't reconfigurable since the execution resources are static in their definition. The thing to note here is that this is more granuality and sharing can be facilitated if need be.

The unified shading architecture will likely differ a fair bit from Cell. Cell will approach matters in running software to deal with the way software will run on the machine. While the graphics architectures will use hardware to run the software on the machine. Of course, there will eventually be microcode programming et al and finally the two approaches will converge.

But the level of abstraction that Josiah is talking about is definately less abstract and his points remain quite valid. I have yet to see a good argument to the contrary.

Vince · Oct 16, 2003

Saem said:
The other side of the argument is that Cell isn't reconfigurable since the execution resources are static in their definition. The thing to note here is that this is more granuality and sharing can be facilitated if need be.

That is my point, and this:

What josiah is saying is that in NV3x et al, this is not the case. Here you have a fixed hardware unit which looks at in comming data (instructions + information) to work on

Begs the question of how you can draw such a distinction when developers can already do rudimentary physics in shaders I believe and if not will be able to soon enough.

There is no fundimental micro-architectural difference between a VU, an APU, or a future VS/FS. Hell, I just read in another thread about a guy who networked like 30 workstation NV3x's and ran weather simulations on them.

The unified shading architecture will likely differ a fair bit from Cell. Cell will approach matters in running software to deal with the way software will run on the machine. While the graphics architectures will use hardware to run the software on the machine. Of course, there will eventually be microcode programming et al and finally the two approaches will converge.

In both cases there is an underlying hardware which has software running on/over it. In both cases, especially in a Unified Shading Architecture and Cell, you'll see developer input changing the allocation of resources to some extent - if this is consciously manipulated via the Cell software developer or a driver-esque automatic compiling/allocating is irrelevent. Perhaps you can draw a line in this grey, I don't personally as I see where it's going.

But the level of abstraction that Josiah is talking about is definately less abstract and his points remain quite valid. I have yet to see a good argument to the contrary.

Obviously I disagree. And I'll respond to him when I get back, but I need to run out.

Paul · Oct 16, 2003

IT could still easily fail.

It can't.

Sure it could come out underpowered, late or whatknot but Sony still buys 8 billion in IC anually, not having to do this because they can use Cell in all their products means it's a success to Sony either way.

Cell can't "fail" but sure it could be underpowered or late.

nondescript · Oct 16, 2003

It's been 12-hours since I last checked the thread...so some of this is going to be about things 20 posts ago.

To whoever posted the Micron stuff:

Yeah, that sounds very promising. Micron's starting from the memory end and building logic, while Clearspeed starting from the logic end and is embedding more memory close to the processors. Definitely, I think the ultimate solution is somewhere in between, with logic and memory units close together, and ridiculous amounts of bandwidth.

Like Hannibal (the Arstechnica guy, not the Carthaginian general) said, the bottleneck for PCs is definitely the bus speed and latency - putting memory w/ logic solves this problem. Which is why eDRAM is so important for CELL.

Cell like FPGA?

I don't pretend to know what CELL is, other than what I can scrounge from the patents and from other ppl who have, but its definitely not a FPGA.

I've used the Xilinx Spartan line of FPGAs, and all it is 5000 or so "slices" of logic and memory. Each slice has some flip-flops, and a two-tier AND-OR array for combinatory logic. There are another 200 or so I/O modules that you can assign the chip pins to. You design a circuit, the Xilinx complier maps it on to the chip, and you have your circuit. You can't reprogram the hardware on the fly, you need to restart each time you change the logic. "Field-Programmable" simply means it doesn't take special equipment to reprogram it.

FPGA don't "optimise themselves to a certain task by effectively allocating it's processing elements via software for whatever tasks are thrown at it." (Saem) They don't do anything at all unless you design it in.

An FPGA is a bunch of components. You build things with the components. Sure, you could build a CELL-like chip with an FPGA - but that's your design and not an intrinsic property of the FPGA itself.

FPGA's are used for prototyping and low-volume applications where it isn't economical to fab a ASIC.

That's very different from CELL.

MfA · Oct 16, 2003

Cell is like a memory anemic cluster based supercomputer.

nondescript · Oct 16, 2003

nondescript said:
(No justification right now for the 1.0V or the 20% savings - I'm headed out for dinner, and in a rush, get back to it later)

Getting back to it.

First, the 1.0V:

Looking and the SIA roadmap for high-performance chips, we see that the projected voltage for the 100nm node is 1.2-0.9V, for the 70nm node is 0.9-0.6V and for the 50nm node is 0.6-0.5V. The SIA doesn't expect 70nm to be attained until 2008, and 50nm until 2011 - but STI is a little ahead of the curb here. If CELL is being fabbed at 65nm, 1.0V is already a conservative estimate.

SIA Roadmap: http://public.itrs.net/files/1999_SIA_Roadmap/ORTC.pdf
Page 18 for the power projections.

Now, for the blanket 20% power savings.

With SOI and eDRAM and all that good stuff, we can expect power savings beyond the simple scaling of bulk CMOS.

http://pr.fujitsu.com/en/news/2001/02/9-1.html

The newly developed SOI chip achieves a 24% reduction in power consumption.

http://www.ssec.honeywell.com/aerospace/new/080699.html

SOI technologies offer 40 percent greater speed and 30 percent power reduction when compared to conventional bulk CMOS.

As you can see, 20% is already a conservative estimate.

So there, that's my justification.

Vince · Oct 16, 2003

Re: ...

Josiah said:
On a higher level the difference between how Cell works and how an ATI card works should be very obvious. On a lower level, Cell (as I understand it) is a group of processor cores, split into clusters, split into execution units. These elements do not share resources such as registers or ALUs, but each independantly work on a "cell" of data.

On a high level a unified shader model is just an implementation detail of the API. On a low level ILDPs like R500 allocate hardware resources as needed by the program, rather than feeding clusters of instructions through static function units (as Cell apparently does).

Ok, read what you just wrote. To me the difference between Cell running a shader program on an arbirary number of APUs is just about equivalent to a NV50 or R500 running a Shader program on a Unified Shading Architecture with an arbitrary number of constructs running it. Under both are clusters of FP logic which is being controlled per application and churning out data. In both circumstances there are finergrained constructs which are arbitrarily divided on a per task basis, which I'll touch on after the following reply.

Vince said:
If you think in such abstract terms, you can also say Cell isn't that much different from a digital watch (they both use electricity) or an apple (they both are constructed with atomic particles).

Ok, this is true but not going anywhere.

The key difference here is that Cell does not contain anything like a hardware graphics pipeline, the graphics pipeline is the software you feed it

Who ever said this? The patent outlined hardwired functionality for the "dumb" raster functions which are highly iterative (eg. Filtering, etal). It's just that in front of these pseudo-pipelines are a significant amount of FP logic that's arranged in constructs composed of 4 FPUs and 4 FXUs. As opposed to the R300's VS which has something like a 4-way SIMD unit and scalar pipe, which is almost the same, no? Perhaps closer to a VU than an APU. Correct me if I'm wrong, I think I might be because I think the R300's are more complex, like multi-tiered - regardless, the idea remains constant.

(if that is what you feed it, Sony intends to use this architechture in a wide range of devices). Any graphics card available today or in the next few years is the reverse (you won't find a GeforceFX or an R500 in a Walkman).

And application is irrelevent, but just to throw it in. You can already do some level of scientific apps "in software" as you like to say on a Graphics card. This will only get more common with time.

Vince · Oct 16, 2003

Nice reply Nondescript.

MfA said:
Cell is like a memory anemic cluster based supercomputer.

I'd trademark this phrase while you still can.

Saem · Oct 16, 2003

Vince,

Stupid issues with articulating arguments on the net.

So my take on the matter is Josiah's PoV is valid and isn't all wrong. Now that doesn't mean I think yours (Vince) is wrong. One of the points of my post was to elaborate on the grey area and then draw a line in it to define what your side and Josiah's side are.

Additionally, note that I mentioned that they're convergent paths! I don't think Josiah's arguement as I understand it excludes this possiblity. I think you're arguing your (Vince) position too strongly.

DeadmeatGA · Oct 16, 2003

....

From the ClearSpeed product pdf..

Software Development
The CS301 is programmed in C which has been extended
with the poly keyword used to identify data to be processed
on the array.
The ClearSpeed Software Development Kit includes a
C compiler, a graphical debugger and a full suite of supporting
tools and libraries.
The C compiler is based on ANSI C with simple extensions
to support the CS301 architecture.
The debugger supports all features required by professional
software developers: simple & complex breakpoints, watchpoints,
single-stepping and symbolic source-level debug.

See? PixelFusion doesn't do auto parallelization out of serial code.

Brimstone · Oct 16, 2003

nondescript said:
It's been 12-hours since I last checked the thread...so some of this is going to be about things 20 posts ago.

To whoever posted the Micron stuff:

Yeah, that sounds very promising. Micron's starting from the memory end and building logic, while Clearspeed starting from the logic end and is embedding more memory close to the processors. Definitely, I think the ultimate solution is somewhere in between, with logic and memory units close together, and ridiculous amounts of bandwidth.

Like Hannibal (the Arstechnica guy, not the Carthaginian general) said, the bottleneck for PCs is definitely the bus speed and latency - putting memory w/ logic solves this problem. Which is why eDRAM is so important for CELL.

The idea for combining DRAM and Logic has been around for awhile. It is mainly refered to a IRAM (Intelligent RAM). It looks like Micron is calling their version of it Active Memory.

One of the people thats the driving force behind IRAM is Dave Patterson. His research credits include the RISC CPU design standard and the RAID standard that many file servers utilize. Dave Patterson is a real heavy weight when it comes to research. So after RISC and RAID his third big project is the IRAM concept.

Here is a good article from Wired August of 1996.

http://iram.cs.berkeley.edu/articles/wired.85.lo.jpg

Another fun to read article on IRAM.

http://iram.cs.berkeley.edu/papers/IRAM.computer.pdf

Anyway, could the rumor that Team X-Box reported on that Microsoft has signed a contract with IBM be for an IRAM chip for the X-Box 2?

Marriage of expedience
A professor's latest innovation might change computing -- again.
By Bridgit Ekland

This article is from the June 15 and July 1, 2001, issue of Red Herring magazine.

Dave Patterson is a master at pulling off technology coups. In the early '80s, the University of California at Berkeley computer science professor invented a microprocessor design called RISC, which replaced long sets of processing instructions with smaller and faster sets. It's the engine driving many of the large servers in operation today. A short time later, he invented a more reliable -- and now ubiquitous -- data-storage technology called redundant array of inexpensive disks, which offers fast, reliable, cheap mass storage.

Now Mr. Patterson is on the verge of announcing yet another engineering feat. This time, he's going after the brains of computers with a chip design he calls intelligent random access memory (IRAM). Simply put, IRAM defies conventional computing economics by combining a microprocessor and a memory chip on a single piece of silicon.

After five years of work, the professor and his team of ten graduate students have handed a detailed design of IRAM to IBM (NYSE: IBM), which will fabricate the prototype chip. The plan is to begin testing the prototype this fall, in applications like multimedia and portable systems.

SPEED OF BYTE
Mr. Patterson's invention doesn't mark the first time engineers have tried to marry a microprocessor and a memory chip. Similar ideas, like graphics chips integrated with data-storage devices, have filtered onto the market, particularly in Sony PlayStations and set-top boxes. But these haven't approached the promise of the faster and more efficient IRAM chip. If the IRAM design takes hold in the chip industry, Mr. Patterson's invention may accelerate the market for a new generation of handheld computers that would combine wireless communications, television, speech recognition, graphics, and video games. "I believe in the post-PC era and the gadgets, cell phones, and PDAs," Mr. Patterson says.

One application Mr. Patterson has in mind is to leverage IRAM so that a handheld like the Palm can be used as a tape recorder with speech recognition and file-index capabilities. For example, the device would enable someone to locate and hear what a colleague, "John," has said about "computer privacy." The user would simply repeat those words, and the IRAM technology in the Palm would recognize the voice command and find the specific passages.

Mr. Patterson is essentially attempting to remove a thorn in the side of the microprocessor industry: the bottleneck that has long restrained processing speeds. Over the last two decades, the speed of microprocessors has increased more than 100-fold. But while memory chips, known as DRAMs, have kept pace in terms of capacity, their speed has increased only by about a factor of ten. As a result, microprocessors spend more time waiting for data and less time doing valuable computations. And as the gap between speeds grows, methods to help alleviate the problem, like memory caching, are being maxed out.

Mr. Patterson believes that any IRAM-designed microprocessor could potentially access memory 100 times faster than is currently possible. The performance in, say, a wireless device would be comparable to that of the average PC. Eventually, other chip giants will try to place microprocessors alongside DRAM on a single chip. But Mr. Patterson is impatient; he thinks it can be done now, intelligently, and with benefits far greater than others might imagine.

Mr. Patterson, a former college wrestler who bench-pressed 350 pounds on his 50th birthday, isn't one to shy away from tough odds. While the expected payoff is a faster chip, IRAM is a huge gamble, given that Intel (Nasdaq: INTC) chips dominate the market. Another hurdle: if Intel or any other company adopts the new chip, engineers would have to learn how to program it. "Almost all of these new types of media processor-kind of chips are difficult to develop software for," says Pete Glaskowsky, senior analyst for MicroDesign Resources, a market research firm. "They all say that their product is easy to develop software for. I've learned to discount that statement."

If history is any indicator, IRAM might level the playing field, just as RISC once allowed Sun Microsystems (Nasdaq: SUNW) and IBM to challenge Intel.

http://www.cs.berkeley.edu/~pattrsn/articles/RedHerring.html

This is a good article also.

http://www.coe.berkeley.edu/labnotes/0102iram.html

DeadmeatGA · Oct 16, 2003

...

Well, I got the die size data of CS301.

41 million transistors take up 72 square mm using an IBM 0.13 silicon-on-insulator process

For CELL fans, don't cheer yet because CS301 is a SIMD processor; only one instruction decoder, one control unit and one instruction cache shared among 64 FPUs, and is not heavily pipelined to support higher clock. EE3 on the other hand is a 18-way MIMD(2 PPC cores + 16 active VUs + 2 spare VUs) plus 2 MB of SRAM cache, so it will no doubt be massive in dize size and give a poor yield, plus a programming model that makes CS301 programming look like a kiddy stuff.

Sorry, Kutaragi's dream of 1 TFLOPS per chip still has to wait until 2007 or 2008 and still cost a bucketload of cash to afford....

Megadrive1988 · Oct 16, 2003

Deadmeat, do you feel that PS3 will offer the same leap over PS2, as PS2 offered over PS1?

I know you are not a fan of PSX/PS1 or PSX2/PS2, and I am not either.
Sony has not yet showed they are capable of impressive graphics quality. sure their Playstations were very powerful opon their releases
(lots of polygons for the time) oh, and they did have the first consumer device with embedded memory (Graphics Synth in PS2) but that's about it.

I am more of a fan of Lockheed Martin's Real3D, PowerVR and ArtX/ATI.

(but wait Sony fans!)
I do however have some hope that PS3 will offer much improved graphics quality and new way of rendering real-time graphics. I dont know if PS3 will reach 1TFLOP or not. It doesnt really matter if its 900 GFLOPs or 1.2 TFLOPs, PS3 will only get 100~300 sustained GFLOPs anyway. what *will* matter more, is if developers can do something new, with any given amount of flops. new way of rendering (procedural textures, dynamic lighting, sub-division, shaders etc.)

And I'm also looking forward to PSP for the simple fact that it will FINALLY bring true competition to the handheld arena, thus cooler games.

TheMightyPuck · Oct 16, 2003

I still think the bottleneck in next generation consoles is going to be developer talent (of course I guess that is always the case). Sony's biggest challenge ain't CELL. It isn't even exectuting on cost effective console hardware (over the course of its life including any die shrinks, integration of components etc.) It is providing development tools for the CELL based PS3 to make possible reasonable production times and costs for PS3 games relative to the competition. Of course in the existing Sony dominated marketplace, non AAA developers are willing to invest more time and money into developing PS2 software than they would for other platorms. If PS3 is a cheap and easy box to develop for, it will be able to leverage its massive brand advantage into continued console dominance even if its hardware doesn't quite live up to the hype.

Paul · Oct 16, 2003

PS3 will only get 100~300 sustained GFLOPs anyway.

Where do you derive such a number from? Noone has a clue of what the bandwidth of the e-DRAM on Cell will be.

Clearspeed announces CELL-like processor

zidane1strife

Paul

MfA

Josiah

jvd

Saem

Vince

Paul

nondescript

MfA

nondescript

Vince

Vince

Saem

DeadmeatGA

Brimstone

B3D Shockwave Rider

DeadmeatGA

Megadrive1988

TheMightyPuck

Paul

Similar threads