Cell

Today's highest performing CPU core can do 8 GFLOPS at 333 MHz. If clocked at 1 GHz, you get 24 GFLOPS per core, so 64 cores would get you 1.5 TFLOPS.
 
I think it's interesting that a paper coming from IBM talking about the CELL chips of 2011 uses a performance figure which is a third of of the rumored 1TFlop for the very first CELL chip. Not to mention it's talking about 16 cores on a die, not 64.

Serge
 
MfA said:
Thats gonna be one huge die though.

Yeah, even stripping down the MIPS64-20Kc core and clocking it at 1ghz yeilds only 4GFlops for ~3M Transistors. Their going to have to resort to a more vector/scientific processor approach. Which is why it irks me when people say 'General Computing' - the line between 'dedicated computing' (VS) and 'general computing' (VU) is narroring quickly.

Their ability to hit 1TFlop will be totally dependant upon the process size they can use. It's difficult to imagine that preformance on .10um just due to transistor limitations. Sub-10 micron looks to be necessesity, but just how far south is a good question.
 
A better question might be how far south of .10 micron can we possibly go? We're definitely getting close to the limits of silicon transistor technology.

For example, it's fairly unrealistic to believe that chips will be able to ever reach 100GHz, and I doubt silicon processes will get below 0.01 microns (I'd say these are very conservative estimates, too...we'll probably not even get this far).

Hopefully quantum technologies will become viable before we hit these limits.
 
I think it's interesting that a paper coming from IBM talking about the CELL chips of 2011 uses a performance figure which is a third of of the rumored 1TFlop for the very first CELL chip. Not to mention it's talking about 16 cores on a die, not 64.

What FLOPS they will achieve depend largely on what kind of processor they are going to put into that cellular arrangement. If the processor is more general purpose, than they will have less FLOPS performance compare to special purpose one like say Vector Unit.

Also, I thought that 64 number refered to 64 chips in a package of 20x20cm, not 64 cores in a single die.

The chip they are projecting by 2008 seems to be around 400mm2, that's one huge chip.
 
V3 that's fair. As far as the article goes it refers to 64 chips, with 16 cores each. I definitely am not knowledgeable enough to say "1TFlop on a chip in 2005, without a doubt that's BS". I do think it sounds very fishy though...
 
As far as what will be used for a rasterizer in PS3.... Well my understanding was that Sony would have EE3 (500M transistors) and Graphics Synthesizer 3 (unknown number of transistors but likely astronomical) chips by 2005 or 2006. these chips would form the basis of the PS3.

this infomation, although 2-3 years old now, was from Sony/Ken K.

The EE3, according to Sony, would be a "radically new architecture". I think that sounds like CELL to me. Where as the EE2 of 2002, which is intended for workstations, part of Phase II of Sony's long term plans and unreleased afaik, would be an enhanced version of the EE1 architechure.

The rasterizer in PS3, I would think, is going to be a fairly traditional rasterizer. I do think Sony will opt for a GS3 as they have said in the past. At least the rasterizng portion of GS3 will be a traditional rasterizer--Meaning that I do not think that CELL will be used as a rasterizer, or that a second CELL will do the rasterizing. Either the main CELL CPU in PS3 will do the transform and lighting for a GS3 (as EE's Vector Units do for GS in PS2) or a second CELL will be bolted onto a GS3 rasterizer to act as the T&L/VU/Vertex Shader portion of GS3.

The rastering portion of GS3, or the whole GS3 will most likely be a massively parallel version of GS2 with more features and image processing enhancements (AA, texture compression, etc).
 
Chalnoth said:
A better question might be how far south of .10 micron can we possibly go? We're definitely getting close to the limits of silicon transistor technology.

For example, it's fairly unrealistic to believe that chips will be able to ever reach 100GHz, and I doubt silicon processes will get below 0.01 microns (I'd say these are very conservative estimates, too...we'll probably not even get this far).

Hopefully quantum technologies will become viable before we hit these limits.


didn't IBM make a chip for communications that could do 200gighertz???
 
btw the blue gene project
when is IBM going to deliver the machine for the medication research on gene functions???don't they have a deadline?
 
PC-Engine said:
Today's highest performing CPU core can do 8 GFLOPS at 333 MHz. If clocked at 1 GHz, you get 24 GFLOPS per core, so 64 cores would get you 1.5 TFLOPS.

For a IEEE compliant FPU count about 1/2 million transistors.

Let's be generous and say that each FPU can do a MADD each cycle. Let's say this chip runs a 1 GHz.

Then we get 5*10^5 transistors/FPU * 1500GFLOPS / (2GFLOPS/FPU ) =375*10^6 transistors. Not bloody likely, is it ?

Either SONY/IBM will be nowhere near 1TFLOPS or CELL will be full of special purpose logic (and not general purpose).

Cheers
Gubbi
 
Gubbi,

is it really that unlikely to achieve 375M transistors by 2005? I mean R300 already has 117M at .15u. Assuming equal transistors/area ratio you could fit 375M into the same area as an R300 by shrinking the die to ~.08u. Intel and others are already working on .09u and .065u processes... And with the kind of paralellism in we'll see in cell, won't there be a bunch of opportunities to exploit synergetic effects? Anyway, all these estimates have been kind of generous. If they DO achieve 1TFlops I'll be mighty impressed.


On an OT note, I thought this was kinda cool:
http://eet.com/at/news/OEG20020806S0030


Regards / ushac
 
Well, you have to consider that we're talking 375M logic transistors. And that's just for the FPUS, then you need the apparatus to issue instructions to these, you need to move data to/from these etc. So we'ere talking close to 500M.

Also SRAM arrays have much higher transistor density than logic (2-4 times). I'd love to know how many transistors of the R300 is in caches, FIFOs and other SRAM arrays. An uninformed guess is 50M (around 1 Mbyte in total).

Cheers
Gubbi
 
With forthcoming 0.09 micron tech, you can build chip with, say 350 million transistors. It's just completely other issue what clock rate will be ;)

Also this whole PS3/Cell hype is bit pointless. There is no reason why you couldn't build massivly paraller system. The real question is, how you are going to fit linear/general program flow to run on it. There have been a lot of research about paraller algorithms and they work well WITH SPECIAL CASES. It is completely another thing to get some general game engine with physics/AI etc. to run even half decently with the paraller architecture. No middleware, no compiler is gonna offer any foolproof solution to this. Sony's PS2 is good example what happens when you give developers paraller system to program with. You have a lot of hidden power that is very difficult dig up and use.

Besides, the calculation power in use with paraller system doesn't grow with linear fashion... With 2 processors/cores you have, say ~1.8 speedup, with four ~3.24 speedup etc.
 
I'm no expert at this, but doesn't ~50M (~43%) sound a little high concidering that the GF4Ti has ~14M (~22%)? Anyway I'd say it's questionable if they can deliver what they promise, but not impossible.

Does anyone know what feature width they are targeting cell at? How will the fact that they are tuning a manufacturing process explicitly for the cell chip affect size/yield/clock etc? Have anyone seen any roadmaps of when the .1u - .06u category of sizes are estimated to be ready for commercial production?

BTW, phynicle, those extremely high clocked chips are usually a single transistor - a transmission amplifier or so in exotic materials which often don't lend themselves well to microprocessor manufacturing techniques.

Regards / ushac
 
phynicle said:
didn't IBM make a chip for communications that could do 200gighertz???

I think it was closer to 20GHz, based on a silicon-germanium process.

Anyway, I didn't explain the full extent of the limitation. It's essentially based on the size of the circuit. Yes, it may be possible to build a 1THz processor, but that processor would be exceptionally tiny. The barrier here is essentially just the speed of light. As soon as you attempt to produce clock speeds that get too high for the size of the chip, the chip will start radiating like crazy, and the resistance of the metal will also increase significantly.
 
What kind of heat will a 16+ core chip generate?

That depend on the core, the process they are using and the clock speed, etc.

Here that article predicted,

By 2008, however, a 400-mm2 chip would easily be able to accommodate 16 processors running at 6 GHz or higher. If designed for optimal efficiency for typical applications, each processor should take up only about 5 mm2 on the die (This area could accommodate a two-issue out-of-order processor with 32K instruction and data caches and a single-instruction-multiple-data (SIMD) multimedia unit.), leaving approximately 320 mm2 for memory and communication.

From that article, it seems that Cell is just a way to utilise the high transistors count that they are going to achieve in the future. Instead of making more complex processor with fancy things to get more performance (which not efficient according to them), they propose to use simpler processor, but many of them, to get the performance. Well that's the vibe I am getting from that article anyway.

Are they right, what do you think ?

They also have prediction of transistor density in one of the table in that article.
 
Are they right, what do you think?

Yes, I do.

Games have large data sets that can be broken up and then processed by different processors easily. That's just games. In the real world, when was the last time you ran a single program at once? It must have been years ago really, because, windows and a lot of other programs run on your machine at once - virus scanner, fire wall, ICQ, got knows what else... and that's just on your desktop. The thing is that there are so many little programs running, it seems weird to have one CPU jump around doing all these tasks, rather than executing them at the same time. It'd be nice not to have my HDD accesses slow computers to a crawl, yeah, yeah that's more IDE's fault.

Regardless we constantly hear that we don't need faster CPUs and I agree, we need wider CPUs. Ones that can do a lot of things at once. We're used to doing things in the foreground while things are being done in the back ground. Think about how one works in the kitchen, I'm sure most people don't do one task at a time, they let one happen on all by it's lonesome and then they do something else.

We live in a world where many things happen at the same time, it's time CPUs began to take large steps into this world.
 
Back
Top