IBM unveils Cell roadmap

Shinjisan

Newcomer
cell2007pw6.jpg


During a recent event IBM has unveiled a few details on Cell roadmap.As you can see Cell will be manufactured at 65nm during next year and a next gen version of the chip is expected around 2010 featuring 2PPE and 32 SPEs (45nm manufacturing technology).
 
2 PPEs
32 SPEs
45nm SOI
~ 1TFLOPs
2010

My guess is that in 2012 we will see the PS4 with a CELL on 32nm with ~ 2x that. I am curious if the PPEs will be more robust (OOOe? How much more cache?) and how much the LS in the SPEs will have grown? Likewise if there will be more synergy in the Synergistic Processing Units.

Glad to see IBM is fulfilling my "bold" prediction of CELL as a platform. All the growing pains now will be offset in the future by a stable platform. Devs should be able to hit PS4 running.

EDIT: Imagine the ironies of MS picking a CPU that has 3 PPEs... and SPEs.
 
2 PPEs
32 SPEs
45nm SOI
~ 1TFLOPs
2010

My guess is that in 2012 we will see the PS4 with a CELL on 32nm with ~ 2x that. I am curious if the PPEs will be more robust (OOOe? How much more cache?) and how much the LS in the SPEs will have grown? Likewise if there will be more synergy in the Synergistic Processing Units.

But at the same time, we could hope for a clock speed increase or no? Why have so many cores beyond 2PPEs and 32SPEs at that point (unless you're expecting devs to find that many tasks for PS4)? It's tough to predict that far but... I don't know. I'd figure they could get away with their next gen prediction there and ramp up clockspeeds for individual thread performance. 32SPE's sounds pretty high as it is...

And just a thought... are these roadmaps generally on the conservative side or are they just on the edge of wishful thinking (like say... end of fiscal 2010 instead of during calendar 2010)?


edit: there's a mini-, old interview here (Oct 26 2006) with Jim Kahle: http://blogs.mercurynews.com/aei/2006/10/the_playstation.html

I guess he kinda already revealed the 32SPE thing here:
DT: It seems like you finished the Cell chip designs early. The first prototypes came out in 2004 and this is 2006. Did you still need a lot of development time after that first tape out?
JK: We used that first tape out to get the initial software up and running. There were modifications we did to the chip over time. The design center is still active and participating. Our roadmap shows we are continuing down the cost reduction path. We have a 65 nanometer part. We are continuing the cost reductions. We have another vector where we are going after more performance. We have talked about enhanced double-precision chips. Architecturally we have double precision but we will fully exploit that capability from a performance point of view. That will be useful in high-performance computing and open another set of markets.

DT: That sounds like it’s not a PlayStation 3 chip?

JK: Yeah, it is a different vector. For us to extrapolate. We will push the number of special processing units. By 2010, we will shoot for a teraflop on a chip. I think it establishes there is a roadmap. We want to invest in it. For those that want to invest in the software, it shows that there is life in this architecture as we continue to move forward.

DT: Right now you’re at 200 gigaflops?

JK: We’re in the low 200s now.

DT
: So that is five times faster by 2010?

JK: Four or five times faster. Yes, you basically need about 32 special processing units.


More info on roadmap here (Nov 3):

http://www.ppcnux.com/modules.php?name=News&file=article&sid=6666
 
Last edited by a moderator:
And what they say about Advanced CELL in 2008. The photo is to blurry to read it. More SPE?

Here's a better look ;)

cellroadmapxk3.jpg


We had heard of the 1Tflop goal for 2010 before, but I'll say again that this seems quite conservative if it's simply a scaling from where we are now. How much different would the chip be to soak up the pure performance gains that could be made scaling what we have today (asides from being smaller/cheaper :p)?
 
Last edited by a moderator:
cell2007pw6.jpg


During a recent event IBM has unveiled a few details on Cell roadmap.As you can see Cell will be manufactured at 65nm during next year and a next gen version of the chip is expected around 2010 featuring 2PPE and 32 SPEs (45nm manufacturing technology).

Seems the 65nm spin isn't due till 2008 though...
 
It's interesting, actually, that Register article (from January 2006) talks about various process nodes:

The trio first announced its plan to cooperate on the development of Cell and its underlying 90nm and 65nm fabrication technology back in 2001. Back then, they described the project as a five-year programme costing $400m.

That said, Sony and Toshiba already have a separate 45nm joint development programme in place. In February 2004, the companies announced they would spend $190m to reach 45nm in 2005, at the same time other chip companies, most notably Intel, were reaching 65nm. Not that there's been any public announcement of late that the pair have achieved that goal.
It just goes to show how difficult these process nodes are. This second quoted text seems almost comically optimistic, making me wonder if it's the truth. Of course "reach" is not the same as "stamping out millions" - Intel has just demonstrated 45nm, but it's almost a year before you'll be able to buy one.

32nm looks kinda unlikely for 2010. Didn't think about this in these terms before, so 45nm/1TFLOP doesn't seem particularly conservative.

I dare say GPUs should be in the region of 2TFLOPs by then.

Jawed
 
Here's a better look ;)
The enhanced Cell is the DP flavour only, it seems. Also no mention of smaller Cells. Presumably IBM's goals differ from Sony and Toshibas, whos roadmaps might include 1:4's and the like? Or are IBM the sole developers of new Cell breeds?

As for the 1 teraflop estimate, I guess scaling the SPE's up by a factor of 4, 2:32 would come out about 1 Teraflop if 1:8 is 250 GFlops. That might be double precision though, or with larger LS etc.
 
I think in High Performance Computing, double-precision is a must.

The 16,000 Cells going into that supercomputer are just "dummies" as far as I can tell (obviously they can do DP, but slowly) - they'll eventually be replaced by the true DP Cell in 2008.

It'll be interesting to see the performance per watt of the 65nm Cell-DP versus Clearspeed versus a GPU. They should all be 65nm DP in 2008.

Jawed
 
IBM must be planning to enhance the PPEs significantly by that process generation.

8 or fewer SPEs have been shown in some cases to monopolize a single 2-wide PPE in some situations.
32 probably more powerful SPEs would be waiting on just 2 PPEs in this future version.
To keep the overal proportion of 1 PPE per 8 SPEs, each PPE would need at least twice the execution throughput to direct all the SPEs.

Instead, IBM could instead count on future code to offload more of the management work to SPEs, which now are somewhat "cheaper" with 32 of them, or make the SPEs more flexible somehow.

The ring-bus would likely need some work as well. Otherwise, some of those SPEs are going to be very distant from the PPE and DMA engine(s) indeed.
 
I don't know if 'slow' is the right word for Cell DP performance... because it's hardly slow at all compared to other chips DP performance! That said, the targets for the enhanced DP chips will make them a true monster.

And yeah this is just IBMs roadmap; we've seen Sony's before and I think we may have seen Toshiba's as well. Both companies will be making Cell on a bulk non-SOI process as well starting at 65nm, so there will begin to be some appreciable product differentation within the BE architecture.

Also I'm not sure why speed bumps and increased cores are beign viewed as mutually exclusive here; at 45 and 32nm, there should be ample room for ratcheting up both.
 
IBM must be planning to enhance the PPEs significantly by that process generation.

8 or fewer SPEs have been shown in some cases to monopolize a single 2-wide PPE in some situations.
32 probably more powerful SPEs would be waiting on just 2 PPEs in this future version.
Why would they be waiting?
What are the cases you are refering to?
I mean the SPEs are quite independent bastards, they can access I/O and memory without involving the PPE if you want them to.
 
I don't know if 'slow' is the right word for Cell DP performance... because ti's hardly slow at all compared to other chips DP performance! That said, the targets for the enhanced DP chips will make them a true monster.
In performance per watt they're pretty bloody useless compared to the state of the art.

http://www.clearspeed.com/products/cs_advance/

50 GFLOPs at 25W for a 2 chip add-in board. Currently Cell is doing ~25 GFLOPs and consumes more than 25W. How much more I dunno, IBM is pretty secretive as far as I can tell - making a lot of noise about performance per watt, but not coming clean. But it doesn't really matter because Cell is currently just a placeholder for DP Cell.

Jawed
 
Guys i have a dumbass question to ask if i may?

What are the advantages of Double percission in normal and game related tasks? ( Im more interested in the game realted part TBH :) )
 
In performance per watt they're pretty bloody useless compared to the state of the art.

http://www.clearspeed.com/products/cs_advance/

50 GFLOPs at 25W for a 2 chip add-in board. Currently Cell is doing ~25 GFLOPs and consumes more than 25W.

From what I've read the Clearspeed chips are a lot more limited, all the elements run exactly the same program, SPEs can run completely different programs.

That roadmap looks pretty much for high performance DP parts only. The 65nm parts exist already for the PS3 but they're not the same.
 
Why would they be waiting?
What are the cases you are refering to?
I mean the SPEs are quite independent bastards, they can access I/O and memory without involving the PPE if you want them to.

There were a number of threads that came out some months ago detailing technical demos and some test game engines that ran on CELL.
The PPE is not magically free to do whatever it wants when the SPEs are being utilized. In high-demand scenarious, a significant portion of its time is still devoted to coordination.

The school of fish demo, if I remember correctly, devoted half of the PPE's cycles to coordinating the SPEs. If a similar proportion existed on the 32 SPE cell, the PPEs would be used up completely, with no time left over for system tasks or any other processing.

There were some other programming projects that showed that naively using the PPE as a director and also have it pre-package and convert data for SPE consumption would lead to CELL being PPE-bottlenecked after using only 4 SPEs. The PPE becomes a bottleneck more rapidly than some had previously thought, so they had to rebalance the workload.

Like I said, the PPE bottleneck can be reduced by devoting SPEs towards conversion and packaging, something that would be less expensive using ~4 SPEs out of 32 than 2 out of 8.
 
There were a number of threads that came out some months ago detailing technical demos and some test game engines that ran on CELL.
The PPE is not magically free to do whatever it wants when the SPEs are being utilized. In high-demand scenarious, a significant portion of its time is still devoted to coordination.

The school of fish demo, if I remember correctly, devoted half of the PPE's cycles to coordinating the SPEs. If a similar proportion existed on the 32 SPE cell, the PPEs would be used up completely, with no time left over for system tasks or any other processing.

There were some other programming projects that showed that naively using the PPE as a director and also have it pre-package and convert data for SPE consumption would lead to CELL being PPE-bottlenecked after using only 4 SPEs. The PPE becomes a bottleneck more rapidly than some had previously thought, so they had to rebalance the workload.

Like I said, the PPE bottleneck can be reduced by devoting SPEs towards conversion and packaging, something that would be less expensive using ~4 SPEs out of 32 than 2 out of 8.
Thanks!
I guess it´s also fair to assume that a lot of HPC-applications will not be as tuned as future PS3 games where the dependency of the PPE will e significantly reduced as predicted by DeanoC, and the HPC apps cannot be tuned to the same degree as they cannot rely on a fixed hardware configuration.
Most of them will also likely run Linux and use some legacy cluster software that will help distribute the work, that will also benfit from more PPE power.

The Cell IBM will be selling for HPC will probably be more dependant of powerful PPEs than the Cell of the PS4.

I am curious about what size the internal store of SPEs will have in the 2010 version. I wonder if they will go above 512 kB in size.
 
Back
Top