2 PPEs
32 SPEs
45nm SOI
~ 1TFLOPs
2010
My guess is that in 2012 we will see the PS4 with a CELL on 32nm with ~ 2x that. I am curious if the PPEs will be more robust (OOOe? How much more cache?) and how much the LS in the SPEs will have grown? Likewise if there will be more synergy in the Synergistic Processing Units.
DT: It seems like you finished the Cell chip designs early. The first prototypes came out in 2004 and this is 2006. Did you still need a lot of development time after that first tape out?
JK: We used that first tape out to get the initial software up and running. There were modifications we did to the chip over time. The design center is still active and participating. Our roadmap shows we are continuing down the cost reduction path. We have a 65 nanometer part. We are continuing the cost reductions. We have another vector where we are going after more performance. We have talked about enhanced double-precision chips. Architecturally we have double precision but we will fully exploit that capability from a performance point of view. That will be useful in high-performance computing and open another set of markets.
DT: That sounds like it’s not a PlayStation 3 chip?
JK: Yeah, it is a different vector. For us to extrapolate. We will push the number of special processing units. By 2010, we will shoot for a teraflop on a chip. I think it establishes there is a roadmap. We want to invest in it. For those that want to invest in the software, it shows that there is life in this architecture as we continue to move forward.
DT: Right now you’re at 200 gigaflops?
JK: We’re in the low 200s now.
DT
: So that is five times faster by 2010?
JK: Four or five times faster. Yes, you basically need about 32 special processing units.
And what they say about Advanced CELL in 2008. The photo is to blurry to read it. More SPE?
During a recent event IBM has unveiled a few details on Cell roadmap.As you can see Cell will be manufactured at 65nm during next year and a next gen version of the chip is expected around 2010 featuring 2PPE and 32 SPEs (45nm manufacturing technology).
Seems the 65nm spin isn't due till 2008 though...
Jawedme said:By 2010 32nm should be viable
http://www.theregister.co.uk/2006/01...iba_32nm_cell/
which is an ~8x area improvement while 1TFLOP is only a 4x improvement. So, ahem, 1TFLOP is extremely conservative.
The trio first announced its plan to cooperate on the development of Cell and its underlying 90nm and 65nm fabrication technology back in 2001. Back then, they described the project as a five-year programme costing $400m.
It just goes to show how difficult these process nodes are. This second quoted text seems almost comically optimistic, making me wonder if it's the truth. Of course "reach" is not the same as "stamping out millions" - Intel has just demonstrated 45nm, but it's almost a year before you'll be able to buy one.That said, Sony and Toshiba already have a separate 45nm joint development programme in place. In February 2004, the companies announced they would spend $190m to reach 45nm in 2005, at the same time other chip companies, most notably Intel, were reaching 65nm. Not that there's been any public announcement of late that the pair have achieved that goal.
The enhanced Cell is the DP flavour only, it seems. Also no mention of smaller Cells. Presumably IBM's goals differ from Sony and Toshibas, whos roadmaps might include 1:4's and the like? Or are IBM the sole developers of new Cell breeds?Here's a better look
Why would they be waiting?IBM must be planning to enhance the PPEs significantly by that process generation.
8 or fewer SPEs have been shown in some cases to monopolize a single 2-wide PPE in some situations.
32 probably more powerful SPEs would be waiting on just 2 PPEs in this future version.
In performance per watt they're pretty bloody useless compared to the state of the art.I don't know if 'slow' is the right word for Cell DP performance... because ti's hardly slow at all compared to other chips DP performance! That said, the targets for the enhanced DP chips will make them a true monster.
In performance per watt they're pretty bloody useless compared to the state of the art.
http://www.clearspeed.com/products/cs_advance/
50 GFLOPs at 25W for a 2 chip add-in board. Currently Cell is doing ~25 GFLOPs and consumes more than 25W.
Why would they be waiting?
What are the cases you are refering to?
I mean the SPEs are quite independent bastards, they can access I/O and memory without involving the PPE if you want them to.
Thanks!There were a number of threads that came out some months ago detailing technical demos and some test game engines that ran on CELL.
The PPE is not magically free to do whatever it wants when the SPEs are being utilized. In high-demand scenarious, a significant portion of its time is still devoted to coordination.
The school of fish demo, if I remember correctly, devoted half of the PPE's cycles to coordinating the SPEs. If a similar proportion existed on the 32 SPE cell, the PPEs would be used up completely, with no time left over for system tasks or any other processing.
There were some other programming projects that showed that naively using the PPE as a director and also have it pre-package and convert data for SPE consumption would lead to CELL being PPE-bottlenecked after using only 4 SPEs. The PPE becomes a bottleneck more rapidly than some had previously thought, so they had to rebalance the workload.
Like I said, the PPE bottleneck can be reduced by devoting SPEs towards conversion and packaging, something that would be less expensive using ~4 SPEs out of 32 than 2 out of 8.