Was Cell any good? *spawn

Status
Not open for further replies.
Not to change the focus of the topic, but I think it is a relevant tangent: One aspect of an architecture is not only does it meet immediate needs but does it have a clear path to be a success in the future.

So, is the Cell architecture a good candidate for 2013 products?

Just looking at the TDP and area projected for a 4PPE 32SPE Cell2 (based on old IBM roadmaps) it definitely would appear to be on the large side and higher power side of today's processors. It would obviously have a very high peak performance and have a lot of cores; but how does it fit into what we have learned about vectorizing code and how today's algorithms are doing in the face of Amdahl's law?

I am curious what sebbbi thinks but it seems for IBM the answer was no and it appears a lot of people prefer fewer more robust CPUs and compute on the GPU than the middle ground Cell attempts to occupy. Does anyone disagree with this--and if so, why?
 
A problem with that is that the architecture would be refined in a new Cell, and we don't know how that'd pan out. A Cell 2 could incorporate a far better PPE, alternative memory systems, changes to the SPEs, and who knows what else. As long as it has PPE, SPEs, and still executes existing Cell code in native fashion (not emulated) then it'll count as a Cell processor.

I don't take IBM dropping Cell as proof-positive that Cell was a dead end. There can be economic factors where a good idea doesn't get developed; there can even be market forces that see a worse idea adopted in favour of a much better option. IBM may not like the SPE idea much as that was Toshiba's dream, and so want to drop SPEs even if they were very capable going forwards.

That's not to say Cell isn't worth dropping - only that the fact that it has been dropped doesn't equate to its intrinsic value as an architecture.
 
You said: "and still executes existing Cell code in native fashion" Isn't that one of the problems/hurdles? The road maps called for upping SPE LS iirc but that means it would break current code (and possibly be slower).
 
Not to change the focus of the topic, but I think it is a relevant tangent: One aspect of an architecture is not only does it meet immediate needs but does it have a clear path to be a success in the future.

So, is the Cell architecture a good candidate for 2013 products?

If I was designing and updates Cell what would I do:

Add another 256-512K to each cell and make it switchable from L/S to coherent cache on either a 64K or 128K granularity.

Double or quadruple the width to 256/512b

Replace the PPE with a power 7 core.

Enable the i-stream to come from main memory or L/S. This includes adding a 16-64K i-stream cache.
 
You said: "and still executes existing Cell code in native fashion" Isn't that one of the problems/hurdles? The road maps called for upping SPE LS iirc but that means it would break current code (and possibly be slower).
I don't know the particular hurdles of CPU design, but I don't see why this would be an issue any more than lengthening pipelines, changing caches, etc., hampered evolution of the x86 platform to very different processors all in the same family. I certainly wouldn't expect Cell 2 to be just several Cells on a die any more than I'd expect any other processor upgrade (x86, ARM, GPUs) to be more of exactly the same. There's nothing particular about Cell that precludes significant archtectural advances.

Aaron's description is obviously much more useful than my posts as he suggests architectural changes that he knows would be viable.
 
I don't know the particular hurdles of CPU design, but I don't see why this would be an issue any more than lengthening pipelines, changing caches, etc., hampered evolution of the x86 platform to very different processors all in the same family. I certainly wouldn't expect Cell 2 to be just several Cells on a die any more than I'd expect any other processor upgrade (x86, ARM, GPUs) to be more of exactly the same. There's nothing particular about Cell that precludes significant archtectural advances.

Aaron's description is obviously much more useful than my posts as he suggests architectural changes that he knows would be viable.

It was noted by some of the developers posting here at B3D, those familiar with the PS3 Cell, that increasing the LS memory space size would break straight backwards compatibility and such an LS would be slower.

Hence I think that is the very contrast to, "and still executes existing Cell code in native fashion". Changing the SPEs memory is going to have an impact on compatibility and performance (that is one of the reasons smaller LS sizes were chosen).

And I think that is at the core one of the architectural differences between Cell and a modern processor like the x86. Code that run on a Pentium D runs on a Core2 runs on an IVB runs on a SB. You may not take advantage of all the new features (more cores, new extensions like AVX, etc) but it is going to run, and run faster. With Cell to the Cell2 mentioned on IBM roadmaps increasing LS first of all means going back and getting the code just to function and then there is the issue that if 3.2GHz is determined to be the best TDP target (keep frequency but increase core count) you may run into issues as the larger LS is going to slow the SPE down.

That were the sort of architectural issues pointed out ~ 2006 and 2007 by PS3 devs here. The matra "just increase LS size" wasn't as easy as "just increase the cache size" as LS is not a cache and SPEs don't work like, and are not programmed like, an x86 chip.

Maybe an experienced developers/good CE knowledge poster can correct what I am saying but I am pretty sure this is what some of the more knowledgable developers had noted--and it is one of the tradeoffs of the Cell architecture approach.

I would like to know if my memory is wrong on this case.
 
at worst, it might not perform the same if the LS gets slower. I cant see how this would be a compatibility issue, or an issue at all aslong the total performance is similar or faster.
Similary, the LS size was choosen because of die-size constraints.
 
Per my memory, and Shifty's, LS size was chosen as a balance between size and latency and not simply die constraints:

Shifty said:
Regards the LS, IBM chose 256 kB as the best compromise between size and latency. Larger storage means more cycles per read. High clock is also more valuable in some processing cases where parallelism isn't good.

As for the comment on compatibility my memory is that increasing the LS size has an impact on code compatibility; as for it being "faster" if you go with bigger LS which has more latency there is no guarantee that in a new system the SPEs are going to be faster than 3.2GHz--in fact, tossing in 32 of these things+ 2-4x PPEs-plus and more L2 (L3 too?) and I am not sure why, considering the console TDP constraints we believe we can not only 4x the processing cores, improve the PPEs (Power 7?), improve the SPEs, increase the LS size, add more cache, crank up the memory frequency AND to compensate for the increased LS latency also up clock SPE frequency.

This is an issue for Backwards Compatibility but also is an extra layer of work for dropping in old code. Which goes back to my original comment about looking at Cell as an architecture and its "forward looking" approach and being no different than x86--which it seems it is.
 
Yes, the issue is the latency with the LS.

But - why do we care if it is 100% B/C? PS4 sure as hell is going to have a *much* different CPU architecture than PS3 did, so B/C has already been deemed to be of second-tier importance. Suffice to say, an evolved Cell would have made an emulated PS3 environment easier to achieve than will an x86 environment. I think that just goes without saying.

As for the architecture itself, the IBM roadmap would have to be tossed in the trash anyway within this context. Improved PPE - I'm on board with the Power7 idea - and more versatile memory structures could make for reasonable trade-offs towards greater approachability. In tandem with modern advances in transistor power saving (on an already low power chip) and possible throttled clocks, and I think you'd be looking pretty decent on a die size/performance/wattage basis. Two Power7 cores & 16 SPE-neo's, with additional on-die cache considerations and the requisite I/O... plenty CPU enough.

I/O concerns on the present die would have demanded a re-architected chip anyway at some point, as the non-equal shrink ratios of those components has been leading to increased "dead" space on the standard Cell die for some time now. I forget whether this was going to be addressed at 32nm.
 
Last edited by a moderator:
Add another 256-512K to each cell and make it switchable from L/S to coherent cache on either a 64K or 128K granularity.
Right, that's the big one IMHO. If Cell 2/whatever had that I'd be a lot more interested in it as an architecture. Of course that sort of change isn't free, but I'd definitely sacrifice a pile of raw compute power on Cell for more *useful* cycles.

And yeah, backwards compatibility in a architecture post mortem thread? Who cares :)
 
I remember reading in the Cell BE documentation that LS was something that was explicitly called out as being subject to expansion in future revisions. Whether semiconductor technology would permit a larger LS to be constructed today with the same timing characteristics that the original Cell had, who knows.

I don't know how much of an advantage a larger LS would have, though. It seemed pretty clear that Cell was designed to scale primarily by increasing the number of SPEs.

All moot questions now, though.
 
It'd be nice to know. If Cell couldn't be simply upgraded with full BC, that'd be one reason why the architecture was dropped as it'd be unsalvagable from the existing install base.

Ok, thank you for distilling what I was trying to say. I think, as an architecture, this is important because (a) the PS is the biggest Cell client by far and wide and (b) Cell was initially pitched as to be in everything from servers, game consoles, TV, set top boxes, and even home appliances. This sort of ecosystem works for x86 but would have the revisions called for in Cell (e.g. changes to LS size and latency) broken code compatibility? That could be an issue that impacts the longevity of the platform and how "good" it was for those goals.

I will leave others more knowledgeable to comment. I wish nAo was here to clarify as I may be misremembering things he and other said. (As an irrelevant side Phil and I were discussing how we thought Cell was a good long term move as it offered a scalable platform and it was PS3 developers who chimed in that some of the changes to make Cell better cause issues like latency and breaking code).

It really is odd in some ways thinking of the PS4 without Cell2. The SPEs have such a high peak throughput and are so small it is not hard to imaging on 28nm a 4 core enhanced PPE (or even 2) and 32 SPEs fitting into the die area and the TDP console makers are looking at. That right there is over 1TFLOPs in CPU performance. If developers could extra performance at a reasonable level on such a chip you would think Sony would be all over it as MS would have no real answer.

For whatever reasons Sony is not confident pushing out such a product. It raises eyebrows if they go with a 4 core x86 chips that a 2xPPE+16xSPE fits inside the same budgets. Surely Sony wouldn't abandon the later if they thought it offered the huge benefits the paper specs imply.
 
It's actually the reverse - the eyebrows would be raised if after billions of dollars gambled on R&D and fab build outs, Sony went that route again. As an architecture, I think an evolution on Cell would be absolutely viable as both a processor effort in its own right and the CPU for the PS4. But, with Sony as the primary financier of the previous effort, and no stomach in the year 2012 for any losses or financial commitments beyond what's necessary, Cell is in cryo.

Instead they'll go with a "generic" architecture with ample tools support and a high level of existing developer familiarity, with a per-unit cost structure that is - maybe if not absolutely as low as an in house IC could be - advantaged for being completely predictable in its expense and ancillary costs over the life of the product. Right now for Sony and the gaming division, I can think of little else that would be at the top of their prime directives list.

Consoles in the year 2013 won't be defined by their ambition, but by their cost controls.
 
I remember reading in the Cell BE documentation that LS was something that was explicitly called out as being subject to expansion in future revisions. Whether semiconductor technology would permit a larger LS to be constructed today with the same timing characteristics that the original Cell had, who knows.
If my memory serves me correctly the limitation was light speed.
So with current technology and change down from 65 nm the increased LS shouldn't be a problem.
 
Last edited by a moderator:
If my memory serves me correctly the limitation was light speed.
So with current technology and change down from 65 nm the increased LS shouldn't be a problem.

The relative electrical signal propagation speed goes down, not up, as you scale to smaller geometries. The wires themselves become narrower, increasing ohmic resistance, and capacitive coupling goes up because wires are placed closer together.

Cheers
 
Last edited by a moderator:
The relative electrical signal propagation speed goes down, not up, as you scale to smaller geometries. The wires themselves become narrower, increasing ohmic resistance, and capacitive coupling goes up because wires are placed closer together.

Cheers
Well, that certainly would be a problem. :D
 
Cell was a terrible idea but it worked out in the end. People suggesting that its a good idea because after 5 years people finally can do some gpu algorithms on it needs to look at the larger picture and put it into perspective.

The transistor costs was not worth it. If they went with a smaller budget on the cell and went with something conventional like the 360, they would be cutting the transistor budget down by half. For having 2x the number of cpu transistors as the 360, the increased performance is negligible. This along with the inability for the ps3 to show off its performance advantages until late late into the generation must have made for less ps3 sales. The design for this power to be "locked away" is a bad idea and sony has learned with the ps4 from that.

I wouldn't say cell was a good idea at all but it was just very good programers making use of what they can. I think almost every programer would have preferred a more modern gpu with a bigger transistor budget than what the cell+rsx can provide but they learn to deal with the cell over time. At the time the ps3 launched, if a theoretical chip(s) was given the same number of transistors for cpu and gpu and the same TPD, they could have undoubtedly been able to come out with something more easily programmed and performed much better in gaming (at least earlier on) than what the ps3 provided. The increase production costs from the cell even today probably makes for very little advantages over the 360 if the same amount of development time and skill is used to make a game.

The pros of the cell now came at a huge cost for sony. all the R&D and all the extra development for games. This is compounded on top of not having an obvious advantage over the 360 until recently really hurt the ps3.
 
Status
Not open for further replies.
Back
Top