The Engineers Who Created Cell

version · Jul 11, 2005

phat said:
As a systems programmer, 128kB of cache is a lot more attractive to me than 256kB or even 512kB of plain local storage.

ok swap LS to L1 cache , it is fine , but what do you do if cache miss ? with 8-10 cores a L1 cache miss will be 50-200 cycle, a L2 cache miss will be 1000+ cycle
and how much L2 cache for 9 cores, 3-4 MB ?? this design is totally sux...

bleon · Jul 11, 2005

I wonder whether major Sony developers such as Square Enix or Polyphony Digital had any influence on hardware design decisions for Cell, in the same way that Carmack is heavily involved in various technical advisory boards.

phat · Jul 11, 2005

version said:
phat said:

As a systems programmer, 128kB of cache is a lot more attractive to me than 256kB or even 512kB of plain local storage.

Click to expand...

ok swap LS to L1 cache , it is fine , but what do you do if cache miss ? with 8-10 cores a L1 cache miss will be 50-200 cycle, a L2 cache miss will be 1000+ cycle
and how much L2 cache for 9 cores, 3-4 MB ?? this design is totally sux...

The amount of cache and average miss latency need not grow proportionally with the number of cores since not all cores will be missing their caches at the same time. The worst case latency of course increases proportionally but worst case will be statistically rare. With games, you don't need to plan for worst case--if the game frames out once every 30 seconds, so what?

The advantage with cache vs dumb local store is that it makes your game design much more robust against changes in assumptions about data size and access patterns. With a dumb local store, you have to explicitly overlay data into the local store, and planning for that can be difficult and will make it very hard for you to know whether you're making optimal use of system bandwidth. If your assumptions change, as they are bound to, you'll have to redesign. With a cache, you can more easily iteratively design, profile, optimize, and rework where necessary. This sounds like sloppy engineering, but, in practice, works much better than a front-heavy workflow.

Fafalada · Jul 11, 2005

The advantage with cache vs dumb local store is that it makes your game design much more robust against changes in assumptions about data size and access patterns

Unfortunately on many existing architectures that is "only" true so long as you don't give a flying f$$$ about performance.
Current generation of consoles is a great example of how cache architectures can still be incredibly sensitive to data sizes and access patterns, and designing your application without paying close attention to those issues will literally grind the system to a halt.
Problem is compounded by the fact that optimizing for cache-coherency has no "exact" solution, you're often reduced to things like packing your data structures tighter and just "hoping" it will amount to better performance (it usually does, but the actual effects are unpredictable at best - thanks to the way cache behaves).

And fact of the matter is that the PPC cores in new consoles are not making this problem go away - it may even grow worse in some ways.

Shifty Geezer · Jul 11, 2005

xbdestroya said:
And no I completely agree with you... I see Cell as something that Sony and Toshiba brought IBM in on to assist with, but also as something that at it's heart is still Sony and Toshiba's creation.

I get a very diffrent story from reading the first page of this thread. eg.

+ Later 3 companies had meetings to discuss the architecture of CELL. The target performance of the project was 1 TFLOPS. Toshiba proposed Force System that has many simple RISC cores and a main core as the controller. Jim Kahle, the POWER4 architect, from IBM proposed an architecture which has just multiple identical POWER4 cores. When a Toshiba engineer said maybe Force System doesn't need a main core, Kahle was greatly pissed off (thus the title of this chapter) as without a main core POWER has no role in the new architecture.

+ Meetings continued several months and Yamazaki of SCE was inlined toward the IBM plan and voted for it. But Kutaragi turned down it. Eventually Yamazaki and Kahle talked about the new architecture and agreed to coalesce the Toshiba plan and the IBM plan. Finally IBM proposed the new plan where a Power core is surrounded by multiple APUs. The backer of APU at IBM Austin was Peter Hofstee, one of the architects of the 1Ghz Power processor. It was adopted as the CELL architecture.

The three cam e together. IBM proposed one solution. Toshiba another. One Sony guy went with IBM's idea, one didn't. Disagreements were overcome through compromise.

I think the main reason Cell is termed 'IBM's Cell chip' is because the bulk of work and design and implementation was done by IBM engineers, but I also don't know if I've heard that term. It's known to me inarticles as STI's Cell.

It looks like XeCPU is more akin to what IBM wanted for Cell so it'll look like next-gen consoles offer a chance for both approaches to be evaluated.

3roxor · Jul 11, 2005

It looks like XeCPU is more akin to what IBM wanted for Cell so it'll look like next-gen consoles offer a chance for both approaches to be evaluated.

Don't think so.. Ibm is actively marketing and supporting the CELL architecture as found in the PS3 and not their "own" version.

London Geezer · Jul 11, 2005

3roxor said:
It looks like XeCPU is more akin to what IBM wanted for Cell so it'll look like next-gen consoles offer a chance for both approaches to be evaluated.

Click to expand...

Don't think so.. Ibm is actively marketing and supporting the CELL architecture as found in the PS3 and not their "own" version.

Obviously, it's not like they're going to downplay Cell. It's also their creation.

What will happen will be a showdown of to different approaches, both of which IBM helped creating. From IBM side, they're both good.

It's down to Sony/Toshiba and MS now to downplay their competitors and bore us to death with "proof" that each approach is the best, which is something that started already and will probably carry on for a long time.

Shifty Geezer · Jul 11, 2005

If you read the first post in this thread, when the groups first sat around the table IBM, suggested multiple cores, no SPUs, and Toshiba suggested all SPUs, no main cores. As such, XeCPU being all PPC cores seems similar to IBM's take on Cell when first considered. This doesn't mean they want to push XeCPU as an alternative to Cell, as they have a vested interest in the whole technology as part of a group effort even if they still think their original design would outperform the current Cell design. IBM may even have changed their minds and believe the current Cell structure is better at what it's supposed to do then their original idea.

London Geezer · Jul 11, 2005

Would it be wrong to say that with the new XCPU IBM gets paid for 3 cores instead of Cell's 1. I think it's not surprising that they'd go for the 3 cores solution, instead of 1 core+SPEs one if that's the case.

3roxor · Jul 11, 2005

IBM may even have changed their minds and believe the current Cell structure is better at what it's supposed to do then their original idea.

This is what I think. Remember that IBM is producing the XeCPU also so they could easily have chosen to market that solution instead. The reason they don't do that is quit telling IMHO.

Entropy · Jul 11, 2005

Fafalada said:
The advantage with cache vs dumb local store is that it makes your game design much more robust against changes in assumptions about data size and access patterns

Click to expand...

Unfortunately on many existing architectures that is "only" true so long as you don't give a flying f$$$ about performance.
Current generation of consoles is a great example of how cache architectures can still be incredibly sensitive to data sizes and access patterns, and designing your application without paying close attention to those issues will literally grind the system to a halt.
Problem is compounded by the fact that optimizing for cache-coherency has no "exact" solution, you're often reduced to things like packing your data structures tighter and just "hoping" it will amount to better performance (it usually does, but the actual effects are unpredictable at best - thanks to the way cache behaves).

And fact of the matter is that the PPC cores in new consoles are not making this problem go away - it may even grow worse in some ways.

Programming is the art of managing memory.
At least in high-performance systems, but it has been increasingly true for general purpose CPUs for a long time as well, if you do anything where performance is critical. And it should be bleeding obvious why, when you compare the core throughput capabilities with the memory subsystems that are supposed to feed them. Cache simply reduces the penalty for programmers who chooses to ignore the last two/three decades or so of hardware development, and stick to programming as it is still taught in schools, which is strongly tied to minimizing cost of support which is the critical parameter for clerical applications.

Software that needs to extract a high percentage of the capabilities of the hardware is a different kettle of fish entirely. It always has been. And yes, that means that you have to have a good awareness of the underlying hardware, and yes it means that your code will have (manageable) portability issues. So what? That's the price you've always had to pay for performance, but the performance benefits from prioritizing that way is very large indeed. For scientific codes, that get ported to tons of different hardware over the course of decades, this is obviously a bit of headache but is also par for the course. For games on consoles, explicitly taking the underlying memory hardware architecture into account will bring huge performance benefits as well, and without most of the porting complications that apply to scientific codes.

It really is difficult to feel much sympathy for programmers that choose to whine about having to change their coding habits to fit the underlying reality of their tool, and without appreciating what their tool offers and can be made to yield. I really can't help feeling that they should stay in some other field of their craft, or change career entirely. The spectres of COBOL and PL1 programmers hover just behind the shoulders of todays PC c++ coders.

Shifty Geezer · Jul 11, 2005

I don't know im IBM were looking for a new processor line to market. It could potentially be that IBM's design is better than the final Cell design maybe, but in such a case they would still toot Cell as it's going to feature in other equipment guarenteed, whereas a multicore CPU would have no more customers guarenteed with IBM erking Sony and Toshiba by pushing a rival.

Though IMO I think the parties settled on the best possible solution, bringing different philosophies and finding the happy midground.

London-boy : I would have thought IBM get paid for chips, not cores.

London Geezer · Jul 11, 2005

Shifty Geezer said:
London-boy : I would have thought IBM get paid for chips, not cores.

Obviously, but would they get paid more for a 3 PPC core chip or a 1 core one? Are the SPEs an IBM creation or a Sony/Toshiba one?

In the end, it doesn't matter, i can't wait to see how the 2 different approaches will handle the end result - the games.

Shifty Geezer · Jul 11, 2005

I think the Cell technology is shared amongst STI, and purchasers of the chips will give an equal fee to IBM, Sony and Toshiba. It'd be crazy to pay one company more or less based on contribution - how could you measure such a thing? SPE use PPC ISA, so are they PPCs...?

PC-Engine · Jul 11, 2005

Are the SPEs an IBM creation or a Sony/Toshiba one?

The original Toshiba/SONY embodiment was a VLIW APU. IBM suggested that it be changed to a SIMD SPE which it did.

Correction Toshiba's orginal plan was the Force System which was comprised of many small/simple VLIW RISC cores with a main control core based on MIPS. IBM wanted multiple complex PPC cores with no control core. Engineers from SCE was in favor of the IBM proposal including a former NEC supercomputer engineer. KK however rejected the IBM proposal because it required too many pins which would be too expensive for their targeted budget. Former NEC engineer now working at SCE talked to IBM and IBM agreed with a PPC control core with many smaller simpler cores attached to it. The actual APU architecture was IBMs idea but Toshiba wanted these small cores to be VLIW. IBM convinced Toshiba to use SIMD instead of VLIW for the smaller cores.

So in summary STI created the APU/SPEs.

Carl B · Jul 11, 2005

Well, we've all read the same excerpts posted by one, probably several times each, so we should all be on the same page, but we're just not and that's ok.

I see what you're saying Shifty, but when I say Cell seems more Sony and Toshiba's brainchild, I say so because it's an architecture along the lines of what they expected to create when they went into it - whereas IBM was something of a holdout; granted IBM had an equal vote and got some of their ideas were adopted.

Also I have to question the logic that IBM did the majority of the engineering - these excerpts made it fairly apparent that Toshiba and Sony were highly involved throughout the entire process, with their own sizable contingent of engineers on premises in Austin and with Kutaragi with enough oversight to be able to scrap the entire design at one point.

As for what London Boy is saying when he refers to revenue from the cores, I don't think he is refering to manufacturing fees/revenues, but rather to possible licensing fees/revenues from the Power cores (Power being IBM IP afterall). In which case three is better than one, depending on the cost/licensing structure.

And my comment on 'IBM's Cell' was certainly not some thing I was just making up - truly in the American press, if nowhere else, that is how it is depicted. Since I'm in the US, it's what I deal with for the most part.

Do a Google search for 'IBM's Cell' and you will find countless exemplars of this.

The Engineers Who Created Cell

version

bleon

phat

Fafalada

Shifty Geezer

uber-Troll!

3roxor

London Geezer

Shifty Geezer

uber-Troll!

London Geezer

3roxor

Entropy

Shifty Geezer

uber-Troll!

London Geezer

Shifty Geezer

uber-Troll!

PC-Engine

Carl B

Friends call me xbd

Similar threads