Some Innuendo!!

Pepto-Bismol

Newcomer
WARNING: Parental Guidance Suggested. Some Material May Not Be Suitable For Children.

Three years ago, IBM (NYSE: IBM) , Sony (NYSE: SNE) and Toshiba announced a partnership aimed at developing a new processor for use in digital entertainment devices like the PlayStation. Since then, the product has seen a billion dollars in development work. Two fabs, one in Tokyo and one in Fishkill, New York, have been custom-built to make the new processor in large volumes. On May 12th, IBM announced that the first commercial workstations based on this processor would become available to game-industry developers late this year.

A lot is known about this processor, but relatively little real information has leaked. To the extent that performance information has become available, it has been characterized by numbers so high that most people simply dismissed the reports. In November of last year, for example, a senior Sony executive told an internal audience that implementations would scale from uniprocessors to 64-way groupings that would deliver in excess of two teraflops -- making it more than 10 times faster than Xeon.

Most of what we know about this machine comes from U.S. patent #6,526,491 as issued to Sony in February 2003 for a "memory protection system and method for computer architecture for broadband networks."

Here's the abstract:

  • A computer architecture and programming model for high speed processing over broadband networks are provided. The architecture employs a consistent modular structure, a common computing module and uniform software cells. The common computing module includes a control processor, a plurality of processing units, a plurality of local memories from which the processing units process programs, a direct memory access controller and a shared main memory.

    A synchronized system and method for the coordinated reading and writing of data to and from the shared main memory by the processing units also are provided. A hardware sandbox structure is provided for security against the corruption of data among the programs being processed by the processing units. The uniform software cells contain both data and applications and are structured for processing by any of the processors of the network. Each software cell is uniquely identified on the network. A system and method for creating a dedicated pipeline for processing streaming data also are provided.

The machine is widely referred to as a cell processor, but the cells involved are software, not hardware. Thus a cell is a kind of TCP packet on steroids, containing both data and instructions and linked back to the task of which it forms part via unique identifiers that facilitate results assembly just as the TCP sequence number does.

Outrageous Performance Claims

The basic processor itself appears to be a PowerPC derivative with high-speed built-in local communications, high-speed access to local memory, and up to eight attached processing units broadly akin to the Altivec short array processor used by Apple (Nasdaq: AAPL) . The actual product consists of one to eight of these on a chip -- a true grid-on-a-chip approach in which a four-way assembly can, when fully populated, consist of four core CPUs, 32 attached processing units and 512 MB of local memory.

The per-cycle performance of the core CPU is undocumented but may be expected to be comparable to other PowerPC machines running at high cache hit rates. Specifications for the four or eight attached processors comprising the array are known; these are expected to turn in one floating point operation per cycle or around 32 Gigaflops for the fully populated array at a nominal 4 GHz.

That's where the apparently outrageous performance claims come from: a four-way assembly running at a planned 4 GHz offers 32 x 4 = 128 Gigaflops in potential floating-point execution. A 64-way supergrid made by stacking eight eight-way assemblies would have a total of 512 attached processors and could, therefore, break 2 teraflops if data transportation kept up with the processors.

In practice, however, Apple has never succeeded in getting the bulk of its developers to make effective use of the Altivec, and Sun has had essentially no success getting people outside the military and intelligence communities to use the four-way SIMD capabilities built into its Sparc processors. Grid computing is slowly entering the commercial mainstream, but combining both local-array access with grid computing requires a significant shift in programming paradigm that will not appeal to the mainstream Wintel and IBM customer base.

Gains Outweigh the Pain

For games developers, however, the potential gains -- up to 50 times the best x86-based processor and graphics board combinations can deliver -- should outweigh the pain. Even a minor software change, the kind of thing Adobe does to take advantage of the Altivec in Photoshop, should offer significant advantages to a wider programming community and enable floating-point-intensive applications to run a full order of magnitude more quickly on this machine than on Intel's (Nasdaq: INTC) best.

An important point to bear in mind is that this processor will be inexpensive, and systems built around it even less expensive because no external graphics or network boards will be needed. Both Sony and IBM have been building fabs specifically to make this device. Volumes will be high because Sony will use up to 20 million assemblies in the PlayStation, while 10 million or more that don't quite make the quality cut will get used in its digital televisions and other products.

Very little has been publicly revealed about the operating system for this thing, but it is quite obvious what it has to be and how it has to work. Each core will have its own local Unix kernel, with most just executing cells as they arrive from the dispatch manager and one managing the traffic-coordination hardware. In all likelihood, the kernel used will prove to be both Linux-derived and Linux-compatible -- meaning that most Linux software will run out of the box on the uniprocessor configuration while software adapted for the grid environment will run unchanged on everything from the uniprocessor to configurations with hundreds or even thousands of processor assemblies.

As users of Sun's open-source grid software have found, performance losses on single processes increase as you add processors because data flow and timing control issues increase in complexity nonlinearly with system growth. Fundamentally, what happens is that the larger you make the total machine, whether on one piece of silicon or in a rack, the more cell transit time dominates execution time and the greater the performance cost imposed by the need to coordinate operations.

New Generation of Linux PCs

The patent mentions the use of no-ops (processor nulls) inserted into cells to get around timing problems associated with having components run at different speeds -- with processor coordination initially enforced by setting TTL-like time budgets for cell execution. My guess, however, is that advances in cell isolation and programming for asynchronous event handling have since obsolesced those solutions.

I expect, therefore, that when the real thing appears, it will fully support both the traditional grid format for on-chip work and an asynchronous hypergrid for multi-assembly processes on the model Thinking Machines hoped to achieve with the transputer-based hypercube in 1985 -- and that NSA is rumored to actually have built on 1989's Sparc-SIMD-based CM-5.

Either way, however, the OS for this machine is likely to offer both Linux compatibility at the low end and enormous scalability for those willing to modify their software -- which is why, as I discuss in next week's column, I expect IBM and Toshiba soon to launch a new generation of Linux PCs built around the combination of this CPU with IBM software products like Lotus Workspace for Linux.

Source: Linux Insider
 
A 64-way supergrid made by stacking eight eight-way assemblies would have a total of 512 attached processors and could, therefore, break 2 teraflops if data transportation kept up with the processors.

No shit.

But they did not mention the Broadband Engine in their writeup, as this is the planned PS3 1TFLOPS IC.

Maybe I should E-Mail the author and make clearer a few things :)

Edit: I see his little mention of it

The actual product consists of one to eight of these on a chip -- a true grid-on-a-chip approach in which a four-way assembly can, when fully populated, consist of four core CPUs, 32 attached processing units and 512 MB of local memory.
 
I like that guy.

I really hope that apart from the cell workstation being the core of the PS3 development station they will get some CGI shops to use them as workstations, and get linux and assorted open source nuts on board ... I want to see them get cheap, and AMD/Intel forced to make parallel oriented cores.
 
Assume all our wildest dreams of PS3 are fulfilled. We get a broadband engine (stupid name) with 4 PPC cores and 32 APUs at 4GHz on a single chip, plus as much as 64MB eDRAM and half a gig of XDR in one box.

Just THINK if this thing ran Linux straight out of the box... :p Connect USB mouse and keyboard to it, you have a MONSTER computer even if you don't count the APUs!

Then THINK if it's possible to daisy-chain PS3s for seamlessly increased performance... OMG! The f-boy in me is getting so excited he almost throws up. :D

Now, where reality will intersect our dream world, THAT will be the really interesting bit. We still don't KNOW the BBE will have 4/32/64MB@4GHz or even close to it. It's all pie-in-the-sky at the moment. Frankly, I'd be happy as hell even if all those numbers were cut by two (yes even though that would be "only" 1/8th of a Tflop in peak performance).
 
Assume all our wildest dreams of PS3 are fulfilled. We get a broadband engine (stupid name) with 4 PPC cores and 32 APUs at 4GHz on a single chip, plus as much as 64MB eDRAM and half a gig of XDR in one box

lol, my wildest dreams for PS3 go well beyond that. in my wildest dreams, PS3 CPU has 16 PowerPC cores, 128 APUs, at 4~5 GHz. plus 256 MB eDRAM with 1-2 TeraByte bandwidth plus 2-4 GB XDR external memory at 100-200 GB/sec bandwidth. :LOL: you DID say wildest dreams right?
 
Hahaha, Mega... Guess we need to separate between "somewhat realistic wildest dreams" and "completely totally nutso-unrealistic wildest dreams"... :LOL:
 
Guden Oden said:
Then THINK if it's possible to daisy-chain PS3s for seamlessly increased performance... OMG! The f-boy in me is getting so excited he almost throws up.

I think the author undermines its philosophy actually. As I understand it, the Cell project is considerably broader and more malleable than he is giving it credit for -- a universal processing standard and "one size fits all" approach to computing that will squeeze into a menagerie of devices.

STI are attempting to design a revolutionary architecture, keeping in mind the impact technology (and geeks!!) may have on its past, present and future -- unlike 80x86 designers in the early 1970s ... ;)
 
PSP wont use Cell and Toshiba has already presented a media processor it sees as a complement to Cell ... one size never fits all.
 
no doubt Sony had more than one chipset for PlayStation2 back in the late 1990s. I think Sony might have had E&S TR`5 or E&S RealIMAGE based PS2 standing ready in case Sega used Lockheed Real3D in the Saturn2 or Dural.
 
MfA said:
PSP wont use Cell and Toshiba has already presented a media processor it sees as a complement to Cell ... one size never fits all.

Since the project is steeped in biology, I'd be inclined to say that a compliment to Cell is, well, another Cell. :|

True. The processor in PSP may not have the same function as the one in PS3, but the two should have a very similar composition if for nothing else than to be able to operate efficiently within the same "organism". This, after all, is how real cells get along.
 
Yes, but the DNA shouldnt be written in Cell's arbitrary and low level ISA. JVM/CLR/LLVM/condensed-graphs ... that is where the future lies, not some ISA of the day. Otherwise we will just get stuck with another x86.
 
DeanoC said:
Sony also are working on a non-Cell processor architecture strangely enough. Guess you always have to have a back up plan just in case (note its designed to run conventional code well...)

Sony has a really nasty habit of playing the electronic entertainment industry like Bobby Fischer plays chess: they seem to be several moves ahead of their competitors! So I wouldn't put it past those turkeys to have already begun their post-Cell research ... :rolleyes:

But as far as working on a Cellular alternative goes -- a Plan B in case the incredibly expensive Plan A doesn't work out -- I'm not so sure. Sony Corp. seems to have bet the farm on Cell.
 
Back
Top