Peter Hofstee (IBM) talk on Cell

BlueTsunami

I laugh at you! HA HA HA!
Veteran
Grabbed this from GAF (thanks to antipode for the info). Appearently Peter Hofstee is talking (I believe) at that London Development Conference (the one at the pub?)

H. Peter Hofstee is giving a technical presentation on Cell right now, so I thought I'd post some notes for those who are interested. Probably most of this will be stuff we already know.

-Host mentions in introduction "in the Playstation 3, hopefully coming out this year", Peter cracks a smile.

-agenda: power wall, memory/latency wall, multicore and specialization, DMA and microarchitecture distinctions, thinks that work and don't work well, things for Academia to look at

-historical specInt single thread growth rate was 45%, but slowing dramatically
-memory wall: asynchronous loads point to non-homogenous cores, efficiency wall: specialized functions point to non-homogenous cores, power wall: reduced transistor power w/limit oxide thickness scaling, channel length and operating voltage points to multiple cores, reduced switching per function points to non-homogenous cores
- ceiling in terms of power - have already hit ceiling in terms of watts that can fit in a traditional computer form factor, need multi-core to progress, don't want much more than 250W in a consumer box
- motivation since 2001 was to "Support an introduction in 2005/6 - Challenge: structure innovation such that 5yr schedule can be met"
-sharing workloads across the network an important design motivation
-non-homogenous coherent chip multiprocessor allows attack on the "Frequency Wall" - deliberately designed for 4GHx, reduced to 3.2 w/ low operating volatage because power efficiency increases greater than cubically - "also helps that we spent 400 mil in that regard" (gets laughs)
-streaming DMA attacks "memory wall"

-potentially a collision between mainstream OS functions and streaming app/games, managed by hypervisor to allow realtime guarantees
-most programmers will not ("and I don't see in practice") use LS Alias available in main memory, since load balancing and scaling are difficult, instead refer explicitly to LS in SPEs

-token-based mechanism guarantees bandwidth at memory and IO chokepoints to real-time OS functions that need it

-Fundamental change for programmers - transition from demand-fetch to software-controlled prefetch - DMA lists to scatter and gather info from memory
-design "would have been flat-out impossible for a PC maker to pull-off" because performance is not as important to them
-everything goes into GPRs - branches, links, compare results - because don't like how higher-frequency processors these specialized registers bloat into stacks

-Q from audience: "this reminds me of the 'CDC 6600' processor with lots of little processors around the central processor". A: "the SPEs aren't really so little (compared to central core)- remember that on compute-intensive tasks they will outperform a Pentium 4"

-fetching from main mem to local store is not so bad - similar to cache miss, but you can have many things in flight

- shows Mercury development system, IBM blade

-future of Cell designs - may be a threat to "Windows" in desire for immerse, 3D interactivity in real-time with distributed, device-agnostic, collaborative apps
- "new types of applications (often real-time) made possible by a dramatic jump in performance - E.g. gesture and emotion recognition"

-thinks emotion recognition (with a camera - frown or smile) is a promising way of interacting with a computer

-Q: How are game devs doing? A: The Sony side knows more about specific game development progress...
-US export restrictions on supercomputers - "at some point we had a question whether all our employees could be allowed into the printing lab" jokes

I apologize how scatterd the points are. The information seems to be coming from somone whos is currently at the conference and is relaying the info back in the form of quick points.

EDIT: I just asked I was told this is NOT the London based conference. This is a seperate event.

The orginal topic creator answered back with this

antipode from GAF said:
Heh, sorry - no it's not London or Fridayton, it's a small presentation at Stanford for faculty and students. The main talk is over, he's just taking questions from professors now. Overall he seems pretty confident in the architecture but a little bewildered at all the different applications people are coming up with and what that means for whether his current design decisions were right and what Cell 2 will look like. He pointed to the "Magic Mirror" (I think that was Toshiba's?) as an application he would never have imagined was possible when he started the project. It sounds like the games are also more ambitious in terms of what he thought necessary - specifically needing DP floating point to handle the number of objects in the game world.

Update:

antipode commenting again said:
No problem. Some of the stuff is probably confusing just because of my shorthand notes, not because it's difficult to explain...

BTW, I asked him about that stuff DCharlie was saying earlier about Sony reserving some of the 7 SPEs for the OS and not for games. He said he couldn't comment specifically on the Playstation 3 but could explain why it wasn't necessary or likely - they designed the Cell with the hypervisor able to partition access to bandwidth for all the SPEs, for real-time applications, so any OS wouldn't normally lock an entire SPE considering it wouldn't need the compute power. So I'm not sure if what DCharlie said is actually true.

It also sounds like the FlexIO that's going to connect the RSX to Cell has alot of tricks up its sleeve - they're using some of them in the IBM Blade to connect multiple Cells together.
 
Last edited by a moderator:
BlueTsunami said:
EDIT: I just asked I was told this is NOT the London based conference. This is a seperate event.

Did sound a little coherent for someone giving a speech in a pub at 1:30am ;)
 
Mmmkay said:
Did sound a little coherent for someone giving a speech in a pub at 1:30am ;)

:LOL:

Very true. I like how he seems to be bewildered by all the applications that are being brought up or are in development for Cell.
 
Since its cell related....

Some excerpts...

Octopiler seeks to arm Cell programmers

Programmers grappling with the Cell chip--the processor behind the Playstation 3 and some high-definition TVs--can get a helping hand from IBM's new project: the Octopiler.

It isn't easy to write code for Cell, with its central processing core and eight accompanying special-purpose engines. Octopiler, which IBM Research plans to outline at a tutorial next month, aims to change all that. The software development tool converts a single, human-written program into several different programs that run simultaneously on Cell's various cores.

A compiler such as Octopiler is a development tool that translates a programmer's source code into machine language the chip can understand. The name Octopiler refers to the ability to control how software uses Cell's eight special-purpose engines.

Octopiler has more work to do than most compilers. For one thing, it must create instructions in a different language for the eight SPEs than for the PowerPC core. For another, it must divvy up software among the nine cores and govern how those programs communicate and share memory.

'And it has to scrutinize source code for the specific "single instruction, multiple data" tasks that SPEs can perform. Those tasks economize chip operations by performing the same operation on multiple data elements in one fell swoop.

source: cnet.com
 
An update from antipode

No problem. Some of the stuff is probably confusing just because of my shorthand notes, not because it's difficult to explain...

BTW, I asked him about that stuff DCharlie was saying earlier about Sony reserving some of the 7 SPEs for the OS and not for games. He said he couldn't comment specifically on the Playstation 3 but could explain why it wasn't necessary or likely - they designed the Cell with the hypervisor able to partition access to bandwidth for all the SPEs, for real-time applications, so any OS wouldn't normally lock an entire SPE considering it wouldn't need the compute power. So I'm not sure if what DCharlie said is actually true.

It also sounds like the FlexIO that's going to connect the RSX to Cell has alot of tricks up its sleeve - they're using some of them in the IBM Blade to connect multiple Cells together.
 
Can't two Cell chips utilize the FlexIO to communicate with each other in a 'glueless' environment? In fact, I thought that this was a 'limitation' of the current Cell iterations, where large glueless environments would be the natural direction implied by the original patent.

But certainly a plus for SMP systems.

I mean, if there's more to the FlexIO than that in terms of what he's implying though, I'm certainly interested.
 
xbdestroya said:
Can't two Cell chips utilize the FlexIO to communicate with each other in a 'glueless' environment? In fact, I thought that this was a 'limitation' of the current Cell iterations, where large glueless environments would be the natural direction implied by the original patent.

But certainly a plus for SMP systems.

I mean, if there's more to the FlexIO than that in terms of what he's implying though, I'm certainly interested.

I think the mystery to flexio is how it will work with a bi-directional DMA management. I fully expect their to be DMA management logic on the RSX...just as their is on CELL.
 
I don't know the history of FlexIO but was FlexIO brought about for Cell/RSX interaction? or Cell/Cell interaction? Also Is FexIO actually being put into these Blade Servers or is it a form of FlexIO (just what is needed).

I find it cool though that FlexIO is being utilized beyond the PS3 in that manner.
 
FlexIO is an inherent part of Cell's useage of XDR - or the other way around - but either way the point is that it's a Rambus technology. Now, it's hard to say how much the development of Cell effected the development of FlexIO, but I rather think that Cell was built with FlexIO in mind rather than believing Rambus did anything to tweak it themselves for STI's purposes.

The Cell blades will (should) have a full FlexIO.
 
Last edited by a moderator:
Mythos said:
Since its cell related....

Some excerpts...

Octopiler seeks to arm Cell programmers

Programmers grappling with the Cell chip--the processor behind the Playstation 3 and some high-definition TVs--can get a helping hand from IBM's new project: the Octopiler.

It isn't easy to write code for Cell, with its central processing core and eight accompanying special-purpose engines. Octopiler, which IBM Research plans to outline at a tutorial next month, aims to change all that. The software development tool converts a single, human-written program into several different programs that run simultaneously on Cell's various cores.

A compiler such as Octopiler is a development tool that translates a programmer's source code into machine language the chip can understand. The name Octopiler refers to the ability to control how software uses Cell's eight special-purpose engines.

Octopiler has more work to do than most compilers. For one thing, it must create instructions in a different language for the eight SPEs than for the PowerPC core. For another, it must divvy up software among the nine cores and govern how those programs communicate and share memory.

'And it has to scrutinize source code for the specific "single instruction, multiple data" tasks that SPEs can perform. Those tasks economize chip operations by performing the same operation on multiple data elements in one fell swoop.

source: cnet.com

Well that sounds a little bit too good to be true. Can it actually be that good? Developers please chime in.

Thanks.
 
xbdestroya said:
FlexIO is an inherent part of Cell's useage of XDR - or the other way around - but either way the point is that it's a Rambus technology. Now, it's hard to say how much the development of Cell effected the development of FlexIO, but I rather think that Cell was built with FlexIO in mind rather than believing Rambus did anything to tweak it themselves for STI's purposes.

The Cell blades will (should) have a full FlexIO.

Thanks for the explanation XB!
 
mckmas8808 said:
So I'm guessing people like DeanoC and other PS3 devs will be using this "Octopiler" compiler?

I'd say trusting a compiler to split up your code and toss it on different SPEs is asking for trouble. It also sounds like something thats quite a ways from being in a usuable form.

Sounds like a research project and nothing more...

It's by IBM anyways -- Sony isn't even using IBM's normal cell compiler, are they? I vaguely remember Sony going with something else (much to the chagrin of some).
 
Back
Top