ISSCC 2005

PZ said:
Oh god, fugly ... so the software can know 18 cycles ahead of time which way a branch will go, but will not have a way to tell the hardware that?

Yeah, I am concerned that this chip will require a super compiler, OS, and scheduler to work properly without overwhelming the programmer. It seems as though they took a lot of the complexity out of the hardware in order to get speed and shifted the complexity to the OS and compiler and ultimately back onto the general purpose PPC core (is that good?)..

Compiler, certainly yes. But OS ? The OS will run on the PPE, all it has to do to make the SPUs run correct is set up the IOMMUs (or equivalent) for these, - similar to what all OS already do for PCI on hardware that support IOMMUs.

SPU context switches are going to be super expensive, but that's allright since the entire thing is only meant to run one big honking app. at a time.

Cheers
Gubbi
 
I know some of you are dissappointed, but holy cow! 256GFLOPs?! If you consider what Sony did with the PS2 and argueably the least powerful console, think of what they will do with possibly the most powerful HW.

hmm.. could we imagine :
PS3 ----> 3 years lifespan generation ,then PS4= "Cell apu add-on kit" on it ?
 
IF they give us a GOOD compiler coding that beast can't be so bad as coding on PS2 VUs (even with VCL help..).
 
PZ said:
Hmmm... did PS3 programming just start to suck :D

I don't think so. It just means that you don't have any branch prediction. For the type of work SPUs is likely to do branch prediction probably don't do a whole lot of good any way.

Most of the time a SPU is likely to crunch vectors with a regular access/control patterns. Regular patterns -> highly predictable.

And for stuff like collision checking, traversing trees, branch prediction gives you little anyway, since the branch(es) you'll have to do in each node has zero correlation to the branches you have already taken (and branch predictors are all historical devices).

Mind you the predicate branching model is already used by PPC MPUs, which has 8 predicate registers. instead of a fused test-and-branch instruction (like you have in MIPS), it's split in test (calculate predicate) and branch (on predicate true), like this:
Code:
foo:
..
cmpeq p0,r1,r2;      // p0 = r1==r2
..
branch p0, foo        // if(p0) goto foo

Cheers
Gubbi
 
More questions: The PPE is a single issue MPU with VMX (Altivec), however having a VMX unit with 1 op / cycle throughput seems overkill when the entire MPU can only issue 1 op / cycle (it has to be fed ... and data needs to be shuffled around), any info on VMX throughput ?

Also, there were patents disclosing some kind of memory overlay mechanism for the SPUs, in turn to virtualize SPU scratchpad memory (but through software AFAICR), Panajev, didn't you dig that up? Any info on that?

To be used as a general purpose CPU, there would have to be an efficient way to allocate SPU resources and switching contexts, and swapping 8 SPU's scratchpad memory (plus all their registers) on a context switch doesn't seem like a viable way to go (>2MB of state).

Cheers
Gubbi
 
_phil_ said:
I know some of you are dissappointed, but holy cow! 256GFLOPs?! If you consider what Sony did with the PS2 and argueably the least powerful console, think of what they will do with possibly the most powerful HW.

hmm.. could we imagine :
PS3 ----> 3 years lifespan generation ,then PS4= "Cell apu add-on kit" on it ?


no, more like a new console in ~5 years with a new Cell-2 based CPU with dozens or hundreds of processors giving Tflops of processing power.
 
Maybe the OS will manage an "SPU stack" by dividing up the scratchpad, say, 64k for cache, 64k for stack. Then that 64k can be further subdivided to hold a bunch of separate stacks for many threads. A spill mechanism can spill the 128k scratchpad when limits are reached.

If gaming on a "desktop", perhaps the game can "lock" the SPU resources so that they cannot be context switched.

Of course, new programming languages and libraries are needed to extract maximum benefit.
 
Ok, thanks for the explanation guys, but I have one more question that I haven't seen the answer to yet: Have the changed the name of the PU aswell as the APU?
 
DemoCoder said:
Maybe the OS will manage an "SPU stack" by dividing up the scratchpad, say, 64k for cache, 64k for stack. Then that 64k can be further subdivided to hold a bunch of separate stacks for many threads. A spill mechanism can spill the 128k scratchpad when limits are reached.

Yes, and there's probably a small chunk of scratchpad memory allocated to running the ABI, like the stack engine (remember everything has to be explicitly fetched/put). I can see why IBM wants to pack code/data into fixed chunks (apulets), to better manage the SPUs from a system perspective. Still, context switches seem expensive.

Goddamn, that is a can of worms.

DemoCoder said:
If gaming on a "desktop", perhaps the game can "lock" the SPU resources so that they cannot be context switched.

Yes, similar to the way GPUs are used on PCs today.

Cheers
Gubbi
 
Gubbi said:
More questions: The PPE is a single issue MPU with VMX (Altivec), however having a VMX unit with 1 op / cycle throughput seems overkill when the entire MPU can only issue 1 op / cycle (it has to be fed ... and data needs to be shuffled around), any info on VMX throughput ?

Also, there were patents disclosing some kind of memory overlay mechanism for the SPUs, in turn to virtualize SPU scratchpad memory (but through software AFAICR), Panajev, didn't you dig that up? Any info on that?

To be used as a general purpose CPU, there would have to be an efficient way to allocate SPU resources and switching contexts, and swapping 8 SPU's scratchpad memory (plus all their registers) on a context switch doesn't seem like a viable way to go (>2MB of state).

Cheers
Gubbi

The PU/PPE should be a dual-issue core...

Contains 64-bit Power ArchitectureTM with VMX that is a dual thread SMT design – views system memory as a 10-way coherent threaded machine

Single-issue, but dual threaded SMT would make little sense, efficiency wise.
 
I was wondering about something that might be important - will the Nvidia GPU handle geometry calculations, or will this be the CPU's task, like in the PS2?
 
By the way, according to H. Goto @ Impress Watch, the Cell PPE is a core newly developped from the ground up and not derived from older cores. Only its ISA is compatible with other Power CPUs, and with VMX, compatible with PowerPC 970.
 
Megadrive1988 said:
_phil_ said:
I know some of you are dissappointed, but holy cow! 256GFLOPs?! If you consider what Sony did with the PS2 and argueably the least powerful console, think of what they will do with possibly the most powerful HW.

hmm.. could we imagine :
PS3 ----> 3 years lifespan generation ,then PS4= "Cell apu add-on kit" on it ?


no, more like a new console in ~5 years with a new Cell-2 based CPU with dozens or hundreds of processors giving Tflops of processing power.

So the quest for a 1 TFLOPS CELL have been pushed back to 2009? :LOL:

BTW anybody seen Vince?
 
Laa-Yosh said:
I was wondering about something that might be important - will the Nvidia GPU handle geometry calculations, or will this be the CPU's task, like in the PS2?
100% speculation: CELL CPU will do all the geometry calculations, NVIDIA GPU will do all the pixel shading calculations (so the GPU wll not have specialized vertex shading hardware..)
BUT NVIDIA GPU will be flexible enough to do even vertex shading with its pixel shading pipelines, even if it will not be as efficient (die size wise..) as a dedicated design!

ciao,
Marco
 
So it looks like that PS3 will have 2PEs+16APUs with 512 GFlops, now if NV GPU could fill the rest then we can have 1TFlops. :)
 
Deepak said:
So it looks like that PS3 will have 2PEs+16APUs with 512 GFlops, now if NV GPU could fill the rest then we can have 1TFlops. :)

Yes, looks like it and with some nice amounts of EDRAM. I will make this my wallpaper till E3: ;)

kaigai008.jpg


Fredi
 
McFly said:
Deepak said:
So it looks like that PS3 will have 2PEs+16APUs with 512 GFlops, now if NV GPU could fill the rest then we can have 1TFlops. :)

Yes, looks like it and with some nice amounts of EDRAM. I will make this my wallpaper till E3: ;)

kaigai008.jpg


Fredi

2 CELL with GPU on 1 die????
 
Back
Top