DeanoC do you know when you get the final PS3 Harware?

Democoder said:
There is no way a distributed CELL design will get within a sliver of that peak theoretical performance. Only extremely embarassingly parallel problems like those run on *@Home would work. BlueGene/L is already at 65536 processors and hits 360TFlops theoretical, which means each process yields ~5.5GFLOPs. Do you really think a CELL CPU is 30+ times faster than the PowerPC derivatives in BlueGene?

Sorry I didn't see your reply, and of course not for all programs -- barring the types of problems you stated, which are of value. I believe Cell has roots in BlueGene's initial goal of faster Protein Folding and other bio-science computing applications. But, more to the point, as I stated before, When you're talking about using Cell in this facility, any basis for it's use is relying on it's economies of scale to lower the per-unit cost into a region in which it can use it's sheer preformance to surpass the even lower cost that a commodity x86 system has.

And as for your 30X speed-up... it would depend on the task.

PC-Engine said:
Democoder said:
BlueGene/L is already at 65536 processors and hits 360TFlops theoretical, which means each process yields ~5.5GFLOPs.
IIRC BG/L is at 131K processors now and is rated at 367 TFLOPS.

http://www.top500.org/lists/2005/11/basic

Democoder is right in the context of the argument as each IC has dual-core CMP: [131072] / [2] = 65536 ASICs
 
PC-Engine said:
IIRC BG/L is at 131K processors now and is rated at 367 TFLOPS.

http://www.top500.org/lists/2005/11/basic

Actual 280 TFLOPs on an embarrassingly parallel workload. This is only possible because there is so little communication required by the benchmark that both the cores can run the main loop. In a more realistic situation, the performance is roughly half of stated.

Aaron Spink
speaking for myself inc.
 
Vince said:
Sorry I didn't see your reply, and of course not for all programs -- barring the types of problems you stated, which are of value. I believe Cell has roots in BlueGene's initial goal of faster Protein Folding and other bio-science computing applications. But, more to the point, as I stated before, When you're talking about using Cell in this facility, any basis for it's use is relying on it's economies of scale to lower the per-unit cost into a region in which it can use it's sheer preformance to surpass the even lower cost that a commodity x86 system has.

The problem is that for the level of computation one would be running, DP will be required at a minimum. Nor does this get into the massive communication issues present. Cell as it currently stands would like be uncompetitive with something like blue gene on the vast majority of the HPC workloads.

Aaron Spink
speaking for myself inc.
 
Since Vince brought up massive economies of scale, is it just me or does XeCPU seem more apt at supercomputing than CELL? For example CELL as seen in PS3 is capable of ~ 26 GFLOPS DP, but isn't about half of that coming from the PPE? XeCPU has 3 PPEs and it has a smaller die too so for DP it seems XeCPU would be better than CELL as a building block for a supercomputing cluster when strictly talking about economies of scale through millions of manufactured processors. In fact you may even fit a 4th PPE in there and still come out smaller than CELL.
 
PC-Engine said:
Since Vince brought up massive economies of scale, is it just me or does XeCPU seem more apt at supercomputing than CELL? For example CELL as seen in PS3 is capable of ~ 26 GFLOPS DP, but isn't about half of that coming from the PPE? XeCPU has 3 PPEs and it has a smaller die too so for DP it seems XeCPU would be better than CELL as a building block for a supercomputing cluster when strictly talking about economies of scale through millions of manufactured processors. In fact you may even fit a 4th PPE in there and still come out smaller than CELL.

Well first the CELL DP performance should be:
1 PPE * 1 FMAC * 2 FLOPS/FMAC * 3.2 Freq = 6.4 GFLOPS
7 SPU * 2 FMAC * 2 FLOPS/FMAC * 3.2 Freq * 1 Inst / 7 Cycles = 12.8 GFLOPS
CELL Total = 19.2 GFLOPS

Xenon DP performance should be:
3 PPE * 1 FMAC * 2FLOPS/FMAC * 3.2 Freq = 19.2 GFLOPS

So both CELL and Xenon have the same DP performance, though there is certainly no contest that Xenon would be easier to program and achieve closer to peak performance in a DP float enviroment. Even Vince would have to concede that one.

Aaron Spink
speaking for myself inc.
 
aaronspink said:
Well first the CELL DP performance should be:
1 PPE * 1 FMAC * 2 FLOPS/FMAC * 3.2 Freq = 6.4 GFLOPS
7 SPU * 2 FMAC * 2 FLOPS/FMAC * 3.2 Freq * 1 Inst / 7 Cycles = 12.8 GFLOPS
CELL Total = 19.2 GFLOPS

Xenon DP performance should be:
3 PPE * 1 FMAC * 2FLOPS/FMAC * 3.2 Freq = 19.2 GFLOPS

So both CELL and Xenon have the same DP performance, though there is certainly no contest that Xenon would be easier to program and achieve closer to peak performance in a DP float enviroment. Even Vince would have to concede that one.

Aaron Spink
speaking for myself inc.

DFMADD is 2 cycle on ppe and xenon
half your results
 
The calculations don't seem to take into account that there are actually 8 SPEs in Cell. If the 7 SPE decision stays in PS3 production for better yields it surely doesn't mean that blade servers utilizing Cell need to use this specific version of processor.


--
Reboot -man says "yes".
 
Oh, PS3 was mentioned in the quoted text. Anyway the smaller power consumption of Cell should be taken into account regarding the blade server usage, not only DP float capability.

Also as far as I have understood it IBM is upping the SPE DP capacity of their next iteration of Cell.
 
Sct I/On said:
Also as far as I have understood it IBM is upping the SPE DP capacity of their next iteration of Cell.

Sure but not without losing something else somewhere along the line assuming die size, process node, power consumption and SP performance stays the same. Also you can fit 4PPE in the same die size as CELL in PS3.
 
PC-Engine said:
Sure but not without losing something else somewhere along the line assuming die size, process node, power consumption and SP performance stays the same. Also you can fit 4PPE in the same die size as CELL in PS3.

4 processor push up the cache latency, and system will be slower than with 3 ppes
 
aaronspink said:
In fact, you're just flat out wrong. Each PPE or Xenon core does 1 DP MAD per cycle.

Aaron Spink
speaking for myself inc.

Do you have reference to your assertion, that its one cycle DPMADD ?

I haven't seen any reference to either version or your claim. Really want to know.
 
Why are you guys so hung up on DP performance on a CONSOLE forum, and how does this discussion have anything to do with the TITLE of this thread?
 
Edge said:
Why are you guys so hung up on DP performance on a CONSOLE forum, and how does this discussion have anything to do with the TITLE of this thread?
well, this is the 'Console Technology' sub-forum ... also, DeanoC already kinda' answered the question posed by the title of this thread..
 
Sorry to bring this topic back from the dead, but its now coming close to mid December. I wonder if the final kits are out now? Any inside news anyone? ;)
 
Bad_Boy said:
Sorry to bring this topic back from the dead, but its now coming close to mid December. I wonder if the final kits are out now? Any inside news anyone? ;)

No this is a good question. We should actually ask it every week until we get a yes.
 
Back
Top