MrWibble said:The first rule of fight club...
Hahah!
MrWibble said:The first rule of fight club...
BlueGene/L is already at 65536 processors and hits 360TFlops theoretical, which means each process yields ~5.5GFLOPs.
Democoder said:There is no way a distributed CELL design will get within a sliver of that peak theoretical performance. Only extremely embarassingly parallel problems like those run on *@Home would work. BlueGene/L is already at 65536 processors and hits 360TFlops theoretical, which means each process yields ~5.5GFLOPs. Do you really think a CELL CPU is 30+ times faster than the PowerPC derivatives in BlueGene?
PC-Engine said:IIRC BG/L is at 131K processors now and is rated at 367 TFLOPS.Democoder said:BlueGene/L is already at 65536 processors and hits 360TFlops theoretical, which means each process yields ~5.5GFLOPs.
http://www.top500.org/lists/2005/11/basic
PC-Engine said:IIRC BG/L is at 131K processors now and is rated at 367 TFLOPS.
http://www.top500.org/lists/2005/11/basic
Vince said:Sorry I didn't see your reply, and of course not for all programs -- barring the types of problems you stated, which are of value. I believe Cell has roots in BlueGene's initial goal of faster Protein Folding and other bio-science computing applications. But, more to the point, as I stated before, When you're talking about using Cell in this facility, any basis for it's use is relying on it's economies of scale to lower the per-unit cost into a region in which it can use it's sheer preformance to surpass the even lower cost that a commodity x86 system has.
PC-Engine said:Since Vince brought up massive economies of scale, is it just me or does XeCPU seem more apt at supercomputing than CELL? For example CELL as seen in PS3 is capable of ~ 26 GFLOPS DP, but isn't about half of that coming from the PPE? XeCPU has 3 PPEs and it has a smaller die too so for DP it seems XeCPU would be better than CELL as a building block for a supercomputing cluster when strictly talking about economies of scale through millions of manufactured processors. In fact you may even fit a 4th PPE in there and still come out smaller than CELL.
aaronspink said:Well first the CELL DP performance should be:
1 PPE * 1 FMAC * 2 FLOPS/FMAC * 3.2 Freq = 6.4 GFLOPS
7 SPU * 2 FMAC * 2 FLOPS/FMAC * 3.2 Freq * 1 Inst / 7 Cycles = 12.8 GFLOPS
CELL Total = 19.2 GFLOPS
Xenon DP performance should be:
3 PPE * 1 FMAC * 2FLOPS/FMAC * 3.2 Freq = 19.2 GFLOPS
So both CELL and Xenon have the same DP performance, though there is certainly no contest that Xenon would be easier to program and achieve closer to peak performance in a DP float enviroment. Even Vince would have to concede that one.
Aaron Spink
speaking for myself inc.
Sct I/On said:Also as far as I have understood it IBM is upping the SPE DP capacity of their next iteration of Cell.
PC-Engine said:Sure but not without losing something else somewhere along the line assuming die size, process node, power consumption and SP performance stays the same. Also you can fit 4PPE in the same die size as CELL in PS3.
version said:4 processor push up the cache latency, and system will be slower than with 3 ppes
version said:DFMADD is 2 cycle on ppe and xenon
half your results
version said:DFMADD is 2 cycle on ppe and xenon
half your results
aaronspink said:In fact, you're just flat out wrong. Each PPE or Xenon core does 1 DP MAD per cycle.
Aaron Spink
speaking for myself inc.
well, this is the 'Console Technology' sub-forum ... also, DeanoC already kinda' answered the question posed by the title of this thread..Edge said:Why are you guys so hung up on DP performance on a CONSOLE forum, and how does this discussion have anything to do with the TITLE of this thread?
Bad_Boy said:Sorry to bring this topic back from the dead, but its now coming close to mid December. I wonder if the final kits are out now? Any inside news anyone?