Jawed
Legend
That's precisely what TiTech is against. They've built a 655 node/16-core array of Opterons (10480 cores) and they've added 360 Clearspeed boards (and plan to add more), basically one board per node.The Japanese supercomputers tend to be hardware orientated inflexible approach targetted at very specific applications, while the US approach is software orientated, and more flexible. I suppose this reflects Japanese vs US technology strengths. Before the current IBM world's fastest supercomputer, the previous world's fastest supercomputer was an exotic Japanese array processor, which was designed to do very specific supercomputing tasks, unlike the other competing supercomputers which were general purpose machines.
25W for two ClearSpeed CSX600s on a board, including 1GB of memory, with each board producing a sustained 50GFLOPs in DGEMM.I am not sure that comparing Cell to Clearspeed on a watt per flop basis is fair, since Cell has the PPE as a control processor, along with an on-chip ring bus, flex-io and associated logic. Clearspeed is just a DSP, and would require an external control processor and communications logic, which would consume more power. Comparing Clearspeed with SPEs with reduced local store would be more appropriate.
The Roadrunner architecture posits Cell as a co-processor, with an Opteron as host per node. As far as I can tell this is because they want to run existing x86 code on it and hand-off FP work to Cell. Or use x86 as the glue to distribute data to the Cells.
For double-precision work, Cell as a "co-processor" doesn't currently seem to be very compelling, there's stuff out there that drops in more easily and provides more performance.
My point has always been that will change, because IBM has plans for a true DP Cell in 2008.
Cell currently seems to be deployed in applications where DP isn't important and additionally the system is designed to run a single application.
Jawed