Today GPGPU and Xeon Phi are beating everyone of course, but GPGPU isn't nearly as flexible as the Cell was, it was a real CPU.
No. Cell is not a real cpu. Or, it's a real CPU, but comparable with the high-tech wonders of the mid-80's. Being only able to directly address a few hundred kB of local pool is not some minor detail that you can forget in the margin, it
defines Cell. Memory access is in general more important than computation these days. The memory architecture of Cell means that the SPEs are much less real cpus than, say, the shader arrays in GCN, which can directly address the global pool.
The PPE of course is a real CPU, but it's also pathetic on it's own.
But the Cell dominated the Green500 supercomputers for years. It took 5 years for someone to beat it's performance per watt, and it was the Blugene/Q prototype in 2010. Even then, the Bluegene/Q is downclocked significantly in order to beat the Cell.
If it was so good, then why didn't people want to buy it? IBM sold less than 2EF total of Cell supercomputers, and every one of them that got shipped was either heavily supported by IBM, or sold to US Govt before the characteristics of Cell were well known. It certainly wasn't expensive, what with IBM desperately trying to push it on everyone. If it was cheap and very efficient, then why didn't everyone jump at the chance?
The answer of course being that while it got good numbers on synthetic benchmarks with completely predictable memory access, on real code it's really bad. It's bad per watt and just bad in general. There are a few things it's really good at, but it general, the things supercomputers are used for can rarely be split into nice small 64kB chunks that you can process in parallel and independently. If they could, Cell would still be a very good system and every major operator would still be begging for IBM to take their money.
IBM learned their mistake, and now their top-end entry for supercomputers is the PPC A2, which ditched the thing that made Cell what it was, and now has proper memory access from each thread. And those, people actually want to buy.
It seems to have served it's purpose perfectly, and led the way to the accelerated processing revolution years later.
I really don't think there is any way that Cell can be said to have lead to an accelerated computing revolution. Various vector processors and other accelerated processing have existed since the beginning of high-end computing, and modern GPGPU has much more in common with, say, the Fujitsu Numerical Wind Tunnel than Cell, both in architecture and programming model.
I would have thought that gaming devs would prefer a very powerful OoO PPE combined with lots of SPEs instead of having to do GPGPU code.
GPGPU platform with a shared memory space is vastly, vastly more comfortable to code for than having to try to split your problem into nice small chunks that you have to dma to the processing elements before they are needed. One of these things just requires you to withstand a little more latency (which is almost perfectly masked by many threads in flight), the other requires you to orchestrate a careful dance of data. All you need is just one unpredictable read from a megabyte pool to really, really ruin your day.
Devs in general hate Cell. I personally hate it with a burning passion. Ever meet some old-timer PS3 devs, buy the one with prematurely gray hair a beer, and you get to listen to vitriol-filled horror stories about trying to fit normal processing into the programming model of the Cell. Sure, we'll use it, if it's the only game in town, and we'll even grumpily admit that it's actually really good at a narrow set of tasks, but I don't think I've ever heard anyone who has actually programmed on it to
prefer it over anything other than having his fingers amputated. And even that will probably make a lot of us pause and think.