Even more curious, we have a performance estimation of it of 100-200 GFlops (whether going by CPU equivalent of CU performance). Cell would have been tiny and effective and more powerful and allow PS3 BC, and even be useable for other things. I wonder if it was considered? Did they see the value but find it but find it too difficult/costly to adapt Cell for an AMD SOC? Or was it not even on table, and if not, why not?
Sony might have needed to dust off the contracts and old data for Cell, if they maintained it. IBM was the last one to touch Cell, with a 45nm variant.
At this point, none of the three partners in the Cell alliance have kept up with the process node race, with IBM's server offerings lagging the least behind with a dedicated 14nm line at GF.
Someone would need to re-implement Cell, and it might be a question of the contracts and investment needed to spin that pipeline back up, or if it would be workable to invite an outside party like AMD in to implement it.
After that, some research and design modifications would be needed to get Cell working so many nodes past where it was last looked at, and to fit it as a slave device in a very different ecosystem. Something like that did happen with the Roadrunner supercomputer, although a straight port of that would require dusting off the 90nm K8 Opteron as well...
If Sony abstracted away the incompatible system architecture, (EIB, the interrupt handling of a PPC system, DSPs, different DMA engines) and 2 incompatible ISAs, it would just be for backwards compatibility and so an entirely backwards-facing investment.
Sony probably would not be up to the task, and I suspect Toshiba wouldn't be.
IBM might game if Sony paid enough, and maybe AMD could be enticed with a large sum if legal matters didn't get in the way. Both of them might feel that there would be too much re-building of an abandoned architecture, though.
A sufficient payment would be much greater than what was needed for the BC measures put into the CPU and GPU blocks, and no synergy would exist with AMD's extant product pipelines.
Not making the Cell block a BC-only feature and exposing to the outside world might have made it an ongoing engineering effort, which is something Sony tends to avoid.
The SPUs can be deployed without PPU. Toshiba sold such a configuration before:
https://en.wikipedia.org/wiki/SpursEngine
Lacking the PPE would hurt the backwards compatibility angle, however.
And it is heavily modified so much that it works like a SPU. This is not working at all like a CU This is what I said. ND Audio dev call it a SPU not a CU for example.
I know this is a modified CU but I am sure when dev code for it this probably have much more in common with SPU because of DMA and a serial programming model than a CUs with a cache and many wavefront.
Details are sparse, so I'm curious how different it its.
For example, the SPU is much narrower than a SIMD, and only having 2 wavefronts doesn't seem like it allows the wide hardware to be subdivided enough to give narrow-SIMD behavior.
What exactly replaces the caches, or which caches are replaced is another question mark. The GPU cache path is extremely long-latency relative to an SPU-like architecture, but what would have been put in its place?
There would still need to be memory operations out of some kind of scratchpad, although the LDS is still a long-latency pool compared to the LS in Cell.
Other details don't rule out a variant of the GCN ISA rather than the RISC-like SPU. A GCN/RDNA programming model would reduce the complexity for developers Sony is apparently making half of the wavefronts available to devs.
One type of GPU block that has a kind of batch or tile-based load, compute, export pattern is the RBEs, which is why they've historically been so effective at consuming memory bandwidth.
For years I always thought of the SPUs as a DSP-like architecture. Not sure why it's not called that.
I've seen descriptions of the SPUs as being DSPs, or DSP-like in their usage. It was pointed out this was their primary strong point, and detractors pointed out that Cell wasn't the first design to have a heterogeneous design with a general-purpose core flanked by DSPs. The lack of success of those attempts was one source of skepticism about the architecture.