Cell's dead baby, Cell's dead. Even in its final form. *spawn*

It truly was hardware design that was well ahead of it's own time, maybe too far into the future ...

It pioneered concepts like data-oriented design which is paving the way for more scalable multi-threaded game logic. Deferred renderers ultimately became more popular as well and is the dominant rendering technique we see for AAA games today. The SPUs were a lot like compute shaders we see on GPUs and both had limited local memory ...

Who knew that hardware which is over 15 years old at this point would be setting programming/paradigm trends that we see in games today and likely many years from now ? The architects behind the design must have been crazy visionaries ...
 
Enjoy :cool:


...and...


Rob Wyatt

"One thing to remember when it comes to the SPUs is they were made for a completely different purpose to what they were used for. When I started working on the PS3, I was in Japan, it was probably 2002 and I was splitting time between the PS3 and working on Ratchet on PS2. For the PS3 the RSX wasn't in the picture for maybe another year or so. The original PS3 design had a Sony designed GPU called the RS but it only did pixels, it was also kind of complicated as you had to schedule all the threads yourself. The SPUs were intended to feed the RS with transformed vertices, in a similar manner to how the PS2 worked, and if you look at how the SPU DMA works then processing vertices is an almost perfect use case. The intended design was you'd be able to do fantastically complex vertex processing, with programmable nodes for skeleton joints, because the SPUs were not just stream processors (like vertex processors still are). There was a device called the LDPCU and to this day I'm 100% sure how it worked, it had 1500 pages of documentation in Japanese and only Mark Cerny could read it. It was basically a gate keeper and synchronization system that would allow the SPUs to process and complete vertex batches out of order but still have the RS/GPU render them in order. We never really used it because we didn't know how, to got it to work from what Mark told us and by the nature of how simple our tests were - I'm pretty sure it would have a total nightmare. So what happened was the RS was too big, in silicon terms, to make and it wasn't really possible to optimize down to a reasonable size without significantly gutting it, if they gutted it then it wouldn't have competed with the XBox. At this point Sony were stuck between a rock and a hard place, they looked putting multiple cells in the console and software rendering (I actually wrote a prototype software renderer, in 100% hand paired asm, that would run across multiple SPUs - ultimately it was a proof of concept of what not to do), they look at stacking a bunch of PS2 style GPUs together to make a pseudo programmable blend stack. Ken Kutargi did not want to give up and go to Nvidia or AMD/ATI but in the end he had no choice, its a good job he did because how terrible would the PS3 have been if the SPUs were used for graphics and games had just the single PowerPC core?? Once the RSX showed up and it could do vertex processing the SPUs had no job. This is when the ICE team started looking at using the SPUs for other tasks, it was a massive exercise in data design. If you started from scratch you could design a system for physics, audio, AI, particles - whatever and it would be very fast because you could factor in the constraints of the SPU memory. However, if you started with existing code or cross platform code, then it was next to impossible to get the SPUs to do anything useful. Initially this resulted in huge variance in quality between first party and third party games. This was also the time frame when fewer and fewer studios were willing to write an engine from scratch and things like Unreal engine were getting very popular, it was UE3 at the time, and it ran like crap on the PS3 but ran awesome on the Xbox and PC. Ultimately, the negative developer feedback cut through the arrogance that was present at the time within Sony (and Ken himself) and the PS4 was intentionally designed to be PC like (and was done by Mark)."
 
Cell was always the best

aYodGd7_700b.jpg
 
Hard disagree. It's fair to say Cell wasn't good for the gaming code and paradigms of the time. If you were to take modern GPGPU focussed concepts, or even just data-oriented game design, and map them onto Cell, you might have a very different story. When it was able to stretch its legs, Cell got unparalleled throughput. The main problem was devs couldn't and shouldn't have had to rearchitect everything for Cell on top of developing for other platforms. Sony perhaps imagined a scenario like PS2 where the majority code was focussed on it and ported, but that didn't happen, and realistically couldn't for the cost of game development as it increased exponentially.

Sadly we'll never know. There isn't going to be a future retro-demo scene pushing PS3's the same way we've seen C64, Spectrum, etc. pushed. There's no access to Cell hardware or interest in exploring it. I'm not going to argue it's a loss to the world and a travesty that other tech displaced it, but I'm not going to accept that it's a poor design for a gaming CPU. Modern game development is data focussed, working around the data access and memory limits caused by RAM buses that haven't kept up with the increase in maths power, gaining massive improvements (10-100x) over object orientated development concepts, and that's entirely what Cell was about 15 years ago when STI got together and said, "you know what the future limiting factor is going to be? Data throughput."

If the Cell wasn't good for gaming tasks at the time, and it was today, then it still wasnt a good CPU for gaming when the PS3 was hot. Though i doubt its design would reign today, evidently it does not. As a user after your post has shared, it wasn't all that great of a CPU back then for gaming, or most things basically, nor is its architecture 15 years later.
We are not seeing anything of it in todays architectures and consoles, and for good reason. It was a terrible console hardware-wise.
 
I really wonder what cell + geforce 8 series gpu could have done. That $500 budget could have gone to a cell + geforce 8 and 512megs of ddr with the 256 rdram. Could have been an amazing console if they didn't try to go with that weird gpu and force bluray into the console.

Or simply something else for the CPU than the weak Cell. Whatever AMD/Intel or even IBM had on offer really.
 
If the Cell wasn't good for gaming tasks at the time, and it was today, then it still wasnt a good CPU for gaming when the PS3 was hot. Though i doubt its design would reign today, evidently it does not. As a user after your post has shared, it wasn't all that great of a CPU back then for gaming, or most things basically, nor is its architecture 15 years later.
We are not seeing anything of it in todays architectures and consoles, and for good reason. It was a terrible console hardware-wise.
Your initial argument was that the CELL wasnt very powerful. Clearly it was extremely powerful. We have seen it in action and there is proof.

CELL was not good for gaming and CELL not being ideal for gaming development paradigms of the time are two different things.
Regarding how its architecture fares 15 years later it is irrelevant. Whatever architecture we have today, like everything else, is a result of 15 years of steady evolution and improvement.
The CELL was designed well for what was about to come, but came too early.

The earlier release of 360 and the slow adoption of PS3 meant that developers found home in the 360's development ecosystem. If the PS3 was launched earlier and its market share exploded like the PS2, we would have been seeing developers taking advantage of CELL in a lot more ways, and whether it was ideal or not would have been irrelevant. It makes more sense to focus main support for the huge market leader even if it is not ideal. Just like the PS2. We were having the same arguments back when PS2 was released. It was difficult to make games on, libraries and documentation were unfinished, it was too exotic and developers had to be super smart how to use the architecture, some games looked worse than on the DC, analysts were predicting the elimination of many games companies because it was too costly to make games on, yada yada.

Same story at the beginning, but at the end it rocked the industry. Simply because it came early and sold like hotcakes until the end. The PS2 was overall, probably a bigger nightmare, but it was irrelevant. When it owned 70% of the market, comparisons with a product with significantly less market share, makes financially zero sense. It does make a hell of a lot more sense though if your market share is below 50% and developers are making multiplatform games.

We arent having this discussion today about PS2 and we saw it's architecture pushed by developers. Even though XBOX was significantly more powerful and straightforward, PS2 was the base platform. It didnt get Saturn'ed.
The PS3 and the 360 were much more similar in terms of performance and architecture on the other hand, but we are having discussions about how badly the PS3 was designed and how bad CELL was, when we do have games that prove its capabilities, since the PS3 almost got Saturn'ed. This discussion might not have taken place if the PS3 repeated PS2's success. All developers would have jumped in and put aside the 360's more efficient. more traditional approach and they would have been discovering more ways to use the CELL's potential.
 
Last edited:
Disclaimer: I only had limited experience with CELL (basically some tinkering on my PS3 + Linux, no serious projects)

My problem with CELL is, basically, it's a CPU + DSP set, with a underpowered CPU. The DSP part (those SPE) are quite powerful at the time, in terms of DSP, but they are also limited in terms of more general purpose workloads. Each SPE only has 256 KB memory, and from my understand SPE can't address main memory directly (has to go through DMA, which has some latency problem). That means SPE are best suitable for "streaming" works i.e. workloads with highly localized memory access, e.g. audio or video encoding/decoding. Unfortunately, the CPU (PPE) is not powerful enough to feed to those SPE. This creates a lot of balancing problem, making CELL a pretty fragile architecture (that is, you need to carefully optimize for it to run very well, but if some of the workloads changed, you'll have to go through the whole optimization process again).

I think if the SPE is able to address the main memory directly (read only is probably fine, but there's also cache coherence problem to consider), and the if CPU is more powerful (e.g. if we trade two or more SPE for another PPE or a better PPE), CELL would probably be much better. Unfortunately, since CELL was not designed to be like that (as mentioned in a previous post, CELL was designed to cover graphics workloads), it's probably too late to change.

Anyway, we don't see anything like that being tried again in general purpose computing, and probably with good reasons.
 
Your initial argument was that the CELL wasnt very powerful. Clearly it was extremely powerful. We have seen it in action and there is proof.

CELL was not good for gaming and CELL not being ideal for gaming development paradigms of the time are two different things.
Regarding how its architecture fares 15 years later it is irrelevant. Whatever architecture we have today, like everything else, is a result of 15 years of steady evolution and improvement.
The CELL was designed well for what was about to come, but came too early.

The earlier release of 360 and the slow adoption of PS3 meant that developers found home in the 360's development ecosystem. If the PS3 was launched earlier and its market share exploded like the PS2, we would have been seeing developers taking advantage of CELL in a lot more ways, and whether it was ideal or not would have been irrelevant. It makes more sense to focus main support for the huge market leader even if it is not ideal. Just like the PS2. We were having the same arguments back when PS2 was released. It was difficult to make games on, libraries and documentation were unfinished, it was too exotic and developers had to be super smart how to use the architecture, some games looked worse than on the DC, analysts were predicting the elimination of many games companies because it was too costly to make games on, yada yada.

Same story at the beginning, but at the end it rocked the industry. Simply because it came early and sold like hotcakes until the end. The PS2 was overall, probably a bigger nightmare, but it was irrelevant. When it owned 70% of the market, comparisons with a product with significantly less market share, makes financially zero sense. It does make a hell of a lot more sense though if your market share is below 50% and developers are making multiplatform games.

We arent having this discussion today about PS2 and we saw it's architecture pushed by developers. Even though XBOX was significantly more powerful and straightforward, PS2 was the base platform. It didnt get Saturn'ed.
The PS3 and the 360 were much more similar in terms of performance and architecture on the other hand, but we are having discussions about how badly the PS3 was designed and how bad CELL was, when we do have games that prove its capabilities, since the PS3 almost got Saturn'ed. This discussion might not have taken place if the PS3 repeated PS2's success. All developers would have jumped in and put aside the 360's more efficient. more traditional approach and they would have been discovering more ways to use the CELL's potential.

So, whats that proof it was 'extremely powerfull'? People with actual hands on, from home users to developers clearly have a different view. As a CPU it was very underpowered, coupled to specialized processors (DSP's) that proved to bring not much to gaming performance aside from some use cases. On top of that it was hard to code for.

The 360 launched earlier yet was the more capable system, evident in multiplatform games. We cant judge on AAA exclusives since those never made it to both consoles/optimized for both.

All you are proving is that hardware isnt always tied to a consoles success, look at the PS2 and switch.... both the weakest but seemingly the most successfull.
its like if some are completely ignoring what others that have had experience with it here (and elsewere) have to say. The PS4 was and still is the much better machine hardware wise.

Anyway, we don't see anything like that being tried again in general purpose computing, and probably with good reasons.

Its good that todays consoles (starting with the PS4) bear nothing from the PS3 and before that area. Yeah it was intresting to talk about these exotic (yet weak) designs), though we couldve had so much more graphically if they didnt went with crazy custom designs.
 
think if the SPE is able to address the main memory directly (read only is probably fine, but there's also cache coherence problem to consider), and the if CPU is more powerful (e.g. if we trade two or more SPE for another PPE or a better PPE), CELL would probably be much better. Unfortunately, since CELL was not designed to be like that (as mentioned in a previous post, CELL was designed to cover graphics workloads), it's probably too late to change.
CELL wasn't designed for graphic workloads. It was designed for 'stream processing', whatever workload that is, including things like video encoding/decoding. Kutaragi may have had a notion it'd be great for pure visuals, but the architecture was not designed specifically around graphics rendering.

My problem with CELL is, basically, it's a CPU + DSP set, with a underpowered CPU. The DSP part (those SPE) are quite powerful at the time, in terms of DSP, but they are also limited in terms of more general purpose workloads. Each SPE only has 256 KB memory, and from my understand SPE can't address main memory directly (has to go through DMA, which has some latency problem). That means SPE are best suitable for "streaming" works i.e. workloads with highly localized memory access, e.g. audio or video encoding/decoding. Unfortunately, the CPU (PPE) is not powerful enough to feed to those SPE. This creates a lot of balancing problem, making CELL a pretty fragile architecture (that is, you need to carefully optimize for it to run very well, but if some of the workloads changed, you'll have to go through the whole optimization process again).
IIRC a memory manager was developed for a SPE to server the requirements of other SPE workloads. When it comes to 'streamable workloads', modern software is structuring all workloads around streaming workloads because that's how to get the best utilisation from the caches and keep the many ALU's and cores occupied, so as many jobs as possible are now being designed around parallel processable, streamable data workloads. The best economy from any processor, ARM, x64, GPGU, comes from lining up all your data and running through it as quickly as possible. Even doing this multiple times as opposed to condtional and branches to only do the 'necessary' maths.
Anyway, we don't see anything like that being tried again in general purpose computing, and probably with good reasons.
Yes, but probably not because the architectural concepts are a dead end. An independent IHV can't add 'stream processors' onto an existing CPU because no-one will use them unless they are standard across all platform. What we see instead is processors being adapted to fit the workloads Cell was designed for, notably adjusting the GPU shaders to stream process. You'll have to trip to a parallel dimension without the same economic constraints to find a universe where all silicon swapped to 'stream processors' 15 years ago to understand what the architecture real potential/limits are compared to what we have as standard, including 15 years evolution across ARM, Intel, AMD and STI 'cell' type processors to compare with i9 and Ryzen/Threadripper.
 
So, whats that proof it was 'extremely powerfull'? People with actual hands on, from home users to developers clearly have a different view. As a CPU it was very underpowered, coupled to specialized processors (DSP's) that proved to bring not much to gaming performance aside from some use cases. On top of that it was hard to code for.

The 360 launched earlier yet was the more capable system, evident in multiplatform games. We cant judge on AAA exclusives since those never made it to both consoles/optimized for both.

All you are proving is that hardware isnt always tied to a consoles success, look at the PS2 and switch.... both the weakest but seemingly the most successfull.
its like if some are completely ignoring what others that have had experience with it here (and elsewere) have to say. The PS4 was and still is the much better machine hardware wise.



Its good that todays consoles (starting with the PS4) bear nothing from the PS3 and before that area. Yeah it was intresting to talk about these exotic (yet weak) designs), though we couldve had so much more graphically if they didnt went with crazy custom designs.
We arent discussing which is the more capable system which again is subjective and a different subject since both consoles had their own set of pros and cons. We arent discussing about whether power defines who the market leader is going to be either. Again thats a different subject.
We arent discussing whether Sony's overall choices for the PS3's architecture in general were the best either. We all agree that the RSX was weaker and that the CPU was a pain. So I am not sure what you are arguing.

We are only discussing the capabilities of the CELL processor and the circumstances under which it was properly or improperly utilized. In general the PS3 punched above its weight despite the tangible weaknesses compared to the 360 and the various false notions about the CELL. And it was all thanks to the CPU that the PS3 later in its life performed so close, in many cases equal, and in some rare occasions better while it spat out AAA titles that set new standards in console gaming visuals more often than the 360 did
 
So, whats that proof it was 'extremely powerfull'?
The raw data from paper specs, performance tests and benchmarks coupled with an understanding of the data and processing workflow of a game (or indeed any CPU optimised workload) seeing how it can be parallelised and streamed. Firstly, your CPU has a peak amount of maths it can do. Then it gets bottlenecked trying to keep them busy. If you can keep them busy, the CPU with the most ALUs wins. The case you are wanting, a game 10x better on PS3 than XB360, you will never find, so you have to understand what's actually happening with the software, data, and processing to see where Cell's strengths lie and why it was attempted in the first place.
Its good that todays consoles (starting with the PS4) bear nothing from the PS3 and before that area. Yeah it was intresting to talk about these exotic (yet weak) designs), though we couldve had so much more graphically if they didnt went with crazy custom designs.
We could have had better results if devs were able to develop Cell-ideal games too and the software was actually mapped well to it.

Basically, Cell represents a returning to computing's roots, and as such will be intrinsically 'superior' to getting work done, but at the cost of software requirements. Computers work a particular way which is completely alien to human thinking. In computing's infancy, people had to learn to think like a machine. They hand-crafted code to make the machines do things the way the machines liked, to get them to work at their peak. Over time, the exponential increase in processing performance coupled with a need for more complex software saw processors designed to accommodate a human way of doing things. Eventually, code development became human led, based on being easy for developers, rather than led by the machine's requirements. But this is wasteful both in prodessing time and silicon budget. Now your CPU is hopping around random memory and trying to shuffle which instructions to do when to get the developers code to operate effectively. The result is your chunky processor attaining 1/10th what it's actually capable of doing. Alternatively you can write code that fits the machine, but then you have to write code the Old Fashioned way, learning to speak Computerese instead of having the machine work with your human-level thinking - 'difficult to code for'

Many of the criticisms levied at Cell are only of the time. It was a processor in its infancy, with tools to match. When you look at the tools such as that guy's video about Time To Triangle and the massive program needed for Cell proving how complicated and 'crap' the SPE was, that simple 5 line 'printf' code is hiding a metric fuck-ton of code hidden in the OS, firmware, tools, libraries, etc. The actual amount of code needed for the machine to produce that "Hello World" on screen is possibly millions of instructions. That's been hidden from the dev thanks to tools that have been developed over decades. If you were to compare the raw machine language to get the CPU to put something on screen to the machine language needed for the SPE, the difference would be a miniscule percentage more effort.

Now imagine Cell has been around 15 years. You'll be able to go into VS and include a bunch of standard libraries that do the base-work and boiler plate for you, just as happens with the x64 CPU you write to. Not only that, the Cell has received numerous updates to improve its usability. Now you have a 128 core processor that's faster than Threadripper but far smaller because it doesn't have all the legacy stuff designed to make 'human centric' code run at a decent speed because all the software on Cell is 'machine centric'. There's no reason to doubt this as it's exactly what's happened in the GPU space. GPU workloads were designed around streamed data keeping the ALUs occupied, and software was machine-friendly needing educated engineers to write bespoke shaders. GPU work hasn't had decades of clumsy branchy code that modern GPU are having to try to run quickly. As such, you can stuff multiple teraflops of ALU in a GPU's silicon budget and get that to do useful work thanks to not hopping around randomly in memory and executing instructions in arbitrary orders. Just as GPUs are faster at data-driven workloads than CPUs (and CPUs are better at machine-centric work rather than human-centric work), so too is Cell in principle.

Now imagine a modern data-driven engine on PS3, maybe using something like SDFs, new concepts that could have been supported on PS3 but they weren't in the developer toolset or mindset at that point. The results would have been mindblowing and likely not possible on other platforms that lacked the raw power to execute.
 
Last edited:
Unity DOTS explained:

Mike Acton explains data-driven design and Unity's future solution.

How is Unity DOTS relevant to Cell? Here's a paper on DODS, with this intro...
In 2017, Unity Technologies, known best for the Unity Real-Time Development Platform used by many game developers, hired DOD proponents Mike Acton and Andreas Fredriksson from Insomniac Games

The History of DOD​

The movement toward data orientation in game development occurred after the PlayStation (PS) 3 game console was released in the mid-2000s. The game console used the Cell hardware architecture, which contains a PowerPC core and eight synergistic processing elements, of which game developers traditionally used six. This forced game developers to make the move from a single-threaded view of software development to a more parallel way of thinking about game execution to push the boundaries of performance for games on the platform. At the same time, large-scale (AAA) game development was growing in complexity, with an emphasis on more content and realistic graphics.

Within that environment, data parallelism and throughput were very important. As practices coalesced around techniques used in game development, the term DOD was created and first mentioned in an article in Game Developer magazine in 2009.6

In 2017, Unity Technologies, known best for the Unity Real-Time Development Platform used by many game developers, hired DOD proponents Mike Acton and Andreas Fredriksson from Insomniac Games to “democratize data-oriented programming” and to realize a philosophical vision with the tagline of “performance by default.”7 The result has been the introduction of a DOTS, which is a canonical use of DOD techniques.

To date, many blogs and talks have discussed DOD since the original article, but very little has been studied in academia regarding the process. Richard Fabian, a practitioner from industry, published a book on DOD in 2018, although it existed for several years in draft form online.8

In 2019, a master’s thesis was published by Per-Morten Straume at the Norwegian University of Science and Technology that investigated DOD.9 Straume interviewed several game industry proponents for DOD in the thesis and concluded that, while they differed in their characterizations of DOD, the core characteristics of DOD were to focus on solving specific problems rather than generic ones, the consideration of all kinds of data, making decisions based on data, and an emphasis on performance in a wide sense.

Both Fabian and Straume discuss DOD elements without fully tying those elements together to form a design practice that can be used to create software. The overarching theme that ties all of the elements together is that software exists to input, transform, and output data.
Mike Acton and Andreas Fredriksson took the work needed for Cell at Insomniac and saw it applied to all processors. You want to map your work to stream processing, and once you start doing that it makes sense to design the hardware to be optimised for those workloads given your limited silicon budget. Which is exactly what Cell was.

Edit: The saddest part, "Mike Acton" is a B3D member who actually teamed up with Beyond3D to host the Cell development centre for discussion and best practice. You can check out his content at https://cellperformance.beyond3d.com/articles/index.html

His posts: https://forum.beyond3d.com/members/mike-acton.6747/#recent-content

There was also a Cell development sub forum. Back when we had real developers doing real work. 😥So don't take my word for it that Cell is great for solving the workloads that computers should be doing!
 
Last edited:
I'm not that optimistic on CELL's architecture decisions though. The memory size of SPE is really limiting. In many ways current GPGPU architectures do things way better than SPE.
In many ways, if you want to make something unique, its benefits must be clear and huge. By huge I mean something like ten times better in some measure (not necessarily raw performance, it can be performance per watt or performance per dollar, or something like that). CELL does not provide enough of that to justify the huge investment in new software. I guess SONY and IBM wished that the huge market of PS3 could kickstart it, but it didn't come to that.
By comparison, if we look at the early history of GPGPU, it's clearly that because GPGPU has the potential to provide much better benefit (again, not necessarily raw performance), so people were willing to invest into a completely new and immature software infrastructure.
 
You can't compare current GPGPU designs to 15 year old Cells though - that's not the environment it was conceived in. When Cell was conceived, if you didn't have Cell, you had either x86, tiny, puny ARM, or GPU that you 'hacked' to run non graphics work on limited fixed vertex or pixel shaders. GPGPU started by actually exploiting GPUs for 'free' without any software investment so that's how that paradigm was able to develop naturally alongside the restrictive economics of general computing. Data was structured as textures and used architecture designed for graphics in novel ways. That's how GPGPU has been able to evolve with the end result being a mutation of GPUs into ultra-wide stream processors. Back in 2005 when Cell was conceived, GPUs were still graphics focussed even though work was ongoing in exploiting them for non-graphics work and it wasn't clear what their end result would be - CUDA didn't appear until 2007 and GPU shaders were inherently too limited to be applicable to all streamable workloads. If it was obvious, STI wouldn't have bothered with Cell, instead saying, "you know what, GPUs are going to have this covered in 5 years."

In short, I think your complaints are the results of the economics, not the architecture. It's not that Cell couldn't bring something meaningful, but that it couldn't do it in an economical, self-sustaining business between conventional CPUs being made wider and GPUs being made more versatile. We need to separate out the evaluation of Cell's design and architecture from its product viability as they are two different arguments. It's almost like considering which is the better mode of transport - an all electric small car capable of 500 miles on one 8h charge or a 2l diesel managing 30 mpg - in 1983. "The electric car is useless; there's nowhere for me to charge it. Liquid fuel is the only sane solution." (note I'mnot saying Cell is like a future tech, only that its value and appreciation is constrained to its functional environment. A good idea can be rendered useless and worthless by wider factors)

The only real architectural complaint here is that the SPE memory was too small. That's a difficult one to evaluate. Is it inherently too small to d the work needed, or was it too small for the work devs were trying to get it to do back then in the way they wree trying to get it done?
 
The only real architectural complaint here is that the SPE memory was too small. That's a difficult one to evaluate. Is it inherently too small to d the work needed, or was it too small for the work devs were trying to get it to do back then in the way they wree trying to get it done?

Oh, it wasn't my intention to compare directly between CELL and GPGPU. I was merely using GPGPU as a more successful example of how a new software ecosystem might be established.

The memory size of SPE is, IMHO, not just it's "too small," but it's not scalable. Therefore, all applications have to optimize around this number. This is a big problem for software, because software have notoriously long shelf life. So, if CELL really became successful, there might be a "CELL 2" which may have a larger SPE memory, but all previous applications won't be able to utilize that, and future software which are able to utilize it, will have trouble running on older CELL. This will cause huge problem. This is what I mentioned as "fragile."

Early GPGPU has similar problem (e.g. limited share memory size), but since the GPU has access to a much larger video memory, shared memory are mostly limited into scratch pad usage, thus easier to control. So while early GPGPU were still quite fragile in this sense, they are much less so compared to CELL. So, that's why I think if SPE are able to access main memory in a low latency fashion, it'd be possible to make SPE memory into something more like scratchpad and will make CELL much less fragile in this sense.

Of course, this is less problematic for a console which was supposed to have the same hardware for quite some time. Back in the PS3 days, if there's no competition I'd say the plan to "kickstart" a new software infrastructure for CELL might be possible. Once that happened, people would have better idea on how to improve on things (e.g. maybe making SPE's being able to directly access main memory, or other improvements). We might have a (much improved) CELL-like architecture, instead of GPGPU we have today.

However, it didn't happen, one reason being there were competition, another being that the benefits of CELL was not really huge enough to justify the investments in new software. Another thing is that, after CELL, there's really no much developments on similar direction for general purpose computing. That's another telling sign.

Also, if we look at the history of GPGPU, it also had a quite shaky start. In the first few years, many people were still skeptical on whether GPGPU is or will be useful. Many put GPGPU as something that's only useful for some specialized tasks, with no real impact on more consumer oriented workloads. Some may argue that it's still true, as most GPGPU applications are still in the scientific computing and AI realms. So, in an alternative history, it's quite possible that CELL-like architecture would be limited to similar scopes.

Of course, what's interesting to us are gaming workloads, and that's something where a CELL-like architecture might be beneficial in some ways. However, even now most game engines are not using GPGPU for non-graphics tasks much, so it's difficult to imagine how it might be if CELL-like architecture survived in some form.
 
Edit: The saddest part, "Mike Acton" is a B3D member who actually teamed up with Beyond3D to host the Cell development centre for discussion and best practice. You can check out his content at https://cellperformance.beyond3d.com/articles/index.html

His posts: https://forum.beyond3d.com/members/mike-acton.6747/#recent-content

Mike Acton also thought that 30 fps was more commercially attractive than 60 fps by doing a very flawed game review survey. He was also really pwned at a Cppcon.

Very OT, but I will bring this up anytime anyone mentions Mike Acton.
 
Back
Top