Was Cell any good? *spawn

Status
Not open for further replies.
http://www.youtube.com/watch?v=RYNTIyYIJBQ&feature=plcp

heres what about 20 cells were doing in 2008 on a 2005 architecture, lets be crazy and speak theoretically, lets assume cells successor was 4x32 and sony stuck 5 of them in a ps4, and assume that the ps4 cost a thousand dollars to make

i dont know much about gpus but im interested in knowing if your current 2012 thousand dollar gpu can do that?

or even crazier, what about your current 2012 thousand dollar cpu?
 
Last edited by a moderator:
It is important to understand that one of the main reason that Kutaragi was forced out was CELL. AKA, CELL was so bad and so championed by him that he got the axe.

It wasn't just the CELL design. There were at least two more reasons:
1. Pushing a blue laser optical format into a mass market product without sufficient supply, - adding a lot of cost and delays.
2. Insisting on producing CELL in-house. Fitting a fab with state of the art 90nm equipment only to see it operate way below capacity, - then having to sell it off to Toshiba taking a huge loss on the whole ordeal (and later buying it back to produce CMOS image detectors.)

Cheers
 
http://www.youtube.com/watch?v=RYNTIyYIJBQ&feature=plcp

heres what about 20 cells were doing in 2008 on a 2005 architecture, lets be crazy and speak theoretically, lets assume cells successor was 4x32 and sony stuck 5 of them in a ps4, and assume that the ps4 cost a thousand dollars to make

Are we also suppose to be crazy and theoretically assume it doesn't melt the case?

i dont know much about gpus but im interested in knowing if your current 2012 thousand dollar gpu can do that?

No, it can do much better. Your configuration has ~4T FLOPs of SP computing power. A current modern day $1000 GPU configuration has ~8T FLOPs of SP computing power. And the demo you linked was at best meh.

And more to the point, ray tracing is a dead end. What people are eventually hoping that GPUs can hopefully get to is REYES which is what almost the entirety of the film industry uses.
 
How much CELL or PS3 programming have you done yourself aaronspink by the way?

I've read all the documentation as well as discussed the architecture with many who have programmed it. One doesn't need to program an architecture in order to understand its issues and shortcomings. This is true now and has been true for the past 40 years. CELL itself violates many well known learning from throughout the history of computer architecture. In addition, there is a host of documentation on the issues with the architecture in practice. There is a reason that all the partners involved abandoned the architecture after having pouring billions into it and related products. It is just fundamentally a poor architecture with no redeeming advantages, esp with the competition available.
 
And more to the point, ray tracing is a dead end. What people are eventually hoping that GPUs can hopefully get to is REYES which is what almost the entirety of the film industry uses.
Laa-Yosh recently revealed that things are moving towards raytracing in the GI business because it provides a faster route to create the scene and processing power has enabled fast rendering. Complex scenes using other rendering methods waste too much time tweaking particular parameters to make things look right. His studio is raytracing now.

Which is not much value to this dscussion. There's a whole epic discussion on raytracing. The example raised here as to the value of Cell is pretty pointless without a comparable excerise on another architecture.

A lot of your complaints seem to based on current GPU design though. Back in 2005 (and earlier, when Cell was being envisaged), branching on GPUs was a no-no. Cell provides high throughput with greater programmability. Now you seem to be saying that beefy VMX units are just as good as a SPU without the drawbacks. I haven't the knowledge or experience to have any opinion on that, but it'd be nice to hear some solid arguments with either side explaining how this can be proven. ;)
 
It is important to understand that one of the main reason that Kutaragi was forced out was CELL. AKA, CELL was so bad and so championed by him that he got the axe.

Nobody except top Sony honchos and Crazy Ken himself knows why Kutaragi left or got fired. We can speculate, that is all.
 
The homogenous multicore would have been closer to what IBM wanted when it was working with Sony and Toshiba, with Toshiba apparently more in love with the SPU concept. This seems borne out by the SPURS product that barely made a blip anywhere.
Actually I wonder (it was a long time ago now) if IBM and Toshiba agreed more on the design than it looks.
I'm not that confident in my memory but it tells me that actually both IBM and Toshiba wanted a SMP set-up most likely throughput oriented but one wanted it PPC based the other one MIPS based.
I believe that it was Sony that was hell bent on introducing vector processing units in the design.

To me (and at posteriori... it's always easy to comment a posteriori...) and looking at what STI wanted to achieve the chip should have ended closer to the Power EN than the Cell.
It could have been multiple low power ppc/mips cores and a few accelerators (not network ones as in the Power EN).

If I look at what happen even inside IBM I can't failed to notice that it's the chief architect in charge of Xenon that got the position to develop the Power A2 cores which is an important project to them. So I feel like the pov of Aaron, Timothy, and those I forget is shared inside IBM.
My belief is that the Power A2 is really close to what IBM and Toshiba would have wanted the Cell to be. Toshiba may have wanted a more media oriented ISA for the SIMD but as much as the overall design is concerned I would not be that surprised if that's true.
 
Actually I wonder (it was a long time ago now) if IBM and Toshiba agreed more on the design than it looks.
I'm not that confident in my memory but it tells me that actually both IBM and Toshiba wanted a SMP set-up most likely throughput oriented but one wanted it PPC based the other one MIPS based.
I believe that it was Sony that was hell bent on introducing vector processing units in the design.
The stuff I've read was very much that IBM wanted multicore PPC, like Xenon, and Toshiba wanted SPURS. I've never heard Sony wanting any particular architecture and it appears as though they were just the mediators bringing it all together. I've no quotes though. ;)
 
So you admit to being one of the sucky programmers incapable of handling the greatness. Do you also use C++? :p
Lol Alex :)

I find it hard to believe that with a better GPU, no developer could come up with any additional computing to do. There's a whole lot on the physics/animation side that either isn't happening this gen or only barely happens. Seems like whenever anyone declares this or that chunk of computational ordinance to be useless, someone finds a way to use it.
Right but we're not talking about having part A+B vs just A. We're talking about using the area and power resources that went into the SPEs to do a better CPU and/or a better GPU. And going forward this gets a lot more murky because when power becomes the limit, evening having extra hardware is not a pure win because if you power it up to run a problem that could have been more efficiently executed elsewhere then you'll slow down the whole chip and end up behind...

i get the feeling that people who dont work on it or never have presume that everything possible on the Cell architecture has already been accomplished because it happens to exist as an 1ppe6spe chip in the ps3
I've written code on both PS3 and cell blades... as well as lots of GPUs, AVX/SSE, Larrabee, etc. So I think I'm qualified to speak about the comparison?

GPU compute is limited to specific workloads.
But that's the biggest problem with Cell too. It only does well at workloads that GPUs are even better than it at. Sure it can run stuff like LZMA, etc. (and so can GPUs), but more poorly than a standard CPU core would have.

Maybe, if you have a tightly paired APU with shared TLBs and etc.
Having a unified address space is barely useful since it's not even cached... not even for read-only data (like GPUs). You don't want to be pointer-chasing on SPUs (or GPUs) anyways so the advantages vs. the GPU model are pretty moot.

I really want to give Cell the benefit of the doubt but as I mentioned retrospectively the memory hierarchy choices they made are just too crippling. Thus I agree with Aaron... either a CPU or a GPU is more efficient at the vast majority of interesting workloads.

And of course you can make the argument that it was interesting "when it came out", but I doubt Sony would have made a bet on something they knew had no future. Furthermore it came out around the same time that G80 did, which really is where the GPUs begun to put the nails in the coffin.

The real-time ray tracing link is kinda of funny because ray tracing is one of the things I was involved in doing on Cell and GPUs in the past. Cell's performance was interesting briefly compared to 7xxx series GPUs and their crap branching, but once G80 came out it (again, around the same time as PS3 IIRC), it was clearly outmatched. And for reference, G80 is awful at ray tracing compared to modern GPUs (and massive strides have been made once the switch to software warp scheduling was made for ray tracing on GPUs), so I think it's pretty fair to conclude that Cell is going to look really bad in that comparison today.
 
Last edited by a moderator:
And of course you can make the argument that it was interesting "when it came out", but I doubt Sony would have made a bet on something they knew had no future. Furthermore it came out around the same time that G80 did, which really is where the GPUs begun to put the nails in the coffin.
But it began it's design way before, long before GPGPU of the current complexity could have been predicted. If GPUs hadn't branched out as they did, Cell would have had a very prominent place.

I also feel there's plenty Cell can do which is being overlooked, although I could be very wrong. Despite protestations that SPUs aren't DSPs, they do excel with those workloads. I still think Cell would be the most effective audio synth processor which is something GPGPU will never be a good fit for. Modern CPUs may be just as good, but SPUs are tiny. And for whatever audio workloads Cell could execl at, it could also turn its hand to in the visual space, such as various procedural content algorithms. There is talk of a DSP being wanted in the next-gen consoles to handle audio workloads. The addition of SPUs as 'extra versatile DSPs' sounds good on paper.
 
But it began it's design way before, long before GPGPU of the current complexity could have been predicted. If GPUs hadn't branched out as they did, Cell would have had a very prominent place.
Well other people predicted it just fine :) I never said Sony knowingly made a poor choice; obviously they thought it would be interesting long term... but they were wrong. And really, you think they were totally oblivious to developments like G80 even though they were still developing their architecture in parallel with it? If they are unable to predict a year out they've got more fundamental problems than just a bad call on Cell.

I still think Cell would be the most effective audio synth processor which is something GPGPU will never be a good fit for. Modern CPUs may be just as good, but SPUs are tiny.
What part about SPUs makes them massively more suitable for DSP than GPUs? And as you mentioned, modern CPUs are pretty great at audio anyways, even the little cores (atom, etc) so I'm not buying the argument.
 
But it began it's design way before, long before GPGPU of the current complexity could have been predicted. If GPUs hadn't branched out as they did, Cell would have had a very prominent place.

If anything at all happened to cause this generation to last another 3-4 years Cell would pull ahead. As it stands it hasn't pulled significantly ahead on the ~7 years we're at now.

FWIW most games use SPUs for audio currently.
 
What part about SPUs makes them massively more suitable for DSP than GPUs? And as you mentioned, modern CPUs are pretty great at audio anyways, even the little cores (atom, etc) so I'm not buying the argument.
What do you mean by audio though? You can't run multiple softsynths on an Atom AFAIK. Audio is a linear stream, so would benefit from potent single cores rather than lots of parallel cores for conventional synthesis techniques. A quick Google to get up to speed suggests GPGPU isn't being used for audio synthesis, with some papers on research into spacial modelling and membrane physics but nothing actually creating synth audio on a GPU at the moment. Cell would sit nicely between the two options, with a tiny core that can burn through any linear synth model and could also render physical simulations pretty quickly. It's memory issues aren't a problem for audio work meaning it can work full tilt, with 7 or 8 3.2 GHz FPUs. And that was when Cell released. Back then I was eyeing it as the perfect platform for building a synth workstation, and it still looks that way to me. x86 may well be up to speed now with multiple cores and easier to program, but that's overkill for the task.
 
Doesn't it have a latency advantage as well for that kind of task?
I love the kind of stuff Uncharted did with Audio.
 
My opinion about Cell isn't as negative as the general consensus here.

Cell forced developers to think about their data layout and memory access patterns. Efficient usage of Cell required batch processing of data (SoA vector processing of large access pattern optimized structures). This resulted in a lot of research around data driven architectures and data oriented design (data items flowing though a pipeline of processing steps). I don't see this as a bad thing at all, because I personally see this as a very good design decision. When done right, this kind of design improves performance a lot, and at the same time improves code quality/maintainability, and reduces multithreading problems (less synchronization errors, improves threading efficiency and provides automatic scaling).

This kind of program model is also very efficient for Xbox 360. Xbox 360 has long vector pipelines, and long stalls (LHS) in moving data from<->to vector pipeline. It has also 128 (x2) vector registers, and no register renaming (it's a simple in-order CPU after all). In order to fully populate these long vector pipelines and utilize this huge amount of registers, you need to batch process huge amount of vector data (unroll wide loops that process large batches of data). This is crucial, as around 80% (depending how you want to calculate it) of Xbox 360 CPU raw number crunching performance comes from the vector pipeline. Modern PC CPUs (such as Ivy Bridge) get similar percentage of their peak performance from the vector pipelines (16 float ops per cycle from vector pipeline), so this kind of processing model is very good for them as well (and even better for massive number crunchers such as Xeon Phi, PowerPC A2 and GPU compute).

The worst thing in Cell was that it practically forced you for this kind of programming model. It wasn't optional at all, and couldn't just be used for the code that benefited from it. Game developers were used to single core CPUs (all previous consoles and PC before 2005), and suddenly you had "external" vector processor much like GPGPU dedicated for (asynchronous) stream processing. If you didn't use it, the hardware was wasted. And it wasn't straightforward to use for rendering either (unlike GPUs with compute support).

Efficient GPGPU programming isn't any easier than programming Cell. However GPU compute is a much better alternative, because it's fully optional. It's your choice how you utilize the GPU. If you want to use all GPU cycles to run pixel shaders you can do so. If you want to use huge majority of your cycles to GPU compute you can do it as well, and anything in between. However GPUs with flexible compute capabilities weren't available at that time and neither were general purpose CPUs with very wide vector units. Sony wanted to have lots of compute performance, and Cell was pretty much the only choice.

It's true that some developers didn't use SPUs much, especially for the first generation games. However all the major AAA developers are using them extensively now. Discussion about audio mixing (above) is also relevant, as some games use a whole Xbox 360 CPU core just for audio mixing. All relevant audio middleware can offload that task to SPUs. GPU isn't yet usable for audio mixing yet because of latency issues (only one ring buffer and no context switching/priority mechanism). Physics simulation can also take a whole Xbox 360 core (or more) in a physics heavy game. All major physics middlewares can offload that work to SPUs.

Many developers are doing graphics tasks such as animation/skinning, occlusion culling, tile based deferred lighting, stereo reprojection and post process antialiasing with SPUs. So SPUs also help the GPU in some tasks that are more efficient for fully programmable compute oriented hardware. GPGPU will of course be even better for these tasks in future games, but it wasn't available 7 years ago. It's easy to criticize Cell now when we have better options (wide vector general purpose CPUs and compute capable programmable GPUs), but I think it thought us a very good lesson. It forced us to adapt some new programming models and techniques that will be even more useful in the future. The relative memory access latency (and bandwidth) compared the compute capability widens all the time, and the parallel computation resources (physical cores, SMT threads, vector unit width) widen all the time so data oriented design becomes even more critical in the future.
 
Status
Not open for further replies.
Back
Top