Was Cell any good? *spawn

Status
Not open for further replies.
Shifty Geezer said:
The post processing AA derived from MLAA was pioneered by Intel AFAIK. And if the GPUs were more potent, we wouldn't need post-FX AA. ;) But it is a useful addition to the developer toolset, I'll grant. Not sure that Cell really helped with that.

Oh, ok..I see. It was just a good match for the PS3 than...
 
The post processing AA derived from MLAA was pioneered by Intel AFAIK. And if the GPUs were more potent, we wouldn't need post-FX AA. ;) But it is a useful addition to the developer toolset, I'll grant. Not sure that Cell really helped with that.

Was based on a paper from 2009. Intel developed an efficient implementation of the technique for CPU. There's a poster here (embarrassed that I can't think of his name) who actually did the work to bring it to Cell. This implementation was definitely an important one, as after God of War 3 (which was relatively soon after that paper was produced) AMD made a GPU implementation, and then a whole host of research into these types of post-AA took off.

Great fairly recent paper by the way from Intel on the subject that I just came across:

http://software.intel.com/en-us/art...-moving-antialiasing-from-the-gpu-to-the-cpu/
 
Either you didn't read/understand my response properly or you don't know what you're talking about. I'll favorably assume the former...

I read and I understood and I pointed out that people on a system with a much simpler tool flow, with much better developer tools, that has been around for a long time, are still coming up with performance optimizations that increase the performance of a particular set of code.

Sure but did it do everything developers wanted? No... Not by a long shot.. Ask the Splinter Cell: Conviction engine boys about that...

Can it do everything cell can and easier? Yes. The amount of time that has spent making cell useful and optimizing for it is astronomical compared to xenon. Orders of magnitude more, because they had too. Xenon just works out of the gate and over the years much better.

Take it from someone who has worked directly on PS3, attended many technical talks from the Sony/Naughty Dog guys on the work they did in these games and what it took to achieve what they were able to accomplish. You're simply quite wrong.

If there target was Xenon, they would of got it to work too.

Finally aaronspink, I think you need to learn/understand/comprehend that writing code so close to the hardware in ways that the architectural differences between Xenon & CELL really matter is nothing more than a sunk cost...

of course it is sunk cost, it was a dead end. It was pointed out to be a dead end from before day 1. The whole conception of cell was a dead end from before day 1 which is why now after their failure of a grand experiment failed it is completely unsupported for the future. Anything learned from CELL is useless moving forward.

And before someone chimes in with task based programming, that was alive and well before CELL and gaining momentum before CELL. In fact, CELL only adds complications to task based programming, regardless of the fact that task based programming is about the only way to make any use of CELL.
 
a sunk cost means that once the knowledge spent on learning the differences between architectures was obtained, it didnt need to be obtained again .

it means that devs dont have to constantly relearn how to get their code running on ps3 or how to architect code for it .

it doesnt mean that they "wasted" their time on learning it (because its gone) as you keep trying to portray.

as angel archangelmorph seems to suggest, some devs probably learned a lot from cell with this move towards a job system

- on another note, its weird to me is that most people refer to cell as the processor in the ps3 when it was supposed to be both, a processor and an architecture, would cell have been "so difficult" to program for if it had 3ppe and 24spe compared to xenons 3? it just seems to me that most people compare its "difficulty" according to devs to the 360

ive read reports that the original cell was thought to have 72processors (possibly 8ppe64spe)
 
Last edited by a moderator:
So according to aaronspink CELL was a complete was and time by Sony, Toshiba, and IBM and should have never seen the light of day. It was a complete and utter failure in the marketplace and gained no traction. And it was a complete waste of money for anybody out there developing for it considering there will be no future iterations of the architecture. It was a waste of silicon space in the PS3 and a quad core PPU CPU would have been superior in every single way. Gotcha.

According to others it wasn't a waste because it allowed developers to keep parity with the 360.

We acknowledge it has failed in the marketplace, similar to RDRAM. We also acknowledge that it helped the PS3 keep parity with the 360 by taking on graphical tasks that the RSX wasn't as capable of doing. Of course in many ways the CPU, which means the 6 SPE's included, was superior to the tri core PPE CPU inside the 360 but also clearly inferior in many tasks If the PPE can do it better than an SPE can, then 3 PPE's can do it even better.
 
So according to aaronspink CELL was a complete was and time by Sony, Toshiba, and IBM and should have never seen the light of day. It was a complete and utter failure in the marketplace and gained no traction. And it was a complete waste of money for anybody out there developing for it considering there will be no future iterations of the architecture. It was a waste of silicon space in the PS3 and a quad core PPU CPU would have been superior in every single way. Gotcha.

According to others it wasn't a waste because it allowed developers to keep parity with the 360.
Much of first paragraph and the last sentence are not entirely exclusive.
The latter could be considered a silver lining to the former's dark cloud.

CELL was intended for a lot more than keeping up with the 360, hence the involvment of so many parties in its design and some significant additional work put into the creation of it.
While it's not a fault of the processor in the PS3, Sony and Toshiba allocated a lot of resources and were planning on gearing up for a lot of silicon using CELL or derived cores, and essentially none of that came to pass.
A whole fab ping-ponged between the two and eventually wound up making imaging sensors instead.

Similarly anyone working to make use of CELL in the future found a lot of their investment wasted.
 
Oh, ok..I see. It was just a good match for the PS3 than...
In terms of the algorithm, not really... PS3 just needed it the most because there's no way you're going to run MSAA or similar on RSX in any reasonably complex game.

But that's entirely the argument here, and one that I agree with: the fact that you can do a lot of work to lessen the performances issues with PS3 by using the SPUs does not make Cell a good design. In fact in practice it has been mostly relegated to stuff that any competent GPU should be doing, because it hasn't really turned out to be good at more general stuff honestly. So sure it can do skinning and culling and shading, etc. but a GPU can do those even more efficiently, so the PS3 designers would have been better off spending the cost/area/power on a bigger GPU.

I maintain that not having a proper cache attached to the SPUs is a crippling problem, and that is at the heart of what limits their usability for more complex workloads. Fundamentally if you are mostly limited to predictable access patterns and simple, regular data structures then a GPU is just going to bury you in performance per mm^2/watt.

Cell was meant to be sort of half way between CPU and GPU, but in practice it ends up more like just a poor GPU (speaking of the SPUs in particular). I'm still waiting for some truly awesome algorithm that just couldn't be done efficiently on either CPU or GPU but so far it's just basic parallel programming "moved to the SPU!". At this stage I think it's fair to conclude that the memory architecture is just too crippling to do anything really interesting with.
 
In terms of the algorithm, not really... PS3 just needed it the most because there's no way you're going to run MSAA or similar on RSX in any reasonably complex game.

But that's entirely the argument here, and one that I agree with: the fact that you can do a lot of work to lessen the performances issues with PS3 by using the SPUs does not make Cell a good design. In fact in practice it has been mostly relegated to stuff that any competent GPU should be doing, because it hasn't really turned out to be good at more general stuff honestly. So sure it can do skinning and culling and shading, etc. but a GPU can do those even more efficiently, so the PS3 designers would have been better off spending the cost/area/power on a bigger GPU.

I maintain that not having a proper cache attached to the SPUs is a crippling problem, and that is at the heart of what limits their usability for more complex workloads. Fundamentally if you are mostly limited to predictable access patterns and simple, regular data structures then a GPU is just going to bury you in performance per mm^2/watt.

Cell was meant to be sort of half way between CPU and GPU, but in practice it ends up more like just a poor GPU (speaking of the SPUs in particular). I'm still waiting for some truly awesome algorithm that just couldn't be done efficiently on either CPU or GPU but so far it's just basic parallel programming "moved to the SPU!". At this stage I think it's fair to conclude that the memory architecture is just too crippling to do anything really interesting with.

Yep, that is what I meant: a good fit for PS3 to release RSX from the burden of doing MSAA (and other stuff).

But on the other hand, it also showed (the examples you mentioned about SPUs helping out RSX) that it was somehow a general purpose architecture, capable of doing not only the typical CPU tasks, but also GPU tasks, right?

Isn't this somehow a plus? In theory you could either decide about helping out in graphics related stuff, or depending on the actual gaming situation, frame by frame, helping out in physics, or AI related stuff, or other fancy stuff (ok, due to weak RSX, we only see the graphics related stuff for PS3, but...), getting you better 'usability' per mm^2/watt. Would such an adaptive approach like this be even possible, or desirable?
 
In terms of the algorithm, not really... PS3 just needed it the most because there's no way you're going to run MSAA or similar on RSX in any reasonably complex game.

But that's entirely the argument here, and one that I agree with: the fact that you can do a lot of work to lessen the performances issues with PS3 by using the SPUs does not make Cell a good design. In fact in practice it has been mostly relegated to stuff that any competent GPU should be doing, because it hasn't really turned out to be good at more general stuff honestly. So sure it can do skinning and culling and shading, etc. but a GPU can do those even more efficiently, so the PS3 designers would have been better off spending the cost/area/power on a bigger GPU.

I maintain that not having a proper cache attached to the SPUs is a crippling problem, and that is at the heart of what limits their usability for more complex workloads. Fundamentally if you are mostly limited to predictable access patterns and simple, regular data structures then a GPU is just going to bury you in performance per mm^2/watt.

Cell was meant to be sort of half way between CPU and GPU, but in practice it ends up more like just a poor GPU (speaking of the SPUs in particular). I'm still waiting for some truly awesome algorithm that just couldn't be done efficiently on either CPU or GPU but so far it's just basic parallel programming "moved to the SPU!". At this stage I think it's fair to conclude that the memory architecture is just too crippling to do anything really interesting with.

So you admit to being one of the sucky programmers incapable of handling the greatness. Do you also use C++? :p
 
i just feel like its easy for people to badmouth and label a failure since we'll never see an improved successor after sony forced kutaragi out thus making the cell architecture a one and done project

i dont know much about die area and transistor count but who knows what kind of upgraded design could have been made at 28nm or 22nm, maybe a 8ppe64spe design OR 6ppe48spe

who knows what something like that would have been capable of... both by itself and paired with your modern amd/nvidia gpu
KK leaving his position at Sony has nothing to do with IBM not working on a Cell successor or the failure of Toshiba add-on card.

my guess is that the people who would have ordered cell's successor would have been the same people who ordered the first cell, that is unless the initial undertaking of cell was ibm's idea
 
Last edited by a moderator:
KK leaving his position at Sony has nothing to do with IBM not working on a Cell successor or the failure of Toshiba add-on card.
 
i just feel like its easy for people to badmouth and label a failure since we'll never see an improved successor after sony forced kutaragi out thus making the cell architecture a one and done project

It is important to understand that one of the main reason that Kutaragi was forced out was CELL. AKA, CELL was so bad and so championed by him that he got the axe.

i dont know much about die area and transistor count but who knows what kind of upgraded design could have been made at 28nm or 22nm, maybe a 8"ppe"64spe design OR 6"ppe"48spe

And neither would be better than a couple AVX cores and spending the area on GPU. SPEs will always lose out to GPUs. And SPEs don't have any advantages over GPUs.

who knows what something like that would have been capable of... both by itself and paired with your modern amd/nvidia gpu

Paired with a modern GPU, the SPEs would be useless and wasted silicon.
 
Andrew Lauritzen said:
so the PS3 designers would have been better off spending the cost/area/power on a bigger GPU.
Correct me if I'm wrong, but wasn't the plan to have a bigger GPU? And didn't the bigger GPU not happen because chip yields weren't high enough? I seem to remember 32 pixel ALUs getting cut down to 24.
aaronspink said:
Paired with a modern GPU, the SPEs would be useless and wasted silicon.
I find it hard to believe that with a better GPU, no developer could come up with any additional computing to do. There's a whole lot on the physics/animation side that either isn't happening this gen or only barely happens. Seems like whenever anyone declares this or that chunk of computational ordinance to be useless, someone finds a way to use it.

From a business standpoint, I agree, Cell was a failure. It never panned out to be the general purpose supercomputer-on-a-chip they were shooting for. The market seems pretty well dominated by x86.

On the software side, there hasn't been a significant difference in PS3 and 360 games for a long time, and what difference there is can be attributed largely to the low fill rate of the RSX. Some people say GOW and UC do some neat things that can't happen on 360, but qualitatively, I'm not seeing a huge difference w/ Gears of War or Forza 3 (haven't been many showpiece exclusives on the 360 lately). So maybe the real problem is that nVidia isn't a good partner in the console space, since this is the second time they've barely modified a PC GPU.
 
It is important to understand that one of the main reason that Kutaragi was forced out was CELL. AKA, CELL was so bad and so championed by him that he got the axe.
no, he was forced out because the ps3 underperformed and corporate politics, had he been there longer, the ps3 would have achieved the same level of success, kaz harai was not responsible for uncharted, mgs4, lbp, resistance, kz2, infamous, and ultimately uncharted 2, kaz harai wasnt responsible for any of that, was he? was kaz harai even involved in the formation of world wide studios?
And neither would be better than a couple AVX cores and spending the area on GPU. SPEs will always lose out to GPUs. And SPEs don't have any advantages over GPUs.
and you know that how? from the tiny tidbits youve read from porters over the years, or youve actually delved into a cell network?

Paired with a modern GPU, the SPEs would be useless and wasted silicon.

wasted? so exclusive devs wouldnt make use of them? to find new ways to process increases in data? or even graphics? theyd just be sitting there?

i get the feeling that people who dont work on it or never have presume that everything possible on the Cell architecture has already been accomplished because it happens to exist as an 1ppe6spe chip in the ps3
 
Last edited by a moderator:
Cell was meant to be sort of half way between CPU and GPU, but in practice it ends up more like just a poor GPU (speaking of the SPUs in particular). I'm still waiting for some truly awesome algorithm that just couldn't be done efficiently on either CPU or GPU but so far it's just basic parallel programming "moved to the SPU!". At this stage I think it's fair to conclude that the memory architecture is just too crippling to do anything really interesting with.
I'm operating on very fuzzy memory of a second-hand discussion, but I did see it discussed that for scientific computing there are certain algorithms could have better scaling on a non-coherent architecture like Cell than they would on a coherent SMP system. However, the caveat was that this was better scaling over truly massive problem sizes and on problems you might not be interested in unless you worked at Los Alamos.
That's not to say CPUs fell apart, they just scaled a little less and it started to matter more at the larger node counts. That was quite some time ago, however, and things are looking different these days.

One thing I wonder about Cell is that IBM purportedly wanted a homogenous and coherent multicore, but Toshiba and to a lesser extent Sony wanted the non-coherent DSP-like SPEs.
How much of Cell's development was Sony and Toshiba throwing serious cash to paper over their computational shortcomings?
There's no doubting IBM's comfort in designing complex and robust cores, along with very strong interconnect and memory protocols. Xenon wasn't necessarily the best thing IBM has done, but that wasn't an example of them bringing their A game.

However, we see what the SPE is seemingly designed to avoid all those knotty problems like a complex core and difficult to design and validate memory pipelines. When the future was found to be in multicore scaling, they--and this is the media company part of STI, not the compute side--sided with a design that saw difficulties and complexities and simply didn't show up for the fight.
 
And SPEs don't have any advantages over GPUs.
Yeah, the GPU that just learned how to fetch data from unified memory address space.
GPU compute is limited to specific workloads.
How about a serial LZMA decompression running on GPU?
How about doing that directly from disk?
GPU suffers from huge latency. The effectiveness of small tasks is decreased even further.

SPE is flexible. It can do the job asynchronously (or synchronously) within microseconds.
SPE is toughier to code than GPU because of lack of SIMD-friendly compiler like ispc,
but it is easier to optimize (you're either limited by memory or computation).

Paired with a modern GPU, the SPEs would be useless and wasted silicon.
Maybe, if you have a tightly paired APU with shared TLBs and etc.
That means 3rd or 4th gen APU at least (not released yet).
 
I find it hard to believe that with a better GPU, no developer could come up with any additional computing to do. There's a whole lot on the physics/animation side that either isn't happening this gen or only barely happens. Seems like whenever anyone declares this or that chunk of computational ordinance to be useless, someone finds a way to use it.

You are either better off doing it on a modern CPU core or better off doing it on GPU cores. SPE is in the no man's land in between. The problem is that the SPEs are basically limited to the same or less programmable functionality of a modern GPU core while being much larger and power hungry. And they significantly lack the flexibility of a modern SIMD CPU core while being not much smaller.
 
no, he was forced out because the ps3 underperformed and corporate politics, had he been there longer, the ps3 would have achieved the same level of success, kaz harai was not responsible for uncharted, mgs4, lbp, resistance, kz2, infamous, and ultimately uncharted 2, kaz harai wasnt responsible for any of that, was he? was kaz harai even involved in the formation of world wide studios?

He spent an insane amount of money on an architecture with no future. The PS3 underperformed because he spent an insane amount of money on an architecture with no future.

and you know that how? from the tiny tidbits youve read from porters over the years, or youve actually delved into a cell network?

by actually studying and designing computer architectures for a living. The SPE is the equivalent of cutting off your arms and attaching them where your missing legs should be.

wasted? so exclusive devs wouldnt make use of them? to find new ways to process increases in data? or even graphics? theyd just be sitting there?

There really isn't anything that SPEs can do that modern GPUs cannot do faster with less area and less power.

i get the feeling that people who dont work on it or never have presume that everything possible on the Cell architecture has already been accomplished because it happens to exist as an 1ppe6spe chip in the ps3

Everything possible has been done because it is a dead end architecture.
 
I'm operating on very fuzzy memory of a second-hand discussion, but I did see it discussed that for scientific computing there are certain algorithms could have better scaling on a non-coherent architecture like Cell than they would on a coherent SMP system. However, the caveat was that this was better scaling over truly massive problem sizes and on problems you might not be interested in unless you worked at Los Alamos.
That's not to say CPUs fell apart, they just scaled a little less and it started to matter more at the larger node counts. That was quite some time ago, however, and things are looking different these days.

There is advantage to having scratch pads and read current non-coherent accesses but this isn't what the SPE does. It has a scratch pad that must be filled and emptied via coherent DMA. The reason to have scratch pad (generally for intermediate values/results) and read current non-coherent accesses is to reduce the coherence overhead on the system for this that don't need to be *maintained* in a coherent state. You still want the capability to do coherent transaction/message passing in order to co-ordinate and interact. But this isn't how the SPEs were architected.
 
Status
Not open for further replies.
Back
Top