Does Cell Have Any Other Advantages Over XCPU Other Than FLOPS?

Titanio said:
DP is of marginal interest in a discussion of Cell as a games processor. DP on competing chips is..?

As for integer, I haven't yet seen any figures or guesses as to its performance, other than that it is of secondary importance to the SPUs. If you've seen a figure, let us know.



This doesn't really make any direct points about efficiency, I don't think? It just describes some of the architecture.

On SPUs accessing other LS - can anyone confirm the PPEs role here, if any? Can't one SPU put something on the EIB, and another pick it up?

Re. looping/branching - I wasn't aware looping was not available in SPU code. In fact I thought it was - you just don't have any branch prediction, it'll always assume that the branch is taken. There are branch hints, though, and of course, ways to avoid branching and looping - loop unbundling being one way for the latter, as he describes. Using that doesn't mean you couldn't use a loop if you wanted, though, and were confident of the behaviour of the loop. Assuming I'm not mistaken about the simple availability of loops in SPU code? I need to spend more time with that simulator ;)



That doesn't actually tell us why the SPUs don't use cache - at all. The In-order processing issues relate aren't unique to Cell (Xenon is in-order too, for example), and can be overcome in some (if not many) cases with an intelligent approach/more work. To take his example, he should be using E = A + B + D ;) Or if he wanted to store A+B seperately, then he should just follow E = A+B+D with C=A+B. A trivial, silly example, I know, and not all dependencies would be so easily resolved at all, but still.

I'm not an cpu expert :oops:
but the fact that spe lacks branching at all, are in order, don't have cache is not related to efficiency and real-world speed?
so why Gabell regrets and Carmack talks of cell programming as "pain in the ass" ?

let me understand, are they crappy game programmers or there's a problem getting reasonable speed from cell?

also, I read somewhere that apple discard cell 'cause his poor general purpose performance, and take the x86 way, for his MAC OS
maybe the hype on this processor (out of earth peak values, etc) exaggerate the public perception??

let me understand
 
SynapticSignal said:
but the fact that spe lacks branching at all, are in order, don't have cache is not related to efficiency and real-world speed?

It relates to the (hardware) tools at a programmer has at their disposal to achieve what they want to achieve. They place more responsibility on the programmer:

no cache: you have to be explicit about memory access, try to plan it in advance, and/or implement a cache in software if you prefer

in-order execution: watch for dependencies in your code

no branch prediction: either do what you need to do without branching - very possible in many cases, even for things you might consider branching heavy, and i could give examples here - or use profiling with hints, unroll loops etc.

What's the upside? You have 7 of these things and the PPE. They could have added OOE, they could have added branching, they could have added hardware cache control, but would you have 7 of those for the same silicon/dollar? No, not near it. The things they cut are arguably things that can be offset with more programmer care/effort. You can't offset a lack of horsepower, ultimately, however - eventually that's what you come up against. I think that's what drove the design. Within reason, move more responsibility from hardware to software so that you can get more hardware that actually processes your code in there. When you're going into a closed box, how much you can transfer "within reason", arguably goes up.

SynapticSignal said:
so why Gabell regrets and Carmack talks of cell programming as "pain in the ass" ?

See above, they're used to the comforts of hardware control on PC CPUs.

SynapticSignal said:
also, I read somewhere that apple discard cell 'cause his poor general purpose performance, and take the x86 way, for his MAC OS
maybe the hype on this processor (out of earth peak values, etc) exaggerate the public perception??

Who's hyping it as a "general purpose" desktop processor? There's more at work when a company like Apple chooses a technology than just simple technical merit, too.
 
Last edited by a moderator:
The CELL SPE's do have branching capability, they just don't have dedicated branch prediction hardware.

To quote from "Introduction to the CELL multiprocessor - by Jim Kahle, Day, Hofstee, Johns, Maeurer, Shippy" (available on the IBM website):

Synergistic processing element - "To limit hardware overhead for branch speculation, branches can be ‘‘hinted’’ by the programmer or compiler. The branch hint instruction notifies the hardware of an upcoming branch address and branch target, and the hardware responds (assuming that local store slots are available) by pre-fetching at least seventeen instructions at the branch target address. A three-source bitwise select instruction can be used to further eliminate branches from the code."
 
On an unrelated note, something I've always wondered; the PPE is based on a PowerPC instruction set... so what are the SPEs based on? They run their own kernel and contain completely different hardware, surely it's not the same instruction set?

Can someone clarify?
 
SynapticSignal said:
also, I read somewhere that apple discard cell 'cause his poor general purpose performance, and take the x86 way, for his MAC OS
maybe the hype on this processor (out of earth peak values, etc) exaggerate the public perception??

let me understand

I'd like to see the quote.

Having followed Apple for so many years it is my belief that they did not choose the Cell for that reason. If you were a company thinking about transitioning AWAY from the PPC would you seriously consider the CELL processor considering it's not out in the market? Do you really think STI will give the large volume discount Apple wants for its processor. I think not. Both answers are no obviously.

Seriously, poor general purpose performance was not an issue with Apple.
 
SynapticSignal said:
I'm not an cpu expert :oops:
but the fact that spe lacks branching at all, are in order, don't have cache is not related to efficiency and real-world speed?
As always it depends what you're trying to do. Lack of branch prediction and in-order processing isn't going to have any negative impact on much of what SPE's are intended to process, because the code will be written to avoid as much of that as possible. Cache doesn't add to performance efficiences over SPE's local storage. It moves the need to manage local stores from developers to the hardware. Both serve the same purpose in prefetching needed data from main memory and keeping it in fast local storage.
so why Gabell regrets and Carmack talks of cell programming as "pain in the ass" ?
Because people don't like change. They way people are used to writing code won't work on Cell or XeCPU (something people arguing against Cell oft overlook. XeCPU has some of the same requirements to programming efficiently) so they've got to get rid of their old ways of solving problems and come up with new ways. This is trickier on Cell on that you need to think of code segments that fit into 256 kb LS and work on vectorised data.
let me understand, are they crappy game programmers or there's a problem getting reasonable speed from cell?
How many times does this have to be covered? Writing for Cell is different to writing for other processors, so of course it's going to be harder for people used to a different architecture. In exactly the same way anyone raised on Cell coding would find the limits or difference of conventional CPUs annoying. I speak English effortlessly because I was raised in England amongst English speakers, but had I been raised in Germany it is German I'd be speaking effortlessly and having to come to grips with English. Learning Cell is like learning a new language, but once it's mastered it ought not be overall any harder than any other language, save for the inherant intellectual demands of thinking in multithreading terms. Gicen a choice to keep using existing ways of coding or having to learn entirely new ones, it's not surprising which many would prefer. But the new ways seem the ways needed to get more performance and are the ways that'll be permeating all programming, so they'll have to leanr nonetheless.
also, I read somewhere that apple discard cell 'cause his poor general purpose performance, and take the x86 way, for his MAC OS
Though on the one hand no-one's saying Cell is the ideal CPU for a general purpose PC, it might be worth confirming that that was the reason Apple didn't go wtih Cell for many other reasons, such as Ars Technica's
As I stated in my previous article on the switch, Apple is more concerned with scoring Intel's famous volume discounts on the Pentium (with its attendant feature-rich chipsets) and XScale lines than it is about the performance, or even the performance per Watt, of the Mac.
Honestly, taking 'one guy once said such and such and so it must be true and I believe it wholeheartedly' is a pretty daft and narrowminded way to expans your understanding of things.
maybe the hype on this processor (out of earth peak values, etc) exaggerate the public perception??
Or maybe the public just don't understand what peak values mean or what the hardware is designed to achieve and how it's designed to achieve that? Maybe they should all stop quoting figures and opinions read in forums and dubious technical evaluations and come to B3D and read up on the subject, instead of reading one off statements like 'Cell SPE's have no Integer performance; Apple didn't use Cell because it's a crap CPU that doesn't work but the designers just pretend it works by making up big imaginary peak figures and not releasing any real details; and Cell is a super computer on a chip that'll process everything at 250 GFlops and Intel and AMD will be bust in 5 years' and swallowing these wholeheartedly?
 
Apple computers are generally used for multimedia applications like video and music editing. I'm sure they'd have welcomed the extra streaming and floating point power that the Cell could provide.

Whatever reason they chose not to use Cell, it doesn't matter. They chose Intel, so the Cell is now their competition, and thus it's their job to down-play its capabilities.
 
Titanio said:
Who's hyping it as a "general purpose" desktop processor?

Ken Kuturagi repeats this many and many times, that ps3 thanks to Cell will be a computer of "supercomputer" class (maybe he's drunk or what?) and that a linux os + keyboard will be release for ps3 to use its supercomputer power

if this is not hyping as a "general purpose" i don't know what it's then...
and the worse thing is that the pressmans believe his madness
 
Shifty Geezer said:
Honestly, taking 'one guy once said such and such and so it must be true and I believe it wholeheartedly' is a pretty daft and narrowminded way to expans your understanding of things.

seems that people can't avoid to insults and can't keep the discussion clean
this is very sad
 
SynapticSignal said:
seems that people can't avoid to insults and can't keep the discussion clean
this is very sad
I'm not insulting you at all. I'm sying taking one person's statement as fact is a daft and narrowminded way to learn, which it is.

As for supercomputers, which is going way off topic, supercomputers aren't general purpose computers. Even if PS3 was capable of calculating weather patterns as well the Met Offices latest acquisition, wouldn't mean it was an effective general purpose processor. It's a high performance calculating machine designed to work on processing large amounts of data quickly, though with extra versatility than just being FPUs. SPE's forgo several general-purpose enhancements that make developers' lives easier so that they can fit more processing units onto a finite amount of silicon. This potentially high throughput is ideal for the purposes for which the processor is intended to be used.

In the context of this discussion, XeCPU is similar. It has cut back some of the developer friendly components to fit more execution units onto the chip. If you want to carry on questioning whether Cell really can achieve it's peak Flops and whether it's all just hype or not, please use the Search function to revisit one of the many Cell threads that already probably covers your points, and if you've anything new to add, add it there. This thread is supposed to be comparing Cell design with XeCPU design based on given information. If you want to raise a matter with Cell, raise it in context of the XeCPU too. Eg. If you want to say 'Cell won't be any good at games' explain why XeCPU will (although really the thread's title limits discussion to what advantages Cell has over XeCPU, and not vice versa).
 
SynapticSignal said:
Ken Kuturagi repeats this many and many times, that ps3 thanks to Cell will be a computer of "supercomputer" class (maybe he's drunk or what?) and that a linux os + keyboard will be release for ps3 to use its supercomputer power

if this is not hyping as a "general purpose" i don't know what it's then...
and the worse thing is that the pressmans believe his madness
In your definition 'supercomputer' means 'general purpose'? :oops:
 
seems that people can't avoid to insults and can't keep the discussion clean
this is very sad

Shifty Geezer won't hear a bad thing said about Cell.

Writing for Cell is different to writing for other processors, so of course it's going to be harder for people used to a different architecture. In exactly the same way anyone raised on Cell coding would find the limits or difference of conventional CPUs annoying. I speak English effortlessly because I was raised in England amongst English speakers, but had I been raised in Germany it is German I'd be speaking effortlessly and having to come to grips with English. Learning Cell is like learning a new language, but once it's mastered it ought not be overall any harder than any other language, save for the inherant intellectual demands of thinking in multithreading terms. Gicen a choice to keep using existing ways of coding or having to learn entirely new ones, it's not surprising which many would prefer. But the new ways seem the ways needed to get more performance and are the ways that'll be permeating all programming, so they'll have to leanr nonetheless.

No it's definately alot harder to create applications that use an array of mini processors with no cacheing than to create the same applications on one hefty x86 CPU that takes alot of work out of the programmer's hands. There are ways to make Cell programming easier and programmers will cope, but the general notion that programming on a Cell like architecture is harder than programming on single or even multi core x86 like architecture would be correct. If each SPU was equipped with all the features of your average CPU then it wouldn't be any harder than any other multicore solution, but their not. They are missing certain features that require workarounds, thus makeing it harder to program for Cell.

Also the analogy about English and German doesn't make your argument that Cell is just different, not any harder, any stronger. Unless you can prove that Cell's programming models are as easy to implement in a range of Applications as a conventional CPU's are, I'll just go with what common sense is telling me.

That doesn't mean I don't like Cell or it's design, on the contrary I think it's quite cool, but I acknowledge it's pitfalls.
 
Synaptic Cell's integer performance is right up there with it's floating point; harnessing that power is the potential issue there. Indeed though, it's dual-precision performance suffers compared to it's single precision. IBM may be addressing the SPE's DP capabilities in a future Cell revision, as they have hinted at. Not that that matters as far as PS3 is concerned though; I doubt it would see such a revision.

By the way on the whole 'supercomputers' and Kutaragi thing, supercomputers are hardly a good example of 'general processing' power.

As for Apple, I think Cell was always a very low-chance sort of thing for them. They've been working on x86 versions of their OS and apps secretly for years now it seems, and the main benefits of going Intel will be lower chip costs, lower power draw on their notebook chips, and ease of programming.
 
A bit OT, but just to get a better picture:

Writing for Cell wouldnt be more like speaking multiple languages at the same time for different people (and taking in account that speak to different people in the same language is already a lot harder...)?
 
Ragemare said:
Shifty Geezer won't hear a bad thing said about Cell.
If you say so.
No it's definitely alot harder to create applications that use an array of mini processors with no cacheing than to create the same applications on one hefty x86 CPU that takes alot of work out of the programmer's hands.
I agree and never said otherise. SynpaticSignal was saying 'These devs say Cell is hard, so are are they crap programmers or is Cell hard?' My response was 'it's hard for people to adapt.' Once they know Cell, it won't be impossibly hard, but it will of course be harder than a hefty x86 OOE core. But it will of course be hard for a hefty OOE x86 to decompress 12 HD streams on the fly or process a 16 million element FFT quickly. The cost of greater performance is grater responsibility and effort needed by the devs.
Also the analogy about English and German doesn't make your argument that Cell is just different, not any harder, any stronger. Unless you can prove that Cell's programming models are as easy to implement in a range of Applications as a conventional CPU's are, I'll just go with what common sense is telling me.
Well no-one can prove that as it's subjective! At university I found SML easier to work with than many colleagues who were far happier with C/C++, Modula 2, and 'traditional' languages. You can't say 'C++ is easier than SML' as it depends on individuals. My point is you are happiest whatever you're used to. Chinese is a tonal language that works on very different principles to European languages. That makes learning Chinese harder for most Europeans than learning similar western languages. Yet anyone from this side of the world who's having trouble learning Chinese would speak it without problem had they have been born and raised in China. Chinese isn't a 'difficult' language, it's just suitably different from western languages to make it hard to adjust to when you're in the habit of speaking western languages. Likewise a different coding mentality isn't 'difficult', it's just different and needs time to learn. That's the nature of human beings and has nothing to do with technology. If those same friends of mine who had trouble learning SML had never had experience with any other languages, they would not have found it difficult to think in terms of the language.

The difficulties of designing for Cell are very conceptual in the main, it seems to me. People already know how to do things, and when those things no longer work trying to get them to work is difficult. The difficulties associated with Cell require thinking about data structures, memory access patterns, working with 256kb packets of code etc., which at the moment is alien, even though these things are fundamental to getting high performance. Once people have got used to thinking in a new way, it will be no more difficult to design for Cell than any other CPU I dare say. The only obvious difference is that bad programmers won't have any part to play. By bad programmers I can give example from a friend who at his work place took a colleague's 15 nested IF statements and reworked them into a four line FOR...NEXT loop. There are people out there who write attrocious code that still runs, but not efficiently. These same people could also write attrocious code on Cell and have it run inefficiently, only painfully more-so. And to be honest they shouldn't be programmers. No-one who wrote books as badly or painted pictures as sloppily would make a career as an author or artist.

To clarify, it seems to me more a problem of Cell being difficult to learn rather than difficult to work with. It's like juggling. If writing for x86 is like throwing and catching one ball, writing for Cell could be like juggling 3 balls simultaneously. Takes a while to learn but with practice it becomes easy. Though it could be writing for Cell is like trying to juggle 7 balls and very hard for practised veterans to cope with; I've no experince of writing for Cell ;). At this point in time though, people saying 'Cell is hard to write for' haven't given themselves time to learn yet, so aren't in a position to say. The idea that OOE and automatic caches in an x86 will have it outperforming Cell because Cell is too hard to work with is bunk IMO. Hence my answer which was weighted against SynapticSignal's arguments to show the counter point. Will Cell be too hard to get the theoretical perforamnce out of it? I don't think so. Is it harder to write any old generic code and have it run at a half decent speed on Cell than on an x86? Yes, of course. Should inefficient unoptimized code be finding it's way into a console game running on a closed box system? I don't think so. Is it even possible to write an optimised cloth simulator that runs on a P4 as fast as an optimised cloth simulator does on Cell? No.

That doesn't mean I don't like Cell or it's design, on the contrary I think it's quite cool, but I acknowledge it's pitfalls.
As do I. It's SynpaticSignal's idea that Cell is good for nothing, or at least lack of OOE+caches+etc. mean Cell is cripled and cannot attain the high performance tooted for it, that I'm responding too with the counter-arguments.
 
I think the idea that the Cell will be bad on General purpose code is nonsense. I've yet to see a single good explanation of why this should be.

The problem is "general purpose" means usable for everything. That by definition includes all sorts of different types of code, some of which Cell will be good at others it'll be bad at.

You could say the SPEs will be strong on intensive processing code and say they'll be relatively weak on control code but even this may not be entirely true - it depends on the algorithms used. It may be possible to change the algorithm used in a control function so it performs well on an SPE.

As far as the PPE is concerned it's a bit more complex, they will almost by default be better on control code as they are more of a conventional design.

The lack of Out of Order processing isn't going to give even nearly as big a hit some people seem to suggest. Removing it from an x86 processor will cripple it - but we're not talking about x86 processors, were talking about PowerPC. x86 needs OOO for the extra registers it provides, PPC had enough registers to begin with so doesn't benefit from OOO even nearly as much.

Going in-order should drop the performance *per clock* by up to 30%, PowerPC has enough default registers to keep the pipelines flowing and modern compilers are quite capable of doing any reordering necessary.

The PPEs also have relatively small branch predictors but this is the same, big branch predictors don't provide much benefit, so they didn't use them. It may mean 5% extra branches are mispredicted but that's nothing.

The SPEs don't have branch predictors because they don't need them much, SPEs can make use of branch removal techniques, there's branch hints if they are needed though.

OK, so the PPE will be weaker per clock than an x86, why do this?
Simple - it reduces power consumption by a huge amount, they had to do this in order to fit everything on the chip. But by reducing power consumption they are able to increase the clock significantly, the Cell and XCPU should both clock easily to well beyond 4GHz. A high clock should more than make up for any relative inefficiencies per clock.

Almost all of the general purpose applications I know of can in some way be accelerated by the SPEs - even things like word processors. It may only be a fraction of the actual code but it's usually that small part which does most of the work, i.e. the part which will have the biggest impact on performance.

So, I think it's something of a misnomer to say Cell will be weak on general purpose code. I am of the opinion that -with enough effort- the Cell can be very good on GP code.
 
pc999 said:
A bit OT, but just to get a better picture:

Writing for Cell wouldnt be more like speaking multiple languages at the same time for different people (and taking in account that speak to different people in the same language is already a lot harder...)?
Though I don't like employing simile in this kind of talk, Cell is an orchestra, as SCEA research suggests, rather than a solo piano or a violin trio. Admittedly it would be more laborsome and costly to write a full symphony or train a full orchestra than to write a solo or train a violin trio.
 
ADEX said:
PowerPC has enough default registers to keep the pipelines flowing and modern compilers are quite capable of doing any reordering necessary.
...then you find out that modern compilers are not modern enough..
The PPEs also have relatively small branch predictors but this is the same, big branch predictors don't provide much benefit, so they didn't use them. It may mean 5% extra branches are mispredicted but that's nothing.
5% is a small number, but you can waste a LOT of cycles on that little number. That's why we have processors with very capable and big BPUs. A 5% can make a difference..
The SPEs don't have branch predictors because they don't need them much, SPEs can make use of branch removal techniques, there's branch hints if they are needed though.
C'mon! SPEs would need a BPU as every other processor but they decided to make not use of a BPU to save transistors, at the same time they designed this thing around the absence of a BPU.

ciao,
Marco
 
ADEX said:
The SPEs don't have branch predictors because they don't need them much, SPEs can make use of branch removal techniques, there's branch hints if they are needed though.
I thought it was the developers who had to make use of branch removal techniques ;) By the same reasoning x86 doesn't need branch prediction because it can use branch removal techniques - there's nothing stopping developers writing reduced-branch code on x86 any more than they can on SPEs.
 
Back
Top