Does Cell Have Any Other Advantages Over XCPU Other Than FLOPS?

Vince said:
So, kinda like the Alpha 21364 ?

Considering that everyone and their puppy dog followed the lead of the 21364 in their processor designs, I would think not. K8 owes a LOT to architectural designs made in EV7, as do most of the other processors publically disclosed for the future. EV7 was the product that put microprocessors down the path of integrating a larger percentage of the system directly onto the microprocessor.

Aaron Spink
speaking for myself inc.
 
Vince said:
So, kinda like the Alpha 21364 ?

Alpha died because of politics and economics, not because of it's architecture.

Vysez: The limit for the champagne is €150 :D (+ shipping, yes I'm cheap)

Edit: or a bottle of scotch. You can get a very nice 1978 Glenrothes for that amount.

Cheers
Gubbi
 
Last edited by a moderator:
ihamoitc2005 said:
1) There is no point in making insult to others no? Do you feel there is much to gain in knowledge or understanding from calling someone "waste of time"?

An insult generally doesn't have truth behind it. Version IS a waste of time.

2) CELL is not x86 so stop trying to fit x86 programming model onto CELL. This is perhaps youre 19,842nd post saying CELL is no good because it has different programming model than one you like. It is different from what you are used to, that does not in itself make it better or worse.

I know cell is not x86. It isn't an issue of fitting an x86 programming model onto CELL. It is an issue of fitting pretty much EVERY programming model onto CELL. x86 is simply an instruction set, it doesn't dictate the programming model and has the same programming model as Sparc, MIPS, ALPHA, PA, IA64, Power, etc.

I have less than 1K posts. I don't believe that I have ever said that CELL is no good, I have simply said that the required programming models for CELL are different and quite complex and will result in much greater programming difficulty.



3) Many have already taken very good advantage of CELL and published openly (available with simple google search my friend) so we have indication from real world examples that, despite new programming model that programmers had to learn, CELL is extremely effective and powerful.

Um, no. We've had some example of quite simple algorithims being ported to CELL with much effort that seem to run reasonably. This is no different than TERA or anyother exotic hardware. The issue is that real programs are more than algorithims and many of the things shown in CELL are either not required, easier to do with fixed hardware, or possible to do perfectly aceptable on more mainstream hardware.

Go lookup the history of TriMedia for an example of funky hardware with some promise but overcome because of a complex programming model.

For PS3 CELL will do ok, do to a significant amount of very custom programming. But a lot of people want to make it into the second coming which it is not. Or revolutionary, which it is not.

Aaron Spink
speaking for myself inc.
 
Gubbi said:
Alpha died because of politics and economics, not because of it's architecture.

Vysez: The limit for the champagne is €150 :D (+ shipping, yes I'm cheap)

Cheers
Gubbi

Cheap? Christ, €150 for champagne will get you some really really good bubbly if you don't want some overpriced french grape juice.

Aaron Spink
speaking for myself inc.
 
aaronspink said:
This is more pipedream than reality. The architecture of the SPE's lack many of the features that one would like to run in a true kernal, leaving you with something akin to non-preemptive multithreading with a rather limited code and data space. Very unlikely to happen in reality.

The SPU can't really re-task itself on its own, at best you can have to switch between sections of a program.

Aaron Spink
speaking for myself inc.

And yet my own code does this... it runs a small kernel on the SPU(s) which loads and unloads code to do different tasks and streams data back and forth from main memory. The existence of the PPE is not necessary except for initialisation - and if the Cell designers chose to remove it everything would still work. Right now we have a choice of two types of core to run code on - the stupidly fast array of streaming cores or the not-so-fast traditional core. Either can take on any job, the decision is not based on feature set, but on how critical and/or easy it is to code something around the SPU memory model.

Why would a multicore processor need every individual core to bootstrap independantly? Why would you *not* give initial control to a single unit and let it bring the rest up? Why in a chip where eight of the cores are optimised for a particular set of algorithms and the other is more standard would you bootstrap the specialist cores instead of the ordinary one? Why does the fact that the designers did this very logical thing in any way indicate some kind of weakness in the design of the SPU cores? What feature is it you think they lack which would stop them from being used as self-contained processors?

Do you actually have some kind of coherent point to make? Because from where I'm sitting it looks like you're blowing a lot of hot air and just trying to post everyone else into submission on technicalities I'm not sure you even understand.
 
aaronspink said:
CELL is an architectural dead end. The general faults with the architecture are such that no one else will likely go down the path that was chosen by CELL.




Aaron Spink
speaking for myself inc.


I don't understand how it's an architectural dead end when we are almost upon the adoption of IPv6. The dawn of a Pervasive Computing world is almost upon us. IPv6 and Grid Computing fit together well and CELL seems it's designed to excel in this enviroment. Achieving the vision for a "CELL WORLD" won't happen overnight because it will take time for CELL to work it's way into billions of shipped consumer products like TV sets and cell phones.
 
I have less than 1K posts. I don't believe that I have ever said that CELL is no good, I have simply said that the required programming models for CELL are different and quite complex and will result in much greater programming difficulty.

This is true to a degree but it is or will be true for all processors. The days of ever faster single threaded CPUs doing it all for you are over. Unfortunately nobody seems to have noticed.

The tricks you have to learn for programming Cell are the very same tricks you need to learn for every future processor. If you don't learn them you may find you code eventually slowing down with new processor releases.
 
Brimstone said:
I don't understand how it's an architectural dead end when we are almost upon the adoption of IPv6. The dawn of a Pervasive Computing world is almost upon us. IPv6 and Grid Computing fit together well and CELL seems it's designed to excel in this enviroment. Achieving the vision for a "CELL WORLD" won't happen overnight because it will take time for CELL to work it's way into billions of shipped consumer products like TV sets and cell phones.
The idea of multiple cores isn't a dead end, I think Aaron is talking about this specific implementation of Cell. I'm kind of up in the air about it, it's a neat way to get around some of the limitations current hardware has (e.g. bandwidth) but I'm doubtful that the next version of Cell will be the exact same hardware, but with more cores (the vision(tm)). I'm in particular dubious about IBM championing the hardware, when I see boxes roll with IBM badges and them actually pushing them over their other hardware then I'll be a believer. And if someone figures out how to get faster single-threaded cores going again, we'll look back on all this bother and laugh, at least until the next time things bog down.

MrWibble - can you say what kind of things you're doing on the SPEs, in general terms? I think a lot of us would be interested in what kind of real work is happening there rather than speculation about what is and isn't possible from people who haven't touched the hardware.
 
Brimstone said:
I don't understand how it's an architectural dead end when we are almost upon the adoption of IPv6. The dawn of a Pervasive Computing world is almost upon us. IPv6 and Grid Computing fit together well and CELL seems it's designed to excel in this enviroment. Achieving the vision for a "CELL WORLD" won't happen overnight because it will take time for CELL to work it's way into billions of shipped consumer products like TV sets and cell phones.

As an aside to this conversation, not sure if this ever got posted in this forum or not, but here's SCE's patent for the design and implementation of what essentially amounts to a non-packet based network grid/distributed computing concept.

Patent

Not surprisingly, all processor 'examples' refered to in the patent are of the Broadband Engine variety.
 
Last edited by a moderator:
cell.JPG
 
Brimstone said:
I don't understand how it's an architectural dead end when we are almost upon the adoption of IPv6. The dawn of a Pervasive Computing world is almost upon us. IPv6 and Grid Computing fit together well and CELL seems it's designed to excel in this enviroment. Achieving the vision for a "CELL WORLD" won't happen overnight because it will take time for CELL to work it's way into billions of shipped consumer products like TV sets and cell phones.

IPv6 and pervasive computing have nothing to do with cell.

Grid computing is called clusters, its been around from at least the early 90's.

CELL is designed for a gaming console, not toasters, refrigerators, or cell phones. Cell uses way too much power to work in a cell phone. ARM owns the cell phone market and this isn't likely to change.

Aaron Spink
speaking for myself inc.
 
ADEX said:
This is true to a degree but it is or will be true for all processors. The days of ever faster single threaded CPUs doing it all for you are over. Unfortunately nobody seems to have noticed.

There is some truth in what you say but also some false hood. There is still Amdal's Law to consider.

The tricks you have to learn for programming Cell are the very same tricks you need to learn for every future processor. If you don't learn them you may find you code eventually slowing down with new processor releases.

You have stated your thoughts incorrectly...

The tricks that you have to learn to program a multi-processors are some of the very sae trick you need to learn for CELL. CELL adds additional complexities on top of the multi-processor programming models.

I'm not debated whether there will be a multi-context future, that was known back in the early 90's. The issue is whether the path taken by CELL is the best path going forward and whether it leverages the existing knowledge base in a way that is useable or throws it all away. CELL is not revolutionary, things like it have been tried in the past and the programming model issues remain very much unsolved. In this aspect it is very similar to VLIW: It keeps looking interesting but doesn't actually solve the real problems.

Aaron Spink
speaking for myself inc.
 
MrWibble said:
And yet my own code does this... it runs a small kernel on the SPU(s) which loads and unloads code to do different tasks and streams data back and forth from main memory. The existence of the PPE is not necessary except for initialisation - and if the Cell designers chose to remove it everything would still work. Right now we have a choice of two types of core to run code on - the stupidly fast array of streaming cores or the not-so-fast traditional core. Either can take on any job, the decision is not based on feature set, but on how critical and/or easy it is to code something around the SPU memory model.

The 8080 had multi-tasking added on by people. It doesn't make it suited for the work though. What you are effectively doing is writing a program that has several code sequences that loop back to a DO list when a given code sequence is done. You don't really need a kernal to do this.

There are a significant number of jobs that the SPU cannot do. The SPU's are subordinate to the PPE and rely on the PPE for a lot of the non-computational function.

Why would a multicore processor need every individual core to bootstrap independantly?

Flexability.

Why would you *not* give initial control to a single unit and let it bring the rest up? Why in a chip where eight of the cores are optimised for a particular set of algorithms and the other is more standard would you bootstrap the specialist cores instead of the ordinary one?

When you have broken cores you have no other option.

Why does the fact that the designers did this very logical thing in any way indicate some kind of weakness in the design of the SPU cores?

because this very thing is because the SPU cores can't do it themselves. They can't change the memory aliasing, they can't process interrupts, they can't dynamically change what they are running because of some external event. The SPUs are glorified SIMD pipelines. They lack a significant amount of functionality in an effort to reduce their size to the point that so many could be put on. The question is if the trade off is worthwhile or merely a stopgap.

What feature is it you think they lack which would stop them from being used as self-contained processors?

Coherence, TLBs, interrupts, memory access, non-coherent accesses, etc.

Do you actually have some kind of coherent point to make? Because from where I'm sitting it looks like you're blowing a lot of hot air and just trying to post everyone else into submission on technicalities I'm not sure you even understand.

I do believe that I've made my point. I'm fairly confident that I understand the technicalities at least as well as anyone else on this board.

Aaron Spink
speaking for myself inc.
 
aaronspink said:
I do believe that I've made my point. I'm fairly confident that I understand the technicalities at least as well as anyone else on this board.

Which is odd considering you're just outputting a stream of FUD, some of which flies in the face of the actual real world experience of those on this board who are programming the thing on a daily basis.

You are confusing (I feel deliberately, thought it might just be stupidity) the *choice* of a hw designer to place certain functions in the PPE instead of an SPU with design decisions made out of necessity.

Some of the things you mention aren't even absent from the current SPUs anyway.. memory access?? My SPUs can access memory just fine, thank you very much.

The remaining portion of functions limited to the PPE consist largely of one-off boot-time functions or house-keeping. Frankly the kind of stuff I don't want cluttering up my SPUs, though there would be absolutely nothing stopping them from being added in the future if someone wanted to get rid of the PPE.

I might as well suggest that the X360 CPU is crippled and not a "real CPU" because its memory interface is on a seperate chip (the GPU) rather than being integrated. If it wants memory it has to ask the GPU - so the CPU is subordinate to the GPU on that system, by your (rather bizarre) definition. However this is rubbish - designers on both projects simply decided that certain functions made more sense to be attached to only specific cores rather than have everything connected to everything else. It was not a necessity bourne out of a critical weakness, simply a logical choice that could be changed in other circumstances. If the XGPU was not present I'm sure XCPU could have a memory interface, and if the PPE was not present then SPUs could be given more responsibility.

Certainly in practice the machine can work exactly as described by Sony's slides - the SPUs can entirely self-manage without intervention. So whatever you're claiming can be disproven simply by observation - my code certainly hasn't stopped working just because you've said it isn't possible, and runtime performance is not bound by reliance on the PPE.
 
MrWibble said:
Some of the things you mention aren't even absent from the current SPUs anyway.. memory access?? My SPUs can access memory just fine, thank you very much.

If you have read any of Aaron's posting-sprees you would know he meant coherent loads and stores. Memory access is through DMA only.

I think your comment that Aaron is spreading FUD is unfair. He already stated that CELL *can* make sense in PS3, because of it's fixed architecture.

All Aaron has stated is that it will go nowhere else, in the wide mass-market sense. It'll do ok in FP bound DSP like system, medical visualization, Radar systems etc. But it will never replace x86 in PCs or ARM in PDAs/phones.

And this you will find very hard to argue against. By making the local store non-coherent, SONY has made the LS part of the SPU context and thereby made it very hard to virtualize, with >256KB context the thing will be impossible to time-slice. This doesn't matter in the PS3 where you know you have 7 SPUs at your disposal and can code accordingly. But when you're writing software that will run on platforms that will range from one to many SPUs, it will make a difference, and software that will see a reasonable lifetime (5+ years).

Also changing the size of the local store is a no no, lowering it will break existing code, increasing it brings zero benefit to existing code (unlike caches), so you'll need a re-write/compile every generation, look at VLIW to see how succesful that approach is.

Side note: I do think it is Sony influence that made CELL look like it does, the SPUs looks too much like VUs, and IBM should know better :)

Cheers
Gubbi
 
Last edited by a moderator:
Gubbi said:
Side note: I do think it is Sony influence that made CELL look like it does, the SPUs looks too much like VUs, and IBM should know better
If only... :(
Frankly I find the exact opposite to be true for me - just about every thing I dislike about Cell (as well as that other console CPU) is clearly all IBM's influence/design.
 
Fafalada said:
If only... :(
Frankly I find the exact opposite to be true for me - just about every thing I dislike about Cell (as well as that other console CPU) is clearly all IBM's influence/design.

You dislike the PPE+VMX?
 
Gubbi said:
By making the local store non-coherent, SONY has made the LS part of the SPU context and thereby made it virtually impossible to virtualize, with >256KB context the thing will be impossible to time-slice.

The optimal model for multi-tasking the SPUs seems to be cooperative (e.g. MrWibble's SPE task scheduler). This works well for a fixed hardware target (like a game console), but for this programming model to be of any use for more general applications, this scheduler will have to be part of the OS / standard runtime environment. I find it hard to believe that Sony offers no such thing in their SDK, but then again, I don't know.
 
Back
Top