Consoles & Propoganda

Status
Not open for further replies.
There is no debate that a PPU is more general purpose (ie. it would run Word and Firefox better with less coding), the question is how much code in a game is general purpose?

You're killing me here Todd - I hope you read that thead I linked to before you posted this! ;)

Look, here's what makes the PPU 'general purpose,' it's that you don't have to change your code *at all* to run those apps on it - Cell will run Firefox in Linux, without any re-coding, because the PPU core is supports the Power ISA completely. It is "general purpose" in that sense, but that doesn't make it good at running such code, and it doesn't make such code favorable. The SPEs require you to change your code - they will not run Firefox out the box. But that doesn't mean they can't run Firefox, know what I mean?

Both Sony and MS would have loved to have a top of the line Intel or AMD for their CPU, but they could not afford the price or heat, so they both made compromises.

Who knows, maybe MS - but absolutely Sony would not want such a solution. If they did, they would have pursued an OOE G5-esque solution; do you think it beyond IBM and Sony's abilities? The processor they created was the processor they wanted - what they wanted were the SPEs - the SPEs *are* Cell, and the reason anyone in any industry would be attracted to it.
 
Sony has been pimping the "Edge" tools and AI was spoke about:

Of particular interest however was the bit Sony Computer Entertainment Worldwide (SCEWW) said about branching AI on the SPU. Branching is a common technique used in artificial intelligence where a program randomly chooses a few samples from a larger set of options, and then tests each to see which is the best. This method of AI can provide more realistic behavior, as humans often don’t choose the overall best option because it simply doesn’t occur to them, however it is very inefficient to begin with, and due to the nature of the type of process it is, most developers have claimed that the SPUs would be absolutely terrible for branching.

As Sony put it however, branching is absolutely terrible for ALL processors. In their experience, they said, it is less terrible for the SPUs however. In the upcoming game Heavenly Sword, they said that moving the branching AI off of the Power Processor Unit (PPU) increased the performance of that particular process. In other words, the same branching ran better on the SPUs.

http://www.psu.com/node/8715

So whether Sony is lying or not is up to the devs, but maybe the SPUs are not bad but the techniques for using them was immature last year.
 
there was that one slip of the lounge last year by the Assassin's Creed team that mentioned the branching in Xenon allowing "better" AI than the PS3 version... quickly recanted. ;)

Well and no doubt their AI implementation will run better on 360, if it's an if/then branch-heavy affair. But you and I both know that's not how they have to approach AI, it's just how they happen to be doing it because its easy for programmers to understand. Hell, the AI debates between members on AI implementation on Cell have been some of the craziest discussions on this board period! ;)

EDIT: Ironically Todd shows above that Sony is not trying to mess with dev's preference of the branchy AI, but simply helping them move it to the SPEs. There's a paper around here that shows how sometimes simply biting the bullet on branches on the SPE can be the best way to go; let me see if I can find it. But my point was, AI implementation isn't confined to the present ubiquitous methods.
 
You're killing me here Todd - I hope you read that thead I linked to before you posted this! ;)

Look, here's what makes the PPU 'general purpose,' it's that you don't have to change your code *at all* to run those apps on it - Cell will run Firefox in Linux, without any re-coding, because the PPU core is supports the Power ISA completely. It is "general purpose" in that sense, but that doesn't make it good at running such code, and it doesn't make such code favorable. The SPEs require you to change your code - they will not run Firefox out the box. But that doesn't mean they can't run Firefox, know what I mean?

Running FF on a SPU(s) might be possible, but it may slow and take 10x the amount of development time. The SPUs excel at math, hence the huge amount of interest and work in the scientific and research communities.
 
Ok, a while ago "general purpose" was discussed at length in a thread that wondered whether a modern OOE processor might not have been a better choice for these consoles.

Read this four-page thread: http://forum.beyond3d.com/showthread.php?t=38592

And come away with a better understanding of what that term means, and why Major Nelson is just throwing up smoke when he uses it. Essentially though it boils down to there being *no* game code run on Cell that need go unoptomized as the gen progresses. Major Nelson is doing things like implying that "general purpose" code like animation can't run on Cell... but if you change the way you do your animation, then voila! - suddenly it runs and it runs extremely well to boot. And that's why I hate the term, and it misleads those that read it.

Anyway, read the thread, it is *the* place to gain an understanding of the term, and saves me a lot of typing here. :p

Great - thanks for the link xbd. :smile:

Now how bout that question regarding xcpu v cell? Are there not circumstances where cell is outperformed by xcpu?

and

Carl B said:
The bolded part is the only part of that that's true, and as such we'll leave it at that.

Chef said:
??? You're saying that current multiplat games are not roughly equal on ps3 and xb360 and that most unbiased devs have not called the systems a wash overall?? c'mon xbd! That's ehrm ... questionable.
 
Last edited by a moderator:
I'll let you read the rest of my posts Chef, let you edit yours up, and then I'll respond to the final product. ;)
 
The fact is the PPE are better general purpose cores than the SPE's, they are more flexible, they are better suited to a wider variety of code.
I wouldn't necessarily say that. I could take code that's really very bad (performance-wise) so-called "general purpose" code and as long as it isn't doing anything that the SPE compiler doesn't support, it can be built to SPE code, and even a single SPE will, the majority of the time, outperform the PPE on that same code. Granted, I'm not talking about writing a word processor, but still. So what if it is a series of vector units? That doesn't mean it will exclusively work on vector ops, which many people would like to have you believe. So what if it doesn't have cache? On a higher level, the fetching over DMA can really be treated as cache misses.

I honestly think the notion that the PPE is more flexible and better suited to various things is less due to its hardware layout specifically and more due to the role it has in the overall Cell layout as a sort of "master" core. It is easier to work with, all right, but I fail to see how that's a guaranteed victory for "general purpose."
 
EDIT: Ironically Todd shows above that Sony is not trying to mess with dev's preference of the branchy AI, but simply helping them move it to the SPEs. There's a paper around here that shows how sometimes simply biting the bullet on branches on the SPE can be the best way to go; let me see if I can find it. But my point was, AI implementation isn't confined to the present ubiquitous methods.

And I go back to me previous conclusion below which may or may not be true to a degree. I believe that on a certain level I'm pretty close to correct and perhaps someone who works on both machines (SMM?) would know best how close this is to being true?

Perhaps now devs are using the SPEs to pick up that slack. I (of course) realize that the SPEs allow the Cell to be SUPERIOR to Xenon cpu but... they are being leveraged in many, many instances to do some of the things the 360 does in other parts of the system; essentially giving us what many feel is the ultimate result... near parity between these two systems.
 
What about apps that are larger than 256k?

When you say 'apps,' what you really mean is data chunks. Right, well and so I would ask you, why if running an app on Cell would you not optimize the code around that LS restriction? You see, you want to take a situation where sub-par code is running alright on the PPE, and stick it on the SPEs and say: look how it struggles! Why aren't we changing the code? If you change that code, it won't be struggling anymore, but it will still be getting the job done.
 
When you say 'apps,' what you really mean is data chunks. Right, well and so I would ask you, why if running an app on Cell would you not optimize the code around that LS restriction? You see, you want to take a situation where sub-par code is running alright on the PPE, and stick it on the SPEs and say: look how it struggles! Why aren't we changing the code? If you change that code, it won't be struggling anymore, but it will still be getting the job done.

Your saying these "datachunks" are never larger than 256k? Or that some scenarios would not require or run better with a larger pool for these "data chunks"?
 
Your saying these "datachunks" are never larger than 256k? Or that some scenarios would not require or run better with a larger pool for these "data chunks"?

A Cell processor with 8 SPEs, each with 512KB of local store would outperform the Cell chip with 256KB... but what you're asking is if the XeCPU cache advantage cripples the SPEs and is a barrier that they are unable to overcome via optimization relative to the XeCPU, in order to keep pace with it... and what I'm saying is that no, with proper coding the SPEs will not be crippled relative to the performance the XeCPU can muster. The SPEs suffer when they can't work on chunks smaller than 256KB in size, but that doesn't mean that they are rendered useless. But the plan with the optimized code is that you break those chunks down as best you can to begin with, and you manage the data yourself rather than letting the cache do it for you.
 
A Cell processor with 8 SPEs, each with 512KB of local store would outperform the Cell chip with 256KB... but what you're asking is if the XeCPU cache advantage cripples the SPEs and is a barrier that they are unable to overcome via optimization relative to the XeCPU, in order to keep pace with it... and what I'm saying is that no, with proper coding the SPEs will not be crippled relative to the performance the XeCPU can muster. The SPEs suffer when thet can't work on chunks smaller than 256KB in size, but that doesn't mean that they are rendered useless.

I see ... so a larger local store would benefit cell but not xcpu? ....or you're saying spe's are soo much faster than ppe's that it would make up for the smaller cache/local stor in EVERY situation... :???:
 
I see ... so a larger local store would benefit cell but not xcpu? ....or you're saying spe's are soo much faster than ppe's that it would make up for the smaller cache/local stor in EVERY situation... :???:

What I'm saying is that 256KB is a contraint Cell faces; larger would be better. What I am also saying is that the cache of the XeCPU does not compete with the LS in terms of the functions they serve - they are used differently from one another, and that if the coding is done right, the larger cache the XeCPU enjoys is not relevant to what a programmer is doing on his Cell-based code, because he is approaching the processors entirely differently. Again, Cell could stand for larger LS, but if what we're discussing is performance of a single game-related task, and whether there are any situations in which optimizing for Cell will result in a slower implementation vs its XeCPU counterpart, then... well yeah, for most relevent operations Cell is going to win.
 
What I'm saying is that 256KB is a contraint Cell faces; larger would be better. What I am also saying is that the cache of the XeCPU does not compete with the LS in terms of the functions they serve - they are used differently from one another, and that if the coding is done right, the larger cache the XeCPU enjoys is not relevant to what a programmer is doing on his Cell-based code, because he is approaching the processors entirely differently.

Interesting. I don't buy it though.

If that were the case and the LS size had zero effect on the end product, why didn't they shoot for 128k/spe? They could have saved a significan't chunk of die space!

I'm not a programmer and don't know the ins and outs but I do know enough to realize that larger fast local storage can enable certain things which would not be possible with a smaller storage pool. Otherwise they would have saved themselves the die space and produced 128k or 64k LS/spe.

Anyway since I can't refute this with fact and nobody else wants to chime in that knows better, I'll just leave it at that.

edited ... ok so like you said "most" and that is exactly what I was saying ... there are instances where xcpu can perform better than cell. I've said it I don't know how many times: Cell is better overall and I expect to see the reults of this better design at some point in the ps3's life. However this chip will also have to help the inferior gpu of ps3 to compete with xb360 so this will limit this advantage.

Now about my other question(s)...
 
Last edited by a moderator:
Doesn't the larger "cache/local store" also enable a more complex program to be run on the ppe (512k) vs the spe (256k)?
I don't know how often you run into a job where the *code* takes up a huge chunk of a 256k block since you're not going to put an entire *application* on either one all at once. The SPEs can put anything in their 256k block, and you can DMA anything including code to that local store on demand. It basically counts as a cache miss once again when you need to do so. The PPE has its 32K L1 instruction cache and it sees instruction cache misses all the same.

You might have fewer misses on the PPE than the SPE, so if continuous feeds of data or the use of far function pointers is a common thing for your codebase, you could have some issues. And how hard that hits you performance wise is a matter of how early you make a "prefetch" or equivalent.

If you've got a single job that needs some 300k of code (which is a hell of a lot), then you definitely need to break it up... but that's different from not being able to run all 300k of that code at all.

If you have a small block of code that works on 1 MB of data, that simply means going through it in smaller chunks at a time and fetching over the DMA while you're going through data further down the line so that you never have to wait. This is actually not that unusual since it was something people did on the PS2 VUs all the time and you only had 4k/16k on them.

You have to remember that cache/local store is just that -- a cache. It's not an absolute storage limit or a limit on how much work you can do or how much you can work on... it's a limit on how much you can do AT a given moment.
 
Last edited by a moderator:
If that were the case and the LS size had zero effect on the end product, why didn't they shoot for 128k/spe? They could have saved a significan't chunk of die space!

They originally wanted to do 128KB, but they decided to up it to 256. For their HPC clients I imagine a future Cell revision will up it again to 512, but no word as yet. I think you're still thinking of the LS size contraint as too much of a "hard" constraint though; what it is is a penalty-inducer in the situations you're thinking of.

I'm not a programmer and don't know the ins and outs but I do know enough to realize that larger fast local storage can enable certain things which would not be possible with a smaller storage pool.

It's not about possible vs not possible, it's about efficiency and speed.
 
Status
Not open for further replies.
Back
Top