Consoles & Propoganda

Discussion in 'Console Gaming' started by aselto, Mar 22, 2007.

Thread Status:
Not open for further replies.
  1. Carl B

    Carl B Friends call me xbd
    Legend

    Joined:
    Feb 20, 2005
    Messages:
    6,266
    Likes Received:
    63
    You're killing me here Todd - I hope you read that thead I linked to before you posted this! ;)

    Look, here's what makes the PPU 'general purpose,' it's that you don't have to change your code *at all* to run those apps on it - Cell will run Firefox in Linux, without any re-coding, because the PPU core is supports the Power ISA completely. It is "general purpose" in that sense, but that doesn't make it good at running such code, and it doesn't make such code favorable. The SPEs require you to change your code - they will not run Firefox out the box. But that doesn't mean they can't run Firefox, know what I mean?

    Who knows, maybe MS - but absolutely Sony would not want such a solution. If they did, they would have pursued an OOE G5-esque solution; do you think it beyond IBM and Sony's abilities? The processor they created was the processor they wanted - what they wanted were the SPEs - the SPEs *are* Cell, and the reason anyone in any industry would be attracted to it.
     
  2. Todd33

    Veteran Banned

    Joined:
    Jan 22, 2007
    Messages:
    1,066
    Likes Received:
    7
    Location:
    CA
    Sony has been pimping the "Edge" tools and AI was spoke about:

    http://www.psu.com/node/8715

    So whether Sony is lying or not is up to the devs, but maybe the SPUs are not bad but the techniques for using them was immature last year.
     
  3. Carl B

    Carl B Friends call me xbd
    Legend

    Joined:
    Feb 20, 2005
    Messages:
    6,266
    Likes Received:
    63
    Well and no doubt their AI implementation will run better on 360, if it's an if/then branch-heavy affair. But you and I both know that's not how they have to approach AI, it's just how they happen to be doing it because its easy for programmers to understand. Hell, the AI debates between members on AI implementation on Cell have been some of the craziest discussions on this board period! ;)

    EDIT: Ironically Todd shows above that Sony is not trying to mess with dev's preference of the branchy AI, but simply helping them move it to the SPEs. There's a paper around here that shows how sometimes simply biting the bullet on branches on the SPE can be the best way to go; let me see if I can find it. But my point was, AI implementation isn't confined to the present ubiquitous methods.
     
  4. Todd33

    Veteran Banned

    Joined:
    Jan 22, 2007
    Messages:
    1,066
    Likes Received:
    7
    Location:
    CA
    Running FF on a SPU(s) might be possible, but it may slow and take 10x the amount of development time. The SPUs excel at math, hence the huge amount of interest and work in the scientific and research communities.
     
  5. TheChefO

    Banned

    Joined:
    Jul 29, 2005
    Messages:
    4,656
    Likes Received:
    32
    Location:
    Tampa, FL
    Great - thanks for the link xbd. :smile:

    Now how bout that question regarding xcpu v cell? Are there not circumstances where cell is outperformed by xcpu?

    and

     
    #105 TheChefO, Mar 23, 2007
    Last edited by a moderator: Mar 23, 2007
  6. Carl B

    Carl B Friends call me xbd
    Legend

    Joined:
    Feb 20, 2005
    Messages:
    6,266
    Likes Received:
    63
    I'll let you read the rest of my posts Chef, let you edit yours up, and then I'll respond to the final product. ;)
     
  7. TheChefO

    Banned

    Joined:
    Jul 29, 2005
    Messages:
    4,656
    Likes Received:
    32
    Location:
    Tampa, FL
    done :smile:

    At least I hope I'm not missing something ... :???:
     
  8. ShootMyMonkey

    Veteran

    Joined:
    Mar 21, 2005
    Messages:
    1,177
    Likes Received:
    72
    I wouldn't necessarily say that. I could take code that's really very bad (performance-wise) so-called "general purpose" code and as long as it isn't doing anything that the SPE compiler doesn't support, it can be built to SPE code, and even a single SPE will, the majority of the time, outperform the PPE on that same code. Granted, I'm not talking about writing a word processor, but still. So what if it is a series of vector units? That doesn't mean it will exclusively work on vector ops, which many people would like to have you believe. So what if it doesn't have cache? On a higher level, the fetching over DMA can really be treated as cache misses.

    I honestly think the notion that the PPE is more flexible and better suited to various things is less due to its hardware layout specifically and more due to the role it has in the overall Cell layout as a sort of "master" core. It is easier to work with, all right, but I fail to see how that's a guaranteed victory for "general purpose."
     
  9. TheChefO

    Banned

    Joined:
    Jul 29, 2005
    Messages:
    4,656
    Likes Received:
    32
    Location:
    Tampa, FL
    Doesn't the larger "cache/local store" also enable a more complex program to be run on the ppe (512k) vs the spe (256k)?
     
  10. Tap In

    Legend

    Joined:
    Jun 5, 2005
    Messages:
    6,382
    Likes Received:
    65
    Location:
    Gravity Always Wins
    And I go back to me previous conclusion below which may or may not be true to a degree. I believe that on a certain level I'm pretty close to correct and perhaps someone who works on both machines (SMM?) would know best how close this is to being true?

     
  11. Carl B

    Carl B Friends call me xbd
    Legend

    Joined:
    Feb 20, 2005
    Messages:
    6,266
    Likes Received:
    63
    With code optimization? No, Cell will emerge victorious every time.
     
  12. TheChefO

    Banned

    Joined:
    Jul 29, 2005
    Messages:
    4,656
    Likes Received:
    32
    Location:
    Tampa, FL
    What about apps/datasets that are larger than 256k?
     
  13. Carl B

    Carl B Friends call me xbd
    Legend

    Joined:
    Feb 20, 2005
    Messages:
    6,266
    Likes Received:
    63
    When you say 'apps,' what you really mean is data chunks. Right, well and so I would ask you, why if running an app on Cell would you not optimize the code around that LS restriction? You see, you want to take a situation where sub-par code is running alright on the PPE, and stick it on the SPEs and say: look how it struggles! Why aren't we changing the code? If you change that code, it won't be struggling anymore, but it will still be getting the job done.
     
  14. TheChefO

    Banned

    Joined:
    Jul 29, 2005
    Messages:
    4,656
    Likes Received:
    32
    Location:
    Tampa, FL
    Your saying these "datachunks" are never larger than 256k? Or that some scenarios would not require or run better with a larger pool for these "data chunks"?
     
  15. Carl B

    Carl B Friends call me xbd
    Legend

    Joined:
    Feb 20, 2005
    Messages:
    6,266
    Likes Received:
    63
    A Cell processor with 8 SPEs, each with 512KB of local store would outperform the Cell chip with 256KB... but what you're asking is if the XeCPU cache advantage cripples the SPEs and is a barrier that they are unable to overcome via optimization relative to the XeCPU, in order to keep pace with it... and what I'm saying is that no, with proper coding the SPEs will not be crippled relative to the performance the XeCPU can muster. The SPEs suffer when they can't work on chunks smaller than 256KB in size, but that doesn't mean that they are rendered useless. But the plan with the optimized code is that you break those chunks down as best you can to begin with, and you manage the data yourself rather than letting the cache do it for you.
     
  16. TheChefO

    Banned

    Joined:
    Jul 29, 2005
    Messages:
    4,656
    Likes Received:
    32
    Location:
    Tampa, FL
    I see ... so a larger local store would benefit cell but not xcpu? ....or you're saying spe's are soo much faster than ppe's that it would make up for the smaller cache/local stor in EVERY situation... :???:
     
  17. Carl B

    Carl B Friends call me xbd
    Legend

    Joined:
    Feb 20, 2005
    Messages:
    6,266
    Likes Received:
    63
    What I'm saying is that 256KB is a contraint Cell faces; larger would be better. What I am also saying is that the cache of the XeCPU does not compete with the LS in terms of the functions they serve - they are used differently from one another, and that if the coding is done right, the larger cache the XeCPU enjoys is not relevant to what a programmer is doing on his Cell-based code, because he is approaching the processors entirely differently. Again, Cell could stand for larger LS, but if what we're discussing is performance of a single game-related task, and whether there are any situations in which optimizing for Cell will result in a slower implementation vs its XeCPU counterpart, then... well yeah, for most relevent operations Cell is going to win.
     
  18. TheChefO

    Banned

    Joined:
    Jul 29, 2005
    Messages:
    4,656
    Likes Received:
    32
    Location:
    Tampa, FL
    Interesting. I don't buy it though.

    If that were the case and the LS size had zero effect on the end product, why didn't they shoot for 128k/spe? They could have saved a significan't chunk of die space!

    I'm not a programmer and don't know the ins and outs but I do know enough to realize that larger fast local storage can enable certain things which would not be possible with a smaller storage pool. Otherwise they would have saved themselves the die space and produced 128k or 64k LS/spe.

    Anyway since I can't refute this with fact and nobody else wants to chime in that knows better, I'll just leave it at that.

    edited ... ok so like you said "most" and that is exactly what I was saying ... there are instances where xcpu can perform better than cell. I've said it I don't know how many times: Cell is better overall and I expect to see the reults of this better design at some point in the ps3's life. However this chip will also have to help the inferior gpu of ps3 to compete with xb360 so this will limit this advantage.

    Now about my other question(s)...
     
    #118 TheChefO, Mar 23, 2007
    Last edited by a moderator: Mar 23, 2007
  19. ShootMyMonkey

    Veteran

    Joined:
    Mar 21, 2005
    Messages:
    1,177
    Likes Received:
    72
    I don't know how often you run into a job where the *code* takes up a huge chunk of a 256k block since you're not going to put an entire *application* on either one all at once. The SPEs can put anything in their 256k block, and you can DMA anything including code to that local store on demand. It basically counts as a cache miss once again when you need to do so. The PPE has its 32K L1 instruction cache and it sees instruction cache misses all the same.

    You might have fewer misses on the PPE than the SPE, so if continuous feeds of data or the use of far function pointers is a common thing for your codebase, you could have some issues. And how hard that hits you performance wise is a matter of how early you make a "prefetch" or equivalent.

    If you've got a single job that needs some 300k of code (which is a hell of a lot), then you definitely need to break it up... but that's different from not being able to run all 300k of that code at all.

    If you have a small block of code that works on 1 MB of data, that simply means going through it in smaller chunks at a time and fetching over the DMA while you're going through data further down the line so that you never have to wait. This is actually not that unusual since it was something people did on the PS2 VUs all the time and you only had 4k/16k on them.

    You have to remember that cache/local store is just that -- a cache. It's not an absolute storage limit or a limit on how much work you can do or how much you can work on... it's a limit on how much you can do AT a given moment.
     
    #119 ShootMyMonkey, Mar 23, 2007
    Last edited by a moderator: Mar 23, 2007
  20. Carl B

    Carl B Friends call me xbd
    Legend

    Joined:
    Feb 20, 2005
    Messages:
    6,266
    Likes Received:
    63
    They originally wanted to do 128KB, but they decided to up it to 256. For their HPC clients I imagine a future Cell revision will up it again to 512, but no word as yet. I think you're still thinking of the LS size contraint as too much of a "hard" constraint though; what it is is a penalty-inducer in the situations you're thinking of.

    It's not about possible vs not possible, it's about efficiency and speed.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...