Does Cell Have Any Other Advantages Over XCPU Other Than FLOPS?

Discussion in 'Console Technology' started by Alpha_Spartan, Nov 16, 2005.

  1. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Yep. PPE has basically nothing to do with it.

    SPEs have looping. It's an 18 cycle penalty for getting loops wrong I think. The lack of branch prediction simply means the programmer has to do all the work, providing code that does what a branch predictor would do.

    Just the same as the programmer has to do all the work with the LS, rather than having the convenience of a cache (even if that convenience comes with certain restrictions).

    You could call the SPE a hardware engineer's revenge on programmers for being so lazy all these years and soaking up everything that Moore's law has so far provided...

    Jawed
     
  2. ERP

    ERP
    Veteran

    Joined:
    Feb 11, 2002
    Messages:
    3,669
    Likes Received:
    49
    Location:
    Redmond, WA
    Unfortunately instruction scheduling isn't something that "modern" compilers have to do well, because "modern" processors do a lot of the job for them.

    Instruction scheduling on an in order core with large instruction latencies is a HARD problem, because the ruleset is large, and it's generally dones as a collection of heuristics in the compiler back end.

    IME "modern" compilers do a totally crappy job at hiding instruction latency on both X360 and PS3. In some cases I don't think they have enough information to do a good job, If you have a simple linear series of operations on a memory sequence on one of these cores your looking at interleaving many (10+) copies of that loop to hide the latency. A C Compiler in most cases simply can't ascertain if that unroll is a good optimisation, it doesn't know how much data might be being worked on and it isn't allowed to "pad" the data to avoid multiple exit conditions inside the loop. That leaves two options, add hints to the language or use some sort of profile guided optimisation.

    The only real reason that modern compilers are competitive with "hand coded assembly language" on processors is that for most code cache misses dominate performance, and as long as the code is similar in size and reads about the same amount of memory it will perform in a similar fashion.
     
  3. Bowie

    Newcomer

    Joined:
    Feb 10, 2002
    Messages:
    63
    Likes Received:
    0
    Can any programmers here answer my previous question on how the SPEs compare in general purpose code to the MIPS core in the EE. I just want a frame of reference on the leap in performance from last generation to the next.
     
  4. pc999

    Veteran

    Joined:
    Mar 13, 2004
    Messages:
    3,628
    Likes Received:
    31
    Location:
    Portugal
    About Cell I dont know but an orchestra score/training is a hell till get it right.Thanks.
     
  5. The GameMaster

    Newcomer

    Joined:
    Feb 9, 2005
    Messages:
    109
    Likes Received:
    1
    I would just like to state for the record that "Super Computers" do not make for good "Gaming Computers".

    Super Computers are built for specific purposes and are not for everyday use.
     
  6. randycat99

    Veteran

    Joined:
    Jul 24, 2002
    Messages:
    1,772
    Likes Received:
    12
    Location:
    turn around...
    Yeah, cuz the game development tools and hardware graphics acceleration has been pretty, erm, sparse for historic supercomputers. :roll:
     
  7. flick556

    Newcomer

    Joined:
    May 4, 2003
    Messages:
    163
    Likes Received:
    4
    I can't help but think when people say general purpose code they are talking about legacy previously written code. From that perspective I can agree, x86 processors have dedicated hardware features and mature compilers to allow them to execute most any code.

    Their are billions of lines of preexisting code and many developers rely on libraries of preexisting code and compiled libraries as opposed to writing everything anew. I think any new processor design will be faced with this hurdle, and a large burden is placed on compilers and middleware. The reason x86 can execute most any code has as much to do with the years of compiler optimizations as it does with the hardware itself.

    Cell should receive a nice developer following through IBM, Sony, Open Source, and all the console developers. It will take time but high quality compilers and libraries for cell will emerge. I think other processor designers should be threatened by this new design, because flops do mean something.

    Companies do some shady things to make their flop ratings look higher than they really are and some architectures have really low efficiency. That does not mean that flops are no longer important, it just means many flop ratings are deceitful.
     
  8. SedentaryJourney

    Regular

    Joined:
    Mar 13, 2003
    Messages:
    478
    Likes Received:
    28
    GP code? I have the answer!

    Here it is: General purpose code=bad cell code...that is all. :p
     
  9. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
    Um, from all the documentation available, this isn't true. The SPUs do have limited atonomy but must be configured and setup via the PPE in order to run any code. The SPUs are not turing complete.

    Aaron Spink
    speaking for myself inc.
     
  10. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
    You are making large assumptions about programming efficiency, you can't reasonably make. The efficiency of Cell will likely reasonably low. The programming model IS quite complex. Structured data access is a lot harder to do than talk about.

    The primary issue is that anything you do to speed up a cell style processor will be equally applicable to a non-cell style processor, but not visa versa.

    Aaron Spink
    speaking for myself inc.
     
  11. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
    There are people that think Intel and HP got together to design a new CPU and produced something useless. Do you honestly believe them that stupid and incompetant?

    There are people that think IBM, Apple, and Motorola got together to design a new CPU and produced something useless. Do you honestly believe them that stupid and incompetant?

    Just because several big companies get together doesn't mean they will produce something spectacular.

    Cell is an interesting design but it does have some serious issues and compromises that may be detremental. To gloss over these and assume that it will all be overcome is folly.

    Aaron Spink
    speaking for myself inc.
     
  12. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
    Pretty bad since all data access for an SPU has to be DMA'd in.

    The explicit memory management has both pluses and minuses. The primary minus being a complicated programming model and an inefficient use of storage space.

    Aaron Spink
    speaking for myself inc.
     
  13. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
    Which is why there have been very catastrophic bridge failures and collapses over the years. Bridges involve some complicated physics and physical interactions with the world. You have to make sure that simple things like 20 MPH winds won't cause oscillations which result in the bridge tearing itself apart as has been captured on film.

    You're pretty much describing DAXPY workloads, which just about any processor can churn through. The problem is that most real workloads may have sequences that will act like DAXPY workloads, but those sequences are surrounded by much more complex code.

    You are assuming that the hardware will be suited for the codes which may or may not be correct. You can't just make an array inversion into not an array inversion.

    Cell has a complex programming model with no dynamic data access between the processing elements, nor direct access to data storage requiring programmers to jump through myriad hoops just to get data into the SPU. It will also require private copies in each SPU of any data that is used by more than one SPU. It doesn't allow an efficient method of sharing data structures nor allowing more than one SPU to update a data structure in an efficient and programmer friendly manor.

    Aaron Spink
    speaking for myself inc.
     
  14. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
    Yes, in the same way that XCPU can have infinite threads. The reality is that you get 2 general purpose threads and 7 attached media processors.

    Aaron Spink
    speaking for myself inc.
     
  15. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
    I think you might be overestimating the amount of number crunching involved. You might want to take a look at some of the HPC applications out there. Even in the most physics heavy HPC codes, there is a supprising amount of what could be described as number crunching code.

    Aaron Spink
    speaking for myself inc.
     
  16. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
    Then you haven't been paying attention...

    Then by your definition, a streaming processor is useless because it processes ephemeral vapor by passing it through various processing element to be spit into the ether.

    SPE's have NO access to external memory. All data into and out of an SPEs must be moved explicitly by a DMA engine. There is no method for access to external memory from within an SPE program except by configuring a DMA descriptor and loading it into the DMA engine.

    The SPEs ARE streaming processors, being practically the very definition of a stream processor. They have severly limited integer, logical, and control flow capabilities. To describe them as general purpose mis-characterises them and does an injustice to their designed intent.

    Aaron Spink
    speaking for myself inc.
     
  17. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
    I believe you need to take a couple courses in computer architecture and maybe study up on the technical papers. Until then, you might not want to try to correct other people who have a little better grasp on the technical issues.

    Aaron Spink
    speaking for myself inc.
     
  18. London Geezer

    Legend Subscriber

    Joined:
    Apr 13, 2002
    Messages:
    24,151
    Likes Received:
    10,297
    Wow Aaron, the last 9 posts were yours!! Would be nice to have a "merge" option on these boards.
     
  19. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
    Actually, dual threading supports at most a 100% performance advantage. Real world workload performance will however vary. But there are certain classes of opperations, mostly surround the traversals of linked lists which will achieve a 100% performance increase with dual threading.

    And Tera supported 128 threads per core. Where are they now?

    Aaron Spink
    speaking for myself inc.
     
  20. Brimstone

    Brimstone B3D Shockwave Rider
    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    1,835
    Likes Received:
    11
    Aaraon do you have an opinion/insight on the impact of Transmeta's Longrun2 technology will have on a multicore design like CELL? It's unlike the single single fat design of the Curosoe processor.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...