RapidMind doubled Cell performance over IBM handcode

Discussion in 'CellPerformance@B3D' started by Terarrim, Jun 15, 2007.

  1. Terarrim

    Newcomer

    Joined:
    Jun 12, 2007
    Messages:
    177
    Likes Received:
    0
    After snooping around for Cell news found this article it was actually about Google aquasition of a multicore software developer company.

    Anyway further down I found this to be very intresting:

    http://www.hpcwire.com/hpc/1613242.html
     
  2. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,118
    Likes Received:
    2,860
    Location:
    Well within 3d
    Does that mean that Peakstream's compiler is that good or that the guys who hand-coded IBM's renderer weren't the best at it?
    Hand-coded does not have to equal well-optimized.

    It's not everyday that we see compiled code beating hand-coded programs by that margin, so I wonder what the specifics were behind such a showdown.
     
  3. phat

    Regular

    Joined:
    Feb 13, 2002
    Messages:
    496
    Likes Received:
    3
    Location:
    Waterloo, ON Canada
    I sat beside a couple of RapidMind guys at dinner yesterday. They seem to be a bright bunch. The guy to my left was working on parallelizing raytracing on Cell and GPUs, looking for realtime feasible implementations of traditionally offline techniques.
     
  4. patsu

    Legend

    Joined:
    Jun 25, 2005
    Messages:
    27,614
    Likes Received:
    60
    I agree ! But I want to ask something...

    It seems that Cell performance is designed to be very predictable (Some Berkeley paper also mentioned that they can draft up an algorithm on paper and estimated its run-time performance rather accurately). This trait _should_ help compilers become more effective in scheduling and optimizing performances. If so, for what class of problems ? If not, why ?
     
  5. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,118
    Likes Received:
    2,860
    Location:
    Well within 3d
    Cell's more deterministic execution model should help compilers, but the simplified view of memory and latencies would be helpful for hand-coders as well.

    On the other hand, the more concurrent execution model can be a roadblock for a coder and compiler alike. If the compiler gets it right though, it should outdo a human who is overwhelmed by the complexity.
    I'm curious what bottleneck the compiler worked around that the human coders missed.

    It is hopefully also the case that the hand-coded and compiled versions were based on equivalent source code.

    It's also hard to judge the quality of the renderer in question if it's not compared to other software renderers.
     
  6. Crossbar

    Veteran

    Joined:
    Feb 8, 2006
    Messages:
    1,821
    Likes Received:
    12
    My guess is that they just found a better way to organise the work by organising the data in a clever way and splitting the tasks over the SPUs in a more efficient manner.
    There are quite a few C-extensions to the compiler that let you take advantage of the "altivec"-functions of both the PPU and SPUs as well as more SPU-unique functions such as dynamic branch prediction etc., so you can use the HW pretty efficiently without needing to get your hands dirty in assembler.

    I also guess these guys regulary study the output of the compiler to see where they would benefit of loop roll-outs, inline functions etc. My experience is that you can improve efficiency quite a bit just by doing that. There are a few rules of thumb you can apply that will really help the compiler.

    I personally think that the pieces written in assembler should be kept to a minimum, because large pieces of code written in assembler make you less prone to make changes in that code as it usually is very cumbersome. Perhaps that was what happened to the IBM guys in this case, maybe they ended up with a sub-optimal solution because they had to write a lot of the code before the compilers were up to snuff and got them self limited by that code?
     
  7. one

    one Unruly Member
    Veteran

    Joined:
    Jul 26, 2004
    Messages:
    4,823
    Likes Received:
    153
    Location:
    Minato-ku, Tokyo
    If a workload is complex to a certain extent a runtime scheduler on PPU (or SPU) can be more clever at work than a hardcoded task launcher. Is the Rapidmind framework on PPU or on SPE like SPURS? Or is it a static compiler?
     
  8. Terarrim

    Newcomer

    Joined:
    Jun 12, 2007
    Messages:
    177
    Likes Received:
    0
    Just a couple of thoughts.

    Could this type of product help in games.

    For example the killzone devs have mentioned the use of parallel processing of geomtry.

    Could Rapid Mind speed up this type of task.

    If it could, could it work with Edge (due to the fact that its excepted that killzone devs have had a large part to play in its development.

    Also looking at the post above i.e. looking at using rapid mind for raytracing on Cell + GPU's.

    Due to the fact that Cell can "talk" to the GPU could something like rapid mind (with modification?) allow some type of parllel processing between Cell and RSX.

    I know the RSX is a diffrent beast than the SPE. However if it's speeding up the Cell by optimising code in essence couldn't it do this with the instructions that the Cell was sending RSX?
     
  9. Crossbar

    Veteran

    Joined:
    Feb 8, 2006
    Messages:
    1,821
    Likes Received:
    12
    Here you have some hard details of that specific render implementation.

    http://www.rapidmind.net/pdfs/RapidMindCellPorting.pdf

    Turns out that the IBM code was not really hard tuned either. Maybe we shouldn´t jump to any conclusions from just this example. The code looks neat though.
     
  10. Titanio

    Legend

    Joined:
    Dec 1, 2004
    Messages:
    5,670
    Likes Received:
    51
    I'm curious, but didn't Terrasoft announce that an evaluation version of Rapidmind would be available for YDL? Last I checked it still wasn't available, but I may have missed it. Anyone know what the story is there?
     
  11. Truespeed

    Banned

    Joined:
    Jun 10, 2007
    Messages:
    46
    Likes Received:
    0
    The PDF is impressive and there sample applications virtually sell the product itself. If what they're saying is true then the process of porting to the PS3 just became a Sunday stroll if they're code examples are any indication of the ease of distributed processing using their RapidMind product. What impressed me was how they were able to surpass the performance of C/PS3 Intrinsics using C++/Rapidmind without needing to use PS3 Intrinsics code whatsoever. Also, their code was not only considerably smaller, but also easier to optimize.

    Now, I'm not one to speculate, but given the recent and dramatic increase in performance and visual quality of Rainbow Six: Vegas on the PS3, one has to wonder if Ubisoft may be a new customer (RapidMind and Ubisoft are both based in Canada)
     
  12. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,118
    Likes Received:
    2,860
    Location:
    Well within 3d
    I like the simplicity and ease of use argument.

    The performance comparison is playing dirty, though.
    It's not apples to apples when they omitted a pretty significant optimization from the IBM side.

    It's too bad there wasn't a hand-coded unrolled version for a full comparison.
    If the trend follows, it would only be slightly slower, though much easier to implement.
     
  13. Truespeed

    Banned

    Joined:
    Jun 10, 2007
    Messages:
    46
    Likes Received:
    0
    There's no question a hand optimized version with unrolled loops would have slaughtered their compiler generated result, but this gives hope to the multi-platform companies that use the 360 as their lead SKU that won't optimize for the Cell.
     
  14. chris1515

    Veteran Regular

    Joined:
    Jul 24, 2005
    Messages:
    3,376
    Likes Received:
    1,999
    Location:
    Barcelona Spain
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...