IBM's Ashwini Nanda on Cell blades, raycasting, and more

Discussion in 'Console Technology' started by one, Aug 7, 2005.

  1. V3

    V3
    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    3,304
    Likes Received:
    5
    If it had dual core, you would dedicate another core to raycasting, why on earth would you want to do jpeg encoding on the second core ?

    The G5 workstation doesn't need jpeg encoding.
     
  2. dskneo

    Regular

    Joined:
    Jul 25, 2005
    Messages:
    816
    Likes Received:
    298
    hi :)
    look, lets forget the Benchmark scenario. The only use and lesson we gamers can take from that benchmark, is that we can see that ouputed graphics at that speed in a 1x7spe configuration because ps3 (and any other console and pc games) deals with Raw images and does not have to compress them to jpeg... We will have a frame buffer to ouput the raw graphics directly to screen.

    thas why i said before, that this benchmark is the same as realtime minus the Jpeg compression.
    When you say the Jpeg is speeding up the benchmark, its not true, the graphics are being rendered in Raw quality, and anything you do the reduce the quality after that raw image is rendered, its only taking cpu time to do it, ergo, slowing it down.
    The natural output is Raw. everything else done to that image is just consuming cycles.
    IF the natural output was Jpeg, then that benchmark would not need a extra spe to do it.

    but if you just want to see the Benchmark scenario forgeting that we wont need jpeg stuff, then yes that benchmark would run slower in 1x7 because it envonlves Jpeg compression. how slower, we dont know... my A64 compresses Jpeg in a blink.
    how good this information is for me? nothing... i play raw images just like everyother console in the world.
     
  3. ralexand

    Regular

    Joined:
    May 22, 2005
    Messages:
    438
    Likes Received:
    0
    Still don't understand the point of the jpeg and why not just render it to screen and display it.
     
  4. patsu

    Legend

    Joined:
    Jun 25, 2005
    Messages:
    27,709
    Likes Received:
    145
    By isolating jpeg compression + PPE flow-control (to prevent memory buffer overflow if the SPEs are too fast) to a single SPE, all the overhead is centralized. The other SPEs will just run with minimal/no overhead (x 7). Essentially the dedicated SPE is there to prevent the (overhead x 7) factor. This is not uncommon when programming super/parallel computers.

    If there is no jpeg compression, then the third stage goes away (freeing up the dedicated 8th SPE). People assumed that the SPEs at the 2nd stage will now send the jpeg images to the PPE (or RSX ?) directly. Assuming that the PPE can keep up (space and time), then no major rework is necessary when ported to a cell with 7 SPEs.

    Since they experienced super-linear improvements with more cells/higher megahertz, the problem is still CPU-bound. So the developers will want to keep all 7 SPEs working as much as possible (Wouldn't you ?). This means that in the worst case, the 7th SPE will have additional code partly to handle the overhead (if any !!!).

    The performance number should be pretty close to 50 (47 and above, assuming the 7th SPU is at least 50% utilitized for whatever reasons). Still respectable.
     
  5. Titanio

    Legend

    Joined:
    Dec 1, 2004
    Messages:
    5,670
    Likes Received:
    51
    The Cell Blade that's rendering it and the G5 is displaying and taking interaction etc. The Blade and the G5 are connected over the network, so in order for the Blade to feed frames to the G5 quickly enough - at realtime rates - compression is required.

    You might be asking why they didn't just hook the Blade up to a monitor and have it render directly to that. But I think much of the point of this demo was also how it was being done over the network. Not everyone can have a Cell blade on their desk, or with them, where this kind of application may be required..
     
  6. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,610
    Likes Received:
    825
    jvd,

    are you trolling?

    It is hard to take you seriously with the circletimessquare formatting.

    One last time, the jpeg compression is to put the frame over the network. The G5 equivalent of the jpeg compression is transferring the finished frame to the videocard framebuffer, that is it. I hope you can agree it wont need a second core to transfer a frame to a framebuffer?
     
  7. darkblu

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,642
    Likes Received:
    22
    mfa, whould you please post the link to that terrain rendering paper you mentioned?
     
  8. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,610
    Likes Received:
    825
    Here is the citeseer link.

    You might be interested in this paper too, someone else rediscovered the same method in the context of displacement mapping (and generalized it a bit).

    Oops, second link is dead ... Ill search for the paper later.
     
  9. BlueTsunami

    BlueTsunami I laugh at you! HA HA HA!
    Veteran

    Joined:
    May 4, 2005
    Messages:
    1,708
    Likes Received:
    33
    Location:
    In a tiny box
    ahhhhh ok, this post is a good one...because I was confused on how the G5 would handle that same setup (even though its not exactly the same, but being compared to a G5..its good to know how the G5 goes about doing that). So its pretty obvious that the G5 doesn't need a second core devoted to that, to make it a fair comparison (as jvd stated).

    So is that demo supposed to show that CELL is self sufficient in doing that task (Not needing to transfer to a different device, since it can do it itself) and is much quicker than a G5 to boot?
     
  10. darkblu

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,642
    Likes Received:
    22
    thanks! it'd be great if you manage to find that second one too :cool:
     
  11. Titanio

    Legend

    Joined:
    Dec 1, 2004
    Messages:
    5,670
    Likes Received:
    51
    Hmm? The point is Cell does need to transfer the frame to another client, since that's the setup IBM were designed the app in mind with - a client-server app. In this case G5 is the client, so it doesn't need to transfer the frame to itself over the network if it's rendering it locally.

    If you wanted to make it fair, you would have the G5 doing image encoding too ;) If the Cell Blade were the client, it could do that rendering locally too and not have to transfer the image over the network, and save that SPE for more raycasting/rendering.
     
  12. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,610
    Likes Received:
    825
    Okay, had to host it myself.
     
  13. scificube

    Regular

    Joined:
    Feb 9, 2005
    Messages:
    836
    Likes Received:
    9
    Thanks Mfa I wanted to check that out too ;)
     
  14. TrungGap

    Regular

    Joined:
    Jun 17, 2005
    Messages:
    578
    Likes Received:
    2
    All these JPG compression stuff is a moot point. They're just trying to isolate raycasting aspect . (IE if you take the network bandwidth out of the equation, you get closer to the real number).

    Quaz51 brought up a more important question...so what's going on here?
     
  15. SenatorMonkey

    Newcomer

    Joined:
    May 6, 2005
    Messages:
    21
    Likes Received:
    1
    From the nature of the test, one cell has 7 SPUs devoted to raycasting. A dual-cell system doesn't need to have 2 SPUs doing jpeg compression, so would likely have 15 SPUs doing raycasting, for a 2.14x theoretical improvement (114%), so 108% isn't out of the question. Just highly efficient.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...