QPACE - Quantum Chromodynamics on PowerXCell 8i

Discussion in 'CellPerformance@B3D' started by Jawed, Jun 14, 2009.

  1. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,853
    Likes Received:
    722
    Location:
    London
    Nice overview of the project:

    http://www.itwm.fhg.de/hpc/workshop/mic/Qpace_(Dirk_Pleiter_-_Desy).pdf

    More details on the hardware:

    http://www.fz-juelich.de/jsc/datapool/cell/eQPACE/pleiter-eqpace-20090209.pdf

    It's a bit dodgy that the first presentation makes its FLOPs/word case based on single-precision math, but still it all looks quite tasty.

    This article is more wordy (pages 25-27):

    http://inside.hlrs.de/pdfs/inSiDE_autumn2008.pdf

    The cooling architecture is interesting as it uses a large coldplate per backplane, i.e. one coldplate per 32 node cards. The cards are cooled solely by conduction of heat into the coldplate which is water-cooled.

    2048 node cards seems to be the limit for a single set of racks, i.e. all networked to run a single program. With each card able to deliver 100GFLOPs that seems to be a target of ~200TFLOPs.

    Jawed
     
  2. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    I don't get it. Why bother making such huge investments based on a product that has no public roadmap into the future. Not just with regard to this project, which seems a nicely executed one (direct LS to LS DMA with 1us latency sounds cool), but generally.

    Why should anyone bother to write for cell when nobody knows whether it will be present in the future or not. IIRC, there was talk of the 2PPE+32SPE cell2 being scrubbed. If it wasn't, surely, something or the other would have been heard regarding it.

    BTW, I'd like someone to put together something like this based on GPU's. Here's my wish list. 8x(4870 equivalent+4GB GDDR5) per blade, connected to an FPGA over the HT3.1 bus. With the FPGA providing QDR IB over fiber optic.
     
  3. patsu

    Legend

    Joined:
    Jun 25, 2005
    Messages:
    27,614
    Likes Received:
    60
    Probably because the hardware and software meet their budget and operating demand (heat, power and FLOP-wise). Slide 15 has a little comparison (based on balance of h/w).

    Software that runs well on Cell can probably run well everywhere else due to its stringent demand in data locality. So porting to other platform should not be an issue. Many high performance systems use a NUMA-like data access architecture anyway.

    Besides, GPU architecture and programming model are still evolving too. It's only a chip. The entire system (e.g., memory, interconnect) needs to be built-up for general purpose high performance computing. Cell took those into consideration as long as the programmers are willing to optimize for data locality (i.e., local store).

    I think they can always explore a better solution in the future. But they need answers today (or rather, yesterday, since the project started last year).
     
  4. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Hmm, they are not evolving much more than Cell. PS3 launched in late 2006, about the same time as Cuda.
     
  5. patsu

    Legend

    Joined:
    Jun 25, 2005
    Messages:
    27,614
    Likes Received:
    60
    With Intel's work ? I think it can be more like programming a CPU (eventually). OpenCL should work equally well in Cell -- as long as locality is observed.
     
  6. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Dude, what's this? And how the hell is this related to what I said? Where did intel/opencl come into it?
     
  7. patsu

    Legend

    Joined:
    Jun 25, 2005
    Messages:
    27,614
    Likes Received:
    60
    Huh ? I'm just saying things are changing on the GPU programming model front also. So whatever you invested in programming GPU SKUs will change like any other code investment in SPUs. As long as the basic principles are still applicable, then the engineers will simply use whatever works for their budget and operating demand.
     
  8. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    39,192
    Likes Received:
    9,094
    Location:
    Under my bridge
    Code is perpetually changing. I can't think of many industries where you need to keep learning the latest stuff to stay on top as much as software development. Anyone thinking 'I know how to do this, I'll do it this way for the rest of my working life' isn't going to get very far ;)

    Still, is the roadmap for Cell still active? Or is it a deadend architecture?
     
  9. Enzyme

    Newcomer

    Joined:
    Nov 15, 2007
    Messages:
    81
    Likes Received:
    1
    Location:
    Belgium
    I believe the roadmap is still active. I don't know how much it has changed since the last time I saw it though (if it has changed at all).

    I'll meet Peter Hofstee next Tuesday. He's gonna talk about Cell and the future of heterogeneous multicore processors in mainstream applications.

    I'll be sure to ask him about the roadmap though, and I'll let you know. :wink:
     
  10. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    39,192
    Likes Received:
    9,094
    Location:
    Under my bridge
  11. Enzyme

    Newcomer

    Joined:
    Nov 15, 2007
    Messages:
    81
    Likes Received:
    1
    Location:
    Belgium
    Well, I got to talk to Hofstee and he said that there weren't any real plans to scale up the Cell to more SPE's or more PPE's in the near future. He did say that they could if they wanted to though, but it just wasn't their goal right now.
    They are mostly working on the software front, trying to make the programming a lot easier than it is now.
    So hopefully we see something really useful comming from them real soon.
     
  12. patsu

    Legend

    Joined:
    Jun 25, 2005
    Messages:
    27,614
    Likes Received:
    60
    Did he elaborate on the approach they are taking now ? (regarding the simplifying programming part)
     
  13. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    I'd think that would go the opencl route. That seems to be their best bet. Opencl is going to be the runtime of choice for GPU's and there's going to be a huge amount of code written as cl kernels. POrting them to run on the cell should be a simple exercise.
     
  14. Enzyme

    Newcomer

    Joined:
    Nov 15, 2007
    Messages:
    81
    Likes Received:
    1
    Location:
    Belgium
    Yeah, that's indeed one of the approaches they're taking.
    There are some other options, but I don't really know how far I can elaborate on this, because he wasn't realy sure about what he could tell and show himself. :)
     
  15. Panajev2001a

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,187
    Likes Received:
    8
    It is unfortunate that they are not talking about HW upgrades.... improvements to the EIB (which originally should have been a switched network and not a ring bus), to the SPU, to the PPU, etc... not more of the latter two simply... ISA additions, addition of cache (LS1 Cache is even in the CBEA 1.0 specs... will we ever see CBEA 2.0 or IBM is trying to simply sell current CELL [DP enhanced model] chips in blades and more and more software stacks to make their customers happy?)...

    I am glad that they are keeping on pushing the software stack that runs on the actual CELL, I just hope they are not trying to simply optimize its performance without a successor in mind...

    Luckily their Rochester Labs are working on newer approaches...

    http://appft.uspto.gov/netacgi/nph-...h;+Eric+Oliver"&RS=IN/"Mejdrich;+Eric+Oliver" (an example of their several VPU patents)

    Inventors: Mejdrich; Eric Oliver; (Rochester, MN) ; Muff; Adam James; (Rochester, MN) ; Tubbs; Matthew Ray; (Rochester, MN)
     
    #15 Panajev2001a, Jun 27, 2009
    Last edited by a moderator: Jun 27, 2009
  16. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    39,192
    Likes Received:
    9,094
    Location:
    Under my bridge
    I guess due to the scalability of the chips networked, at the moment shrinks and sticking more on a mobo is an acceptible way to increase performance. They need a large enough market to justify hardware improvements. The better the software side, the more people will use Cell, the more reason to push it forwards. Off course if they don't develop the hardware enough, GPGPU will overtake. This also makes one wonder about PS4, if there won't be a new Cell processor for it!
     
  17. Vitaly Vidmirov

    Newcomer

    Joined:
    Jul 9, 2007
    Messages:
    108
    Likes Received:
    10
    Location:
    Russia
    What's wrong with ring bus? It has simple topology and high B/W.
    A few extra cycles of latency? :lol: I can't imagine who cares about that.
     

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...