PDA

View Full Version : QPACE - Quantum Chromodynamics on PowerXCell 8i


Jawed
14-Jun-2009, 18:35
Nice overview of the project:

http://www.itwm.fhg.de/hpc/workshop/mic/Qpace_(Dirk_Pleiter_-_Desy).pdf

More details on the hardware:

http://www.fz-juelich.de/jsc/datapool/cell/eQPACE/pleiter-eqpace-20090209.pdf

It's a bit dodgy that the first presentation makes its FLOPs/word case based on single-precision math, but still it all looks quite tasty.

This article is more wordy (pages 25-27):

http://inside.hlrs.de/pdfs/inSiDE_autumn2008.pdf

The cooling architecture is interesting as it uses a large coldplate per backplane, i.e. one coldplate per 32 node cards. The cards are cooled solely by conduction of heat into the coldplate which is water-cooled.

2048 node cards seems to be the limit for a single set of racks, i.e. all networked to run a single program. With each card able to deliver 100GFLOPs that seems to be a target of ~200TFLOPs.

Jawed

rpg.314
14-Jun-2009, 21:18
I don't get it. Why bother making such huge investments based on a product that has no public roadmap into the future. Not just with regard to this project, which seems a nicely executed one (direct LS to LS DMA with 1us latency sounds cool), but generally.

Why should anyone bother to write for cell when nobody knows whether it will be present in the future or not. IIRC, there was talk of the 2PPE+32SPE cell2 being scrubbed. If it wasn't, surely, something or the other would have been heard regarding it.

BTW, I'd like someone to put together something like this based on GPU's. Here's my wish list. 8x(4870 equivalent+4GB GDDR5) per blade, connected to an FPGA over the HT3.1 bus. With the FPGA providing QDR IB over fiber optic.

patsu
14-Jun-2009, 23:03
Probably because the hardware and software meet their budget and operating demand (heat, power and FLOP-wise). Slide 15 has a little comparison (based on balance of h/w).

Software that runs well on Cell can probably run well everywhere else due to its stringent demand in data locality. So porting to other platform should not be an issue. Many high performance systems use a NUMA-like data access architecture anyway.

Besides, GPU architecture and programming model are still evolving too. It's only a chip. The entire system (e.g., memory, interconnect) needs to be built-up for general purpose high performance computing. Cell took those into consideration as long as the programmers are willing to optimize for data locality (i.e., local store).

I think they can always explore a better solution in the future. But they need answers today (or rather, yesterday, since the project started last year).

rpg.314
15-Jun-2009, 07:38
Besides, GPU architecture and programming model are still evolving too. Hmm, they are not evolving much more than Cell. PS3 launched in late 2006, about the same time as Cuda.

patsu
15-Jun-2009, 07:43
With Intel's work ? I think it can be more like programming a CPU (eventually). OpenCL should work equally well in Cell -- as long as locality is observed.

rpg.314
15-Jun-2009, 18:07
With Intel's work ? I think it can be more like programming a CPU (eventually). OpenCL should work equally well in Cell -- as long as locality is observed.

Dude, what's this? And how the hell is this related to what I said? Where did intel/opencl come into it?

patsu
15-Jun-2009, 20:20
Huh ? I'm just saying things are changing on the GPU programming model front also. So whatever you invested in programming GPU SKUs will change like any other code investment in SPUs. As long as the basic principles are still applicable, then the engineers will simply use whatever works for their budget and operating demand.

Shifty Geezer
16-Jun-2009, 14:50
Code is perpetually changing. I can't think of many industries where you need to keep learning the latest stuff to stay on top as much as software development. Anyone thinking 'I know how to do this, I'll do it this way for the rest of my working life' isn't going to get very far ;)

Still, is the roadmap for Cell still active? Or is it a deadend architecture?

Enzyme
16-Jun-2009, 15:25
I believe the roadmap is still active. I don't know how much it has changed since the last time I saw it though (if it has changed at all).

I'll meet Peter Hofstee next Tuesday. He's gonna talk about Cell and the future of heterogeneous multicore processors in mainstream applications.

I'll be sure to ask him about the roadmap though, and I'll let you know. :wink:

Shifty Geezer
16-Jun-2009, 15:56
Good job!

Enzyme
26-Jun-2009, 15:07
Well, I got to talk to Hofstee and he said that there weren't any real plans to scale up the Cell to more SPE's or more PPE's in the near future. He did say that they could if they wanted to though, but it just wasn't their goal right now.
They are mostly working on the software front, trying to make the programming a lot easier than it is now.
So hopefully we see something really useful comming from them real soon.

patsu
26-Jun-2009, 16:50
Did he elaborate on the approach they are taking now ? (regarding the simplifying programming part)

rpg.314
26-Jun-2009, 18:00
I'd think that would go the opencl route. That seems to be their best bet. Opencl is going to be the runtime of choice for GPU's and there's going to be a huge amount of code written as cl kernels. POrting them to run on the cell should be a simple exercise.

Enzyme
26-Jun-2009, 22:04
Yeah, that's indeed one of the approaches they're taking.
There are some other options, but I don't really know how far I can elaborate on this, because he wasn't realy sure about what he could tell and show himself. :)

Panajev2001a
27-Jun-2009, 19:58
Yeah, that's indeed one of the approaches they're taking.
There are some other options, but I don't really know how far I can elaborate on this, because he wasn't realy sure about what he could tell and show himself. :)

It is unfortunate that they are not talking about HW upgrades.... improvements to the EIB (which originally should have been a switched network and not a ring bus), to the SPU, to the PPU, etc... not more of the latter two simply... ISA additions, addition of cache (LS1 Cache is even in the CBEA 1.0 specs... will we ever see CBEA 2.0 or IBM is trying to simply sell current CELL [DP enhanced model] chips in blades and more and more software stacks to make their customers happy?)...

I am glad that they are keeping on pushing the software stack that runs on the actual CELL, I just hope they are not trying to simply optimize its performance without a successor in mind...

Luckily their Rochester Labs are working on newer approaches...

http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=2&f=G&l=50&co1=AND&d=PG01&s1=%22Mejdrich%3B+Eric+Oliver%22.IN.&OS=IN/%22Mejdrich;+Eric+Oliver%22&RS=IN/%22Mejdrich;+Eric+Oliver%22 (an example of their several VPU patents)

Inventors: Mejdrich; Eric Oliver; (Rochester, MN) ; Muff; Adam James; (Rochester, MN) ; Tubbs; Matthew Ray; (Rochester, MN)

Shifty Geezer
28-Jun-2009, 09:25
I guess due to the scalability of the chips networked, at the moment shrinks and sticking more on a mobo is an acceptible way to increase performance. They need a large enough market to justify hardware improvements. The better the software side, the more people will use Cell, the more reason to push it forwards. Off course if they don't develop the hardware enough, GPGPU will overtake. This also makes one wonder about PS4, if there won't be a new Cell processor for it!

Vitaly Vidmirov
29-Jun-2009, 13:41
improvements to the EIB (which originally should have been a switched network and not a ring bus)
What's wrong with ring bus? It has simple topology and high B/W.
A few extra cycles of latency? :lol: I can't imagine who cares about that.