Tokyo Tech Builds First Tesla GPU Based Heterogeneous Cluster To Reach Top 500

Discussion in 'GPGPU Technology & Programming' started by Jawed, Nov 18, 2008.

  1. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    SC08—AUSTIN, TX—NOVEMBER 17, 2008—The Tokyo Institute of Technology (Tokyo Tech) today announced a collaboration with NVIDIA to use NVIDIA® Tesla™ GPUs to boost the computational horsepower of its TSUBAME supercomputer. Through the addition of 170 Tesla S1070 1U systems, the TSUBAME supercomputer now delivers nearly 170 TFLOPS of theoretical peak performance, as well as 77.48 TFLOPS of measured Linpack performance, placing it, again, amongst the top ranks in the world’s Top 500 Supercomputers.

    “Tokyo Tech is constantly investigating future computing platforms and it had become clear to us that to make the next major leap in performance, TSUBAME had to adopt GPU computing technologies,” said Satoshi Matsuoka, division director of the Global Scientific Information and Computing Center at Tokyo Tech. “In testing our key applications, the Tesla GPUs delivered speed-ups that we had never seen before, sometimes even orders of magnitude – a tremendous competitive boost for our scientists and engineers in reducing their time to solution.”

    Speaking to the ease of implementation, Matsuoka continued, “The entire upgrade was carried out in 1 week, and the TSUBAME supercomputer remained live throughout. This is an unprecedented feat in top-level supercomputing.”

    “We are honored to partner with Tokyo Tech – world famous for their supercomputing expertise and success,” said Andy Keane, general manager of the GPU Computing business at NVIDIA. “NVIDIA Tesla breaking into the Top 500 marks a milestone in supercomputing history. The massively parallel GPU is now essential for supercomputing centers worldwide.”

    The first to achieve Top 500 ranking with an NVIDIA Tesla based GPU cluster, Tokyo Tech. is one of hundreds of distinguished universities and supercomputing centers that have adopted GPU based solutions for research. Other leading centers include the National Center of Supercomputing Applications (NCSA) at the University of Illinois, Rice University, University of Heidelberg, University of Maryland, Max Planck Institute and University of North Carolina.

    The Tesla S1070 1U GPU system is based on the NVIDIA CUDA™ parallel architecture. This architecture is accessible through an industry standard C language programming environment that allows developers and researchers to tap into the parallel architecture of the GPU more quickly and easily than any other solution shipping today.
    For more information on NVIDIA Tesla S1070, please visit: www.nvidia.com/object/tesla_s1070

    ---

    Pretty groovy, huh?

    TiT has a load of Clearspeed processors, too, lurking somewhere within Tsubame. Wonder if they have much of a lifetime left.

    http://www.clearspeed.com/newsevents/news/pressreleases/ClearSpeed_Nissho_TokyoTech.php

    Jawed
     
  2. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    How long you think it is before nearly all in top500 are gpu accelerated? I'd say 2-3 years.
     
  3. RudeCurve

    Banned

    Joined:
    Jun 1, 2008
    Messages:
    2,831
    Likes Received:
    0
    It's interesting they did not go with Clearspeed this time around. I'm thinking Nvidia probably gave them a better deal. Clearspeed does have the new CATS 700 1U systems with comparable performance to Nvidia's 1U solution.
     
  4. Rufus

    Newcomer

    Joined:
    Oct 25, 2006
    Messages:
    246
    Likes Received:
    60
    RudeCurve: hadn't heard about that system before, but it makes for a very interesting architecture comparison.

    CATS 700: 1,100GFLOPS DP (? SP), 24GB RAM, 96GB/sec bandwidth
    Tesla S1070: ~333GFLOPS DP (~4,000GFLOPS SP), 16GB RAM, 400GB/sec bandwidth

    They only thing that is equivalent between the two is the amount of RAM. Clearspeed has a huge DP FLOPS advantage, while NV has a huge bandwidth advantage and probably has a large SP FLOPS advantage.

    Real world numbers for these two architectures will be extremely different based on if the workload is compute or bandwidth bound.

    Edit: also I wonder how long before AMD gets ATI onto the list.
     
    #4 Rufus, Nov 18, 2008
    Last edited by a moderator: Nov 18, 2008
  5. pcchen

    pcchen Moderator
    Moderator Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    2,749
    Likes Received:
    127
    Location:
    Taiwan
    I don't know about the architecture of Clearspeed processors. But for GPUs, because they have relatively smaller internal memory (including registers and share memory), many workloads are going to be more bandwidth bound then on normal CPU. If Clearspeed suffers from similar problem, then I'd say the bandwidth advantage is probably quite important.
     
  6. bowman

    Newcomer

    Joined:
    Apr 24, 2008
    Messages:
    141
    Likes Received:
    0
    Ooh, Linpack on GPUs! I didn't even know they had a package for that. I wish they'd make it available, just for novelty's sake..

    I thought the lack of ECC memory was an obstacle to using GPUs in production?
     
  7. ShaidarHaran

    ShaidarHaran hardware monkey
    Veteran

    Joined:
    Mar 31, 2007
    Messages:
    3,984
    Likes Received:
    34
    That'd be quite the novelty, producing incorrect results and then burning out the GPU in minutes :p

    It is an obstacle, but not an insurmountable one.

    Look at the GPU client for FAH. Frequent checkpoints with result verification are the answer, here. Some performance is lost of course.
     
  8. bowman

    Newcomer

    Joined:
    Apr 24, 2008
    Messages:
    141
    Likes Received:
    0
    Really? The so-called 'Intel Burn Test' (Linpack) runs on a single processor just fine, and reports the correct FLOPS results along with correct mathematical results as long as it's not overclocked beyond stability (so a nice stability test). Even if it's made for clusters, shouldn't it be possible to run this on a single board as well?

    I really want to test the 8800GTX and get the real numbers now :lol: But I guess, perhaps this is a 64-bit GT200 only implementation..
     
  9. ShaidarHaran

    ShaidarHaran hardware monkey
    Veteran

    Joined:
    Mar 31, 2007
    Messages:
    3,984
    Likes Received:
    34
    My point is that GPUs were not designed with such precision in mind. Where rounding errors are unacceptable in the CPU arena, they are a fact of life for GPUs.
     
  10. pcchen

    pcchen Moderator
    Moderator Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    2,749
    Likes Received:
    127
    Location:
    Taiwan
    Whether GPU was designed with that precision in mind is not important. What's important is whether it is designed with that precision.

    For example, NVIDIA claims that GT200 supports full IEEE 754 precision when doing double precision, and some operations (basically add and multiply) has full precision when doing single precision. These are probably good enough for some operations, depends on what you want to do and your algorithms.

    ECC is another problem when you are using a lot of devices in parallel and operates continually for a long time. This is actually what NVIDIA can do to differentiate Tesla and other consumer level hardwares.
     
  11. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    F@H operates in a way that makes it quite different from an HPC center.
    The computation is done on a wide swath of hardware, for which:
    1) maintenance of said hardware is not handled by F@H
    2) paying for the maintenance of said hardware is not done
    3) there's mostly no physical plant, lower utilities
    4) the economic model, such as it is, different than a supercomputer for a proprietary client might have

    I haven't seen the figures of much of F@H's FLOPs finally resolved to verified result FLOPs.
    I suppose we'll see. Some workloads won't mind the error, and in some cases the lowered yield due to checkpointing might not be prohibitive.
    A university with academic interest in GPGPU is not the primary test of the extendability of the concept.

    A flood of crap FLOPs that happens to be free doesn't look the same if it is constrained by physical and financial limits, it has to be maintained, and it ceases to be free.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...