Tianhe-1A installation - 7168 NVIDIA Tesla M2050s

Discussion in 'GPGPU Technology & Programming' started by Rys, Oct 29, 2010.

  1. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,541
    Likes Received:
    964
    Well, let's look at some raw numbers:

    Code:
                                                Xeon X5670                     Tesla C2050
    Peak DP GFLOPS                              70.32                           515.2
    TDP (W)                                     95.0                            238.0
    ~Peak efficiency (DP GFLOPS/W)              0,74                             2,16
    
    Now the same numbers considering that Tianhe-1A has 14,336 Xeon X5670 processors and 7,168 Nvidia Tesla M2050s.

    Code:
                                             Xeon X5670              Tesla C2050
    Peak DP TFLOPS                            1008.1                    3693
    So basically, GPUs offer 3.66 times as much raw power as CPUs, and about 2.92 times the peak power-efficiency.

    In other words, for GPUs to contribute as much as CPUs to total throughput, their efficiency needs to be at least 1/3.66 that of CPUs. If we generously assume that to be 95%, then those little Fermis only need to reach ~26% efficiency.
    If power-efficiency is the goal, then it's a bit harder to assess because I suspect Xeons have better power-management. Still, 95/2.92 = 32,5. So let's say they need to reach about 40% efficiency to be as "green" as CPUs.

    Maybe those back-of-the-napkin calculations don't paint the whole picture, but that sounds doable, doesn't it?
     
  2. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
    While 25% efficiency sounds doable, the reality is that for many HPC workloads/algorithms, GPU do not reach that level of efficiency! Though it isn't like this is exactly a new issue. If you look back a couple years you'll see that FPGAs were just as hyped as GPUs are now and the reality is that the ended up suffering from many of the same issues with the biggest being that while in some workloads they did well, in the vast majority they were just dead weight. This becomes even a bigger issue with algorithms that give significant algorithmic speed ups. The classic data structures were big simple matrixes, but as algorithms have improved the data structures and algorithms have become much more sparse and complex.

    I'm not saying that GPUs are useless, I just saying that hyping a machine based on a linpack result with ~<50% efficiency just seems like it is counter productive. Architecturally, we've known for a long time how to design chips that will look incredible on something like linpack. If linpack was the real world goal, we'd already have exaflop machines. The reality is that in real world workloads of scientific interest, many of the most efficient capability supers are doing extremely well to get 10% of peak. The capacity supers are doing significantly less than that. And these are all machines that can push 90+% efficiency on something as simple as linpack. If you cannot hit 90% efficiency on linpack then you're going to have a hell of a time hitting 1% on the capability codes.

    This is all reflected in the research priorities for next gen supers, none of which is focused on things like linpack. In many ways, building a machine to top the top 500 linpack machine is building for a target over a decade past its expiration date.
     
  3. Florin

    Florin Merrily dodgy
    Veteran Subscriber

    Joined:
    Aug 27, 2003
    Messages:
    1,707
    Likes Received:
    345
    Location:
    The colonies
    Still, there are many real world workloads that have been retrofitted on GPUs just fine, regardless at how their efficiency happens to look in Linpack. As you said, Linpack itself shouldn't be a goal.

    GPUs also weren't specifically designed for tasks like derivative pricing models, image and media transformation and encoding, mass particle interaction like galaxy simulations etc etc, yet they seem to be rather succesfully making inroads in these areas against the CPU value proposition.
     
  4. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,093
    Likes Received:
    3,178
    Location:
    New York
    Yep attempts to characterize GPUs as being poor at specific tasks are rather useless given that the folks employing them are doing so to run tasks for which they are well suited.
     
  5. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,093
    Likes Received:
    3,178
    Location:
    New York
  6. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,802
    Likes Received:
    3,921
    Location:
    Germany
    Very interesting indeed. But there's another paragraph, I'd like to highlight:
    If that's true, then maybe in a year from now, it won't be Nvidia any more what is considered the driving force behind GPUs entering the HPC space. :)
     
  7. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,093
    Likes Received:
    3,178
    Location:
    New York
    It'll take more than a year but yeah eventually they could be squeezed out there as well. x86 owns their bones.
     
  8. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,360
    Likes Received:
    1,377
    In my field, this is the experience as well.
    We can't really do much with GPUs, I'd really like to hear just what the Tianhe is going to be actually used for.

    When it comes to computational science there is this huge divide between those who are involved with high performance computing as a discipline in its own right, and people who try to do real work on the damn things and solve problems relevant to their respective fields. In my own field I've seen the number of algorithms that can be profitably ported to high performance architectures dwindle. There just isn't that many embarrassingly parallel codes that do huge amounts of work on tiny amounts of data.

    And it's frustrating that the more intelligent you try to be, the more knowledge and heuristics you try to incorporate to work smarter, the harder it becomes to substantially outperform any old PC you can buy for less than $1000. To maintain my image, I have to use Big Iron for validation runs. :) (And even on those codes we're limited more by the memory subsystems than available FLOPs).
    Seriously, what you wrote about massive computation using simple algorithms vs. smarter computation on more complex algorithms struck very close to home. For at least the last couple of decades, HPC has gone ever further on the first path, and for the life of me, I can't see it as being the future, bragging rights be damned. In what domain does human creativity get the most opportunities?

    The older I get, the less I get out of outrageously powerful and expensive hardware. Good thing there are younger guys out there with higher levels of testosterone to push the limits. ;) Kudos to you.
     
  9. aaronspink

    Veteran

    Joined:
    Jun 20, 2003
    Messages:
    2,641
    Likes Received:
    64
    I can dig up lots of BS from the director of the NCSA. You do realize that his primary job is to get yet more funding. That's it. "oh look, china took the top spot in linpack, I bet I can use that to get more funding". Putting in more harder to program flops isn't solving any of the real problems facing supercomputers.
     
  10. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,093
    Likes Received:
    3,178
    Location:
    New York
    I'm sure you can but from where I stand his BS > your BS :)
     
  11. madyasiwi

    Newcomer

    Joined:
    Oct 7, 2008
    Messages:
    194
    Likes Received:
    32
    Didn't the GPU eventually unify the vertex and pixel shader for a reason?
     
  12. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,802
    Likes Received:
    3,921
    Location:
    Germany
    Yes, but I fail to get your point. The quoted statement is not about unified GPUs, but about very fast connections between CPU and GPU, so fast actually, that it only seems possible to realize, if both reside on the same substrate or even share the same silicon.

    I guess, we're talking about 100s of GB/s.
     
  13. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,541
    Likes Received:
    964
    And perhaps latency is even more important in this case…
     
  14. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    TBH, I just don't get people's insistence on pointing out that GPUs suck for zlib and the like, TEH REAL WORLD YOU KNOW....

    It is ironic that for all the real-world-rants, in the last 3-4 years, I have only seen the use of gpu's rise. From games to UI, to HD video decode (and now encode) and now HTML5. Must I point out that CPUs suck at, I dunno, rendering HTML5 canvas?

    GPUs are different type of processors and are meant for different types of things. Deal with it.
     
  15. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,541
    Likes Received:
    964
    What they're saying and what you're saying aren't very different things, they're just said from different points of view.

    They're only trying to provide a counterpoint to NVIDIA's bullshit that GPUs can speedup most workloads by 100× or whatever…
     
  16. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    There is a subtle difference.
    To me, the idea that GPUs are good for only the most trivial codes is BS.
     
  17. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,093
    Likes Received:
    3,178
    Location:
    New York
    As far as I can tell nVidia always points to very specific fields where GPUs are a benefit. Where did you get "most workloads" from?
     
  18. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,541
    Likes Received:
    964
    That was a hyperbole, but they do present things as if GPUs were the way to go for all HPC problems.

    They typically make very broad statements and try to present GPUs as suitable for all demanding workloads. I suppose it's natural, why would they tell people when not to use their products?
     
  19. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    949
    Likes Received:
    419
    I deal with it, that's why I don't say they are the blessing for all and everything, just for that for that it's prepared ...

    What I wrote was exactly not dogmatic, and my examples were for people to understand what I mean (not everyone has a Ph.D and knows about algorithm theory). You may also have realized that the structure of code-packages like zlib is simplified representative for the majority of code, and not the minority. That the data-structures you use in the DOM are of the wide-spread kind (graphs, unbound trees, circular lists) and not the exotic kind. If GPUs are not able to run such types of code and handle such kind of open structures sometime, they will progressively hit the utility wall (<sarc>but as we know humans they will start inventing problems they can solve fast on GPU, instead of producing fast solvers for important problems</sarc> it's not that bad yet ;) ).

    Even if you merge CPU and GPU, and the GPU would be of good use in tight loops with the simplest of algorithms on huge amounts of unconditionally accessible data - but you can not run-time defuse the GPU to not use any power at all, the given chip may have much lower efficiency than a pure CPU.

    At times you need a processor which is able to cope with very different paradigms in the same "moment" (inconvenient task-switching for example), and there simply is the biggest disadvantage of a GPU which favors a real strict non-preemptive well-defined tuned environment.
     
  20. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    949
    Likes Received:
    419
    I think it's exactly there where the discussion should take place. We have papers like this:

    Hardware acceleration vs. algorithmic acceleration: Can GPU-based processing beat complexity optimization for CT?
    http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.74.386

    Isn't that postulation a bit backwards, if not dangerous?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...