Nvidia BigK GK110 Kepler Speculation Thread

Discussion in 'Architecture and Products' started by A1xLLcqAgt0qc2RyMz0y, Apr 21, 2012.

Tags:
  1. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    So you think there is a product line that cannot be used?
     
  2. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    I am saying that the software stack that goes with that product line is a net negative.
     
  3. Tim Murray

    Tim Murray the Windom Earle of mobile SOCs
    Veteran

    Joined:
    May 25, 2003
    Messages:
    3,278
    Likes Received:
    66
    Location:
    Mountain View, CA
    drive-by to say that although I left NV in July, I was the CUDA SW lead for Titan for a long time (over a year). B3D taught me well :) (this is the second #1 machine I was heavily involved with, I worked on Tianhe-1A as well)

    also, it should be noted that it's hard to get efficiency close to BG/Q on any x86 machine--even Jaguar was running at 75% efficiency.
     
  4. tviceman

    Newcomer

    Joined:
    Mar 6, 2012
    Messages:
    191
    Likes Received:
    0
    Why did you leave Nvidia? Where are you at now?
     
  5. ams

    ams
    Regular

    Joined:
    Jul 14, 2012
    Messages:
    914
    Likes Received:
    0
    #445 ams, Nov 14, 2012
    Last edited by a moderator: Nov 14, 2012
  6. LordEC911

    Regular

    Joined:
    Nov 25, 2007
    Messages:
    877
    Likes Received:
    208
    Location:
    'Zona
    No such thing as "fully optimized"...


    They have had silicon in house for at least a year...
     
  7. Tim Murray

    Tim Murray the Windom Earle of mobile SOCs
    Veteran

    Joined:
    May 25, 2003
    Messages:
    3,278
    Likes Received:
    66
    Location:
    Mountain View, CA
    Why I left: lots of reasons, but the main one is that at this point I don't think HPC is where I want to spend the majority of my career. It was an unbelievable first job to have in terms of advancement and learning opportunities, but it was time to move on.

    Working on the Android RenderScript team at Google now. (did you know: we shipped GPU compute on Nexus 10)
     
  8. ams

    ams
    Regular

    Joined:
    Jul 14, 2012
    Messages:
    914
    Likes Received:
    0
    When running Linpack, the Titan system overall achieves ~ 2.1428 GFLOPS/w, but GK110 by itself achieves ~ 7 GFLOPS/w (per Bill Dally @ NVIDIA in his SC12 presentation). So GK110 appears to be very energy efficient (relatively speaking). With Titan, there is a 1:1 ratio of CPU-to-GPU, a very high total number of cores, a very high performance network, and other things that contribute to the overall power consumption and reduced energy efficiency relative to running GK110 by itself.

    With respect to the King Abdulaziz supercomputing system, the data has been revised/updated to include power consumption. The Linpack performance/watt for the system is quite good (relatively speaking), but the Linpack efficiency is still really low, at ~ 38.4% of the theoretical peak performance. Any thoughts on why the efficiency would be so low?
     
    #448 ams, Nov 16, 2012
    Last edited by a moderator: Nov 16, 2012
  9. Blazkowicz

    Legend

    Joined:
    Dec 24, 2004
    Messages:
    5,607
    Likes Received:
    256
    I wonder where the line is drawn, to get that incredibly precise 2.1428 figure.
    There's the power used by the rack cabinets themselves (maybe 208V DC or some combination of DC voltages). Then power used by the transformers/PSU that generate that supply from whatever comes to the building. The UPS batteries too, and then cooling that big server room.
     
  10. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    The data in the Top500 list is still wrong as said in the SI thread already. The cluster has a theoretical peak of just 674.7 TFLOP/s and runs with a Linpack efficiency of 62.4%. The predecessor of that cluster (using Cypress cards) attained a Linpack efficiency of 69,6% iirc. The difference is that they put now two much faster cards in a single node (instead of a single Cypress card) and the network basically stayed the same potentially starting to limit a bit in comparison. The efficiency of the DGEMM kernels on the GPU is in both cases close to or even above 90%.
     
    #450 Gipsel, Nov 16, 2012
    Last edited by a moderator: Nov 16, 2012
  11. ams

    ams
    Regular

    Joined:
    Jul 14, 2012
    Messages:
    914
    Likes Received:
    0
    An erroneous peak performance data point would certainly explain the strangely low Linpack efficiency. That is a pretty big error for Top 500 to make, because the peak system performance goes from 1.098 Petaflops to 0.6747 Petaflops.
     
  12. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    The core count is also completely wrong. Top500 says there are 33600 accelerator cores and 38400 in total, while there are in fact 3360 CPU cores and 840 GPUs (on 420 S10000 cards). Looks like they should train that copy paste stuff a bit more. :lol:

    If we count CUs as cores (with nV GPUs SMx are usually counted as cores) it would be 23520 accelerator cores and 26880 in total.
     
  13. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Arthur Bland from OLCF/ORNL gave some details about this: the 8.209 MW are with all cooling, PSUs and everything included (at the wall, so to say), but are the average over the whole Linpack run. Peak power was about 8.9 MW - I hope those measurements are standardized over the top500 entry - otherwise, the green500 would be reduced to some sort of excel joke, with green500= sort[Rmax/Power] as the formula.
     
    #453 CarstenS, Nov 17, 2012
    Last edited by a moderator: Nov 17, 2012
  14. ams

    ams
    Regular

    Joined:
    Jul 14, 2012
    Messages:
    914
    Likes Received:
    0
    According to the podcast guys at HPCWire, Tesla K20/K20X is more energy efficient than both Xeon Phi and FirePro 10000 when compared on it's own:

    http://www.hpcwire.com/hpcwire/2012-11-16/podcast:_amd_troubles_sc12_winners_and_losers.html

    The podcast guys even hint that it would be possible to create a "trick" system to gain top honors on the Green 500 list. They suggest that Beacon (with Xeon Phi) and SANAM (with FirePro 10000) supercomputing systems are propelled to the top of the Green 500 list for two reasons: 1) The ratio of accelerators to CPU's in the system is relatively high. For instance, in the Beacon system with Xeon Phi accelerator, there are four Xeon Phi's for every two Xeon CPU's per node. Since accelerators have relatively high performance/watt compared to CPU's, the Green 500 performance/watt score is significantly boosted. 2) The size and scope of the system is relatively small compared to the Top 10 supercomputing systems. For instance, the Beacon system contains only 144 Xeon Phi accelerators (compared to 18,688 Tesla K20X accelerators in the Titan system and 1,875 Xeon Phi accelerators in the Stampede system). Performance scaling tends to become worse as the number of cores in a supercomputing system increases, so it is much easier to achieve high performance/watt with smaller systems.

    I do believe that the Green 500 score for Beacon (with Xeon Phi) in particular is a bit suspect. None of the other systems equipped with Xeon Phi are even close with respect to Beacon's Green 500 score. I suspect that when running Linpack on the Beacon system, the Xeon CPU's were turned off while the Xeon Phi's were used exclusively for Linpack. Since Xeon Phi has x86 functionality and can essentially operate autonomously in the system, it is possible to do this. And since accelerators tend to have higher performance/watt than CPU's, that would boost the Green 500 performance/watt score substantially. But this is a bit misleading too, because anyone with a Xeon CPU + Xeon Phi accelerator supercomputing system would never realistically use the system with the Xeon CPU's turned off, as the Xeon Phi's would be far less efficient than the Xeon CPU's when executing serial portions of the software code. On the flip side, the Titan supercomputing system with K20X accelerators appears to achieve very high Green 500 performance/watt in it's standard (and not yet fully optimized) configuration, while also achieving extremely high overall Top 500 performance, so that is quite an achievement.
     
  15. ams

    ams
    Regular

    Joined:
    Jul 14, 2012
    Messages:
    914
    Likes Received:
    0
    Here is an interesting fact that highlights the momentum behind GPU-accelerated heterogeneous high performance computing: only one year ago, if you looked at the Top 500 supercomputing systems, the total peak throughput of all Top 500 systems combined (excluding Sequoia) was less than 30 Petaflops. Within the last one month, there has already been 30 Petaflops of Tesla K20/K20X cards shipped out:

    http://www.youtube.com/watch?v=50AzXbrvtmg

    This will be a very interesting and exciting space to monitor over the next few years...
     
  16. lanek

    Veteran

    Joined:
    Mar 7, 2012
    Messages:
    2,469
    Likes Received:
    315
    Location:
    Switzerland
    Im not sure you can turn off the CPU in the system with Phi cards for get data... its a coprocessor unit, they cant work alone. ( as mentionned by Anandtech if i remember well )
     
  17. ams

    ams
    Regular

    Joined:
    Jul 14, 2012
    Messages:
    914
    Likes Received:
    0
    Actually Xeon Phi can boot Linux and it can run x86 software on it's own (remember that Xeon Phi consists of a few dozen relatively simple x86 CPU cores placed together on one piece of silicon). I believe that Intel has pulled wool over people's eyes by running Linpack on the Beacon supercomputing system without turning on the Xeon CPU's, primarily to get on top of the Green 500 list. The reality is that Xeon Phi is supposed to be used as a "co-processor", and no one in their right mind would use it in a supercomputing system without some high performance CPU's. Fortunately for NVIDIA, Project Denver will integrate CPU and GPU cores so that their card will be able to boot Linux too. Same goes for AMD presumably with their next gen card too.
     
  18. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,976
    Likes Received:
    5,210
    http://blogs.barrons.com/techtrader...losses-wells-sees-hope-in-cost-cuts-consoles/
     
  19. iMacmatician

    Regular

    Joined:
    Jul 24, 2010
    Messages:
    797
    Likes Received:
    223
  20. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    Maybe it just denotes the board or cooling variation. I've seen an air cooled K20 card bearing the designation of K20SUC and for the K10 there is also a version named K10-RL-SUC (specifying an aircooled version with the airflow from the left to the right, howsoever that is defined). There are probably a few versions around with different cooling layouts or even coming without a heatsink (for watercooled installations).
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...