Nvidia BigK GK110 Kepler Speculation Thread

Discussion in 'Architecture and Products' started by A1xLLcqAgt0qc2RyMz0y, Apr 21, 2012.

Tags:
  1. Wynix

    Veteran Regular

    Joined:
    Feb 23, 2013
    Messages:
    1,052
    Likes Received:
    57
  2. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,797
    Likes Received:
    2,056
    Location:
    Germany
    Seems like the first US sites are getting their samples now. But probably from board partners rather than Nvidia itself.
     
  3. iMacmatician

    Regular

    Joined:
    Jul 24, 2010
    Messages:
    773
    Likes Received:
    200
    My SMX assumption doesn't seem to hold. According to slide 16 in this slide deck, the K80 has 2.9 TF DP, 4992 CCs, and 480 GB/s memory bandwidth. These specs would imply 13 SMXs per chip and ~870 MHz core clock.

    Wouldn't a GPU with 15 SMXs enabled and at a lower clock improve performance/W? I'm also considering the possibility that the GK210 chip physically has only 13 SMXs, although I'm not sure why they would do that.
     
  4. RecessionCone

    Regular Subscriber

    Joined:
    Feb 27, 2010
    Messages:
    499
    Likes Received:
    177
    My favorite thing about GK210 is the 512 kB of register file and 128 kB of L1/shared memory per SM. Can be nice for occupancy limited code.
     
  5. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    Interesting observation on Anandtech: this is the first GPU that is created for Tesla only. This means that the Tesla business is now large enough to warrant separate silicon? Remarkable.
     
  6. Dade

    Newcomer

    Joined:
    Dec 20, 2009
    Messages:
    206
    Likes Received:
    20
    It is going to be pretty much awesome for anyone not writing 10 lines long kernels. I can easily predict huge LuxMark scores and good results with any renderer.
     
  7. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    2,910
    Likes Received:
    1,607
    [​IMG]
    http://www.guru3d.com/news-story/nvidia-tesla-k80-dual-gpu-compute-accelerator.html
     
    #1847 pharma, Nov 17, 2014
    Last edited: Nov 17, 2014
  8. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,797
    Likes Received:
    2,056
    Location:
    Germany
    Another aspect or maybe free interpretation: GK210 is a failsafe for 16/20nm not being ready for another round of >500mm² products. This could be an(other) indication, that GM200 was/is planned for 16/20nm only release.
     
  9. Picao84

    Veteran Regular

    Joined:
    Feb 15, 2010
    Messages:
    1,532
    Likes Received:
    689
    There were already Customer Samples of GM200 detected on shipping manifests so it does not make much sense for it to be waiting on 16nm.

    What about a crazy theory that GM200 is 28nm but a Gaming oriented chip, without new compute features and lower total DP performance than GK210 (although higher DP/watt)? :D

    I think the fact that it is called GM200 and not GM210 is highly revealing of its nature...
     
  10. AnarchX

    Veteran

    Joined:
    Apr 19, 2007
    Messages:
    1,559
    Likes Received:
    34
    GF100 was also a HPC chip. Also NVs Mike Clark called GM200 in line with other HPC chips.

    The other odd aspect of GK210 is his MIA brother GK180, which was shipped at Zauba in early 2013 and has his own device in CUDA DLL (so it was not just the GK110B).
    Maybe there is some intern lobby at NV who still wants to push the super-scalar approach...
     
  11. Blazkowicz

    Legend Veteran

    Joined:
    Dec 24, 2004
    Messages:
    5,607
    Likes Received:
    256
    No idea, but Pascal is the variant with very high bandwith coherent interconnection, and lower memory latency. It is a lot more interesting as a new product for HPC than GM200.
     
  12. Picao84

    Veteran Regular

    Joined:
    Feb 15, 2010
    Messages:
    1,532
    Likes Received:
    689
    True, but that was before nVIDIA further bifurcated Compute from Graphics Chips. While GF104 and GF100 shared most (all?) of the feature set, GK110 brought things like Dynamic Paralelism and Hyper-Q, which GK104 never had.
     
  13. Erinyes

    Regular

    Joined:
    Mar 25, 2010
    Messages:
    647
    Likes Received:
    92
    I would have though so as well..but I guess there is a floor and diminishing returns as you go lower. Given the already low 562 mhz clock, perhaps there wasn't much benefit in going lower. And of course, given the dual GPU config and 300W TDP, there could simply be a hard power limit which limited them to 13 SMXs.
    Yea even I noticed that..it is quite interesting. But given that it wasn't a big change..the costs were likely minimal. From what I have read, costs at 28nm are still reasonable. It is at 20/16nm where design costs (and time) go up significantly, apart from the higher per transistor costs at the moment.

    What I'm also more curious about is how has the die size been impacted by these changes? Did they have to increase the die size?
    Nope, GM200 was planned for 28nm since at least late last year. GK210 being a failsafe makes no sense as GM204 beats it in everything except DP. My guess is that as it was a very minimal change, the design costs and time were low enough that it was worth doing it.
    Umm why? If it was called GM100 instead of GM200 then that might have been something. GM200 is just following the standard Nvidia naming convention.
     
  14. Picao84

    Veteran Regular

    Joined:
    Feb 15, 2010
    Messages:
    1,532
    Likes Received:
    689
    There was no GK100...
     
  15. AnarchX

    Veteran

    Joined:
    Apr 19, 2007
    Messages:
    1,559
    Likes Received:
    34
    Are ~500mm² tapeouts so cheap?
    If you take GK180 also in this "bigger cache Kepler" project, we are talking about two tapeouts, two years of working on it.
    Also Mike Clark saw GK210 as summer 2014 product, while GM200 is/was end of 2014/early 2015.

    Its probably a failed time to market project. Maybe they had to low resources because of Tegra Kepler/Denver and Maxwell.

    But GK210 is also a x10 part, while there was no GK200.
     
  16. Kaarlisk

    Regular Newcomer Subscriber

    Joined:
    Mar 22, 2010
    Messages:
    293
    Likes Received:
    49
    There might be a clue in this presentation (link to parent page). Pages 11-25.
     
  17. Erinyes

    Regular

    Joined:
    Mar 25, 2010
    Messages:
    647
    Likes Received:
    92
    True but here there was a GM107 before GM204. Also note that the x10 designation may not necessarily imply compute focused. (See GF100 to GF110..and also the GK180 part)
    I don't know the exact costs but from what I've read it is in the range of a few million. The architectural changes are minimum and its largely just a new hardware layout and tape out. They could very well have done this while Maxwell was still in development. And given the high margins of the Tesla business, seems like they can recover the investment.

    But you could be right..it could have been delayed. They probably did not assign as many resources to it as to Maxwell.

    Yes..but there was a GM107 before GM204 came out. And like I've stated above, x10 part does not necessarily imply a compute part.
     
  18. RecessionCone

    Regular Subscriber

    Joined:
    Feb 27, 2010
    Messages:
    499
    Likes Received:
    177
    GK180 was renamed to GK110B and replaced the original GK110 in all of Nvidia's product line. It has lower power consumption and a few bug fixes.

    It should have been named GK110B from the beginning to avoid all this confusion.
     
  19. Blazkowicz

    Legend Veteran

    Joined:
    Dec 24, 2004
    Messages:
    5,607
    Likes Received:
    256
    But there is GK208 with Compute 3.5, while GK210 is compute 3.7
    (and GK20A on a side-line with compute 3.2)
     
  20. iMacmatician

    Regular

    Joined:
    Jul 24, 2010
    Messages:
    773
    Likes Received:
    200
    Good read, thanks.

    One question though. Slide 11 says "leakage goes up with powered transistor count [and] doesn't matter what the frequency is," so wouldn't the 2x part on slide 24 have more leakage than the 1x part and so perform lower?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...