Nvidia BigK GK110 Kepler Speculation Thread

Wynix · Jun 7, 2014

Sinistar said:
Titan Z

Doesn't seem like Nvidia sent any cards to reviewers.

That is why they didn't send them to reviewers.

CarstenS · Jun 7, 2014

Seems like the first US sites are getting their samples now. But probably from board partners rather than Nvidia itself.

iMacmatician · Nov 15, 2014

entity279 said:
Well, as far as I managed to find out, there won't be any Maxwell-based Tesla soon (meaning: not in the next 6 months).

Instead, there will be a launch this month for a dual chip Kepler based Tesla SKU. As strange as it sounds..

iMacmatician said:
It could be the Tesla K80, which is rumored to be a dual-chip part.

If this image is real then the K80 (GK210-DUO) would have ~520 MHz core clock and ~5.0 Gbps memory, assuming all SMXs and memory interfaces are enabled.

My SMX assumption doesn't seem to hold. According to slide 16 in this slide deck, the K80 has 2.9 TF DP, 4992 CCs, and 480 GB/s memory bandwidth. These specs would imply 13 SMXs per chip and ~870 MHz core clock.

Wouldn't a GPU with 15 SMXs enabled and at a lower clock improve performance/W? I'm also considering the possibility that the GK210 chip physically has only 13 SMXs, although I'm not sure why they would do that.

RecessionCone · Nov 17, 2014

My favorite thing about GK210 is the 512 kB of register file and 128 kB of L1/shared memory per SM. Can be nice for occupancy limited code.

silent_guy · Nov 17, 2014

Interesting observation on Anandtech: this is the first GPU that is created for Tesla only. This means that the Tesla business is now large enough to warrant separate silicon? Remarkable.

Dade · Nov 17, 2014

RecessionCone said:
My favorite thing about GK210 is the 512 kB of register file and 128 kB of L1/shared memory per SM. Can be nice for occupancy limited code.

It is going to be pretty much awesome for anyone not writing 10 lines long kernels. I can easily predict huge LuxMark scores and good results with any renderer.

Deleted member 2197 · Nov 17, 2014

The Tesla K80 dual-GPU accelerator delivers nearly two times higher performance and double the memory bandwidth of its predecessor, the Tesla K40 GPU accelerator. With ten times higher performance than today's fastest CPU, it outperforms CPUs and competing accelerators on hundreds of complex analytics and large, computationally intensive scientific computing applications.
The Tesla K80 delivers up to 8.74 teraflops single-precision and up to 2.91 teraflops double-precision peak floating point performance, and10 times higher performance than today's fastest CPUs
Key features of the Tesla K80 dual-GPU accelerator include:

Two GPUs per board - Doubles throughput of applications designed to take advantage of multiple GPUs.

24GB of ultra-fast GDDR5 memory - 12GB of memory per GPU, 2x more memory than Tesla K40 GPU, allows users to process 2x larger datasets.

480GB/s memory bandwidth - Increased data throughput allows data scientists to crunch though petabytes of information in half the time compared to the Tesla K10 accelerator. Optimized for energy exploration, video and image processing, and data analytics applications.

4,992 CUDA parallel processing cores - Accelerates applications by up to 10x compared to using a CPU alone.

Dynamic NVIDIA GPU Boost Technology - Dynamically scales GPU clocks based on the characteristics of individual applications for maximum performance.

Dynamic Parallelism - Enables GPU threads to dynamically spawn new threads, enabling users to quickly and easily crunch through adaptive and dynamic data structures.

The Tesla K80 accelerates the broadest range of scientific, engineering, commercial and enterprise HPC and data center applications -- more than 280 in all. The complete catalog of GPU-accelerated applications (PDF) is available as a free download. More information about the Tesla K80 dual-GPU accelerator is available at NVIDIA booth 1727 at SC14, Nov. 17-20, and on the NVIDIA high performance computing website.

Users can also try the Tesla K80 dual-GPU accelerator for free on remotely hosted clusters. Visit the GPU Test Drive website for more information.

http://www.guru3d.com/news-story/nvidia-tesla-k80-dual-gpu-compute-accelerator.html

CarstenS · Nov 19, 2014

silent_guy said:
Interesting observation on Anandtech: this is the first GPU that is created for Tesla only. This means that the Tesla business is now large enough to warrant separate silicon? Remarkable.

Another aspect or maybe free interpretation: GK210 is a failsafe for 16/20nm not being ready for another round of >500mm² products. This could be an(other) indication, that GM200 was/is planned for 16/20nm only release.

Picao84 · Nov 19, 2014

CarstenS said:
Another aspect or maybe free interpretation: GK210 is a failsafe for 16/20nm not being ready for another round of >500mm² products. This could be an(other) indication, that GM200 was/is planned for 16/20nm only release.

There were already Customer Samples of GM200 detected on shipping manifests so it does not make much sense for it to be waiting on 16nm.

What about a crazy theory that GM200 is 28nm but a Gaming oriented chip, without new compute features and lower total DP performance than GK210 (although higher DP/watt)?

I think the fact that it is called GM200 and not GM210 is highly revealing of its nature...

AnarchX · Nov 19, 2014

GF100 was also a HPC chip. Also NVs Mike Clark called GM200 in line with other HPC chips.

The other odd aspect of GK210 is his MIA brother GK180, which was shipped at Zauba in early 2013 and has his own device in CUDA DLL (so it was not just the GK110B).
Maybe there is some intern lobby at NV who still wants to push the super-scalar approach...

Blazkowicz · Nov 19, 2014

No idea, but Pascal is the variant with very high bandwith coherent interconnection, and lower memory latency. It is a lot more interesting as a new product for HPC than GM200.

Picao84 · Nov 19, 2014

AnarchX said:
GF100 was also a HPC chip.

True, but that was before nVIDIA further bifurcated Compute from Graphics Chips. While GF104 and GF100 shared most (all?) of the feature set, GK110 brought things like Dynamic Paralelism and Hyper-Q, which GK104 never had.

Erinyes · Nov 19, 2014

iMacmatician said:
Wouldn't a GPU with 15 SMXs enabled and at a lower clock improve performance/W? I'm also considering the possibility that the GK210 chip physically has only 13 SMXs, although I'm not sure why they would do that.

I would have though so as well..but I guess there is a floor and diminishing returns as you go lower. Given the already low 562 mhz clock, perhaps there wasn't much benefit in going lower. And of course, given the dual GPU config and 300W TDP, there could simply be a hard power limit which limited them to 13 SMXs.

silent_guy said:
Interesting observation on Anandtech: this is the first GPU that is created for Tesla only. This means that the Tesla business is now large enough to warrant separate silicon? Remarkable.

Yea even I noticed that..it is quite interesting. But given that it wasn't a big change..the costs were likely minimal. From what I have read, costs at 28nm are still reasonable. It is at 20/16nm where design costs (and time) go up significantly, apart from the higher per transistor costs at the moment.

What I'm also more curious about is how has the die size been impacted by these changes? Did they have to increase the die size?

CarstenS said:
Another aspect or maybe free interpretation: GK210 is a failsafe for 16/20nm not being ready for another round of >500mm² products. This could be an(other) indication, that GM200 was/is planned for 16/20nm only release.

Nope, GM200 was planned for 28nm since at least late last year. GK210 being a failsafe makes no sense as GM204 beats it in everything except DP. My guess is that as it was a very minimal change, the design costs and time were low enough that it was worth doing it.

Picao84 said:
I think the fact that it is called GM200 and not GM210 is highly revealing of its nature...

Umm why? If it was called GM100 instead of GM200 then that might have been something. GM200 is just following the standard Nvidia naming convention.

Picao84 · Nov 19, 2014

Erinyes said:
Umm why? If it was called GM100 instead of GM200 then that might have been something. GM200 is just following the standard Nvidia naming convention.

There was no GK100...

AnarchX · Nov 19, 2014

Erinyes said:
Nope, GM200 was planned for 28nm since at least late last year. GK210 being a failsafe makes no sense as GM204 beats it in everything except DP. My guess is that as it was a very minimal change, the design costs and time were low enough that it was worth doing it.

Are ~500mm² tapeouts so cheap?
If you take GK180 also in this "bigger cache Kepler" project, we are talking about two tapeouts, two years of working on it.
Also Mike Clark saw GK210 as summer 2014 product, while GM200 is/was end of 2014/early 2015.

Its probably a failed time to market project. Maybe they had to low resources because of Tegra Kepler/Denver and Maxwell.

Erinyes said:
Umm why? If it was called GM100 instead of GM200 then that might have been something. GM200 is just following the standard Nvidia naming convention.

But GK210 is also a x10 part, while there was no GK200.

Kaarlisk · Nov 19, 2014

Erinyes said:
I would have though so as well..but I guess there is a floor and diminishing returns as you go lower. Given the already low 562 mhz clock, perhaps there wasn't much benefit in going lower. And of course, given the dual GPU config and 300W TDP, there could simply be a hard power limit which limited them to 13 SMXs.

There might be a clue in this presentation (link to parent page). Pages 11-25.

Erinyes · Nov 19, 2014

Picao84 said:
There was no GK100...

True but here there was a GM107 before GM204. Also note that the x10 designation may not necessarily imply compute focused. (See GF100 to GF110..and also the GK180 part)

AnarchX said:
Are ~500mm² tapeouts so cheap?
If you take GK180 also in this "bigger cache Kepler" project, we are talking about two tapeouts, two years of working on it.
Also Mike Clark saw GK210 as summer 2014 product, while GM200 is/was end of 2014/early 2015.

Its probably a failed time to market project. Maybe they had to low resources because of Tegra Kepler/Denver and Maxwell.

I don't know the exact costs but from what I've read it is in the range of a few million. The architectural changes are minimum and its largely just a new hardware layout and tape out. They could very well have done this while Maxwell was still in development. And given the high margins of the Tesla business, seems like they can recover the investment.

But you could be right..it could have been delayed. They probably did not assign as many resources to it as to Maxwell.

AnarchX said:
But GK210 is also a x10 part, while there was no GK200.

Yes..but there was a GM107 before GM204 came out. And like I've stated above, x10 part does not necessarily imply a compute part.

RecessionCone · Nov 19, 2014

AnarchX said:
The other odd aspect of GK210 is his MIA brother GK180, which was shipped at Zauba in early 2013 and has his own device in CUDA DLL (so it was not just the GK110B).
Maybe there is some intern lobby at NV who still wants to push the super-scalar approach...

GK180 was renamed to GK110B and replaced the original GK110 in all of Nvidia's product line. It has lower power consumption and a few bug fixes.

It should have been named GK110B from the beginning to avoid all this confusion.

Blazkowicz · Nov 19, 2014

AnarchX said:
But GK210 is also a x10 part, while there was no GK200.

But there is GK208 with Compute 3.5, while GK210 is compute 3.7
(and GK20A on a side-line with compute 3.2)

iMacmatician · Nov 22, 2014

Kaarlisk said:
There might be a clue in this presentation (link to parent page). Pages 11-25.

Good read, thanks.

One question though. Slide 11 says "leakage goes up with powered transistor count [and] doesn't matter what the frequency is," so wouldn't the 2x part on slide 24 have more leakage than the 1x part and so perform lower?

Nvidia BigK GK110 Kepler Speculation Thread

Wynix

CarstenS

Moderator

iMacmatician

RecessionCone

silent_guy

Dade

Deleted member 2197

Guest

CarstenS

Moderator

Picao84

AnarchX

Blazkowicz

Picao84

Erinyes

Picao84

AnarchX

Kaarlisk

Erinyes

RecessionCone

Blazkowicz

iMacmatician

Similar threads