Nvidia BigK GK110 Kepler Speculation Thread

Overclocking is indicative of results from typical material, not necessarily what a product is qualified or even designed to do.

I completely agree. However, they've put out several SKUs at 6Gbps (690, 680, 670). I think it's a fair assumption that they aren't exactly pushing the envelope here and are quite comfortable at those speeds. Overclocking results further support that. What have you seen that would make you think 7Gbps on a refresh is unlikely (besides unavailability of memory at that speed of course)?
 
Overclocking is not indicative of a product refresh or actual capabilities, that is all. You can't draw any conclusions from it because it does not represent a product qual environment or product qual material.
 
Overclocking is not indicative of a product refresh or actual capabilities, that is all. You can't draw any conclusions from it because it does not represent a product qual environment or product qual material.

Fair enough. Doesn't eliminate the possibility either. What we know is that some sample of GK104 chips are happy at 7Gbps. Good enough for me.
 
Fair enough. Doesn't eliminate the possibility either. What we know is that some sample of GK104 chips are happy at 7Gbps. Good enough for me.

Did the bandwidth numbers match the increase in memory clock? GDDR5 is more resilient to errors, but that means that it can waddle into higher clocks that don't yield any bandwidth increase.
If pushing 6 Gbps RAM past its spec, it would also depend on the GDDR5 devices. I don't think the GPU on the other side of the memory interface can take all of the credit.
 
My 3Dmark11 score improves a bit when I push the memory clocks from 7.0 to 7.1 Ghz. The test itself doesn't seem very bandwith limited though, increasing the core clock helps much more.

While overclocked results can't be extrapolated to mean that those speeds can be achieved in "qualified situations" The abundance of OC headroom compared to past designs feels to like an indicator that the memory controller has quite high capabilities. The chips itself are rated to operate at only 6 Ghz, so it's somewhat likely that they are reaching their limits before the controller.
 
Did the bandwidth numbers match the increase in memory clock? GDDR5 is more resilient to errors, but that means that it can waddle into higher clocks that don't yield any bandwidth increase.
If pushing 6 Gbps RAM past its spec, it would also depend on the GDDR5 devices. I don't think the GPU on the other side of the memory interface can take all of the credit.

I cant speak for the GK104 as i have not make test of bandwith overclocked. but for the 7970, this make a difference... the difference between 5500 to 6000mhz and 6000 to 7000 is quite visible. In reality you need to follow the increase on core speed when overclocking with the memory clock speed, otherwise, you will cut the gain passed a certain core speed. ( This is typical in vantage, where you will loose 1K points on GPU score if you dont do it ) ( All tests made at 1200mhz core )

But ofc tests have been made with 3Dmark11 who is not bandwith hungry.. and in this bench the memory clock speed is more important vs the available amount of ram. ( memory usage stay really low ) . I cant confirm this is a general trend in games. The results will depend of the situation. ( dont expect to get 1K point more in 3Dmark11, but the score increase, so it have an impact ).

on the 7970, the GDDR5 is fixed at 1602mv .... Depending ofc the chips quality, ( both of my cards are watercooled, so it dont go over ~ 33°C on Vram ) , no problem to hit 7200 mhz without touching the voltage. ( again in a bench as 3Dmark11, maybe with a game i will need increase voltage for stability on the ram )
 
Last edited by a moderator:
He said 1.5Ghz not 1.5GB :)

While in hair splitting mode I said 1.5GHz :LOL: :p

Anyway I don't personally think that they'll go for a wider bus or any extravagant memory clocks with the GK104 follow-up and no I don't personally expect it to end up with a quite high performance difference compared to the 680, but would like to be pleasantly surprised. Honestly when was the last time when something wasn't inheretly wrong with a predessing chip that it's refresh chip within the same architectural family yielded a say >20% performance increase?
 
Or rather, how much it underclocks. Should do it automatically, at least when playing vsynced.

20 minutes of Diablo, 1920x1200 vsync-on.

Min: 550 Mhz
Max: 1137 Mhz
Avg: 872 Mhz

Obviously I need to find something more interesting to play :)
 
I dunno sounds reasonable to me if clocks are reasonably high (so 50% more memory bandwidth, ~35% higher ROP-fillrate, ~70% higher alu/tex rate). That's assuming GPCs aren't a bottleneck (which clock-adjusted barely increase) nor that there are any other less obvious bottlenecks. Some of the compute stuff (not DP/ECC), while not area-efficient for gaming might also marginally help. The chip is quite a beast.

At a theoretical 850 MHz, GK110 would be 150% of GK104 at it's boost clock of 1058 MHz (I am talking product here, not clock for clock obviously). Which would fit to the probable increase in memory bandwidth (barring the larger L2) and the power headroom that's left between 190ish and almost 300ish watts.

20 minutes of Diablo, 1920x1200 vsync-on.

Min: 550 Mhz
Max: 1137 Mhz
Avg: 872 Mhz

Obviously I need to find something more interesting to play :)

That's more than I thought it would be. :)

tbc: More MHz, not more throtteling.
 
Last edited by a moderator:
20 minutes of Diablo, 1920x1200 vsync-on.

Min: 550 Mhz
Max: 1137 Mhz
Avg: 872 Mhz

Obviously I need to find something more interesting to play :)

Interesting so it underclock following the gpu usage ....

I ask me if it is not related to the stutter seen with v-sync enabled.

Can you do the same test with adaptative v-sync ?

If i take my system as example, the usage on core without v-sync is at 99% under BF3, and goes to 44-70% on both core when using v-sync at 60fps.. ( intended, i never goes under 70fps min. in BF3 ) so my fps are fixed at 60fps all the time.
 
Oak Ridge has received initial complement of Nvidia's next-generation Kepler GPUs

Jeff Nichols, Oak Ridge National Laboratory's scientific computing chief, confirmed that the lab has received its initial complement of Nvidia's next-generation Kepler GPUs
I'd been bugging Nichols for some information on the latest developments and a few days ago he emailed me, saying, "We have received 32 Keplers and put them in our development platform. Everything is going as (or better than) expected."

Nichols said the ORNL is expected to begin receiving many more of the Kepler GPUs (1,000 of them) as early as this week.
So far, the Department of Energy has approved the purchase and installation of 14,592 GPUs in Titan, the ORNL official said.
http://blogs.knoxnews.com/munger/2012/09/the-big-computing-change-is-ta.html
 
( note this info have fall 1 weeks ago )

They need start test the supercomputer ( 16000 GPU and 18 000 AMD CPU ( not sure of the numbers, my memory is down at this hour ) .. so first test samples have been send ( hé they need a lot of time for test the interconnect ( at this point thoses gpu could be not even fully finalised, this is not important ).

The system will not be finished before march 2013 .
 
>1 TFLOPS of 1/3 DP at ~85% efficiency gives >3.53 TFLOPS SP, which indicates >613 MHz with 15 SMs, >657 MHz with 14 SMs, or >707 MHz with 13 SMs.

Interesting those hypothetical clocks are so low….

What NVIDIA said in the Tesla Kepler Family Product Overview is that Tesla K20 (Kepler) has 3x higher performance per watt than Tesla M2090 (Fermi) based on measured DGEMM performance, where Tesla M2090 measured 410 GFLOPS double precision throughput and Tesla K20 measured > 1000 GFLOPS double precision throughput. In the Kepler Compute Architecture Whitepaper, NVIDIA also mentioned that Tesla K20 has > 80% DGEMM efficiency vs. 60-65% for Tesla M2090. The peak double precision floating point performance for Tesla M2090 is 665 GFLOPS, which gives a DGEMM efficiency of ~ 62% (which matches the range specified in the whitepaper). In order for Tesla K20 to have ~ 3x higher performance per watt, it would need a measured DGEMM performance of ~ 1200 GFLOPS double precision throughput. And with > 80% DGEMM efficiency, Tesla K20 should have a peak double precision floating point performance of ~ 1500 GFLOPS (ie. ~ 1.5 TFLOPS). Since Tesla K20 has a 1-to-3 ratio of double precision to single precision execution units, the peak single precision floating point performance would be ~ 4500 GFLOPS (ie. ~ 4.5 TFLOPS). So the core clock frequency for Tesla K20 with all 15 SMX's (with 2880 CUDA "cores") enabled would be ~ 781MHz.
 
Back
Top