NVIDIA Kepler speculation thread

It's more accurate data that is tens of millions to a hundred million cycles out of date with the latency of the off-die control loop.
A measurement can be several frames out of date, in more relatable terms.

The comparison is between a having an accurate thermometer that gives you the temperature of last week compared to a somewhat less precise model that tells you how hot it is today.

A more advanced scheme might be a less conservative counter-based system with feedback from voltage and current measurements to tamp it down if it gets too aggressive.

Possibly better might be a version of the initial non-digital Foxton that Intel almost used, or if AMD or Nvidia ever manage to get on-die voltage control, the loop can be sped up significantly.
 
Is NVidia measuring current and volts or just current and assuming volts? And how does current off-die tell you about the heating effects of that current on die?
 
Tahiti is about 70% bigger than Pitcairn - and about 30% faster @2560x1600 in real gaming benchmarks (while running @ a mere 925Mhz).

Some people consider that a complete and utter fail.

Now GK110 is rumored to be about 80% bigger than GK104 - and there's an alleged slide that says it's about 40% faster than GK104 @2560x1600 in some marketing picked benchmarks.

The same people that consider Tahiti an utter fail now proclaim that GK110 is a chip of many wonders - and ponder it's about to totally humilate AMD.

EDIT: Did I mention that Tahiti was 3 months early (compared to Pitcairn) - and GK110 will probably be 6-9 months late?

If it is 40% faster than GK104 it makes GK110 around 50-60% faster than 7970 and up to 90% faster than GTX580.

In terms of raw performance that is far from fail. Does that raw performance come at too high a die size, sure, but that has been the case for all Nvidia big die chips. What makes 7970 such a big fail is that a 50-60% performance delta hasn't been seen since the days of the G80, ever since then Nvidia have taken compute hits to push their Tesla line while AMD didn't reducing the performance gap all the way down to 20% with the 5870 and GTX480. Now that AMD have taken the same compute hit as Nvidia the 50-60% larger die will have 50-60% better performance...
 
Is that some new kind of math? For gk110 to be 90% faster than the 580 it would need to be a lot more than 40% faster than the 680. The 680 is not 66% faster than the 580 (not in general anyway).
 
Is NVidia measuring current and volts or just current and assuming volts? And how does current off-die tell you about the heating effects of that current on die?

I've not seen a reference to what measurements Nvidia uses.
Off-die measurements won't give much detail on localized thermal effects, but Nvidia's tech is described as taking thermal data into account as well.
 
It's more accurate data that is tens of millions to a hundred million cycles out of date with the latency of the off-die control loop.
A measurement can be several frames out of date, in more relatable terms.

The comparison is between a having an accurate thermometer that gives you the temperature of last week compared to a somewhat less precise model that tells you how hot it is today.

A more advanced scheme might be a less conservative counter-based system with feedback from voltage and current measurements to tamp it down if it gets too aggressive.

Possibly better might be a version of the initial non-digital Foxton that Intel almost used, or if AMD or Nvidia ever manage to get on-die voltage control, the loop can be sped up significantly.
That's not necessarily important. It all depends upon how rapidly heat is dissipated. If the heat is dissipated from the die more slowly than the measurement frequency, then it really does not matter that there is a significant delay there: any current fluctuations on time scales shorter than the dissipation rate are going to get averaged together anyway.

I'd be really, really surprised if the heat dissipation rate was fast enough for short-term current fluctuations to make a significant difference to die temperature.
 
That's not necessarily important. It all depends upon how rapidly heat is dissipated. If the heat is dissipated from the die more slowly than the measurement frequency, then it really does not matter that there is a significant delay there: any current fluctuations on time scales shorter than the dissipation rate are going to get averaged together anyway.

I'd be really, really surprised if the heat dissipation rate was fast enough for short-term current fluctuations to make a significant difference to die temperature
I suspect that would make things worse. If the chip is measured as being very close to the limit at step N, there's a chance that it will go over the limit for some amount of time prior to step N+1.
The slower heat moves off-chip, the faster it accumulates from transient activity spikes, which is risky if regions of the chip are already toeing the line and the time steps are long.

Part of the weakness of the long control loop is that the time periods in question are so long that they are thermally significant from the POV of the cooling solution. They can't say everything averages together because their spikes can last longer than what is considered transient.

All of this can be avoided by inserting a decent guard band, which Nvidia has done.
 
Is that some new kind of math? For gk110 to be 90% faster than the 580 it would need to be a lot more than 40% faster than the 680. The 680 is not 66% faster than the 580 (not in general anyway).

I take it you have never heard of compounding...

7970 > 580 by 20%
680 > 7970 by 15%
GK110 > 680 by 40%

580 = 100
7970 = 120
680 = 138
GK110 = 193

Don't get me wrong, that is a best case scenario which is why I said up to, chances are it will be more like 80-85%...
 
Err, the difference between 680 & 7970 is hardly 15%, unless you're only counting low resolutions.
TPU has the largest game selection used by review sites, and 680 comes out only 7,5% faster than 7970 at a mere 1920x1200, while at 2560x1600 the difference shrinks to mere 4%, even if you'd take the 7,5% difference it would already drop GK110 number by 13% to 180% of 580
 
If it is 40% faster than GK104 it makes GK110 around 50-60% faster than 7970 and up to 90% faster than GTX580.
(1) We're talking about performance @2560x1600 resolution here - as that's the setting upon which that ominious slide is based (possibly because the performance delta between GTX580 and GTX780 won't be that big at lower resolutions - always assuming that the slide actually is real, of course). Gk104 is about as fast as HD7970 in that szenario.

(2) So we're talking about 40% performance advantage over Tahiti - which should end up about in line with the expected difference in die size. So we can speculate that GK110 might yield about the same perf/mm² as Tahiti. That's a great improvement for Nvidia (who always struggled in that respect) - but no humilation for AMD.

(3) 40% better performance than Tahiti is just within reach of an upcoming HD8970 (or an HD7870 crossfire solution, for that matter) - which again brings us back to the timing problem. Even if GTX780 launches very early this summer - it's still at least 6 months behind Tahiti. By the time GTX780 is out, HD8970 won't be too far away.

I'm not saying that Gk110 won't be an impressive product. I'm just trying to point out that some people seem to apply double standards.
 
Some people consider that a complete and utter fail.

It's all about expectations. I've met women who were floored because I opened a door for them or asked if they got home ok cause their expectation is that all men are douchebags (as if opening doors changes that fact :LOL: ).

People expect nVidia to make big, power hungry compute focused chips with relatively low gaming efficiency because nVidia has made its priorities blatantly clear. This is why Kepler was a positive surprise - it broke that expectation. Pitcairn is equally impressive yet got nowhere near the same reaction because it "only" met expectations.
 
It's not like most reviews didn't test the 580 in the same benchmarks.
Skip compounding the error with an extra comparison to a dissimilar architecture and just go 580-680-780.
It's handwaving within handwaving already.
 
It's all about expectations. I've met women who were floored because I opened a door for them or asked if they got home ok cause their expectation is that all men are douchebags (as if opening doors changes that fact :LOL: ).

People expect nVidia to make big, power hungry compute focused chips with relatively low gaming efficiency because nVidia has made its priorities blatantly clear. This is why Kepler was a positive surprise - it broke that expectation. Pitcairn is equally impressive yet got nowhere near the same reaction because it "only" met expectations.
So you are saying women opening doors for women will not earn them any brownie points? Damn sexism or prejudice for that matter.
 
I just laughed as hard at this as when people were insisting that the 7970 beats the 580 by 50%, when it was at most, 15-20%. Go go inflated numbers!
Both GTX680 and HD7970 are about ~30% faster than GTX580 @2560x1600.

Some review sites say a little less (e.g. techpowerup.com @ 27%), some say a bit more (e.g. computerbase.de @ 33%). But ~30% should be a very good number to go by.

Lower resolutions are a totally different story, of course.
 
w00t warp shuffling :runaway:

desktop_2012_03_27_22wif3f.png

http://forums.nvidia.com/index.php?showtopic=225312&st=40
http://developer.download.nvidia.com/compute/cuda/4_2/rc/toolkit/cudatoolkit_4.2.6_win_64.msi
 
There are swizzling ops indicated for GCN that are LDS instructions that don't take up any LDS, but must be using the crossbar logic to move data between lanes.
That might be similar, or a subset of Nvidia's instruction.
 
Is there something particularly taxing on LuxMark that's not present in these tests?

Please, keep in mind that LuxMark kernel is several thousand of lines long while a Mandelbrot kernel is just 10-20 lines of codes. The kind of load and bandwidth requirements are simply too different to be compared.

Blender/Cycles (a CUDA path tracer) users are reporting the same kind of results shown by LuxMark: the 580 is faster than the 680. So it doesn't look like a OpenCL specific problem.
 
Back
Top