NVIDIA Kepler speculation thread

Let's assume the 7970 overclocks better and is faster at maximum overclock. That still has no bearing on the validity of GPU boost as a stock feature of the 680. If I'm reading you correctly you're saying that comparing the two cards at stock isnt fair and the only fair way to compare them is to overclock them to the max first?

I think these posts in that thread sum it up nicely.




It's just as likely that they get cards that boost higher than the ones in reviews. Variance goes both ways.

So you admit, buying a 680 is a crap shoot.
 
I think you mistakenly quoted me instead of someone else. I didn't make any mention about stock clocks. However, having posted about overclocking/boosting I did ask if reviewers verified what the boost frequency was in game since it can be higher. Since boosting the GPU can have an effect on frame rates. :D

At least one did.

gpuboost.png


So you admit, buying a 680 is a crap shoot.

Show me the standard deviation of boost clocks across a large number of card samples in each game then we can talk. Btw, question to the skeptical. Isn't Powertune subject to the same chip and TDP variances?
 
Yes, that's why I'm not complaining about my laptop drivers specifically (that is, I'm not complaining about not being able to update them). Though I admit that my experience with more recent ATI drivers may be somewhat colored by the fact that these drivers are rather old, they lack features that I had become used to with nVidia drivers for years previously.
Then I'm mis-understanding something, because you just got done saying:
Well, that's good that they've fixed that. But like I said, I have an ATI laptop right now. The main problem is that I can't use drivers newer than about three years in Windows. And those three-year-old drivers do not have support for display scaling customization, or most of the other options that I have become accustomed to with nVidia drivers.

These two statements appear quite contrary. You say you have no complaint, but you said that you can't update drivers sooo... Don't you have a complaint? Can you understand why a few of us are having problems following?

I am going to be with only my laptop for a few weeks soon, though, so perhaps I'll see about taking the time to figure out how to install some more up-to-date drivers. At least that will give me a better picture of the current status.
I think that's absolutely fair, and to be perfectly clear, I'm not actually going to say that your opinion is going to improve ;) To assist you, go find the direct akamai link for the ATI mobility drivers, it doesn't have the piece of code that blocks the specific IHV's. For example, it works fine on my Lenovo Y460 switchable 5650 (one of the blacklisted devices.)

Now, in Linux, where I by far spend the most of my time on the laptop....
You need not say more, as I already know ATI blows nuts in Linux. If the laptop you were mentioning above is a purely Linux device, it's no wonder you hate them! If it's a dual-boot Win/Linux box, then at least the Windows side will probably get better with the newer driver... Hopefully... ;)

I cannot even begin to explain why ATI/AMD sucks so bad with Linux drivers, but it's obvious that they simply don't care -- I cannot fathom any other reason why it would have sucked for this long. NV is absolutely the only vendor to use if you're using Linux of pretty much any flavor.
 
Isn't Powertune subject to the same chip and TDP variances?
If this effected benchmarks, I would have a problem with it, but it doesn't.
If I look at a benchmark of a HD7XXX, I know what I buy will be the same.
If I look at a benchmark for 680, it may not represent what I am buying.
 

Thanks for the link. So now we need to know if there's a deviation within a deviation between cards. :p But it's clear that the gpu is boosting above 1058MHz. Nothing wrong with that but I would like to know what exactly is the card doing when gaming. Perhaps at some future date we can set a standard boost range since 1058MHz is not a fixed clock for boosting. Having said that, 1006MHz + 91Mhz is far away from 1058Mhz that I originally thought most of those results were based from.

Now that it's clear that GPU boosting is more then 1058MHz from 1006Mhz and reported as high at 1100MHz. If a certain game never sees a GPU boost clock below 1058MHz will people see this as just overclocking? Even though it never sustains a certain frequency I like to see follow up reviews, if possible, to see if there is more than 2 FPS difference between GPU boosting and stock base frequency of 1006MHz.
 
Last edited by a moderator:
Thanks for the link. So now we need to know if there's a deviation within a deviation between cards. :p But it's clear that the gpu is boosting above 1058MHz. Nothing wrong with that but I would like to know what exactly is the card doing when gaming. Perhaps at some future date we can set a standard boost range since 1058MHz is not a fixed clock for boosting. Having said that, 1006MHz + 91Mhz is far away from 1058Mhz that I originally thought most of those results were based from.

The boost clock is pretty confusing. It's not the maximum allowed clock, it's the halfway point. There are 8 boost steps of 13Mhz each similar to Intel's discrete turbo steps. So the boost range on a stock 680 is actually fixed with a maximum of 1110Mhz = 1006Mhz + (13Mhz x 8). I think nVidia made a mistake with marketing the 1058 boost clock, they should have just done base + turbo like Intel.

Boost taken on its own is fine but IMO it's the combination of all the power management features that really matters. Adaptive vsync and/or frame rate limiting are good for people who want to keep their cards as quiet as possible without sacrificing performance.
 
My sample stayed at mostly constant 1097 MHz in Alan Wake, Anno 2070, Batman AC, Battlefield 3, Bullestorm, Civilization V, Crysis 2, F1 2011, Metro 2033.

It went a bit lower on average in Total War Shogun 2 (1071 with FXAA and 1058 with MSAA 4x).

I'm not a big fan of non-deterministic performances. To be fair we still have to see if the performance difference between samples is significant and we should not forget that if Powertune enables determinitic performances, it is non-deterministic on the noise side ;)
 
The boost clock is pretty confusing. It's not the maximum allowed clock, it's the halfway point. There are 8 boost steps of 13Mhz each similar to Intel's discrete turbo steps. So the boost range on a stock 680 is actually fixed with a maximum of 1110Mhz = 1006Mhz + (13Mhz x 8). I think nVidia made a mistake with marketing the 1058 boost clock, they should have just done base + turbo like Intel.

Boost taken on its own is fine but IMO it's the combination of all the power management features that really matters. Adaptive vsync and/or frame rate limiting are good for people who want to keep their cards as quiet as possible without sacrificing performance.


So Adaptive vsync keeps GPU boost in check then. I take it that getting 1110MHz gpu boost isn't really an ideal clock rate to have. For example, you don't need that at the main menu of a game. But is this all on chip or software driven?
 
It can be, but current implementations on HD 7000 and HD 6000 are not. Read from here, in this thread.

I have another question: why are you not using PowerTune for Turbo on top of power-capping? Most of the time, Tahiti is probably operating under its estimated TDP at 925MHz.

Couldn't you set a maximum clock of say, 1050MHz, with 925MHz as the base clock? Most games would probably run somewhere around 1000MHz, Furmark and OCCT would throttle down to lower values just as they do now, and perhaps a couple of games would dip slightly below 900MHz, just as they do now too. In effect, this would be exactly the same as what AMD is doing on the CPU side: advertising a base clock, and offering deterministic Turbo on top of it.

The only downside I can see is that some chips might not make the cut, so yields might suffer a little bit. Is there something else I'm missing?
 
I have another question: why are you not using PowerTune for Turbo on top of power-capping? Most of the time, Tahiti is probably operating under its estimated TDP at 925MHz.

I think it's safe to assume that AMD's version will be coming sooner rather than later.

So AMD offers their own "power-boost" or whatever and we're back to square one - except we don't have the same level of control over our gpu's as we used to have, and buying a card is more of a lottery.
 
I think it's safe to assume that AMD's version will be coming sooner rather than later.

So AMD offers their own "power-boost" or whatever and we're back to square one - except we don't have the same level of control over our gpu's as we used to have, and buying a card is more of a lottery.
I don't see how power-boost reduces control over a GPU. As Anandtech's review points out, there are already overclock utilities available which modify how powerboost operates:
http://www.anandtech.com/show/5699/nvidia-geforce-gtx-680-review/4

This seems to me to simply change how overclocking is done. It doesn't seem to reduce user control, just offer a new mechanism for user control.
 
I think (and I could be wrong) that PowerTune would already technically work for what NV is doing with GPU boost, except the result would be 'in reverse'. You could set a speed limit of, say 1200mhz, but then a maximum power cap of 195W. As the application encounters a case where 1200mhz uses too much power, the card would begin ratcheting down the clockspeed.

As I perceive it, ATI's method was primarily used to avoid exceeding TDP in order to avoid damaging the ASIC or the VRM's (last I recall), whereas NV's method is to avoid not using up all that your TDP could potentially allow for. Conceptually I like NV's method better, but I do not like how they implemented it.

Oh well. My head hurts now...
 
It can be, but current implementations on HD 7000 and HD 6000 are not. Read from here, in this thread.

Ah, so AMD chose deterministic performance by going with conservative clocks and allowing power consumption to vary. nVidia chose deterministic power consumption and allowed performance to vary. I'm not really sure why DK considers AMD's DVFS "more advanced" though. Is it just based on update frequency? Would like to understand that better.

It's interesting because with the frame rate limiter nVidia is also offering deterministic performance with the additional benefit of much lower power consumption in lightweight titles or with vsync enabled. Does Powertune do something similar and reduce clocks/voltages under light load?

framelimiter.png
 
I think it's safe to assume that AMD's version will be coming sooner rather than later.

So AMD offers their own "power-boost" or whatever and we're back to square one - except we don't have the same level of control over our gpu's as we used to have, and buying a card is more of a lottery.
So go ahead and set the Power Limit to -30% and disable GPU Boost. Take the free 9.1% performance improvement and throw it in the toilet. Will you then be happy?

http://www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_680/30.html
 
Ah, so AMD chose deterministic performance by going with conservative clocks and allowing power consumption to vary. nVidia chose deterministic power consumption and allowed performance to vary. I'm not really sure why DK considers AMD's DVFS "more advanced" though. Is it just based on update frequency? Would like to understand that better.

It's interesting because with the frame rate limiter nVidia is also offering deterministic performance with the additional benefit of much lower power consumption in lightweight titles or with vsync enabled. Does Powertune do something similar and reduce clocks/voltages under light load?

framelimiter.png

AMD's DVFS can be set to offer deterministic performance, has a much lower response time, and doesn't require off-die sensors.

I don't think AMD has any kind of framerate-limiting feature (apart from V-sync) but PowerTune can be adjusted between -50% and +50%. On the HD 7970, that means you can cap power at 125W for games that aren't too demanding.
 
AMD's DVFS can be set to offer deterministic performance, has a much lower response time, and doesn't require off-die sensors.

I was more curious about the end result. If I understand correctly Powertune runs off of a static table of mappings between utilization and clocks. Finding it hard to grasp how that's "more advanced" than real-time power and temperature monitoring.

I don't think AMD has any kind of framerate-limiting feature (apart from V-sync) but PowerTune can be adjusted between -50% and +50%. On Tahiti XT, that means you can cap power at 125W for games that aren't too demanding.

Different games would need different powertune settings, can't really expect people to figure out what those are and set them per game. With the nVidia approach you set the framerate target and the card does everything else. The more I think of it from and end-user perspective what nVidia is doing makes a whole lot of sense with a lot less hassle (and transparency too admittedly) for the consumer.
 
Ah, so AMD chose deterministic performance by going with conservative clocks and allowing power consumption to vary.
Determinism and "conservative clocks" have nothing to to do with each other. You still have to know what your binned clockspeed is for any SKU at each voltage level.

nVidia chose deterministic power consumption and allowed performance to vary. I'm not really sure why DK considers AMD's DVFS "more advanced" though. Is it just based on update frequency? Would like to understand that better.
NVIDIA's solution is based on board cirtuitry looking at the input current drawn and reacting. PowerTune is an algorithmic approach that is calculating an inferred power for any given cycle of by looking and activity counters across the chip.
 
It's not dual issue. They have had dual issue since gf104 and dual issue isn't hard to get right with a decent ISA, which they should have.

With GF104's dual-issue there was still intra-warp register dependency tracking in hardware. I was referring to the new compiler.

GCN and gk104 have the same reg file size, virtually same L1/LDS, almost same clocks, virtually same mem clocks but gk104 has 3x more compute. One of these two designs is unbalanced wrt latency hiding.

Not only that but GK104 probably has a lot more bubbles in its ALU pipeline especially in workloads that nVidia hasn't optimized. With GF104 we saw lower compute performance due to a lack of ILP. With GK104 that's compounded by compiler limitations. Given those results I wonder if either dual-issue or static scheduling will make an appearance in big Kepler.

I'm not good at this game but will take a shot at a random guess for GK110:

GPC: 4
SMX per GPC: 4
Scheduler (hardware) per SMX: 4
32-wide SIMD per SMX: 4
L/S per SMX: 32
SFU per SMX: 32
TMU per SMX: 8
Register per SMX: 64k 32-bit entries
L1 per SMX: 64KB
DP throughput: 1/2 SP
Bus-width: 512-bit
ROPs: 64
L2 cache: 1MB
TDP: 250w
Die-size: 425mm^2-475mm^2
Clock: ~850Mhz
Gaming Perf: GK104+25%

I'm not expecting any huge architectural differences cause it's still Kepler. My imaginary chip should net the promised 2-2.5x perf/w over the best Fermi Tesla parts.
 
Determinism and "conservative clocks" have nothing to to do with each other. You still have to know what your binned clockspeed is for any SKU at each voltage level.

If you need to guarantee that all 7970's run at 925Mhz in all games (deterministic) then of course you have to be conservative. nVidia doesn't guarantee that any 680 will run at 1110Mhz in any game on a given day of the week (non-deterministic).

NVIDIA's solution is based on board cirtuitry looking at the input current drawn and reacting. PowerTune is an algorithmic approach that is calculating an inferred power for any given cycle of by looking and activity counters across the chip.

I'm still not getting how "inferred power consumption based on chip utilization" is a whizzbang advanced approach. Sitting on the outside it simply looks like "guessing" versus the direct power consumption readings nVidia is doing. I understand the differences just not getting the "more advanced" part....
 
Back
Top