NVIDIA Kepler speculation thread

The "dynamic clock control"-thingy needs serious investigations, ie which scenarios affect it (ie, is it really only load related, or is there app detection or some such involved, too)
 
The CUDA core clock domain (de facto "CUDA cores"), will not maintain a level of synchrony with the "core". It will independently clock itself all the way up to 1411 MHz, when the load is at 100%.
It was Fermi that actually introduced the fixed clock ratio (2:1) for the ALU domain. All the previous architectures from NV used non-rational clock rate for the shaders, that was user exposed and adjustable in some predefined range.
 

We are getting an increasing amount of agreement that there are 1500ish cores and hot-clocks. We're missing something important about those cores -- that's a huge count growth so something must have changed, no?

Also, aside from more complex heat and power management, I'm intrigued by the 'between fxaa and msaa' hint from one of the previous stories. Was that a mistranslation, or is there some new AA mode in the offing?

-Dave
 
Alas this is going to cause quite some user confusion until folks can understand how it really works.
 
The "dynamic clock control"-thingy needs serious investigations, ie which scenarios affect it (ie, is it really only load related, or is there app detection or some such involved, too)

That, and the overclockability too.

It would've been nice if AMD allowed the TDP slider control to be adjusted by more than just 20%. 30-40% would've been nice for most cards, should one wish to over-volt the card and overclock the hell out of it without some clock throttling.

I'm really hoping that NV isn't going to make things more complicated regarding the true overclockability in all scenarios - my GTX 460 1GB's resetting the clocks whenever I clock it too high in some games (stressing, "some" games) is annoying like hell. I miss the old days when I'd just see the artifacts without the drivers resetting the clocks.
 
It would've been nice if AMD allowed the TDP slider control to be adjusted by more than just 20%. 30-40% would've been nice for most cards, should one wish to over-volt the card and overclock the hell out of it without some clock throttling.
Overvolting won't make a difference to PowerTune, so you already can without changing implied limits. Although voltage can be a variable parameter into the PT calculations, it has been implemented it as a constant because PT is tuned to be deterministic across the range of ASIC's out there, so it assumes the worst case.
 
That techpowerup article says there are dozens of power planes... does that mean different parts of the chip will use different voltages?
 
That techpowerup article says there are dozens of power planes... does that mean different parts of the chip will use different voltages?
But not dozens of different voltages. That is clearly wrong or just some kind of typo. Maybe they wanted to speak of clock domains or the number of individually power gated domains. Or it could mean that there are a lot of power plans (without the "e"), one for each possible combination of clocks in the different clock domains.
 
It probably points to the granularity of this solution. So for instance not only 100 MHz steps but smaller ones. It wouldn't make sense to clock dozens of chip parts differently, would it? How many different domains could there be? ROPS, TMUs, ALUs...that's three.
 
Funny I read initially plans and obviously missed the "e". Good thing they didn't go for power plants instead :LOL:

It probably points to the granularity of this solution. So for instance not only 100 MHz steps but smaller ones. It wouldn't make sense to clock dozens of chip parts differently, would it? How many different domains could there be? ROPS, TMUs, ALUs...that's three.

If there's any merit to it it sounds more like ROPs/TMUs and other enchilada one, rasters/trisetups another one, one for ALUs and then possibly some others for any possible combination.
 
@Gipsel That does seem a lot more realistic.

Though it does seems like being able to make a leakage/dynamic power tradeoff at sub-chip granularity should have some power benefits. But I'm saying that without having any idea of the costs of making the voltage tunable at that scale are.
 
Individual partitions are power gated (and a bunch of them together can be rail gated since they share the rail). But you won't have different voltages for each partition since the number of power rails itself is not going to be very large.
 
@Ailuros - I went back to the original article, and it does indeed talk about "power plans" (whatever that is), not planes - so much for my reading comprehension skills.

@vking - thanks. I tried reading up about it a bit more, and it looks like you would need different VRMs on the PCB for each different on chip voltage. If I'm not mistaken, it looks like AMD used 2 voltage planes on some K10s (marketed as Dual Dynamic Power Management) 4-5 yrs ago. I wonder if they are doing this on their GPUs as well...
 
Alas this is going to cause quite some user confusion until folks can understand how it really works.

If there are multiple clock domains and each can fluctuate independently then I'm sure everybody will be confused, not just end users. Best of luck to reviewers trying to figure out what's happening under the hood.
 
The way it might work is (a) work load determines voltage (within min/max bounds of course), (b) voltage determines frequency (assuming that we didn't hit thermal limit, and if we hit thermal limit it will result in voltage/frequency throttling).

So my guess would be that for any given load, minimum spec'ed frequency will be guaranteed except when thermal throttling is necessary. So barring power virus situation minimum perf is guaranteed.

Essentially this would be a closed loop overclocking (assuming my guess about how they are doing this is correct - and I don't claim any real info, just a guess), and if done right will result in a very nice perf boost over the spec.
 
It would also be good if workload measurement involved multiple kinds of units, as bottlenecks shift over the course of rendering a frame (e.g. a deferred shading pass probably has minimal geometry performance requirements compared say rendering shadow buffers). Maybe one could allocate more of the power budget to the bottlenecked bits by up-clocking those, then down-clock the stuff in light use to compensate. No clue if this is what Kepler does or how feasible it is. I guess since you can't tweak your voltages all over the place, I'm assuming there's a fair bit of wiggle room for clock even at some fixed voltage.
 
Psurge,

I think GPUs are already sophisticated enough to do what you are suggesting. Even if a bunch of clock domains share the same rail (hence run at same voltage), there still is plenty of room to play with such as reducing effective frequency through pulse eating, changing dividers etc. Dynamically reconfigurable PLLs are pretty common these days as well.
 
Back
Top