AMD Overdrive - Overclock each core separately

Interesting... Except I know somebody who has a Quad Core Intel with each core clocked differently already.

I don't think that's physically possible, unless this is some new unnanounced thing that nobody else has yet... Intel's multiplier and VID state is common across all cores, which is an unfortunate thing for mobile platforms as it would make power savings in the face of "semi-increased load" much better.
 
Penryn might allow this.
Intel even made up a marketing name for independently clocking one core higher than the other: Enhanced Dynamic Acceleration Technology.

I don't know about the voltage levels, but even if they are still shared AMD can't change voltage levels independently until socket AM3.
 
I don't think that's physically possible, unless this is some new unnanounced thing that nobody else has yet... Intel's multiplier and VID state is common across all cores, which is an unfortunate thing for mobile platforms as it would make power savings in the face of "semi-increased load" much better.

I am not completely sure how he did it (he's a Com Sci PhD candidate doing something with it for his PhD) but I have done tests on it and it certainly acts like the cores are different clocks.
 
I don't think it matters who he is or his credentials; unless he physically changed the silicon, it's not currently possible in any shipping chips. There is only one "entry point" for setting multiplier on the entire CPU package, there is not any interface for multiplier changes on a per-core basis. And somehow "engineering" a way to do it would also require "engineering" a way to asynchronously reach the currently clock-shared L2 cache in at the same time.

And 3dilettante is right, and I had forgotten -- Intel has at least given the tech a name, but it still doesn't exist in production. Of course, being able to clock each core differently isn't much of a power saver if you can't volt them accordingly. And what are they doing with the shared cache architecture? Does it run full-speed full-time, or does it run at the speed of the highest-multiplier core currently active?

All of Intel's Pentium-M and later chips based on the banias / dothan / conroe / merom / etc & later tech have matching individual VID's for each multiplier. So either they're going to run the entire chip off the VID associated with the highest in-use multiplier, or else (hopefully, and more intelligently) they'll allow each core to request a different VID based on it's individual current multiplier. Of course, I wonder what that does for overclocking -- although i suppose they'll still allow "locking" of the multiplier (disabling of speed step)

Later Edit: Hey, you know what occurs to me? Your CompSi friend may be forcing throttling, which Intel does allow on a per-core basis. It's not a multiplier, but a forced level of clock-throttle. You can accomplish this through several means, I use a utility called RMClock to manage how my two laptops use voltage and multipliers. It also provides a force-throttle option, and I'm pretty sure it also allows it per-core.
 
Last edited by a moderator:
I looked further into this, and it seems Intel's individual core clocking scheme for Penryn only kicks in when one core is in a C3 or deeper power state.

C3 is important because that is the power state where a processor is no longer required to keep its caches coherent.

The shared cache becomes a non-issue, since the idle core won't access it until the other core steps back down to normal clock.

I don't know what this implies for the cores on the other die on the MCM, if this is enabled in that configuration.
 
Later Edit: Hey, you know what occurs to me? Your CompSi friend may be forcing throttling, which Intel does allow on a per-core basis. It's not a multiplier, but a forced level of clock-throttle. You can accomplish this through several means, I use a utility called RMClock to manage how my two laptops use voltage and multipliers. It also provides a force-throttle option, and I'm pretty sure it also allows it per-core.
Though, this would be under-, not overlcoking :p
 
I looked further into this, and it seems Intel's individual core clocking scheme for Penryn only kicks in when one core is in a C3 or deeper power state.

C3 is important because that is the power state where a processor is no longer required to keep its caches coherent.

The shared cache becomes a non-issue, since the idle core won't access it until the other core steps back down to normal clock.

I don't know what this implies for the cores on the other die on the MCM, if this is enabled in that configuration.
My geek side was hoping for far more than that, but thinking about the bigger picture, that method is a better power-saving function than the option I was envisioning.

Here's my big question then: On the surface, this kind of technology seems entirely at the mercy of a proper thread dispatch / management routine by whatever the host OS is. I can't properly envision how the CPU package would handle several cores being asleep while the OS is attempting to allocate threads to them... Obviously this requires at a minimum some awareness by the OS that you can't do that.

Back to the throttling deal, while it's obviously capable of saving power, clock-stop isn't a very process-friendly way to slow the CPU down. In my experience, you really only use that in response to a critical thermal event; you don't care about the process performance, you just want to get the CPU power consumption WAY down to keep from letting the magic smoke out. In my limited testing, performance is far better by reducing the multiplier by half (12x to 6x) versus clock-throttling by 50%. But of course, the power savings / heat reduction isn't nearly as drastic in the former...
 
I don't think overclocking per-core would have any obvious benefits, especially when the change is multiplier-only.

But it has several obvious power and heat benefits when you underclock, which is what I assume they intend it for. Sadly the tables have turned, and AMD is now the one facing big power and heat issues -- being able to downclock a core or two when warranted will be a godsend for partial-load power consumption.

Especially when you start talking about mobile CPUs...
 
Here's my big question then: On the surface, this kind of technology seems entirely at the mercy of a proper thread dispatch / management routine by whatever the host OS is. I can't properly envision how the CPU package would handle several cores being asleep while the OS is attempting to allocate threads to them... Obviously this requires at a minimum some awareness by the OS that you can't do that.

C3 is the low power state for ACPI, which is an interface that links OS control to processor power management. It doesn't look like a CPU can transition to C3 without the OS commanding it, so it would follow that a sane OS wouldn't try to dispatch threads to a CPU it just powered down.
 
I guess the conflict in my head is: an OS enters S3 state for the main processor when the entire system is going to sleep. Perhaps maybe it's more change of thought process for me than it is for the OS? ;) Thinking of putting a CPU core into S3 state just makes me think of the entire computer going to sleep; I'm just wondering if the current CPU driver allows for this in Win2K, WinXP, WinVista...

Or Linux for that matter, although I'd think Linux would have FAR less of a problem (if there were any problems to be had)
 
S3 is a global system state, which would cover all the cores and other devices.
C3 is a CPU state, and apparently it is applied per core, or will have to be applied per core with Penryn.
 
Ahhhhhhhh, yeah, duh... I know this, but I'm stupid and wasn't thinking :(

Thanks for setting me back straight :)
 
Back
Top