These are not improvements enabled by changing the power management control loop on generally equivalent silicon, this is a physical design overhaul.
I did not posit a change in the DVFS design, since it was announced as being there when the Jaguar architecture was launched.
What looks not have been planned at design time was the foundry jump to TSMC, which AMD needed to pay for. (edit: Much of the team left as well, but how much that matters versus that it killed AMD's ability to design more than one new core is unclear to me.)
I look askance at that move as a potential reason why the full range of Jaguar's announced DVFS capability was not realized, outside of an oddly inflexible exception. It took an additional cycle, something that to a lesser extent has happened with the desktop APU refreshes as processes improved and more time has elapsed for characterizing the hardware.
The guard-banding for AMD's initial offerings has a history of being conservative, per the history of AMD's 7970 to 7970 GHz, its 2xx to 3xx series, Trinity/Richland, Kaveri/Godavari, Carrizo/Bristol Ridge, etc.
I think we have to be clear exactly what we're talking about when we talk about hardware frequency control.
I am discussing in terms of the closed-loop voltage control being something that implemented first, and that it allows for more advanced control for clock speed to be implemented. Guard banding is reduced by voltage control, and it can be reduced further with frequency control.
I wasn't sure what Qualcomm had already implemented, and so I was not sure where it would have been in the process. Given where it hopes to go, I think Qualcomm will have a use for the additional functionality.
These two features have been implemented in processors I know of:
- The core dynamically temporarily drops the frequency to compensate for sudden current transient events. This is less of a problem on lower power devices because the transients aren't as large. And there are other solutions that involve dynamically changing the voltage instead.
This goes to my discussion about Qualcomm's higher-end goals for a server chip, and if that is being leveraged in the mobile silicon. The learning curve from that might be indicated in something odd like a physically present L3, and some other design quirks.
- The core estimates/measures power consumption in order to determine what frequencies it can currently support, and will expose that to the OS eg as p-states.
But in these cases the OS is still setting a nominal frequency that the core will generally run at. The only exception I know of here is Skylake, where the CPU can take over scheduling the frequency (presumably under the same guidelines an OS tends to use, which is roughly speaking about minimizing idle time).
My interpretation prior to Skylake's Speed Shift, the states below the turbo range were handled with the OS power states, with the hardware sneaking in turbo bins opportunistically.
Intel added the SOix active idle functionality prior to Skylake, which is where the hardware opportunistically takes the core down to lower power states than the OS is aware of. The latter seems like it would be a larger gain due to Intel's cores being so over-engineered for the space. This is more pressing than it would be for an architecture content with the purely mobile space, but I am questioning if that is the case for Qualcomm this time around.
One thing with SoCs like Puma is that they're usually ran on OSes like Windows and it's difficult to get MS to update the kernel to include power management code that's heavily tuned for any specific CPU. Especially in time for release day of the product. So they may have no option but to put power modeling in the CPU even if it doesn't strictly require fast response times.
For Windows at least, its response time is measured in tens of milliseconds, whereas at the high power ranges critical events operate at order of magnitude less time.
That leads to pessimization on the part of the software on what it thinks it can risk, and it leads to thicker guard-banding on the part of the hardware. If Android can poll that much faster, and Kryo doesn't need to target a higher-power device class, then I agree the need isn't the same.