HBRU
Regular
Thanks god some justice.While there are likely many contributors to the power management system, a major component is that the physical and electrical behavior of the silicon is profiled at various points in the chip's operating range, to give its thermal response at various temperatures, voltages, and clocks. This profile data is looked up when a given activity counter registers an action, to get an estimate of the power consumption that resulted from the operation. Some later additions to the method include dummy ALUs or partial register file blocks that periodically perform some kind representative activity to get a tighter approximation of what the hardware is consuming. This isn't the same as instruction count, since instructions can translate into different levels of activity or different internal operations. An instruction may or may not generate multiple cache misses, or might wake up blocks that were clock-gated or worse power-gated, and all those have varying power costs.
The original motivation for this stems from the question of measuring temperature or power consumption for the silicon at small time scales and in any hot spots. The chip has thermal limits to safe operation, which traditionally required a wide safety margin for worst-case scenarios. Using temperature in a single thermal sensor doesn't cover the whole chip, and thermal sensors are relatively large to put them by every active block. Their response time and the speed that heat travels from a hot spot to a sensor can be a problem since local power spikes can push local temperatures from nominal to dangerous in millisecond or shorter time frames.
AMD's approach was to profile the silicon for how it reacted to events at different places in the performance envelope, and then used the activity count to generate an estimate. While conservative, it was based on a dynamic approximation that worked at microsecond ranges, rather than an estimate decided at product design time with very wide safety margins.
Since then, AMD may have also done more local electrical monitoring as well, which can produce more accurate estimates of power consumption and can tighten the estimate of how much a given region can heat up based on how much additional power can be consumed in a given time step.
The latest boost functionality and high clocks for Zen come from pushing silicon to near the safe limits of voltage and temperature of the silicon process for controlled periods of time, which under prior methods could very quickly overrun them. Sony's method appears to piggy-back on a lot of this work, and its operating point discards much of that boost range. The physical characterization tables seem to be tuned for consistency, which means the power management hardware thinks the silicon has a given baseline set of properties, regardless of whether the silicon itself can do better.
It's possible that AMD's method is over-engineered for what Sony is doing, but it would be more work to take it out at this point.
So the question is more phisical and related to the silicon characteristics that are mapped somehow... thoose maps may also change with time while the console gets old. Hopefully in a predictable way....
So my concern is well based.
And now ban me again please.