Quadbitnomial
Newcomer
I would like to add that I enjoy a lot of the back and forth that happen in these threads. It would be boring without it regardless of how constructive it is or isn't. More often constructive in my opinion as a long time lurker and recent member.Unfortunately most of the time and near the top are unintentionally vague terminology. I work in data, if something was 80-90% you're already near the top. Anything north of 95 you are at the top. Most of the time really can mean 70% and above. If you ask any person what near the top and most of the time is I would largely believe that it would be around 80+ is near the top, and 70% and above to represent most of the time.
But to further add some more context to what Mark Cerny said about the bolded so it becomes a lot less vague than it may seem;
"There's another phenomenon here, which is called 'race to idle'. Let's imagine we are running at 30Hz, and we're using 28 milliseconds out of our 33 millisecond budget, so the GPU is idle for five milliseconds. The power control logic will detect that low power is being consumed - after all, the GPU is not doing much for that five milliseconds - and conclude that the frequency should be increased. But that's a pointless bump in frequency," explains Mark Cerny.
At this point, the clocks may be faster, but the GPU has no work to do. Any frequency bump is totally pointless. "The net result is that the GPU doesn't do any more work, instead it processes its assigned work more quickly and then is idle for longer, just waiting for v-sync or the like. We use 'race to idle' to describe this pointless increase in a GPU's frequency," explains Cerny. "If you construct a variable frequency system, what you're going to see based on this phenomenon (and there's an equivalent on the CPU side) is that the frequencies are usually just pegged at the maximum! That's not meaningful, though; in order to make a meaningful statement about the GPU frequency, we need to find a location in the game where the GPU is fully utilised for 33.3 milliseconds out of a 33.3 millisecond frame.
"So, when I made the statement that the GPU will spend most of its time at or near its top frequency, that is with 'race to idle' taken out of the equation - we were looking at PlayStation 5 games in situations where the whole frame was being used productively. The same is true for the CPU, based on examination of situations where it has high utilisation throughout the frame, we have concluded that the CPU will spend most of its time at its peak frequency."
Put simply, with race to idle out of the equation and both CPU and GPU fully used, the boost clock system should still see both components running near to or at peak frequency most of the time. Cerny also stresses that power consumption and clock speeds don't have a linear relationship. Dropping frequency by 10 per cent reduces power consumption by around 27 per cent. "In general, a 10 per cent power reduction is just a few per cent reduction in frequency," Cerny emphasises.
Now lets look at AMD Polaris architecture and its AVFS strategy
Adaptive Frequency and Voltage Scaling (AVFS)
The most powerful technique deployed to manage power consumption in the Polaris architecture is AMD’s AVFS, which was first developed for the 6th-generation AMD ASeries APUs (“Carrizo”). Modern GPUs operate in an incredibly complex environment with radically different combinations of system configurations (e.g. voltage regulator quality, cooling solution), temperature, and varied and changing workload (e.g. light gaming or the latest AAA games filled with explosions and sophisticated effects). Moreover, even theoretically identical GPUs are subject to subtle variations in silicon manufacturing. Traditional design techniques are fairly pessimistic and account for all these potential differences through guardbands, which reduce the operating frequency and/or increase the voltage – sacrificing performance and increasing power consumption. The central concept of AVFS is to avoid guardbands and instead intelligently measure the behavior of each GPU and chose better combinations of voltage and frequency (fig. 9). AVFS uses power supply monitoring circuits to measure the voltage across different parts of a Polaris GPU in real time as seen by the actual transistors. Polaris GPUs also contains small replica circuits that mimic the slowest circuits in the GPU and are continuously monitored. Together these two blocks can measure how close the GPU is to the voltage limit at a given frequency. Similarly, the GPU can dynamically measure the temperature of the silicon in order to choose the right operating point since temperature affects transistor speed and power dissipation.
When the GPU boots up, the power management unit performs boot time calibration, which measures the voltage that is delivered to the GPU, compared to the voltage measured during the test and binning process. For example, it is fairly common for a voltage regulator to output 1.15V, but the GPU only receives 1.05V due to the system design. In the Polaris architecture, the power management unit can correct for this static difference very precisely, rather than requesting a more conservative (i.e. higher) voltage that would waste power. As a result, platform differences (e.g., higher quality voltage regulators) will translate into higher frequencies and lower power consumption. In addition, the boot-time calibration optimizes the voltage to account for aging and
reliability. Typically, as silicon ages the transistors and metal interconnects degrade and need a higher voltage to maintain stability at the same frequency. The traditional solution to this problem is to specify a voltage that is sufficiently high to guarantee reliable operation over 3-7 years under worst case conditions, which, over the life of the processor, can require as much as 6% greater power. Since the boot-time calibration uses aging-sensitive circuits, it automatically accounts for any aging and reliability issues. As a result, Polaris-based GPUs will run at a lower voltage or higher frequency throughout the life time of the product, delivering more performance for gaming and compute workloads.
Adaptive Clocking
Another advantage of AVFS is that it naturally handles changes induced by the workload. For example, when a complex effect such as an explosion or hair shader starts running, it will activate large portions of the GPU that suddenly draw power and cause the voltage to “droop” temporarily until the voltage regulators can respond. Conceptually, these voltage droops in a GPU or processor are similar to brownouts in a power grid (e.g. caused by millions of customers turning on their lights when they get home from work around 6pm). The power supply monitors detect the voltage droop in 1-2 cycles, and then a clockstretching circuit temporarily decreases the frequency just enough so that all circuits will work safely during the droop. The clock stretcher responds to voltage droops greater than 2.5% and can reduce the frequency by up to 20%. These droops events are quite rare, and the average clock frequency decreases by less than 1%, with almost no impact on performance. However, the efficiency benefits are quite large. The clock-stretching circuits enable increasing the frequency of Polaris GPUs by up to 140MHz.
It is a similar strategy AMD uses for RDNA but improved.
More Control Over GPU Power and Performance
Up until the AMD RadeonTM RX Vega and the RX 500 series GPUs, the clock speed (and associated voltage) of the GPUs was dictated by a small number of fixed, discrete DPM states. Depending on the workload, and available thermal and electrical headroom, the GPU would alternate between one of these fixed DPM states. As a result, the GPU had a lot less flexibility in finding and residing at the most optimum state since it had to be one of these valid DPM states, and nothing in between. Often, this meant leaving performance on the table if the ideal voltage-frequency (Vf) state happened to be in between two of the fixed DPM states.
In addition, for every single GPU within a SKU family (for example, reference RadeonTM RX Vega 64 GPUs), the DPM states or Vf points were identical. Given that there is always a die-to-die variance in performance even between two pieces of otherwise identical silicon, once again this meant giving up performance while catering to the lowest common denominator within the wafer population.
Starting with the AMD RadeonTM VII, and further optimized and refined with the RadeonTM RX 5700 series GPUs, AMD has implemented a much more granular ‘fine grain DPM’ mechanism vs. the fixed, discrete DPM states on previous RadeonTM RX GPUs. Instead of the small number of fixed DPM states, the RadeonTM RX 5700 series GPU have hundreds of Vf ‘states’ between the bookends of the idle clock and the theoretical ‘Fmax’ frequency defined for each GPU SKU. This more granular and responsive approach to managing GPU Vf states is further paired with a more sophisticated Adaptive Voltage Frequency Scaling (AVFS) architecture on the RadeonTM RX 5700 series GPUs.
As a result, each AMD RadeonTM RX 5700 GPU can find and run at the most optimum frequency, tailored to the specific workload, electrical, thermal and acoustic conditions – down to the last MHz. Paired with a Vf curve that is optimized for each individual RadeonTM RX 5700 series GPU, RadeonTM Adrenalin software and the RadeonTM WattMan tool provide much more granular control over the power and performance of the GPU.
Hope this adds more context to what Mark Cerny describes as the strategy they chose for PS5 and how it is different from previous consoles.