Intel Broadwell (Gen8)

Hey Andrew, thx for following up on that!

Yep, it's more often at max turbo than HSW. But when trying to fully utilize all ressources, i.e. also the x86 core, then I see very quickly lowering clocks - and that's also a problem in gaming when games highly stress the CPU cores.
With Haswell, I only have limited comparable data since I only had the Gigabyte Brix Pro for a few days and had to modify cooling a bit in order to achieve stable temperatures under 80 °C under load. And that was with older drivers of course.
 
Yep, it's more often at max turbo than HSW. But when trying to fully utilize all ressources, i.e. also the x86 core, then I see very quickly lowering clocks - and that's also a problem in gaming when games highly stress the CPU cores.
Yeah that should not be unexpected... I mean the CPU portion can easily eat >>65W all by itself (look at HSW for instance) so you can't expect to "max out" every part of these chips at the same time. That hasn't been true for a few years and will likely never be true again in the future. Everything is power-limited ultimately.

That's part of the reason why we're enthusiastic about DX12. Running even a single core at max turbo - which is common in games - is a great way to eat power unnecessarily. Games don't typically "max out" even 4 cores, they just bottleneck entirely on one thread. Some of this is game engine architecture issues but some of it is graphics API problems that are addressed by DX12, Vulkan, etc.

With Haswell, I only have limited comparable data since I only had the Gigabyte Brix Pro for a few days and had to modify cooling a bit in order to achieve stable temperatures under 80 °C under load. And that was with older drivers of course.
In my experience HSW GT3e didn't sit at max turbo most of the time (1.3Ghz), even on the 65W SKU. The Brix Pro additionally is often thermally limited as you note. That's not the end of the world since as you go up the frequency curve you start to really burn power, but even if you had a wider GPU for those larger TDP parts (vs. higher clock rates) it still wouldn't be a good idea to run high frequency CPU cores/threads.
 
Yeah that should not be unexpected... I mean the CPU portion can easily eat >>65W all by itself (look at HSW for instance) so you can't expect to "max out" every part of these chips at the same time. That hasn't been true for a few years and will likely never be true again in the future. Everything is power-limited ultimately..
I'm in a bit of a hurry right now, so just a suggestion for now: For gaming, it might be a good idea to choose whether to throttle CPU or GPU cores based on whether or not a full screen d3d application is running. If yes, throttle CPU cores down to non-turbo values in order to save power for the GPU cores, which are more crucial most of the time in games.

Better yet: Introduce a user editable choice in the driver panel, so it can be varied at will which cores are the first power victims.
 
Are you seeing it drop GPU clocks when running a workload with the CPU ~idle? I've actually seen that far less on BDW than HSW (HSW's turbo clocks were a bit higher to start with). Obviously the CPU can easily eat the entire TDP and more if asked to, but in GPU-only workloads my BDW GT3e 65W tends to stay pegged to max turbo clock much more than my HSW machines did.
If possible, can you confirm how power sharing works?
I was experimenting with my laptop, which (weirdly) responds to Intel XTU, and one behaviour I think I noticed is:
If the SoC becomes power limited, CPU cores run at their maximum non-turbo clock, and the GPU turbo as much as possible within the power envelope.
I have a i5 4210U, and when power limited, the CPU runs at 1700MHz and the GPU clock fluctuates depending on how much power it has available.
That is, when power limited, GPU turbo has priority over CPU turbo.

There's a separate weird behavior where Haswell's iGPU, even when there is no load, constantly runs at a mild turbo clock (600MHz instead of 200MHz). If I unplug my laptop, it drops to base clock momentarily, and then goes back to 600MHz. It seems that there is some reason why it needs to run at 600MHz minimum (some kind of responsiviness maybe?), and this was fixed in the drivers instead of the firmware. It would be interesting to know whether BDW graphics actually run at base clock when there is no load.
 
I'm in a bit of a hurry right now, so just a suggestion for now: For gaming, it might be a good idea to choose whether to throttle CPU or GPU cores based on whether or not a full screen d3d application is running. If yes, throttle CPU cores down to non-turbo values in order to save power for the GPU cores, which are more crucial most of the time in games.
I was actually able to do what you suggested – I was able to get performance gains in a couple of benchmarks by limiting CPU clocks below base (but above minimum).
That is, for my i5-4210U:
Minimum CPU frequency: 800MHz
Base CPU frequency: 1700MHz
Turbo: 2700MHz
In one power-limited benchmark, maximum performance was achieved when I forced the CPU to run at 1400MHz. At 1700MHz, graphics didn't achieve max turbo, at 800MHz, the benchmark was CPU-limited.

I would guess that at some point, there will have a way of measuring actual CPU/GPU utilization (some kind of performance counters) and base the power-priority decisions on that. Power-sharing, IMHO, needs to be deterministic.
 
For gaming, it might be a good idea to choose whether to throttle CPU or GPU cores based on whether or not a full screen d3d application is running. If yes, throttle CPU cores down to non-turbo values in order to save power for the GPU cores, which are more crucial most of the time in games.
It's actually faaaaar more complicated than that, by necessity. But there are far more useful signals to use to inform turbo decisions than "is a full screen d3d application running" too, such as - for instance - is the CPU or GPU currently the bottleneck (which one is idling for a portion of the frame time).

If possible, can you confirm how power sharing works?
It's complicated and varies based on a lot of factors, some software and some hardware. Even if I understood it all (I don't) I'm not sure I could explain it in any reasonably comprehensible way :)

There's a separate weird behavior where Haswell's iGPU, even when there is no load, constantly runs at a mild turbo clock (600MHz instead of 200MHz).
So be careful with this one as typically sleep states get involved at the low end of things at which point the frequency is not really indicative of anything. On Broadwell depending on the tool sometimes it will even report being at max turbo when the GPU is entirely idle... in reality it's in a sleep state. To really understand what's going on you need both P-state and frequency information over some time window.

I was actually able to do what you suggested – I was able to get performance gains in a couple of benchmarks by limiting CPU clocks below base (but above minimum).
It's ultimately a bunch of heuristics and it's impossible for them to be perfect. That said, how different was the performance in the case where you lowered the CPU turbo vs. the native setting? I'd be surprised if it was very significant, although it's not impossible.

I would guess that at some point, there will have a way of measuring actual CPU/GPU utilization (some kind of performance counters) and base the power-priority decisions on that. Power-sharing, IMHO, needs to be deterministic.
As I mentioned above, this sort of thing is already an input to the power algorithm. Note that it's impossible to be fully "deterministic" as there are way too many external sources that affect things, most notably the thermal situation.
 
Back
Top