Google Nexus 6P

I think quite a bit of it is that there's also quite a bit of CPU work going on in a lot of workloads that also use the GPU (browsers in particular), so balancing power across the whole SoC probably doesn't come out in favour of high GPU frequencies. So the GPU gets used, but unless vendors rethink how to balance power it's never going to work out the way we'd expect over here in GPU land.

I don't really get how browser or UI rendering comes up as a potential justification for bursty thermally unsustainable GPU loads. Even if you're not CPU or network limited and can really benefit from running the GPU at max clock speeds much of the GPU will still be heavily underutilized, like the ALUs, triangle setup, and possibly even TMUs. So the power load shouldn't resemble a reasonably game-like benchmark running at full tilt.

Besides, isn't this kind of thing often handled by dedicated 2D accelerators?
 
I'm talking about GPU, not CPU.
Are you guys planning on reviewing the Lumia 950 XL? From my experience using both (950 XL & 6P) the Lumia seems to have the best implementation of the S810 to date in terms of thermal dissipation (don't know about throttling). It also has the best video recording capabilities with the ability to shoot at 4K/30fps with OIS(always on) + Digital OIS (optional). The file bit-rate is 52Mbps (!) with stereo audio recorded at 42Hz. It also natively supports playback of HEVC/H.265 videos.
 
Are you guys planning on reviewing the Lumia 950 XL? From my experience using both (950 XL & 6P) the Lumia seems to have the best implementation of the S810 to date in terms of thermal dissipation (don't know about throttling). It also has the best video recording capabilities with the ability to shoot at 4K/30fps with OIS(always on) + Digital OIS (optional). The file bit-rate is 52Mbps (!) with stereo audio recorded at 42Hz. It also natively supports playback of HEVC/H.265 videos.
It doesn't look like we're receiving a review sample.
 
Too bad, would love to see something like dxcapsviewer on a DX12 qualcomm device. Sorry for being off-topic! ;)
 
It doesn't look like we're receiving a review sample.

Microsoft in Europe has been handing out review samples left and right in the past weeks...PM me if you want contacts.
Too bad, would love to see something like dxcapsviewer on a DX12 qualcomm device. Sorry for being off-topic! ;)
Unfortunately as of right now Windows 10 Mobile only features DX11 support AFAICS on the 950/950XL.
 
Unfortunately as of right now Windows 10 Mobile only features DX11 support AFAICS on the 950/950XL.

Unfortunate, but it would still be interesting to know what they support under DX11. To my knowledge this is the first adreno 4xx Windows device and they claimed 4xx supports feature level 11_1 with tiled resources. Wonder if that's true in practice.
 
Touring detailed locations on Apple Maps (some Flyover cities are good examples) and Google Earth can demand some bursty GPU performance.

Leads to some impressive rendering at times on an iPhone 6+ and 6s+ at 1080p 60 fps.
 
The issue is that I'm currently not aware of any SoC that employs DVFS policies that would even be able to respond to super fine-grained high loads that for example would be used in browsers or similar use-cases, like most SoCs out there switch frequency on a 100ms sample rate and GPUs have mostly always step-wise policies so it's always going to take a continuous load 200-300ms to trigger the highest frequencies. At this point you'd need user-space optimizations for QoS on the GPU freq and AFAIK only Samsung does stuff like that and even there they never request the highest frequencies.
changing clocks costs a lot of power, whether it's CPU or GPU. changing them every 10-20ms is catastrophic for power.

source: my own experiments
 
sure, I should say that's what I measured on 5X/6P (but I seem to recall it's also true on Krait for CPUs).
The 808/810 are hardly representative chipsets and should best be forgotten.

Do you actually mean the power cost for the PLL, regulator switch, software latency overhead affecting efficiency, or again the result of having a too high sampling rate and clocks ending up too high due to over-scaling DVFS too high and needlessly running lower efficiency points? If it's the latter then I would argue it's just an issue of bad scaling logics.

HiSilicon among others runs their big cores on a 10ms rate on ondemand and Samsung does 20ms with a 20ms timer slack on interactive.
 
The 808/810 are hardly representative chipsets and should best be forgotten.

Do you actually mean the power cost for the PLL, regulator switch, software latency overhead affecting efficiency, or again the result of having a too high sampling rate and clocks ending up too high due to over-scaling DVFS too high and needlessly running lower efficiency points? If it's the latter then I would argue it's just an issue of bad scaling logics.

HiSilicon among others runs their big cores on a 10ms rate on ondemand and Samsung does 20ms with a 20ms timer slack on interactive.
I'm assuming PLLs, as it's certainly not any sort of software DVFS cost. Experiment is really simple: get the system into an unloaded state, switch to userspace governor, lock a CPU benchmark (dhrystone or something absurdly simple) onto a single core, run a script on another core that alternates between two different clocks, measure power/perf.

Also I honestly can't believe anyone uses ondemand at this point.
 
I'm assuming PLLs, as it's certainly not any sort of software DVFS cost. Experiment is really simple: get the system into an unloaded state, switch to userspace governor, lock a CPU benchmark (dhrystone or something absurdly simple) onto a single core, run a script on another core that alternates between two different clocks, measure power/perf.

Also I honestly can't believe anyone uses ondemand at this point.
Not at this point - the device is 1.5 years old by now. In the Kirin 950 they switched to interactive at least.

I know performance takes a hit for a frequency switch but it shouldn't be quite that noticeable. IIRC Exynos platforms's full hardware + software stack switch latency was ~140µs of which there is some time where the CPU clock is basically paused. But then again it depends on the clock architecture as some SoCs have much lower hardware "dead periods" as they switch PLLs.

Did you test this with Qualcomm's stock frequencies or actually try to run your own? Editing the driver so that it uses say 960 and 961 MHz states at the same voltage would give more insight what kind of overhead this is.
 
Oh boy, Samsung and Qualcomm could really learn a thing or two from Hisilicon:



 
Back
Top