Qualcomm SoC & ARMv8 custom core discussions

Poor on its own:

http://www.anandtech.com/show/8718/the-samsung-galaxy-note-4-exynos-review/11
Anandtech said:
I still think the A57 is a tad too power hungry in this device, but as long as thermal management is able keep the phone's temperatures in reign, which it seems that it does, there's no real disadvantage to running them at such high clocks. The question is whether efficiency is where it should be. ARM promises that we'll be seeing much improved numbers in the future as licensees get more experience with the IP, something which we're looking forward to test.

Turns out that ended up being the best implementation on 20nm! It seems that licensees did not need more experience with the ip, just a new node. ;)
 
Poor on its own:

http://www.anandtech.com/show/8718/the-samsung-galaxy-note-4-exynos-review/11

Turns out that ended up being the best implementation on 20nm! It seems that licensees did not need more experience with the ip, just a new node. ;)

This still doesn't really mean anything without context. I'm going to take it that what you mean here is that both TSMC and Samsung's 20nm were actually less efficient than their 28nm node. But how would you determine that from the AT article? Are you comparing it to a Cortex-A57 same revision implementation on 28nm? Because it looks like you're comparing a particular revision of A15 to particular revision A57, which is not really fair to isolate process differences.

If you instead compare the Cortex-A15 in Exynos 5420 (Samsung 28nm) with that in Exynos 5430 (Samsung 20nm) you get this: http://www.anandtech.com/show/8718/the-samsung-galaxy-note-4-exynos-review/2 Kind of paints a different picture than "poor process"
 
For the sake of discussion, here's the graph with the 5433 included. http://i.imgur.com/UqpCLYL.png

@Exophase
IP revisions are generally meaningless in regards to the power discussion. Implementation between vendors is so different that it's the least meaningful characteristic to talk about.

A53 in 810 (both SoC revisions) use more power than the A53 in the 808. Also they fall behind Krait because Krait tops off at ~750mW at 2.45GHz with slightly higher IPC while the 810/808's A53's top around 500mW at 1.5GHz.

Actually in regards to X1: I do have power numbers but only those from Nvidia. The X1 might actually end up quite better than the 5433 if I ever get to measure it via my methodology.
 
Last edited:
IP revisions are generally meaningless in regards to the power discussion. Implementation between vendors is so different that it's the least meaningful characteristic to talk about.

ARM described several power improvements in AFAIK r4 of Cortex-A15. nVidia cited these changes as a contributing factor to higher efficiency in Tegra-K1 over Tegra 4. It's not clear to me if all of these modifications even made it into A57, or at least initial revisions.

The line between mere revisions and new core names can be fuzzy. In a lot of ways Cortex-A72 looks like a couple major revisions ahead of A57. and I think we can all agree the power consumption improvements appear substantial. A17 also appears to have started out as some revisions to A12.

Also they fall behind Krait because Krait tops off at ~750mW at 2.45GHz with slightly higher IPC while the 810/808's A53's top around 500mW at 1.5GHz.

Which looks very, very bad for Qualcomm's implementation of A53, which by all means should be an intrinsically more efficient processor than Krait 400. Which again leads me to wonder how good of a job they even did on their 28nm A53 implementations. I don't think I've seen measurements here. What I do know is that the Xperia M4 Aqua phone with S615 has very, very messed up scheduling that makes me wonder if they're trying to haphazardly cover for poor efficiency.

Actually in regards to X1: I do have power numbers but only those from Nvidia. The X1 might actually end up quite better than the 5433 if I ever get to measure it via my methodology.

Any chance we'll see something like this performed on a Jetson X1 board? Should be pretty easy. Maybe AT could get a review sample if you didn't already.
 
There's concerns about the Jetson board because it's not "power optimized" and any measurement deriving from that might be misleading. I'm trying to get Josh to at least verify the Fmax power via fuel-gauge on the Pixel C but I doubt I'll get any other X1 device in my hands.
 
This still doesn't really mean anything without context.

What do you mean by context? The phone routinely needs to throttle due to its power consumption. The 5433 (the ~best 20nm implementation of A57) can consume over 7w at max load! That's just bad in general (especially for phones where throttling will happen in seconds). No comparisons to other SoCs are needed.

Remember the original question:
810 does incredibly bad compared to 7420. Is it both a TSMC process issue and a Qualcomm bad implementation?

Sure you can blame qualcomm for making the situation worse, but the reality is even if qualcomm matched the best 20nm implementation of the A57, it still would have appeared "bad" compared to the 7420. Look at that graph Neb provided us, I'd say the 5433 is also "bad" compared to the 7420 (the gap between the 7420 and the 5433 is larger than the gap between the 5433 and 810).

Ultimately I can't blame TSMC's implementation of 20nm or qualcomm for the 810's failures. Qualcomm was never going to look good (in this market) until its SoCs moved to finfet (or perhaps a custom core). Sure they could have done better with the A57, but would have it really mattered? Was it ever going to be competitive against the 7420?
 
I doubt the A57 implementation or the process is to blame for S810 mischiefs; the majority of so far indications point at a problematic memory controller.
 
I doubt the A57 implementation or the process is to blame for S810 mischiefs; the majority of so far indications point at a problematic memory controller.
Uh, no. The memory controller is least of the problems and was mostly fixed in 2.1. It's very clearly an implementation issue.
 
What do you mean by context? The phone routinely needs to throttle due to its power consumption. The 5433 (the ~best 20nm implementation of A57) can consume over 7w at max load! That's just bad in general (especially for phones where throttling will happen in seconds). No comparisons to other SoCs are needed.

So none of that means that Samsung's 20nm is a "poor process." I can find 28nm phones that can use 7W at full load too. Pretty much no phone with a processor like Cortex-A15 or A57 is allowed to run all cores at max clock speed indefinitely, including on Samsung's 14nm. Of course comparisons to other SoCs are needed, you can make an SoC that uses 7+W on any process.

The point is, you're just not isolating every variable. I showed that Exynos 5430 is a lot more power efficient than 5420 but this doesn't seem to matter to you. By your reasoning, 5430, 5433, and Tegra X1 shouldn't have been made at all, they should have just been skipped for 14/16nm SoCs right? But you can't survive at all in this market by not releasing a product. Despite having terrible efficiency Qualcomm's 808 and 810 SoCs still got many design wins, I think it's obvious that they would have been in a poorer position had they opted to sit out and wait for 16nm. But of course them doing a better job would have mattered, it would have made their PR not nearly as bad and it would have gotten them a few more wins. This isn't an all or nothing game where they're either the best Android SoC or nothing matters given that not everyone buys Samsung devices and their high end SoCs have been mostly limited to said devices.
 
Uh, no. The memory controller is least of the problems and was mostly fixed in 2.1. It's very clearly an implementation issue.

Really? I stand corrected. Then why is the GPU throttling like no tomorrow? Adreno330 didn't throttle by such an absurd persentage.
 
So none of that means that Samsung's 20nm is a "poor process." I can find 28nm phones that can use 7W at full load too. Pretty much no phone with a processor like Cortex-A15 or A57 is allowed to run all cores at max clock speed indefinitely, including on Samsung's 14nm. Of course comparisons to other SoCs are needed, you can make an SoC that uses 7+W on any process.

http://images.anandtech.com/doci/8718/big-cluster.png
Look at the difference: 2 A57s at max clock/load consume about the same power as 4 A15s!

I didn't say samsung had a poor process. I'm saying the A57 (regardless who did the implementation) was underwhelming for a phone SoC on both samsung's and tmsc's 20nm processes.

The point is, you're just not isolating every variable. I showed that Exynos 5430 is a lot more power efficient than 5420 but this doesn't seem to matter to you. By your reasoning, 5430, 5433, and Tegra X1 shouldn't have been made at all, they should have just been skipped for 14/16nm SoCs right? But you can't survive at all in this market by not releasing a product. Despite having terrible efficiency Qualcomm's 808 and 810 SoCs still got many design wins, I think it's obvious that they would have been in a poorer position had they opted to sit out and wait for 16nm. But of course them doing a better job would have mattered, it would have made their PR not nearly as bad and it would have gotten them a few more wins. This isn't an all or nothing game where they're either the best Android SoC or nothing matters given that not everyone buys Samsung devices and their high end SoCs have been mostly limited to said devices.

Yes you showed me how a 28nm SoC (5420) became more power efficient on 20nm (5430). I'll give you 20nm wasn't very good for power, but that's hardly a valid comparison for this situation. But regardless, what kind of efficiency gains were you expecting? Even if they developed a SoC 50% more power efficient than the 5433, it would still be behind the 7420. I'd be surprised if they could achieve those gains in that time frame.

As to what they should have done, that's orthogonal to the question. I didn't say they made the wrong decision or that they should have waited for finfet. I'm only suggesting that in general we shouldn't expect a 20nm A57 SoC to be competitive to a 14nm (finfet) A57 SoC and that imo the A57 turned out to be pretty mediocre on 20nm.
 
Really? I stand corrected. Then why is the GPU throttling like no tomorrow? Adreno330 didn't throttle by such an absurd persentage.
Vendors trying to put high TDP tablet SoCs into phones, the Adreno 430 having some architectural issues, and a bad power on the side of the Cpus eating away some of the power budget from the rest of the SoC.
 
I'm really struggling to even process these results. Why is the A53 perf/W on 810 even worse than on 808? Why is it only marginally better than the A57 perf/W?
The 808 and 810 A53 curves get closer in the performance range where one would expect active power to dominate. Perhaps less than perfect idling or gating on the 810 leaves a higher power floor?
 
Those curves are full load figures, idle has no relevance in those cases. It's simply different voltage scaling with different physical characteristics.
 
Those curves are full load figures, idle has no relevance in those cases.
I am trying to draw inferences from what was given in the graph, and this seemed somewhat ambiguous.
Without knowing what exactly goes into the W component, it didn't look inconsistent with having to deal with a component of non-dynamic power consumption that becomes less dominant as the chip starts to go higher in the design's clock range.
The behavior of the Exynos A53s at 16nm and 20nm Samsung, with the significantly higher perf/W at the lower range seems consistent with the more significant scaling of static and sub-threshold consumption by FinFET versus planar.

It's simply different voltage scaling with different physical characteristics.
The picture didn't indicate either way on whether the measurements excluded SOC power consumption besides the A53s. (edit: and active-only on top of that)
As far as judging the quality of Qualcomm's implementation how many examples for each SOC go into the curves?

I'm curious if there's anything you've seen that illuminates why Samsung's curves have a gap in absolute performance between the A53 and A57 cores, whereas Qualcomm allows more overlap at the apparent price of including that less-than ideal part of the OoO cores' curve.
 
Last edited:
I'll explain the graphs more in depth once I get to put it into a proper article, in any case I'm happy with the representation given the amount of resources at hand. Yes it tries to exclude SoC power so it should be near CPU-only.

As for the big.LITTLE gap, it's actually Qualcomm which is unusual in their implementation as it makes little sense to overlap on the low frequencies of the A57- they're seemingly the only vendor going low freq on the big cluster for some reason while in reality those freqs are never used. It may be Samsung has better cluster power-gating or something and they don't bother to go low freq/voltage idle or their gains aren't big in doing so. MediaTek has similar curves as Samsung. The whole argument is interesting because it's a pointer that vendors have to do their physical design so that the two curves meet as smoothly as possible, which sometimes doesn't work out.

In regards to scaling, do keep in mind that process does play a large role in that. I've heard that TSMC processes have lower voltage ranges than Samsung's, and I did see some evidence of that at 28nm for example.
 
As for the big.LITTLE gap, it's actually Qualcomm which is unusual in their implementation as it makes little sense to overlap on the low frequencies of the A57- they're seemingly the only vendor going low freq on the big cluster for some reason while in reality those freqs are never used. It may be Samsung has better cluster power-gating or something and they don't bother to go low freq/voltage idle or their gains aren't big in doing so. MediaTek has similar curves as Samsung. The whole argument is interesting because it's a pointer that vendors have to do their physical design so that the two curves meet as smoothly as possible, which sometimes doesn't work out.

Out of curiosity, does the 808's A53 curve end sooner on the X axis than the corresponding cluster on the 810?
Is using performance as the X axis enough of a proxy for the clock/voltage points being used?
It might be coincidental, but it would be an interesting correlation if the dynamic range on one side has implications for the other. Both clusters for Qualcomm seem to follow a similar philosophy of a more full curve from the 801, rather than the more truncated range done by Samsung.
 
Out of curiosity, does the 808's A53 curve end sooner on the X axis than the corresponding cluster on the 810?

Yes the 808 ends sooner because it's clocked at 1440 vs 1555 on the 810. It's just coincidental that it seems that it matches up with the 810's curve.
It might be coincidental, but it would be an interesting correlation if the dynamic range on one side has implications for the other. Both clusters for Qualcomm seem to follow a similar philosophy of a more full curve from the 801, rather than the more truncated range done by Samsung.
Don't read too much into Qualcomm's curves. MediaTek, Samsung, HiSilicon all have a gap in their effective curves for one reason or another. Also keep in mind this is just my own curves based on SPEC2k IPC, maybe the vendors use something else which might tighten the curves together (int vs FP workload differences for example).
Is using performance as the X axis enough of a proxy for the clock/voltage points being used?
I don't understand what you're asking here... ?
 
Yes the 808 ends sooner because it's clocked at 1440 vs 1555 on the 810. It's just coincidental that it seems that it matches up with the 810's curve.
I think there should be at least some familial resemblance in cluster behavior for SOCs of the same generation from the same designer.
The 808 and 810 A57 lines have a gap in max performance as well, but there could be various reasons for why one has a greater range in achievable performance than the other.

Don't read too much into Qualcomm's curves. MediaTek, Samsung, HiSilicon all have a gap in their effective curves for one reason or another.
That one has a reason to have so much overlap or lacks of everyone else's reasons for a gap seems interesting to me.

I don't understand what you're asking here... ?
I just wanted to confirm that the SPEC score at point X sufficiently close to where the core's clock is set relative to its min and max.
Performance is a value that results from the contribution of multiple factors, and the reasons for changes in slope or a difference in where a curve ends on the X axis could vary.
I am curious where the cores are in their frequency range, whereas performance achieved is at least partly an architectural/software phenomenon.
 
Last edited:
I just wanted to confirm that the SPEC score at point X sufficiently close to where the core's clock is set relative to its min and max.
Performance is a value that results from the contribution of multiple factors, and the reasons for changes in slope or a difference in where a curve ends on the X axis could vary.
I am curious where the cores are in their frequency range, whereas performance achieved is at least partly an architectural/software phenomenon.
It's basically the full frequency range, performance is mostly linear with clock and that's what the curve represents. I don't actually go and re-run SPEC at every frequency as this would take several hundred hours of work. The only SoC where this isn't valid is Krait because it has an async L2 that caps at ~1.5GHz so CPU IPC is higher at lower core frequencies than at like the max 2.5GHz. For everything else it's mostly all linear.
 
Back
Top