This is conventional wisdom.
It doesn't sit well with me however. I can´t quantify the x86 architectural cost, but it seems to be greater than the first order approximation would indicate.
To keep to the thread topic, the A9x sets the finger on the sore spot - how is it that Intel, who has the best fabs, a crystal clear architectural focus, tons of CPU design experience, and an R&D budget that is effectively limitless, still can't make a competitive x86 mobile chip? Surely, if the x86 penalty was small, all the undeniable strengths of Intel would compensate several times over?
To break out some of the assumptions:
Best fabs: In terms of high-performance digital logic processes, this generally undisputed. However, the era of Intel's dominance has only recently introduced processes more strongly aligned with the needs of mixed analog/digital SoC products and low power. Most likely, few would be near Intel's ability to achieve a usable active power consumption at 3-4 GHz speeds, but it's also the case that historically the designs that needed those features have not cared about idle and static power consumption that was far too high. Getting an SoC node was not immediate, and the turnaround time for changes in processes and which variants were rolled out first is lengthy. Density-wise, it took up until the most recent nodes for the foundries to make an economic decision to weaken their metal scaling at the same time that Intel decided to focus on it.
Clear architectural focus: It might be more unified compared to the fractured ARM market, but it has internal lines and politics. Atom is the low power line, but its initial success was more of a surprise and it was not given the same investment as the main lines. Its optimization and design goals were more modest until Silvermont really replumbed it. The same goes for Larrabee and the various chips up until the most recent Xeon Phi. That line did not receive significant physical design resources and it seems had constraints on the driver optimization resources it could have back when it had aspirations to take on GPUs. It did not receive a non-decrepit base core until the most recent one.
There was significant pushback against anything that would cannibalize the sales of the main-line cores, be it a large many-core HPC/server chip cutting into Xeon, and a too-performance cheap Atom cutting into x86 in the client PC range.
However, the lines that these specialized chips for a long time deferred to are the x86 big cores. Up until the most recent generation, we see Intel's mainline design with that vaunted 3-4 GHz fmax, increasingly wide dynamic range, high scaling potential, and very good physical design--at least for markets that enjoy orders of magnitude more power budget and margin. What we see with AVX-512 and the client/Xeon split is a sign that for all its vaunted resources, Intel seems to have reached a point where getting a design to cater to vector, integer, laptop, desktop, server, HPC, phone, etc. has finally reached a point on one or more exponential curves that even Intel cannot engineer around.
Also, we finally see with Phi and Silvermont examples of designs accomplished after Intel either managed to spin up the design effort or managed to fight back the entrenched interests of the mainline business that initially kept them out of some of that top-tier resourcing.
Given that many of these efforts (cores, interconnects, design targets) have lead times on the order of 3-5 years, the decision was probably made at a point that is both some time ago but frustratingly too late in terms of market changes versus lead time. However, that also means that they had to make some judgement calls on where the market was going, and that's where the Apple's designs might be a case where Intel's game of catch-up lead them to skating to where the puck was.
That's solely on the cores, however. In terms of SOC integration and associated IP, Intel has only recently been able to digest acquisitions for things like modem tech, which in terms of engineering and corporate integration take a long time. In terms of all the various things besides the CPUs that mobile manufacturers tend to care a lot more about, Intel was playing catch-up. There are rumors that it is doing a lot to convince Apple to use its modem.
Then there's the software, the lag in adopting sufficiently upscale GPU resources relative to the CPU allocation, etc.
And it is not for lack of trying either, they developed a new implementation of x86 to specifically go after low power applications, improved it iteratively, and when that failed to find customers Intel threw literally several billions in cash (contra revenue) at the manufacturers to get them to use their CPUs. Other than very effectively killing the low power x86 market for AMD, this has pretty much accomplished nothing.
Atom was initially more like a cheap x86 that was generally lowish in power. It took much longer to get a good low-power implementation. My earlier reference to skating to where the puck was: isn't Silvermont a more competitive to the ARM A9/A15/A57 implementations we've seen?
Getting something to beat the A9 would involve Intel expecting a core with the raw width and resources of a high-end x86, on a revamped ISA that ditched a lot of cruft that ARM had, with a significant design investment, and this core exists to service a specific niche. No 4GHz+ fmax, no high core count scaling, reduced SIMD width, limited introduction of non-relevant design features, more limited or more mobile-targeted dynamic range, etc. AMD's Excavator got measurably less embarrassing when it gave up a significant fraction of its fMax on an unimpressive architecture.
Beyond that, and possibly more relevant, those manufacturers that need to be coaxed into using an insurgent architecture, which Atom would be for mobile, are operating in a market where a CPU core is a widget that doesn't rank above a lot of other concerns. Also, Apple is making most of the profit in those markets, too. Intel is playing a game of catch-up, and the most it can aspire to when the endgame seems to be vertical component/device/content delivery/software/OS integration is an uppity component provider several layers down.
Apple is basically calling the shots at Intel's layer and above. Given how much pull Apple has for things like the large GPU Intel Core line, it can lead Intel around pretty directly.
Does x86 inject overhead? Sure. I think it was on the order of 10-15% on simpler cores. That alone would be surmountable.
However, the definition of competitive is not in Intel's control, and one of the powers that be that controls that definition and is leading vast swaths of the market by the nose is making the A9 and getting all the profits--of which a component provider would only see a tiny sliver of.
The A9 itself doesn't need to make a lot of money for itself to justify investment, that's what the phone/software/content stuff is for.
So, in summary:
Significant design lead times (not x86-specific)
Significant catch-up in non-CPU IP (not x86-specific)
Only recent shift from a period where a single core design across multiple orders of magnitude of power/performance/features/scaling was the assumed optimum (not x86-specific)
Significant organizational inertia (not x86-specific)
Market that gives significantly less money to a component provider (non-x86)
"Competitive" defined by someone who has been in the drivers seat for the market (non-x86)
edit:
One other thing that is more speculative is whether Intel has been trying to get a retargeted core architecture. There has definitely been changes since Sandy Bridge through Skylake, but I thought I saw some rumors of various new designs that never came to fruition. There is still the risk factor that a new design does not pan out, or fails to hit a sweet spot with the physical realities of manufacturing. The Pentium 4 on 90nm, or Bulldozer in general can show that a design's implementations can surprise designers who had to make judgement calls years before.
The latest FinFET nodes and the phone market are big changes for retargeting design priorities, but it doesn't strike me that the value-add for the various cores Intel has brought out have matched it, yet.
Signs of slowed rollouts and the physical split with the latest gen seem to point to a design transition that might be overdue.