It seems rather early, but a Weibo user supposedly has Geekbench 4 scores for the rumored A12. I saw this post from a thread on AnandTech forums and the thread starter claims that the source is reliable.
i冰宇宙 (Google Translate) said:
A12 related, current GB4 scores 5200 13000 up and down, increase the number of branch forecasting units, is currently solving large nuclear power problems (average power consumption is still 23% higher than expected in the case of 7nm)
The A11 in the iPhone 8 Plus reaches 4216 single-core and 10186 multi-core on Geekbench 4 (as of this post) so if the claims are true, then the A12 is ~23% faster in single-core and ~28% faster in multi-core.

The power consumption of the core isn't encouraging though….
 
He's not the source, he just reposts stuff. Anyway the real source is generally reliable.

The full source states that the power is 30% higher than Apple's projections and that the thing is going to throttle a lot harder.
 
The full source states that the power is 30% higher than Apple's projections and that the thing is going to throttle a lot harder.
Do they have time for a respin, or perhaps the issue is more fundamental than what a quick fix can provide?

Maybe we'll see the first iPhone with a heatpipe this year... :p Apple has never bothered with any cooling measures whatsoever in the past, which is pretty crazy really. iPad Pro even sandwiches its SoC up against the display, so it can't use the aluminium case as a heatsink...
 
Do they have time for a respin, or perhaps the issue is more fundamental than what a quick fix can provide?

Maybe we'll see the first iPhone with a heatpipe this year... :p Apple has never bothered with any cooling measures whatsoever in the past, which is pretty crazy really. iPad Pro even sandwiches its SoC up against the display, so it can't use the aluminium case as a heatsink...
That's not true, recent iPhones have a lot of thermal dissipation tape and thermal paste. The iPad also has a very large dissipation area for the SoC.

Anyway if Apple targets the same TDP as the A11 then that means 3.5W to >5W in FP. 30% on that would be troublesome.
 
That's not true, recent iPhones have a lot of thermal dissipation tape and thermal paste. The iPad also has a very large dissipation area for the SoC.

Anyway if Apple targets the same TDP as the A11 then that means 3.5W to >5W in FP. 30% on that would be troublesome.
There are three questionmarks here though. One is if the increase from projected is actually accurate (projected under what conditions?) and another whether the target TDP is unchanged from A11. A third is of course if this chip is from a final production run or a test chip.
I kind of doubt that Apple orders a first volume batch of 50-100 million or so A12 chips only to be surprised that power draw is much higher than anticipated. Strains credibility. We’ll find out just how accurate this is when the final product is out.
How long is the lead time from wafer exposure to benchmarkable product anyway?
 
Two years in a row then. Last year 10nm didn't meet their targets either.

Let's hope Samsung delivers with their 7LPP in time for the S9. I am worried if their PPA targets are over 10LPE or LPP.
 
Two years in a row then. Last year 10nm didn't meet their targets either.

Let's hope Samsung delivers with their 7LPP in time for the S9. I am worried if their PPA targets are over 10LPE or LPP.
I'm actually at a loss as to how you (and Nebuchadnezzar) arrive at this narrative. Obviously, we don't have identical circuitry at the different TSMC lithographic nodes (16nm, 10nm, 7nm) to compare, but using for instance the table Anton Shilov compiled here, TSMC 10nm should yield a 20% performance improvement over 16nmFF+ and 7nm a 30% improvement over 16nmFF+ (<=> 8% improvement over 10nm) at ISO power. Scotten Jones (but that was a year ago) pegged the performance improvement from 16nmFF to 7nm at a slightly higher 35-40% although he says that he builds that estimate on a comparison with 16nmFF rather than 16nmFF+, which scaled would make his numbers basically identical to Antons. TSMCs own current numbers can be found here, and are slightly lower on 16nm->10nm improvement and slightly higher on 10nm->7nm performance improvement.

Now what have we seen in terms of product performance improvement? Apple has released two generations of 16nmFF processors, the A9 and A10, so lets check out the last and best of these (and presumably most similar), the 16nmFF+ A10, that produced a single core GB4 score of 3580. The 10nm A11 scored 4200. And the leak here claims a 7nm A12 score of 5200, on silicon that is highly unlikely (impossible) to be final production run. Yielding respectively a 16->10nm improvement of 17% and a 16->7nm improvement of 45%. And that while we know that the A11 for instance incorporated new functionality that is not reflected in its GB score (but carries a cost in power/resources), which may well be the case with the A12 as well quite apart from the other vagaries surrounding that number. You might argue that architectural advancement stands apart from the process, but that's not really true when you are operating within power constraints. A faster memory subsystem costs power, lower cache latency costs power, larger caches cost power and so on forcing an average drop in clock to compensate at ISO power.

I just don't see the rationale behind the assertion that TSMC has underperformed.
 
Last edited:
Not to be disrespectful but Anton doesn't seen to be into this kind of things. He gets a lot of things wrong or outdated when publishing news about foundries.

That apart, TSMC first claimed that 10nm would yield a 22%Performance and 40% power improvements over 16nm. Then they changed that to 15% performance and 35% power in their website here: http://www.tsmc.com/english/dedicatedFoundry/technology/10nm.htm

After that when Huawei released the Kirin 970 they said 10nm only brought 20% less power from 16nm and Andrei even measured that when reviewed the Huawei Mate 10. The irmprovements in the Kirin 970 and the Mediatek X30 in performance/power were disappointing for that architectures manufactured at TSMC's 10nm when compared to what they were claiming at the beginning the process would bring.

In that website also TSMC currently claims 7nm improves by 20% better performance and 40% less power than 10nm. But lets see.
 
I wonder whether there will be HW level Meltdown/Spectre mitigations in A12, and if so, what sort of power cost that will incur.
 
Not to be disrespectful but Anton doesn't seen to be into this kind of things. He gets a lot of things wrong or outdated when publishing news about foundries.

That apart, TSMC first claimed that 10nm would yield a 22%Performance and 40% power improvements over 16nm. Then they changed that to 15% performance and 35% power in their website here: http://www.tsmc.com/english/dedicatedFoundry/technology/10nm.htm

After that when Huawei released the Kirin 970 they said 10nm only brought 20% less power from 16nm and Andrei even measured that when reviewed the Huawei Mate 10. The irmprovements in the Kirin 970 and the Mediatek X30 in performance/power were disappointing for that architectures manufactured at TSMC's 10nm when compared to what they were claiming at the beginning the process would bring.

In that website also TSMC currently claims 7nm improves by 20% better performance and 40% less power than 10nm. But lets see.
Since you quote power and performance improvements in a non-lithography forum, let me clarify a bit of of PR sleight of hand that has become the norm in the industry for anyone reading who is not familiar with the practise.
"The new process X at foundry Y offers xx% better performance and yy% less power (but not at the same time)." That little clarification in parenthesis is absent in the PR statements, of course.
(Also, there is a fair bit of complexity hidden behind those numbers - memory/logic/IO don't behave in the same way for instance. These are merely ballpark PR numbers. Density can be CPxMP or the size of an SRAM cell, or calculated according to a formula Bohr made up to make intel look good. So - grains of salt recommended.)
 
I am very aware of that and implementation takes a huge role. I am comparing similar/identical IPs ported from 16nm to 10nm. I can't talk from Apple because I don't know what they have changed, but the A11 in non scientific metrics seems to run hotter than the A10 ( that doesn't mean it is more efficient though ).
 
What about future use of SVE ?
Are you sure SVE will ever be a good choice for a phone SoC? It seems nice but also expensive, and its scalability to long vectors doesn’t really pay off until you actually do (for the applications where longer vectors are optimal). How will it pay for itself on an iPhone? In it’s target HPC market there are use cases but in a phone SoC power draw and die area are limited commodities. Not saying it will never happen, but an iPhone SoC may not be the place to look for early implementations, and too little time has passed since its introduction for it to have been implemented anywhere at all to my knowledge.

That said, if the speculations about Apple using their own CPUs throughout their product matrix ever materialize, they might find a use for the variable vector length support of SVE. Maybe. It would be cool from an architectural point of view.
 
Are you sure SVE will ever be a good choice for a phone SoC? It seems nice but also expensive, and its scalability to long vectors doesn’t really pay off until you actually do (for the applications where longer vectors are optimal). How will it pay for itself on an iPhone? In it’s target HPC market there are use cases but in a phone SoC power draw and die area are limited commodities. Not saying it will never happen, but an iPhone SoC may not be the place to look for early implementations, and too little time has passed since its introduction for it to have been implemented anywhere at all to my knowledge.

That said, if the speculations about Apple using their own CPUs throughout their product matrix ever materialize, they might find a use for the variable vector length support of SVE. Maybe. It would be cool from an architectural point of view.

In my view (and experience), long vector SIMD instructions are better suited for general Neural Net computations than for example a dedicated NN coprocessor ...
 
In my view (and experience), long vector SIMD instructions are better suited for general Neural Net computations than for example a dedicated NN coprocessor ...
I’ll defer to your experience, but seeing as image sensors are getting their own intelligence built into them, and how Apple has already decided to produce and incorporate their own dedicated ”Neural Engine”, it seems as though they are not adressing such use cases with beefed up general resources.
This could change. But if so, it will have to make sense from a cost/benefit point of view, and substantially stronger general SIMD capabilities are going to carry a real cost in both die size and power draw (also in secondary effects as you need to be able to feed the SIMD array in order not to limit its benefits further). It’s a game of balance, and we lack most of the data needed to determine if it makes sense or not.
It’s interesting technology though, no question about that.
 
Last edited:
I’ll defer to your experience, but seeing as image sensors are getting their own intelligence built into them, and how Apple has already decided to produce and incorporate their own dedicated ”Neural Engine”, it seems as though they are not adressing such use cases with beefed up general resources.
This could change. But if so, it will have to make sense from a cost/benefit point of view, and substantially stronger general SIMD capabilities are going to carry a real cost in both die size and power draw (also in secondary effects as you need to be able to feed the SIMD array in order not to limit its benefits further). It’s a game of balance, and we lack most of the data needed to determine if it makes sense or not.
It’s interesting technology though, no question about that.

The historical analogy could be the floating point coprocessor (FPU) ...
 
Last edited:
The historical analogy could be the floating point coprocessor (FPU) ...
True, but in those days we were primarily limited by gates, not power.
For some time now, the trend has been that new processing needs are met by dedicated functional blocks or processors dealing with networking, image capture and processing, graphics, video coding and decoding and so on with "Neural Engines" (ugh) being yet another example of dedicated units doing as efficient a job as possible, and being switched off when they are done as opposed to beefing up a general processor and having it take care of it all. Heterogenous processing directed by and accessed through software layers. And it makes sense today. Of course there is bound to be a grey zone where functionality catering to a specific use case can be added to the general processor, and providing enough benefit to average code that it's a good idea to implement that way. But we haven't seen much of that for a long time.
 
True, but in those days we were primarily limited by gates, not power.
For some time now, the trend has been that new processing needs are met by dedicated functional blocks or processors dealing with networking, image capture and processing, graphics, video coding and decoding and so on with "Neural Engines" (ugh) being yet another example of dedicated units doing as efficient a job as possible, and being switched off when they are done as opposed to beefing up a general processor and having it take care of it all. Heterogenous processing directed by and accessed through software layers. And it makes sense today. Of course there is bound to be a grey zone where functionality catering to a specific use case can be added to the general processor, and providing enough benefit to average code that it's a good idea to implement that way. But we haven't seen much of that for a long time.

So a separate FPU coprocessor would make sense now then ?
One problem with coprocessors is latency ...

See for example:
https://www.fudzilla.com/news/ai/46258-intel-xeon-aws-ai-can-read-4x-faster-than-nvidia
 
Last edited:
Back
Top