BTW I think that having two L/S ports is on the low side for a design that wide.
I agree although at least the extra registers of ARMv8 should help compared to ARMv7 or x86-64, right? Also an extra load unit might be difficult to feed from the L1 data cache - I'm not sure what trade-offs they are doing there in terms of bandwidth/banks/etc. given their huge 128KiB capacity.

Coming from a GPU background I have maybe an irrational dislike of register spilling, but in extreme cases I've wondered whether it'd be more energy efficient for the compiler to recompute certain results rather than store it in L1 then reload it (if the data needed to compute is being kept in registers anyway for another reason - not sure how common that is in CPU workloads to be honest, might be too rare to focus on). I don't think modern compilers do this? Anyway it probably wouldn't make much difference and it's a bit academic...

BTW - do we know what's the L1 data cache line size for Apple? ARM's cores have 64 bytes cache lines, but in my mind, that's partly because some customers will use memory controllers with 64 bytes granularity. Since Apple controls the entire SoC, they might (or might not) have decided that 32 bytes granularity is still beneficial despite the cost in the memory controller, at which point it might make sense for the L1 data cache to also have 32 bytes cache lines. With smaller cache lines prefetching also becomes slightly more important, but given Apple's performance levels it's obvious they must have good prefetching algorithms.
 
I agree although at least the extra registers of ARMv8 should help compared to ARMv7 or x86-64, right? Also an extra load unit might be difficult to feed from the L1 data cache - I'm not sure what trade-offs they are doing there in terms of bandwidth/banks/etc. given their huge 128KiB capacity.
I was not thinking about having a third load, but rather being able to issue two loads and one store; that's useful for some computing tasks that stream their data (e.g., summing two vectors). IIRC Intel can do it since Haswell and their CPU are less wide.
 
9to5mac claims that the rumored iPad Pro (2018) will have an A12X SoC.

No numbers are given, but I think a reasonable guess from previous -X SoCs is the following:
  • 3 Vortex cores clocked slightly higher than in the A12
  • 6? Tempest cores
  • 128-bit memory interface
  • 8 GPU cores (2x the A12)
  • 16 Neural Engine cores (2x the A12), ~10 TOPS.
This may be a silly question, but do the numbers of Vortex and Tempest cores have to be in a fixed (1:2) ratio?

One of the more interesting rumors (Kuo, 9to5mac) is that the iPad Pro may have a USB-C port instead of the Lightning port, which allows for 4K output. This change further differentiates the iPad Pro from the iPhone and iPad (non-Pro) and seems to push it a bit closer to laptop territory.
 
The ipad pro getting a USB-C interface would make it more port-friendly than the Surface Pro 6, which would be hilarious.

Especially considering Microsoft gimped the GPU on their high-end Core i7 offering, which now only gets a remarkably old Gen9 GT2 GPU.

Interesting times ahead.
 
AnandTech has released SPEC2006 estimates for the small CPU cores in the A11 and A12, as well as Neural Engine benchmarks.
Andrei Frumusanu said:
What did surprise me a lot was seeing just how well Apple’s small cores compare to Arm’s Cortex-A73 under SPECint. Here Apple’s small cores almost match the performance of Arm’s high-performance cores from ust 2 years ago. In SPEC's integer workloads, A12 Tempest is nearly equivalent to a 2.1GHz A73.

However in the SPECfp workloads, the small cores aren’t competitive.
[…]
In recent years I’ve felt that Arm’s little core performance range has become insufficient in many workloads, and this may also be why we’re going to see a lot more three-tiered SoCs (such as the Kirin 980) in the coming future.
Is there any benefit for a future A-series SoC to have "big" cores, out-of-order "little" cores, and a third tier of in-order tiny cores?
 
AnandTech has released SPEC2006 estimates for the small CPU cores in the A11 and A12, as well as Neural Engine benchmarks.
Is there any benefit for a future A-series SoC to have "big" cores, out-of-order "little" cores, and a third tier of in-order tiny cores?
I was referring to the Android SoCs - the middle gap is now quite big. We'll see some interesting solutions in the next gen for this.
 
Andrei, how did you run SPEC on little cores?
Seems iOS11 didn't paid attention to thread affinity, at least last time I tried.
https://developer.apple.com/library...ef/doc/uid/TP40006635-CH1-DontLinkElementID_2

BTW, your frequency measurements are bit off.
All precise frequencies I measured on A7,A9,A11 are divisible by 24MHz. (So 1587 or 2083 are just not possible)
This is a CNTFRQ_EL0 timebase.

The fastest way to read a timer is
isb
mrs x0, CNTPCT_EL0
ret

While 2064MHz (on Monsoon) was measured by my early freq timing code, the later versions with simultaneous measurements on N=3..6 cores
got the same 2304MHz Monsoon max freq as with dual cores. But I think I need to re-check min frequency in this situation.
I'll going to buy iPhone XR and revisit my measurements.
 
That's just some assumption, you can program PLLs with any frequency.
In theory, but we have frequency range/steps defined by Apple.

BTW do you have a plan to dig into core microarchitecture?
I'm trying to code some uarch tests as well.
You did a good job with review, a lot of Intel fanboys were seriously butthurt :D
 
The new iPad Pros and with the A12x SoC has been revealed.
Well, damn. They promise 35% higher single thread performance and over 90% better multithread over its predecessor, which in Geekbench 4 terms translates to a single thread score of 5400 and a multithread score of just under 20000. For ballpark reference, that's the performance level of Intels core i7 6700K.
And it has supporting computational functionality on the SoC that the x86 environment lacks.
The GPU seems to be effectively twice the performance of the previous iPad Pro, but with some twists. They made repeated references to game consoles, saying it was as fast as the XB1s but capable of feats that consoles cannot match - like portability and 120Hz display. (Although, trying to nip a pointless discussion in the bud, consoles are obviously a different market altogether.) Rather they targeted laptops, demonstrating that the iPad outsold all other laptops even aggregated by manufacturer (Apple themselves being edited out of the comparison), and claiming that the iPad Pros also outperform the overwhelming majority of laptops.
It is certainly true that it stomps all over the new MacBook Air in terms of performance. The iPad editing speed of a 3GB Photoshop file sure was impressive.

As an aside to Nebuchadnezzar, at these performance levels, it would be really neat if you extended the performance/power comparisons beyond the ARM cores and mobile GPUs, and included desktop or portable GPUs as well. Even though it's a can of worms, it would be neat to see what ballpark we are in. CPU comparisons can be done back-of-the-envelope already, but the GPU comparison is trickier. Just toss in a single x86 core and a single desktop GPU, and people have data points to extrapolate from to other products.
 
Well, damn. They promise 35% higher single thread performance and over 90% better multithread over its predecessor
Which predecessor? A10X or A12?


Nonetheless, it sounds like an impressive SoC. I wonder if it has the same 4*32bit channels. Using LPDDR4X 4266MT/s, they'd get almost 70GB/s total bandwidth. That's a lot more than a Geforce MX150 (GP108) GPU, and actually above the Xbone without the EDRAM.
 
Which predecessor? A10X or A12?


Nonetheless, it sounds like an impressive SoC. I wonder if it has the same 4*32bit channels. Using LPDDR4X 4266MT/s, they'd get almost 70GB/s total bandwidth. That's a lot more than a Geforce MX150 (GP108) GPU, and actually above the Xbone without the EDRAM.
They compared to the A10x. Still have an aluminium body, which helps with heat dissipation, as opposed to the iPhones.
 
According to Steve Troughton-Smith, the 2018 iPad Pro features 6 GB RAM for both the 11" and 12.9", but only for the highest end storage (1 TB) configuration. The other storage capacities continue to have 4 GB.

I think this is the first time an iOS device has split RAM sizes by storage capacity. I was hoping for more RAM in the new iPad Pro (6 or 8 GB and regardless of storage) given the higher-end feature set compared to previous iPad Pros and since the iPhone XS and XS Max moved up to 4 GB this year.
 
Well, damn. They promise 35% higher single thread performance and over 90% better multithread over its predecessor
dont worry based on the last half decade or so next years intel CPU will be another ~5% quicker :yes:

Does any intel/AMD chip running without a fan match this chip?
 
Back
Top