Apple A9X SoC

2.25/1.85=1.216
22% over the A9 if no other architectural improvements are present. Which there may well be. The absence of other than the geekbench memory scores is a bit suggestive.
It sure seems to pack a punch.
It appears to be a straightforward higher-clocked Twister. So there aren't any surprises that we've found other than the clockspeed.

Don't ask me, ask the anandtech guy, they are his thoughts :)
Beg your pardon? Apple clearly has a full license for PowerVR Series7XT. They can go build a design with as many clusters as they'd like (as long as it's an even number). I don't know if IMG was expecting anyone to build a 10 cluster design, but it's certainly an intentional option with the scalability of the architecture.
 
It appears to be a straightforward higher-clocked Twister. So there aren't any surprises that we've found other than the clockspeed.

Beg your pardon? Apple clearly has a full license for PowerVR Series7XT. They can go build a design with as many clusters as they'd like (as long as it's an even number). I don't know if IMG was expecting anyone to build a 10 cluster design, but it's certainly an intentional option with the scalability of the architecture.

You may beg all you want, my answer was merely reminding ailuros that the possibility that it was a 10 cluster design was from the article, not me. I was not suggestion the case was right or wrong.

I imagine (!) Apple has the capability to build GPU configurations of whatever design they want, however given that something in excess of 35%+ of IMG's income comes from Apple (and probably well in excess of 60% of the GPU division income), I think it is at least as likely that whatever the configuration, IMG would be more than happy, and much better placed, to design custom GPU configurations for the customer that is responsible for their ongoing existence. Given that Apple have been IMG's lead customer for the last number of iterations of their GPU designs, I assume each gen, at least at the high end, is designed first and foremost with Apple in mind, and possibly tailored to their exact needs. For example, I can't see them designing series 8, in isolation of Apples needs.
 
Last edited:
I'm surprised no one has written a Metal kernel that occupies a single USC for some reasonable amount of time.

Take that kernel and launch increasingly larger grids.

The runtimes should should reveal how many USC's are onboard... unless there are limitations on when they're made available.

I'll wait here while one of you writes the microbenchmark. :rolleyes:
 
iFixit is tearing down the iPad Pro.

Step 16

At long last! We've found the brains of the operation, the logic board!
  • Apple APL1021 A9X 64-bit Processor
So far, an Apple SoC model number is of the form "APL1xxx" if and only if it is made on a TSMC process (list here). I'm thinking that this A9X is made by TSMC.
 
Last edited:
How do you get to 6 clusters? Take a 7400 and glue a 7200 next to it? :p

Correct me if I'm wrong but there's a 7200, 7600 available in the IP portofolio that had been announced but no 10 cluster config, since it goes from 8 (7800) straight to 16 clusters (7900). I could figure that a customer like Apple could theoretically ask for a 10 cluster config if they'd want to, but assuming the A8X GPU was semi-custom as by Anandtech's own past speculations and Apple just had mirrored the GX6450 and the trend has repeated itself in A9X, then anything outside another mirror sounds like it would make only sense if all other parts of the GPU apart from ALUs/TMUs have the same unit count size between a 7600 and a 7400 to reach 10 clusters. It shouldn't be a problem but performance wouldn't scale exactly as someone might expect.

Is it 7200, 7400/7600, 7800 or is it 7200/7400, 7600/7800 when it comes to critical 'stuff' in the front and back end? :p
 
I
Beg your pardon? Apple clearly has a full license for PowerVR Series7XT. They can go build a design with as many clusters as they'd like (as long as it's an even number). I don't know if IMG was expecting anyone to build a 10 cluster design, but it's certainly an intentional option with the scalability of the architecture.

Arun most likely won't be able to answer my questions/thoughts above, but since they're scaling clusters and not cores anymore, not everything gets necessarily duplicated in all cases from config to config apart from ALUs/TMUs. For the layman here/average reader it sounds a wee bit more complicated then just sticking lego bricks next to each other. Again my question remains: why not take a 7600 and simply mirror it, then making it more complicated and go for 6+4 if it truly is a semi custom design and it's not an IMG config after all.
 
Last edited:
They have only caught them in certain crypto schemes due to more hardware accelerated crypto on the A9X. Not in general INT or FPU.

Regards,
SB
Oh fer chrissakes, not that tired SHA1 argument again, when all you have to do is look a few lines up and see exactly the same magnitude advantage for x86 in AES. Link
The A9x is simply a fast processor compared to low power x86. CPU, GPU, cache hierarchy, main memory subsystem - it all looks good vs. Core M.
And the tests so far show little sign of throttling, despite running fanless in a tablet.
That Apple does this on a merchant lithographic process, vs. Intels cutting edge targeted combination of process/tools/product could be seen as remarkable, but there it is.
 
ARM chips can skip the extra microcode breakdown and special case handling BS that crappy ole x86 instruction set needs to run on modern CPU cores. Does anyone have a notion of how much power is actually sunk into that stuff, roughly?
 
ARM chips can skip the extra microcode breakdown and special case handling BS that crappy ole x86 instruction set needs to run on modern CPU cores. Does anyone have a notion of how much power is actually sunk into that stuff, roughly?

I don't have numbers, but it matters a lot less than many people think, especially when you are comparing between two deep out-of-order cores (which both Core M and A9 are).
 
Perhaps. It's still a vestigal chain dragging x86 down, even if it might not be a big one, or even the biggest.

In other, non-A9X-related news, when looking at iFixit's teardown, there turns out to be no connection between the speaker drivers of the Pro-pad and its associated, much-vaunted resonance chambers. Shenanigans?
 
I'm certainly impressed with the GPU performance. Looks to be even faster than Iris Pro 6200 (at least in this power constrained environment). I wonder if the Skylake Iris Pro will be able to change that. On paper it should be able to given the much larger number of EU's, but if it's power constraints that are holding back the 6200 then more EU's may not make much of a difference.

That's not to say I'm unimpressed with the CPU performance either. It's still a fair bit slower than the 15w Skylake in the multithreaded tests (albeit very close in the single threaded tests) but is clearly a lot faster than the 4.5w Core M. I guess the big question is what is the TDP of the A9X.

Anyone know?
 
Perhaps. It's still a vestigal chain dragging x86 down, even if it might not be a big one, or even the biggest.

The only place x86 adds significant complexity is in the decode stage. Intel and AMD mitigates this in various ways; Intel cache decoded uOps (1.5K of them), AMD by marking instruction boundaries.

Partial register updates and lots of condition code updates add a bit of complexity. On the other hand, instructions with memorperands allow for larger effective ROB capacity (because each ROB/instruction entry holds two ops).

Cheers
 
I guess the big question is what is the TDP of the A9X.
It can't be an awful lot; from what I can tell the chip is entirely passively cooled. It's not even facing the outer aluminium casing... It's sandwiched between the display and the system PCB. :p
 
I'm certainly impressed with the GPU performance. Looks to be even faster than Iris Pro 6200 (at least in this power constrained environment). I wonder if the Skylake Iris Pro will be able to change that. On paper it should be able to given the much larger number of EU's, but if it's power constraints that are holding back the 6200 then more EU's may not make much of a difference.

That's not to say I'm unimpressed with the CPU performance either. It's still a fair bit slower than the 15w Skylake in the multithreaded tests (albeit very close in the single threaded tests) but is clearly a lot faster than the 4.5w Core M. I guess the big question is what is the TDP of the A9X.

Anyone know?
I don't think Apple lists any kind of TDP for their parts. And if they did, their methods would probably differ from intels. Furthermore, throttling behaviour, thermal environment (Grall indicated a source for the iPad Pro above) and so on would further muddy the waters.
The SHA1 vs AES performance data also neatly demonstrates the difficulties in making high precision comparisons between architectures.
The best comparisons of the A9x vs. best in class x86 would probably be between the iPad Pro and the new retina MacBook. Same manufacturer, similar OS underpinnings, similar browser for those browser benchmarks, similar compilers and so on. Still won't give better than a ballpark idea, but I can't see a better option.
 
Last edited:
The benchmark results raise my confidence in some of my initial guesses for the processor configuration of the A9X. The really big clock speed bump came to pass for the CPU at least, so I figure the configuration of the GPU looks something like a 750 MHz 10 cluster GT7800+ paired with that 2.25 GHz dual core Twister.

I really doubt twelve clusters would be needed. I'm not even sure ten clusters are needed to make those benchmark results, but I don't know why it would be referred to as a GT7800+ otherwise.
 
The benchmark results raise my confidence in some of my initial guesses for the processor configuration of the A9X. The really big clock speed bump came to pass for the CPU at least, so I figure the configuration of the GPU looks something like a 750 MHz 10 cluster GT7800+ paired with that 2.25 GHz dual core Twister.

I really doubt twelve clusters would be needed. I'm not even sure ten clusters are needed to make those benchmark results, but I don't know why it would be referred to as a GT7800+ otherwise.

10C@750MHz is wayyyyy too generous as a speculation, since it gets you to 480GFLOPs FP32, while you actually "just" need 360GFLOPs FP32 to meet Apple's own numbers: https://forum.beyond3d.com/posts/1871384/

10C * 64 OPs/C = 640 OPs/clock * 0.563 GHz = 360 GFLOPs FP32 or 720 GFLOPs FP16
or
12C * 64 OPs/C = 768 OPs/clock * 0.469 GHz = 360 GFLOPs FP32 or 720 GFLOPs FP16
 
Retina Macbook 12" vs iPad Pro is a interesting pair of hardware to compare.

There are lots of similarities:
- Passively cooled aluminum body premium products (with similar dimensions and similar weight)
- Similar battery: 39.7 Wh (Mac) vs 38.5 Wh (iPad)
- Similar screen size: 12" (Mac) vs 12.9" (iPad)
- Similar screen resolution: 2304x1440 (Mac) vs 2732x2048 (iPad)
- Similar reported battery life (9-10 hours, depending on activity)
- Dual core CPUs

As the reported battery life and the battery size are both identical, average system TDP must be pretty close as well. Mac's Intel Broadwell CPU has lower base clock (1.3 GHz), but higher turbo clock (2.9 GHz). iPad Pro's Twister CPU is reported to be running at 2.25 GHz (no info on dynamic clocking is available). As Xcode seamlessly supports both ARM and x86, I am sure Apple has already compiled and benchmarked lots of code on both CPUs. It would be nice to see more comprehensive benchmark comparison between these two similar systems, to get a refresh on the (high end low power) x86 vs ARM performance situation.

I would also love to see more GPU benchmarks between these two systems. iPad Pro would likely be the winner, since it has 2x memory bandwidth. Intel has already announced EDRAM equipped chips for low power dual cores, allowing them to catch up rapidly.
 
Although not perfect, what is?, I feel that javascript benchmarks between the Core-M Macbook 2015 and iPad Pro will be very telling, for a workload that is routinely used on both devices, ie browsing. Assuming that the OS X team are targeting performance optimisations in Safari, in a similar vein to the iOS team, there are fewer variables than between Android Chrome and Safari.

Does anybody have a Macbook 12" 2015 running El Capitan, to test Kraken and Octane V2?, if not I have a meeting near an Apple Store tomorrow, so I'll try and run the aforementioned on the Pro and Macbook 2015
 
Back
Top