Apple A9X SoC

Discussion in 'Mobile Devices and SoCs' started by tangey, Nov 8, 2015.

  1. tangey

    Veteran

    Joined:
    Jul 28, 2006
    Messages:
    1,476
    Likes Received:
    224
    Location:
    0x5FF6BC
    #21 tangey, Nov 11, 2015
    Last edited: Nov 11, 2015
  2. Ryan Smith

    Regular

    Joined:
    Mar 26, 2010
    Messages:
    623
    Likes Received:
    1,095
    Location:
    PCIe x16_1
    It appears to be a straightforward higher-clocked Twister. So there aren't any surprises that we've found other than the clockspeed.

    Beg your pardon? Apple clearly has a full license for PowerVR Series7XT. They can go build a design with as many clusters as they'd like (as long as it's an even number). I don't know if IMG was expecting anyone to build a 10 cluster design, but it's certainly an intentional option with the scalability of the architecture.
     
  3. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    16,875
    Likes Received:
    6,042
    They have only caught them in certain crypto schemes due to more hardware accelerated crypto on the A9X. Not in general INT or FPU.

    Regards,
    SB
     
  4. tangey

    Veteran

    Joined:
    Jul 28, 2006
    Messages:
    1,476
    Likes Received:
    224
    Location:
    0x5FF6BC
    You may beg all you want, my answer was merely reminding ailuros that the possibility that it was a 10 cluster design was from the article, not me. I was not suggestion the case was right or wrong.

    I imagine (!) Apple has the capability to build GPU configurations of whatever design they want, however given that something in excess of 35%+ of IMG's income comes from Apple (and probably well in excess of 60% of the GPU division income), I think it is at least as likely that whatever the configuration, IMG would be more than happy, and much better placed, to design custom GPU configurations for the customer that is responsible for their ongoing existence. Given that Apple have been IMG's lead customer for the last number of iterations of their GPU designs, I assume each gen, at least at the high end, is designed first and foremost with Apple in mind, and possibly tailored to their exact needs. For example, I can't see them designing series 8, in isolation of Apples needs.
     
    #24 tangey, Nov 11, 2015
    Last edited: Nov 11, 2015
  5. pixelio

    Newcomer

    Joined:
    Feb 17, 2014
    Messages:
    47
    Likes Received:
    75
    Location:
    Seattle, WA
    I'm surprised no one has written a Metal kernel that occupies a single USC for some reasonable amount of time.

    Take that kernel and launch increasingly larger grids.

    The runtimes should should reveal how many USC's are onboard... unless there are limitations on when they're made available.

    I'll wait here while one of you writes the microbenchmark. :rolleyes:
     
  6. iMacmatician

    Regular

    Joined:
    Jul 24, 2010
    Messages:
    778
    Likes Received:
    206
    iFixit is tearing down the iPad Pro.

    So far, an Apple SoC model number is of the form "APL1xxx" if and only if it is made on a TSMC process (list here). I'm thinking that this A9X is made by TSMC.
     
    #26 iMacmatician, Nov 12, 2015
    Last edited: Nov 12, 2015
  7. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,429
    Likes Received:
    181
    Location:
    Chania
    Correct me if I'm wrong but there's a 7200, 7600 available in the IP portofolio that had been announced but no 10 cluster config, since it goes from 8 (7800) straight to 16 clusters (7900). I could figure that a customer like Apple could theoretically ask for a 10 cluster config if they'd want to, but assuming the A8X GPU was semi-custom as by Anandtech's own past speculations and Apple just had mirrored the GX6450 and the trend has repeated itself in A9X, then anything outside another mirror sounds like it would make only sense if all other parts of the GPU apart from ALUs/TMUs have the same unit count size between a 7600 and a 7400 to reach 10 clusters. It shouldn't be a problem but performance wouldn't scale exactly as someone might expect.

    Is it 7200, 7400/7600, 7800 or is it 7200/7400, 7600/7800 when it comes to critical 'stuff' in the front and back end? :p
     
  8. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,429
    Likes Received:
    181
    Location:
    Chania
    Arun most likely won't be able to answer my questions/thoughts above, but since they're scaling clusters and not cores anymore, not everything gets necessarily duplicated in all cases from config to config apart from ALUs/TMUs. For the layman here/average reader it sounds a wee bit more complicated then just sticking lego bricks next to each other. Again my question remains: why not take a 7600 and simply mirror it, then making it more complicated and go for 6+4 if it truly is a semi custom design and it's not an IMG config after all.
     
    #28 Ailuros, Nov 12, 2015
    Last edited: Nov 12, 2015
  9. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,137
    Likes Received:
    1,111
    Oh fer chrissakes, not that tired SHA1 argument again, when all you have to do is look a few lines up and see exactly the same magnitude advantage for x86 in AES. Link
    The A9x is simply a fast processor compared to low power x86. CPU, GPU, cache hierarchy, main memory subsystem - it all looks good vs. Core M.
    And the tests so far show little sign of throttling, despite running fanless in a tablet.
    That Apple does this on a merchant lithographic process, vs. Intels cutting edge targeted combination of process/tools/product could be seen as remarkable, but there it is.
     
    Grall likes this.
  10. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,174
    Location:
    La-la land
    ARM chips can skip the extra microcode breakdown and special case handling BS that crappy ole x86 instruction set needs to run on modern CPU cores. Does anyone have a notion of how much power is actually sunk into that stuff, roughly?
     
  11. pcchen

    pcchen Moderator
    Moderator Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    2,794
    Likes Received:
    187
    Location:
    Taiwan
    I don't have numbers, but it matters a lot less than many people think, especially when you are comparing between two deep out-of-order cores (which both Core M and A9 are).
     
  12. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,174
    Location:
    La-la land
    Perhaps. It's still a vestigal chain dragging x86 down, even if it might not be a big one, or even the biggest.

    In other, non-A9X-related news, when looking at iFixit's teardown, there turns out to be no connection between the speaker drivers of the Pro-pad and its associated, much-vaunted resonance chambers. Shenanigans?
     
  13. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    7,614
    Likes Received:
    771
    Location:
    Guess...
    I'm certainly impressed with the GPU performance. Looks to be even faster than Iris Pro 6200 (at least in this power constrained environment). I wonder if the Skylake Iris Pro will be able to change that. On paper it should be able to given the much larger number of EU's, but if it's power constraints that are holding back the 6200 then more EU's may not make much of a difference.

    That's not to say I'm unimpressed with the CPU performance either. It's still a fair bit slower than the 15w Skylake in the multithreaded tests (albeit very close in the single threaded tests) but is clearly a lot faster than the 4.5w Core M. I guess the big question is what is the TDP of the A9X.

    Anyone know?
     
  14. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,572
    Likes Received:
    965
    The only place x86 adds significant complexity is in the decode stage. Intel and AMD mitigates this in various ways; Intel cache decoded uOps (1.5K of them), AMD by marking instruction boundaries.

    Partial register updates and lots of condition code updates add a bit of complexity. On the other hand, instructions with memorperands allow for larger effective ROB capacity (because each ROB/instruction entry holds two ops).

    Cheers
     
  15. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,174
    Location:
    La-la land
    It can't be an awful lot; from what I can tell the chip is entirely passively cooled. It's not even facing the outer aluminium casing... It's sandwiched between the display and the system PCB. :p
     
  16. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,137
    Likes Received:
    1,111
    I don't think Apple lists any kind of TDP for their parts. And if they did, their methods would probably differ from intels. Furthermore, throttling behaviour, thermal environment (Grall indicated a source for the iPad Pro above) and so on would further muddy the waters.
    The SHA1 vs AES performance data also neatly demonstrates the difficulties in making high precision comparisons between architectures.
    The best comparisons of the A9x vs. best in class x86 would probably be between the iPad Pro and the new retina MacBook. Same manufacturer, similar OS underpinnings, similar browser for those browser benchmarks, similar compilers and so on. Still won't give better than a ballpark idea, but I can't see a better option.
     
    #36 Entropy, Nov 13, 2015
    Last edited: Nov 13, 2015
  17. Lazy8s

    Veteran

    Joined:
    Oct 3, 2002
    Messages:
    3,100
    Likes Received:
    18
    The benchmark results raise my confidence in some of my initial guesses for the processor configuration of the A9X. The really big clock speed bump came to pass for the CPU at least, so I figure the configuration of the GPU looks something like a 750 MHz 10 cluster GT7800+ paired with that 2.25 GHz dual core Twister.

    I really doubt twelve clusters would be needed. I'm not even sure ten clusters are needed to make those benchmark results, but I don't know why it would be referred to as a GT7800+ otherwise.
     
  18. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,429
    Likes Received:
    181
    Location:
    Chania
    10C@750MHz is wayyyyy too generous as a speculation, since it gets you to 480GFLOPs FP32, while you actually "just" need 360GFLOPs FP32 to meet Apple's own numbers: https://forum.beyond3d.com/posts/1871384/

    10C * 64 OPs/C = 640 OPs/clock * 0.563 GHz = 360 GFLOPs FP32 or 720 GFLOPs FP16
    or
    12C * 64 OPs/C = 768 OPs/clock * 0.469 GHz = 360 GFLOPs FP32 or 720 GFLOPs FP16
     
    tangey likes this.
  19. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,293
    Location:
    Helsinki, Finland
    Retina Macbook 12" vs iPad Pro is a interesting pair of hardware to compare.

    There are lots of similarities:
    - Passively cooled aluminum body premium products (with similar dimensions and similar weight)
    - Similar battery: 39.7 Wh (Mac) vs 38.5 Wh (iPad)
    - Similar screen size: 12" (Mac) vs 12.9" (iPad)
    - Similar screen resolution: 2304x1440 (Mac) vs 2732x2048 (iPad)
    - Similar reported battery life (9-10 hours, depending on activity)
    - Dual core CPUs

    As the reported battery life and the battery size are both identical, average system TDP must be pretty close as well. Mac's Intel Broadwell CPU has lower base clock (1.3 GHz), but higher turbo clock (2.9 GHz). iPad Pro's Twister CPU is reported to be running at 2.25 GHz (no info on dynamic clocking is available). As Xcode seamlessly supports both ARM and x86, I am sure Apple has already compiled and benchmarked lots of code on both CPUs. It would be nice to see more comprehensive benchmark comparison between these two similar systems, to get a refresh on the (high end low power) x86 vs ARM performance situation.

    I would also love to see more GPU benchmarks between these two systems. iPad Pro would likely be the winner, since it has 2x memory bandwidth. Intel has already announced EDRAM equipped chips for low power dual cores, allowing them to catch up rapidly.
     
    homerdog and RecessionCone like this.
  20. Turbotab

    Newcomer

    Joined:
    Feb 19, 2013
    Messages:
    214
    Likes Received:
    3
    Although not perfect, what is?, I feel that javascript benchmarks between the Core-M Macbook 2015 and iPad Pro will be very telling, for a workload that is routinely used on both devices, ie browsing. Assuming that the OS X team are targeting performance optimisations in Safari, in a similar vein to the iOS team, there are fewer variables than between Android Chrome and Safari.

    Does anybody have a Macbook 12" 2015 running El Capitan, to test Kraken and Octane V2?, if not I have a meeting near an Apple Store tomorrow, so I'll try and run the aforementioned on the Pro and Macbook 2015
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...