Apple A9X SoC

Discussion in 'Mobile Devices and SoCs' started by tangey, Nov 8, 2015.

  1. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,724
    Likes Received:
    195
    Location:
    Stateless
    Back on the Intel vs Apple topic, I would think that Intel designs are a lot more constrained than Apple wrt to the variety and cheers numbers of requirements they must meet.
    The whole memory architecture can deal from one cores to... many for example. I also think that accommodating the huge SIMD units Intel packs in take its toll on the design. Then there is the benching environment Intel processors are tested against a bunch of different tasks and environments.
    Intel is not design it cores ( and uncore) with a proprietary phone and tablet, they aim at a way broader market with a single scalable IP, it sounds like a completely different effort. I'm not sure it is the best way to do it but that is what they do.
    The Atom line is interesting but Intel does not really put its wait behind it, they iterate quite slowly compare to the mobile manufacturers (and Apple) but here again I suspect they are setting for themselves standard that may be a little out of place, those are not server chips (actually Intel do server chips out of Atom).
    Overall an "issue" I see with Intel approach and how they chips compare to Apple ones or other mobile manufacturers is something I could sum up like that in the GPU world/ try to ship a competitive IEEE FP32 compliant GPU at a time when FP16 happens at a quarter the speed and any type of IEEE compliance is a secondary concern. Overall it impact time to market, costs, power efficiency, etc.
     
  2. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,661
    Likes Received:
    1,114
    The L3 being a victim cache in A9 explains why they killed it in A9X. With the increased pixel count for the display, the GPU would be constantly flushing/thrashing a 4MB L3; Zero benefits for non-zero power/die area.

    I'd expect it back in next year's iteration with the LLC being bypassable by the GPU, similar to how Intel changed the memory semantics from Crystalwell -> SkyLake w. Iris Pro.

    Cheers
     
  3. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    I think I missed the documentation or discussion on how the GPU's use of the L3 was measured. Is it certain it cannot already bypass the LLC?
     
  4. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    430
    Location:
    Cleveland, OH
    No, but I think the L3 cache was mainly there for GPU performance. CPU benchmarks don't suggest that iPhone 6s is much better off with it.

    That's not to say that it couldn't still be used selectively for the GPU for some memory accesses only, maybe vertex data. But it could be a big SoC design and software shift to do this.

    I actually wonder if the L3 cache could even keep up with the 51.2GB/s memory bandwidth. On A9X it'd need to push a cache line every 3 CPU cycles or so which is really fast for an LLC on a low power SoC. If it's actually lower bandwidth that A9X's main memory then that'd be a pretty good reason not to include it.
     
    Gubbi and Grall like this.
  5. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,661
    Likes Received:
    1,114
    That's what I'm guessing too. It's for increasing effective bandwidth to the GPU and lowering power.

    To be effective, the L3 has to be able to cache the various buffers the GPU uses. The devices (tablets) where A9X is to be deployed all have very high resolution, so the L3 would have to be excessively big to be of any use. Since these larger devices have relaxed power consumption constraints, killing the L3 and saving a ton of die area makes sense.

    Cheers
     
  6. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,360
    Likes Received:
    1,377
    Well, the Anandtech article is up, and with it some SPEC2006 data.
    Once libquantum is excluded, the other subtests show similar performance to Intels offerings in the segment, with the wide spread in individual results that can be expected testing different architectures and compilers. Graphics performance of the A9x at the same power level is higher than Intels counterparts, but that has been well documented elsewhere. That Apple increased the sampling of its pen from 120 to 240Hz was known already. Probably didn't happen without justification, so there is probably some relevance there for VR and similar.

    By now, I wonder where the A9x will show up next.
     
  7. In no small part because the A9X is almost %50 larger than Skylake's 2+2 and uses a much more expensive and higher-performing memory subsystem (128bit LPDDR4 3200MT/s vs. 128bit DDR3L 1600MHz).

    Nonetheless, it's impressive how Intel's alien technology is close to being matched in their tablet form factors.
     
  8. wco81

    Legend

    Joined:
    Mar 20, 2004
    Messages:
    6,920
    Likes Received:
    630
    Location:
    West Coast
    If they can get A9X performance into a iPad Air form factor and price this year, it'll be pretty good progress.
     
  9. iMacmatician

    Regular

    Joined:
    Jul 24, 2010
    Messages:
    797
    Likes Received:
    223
    Spreadsheet with SPECint2006 data from AnandTech and other places:
    https://docs.google.com/spreadsheet...fDWXMpn231WXvrlx5zokj8m0Q/edit#gid=1255253279

    For some reason I expected a higher overall per-clock improvement from Typhoon to Twister. The overall score of the A9X isn't far from the scores of the similarly Core M's in the comparison if one excludes libquantum, so I could see Apple catching up in a couple of generations.

    Also, how does LLVM compare to icc in the context of SPEC performance?
     
  10. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    That would be interesting, although is it catching up with the Core M or catching up with a Core successor?
    A couple generations is a long time for the A9 and its successors to thread the needle of being performant at this target range, but not becoming threatening enough to prod Intel into a Core M architecture that eschews the high-end performance and scalability features that don't help in this range.
    There are some signs that Intel is no longer as committed to the one-Core philosophy, and a generous time frame like multiple generations is enough time to see if it strays.
     
  11. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,360
    Likes Received:
    1,377
    Libquantum and to a lesser extent hmmer is broken by icc. For the purpose of comparing architectures those two at least should have been excluded, or ideally LLVM be used for both architectures.
    As far as this benchmark run goes, the CPU cores in the iPad Pro and the MacBook actually perform equivalently.
     
  12. Laurent06

    Veteran

    Joined:
    Dec 14, 2007
    Messages:
    1,091
    Likes Received:
    489
    Note that Anandtech compiled SPEC targetting 32-bit code on x86. This makes gcc and mcf scores significantly better than they should be. And I wonder why they didn't use llvm or gcc for x86, everybody knows Intel has spent a large amount of time and effort to tune icc for SPEC which makes such comparisons dubious.
     
  13. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,235
    Likes Received:
    4,259
    Location:
    Guess...
    The takeaway I got from the Anandtech article is that the A9x is competitive with, but overall a little slower than Intels 1 generation old CoreM base model and quite a bit slower than Intels 1 generation old top end CoreM model - all passive cooled.

    Yes they are catching up, but that's more likely due to there being more low hanging fruit at Apples previous performance levels than at Intels. It doesn't mean Apples performance will continue to grow at the same rate and ultimately overtake Intel.
     
  14. Raqia

    Regular

    Joined:
    Oct 31, 2003
    Messages:
    508
    Likes Received:
    18
    As an aside, I'm aware of the outlier that is LibQuantum w/ ICC and agree it shouldn't be admissible on test focused on the hardware itself, but how generalizable are the optimizations that Intel's made here? Do they detect the specific code signature of SPEC or are they sufficiently general that you really might get better performance from real code?

    I think Intel's compiler is genuinely better than a lot of others, I've read about a lot of the neat tricks it's able to pull off like reordering loops to hoist certain variables:

    http://stackoverflow.com/questions/...-a-sorted-array-faster-than-an-unsorted-array

    In the real world, as much as iOS is a part of the A9's advantage, I have to say ICC is a part of Intel's.
     
    #94 Raqia, Jan 24, 2016
    Last edited: Jan 24, 2016
  15. wco81

    Legend

    Joined:
    Mar 20, 2004
    Messages:
    6,920
    Likes Received:
    630
    Location:
    West Coast
    I don't think Apple necessarily needs to get better than Intel performance on the iPad Pro but it does carry a high price so I guess it needs to be able to run applications that other mobile devices can't run.

    So the demo they gave were for loading a huge AutoCAD model. But drawing with the Pencil is what more people will be drawn to. Do you need a lot of processing power for drawing apps, with minimal input lag? Or is it the 4 GB of RAM?
     
  16. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,360
    Likes Received:
    1,377
    No not really. Which is why I wrote that libquantum and hmmer should probably be excluded. Mcf is an outlier too, btw, when you compare icc to other compilers.
    Look, The SPEC suite has been analyzed by some pretty sharp people. It is well known that intel does things in their compiler that some justifiably consider cheating although it formally stays within the limits of the benchmark rules. You can find formal studies comparing the results of intels compiler with what is used for producing commercial code, and any number of discussion threads on the web.

    Everyone who is dry behind his benchmarking ears knows this, and if you're not, a five minute google search should be enough to explain why using intels compiler specifically, but basically any different compilers and settings is a lousy idea in a case like this. Unless of course you're trying to make a point, rather than investigate it. That is what people like me and laurent are saying - that the choices made here are remarkable. We are actively avoiding asking the question of why such a procedure was chosen.
     
  17. Raqia

    Regular

    Joined:
    Oct 31, 2003
    Messages:
    508
    Likes Received:
    18
    No it's a genuine question; a 5 minute google search shows icc can parallelize code like libquantum across multiple threads and that it was a matter of some controversy the same optimizations weren't showing up for non-Intel CPUs, but it wasn't clear to me if it was considered a cheat since that compiler seems to have a lot of tricks up its sleeves that do work in many real situations which other compilers don't have.
     
  18. pcchen

    pcchen Moderator
    Moderator Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,018
    Likes Received:
    581
    Location:
    Taiwan
    It's a cheat because it basically works only for libquantum.
    ICC is a good compiler, but the problem with ICC+SPEC is that it has too much SPEC specific optimisations (this is not just an Intel thing though, everyone's doing it when SPEC was still very relevant). That's why people tend to look at "difficult" benchmarks such as gcc to get a better idea of how well these CPU perform. It's not because gcc shares performance characteristics with more applications (no benchmark can claim that), but since for most people it's very unlikely that Intel would do an application specific optimisation for you, comparing using gcc is more likely to get meaningful results.
     
    Laurent06 likes this.
  19. Laurent06

    Veteran

    Joined:
    Dec 14, 2007
    Messages:
    1,091
    Likes Received:
    489
    I've seen ICC perform optimization on some code that disappeared when switching to 64-bit, even though they'd have made the code faster. It was for one of the tests of AnTuTu. I don't trust ICC for anything benchmark-related, and I've never experienced any speedup beyond 5% on my own code (which isn't vectorizable) and even experienced some slowdowns and various crashes.
     
  20. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,360
    Likes Received:
    1,377
    Let me reinforce what pcchen is saying here. It is pretty much spot on.

    You can use the SPEC suite for a lot of things. Comparisons within an ISA, between ISAs, between compilers - all have their uses and issues.
    IF you are not comparing compilers (versions/opts) explicitly then you should a) keep compiler and settings identical across your sampling (if possible), and b) avoid using a compiler that specifically targets your benchmark, since that alone will invalidate transferability of your results to the general case.

    You can note that in spite of the A9x beating the MacBook in gcc pcchen does not draw any wide reaching conclusions about A9x vs. X86 performance despite arguing that it may be the most reliably consistent subtest, but rather notes the limitations of benchmarking. His remarks are doubly true when comparing across architectures. Not all of us have an agenda. I was genuinely interested to see if SPEC2006 could show more about relative strengths and weaknesses of the processors. It's a bit of a lost opportunity.
     
    #100 Entropy, Jan 25, 2016
    Last edited: Jan 25, 2016
    Laurent06 likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...