Apple A8 and A8X

Discussion in 'Mobile Devices and SoCs' started by ltcommander.data, Sep 9, 2014.

  1. anexanhume

    Veteran

    Joined:
    Dec 5, 2011
    Messages:
    2,078
    Likes Received:
    1,535
    Chipworks has annotated their die shot. Their conclusion is quad core GPU (GX6450), which I agree with. SRAM seems roughly same size, so if they could double density, that'd give us 8MB.

    For the CPU core, L2 cache appears to be independent now, from what I can tell. CPU block shrank pretty considerably overall, suggesting it is just tweaks to the Cyclone core.

    [​IMG]
     
  2. wco81

    Legend

    Joined:
    Mar 20, 2004
    Messages:
    6,920
    Likes Received:
    630
    Location:
    West Coast
    What about the Secure Element, the enclave where they store the Apple Pay data? Could that be integrated into the A8?
     
  3. Nebuchadnezzar

    Legend

    Joined:
    Feb 10, 2002
    Messages:
    1,061
    Likes Received:
    328
    Location:
    Luxembourg
    A TrustZone controller with some kind of memory for it is minuscule.
    Not everything scales perfectly with die shrinks. Also how do you know they don't have a PCIe controller?
     
  4. ltcommander.data

    Regular

    Joined:
    Apr 4, 2010
    Messages:
    616
    Likes Received:
    15
    http://recode.net/2014/09/23/teardown-shows-apples-iphone-6-cost-at-least-200-to-build/

    IHS is claiming Samsung is still manufacturing 40% of the A8. Does it take a lot of effort to bring up a chip on a different process? If this is true, it'd be interesting for Chipworks to do a Samsung vs TSMC A8 die comparison and maybe Anandtech could do a performance, power consumption comparison.
     
  5. anexanhume

    Veteran

    Joined:
    Dec 5, 2011
    Messages:
    2,078
    Likes Received:
    1,535
    It would take a lot of effort. Different libraries for the processes. Different design rules, etc. Unless they consolidated them into one rule subset somehow, potentially taking lowest common denominator performance wise. Biggest issue is doing full custom designs and having two different libraries and validation rules to work against.

    Anandtech says that the SRAM is still just 4MB (SRAM cell didn't shrink that much), but they do say L2 looks independent. http://www.anandtech.com/show/8562/chipworks-a8
     
    #185 anexanhume, Sep 23, 2014
    Last edited by a moderator: Sep 23, 2014
  6. loekf

    Regular

    Joined:
    Jun 29, 2003
    Messages:
    617
    Likes Received:
    65
    Location:
    Nijmegen, The Netherlands
    I think it's BS to use two foundries for the same IC. Dual fab could make sense, but AFAIK TSMC has also multiple 20/28 nm fabs in Taiwan. I don't see TSMC handing over its process technology to Samsung just to give them 40% volume there. Designing (layout, timing checks, DRC, qualification) two ICs in two different processes doesn't make sense to me and proofs analysts have absolutely no clue what it takes to launch each year a new phone/tablet with a new SoC, which probably takes 2 years to design and qualify (18 months is usually what it takes).

    What still puzzles me, why didn't they go for 2 GB on stacked DRAM instead of the somewhat tight 1 GB ? I lost track of the DRAM die sizes and process choices, but maybe
    Elpida (?) couldn't fit 1 GB into a single die and two stacked DRAM dies was a bridge too far for Apple or impossible due to I/O placement etc.
     
  7. anexanhume

    Veteran

    Joined:
    Dec 5, 2011
    Messages:
    2,078
    Likes Received:
    1,535
    Agree. I won't believe it until Chipworks gets one of these devices and proves it.
     
  8. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,360
    Likes Received:
    1,377
    I hope they will use whatever code they used to assess the caches for the A7 to see if there are other measurable improvements to the cache hierarchy.
    The die area devoted to what appears to be "other stuff" is remarkable. It's as much as the assumed CPU, GPU, cache and memory interface put together. There is one heck of a lot of unaccounted for gates that are hardly sitting around doing nothing. What is being missed here?
     
  9. tangey

    Veteran

    Joined:
    Jul 28, 2006
    Messages:
    1,537
    Likes Received:
    282
    Location:
    0x5FF6BC
    Ask Nebuchadnezzar :)

    http://forum.beyond3d.com/showpost.php?p=1876094&postcount=177
     
    #189 tangey, Sep 23, 2014
    Last edited by a moderator: Sep 23, 2014
  10. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    How is that possible? Are they reading the same registers and the compiler can somehow optimise that? It should definitely be possible to write such a test that always reports exactly 1x clock rate, although making it truly compiler-independent may not be easy.
     
  11. Michael

    Newcomer

    Joined:
    Sep 10, 2014
    Messages:
    10
    Likes Received:
    7
    Hey guys,

    Now that Anandtech has confirmed that Apple decided to go with the GX6450 rather than the GX6650, can anyone speculate on why they might have made that decision? (I'm trying to teach myself more of the technical details as to why this decision may have been made, but it's a lot to take in - please forgive my lack of expertise here..)


    I find it odd that Apple has traditionally pushed their GPU technology to the limit, using the top of the line chips available to them, in their previous mobile chips (A5, A6, A7..), but stuck with a 4 core GPU here - especially when considering how many more pixels these phones, especially the 6 Plus, are required to handle. I've downloaded the Epic Zen Garden demo, for example, and have experienced less than perfect performance on my 6 Plus, despite use of the Metal API and being optimized for the 6 and 6 Plus.


    Kindly,

    Michael
     
  12. ams

    ams
    Regular

    Joined:
    Jul 14, 2012
    Messages:
    914
    Likes Received:
    0
    Well this is the first time that Apple has used a 20nm TSMC fab process, so they had to be at least somewhat conservative here. iPhone also has a tighter window to market than iPad too. And Snapdragon 805 is at best equal to A8 in GPU performance (without regard to thermal throttling) while being behind in single threaded CPU and browser performance.
     
  13. Michael

    Newcomer

    Joined:
    Sep 10, 2014
    Messages:
    10
    Likes Received:
    7
    Very good point on the use of a new fab and a brand new process, that's something I didn't even think about. That certainly makes sense as I'm sure there are a lot of unknowns in the long run for something like that..

    The A8 absolutely appears to be a phenomenal SoC, if not the best on the market. But it strikes me as odd that Apple didn't choose to use a more capable GPU, and simply lower the clock speed when that power isn't needed.
     
  14. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    More L3 bw then?
     
  15. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    My guess would be they ran out of mem bw to feed the 6 cluster GPU. The additional area is not much and it's not like Apple can't handle a bit higher BoM.
     
  16. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    The GPU is a validated portable synthesizable IP. How is a 4 cluster part more conservative than a 6 cluster one?
     
  17. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,382
    Maybe he meant that a 6 core is larger than 4 core? But it would be a relatively marginal increase anyway.
     
  18. pcchen

    pcchen Moderator
    Moderator Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,018
    Likes Received:
    582
    Location:
    Taiwan
    My guess is that x86 CPU does optimise for dec and inc. Since dec and inc are both only 1 byte long, some older codes like to use two inc to perform add by 2 (a normal add by 2 instruction in 16 bits is 3 bytes long, and 5 bytes long in 32 bits). That's probably why some CPU are able to do 2x clock rate in BogoMIPS.

    A better way is to make something which is almost impossible for a compiler to optimise. The most simple one is computing fibonacci series, i.e.

    b += a
    a += b

    I used this in my "A7 is 6-wide" test, and I looked at the compiled assembly codes, it's basically the same as the C code, with some instruction reordering.
     
  19. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,360
    Likes Received:
    1,377
  20. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    I'd be very interested to see a neck to neck test between the Adreno420 and the GX6450 from an independent source in GPGPU stuff.

    Other than that Apple has the tendency to always employ the biggest possible IMG GPU IP long before others; I'd be very surprised if Apple won't also be the first (and probably the only) to use the GX6650. The question now is if it's going to be under 20SoC or 16FinFET (H1 15').
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...