Apple A8 and A8X

Chipworks has annotated their die shot. Their conclusion is quad core GPU (GX6450), which I agree with. SRAM seems roughly same size, so if they could double density, that'd give us 8MB.

For the CPU core, L2 cache appears to be independent now, from what I can tell. CPU block shrank pretty considerably overall, suggesting it is just tweaks to the Cyclone core.

ByOwUlMCQAA2tXJ.jpg:large
 
What about the Secure Element, the enclave where they store the Apple Pay data? Could that be integrated into the A8?
 
What about the Secure Element, the enclave where they store the Apple Pay data? Could that be integrated into the A8?
A TrustZone controller with some kind of memory for it is minuscule.
I'm sure there are many things you missed in that list, but my indirect point is that, the non GPU and non CPU stuff is taking increasingly large proportions of the die.

Most of what you mentioned (obviously not PCIe controller) in some form would be in the A6X as an example. And yet GPU and CPU took up about 60% of that die.
Not everything scales perfectly with die shrinks. Also how do you know they don't have a PCIe controller?
 
http://recode.net/2014/09/23/teardown-shows-apples-iphone-6-cost-at-least-200-to-build/

The iPhone’s main processor is the A8, designed by Apple. Rassweiler said the processor he saw during the teardown was manufactured by Taiwan Semiconductor Manufacturing Co., the massive chip-factory-for-hire based in Taipei. TSMC, he said, is one of the few companies with the capability to manufacture 20-nanometer chips. Apple had previously used South Korea-based Samsung as its chip manufacturer, despite the acrimonious patent litigation between them. The bad blood between them has caused Apple to shift some –but not all — of its production of the chip to TSMC. Rassweiler says TSMC is manufacturing about 60 percent of the chips for Apple, while Samsung is still turning about about 40 percent.
IHS is claiming Samsung is still manufacturing 40% of the A8. Does it take a lot of effort to bring up a chip on a different process? If this is true, it'd be interesting for Chipworks to do a Samsung vs TSMC A8 die comparison and maybe Anandtech could do a performance, power consumption comparison.
 
http://recode.net/2014/09/23/teardown-shows-apples-iphone-6-cost-at-least-200-to-build/


IHS is claiming Samsung is still manufacturing 40% of the A8. Does it take a lot of effort to bring up a chip on a different process? If this is true, it'd be interesting for Chipworks to do a Samsung vs TSMC A8 die comparison and maybe Anandtech could do a performance, power consumption comparison.

It would take a lot of effort. Different libraries for the processes. Different design rules, etc. Unless they consolidated them into one rule subset somehow, potentially taking lowest common denominator performance wise. Biggest issue is doing full custom designs and having two different libraries and validation rules to work against.

Anandtech says that the SRAM is still just 4MB (SRAM cell didn't shrink that much), but they do say L2 looks independent. http://www.anandtech.com/show/8562/chipworks-a8
 
Last edited by a moderator:
It would take a lot of effort. Different libraries for the processes. Different design rules, etc. Unless they consolidated them into one rule subset somehow, potentially taking lowest common denominator performance wise. Biggest issue is doing full custom designs and having two different libraries and validation rules to work against.

Anandtech says that the SRAM is still just 4MB (SRAM cell didn't shrink that much), but they do say L2 looks independent. http://www.anandtech.com/show/8562/chipworks-a8

I think it's BS to use two foundries for the same IC. Dual fab could make sense, but AFAIK TSMC has also multiple 20/28 nm fabs in Taiwan. I don't see TSMC handing over its process technology to Samsung just to give them 40% volume there. Designing (layout, timing checks, DRC, qualification) two ICs in two different processes doesn't make sense to me and proofs analysts have absolutely no clue what it takes to launch each year a new phone/tablet with a new SoC, which probably takes 2 years to design and qualify (18 months is usually what it takes).

What still puzzles me, why didn't they go for 2 GB on stacked DRAM instead of the somewhat tight 1 GB ? I lost track of the DRAM die sizes and process choices, but maybe
Elpida (?) couldn't fit 1 GB into a single die and two stacked DRAM dies was a bridge too far for Apple or impossible due to I/O placement etc.
 
I think it's BS to use two foundries for the same IC. Dual fab could make sense, but AFAIK TSMC has also multiple 20/28 nm fabs in Taiwan. I don't see TSMC handing over its process technology to Samsung just to give them 40% volume there. Designing (layout, timing checks, DRC, qualification) two ICs in two different processes doesn't make sense to me and proofs analysts have absolutely no clue what it takes to launch each year a new phone/tablet with a new SoC, which probably takes 2 years to design and qualify (18 months is usually what it takes).

Agree. I won't believe it until Chipworks gets one of these devices and proves it.
 
Anandtech says that the SRAM is still just 4MB (SRAM cell didn't shrink that much), but they do say L2 looks independent. http://www.anandtech.com/show/8562/chipworks-a8

I hope they will use whatever code they used to assess the caches for the A7 to see if there are other measurable improvements to the cache hierarchy.
The die area devoted to what appears to be "other stuff" is remarkable. It's as much as the assumed CPU, GPU, cache and memory interface put together. There is one heck of a lot of unaccounted for gates that are hardly sitting around doing nothing. What is being missed here?
 
I think this is basically what BogoMIPS does (though BogoMIPS always subtracts by 1), but a lot of CPU run BogoMIPS with 2x (or even more) clock rate results.
How is that possible? Are they reading the same registers and the compiler can somehow optimise that? It should definitely be possible to write such a test that always reports exactly 1x clock rate, although making it truly compiler-independent may not be easy.
 
Hey guys,

Now that Anandtech has confirmed that Apple decided to go with the GX6450 rather than the GX6650, can anyone speculate on why they might have made that decision? (I'm trying to teach myself more of the technical details as to why this decision may have been made, but it's a lot to take in - please forgive my lack of expertise here..)


I find it odd that Apple has traditionally pushed their GPU technology to the limit, using the top of the line chips available to them, in their previous mobile chips (A5, A6, A7..), but stuck with a 4 core GPU here - especially when considering how many more pixels these phones, especially the 6 Plus, are required to handle. I've downloaded the Epic Zen Garden demo, for example, and have experienced less than perfect performance on my 6 Plus, despite use of the Metal API and being optimized for the 6 and 6 Plus.


Kindly,

Michael
 
Well this is the first time that Apple has used a 20nm TSMC fab process, so they had to be at least somewhat conservative here. iPhone also has a tighter window to market than iPad too. And Snapdragon 805 is at best equal to A8 in GPU performance (without regard to thermal throttling) while being behind in single threaded CPU and browser performance.
 
Very good point on the use of a new fab and a brand new process, that's something I didn't even think about. That certainly makes sense as I'm sure there are a lot of unknowns in the long run for something like that..

The A8 absolutely appears to be a phenomenal SoC, if not the best on the market. But it strikes me as odd that Apple didn't choose to use a more capable GPU, and simply lower the clock speed when that power isn't needed.
 
Hey guys,

Now that Anandtech has confirmed that Apple decided to go with the GX6450 rather than the GX6650, can anyone speculate on why they might have made that decision? (I'm trying to teach myself more of the technical details as to why this decision may have been made, but it's a lot to take in - please forgive my lack of expertise here..)


I find it odd that Apple has traditionally pushed their GPU technology to the limit, using the top of the line chips available to them, in their previous mobile chips (A5, A6, A7..), but stuck with a 4 core GPU here - especially when considering how many more pixels these phones, especially the 6 Plus, are required to handle. I've downloaded the Epic Zen Garden demo, for example, and have experienced less than perfect performance on my 6 Plus, despite use of the Metal API and being optimized for the 6 and 6 Plus.


Kindly,

Michael
My guess would be they ran out of mem bw to feed the 6 cluster GPU. The additional area is not much and it's not like Apple can't handle a bit higher BoM.
 
Well this is the first time that Apple has used a 20nm TSMC fab process, so they had to be at least somewhat conservative here. iPhone also has a tighter window to market than iPad too. And Snapdragon 805 is at best equal to A8 in GPU performance (without regard to thermal throttling) while being behind in single threaded CPU and browser performance.

The GPU is a validated portable synthesizable IP. How is a 4 cluster part more conservative than a 6 cluster one?
 
How is that possible? Are they reading the same registers and the compiler can somehow optimise that? It should definitely be possible to write such a test that always reports exactly 1x clock rate, although making it truly compiler-independent may not be easy.

My guess is that x86 CPU does optimise for dec and inc. Since dec and inc are both only 1 byte long, some older codes like to use two inc to perform add by 2 (a normal add by 2 instruction in 16 bits is 3 bytes long, and 5 bytes long in 32 bits). That's probably why some CPU are able to do 2x clock rate in BogoMIPS.

A better way is to make something which is almost impossible for a compiler to optimise. The most simple one is computing fibonacci series, i.e.

b += a
a += b

I used this in my "A7 is 6-wide" test, and I looked at the compiled assembly codes, it's basically the same as the C code, with some instruction reordering.
 
Well this is the first time that Apple has used a 20nm TSMC fab process, so they had to be at least somewhat conservative here. iPhone also has a tighter window to market than iPad too. And Snapdragon 805 is at best equal to A8 in GPU performance (without regard to thermal throttling) while being behind in single threaded CPU and browser performance.

I'd be very interested to see a neck to neck test between the Adreno420 and the GX6450 from an independent source in GPGPU stuff.

Other than that Apple has the tendency to always employ the biggest possible IMG GPU IP long before others; I'd be very surprised if Apple won't also be the first (and probably the only) to use the GX6650. The question now is if it's going to be under 20SoC or 16FinFET (H1 15').
 
Back
Top