ARM Cortex-A7 announced

DSC

Regular
Banned
http://www.arm.com/about/newsroom/a...-processor-ever-with-biglittle-processing.php

http://www.anandtech.com/show/4991/...dualcore-more-power-efficient-highend-devices

ARM today announced the ARM® Cortex™-A7 MPCore™ processor - the most energy-efficient application class processor ARM has ever developed, and big.LITTLE processing - a flexible approach that redefines the traditional power and performance relationship. The Cortex-A7 processor builds on the low-power leadership established by the Cortex-A8 processor that is at the heart of many of today’s most popular smartphones. A single Cortex-A7 processor delivers 5x the energy-efficiency and is one fifth the size of the Cortex-A8 processor, while providing significantly greater performance. The Cortex-A7 processor will enable a rich user experience in sub-$100 entry level smartphones and help connect the next billion people in developing markets.
 
Being targeted at newer processes, it'll fit some budget designs better than previous cores, but this all feels a bit unnecessary with so many other ARM CPU options and ways to scale them.

And yes, being compatible to the same standard as A15 is another benefit.
 
Being targeted at newer processes, it'll fit some budget designs better than previous cores, but this all feels a bit unnecessary with so many other ARM CPU options and ways to scale them.

And yes, being compatible to the same standard as A15 is another benefit.

I'd say it fits well between an A5 and A9. This is pretty much an updated A5 with the latest v7 instructions and a slightly more beefy pipeline.
 
Looks like a slightly beefed up Cortex-A5 with twice the clock target?
Would be a quite nice chip though it's not surprising the "5 times better" claims are against the A8 which wasn't the most efficient design to begin with.
Oh, and is the A7 Kingfisher? Can't really see the "game-changer" aspect of it. I guess it would refer to the ability of the switching from A7 to A15 for better power efficiency (couldn't you use A5 instead of A7 for that purpose?), but certainly what nvidia is doing goes into the same direction.
 
It's really closer to a Cortex-A8 than a Cortex-A5. It has some form of dual-issue which lets it reach 1.9DMIPS/MHz, and according to ARM it's actually noticeably faster than the A8 *per clock* because of the shorter pipeline/improved branch predictor/lower expected L2 latency. This slide implies it should actually be 25% faster per clock for browsing: http://images.anandtech.com/doci/4991/Screen Shot 2011-10-19 at 12.30.25 PM.png - I'm not sure I believe that's realistic but at least it's clearly much faster than an A5.

ARM is unusually cagey about architectural details - on paper if it's really dual-issue for integer there's no reason it shouldn't reach the A8's 2DMIPS/MHz, so there must be something missing, but what? They also haven't said anything about availability - is the ST-Ericsson A9600 already using this or is it a more limited implementation? We'll know soon enough I suppose...
 
Well maybe it's closer to A8 because it's a dual-issue design (though for some reason A5 reaches 1.57 dmips/mhz too), anand mentions the int execution part resembles A8.
But the A8 is a bit of an oddball in the the Cortex family anyway (no doubt because it's the earliest in the family), with its nonpipelined FPU, non-MP capable design, and not very efficient (compared to all other Cortex chips) design.
You'll also note that in the cortex a7 performance section at arm's site the chip is more described in terms of differences to A5 and/or A9, not A8 (well except the big graphic there that is).
It also seems like this is a bit of a more fixed design than either A5 or A9, L1 cache size is fixed and FPU/NEON isn't optional.
Actually it looks like it more or less makes A5 obsolete before it's even out, at least if you look at the more capable versions with larger L1 caches and FPU/NEON? Size isn't all that larger (arm saying A7 is 0.45mm² on 28nm, A5 is 0.68mm² on 40nm with NEON but only half as large L1 caches).
 
It's really closer to a Cortex-A8 than a Cortex-A5. It has some form of dual-issue which lets it reach 1.9DMIPS/MHz, and according to ARM it's actually noticeably faster than the A8 *per clock* because of the shorter pipeline/improved branch predictor/lower expected L2 latency. This slide implies it should actually be 25% faster per clock for browsing: http://images.anandtech.com/doci/4991/Screen Shot 2011-10-19 at 12.30.25 PM.png - I'm not sure I believe that's realistic but at least it's clearly much faster than an A5.

That slide doesn't really make any sense to me. Dhrystone, as dubious as it is, tends to be the standard for ARM making their claims. It seems strange that they'd give a lower DMIPS/MHz number then claim (significantly!) higher IPC elsewhere.

Note that A8 is actually supposed to have 8 cycle latency according to the TRM, and you can see the pipeline is designed around this consideration. In practice implementations tend to be at least a few cycles higher. Whether or not A7 implementations follow the same trend remains to be seen, but I expect ARM's numbers would reflect the ideal timings for both CPUs.

Indeed the shorter pipeline will help branch prediction, but surely this wouldn't give a 25% improvement, especially not while losing some other capabilities. This is closer to the improvement you'd get going from Cortex-A8 to A9.

The one exception that could make this true is if they're including scalar FPU figures in their performance value. If so this seems pretty misleading.

One thing I'd like to know is if this core has only one 64-bit ALUs for NEON like A5, or two like A8 and A9. If it's like A5 then for a lot of cases it'll be less useful to use integer in NEON, assuming that the core can in fact dual-issue ALU operations.
 
Last edited by a moderator:
Well maybe it's closer to A8 because it's a dual-issue design (though for some reason A5 reaches 1.57 dmips/mhz too), anand mentions the int execution part resembles A8.

Note that DMIPS aren't real MIPS, hence why a single-issue part can get > 1. A5 isn't ever actually dual issuing anything.

It also seems like this is a bit of a more fixed design than either A5 or A9, L1 cache size is fixed and FPU/NEON isn't optional.

I got the impression ARM wasn't making NEON/FPU optional with A15 either, which is good news IMO.. no more chance of something like Tegra 2 skimping out on it.
 
[EDIT: removed this part]

One thing I'd like to know is if this core has only one 64-bit ALUs for NEON like A5, or two like A8 and A9. If it's like A5 then for a lot of cases it'll be less useful to use integer in NEON, assuming that the core can in fact dual-issue ALU operations.
They claim it's less than half the size of A8 *on the same process* - there's no way it has two 64-bit ALUs for NEON IMO. BTW, the A5 does have a 2x32-bit FPU for NEON, right?
 
Looks like a slightly beefed up Cortex-A5 with twice the clock target?
Would be a quite nice chip though it's not surprising the "5 times better" claims are against the A8 which wasn't the most efficient design to begin with.
Oh, and is the A7 Kingfisher? Can't really see the "game-changer" aspect of it. I guess it would refer to the ability of the switching from A7 to A15 for better power efficiency (couldn't you use A5 instead of A7 for that purpose?), but certainly what nvidia is doing goes into the same direction.

It is pretty much an updated A5 but the big things to note are that

1. It is ISA compatible to the A15. From software, it may even look identical save for the CPU ID.
2. Its cache system is coherent and compatible with A15, allowing relatively seamless and low-latency swapping between the two types of cores.

The later, Tegra doesn't do; this brings A15/A7 combos into the realm of truly heterogeneous MP.
 
It's really closer to a Cortex-A8 than a Cortex-A5. It has some form of dual-issue which lets it reach 1.9DMIPS/MHz, and according to ARM it's actually noticeably faster than the A8 *per clock* because of the shorter pipeline/improved branch predictor/lower expected L2 latency. This slide implies it should actually be 25% faster per clock for browsing: http://images.anandtech.com/doci/4991/Screen Shot 2011-10-19 at 12.30.25 PM.png - I'm not sure I believe that's realistic but at least it's clearly much faster than an A5.

ARM is unusually cagey about architectural details - on paper if it's really dual-issue for integer there's no reason it shouldn't reach the A8's 2DMIPS/MHz, so there must be something missing, but what? They also haven't said anything about availability - is the ST-Ericsson A9600 already using this or is it a more limited implementation? We'll know soon enough I suppose...

The A5 had certain limited dual-issue as well. It was just a much more restricted set than the A7. Combine that with the pipeline length, static latency and subpipe configuration, and it looks more similar to an A5 than an A8.
 
The A5 had certain limited dual-issue as well. It was just a much more restricted set than the A7. Combine that with the pipeline length, static latency and subpipe configuration, and it looks more similar to an A5 than an A8.
[EDIT]The A5 could indeed co-issue with branches, but that's practically it, whereas the R4 could co-issue with load/stores and the VFP. In that sense you could argue A7 is more an extension of the R4's dual-issue mechanism than the extremely limited one of the A5. And yes, I forgot to check my own Handheld CPU article about this so I got it wrong - how embarassing :) Thanks Arcanum. I suppose you could argue the A7 is closer architecturally to the A5/R4, but in terms of performance it's much closer to the A8.[/EDIT]

BTW, I just got from someone very reliable that the A7 does have 25% higher browser performance per clock, and the reasons are indeed the shorter pipeline/better predictor/tighter L2. I personally suspect the faster VFP might play a role too (Javascript uses FP64 for math) but apparently it's not even the main factor. Pretty impressive - looks like their limited dual-issue really isn't that restricted and it's mostly irrelevant outside of data processing (e.g. Dhrystone for all intents and purposes).
 
Cortex-A5 could dual-issue branches and that's it (mostly).

Cortex-A7 can also dual-issue data-processing operations as well. The tightly coupled L2 and better branch prediction also improve performance over Cortex-A5.

It is correct to say that A7 is closer in design to A5 than A8.
 
BTW, I just got from someone very reliable that the A7 does have 25% higher browser performance per clock, and the reasons are indeed the shorter pipeline/better predictor/tighter L2. I personally suspect the faster VFP might play a role too (Javascript uses FP64 for math) but apparently it's not even the main factor. Pretty impressive - looks like their limited dual-issue really isn't that restricted and it's mostly irrelevant outside of data processing (e.g. Dhrystone for all intents and purposes).
Viewed from another angle, it might not be much of an issue because the A8 couldn't make that much use of dual-issue neither :). Unless I'm mistaken the core of the A8 really was nowhere close to twice as fast as good old arm11 (at the same clock), and that's with faster L1 and L2 caches too. Relatively long pipeline probably didn't help neither for the dual-issue in-order design, so it's not surprising that you can beat A8 even if you can dual issue less things. The memory system performance is specifically listed to even be better than A9 in some areas (larger TLBs - at least I'd think it's also larger than on A8). Still, faster at not even half the die size is quite a feat - or shows how bad the A8 actually really was :).
I haven't actually seen a number how much faster it is over A5 (well except the dmips number - if that's an indication it's not THAT much faster per clock) though it looks like a winner against A5 in any case as it can apparently reach much higher clock frequencies and isn't much bigger (if compared against A5 with the same features), and faster l2 will certainly help in real world apps too.
So seems quite nice on its own, though the big deal is apparently that "it looks the same as a slow A15 from the outside".
There's a nice arm blog about this chip - though I'm not quite sure what "ability to dual-issue most common instruction pairs" really means. Branches? Movs?
 
It looks like we'll know more in less than a week: http://eetimes.com/electronics-news/4229907/ARM-A7-comes-under-the-microscope

No matter what the dual-issue limitations are, it seems clear that it does deliver better performance than the Cortex-A8 (per clock) in at least one very important real-world benchmark, and probably most others. It's certainly not as fast as the Cortex-A9 and seems to clock lower as well, but given its extremely good area and power efficiency, a dual-core A7 is nearly universally superior to a single-core A9. Because single-threaded performance matters and quad threads aren't common enough, a dual-core A9 is still arguably more attractive than a quad-core A7 though.

The more I think about it, the more I suspect this is going to be a very successful CPU for ultra-low-end smartphones, more so than the Cortex-A5 (which could still see some success via the unannounced MSM7227A). You could implement 28HPL 2x1.2GHz A7s with 512KB L2 and still be smaller than the 40LP 1x800MHz A9 with 256KB L2 Android solutions from Broadcom and ST-Ericsson. In fact, I wonder if Qualcomm might be interested as well for the market, or if they'd rather use a single-core Krait.

As for heterogeneous multiprocessing, I was going to start a thread on it, but I'm too tired to finish writing it now so I'll just go to sleep and finish it tomorrow.
 
If it is faster then A8 per clock, allows Higher clock speed and Dual Core Config, then wouldn't it be just as fast as the current Apple A5? Which is a Dual Core A9 @ 800Mhz.
 
@iwod:
It is not faster as the A8 in general, this depends on the the code that is run.

If you have a dual 1.2GHz A7 with larger cache, then it could even be faster than a dual A9 at 800MHz for some workloads.
Certainly it will be much lower power and lower cost, which is nice.


Anyway, I guess we will not see any A7 Chips arrive before 2013.
 
Back
Top