I don't see anybody running > 4 GiB processes running on their phones in the next couple of years.
Bingo.
And from ARM's perspective at least, this is definitely a low-end/mid-range solution. Currently, low-end Android phones have 512MB and mid-range ones have 1GB. If we (optimistically?) assume 24 months Moore's Law with slightly increasing wafer costs you'll need to wait until 2018 until you can afford 2GB in the low-end and 4GB in the mid-range. By that point, the A12 will be outdated and you still won't need ARMv8 in those segments.
So no need to run phone on 64-bit mode. And when running it only on 32-bit mode, cpu which lacks the upper bits inthe datapaths is cheaper and slightly more power efficient.
Yes, although in the case of the in-order Cortex-A53, the extra registers probably help noticeably, so you'll likely end up with similar power efficiency than an equivalent ARMv7 design but at higher maximum power/performance. The relative benefit of extra registers should be significantly smaller on an OoOE core so it doesn't make sense there.
Also FWIW, it's quite obvious to me that they decided to sacrifice a tiny bit more power efficiency in A53 in order to hit higher performance targets (e.g. I was told that the A7's limited dual-issue was engineered based on extensive profiling showing it was slightly more efficient than full dual-issue, yet A53 has full dual-issue...)
and A57 which is beefier core will go to servers.
I think A57 should be perfectly fine for phones *if* you manufacture it on a 14nm FinFET process. It's good to remember that FinFET improves performance at low voltages, which means that if you're willing to increase costs (die size) to reduce power (average, not peak) then it should clearly help.
Maybe more importantly, they should have all the issues worked out with big.LITTLE MP by then hopefully, so you should be able to make some much more interesting hybrid designs. For example, 1xA57+4xA53 would be a very interesting sweetspot for low-end smartphones. I'm honestly not sure why you'd want a hypothetical 4xA55 instead of that (it'd have lower single-threaded performance, lower power efficiency for multi-threaded workloads, and probably similar or higher cost). I'm still not convinced by big.LITTLE's cache hierarchy though, and I still don't understand why the CCN-504 apparently has a *minimum* L3 size of 8MB, but heh...
Also as Exophase pointed out, I agree that A53 is ARM's little core in a high-end strategy, and makes very little sense on its own. In a sense it's unfortunate that ARM doesn't seem to be pushing for ARMv8 to be omnipresent as soon as possible, but then again it makes very good short/mid-term business sense not to do so.