Next-Gen iPhone & iPhone Nano Speculation

When the major wins for A6 come from memory performance, saying "with 2 less cores too" is just wrong, as count of cores has nothing to do with what brings A6's score so much up

Well this is a mobile SoC, after all, so separating the cores from the rest of the system makes little sense. Higher performance memory controllers and higher speed memory cost gates/money/power as well, so the trick is to balance it right for the real world usage at hand.

Geekbench, as almost all small portable benchmarks, typically underestimates the real world importance of the memory subsystem. It's the nature of the beast. (Also, it could be argued that the multiprocessing scores have way to much impact on the final scores if modelling typical usage is the purpose. But then again that may not be its purpose.) Even the SPEC suite has a host of thorny problems if you actually want to use it for anything but marketing, and Geekbench is commercial toy. Use the data for entertainment purposes only.

The elephant in the room for all our discussions here is that the architecture and design of these SoC are dictated by power draw concerns. And we don't have the power draw data! Without power draw data, at best we can achieve pleasant academic discourse, at worst we merely engage in pointless mental masturbation.
 
http://www.engadget.com/2012/09/18/apple-iphone-5-review/

Now that reviews are starting to filter out for the iPhone 5, Engadget is getting 1628 in Geekbench for the iPhone 5 vs 634 for the iPhone 4S confirming the previous leak. Also 924 ms vs. 2200 ms for the iPhone 4S in SunSpider although iOS 6 vs iOS 5.1 differences would be a factor. Various reviews also seem to indicate that there is some battery life improvement over the iPhone 4S, although they don't seem to have hard number comparisons. In any case, it does seem like Apple's 2x performance increase with improved battery life in a smaller device does pan out.
 
Also 924 ms vs. 2200 ms for the iPhone 4S in SunSpider although iOS 6 vs iOS 5.1 differences would be a factor.

According to some tests, Safari in iOS 6 is indeed a little faster than in iOS 5 (it's about 15% faster on my iPhone 4). I've seen reports that iPhone 4S does ~ 1800ms on iOS 6, so iPhone 5 is roughly 2X as fast.
 
For a first custom core, Apple seems to have hit it out of the park. I wonder how it compares vs Krait.

Sunspider is not a good benchmark for comparison, especially when running on different platforms :) However, Anandtech did a preview test with Krait (here) which includes Sunspider running on stock browser, resulting in 1532 ms. That's 34% faster than a 1.2GHz dual core A9 (Droid RAZR). This is a public data we can use.

However, since the stock browser in ICS is slower than Safari in iOS 6 (after all, a 800MHz dual core A9 is able to do ~1800 ms in iOS 6), it's safe to assume that, if one use Krait with iOS 6, it's probably going to be an additional 40% ~ 50% speed up. So it's probably going to be close to ~1000ms, running @ 1.5GHz.

If the estimation is not too far off, then this shows that Apple's A6 is indeed very quick (at least when running Javascript...).
 
The stuff about Atom's Sunspider advantage being down to memory subsystem is probably nothing more than a total guess on Intel's part. I'd be really surprised if they actually came to this answer by analyzing performance counters on Medfield vs Cortex-A9.

Javascript performance is highly dependent on the web browser and compiler techniques the JIT uses. Some of it is pretty different from conventional compiler techniques, and if it relies on self-modifying code (like the deopts in Chrome do) then Cortex-A9 is at a very specific disadvantage. Maybe Apple got wise and made code invalidation fast from user land somehow.

I wish it would stop getting attention by virtue of the fact that Javascript programs are still several times slower than native equivalents. So it's just a competition between extremely slow and horrifically slow. It's the kind of thing where I doubt memory subsystem performance is the big gating factor, unless it's all coming from icache misses.
 
The stuff about Atom's Sunspider advantage being down to memory subsystem is probably nothing more than a total guess on Intel's part. I'd be really surprised if they actually came to this answer by analyzing performance counters on Medfield vs Cortex-A9.

Javascript performance is highly dependent on the web browser and compiler techniques the JIT uses. Some of it is pretty different from conventional compiler techniques, and if it relies on self-modifying code (like the deopts in Chrome do) then Cortex-A9 is at a very specific disadvantage. Maybe Apple got wise and made code invalidation fast from user land somehow.

I wish it would stop getting attention by virtue of the fact that Javascript programs are still several times slower than native equivalents. So it's just a competition between extremely slow and horrifically slow. It's the kind of thing where I doubt memory subsystem performance is the big gating factor, unless it's all coming from icache misses.
Yea apparently the a6 @1ghz beats medfield @ 2ghz!..that's crazy...yea defiantly pound for pound its the best mobile cpu around.

What benchmarks do you think would be good to test mobile cpus?..to get a more accurate representation of what they can do...after all krait is a much wider core than a9..yet clock for clock doesn't seem that much faster outside of linpack?

What's your opinion on this? Qualcomm have a comparatively poor memory subsystem to its cpu?
 
The A9 does have its limitations wrt number of memory transactions it can have in flight, doesn't it? While that's technically not part of the memory controller, it will result in bad performance of the memory system in benchmarks, and a better CPU should see improved numbers. Though i assume the memory controller will be tuned to the requirements of the CPU, so it's a bit chicken and egg anyway.
 
I like how everybody here comparing Apples to Oranges in sunspider with different Android browser builds. I always wounder why reviewers cannot compare different processors on the same browsers, it's easy - just need to download latest build of Chrome from the Play and test sunspider in it. With latest JB browser builds 1.6-1.7Ghz Cortex A9 could easily pass 900 ms mark ;) - http://www.gsmarena.com/motorola_motoedge-review-816p3.php
 
I like how everybody here comparing Apples to Oranges in sunspider with different Android browser builds.

Seconded.

Sunspider is more a test of the software stack than the CPU hardware, - and an almost irrelevant one at that.

Cheers
 
Seconded.

Sunspider is more a test of the software stack than the CPU hardware, - and an almost irrelevant one at that.

Cheers

And even when hardware may give an advantage, like x86's coherent icache, it may not be something that translates into any real benefit for other languages or even other benchmarks using the same language.

Interesting dive (by a PhD candidate) into Apple's "macroscalar" trademarks and patents I stumbled across when my interest in macroscalar was renewed by the A6 piece on CNET:

http://homes.cs.washington.edu/~asampson/blog/macroscalar.html

This approach needs hints from the compiler to work well and I think A6 has established it does very well on legacy code. And if Apple was deploying all this new compiler technology there'd be hints of it in the open source compilers they use.

I actually think that the idea, precisely as presented by the patent anyway, would work very poorly for a general purpose mobile processor, because it relies on a highly symmetric execution unit layout to extract any real kind of parallelism. Look at the higher end low power processors like Cortex-A15 and Bobcat - even those can only do 1 load + 1 store per cycle. In fact, until Sandy Bridge the same was true for Intel's high end desktop processors (not counting one exception, the original Pentium). So you can see that 2 loads or 2 stores simultaneously is an expensive feature, but without it a single load or store in a loop would limit your entire parallelism to 1. It's not just any particular execution type that needs symmetry - these systems decode/issue 3-4 instructions per cycle but don't have as many execution units of any one type. They'd really like you to execute some memory ops in parallel with ALU, branches, etc, and as well they should because otherwise you're only utilizing one slice of the execution resources at any time.

I'm sure it'd also break down on any loop too big, nested, etc too, so really fragile and probably delivering worse performance than Cortex-A9 for a lot of workloads (unless it hits really high frequencies, not realistic for mobile though).
 
Last edited by a moderator:
This approach needs hints from the compiler to work well and I think A6 has established it does very well on legacy code. And if Apple was deploying all this new compiler technology there'd be hints of it in the open source compilers they use.

Given that macroscalar seems to need specific compiler tools, I don't think that's it or geekbench wouldn't run that fast :)

I'm not suggesting this is technology employed in the A6. Just hints that Apple, fully committed to custom architectures, may be looking to commit to this type of effort in the future.
 
Read my edits though, I just don't see this approach as stated being useful in pretty much any non-specialized application.. I'm surprised the blog writer doesn't see the big problems with needing symmetric parallelism..
 
Read my edits though, I just don't see this approach as stated being useful in pretty much any non-specialized application.. I'm surprised the blog writer doesn't see the big problems with needing symmetric parallelism..

I get what you're saying. I find it hard to believe there's a solution that tries to solve OoO suited problems substantially better than established OoO hardware. Which makes me think there is possibly some misunderstanding surrounding what is actually going on.


edit: switching topics, how similar is A6 to Krait? Krait is triple decode, and I assume A6 is too. The A15 is 15 stage pipeline, whereas Krait is 11. A6 is likely 11 at the most given its low clock frequency. All three have 1MB L2 cache. All have a VFPv4 FPU. Krait and the A6 both have 2x32 bit LPDDR2 memory interfaces. Seems like people are using 800 MHz LPDDR2 with Krait though (I can't verify this).
 
Last edited by a moderator:
Back
Top