Next-Gen iPhone & iPhone Nano Speculation

Lazy8s · Nov 2, 2013

The choppiness in certain relatively basic OS animations and the slowness in typical app transitions are the kind of inexplicable things that usually point to a lack of polish/optimization, which is very uncharacteristic of Apple. (They're struggling with smoothness even after removing certain visually intuitive animations they had in the prior iOSs like showing a photo pull itself out of the gallery and set up inside an email when mailing photos.)

iOS 7 seems to be a big rewrite of their software, so it'll hopefully smooth out noticeably within a few more of these early revisions.

Grall · Nov 2, 2013

mavere said:
I've noticed that Apple's blur is fairly low quality, but the small screen of the 5s makes it easily noticeable only in certain places (really high contrast shots).

I've noticed occasional aliasing in the blur - which would sort of go against the idea of blur to begin with!

I can't say it's something that really bothers me though.

Nebuchadnezzar · Nov 3, 2013

Does anybody know if we have any info on the on-SoC IP Apple uses? I'm talking about interconnects and memory controllers.

silent_guy · Nov 3, 2013

Nebuchadnezzar said:
Does anybody know if we have any info on the on-SoC IP Apple uses? I'm talking about interconnects and memory controllers.

They have hired buildings of HW designers. They are not all being kept busy with IP integration work. I can't imagine that a company like Apple would outsource something as crucial and architecture specific as a memory controller and interconnect.

ltcommander.data · Nov 3, 2013

Nebuchadnezzar said:
Does anybody know if we have any info on the on-SoC IP Apple uses? I'm talking about interconnects and memory controllers.

Well PA Semi designed their own Power chip from the ground up including memory controllers, interconnect, and I/O, so Apple would have the expertise and IP to design their own in the A series.

PA Semi's CONEXIUM interconnect was a crossbar architecture:
http://www.realworldtech.com/pa-semi/4/

The ENVOI I/O was interesting because it can dynamically reconfigure its data lanes on boot to vary the number of attached devices using PCIe, ethernet, and eventually SATA and other protocols. This could be useful in the A7 to address the different I/O situation between the smartphone, tablet, and eventually mp3 player configurations. Other interestingly things are Offload Engines for things like DMA (like the XBox One Move Engines?) and cryptography (probably unnecessary now).
http://www.realworldtech.com/pa-semi/5/

http://www.theinquirer.net/inquirer/news/1027774/pa-semi-power-chips-are-full-of-eastern-promise

The L2 cache on the PA6T-1682M is 2MB shared among two PA6T PPC cores. The really interesting bit is that the cores are not connected to the cache itself but to the CONEXIUM crossbar. This means it is addressed in a serial fashion, but can also be used as a cache by other parts of the system. It can be an I/O or DMA cache as well as a CPU cache, but each unit needs to wait it's turn. Luckily, interconnect it is on is up to the job, and the cache can pass up to 1G addresses per second, and it is pipelined so it can have multiple things in flight at once. It also can send and receive data in parallel.

Interestingly, PA Semi designed their L2 cache to serve not only the CPU, but to act as a shared cache for anything else attached to the CONEXIUM crossbar such as the DMA Offload engine and I/O. This design philosophy would support the notion that the A7's 4MB L3 cache is not CPU exclusive, but is truly a shared cache.

Besides, PA Semi IP, Apple has also acquired Anobit which specializes in flash memory controllers, AlgoTrim for data compression, and Passif Semiconductor which does low-power communications so these technologies could be incorporated into A series SoC. The A5 used to use Audience earSmart audio processing IP, but Apple designed their own solution starting with the A6. Apple's also been designing their own ISPs and Anand reports the one in the A7 is called the H6.

Grall · Nov 3, 2013

Ideally you wouldn't actually want your L2 cache to function like that because it fscks up latency a lot, first putting it behind a crossbar and second, sharing it with other devices as well. In a low power, battery-powered device, you'd want your data processing to be finished quickly so your CPU can go to sleep as soon as possible. This setup would not facilitate such an operating procedure.

ltcommander.data · Nov 3, 2013

Grall said:
Ideally you wouldn't actually want your L2 cache to function like that because it fscks up latency a lot, first putting it behind a crossbar and second, sharing it with other devices as well. In a low power, battery-powered device, you'd want your data processing to be finished quickly so your CPU can go to sleep as soon as possible. This setup would not facilitate such an operating procedure.

http://www.theinquirer.net/inquirer/news/1027774/pa-semi-power-chips-are-full-of-eastern-promise

The results are quite good, the latency, load to use, is a worst case 110 clocks or 55ns at 2GHz. It only gets better from there with L1 accesses at 4 clocks, L2 at 22, open pages at 90, and a remote L1 hit takes 30. When looking at these numbers, remember that the L2 and remote L1 numbers are across the crossbar, so if you cringed when you first read that, take a deep breath and relax.

http://www.freescale.com/files/32bit/doc/fact_sheet/MPC7448FACT.pdf

Charlie seems happy about PWRficient's 22 cycle L2 cache latency, but seeing the competing late model PowerPC G4 had 11-12 cycle L2 cache latencies I can see your concerns. I'm guessing the higher latency of a shared L3 cache behind the interconnect shouldn't be an issue with the A7 since it retains a dedicated CPU L2 cache?

Helmore · Nov 3, 2013

L3 caches are usually higher latency anyway as a tradeoff for area, power consumption, size and bandwidth. Depends on the specific cache hierarchy of course. Intel has some pretty great L3 caches in some of their CPUs.

Grall · Nov 3, 2013

ltcommander.data said:
I'm guessing the higher latency of a shared L3 cache behind the interconnect shouldn't be an issue with the A7 since it retains a dedicated CPU L2 cache?

I think, without being a silicon integration engineer and privvy to apple's trade secrets, that that would be a reasonable assumption... How many other mobile computing SoCs have had L3/system caches previous to the A7, anyone knows?

Helmore · Nov 3, 2013

Grall said:
I think, without being a silicon integration engineer and privvy to apple's trade secrets, that that would be a reasonable assumption... How many other mobile computing SoCs have had L3/system caches previous to the A7, anyone knows?

Does Intel Haswell count?

Grall · Nov 3, 2013

Not really a 'proper' SoC because I/O isn't on the main chip. Even ULV haswells have the southbridge on a separate die even though it's on the same substrate...

Nebuchadnezzar · Nov 3, 2013

I brought it up the IP topic up because the cache found on the A7 is basically exactly what the L3 buffer in ARM's CCN-504/8 is.

tangey · Nov 4, 2013

tangey said:
Interesting thing is that in the low level tests, the only ones that got a boost from the driver change are the on-screen textured tests (up by about 25%), the off-screen ones are identical to the previous driver.

What driver improvement would result in onscreen low level tests getting a 25% improvement, but off-screen tests being identical ? Is there a chance that the glbenchmark site is showing old results for low level off-screen tests ?

Overall, the 5s with the new drivers on the 2.7 bench, get 8% improvement off-screen, and around 5% on-screen.

got over 12% improvement in the 2.5 offscreen bench too.

Bad form to quote my own message, but I see the ipad air has posted to glbench. It does prove that the 7.03 driver update brought around 25% improvement to the onscreen low level tests, but exactly ZERO improvement to the off-screen low level tests.

Lazy8s · Nov 4, 2013

I don't remember the old results for most of the off-screen low level tests, but I believe off-screen texel fill rate, at least, was recorded higher after the driver update.

If my assumption of 8 texels/clock is correct for the A7's GPU (which I believe indicates the G64xx), the theoretical peak fill would be 3464 MTex/sec at 433 MHz, and the 5s is achieving 3459 MTex/sec so far. If that's the 5s's clock rate, its real world texel fill performance is essentially 100% effective.

Since the on-screen texel fill test for the iPad Air asks it to fill an area so much larger than 1080p (I suppose the v-sync limit of on-screen testing isn't always as much of a limitation as the smaller resolution of the 1080p off-screen buffer for processing efficiency in this particular low level test), the Air scores even higher than the 5s with a result of 3486 MTex/sec. That would put it above the limit for a 433 MHz clock under my assumption, so I'm estimating the Air's G6430 at a 467 MHz top rate, even though the early benchmark scores for the other GfxBench tests have so far ended up just below what the 5s has posted.

tangey · Nov 4, 2013

Lazy8s said:
I don't remember the old results for most of the off-screen low level tests, but I believe off-screen texel fill rate, at least, was recorded higher after the driver update.

If you look at iphone5s "best score" (which is for 7.0.3) and "median score", you'll see that the low level triangle scores are identical, although the corresponding on-screen scores are about 25% better with 7.0.3. At the time, I wondered perhaps that maybe glbench was still showing the pre 7.0.3 low level off-screen scores for both sets.

However the ipad air off-screen triangle scores are also identical, which does point to 7.0.3 gaving a substantial on-screen low level improvement, whilst delivering zero improvement to off-screen low level results.

Lazy8s · Nov 8, 2013

As mentioned in the comments section of the Anandtech iPad Air review and in their latest podcast, the lack of a performance gain from Apple's Cyclone in 3D Mark's physics test is due to a particular memory access implementation detail in the new CPU architecture.

http://hothardware.com/m/News/When-...-and-Apple-A7-3DMark-Performance/default.aspx

While regressions in performance are not typical when moving to a new generation, it's very minor in the overall performance, can be worked around with optimized code, and is part of a balance that puts Cyclone way ahead of Swift overall. Still, an interesting anecdote.

Grall · Nov 8, 2013

Who cares? What software other than benchmarks actually implements (heavy) physics calculations on a mobile device - and no, Worms and Angry Birds don't really count...

While it's always better when things get better, the lack of better in this case is pretty much a non-issue.

patsu · Nov 9, 2013

Algoriddim uses the A7 for real time audio processing:
http://www.algoriddim.com/press_rel...cessor-support-for-djay-2-and-vjay-for-iphone

wishiknew · Nov 17, 2013

Read the Anandtech mini retina review. Disappointed about screen, cpu and throttling vs the Air. Maybe next year Apple will make a no compromise mini.

wco81 · Nov 17, 2013

Well, we already know there's no touch ID and only 1 GB of RAM.

And it's a bit thicker than the previous Mini so they may try to get the thickness down.

Color gamut is important but I think photographers who care about color accuracy are going to put their pictures on a bigger screen.

I'll take more RAM before a more color-accurate screen.

Next-Gen iPhone & iPhone Nano Speculation

Lazy8s

Grall

Invisible Member

Nebuchadnezzar

silent_guy

ltcommander.data

Grall

Invisible Member

ltcommander.data

Helmore

Grall

Invisible Member

Helmore

Grall

Invisible Member

Nebuchadnezzar

tangey

Lazy8s

tangey

Lazy8s

Grall

Invisible Member

patsu

wishiknew

wco81

Similar threads