Next-Gen iPhone & iPhone Nano Speculation

So, AnandTech does their usual well-rounded reviews and then they drop this nitty-gritty.

The A5X has a Quad-Channel 128-bit memory controller, although he concludes that only the SGX543MP4 has full access to it.
 
Does that mean part of RAM is effectively unusable for CPU?


It just means that the memory controller is directly attached to the GPU and the CPU is second in line with only 64-bit access.

The opposite of how normal computers and graphic cards handle data traffic, where the CPU is first and then everything else second.
 
Thats innovative and well...incredible! just when we all thought A5X was hampered by bandwidth Apple drops this little nugget!.

IMHO this marks Apple as the best SOC designer in the world....not just because of the expertise..but because they have the financial muscle and are bat-shet-crazy enough to pull it off...a die size bigger than a 2 core Sandybridge?? nuts!.

I can't wait till A6..just imagine what they are going to be able to do with say 2 Cortex A15's@ 1.5ghz.. 1 A7 @ 500mhz and MULTI CORE Rogue!! whooa! ;).

Interesting his insights on Haswell though...and also proves all that hardware power is nothing without decent software to take advantage of it..well done Nvidia.
 
So, AnandTech does their usual well-rounded reviews and then they drop this nitty-gritty.

The A5X has a Quad-Channel 128-bit memory controller, although he concludes that only the SGX543MP4 has full access to it.

That is an insane review. Ended up reading almost all of it (all the LTE stuff is simply and unfortunately not relevant to me at all). Today or tomorrow I'll hopefully find one in the store to have a look at. Then it is decision time ... (though lack of funds may simplify that decision)
 
2 dedicated MC that are not even connected to the CPU? Why would you want to do that for an SOC? Seems like an incredible waste of potential BW when you're not doing GPU intensive tasks.

But even if it is as they say it is, it's weird that CPU perf doesn't increase one bit. You'd expect at least some improvement if the GPU traffic has been off-loaded to a different MC? That would make for an interesting separate bencmark.
 
2 dedicated MC that are not even connected to the CPU? Why would you want to do that for an SOC? Seems like an incredible waste of potential BW when you're not doing GPU intensive tasks.

But even if it is as they say it is, it's weird that CPU perf doesn't increase one bit. You'd expect at least some improvement if the GPU traffic has been off-loaded to a different MC? That would make for an interesting separate bencmark.

The GPU traffic isn't off-loaded to a different memory controller.
 
2 dedicated MC that are not even connected to the CPU? Why would you want to do that for an SOC? Seems like an incredible waste of potential BW when you're not doing GPU intensive tasks.

But even if it is as they say it is, it's weird that CPU perf doesn't increase one bit. You'd expect at least some improvement if the GPU traffic has been off-loaded to a different MC? That would make for an interesting separate bencmark.
Anand mentioned the bottleneck in Cortex A9 designs is the L2 cache controller. I'm not sure the details of this, but he said A15 corrects things.

As Pressure mentioned it's not likely the GPU and CPU have different connections to the 4 memory controllers. Instead all 4 memory controllers are attached to the GPU. Attaching the memory controllers to the GPU and having the CPU hang off the GPU in the embedded space seems like a very console-like design. I'm guessing the interface between the GPU and CPU is 64-bit, which explains why the memory bandwidth the CPU sees is unchanged.

As an aside wasn't the + in the Vita's SGX543MP4+ for the GPU having it's own memory controller? Of course, the CPU doesn't share the Vita GPU memory. Does any know the Vita's GPU memory bandwidth for comparison?

Yea i did read something like that before, its getting difficult to pick out just what is a core and what isn't:rolleyes:

As Anands review states that A5X has 12.8gb/s of bandwidth..and Sammy states Exynos 5250 also has 12.8gb/s...with Sammy of course making both...does that mean we will see a similar design in Exynos? (quad channel? or LPDDR3?:???:)
The A5X achieves 12.8GB/s of bandwidth by using LPDDR2-800 and 4 32-bit memory controllers while Exynos 5250 appears to reach 12.8GB/s of bandwidth using LPDDR3-1600 and 2 32-bit memory controllers. Apple presumably did it their way because it's easier to just double the existing memory controllers than design a new one for LPDDR3. As Anand mentions, you need a sufficiently large die with a sufficiently large perimeter to have the space to put 4 memory controllers along the edge. With Exynos 5250 being 32nm and Apple being known for their unusually large dies, it's safe to assume the Exynos 5250 will be quite a bit smaller than the A5X, meaning their likely isn't enough room to place 4 memory controllers.
 
The L2 of the A9 hangs off of the main AXI bus, which somewhat limits its bandwidth in both fetching and feeding. More-over, the A9's LDST queues aren't exactly big, so hiding latency with multiple outstanding requests isn't going to be very easy. More bandwidth channels wouldn't necessarily help here.

However, a major problem in SoC MC design has been the balance between latency -- low latency being needed by the CPU -- and efficient use of bandwidth. Most of the components such as the GPU and fixed-function hardware can tolerate fairly high latency but are very sensitive to bandwidth constraints. So the MC's for those generally need to do a lot of request combining in order to achieve efficient bandwidth.

Needless to say, throwing the CPU requests into this mix can be pretty detrimental. I can easily see having separate, asymmetrical memory controllers being beneficial in this case. Since the ones servicing the CPU's can send requests as soon as they get them, minimizing latency and the ones servicing the GPU can do a lot of request combining and queuing, thus efficiently using bandwidth.
 
Anand mentioned the bottleneck in Cortex A9 designs is the L2 cache controller. I'm not sure the details of this, but he said A15 corrects things.

As Pressure mentioned it's not likely the GPU and CPU have different connections to the 4 memory controllers. Instead all 4 memory controllers are attached to the GPU. Attaching the memory controllers to the GPU and having the CPU hang off the GPU in the embedded space seems like a very console-like design. I'm guessing the interface between the GPU and CPU is 64-bit, which explains why the memory bandwidth the CPU sees is unchanged.

As an aside wasn't the + in the Vita's SGX543MP4+ for the GPU having it's own memory controller? Of course, the CPU doesn't share the Vita GPU memory. Does any know the Vita's GPU memory bandwidth for comparison?


The A5X achieves 12.8GB/s of bandwidth by using LPDDR2-800 and 4 32-bit memory controllers while Exynos 5250 appears to reach 12.8GB/s of bandwidth using LPDDR3-1600 and 2 32-bit memory controllers. Apple presumably did it their way because it's easier to just double the existing memory controllers than design a new one for LPDDR3. As Anand mentions, you need a sufficiently large die with a sufficiently large perimeter to have the space to put 4 memory controllers along the edge. With Exynos 5250 being 32nm and Apple being known for their unusually large dies, it's safe to assume the Exynos 5250 will be quite a bit smaller than the A5X, meaning their likely isn't enough room to place 4 memory controllers.

Cool. well it seems to me then that Exynos is going to be alot smaller..alot more power efficient, and alot more advanced ;)

Really hoping Sammy sticks one @ 1.7ghz in GS3 :cool:
 
Needless to say, throwing the CPU requests into this mix can be pretty detrimental. I can easily see having separate, asymmetrical memory controllers being beneficial in this case. Since the ones servicing the CPU's can send requests as soon as they get them, minimizing latency and the ones servicing the GPU can do a lot of request combining and queuing, thus efficiently using bandwidth.
But don't both the CPU and the GPU have access to the full 1GB of memory and therefore need make use of all memory controllers? Can you actually separate the memory controllers into CPU-focused and GPU-focused functions?
 
But don't both the CPU and the GPU have access to the full 1GB of memory and therefore need make use of all memory controllers? Can you actually separate the memory controllers into CPU-focused and GPU-focused functions?

Sure you can. There are no ordering requirements between components. The GPU isn't coherent to the CPU and the various CPU's enforce coherency at the L2 level.

Even today, many MC designs prioritize CPU requests -- evicting request slots in a combined request and re-issuing it at the back of the queue in order to make room for a CPU request -- in order to satisfy the latency sensitivity. However, such logic adds extra pipelining which adds extra latency....
 
The A5X achieves 12.8GB/s of bandwidth by using LPDDR2-800 and 4 32-bit memory controllers while Exynos 5250 appears to reach 12.8GB/s of bandwidth using LPDDR3-1600 and 2 32-bit memory controllers. Apple presumably did it their way because it's easier to just double the existing memory controllers than design a new one for LPDDR3.

Doubt difficulty has anything to do with it. One factor is availability, a month ago Samsung predicted that LPDDR3 can be deployed in products "by the end of the year". There was not a snowball in hells chance to have ramped volumes in preparation for the iPad launch, we are talking roughly a year too late.
I haven't checked device parameters yet, but all things being equal, driving the interface at twice the frequency comes at a cost in power draw. I'm sure there are people here who can quickly supply figures.

Must say I'm both surprised and impressed by Apples move here. I figured there were three options available to increase bandwidth but suspected they might use none of them. And they used a fourth that opens a straightforward path to yet another doubling. Marketing bragging rights didn't require this, they actually design for performance.
 
When the resolution gap was relatively small because the iPhone had moved up to its high pixel density display yet the iPad still hadn't, the two product lines could reasonably share the same SoC. Now that the iPad has moved up to its own high density display, I assume it'll get the "X" variant of each SoC going forward.

So, the next iPad will get an A6X while the iPhone will get the A6. While a small possibility exists that this year's iPhone gets the 32/28nm G64xx based A6, it's more likely that it gets a process shrunk A5 this year as speculated in the Anandtech review. The high powered, tablet focused A6X should then debut with the 2013 iPad followed by the phone targeted A6 in the 2013 iPhone.
 
When the resolution gap was relatively small because the iPhone had moved up to its high pixel density display yet the iPad still hadn't, the two product lines could reasonably share the same SoC. Now that the iPad has moved up to its own high density display, I assume it'll get the "X" variant of each SoC going forward.

So, the next iPad will get an A6X while the iPhone will get the A6. While a small possibility exists that this year's iPhone gets the 32/28nm G64xx based A6, it's more likely that it gets a process shrunk A5 this year as speculated in the Anandtech review. The high powered, tablet focused A6X should then debut with the 2013 iPad followed by the phone targeted A6 in the 2013 iPhone.

Interesting opinion. it does seem a very logical thing to do.
Do you think we will see an Apple brand-wide design theme? the current one is getting long in the tooth now.

EDIT; Do you think there is a possibility of A6X using next gen DDR4?? this site says Sammy might have some working for next year..
http://spectrum.ieee.org/semiconductors/processors/six-paths-to-longer-battery-life
 
When the resolution gap was relatively small because the iPhone had moved up to its high pixel density display yet the iPad still hadn't, the two product lines could reasonably share the same SoC. Now that the iPad has moved up to its own high density display, I assume it'll get the "X" variant of each SoC going forward.

So, the next iPad will get an A6X while the iPhone will get the A6. While a small possibility exists that this year's iPhone gets the 32/28nm G64xx based A6, it's more likely that it gets a process shrunk A5 this year as speculated in the Anandtech review. The high powered, tablet focused A6X should then debut with the 2013 iPad followed by the phone targeted A6 in the 2013 iPhone.

There is a lot of logic from an engineering sense in what you say. However, from a marketing sense it poses some difficulties. Many rumours are suggesting that the next iphone will be visual/ergonomically different, and likely with a bigger screen. But marketing will probably want to shout about new internals too. Its probably not enough for them to talk about dual-core A9 (even with an uprated clock), which by launch (Sept/Oct ?)will be far from leading edge in the tick box stakes.
 
The 2012 iPhone, especially if they launch as late in the year as they did last year, does give them a perfect opportunity to launch with a 32/28nm G64xx based SoC, so I definitely allow for the possibility. Also, Apple could re-emphasize iPhone as the lead iOS device again by launching it with an A6 and then following with the higher-powered, tablet focused A6X for next year's iPad.

I can see it both ways. The picture will become clearer in the coming months. The bigger question to me right now is whether iOS 6 will finally embrace OpenCL, especially considering Apple first proposed the spec. They're certainly taking some time with that, and iPhoto could certainly stand to benefit from more/any GPGPU.
 
Back
Top