Next-Gen iPhone & iPhone Nano Speculation

Discussion in 'Mobile Devices and SoCs' started by Arun, Jun 19, 2011.

  1. TheAlSpark

    TheAlSpark Moderator Moderator Legend

    Hm... on the one hand Micron does apparently have 16Gbit LPDDR3 available.

    On the other hand, the first wave of LPDDR4 may still only be 8Gbit, so if board space isn't an issue, they could do two chips there to hit 2GB.

    ...maybe?
     
  2. Alexko

    Alexko Veteran Subscriber

    Aren't mobile RAM chips meant to be stackable, package-on-package? (Honest question, I really don't know but that would make sense to me.)
     
  3. anexanhume

    anexanhume Veteran

    Yes, Apple utilizes package on package (PoP) for iPhone SoC packages. IPad DRAM is off chip though, and they use metal covers on those.
     
  4. Grall

    Grall Invisible Member Legend

    The A6X had off-chip DRAM due to the 128-bit bus, but I don't think current A7 iPads have it. I'm not sure, I haven't studied any teardowns intensely, but if it's off-chip again it might be for improved thermal dissipation, in order to clock the SoC higher than in iPhone.
     
  5. The A7 uses 800MHz/1600MT/s LPDDR3 in a dual-channel configuration.
    It has less overall bandwidth than A6X's quad-channel 1066MT/s LPDDR2, so the ipad 4 is technically less bandwidth-limited than the ipad air (12.8GB/s vs. 17.66GB/s), though the more recent GPU probably compensates that somehow (more cache?).
     
  6. anexanhume

    anexanhume Veteran

    The retina mini does not have it (it has PoP like the 5S). The ipad air does have DRAM off chip and a metal cover/heat spreader. Prior to ipad air, I would have thought they'd do away with that and do all PoP. I have to think eventually they want to work towards wideIO with SIP or TSV.

    Sorry, where was that speed confirmed? I had been assuming 1333MT/s.
     
    Last edited by a moderator: Jul 17, 2014
  7. https://d3nevzfk7ii3be.cloudfront.net/igi/XR6jJPopqjgw4LfW
    http://www.micron.com/products/dram/mobile-lpdram#fullPart&306=2

    The part number of the LPDDR3 is Elpida F8164A1MD-GD-F with the GD corresponding to LPDDR3-1600. JD is LPDDR3-1866. No idea what LPDDR3-1333 would have been.


    On the A8, is there a reason why core counts tend to be powers of two or even numbers? Are there technical advantages or is it just because a 2x increase is easier to market and is enabled by doubled transistor counts each full node? The Metal documentation uses 3 threads in their multithreading examples. There probably isn't any deeper meaning there, but it seems sensible for them to use all approaches to increase CPU performance: bump the core count to 3 cores, continued microarchitecture improvements, modest clock speed increases. Apple seems to like using transistors rather than clock speeds to improve performance since they can eat the cost and save the power so the rumours that they would jump from 1.3 GHz to 2 GHz or more seem extreme.
     
  8. anexanhume

    anexanhume Veteran

    Thanks, I was using an outdated Elpida PDF that didn't have those codes listed (or even LPDDR3).

    Well, they've been able to advance ISA/microarchitecture each of the last 3 iphone iterations (A8->A9->ARMv7s->ARMv8-A). There's no new ISA or reference design to jump to this time around, so they'll have to stick to microarchitecture improvements to their own design. They've also claimed 2x speedup the last three SoC iterations, and I think they'll have a tough time getting there without increasing core count or drastically increasing clock speed. However, here we are nearing a full year later and they're still the only mobile device on the market with a ARMv8-A design, so I'm done underestimating them :grin:

    On the 3 core (or more idea), I'm guessing they'd really want to bleed the stone on microarchitecture improvements now that they've added a big L3 cache. Defining the the bus interconnects for all of that is likely a pain and uses a fair amount of die space. I'm sure they have an informed trade of clock speed/core count/core complexity that's driving their decisions though.
     
  9. Ailuros

    Ailuros Epsilon plus three Legend Subscriber

    I don't know what they've planned, but let's assume they have a G6630@=/>600MHz in the A8. That's doesn't follow your reasoning above either as it's by 50% more area (6 clusters vs. 4 in A7) and =/>35% higher frequency. And yes that theory makes more sense than say a G6430@770MHz, albeit it's not technically impossible at all.

    The point here is that it depends if N architecture is laid out for excessive frequencies or not. Rogue as an architecture is far more tolerant to them afaik compared to its predecessors; after all unless I'm reading something wrong out the Allwinner A80 Manhattan results it's clocked at ~780MHz. Yes a dual cluster G6230 is far more easier to get clocked higher than a four or more cluster config, but all it should mean is that just because an IHV has a track record of N hw strategy that any change to that would be absolutely taboo.
     
  10. anexanhume

    anexanhume Veteran

    After their aggressive ARMV8-A adoption, I'm half expecting a GX6450 or GX6650. I'm guessing the finer power gating is extremely attractive and Metal can compensate for the lack of big clock boosts or USC growth potentially.
     
  11. mavere

    mavere Newcomer

    TSMC's 20nm is much more of a density story than a perf/power story. Between that and the suspected roomier 4.7" housing and the expected wafer allotment (up to 40k/month in Q3, 50+k wpm throughout Q4), I think Apple will have a lot of new transistors to spend on something.

    Maybe it'll be for largest possible PowerVR design, 3rd CPU core, larger cache, or even a fancy on-die voltage regulator to turbo to rumored clockspeeds without sacrificing battery life. Or maybe they'll just stick to the original A7 architecture and draw happy faces on the spare die area.
     
  12. Ailuros

    Ailuros Epsilon plus three Legend Subscriber

    No one has seen yet how 6XT cores perform in real time, however from the layman's observer corner I'm standing here, I'm not all that optimistic that the claimed up to 50% compared to 6430/6630 stands up to be an average also.

    The only hw change they've revealed for 6XT are additional FP16 ALUs; so far the 6200 vs. 6230 & 6400 vs. 6430 (whereby on 6200/6400 you don't have any FP16 ALUs and no framebuffer compression) haven't shown any groundbreaking performance differences, so I'm obviously missing something.
     
  13. anexanhume

    anexanhume Veteran

    That's why I'm guessing it's not about peak performance improvement. Better idle power and improved memory utilization via compression techniques seem desirable in and of themselves.
     
  14. Grall

    Grall Invisible Member Legend

    How well would framebuffer compression work anyway? Even non-realtime lossless compression schemes doesn't get you all that far really.
     
  15. Ailuros

    Ailuros Epsilon plus three Legend Subscriber

    I don't recall how they call the function where you can power gate 2 clusters at a time but it's also present in the 6630 as well as framebuffer compression and what not. 6XT cores also support ASTC in hw but that rather means more transistors dedicated to it. In terms of FP16 ALUs it's 288 SPs in the 6630 and 384 in the 6650.

    I'd say as with all single out aspects it just further contributes to efficiency like a multitude of other factors. You know marketiers would kill even for a say 5% performance difference :D
     
  16. silent_guy

    silent_guy Veteran Subscriber

    I can't find the link that refers to the compression, but even if it's pure lossless compression without any additional context bits, I think it could do really quite well (say, 40% reduction?) in a lot of cases. And for cases where they don't, it's still not a big deal: I expect that the power cost of having it enabled to be less than the high cost of transfer data to DRAM.

    And you could imagine it to be adaptive: when a frame don't sufficiently compress to be worth it, one could do it once every 100 frames or so to see if the workload has changed for it to be useful.
     
  17. anexanhume

    anexanhume Veteran

    I was under the impression that the power gaining has a finer granularity with the GX series. Is that not the case?

    From Anandtech preview:

    http://www.anandtech.com/show/7629/...rchitecture-available-for-immediate-licensing
     
  18. Ailuros

    Ailuros Epsilon plus three Legend Subscriber

    Why do I have the impression that it's already present on 6630?

    http://www.imgtec.com/news/detail.asp?ID=706

    Else if you're adequate with just 2 clusters to run the device GUI and what not, the other 4 clusters can be power gated (besides obviously clock gating since you don't necessarily need to run at full frequency for the GUI either). If now they got the ability to shut off also on single cluster levels since 6XT no idea, but it doesn't make all that much sense to me either, since you have 1 quad TMU coupled with 2 USCs at a time.
     
  19. anexanhume

    anexanhume Veteran

    Yes, but still finer grain (individual USCs), which was my question. Though I doubt the benefits of 1 USC vs 2 with others power gated is all that dramatic.
     
  20. Ailuros

    Ailuros Epsilon plus three Legend Subscriber

    Hmmm well IMO if you have in total 6 clusters power gate 5 of them I'd still figure that at least one quad TMU would have to be active, which makes the gain against a 2 USC + 4 TMU active scenario rather negligable as you say. Unless of course you can turn off part of each quad TMU also, but they wouldn't then state that 2 USC share a texture pipeline at a time.
     
Loading...

Share This Page

Loading...