Next-Gen iPhone & iPhone Nano Speculation

Anyway, what other choice does Apple have currently? TSMC has more than enough demand to deal with at the moment.
With Intel slowly taking tentative steps in welcoming foundry customers there is a glimmer of possibility there. I suppose to prevent Apple's volumes from impacting Intel's own production or Intel SoC advantage, they would probably restrict an offer to Apple to previous gen, ie. 32nm process. It would probably also require Intel concluding that Apple iOS will retain a significant market share and that Apple won't be switching to x86 for iOS anytime soon so Intel might as well get a share of the profits by manufacturing the SoCs.

On another note, even if CPU clock speeds haven't changed I wonder the likelihood that they still made some tweaks a la S5PC100 > Apple A4 namely larger L2 cache? Wouldn't setup and driver overhead mean more CPU performance is useful in keeping the expanded GPU fed?

http://www.arm.com/images/CortexA9_L2C310_Page_Render_Time.jpg

Interestingly, ARM reports the speedup going from 512KB to 1MB L2 cache is larger than going from 256KB to 512KB L2 cache.

Cortex A9 can also be configured with single or dual 64-bit AXI interfaces. Do we know if Apple was already using dual 64-bit AXI in the A5? Otherwise that seems like another avenue to increase CPU performance as well as feeding the GPU and maximizing bandwidth from the memory controller.
 
Here's a blog post about the SHA.

http://www.displaysearchblog.com/20...r-of-pixels-into-its-new-ipad-retina-display/

What is interesting is that one of the commenters suggests that someone now working for Apple may have the patent on the SHA design.
http://www.linkedin.com/in/johnzhong

- One of original inventors of super-high aperture TFT technology using either an organic passivation or color filter on array (COA)
Apple's John Z. Zhong's LinkedIn profile reports him as the co-inventors of SHA in his previous employment at OIS and the patents in question may be the ones below:

http://www.patentgenius.com/inventedby/ZhongJohnZZTroyMI.html

They are assigned to LG and OIS so Apple probably can't claim exclusivity unless Zhong has continued his work at Apple with an improved custom design.
 
Tell that to AMD/GF ;) (No seriously..there really isnt any clear information on when they are moving to 22/20 nm. AMD's roadmap so far indicates that they'll continue on 32/28nm through 2013. Intel will be close to 14nm by that point)

How I wish AMD would put out a decent CPU. Sigh....

But there's an entire doom and gloom thread for that
 
Moreover, why would using intra-tile parallelism suffer from latency with fewer parallel front-ends if latency can be masked with fetch pipelining?

Latency is an issue only if you don't have enough arithemetic. For something as regular as dense matrix multiplication, you can hide pretty much arbitrarily long latencies as long as you have enough on chip ram to hold the tiles.

For more irregular workloads, arithmetic intensity is lower. So if you clock your chip twice as high keeping the memory subsystem the same, you will need twice the arithemetic to hide the same latency or twice the threads in flight
 
Latency is an issue only if you don't have enough arithemetic. For something as regular as dense matrix multiplication, you can hide pretty much arbitrarily long latencies as long as you have enough on chip ram to hold the tiles.

For more irregular workloads, arithmetic intensity is lower. So if you clock your chip twice as high keeping the memory subsystem the same, you will need twice the arithemetic to hide the same latency or twice the threads in flight

Well yes. My question is, how regular is the average scene in terms of data processing patterns. How often do you need to go back to access vertex 0 after initially processing it in mobile GPU's nowadays. I know the more complicated GPGPU features require large scratchpads due to support for loops and flow control. But what's the status on an average OpenGL ES 2.0/3.0 game?
 
Both are 543MP4. The iPad's is clocked at least 50 MHz higher. The "+" customization on the Vita's is said to be a relatively minor addition to feature support.

While iPad has good memory performance, Vita has its dedicated video RAM. iPad has more total RAM, but apps are more limited in how they can use it.

Far less abstraction in the API for Vita and less overhead from the OS.

As for how a 543MP4 stacks up in general:
16 Vec4 all-purpose ALUs in a design yielding ~28.8 GFLOPS@200MHz
8 TMUs, so 1.6 Gtex/sec@200MHz
64 Z Units, so 12.8 Gpix/sec Z/stencil @200MHz
Rated for ~130M+ tri/sec@200MHz

Performance has to be considered in the context of a TBDR, so the benefits of a tile buffer and 100% efficiency for texel rate also apply.

The Vita and iPad both have NEON with the CPU.

Thanks for this, appreciated.
 
Notes that it requires twice as many LEDs for the backlight, which may explain some of the increased battery capacity.
Isn't SHA supposed to let more of the backlight through? More LEDs could be used for higher peak brightness or better uniformity, but unless the LEDs are less efficient (which would be odd) I don't see why they would consume more power to generate the same brightness.
 
So, the 250 MHz assumption seems to be right on, and pixel and triangle rates appeared to have scaled roughly in proportion.

The frame rate of Egypt Offscreen fell somewhat short, yet it probably has some room to close the gap a little on subsequent test runs. I doubt it'll ever come close to a 2x gain on that particular test, though. The 1080p GLBenchmark 2.5 might indeed be a better measure.
 
So, the 250 MHz assumption seems to be right on, and pixel and triangle rates appeared to have scaled roughly in proportion.

The frame rate of Egypt Offscreen fell somewhat short, yet it probably has some room to close the gap a little on subsequent test runs. I doubt it'll ever come close to a 2x gain on that particular test, though. The 1080p GLBenchmark 2.5 might indeed be a better measure.

Ironically it's almost 3.0x times ahead in the PRO offscreen test compared to the fastest T30 in the database. I still don't consider a < OGL_ES2.0 synthetic benchmark of any particular relevance nowadays.

2.5 should be a lot better since the bandwidth weight in 1080p is obviously higher as you imply and it also seems to rise complexity significantly compared to 2.1.

Wait I just noticed: the iPad3 lists as a display resolution 1024*768: http://www.glbenchmark.com/phonedetails.jsp?benchmark=glpro21&D=Apple+iPad+3&testgroup=system

Just some sw flaw or is the iPad3 running 3D only in 1024*768?
 
I don't think they would have moved to an MP4 if the resolution wasn't accessible in 3D. But it's a given that the non-offscreen tests are running in 1024x768, otherwise I don't think they'd get a solid 60fps like they do.

I assume the iPad 3 is capable of running in what appears to be a native 1024x768 resolution, for iPad 1 and 2 backwards compatibility (as well as 480x320 for its iPhone compatibility; not sure if it ever did add 960x640 support). It could be that multiple available resolutions are enumerated but GLBenchmark doesn't list more than the first one, or that apps have to do something to make it accessible.
 
Just some sw flaw or is the iPad3 running 3D only in 1024*768?
Well Namco said in their game demo that the iPad 3's increased graphics power allowed higher graphical detail and that the increased resolution made sure all that detailed is conveyed to the user. So it certainly sounds that the full Retina Display resolution is being used in 3D games.

I'm a little surprised those low level 3D tests get perfect 2x scaling from the iPad 2 to iPad 3, when it doesn't hit the 60fps cap. Assuming the clock speed is the same, the MP architecture looks extremely efficient. I wonder if the lower scaling in the offscreen game tests is because of poorer scaling in the GPU architecture outside low level theoreticals, bandwidth starvation, being CPU limited or a combination? The Apple A5 already used 2x32-bit LPDDR2-800. Presuming Apple didn't do anything radical, they could have only gone to LPDDR2-1066 for a 33% bandwidth increase. If it is a CPU bottleneck, then it's curious why they didn't bump the CPU clock since with them quadrupling the pixels but only doubling the GPU, you'd think they'd want to extract every bit of performance from the GPU.
 
Applications need to be updated to set a scale factor if they want to take advantage of the higher resolution in OpenGL ES, just as they did when the iPhone 4 was introduced. Otherwise they'll just continue to use 1024x768 as logical resolution, though standard UI elements automatically make use of the higher resolution to provide higher quality.
 
Any reason going to a 2x resolution require a change in the CPU speed? Could they really get away with sticking with 2x1Ghz Cortex A9 CPUs? I know the GPU does most of the heavy lifting in supporting the new screen resolution, but certainly when loading/manipulating assets, dealing with 4x pixels puts extra work on the CPU (especially in apps like iPhoto). I'm surprised no one bothered to run SunSpider, as imperfect as that test might be.
 
Any reason going to a 2x resolution require a change in the CPU speed? Could they really get away with sticking with 2x1Ghz Cortex A9 CPUs? I know the GPU does most of the heavy lifting in supporting the new screen resolution, but certainly when loading/manipulating assets, dealing with 4x pixels puts extra work on the CPU (especially in apps like iPhoto). I'm surprised no one bothered to run SunSpider, as imperfect as that test might be.

I would have thought that iPhoto would be heavily GPU accelerated. With Core Image and stuff like that all over the place.
 
I would have thought that iPhoto would be heavily GPU accelerated. With Core Image and stuff like that all over the place.

Huh, I guess it can. Totally forgot about the low-level hardware-accelerated APIs. I've never had to use them in any of my apps.
 
In 3D games, doubling resolution generally doesn't put more pressure on CPU, assuming other conditions remain the same. The same can be observed on PC.

Conditions can be different, of course. A game (or any application) designed for higher resolution may require larger images, which will take more CPU power to decode. However, in most case this only affects loading time (assuming the memory is large enough). Geometry based graphics (circles, lines, vector graphics, etc.) also need more CPU power to draw.

Text rendering in high resolution also takes more CPU power, but texts can be cached (may not work very well for scripts with many characters though, such as CJK ideograph).
 
Back
Top