Next-Gen iPhone & iPhone Nano Speculation

My bad then.
The only problem I notice is that you forgot to think of frequency. What if purely hypothetically you have a MP4 clocked at 500MHz vs. the current MP2 clocked at 250MHz, is it still only a 2x times increase or could be conincidentially 4x this time? Not that I expect anything like that, but the entire paragraph above is still missing one rather important detail which is frequency and it does have the tendency to increase at least slightly with smaller manufacturing processes.

The condition i mentioned, assumed that Apple will use the same frequency as the MP2. Or close to it anyway.

The reason for that is simple. If they go for the MP4, the power usage when all 4 cores are stressed, will double ( twice the amount of cores ).

Your right, that it can be offset by the fact that they use a small manufacturing process. And add maybe some small increase in mhz.

But going from MP2 @ 250mhz, to MP4 @ 500mhz, is something different.

MP2 @ 250Mhz => x power load
MP4 @ 250Mhz => 2x power load
Mp4 @ 500Mhz => 2x power load + 4 * power load

This comparison is a bit rudimentary but get the point across.

Each core doubling its frequency increases also the power load on each core. If there is something we all know, with any CPU/GPU, the higher the frequency, the more the power requirements skyrocket.

Now, a Mp4 @ 500Mhz, will give a theoretical increase of performance by a factor of 4 ( compared to a MP2 @ 250 ). Making it in theory even faster then the Rogue.

The A5 is currently made at 45 nm. Assuming that the A6 is made at 28 nm ( 20 nm is still too much in the future ).

From my years of experience with PC hardware, a smaller manufacturing process does not magically allow the frequency to double. And that does not take in account, going from MP2 to MP4.

But, realistically? It does not sound very realistic. A MP4 @ 300 or 350Mhz, that i can still believe, but jumping to 500Mhz... Even the Sony Vita's MP4 core speed is unknown, but it was rumored that they had problems with it in the past in regards to heat => And heat in general means high power consumption.


There are some rumors flying that the iPad3 is supposed to have a bigger battery. Maybe its because of the increase with CPU's / GPU's, but my bet is that the Retina Display is sucking in most of the power. In any smartphone/tablet the main power consumer, is the screen ( unless you run the cpu/gpu at full load all the time ofcourse ).
 
The condition i mentioned, assumed that Apple will use the same frequency as the MP2. Or close to it anyway.

The reason for that is simple. If they go for the MP4, the power usage when all 4 cores are stressed, will double ( twice the amount of cores ).

Your right, that it can be offset by the fact that they use a small manufacturing process. And add maybe some small increase in mhz.

But going from MP2 @ 250mhz, to MP4 @ 500mhz, is something different.

MP2 @ 250Mhz => x power load
MP4 @ 250Mhz => 2x power load
Mp4 @ 500Mhz => 2x power load + 4 * power load

This comparison is a bit rudimentary but get the point across.

Too much detail over something that fits in a sentence. Just for the record's sake the ULP GeForce in T20 (Tegra2 tablets) has 1 Vec4 PS ALU + 1 Vec4 VS ALU clocked at 333MHz, while the ULP GeForce in T30 (Tegra3 tablets) has 2 Vec4 PS ALUs + 1 Vec4 VS ALU clocked at 520MHz and that at the very same TSMC 40nm manufacturing process. Now I admittedly haven't seen any power consumption measurements with 3D for both platforms, but if it should be somewhat higher on Tegra3 it won't obviously just come from the GPU if all A9 CPU cores are utilized at their maximum frequencies.

A5 had been manufactured on Samsung's 45nm and Apple's next SoC most likely will be manufactured on Samsung's 32nm. Now I never suggested or will suggest that doubling the amount of cores and at the same time doubling the frequency is feasable, but even if it wouldn't increase the power consumption by 4x times under full stress exactly because a smaller manufacturing process is at play for the successing SoC.

Each core doubling its frequency increases also the power load on each core. If there is something we all know, with any CPU/GPU, the higher the frequency, the more the power requirements skyrocket.
That's not a rule even under the same manufacturing process. A GTX480@700MHz has 15 SMs enabled only and has a TDP of 250W, while it's successor the GTX580 under the very same 40G/TSMC process has all 16 SMs enabled is clocked at 772MHz with a TDP of 244W. Hw bugs aren't all too typical like in GF100 but in any case if you overgeneralize things there are more than one traps you can step into.

Smaller manufacturing processes have typically a higher tolerance for higher frequencies. Samsung has announced that they managed to increase the frequencies of their Exynos 4xxx SoCs by 50% while going from 45 to 32nm. But that's without any additional chip complexity. The more additional units get used the lower the chances for as high frequency increases. Note that I didn't claim that a MP4@500MHz is likely, just that you missed to consider it in your former post.

Now, a Mp4 @ 500Mhz, will give a theoretical increase of performance by a factor of 4 ( compared to a MP2 @ 250 ). Making it in theory even faster then the Rogue.
Not a single chance in hell would it be faster than Rogue.

SGX543 or 544MP4@500MHz =

72 GFLOPs, 4.0 GTexels/s, 332M Tris/s, DX9 (L3 for 544)

ST Ericsson Novathor A9600 Rogue (most likely being a 4 cluster G6400) =

>210 GFLOPs, >5.2 GTexels/s, >350M Tris/s, DX11.x

The A5 is currently made at 45 nm. Assuming that the A6 is made at 28 nm ( 20 nm is still too much in the future ).
If Apple's next SoC will be manufactured under 28nm (which would be most likely TSMC) then the step from 45 to 28nm is actually twice as big since an entire full node (32nm) would had been jumped. In that case a MP4@500MHz would be much easier than under 32nm (but still not within possibilities).

From my years of experience with PC hardware, a smaller manufacturing process does not magically allow the frequency to double. And that does not take in account, going from MP2 to MP4.
IHVs are investing the majority of the headroom given by the new process into more units and only a relatively small portion of it into frequency. That shouldn't mean though that if they'd go for a MP4 under 32nm (at least) that they couldn't slighly increase the frequency also.

Just for the record's sake the A4 under 65nm had its single core SGX535 clocked at 200MHz, while the dual core SGX543 in A5 under 45nm is clocked at 250MHz.

SGX535 =
2 Vec2 ALUs
2 TMUs
8 z/stencil

SGX543MP2 =
8 Vec4+1 ALUs
4 TMUs
32 z/stencil

Look at how much bigger the A5 is under 45nm compared to A4 under 65nm and usual power consumption hasn't changed. However if you stress out both under 3D power consumption should be quite a bit higher on A5.

But, realistically? It does not sound very realistic. A MP4 @ 300 or 350Mhz, that i can still believe, but jumping to 500Mhz... Even the Sony Vita's MP4 core speed is unknown, but it was rumored that they had problems with it in the past in regards to heat => And heat in general means high power consumption.
PS Vita's SGX543MP4+ is clocked at 200MHz manufactured at Samsung's 45nm. I NEVER SAID, CLAIMED OR IMPLIED THAT IT'LL CLOCK AT 500MHz. I merely pointed out that your former reasoning is flawed not encounting frequencies.

There are some rumors flying that the iPad3 is supposed to have a bigger battery. Maybe its because of the increase with CPU's / GPU's, but my bet is that the Retina Display is sucking in most of the power. In any smartphone/tablet the main power consumer, is the screen ( unless you run the cpu/gpu at full load all the time ofcourse ).
That's a reasonable assumption.
 
PS Vita's SGX543MP4+ is clocked at 200MHz manufactured at Samsung's 45nm. I NEVER SAID, CLAIMED OR IMPLIED THAT IT'LL CLOCK AT 500MHz. I merely pointed out that your former reasoning is flawed not encounting frequencies.

Where has it been said that it was only 200MHz? I heard estimates closer to 400.
 
Where has it been said that it was only 200MHz? I heard estimates closer to 400.

Multiple times here in the forums. 400MHz isn't even reasonable under 45nm, but you might want to point out to those that make those kind of estimates that the architecture is more efficient than they can actually estimate.
 
400 MHz is reasonable at 45nm, but as always it requires tradeoffs (assuming you meant reasonable in the ultimate sense, not in a PS Vita form factor and cost sense). There's nothing stopping 543MP4 running at that speed on anyone's 45/40nm process, ultimately.
 
A5 had been manufactured on Samsung's 45nm and Apple's next SoC most likely will be manufactured on Samsung's 32nm. Now I never suggested or will suggest that doubling the amount of cores and at the same time doubling the frequency is feasable, but even if it wouldn't increase the power consumption by 4x times under full stress exactly because a smaller manufacturing process is at play for the successing SoC.
Apple's tick tock strategy works in the favour here. They kept the same 45nm process when transitioning architectures from Cortex A8 + SGX535 in the Apple A4 to the Cortex A9 + SGX543MP in the A5. Process knowledge and maturity was probably an important factor in keeping power consumption in line even though the architecture complexity greatly increased. Now with the Apple A6 they can keep the architecture the same and focus on learning the new process. Adding more CPU/GPU cores is probably a little more complicated than just duplicating blocks, but the previous architecture experience should still apply.

If Apple's next SoC will be manufactured under 28nm (which would be most likely TSMC) then the step from 45 to 28nm is actually twice as big since an entire full node (32nm) would had been jumped. In that case a MP4@500MHz would be much easier than under 32nm (but still not within possibilities).
Wouldn't it be 1.5 times as big? I thought 45nm, 32nm, and 22nm are full nodes and 40nm and 28nm are half nodes. It just happened that for time to market issues TSMC cancelled their 45nm and 32nm process and focused on their half node derivatives. So going from 45nm to 28nm would be a 1.5 node jump.

It'd be interested to hear how Samsung's processes compare to TSMC's? I thought in terms of performance characteristics Intel's processes can often be a full node ahead of TSMC's, ie. Intel's 45nm process can perform as well as TSMC's cancelled 32nm process was targeted to in terms of voltage and drive current. Can Samsung's 32nm process match or beat the performance of TSMC's 28nm process albeit giving up a little on density?
 
Wouldn't it be 1.5 times as big? I thought 45nm, 32nm, and 22nm are full nodes and 40nm and 28nm are half nodes. It just happened that for time to market issues TSMC cancelled their 45nm and 32nm process and focused on their half node derivatives. So going from 45nm to 28nm would be a 1.5 node jump.

If memory isn't failing me again 45nm wasn't cancelled at TSMC and their 40G is their actual "45nm" process. 32nm is a full node.

It'd be interested to hear how Samsung's processes compare to TSMC's? I thought in terms of performance characteristics Intel's processes can often be a full node ahead of TSMC's, ie. Intel's 45nm process can perform as well as TSMC's cancelled 32nm process was targeted to in terms of voltage and drive current. Can Samsung's 32nm process match or beat the performance of TSMC's 28nm process albeit giving up a little on density?

I severely doubt the last. It would help to know what comes after 32nm at Samsung's fabs.
 
If memory isn't failing me again 45nm wasn't cancelled at TSMC and their 40G is their actual "45nm" process. 32nm is a full node.
http://www.beyond3d.com/content/news/608

Arun did an article to try to clarify things. If I'm reading it correctly TSMC planned 45nm G, GS, and LP variants along with 40nm G and LP variants. 45G was cancelled, 45GS and 40G are actually the same, and the article reports 45LP was already shipping so I guess both 45LP and 40LP went ahead although I really don't remember hearing much about 45LP in products.

I severely doubt the last. It would help to know what comes after 32nm at Samsung's fabs.
http://www.xbitlabs.com/news/other/..._Produces_First_20nm_ARM_Based_Test_Chip.html

Looks like they're jumping from 32nm to 20nm so they'll be doing a 1.5 node transition.
 
Last edited by a moderator:
I NEVER SAID, CLAIMED OR IMPLIED THAT IT'LL CLOCK AT 500MHz. I merely pointed out that your former reasoning is flawed not encounting frequencies.

I don't know why you are so defensive Ailuros, we are just discussing the feasibility of 500Mhz MP4.

I like to point out, you are comparing a Rogue MP4 setup, with a SGX543MP4. Its more the expected that the Rogue has a already increased Vec/ALU/TMU count ( on a single core ! ), giving it more power, but also at a increased load ( under the same manufacturing process ). Of course in that case a Rogue MP4 will be extreme faster, but no way that the power load will be the same as the SGX543MP4 ( again, assuming the same manufacturing process ). I was talking about a Single Rogue Core.

Anyway, i'm not going to discuss this anymore, because you seem to have taken that whole 500Mhz thing a bit too personal. And this is a pointless discussion anyway.
 
I don't know why you are so defensive Ailuros, we are just discussing the feasibility of 500Mhz MP4.

I'm a wee bit allergic to overgeneralizations that's all; nothing personal. And no we aren't discussing the feasability of a 500MHz MP4 as it was merely an example, but a key point that was missing.

I like to point out, you are comparing a Rogue MP4 setup, with a SGX543MP4.

http://www.imgtec.com/News/Release/index.asp?NewsID=666

The first PowerVR Series6 cores, the G6200 and G6400, have two and four compute clusters respectively.

multi-cluster != multi-core


Its more the expected that the Rogue has a already increased Vec/ALU/TMU count ( on a single core ! ), giving it more power, but also at a increased load ( under the same manufacturing process ). Of course in that case a Rogue MP4 will be extreme faster, but no way that the power load will be the same as the SGX543MP4 ( again, assuming the same manufacturing process ). I was talking about a Single Rogue Core.

G6400 is a single core with 4 compute clusters.

Anyway, i'm not going to discuss this anymore, because you seem to have taken that whole 500Mhz thing a bit too personal. And this is a pointless discussion anyway.

Here again your initial part where I objected:

But, when you see the 4 times increase in pixel density on the screen, then technically, using the SGX543MP4, is a downgrade.

Screen resolution increase: 4*
GPU speed increase: 2*

Notice the problem...

If frequency stays the same then and ONLY then is the above accurate. 500MHz isn't a reasonable expectation for a MP4 as I said over and over again above. It was merely an example to show that the quote above is nonsense if don't also consider frequency; heck it's equally missing an important spot if a MP4 is clocked at just 150 or 200MHz compared to the MP2 in the A5.
 
http://www.beyond3d.com/content/news/608

Arun did an article to try to clarify things. If I'm reading it correctly TSMC planned 45nm G, GS, and LP variants along with 40nm G and LP variants. 45G was cancelled, 45GS and 40G are actually the same, and the article reports 45LP was already shipping so I guess both 45LP and 40LP went ahead although I really don't remember hearing much about 45LP in products.

Memory did fail me after all ;) Who else but NVIDIA out of the small form factor SoC manufacturers used TSMC? Tegra2 at least was under 40G and I doubt T3 is on anything else but that one too.

http://www.xbitlabs.com/news/other/..._Produces_First_20nm_ARM_Based_Test_Chip.html

Looks like they're jumping from 32nm to 20nm so they'll be doing a 1.5 node transition.

Hmmm I just read through the article quickly but unless I've missed something it doesn't state anything about them supposedly skipping 28nm. The timing of the article sounds reasonable for first test runs with 20nm.

http://www.samsung.com/us/business/oem-solutions/pdfs/Foundry_32-28nm_Final_0311.pdf

http://semiaccurate.com/2011/08/30/global-foundries-and-samsung-split-28nm-processes/
 
Tegra2 at least was under 40G and I doubt T3 is on anything else but that one too.
I believe both Tegra 2 and 3 are under a hybrid process, 40LPG.


Hmmm I just read through the article quickly but unless I've missed something it doesn't state anything about them supposedly skipping 28nm. The timing of the article sounds reasonable for first test runs with 20nm.

http://www.samsung.com/us/business/oem-solutions/pdfs/Foundry_32-28nm_Final_0311.pdf

http://semiaccurate.com/2011/08/30/global-foundries-and-samsung-split-28nm-processes/
I actually didn't remember the 28nm process since Samsung didn't seem to do half nodes before. I suppose that means that A6 will be 32nm and A7 could benefit from having a half node to play with.

http://www.anandtech.com/show/5467/samsung-exynos-5250-begins-sampling-mass-production-in-q2-2012

Samsung's Exynos 5250 with dual core Cortex A15 and 4x graphics over the Exynos 4210 is supposed to be in mass production in Q2 2012. I guess that brings us back to the 4 x Cortex A9 vs 2 x Cortex A15 debate. The Exynos 5250 running it's CPUs at 2GHz would seem to give it a lot of performance room that would be difficult to compete with using Cortex A9. The Apple A5's SGX543MP2 is 2x faster than the Exynos 4210 Mali-400MP4. So the Apple A6 will need to have a 2x faster GPU if Apple wants to be competitive.
 
I believe both Tegra 2 and 3 are under a hybrid process, 40LPG.

http://www.anandtech.com/show/4144/...gra-2-review-the-first-dual-core-smartphone/3

It's a mixture of 40G and 40LP transistors for different parts of the SoC (at different voltage rails) if that's what you meant by hybrid process.

I actually didn't remember the 28nm process since Samsung didn't seem to do half nodes before. I suppose that means that A6 will be 32nm and A7 could benefit from having a half node to play with.
Obviously with Apple and their typical secrecy it's hard to know such details out front. I'm assuming 32nm to be the likeliest candidate.

http://www.anandtech.com/show/5467/samsung-exynos-5250-begins-sampling-mass-production-in-q2-2012

Samsung's Exynos 5250 with dual core Cortex A15 and 4x graphics over the Exynos 4210 is supposed to be in mass production in Q2 2012. I guess that brings us back to the 4 x Cortex A9 vs 2 x Cortex A15 debate. The Exynos 5250 running it's CPUs at 2GHz would seem to give it a lot of performance room that would be difficult to compete with using Cortex A9. The Apple A5's SGX543MP2 is 2x faster than the Exynos 4210 Mali-400MP4. So the Apple A6 will need to have a 2x faster GPU if Apple wants to be competitive.
What Exynos 5250 will contain for the GPU is unfortunately still a question mark; rumors of the past suggested a MaliT604. You're right about the dual A15 at high frequencies to be more than just a bit ahead of a hypothetical quad A9 and if the MaliT604 rumors are also true (always depending on amount of cores and final frequencies) could be a dangerous contender overall. Samsung leaving the GPU definition spot blank sounds suspicious. Either way it's not going to be an easy battle.

The next Apple SoC will have to have a healthy GPU performance increase; how that will get achieved is still unknown and no its not necessarily a MP4 at N frequency since no one can exclude at this stage a MP3 at N+ frequency instead. It's not only the Exynos 5250 that might be a fierce competitor; there are quite a few 28nm SoCs planned for H2 2012 that are anything but humble in raw specifications.
 
I am looking forward to this mali t604, but i doubt its going to be as powerfull as a 543MP4 @200 .. I mean maybe in terms of FLOPS but in real gaming terms i just cant see it my self.

Well unless mali is clocked stupidly high or something.

EDIT; Another thing, all this talk of 'cores' has got way out of hand.

It started off meaning either a full GPU/CPU 'core', then we moved on to 'unified shader cores' then that got broken down further seperating ALU's into 'cores'..So you end up with 1 GPU, apparently having 'thousands of cores'.
ARM further confused things with 'quad core mali' which has nothing to do with number of gpu's, or even unified shaders or ALU's..??

What really started to take the cake though was Nvidia calling old pixel and vertex pipes 'cores'.. Now we have IMG TECH bringing out 'compute clusters' ...jesus what ever next!?
 
Last edited by a moderator:
I am looking forward to this mali t604, but i doubt its going to be as powerfull as a 543MP4 @200 .. I mean maybe in terms of FLOPS but in real gaming terms i just cant see it my self.

Well unless mali is clocked stupidly high or something.
Wait, T604 couldn't beat 543MP4 at 200(!) MHz?! That's just plain wrong, period. Of course, part of the reason why it's wrong is that T604 will definitely clock massively higher than 200MHz on any modern process. But even at the same clocks and based on public information it should give it a good run for its money on ALU-rich workloads (that are not specifically optimised for PowerVR; i.e. obviously some games don't even bother with front-to-back ordering)

EDIT; Another thing, all this talk of 'cores' has got way out of hand.
No it hasn't. With the exception of NVIDIA which has been silly about it ever since G80, the handheld industry is INCREDIBLY SANE about the whole thing. For IMG and ARM, the number of cores equals the number of rasterisers. This was equal to 1 for all desktop GPUs until Fermi/Cayman. For Vivante, there's still only a single rasteriser, so the number of cores (assuming they don't lie too much) is equal to the number of instruction decoders they have. Both are perfectly reasonable uses of the 'core' terminology in my mind.

ARM further confused things with 'quad core mali' which has nothing to do with number of gpu's, or even unified shaders or ALU's..??
Once again it has to do with the number of rasterisers, and therefore fully independent graphics processing blocks (even if there's a central job scheduler in T604 as per the public diagrams).

What really started to take the cake though was Nvidia calling old pixel and vertex pipes 'cores'.. Now we have IMG TECH bringing out 'compute clusters' ...jesus what ever next!?
Yeah, NV trying to do that is insane, it definitely takes the cake. I *think* Tegra is even Vec4 rather than scalar ala G80 which makes it even more silly. As for IMG's "compute clusters" - I've heard there's a good architectural reason to talk about it that way (they are not cores and I assume they still have those?) although their PR is sadly very confusing given the complete lack of technical information. I really don't know what their marketing department was even thinking mentioning that concept at this point without the proper explanation - simply giving a flops range for the initial family would have been good enough at this point.
 
ARUN; Cheers for breaking that rats nest down for me, so what you are saying is the Mali 400mp4 has 4 rasterizers? where as the SGX543mp1..has just 1?

Or have i got that wrong?

Regardless, i still think using the consumer friendly 'cores' terminology to desribe various different things is completely misleading to the average jo, who doesn't understand the detail of such things.

For instance i have read various accounts of folks comparing a 'duel core' ipad 2 for the (then) upcoming Tegra 3 which apparently had.. ahem '12 cores!'..12 beats 2 so it must be better!!
 
ARUN; Cheers for breaking that rats nest down for me, so what you are saying is the Mali 400mp4 has 4 rasterizers? where as the SGX543mp1..has just 1?
Correct, however one company/IP's rasteriser is not necessarily as fast as another... The triangle setup and rasteriser units could be made massively faster per unit still, the reason why they scale them up is mostly because it makes it possible for customers to select the number of cores they want on their own without any expensive/long customisation work on ARM/IMG's end. And since they're TBR architectures and the pixel data is going to be kept local anyway, there's basically no penalty to doing so (unlike on desktop architectures where it adds some minimal constraints).

Regardless, i still think using the consumer friendly 'cores' terminology to desribe various different things is completely misleading to the average jo, who doesn't understand the detail of such things.

For instance i have read various accounts of folks comparing a 'duel core' ipad 2 for the (then) upcoming Tegra 3 which apparently had.. ahem '12 cores!'..12 beats 2 so it must be better!!
Generally speaking, Joe Six Pack easily comes to very flawed conclusions based on his 'technical analysis' in fields he's not qualified in. I'm not sure what it is that makes humans so eager to make their own analysis in areas they do not truly understand - perhaps it's a relic from when we lived in much smaller social units (tribes/villages) and the distribution of knowledge was much more equal. Or maybe it's something that dates back to these dark times before the Internet (gosh that's a long time ago!) and the human species doesn't yet have the habit of checking up their facts before talking (or they're just really bad at it - it's amazing how few people are really productive at using Google). Anyway people will be misled by marketing one way or another - it's sadly just a fact of life at this point.

But in Joe Six Pack's case, maybe it's just because he just drank those six beers and they're clouding his judgement a little bit... :)
 
No it hasn't. With the exception of NVIDIA which has been silly about it ever since G80, the handheld industry is INCREDIBLY SANE about the whole thing. For IMG and ARM, the number of cores equals the number of rasterisers. This was equal to 1 for all desktop GPUs until Fermi/Cayman. For Vivante, there's still only a single rasteriser, so the number of cores (assuming they don't lie too much) is equal to the number of instruction decoders they have. Both are perfectly reasonable uses of the 'core' terminology in my mind.
If we are counting rasterizers as cores, then it makes no more and no less sense to call an ALU lane or a memory controller a core, since at that point you are just counting cores as a unit element of logic duplication.

And even then, one could accept IMG's definition of a core since they are unified. Choking 4 fragment pipelines with a single vertex pipeline and calling it quad core isn't exactly straight.
 
Back
Top