Next-Gen iPhone & iPhone Nano Speculation

Generally speaking, Joe Six Pack easily comes to very flawed conclusions based on his 'technical analysis' in fields he's not qualified in. I'm not sure what it is that makes humans so eager to make their own analysis in areas they do not truly understand - perhaps it's a relic from when we lived in much smaller social units (tribes/villages) and the distribution of knowledge was much more equal. Or maybe it's something that dates back to these dark times before the Internet (gosh that's a long time ago!)

Ha! yes in ye old'en days when the chief used to tell fantastical acccounts of magical bracelets with 'special abilities'
Subtract the bracelet for a GPU and the chief for an Nvidia PR guy and i we can see not a whole lot has changed! ;)
 
If we are counting rasterizers as cores, then it makes no more and no less sense to call an ALU lane or a memory controller a core, since at that point you are just counting cores as a unit element of logic duplication.
Maybe I'm a bit of an old-schooler, but I still think of GPUs as Graphics Processing Units, so when I talk about cores I tend to talk about 'graphics cores' :) And so I think it makes fairly good sense to define those as requiring their own dedicated triangle setup and rasterisation units. These are obviously not the same thing as 'processor cores' which I would define as requiring their own dedicated instruction stream and hardware decoder. In theory it would be possible to have multiple graphics cores and a single processor core but it would obviously be absurd.

Choking 4 fragment pipelines with a single vertex pipeline and calling it quad core isn't exactly straight.
I think it's rather accurate actually. If you're thinking about Mali-400, it is public knowledge that the binning unit can output more triangles than can be used by a single pixel core, but less than can be used by four cores (I'm not sure anything more specific than that is public). That clearly implies these cores scale the triangle and rasterisation process as well.

It's obviously true that the vertex shader and binner can be a bottleneck, but every GPU has a 'front-end' which can bottleneck the rest of the chip. It's simply somewhat more obvious in Mali's case. I don't think that disqualifies them as having four graphics cores. Now if we're talking about NVIDIA or even Vivante to a lesser extent, that's quite something else... *sigh*
 
If we are counting rasterizers as cores, then it makes no more and no less sense to call an ALU lane or a memory controller a core, since at that point you are just counting cores as a unit element of logic duplication.

And even then, one could accept IMG's definition of a core since they are unified. Choking 4 fragment pipelines with a single vertex pipeline and calling it quad core isn't exactly straight.

Yea i agree its not straight, admitedly these things are hard to market as they are rather techy, 'cores' is something simple a consumer can understand that a higher number of should usually mean better.

I think that we got right into a mess when DX10 came out, and unified shaders became 'cores', this linked onto the whole 'cores' thing with cpu's that was going on around the same time.

They should have stopped it right there and called them what they were..unified shaders..and the old style 'pipes'.
A core should mean exactly that a full complex processing core not any old simple bit of logic.

Are there any trading standards in the tech industry? im sure at least in britain if you label something wrong or market things with miss information you get slapped, thats what should happen with Nvida and Vivandi:mad:

EDIT; Just to clarify just how far i would go, even AMD with their BD modules..and calling 1 module 'duel core' is a push..for me thats more of an extravagant SMT solution that true 'multi core' (although makes more sense than Nvidia).

I can already see where we are going with this, Haswell will come out with some improved form of 4x HT per core, and so from then on every thread becomes a 'core'.....this could get pretty stupid
 
Last edited by a moderator:
Arun,

Apart from funky marketing definitions wherever they come from I don't see why we have to make it as complicated in the end. If we should have a picture on die are per core and you have N square mm per core and a MP4 is N*4 in mm2 then it's obviously multi-core. In another case where you have amount of clusters scaled only and not the entire chip it should obviously be for N mm2/core then for a 4 cluster chip <N*4 in mm2.

If then IMG's multi-cluster description of a Rogue core should equal for N clusters Nx times the single cluster die area then and only then the definition is questionable. Since Kristof though mentions redundancy with multi-core, I doubt it's just a marketing stunt.

Here's to say that in order to keep things fair Mali400MPx isn't exactly multi-core either, rather closer to multi-cluster and I'd dare to say that it's to a different degree (due to USC this time) the same on T6xx. As long as you don't multiply the entire enchilada of a single core by the amount of cores it can get a wee bit more complicated.

The NV ALU lanes crap counting as "cores" is complete nonsense and isn't even a case worth debating compared to the other two cases.

Finally multi-cluster/single chip vs. mGPU is nothing new even for the desktop and that for years now. I'm actually more than glad that in the small form factor space we haven't seen anything like AFR yet.
 
ARUN; Looks like your in demand! Do you know whether the T604/Midgard series follows the same multicore = multirasterizer scenario, and if so how does that translate into unified shaders/ALU's?.
 
A core should mean exactly that a full complex processing core not any old simple bit of logic.
It's strictly NVIDIA which pushed that naming scheme - AMD sticked to 'stream processors' (which is only marginally better mind you) for a very long time. Ironically, if you can wait 5+ years, cores really will be full processing units with their own schedulers for NVIDIA ;) (since they've announced Echelon will be fully MIMD - ugh, then again they might count the VLIW lanes as cores, that'd be just sad).

Are there any trading standards in the tech industry? im sure at least in britain if you label something wrong or market things with miss information you get slapped, thats what should happen with Nvida and Vivandi:mad:
Sadly I'm pretty sure the definition of 'core' in any dictionary is vague enough that the law couldn't provide much protection here. NVIDIA did make a specific technical argument about why their units were 'cores' which was the ability to do performance-free branching (and divergence management) - to which I answered that given Larrabee's branch/mask implementation would allow them to claim one core per vector lane, and they couldn't really contradict that... [which does highlight just how silly the whole thing is.

Ailuros said:
Apart from funky marketing definitions wherever they come from I don't see why we have to make it as complicated in the end. If we should have a picture on die are per core and you have N square mm per core and a MP4 is N*4 in mm2 then it's obviously multi-core. In another case where you have amount of clusters scaled only and not the entire chip it should obviously be for N mm2/core then for a 4 cluster chip <N*4 in mm2.
Is that really true for anything though? Even on a CPU the northbridge/system bus won't scale linearly. There's always a 'front-end' of some kind as I said, it's only a question of how big/important it is. And let's imagine you had a piece of hardware that offloaded what could otherwise be done in the drivers on the CPU - would the fact that doesn't scale make a GPU any less 'multicore'? Or maybe we just shouldn't be talking about cores at all but heh... :)

If then IMG's multi-cluster description of a Rogue core should equal for N clusters Nx times the single cluster die area then and only then the definition is questionable.
I'm pretty sure once it's public/clear what 'clusters' mean we'll all be banging our heads on the wall... :?:

I'm actually more than glad that in the small form factor space we haven't seen anything like AFR yet.
Or even SFR - nobody's processing vertexes once for every core. And I don't think we'll see any multichip stuff in embedded at all, ever. Phew! :)

french toast said:
Do you know whether the T604/Midgard series follows the same multicore = multirasterizer scenario, and if so how does that translate into unified shaders/ALU's?.
I shouldn't say too much, but I think that's known to be the case, yes. As for unified shaders, T604 is publicly known to have two ALU processing pipelines per core and T658 has four ALU processing pipelines per core. It's not public how independent these pipelines are. It definitely has a very competitive amount of ALU horsepower for that level of texturing/geometry performance though :)
 
I shouldn't say too much, but I think that's known to be the case, yes. As for unified shaders, T604 is publicly known to have two ALU processing pipelines per core and T658 has four ALU processing pipelines per core. It's not public how independent these pipelines are. It definitely has a very competitive amount of ALU horsepower for that level of texturing/geometry performance though :)

Awesome! i cant wait for some exynos 4412 goodness, cheers, ill leave you alone now..for a while! :smile:
 
Yeah, Mali is sticking with a few pre-defined multi-rasterizer configs, and PowerVR is still allowing independent cores to be grouped in any (up to a limit) number. Series6 clusters are just describing the ALU part and how they're set in more of an array now.
 
Is that really true for anything though? Even on a CPU the northbridge/system bus won't scale linearly. There's always a 'front-end' of some kind as I said, it's only a question of how big/important it is. And let's imagine you had a piece of hardware that offloaded what could otherwise be done in the drivers on the CPU - would the fact that doesn't scale make a GPU any less 'multicore'? Or maybe we just shouldn't be talking about cores at all but heh... :)

You can definitely talk about cores in the case of Series5XT MPs; assuming 1 SGX543 is truly at 8mm2@65nm as claimed in the past, a MP4 at the same frequency and process will be 32mm2. Best case 32, worst case more.

I'm pretty sure once it's public/clear what 'clusters' mean we'll all be banging our heads on the wall... :?:
If there's still as much redundancy as with MPs, then it's just marketing. It'll come down as to what compute cluster in such a case exactly includes.

Series6 clusters are just describing the ALU part and how they're set in more of an array now.

One could assume that while trying to read behind the lines of the so far released material. But what if there should be also TMUs included in each cluster for example?
 
BGR claims to have some more infos on the A6:
- model number S5L8945X (A5: S5L8940X, A4: S5L8930X)
- quad-core

http://www.bgr.com/2012/02/01/ipad-3-photos-show-quad-core-processor-wi-fi-and-global-lte-options/

So, quad-core Cortex-A9 it is?
I wonder if they've invested in Turbo techniques? Tegra 3 only turbos when a single core is in use and even then only 1 bin from 1.3GHz for 2-4 cores to 1.4GHz for 1 core, which isn't very impressive. A 1.5GHz quad core able to Turbo up to 2GHz when using 2 cores would be useful.
 
I wonder if they've invested in Turbo techniques? Tegra 3 only turbos when a single core is in use and even then only 1 bin from 1.3GHz for 2-4 cores to 1.4GHz for 1 core, which isn't very impressive. A 1.5GHz quad core able to Turbo up to 2GHz when using 2 cores would be useful.

The motivations for turbo techniques were mainly driven by thermal and supply current issues; neither problem exists in the mobile space (yet). The limiting factor is the design timing itself and not limitations due to power consumption and heat dissipation.
 
The motivations for turbo techniques were mainly driven by thermal and supply current issues; neither problem exists in the mobile space (yet). The limiting factor is the design timing itself and not limitations due to power consumption and heat dissipation.
So you mean there currently isn't a need to trade clock speed for increased core count, ie. a quad core ARM SoC can clock as high as a dual core ARM SoC for a given thermal envelope?

http://www.dailytech.com/article.aspx?newsid=23912

There's an interesting rumour here that the Apple has obtained a full ARM instruction set license and has designed their own custom ARM core for the Apple A6.
 
So you mean there currently isn't a need to trade clock speed for increased core count, ie. a quad core ARM SoC can clock as high as a dual core ARM SoC for a given thermal envelope?

Yes. There currently isn't a thermal limit so why not clock quad-core A9's as high as they will go on the process. I expect that may change with A15 designs though.

http://www.dailytech.com/article.aspx?newsid=23912

There's an interesting rumour here that the Apple has obtained a full ARM instruction set license and has designed their own custom ARM core for the Apple A6.

They bought PA-Semi. They were a custom CPU group and there was quite a big recruiting effort for CPU designers at Apple a few years back. A6 does also seem like the right timing for such things to have come to fruition.
 
Guess the PA Semi acquisition was further back then I remembered.. still, ~4 years seems pretty aggressive to go from start of design from scratch to product released (in an SoC etc). They could have been working on it before the acquisition but custom ARM cores isn't exactly a market a company like that would be itching to get into on their own, I would think..

I'm more skeptical because A6 is in line for a "tick" for Apple, and doing a new processor for a process they don't have experience with is harder work. That, and the serial number being a smaller increment than from A4 to A5 suggests a shrink.
 
Didn't a lot of the PA Semi people cash out and leave shortly after the acquisition?

Either them or Intrinsity people it seemed left.

What would be the advantage of a custom core over a standard ARM core? What kind of return for the cost of developing one? Or are the savings alone from not paying the same licensing costs advantageous?
 
Didn't a lot of the PA Semi people cash out and leave shortly after the acquisition?

Either them or Intrinsity people it seemed left.

What would be the advantage of a custom core over a standard ARM core? What kind of return for the cost of developing one? Or are the savings alone from not paying the same licensing costs advantageous?

You'd think licensing savings would start to be part of it, but only at the sort of huge volumes companies like Apple or Qualcomm see.

I doubt they really expect to have a big lead over the others in design quality, but I could see rolling their own as giving them a slight time to market advantage. Which would be pretty significant if true.

... this might sound ridiculous, but is it possible that Apple is even a small bit in this for the marketing? That is, the ability to say their devices are using their special core. People have been eating up A4/A5 so far without it using much in the way of custom IP for the central blocks. But people might not really care all that much more if they're told it's using an Apple CPU, so I don't really know.
 
BGR claims to have some more infos on the A6:
- model number S5L8945X (A5: S5L8940X, A4: S5L8930X)
- quad-core

http://www.bgr.com/2012/02/01/ipad-3-photos-show-quad-core-processor-wi-fi-and-global-lte-options/

So, quad-core Cortex-A9 it is?

Something odd about the cpu's id:

S5L8930X - A4
S5L8940X - A5
S5L8945X - A6

Notice the difference in the numbers? You kind of expect for a "A6", that they increase it to S5L8950X.

When they moved from the 30X (A4 ) -> 40X ( A5 ), there was a new type of CPU ( A8 -> A9 ).

So, probably a updated version of the A5, with double the A9 Cortex core's on the CPU end. GPU is unknown. Makes the version numbering logical.

Its not surprising news. A new iPad, without a Quad core, when most high end tablet producers are moving to Quad core's.

Looks like the boot info, show about 1GB memory size ( not sure if its virtual memory, or factual memory => 244276 * 4 = 977104 and ... ( unreadable the rest behind it ) ). Given that it has vm_page_bootstrap before it, looks like virtual memory.

Sounds logical for the iPad3 to have 1GB ( or more ) memory.

In other words, so far, no real surprising things that can be extracted from this information.
 
Have they all got inside info on each other or something? being as things take bloody years to design and sort out, how are they all correctly guessing what the other one is doing? how would Apple for instance know than quad would be the standard by now? had you asked me 18 months ago i would have laughed at you..

Im still not convinced yet...the numbering scheme points to a tick..that has been Apple's winning strategy untill now..the only thing that could sway it is the extra unit sales from marketing a quad core..that would be worth breaking the cycle.

Im 50/50, i really thought a clock speed bump to both the cpu/gpu with lpddr3..maybe some dedicated vram to help cope with the increase in res...now i also think a quad would make sense for them.
 
Back
Top