iPad 2

According to UBM TechInsights the A5 is quite big with 122.2 mm^2 @45nm from Samsung. Plus 512MB 46nm LPDDR2 from Samsung or Elpida.

http://www.appleinsider.com/article...st_lpddr2_ram_costs_66_more_than_tegra_2.html
A5.evolution.A4vsA5.031211.jpg
 
Sure it ain't manufactured on a 40 nm process?

Well, I guess if you want you can always assume that they are incompetent or lying.
Based on analysis performed by UBM TechInsights Lab and Process Analysis personnel, we can say that the A5 in our possession is definitely manufactured by Samsung using their 45nm process. UBM TechInsights used optical die and SEM cross-section images to analyse important features such as die edge seal, metal 1 pitch, logic and SRAM transistor gate measurements. These features were then compared to other manufacturers in our database, including other Samsung 45nm parts. The previous generation Apple A4 processor was also fabbed on Samsung’s 45nm process.
http://www.ubmtechinsights.com/reports-and-subscriptions/investigative-analysis/apple-ipad-2/
 
Bus speed doesn't say anything about bus width: I'm pretty sure it's 64-bit in iPad, but could very well be 32-bit in iPad 2. nVidia seems to think 32-bit is enough for dual core Cortex-A9, afterall (or even quad-core)

64-bit DDR to 32-bit DDR2 would be a bizarre design decision, particularly considering the increase in processing power. I'd question exactly what's hiding behind the geekbench results before I'd speculate in something like that. Anyone who is intimately familiar with the GL synthetic benchmarks Anand published who can say if they provide any additional insight? Someone mentioned in the comments that the FOR Loop tests revealed memory improvements, but I'm not familiar with the benchmark I can't evaluate that statement.

The SoC is surprisingly large - AMD Zacate with two cores + GPU is 75mm2 at 40nm for instance.
 
Last edited by a moderator:
http://www.iosnoops.com/2011/03/13/ipad-2-graphics-engine-full-benchmark-tests/

Additional benchmarks in GLBenchmark 1.1 testing OpenGL ES 1.1 performance between the iPad and iPad 2.
"Also note that for some tests, the iPad 2 offered such a high performance gain that GLBenchmark wasn’t able to record it properly (these tests were dismissed)."
So that's where the 9x is hiding? :devilish:

By the way, what's up with the ~900Mhz A5? Shouldn't it be 1GHz? It wouldn't be a problem if Apple hadn't said anything about the frequency, but Apple is advertising the A5 with 1GHz...
 
Last edited by a moderator:
By the way, what's up with the ~900Mhz A5? Shouldn't it be 1GHz? It wouldn't be a problem if Apple hadn't said anything about the frequency, but Apple is advertising the A5 with 1GHz...
It looks like the A5 can dynamically clock itself, usually between 800MHz - 1GHz in these benchmark load cases.

Is this a feature of the reference Cortex A9 design and/or other Cortex A9 SoC or is this something Apple designed themselves? I don't remember hearing about Tegra 2 dynamically clocking itself or maybe it's just not as aggressive as Apple implements it?
 
I'm wrestling with my ignorance when it comes to these highly integrated devices.
When you widen the memory bus of a laptop and up CPU, you increase pin-out, PCB complexity, and of course you need wider memory devices or more sockets. In short, there is a real price in power draw and cost/complexity.
But what is the case with these stacked chips? Does it alleviate the power draw per pin problem to the point where its not a major problem? Obviously there is no PCB cost, but is the cost of extra interconnects between stacked chips negligeable? I've only really seen concern about the power draw of higher clock speed interfacing between the CPU and memory for SoCs, never anything about bus width.

I'm trying to get a grip on the dynamics of different kinds of memory solutions going forward, and failing.
 
64-bit DDR to 32-bit DDR2 would be a bizarre design decision, particularly considering the increase in processing power. I'd question exactly what's hiding behind the geekbench results before I'd speculate in something like that. Anyone who is intimately familiar with the GL synthetic benchmarks Anand published who can say if they provide any additional insight? Someone mentioned in the comments that the FOR Loop tests revealed memory improvements, but I'm not familiar with the benchmark I can't evaluate that statement.

Yeah, just throwing 32-bit out there as a "not impossible." Surely that for loop test should be exercising the GPU's ability to, well, loop.. If it's stalling on memory instead, which would probably mean thrashing texture cache, then it'd be obscuring any results of loop testing.

The SoC is surprisingly large - AMD Zacate with two cores + GPU is 75mm2 at 40nm for instance.

I rather contend that TSMC's 40nm process is surprisingly compact compared to Samsung's (and Intel's, for that matter) 45nm.. Tegra 2 is another good example, weighing in at only 49mm^2.
 
Yes, but HOW is it done? If you'd want to work in parallel on the vertices (not just dumb AFR), you'd need some arbiter when generating the display lists. I'm just interested in how the "multiple GPU" scaling is done.

I'd shed some light on this but I don't have the incantation which allows me to post an attachment.
If a mod contacts me, maybe I can send along a pdf?

P
 
pyjamaslug, you don't have enough posts yet for us to even PM you. If you don't want to slap that pdf up on a free hosting site and post the link, the easiest way I can think of is for you to report your own post (click the little triangle-shaped sign below your name) and provide a link in the comment of your report. That way only the mods will see it, and one of us can reply here with it as a local attachment.

The size limit for attachments is fairly small, though: pdf 19kB, zip 97kB, rar 193kB.
 
As far as A5 goes, it's obviously wrong:
Here we’ve labeled the key blocks; the ARM cores are in the right half of the die, with ~4.5 Mb of cache memory each.
First, it doesn't match their picture; it should read "left half", unless the picture is wrong.
Second, they assume it's made of a pair of Cortex-A9, so 4.5 Mb of cache would mean 512 KB of L1 given that A9 has no integrated L2. Makes no sense :)
 
They simply don't realise the L2 is shared, the numbers are obviously right once you consider that. What's a lot more dubious is the existence of a 'WiFi' block on the die, unless they just mean SDIO I/O.
 
They simply don't realise the L2 is shared, the numbers are obviously right once you consider that.
Indeed, but if they don't know / take into account such a widely known piece of information, I wonder how much confidence to put about the rest of their guesses.
 
Indeed, but if they don't know / take into account such a widely known piece of information, I wonder how much confidence to put about the rest of their guesses.

The die photo is fairly valuable in and of itself. You can see the patterns of the GPU datapath as well as GMEM. The L2 cache array is fairly obvious as well.
 
Anybody has any clue how the iPad2 gets 4x more flops by adding 2x cores. That benchmark is supposed to be compute bound, so mem bw *should* not affect it. I find that more interesting than seeing WiFi marked on a app processor. :)
 
The 4x more flops is due to the pipelined VFP vs the non-pipelined one in A8. It has nothing to do with dual core, or that'd be 8x :)
 
Back
Top