iPad 2

Yes it does, but I guess the benchmark doesn't use it for its FP computations (and anyone using NEON for FP should be extremely cautious as it's not IEEE compliant).

I doubt most algorithms care. All we're talking about is subnormal support and exception catching in the cases of divide-by-zero, operations with infinity and NaN's. None of which really concern most things outside of scientific computing.
 
I doubt most algorithms care. All we're talking about is subnormal support and exception catching in the cases of divide-by-zero, operations with infinity and NaN's. None of which really concern most things outside of scientific computing.

Hrrmph. We don't divide by zero a lot in scientific computing. Seriously, I'd wager most parts of the computational science community care for IEEE compliance only in the sense that it simplifies transfer of data files between different platforms, if that. If your code is dependant on the finer points of IEEE standards for rounding et cetera, then for portability purposes, it is already broken, and probably suspect in terms of numerical stability in the first place. :)
 
The A5 is really quite big if you compare it with Tegra 2 and Tegra 3( anyone know the figure for OMAP 4 and QSD 8x60? Though Qualcomm has an integrated baseband so the size will not be directly comparable). The process does play a part as TSMC 40nm v/s Samsung 45nm will be at least 19% smaller (and apparently the density of TSMC's processes in general are higher than Samsung). I read one figure of 122 mm2 for A5, while chipworks say it is more than twice the size of A4, which means its at least a 100 mm2. Tegra 2 is only 49 mm2 and even Quad core Tegra 3 is only 80 mm2. What exactly has Apple put in the A5 to make it so big :???:
 
The A5 is really quite big if you compare it with Tegra 2 and Tegra 3( anyone know the figure for OMAP 4 and QSD 8x60? Though Qualcomm has an integrated baseband so the size will not be directly comparable). The process does play a part as TSMC 40nm v/s Samsung 45nm will be at least 19% smaller (and apparently the density of TSMC's processes in general are higher than Samsung). I read one figure of 122 mm2 for A5, while chipworks say it is more than twice the size of A4, which means its at least a 100 mm2. Tegra 2 is only 49 mm2 and even Quad core Tegra 3 is only 80 mm2. What exactly has Apple put in the A5 to make it so big :???:
Maybe there is an integrated Thunderbolt controller? Apple could be holding off until the iPhone 5 by which time the complete Mac lineup should have been refreshed with Thunderbolt. It would probably need to be implemented to use the existing 30-pin connector so users can either use the traditional 30-pin to USB cable or a new 30-pin to Thunderbolt cable.

I'm thinking Apple would also beef up the video encoder to handle 1080p video recording from the back camera. Ars Technica pointed out that OmniVision's new 8MP camera actually has worse light sensitivity than Apple's existing OmniVision 5MP camera so it would make sense for Apple to stick with the existing 5MP camera and use 1080p video recording as the differentiating feature. Plus, 720p FaceTime HD from the front camera to match the new MacBook Pros is probably also reasonable. Hopefully Apple can convince the carriers to enable VGA Facetime over 3G and have 720p FaceTime HD over WiFi. These thing would go largely unused on the iPad 2 though.
 
I doubt most algorithms care. All we're talking about is subnormal support and exception catching in the cases of divide-by-zero, operations with infinity and NaN's. None of which really concern most things outside of scientific computing.
Add to that a single rounding mode and only single precision. Does that still look good outside of scientific programming? I agree in most cases you don't care, but given how people poorly understand the limitations of FP numbers, adding some more constraints doesn't seem wise.

Anyway, my point is not really about the limitations of running FP code on NEON, it's about being forced to use NEON to get decent performance due to a poor FP unit ;)
 
Add to that a single rounding mode and only single precision. Does that still look good outside of scientific programming? I agree in most cases you don't care, but given how people poorly understand the limitations of FP numbers, adding some more constraints doesn't seem wise.

Anyway, my point is not really about the limitations of running FP code on NEON, it's about being forced to use NEON to get decent performance due to a poor FP unit ;)

I can tell you the difference of cost in implementation and it may not seem all that "poor" anymore :)

For now at least, you can't have your 2-way DP with IEEE compliance and 300mW peak power for FMA.
 
The A5 is really quite big if you compare it with Tegra 2 and Tegra 3( anyone know the figure for OMAP 4 and QSD 8x60? Though Qualcomm has an integrated baseband so the size will not be directly comparable). The process does play a part as TSMC 40nm v/s Samsung 45nm will be at least 19% smaller (and apparently the density of TSMC's processes in general are higher than Samsung). I read one figure of 122 mm2 for A5, while chipworks say it is more than twice the size of A4, which means its at least a 100 mm2. Tegra 2 is only 49 mm2 and even Quad core Tegra 3 is only 80 mm2. What exactly has Apple put in the A5 to make it so big :???:

Looking at the die photo, the dual A9's and L2 cache take up roughly 1/4 of the chip with the 543MP2 taking up another 1/4. There seems to be a lot of other logic outside of the DRAM controllers (and there seem to be two of those).

NEON does add quite a bit of area for the A9 and I imagine a 543MP2 isn't cheap either.
 
I can tell you the difference of cost in implementation and it may not seem all that "poor" anymore :)
I could give exact figures for that cost, but I'd be fired :)

For now at least, you can't have your 2-way DP with IEEE compliance and 300mW peak power for FMA.
I'm not asking for anything, I was just explaining some benchmark results and saying what I think of running some random FP code on the NEON unit due to a poorly performing FP unit.
 
Implementing PowerVR cores at a larger size in exchange for lower power consumption and/or higher clocks/performance would probably be the choice of a lot of semis in the mobile space if available.
 
I'm not asking for anything, I was just explaining some benchmark results and saying what I think of running some random FP code on the NEON unit due to a poorly performing FP unit.

Technically, you can set the FPSCR to disable rounding and turn on flush-to-zero. This allows even VFP instructions to perform like NEON (without the SIMD portion of course) mode instructions. You still probably won't want to do DP though due to the latency. This isn't true on the Cortex A8 but I believe it may be for the A9. It's true for Scorpion.

And like another post pointed out, you likely shouldn't rely on that kinda of precision and exception handling in your code anyway.
 
Implementing PowerVR cores at a larger size in exchange for lower power consumption and/or higher clocks/performance would probably be the choice of a lot of semis in the mobile space if available.

Well let's face it the die is already huge even w/o the entire GPU block. Now assume they wouldn't had used a MP2 but a single core SGX543 as an example. The total die area of the SoC would fall to what? Somewhere around 105mm2?
 
The nicely clocked Mali-400MP4 gives a seriously impressive performance in the Hardkernal ODROID-A at GLBench 2.0.
 
The nicely clocked Mali-400MP4 gives a seriously impressive performance in the Hardkernal ODROID-A at GLBench 2.0.

I don't think it's just a clock boost at play here: http://www.glbenchmark.com/compare....only=1&D1=Hardkernel ODROID-A&D2=Apple iPad 2

For those that won't notice on first sight the Mali400MP4 in the Hardkernel runs at 1366*768 vs. 1024*768 iPad2.

I've said it elsewhere already there's simply NO excuse for any IHV out there to not have the driver running at the highest possible potential when a piece of IP ships in final products, and that doesn't go obviously just for ARM.

We all know how first impressions can work for the average reader; it shouldn't then come as a surprise if any company like NVIDIA paints funky diagrams like here on page 17:

http://www.nvidia.com/content/PDF/t...ing_High-End_Graphics_to_Handheld_Devices.pdf

As competition heats up all IHVs have to realize how critical these things are. Granted NV is a SoC manufacturer and IMG an IP firm so there's no direct competition between the two and yes such stunts are typical from NV, but good hw inevitable needs also good sw.

No wonder GalaxyS2 results have been removed from the GLBenchmark2.0 database.
 
Its noticeable in the ipad2 results that FSAA has zero penalty on the SGX543 platform, whereas it had a 20-30% penalty on both SGX535 in iphone4 and SGX540 in the galaxy player.
 
Its noticeable in the ipad2 results that FSAA has zero penalty on the SGX543 platform, whereas it had a 20-30% penalty on both SGX535 in iphone4 and SGX540 in the galaxy player.

Amongst other possible advantages of the MP2 you shouldn't forget that the 535/540 cores have 8 z/stencil. Each SGX543 should be at 16 z/stencil, where for a MP2 you end up at 32 z/stencil.
 
Back
Top