Cortex A-15 v Bulldozer/Piledriver

Before i get shot down in a ball of flames, i know they are aimed at different markets, and for the most part are not designed to face off against each other, however seeing as Piledriver is AMD's CPU architecture going forward, Cortex A-15 is ARM's 'Daddy core' and they will probably both end up in some kind of W8 Tablet/Netbook i think its worth a look..just for the hell of it.

Bulldozer has less than stella IPC over the Phenom 2 class..(best case on par) as it was designed with different parameters in mind..such as; Fitting more threads efficiently onto a small die area, intergrating CPU's/GPU's on die together in a 'modular' approach and increasing clock speed..whilst keeping IPC on par with K10.5..(hope ive sumised that correct phew!)

Cortex A15 on the other hand is ARM's effort in trying to build a much more powerfull and complex CPU to what it had done in the past.
Sort of like a mini K10.5 core..(According to Aruns article..they have some similarities) how far then would A15 be behind in IPC over a Piledriver core on both outright performance and performance per watt...

-Cortex A-15 quad with 4mb L2 cache,@2.5ghz...
-2 Pileriver modules with comparable cache setup (no L3) @2.5ghz...
-Both with comparable dual channel 64bit ddr3 setups...
-Assuming built on Global foundary's 32nm gate first HKMG process...
-Assuming ULTRABOOK form factor with 17w TDP (as a guide)
-Both hypothetical systems running Windows 8 Metro.

Im not knowledgeable enough to answer this question my self..but there are plenty on here willing to get stuck in im sure;).. discuss.

------------------------------------------------------------------------------------------------------------------------------------------------------
EDIT 1; To clear up some confusion, this is about the CPU's them selfs, assuming both ultrabooks use W8 METRO.

EDIT 2; Of course Cortex A15's are designed to be used with Cortex A7's in big_LITTLE... which would warp the power consumption for general use considerably for ARM.

On the other hand to be fair, others would argue that Bulldozer is designed to work alongside AMD graphics in an Ultrabook form factor....so for now we leave them both out of the equation and just assume Default Nvidia graphics IP.

EDIT 3; Through discusions on this thread it has become clear that the Cortex A-15 v AMD Bulldozer IPC comparison....while related, is a seperate discusion in its own right to whether A-15 or BD is a better architecture for Ultrabooks.

Still, its best to use a 17w tdp for the IPC comparison..for a guide as the X86 core can scale to over 100w TDPs/4.2ghz and 16mb of total cache...which wouldn't be any where near fair, and would never face off against each other.
 
Last edited by a moderator:
Bulldozer will likely blow away Cortex A-15 in single threaded performance though it will consume more power. AMD's competitor to ARM is Bobcat/Jaguar.
 
Bulldozer will likely blow away Cortex A-15 in single threaded performance though it will consume more power. AMD's competitor to ARM is Bobcat/Jaguar.

Would it though?? besides Bobcat is only slightly better than A9..am i right?..i get the feeling it would be closer than alot of peeps think..
 
In sheer computational performance, they are probably pretty close. However...
... with comparable cache setup (no L3) @2.5ghz...
Both with comparable dual channel 64bit ddr3 setups...
High-end memory/cache subsystems are *really* hard. Arguably much harder than making the actual cores. They are the area where AMD is most trailing Intel at the present. And they are also the area where ARM cpus are traditionally at their very weakest. No ARM cpu ever has had a memory subsystem that is anywhere near the performance of modern x86.

The performance you get out of a Cortex A9 pretty much depends on how much the particular benchmark stresses the memory side. In pure register integer code it actually smokes Atom and gets very close to Bobcat. But on benches that put realistic load on memory, it falls far behind.

So, with a comparable memory subsystem I'd expect it to get close to BD at similar clocks. I just find it completely implausible that it will have that comparable memory subsystem.
 
Fair evalutation, but how would that effect real world performance? for instance if we assumed a ultrabook form factor..(ie very thin) and stuck both in there, what difference would you be able to tell? and how would they both effect batterylife? granted you wouldn't need to thrash them most of the time, but the A15's thrashed would consume what about 5w load? the Bulldozers must hitting 15w??..
 
Would it though?? besides Bobcat is only slightly better than A9..am i right?..i get the feeling it would be closer than alot of peeps think..
I don't have answers, hopefully Win8 benchmarks will show us, but I expect A15 to be close to Bobcat performance, not Bulldozer.
 
Fair evalutation, but how would that effect real world performance?
It will determine real world performance. On most desktop loads the A15 will fall far, far behind BD. I'll do real well at calculating the first 40 Fibonacci numbers though.
what difference would you be able to tell?
The BD one would be able to run loads like web browsing, flash, productivity applications and such without noticeable slowdowns. The A15 wouldn't.

and how would they both effect batterylife? granted you wouldn't need to thrash them most of the time, but the A15's thrashed would consume what about 5w load? the Bulldozers must hitting 15w??..
I expect the A15 at 2.5GHz would consume quite a bit more than 5W. However, it would scale down to lower frequencies & power use much nicer.
 
A15 will have to be about 10x faster than my Nook Color's A8@1300mhz to match a recent desktop PC on today's "rich media" (aka ad laden eyecandy nonsense) web pages. Flash often isn't even shown by default on these browsers - the JavaScript and such is enough to slow them down to a Pentium 3 like browsing experience. I'm looking forward to a 7" tablet with it but I can't imagine it even competing with an Athlon 64 X2.

On the other hand, considering A15 uses so little power for what it does, it's in its own class. ARM alone has enabled some really cool, decently performing handheld devices.
 

The speed in typical compiled object-oriented desktop software is basically determined by the speed you can chase pointers. Relatively little time is spent doing math, and a lot of time is spent waiting for results from memory fetches, which often contain jump targets, or another addresses to fetch. So a 1GHz CPU that has a lot better caches, prefetchers and branch predictors can generally beat the pants off a 2GHz cpu that has a more primitive memory subsystem.

This is quite visible on x86 -- many mobile sandy bridge celerons have (non-simd) arithmetic performance similar to the fastest P3 cpus out there, yet they are really, really a lot faster in practice.
 
So a 1GHz CPU that has a lot better caches, prefetchers and branch predictors can generally beat the pants off a 2GHz cpu that has a more primitive memory subsystem.

That's a good point. Too many people focus on the data rate and width of the memory interface. What really matters for real world applications are all the secret sauce aspects that make efficient use of the available bandwidth. Such as the number or operations in flight, reordering, prefetching, combining, stride detection and so on, and doing all of those with as little latency as possible, and feeding that into a high performance cache subsystem.

Even if some hypothetical ARM can catch up on the CPU side, they also need to match Intel and AMD on the uncore side as well.
 
Fact is, that the whole memory pipeline is the hidden "art" of any modern microprocessor architecture design. With ever decreasing transistor costs, it's relatively easy to pack a bunch of ALUs and SIMDs tight, but feeding those in timely manner with the right data is another story. The problem for all those ARM designs is that they are meant to work in very resource constraint environment -- on batteries. Accessing any off-chip interface is much more power consuming, than communication with a device in the SoC itself. High-performance and efficient memory system means both much more signal load on the external bus and more often activation of the DRAM devices. System memory is one of the few components left out of the SoC integration, and it will be some time until a viable solution is found for this. For the time being, there is simply no incentive to waste power on state-of-the-art memory pipeline in your average smart phone CPU. But if ARM is to have any aspirations for the server/desktop/portable market, they have to build up some know-how in the area. Intel have a huge head-up start and it only goes further with every new architecture generation.
 
Guys i have some benchmarks to throw into the mix..make of them what you wish, the ones to look out for are the Exynos 4210, Pandaboad ES (OMAP 4430), Tegra 3 scores compared to the panda board, Trim slice, containing Tegra 2...they are going up against a number of x86 offerings of different flavours....some of the tests seem to focus on memory and cache...and the results are somewhat closer than you would think against the duel core higher clocked core 2 duos....

Trim slice (Tegra 2) & Panda board ES (OMAP 4430) v x86
http://www.phoronix.com/scan.php?page=article&item=compulab_trimslice&num=4

Tegra 3 v Pandaboard ES use to gauge against x 86
http://www.phoronix.com/scan.php?page=news_item&px=MTA3MjQ

Exynos 4210 & Pandaboard ES v X86
http://openbenchmarking.org/result/1201051-AR-1112277AR91

Most recent Pandaboard ES v x86 running UBUNTU 12.4.
http://www.phoronix.com/scan.php?page=article&item=ubuntu_1204_armfeb&num=1

The OMAP 4430 does not fair as well as the low end but higher clocked X86 cores but some ARM chips do quite well, bare in mind the ARM chips are optimised for smartphones, and clocked/cached accordingly. the above hyperthetical scenario would put both architectures optimised for a 'ultra book' form factor with similar clocks memory and cache...

Bulldozer is not far off the IPC of that core 2 duo...all things being equal and cortex A15's in there we would be looking at something very close IMHO ;)
 
Last edited by a moderator:
This has a lot to do with using a somewhat poor benchmarking suite (being lenient here). Ask David Kanter how he feels about the PTS thing, he has a pretty decent point.
 
How about something like this

sunspider.gif

Check out the diminishing returns here.


44379.png


MSM8960 is somewhere in between A15 and A9.

42762.png


I get the impression that these CPUs are very far behind anything K10 or Conroe-like.

BTW, my Nook Color's OMAP 3621 (A8) @ 1300 MHz is just a touch under 3000 ms.
 
Well that is a very dramatic point, BUT i have read somewhere that those type of tests are more heavilly optimised for x86..and also are done across different operating systems... so is that a fair comparison??.

For the record i expect bulldozer to win in outright performance..but obviously A15 to win by far on power consumption...the thing im trying to get to is - will ARM's best chip be a better solution for anything up to a ultrabook than AMD's and maybe Intels all things considered. (including batterylife).

I think it could be quite close, alot closer than people think...and that may mark the first time ARM can encroach into x86 Territory when all sides are using their best troops...
 
Well that is a very dramatic point, BUT i have read somewhere that those type of tests are more heavilly optimised for x86..and also are done across different operating systems... so is that a fair comparison??.

ARM claims that the V8 JIT lead to a large speedup executing Javascript. Now that doesn't mean that V8 is optimal for ARM – they could have been so far behind that even with a big speedup there is still performance to be found – but I don't think it is right to say such tests particularly favour x86. A lot of effort is going into making Javascript faster on ARM in general and with Android in particular.

I don't expect miracles from A15 or A20, catching up with Merom from 2006 would be impressive never mind Sandy Bridge. The only ARM core that we know of that is chasing x86 like performance is Denver as that is targeting the HPC market, and will need to displace x86 paired with GPUs, and its closest competitor is likely to be from AMD as HPC purchasers start evaluating x86 APUs from AMD for future mutli-PFLOPS scale systems.
 
Guys i have some benchmarks to throw into the mix..make of them what you wish, the ones to look out for are the Exynos 4210, Pandaboad ES (OMAP 4430), Tegra 3 scores compared to the panda board, Trim slice, containing Tegra 2...they are going up against a number of x86 offerings of different flavours....some of the tests seem to focus on memory and cache...and the results are somewhat closer than you would think against the duel core higher clocked core 2 duos....

Trim slice (Tegra 2) & Panda board ES (OMAP 4430) v x86
http://www.phoronix.com/scan.php?page=article&item=compulab_trimslice&num=4

Tegra 3 v Pandaboard ES use to gauge against x 86
http://www.phoronix.com/scan.php?page=news_item&px=MTA3MjQ

Exynos 4210 & Pandaboard ES v X86
http://openbenchmarking.org/result/1201051-AR-1112277AR91

Most recent Pandaboard ES v x86 running UBUNTU 12.4.
http://www.phoronix.com/scan.php?page=article&item=ubuntu_1204_armfeb&num=1

The OMAP 4430 does not fair as well as the low end but higher clocked X86 cores but some ARM chips do quite well, bare in mind the ARM chips are optimised for smartphones, and clocked/cached accordingly. the above hyperthetical scenario would put both architectures optimised for a 'ultra book' form factor with similar clocks memory and cache...

Bulldozer is not far off the IPC of that core 2 duo...all things being equal and cortex A15's in there we would be looking at something very close IMHO ;)

They are Core Duo, not Core 2 Duo. There's quite a bit of IPC difference there. I'd also expect BD to have higher IPC than C2D. I haven't looked into it in detail but I'd guess somewhere between Penryn and Nehalem?
 
Back
Top