Cortex A-15 v Bulldozer/Piledriver

I would actually contest that. I think it's more that x86 has been (much) more heavily optimized for that kinds of loads. Either way, it's irrelevant -- what's relevant is that x86 + the software stack that will be available to it will be much faster at desktop type loads than ARM + the software stack it has will be.

The issue isn't that people overestimate BD ipc, it's that you vastly overestimate what ARM will get. As I said earlier, the next ARM will probably reach or exceed x86 ipc on pure scalar numbercrunching. It's just that that isn't a realistic load for a desktop/laptop pc. For tasks that the desktop user cares about, like spell-checking and grammar checking text while layouting it, what matters is rapidly traversing complex and large data structures. And ARM really really falls behind there. Even if they did awesome evolutionary improvements over their last generation, they are not going to get to half of BD ipc when running at 2.5GHz.

And I'm not an ARM hater. I actually like the arch quite a bit, and the future ARMv8 will probably be the cleanest and best mass-produced isa ever. It's just that I have programmed on a lot of arm systems, and I have programmed on a lot of x86 systems, and I think your expectations are really unreasonable.

You obviously have a great deal of experience working with both architectures and i respect that, perhaps the memory subsystems will work heavily in x86 favour..but i still standby that you could make at least a comparative A15 ultrabook to a BD one.

As i have stated, i don't expect A15 to get within 30% of BD, BUT that wont matter a jot for W8 METRO performance..in that form factor, what will matter is things like thermal budgets, power budgets, which could go towards increasing the graphics side..(ironically AMD spin v Intel) decreasing the battery size=weight...or just simply MORE batterylife full stop.

This isn't a comparison that could have been done with A9's (too weak) or IVYBRIDGE (too powerfull) but BD is low enough down the pecking order to take a stab at.and A15's will be in that ball park (ish).IMHO.

Ok lets look at the other side of the equation, power, how much do you think realistically will both use at full load...and another comparison at light load? :???:
 
EDIT 1; To clear up some confusion, this is about the CPU's them selfs, assuming both ultrabooks use W8 METRO.

Just talking about the CPU... ;)

If you benchmarked for example AES encryption in hardware the Triniy CPU would be expected to blow circles around the A15.

Have to admit though the A15 looks hugely impressive and a massive step forward. Can't wait to see some comparisons between it and AMD/Intel solutions on tablets.
 
Just talking about the CPU... ;)

If you benchmarked for example AES encryption in hardware the Triniy CPU would be expected to blow circles around the A15.

Have to admit though the A15 looks hugely impressive and a massive step forward. Can't wait to see some comparisons between it and AMD/Intel solutions on tablets.

Well the question is in relation to the form factor taking batterylife into consideration...

Well we will be able to benchmark them in w8 wont we?:smile:
 
If you benchmarked for example AES encryption in hardware the Triniy CPU would be expected to blow circles around the A15.
http://www.anomalousanomaly.com/docs/CheckMark Results.pdf
Clock-per-clock ARM9 seems to be nearly as fast as core2 with SHA. I know I've seen benchmarks where some ARM had HW instructions for encrypthion and it blew away pertty much anything else. Though yes, later x86 CPUs also have HW instructions for it.
 
Anecdotal evidence:

I have a small single threaded flash apps that manages 8fps on my Tegra 2 based phone (Cortex A9 @ 1 GHZ). The same app manages 54-56 fps on a 2.56 GHz Core 2 Duo and 75-80 fps on my 2.66 GHz i7 920. All with flash player 11.1

There is significant uncertainty related to the software stack. As an example, the Core 2 Duo only manages 38 fps under Ubuntu 11.

Adobe has made significant strides in flash performance, with flash player 9 I had 2-3 fps on the Tegra 2, 20-25 fps on the C2D and 40-45 fps on the i7 (both Windows 7). ARM performance had the largest relative jump in performance going from version 9 to 11.

Two things can be inferred from this: 1) x86 processors are very good at running unoptimized/suboptimal code, and/or 2) PC performance gets the most attention early on.

Anyway, bottom line is that Intel x86 CPUs performs 2.5 to 3.5 faster per clock compared to Cortex A9. Even if Cortex A15 improves ipc by 40-50% they will be lagging.

Cheers
 
Anyway, bottom line is that Intel x86 CPUs performs 2.5 to 3.5 faster per clock compared to Cortex A9. Even if Cortex A15 improves ipc by 40-50% they will be lagging.

Cheers
Don't forget the "good enough" factor! ;)

But once ARM dedicates an architecture, not meant to run on tiny batteries, things could change.
 
Anecdotal evidence:

I have a small single threaded flash apps that manages 8fps on my Tegra 2 based phone (Cortex A9 @ 1 GHZ). The same app manages 54-56 fps on a 2.56 GHz Core 2 Duo and 75-80 fps on my 2.66 GHz i7 920. All with flash player 11.1

There is significant uncertainty related to the software stack. As an example, the Core 2 Duo only manages 38 fps under Ubuntu 11.

Adobe has made significant strides in flash performance, with flash player 9 I had 2-3 fps on the Tegra 2, 20-25 fps on the C2D and 40-45 fps on the i7 (both Windows 7). ARM performance had the largest relative jump in performance going from version 9 to 11.

Two things can be inferred from this: 1) x86 processors are very good at running unoptimized/suboptimal code, and/or 2) PC performance gets the most attention early on.

Anyway, bottom line is that Intel x86 CPUs performs 2.5 to 3.5 faster per clock compared to Cortex A9. Even if Cortex A15 improves ipc by 40-50% they will be lagging.

Cheers

Im sorry although i don't doubt for a minute that some x86 will be VASTLY more powerfull than ARM..the core 2 duo comparison doesn't hold up in my opinion....Tegra 2 has no MPE and terrible bandwidth... and is generally considered to be the worst A9 implementation out there...
A15 also has a vastly superior memory subsystem to A9..(ARM weakness) and don't forget flash is gpu accelerated anyhow.

Also comparison uses the same memory, AND cache...which no doubt alongside floating point units and software optimisations helps tremendously over raw perf/ghz comparisons.

I have a Atom netbook..and it is simply abysmal at running ANYTHING...also i have a old Athlon xp @2.0ghz which is only marginally better..and struggles with anything 'HD'...integer wise BD will not be a trillion miles of Athlon XP.

If this is any indication of real world performance improvements over A9..then it could get interesting;
Omap 5 series Duel core A15 @800mhz v Tegra 3 quad @ 1.3 ghz
http://www.slashgear.com/ti-omap-5-blows-past-quadcore-tegra-3-23215003/

The lower clocked A15's destroy the A9's....so i don't think using Tegra 2 is going to be a fair comparison...especially since the processors you compare them with a high TDP desktop parts.
 
Last edited by a moderator:
Im sorry although i don't doubt for a minute that some x86 will be VASTLY more powerfull than ARM..the core 2 duo comparison doesn't hold up in my opinion....Tegra 2 has no MPE and terrible bandwidth... and is generally considered to be the worst A9 implementation out there...
A15 also has a vastly superior memory subsystem to A9..(ARM weakness) and don't forget flash is gpu accelerated anyhow.

The particular flash app is simply action scripting crunching numbers, with no support for SIMD (unless Adobe's runtime does something clever), there is *zero* aid from the GPU.

And I don't buy the bandwidth thing. While my Core i7 has ten times more bandwidth than the Tegra 2, I can run 5 instances of the flash app on my PC and get a compound 300 fps, 35 times that of the Tegra 2. If performance scaled with bandwidth it would be less than half that.

And yes, Atom sucks.

Cheers
 
And yes, Atom sucks.

Cheers

We agree on something then!:D

Still you can't get over the cache, bandwidth, and higher optimised TDP...i very much doubt perf/watt is going to be 35 times ARM somehow;)

And besides im not sure all those factors would scale linealy..never mind the software...lets wait till w8 metro tablets/ultrabooks...it either going to put ARM in the game and get Intel worried (as node shrinkage is getting vastly more expensive/ with diminishing returns as you go down) OR prove to everyone just how much powerfull x86 really is compared to ARM! :oops:

Im going with ARM in the short run...and Intel to win overall (sorry AMD:cry:)
 
We agree on something then!:D

Still you can't get over the cache, bandwidth, and higher optimised TDP...i very much doubt perf/watt is going to be 35 times ARM somehow;)

Single thread performance doesn't scale linearly with power. You can't compare perf/watt between the fastest processor in the world and something designed for sub watt power envelope.

The context of this thread is Cortex A15 vs Bulldozer. A15 will be nowhere near the performance of Bulldozer at same frequency, but it will use a lot less power.

And besides im not sure all those factors would scale linealy..never mind the software...lets wait till w8 metro tablets/ultrabooks...it either going to put ARM in the game and get Intel worried (as node shrinkage is getting vastly more expensive/ with diminishing returns as you go down) OR prove to everyone just how much powerfull x86 really is compared to ARM! :oops:

Im going with ARM in the short run...and Intel to win overall (sorry AMD:cry:)

Funny, I kind of like the idea of a Bobcat W8 tablet.

Cheers
 
Single thread performance doesn't scale linearly with power. You can't compare perf/watt between the fastest processor in the world and something designed for sub watt power envelope.

The context of this thread is Cortex A15 vs Bulldozer. A15 will be nowhere near the performance of Bulldozer at same frequency, but it will use a lot less power.

True, but my response was in context to your comparison between a smartphone optimised SOC (and a poor one at that) and 2 fully fledged desktop cpu's....all im saying is form factors/thermal limits IS going to bring the difference down considerably...inversely 4x Cortex A15's with 4mb cache..ddr3 @ 2.5ghz is going to be WAY WAY more powerfull than 2.5x Tegra 2 (using the clock speed comparisons you used to determine x86 performance advantage)..

The other thing you han't taken on board is that even a core 2 duo has better IPC than BD anyhow....

Im sticking with my assumption..A15 to be within 30% of BD..but be 3-4x less power hungry (and cooler)..which would allow either better battery life...or same batterylife but more powerfull graphics budget..either way a better bet for an ultrabook form factor. ;)
 
We'll see how Trinity turns out. I'm thinking ultrabook-alikes are the intended target because that's the current hotness in PC notebooks.
 
I have a Atom netbook..and it is simply abysmal at running ANYTHING...also i have a old Athlon xp @2.0ghz which is only marginally better..and struggles with anything 'HD'...integer wise BD will not be a trillion miles of Athlon XP.

Bulldozer performs bad, but not that bad.

Assuming the Dothan Pentium M is 1.0.

Core Duo = 1.08
Core 2 = 1.3
Core 2 Penryn = 1.4

Deneb = Penryn minus few % so 1.3?
Bulldozer = Deneb minus few % so 1.2?

Core Duo overclocked were few % faster than Athlon X2s.

Athlon X2 = 1.05?

There's basically no IPC gain from Athlon 64 to Athlon X2. Athlon 64's were 30% faster than Athlon XPs.

That still makes Bulldozer 50% faster than Athlon X2. That's not a small gain in the CPU world. It's also in single thread. Older processors simply scaled to more cores won't perform as well. Real world applications will show greater differences with multi core(if all compared CPUs were multi core), ISA and platform enhancements.

Athlon XP = 0.81

Pentium M Banias = 0.94
Banias 512 = 0.85
Atom = Atom 1.86GHz Atom was equal to 1.0GHz Pentium M Banias with 512KB L2 = 0.5

inversely 4x Cortex A15's with 4mb cache..ddr3 @ 2.5ghz is going to be WAY WAY more powerfull than 2.5x Tegra 2 (using the clock speed comparisons you used to determine x86 performance advantage)..

Yes, but its 2.5x clock per clock advantage, assuming the previous poster has it right, and taking the lower side of "2.5-3.5x". Then it has to be clock normalized. Bulldozer also has 8 cores(or if you prefer, 4 cores with big multi-thread gains). What about against a 2GHz Trinity(based on being ~ with 35W Llano parts and with 10% clock per clock disadvantage on Bulldozer based architectures) with 17W TDP and 4 cores which include a powerful GPU?
 
Last edited by a moderator:
DAVID C; Nice breakdown, how did you come to those conclusions? any supporting evidence would be great to add to the thread and get a better idea.

Well it certainly looks like bulldozer was designed to operate best at 17-35w..as mentioned in my starting post also designed to carry the class leading AMD graphics...BUT Cortex A15 was also designed to run with A7..so power consumption would decrease even more..allowing for better graphics..

Based on what i have seen, assuming same memory/cache/clocks BD would not be 50% faster than athlon 64 in single thread.

If you check the SYSmark 2007 comparison i used above a lower clocked,less threads, older, Core 2 duo with less L2 and no L3..and likely (not sure) slower memory...gets the SAME score...

Also the comparison was to be 2 module (4 thread each rather than 8) piledriver (so trinity) its hard to quantify things like graphics as we have no performance data from say IMG TECH Rogue series?? or say Keplar SMX in Tegra 4?? so its difficult to include graphics unless we make it fair and just say default Nvidia discrete?

The other point you make is that a Trinity 17w part is going to be clocked lower than that to fit into the TDP....however A15's would have room to expand the clock speed higher than Piledriver....thats almost a seperate discusion...

What do you think the power consumption would be of both setups at 2.5 ghz ??
 
DAVID C; Nice breakdown, how did you come to those conclusions? any supporting evidence would be great to add to the thread and get a better idea.

Well it certainly looks like bulldozer was designed to operate best at 17-35w..as mentioned in my starting post also designed to carry the class leading AMD graphics...BUT Cortex A15 was also designed to run with A7..so power consumption would decrease even more..allowing for better graphics..

Based on what i have seen, assuming same memory/cache/clocks BD would not be 50% faster than athlon 64 in single thread.

If you check the SYSmark 2007 comparison i used above a lower clocked,less threads, older, Core 2 duo with less L2 and no L3..and likely (not sure) slower memory...gets the SAME score...

Also the comparison was to be 2 module (4 thread each rather than 8) piledriver (so trinity) its hard to quantify things like graphics as we have no performance data from say IMG TECH Rogue series?? or say Keplar SMX in Tegra 4?? so its difficult to include graphics unless we make it fair and just say default Nvidia discrete?

The other point you make is that a Trinity 17w part is going to be clocked lower than that to fit into the TDP....however A15's would have room to expand the clock speed higher than Piledriver....thats almost a seperate discusion...

What do you think the power consumption would be of both setups at 2.5 ghz ??

Hi again French Toast

Love your persistence (for now) but really you whole argument is based on a major fallacy.

CPU performance is not measured on a single score (SysMark 2007). Every architecture has strengths and weaknesses and you are seemingly trying to divorce a CPU execution core from its memory/cache/clocks what the hell are you actually trying to compare?

It is IMPOSSIBLE to compare an Athlon X2 against a Bulldozer/Piledriver within that narrow, artificial and rather stupid criteria e.g. Bulldozer works with DDR3, Athlon X2 does not.

The other issue is that different benchmarks show different performance differences.

Here is a comparison between Athlon X2 6400+ and AMD-FX 8150 (not ideal as clocks are not the same but they are close - 12.5% difference:

http://www.anandtech.com/bench/Product/27?vs=434

Now I am going to do to you what you have been attempting to do to us (against my better judgement) and cherry pick one of the best case performance increase for Bulldozer using Anand's bench between the two processor normalised for clockspeed.

Excel SP1 2007 Monte Carlo simulation
(BD)14.2 - (AthX2) 65.9
Assuming 12.5% clockspeed disparity and linear clock scaling -
(BD)16 - AthX2) 65.9

So Bulldozer is 4.1x faster than Athlon X2.

Bulldozer is also approx 1.8x fast as a Core 2 Duo E8600 3.33GHz

As far as we know currently Piledriver is going to be faster still per clock.

Thanks for listening... damn stats.
 
To be fair that was the first and ONLY time i have used Anandbench.. i actually thought i was being a smartass to just find the 'nugget' that did! ;)

The things that could narrow the gap are the fact the Athlon x2 is running with 1/4 L2 and no L3..(relevant beacuse thats the comparison i used) likely crappy memory also the BD part is turbo'd up to 4.2 ghz (did you factor in a full +25% clock frequency?) AND lets not forget BD has a 8 threads AND 2x ram for crying our loud!!.

Also a quick search and here is a result with that core 2 duo i tagged that BEATS the BD part..so i don't know why you mention that comparitively weak Core 2 duo..
http://www.anandtech.com/bench/Product/54?vs=434

In this full comparison test, the e8600 actually beats the newer more highly spec'd BD part in some tests..with only 2 threads ddr2, 1/2 ram, lower clock speed AND only 6mb L2 cache.....

If like i said..we leveled the playing field..also using THE SAME SOFTWARE Athlon xp wouldn't be a million miles off SINGLE THREAD IPC of BD (its hard to tell?)...because its so old its hard to get a fair comparison of a bench like that... as it will ALWAYS favour the newer processors...but its interesting non the less.

Here is a comparison that shows a lower spec'd/lower end Athlon x2 @ 2.8ghz and 1/4 L2...half the cores, crap memory can actually BEAT the HUSKY CORE A6-3650..@ 2.6GHZ...in some tests...
http://www.anandtech.com/bench/Product/403?vs=90
It is generally considered that ipc of BD is slightly WORSE than HUSKY..So level the playing field on this test... you work out the math!! ;)
 
Last edited by a moderator:
French Toast,

Bulldozer's lower thread count scores are penalized by the module architecture: http://techreport.com/articles.x/21865/2

That makes single thread scores artificially lower than they really are. First, I looked over at my numbers, I noted that I wasn't sure with AMD results, but it should be more accurate now.

"Revision 1.1"


Starting from 1.4 on Penryn
Deneb = 1.25
Bulldozer = 1.2
Athlon X2 = 1.0

Notes: Sysmark isn't a single threaded program, and it benefits small by going from more than 2 cores. The maximum practical benefit stops at 4 cores. There's a gain going from 2 to 3 and another from 3 to 4.

-Phenom II X2 550 BE 3.1GHz vs Athlon X2 6000+ 3.0GHz = 25.5% advantage of former with 3% frequency difference. I've said Deneb is at 1.25 and X2 is at 1.0.

-FX-8150 3.9GHz(TC) vs Phenom II 980BE 3.7GHz = 7% performance advantage of latter with former having 5.4% frequency advantage. But what if we can gain 10% for Bulldozer by manually assigning cores and threads? That would turn out to be 2.8% advantage for Bulldozer then. Deneb at 1.25 and Bulldozer at 1.2.

Core 2 Q9650 3.0GHz vs Phenom II 940 = 14.8% advantage of former. Penryn at 1.4 and Deneb at 1.25.

Core 2 Duo E8400 3.0GHz vs Athlon X2 6000+ 3.0GHz = 43.6% advantage of former. Penryn at 1.4 and Athlon X2 at 1.0.

Core 2 Duo E8400 3.0GHz vs Core 2 Duo E6850 3.0GHz = 2.7% advantage of former. Penryn at 1.4 and Core 2 at 1.3.

The examples show my numbers are roughly correct. Also determining single thread performance is hard to do. Penryn, for example might look better elsewhere because focus was on speeding up media applications. Bulldozer, looks worse because of modules and shared FP unit not being ideal for single threads. No point of discussing that in theoreticals though.

Well it certainly looks like bulldozer was designed to operate best at 17-35w..as mentioned in my starting post also designed to carry the class leading AMD graphics...BUT Cortex A15 was also designed to run with A7..so power consumption would decrease even more..allowing for better graphics..

What's the logic of comparing TDP, which is a designation with practical maximum power(and for a lot of applications both Intel and AMD chips fall significantly below TDP), and A7, which is for a very light load use? You increase frequency, and double the cores, the power usage will skyrocket. Didn't someone say PS Vita's Quad A9 and SGX543MP4+ had a TDP rating close to 5W? Even going to 28nm, significantly higher CPU frequency and new architecture would increase TDP a lot.
 
Last edited by a moderator:
DAVID C; Thanks for the breakdown, so from your numbers if i have read it correct, BD is 20% faster than Athlon xp??
If so that does sound reasonable and pretty much confirms what i thought looking at all those benchies...

What we need to do now is draw a comparison from xp to core duo...used in the pheronix test suite...then draw some approx numbers form A9-A15 projected performance differences...factoring in increase in cache/clock/memory we are using for this hyperthetical comparison.

What's the logic of comparing TDP, which is a designation with practical maximum power(and for a lot of applications both Intel and AMD chips fall significantly below TDP), and A7, which is for a very light load use? You increase frequency, and double the cores, the power usage will skyrocket. Didn't someone say PS Vita's Quad A9 and SGX543MP4+ had a TDP rating close to 5W? Even going to 28nm, significantly higher CPU frequency and new architecture would increase TDP a lot

Well the point is Piledriver will not be running a 2.5ghz in a 17w tdp...it would underclock/volt..to a lower frequency..in fact you would be lucky to get turbo up to 2.5ghz in a 2 module Piledriver..in an ultrabook form factor..or could be wrong though.

With A15..the running power consumption of the cores is significantly less..obviously as it designed to operate as low as smartphones...but they can't undervolt/clock (like Krait/BD) so having the A7 would allow the A15's to use less power on menial tasks...increasing average battery time...(as opposed to no A7)

Also whilst we are discussing IPC between the 2 architectures at that set frequency/and using that ultrabook form factor as a guide...the fact is the A15's would have the ability to run at a higher clock than piledriver...negating the 30% projected IPC difference..this allows the A15's to at least equal and probably beat the Piledriver cores in the same tdp...if you catch my drift..

If you don't need the Piledriver matching performance you could allocate some of that TDP to increased graphics..

Speaking of power consumption, no body so far has touched on what the A15's would be capable of....both at load and minimal/average use....

EDIT; forgot to add benchmark link to A6 3650 2.6ghz V ATHLON X2 2.8ghz
http://www.anandtech.com/bench/Product/403?vs=90
 
Last edited by a moderator:
With A15..the running power consumption of the cores is significantly less..obviously as it designed to operate as low as smartphones...but they can't undervolt/clock (like Krait/BD) so having the A7 would allow the A15's to use less power on menial tasks...increasing average battery time...(as opposed to no A7).
What? Of course A15 can underclock and undervolt. The problem is that it obviously cannot undervolt below the process minimum and it's also a very big core in handheld terms so that means high leakage.

BTW, this is certainly an interesting and important thread, although I'm not sure how much of the analysis I agree with on any side. In a general sense I can certainly see a quad-core 3GHz A15 being extremely competitive with a 2GHz Trinity (2 module/4 cores) in every metric - it's just really hard to get 50% higher single-threased IPC than an A15-level core (Intel is probably at least that good, but we're talking about Piledriver here).

Of course we'll need another generation to get 64-bit support...
 
What? Of course A15 can underclock and undervolt. The problem is that it obviously cannot undervolt below the process minimum and it's also a very big core in handheld terms so that means high leakage.

Well im not sure of the proper workings in detail..i was just going off Kraits supposed advantage over default ARM Cortex cores....SMP or something like that, looks like ive got it wrong?

Yea im sure im right, A15 would be a really good match in that form factor...also the talk about IPC between the two is related but really a seperate discussion to which 'set up' is best for an ultra book/tablet..all things considered.

Another interesting discussion would be something like AMD Brazos v Qualcomm S4...maybe Exynos 5450 v Brazos?...
 
Last edited by a moderator:
Back
Top