Microsoft takes an Arm architecture license

What do they have to gain by designing their own ARM uArch that is worth breaking BC, and taking on massive costs and engineering efforts, while the alternative is to write a cheque to IBM?

BC has been and always will be a canard. It doesn't really matter in the console space.
 
You missed two: DEC and Intel.
And I'm fairly sure that Samsung has one as well.

Intel probably got rid of their license when they sold their XScale assets to Marvell - maybe outright sold the license. DEC doesn't even exist. It hasn't for a long time. I think you're confusing "has a license" with "had a license."

Why do you think Samsung has one? Their Hummingbird CPUs are just particular implementations of Cortex-A8 netlists.

That partially has to due with a difference in strategy. do >2x the work in <1/2 the time and go back into sleep vs burn less power for longer periods of time. So far all data points to hurry up and sleep as having overall better power efficiency.

I suppose you think Atom has 4 times the peak performance of Cortex-A9 then. I actually expect Cortex-A9 to have superior perf/watt when paired with the same kind of external memory subsystem as an Atom bench. Intel has done nothing to improve Atom's performance, and is instead positioning it at lower clockspeeds for the handheld market.
 
Last edited by a moderator:
BC has been and always will be a canard. It doesn't really matter in the console space.

Fair enough. I am still in the dark over potential benefits from designing your own uArch for a 3GHz multi core cpu console cpu, when an old supplier like IBM is available.

Not to mention the challenges of making a unified CPU+GPU arch, which I expect next gen consoles to have, since they have to last pretty much until 2020.
 
Intel probably got rid of their license when they sold their XScale assets to Marvell - maybe outright sold the license. DEC doesn't even exist. It hasn't for a long time. I think you're confusing "has a license" with "had a license."

AFAIK, DEC/Compaq/HP still technically have a license. It was the first architectural license ever done, fyi, and had differences from those that came after.

Intel didn't sell a license, Marvell acquired a new license from ARM. I'm pretty sure that intel still has a license as they still sell products containing their custom arm design to this day.

I suppose you think Atom has 4 times the peak performance of Cortex-A9 then. I actually expect Cortex-A9 to have superior perf/watt when paired with the same kind of external memory subsystem as an Atom bench. Intel has done nothing to improve Atom's performance, and is instead positioning it at lower clockspeeds for the handheld market.

For a variety of workloads ATOM has a significant performance advantage over ARM and in general has the same or better perf/watt as ARM cores.
 
You missed two: DEC and Intel.
And I'm fairly sure that Samsung has one as well.

I assumed the Intel one went to Marvell, certainly Intel signalled as strongly as they could that they were getting out of the Arm business. They may have kept a manufacturing license for existing products....or maybe they still do have the arch one.

The DEC one predates my knowledge.

Havn't seen anything officially from ARM, or any product from Samsung that would suggest they have an Arch ARM license.
 
I assumed the Intel one went to Marvell, certainly Intel signalled as strongly as they could that they were getting out of the Arm business. They may have kept a manufacturing license for existing products....or maybe they still do have the arch one.

The DEC one predates my knowledge.

Havn't seen anything officially from ARM, or any product from Samsung that would suggest they have an Arch ARM license.

I thought the hummingbird core was supposed to be owned by Samsung, atleast in part.
 
I thought the hummingbird core was supposed to be owned by Samsung, atleast in part.

I think the exact nature of that core is still a point of conversation. We know intrinisty "did their thang" to get it to the 1Ghz. Whether that requires an architectural licencee, I don't know, its apparently, on a per cycle basis a standard A8 core, nothing architecturally different, just speed optimised. Also its not totally clear if it was a joint venture by Samsung/Apple or solely a Samsung venture (who resold it to Apple), the fact that Apple bought intrinsity just adds to the mix.
 
For a variety of workloads ATOM has a significant performance advantage over ARM and in general has the same or better perf/watt as ARM cores.

What is your reference? The only real comparison I've seen is that i.MX51 netbook vs Atom ones. That's Cortex-A8, and is likely held back by amount of cache - I'm talking about Cortex-A9 which has a number of throughput improvements and typically more L2 cache. A test with the same number of cores and similar memory subsystem (ie, likely 64-bit DDR2 for both) would be appropriate, although I do think that ARM is still being held back by inferior compilation results with GCC, so using the best performing compilers on both might be a good idea (Intel ICC and ARM RCVT for x86 and ARM respectively)

The perf/watt comment in particular needs substantiation. Given what I know about both the pipelines of the two and the consumption/MHz it's kind of difficult to see that conclusion.

tangey said:
I think the exact nature of that core is still a point of conversation. We know intrinisty "did their thang" to get it to the 1Ghz. Whether that requires an architectural licencee, I don't know, its apparently, on a per cycle basis a standard A8 core, nothing architecturally different, just speed optimised. Also its not totally clear if it was a joint venture by Samsung/Apple or solely a Samsung venture (who resold it to Apple), the fact that Apple bought intrinsity just adds to the mix.

See Samsung's press release commentary:

Samsung said:
Intrinsity's Coretex-A8 processor-based FastCore embedded core is cycle-accurate and Boolean equivalent to the original Cortex-A8 RTL specification. While most ARM processor cores are implemented with synthesized static logic and compiled SRAMs, the Hummingbird achieves the exceptional 1GHz clock rate in Samsung's 45nm LP process technology through the use of a semi-custom design flow which strategically applies Intrinsity's proprietary Fast14 1-of-n domino logic (NDL) technology as macros in the timing-critical paths of the Cortex-A8 RTL core. NDL provides low latency conversion between domino logic and static logic which allows NDL to be seamlessly applied to a standard cell synthesized design. NDL provides gates which are 25 to 50 percent faster than static logic gates.

http://www.samsung.com/global/business/semiconductor/newsView.do?news_id=1030

Bottom line is, ARM doesn't give you source for Cortex-A8 no matter what you buy, so you can't customize its design. You can only obtain the netlists via a manufacturing license. I would be a little surprised if Apple contributed to Hummingbird's funding, given that you can buy the core in S5PC110 which is publicly available, although I guess it wouldn't be that different from Apple's prior involvement in ARM.
 
I think the exact nature of that core is still a point of conversation. We know intrinisty "did their thang" to get it to the 1Ghz. Whether that requires an architectural licencee, I don't know, its apparently, on a per cycle basis a standard A8 core, nothing architecturally different, just speed optimised. Also its not totally clear if it was a joint venture by Samsung/Apple or solely a Samsung venture (who resold it to Apple), the fact that Apple bought intrinsity just adds to the mix.

IANAL, but what Intrinsity would require a architectural license in my book.
 
That partially has to due with a difference in strategy. do >2x the work in <1/2 the time and go back into sleep vs burn less power for longer periods of time. So far all data points to hurry up and sleep as having overall better power efficiency.

That strategy, IMHO, is risky particularly if you stall for some reason. I/O for instance.
 
A test with the same number of cores and similar memory subsystem (ie, likely 64-bit DDR2 for both) would be appropriate
Moorestown is single-core ATOM,32bit MC.
CPUperf.jpg
 
Also something beyond a comparison slide given by Intel themselves would be good, and one at more similar clock speeds. We are still talking performance/watt, right? I'm sure Cortex-A9 is going to scale well beyond 1GHz in something.

Look at the pipelines of both CPUs, issue capabilities, stalls, BTB sizes, cache arrangement, and tell me what you think would be giving Atom a substantial advantage clock for clock vs Cortex-A9 w/NEON, especially in cases where hyperthreading isn't providing much benefit. The only real advantage I see Atom having is folded load/store (and is very good at it as far as x86 goes), but the smaller register file and AGIs really make it hard to keep the in-order pipelines full. At least, that's my experience writing ASM for it.
 
Also something beyond a comparison slide given by Intel themselves would be good, and one at more similar clock speeds. We are still talking performance/watt, right? I'm sure Cortex-A9 is going to scale well beyond 1GHz in something.

That would require ARM releasing actual performance data instead of drystone. For clock speed, I'm sure some A9's will go more than 1 Ghz, just like Atom does more than 1.5 but for phones, atom will do 1.5 and it is unlikely that A9 will do more than 1.

Look at the pipelines of both CPUs, issue capabilities, stalls, BTB sizes, cache arrangement, and tell me what you think would be giving Atom a substantial advantage clock for clock vs Cortex-A9 w/NEON, especially in cases where hyperthreading isn't providing much benefit. The only real advantage I see Atom having is folded load/store (and is very good at it as far as x86 goes), but the smaller register file and AGIs really make it hard to keep the in-order pipelines full. At least, that's my experience writing ASM for it.

Power savings is more about implementation than pipeline architectures. Atom has the same size register file as ARM, fyi.
 
That would require ARM releasing actual performance data instead of drystone. For clock speed, I'm sure some A9's will go more than 1 Ghz, just like Atom does more than 1.5 but for phones, atom will do 1.5 and it is unlikely that A9 will do more than 1.

I place firm bets on Cortex-A9 consuming less at 1.5GHz than Atom, so I wonder why you think that. If it doesn't at 45nm it certainly will at 28 or whatever is next.

Power savings is more about implementation than pipeline architectures. Atom has the same size register file as ARM, fyi.

Okay, so we have lower watt/MHz and better performance/MHz, so what are you on about, idle power? Because I'm not so sure Atom wins there either.

And actually, while some Atom chips support x86-64 the Z series does not, and therefore does not have the same size register file as ARM.
 
I place firm bets on Cortex-A9 consuming less at 1.5GHz than Atom, so I wonder why you think that. If it doesn't at 45nm it certainly will at 28 or whatever is next.

I don't. And most in the industry don't either. A9 is an up power design to get some performance competitiveness.

Okay, so we have lower watt/MHz and better performance/MHz, so what are you on about, idle power? Because I'm not so sure Atom wins there either.

watt/Mhz and perf/Mhz are still up for debate and actually quite variable. The main thing that matters is how often and how long a part is actually using power in the mobile phone industry.

And actually, while some Atom chips support x86-64 the Z series does not, and therefore does not have the same size register file as ARM.

The current Z series chips aren't being designed into new devices. We'll have to wait until more information is available on the new integrated design.
 
I don't. And most in the industry don't either. A9 is an up power design to get some performance competitiveness.

ARM claims Cortex-A9 consumes about the same A8 does and there isn't data from another source to deny this - although it's a move to an out of order design it's also a smaller pipeline than A8 (9 stages instead of 13, compare with Atom at 16), which makes a few compromises (no more fold shifts for instance) - do you think they did that as a bid to increase performance at the cost of power consumption too? And I suppose this increased power per core is why Cortex-A9 will likely be pushed in dual core packages on phones (while Atom likely will not be?)

watt/Mhz and perf/Mhz are still up for debate and actually quite variable. The main thing that matters is how often and how long a part is actually using power in the mobile phone industry.

We were talking in this thread about server usage, not mobile phones. Idle power obviously matters, but I still don't see how ARM SoCs are lagging in that regard, given that they've been in phones all this time and Atom hasn't been...

The current Z series chips aren't being designed into new devices. We'll have to wait until more information is available on the new integrated design.

There's a reason why most Atoms are not EM64T, despite Intel having designed for it. 64-bit is currently a waste for mobile, unfortunately you don't get the extra registers without the 64-bit.
 
I don't think it would have anything to do with Xbox,MS will stick with PPC to keep backward compatibility.

Not necessarily. Consider that they want Windows Phone 7 games to be cross-compatible (though through a graphics API) with x-box titles as well as Windows titles. Since WP7 will primarily run on ARM, it's feasible and even logical for them to swap x-box over to ARM. Easy enough to run a software translation for older games such as the PS3 does.
 
Not necessarily. Consider that they want Windows Phone 7 games to be cross-compatible (though through a graphics API) with x-box titles as well as Windows titles. Since WP7 will primarily run on ARM, it's feasible and even logical for them to swap x-box over to ARM. Easy enough to run a software translation for older games such as the PS3 does.

Heard of this little thing called the CLR?
 
That partially has to due with a difference in strategy. do >2x the work in <1/2 the time and go back into sleep vs burn less power for longer periods of time. So far all data points to hurry up and sleep as having overall better power efficiency.

This isn't true. If you use 2W, or 2 joules per second whereas another chip uses 500mW, or 0.5 joules per second, you'd save no power by running for 1 second vs the other chip running 2. It's a matter of total energy per task, not how fast you complete it.

The only variable that might throw that off is leakage. But leakage doesn't remain constant either. The faster your processor, the higher the leakage (exponentially so) since you'll need to run at a leakier process (which Intel does compared to TSMC's LP) and use higher voltage (which Intel does compared to the typical ARM chip).

All in all, it's never more power efficient to use a faster processor. Not even if you can shut down when idle. This is why dual core for parallizable tasks is more power efficient. A processor that's 2x slower actually consumes far less than half the power.

Moorestown is single-core ATOM,32bit MC.

Like I said. The power numbers are coincidentally missing from that slide. It's easy to claim 2x the performance when you eat 4x the power. Of course, the 2W figure is just speculation since we don't have numbers from Intel. Typical Cortex A8 chips at 1GHz consume 500mW or less (with the exception of Hummingbird).
 
Last edited by a moderator:
Back
Top