ARM adds virtualization extensions to Eagle

argor

Newcomer
from http://arstechnica.com/business/new...zation-tech-adds-more-fuel-to-server-fire.ars
a presentation at Stanford's Hot Chips conference on Tuesday, ARM added a few more drops to the trickle of information that's coming out which suggests that the UK-based mobile and embedded processor designer is very seriously pursuing the server market. Specifically, ARM's David Brash described a new set of virtualization extensions for the ARM-v7-A architecture, which will be included in the follow-on to Cortex A9. Brash also described an OS-managed address extension that will alleviate some of the I/O and memory pressure that goes with ARM's 4GB memory limit.

This upcoming architecture is codenamed "Eagle," and Brash indicated that it will be unveiled soon. The virtualization and address extensions for Eagle lend credence to what has been heard from other sources, which is that Eagle has some features that are clearly aimed at higher-end, non-mobile or embedded applications and that many ARM vendors are shopping detailed ARM-based server roadmaps to some major cloud customers.

Also relevant is the thoroughly plausible rumor from Semi Accurate which claims that Facebook will be deploying ARM servers in its Oregon datacenter. This wouldn't be at all surprising, since many datacenter customers are thinking along these lines already. We've mostly covered this trend under the label of "physicalization," but the idea is becoming so ubiquitous that it doesn't really need a special name anymore.

ARM's approach to virtualization is fairly straightforward, and will be familiar to anyone who has read up on Intel's and AMD's virtualization extensions. ARM has added a new hypervisor execution mode, along with trap and control support that will let the hypervisor easily trap a wide array of operations and privileged control instructions.

Ironically, the ARM virtualization extensions may find far more widespread use in mobiles than in the datacenter. In a 2009 article on VMware's Mobile Virtualization Platform (MVP) announcement, we described the benefits to carriers of using virtualization to let customers choose an OS for their phone at the time of purchase.

In today's world, when a company like Nokia brings to market a new phone, that phone is a "Windows Mobile" phone, or an Android phone, or a Symbian phone, etc. The underlying hardware might be able to support any number of mobile OSes, but Nokia (or whoever) has to pick a single OS to tie that hardware to for the life of the product. But if Nokia has a hypervisor that can run a variety of OS options on the same hardware, then the company can easily offer a handset with more than one OS option. Indeed, the mix of OS versions can be trivially adjusted on a store-by-store basis by just shipping phones to stores and having the salesperson load the OS image onto the phone at the point of sale.

So from a carrier and mobile handset manufacturer's perspective, virtualization will increase mobile OS competition and lower the up-front risk of betting on an OS, all while giving carriers more flexibility to manage their product mix on-demand.

Given the huge and growing number of Android phones that could potentially support an alternate OS like Meego, it's easy to imagine a world in which Verizon ships a Droid 3 or 4 that can be configured in-store to run one of a number of OSes.

It's also the case that tablets could benefit, as well. It's entirely possible that HP might want to ship both Android and webOS on the same tablet device, and let the customer choose.
 
Oh it's on now. Hopefully some of these will be useful for your non-virtualizing style emulators too.

A little sad to see that they're going the PAE route here, but I guess that also shows that Semi Accurate was wrong when they said it'd be 64-bit?
 
Doesn't look like 64-bit indeed. That certainly seems to make the other parts of Charlie's article less likely too...
 
Semi Accurate was wrong when they said it'd be 64-bit?
we really don't know rite now it is just rumors
some rumors say that Eagle will be 64-bit but other rumors say it will not be 64-bit
but ARM has not disclosed yet if Eagle will be 64-bit or not
still there is a third option little bit unlikely but still a possibility is that ARM will offer Eagle in 64-bit and 32-bit configurations ;)
 
But what you posted doesn't sound like a rumor, it sounds like an official presentation, and it contradicts 64-bit.

64-bit is really not even close to free in terms of die space and power consumption, hence why very few Atoms (ie, the ones intended not even for netbooks but desktops) incorporate it, despite being part of the design.
 
Last edited by a moderator:
64-bit is really not even close to free in terms of die space and power consumption, hence why very few Atoms (ie, the ones intended not even for netbooks but desktops) incorporate it, despite being part of the design.
I'd be surprised if all Atom hadn't 64-bit support. It's just fused out so die space isn't changed.
 
I'd be surprised if all Atom hadn't 64-bit support. It's just fused out so die space isn't changed.
Atom's 64bit support comes at a significant performance hit though. I haven't looked at slides or anything, but I'd wager all execution units are 32bit, and wider operations take several trips to complete. Of course, if die area is the primary concern, that's not such a bad model to adopt.
ARM could do the same. They used to have their integer multipliers configured that way already.
 
Atom's 64bit support comes at a significant performance hit though. I haven't looked at slides or anything, but I'd wager all execution units are 32bit, and wider operations take several trips to complete. Of course, if die area is the primary concern, that's not such a bad model to adopt.
Where did you get that information that 64-bit takes a hit on Atom? According to quick measures I did this is completely wrong: a 32-bit add takes the same time as a 64-bit one.

Also please look here: http://www.freeweb.hu/instlatx64/GenuineIntel00106C2_DiamondvilleDC_InstLatX64.txt
Compare for instance latency of add 32 (index 72) versus add 64 (index 73).
 
Where did you get that information that 64-bit takes a hit on Atom? According to quick measures I did this is completely wrong: a 32-bit add takes the same time as a 64-bit one.

Also please look here: http://www.freeweb.hu/instlatx64/GenuineIntel00106C2_DiamondvilleDC_InstLatX64.txt
Compare for instance latency of add 32 (index 72) versus add 64 (index 73).
I've used an Intel D945GCLF2 (Atom 330) as my primary workhorse machine for a good, long while.

By the looks of this test, the results are all measured in the same 64bit executable. It wouldn't show latencies as achieved on a 32bit system.
Here's a 32-vs-64bit comparison on the same CPU, which is flawed by using different distro revisions, but still:
http://popolon.org/gblog2/atom330-benchmark
 
By the looks of this test, the results are all measured in the same 64bit executable. It wouldn't show latencies as achieved on a 32bit system.
So please compare 64-bit exec vs 32-bit. You'll see that latency is the same.

Here's a 32-vs-64bit comparison on the same CPU, which is flawed by using different distro revisions, but still:
http://popolon.org/gblog2/atom330-benchmark
I'm sorry but that comparison is mostly useless. It's known that tests that use a lot of pointers might see slowdowns on 64-bit machines when compared to 32-bit ones. Also there's no guarantee the same compiler was used.

For instance this review doesn't show that 32- vs 64-bit Atom makes such a difference. What I mean is that if Atom 64-bit instructions really were implemented internally as 32-bit instructions (or using multicycle functional units) then the difference would be much bigger.
 
I'd be surprised if all Atom hadn't 64-bit support. It's just fused out so die space isn't changed.

That would surprise me (particularly with so few Atoms supporting it), but I guess the power consumption bullet is by far the larger of the two. It's a little confusing as well, surely 64-bit has to exist all over the core and isn't a contained part you remove. Maybe since x86-64 supports 32-bit fully the ALU will have proper 32-bit only paths that get exercised, but you'd still need 64-bit address lines and register transfer paths and what have you, wouldn't you?

For instance this review doesn't show that 32- vs 64-bit Atom makes such a difference. What I mean is that if Atom 64-bit instructions really were implemented internally as 32-bit instructions (or using multicycle functional units) then the difference would be much bigger.

But how often are 64-bit ALU operations used in the first place, in 64-bit binaries? I don't think the difference would actually be very big at all, not that I'm doubting you since you've shown the execution to be the same anyway (although I would expect 64-bit multiplies and divides to take longer, and of course they do)
 
Last edited by a moderator:
That would surprise me (particularly with so few Atoms supporting it), but I guess the power consumption bullet is by far the larger of the two. It's a little confusing as well, surely 64-bit has to exist all over the core and isn't a contained part you remove. Maybe since x86-64 supports 32-bit fully the ALU will have proper 32-bit only paths that get exercised, but you'd still need 64-bit address lines and register transfer paths and what have you, wouldn't you?
I'm no implementation specialist, but I don't think that 64-bit paths where the 32 higher bits don't toggle would increase power by much. Even having registers (in the Verilog meaning) that always are 0 wouldn't increase power too much. OTOH from the R&D and production point of view having two designs where you do place & route "by hand" would cost you a lot.

I might be wrong of course and Intel could have two different designs for 32- and 64-bit Atoms, but that doesn't look likely to me.

(although I would expect 64-bit multiplies and divides to take longer, and of course they do)
Yes, muls and divs are very different beasts :smile:
 
At the very least I think they must be doing something where physically disabling it in hardware is a big power win over having it and not using it, or else all Atoms would have 64-bit support. That placing/routing by hand is so important to begin with would suggest that there'd be benefit to removing the non-64bit stuff, even if just reducing critical path lengths.

It's kind of a shame because I think that Atom's design stresses the 8 register x86 limitation harder than almost any other x86 I've seen on account of being dual issue in-order, having long AGIs, and just one load/store pipe (on the other hand, there's no load use so I guess it balances out, Pentium I might not be any better). Having 16 registers instead would help a lot.
 
At the very least I think they must be doing something where physically disabling it in hardware is a big power win over having it and not using it, or else all Atoms would have 64-bit support.
That's not a technical issue, it's only market segmentation, Intel doesn't want Atom to eat lower ends Core2 market parts. Why do you think not all Atoms have virtualisation extensions? Certainly not because of power :)

That placing/routing by hand is so important to begin with would suggest that there'd be benefit to removing the non-64bit stuff, even if just reducing critical path lengths.
Agreed. But I don't think the cost would be worth the effort (and don't forget the increased validation effort which I forgot as a cost concern).

Again I'm not sure of my claim, I just haven't read something that convinces me I'm wrong... Yet :)
 
That's not a technical issue, it's only market segmentation, Intel doesn't want Atom to eat lower ends Core2 market parts. Why do you think not all Atoms have virtualisation extensions? Certainly not because of power :)

I very much doubt this is due to market segmentation, or else no Atoms would have 64-bit support (and I wouldn't be quick to consider virtualization as not taking power or die space either).

If any Atoms stood any chance of threatening lower end Core 2/i7 parts (IMO, they don't, too weak) it'd be precisely the ones that are being sold with 64-bit support, the ones that have two cores and are marketed for use in desktops. The N series that have been popular in netbooks and especially the Z series that Intel would like to see in phones now have little to no tangible market overlap with the current heavy duty desktop chips.

If 64-bit was at all negligible power-wise Intel would be all about using it to bolster perf/watt. Even if it didn't work out in that space I can definitely imagine that Intel would be touting Atom as forward moving and versatile compared to the perpetually 32-bit ARM.

Intel hardly gets an Atom press release out without announcing how much better the next generation will be. They're constantly working on what I assume to be design aspects relevant to the core in some way, so I expect all the cost of hand routing and validation is at least somewhat under control.
 
Back
Top