Microsoft takes an Arm architecture license

Heard of this little thing called the CLR?

Just because it runs doesn't mean it runs equally well. If you narrow the architecture you optimize for, you can have significant speed improvements. Particularly if you'll be using SIMD heavily.
 
The only variable that might throw that off is leakage. But leakage doesn't remain constant either. The faster your processor, the higher the leakage (exponentially so) since you'll need to run at a leakier process (which Intel does compared to TSMC's LP) and use higher voltage (which Intel does compared to the typical ARM chip).

Leakage isn't frequency dependent but voltage dependent. In fact, leakage has little correlation to performance of a silicon device. And FYI, TSMC's process is higher leakage than Intel based on all the documented/reported measurements. In addition, Intel's process has a higher Ion/Ioff ratio than any of TSMC's processes, again according to published and reported data. Data also suggests that Intel's process can support lower Vmin's than TSMC's process too.

All in all, it's never more power efficient to use a faster processor. Not even if you can shut down when idle. This is why dual core for parallizable tasks is more power efficient. A processor that's 2x slower actually consumes far less than half the power.

This has been proven false in many measurements in many different papers over many different years. Hurry up and sleep has proven to be a more efficient method of computation. Low TDP/Pmax is only important from a thermal and sustained power delivery standpoint and not as a metric for total energy required.


Like I said. The power numbers are coincidentally missing from that slide. It's easy to claim 2x the performance when you eat 4x the power. Of course, the 2W figure is just speculation since we don't have numbers from Intel. Typical Cortex A8 chips at 1GHz consume 500mW or less (with the exception of Hummingbird).

They are both targeting the same market with roughly the same battery life. And people really need to learn that TDP != power consumed.
 
aaronspink said:
Leakage isn't frequency dependent but voltage dependent.

And pushing frequency is voltage dependent..

aaronspink said:
In fact, leakage has little correlation to performance of a silicon device.

Has plenty of correlation when you're measuring amount of power consumed while sleeping, with the "hurry up and sleep" strategy you're promoting.

This has been proven false in many measurements in many different papers over many different years. Hurry up and sleep has proven to be a more efficient method of computation. Low TDP/Pmax is only important from a thermal and sustained power delivery standpoint and not as a metric for total energy required.

If that were true then we would have never seen frequency scaling ala speed step, which is also a major facet of Atom. Intel does have lower and lower consuming sleep modes, but they also take longer and longer to come out of, which makes them scale poorly for throttling active workloads. Of course this isn't true if you're doing a batch job, but most interesting computing on consumer devices is interactive enough to prevent this, so you're really going to be doing work sleep work sleep with a fairly quick duty cycle.

aaronspink said:
They are both targeting the same market with roughly the same battery life. And people really need to learn that TDP != power consumed.

No, but you can't say that dissipating four times as much heat isn't a reflection on power consumption, I imagine that heat is coming from somewhere...
 
Leakage isn't frequency dependent but voltage dependent. In fact, leakage has little correlation to performance of a silicon device.

I believe the reasons I listed that it's frequency dependent is "since it'll be on a leakier process and also because of higher voltage".

Leakage has very much to do with performance of a design. Granted one manufacturer's process may simply be superior and have both higher performance *and* lower leakage. But the point you were making was that all else being equal, a faster processor (running at a higher frequency) would consume less power for a given task than a slower processor (running at a lower frequency). This is simply false.

And FYI, TSMC's process is higher leakage than Intel based on all the documented/reported measurements. In addition, Intel's process has a higher Ion/Ioff ratio than any of TSMC's processes, again according to published and reported data. Data also suggests that Intel's process can support lower Vmin's than TSMC's process too.

Please provide references. All the data I've seen confirm your second and third point but not your first. TSMC is still using bulk whereas all of Intel's newer nodes are on metal gate. One of the things about metal gate is that leakage is a whole lot worse.

Also compare the Ioff numbers for TSMC's LP and TPG processes vs Intel's. They are nowhere near close.

This has been proven false in many measurements in many different papers over many different years. Hurry up and sleep has proven to be a more efficient method of computation. Low TDP/Pmax is only important from a thermal and sustained power delivery standpoint and not as a metric for total energy required.

Provide references and details of said comparison.

They are both targeting the same market with roughly the same battery life. And people really need to learn that TDP != power consumed.

Again, provide references. We've seen no numbers for Moorestown's power consumption while using the CPU. Saying "roughly the same battery life" is nebulous and meaningless. One can do that through more efficient software, better software/hardware coupling, better battery technology. Hell, a better PCB design or a more efficient screen dimming algorithm will impact battery life dramatically.

What are the numbers for Moorestown's CPU while in full usage? What gives you any reason to believe it will be 1/4 of the current Atom to be competitive with ARM offerings?
 
And pushing frequency is voltage dependent..

Has plenty of correlation when you're measuring amount of power consumed while sleeping, with the "hurry up and sleep" strategy you're promoting.

Not only that, people forget leakage is a factor even while the device is active. At 45G, for a typical LVT cell, leakage accounts for about a third of the power consumed while at 1.2V.

That's what I meant when I said leakage scaled. If you increase the voltage in order to increase frequency, your power consumption grows far faster than your frequency due to both higher leakage *and* higher dynamic power. Your 2x frequency results in ~3x the power consumed.

For reference, Scorpion at 65LP consumes 200mW (published) at 600MHz. It consumes more than 500mW at 1GHz.

No, but you can't say that dissipating four times as much heat isn't a reflection on power consumption, I imagine that heat is coming from somewhere...

Not to mention TDP is usually lower than max power consumption anyway.
 
No, not really. frequency can be partially voltage dependent, partially RC, partially design, etc.

Yes, yes really. Frequency is a function of voltage along with other factors. In almost all chips in the mobile space, it is practically the first knob you tweak when trying to push higher frequencies during chip bring-up. Hell, why do you think timing corners for every single microchip out there is done at max and min voltage nodes?
 
Leakage has very much to do with performance of a design. Granted one manufacturer's process may simply be superior and have both higher performance *and* lower leakage. But the point you were making was that all else being equal, a faster processor (running at a higher frequency) would consume less power for a given task than a slower processor (running at a lower frequency). This is simply false.

Leakage has nothing to do with the performance of a design. When you figure that out, you'll start to understand.


Please provide references. All the data I've seen confirm your second and third point but not your first. TSMC is still using bulk whereas all of Intel's newer nodes are on metal gate. One of the things about metal gate is that leakage is a whole lot worse.

Part of the whole point in going to hi-k/metal gate is orders of magnitude reduction in gate leakage:

Code:
High-κ gate dielectrics and metal gate electrodes are required for enabling continued equivalent gate oxide thickness
scaling, and hence high performance, and for controlling gate oxide leakage for both future silicon and emerging non-silicon
nanoelectronic transistors

Also compare the Ioff numbers for TSMC's LP and TPG processes vs Intel's. They are nowhere near close.
Provide references and details of said comparison.

Start looking at IEDM papers. TSMC is behind in all aspects and always has been. Then again, why should I expect someone who thinks hik/metal gate causes increased leakage to understand process.

Again, provide references. We've seen no numbers for Moorestown's power consumption while using the CPU. Saying "roughly the same battery life" is nebulous and meaningless. One can do that through more efficient software, better software/hardware coupling, better battery technology. Hell, a better PCB design or a more efficient screen dimming algorithm will impact battery life dramatically.

http://www.anandtech.com/show/3696/...600-series-the-fastest-smartphone-processor/4
 
Leakage has nothing to do with the performance of a design. When you figure that out, you'll start to understand.

Yes, it does. When you figure that out, you'll start to understand. Hint: search for papers on multi-Vt cell design.

Part of the whole point in going to hi-k/metal gate is orders of magnitude reduction in gate leakage:

No, it isn't. Metal gate is all about speed. Leakage and dynamic power unfortunately suffers. Of course, speed and leakage are not entirely separate. So being able to scale speed without reducing oxide thickness (as metal gate does) allows you to continue frequency scaling without making leakage as bad as it would be if you kept making the SiO2 insulator thinner. That is not to say processes that are on bulk and uses thick gate oxides (such as TSMC) are not orders of magnitude lower than metal gate processes. It's simply that TSMC's processes are orders of magnitude slower as well.

Start looking at IEDM papers. TSMC is behind in all aspects and always has been. Then again, why should I expect someone who thinks hik/metal gate causes increased leakage to understand process.

I have. TSMC is either behind of speed or leakage. Not both at the same time. Their TPG process is well below Intel's but slower. Their G process has higher leakage than Intel's. But then again why should I expect someone with a clear lack of reasoning skills to understand.


Again I will ask. Where are the power numbers for the Moorestown CPU? Which you avoid and side-step.
 
No where have I seen Intel publish anything resembling peak power consumption numbers for Moorsetown. There are figures for web browsing - which could mean pretty much anything in terms of CPU load - and there are figures for video playback, which we know to be heavily offloaded to fixed function hardware (and in that regard those numbers are way behind what Apple promotes for iPhone, which as far as I recall has a smaller battery). Intel is eager to outline its breakthroughs on diminishing idle consumption and improved scaling and gating, but they routinely shy away from real power consumption numbers that are proportional to the amount of work actually being done.

I take the benchmarks with more than a grain of salt. We don't know anything about the compiler setup, and some reports have me believing that SPECint 2000 is performing worse than typical apps on Cortex-A9. So there can definitely be an issue. The Javascript comparisons are even worse, given that I suspect one is being JITed and one is not - guess which.

Maybe it's too analytical of me, but I have a difficult time embracing benchmark results when all knowledge of the platform paints a different story, and when you look at it this way you have:

- Pipeline length - 9 stages vs 16, Atom handles branch mispredicts much more poorly
- Cache - A8 has 1 cycle latency on L1 and 8 cycle on L2, Atom has 2 and 16 cycle respectively. I can't speak for A9 (I don't expect L1 to be worse, L2 might be slightly by virtue of being shared between cores), but on an in-order design you entirely expect Atom's higher L2 latency to be a penalty. And A9 cores are shipping with 1MB of L2, which will give them a significant advantage in single core applications (vs 512KB on Atom, where a healthy advantage is maintained vs 256 on many A8 chips)
- BTB size - 512 on A8, I assume at least as large on A9, Atom has 128 and it suffers for it (see Agner Fog's reports)
- Issue rate - Both can decode two instructions per cycle, but Atom can be shortchanged by being fed more than 8 bytes per 2 instructions, which is fairly typical in x86 code, especially that not Atom optimized. ARM ISA doesn't have this problem.
- Out of order execution - the benefits here go without saying, and not having speculative execution w/register naming is to great detriment on the register starved Atom, especially when trying to fill 3 cycle AGI stalls

Atom does have hyper threading, but its much less effective than in other wider x86 cores, and can even hurt performance (again, see Agner Fog). I'd much rather have two cores.

One of the only benefits I see on Atom's side is, as I mentioned, fast paired memory/ALU operations (especially RMWs). But with only one load/store unit this isn't exactly a huge benefit - you leverage much more by keeping more when it becomes a matter of keeping things in registers on ARM and actually dual issuing operations on them.

The other benefit is wider floating point SIMD. On the other hand, it can't do more than the 2 single-precision multiplies per cycle NEON can, and it can't single-cycle issue multiply + add like NEON either, so it's kind of a wash.

Everything in mind, I'd really like someone who is suggesting that Atom will tend to outperform Cortex-A9 per clock in a balanced comparison to either offer an explanation why, attribute some other factor (poor memory controller seems to get a lot of ARM implementations, but I stand by compiler output being worse), or at least admitting they don't know. Not just responding along the lines of "well of course it, stupid."
 
Back
Top