Intel Silvermont(Next-gen OOE Atom)

DSC · May 6, 2013

http://www.anandtech.com/show/6936/...tecture-revealed-getting-serious-about-mobile

http://techreport.com/review/24767/the-next-atom-intel-silvermont-architecture-revealed

Homeles · May 6, 2013

Oh jeez, I've been anxiously awaiting this.

AnandTech said:
Intel is talking about a 50% improvement in IPC at the core, combine that with a 30% improvement in frequency without any power impact and you’re now at 83% better performance potentially with no power penalty.

Aren't those gains multiplicative, not additive? Should be 95% higher performance. (Later slides confirm this -- single threaded performance is double that of Saltwell)

AnandTech said:
On single threaded performance, you should expect a 2.4GHz Silvermont to perform like a 1.2GHz Penryn. To put it in perspective of actual systems, we’re talking about around the level of performance of an 11-inch Core 2 Duo MacBook Air from 2010. Keep in mind, I’m talking about single threaded performance here. In heavily threaded applications, a quad-core Silvermont should be able to bat even further up the Penryn line. Intel is able to do all of this with only a 2-wide machine (lower IPC, but much higher frequency thanks to 22nm).

The performance of this thing is absurd.

I've been betting on Intel to cream the hell out of ARM in this war, but I never expected Silvermont to be quite this good. Anand draws a lot of comparisons to Conroe, while cautiously throwing in a lot of disclaimers -- but the comparison is exceptionally on target.

I'd imagine that things are going to taper off from here though. There's certainly a lot of room for Atom to get wider, but the low hanging fruit have all been picked. I mean, Intel got everything. They went OoO, implemented a real IMC, improved what was already a great 22nm process for mobile, increased L2 size while simultaneously lowering latency, and implemented a real turbo boost.

I wonder what the die size of this thing is, and I cannot wait to see photos.

Okay, so Silvermont blows ARM out of the water, but what about Jaguar? I'm curious to see how it competes against AMD's analogue. Atom should be less expensive (to produce at least), but I'm interested in seeing the CPU performance deltas in particular.

liolio · May 6, 2013

Homeles said:
The performance of this thing is absurd.

I've been betting on Intel to cream the hell out of ARM in this war, but I never expected Silvermont to be quite this good. Anand draws a lot of comparisons to Conroe, while cautiously throwing in a lot of disclaimers -- but the comparison is exceptionally on target.

I'd imagine that things are going to taper off from here though. There's certainly a lot of room for Atom to get wider, but the low hanging fruit have all been picked. I mean, Intel got everything. They went OoO, implemented a real IMC, improved what was already a great 22nm process for mobile, increased L2 size while simultaneously lowering latency, and implemented a real turbo boost.

I wonder what the die size of this thing is, and I cannot wait to see photos.

Okay, so Silvermont blows ARM out of the water, but what about Jaguar? I'm curious to see how it competes against AMD's analogue. Atom should be less expensive (to produce at least), but I'm interested in seeing the CPU performance deltas in particular.

Well on a technical basis Intel should take a lead competitors will have tough time to reclaim (at least till other founders catch up) but I would be be wary about how it will translate in market shares there are many factors including price war.
But indeed I never really got the gloom and doom of some financials about Intel and why Qualcomm capitalization exceeds now the one of Intel, imho some people in the financial world are out their freaking rockers (not that Qualcomnm is a bad company by any extend, they do great). If it comes down to that to save the volume it needs for its fab, Intel will lower its margins.
On the tech side imho there is no company that can compete now with Intel when they are at it.
Now technical merit doesn't decide everything, Intel needs partner foremost secure the Nexus line on Google side and also support whatever MSFT product comes next (tablet and phones).
I wonder if they could go with exclusive partnership at first, providing top of the line hardware, to build a really strong brand name in the sector and then open up to other manufacturers with the follow up chip (@14nm). They need a stronger brand and a great partnership, Intel inside doesn't cut it in the embedded realm they need one killer product in both the phone and tablet realm. Samsung is out of the picture, I'm iffy about HTC or Nokia getting (back) the traction (they used to have) even with a top of line product, there is really not much option Google, MSFT (though they don't have a phone line at the moment) either start their own (unlikely but could prove not a bad idea).

Albuquerque · May 6, 2013

Waiting for an updated version of Lenovo's Tablet 2 device with this processor. w00t

tunafish · May 8, 2013

Homeles said:
Okay, so Silvermont blows ARM out of the water, but what about Jaguar?

Jaguar seems to be better at everything except L2 latency (at which it's much worse) and clock speed. All the OoOE buffers in Jaguar are much larger, and it has a load + store units, instead of a load/store unit.

I'd expect that Jaguar should be faster in most circumstances, but that Silvermont is able to scale to a much smaller power envelope.

Wynix · May 8, 2013

tunafish said:
Jaguar seems to be better at everything except L2 latency (at which it's much worse) and clock speed. All the OoOE buffers in Jaguar are much larger, and it has a load + store units, instead of a load/store unit.

I'd expect that Jaguar should be faster in most circumstances, but that Silvermont is able to scale to a much smaller power envelope.

This was to be expected, jaguar is aimed at tablet and up, Silvermont is aimed at tablet and down.

liolio · May 8, 2013

Whereas Jaguar should preform better I've a pretty bad feeling about it now, Intel spoke a couples of weeks ago about cheap laptops with a sleak form factor that could be powered by those new Atom and sell as low as 249/299$. Sort of low end ultrabook.
Whereas Jaguar should perform better (CPU is unclear / depend on the gap in clock speed, GPU is a given) the power consumption should not be in the same ballpark. It is extremely bothering for AMd we speak ~an order of magnitude more power, that is a lot with a lot of impact on costs.
I start to seriously question the chances Jaguar has to succeed as a tablet or netbook 2.0 processor, Intel delivered a massive blow into AMD forecast

EDIT
Jaguar could slightly better better per cycle as others have been pointing to though that let clock speed out of the picture.

mczak · May 8, 2013

tunafish said:
Jaguar seems to be better at everything except L2 latency (at which it's much worse) and clock speed. All the OoOE buffers in Jaguar are much larger, and it has a load + store units, instead of a load/store unit.

The numbers are difficult to compare as far as OoOE resources go but it doesn't look to me like they are "much" larger overall for Jaguar. Maybe slightly larger. A pity the realworldtech article only compares silvermont to saltwell, but not Jaguar.
So far I see both having integer PRF with 64 entries, though Jaguar being slightly more flexible (can use 33-44 entries available for renaming whereas silvermont is just using a fixed 32 for that). Store queues also have similar size (20 for Jaguar, 16 for Silvermont). Float PRF though seems indeed larger on Jaguar (72 entries vs. just 32).
Ok Jaguar seems to be able to have significantly more operations in flight (64 for int pipe, 44 for float pipe, both 32 for silvermont), and similarly the schedulers hold more entries (20 alu, 18 float, 12 mem vs. 8+8 alu, 8+8 float, 6 mem) while being unified (well separate for int/float/mem but not per execution unit) which should also help a bit.
So far I see advantages for Silvermont for L2 cache (while theoretically smaller for single-thread case the much better latency will make much more of an impact), and the smaller branch mispredict penalty (10 vs 14 cycles - nothing to sneeze at).
Jaguar has an advantage due to full store and load pipe (though as far as single load/store pipes go silvermont looks quite spiffy there), and there's some advantage in the simd unit because silvermont seems to retain the half-wide multiplier.
Let's compare some other hard numbers, Silvermont listed first:
L1I TLB: 48/4KB entries vs. 40/4KB + 8/2MB entries, both fully associative
L1D TLB: 48/4KB entries vs. 40/4KB + 8/2MB entries, both fully associative
L2 TLB: 128/4KB/4-way + 16/2MB/4-way entries vs. 512/4KB/4-way + 256/2MB/2-way (serving twice as many cores)
L1I Cache: 32KB/8-way vs. 32KB/2-way, both with 64B line size
L1D Cache: 24KB/6-way vs. 32KB/8-way (both can handle 16B load and store simultaneously)
L2 Cache: 1MB/16-way (13-14 cycles) vs. 2MB/16-way (25 cycles) (serving twice as many cores)

So Jaguar can better handle large pages, but overall these chips look like designed for similar throughput (per clock). Oh and Silvermont has better L1I cache, though I guess unlike on BD the 2-way associativity of Jaguar doesn't hurt that much.
I can't quite judge things like branch predictors, loop buffers etc when comparing these two chips which probably can make quite some difference.
But I guess you're probably right, the OoOE resources are somewhat larger in general for Jaguar so you'd think it should perform a bit better per clock (particularly on the simd side I think - on the int side I'm not sure if the better l2 cache latency, lower branch misprediction penalty (though that would depend on branch predictor quality) couldn't make up for that). In any case, performance of these chips should track much more closely overall unlike Bonnell vs. Bobcat (where you get from Bobcat annihilates Bonnell to about as fast and everything in-between depending on the code).

liolio · May 9, 2013

Ain't that a really bad news once you took in account the difference is clock speed (and power) between Silvermont and Jaguar?

epicstruggle · May 11, 2013

Wynix said:
This was to be expected, jaguar is aimed at tablet and up, Silvermont is aimed at tablet and down.

Not entirely accurate, they also took into account microservers.

Exophase · May 11, 2013

liolio said:
Ain't that a really bad news once you took in account the difference is clock speed (and power) between Silvermont and Jaguar?

For Temash, yes. Although its GPU will probably still be faster. For the 2GHz Kabini, not as much, but at its TDP it'll only find its way in larger devices.

There's also the time gap between releases, 6 months isn't insignificant.

sebbbi · May 11, 2013

liolio said:
Ain't that a really bad news once you took in account the difference is clock speed (and power) between Silvermont and Jaguar?

Intel hasn't yet revealed the SOC configurations / clocks / TDP. The only information we have so far is that one Silvermont CPU core consumes less than 1W (Real world tech). If that is "slightly under 1W", the CPU cores add up to 3-4W (and that's without a GPU). That's not far away from 3.5-5.9W Temash figures. But as long as we don't know the clocks, we cannot make a proper TDP comparision.

We should assume that Intel has better performance per watt (in general purpose code), because they have finally brought their most advanced process technology to their low power parts (22nm to tri-gate vs 28nm bulk in Jaguar). I would expect them to have slight advantage in scalar integer performance as well (as the IPC should be only slightly behind, but Intel has higher clock headroom). However AMD should be faster in floating point and SIMD (both float and integer) processing (comparison of Silvermont revealed bits from real world tech article to the AMD Jaguar optimization guide / instruction latency/throughput excel sheet). Jaguar also supports AVX instruction set. The VEX (3 operand non-destructive operands) are good for narrow (2 wide) architectures (less instructions needed for extra register moves and loads). Jaguar also has a really nice full rate 4x32 bit integer SIMD multiply (with superb 2 cycle latency), and fast horizontal ops (and very fast CVT16 among other goodies). I would expect it to fare pretty well against Silvermont in multimedia processing (and games).

liolio · Jul 5, 2013

That is interesting:
http://techreport.com/news/25043/le...ail-based-atom-celeron-and-pentium-processors

Those CPU seems to have pretty awesome power characteristics.

EDIT
There is that too sound a bit crazy when one takes in account clock speed.

Homeles · Jul 6, 2013

Well AnTuTu is supposedly heavily biased towards Silvermont, so those results aren't really useful. Interesting lineup, though.

Can't wait for die shots to surface.

DavidC · Jul 9, 2013

The Tablet oriented Bay Trail-T parts are likely even better on power characteristics than the M/D/I.

liolio said:
EDIT
There is that too sound a bit crazy when one takes in account clock speed.

I wouldn't take the clock speed seriously. It likely ran the benchmark at its full ~2GHz speed then ramped down for power savings after the benchmark was completed.

45-55k is not far from the Core i3 Sandy Bridge range. The benchmark seems to scale perfectly linear with multiple cores, so 2x cores are making up for IPC deficiency against Core chips.

Also, note that its a "system" benchmark.

Laurent06 · Jul 10, 2013

Some AnTuTu analysis: http://www.eetimes.com/author.asp?section_id=36&itc=eetimes_sitedefault&doc_id=1318857

One can basically dismiss AnTuTu when comparing Intel vs ARM. The benchmark seems to have been optimized for Intel starting with version 3.

Intel vs Intel, or ARM vs ARM comparisons remain interesting

mczak · Jul 10, 2013

Laurent06 said:
Some AnTuTu analysis: http://www.eetimes.com/author.asp?section_id=36&itc=eetimes_sitedefault&doc_id=1318857

One can basically dismiss AnTuTu when comparing Intel vs ARM. The benchmark seems to have been optimized for Intel starting with version 3.

I don't think your conclusion is valid. I suspect it is more likely (as this is a native benchmark) the compiler used for 2.9.3 didn't produce code very well optimized for atom - and atom being an in-order architecture this is going to be more important than for OoO chips.
I'm just speculating here of course. Also the high memory score (compared to linpack etc.) could easily be due to measuring latency vs. bandwidth for instance (or using patterns which benefit or not from prefetchers, or mostly measuring l2 or whatnot, atom has very low l2 latency for instance). Most of the other benchmarks (like linpack) aren't native neither so their memory scores are probably more likely going to reflect generic dalvik performance rather than really having much to do directly with memory subsystem.
I am certainly not saying that AnTuTu should be used as THE benchmark when judging atom vs. arm, but as you could see in the results there are indeed some (not that many) benchmarks which also show a similar picture, so I think just dismissing it as "atom-biased" is a bit unfair. Of course intel is cherry-picking benchmarks which show them in a good light, everybody is doing that. You can't blame AnTuTu itself if some reviews use this as the only measure of performance.
FWIW I wouldn't be surprised if performance in other benchmarks scales quite differently than in AnTuTu when comparing silvermont vs. saltwell.

Exophase · Jul 11, 2013

mczak said:
I don't think your conclusion is valid. I suspect it is more likely (as this is a native benchmark) the compiler used for 2.9.3 didn't produce code very well optimized for atom - and atom being an in-order architecture this is going to be more important than for OoO chips.
I'm just speculating here of course. Also the high memory score (compared to linpack etc.) could easily be due to measuring latency vs. bandwidth for instance (or using patterns which benefit or not from prefetchers, or mostly measuring l2 or whatnot, atom has very low l2 latency for instance). Most of the other benchmarks (like linpack) aren't native neither so their memory scores are probably more likely going to reflect generic dalvik performance rather than really having much to do directly with memory subsystem.
I am certainly not saying that AnTuTu should be used as THE benchmark when judging atom vs. arm, but as you could see in the results there are indeed some (not that many) benchmarks which also show a similar picture, so I think just dismissing it as "atom-biased" is a bit unfair. Of course intel is cherry-picking benchmarks which show them in a good light, everybody is doing that. You can't blame AnTuTu itself if some reviews use this as the only measure of performance.
FWIW I wouldn't be surprised if performance in other benchmarks scales quite differently than in AnTuTu when comparing silvermont vs. saltwell.

Then again, maybe Intel really is just cheating.

http://forums.anandtech.com/showthread.php?p=35245611#post35245611

mczak · Jul 11, 2013

Exophase said:
Then again, maybe Intel really is just cheating.

http://forums.anandtech.com/showthread.php?p=35245611#post35245611

Ok you might be right

. (At least I was correct assuming this is indeed due to compiler differences, not the code itself was tweaked to better suit atoms.)
While I was suspecting they were using newer compiler I didn't expect them to use ICC for the x86 build. That is indeed pretty lame since those sort of benchmarks are well known to be quite exploitable by compilers (and even if the compiler didn't cheat and only use legitimate optimizations, it is supposed to be a benchmark indicating hw performance not a benchmark for comparing compiler quality). I could understand though not using vectorization on arm, frankly I've never seen that doing really much with gcc at least on x86 (could have changed recently though dunno). And I'm way too lazy to figure out if gcc could do something with autovectorization on nbench

.
intel is definitely known for using questionable tricks in their compiler which gets them better results in widely used benchmarks, that certainly wouldn't be a first. Those SPEC cpu results were always nice claiming the new cpu got 30% faster where in reality it was just 10% the rest being due to the new compiler used with the new cpu (but not used on the old cpu) was learning one or two new tricks

.
I'd be definitely more interested in AnTuTu 3.3 results using gcc for everything indeed.

liolio · Jul 11, 2013

Interesting still is it really relevant? I mean if you put aside the "fake" differences between Intel CPU, it is a bit like drivers in the GPU realm nobody really cares about knowing which GPU is better it is more about the whole stack (optimization, drivers, GPU).

If Intel gets traction on the mobile market why would devs use a compiler that is performing worse than Intel one because the worse performing compiler is available on different platform?

Intel Silvermont(Next-gen OOE Atom)

DSC

Homeles

liolio

Aquoiboniste

Albuquerque

Red-headed step child

tunafish

Wynix

liolio

Aquoiboniste

mczak

liolio

Aquoiboniste

epicstruggle

Passenger on Serenity

Exophase

sebbbi

liolio

Aquoiboniste

Homeles

DavidC

Laurent06

mczak

Exophase

mczak

liolio

Aquoiboniste

Similar threads