Intel Silvermont(Next-gen OOE Atom)

Discussion in 'PC Industry' started by DSC, May 6, 2013.

Tags:
  1. Homeles

    Newcomer

    Joined:
    May 25, 2012
    Messages:
    234
    Likes Received:
    0
    Oh jeez, I've been anxiously awaiting this.

    Aren't those gains multiplicative, not additive? Should be 95% higher performance. (Later slides confirm this -- single threaded performance is double that of Saltwell)
    The performance of this thing is absurd.

    I've been betting on Intel to cream the hell out of ARM in this war, but I never expected Silvermont to be quite this good. Anand draws a lot of comparisons to Conroe, while cautiously throwing in a lot of disclaimers -- but the comparison is exceptionally on target.

    I'd imagine that things are going to taper off from here though. There's certainly a lot of room for Atom to get wider, but the low hanging fruit have all been picked. I mean, Intel got everything. They went OoO, implemented a real IMC, improved what was already a great 22nm process for mobile, increased L2 size while simultaneously lowering latency, and implemented a real turbo boost.

    I wonder what the die size of this thing is, and I cannot wait to see photos.

    Okay, so Silvermont blows ARM out of the water, but what about Jaguar? I'm curious to see how it competes against AMD's analogue. Atom should be less expensive (to produce at least), but I'm interested in seeing the CPU performance deltas in particular.
     
    #2 Homeles, May 6, 2013
    Last edited by a moderator: May 6, 2013
  2. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,723
    Likes Received:
    193
    Location:
    Stateless
    Well on a technical basis Intel should take a lead competitors will have tough time to reclaim (at least till other founders catch up) but I would be be wary about how it will translate in market shares there are many factors including price war.
    But indeed I never really got the gloom and doom of some financials about Intel and why Qualcomm capitalization exceeds now the one of Intel, imho some people in the financial world are out their freaking rockers (not that Qualcomnm is a bad company by any extend, they do great). If it comes down to that to save the volume it needs for its fab, Intel will lower its margins.
    On the tech side imho there is no company that can compete now with Intel when they are at it.
    Now technical merit doesn't decide everything, Intel needs partner foremost secure the Nexus line on Google side and also support whatever MSFT product comes next (tablet and phones).
    I wonder if they could go with exclusive partnership at first, providing top of the line hardware, to build a really strong brand name in the sector and then open up to other manufacturers with the follow up chip (@14nm). They need a stronger brand and a great partnership, Intel inside doesn't cut it in the embedded realm they need one killer product in both the phone and tablet realm. Samsung is out of the picture, I'm iffy about HTC or Nokia getting (back) the traction (they used to have) even with a top of line product, there is really not much option Google, MSFT (though they don't have a phone line at the moment) either start their own (unlikely but could prove not a bad idea).
     
    #3 liolio, May 6, 2013
    Last edited by a moderator: May 6, 2013
  3. Albuquerque

    Albuquerque Red-headed step child
    Veteran

    Joined:
    Jun 17, 2004
    Messages:
    3,845
    Likes Received:
    329
    Location:
    35.1415,-90.056
    Waiting for an updated version of Lenovo's Tablet 2 device with this processor. w00t
     
  4. tunafish

    Regular

    Joined:
    Aug 19, 2011
    Messages:
    542
    Likes Received:
    171
    Jaguar seems to be better at everything except L2 latency (at which it's much worse) and clock speed. All the OoOE buffers in Jaguar are much larger, and it has a load + store units, instead of a load/store unit.

    I'd expect that Jaguar should be faster in most circumstances, but that Silvermont is able to scale to a much smaller power envelope.
     
  5. Wynix

    Veteran Regular

    Joined:
    Feb 23, 2013
    Messages:
    1,052
    Likes Received:
    57
    This was to be expected, jaguar is aimed at tablet and up, Silvermont is aimed at tablet and down.
     
  6. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,723
    Likes Received:
    193
    Location:
    Stateless
    Whereas Jaguar should preform better I've a pretty bad feeling about it now, Intel spoke a couples of weeks ago about cheap laptops with a sleak form factor that could be powered by those new Atom and sell as low as 249/299$. Sort of low end ultrabook.
    Whereas Jaguar should perform better (CPU is unclear / depend on the gap in clock speed, GPU is a given) the power consumption should not be in the same ballpark. It is extremely bothering for AMd we speak ~an order of magnitude more power, that is a lot with a lot of impact on costs.
    I start to seriously question the chances Jaguar has to succeed as a tablet or netbook 2.0 processor, Intel delivered a massive blow into AMD forecast :(

    EDIT
    Jaguar could slightly better better per cycle as others have been pointing to though that let clock speed out of the picture.
     
    #7 liolio, May 8, 2013
    Last edited by a moderator: May 9, 2013
  7. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,015
    Likes Received:
    112
    The numbers are difficult to compare as far as OoOE resources go but it doesn't look to me like they are "much" larger overall for Jaguar. Maybe slightly larger. A pity the realworldtech article only compares silvermont to saltwell, but not Jaguar.
    So far I see both having integer PRF with 64 entries, though Jaguar being slightly more flexible (can use 33-44 entries available for renaming whereas silvermont is just using a fixed 32 for that). Store queues also have similar size (20 for Jaguar, 16 for Silvermont). Float PRF though seems indeed larger on Jaguar (72 entries vs. just 32).
    Ok Jaguar seems to be able to have significantly more operations in flight (64 for int pipe, 44 for float pipe, both 32 for silvermont), and similarly the schedulers hold more entries (20 alu, 18 float, 12 mem vs. 8+8 alu, 8+8 float, 6 mem) while being unified (well separate for int/float/mem but not per execution unit) which should also help a bit.
    So far I see advantages for Silvermont for L2 cache (while theoretically smaller for single-thread case the much better latency will make much more of an impact), and the smaller branch mispredict penalty (10 vs 14 cycles - nothing to sneeze at).
    Jaguar has an advantage due to full store and load pipe (though as far as single load/store pipes go silvermont looks quite spiffy there), and there's some advantage in the simd unit because silvermont seems to retain the half-wide multiplier.
    Let's compare some other hard numbers, Silvermont listed first:
    L1I TLB: 48/4KB entries vs. 40/4KB + 8/2MB entries, both fully associative
    L1D TLB: 48/4KB entries vs. 40/4KB + 8/2MB entries, both fully associative
    L2 TLB: 128/4KB/4-way + 16/2MB/4-way entries vs. 512/4KB/4-way + 256/2MB/2-way (serving twice as many cores)
    L1I Cache: 32KB/8-way vs. 32KB/2-way, both with 64B line size
    L1D Cache: 24KB/6-way vs. 32KB/8-way (both can handle 16B load and store simultaneously)
    L2 Cache: 1MB/16-way (13-14 cycles) vs. 2MB/16-way (25 cycles) (serving twice as many cores)

    So Jaguar can better handle large pages, but overall these chips look like designed for similar throughput (per clock). Oh and Silvermont has better L1I cache, though I guess unlike on BD the 2-way associativity of Jaguar doesn't hurt that much.
    I can't quite judge things like branch predictors, loop buffers etc when comparing these two chips which probably can make quite some difference.
    But I guess you're probably right, the OoOE resources are somewhat larger in general for Jaguar so you'd think it should perform a bit better per clock (particularly on the simd side I think - on the int side I'm not sure if the better l2 cache latency, lower branch misprediction penalty (though that would depend on branch predictor quality) couldn't make up for that). In any case, performance of these chips should track much more closely overall unlike Bonnell vs. Bobcat (where you get from Bobcat annihilates Bonnell to about as fast and everything in-between depending on the code).
     
  8. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,723
    Likes Received:
    193
    Location:
    Stateless
    Ain't that a really bad news once you took in account the difference is clock speed (and power) between Silvermont and Jaguar?
     
    #9 liolio, May 9, 2013
    Last edited by a moderator: May 9, 2013
  9. epicstruggle

    epicstruggle Passenger on Serenity
    Veteran

    Joined:
    Jul 24, 2002
    Messages:
    1,903
    Likes Received:
    45
    Location:
    Object in Space
    Not entirely accurate, they also took into account microservers.
     
  10. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    429
    Location:
    Cleveland, OH
    For Temash, yes. Although its GPU will probably still be faster. For the 2GHz Kabini, not as much, but at its TDP it'll only find its way in larger devices.

    There's also the time gap between releases, 6 months isn't insignificant.
     
  11. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    Intel hasn't yet revealed the SOC configurations / clocks / TDP. The only information we have so far is that one Silvermont CPU core consumes less than 1W (Real world tech). If that is "slightly under 1W", the CPU cores add up to 3-4W (and that's without a GPU). That's not far away from 3.5-5.9W Temash figures. But as long as we don't know the clocks, we cannot make a proper TDP comparision.

    We should assume that Intel has better performance per watt (in general purpose code), because they have finally brought their most advanced process technology to their low power parts (22nm to tri-gate vs 28nm bulk in Jaguar). I would expect them to have slight advantage in scalar integer performance as well (as the IPC should be only slightly behind, but Intel has higher clock headroom). However AMD should be faster in floating point and SIMD (both float and integer) processing (comparison of Silvermont revealed bits from real world tech article to the AMD Jaguar optimization guide / instruction latency/throughput excel sheet). Jaguar also supports AVX instruction set. The VEX (3 operand non-destructive operands) are good for narrow (2 wide) architectures (less instructions needed for extra register moves and loads). Jaguar also has a really nice full rate 4x32 bit integer SIMD multiply (with superb 2 cycle latency), and fast horizontal ops (and very fast CVT16 among other goodies). I would expect it to fare pretty well against Silvermont in multimedia processing (and games).
     
  12. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,723
    Likes Received:
    193
    Location:
    Stateless
    #13 liolio, Jul 5, 2013
    Last edited by a moderator: Jul 5, 2013
  13. Homeles

    Newcomer

    Joined:
    May 25, 2012
    Messages:
    234
    Likes Received:
    0
    Well AnTuTu is supposedly heavily biased towards Silvermont, so those results aren't really useful. Interesting lineup, though.

    Can't wait for die shots to surface.
     
  14. DavidC

    Regular

    Joined:
    Sep 26, 2006
    Messages:
    347
    Likes Received:
    24
    The Tablet oriented Bay Trail-T parts are likely even better on power characteristics than the M/D/I.

    I wouldn't take the clock speed seriously. It likely ran the benchmark at its full ~2GHz speed then ramped down for power savings after the benchmark was completed.

    45-55k is not far from the Core i3 Sandy Bridge range. The benchmark seems to scale perfectly linear with multiple cores, so 2x cores are making up for IPC deficiency against Core chips.

    Also, note that its a "system" benchmark.
     
  15. Laurent06

    Regular

    Joined:
    Dec 14, 2007
    Messages:
    716
    Likes Received:
    33
  16. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,015
    Likes Received:
    112
    I don't think your conclusion is valid. I suspect it is more likely (as this is a native benchmark) the compiler used for 2.9.3 didn't produce code very well optimized for atom - and atom being an in-order architecture this is going to be more important than for OoO chips.
    I'm just speculating here of course. Also the high memory score (compared to linpack etc.) could easily be due to measuring latency vs. bandwidth for instance (or using patterns which benefit or not from prefetchers, or mostly measuring l2 or whatnot, atom has very low l2 latency for instance). Most of the other benchmarks (like linpack) aren't native neither so their memory scores are probably more likely going to reflect generic dalvik performance rather than really having much to do directly with memory subsystem.
    I am certainly not saying that AnTuTu should be used as THE benchmark when judging atom vs. arm, but as you could see in the results there are indeed some (not that many) benchmarks which also show a similar picture, so I think just dismissing it as "atom-biased" is a bit unfair. Of course intel is cherry-picking benchmarks which show them in a good light, everybody is doing that. You can't blame AnTuTu itself if some reviews use this as the only measure of performance.
    FWIW I wouldn't be surprised if performance in other benchmarks scales quite differently than in AnTuTu when comparing silvermont vs. saltwell.
     
  17. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    429
    Location:
    Cleveland, OH
    Then again, maybe Intel really is just cheating.

    http://forums.anandtech.com/showthread.php?p=35245611#post35245611
     
  18. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,015
    Likes Received:
    112
    Ok you might be right :). (At least I was correct assuming this is indeed due to compiler differences, not the code itself was tweaked to better suit atoms.)
    While I was suspecting they were using newer compiler I didn't expect them to use ICC for the x86 build. That is indeed pretty lame since those sort of benchmarks are well known to be quite exploitable by compilers (and even if the compiler didn't cheat and only use legitimate optimizations, it is supposed to be a benchmark indicating hw performance not a benchmark for comparing compiler quality). I could understand though not using vectorization on arm, frankly I've never seen that doing really much with gcc at least on x86 (could have changed recently though dunno). And I'm way too lazy to figure out if gcc could do something with autovectorization on nbench :).
    intel is definitely known for using questionable tricks in their compiler which gets them better results in widely used benchmarks, that certainly wouldn't be a first. Those SPEC cpu results were always nice claiming the new cpu got 30% faster where in reality it was just 10% the rest being due to the new compiler used with the new cpu (but not used on the old cpu) was learning one or two new tricks :).
    I'd be definitely more interested in AnTuTu 3.3 results using gcc for everything indeed.
     
    #19 mczak, Jul 11, 2013
    Last edited by a moderator: Jul 11, 2013
  19. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,723
    Likes Received:
    193
    Location:
    Stateless
    Interesting still is it really relevant? I mean if you put aside the "fake" differences between Intel CPU, it is a bit like drivers in the GPU realm nobody really cares about knowing which GPU is better it is more about the whole stack (optimization, drivers, GPU).

    If Intel gets traction on the mobile market why would devs use a compiler that is performing worse than Intel one because the worse performing compiler is available on different platform?
     
    #20 liolio, Jul 11, 2013
    Last edited by a moderator: Jul 11, 2013
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...