Intel Silvermont(Next-gen OOE Atom)

Discussion in 'PC Industry' started by DSC, May 6, 2013.

Tags:
  1. Laurent06

    Regular

    Joined:
    Dec 14, 2007
    Messages:
    715
    Likes Received:
    33
    Then why not every commercial app on Windows or Linux is using icc?

    My personal experience with icc is mixed to say the least: when it's faster than gcc, it's by a few percent, but most of the time the speed is very similar. Basically if your code doesn't vectorize or doesn't look like a benchmark that icc has specifically targeted then there's no point in using it, just use the most widely used compiler on your platform (that is VS for Windows and gcc for Linux/Android).
     
  2. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,723
    Likes Received:
    193
    Location:
    Stateless
    Thanks for the explanation, I was ignorant on the matter :)
     
  3. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,015
    Likes Received:
    112
    Even if everybody would be using icc it is questionable you would see the same kind of differences. Now maybe icc indeed makes more of a difference on atoms compared to "normal" x86 cpus (and yes I'd be interested in seeing results for less-widely-known stuff compiled), but this kind of optimization intel is doing here smells like being targeted exclusively at this benchmark. So this isn't really a fair comparison since noone is using a special gcc targeting benchmarks.
    Also, it would help if the guys developing such benchmarks would disclose the toolchain they are using up front as it plays such a crucial role.
     
  4. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    Even better..release the source code.
     
  5. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,015
    Likes Received:
    112
    Now that would be neat - kinda like SPEC cpu where you could certainly do your own numbers using your own toolchain.
    Though given that at least the cpu portion just seems to be nbench someone could try that instead...
     
  6. Laurent06

    Regular

    Joined:
    Dec 14, 2007
    Messages:
    715
    Likes Received:
    33
    There's a nbench app on the Android store ;)
     
  7. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,015
    Likes Received:
    112
    With published results even...
    Seems to be single-threaded though in contrast to AnTuTu. In any case atoms look sort of ok there, though there's only desktop atoms in the list, and the arm SoCs listed aren't quite the fastest (best I could spot was msm8960, dual core Krait 1.5Ghz (in the Motorola MB886)).
     
  8. Laurent06

    Regular

    Joined:
    Dec 14, 2007
    Messages:
    715
    Likes Received:
    33
    Yes, not many results, in particular no Android x86 results :(
     
  9. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,723
    Likes Received:
    193
    Location:
    Stateless
  10. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,015
    Likes Received:
    112
    Interestingly heise is claiming there are hints that the chip is not completely an out-of-order design. In particular the SIMD unit could be in-order they say: http://www.heise.de/ct/artikel/Prozessorgefluester-1921728.html - I've never heard of this and there's no other sources there but that's quite interesting.
     
  11. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,489
    Likes Received:
    400
    Location:
    Varna, Bulgaria
    Various parts of a CPU pipeline can be in-order or out-of-order, depending on the goals of the particular architecture and there's nothing strange about it. For instance, P4's instruction fetch and decode phase was an in-order 8-stage pipeline organization.
     
  12. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,015
    Likes Received:
    112
    I was not implying this is a strange choice just an interesting design decision (if true).
    Obviously, this would not be possible with a core-i design (as that has unified scheduler) but since it is known silvermont has separate schedulers this is indeed quite possible. However, silvermont architecture articles (like this one: http://www.realworldtech.com/silvermont/4/) certainly implied simd unit is fully OoO too.
    Though I would definitely qualify the P4 design as interesting AND strange overall :).
     
  13. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    429
    Location:
    Cleveland, OH
    Intel says it unambiguously in the software optimization guide.

    http://www.intel.com/content/www/us...-ia-32-architectures-optimization-manual.html

    I've actually never heard of a CPU that had out of order fetch or decode, have you? It seems like there wouldn't be value since there aren't really dependencies between fetch/decode of different instructions, although I guess it'd qualify as out of order if you can fetch from later blocks that are in icache when earlier ones aren't (not sure if this is really done either).
     
    #33 Exophase, Jul 30, 2013
    Last edited by a moderator: Jul 30, 2013
  14. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,015
    Likes Received:
    112
    Oh I missed that. In fact it even says it in the silvermont microarchitecture overview too: "While floating-point and memory instructions schedule from their respective queues in program order, integer execution instructions schedule from their respective queues out of order."
    Makes me wonder though if it would really have been much effort to go "full" out-of-order in the fpu cluster? Granted it does make the RSVs simpler but that seems to be about it. Also I wonder how the allocation to the RSVs works, if an instruction could be scheduled on both FP RSVs (deciding this was a quite sore point on K8 cpus). If the wrong one is picked it potentially has to wait behind some instruction whose data isn't ready even if it could be executed just fine (which could be a somwhat worse problem than it was on K8, as there picking the "wrong" RSV only meant picking a more busy execution pipe essentially).
    So I guess the simd unit is all in all really quite weak (don't forget the multiplier (as well as divide unit) is also 2x32bit only, it still has terrible non-pipelined microcoded horizontal instructions (not just horizontal but some very nice other instructions as well, like pmulld, pshufb are also like that) inherited from bonnell essentially). Kabini's simd unit (not quite sure about A15 as the instructions are obviously different but I suspect it's much better on paper as well) is compared to that in a completely different class.
     
  15. Homeles

    Newcomer

    Joined:
    May 25, 2012
    Messages:
    234
    Likes Received:
    0
    I had made a guess over at the AnandTech forums that the 2W SDP Silvermont Z3770 would come within 15% of the 15W Kabini A4-5000. It appears that my already generous assumption wasn't generous enough.

    I doubt that we'll see quite these high of numbers when Silvermont lands in the real world, but I think it's pretty safe to say that Silvermont is going to be a slam dunk, from a performance standpoint.
     
  16. Wynix

    Veteran Regular

    Joined:
    Feb 23, 2013
    Messages:
    1,052
    Likes Received:
    57
    This cannot be correct.
     
  17. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,723
    Likes Received:
    193
    Location:
    Stateless
  18. Rurouni

    Veteran

    Joined:
    Sep 30, 2008
    Messages:
    950
    Likes Received:
    216
    It's SDP and not TDP. I'm pretty sure on those benchmarks it would exceed its SDP.
    I do believe that the TDP would be lower than AMD offering, but definitely not 2w vs 15w.
     
  19. cal_guy

    Newcomer

    Joined:
    Jun 27, 2008
    Messages:
    216
    Likes Received:
    2
    Cinebench is scalar floating point so Silvermont isn't penalized for it's 64-bit multiply SIMD integer/FP pipe. It be interesting to see the Silvermont in a diverse set of benchmarks. Also next year should bring Jaguar's successor which will bring Connected Standby and probably a good non-deterministic Richland type turbo to the CPU side.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...