Samsung Exynos 7420 architecture analysis @ Anandtech

Discussion in 'Mobile Devices and SoCs' started by Rys, Jun 29, 2015.

  1. Rys

    Rys Graphics @ AMD
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,169
    Likes Received:
    1,480
    Location:
    Beyond3D HQ
  2. Laurent06

    Regular

    Joined:
    Dec 14, 2007
    Messages:
    753
    Likes Received:
    72
    Great work indeed!

    A comment: 181.mcf is getting a negative benefit on AArch64 due to larger pointer sizes which results in more cache thrashing (the main data structure is a pointer-based graph).

    OTOH I'm surprised by Crafty bad result. This one should benefit a lot from 64-bit integers as it's using 64-bit bitboards.
     
  3. Kaarlisk

    Regular Newcomer Subscriber

    Joined:
    Mar 22, 2010
    Messages:
    293
    Likes Received:
    49
    Now this is what Anandtech is about! :)
    A very interesting and understandable investigation that not only is thorough and revealing, but also clearly delineates the level of confidence the author has about specific statements.
     
  4. mboeller

    Regular

    Joined:
    Feb 7, 2002
    Messages:
    923
    Likes Received:
    3
    Location:
    Germany
    very nice article, but I still have one minor complain... on the perf/w table on page 7 I miss the values for the 5420-GPU. This would make it easy to get a better picture how much the Nvidia and AMD GPU's could gain from the new process
     
  5. Nebuchadnezzar

    Legend Veteran

    Joined:
    Feb 10, 2002
    Messages:
    1,026
    Likes Received:
    254
    Location:
    Luxembourg
    I don't have any devices with the 5420 around.
     
  6. Laurent06

    Regular

    Joined:
    Dec 14, 2007
    Messages:
    753
    Likes Received:
    72
    I took a closer look at SPEC2000 results and some of them make no sense. In particular AArch64 vs AArch32 A57 speedup/slowdown for 253.perlbmk and 186.crafty are most likely wrong. How was this compiled? With what version of gcc?
     
  7. Nebuchadnezzar

    Legend Veteran

    Joined:
    Feb 10, 2002
    Messages:
    1,026
    Likes Received:
    254
    Location:
    Luxembourg
    NDK10d so GCC 4.8

    -Ofast -ffast-math -flto -march=armv8-a -ftree-vectorize -fno-jump-tables -fgcse -fgcse-lm -fgcse-sm -mtune=cortex-a57
    -funroll-all-loops -static -opt-mem-layout-trans=3 -opt-prefetch -Wall -fPIC -fPIE -pie -Ofast -ffast-math -flto -march=armv8-a -ftree-vectorize -fno-jump- tables -fgcse -fgcse-lm -fgcse-sm -mtune=cortex-a57

    I agree that the crafty scores look wrong but I haven't been able to see find out why.
     
  8. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,293
    Location:
    Helsinki, Finland
    Thanks for the good article!
     
  9. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    11,280
    Likes Received:
    5,898
    The size difference between the big and LITTLE cores is really large!
    I though it'd be around 2x larger, but a Cortex A15 cluster is actually >4x larger than a Cortex A7 cluster.
    The difference is smaller with the A53 because it's ~40% larger than the A7.

    So if we assume that a 2-core cluster is ~60% the size of a 4-core cluster (the "MP core glue" probably can't be halved), then using Samsung's 20nm a 8-core Cortex A53 would occupy 9.16mm^2 whereas a 2*A53 + 2*A57 would occupy 11.81mm^2.

    The chinese SoC makers like HiSilicon and MediaTek have been sacrificing single-threaded performance for a measly 2.5-3mm^2...
     
  10. Rys

    Rys Graphics @ AMD
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,169
    Likes Received:
    1,480
    Location:
    Beyond3D HQ
    Some things I jotted down while I was reading it properly just now:
    • Samsung getting to mass production and customer delivery with complex 14nm SoCs before TSMC with 16FF or 16FF+ is noteworthy.
    • Dropping Qualcomm for Galaxy S and Galaxy Note is enough to cause Qualcomm a profit warning, which says a bit about ASPs for high-end chips.
    • The T760 core layouts on 5433 and 7420 are completely different and it is definitely 1MB L2 for the GPU (I haz very die shots, much cell counting, wow)
    • I don't really like the trend of heavily overvolting (30%!) the CPU complexes. Feels like it's just to high marketing numbers more than a better user experience.
    • The Cortex-M3 based live feedback loop thing is cool, and looks like the way forward for active dynamic power management.
    • I wonder how expensive the custom PMIC is, to be able to respond that quickly. I don't believe that's common.
    • For device minimum power you've got 358mW in the table, but 330mW in the text.
    • You could probably sell that undervolted kernel!
    • It's nice to have a public power figure for a high performance memory system plus DRAM (statements have always been in the ballpark but now there's something to link to).
     
    Lightman likes this.
  11. Nebuchadnezzar

    Legend Veteran

    Joined:
    Feb 10, 2002
    Messages:
    1,026
    Likes Received:
    254
    Location:
    Luxembourg
    358mW is 2nits white, 330mW is black.

    No I can't sell kernels, anybody can just make a free copy or redistribute it due to GPL.
     
  12. Rys

    Rys Graphics @ AMD
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,169
    Likes Received:
    1,480
    Location:
    Beyond3D HQ
    You don't think people would pay for a prepackaged kernel they could install with an easy method on their rooted device, and restore the original if they don't like it? I would.
     
  13. Nebuchadnezzar

    Legend Veteran

    Joined:
    Feb 10, 2002
    Messages:
    1,026
    Likes Received:
    254
    Location:
    Luxembourg
    You're in the minority... doing such a thing would result in a witch-hunt for the developer and having a copy-cat doing it for free within the same day.

    Btw regarding PMICs: basically every SoC nowdays come with their own special-purpose design. I think Samsung was one of the last to drop Maxim in favour of their own designs. The S2MPS15 in the S6 ramps up 12mV per ┬ÁS.
     
  14. Laurent06

    Regular

    Joined:
    Dec 14, 2007
    Messages:
    753
    Likes Received:
    72
    There's no gcc 4.8 for AArch64 in r10d, it's 4.9.

    Aren't -opt-mem-layout-trans and -opt-prefetch ICC (Intel x86 compiler) specific optimization flags? I couldn't find them in any gcc document.

    If you don't force the use of 64-bit types then it might explain the bad result. Try with -DHAS_LONGLONG or -DLONG_HAS_64BITS.
     
  15. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,450
    Likes Received:
    183
    Location:
    Chania
    Kaarlisk likes this.
  16. Kaarlisk

    Regular Newcomer Subscriber

    Joined:
    Mar 22, 2010
    Messages:
    293
    Likes Received:
    49
    It's pleasantly surprising. And a very thorough and methodical article.
    I now have to bow my head to those who said there is use in 4 little cores. Even in browsers. (I never had any doubts about power efficiency in games).
     
    iMacmatician likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...