ARM announces new Cortex A72 core

Discussion in 'Mobile Graphics Architectures and IP' started by anexanhume, Feb 3, 2015.

  1. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,407
    Likes Received:
    4,057
    Yes, but it'll be a TRUE DOUBLE-OCTA-COREZ!!!!!11111oneone
     
  2. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    Gimme an 120k score in Antutu :runaway:
     
  3. Erinyes

    Regular

    Joined:
    Mar 25, 2010
    Messages:
    647
    Likes Received:
    92
    Ahh ok. So it seems IMG Series 7 will still be quite a bit ahead on performance then.
    Cyclone is significantly wider than A57 so even if they did go a bit wider with A72, Cyclone would still beat it. Geekbench single core scores show a 1.4 ghz Cyclone beating a 2.1 Ghz A57 (Score of 1640 v/s 1520). So even if A72 is 50% faster, clock-for-clock it would still fall short of a 2014 cyclone..forget a 2016 one.
    I dont think they gave any die size comparisons to A15 when they launched A57 either..but yea A72 is likely a fair bit larger than the A57 on the same process.

    I feel that they think A53 is fine for big.LITTLE because a higher performance core wont be of much use when paired with A72. The job of the LITTLE core is basically to save power. But for the mid-range market I can easily see them introducing a new core with performance in between A53 and A57..like they did with Cortex A12.
    How do you conclude that the CCI-500 contributes 30%? As per ARM, the memory performance is up 30% but the CPU performance wont increase anywhere near that much.
    Well if they mean sustained performance then the figure would be ok..but the tweet does not imply that in any way at all. Even I thought they meant absolute performance and only saw the sustained performance bit on the A72 product page later.
    Baah..the use of Antutu for marketing has given us a lot of useless SoC designs. At least we have Snapdragon 808 though.
     
  4. mboeller

    Regular

    Joined:
    Feb 7, 2002
    Messages:
    922
    Likes Received:
    1
    Location:
    Germany
    but don't underestimate the T880. If it is realy nearly 2x the performance than even a MP2 setup will do quite well. For comparison here are the benchmark results of an MT6752 which contains an T760 MP2 @ up to 700MHz:

    http://www.gizchina.com/2015/01/27/jiayu-s3-mt6752-64bit-benchmarks/

    The conclusion came from here:
    http://community.arm.com/groups/pro...ed-system-coherency--part-3--corelink-cci-500

    Therefore I think that for sustained performance the benefit is up to 30% but not for peak/benchmark performance.
     
  5. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    Huh? I don't know how ARM's marketing thinks, but it's likelier that the 80% is on a per core and per clock comparison.

    A T760MP4@700MHz is slightly faster in Manhattan than a Rogue G6230@600MHz. Note that that's 4 clusters vs. 2 clusters and a 16+% higher frequency (***edit: strike that, they're roughly even both at a bit under 9 fps in Manhattan offscreen). According to IMG's own marketing you have Series6XT ending at up to 50% faster compared to Series6 (G6230 above) and Series7XT another up to 60% faster compared to 6XT. I don't know if the latter is true, but the GX6450 in Apple A8 seems close to the claim for the 6 to 6XT difference.

    http://semiaccurate.com/2015/02/03/arm-outs-a57-successor-maya-core-cortex-a72/

    Let me be generous in ARM Mali's favour that the differences between Series6 and Series7XT and 760 vs. 880 are roughly the same, you still have the incident that you need twice as many clusters and a slightly higher frequency to beat with a recent Mali a Rogue. One partial explanation is that despite both having the same FLOPs to TMU ratio, in Midgaard you have that ultra-weird vector-ALU gordian knot with always 1 TMU per cluster.

    Long story short: Erinyes is right as it seems.
     
    #25 Ailuros, Feb 5, 2015
    Last edited: Feb 5, 2015
  6. Zeross

    Regular

    Joined:
    Jun 3, 2002
    Messages:
    280
    Likes Received:
    11
    Location:
    France
    Interesting information from Peter Greenhalgh on RWT forum :

     
  7. Erinyes

    Regular

    Joined:
    Mar 25, 2010
    Messages:
    647
    Likes Received:
    92
    Yes but there would be very few workloads where the CPU will be limited by memory bandwidth alone..which is why I said that in general use cases it would not be anywhere near that much. You also have to remember that available bandwidth will double with LPDDR4 anyway. So even for sustained performance..I do not think the benefit would be anywhere near 30%.
    It's not even on a per clock basis..as we've seen with the CPU performance claims they take the benefit of the process into account. They have considered a clock speed of 850 Mhz on 16nm. The comparison to T760 is probably a 700 mhz clock on a 20nm (this is just my estimation). See the performance tab on the product page here - http://www.arm.com/products/multimedia/mali-performance-efficient-graphics/mali-t880.php

    Mali-T880 (MP16)
    Feature Value Description
    Frequency 850MHz in 16nm (16 FinFET)
    Throughput 1700Mtri/s, 13.6Gpix/s in 16nm (16 FinFET)


    Edit: The performance tab on the product page on T760 lists the following:-

    Mali-T760 MP16

    Feature
    Value Description
    Frequency 650MHz in 28nm HPM
    Throughput 1300Mtri/s, 10.4Gpix/s in 28nm HPM
     
    #27 Erinyes, Feb 5, 2015
    Last edited: Feb 5, 2015
  8. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
  9. liquidboy

    Regular Newcomer

    Joined:
    Jan 16, 2013
    Messages:
    416
    Likes Received:
    77
    There's probably no relationship BUT I do find it totally intriguing that the Mali-t880 is

    1. 850MHz
    2. 13.6Gpixels/s
    3. 1700 M triangles/s (1.7 G triangles/s)


    and the XB1 GPU is

    1. 850MHz
    2. 13.6Gpixels/s - CB/DB block
    3. 1.71 G primitives/s - Geometry Block

    ARM weren't kidding when they said that the " Mali-T880 delivers 'console-quality gaming' "

    :)



    [​IMG]
     
  10. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    You're not going to see in all likeliness any device with 16 T880 clusters in the first place, because at that height the result will consume way too much for a mobile device ;) Besides and unless my math is wrong there's a severe difference in FLOPs between the two still. I get 870 GFLOPs for a T880MP16@850MHz unless they've pumped up the ALUs....Also "GPixels" for the T880 should actually state "GTexels/s" for which you're off by a factor of 3.0.
     
  11. liquidboy

    Regular Newcomer

    Joined:
    Jan 16, 2013
    Messages:
    416
    Likes Received:
    77

    :) I was only going off the ARM site

    under performance it shows "Throughput 1700Mtri/s, 13.6Gpix/s in 16nm (16 FinFET) "

    p.s. we are talking about the "throughput" which should be render targets via the CB/DB
     
  12. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    I'm as sure as I can be that the quoted GPixels are actually texture fillrates. Each cluster comes with a single TMU and therefore for MP16 = 16 TMUs * frequency; in the given case 16*850MHz = 13.6 GTexels/s. The data above in post #28 I provided are from their own homesite.

    You have 16 TMUs on a T880MP16 and 48 TMUs on the XB1 GPU (see pink block in the diagram you quoted); the 13.6 Gigapixels are from the 16 ROPs of the XB1 GPU. Essentially the XB1 has 3x times the TMU amount as the T880MP16.

    Realistically the highest amount we'll see in all likeliness from that GPU IP is a T880MP6 for future high end smartphone SoCs.
     
  13. Blazkowicz

    Legend Veteran

    Joined:
    Dec 24, 2004
    Messages:
    5,607
    Likes Received:
    256
    Wow, there's an IOMMU (that ARM calls an SMMU) and so a provision for a VM to use the GPU or other hardware blocks. And that was on previous tech already (ARMv7) and this is described in this paper from 2011
    http://www.arm.com/files/pdf/System-MMU-Whitepaper-v8.0.pdf

    This is tech barely available in desktops (with software complexity and/or licensing, Intel and nvidia mostly locking away the feature)
    Though some use cases are from the difficulty of securing Android (lack of security updates, overbearing apps). But if this works and there's software available you could use a secure OS or two for most of your stuff, and some Android VM with disabled or firewalled network to run some game etc.

    Today if you have enough money (and perhaps time or need to care) you can run a desktop PC with dual graphics cards and both Windows + linux, keep browsing beyond3D and doing stuff on linux while the Windows VM is off or rebooting to apply updates.
     
  14. fxtech

    Newcomer

    Joined:
    Apr 23, 2003
    Messages:
    77
    Likes Received:
    5
    Entropy and Lodix like this.
  15. fxtech

    Newcomer

    Joined:
    Apr 23, 2003
    Messages:
    77
    Likes Received:
    5
    i was expecting more reaction from people :) this core could reach (and go over) 2500 point on geek bench when will be deployed on the snapdragon 820 at 3 ghz with lpddr4 at 14nm
     
  16. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,714
    Likes Received:
    185
    Location:
    Stateless
    ARM Corp is not given much information about the difference between the T7xx series and the T8xx series yet for the T880 specifically they claim a beefy 1.8 the performance of T760. It makes me wonder if ARM went for 4 ALUS pipeline per core with the T880 (they did in some of their past designs).
    IF I use Anandtech data, I get 320 FLOPS per cycle or 544 FLOPS per cycle (taking in DOT product throughput and more, ARM count 17 FLOPS per ALU) for a T760 mp16. The T860 is the same as the T760 efficiency aside.
    The T880 could be simply a reworked T860 using the 14/16nm process. In that case I get: 376 MFLOPS and 462 MFLOPS (accounting for the DOT products). You can double that figure if the T880 were to use 4 ALUs per core, I still can't get the same number as you do.

    I don't think we are going mp16 configuration either. By ARM own admission they were expecting 10 cores to be the highest configuration we would see for the T760 line, so far the highest end implementation uses 8 cores.The Mali delivers either 160 FLOPS or 272 FLOPS per cycle depending on your accounting, the mp8 version included in the S6 runs @772MHz (max), that is 145 MFLOPS (or 210 MFLOPS using ARM figures).

    The PowerVR G6'xx inside either a Zenfone 2 or an iPhone 6 delivers 256 FLOPS per cycle, at max speed that is 136 MFLOPS for the Zenfone 2 and ~115 MFLOPS for the iPhone 5s (going by Anand tech clock estimate of 450MHz).
    If I use the GFX benchmark results and compare the Galaxy S- to the Zenfone, the results are inline with the FLOPS figures if you use ARM accounting for all the result but the ALU2 where both devices perform more or less the same. Actually in the ALU 2 test the PowerVR are performing better comparatively better and in pure graphical tests the mali does a tad better.
    From that test you can estimate that gpu in Apple A8 operates ~700 MHz (that test gives an accurate estimate of 454MHz for the A7 using the Z3580 GPU 533MHz as a ref). Now I can estimate the PowerVR G6450 (in the A8) theoretical FLOPS throughput :) And I get 179 MFLOPS.
    Again sticking to GFX bench result are pretty consistent with FLOPS figure using ARM accounting.
    I makes me think of some article I read I don't remember where comparing the FLOPS throughput of the modern GCN GPU inside the XB1 with the FLOPS throughput of Xenos inside the 360. Obvious the former is a lot more powerful but they article showed pretty well that on Xenos quite complicated calculations were "free" (or need multiple instructions on GCN). The situation seems to be the same when you compare Mali to other GPU architecture, it shows pretty well the difference between Compute FLOPS and Graphic FLOPS if that makes sense.

    Overall whereas I agree that Mali are not fit deliver the the type of performances you get for low mid-range PC GPUs or consoles, I realized that I've not been paying enough attention to the progress ARM made with its GPUs, those Mali GPU are damned good.
    ARM promised a 80% increase of performance over the T760, it might account for difference in clock speed 650 MHz vs 850 MHz in reference designs for the estimate (operating frequency in Samsung Exynos T760 is already higher). Even if part of that increase is deliver is could be enough for a hypothetical Mali-T880 mp8 powered device to beat an iPad Air 2 compete (if not win) the Shield Tablet. An Mp16 configuration (assuming good scaling and the GPU is fed properly) should compete (if not beat) a Shield TV (within a comparable tdp).
    Pretty sweet, if there is that is not too much of a PR twist actually a company like Nintendo should consider an all ARM design.
     
    #36 liolio, Jun 28, 2015
    Last edited: Jun 28, 2015
  17. Rys

    Rys AMD RTG
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,138
    Likes Received:
    1,337
    Location:
    Beyond3D HQ
    s/MFLOP/GFLOP/g
     
    OlegSH likes this.
  18. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,408
    Likes Received:
    172
    Location:
    Chania
    A >5 months delay for an answer and a quite long post just to actually agree with the quoted content of a few sentences is quite an art I must admit :p
     
  19. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    428
    Location:
    Cleveland, OH
    1) This isn't actually the first Cortex-A72 Geekbench 3 score to show up, MediaTek MT8173 have, and depending on what you look at single threaded score is pretty similar http://browser.primatelabs.com/geekbench3/compare/2732906?baseline=2826004 But Geekbench scores on Android are super erratic so who knows (one reason why they're not very instructive)

    2) Snapdragon 820 isn't using Cortex-A72 but a custom Qualcomm 64-bit ARM core. We have pretty much no performance details on it.

    3) No one has announced a 3GHz Cortex-A72 and I doubt any SoC with A72s made on a remotely mobile-targeted 16/14nm process will allow such high clocks.
     
    fxtech likes this.
  20. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,714
    Likes Received:
    185
    Location:
    Stateless
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...