Samsung Exynos 8890

Are those numbers for the same process? Because it's surprising to see that A72 is advertised as smaller than A57.
Why? ARM has said so back when it was announced (unless you'd believe it was _all_ just marketing...). Slightly smaller (hence cheaper), slightly less power, slightly faster. That of course should make the A57 completely obsolete, apparently the A57 wasn't really that good (well, neither was the A15)...
Albeit I'm not sure to what chip this image refers to. I'm somewhat surprised by the large non-core size decrease (core is 10% as expected, but non-core is over 25%).
 
Last edited:
Why? ARM has said so back when it was announced (unless you'd believe it was _all_ just marketing...). Slightly smaller (hence cheaper), slightly less power, slightly faster. That of course should make the A57 completely obsolete, apparently the A57 wasn't really that good (well, neither was the A15)...
Albeit I'm not sure to what chip this image refers to. I'm somewhat surprised by the large non-core size decrease (core is 10% as expected, but non-core is over 25%).

I guess I'd just missed it. Kudos to ARM, then.
 
Neither Hisilicon, Mediatek, Allwinner or Rockchips developed SoCs with the Cortex A57, even though all of them had SoCs with the previous 32bit "big" chips Cortex A15 and/or A17.
In fact, the only HIV capable of launching a Cortex A57 SoC that was successful in terms of power and thermals was Samsung.

Makes you wonder if the A57 wasn't actually a rushed half-assed design just to get 64bit SoCs in 2015.
 
HiSilicon was the launch-partner for the A57 and the first vendor to implement it in silicon. They're used in base-station and network equipment SoCs.

There's a vast portfolio of very little talked about non-mobile SoCs from all of those companies.
 
Neither Hisilicon, Mediatek, Allwinner or Rockchips developed SoCs with the Cortex A57, even though all of them had SoCs with the previous 32bit "big" chips Cortex A15 and/or A17.
In fact, the only HIV capable of launching a Cortex A57 SoC that was successful in terms of power and thermals was Samsung.

None of those companies had A15 SoCs out in devices in anything like the timeframe Samsung has either. Most of their lineup is lower end stuff. For example, Samsung had A15 in devices in fall 2012, while MediaTek's MT8135 was announced mid-2013, Allwinner's A80 early-2014, HiSilicon Kirin 920 mid-2014, Rockchip 3288 mid-2014 (and A17 isn't really a straight A15 successor, it's more of a medium-class core)

What I'm getting at is that they all lagged a while because their market positioning doesn't drive them to be as aggressive with high-end releases as Samsung or Qualcomm (although that's gradually changing). When they got around to releasing A15 SoCs they probably used newer revision A15 cores than the ones Samsung first launched with. The equivalent of waiting a year on A57 and using higher revision cores is basically using A72.

And I know nVidia's Tegra X1 isn't associated with any low power devices, but nVidia does claim that its CPU implementation of Cortex-A57 is significantly more efficient on TSMC's 20nm process than Exynos 5433 is on Samsung's 20nm process: http://international.download.nvidia.com/pdf/tegra/Tegra-X1-whitepaper-v1.0.pdf Would be interesting to see some third party measurements weigh in on this. I take nVidia's claims with a lot of salt, especially when they could involve things like compiler tricks gaming SPEC scores, but it would be pretty shocking if they were so far off that X1's CPU cores were actually significantly less efficient than Exynos 5433's.

Then again, Exynos 5433's efficiency wasn't really that great either.
 
Their 20nm process suffered quite some setbacks. The S5 was supposed to launch with the 5430.

On that topic.. it was my impression that Exynos 5430 had very impressive efficiency with its A15 and A7 cores, while 5433 was a lot less impressive. Late or not, it showed a really nice improvement over 5420 and it's especially impressive to see how much both the process and A15 revisions turned things around since their first A15 SoC (5250).

I know your aggregate testing showed a slight overall perf/W edge from the 5433 but it's hard to really get that from the raw power tests (and I wonder if that's not more of an attribute of better scheduling and a wider dynamic range on the little cores). I was especially taken back by how much more power A53 used than A7 at the same clock speed, but then again it was a pretty substantially improved design. A57 over A15, not as much. Do you have any information or speculation as to why the big cores on 5433 used so much more power at the same clock than the big cores on 5430? Was 5430 a more mature implementation where the designers had more time to massage it to the 20nm process since they probably started on it earlier? Was A15 more mature than A57 at this point, maybe incorporating power optimizations that weren't yet present in A57? Or does A57 really just intrinsically need that much more power at the same clock vs A15?

I like to think that nVidia's data support that 5433 wasn't a model A57 implementation more than A57 being all-around meh.
 
Do you have any information or speculation as to why the big cores on 5433 used so much more power at the same clock than the big cores on 5430? Was 5430 a more mature implementation where the designers had more time to massage it to the 20nm process since they probably started on it earlier?
Less to do with 20nm than the fact that they had a lot of time to do the implementation and it was like their 7th or so A15 SoC. I actually have to go back and do the power curves for it since its efficiency is in the order of ridiculousness and it would be a great data-point.

Over the last year or so I learned that physical implementation trumps pretty much everything else in a SoC. The range of what vendors end up doing with the RTL is pretty big where you end up with things like the 5430 on one end of the spectrum and then you have stuff like the Snapdragon 810/808 on the other end. Whenever I'll end up getting an S820 for testing I'm planning on a huge look back of the last 1-2 years.

So one thing I did is take the power curves, and normalize the x-axis for SPEC2k/MHz and then again divide the y-axis by the same thing. You end up with a graph such as this http://images.anandtech.com/doci/9518/perfw.png and when you actually add in all the other SoCs over the last few years it tells quite an interesting story. Take a guess which SoCs fall directly under the 7420 (I don't have Apple/Intel/5430 data yet).
 
Last edited:
GFXbench of the Mali T880 MP4 in the Mate 8's Kirin 950 SoC, vs the T760 MP8. It appears that Samsung will have to clock the T880 MP12 surprisingly high to match S530 and X1.

However the T880 does have a decent amount of shader performance, actually beating the T760mp8 in the ALU2 test, and the Open GL ES drivers have the word 'dev' in them, so perhaps they is a lot of optimisation to be done.
  • Offscreen1145 Frames (19.1 Fps) T880mp4
  • Offscreen1119 Frames (18.7 Fps) T760mp8

https://gfxbench.com/compare.jsp?be...-T880&D2=Samsung+Galaxy+S6+(SM-G920x,+SC-05G)
 
GFXbench of the Mali T880 MP4 in the Mate 8's Kirin 950 SoC, vs the T760 MP8. It appears that Samsung will have to clock the T880 MP12 surprisingly high to match S530 and X1.

However the T880 does have a decent amount of shader performance, actually beating the T760mp8 in the ALU2 test, and the Open GL ES drivers have the word 'dev' in them, so perhaps they is a lot of optimisation to be done.
  • Offscreen1145 Frames (19.1 Fps) T880mp4
  • Offscreen1119 Frames (18.7 Fps) T760mp8

https://gfxbench.com/compare.jsp?benchmark=gfxgen&did1=27816486&os1=Android&api1=gl&hwtype1=GPU&hwname1=ARM+Mali-T880&D2=Samsung+Galaxy+S6+(SM-G920x,+SC-05G)

Working link!
https://gfxbench.com/compare.jsp?benchmark=gfxgen&did1=25103839&os1=Android&api1=gl&hwtype1=GPU&hwname1=ARM+Mali-T760+MP8+(octa+core)&D2=Huawei+Mate+8+(NXT-xxx)
 
GFXbench of the Mali T880 MP4 in the Mate 8's Kirin 950 SoC, vs the T760 MP8. It appears that Samsung will have to clock the T880 MP12 surprisingly high to match S530 and X1.

However the T880 does have a decent amount of shader performance, actually beating the T760mp8 in the ALU2 test, and the Open GL ES drivers have the word 'dev' in them, so perhaps they is a lot of optimisation to be done.
  • Offscreen1145 Frames (19.1 Fps) T880mp4
  • Offscreen1119 Frames (18.7 Fps) T760mp8

https://gfxbench.com/compare.jsp?benchmark=gfxgen&did1=27816486&os1=Android&api1=gl&hwtype1=GPU&hwname1=ARM+Mali-T880&D2=Samsung+Galaxy+S6+(SM-G920x,+SC-05G)

I think the Kirin clocks its GPU at a wooping 900MHz. Other than that I don't see it even beating the 760MP8 in the ALU2 test. Results are within the margin of error; sarcasm?

Let me assume that the 880MP4 is truly clocked at 900MHz:

In Manhattan 3.0 offscreen it's at 17.4 fps. At 700MHz it would mean 13.5 fps * 3 (for 12 clusters) = 40.6 fps. Use a burst frequency of 772MHz like the T760MP8 in the 7420 and you're at 44.8 fps.

Additionally if you normalize the 880MP4 ALU2 score to 700MHz you're at a tad less than 15 fps.
 
Last edited:
I think the Kirin clocks its GPU at a wooping 900MHz. Other than that I don't see it even beating the 760MP8 in the ALU2 test. Results are within the margin of error; sarcasm?

Let me assume that the 880MP4 is truly clocked at 900MHz:

In Manhattan 3.0 offscreen it's at 17.4 fps. At 700MHz it would mean 13.5 fps * 3 (for 12 clusters) = 40.6 fps. Use a burst frequency of 772MHz like the T760MP8 in the 7420 and you're at 44.8 fps.

Additionally if you normalize the 880MP4 ALU2 score to 700MHz you're at a tad less than 15 fps.

Are you assuming perfect scaling? I'd love to know the power draw for the T880mp4 @ 700 to 900 MHz range, either HiSilicon has been very conservative on the GPU front, or Samsung's 14nm ++ process is doing an excellent job at minimising power draw.
 
On one hand it's nice to see low-power SoCs continuously raising the bar on GPU performance, even more for chinese IHVs like Mediatek and HiSilicon.

On the other hand, increasing GPU performance past Adreno 330's threshold seems like such a waste of die area and power envelope.
First, because none of these later SoCs can actually maintain their maximum performance for longer than a handful of minutes (or less), making it useless for anything but two or three runs of synthetic benchmarks. Second, because Android's 3D gaming market has pretty much stagnated for the last 2 or 3 years.

Meh...
 
Are you assuming perfect scaling? I'd love to know the power draw for the T880mp4 @ 700 to 900 MHz range, either HiSilicon has been very conservative on the GPU front, or Samsung's 14nm ++ process is doing an excellent job at minimising power draw.

If Samsung can clock 8 clusters at a peak 772MHz frequency under 14FF, then I don't see much of a problem going for 900MHz +17% frequency for half the clusters under the same process. Albeit just an indication it seems to throttle "only" by ~18% in the long term performance test compared to TRex onscreen: https://gfxbench.com/device.jsp?benchmark=gfxgen&os=Android&api=gl&cpu-arch=ARM&hwtype=GPU&hwname=ARM Mali-T880&did=27816486&D=Huawei Mate 8 (NXT-xxx)

It's not that the Galaxy6 doesn't throttle at all either: https://gfxbench.com/device.jsp?benchmark=gfxgen&os=Android&api=gl&cpu-arch=ARM&hwtype=GPU&hwname=ARM Mali-T760 MP8 (octa core)&did=23147698&D=Samsung Galaxy S6 Edge (SM-G925x, SC-04G, SCV31, 404SC) rather the contrary and even by a bigger degree.

I'm willing to bet that under 14FF and 2 clusters you could actually surpass the GHz mark w/o a problem.

On one hand it's nice to see low-power SoCs continuously raising the bar on GPU performance, even more for chinese IHVs like Mediatek and HiSilicon.

On the other hand, increasing GPU performance past Adreno 330's threshold seems like such a waste of die area and power envelope.
First, because none of these later SoCs can actually maintain their maximum performance for longer than a handful of minutes (or less), making it useless for anything but two or three runs of synthetic benchmarks. Second, because Android's 3D gaming market has pretty much stagnated for the last 2 or 3 years.

Meh...

Where is HISilicon or MTK raising the GPU bar? The current Helio X10 employs a dual cluster G6200@700MHz and the Kirin950 a MaliT880MP4@900MHz. The first is by 30% slower than the latter and the 880MP4 is performance wise in the Apple A7, Adreno420 region which is about two "generations" back.

GPU performance will and should continue to rise for two primary reasons IMHO:

1. GPGPU should start rising SoC performance where higher CPU core scaling doesn't make much sense anymore.
2. Exactly as you say because they can't maintain their peak performance the need for more performance is there since if you have say 100 GFLOPs on paper you actually end up with way less.
 
Last edited:
Where is HISilicon or MTK raising the GPU bar?

I meant their own bar. Chinese SoCs are getting a lot less terrible with their integrated GPUs. Remember that by the end of 2012 Mediatek's highest end SoC had a 6 year-old PowerVR SGX 531. By the end of 2013, Qualcomm and Samsung were launching new SoCs with OpenGL ES 3.0 GPUs, compute-capable unified shaders and >100GFLOP/s, where Mediatek and HiSilicon's flagships launched with a Mali 450MP4 (separated vertex/fragment shaders, OpenGL ES 2.0, zero compute capabilities).
Late 2014 and 2015 finally saw true parity on GPU featureset with those IHVs bringing PowerVR Series 6 and Mali Midgard solutions. 2016's chips will show performance on the chinese vendors actually really close to Samsung/Qualcomm's 2015 flagships.
But now even their midranges are getting decent GPUs. The MT675x have better GPUs than Snapdragon 615617, the 673x have much better GPUs than the S410/412 (which has an embarrassingly ancient GPU in it).
They're raising (their own) bar at a really high pace, IMO.


The current Helio X10 employs a dual cluster G6200@700MHz and the Kirin950 a MaliT880MP4@900MHz. The first is by 30% slower than the latter

Don't you mean the G6200@700MHz has close to 30% of the performance of a Mali880MP4@900MHz (more like 40%)? The Helio X10 is actually 60% slower than the Kirin 950.


and the 880MP4 is performance wise in the Apple A7, Adreno420 region which is about two "generations" back.

Except if we look at "long term performance" results, the Adreno 420 is barely any better than the Adreno 330. Which definitely leaves you thinking..



1. GPGPU should start rising SoC performance where higher CPU core scaling doesn't make much sense anymore.

Hum.. and GPGPU will start rising performance on smartphone applications such as?
Javascript seems hopelessly dependent on single-threaded performance. Photo/Video processing seems to consistently need fixed-function DSPs.
Are we going to perform physics simulations on smartphones? Run advanced video/image editing software?
Unless you're thinking of the transition to the smartphone-as-a-PC when docked to a screen+keyboard+mouse on Android, I don't see any actual need for this ever increasing GPU performance at the moment.



2. Exactly as you say because they can't maintain their peak performance the need for more performance is there since if you have say 100 GFLOPs on paper you actually end up with way less.

But increasing peak (and not sustainable) performance is completely bonkers. As I said, it's really only useful for reviewers who don't know any better to run a certain benchmark a couple of times.

Just imagine the clusterfuck that would be if this was happening in the desktop space. nVidia launching a graphics card that could run Witcher 3 maxed out at 60 FPS on 4K. And then when people took said card home and played Witcher 3, after 10 minutes they would only get 40 FPS because of power and heat throttling.
That would be an instant lawsuit right there. Together with all the cheating happening with overclocking for specific apps, how come Samsung and Qualcomm are getting away with this?
 
But now even their midranges are getting decent GPUs. The MT675x have better GPUs than Snapdragon 615617, the 673x have much better GPUs than the S410/412 (which has an embarrassingly ancient GPU in it).
I don't quite agree with that. MT675x and MT673x (both T720MP2) can't touch the SD 615 (Adreno 405). Albeit they are faster than the good old SD 410 (Adreno 306 - which is slower than Adreno 305 though SD 412 fixes that at least).
Albeit the T720MP2 results are all over the place I'm wondering what's up with that - if you've got over a factor of 2 performance difference for the exact same SOC that makes me suspicious... Of course could just be all due to drivers if old ones were crappy (adreno 305 had similar differences, though with adreno 306 all results tend to be MUCH closer together but of course they'd all use more mature drivers). A "slow" mt673x is in fact well below SD 410 even.
I don't disagree though that they are improving and pretty decent nowadays for midrange, the T720MP2 is ok. Albeit they still sell tons of chips with the crappy Mali 400/450 (though a 450MP4 performance-wise isn't too bad, but with a zero feature set).
 
Last edited:
I meant their own bar. Chinese SoCs are getting a lot less terrible with their integrated GPUs. Remember that by the end of 2012 Mediatek's highest end SoC had a 6 year-old PowerVR SGX 531. By the end of 2013, Qualcomm and Samsung were launching new SoCs with OpenGL ES 3.0 GPUs, compute-capable unified shaders and >100GFLOP/s, where Mediatek and HiSilicon's flagships launched with a Mali 450MP4 (separated vertex/fragment shaders, OpenGL ES 2.0, zero compute capabilities).

Apple and Co. went through that trend too, just earlier.The original iPad had 2 GFLOPs from the SGX535 GPU; it wasn't any better with smartphones either at first.

Late 2014 and 2015 finally saw true parity on GPU featureset with those IHVs bringing PowerVR Series 6 and Mali Midgard solutions. 2016's chips will show performance on the chinese vendors actually really close to Samsung/Qualcomm's 2015 flagships.
But now even their midranges are getting decent GPUs. The MT675x have better GPUs than Snapdragon 615617, the 673x have much better GPUs than the S410/412 (which has an embarrassingly ancient GPU in it).
They're raising (their own) bar at a really high pace, IMO.

GPU performance continues to scale everywhere.Chinese smartphones have still a sizeable distance in GPU performance compared to Android/iOS high end solutions.

Don't you mean the G6200@700MHz has close to 30% of the performance of a Mali880MP4@900MHz (more like 40%)? The Helio X10 is actually 60% slower than the Kirin 950.

The MT6795 has a 6200@700MHz for which I don't know why the specific HTC has as crappy performance, but a 6200@700MHz looks more like that:
https://gfxbench.com/device.jsp?benchmark=gfxgen&os=Android&api=gl&cpu-arch=ARM&hwtype=GPU&hwname=Imagination Technologies PowerVR Rogue G6200&did=25741695&D=Gionee GN9008

Maybe crappy drivers and a 6795M variant with quite a bit lower frequencies?

For the record the upcoming HelioX20 (MT6797 - decacore yadda yadda....) will have a Mali T880MP4 clocked at 700MHz. If the Mate with it's MP4@900MHz gets 17+ in Manhattan 3.0 the upcoming Mediatek X20 GPU might end up in the 13-14fps region which is exactly where A7 GPUs and the likes of that generation stands in performance.

That all assuming the Kirin950 doesn't have immature GPU drivers. In any other case it might end rather in the A8 GPU/Adreno430 region and it'll be one instead of two generations behind for which I'll of course stand corrected. For the moment I just use the existing data.

Today we have at the highest smartphone end an Adreno530 for Android vs. a 6 cluster GT7600 for iOS. The highest end Rogue you'll see from IMG's IP portofolio in chinese stuff sounds more like a dual cluster GT7200, which will be quite a bit ahead compared to today's G6200s, but also considering frequency differences at least 2x times lower than the peak smartphone GPUs. I'd estimate a GT7200@700MHz at 20-21 fps in Manhattan3.0; GT7600 & Adreno530 are in the 41-49 fps range.

Except if we look at "long term performance" results, the Adreno 420 is barely any better than the Adreno 330. Which definitely leaves you thinking..

That's QCOM's own problem and their DX11 craze for Microsoft's wet dreams which didn't lead anywhere.


Hum.. and GPGPU will start rising performance on smartphone applications such as?
Javascript seems hopelessly dependent on single-threaded performance. Photo/Video processing seems to consistently need fixed-function DSPs.
Are we going to perform physics simulations on smartphones? Run advanced video/image editing software?
Unless you're thinking of the transition to the smartphone-as-a-PC when docked to a screen+keyboard+mouse on Android, I don't see any actual need for this ever increasing GPU performance at the moment.

No let's leave SoCs as they are because we can scale CPU core amount and/or frequencies there endlessly. Not only do SoCs need a fine balance between CPU and GPU processing power (like everywhere else), but scaling GPU performance within boundaries is by a lot cheaper than with CPUs because GPUs actually scale almost linearly with increasing cluster (say "core" count for marketing's sake) count. Both ARM and IMG have endless blog entries about GPGPU and no not every vendor can use a crapload of dedicated processing units nor can everyone develop them. In such a case where you actually need a fair amount of parallel processing the GPU is ideal and will burn way less power than any garden variety CPU out there: http://blog.imgtec.com/powervr/measuring-gpu-compute-performance (bottom of the page has links to even more articles).

Case example Mobileye EyeQ4 for ADAS:

http://www.prnewswire.com/news-rele...-its-first-design-win-for-2018-300045242.html

The EyeQ4® will feature four CPU cores with four hardware threads each, coupled with six cores of Mobileye's innovative and well-proven Vector Microcode Processors (VMP) that has been running in the EyeQ2 and EyeQ3 generations. The EyeQ4® will also introduce novel accelerator types – two Multithreaded Processing Cluster (MPC) cores and two Programmable Macro Array (PMA) cores. MPC is more versatile than a GPU or any other OpenCL accelerator, and with higher efficiency than any CPU. PMA sports compute density nearing that of fixed-function hardware accelerators, and unachievable in the classic DSP architecture, without sacrificing programmability. All cores are fully programmable and support different types of algorithms. Using the right core for the right task saves both computational time and energy. This is critical as the EyeQ4® is required to provide "super-computer" capabilities of more than 2.5 teraflops within a low-power (approximately 3W) automotive grade system-on-chip.

Not everyone can turn up resources to develop hw and algorithms like that.
But increasing peak (and not sustainable) performance is completely bonkers. As I said, it's really only useful for reviewers who don't know any better to run a certain benchmark a couple of times.

That's a topic very specific for Samsung and or QCOM solutions. If you watch carefully to which frequency within each generation GPUs scale down to while throttling you'll see that it's usuallly not too far apart between most of them. Obviously a solution that is clocked at "just" 533MHz throttling down to 400MHz compared to a solution clocked at 772MHz throttling down to 400MHz, means higher sustainable perfornance chances for the first case. They're just using very high frequencies to stay competitive; that it's not free in terms of power consumption isn't something new.

However what I said was meant for all solutions and wasn't targeted at the above. If an ISV today would want to create a triple A mobile game that would run decently only on something like the iPad Air2 they won't be able to squeeze all of its 270+ GFLOPs out of it but a sizeable portion less because it'll just get too hot. If FLOPs don't help as a measure use anything else instead to set a mark for GPU performance. It's a general problem of ULP mobile devices.

Just imagine the clusterfuck that would be if this was happening in the desktop space. nVidia launching a graphics card that could run Witcher 3 maxed out at 60 FPS on 4K. And then when people took said card home and played Witcher 3, after 10 minutes they would only get 40 FPS because of power and heat throttling.
That would be an instant lawsuit right there. Together with all the cheating happening with overclocking for specific apps, how come Samsung and Qualcomm are getting away with this?

Nice example; now imagine a mobile solution that would contain that graphics card mentioned above. Chances are damn high it would actually end up like that under circumstances. Just for the record's sake and just because some of us haven't gone nuts: when many said that the GM20B GPU in the X1 cannot in its full form make it into an ultra thin tablet it wasn't a joke. It's clocked at 1000MHz in the Shield Android TV and at 850MHz in the PixelC. If possible someone should downclock the first's GPU to 850MHz and run the very same stressful 3D benchmark over a fair amount of loops. Which and WHY would you think is likely to throttle first and more?

As for the last question ask the press why they're too mild for either/or cases. You should also consider that the endless smartphone/tablet crowd for Samsung and QCOM solutions aren't a bunch of gamers that have the knowledge or interest to find out what's going on. If you'd have a big enough angry crowd opposing against it, the press would also chime along in order to gain even more hits.

But you know all the above already; why are you even asking? :p
 
Last edited:
Back
Top