Samsung Exynos 8890

Rys

Graphics @ AMD
Moderator
Veteran
Supporter
http://www.anandtech.com/show/9781/samsung-announces-exynos-8890-with-cat1213-modem-and-custom-cpu

Samsung's own Mongoose CPU microarchitecture for the big complex (now called M1)
Cortex-A53 for the little complex
Mali-T880MP12 GPU
Custom fabric/interconnect called SCI (Samsung Coherent Interconnect), so no ARM CCI
Powerful LTE modem (@Nebuchadnezzar speculates it's off-chip and on-package, which given the size of the rest of the big blocks, especially GPU, I'd say is probably likely)

So that makes three of the big four very high performance ARM-based SoC vendors not using Cortex for their fastest CPUs (although Samsung still use it for the little complex here).
 
Still 3-wide apparently. Apple really caught the rest of the industry with their pants down it seems. When are we going to see someone other than Apple attempt a really wide core?
 
Mali-T880MP12, seems that at least one of the Galaxy devices, probably the Note will have a 4K display.
T880MP12 sounds huge indeed. Considering the T760-MP8 in the Exynos 7420 already had to downclock to roughly half the peak frequency for sustained use, this thing has 50% more clusters, _and_ 50% more alus per cluster, so I wonder what the magic was to make this work reasonably in a low TDP environment... Is T8xx really that more power efficient?
 
T880MP12 sounds huge indeed. Considering the T760-MP8 in the Exynos 7420 already had to downclock to roughly half the peak frequency for sustained use, this thing has 50% more clusters, _and_ 50% more alus per cluster, so I wonder what the magic was to make this work reasonably in a low TDP environment... Is T8xx really that more power efficient?

No doubt Samsung have improved their 14nm process, during the past year, so that should account for some of improvement. I bet that Exynos 8890 will be fabbed on their 14LPP, rather than the older LPE of the 7420.
From the TSMC VS Samsung's A9 die area comparison, it seems that Sam have an advantage over TSMC in die area, so going for a wide and slow (MHz) GPU, is less risky than for TSMC users.
 
Still 3-wide apparently. Apple really caught the rest of the industry with their pants down it seems. When are we going to see someone other than Apple attempt a really wide core?

It's also Samsung's first custom core.
Apple starting putting the groundwork together at least as far back as 2008 with the PA Semi acquisition, and Swift was the more modest first deployment. 6-wide came about in 2014.

Samsung seems to have started a bit later, so a riff on an existing template seems like a more conservative choice that has less risk in terms of time to market, and real-world feedback for a more advanced core. This could be Samsung's version of Swift.

The PR is thick and details are light, so I do not know how to rank this versus an A72 implementation. The benefits of a custom core might be muted by Samsung's custom power management and process switch with the 7420, and thanks to ARM going back and performing more optimization and physical IP design, compared to what the prior cores left on the table.
 
I bet that Exynos 8890 will be fabbed on their 14LPP
Agree. LPP should be 14% faster than LPE, somewhat closing the gap to 16FF+.

The supposed scores reported a few months ago were 59.4 fps MH and 108.9 fps Trex that would point out that they kept frequencies stable at 700-772MHz.
 
T880MP12 sounds huge indeed. Considering the T760-MP8 in the Exynos 7420 already had to downclock to roughly half the peak frequency for sustained use, this thing has 50% more clusters, _and_ 50% more alus per cluster, so I wonder what the magic was to make this work reasonably in a low TDP environment... Is T8xx really that more power efficient?

I think it sounds huge because "12" sounds like a very high number. If you consider though that it actually has "just" 12 TMUs, then it's not "huge" at least compared to the GT7600 in the Apple A9 for instance. Frequency then is another chapter; the 760MP8 in the 7420 clocks at 700MHz with a burst frequency of 772MHz. Nebu or anyone else might correct me but in T-Rex if memory serves well it goes only up to 700MHz but it throttles down to 400MHz over N period of time.

The 7600 in A9 should be clocked at 533MHz or somewhere around that frequency either way. Now assume it'll throttle in a worst case down by ~20% it drops to 425MHz. Now ask yourself why the frequencies ULP mobile GPUs usually throttle at today are not so far apart.

For the record's sake the Kirin 950 is stuck with "just" 4 clusters but clocks its Mali to 900MHz *cough*
 
I think it sounds huge because "12" sounds like a very high number. If you consider though that it actually has "just" 12 TMUs, then it's not "huge" at least compared to the GT7600 in the Apple A9 for instance.
The "huge" was really in relation to the 760MP8 used in the Exynos 7420 as this one already has to throttle down to ~400Mhz. Now it's quite possible this is a reasonable frequency for power efficiency reasons (as a side note, even Carrizo GPU throttles to roughly this level in its 15W form), but I have some doubts going even lower would be helpful. So with 50% more clusters (which themselves have 50% more alu capacity) you're still looking at quite a big efficiency improvement needed (somewhere - either architecture or process or more likely both) to make this really useful.
 
Maybe it's just a rebranded A72.
The Apple A4, A5 were also rebranded Cortex A8, A9

A4 and A5 weren't CPU names, they were SoC names.. it's not like Exynos 3xxx was a rebranding for Cortex-A8, 4xxx for A9, etc.

The details posted for Mongoose look very similar to both A72 and A57, which both look even more similar to each other based on this sort of information. There are some small differences, for example most integer SIMD operations are lower latency. But really the devil is in the details, which is how A72 has measurably better IPC than A57 despite having the same basic execution resources with the same latencies. This doesn't account for the impact of things like instruction window/scheduler size, cache sizes and latencies, prefetch performance, branch prediction, TLB sizes, reordering capability (particularly, memory disambiguation and alias prediction) and so on.
 
The mobile SoC trend for CPUs seems to go into 2 directions:
- 2 cores but wide decode like 6 instructions (Apple, Qualcomn 820 (+2 low power cores))
- 4 cores but only 3 wide decode (A57, A72, M1 (+4 low power cores))
All with high clock > 2Ghz
 
The "huge" was really in relation to the 760MP8 used in the Exynos 7420 as this one already has to throttle down to ~400Mhz. Now it's quite possible this is a reasonable frequency for power efficiency reasons (as a side note, even Carrizo GPU throttles to roughly this level in its 15W form), but I have some doubts going even lower would be helpful. So with 50% more clusters (which themselves have 50% more alu capacity) you're still looking at quite a big efficiency improvement needed (somewhere - either architecture or process or more likely both) to make this really useful.

Look at it that way: IF it should throttle to the same degree/persentage as the 7420 GPU, you still have N% more usable performance compared to the former.
 
Look at it that way: IF it should throttle to the same degree/persentage as the 7420 GPU, you still have N% more usable performance compared to the former.
Keep in mind that the actual GPU power in the 7420 was only a portion of the total SoC power. Given Samsung did stuff in terms of their interconnect and memory controllers, things can end up either way.
 
So as I suspected, something got lost in translation. The 30% perf 10% efficiency figure actually was 30% perf 10% lower power. That's about an 44% increase in efficiency over the 7420's A57.

Seems they were pretty clear that this is also an on-die modem.
 
Last edited:
So as I suspected, something got lost in translation. The 30% perf 10% efficiency figure actually was 30% perf 10% lower power. That's about an 44% increase in efficiency over the 7420's A57.

That sounds pretty close to Cortex-A72's improvements, depending on how generous you are with measuring the performance increase (they say 10-50% at same clock with lower power consumption, but the more robust benchmarks are probably closer to the 10% mark)
 
That sounds pretty close to Cortex-A72's improvements, depending on how generous you are with measuring the performance increase (they say 10-50% at same clock with lower power consumption, but the more robust benchmarks are probably closer to the 10% mark)
http://images.anandtech.com/doci/9762/P1030611.jpg

Less power improvements but larger performance improvements making for overall larger efficiency improvements.

Anyway I don't really trust Samsung to give representative numbers on their SoCs (for better or worse) - the 7420 marketing numbers were for example focused on the process gains but the real gains were far greater than that.
 
Are those numbers for the same process? Because it's surprising to see that A72 is advertised as smaller than A57.
 
Back
Top