Samsung SoC & ARMv8 discussions

For ARM-based designs based on CCI-400, it's not just bandwidth you need to focus on. That interconnect is "deep", so especially with GPUs since they tend to have reasonably hard requirements on latency, making sure those requirements are met in a complex SoC can be difficult and GPU performance can be affected. Measured bandwidth only tells you part of the story.
 
Samsung confirmed what we pretty much already knew..14nm for the Exynos 7420, 2.1 Ghz clock for the A57's and LPDDR4. They also claim 20 percent faster speed, 35 percent less power consumption and 30 percent productivity gain for 14nm compared to 20nm. Not sure what they mean by productivity but Joshua at AT seems to think it is performance per watt. Unless my reasoning is completely wrong, wouldn't a 35% reduction in power consumption mean ~50% higher performance per watt (i.e. 1/0.65)?

Source - Anandtech
1. No idea. May have to do with power consumption.
2. It's not only cache but also other stuff within the cluster, PLLs, interfaces, etc.
3. There is a base amount of power which goes into the RoS and memory, remember these are SoC load numbers not just merely CPU core power figures. When you take away that base amount the scaling with threads is pretty linear on the A57 cores. I was also told that power consumption on the A15's was not linear because the cluster might be fighting for resources and each additional thread would decrease the actual work done per thread, making each additional thread use less power than the previous. The deltas from n to n+1 threads was 879, 708 and 637 mW. The same thing happened on the A7 cores.

Fixed the LP mention.

2. Yes..agreed that those dont scale well but surely they wouldn't grow in size compared to 28nm? And cache should scale well so something doesn't quite add up. The overall size of the clusters certainly should not have gone up significantly.
3. Yes..I understand the results are for the full SoC. But my point was that the additional power consumption for a 4 core load on A53 seems to be much higher than the A7, compared to a single core. On a one core load..SoC power is 0.271W v/s 0.213W for A53 v/s A7, i.e. a difference of 58 mw. But for 4 cores..SoC power is 0.847W v/s 0.453W, a difference of 394 mw or about 100 mw per core. This is almost double the difference of 58 mw for one core alone.

P.S. I have another unrelated query if you could indulge me and have the time to test. You guys did a test on encryption and storage performance on Lollipop on the Nexus 6 and we saw that performance dropped drastically if encryption was enabled. With the A57 this should be mitigated due to the encryption units. Could you possibly test this on the Note 4?

Ahh yes..thanks I forgot about that. Seems like higher bandwidth does not seem to be helping performance all that much even in benchmarks. The Geekbench scores for 7420 vs 5433 are ~15% and ~10% higher for single and multicore respectively. If you normalize for clocks (2.1 v/s 1.9 ghz) this reduces to ~5% and 0%. The other slight surprise is that multi-core advantage is lower than single core. I would have thought it would be the opposite due to the process advantage and presumably less throttling.
For ARM-based designs based on CCI-400, it's not just bandwidth you need to focus on. That interconnect is "deep", so especially with GPUs since they tend to have reasonably hard requirements on latency, making sure those requirements are met in a complex SoC can be difficult and GPU performance can be affected. Measured bandwidth only tells you part of the story.

Thanks Rys..always appreciate your valuable inputs :) Do you see this changing with CCI-500?
 
Last edited:
Not sure what they mean by productivity but Joshua at AT seems to think it is performance per watt. Unless my reasoning is completely wrong, wouldn't a 35% reduction in power consumption mean ~50% higher performance per watt (i.e. 1/0.65)?
That was our best guess by what that "productivity" gain meant. The announcement was weird so we'll know more by the end of this week.

3. Yes..I understand the results are for the full SoC. But my point was that the additional power consumption for a 4 core load on A53 seems to be much higher than the A7, compared to a single core. On a one core load..SoC power is 0.271W v/s 0.213W for A53 v/s A7, i.e. a difference of 58 mw. But for 4 cores..SoC power is 0.847W v/s 0.453W, a difference of 394 mw or about 100 mw per core. This is almost double the difference of 58 mw for one core alone.
As stated before, I was told by ARM that this may have been caused by decreasing work per thread with increasing threads/cores due to resource constraints on the clusters of the A7/A15 which has been "resolved" on the newer architectures.
P.S. I have another unrelated query if you could indulge me and have the time to test. You guys did a test on encryption and storage performance on Lollipop on the Nexus 6 and we saw that performance dropped drastically if encryption was enabled. With the A57 this should be mitigated due to the encryption units. Could you possibly test this on the Note 4?
I didn't update to Lolipop yet on the Note 4 and AFAIK the Samsung ROM doesn't have the encryption option.
 
Last edited:
Ahh yes..thanks I forgot about that. Seems like higher bandwidth does not seem to be helping performance all that much even in benchmarks. The Geekbench scores for 7420 vs 5433 are ~15% and ~10% higher for single and multicore respectively. If you normalize for clocks (2.1 v/s 1.9 ghz) this reduces to ~5% and 0%.

"even in benchmarks" is an odd turn of words.
Geekbench is very much an example of a benchmark suite that is designed on purpose to separate main memory performance from the rest of the benchmark suite. This allows assessment of the per core low level performance for a number of code examples, and how this scales with number of cores. The memory performance IS tested in a separate part of the overall benchmark. (It would have been nice if they added some kind of latency test as well.) It is not strange that the main memory bandwidth doesn't affect the other scores. It is not really meant to.
 
Do you see this changing with CCI-500?
Yep. I'd be very surprised if the interconnect's real-world performance and influence on full-SoC performance didn't get better versus the last gen.
 
Samsung are giving a talk tomorrow at ISSCC titled:
20nm High-κ Metal-Gate Heterogeneous 64b Quad-Core CPUs and Hexa-Core GPU for High-Performance and Energy-Efficient Mobile Application Processor

This could be where we hear first details of their custom GPU and/or CPU cores.
 
I take "Heterogeneous 64b Quad-Core CPUs" to mean two quad-core clusters. If it's not 5433 I'll eat a hat.
 
I take "Heterogeneous 64b Quad-Core CPUs" to mean two quad-core clusters. If it's not 5433 I'll eat a hat.
Start seasoning your hat just in case.

From what I understood of their architecture the heterogeneous is supposed to refer to the HSA-ity between the CPU and GPU.

Edit*

This report from back in December matches what I hear:
http://www.zdnet.co.kr/news/news_view.asp?artice_id=20141202145608

I know at least what their gpu is supposed to look like based on research papers but they have several designs including one ray-tracer that looked promising. Not sure what the presented one will be.
 
Last edited:
Why do a 20nm tapeout of something that works well enough to present at ISSCC, a full application processor no less with all the baggage that means (including a Cat6 LTE modem according to that ZDNet article!), and never have sold it or even announced it properly.

I'll season the hat but I'll be very surprised if I have to eat it.
 
I take it from the lack of any reports that this was indeed just the Exynos 5433 then?
Probably NDA for the audience. It was already weird only a single outlet world-wide reported on the 2013 piece. The MediaTek presentation on a "2.5GHz Octa-core" after is also nowhere to be seen
 
Last edited:
Correct me if I'm wrong.
With Exynos 7420's GPU, we're looking at:
~130 to 220 GFLOPs
8 TMUs, 8 ROPs @ 700-770MHz
25GB/s memory bandwidth + 1MB L2 cache
 
Back
Top