Samsung Exynos 5250 - production starting in Q2 2012

Nebuchadnezzar · Apr 19, 2013

Exophase said:
Good to hear they're power gating the cores now... in inability to do that on Octa would be pretty devastating..

1.3GHz for A7s is good news. Ideally you'd have a fairly smooth performance curve between the A7s and A15s; I don't think this will quite give that (800MHz A15 will probably tend to beat 1.3GHz A7) but at least the gap is closer than it'd have been with a 1.2GHz limit. I can't wait to see some power consumption numbers, I hope someone (probably Anandtech) does a very deep dive on this.

Are you yet able to ascertain anything on how the cores are currently scheduled?

They've always had independent powergating, but the point is that it's now done via CPUIdle driver instead of stinking hotplugging, i.e. much much less software overhead. This was a big issue in terms of kernel implementation on the 4412.

I'm still going over the IKS policies; here's the driver: http://paste2.org/IZC5YgOj

Exophase · Apr 19, 2013

Nebuchadnezzar said:
They've always had independent powergating, but the point is that it's now done via CPUIdle driver instead of stinking hotplugging, i.e. much much less software overhead. This was a big issue in terms of kernel implementation on the 4412.

Okay, all I remember was you said before it wasn't normally done during usual idle scenarios.

Do you have any idea if the improvements to CPUIdle work for Exynos 5250 too? I'm interested since it's starting to show up in some random Chinese devices and I'm wondering if it could be more of a realistic option for getting low volume access from Samsung, even though 5410 looks like a much superior choice.

Nebuchadnezzar said:
I'm still going over the IKS policies; here's the driver: http://paste2.org/IZC5YgOj

Thanks. I'm going to now read it fully so I stop editing out comments here ;p

Nebuchadnezzar · Apr 19, 2013

Exophase said:
Okay, all I remember was you said before it wasn't normally done during usual idle scenarios.

Correct, hotplugging is a crude neanderthalic way of achieving power collapse, it could only be used in brutal and coarse ways with big software overhead. CPUIdle core-independent collapse is much much better.

Exophase said:
Do you have any idea if the improvements to CPUIdle work for Exynos 5250 too? I'm interested since it's starting to show up in some random Chinese devices and I'm wondering if it could be more of a realistic option for getting low volume access from Samsung, even though 5410 looks like a much superior choice.

The 5250 is the Nexus 10 has a coupled power collapse CPUIdle state, it's not as versatile as a core-independent one, but still much better than hotplugging.

I'm not currently aware if these are even actually hardware limitations or failure on part of the power management developers.

Turbotab · Apr 19, 2013

@Nebuchadnezzar Any chance you could please create a mirror for the 9500 & 9505 sources?

Nebuchadnezzar · Apr 19, 2013

Code:

	#define DOWN_STEP_OLD		1100000
	#define DOWN_STEP_NEW		600000
	#define UP_STEP_OLD		550000
	#define UP_STEP_NEW		600000
	#define STEP_LEVEL_CA7_MAX	600000
	#define STEP_LEVEL_CA15_MIN	800000

	if (freqs[cur]->old <= UP_STEP_OLD  && target_freq > UP_STEP_NEW)
		target_freq = STEP_LEVEL_CA7_MAX;

	if (freqs[cur]->old >= DOWN_STEP_OLD && target_freq < DOWN_STEP_NEW) {
		if (strcmp(policy->governor->name, "ondemand") == 0)
			target_freq = STEP_LEVEL_CA15_MIN;
		else
			target_freq = STEP_LEVEL_CA7_MAX;
	}

	if (cur == CA15 && target_freq < freq_min[CA15]) {
		do_switch = 1;	/* Switch to Little */
	} else if (cur == CA7 && user_set_eagle_count > get_num_CA15()
			&& target_freq > freq_max[CA7]) {
		do_switch = 1;	/* Switch from LITTLE to big */
		if (count > 0 && count < 4 &&
			target_freq > exynos_info[cur]->max_op_freqs[count + 1])
				later = true;
	}

This is basically the switching logic. The A7 cores are mapped to the A15 frequency at half speed, the frequency table as such means it's 1:1 A15 clocks up until 1200MHz.

There are two conditions to switching from A15's to A7's:

An A15 core will need to be above 1100MHz (real) for the previous sampling period, and the new (current target) virtual frequency to be below 600MHz (1200MHz A7) to trigger an override of the CPUFreq logic and initiate a jump from A15 to A7's, however it doesn't directly switch: Ondemand is the default governor, so what it seems to be doing is to have a stop-over at the minimum A15 frequency / 800MHz for a frequency period and let the next sample decide what to do.

An A7 core will need to be at 550 virtual (1100MHz A7) for its previous sampling period, and over 600 virtual (1200MHz A7) to switch to an A15 core, however again, this is not the actual switching, it's sitting again for another sample at 1200MHz real frequency before it lets the next sample decide.

The actual switching is simple: if frequency reaches 800MHz, we go to A7's, if it goes to 650 (1300 == freq_max[CA7]) it goes to A15.

The whole thing is a bit shenanigans in logic, but I suppose they do it so that the cores don't do too much fine-grained switching and the above logic flattens out the frequency jumps.

Turbotab said:
@Nebuchadnezzar Any chance you could please create a mirror for the 9500 & 9505 sources?

Hrm; I only have 1mbps upload on my connection and the kernel sources are both 250mB. I'll try to upload them to Github over the following hours. Edit: I'm going to need to merge this with the official Linux kernel before uploading...

Exophase · Apr 19, 2013

Just so we're clear, there is only one cluster active at a time right? Everything I see in the code suggests this but I want to make sure I'm not missing anything. For instance, exynos_switch migrates all the cores, and there are a lot of references to current active cluster.

Nebuchadnezzar · Apr 19, 2013

Exophase said:
Just so we're clear, there is only one cluster active at a time right? Everything I see in the code suggests this but I want to make sure I'm not missing anything. For instance, exynos_switch migrates all the cores, and there are a lot of references to current active cluster.

No, the driver has several instances and the code needs to know where it's currently at (which cluster) to decide the next move.

This is the core migration model which we've discussed ~2 months ago. You can have any combination and mix of A7/A15 of a total of 4 cores online. On top of that, they have a userspace limitation to the number of A15 cores, it's probably used when you enable power savings mode or something.

Exophase · Apr 19, 2013

Hrm, I was wondering if it was that at first, don't know why I didn't go with that >_> I guess exynos_switch doesn't switch all CPUs afterall but only the ones in the policy. This would probably be clearer if I knew what the for_each_cpu macro did.

Noticed user_set_eagle_count too, curious what kind of settings there'll be for that.

Looks like Samsung calls the A15 cluster domain ARM while the A7 cluster domain is called KFC (vdd_kfc, some register names, etc). I wonder what KFC stands for. It's making me kind of hungry.

french toast · Apr 19, 2013

GL benchmark 2.7, both unknown GS4 variants (korean vs canadian?) Exynos 5410 vs S600.
http://gfxbench.com/compare.jsp?D1=...337M&D3=Apple+iPad+4&D4=Apple+iPhone+5&cols=4

Edit, tracked some info down and it does seem to be korean version, apparently 5410 in this version runs faster than other markets ..1.8ghz vs 1.6ghz...could this be a similar scenario to gs3, which in some markets got exynos 4412 prime +LTE? (Galaxy note 2 SOC)

Nebuchadnezzar · Apr 19, 2013

Exophase said:
Looks like Samsung calls the A15 cluster domain ARM while the A7 cluster domain is called KFC (vdd_kfc, some register names, etc). I wonder what KFC stands for. It's making me kind of hungry.

VDD_ARM has been historically the voltage rail for the CPU, it's now the rail for the A15's. KFC stands for Kingfisher(cores) a.k.a. the A7 codename. You'll also see Eagle references, that's the A15 codename.

Edit: I have to correct myself, stupid 0 indexed frequency tables: the A7's are running at max of 1200, not 1300.

Exophase · Apr 19, 2013

Thanks. I knew Eagle was the codename for A15 but had no idea that A7 was called Kingfisher. I was actually wondering that.

Nebuchadnezzar · Apr 19, 2013

There's plenty of evidence of a global LTE enabled 5410 device, the kernel even has the usual modem drivers for it and GPIO board definitions.

Helmore · Apr 19, 2013

french toast said:
GL benchmark 2.7, both unknown GS4 variants (korean vs canadian?) Exynos 5410 vs S600.
http://gfxbench.com/compare.jsp?D1=...337M&D3=Apple+iPad+4&D4=Apple+iPhone+5&cols=4

Edit, tracked some info down and it does seem to be korean version, apparently 5410 in this version runs faster than other markets ..1.8ghz vs 1.6ghz...could this be a similar scenario to gs3, which in some markets got exynos 4412 prime +LTE? (Galaxy note 2 SOC)

No, as Nebuchadnezzar has just said, max frequency for the A15 cores actually is 1800 MHz, but this is as a form of 'Turbo' implementation. Here:

Nebuchadnezzar said:
EDIT: When the IKS is active, which means at all times, then the CPU is set up to run in "turbo"-like configurations, if 1-2 A15 are active, max frequency is 1800, 3 are active, it is 1700, if all 4 big CPUs are on, 1600MHz is the maximum frequency. I'm still reading through the max index they setup there so I can't confirm this behaviour yet, but it's there in the code.

Nebuchadnezzar · Apr 19, 2013

Helmore said:
No, as Nebuchadnezzar has just said, max frequency for the A15 cores actually is 1800 MHz, but this is as a form of 'Turbo' implementation. Here:

I read a bit more of the code and need to correct myself: IF the max CPU clock is allowed at 1800 then there is a the turbo frequencies are 1800 for 1, 1700 for 2 and 1600 for >2 (I misread the table indexes the first time round).

However the shipping kernel is limited at 1600MHz anyways so none of that seems to be active. I would need to check the SHV-300S's sources to see if those 1800 are indeed activated, if so, then the above turbo logic seems to be valid.

I'm currently uploading the sources to github, but I already merged the kernel with upstream Linux commits and the size ballooned, it'll take a while:

Counting objects: 2440900, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (374925/374925), done.
Writing objects: 15% (370587/2440900), 241.80 MiB | 102 KiB/s

To be found here whenever it finishes in an hour or two: https://github.com/AndreiLux/Perseus-UNIVERSAL5410

Edit: Holy crap Samsung's GPU development team must be really filled with baboons, that drivers looks hellish, and here I thought the Mali one was bad.

@Rys, what's the difference between "core" clocks and "hydra" clocks? Or better asked, what's Hydra?

Rys · Apr 22, 2013

Hydra's a GPU block in multicore SGX configurations. It does a bunch of different things, but primarily it's there to help make the multiple cores appear as one to the rest of the system while still having them work independently.

Ailuros · Apr 22, 2013

Nebuchadnezzar said:
@Rys, what's the difference between "core" clocks and "hydra" clocks? Or better asked, what's Hydra?

Any values you can read out for the latter clocks?

Rys · Apr 22, 2013

Hydra clock should be quite a bit lower than the SGX clock, especially in 5410's config where the SGX clock is high. In the 200MHz ballpark is normal, and it's a good idea for SGX 5XT multicore implementers to have it clocked independently, given what it does.

Rys · Apr 22, 2013

I should mention that it does depend on its connection to the outside world, number of SGX cores and some other things. Hydra clock could be higher than SGX clock as well. It's independent for a reason

Nebuchadnezzar · Apr 22, 2013

Ailuros said:
Any values you can read out for the latter clocks?

Rys said:
Hydra clock should be quite a bit lower than the SGX clock, especially in 5410's config where the SGX clock is high. In the 200MHz ballpark is normal, and it's a good idea for SGX 5XT multicore implementers to have it clocked independently, given what it does.

Rys said:
I should mention that it does depend on its connection to the outside world, number of SGX cores and some other things. Hydra clock could be higher than SGX clock as well. It's independent for a reason

https://github.com/AndreiLux/Perseu.../services4/system/exynos5410/sec_clock.c#L274

They're both running on the same clock.

Are the some performance differences in clocking Hydra block differently? Edit: Or power consumption waste in case performance is unaffected?

Rys · Apr 22, 2013

Performance at a given Hydra clock, much like a given GPU clock, will depend on rendering load and memory access.

Samsung Exynos 5250 - production starting in Q2 2012

Nebuchadnezzar

Exophase

Nebuchadnezzar

Turbotab

Nebuchadnezzar

Exophase

Nebuchadnezzar

Exophase

french toast

Nebuchadnezzar

Exophase

Nebuchadnezzar

Helmore

Nebuchadnezzar

Rys

Graphics @ AMD

Ailuros

Epsilon plus three

Rys

Graphics @ AMD

Rys

Graphics @ AMD

Nebuchadnezzar

Rys

Graphics @ AMD

Similar threads