From: Jorge Ramirez-Ortiz jro@xenomai.org
The current arm64 power scaling calculations deviate from the arm implementation causing scheduling problems on non bigLITTLE architectures (this could be due to baseline/merging issues).
The proposed workaround uses the middle_capacity weighting factor to guarantee that all CPUs report the same power (1024) on non bigLITTLE SoC.
Signed-off-by: Jorge Ramirez-Ortiz jro@xenomai.org --- arch/arm64/kernel/topology.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index db8bb29..65837c2 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -325,7 +325,7 @@ static void __init parse_dt_cpu_power(void) * constraint explained near table_efficiency[]. */ if (min_capacity == max_capacity) - return; + middle_capacity = max_capacity >> SCHED_POWER_SHIFT; else if (4 * max_capacity < (3 * (max_capacity + min_capacity))) middle_capacity = (min_capacity + max_capacity) >> (SCHED_POWER_SHIFT+1);
+Mark Brown
Jorge Ramirez-Ortiz jorge.ramirez-ortiz@linaro.org writes:
From: Jorge Ramirez-Ortiz jro@xenomai.org
The current arm64 power scaling calculations deviate from the arm implementation causing scheduling problems on non bigLITTLE architectures (this could be due to baseline/merging issues).
Could you elaborate on the scheduling problems, and on what platforms?
The proposed workaround uses the middle_capacity weighting factor to guarantee that all CPUs report the same power (1024) on non bigLITTLE SoC.
Since the problem isn't really specified in detail, it's not obvious why this is a fix to the problem.
Kevin
Signed-off-by: Jorge Ramirez-Ortiz jro@xenomai.org
arch/arm64/kernel/topology.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index db8bb29..65837c2 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -325,7 +325,7 @@ static void __init parse_dt_cpu_power(void) * constraint explained near table_efficiency[]. */ if (min_capacity == max_capacity)
return;
else if (4 * max_capacity < (3 * (max_capacity + min_capacity))) middle_capacity = (min_capacity + max_capacity) >> (SCHED_POWER_SHIFT+1);middle_capacity = max_capacity >> SCHED_POWER_SHIFT;
On 13 May 2015 at 16:43, Kevin Hilman khilman@linaro.org wrote:
Jorge Ramirez-Ortiz jorge.ramirez-ortiz@linaro.org writes:
The current arm64 power scaling calculations deviate from the arm implementation causing scheduling problems on non bigLITTLE architectures (this could be due to baseline/merging issues).
Could you elaborate on the scheduling problems, and on what platforms?
Also check what happened on the ARM side. Any changes here would have been requested by Catalin during review FWIW.
On 05/13/2015 11:43 AM, Kevin Hilman wrote:
+Mark Brown
Jorge Ramirez-Ortiz jorge.ramirez-ortiz@linaro.org writes:
From: Jorge Ramirez-Ortiz jro@xenomai.org
The current arm64 power scaling calculations deviate from the arm implementation causing scheduling problems on non bigLITTLE architectures (this could be due to baseline/merging issues).
Could you elaborate on the scheduling problems, and on what platforms?
A quadcore A53 SoC with SCHED_FEAT(ARCH_POWER,true).
Power is determined by the call to set_power_scale(cpu, cpu_capacity(cpu) / middle_capacity);
where each cpu node capacity is calculated as: ((be32_to_cpup(rate)) >> 20) * cpu_eff->efficiency;
being the efficiency table:
static const struct cpu_efficiency table_efficiency[] = { { "arm,cortex-a57", 3891 }, { "arm,cortex-a53", 2048 }, { NULL, }, };
When parse_dt_cpu_power gets called on the mentioned SoC, the following condition is met: min_capacity==max_capacity
Under these circumstances, middle_capacity is left with its default value (value of 1) and therefore the cpu power is out of scale (>>1024).
This affects the load balancing algorithm (what was reported were four instances of the same program -memtester- running on the same core while the other cores being left unused.
The proposed workaround uses the middle_capacity weighting factor to guarantee that all CPUs report the same power (1024) on non bigLITTLE SoC.
Since the problem isn't really specified in detail, it's not obvious why this is a fix to the problem.
Kevin
Signed-off-by: Jorge Ramirez-Ortiz jro@xenomai.org
arch/arm64/kernel/topology.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index db8bb29..65837c2 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -325,7 +325,7 @@ static void __init parse_dt_cpu_power(void) * constraint explained near table_efficiency[]. */ if (min_capacity == max_capacity)
return;
else if (4 * max_capacity < (3 * (max_capacity + min_capacity))) middle_capacity = (min_capacity + max_capacity) >> (SCHED_POWER_SHIFT+1);middle_capacity = max_capacity >> SCHED_POWER_SHIFT;
On 13 May 2015 at 17:06, Jorge Ramirez-Ortiz jorge.ramirez-ortiz@linaro.org wrote:
From: Jorge Ramirez-Ortiz jro@xenomai.org
The current arm64 power scaling calculations deviate from the arm implementation causing scheduling problems on non bigLITTLE architectures (this could be due to baseline/merging issues).
The proposed workaround uses the middle_capacity weighting factor to guarantee that all CPUs report the same power (1024) on non bigLITTLE SoC.
Signed-off-by: Jorge Ramirez-Ortiz jro@xenomai.org
arch/arm64/kernel/topology.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index db8bb29..65837c2 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -325,7 +325,7 @@ static void __init parse_dt_cpu_power(void) * constraint explained near table_efficiency[]. */ if (min_capacity == max_capacity)
return;
middle_capacity = max_capacity >> SCHED_POWER_SHIFT;
You should probably remove the if (min_capacity == max_capacity) condition like what is done in Arm 32bits ?
Regards, Vincent
else if (4 * max_capacity < (3 * (max_capacity + min_capacity))) middle_capacity = (min_capacity + max_capacity) >> (SCHED_POWER_SHIFT+1);
-- 2.1.4
On 05/13/2015 11:53 AM, Vincent Guittot wrote:
On 13 May 2015 at 17:06, Jorge Ramirez-Ortiz jorge.ramirez-ortiz@linaro.org wrote:
From: Jorge Ramirez-Ortiz jro@xenomai.org
The current arm64 power scaling calculations deviate from the arm implementation causing scheduling problems on non bigLITTLE architectures (this could be due to baseline/merging issues).
The proposed workaround uses the middle_capacity weighting factor to guarantee that all CPUs report the same power (1024) on non bigLITTLE SoC.
Signed-off-by: Jorge Ramirez-Ortiz jro@xenomai.org
arch/arm64/kernel/topology.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index db8bb29..65837c2 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -325,7 +325,7 @@ static void __init parse_dt_cpu_power(void) * constraint explained near table_efficiency[]. */ if (min_capacity == max_capacity)
return;
middle_capacity = max_capacity >> SCHED_POWER_SHIFT;
You should probably remove the if (min_capacity == max_capacity) condition like what is done in Arm 32bits ?
yes I did look into following the arm implementation, but since doing that is a larger change and I didnt know if someone (arm LT) was already working on a merge I opted for a simple scaling factor until the merge happens.
but of course I agree with you, we have to keep a single algorithm (I remember you mentioning).
On Wed, 2015-05-13 at 12:15 -0400, Jorge Ramirez-Ortiz wrote:
On 05/13/2015 11:53 AM, Vincent Guittot wrote:
On 13 May 2015 at 17:06, Jorge Ramirez-Ortiz jorge.ramirez-ortiz@linaro.org wrote:
From: Jorge Ramirez-Ortiz jro@xenomai.org
The current arm64 power scaling calculations deviate from the arm implementation causing scheduling problems on non bigLITTLE architectures (this could be due to baseline/merging issues).
The proposed workaround uses the middle_capacity weighting factor to guarantee that all CPUs report the same power (1024) on non bigLITTLE SoC.
Signed-off-by: Jorge Ramirez-Ortiz jro@xenomai.org
arch/arm64/kernel/topology.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index db8bb29..65837c2 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -325,7 +325,7 @@ static void __init parse_dt_cpu_power(void) * constraint explained near table_efficiency[]. */ if (min_capacity == max_capacity)
return;
middle_capacity = max_capacity >> SCHED_POWER_SHIFT;
You should probably remove the if (min_capacity == max_capacity) condition like what is done in Arm 32bits ?
yes I did look into following the arm implementation, but since doing that is a larger change and I didnt know if someone (arm LT) was already working on a merge I opted for a simple scaling factor until the merge happens.
This email thread on the Linaro Kernel list is the first I've seen of any of this. And in general, the ARM LT doesn't actually do any work on this code, we just seem to be the delivery path for work done by others, and we don't usually get to hear about that work until they until they want it in LSK.
On Wed, 2015-05-13 at 11:06 -0400, Jorge Ramirez-Ortiz wrote:
From: Jorge Ramirez-Ortiz jro@xenomai.org
The current arm64 power scaling calculations deviate from the arm implementation causing scheduling problems on non bigLITTLE architectures (this could be due to baseline/merging issues).
I've never really looked at this code before but I did some comparisons of arm versus arm64 versions and LSK versus mainline and my observations are:
1. There is no arm64 implementation in mainline. (Is there a reason?)
2. The arm64 implementation in LSK closely resembles the arm mainline version, except that mainline doesn't have the
if (min_capacity == max_capacity) return; else
That looks like an accidental side effect of commit 816a8de0017f ("ARM: topology: remove hwid/MPIDR dependency from cpu_capacity"). But it does meant that in mainline 'middle_capacity' is now set by parse_dt_cpu_power (to a useful value?) rather than being left as '1'.
3. The arm version in LSK looks like the ARM64 version, in the sense that 'middle_capacity' is not set, so is this something which needs addressing on arm as well?
The proposed workaround uses the middle_capacity weighting factor to guarantee that all CPUs report the same power (1024) on non bigLITTLE SoC.
Signed-off-by: Jorge Ramirez-Ortiz jro@xenomai.org
arch/arm64/kernel/topology.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index db8bb29..65837c2 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -325,7 +325,7 @@ static void __init parse_dt_cpu_power(void) * constraint explained near table_efficiency[]. */ if (min_capacity == max_capacity)
return;
else if (4 * max_capacity < (3 * (max_capacity + min_capacity))) middle_capacity = (min_capacity + max_capacity) >> (SCHED_POWER_SHIFT+1);middle_capacity = max_capacity >> SCHED_POWER_SHIFT;
On 13 May 2015 at 17:39, Jon Medhurst (Tixy) tixy@linaro.org wrote:
- There is no arm64 implementation in mainline. (Is there a reason?)
Catalin doesn't like it because he was concerned about where the relative power numbers came from and making sure that they were bang on accurate (they are derived from ARM's published Dhrystone numbers as are the ones in the arm code). Will did indicate that he'd be OK with it from the point of view of being consistent with the 32 bit implementation (which is the main reason I was pushing it upstream) but he wanted some benchmark numbers which seems sensible but I didn't get round to doing yet.
2. The arm64 implementation in LSK closely resembles the arm mainline
version, except that mainline doesn't have the
if (min_capacity == max_capacity) return; else
That looks like an accidental side effect of commit 816a8de0017f ("ARM: topology: remove hwid/MPIDR dependency from cpu_capacity"). But it does meant that in mainline 'middle_capacity' is now set by parse_dt_cpu_power (to a useful value?) rather than being left as '1'.
Thanks for digging that out.
3. The arm version in LSK looks like the ARM64 version, in the sense
that 'middle_capacity' is not set, so is this something which needs addressing on arm as well?
I'd have expected so. The code is a direct copy with updates being purely for review comments (plus drift due to updates to arch/arm that didn't get copied over).
On 05/13/2015 12:39 PM, Jon Medhurst (Tixy) wrote:
On Wed, 2015-05-13 at 11:06 -0400, Jorge Ramirez-Ortiz wrote:
From: Jorge Ramirez-Ortiz jro@xenomai.org
The current arm64 power scaling calculations deviate from the arm implementation causing scheduling problems on non bigLITTLE architectures (this could be due to baseline/merging issues).
I've never really looked at this code before but I did some comparisons of arm versus arm64 versions and LSK versus mainline and my observations are:
There is no arm64 implementation in mainline. (Is there a reason?)
The arm64 implementation in LSK closely resembles the arm mainline
version, except that mainline doesn't have the
if (min_capacity == max_capacity) return; else
That looks like an accidental side effect of commit 816a8de0017f ("ARM: topology: remove hwid/MPIDR dependency from cpu_capacity"). But it does meant that in mainline 'middle_capacity' is now set by parse_dt_cpu_power (to a useful value?) rather than being left as '1'.
- The arm version in LSK looks like the ARM64 version, in the sense
that 'middle_capacity' is not set, so is this something which needs addressing on arm as well?
I'd think so. the patch below (scaling in the case where min_capacity == max_capacity) is one way of fixing it.
The proposed workaround uses the middle_capacity weighting factor to guarantee that all CPUs report the same power (1024) on non bigLITTLE SoC.
Signed-off-by: Jorge Ramirez-Ortiz jro@xenomai.org
arch/arm64/kernel/topology.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index db8bb29..65837c2 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -325,7 +325,7 @@ static void __init parse_dt_cpu_power(void) * constraint explained near table_efficiency[]. */ if (min_capacity == max_capacity)
return;
else if (4 * max_capacity < (3 * (max_capacity + min_capacity))) middle_capacity = (min_capacity + max_capacity) >> (SCHED_POWER_SHIFT+1);middle_capacity = max_capacity >> SCHED_POWER_SHIFT;
Le 13 mai 2015 9:02 PM, "Jorge Ramirez-Ortiz" < jorge.ramirez-ortiz@linaro.org> a écrit :
On 05/13/2015 12:39 PM, Jon Medhurst (Tixy) wrote:
On Wed, 2015-05-13 at 11:06 -0400, Jorge Ramirez-Ortiz wrote:
From: Jorge Ramirez-Ortiz jro@xenomai.org
The current arm64 power scaling calculations deviate from the arm implementation causing scheduling problems on non bigLITTLE architectures (this could be due to baseline/merging issues).
I've never really looked at this code before but I did some comparisons of arm versus arm64 versions and LSK versus mainline and my observations are:
There is no arm64 implementation in mainline. (Is there a reason?)
The arm64 implementation in LSK closely resembles the arm mainline
version, except that mainline doesn't have the
if (min_capacity == max_capacity) return; else
That looks like an accidental side effect of commit 816a8de0017f ("ARM: topology: remove hwid/MPIDR dependency from cpu_capacity"). But it does meant that in mainline 'middle_capacity' is now set by parse_dt_cpu_power (to a useful value?) rather than being left as '1'.
- The arm version in LSK looks like the ARM64 version, in the sense
that 'middle_capacity' is not set, so is this something which needs addressing on arm as well?
I'd think so.
AFAICT ARM 32bits in lsk still uses the old algorithm with hwid which prevent the situation seen on arm64
Vincent
the patch below (scaling in the case where min_capacity == max_capacity)
is one
way of fixing it.
The proposed workaround uses the middle_capacity weighting factor to guarantee that all CPUs report the same power (1024) on non bigLITTLE
SoC.
Signed-off-by: Jorge Ramirez-Ortiz jro@xenomai.org
arch/arm64/kernel/topology.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/topology.c
b/arch/arm64/kernel/topology.c
index db8bb29..65837c2 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -325,7 +325,7 @@ static void __init parse_dt_cpu_power(void) * constraint explained near table_efficiency[]. */ if (min_capacity == max_capacity)
return;
else if (4 * max_capacity < (3 * (max_capacity + min_capacity))) middle_capacity = (min_capacity + max_capacity) >> (SCHED_POWER_SHIFT+1);middle_capacity = max_capacity >> SCHED_POWER_SHIFT;
On 05/13/2015 03:40 PM, Vincent Guittot wrote:
Le 13 mai 2015 9:02 PM, "Jorge Ramirez-Ortiz" < jorge.ramirez-ortiz@linaro.org> a écrit :
On 05/13/2015 12:39 PM, Jon Medhurst (Tixy) wrote:
- The arm version in LSK looks like the ARM64 version, in the sense
that 'middle_capacity' is not set, so is this something which needs addressing on arm as well?
I'd think so.
AFAICT ARM 32bits in lsk still uses the old algorithm with hwid which prevent the situation seen on arm64
Vincent
yes you are right.
But not on mainline (LSK arm64 has the same implementation than mainline arm: so both have broken calculations on non bigLITTLE SoC). I believe the suggestion is to fix this upstream and then sync the LSK arm64.
I was going to post to linux-arm-kernel now.
linaro-kernel@lists.linaro.org