Hi Andrey,
Please PULL b.L MP V10 branch from my tree.
Updates: ------- - Based on v3.6 - Stats: - Total Patches: 77 (V9 had incorrect count) - New Patches: 7 - task-placement-v2: sched: Enable HMP priority filter by default - arm-multi_pmu_v2 updated existing patches: http://permalink.gmane.org/gmane.linux.linaro.devel/13707 - hw-bkp-v7.1-debug-v1: new branch (1 patch) - Dropped Patches: 2 - branch cpu-hotplug-get_online_cpus-v1 removed as patches are already there in rcu-hotplug-v1 - Updated Patches: - per-task-load-average-v3-fixed updated with minor fixes. from: git://git.kernel.org/pub/scm/linux/kernel/git/pjt/sched.git
------------------------8<------------------------------------------------
The following changes since commit c6617199117105f771463be72e69017303c9fe54:
config-frag/big-LITTLE: Use device-tree to provide fast/slow CPU list for HMP (2012-10-03 16:00:21 +0530)
are available in the git repository at:
git://git.linaro.org/arm/big.LITTLE/mp.git big-LITTLE-MP-v10
for you to fetch changes up to 2027d925f44d49835beff1b4917d8fd91f8805d7:
Merge branches 'per-cpu-thread-hotplug-v3-fixed', 'task-placement-v2', 'arm-asymmetric-support-v3-v3.6-rc1', 'rcu-hotplug-v1', 'arm-multi_pmu_v2', 'scheduler-misc-v1', 'hw-bkp-v7.1-debug-v1' and 'config-fragments' into big-LITTLE-MP-v10 (2012-10-12 14:14:20 +0530)
----------------------------------------------------------------
Axel Lin (1): ARM: ux500: Fix build error due to missing include of asm/pmu.h in cpu-db8500.c
Ben Segall (1): sched: maintain per-rq runnable averages
Dietmar Eggemann (1): ARM: hw_breakpoint: v7.1 self-hosted debug powerdown support
Jon Hunter (1): ARM: PMU: Add runtime PM Support
Lorenzo Pieralisi (1): ARM: kernel: provide cluster to logical cpu mask mapping API
Marc Zyngier (1): ARM: perf: add guest vs host discrimination
Mark Rutland (1): ARM: perf: register cpu_notifier at driver init
Morten Rasmussen (12): sched: entity load-tracking load_avg_ratio sched: Task placement for heterogeneous systems based on task load-tracking sched: Forced task migration on heterogeneous systems sched: Introduce priority-based task migration filter ARM: Add HMP scheduling support for ARM architecture ARM: sched: Use device-tree to provide fast/slow CPU list for HMP ARM: sched: Setup SCHED_HMP domains sched: Add ftrace events for entity load-tracking sched: Add HMP task migration ftrace event sched: SCHED_HMP multi-domain task migration control sched: Enable HMP priority filter by default linaro/configs: Enable HMP priority filter by default
Paul Turner (15): sched: track the runnable average on a per-task entity basis sched: aggregate load contributed by task entities on parenting cfs_rq sched: maintain the load contribution of blocked entities sched: add an rq migration call-back to sched_class sched: account for blocked load waking back up sched: aggregate total task_group load sched: compute load contribution by a group entity sched: normalize tg load contributions against runnable time sched: maintain runnable averages across throttled periods sched: replace update_shares weight distribution with per-entity computation sched: refactor update_shares_cpu() -> update_blocked_avgs() sched: update_cfs_shares at period edge sched: make __update_entity_runnable_avg() fast sched: implement usage tracking sched: introduce temporary FAIR_GROUP_SCHED dependency for load-tracking
Sudeep KarkadaNagesha (11): ARM: pmu: remove arm_pmu_type enumeration ARM: perf: move irq registration into pmu implementation ARM: perf: allocate CPU PMU dynamically at probe time ARM: perf: consistently use struct perf_event in arm_pmu functions ARM: perf: check ARMv7 counter validity on a per-pmu basis ARM: perf: replace global CPU PMU pointer with per-cpu pointers ARM: perf: register CPU PMUs with idr types ARM: perf: set cpu affinity to support multiple PMUs ARM: perf: set cpu affinity for the irqs correctly ARM: perf: remove spaces in CPU PMU names ARM: perf: save/restore pmu registers in pm notifier
Viresh Kumar (1): Merge branches 'per-cpu-thread-hotplug-v3-fixed', 'task-placement-v2', 'arm-asymmetric-support-v3-v3.6-rc1', 'rcu-hotplug-v1', 'arm-multi_pmu_v2', 'scheduler-misc-v1', 'hw-bkp-v7.1-debug-v1' and 'config-fragments' into big-LITTLE-MP-v10
Will Deacon (8): ARM: perf: add devicetree bindings for 11MPcore, A5, A7 and A15 PMUs ARM: pmu: remove unused reservation mechanism ARM: perf: remove mysterious compiler barrier ARM: perf: probe devicetree in preference to current CPU ARM: perf: prepare for moving CPU PMU code into separate file ARM: perf: move CPU-specific PMU handling code into separate file ARM: perf: return NOTIFY_DONE from cpu notifier when no available PMU ARM: perf: consistently use arm_pmu->name for PMU name
On Fri, 2012-10-12 at 14:46 +0530, Viresh Kumar wrote:
Hi Andrey,
Please PULL b.L MP V10 branch from my tree.
I've been testing with the v10 branch and with "Enable HMP priority filter by default" I have get a crash in select_task_rq_fair() on vexpress A9, busy investigating this at the moment...
Updates:
- Based on v3.6
- Stats:
- Total Patches: 77 (V9 had incorrect count)
- New Patches: 7
- task-placement-v2: sched: Enable HMP priority filter by default
- arm-multi_pmu_v2 updated existing patches: http://permalink.gmane.org/gmane.linux.linaro.devel/13707
- hw-bkp-v7.1-debug-v1: new branch (1 patch)
- Dropped Patches: 2
- branch cpu-hotplug-get_online_cpus-v1 removed as patches are already there in rcu-hotplug-v1
- Updated Patches:
- per-task-load-average-v3-fixed updated with minor fixes. from: git://git.kernel.org/pub/scm/linux/kernel/git/pjt/sched.git
------------------------8<------------------------------------------------
The following changes since commit c6617199117105f771463be72e69017303c9fe54:
config-frag/big-LITTLE: Use device-tree to provide fast/slow CPU list for HMP (2012-10-03 16:00:21 +0530)
are available in the git repository at:
git://git.linaro.org/arm/big.LITTLE/mp.git big-LITTLE-MP-v10
for you to fetch changes up to 2027d925f44d49835beff1b4917d8fd91f8805d7:
Merge branches 'per-cpu-thread-hotplug-v3-fixed', 'task-placement-v2', 'arm-asymmetric-support-v3-v3.6-rc1', 'rcu-hotplug-v1', 'arm-multi_pmu_v2', 'scheduler-misc-v1', 'hw-bkp-v7.1-debug-v1' and 'config-fragments' into big-LITTLE-MP-v10 (2012-10-12 14:14:20 +0530)
Axel Lin (1): ARM: ux500: Fix build error due to missing include of asm/pmu.h in cpu-db8500.c
Ben Segall (1): sched: maintain per-rq runnable averages
Dietmar Eggemann (1): ARM: hw_breakpoint: v7.1 self-hosted debug powerdown support
Jon Hunter (1): ARM: PMU: Add runtime PM Support
Lorenzo Pieralisi (1): ARM: kernel: provide cluster to logical cpu mask mapping API
Marc Zyngier (1): ARM: perf: add guest vs host discrimination
Mark Rutland (1): ARM: perf: register cpu_notifier at driver init
Morten Rasmussen (12): sched: entity load-tracking load_avg_ratio sched: Task placement for heterogeneous systems based on task load-tracking sched: Forced task migration on heterogeneous systems sched: Introduce priority-based task migration filter ARM: Add HMP scheduling support for ARM architecture ARM: sched: Use device-tree to provide fast/slow CPU list for HMP ARM: sched: Setup SCHED_HMP domains sched: Add ftrace events for entity load-tracking sched: Add HMP task migration ftrace event sched: SCHED_HMP multi-domain task migration control sched: Enable HMP priority filter by default linaro/configs: Enable HMP priority filter by default
Paul Turner (15): sched: track the runnable average on a per-task entity basis sched: aggregate load contributed by task entities on parenting cfs_rq sched: maintain the load contribution of blocked entities sched: add an rq migration call-back to sched_class sched: account for blocked load waking back up sched: aggregate total task_group load sched: compute load contribution by a group entity sched: normalize tg load contributions against runnable time sched: maintain runnable averages across throttled periods sched: replace update_shares weight distribution with per-entity computation sched: refactor update_shares_cpu() -> update_blocked_avgs() sched: update_cfs_shares at period edge sched: make __update_entity_runnable_avg() fast sched: implement usage tracking sched: introduce temporary FAIR_GROUP_SCHED dependency for load-tracking
Sudeep KarkadaNagesha (11): ARM: pmu: remove arm_pmu_type enumeration ARM: perf: move irq registration into pmu implementation ARM: perf: allocate CPU PMU dynamically at probe time ARM: perf: consistently use struct perf_event in arm_pmu functions ARM: perf: check ARMv7 counter validity on a per-pmu basis ARM: perf: replace global CPU PMU pointer with per-cpu pointers ARM: perf: register CPU PMUs with idr types ARM: perf: set cpu affinity to support multiple PMUs ARM: perf: set cpu affinity for the irqs correctly ARM: perf: remove spaces in CPU PMU names ARM: perf: save/restore pmu registers in pm notifier
Viresh Kumar (1): Merge branches 'per-cpu-thread-hotplug-v3-fixed', 'task-placement-v2', 'arm-asymmetric-support-v3-v3.6-rc1', 'rcu-hotplug-v1', 'arm-multi_pmu_v2', 'scheduler-misc-v1', 'hw-bkp-v7.1-debug-v1' and 'config-fragments' into big-LITTLE-MP-v10
Will Deacon (8): ARM: perf: add devicetree bindings for 11MPcore, A5, A7 and A15 PMUs ARM: pmu: remove unused reservation mechanism ARM: perf: remove mysterious compiler barrier ARM: perf: probe devicetree in preference to current CPU ARM: perf: prepare for moving CPU PMU code into separate file ARM: perf: move CPU-specific PMU handling code into separate file ARM: perf: return NOTIFY_DONE from cpu notifier when no available PMU ARM: perf: consistently use arm_pmu->name for PMU name
linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
On 12 October 2012 15:05, Jon Medhurst (Tixy) tixy@linaro.org wrote:
On Fri, 2012-10-12 at 14:46 +0530, Viresh Kumar wrote: I've been testing with the v10 branch and with "Enable HMP priority filter by default" I have get a crash in select_task_rq_fair() on vexpress A9, busy investigating this at the moment...
I have tested it only on TC2 and was fine there.
-- viresh
On Fri, 2012-10-12 at 10:35 +0100, Jon Medhurst (Tixy) wrote:
On Fri, 2012-10-12 at 14:46 +0530, Viresh Kumar wrote:
Hi Andrey,
Please PULL b.L MP V10 branch from my tree.
I've been testing with the v10 branch and with "Enable HMP priority filter by default" I have get a crash in select_task_rq_fair() on vexpress A9, busy investigating this at the moment...
I've now got to the bottom of the problem...
On homogeneous (non-heterogeneous) systems, arch_get_hmp_domains() still adds two HMP domains, but the second one is empty. This means that when CONFIG_SCHED_HMP_PRIO_FILTER is enabled and hmp_down_migration() unconditionally returns true, then hmp_select_slower_cpu() is called to select a cpu from this empty domain and returns the value NR_CPUS, which makes subsequent code blow up.
One question is should hmp_down_migration() return true even when it hasn't checked there are slower CPUs available?
The attached patch fixes the immediate problem by avoiding the empty domain (which is probably a good thing anyway), however I wonder if there are other circumstances in which hmp_select_{slower,fater}_cpu() can fail? E.g. can tsk_cpus_allowed() ever not intersect with the domain cpus?
I.e. do we need something like...
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e9dd53c..d3968a6 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3217,8 +3217,9 @@ static inline struct hmp_domain *hmp_faster_domain(int cpu) static inline unsigned int hmp_select_faster_cpu(struct task_struct *tsk, int cpu) { - return cpumask_any_and(&hmp_faster_domain(cpu)->cpus, - tsk_cpus_allowed(tsk)); + unsigned int new_cpu = cpumask_any_and(&hmp_faster_domain(cpu)->cpus, + tsk_cpus_allowed(tsk)); + return new_cpu < NR_CPUS ? new_cpu : cpu; }
/* @@ -3228,8 +3229,9 @@ static inline unsigned int hmp_select_faster_cpu(struct task_struct *tsk, static inline unsigned int hmp_select_slower_cpu(struct task_struct *tsk, int cpu) { - return cpumask_any_and(&hmp_slower_domain(cpu)->cpus, - tsk_cpus_allowed(tsk)); + unsigned int new_cpu = cpumask_any_and(&hmp_slower_domain(cpu)->cpus, + tsk_cpus_allowed(tsk)); + return new_cpu < NR_CPUS ? new_cpu : cpu; }
static inline void hmp_next_up_delay(struct sched_entity *se, int cpu)
On Fri, 2012-10-12 at 14:19 +0100, Jon Medhurst (Tixy) wrote:
The attached patch fixes the immediate problem by avoiding the empty domain (which is probably a good thing anyway)
Oops, my last patch included some extra junk, the one attached to this mail fixes this...
On 12 October 2012 19:03, Jon Medhurst (Tixy) tixy@linaro.org wrote:
On Fri, 2012-10-12 at 14:19 +0100, Jon Medhurst (Tixy) wrote:
The attached patch fixes the immediate problem by avoiding the empty domain (which is probably a good thing anyway)
Oops, my last patch included some extra junk, the one attached to this mail fixes this...
Tixy, do let me know if i should send another pull request with this one applied.
On 10/12/2012 05:41 PM, Viresh Kumar wrote:
On 12 October 2012 19:03, Jon Medhurst (Tixy) tixy@linaro.org wrote:
On Fri, 2012-10-12 at 14:19 +0100, Jon Medhurst (Tixy) wrote:
The attached patch fixes the immediate problem by avoiding the empty domain (which is probably a good thing anyway)
Oops, my last patch included some extra junk, the one attached to this mail fixes this...
Tixy, do let me know if i should send another pull request with this one applied.
I am going to apply the latest Tixy's patch to llct anyway. (so it will get into the ll too). Would be easier for me to get it as part of big.LITTLE-MP topic though. If it woun't be in big.LITTLE-MP, will apply to llct myself. Today's linux-linaro trees are not RCs yet, so we should better try it now then right before the release. If there are issues there, there is one week (till Oct 18) to adjust the patch.
Thanks, Andrey
On Fri, 2012-10-12 at 19:11 +0530, Viresh Kumar wrote:
On 12 October 2012 19:03, Jon Medhurst (Tixy) tixy@linaro.org wrote:
On Fri, 2012-10-12 at 14:19 +0100, Jon Medhurst (Tixy) wrote:
The attached patch fixes the immediate problem by avoiding the empty domain (which is probably a good thing anyway)
Oops, my last patch included some extra junk, the one attached to this mail fixes this...
Tixy, do let me know if i should send another pull request with this one applied.
It will need fixing one way or the other, and it makes more sense to be part of the MP branch. I was hoping for a second opinion from someone, but if the fix looks OK to you, then yes, please add it to your branch.
I've boot tested this patch with an Android kernel running on TC2 and A9 CoreTiles.
--- Tixy
On 12 October 2012 19:30, Jon Medhurst (Tixy) tixy@linaro.org wrote:
It will need fixing one way or the other, and it makes more sense to be part of the MP branch. I was hoping for a second opinion from someone, but if the fix looks OK to you, then yes, please add it to your branch.
Ok I have applied this patch to my tree now.
@Andrey: Please pull my branch now. :)
I've boot tested this patch with an Android kernel running on TC2 and A9 CoreTiles.
I couldn't test it as am not in office.
-- viresh
Hi Tixy,
Thanks for the patch. I think this patch is the right way to solve this issue.
There is still a problem with the priority filter in hmp_down_migration() which Viresh pointed out earlier. There is no checking of whether the task is actually allowed to run on any of the slower cpus. Solving that would actually also fix the issue that you are observing as a side effect. I have attached a patch.
I think we should apply both.
Thanks, Morten
On Fri, Oct 12, 2012 at 02:33:40PM +0100, Jon Medhurst (Tixy) wrote:
On Fri, 2012-10-12 at 14:19 +0100, Jon Medhurst (Tixy) wrote:
The attached patch fixes the immediate problem by avoiding the empty domain (which is probably a good thing anyway)
Oops, my last patch included some extra junk, the one attached to this mail fixes this...
From 7365076675b851355d48e9b1157e223d7719e3ac Mon Sep 17 00:00:00 2001 From: Jon Medhurst tixy@linaro.org Date: Fri, 12 Oct 2012 13:45:35 +0100 Subject: [PATCH] ARM: sched: Avoid empty 'slow' HMP domain
On homogeneous (non-heterogeneous) systems all CPUs will be declared 'fast' and the slow cpu list will be empty. In this situation we need to avoid adding an empty slow HMP domain otherwise the scheduler code will blow up when it attempts to move a task to the slow domain.
Signed-off-by: Jon Medhurst tixy@linaro.org
arch/arm/kernel/topology.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c index 58dac7a..0b51233 100644 --- a/arch/arm/kernel/topology.c +++ b/arch/arm/kernel/topology.c @@ -396,10 +396,12 @@ void __init arch_get_hmp_domains(struct list_head *hmp_domains_list) * Must be ordered with respect to compute capacity. * Fastest domain at head of list. */
- domain = (struct hmp_domain *)
kmalloc(sizeof(struct hmp_domain), GFP_KERNEL);
- cpumask_copy(&domain->cpus, &hmp_slow_cpu_mask);
- list_add(&domain->hmp_domains, hmp_domains_list);
- if(!cpumask_empty(&hmp_slow_cpu_mask)) {
domain = (struct hmp_domain *)
kmalloc(sizeof(struct hmp_domain), GFP_KERNEL);
cpumask_copy(&domain->cpus, &hmp_slow_cpu_mask);
list_add(&domain->hmp_domains, hmp_domains_list);
- } domain = (struct hmp_domain *) kmalloc(sizeof(struct hmp_domain), GFP_KERNEL); cpumask_copy(&domain->cpus, &hmp_fast_cpu_mask);
-- 1.7.10.4
On 12 October 2012 20:41, Morten Rasmussen Morten.Rasmussen@arm.com wrote:
Thanks for the patch. I think this patch is the right way to solve this issue.
There is still a problem with the priority filter in hmp_down_migration() which Viresh pointed out earlier. There is no checking of whether the task is actually allowed to run on any of the slower cpus. Solving that would actually also fix the issue that you are observing as a side effect. I have attached a patch.
@Andrey: I have applied this too. Please PULL it when you feel Tixy & Morten don't have any more patches to send :)
-- viresh
On Fri, 2012-10-12 at 21:03 +0530, Viresh Kumar wrote:
On 12 October 2012 20:41, Morten Rasmussen Morten.Rasmussen@arm.com wrote:
Thanks for the patch. I think this patch is the right way to solve this issue.
There is still a problem with the priority filter in hmp_down_migration() which Viresh pointed out earlier. There is no checking of whether the task is actually allowed to run on any of the slower cpus. Solving that would actually also fix the issue that you are observing as a side effect. I have attached a patch.
@Andrey: I have applied this too. Please PULL it when you feel Tixy & Morten don't have any more patches to send :)
Well, I still have one more issue which _may_ be related, but if it is I don't think I'm going to be producing any more patches before Monday anyway. The issue? ...
Android doesn't boot on RTSM A15x1-A7x1, some processes are dying for currently unknown reasons. But the same kernel with the same filesystem works on RTSM A15x4-A7x4, RTSM A15x1 and RTSM A15x4, and the same kernel works on real hardware TC2 and A9x4.
Perhaps it's not related to the MP branch, but as the system which fails is one core in each cluster it's a topology which might be exposing a bug in the code.
On Fri, 2012-10-12 at 16:11 +0100, Morten Rasmussen wrote:
Hi Tixy,
Thanks for the patch. I think this patch is the right way to solve this issue.
There is still a problem with the priority filter in hmp_down_migration() which Viresh pointed out earlier. There is no checking of whether the task is actually allowed to run on any of the slower cpus. Solving that would actually also fix the issue that you are observing as a side effect. I have attached a patch.
The patch looks reasonable. I've just run it on TC2 and A9 with the addition of a "pr_err("$");" before the "return 1;" and can see the occosional '$' on TC2 and none on A9, as we would expect. So I guess that counts as:
Reviewed-by: Jon Medhurst tixy@linaro.org Tested-by: Jon Medhurst tixy@linaro.org
On Fri, Oct 12, 2012 at 04:33:19PM +0100, Jon Medhurst (Tixy) wrote:
On Fri, 2012-10-12 at 16:11 +0100, Morten Rasmussen wrote:
Hi Tixy,
Thanks for the patch. I think this patch is the right way to solve this issue.
There is still a problem with the priority filter in hmp_down_migration() which Viresh pointed out earlier. There is no checking of whether the task is actually allowed to run on any of the slower cpus. Solving that would actually also fix the issue that you are observing as a side effect. I have attached a patch.
The patch looks reasonable. I've just run it on TC2 and A9 with the addition of a "pr_err("$");" before the "return 1;" and can see the occosional '$' on TC2 and none on A9, as we would expect. So I guess that counts as:
Reviewed-by: Jon Medhurst tixy@linaro.org Tested-by: Jon Medhurst tixy@linaro.org
Thanks for reviewing and testing.
My comments to your patch in the previous reply would count as:
Reviewed-by: Morten Rasmussen morten.rasmussen@arm.com
I have only tested it on TC2.
Morten
-- Tixy
I think we should apply both.
Thanks, Morten
On Fri, Oct 12, 2012 at 02:33:40PM +0100, Jon Medhurst (Tixy) wrote:
On Fri, 2012-10-12 at 14:19 +0100, Jon Medhurst (Tixy) wrote:
The attached patch fixes the immediate problem by avoiding the empty domain (which is probably a good thing anyway)
Oops, my last patch included some extra junk, the one attached to this mail fixes this...
From 7365076675b851355d48e9b1157e223d7719e3ac Mon Sep 17 00:00:00 2001 From: Jon Medhurst tixy@linaro.org Date: Fri, 12 Oct 2012 13:45:35 +0100 Subject: [PATCH] ARM: sched: Avoid empty 'slow' HMP domain
On homogeneous (non-heterogeneous) systems all CPUs will be declared 'fast' and the slow cpu list will be empty. In this situation we need to avoid adding an empty slow HMP domain otherwise the scheduler code will blow up when it attempts to move a task to the slow domain.
Signed-off-by: Jon Medhurst tixy@linaro.org
arch/arm/kernel/topology.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c index 58dac7a..0b51233 100644 --- a/arch/arm/kernel/topology.c +++ b/arch/arm/kernel/topology.c @@ -396,10 +396,12 @@ void __init arch_get_hmp_domains(struct list_head *hmp_domains_list) * Must be ordered with respect to compute capacity. * Fastest domain at head of list. */
- domain = (struct hmp_domain *)
kmalloc(sizeof(struct hmp_domain), GFP_KERNEL);
- cpumask_copy(&domain->cpus, &hmp_slow_cpu_mask);
- list_add(&domain->hmp_domains, hmp_domains_list);
- if(!cpumask_empty(&hmp_slow_cpu_mask)) {
domain = (struct hmp_domain *)
kmalloc(sizeof(struct hmp_domain), GFP_KERNEL);
cpumask_copy(&domain->cpus, &hmp_slow_cpu_mask);
list_add(&domain->hmp_domains, hmp_domains_list);
- } domain = (struct hmp_domain *) kmalloc(sizeof(struct hmp_domain), GFP_KERNEL); cpumask_copy(&domain->cpus, &hmp_fast_cpu_mask);
-- 1.7.10.4