need_active_balance uses env->src_cpu which is set only if there is more than 1 task on the run queue. We must set the src_cpu field unconditionnally otherwise the test "env->src_cpu > env->dst_cpu" will always fail if there is only 1 task on the run queue
Signed-off-by: Vincent Guittot vincent.guittot@linaro.org --- kernel/sched/fair.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 81fa536..32938ea 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5044,6 +5044,10 @@ redo:
ld_moved = 0; lb_iterations = 1; + + env.src_cpu = busiest->cpu; + env.src_rq = busiest; + if (busiest->nr_running > 1) { /* * Attempt to move tasks. If find_busiest_group has found @@ -5052,8 +5056,6 @@ redo: * correctly treated as an imbalance. */ env.flags |= LBF_ALL_PINNED; - env.src_cpu = busiest->cpu; - env.src_rq = busiest; env.loop_max = min(sysctl_sched_nr_migrate, busiest->nr_running);
update_h_load(env.src_cpu);
Hi,
I tested this on top of 3.8-rc7 and this made the machine (x86_64, Core i7 920) unable to boot (very early as nothing at all is displayed on screen). Nothing in the kernel log (after booting with a working kernel).
Double-checked by just backing out only this patch and this made the machine boot again.
Damien
need_active_balance uses env->src_cpu which is set only if there is more than 1 task on the run queue. We must set the src_cpu field unconditionnally otherwise the test "env->src_cpu > env->dst_cpu" will always fail if there is only 1 task on the run queue
Signed-off-by: Vincent Guittot vincent.guittot@linaro.org
kernel/sched/fair.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 81fa536..32938ea 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5044,6 +5044,10 @@ redo: ld_moved = 0; lb_iterations = 1;
- env.src_cpu = busiest->cpu;
- env.src_rq = busiest;
- if (busiest->nr_running > 1) { /*
- Attempt to move tasks. If find_busiest_group has found
@@ -5052,8 +5056,6 @@ redo: * correctly treated as an imbalance. */ env.flags |= LBF_ALL_PINNED;
env.src_cpu = busiest->cpu;
env.loop_max = min(sysctl_sched_nr_migrate, busiest->nr_running);env.src_rq = busiest;
update_h_load(env.src_cpu);
Hi Damien,
Thanks for the test and the feedback. Could you send me the sched_domain configuration of your machine with the kernel that boots on your machine ? It's available in /proc/sys/kernel/sched_domain/cpu*/
This should not have any impact on your machine but it looks like it have one.
Regards, Vincent
On 13 February 2013 07:18, Damien Wyart damien.wyart@gmail.com wrote:
Hi,
I tested this on top of 3.8-rc7 and this made the machine (x86_64, Core i7 920) unable to boot (very early as nothing at all is displayed on screen). Nothing in the kernel log (after booting with a working kernel).
Double-checked by just backing out only this patch and this made the machine boot again.
Damien
need_active_balance uses env->src_cpu which is set only if there is more than 1 task on the run queue. We must set the src_cpu field unconditionnally otherwise the test "env->src_cpu > env->dst_cpu" will always fail if there is only 1 task on the run queue
Signed-off-by: Vincent Guittot vincent.guittot@linaro.org
kernel/sched/fair.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 81fa536..32938ea 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5044,6 +5044,10 @@ redo:
ld_moved = 0; lb_iterations = 1;
env.src_cpu = busiest->cpu;
env.src_rq = busiest;
if (busiest->nr_running > 1) { /* * Attempt to move tasks. If find_busiest_group has found
@@ -5052,8 +5056,6 @@ redo: * correctly treated as an imbalance. */ env.flags |= LBF_ALL_PINNED;
env.src_cpu = busiest->cpu;
env.src_rq = busiest; env.loop_max = min(sysctl_sched_nr_migrate, busiest->nr_running); update_h_load(env.src_cpu);
Hi,
Thanks for the test and the feedback. Could you send me the sched_domain configuration of your machine with the kernel that boots on your machine ? It's available in /proc/sys/kernel/sched_domain/cpu*/
I attach to this mail the sched_domain info, the contents of /proc/cpuinfo and my kernel config.
Damien,
Regarding your sched_domain config and especially the flags field, you should not be impacted by my patch because - need_active_balance is the only new place that use env->src_cpu in the load_balance function - and your machine will never test the condition: "env->src_cpu > env->dst_cpu" in need_active_balance because SD_ASYM_PACKING is not set in your config
Have you tested the patch with others that could have modified the load_balance function ?
Vincent
On 13 February 2013 10:21, Damien Wyart damien.wyart@gmail.com wrote:
Hi,
Thanks for the test and the feedback. Could you send me the sched_domain configuration of your machine with the kernel that boots on your machine ? It's available in /proc/sys/kernel/sched_domain/cpu*/
I attach to this mail the sched_domain info, the contents of /proc/cpuinfo and my kernel config.
-- Damien
* Vincent Guittot vincent.guittot@linaro.org [2013-02-13 13:08]:
Damien, Regarding your sched_domain config and especially the flags field, you should not be impacted by my patch because
- need_active_balance is the only new place that use env->src_cpu in
the load_balance function
- and your machine will never test the condition: "env->src_cpu >
env->dst_cpu" in need_active_balance because SD_ASYM_PACKING is not set in your config
Have you tested the patch with others that could have modified the load_balance function ?
Yes, sorry, I should have been more precise in my initial report: your patch was not applied on top of vanilla 3.8-rc7, but a few other patches were also present. Seems the ones impacting load_balance are from Frederic's nohz work (http://git.kernel.org/?p=linux/kernel/git/frederic/linux-dynticks.git%3Ba=sh...)
The patches from there modifying load_balance are: - "sched: Update nohz rq clock before searching busiest group on load balancing" - "sched: Update clock of nohz busiest rq before balancing"
In this test, I did not use any kernel parameter related to this patchset (full_nohz, etc.).
I am adding Frederic in Cc, not sure if the breakage is to be investigated on your side or his...
Thanks for your explanations,
On 13 February 2013 15:08, Damien Wyart damien.wyart@gmail.com wrote:
- Vincent Guittot vincent.guittot@linaro.org [2013-02-13 13:08]:
Damien, Regarding your sched_domain config and especially the flags field, you should not be impacted by my patch because
- need_active_balance is the only new place that use env->src_cpu in
the load_balance function
- and your machine will never test the condition: "env->src_cpu >
env->dst_cpu" in need_active_balance because SD_ASYM_PACKING is not set in your config
Have you tested the patch with others that could have modified the load_balance function ?
Yes, sorry, I should have been more precise in my initial report: your patch was not applied on top of vanilla 3.8-rc7, but a few other patches were also present. Seems the ones impacting load_balance are from Frederic's nohz work (http://git.kernel.org/?p=linux/kernel/git/frederic/linux-dynticks.git%3Ba=sh...)
ok thanks for the pointer. i'm going to have a look
The patches from there modifying load_balance are:
- "sched: Update nohz rq clock before searching busiest group on load balancing"
- "sched: Update clock of nohz busiest rq before balancing"
In this test, I did not use any kernel parameter related to this patchset (full_nohz, etc.).
I am adding Frederic in Cc, not sure if the breakage is to be investigated on your side or his...
probably both
Thanks for your explanations,
Damien
On 13 February 2013 15:28, Vincent Guittot vincent.guittot@linaro.org wrote:
On 13 February 2013 15:08, Damien Wyart damien.wyart@gmail.com wrote:
- Vincent Guittot vincent.guittot@linaro.org [2013-02-13 13:08]:
Damien, Regarding your sched_domain config and especially the flags field, you should not be impacted by my patch because
- need_active_balance is the only new place that use env->src_cpu in
the load_balance function
- and your machine will never test the condition: "env->src_cpu >
env->dst_cpu" in need_active_balance because SD_ASYM_PACKING is not set in your config
Have you tested the patch with others that could have modified the load_balance function ?
Yes, sorry, I should have been more precise in my initial report: your patch was not applied on top of vanilla 3.8-rc7, but a few other patches were also present. Seems the ones impacting load_balance are from Frederic's nohz work (http://git.kernel.org/?p=linux/kernel/git/frederic/linux-dynticks.git%3Ba=sh...)
ok thanks for the pointer. i'm going to have a look
The patches from there modifying load_balance are:
- "sched: Update nohz rq clock before searching busiest group on load balancing"
- "sched: Update clock of nohz busiest rq before balancing"
In this test, I did not use any kernel parameter related to this patchset (full_nohz, etc.).
I am adding Frederic in Cc, not sure if the breakage is to be investigated on your side or his...
probably both
I have look into Frederic's tree but i didn't find any reason that could explain your problem. May be Frederic will have some ideas I have also tested his branch with and without my patch and both kernel are booting (on an ARM platform without using the full_nohz feature ). I have faced a conflict when i have applied my patch on his branch as we insert code at the same place. You should have faced the same conflict. How have you solved it ?
Vincent
Thanks for your explanations,
Damien
* Vincent Guittot vincent.guittot@linaro.org [2013-02-13 18:49]:
I have look into Frederic's tree but i didn't find any reason that could explain your problem. May be Frederic will have some ideas I have also tested his branch with and without my patch and both kernel are booting (on an ARM platform without using the full_nohz feature ). I have faced a conflict when i have applied my patch on his branch as we insert code at the same place. You should have faced the same conflict. How have you solved it ?
Bingo, that was the problem in my setup: as the patch was applied through a script with others, I had missed the error message about the conflict (I have also another conflict which can be safely ignored so the new one did not catch my eye)... So the patch was only half-applied, and the final code is broken.
How did you solve the conflict (I am not a scheduler expert)? I can retry running the patched kernel with your resolution, to check if everything is ok.
Thanks and sorry for this,
Bingo, that was the problem in my setup: as the patch was applied through a script with others, I had missed the error message about the conflict (I have also another conflict which can be safely ignored so the new one did not catch my eye)... So the patch was only half-applied, and the final code is broken.
How did you solve the conflict (I am not a scheduler expert)? I can retry running the patched kernel with your resolution, to check if everything is ok.
After looking a bit more, the conflict resolution seemed straighforward, so I gave it a go. The attached version booted fine, so the initial problem was purely PEBCAK... Sorry for the noise!
On 13 February 2013 21:02, Damien Wyart damien.wyart@gmail.com wrote:
Bingo, that was the problem in my setup: as the patch was applied through a script with others, I had missed the error message about the conflict (I have also another conflict which can be safely ignored so the new one did not catch my eye)... So the patch was only half-applied, and the final code is broken.
How did you solve the conflict (I am not a scheduler expert)? I can retry running the patched kernel with your resolution, to check if everything is ok.
After looking a bit more, the conflict resolution seemed straighforward,
Yes the resolution is straightforward and your patch is ok.
so I gave it a go. The attached version booted fine, so the initial problem was purely PEBCAK... Sorry for the noise!
Now, i know that my patch on frederic's branch, boots on a x86_64, Core i7 920 :-)
Vincent
-- Damien
On Tue, Feb 12, 2013 at 5:19 AM, Vincent Guittot vincent.guittot@linaro.org wrote:
need_active_balance uses env->src_cpu which is set only if there is more than 1 task on the run queue. We must set the src_cpu field unconditionnally otherwise the test "env->src_cpu > env->dst_cpu" will always fail if there is only 1 task on the run queue
Signed-off-by: Vincent Guittot vincent.guittot@linaro.org
kernel/sched/fair.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 81fa536..32938ea 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5044,6 +5044,10 @@ redo:
Aside: Oh dear, it looks like in all the re-factoring here update_h_load() has escaped rq->lock?
load_balance() ... update_h_load(env.src_cpu); more_balance: local_irq_save(flags); double_rq_lock(env.dst_rq, busiest);
ld_moved = 0; lb_iterations = 1;
env.src_cpu = busiest->cpu;
env.src_rq = busiest;
Hmm -- But shouldn't find_busiest_queue return NULL in this case?
We're admittedly racing but we should have: busiest->nr_running == 1 wl = rq->load.weight > env->imbalance since imbalance < (wl - this_rq->load.weight / 2)
This would seem specialized to the case when we either A) race (fbq is openly racy) B) have no capacity
Admittedly when we do race current case this is probably a biased coinflip depending on whatever was on the stack (src_cpu is uninitialized); it's also super easy for this to later become a crash if someone tries to dereference src_rq so we should do something.
The case this seems most important for (and something we should add an explicit comment regarding) is that we want this case specifically for CPU_NEW_IDLE to move a single runnable-task to a lower numbered idle-cpu index in the SD_ASYM case (although I suspect we need to push this up to fbq also to actually find it...)
In the !SD_ASYM case we'd probably also want to re-check busiest nr_running in need_active_balance (or probably better alternative re-arrange the checks) since this is going to potentially now move a single large task needlessly to an already loaded rq in balance-failed case.
if (busiest->nr_running > 1) { /* * Attempt to move tasks. If find_busiest_group has found
@@ -5052,8 +5056,6 @@ redo: * correctly treated as an imbalance. */ env.flags |= LBF_ALL_PINNED;
env.src_cpu = busiest->cpu;
env.src_rq = busiest; env.loop_max = min(sysctl_sched_nr_migrate, busiest->nr_running); update_h_load(env.src_cpu);
-- 1.7.9.5
-- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On 13 February 2013 21:03, Paul Turner pjt@google.com wrote:
On Tue, Feb 12, 2013 at 5:19 AM, Vincent Guittot vincent.guittot@linaro.org wrote:
need_active_balance uses env->src_cpu which is set only if there is more than 1 task on the run queue. We must set the src_cpu field unconditionnally otherwise the test "env->src_cpu > env->dst_cpu" will always fail if there is only 1 task on the run queue
Signed-off-by: Vincent Guittot vincent.guittot@linaro.org
kernel/sched/fair.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 81fa536..32938ea 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5044,6 +5044,10 @@ redo:
Aside: Oh dear, it looks like in all the re-factoring here update_h_load() has escaped rq->lock?
load_balance() ... update_h_load(env.src_cpu); more_balance: local_irq_save(flags); double_rq_lock(env.dst_rq, busiest);
ld_moved = 0; lb_iterations = 1;
env.src_cpu = busiest->cpu;
env.src_rq = busiest;
Hmm -- But shouldn't find_busiest_queue return NULL in this case?
We're admittedly racing but we should have: busiest->nr_running == 1 wl = rq->load.weight > env->imbalance since imbalance < (wl - this_rq->load.weight / 2)
check_asym_packing overwrites the env->imbalance with (sds->max_load * sds->busiest->sgp->power / SCHED_POWER_SCALE) so fbq can return a queue
This would seem specialized to the case when we either A) race (fbq is openly racy) B) have no capacity
Admittedly when we do race current case this is probably a biased coinflip depending on whatever was on the stack (src_cpu is uninitialized); it's also super easy for this to later become a crash if someone tries to dereference src_rq so we should do something.
The case this seems most important for (and something we should add an explicit comment regarding) is that we want this case specifically for CPU_NEW_IDLE to move a single runnable-task to a lower numbered idle-cpu index in the SD_ASYM case (although I suspect we need to push this up to fbq also to actually find it...)
The update of imbalance should be enough
In the !SD_ASYM case we'd probably also want to re-check busiest nr_running in need_active_balance (or probably better alternative re-arrange the checks) since this is going to potentially now move a single large task needlessly to an already loaded rq in balance-failed case.
yes, that could be an interesting add-on
if (busiest->nr_running > 1) { /* * Attempt to move tasks. If find_busiest_group has found
@@ -5052,8 +5056,6 @@ redo: * correctly treated as an imbalance. */ env.flags |= LBF_ALL_PINNED;
env.src_cpu = busiest->cpu;
env.src_rq = busiest; env.loop_max = min(sysctl_sched_nr_migrate, busiest->nr_running); update_h_load(env.src_cpu);
-- 1.7.9.5
-- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/