This commit:
9fb8d5dc4b64 ("stop_machine, Disable preemption when waking two stopper threads")
does not fully address the race condition that can occur as follows:
On one CPU, call it CPU 3, thread 1 invokes cpu_stop_queue_two_works(2, 3,...), and the execution is such that thread 1 queues the works for migration/2 and migration/3, and is preempted after releasing the locks for migration/2 and migration/3, but before waking the threads.
Then, On CPU 2, a kworker, call it thread 2, is running, and it invokes cpu_stop_queue_two_works(1, 2,...), such that thread 2 queues the works for migration/1 and migration/2. Meanwhile, on CPU 3, thread 1 resumes execution, and wakes migration/2 and migration/3. This means that when CPU 2 releases the locks for migration/1 and migration/2, but before it wakes those threads, it can be preempted by migration/2.
If thread 2 is preempted by migration/2, then migration/2 will execute the first work item successfully, since migration/3 was woken up by CPU 3, but when it goes to execute the second work item, it disables preemption, calls multi_cpu_stop(), and thus, CPU 2 will wait forever for migration/1, which should have been woken up by thread 2. However migration/1 cannot be woken up by thread 2, since it is a kworker, so it is affine to CPU 2, but CPU 2 is running migration/2 with preemption disabled, so thread 2 will never run.
Disable preemption after queueing works for stopper threads to ensure that the operation of queueing the works and waking the stopper threads is atomic.
Fixes: 9fb8d5dc4b64 ("stop_machine, Disable preemption when waking two stopper threads") Co-Developed-by: Prasad Sodagudi psodagud@codeaurora.org Co-Developed-by: Pavankumar Kondeti pkondeti@codeaurora.org Signed-off-by: Isaac J. Manjarres isaacm@codeaurora.org Signed-off-by: Prasad Sodagudi psodagud@codeaurora.org Signed-off-by: Pavankumar Kondeti pkondeti@codeaurora.org Cc: stable@vger.kernel.org --- kernel/stop_machine.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index 1ff523d..e190d1e 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -260,6 +260,15 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1, err = 0; __cpu_stop_queue_work(stopper1, work1, &wakeq); __cpu_stop_queue_work(stopper2, work2, &wakeq); + /* + * The waking up of stopper threads has to happen + * in the same scheduling context as the queueing. + * Otherwise, there is a possibility of one of the + * above stoppers being woken up by another CPU, + * and preempting us. This will cause us to n ot + * wake up the other stopper forever. + */ + preempt_disable(); unlock: raw_spin_unlock(&stopper2->lock); raw_spin_unlock_irq(&stopper1->lock); @@ -271,7 +280,6 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1, }
if (!err) { - preempt_disable(); wake_up_q(&wakeq); preempt_enable(); }
Hi all,
Are there any comments about this patch?
Thanks, Isaac Manjarres On 2018-07-17 12:35, Isaac J. Manjarres wrote:
This commit:
9fb8d5dc4b64 ("stop_machine, Disable preemption when waking two stopper threads")
does not fully address the race condition that can occur as follows:
On one CPU, call it CPU 3, thread 1 invokes cpu_stop_queue_two_works(2, 3,...), and the execution is such that thread 1 queues the works for migration/2 and migration/3, and is preempted after releasing the locks for migration/2 and migration/3, but before waking the threads.
Then, On CPU 2, a kworker, call it thread 2, is running, and it invokes cpu_stop_queue_two_works(1, 2,...), such that thread 2 queues the works for migration/1 and migration/2. Meanwhile, on CPU 3, thread 1 resumes execution, and wakes migration/2 and migration/3. This means that when CPU 2 releases the locks for migration/1 and migration/2, but before it wakes those threads, it can be preempted by migration/2.
If thread 2 is preempted by migration/2, then migration/2 will execute the first work item successfully, since migration/3 was woken up by CPU 3, but when it goes to execute the second work item, it disables preemption, calls multi_cpu_stop(), and thus, CPU 2 will wait forever for migration/1, which should have been woken up by thread 2. However migration/1 cannot be woken up by thread 2, since it is a kworker, so it is affine to CPU 2, but CPU 2 is running migration/2 with preemption disabled, so thread 2 will never run.
Disable preemption after queueing works for stopper threads to ensure that the operation of queueing the works and waking the stopper threads is atomic.
Fixes: 9fb8d5dc4b64 ("stop_machine, Disable preemption when waking two stopper threads") Co-Developed-by: Prasad Sodagudi psodagud@codeaurora.org Co-Developed-by: Pavankumar Kondeti pkondeti@codeaurora.org Signed-off-by: Isaac J. Manjarres isaacm@codeaurora.org Signed-off-by: Prasad Sodagudi psodagud@codeaurora.org Signed-off-by: Pavankumar Kondeti pkondeti@codeaurora.org Cc: stable@vger.kernel.org
kernel/stop_machine.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index 1ff523d..e190d1e 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -260,6 +260,15 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1, err = 0; __cpu_stop_queue_work(stopper1, work1, &wakeq); __cpu_stop_queue_work(stopper2, work2, &wakeq);
- /*
* The waking up of stopper threads has to happen
* in the same scheduling context as the queueing.
* Otherwise, there is a possibility of one of the
* above stoppers being woken up by another CPU,
* and preempting us. This will cause us to n ot
* wake up the other stopper forever.
*/
- preempt_disable();
unlock: raw_spin_unlock(&stopper2->lock); raw_spin_unlock_irq(&stopper1->lock); @@ -271,7 +280,6 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1, }
if (!err) {
wake_up_q(&wakeq); preempt_enable(); }preempt_disable();
Hi Sebastian,
Thanks for the response.
"I haven't look in detail at this but your new preempt_disable() makes things unbalanced for the err != 0 case."
This cannot happen. The only possible return values of this function are -ENOENT or 0.
In the case where we return -ENOENT, we'll go straight to "unlock," which releases the two locks being held, but doesn't disable preemption, and since err != we won't call preemption_enable.
In the case where we return 0, then that means the works were queued successfully, and preemption was disabled, and we'll fall into the if branch, after releasing the locks, and enable preemption, which is correct.
In either case, there is no imbalance between the preemption_[disable/enable] calls.
Thanks, Isaac Manjarres
On 2018-07-23 23:23, Sebastian Andrzej Siewior wrote:
On 2018-07-23 18:13:48 [-0700], isaacm@codeaurora.org wrote:
Hi all,
Hi,
Are there any comments about this patch?
I haven't look in detail at this but your new preempt_disable() makes things unbalanced for the err != 0 case.
Thanks, Isaac Manjarres
Sebastian
On Tue, 24 Jul 2018, Sebastian Andrzej Siewior wrote:
On 2018-07-23 18:13:48 [-0700], isaacm@codeaurora.org wrote:
Hi all,
Hi,
Are there any comments about this patch?
I haven't look in detail at this but your new preempt_disable() makes things unbalanced for the err != 0 case.
It doesn't but that code is really an unreadable pile of ...
On Mon, Jul 30, 2018 at 12:20:57PM +0200, Thomas Gleixner wrote:
On Tue, 24 Jul 2018, Sebastian Andrzej Siewior wrote:
On 2018-07-23 18:13:48 [-0700], isaacm@codeaurora.org wrote:
Hi all,
Hi,
Are there any comments about this patch?
I haven't look in detail at this but your new preempt_disable() makes things unbalanced for the err != 0 case.
It doesn't but that code is really an unreadable pile of ...
--- Subject: stop_machine: Reflow cpu_stop_queue_two_works()
The code flow in cpu_stop_queue_two_works() is a little arcane; fix this by lifting the preempt_disable() to the top to create more natural nesting wrt the spinlocks and make the wake_up_q() and preempt_enable() unconditional at the end.
Furthermore, enable preemption in the -EDEADLK case, such that we spin-wait with preemption enabled.
Suggested-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org --- kernel/stop_machine.c | 41 +++++++++++++++++++++++------------------ 1 file changed, 23 insertions(+), 18 deletions(-)
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index e190d1ef3a23..34b6652e8677 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -236,13 +236,24 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1, struct cpu_stopper *stopper2 = per_cpu_ptr(&cpu_stopper, cpu2); DEFINE_WAKE_Q(wakeq); int err; + retry: + /* + * The waking up of stopper threads has to happen in the same + * scheduling context as the queueing. Otherwise, there is a + * possibility of one of the above stoppers being woken up by another + * CPU, and preempting us. This will cause us to not wake up the other + * stopper forever. + */ + preempt_disable(); raw_spin_lock_irq(&stopper1->lock); raw_spin_lock_nested(&stopper2->lock, SINGLE_DEPTH_NESTING);
- err = -ENOENT; - if (!stopper1->enabled || !stopper2->enabled) + if (!stopper1->enabled || !stopper2->enabled) { + err = -ENOENT; goto unlock; + } + /* * Ensure that if we race with __stop_cpus() the stoppers won't get * queued up in reverse order leading to system deadlock. @@ -253,36 +264,30 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1, * It can be falsely true but it is safe to spin until it is cleared, * queue_stop_cpus_work() does everything under preempt_disable(). */ - err = -EDEADLK; - if (unlikely(stop_cpus_in_progress)) - goto unlock; + if (unlikely(stop_cpus_in_progress)) { + err = -EDEADLK; + goto unlock; + }
err = 0; __cpu_stop_queue_work(stopper1, work1, &wakeq); __cpu_stop_queue_work(stopper2, work2, &wakeq); - /* - * The waking up of stopper threads has to happen - * in the same scheduling context as the queueing. - * Otherwise, there is a possibility of one of the - * above stoppers being woken up by another CPU, - * and preempting us. This will cause us to n ot - * wake up the other stopper forever. - */ - preempt_disable(); + unlock: raw_spin_unlock(&stopper2->lock); raw_spin_unlock_irq(&stopper1->lock);
if (unlikely(err == -EDEADLK)) { + preempt_enable(); + while (stop_cpus_in_progress) cpu_relax(); + goto retry; }
- if (!err) { - wake_up_q(&wakeq); - preempt_enable(); - } + wake_up_q(&wakeq); + preempt_enable();
return err; }
On Mon, 30 Jul 2018, Peter Zijlstra wrote:
On Mon, Jul 30, 2018 at 12:20:57PM +0200, Thomas Gleixner wrote:
On Tue, 24 Jul 2018, Sebastian Andrzej Siewior wrote:
On 2018-07-23 18:13:48 [-0700], isaacm@codeaurora.org wrote:
Hi all,
Hi,
Are there any comments about this patch?
I haven't look in detail at this but your new preempt_disable() makes things unbalanced for the err != 0 case.
It doesn't but that code is really an unreadable pile of ...
Subject: stop_machine: Reflow cpu_stop_queue_two_works()
The code flow in cpu_stop_queue_two_works() is a little arcane; fix this by lifting the preempt_disable() to the top to create more natural nesting wrt the spinlocks and make the wake_up_q() and preempt_enable() unconditional at the end.
Furthermore, enable preemption in the -EDEADLK case, such that we spin-wait with preemption enabled.
Suggested-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org
Thanks for cleaning that up!
Reviewed-by: Thomas Gleixner tglx@linutronix.de
On 2018-07-30 05:41, Thomas Gleixner wrote:
On Mon, 30 Jul 2018, Peter Zijlstra wrote:
On Mon, Jul 30, 2018 at 12:20:57PM +0200, Thomas Gleixner wrote:
On Tue, 24 Jul 2018, Sebastian Andrzej Siewior wrote:
On 2018-07-23 18:13:48 [-0700], isaacm@codeaurora.org wrote:
Hi all,
Hi,
Are there any comments about this patch?
I haven't look in detail at this but your new preempt_disable() makes things unbalanced for the err != 0 case.
It doesn't but that code is really an unreadable pile of ...
Subject: stop_machine: Reflow cpu_stop_queue_two_works()
The code flow in cpu_stop_queue_two_works() is a little arcane; fix this by lifting the preempt_disable() to the top to create more natural nesting wrt the spinlocks and make the wake_up_q() and preempt_enable() unconditional at the end.
Furthermore, enable preemption in the -EDEADLK case, such that we spin-wait with preemption enabled.
Suggested-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org
Hi Peter/Thomas,
How about including below change as well? Currently, there is no way to identify thread migrations completed or not. When we observe this issue, the symptom was work queue lock up. It is better to have some timeout here and induce the bug_on.
There is no way to identify the migration threads stuck or not.
--- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -290,6 +290,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void * struct cpu_stop_done done; struct cpu_stop_work work1, work2; struct multi_stop_data msdata; + int ret;
msdata = (struct multi_stop_data){ .fn = fn, @@ -312,7 +313,10 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void * if (cpu_stop_queue_two_works(cpu1, &work1, cpu2, &work2)) return -ENOENT;
- wait_for_completion(&done.completion); + ret = wait_for_completion_timeout(&done.completion, msecs_to_jiffies(1000)); + if (!ret) + BUG_ON(1); +
Thanks for cleaning that up!
Reviewed-by: Thomas Gleixner tglx@linutronix.de
On Mon, 30 Jul 2018, Sodagudi Prasad wrote:
How about including below change as well? Currently, there is no way to
That would be a completely separate change.
identify thread migrations completed or not. When we observe this issue, the symptom was work queue lock up. It is better to have some timeout here and induce the bug_on.
BUG_ON() is wrong. Why kill the thing if there is at least a theoretical chance that stuff can continue half baken so you can get more info out of it. The back trace is pretty much uninteresting.
Thanks,
tglx
On Mon, Jul 30, 2018 at 10:12:43AM -0700, Sodagudi Prasad wrote:
How about including below change as well? Currently, there is no way to identify thread migrations completed or not. When we observe this issue, the symptom was work queue lock up. It is better to have some timeout here and induce the bug_on.
You'd trigger the soft-lockup or hung-task detector I think. And if not, we ought to look at making it trigger at least one of those.
There is no way to identify the migration threads stuck or not.
Should be pretty obvious from the splat generated by the above, no?
--- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -290,6 +290,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void * struct cpu_stop_done done; struct cpu_stop_work work1, work2; struct multi_stop_data msdata;
int ret; msdata = (struct multi_stop_data){ .fn = fn,
@@ -312,7 +313,10 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void * if (cpu_stop_queue_two_works(cpu1, &work1, cpu2, &work2)) return -ENOENT;
wait_for_completion(&done.completion);
ret = wait_for_completion_timeout(&done.completion,
msecs_to_jiffies(1000));
if (!ret)
BUG_ON(1);
That's a random timeout, which if you spuriously trigger it, will take down your machine. That seems like a cure worse than the disease.
On 2018-07-30 14:07, Peter Zijlstra wrote:
On Mon, Jul 30, 2018 at 10:12:43AM -0700, Sodagudi Prasad wrote:
How about including below change as well? Currently, there is no way to identify thread migrations completed or not. When we observe this issue, the symptom was work queue lock up. It is better to have some timeout here and induce the bug_on.
You'd trigger the soft-lockup or hung-task detector I think. And if not, we ought to look at making it trigger at least one of those.
There is no way to identify the migration threads stuck or not.
Should be pretty obvious from the splat generated by the above, no?
Hi Peter and Thomas,
Thanks for your support. I have another question on this flow and retry mechanism used in this cpu_stop_queue_two_works() function using the global variable stop_cpus_in_progress.
This variable is getting used in various paths, such as task migration, set task affinity, and CPU hotplug.
For example cpu hotplug path, stop_cpus_in_progress variable getting set with true with out checking. takedown_cpu() --stop_machine_cpuslocked() ---stop_cpus() ---__stop_cpus() ----queue_stop_cpus_work() setting stop_cpus_in_progress to true directly.
But in the task migration path only, the stop_cpus_in_progress variable is used for retry.
I am thinking that stop_cpus_in_progress variable lead race conditions, where CPU hotplug and task migration happening simultaneously. Please correct me If my understanding wrong.
-Thanks, Prasad
--- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -290,6 +290,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void * struct cpu_stop_done done; struct cpu_stop_work work1, work2; struct multi_stop_data msdata;
int ret; msdata = (struct multi_stop_data){ .fn = fn,
@@ -312,7 +313,10 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void * if (cpu_stop_queue_two_works(cpu1, &work1, cpu2, &work2)) return -ENOENT;
wait_for_completion(&done.completion);
ret = wait_for_completion_timeout(&done.completion,
msecs_to_jiffies(1000));
if (!ret)
BUG_ON(1);
That's a random timeout, which if you spuriously trigger it, will take down your machine. That seems like a cure worse than the disease.
Hi Prasad,
On Wed, Aug 01, 2018 at 01:07:03AM -0700, Sodagudi Prasad wrote:
On 2018-07-30 14:07, Peter Zijlstra wrote:
On Mon, Jul 30, 2018 at 10:12:43AM -0700, Sodagudi Prasad wrote:
How about including below change as well? Currently, there is no way to identify thread migrations completed or not. When we observe this issue, the symptom was work queue lock up. It is better to have some timeout here and induce the bug_on.
You'd trigger the soft-lockup or hung-task detector I think. And if not, we ought to look at making it trigger at least one of those.
There is no way to identify the migration threads stuck or not.
Should be pretty obvious from the splat generated by the above, no?
Hi Peter and Thomas,
Thanks for your support. I have another question on this flow and retry mechanism used in this cpu_stop_queue_two_works() function using the global variable stop_cpus_in_progress.
This variable is getting used in various paths, such as task migration, set task affinity, and CPU hotplug.
For example cpu hotplug path, stop_cpus_in_progress variable getting set with true with out checking. takedown_cpu() --stop_machine_cpuslocked() ---stop_cpus() ---__stop_cpus() ----queue_stop_cpus_work() setting stop_cpus_in_progress to true directly.
But in the task migration path only, the stop_cpus_in_progress variable is used for retry.
I am thinking that stop_cpus_in_progress variable lead race conditions, where CPU hotplug and task migration happening simultaneously. Please correct me If my understanding wrong.
The stop_cpus_in_progress variable is to guard against out of order queuing. The stopper locks does not protect this when cpu_stop_queue_two_works() and stop_cpus() are executing in parallel.
stop_one_cpu_{nowait} functions are called to handle affinity change and load balance. Since we are queuing the work only on 1 CPU, stop_cpus_in_progress variable protection is not needed.
Thanks, Pavan
linux-stable-mirror@lists.linaro.org