The following situation leads to deadlock:
[task 1] [task 2] [task 3] kill_fasync() mm_update_next_owner() copy_process() spin_lock_irqsave(&fa->fa_lock) read_lock(&tasklist_lock) write_lock_irq(&tasklist_lock) send_sigio() <IRQ> ... read_lock(&fown->lock) kill_fasync() ... read_lock(&tasklist_lock) spin_lock_irqsave(&fa->fa_lock) ...
Task 1 can't acquire read locked tasklist_lock, since there is already task 3 expressed its wish to take the lock exclusive. Task 2 holds the read locked lock, but it can't take the spin lock.
The patch makes queued_read_lock_slowpath() to give task 1 the same priority as it was an interrupt handler, and to take the lock dispite of task 3 is waiting it, and this prevents the deadlock. It seems there is no better way to detect such the situations, also in general it's not good to wait so long for readers with interrupts disabled, since read_lock may nest with another locks and delay the system.
Signed-off-by: Kirill Tkhai ktkhai@virtuozzo.com --- kernel/locking/qrwlock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c index c7471c3fb798..d15df85de8f5 100644 --- a/kernel/locking/qrwlock.c +++ b/kernel/locking/qrwlock.c @@ -32,7 +32,7 @@ void queued_read_lock_slowpath(struct qrwlock *lock) /* * Readers come here when they cannot get the lock without waiting */ - if (unlikely(in_interrupt())) { + if (unlikely(irqs_disabled())) { /* * Readers in interrupt context will get the lock immediately * if the writer is just waiting (not holding the lock yet),
On Wed, Apr 04, 2018 at 06:24:39PM +0300, Kirill Tkhai wrote:
The following situation leads to deadlock:
[task 1] [task 2] [task 3] kill_fasync() mm_update_next_owner() copy_process() spin_lock_irqsave(&fa->fa_lock) read_lock(&tasklist_lock) write_lock_irq(&tasklist_lock) send_sigio() <IRQ> ... read_lock(&fown->lock) kill_fasync() ... read_lock(&tasklist_lock) spin_lock_irqsave(&fa->fa_lock) ...
Task 1 can't acquire read locked tasklist_lock, since there is already task 3 expressed its wish to take the lock exclusive. Task 2 holds the read locked lock, but it can't take the spin lock.
The patch makes queued_read_lock_slowpath() to give task 1 the same priority as it was an interrupt handler, and to take the lock
That re-introduces starvation scenarios. And the above looks like a proper deadlock that should be sorted by fixing the locking order.
On 04.04.2018 18:35, Peter Zijlstra wrote:
On Wed, Apr 04, 2018 at 06:24:39PM +0300, Kirill Tkhai wrote:
The following situation leads to deadlock:
[task 1] [task 2] [task 3] kill_fasync() mm_update_next_owner() copy_process() spin_lock_irqsave(&fa->fa_lock) read_lock(&tasklist_lock) write_lock_irq(&tasklist_lock) send_sigio() <IRQ> ... read_lock(&fown->lock) kill_fasync() ... read_lock(&tasklist_lock) spin_lock_irqsave(&fa->fa_lock) ...
Task 1 can't acquire read locked tasklist_lock, since there is already task 3 expressed its wish to take the lock exclusive. Task 2 holds the read locked lock, but it can't take the spin lock.
The patch makes queued_read_lock_slowpath() to give task 1 the same priority as it was an interrupt handler, and to take the lock
That re-introduces starvation scenarios. And the above looks like a proper deadlock that should be sorted by fixing the locking order.
We can move tasklist_lock out of send_sigio(), but I'm not sure it's possible for read_lock(&fown->lock).
Is there another solution? Is there reliable way to iterate do_each_pid_task() with rcu_read_lock()?
On 04.04.2018 18:51, Kirill Tkhai wrote:
On 04.04.2018 18:35, Peter Zijlstra wrote:
On Wed, Apr 04, 2018 at 06:24:39PM +0300, Kirill Tkhai wrote:
The following situation leads to deadlock:
[task 1] [task 2] [task 3] kill_fasync() mm_update_next_owner() copy_process() spin_lock_irqsave(&fa->fa_lock) read_lock(&tasklist_lock) write_lock_irq(&tasklist_lock) send_sigio() <IRQ> ... read_lock(&fown->lock) kill_fasync() ... read_lock(&tasklist_lock) spin_lock_irqsave(&fa->fa_lock) ...
Task 1 can't acquire read locked tasklist_lock, since there is already task 3 expressed its wish to take the lock exclusive. Task 2 holds the read locked lock, but it can't take the spin lock.
The patch makes queued_read_lock_slowpath() to give task 1 the same priority as it was an interrupt handler, and to take the lock
That re-introduces starvation scenarios. And the above looks like a proper deadlock that should be sorted by fixing the locking order.
We can move tasklist_lock out of send_sigio(), but I'm not sure it's possible for read_lock(&fown->lock).
Is there another solution? Is there reliable way to iterate do_each_pid_task() with rcu_read_lock()?
In case of &fown->lock we may always disable irqs for all the places, where it's taken for read, i.e. read_lock_irqsave(&fown->lock). This seems to fix the problem for this lock.
On 04/04/2018 11:55 AM, Kirill Tkhai wrote:
On 04.04.2018 18:51, Kirill Tkhai wrote:
On 04.04.2018 18:35, Peter Zijlstra wrote:
On Wed, Apr 04, 2018 at 06:24:39PM +0300, Kirill Tkhai wrote:
The following situation leads to deadlock:
[task 1] [task 2] [task 3] kill_fasync() mm_update_next_owner() copy_process() spin_lock_irqsave(&fa->fa_lock) read_lock(&tasklist_lock) write_lock_irq(&tasklist_lock) send_sigio() <IRQ> ... read_lock(&fown->lock) kill_fasync() ... read_lock(&tasklist_lock) spin_lock_irqsave(&fa->fa_lock) ...
Task 1 can't acquire read locked tasklist_lock, since there is already task 3 expressed its wish to take the lock exclusive. Task 2 holds the read locked lock, but it can't take the spin lock.
The patch makes queued_read_lock_slowpath() to give task 1 the same priority as it was an interrupt handler, and to take the lock
That re-introduces starvation scenarios. And the above looks like a proper deadlock that should be sorted by fixing the locking order.
We can move tasklist_lock out of send_sigio(), but I'm not sure it's possible for read_lock(&fown->lock).
Is there another solution? Is there reliable way to iterate do_each_pid_task() with rcu_read_lock()?
In case of &fown->lock we may always disable irqs for all the places, where it's taken for read, i.e. read_lock_irqsave(&fown->lock). This seems to fix the problem for this lock.
One possible solution is add a flag in send_sigio() to use a read_trylock(&tasklist_lock) instead of read_lock(). If the trylock fails, returns an error and have the caller (kill_fasync) release fa->fa_lock and retry again. Task 1 has 3 levels of nested locking and so it should be the one that does a retry if the innermost locking fails. An warning can be printed if the retry count is too large.
-Longman
On 04.04.2018 19:25, Waiman Long wrote:
On 04/04/2018 11:55 AM, Kirill Tkhai wrote:
On 04.04.2018 18:51, Kirill Tkhai wrote:
On 04.04.2018 18:35, Peter Zijlstra wrote:
On Wed, Apr 04, 2018 at 06:24:39PM +0300, Kirill Tkhai wrote:
The following situation leads to deadlock:
[task 1] [task 2] [task 3] kill_fasync() mm_update_next_owner() copy_process() spin_lock_irqsave(&fa->fa_lock) read_lock(&tasklist_lock) write_lock_irq(&tasklist_lock) send_sigio() <IRQ> ... read_lock(&fown->lock) kill_fasync() ... read_lock(&tasklist_lock) spin_lock_irqsave(&fa->fa_lock) ...
Task 1 can't acquire read locked tasklist_lock, since there is already task 3 expressed its wish to take the lock exclusive. Task 2 holds the read locked lock, but it can't take the spin lock.
The patch makes queued_read_lock_slowpath() to give task 1 the same priority as it was an interrupt handler, and to take the lock
That re-introduces starvation scenarios. And the above looks like a proper deadlock that should be sorted by fixing the locking order.
We can move tasklist_lock out of send_sigio(), but I'm not sure it's possible for read_lock(&fown->lock).
Is there another solution? Is there reliable way to iterate do_each_pid_task() with rcu_read_lock()?
In case of &fown->lock we may always disable irqs for all the places, where it's taken for read, i.e. read_lock_irqsave(&fown->lock). This seems to fix the problem for this lock.
One possible solution is add a flag in send_sigio() to use a read_trylock(&tasklist_lock) instead of read_lock(). If the trylock fails, returns an error and have the caller (kill_fasync) release fa->fa_lock and retry again. Task 1 has 3 levels of nested locking and so it should be the one that does a retry if the innermost locking fails. An warning can be printed if the retry count is too large.
send_sigio() is called from several places, and the context, which calls it, in general is unknown. In case of dnotify_handle_event(), which calls it under spinlock, trylock will act as busy loop. This may block some smp/stop machine primitives, and I'm not sure, this can't bring to some currently unvisible deadlocks.
There is a solution, I'm analysing in the moment, when we can convert fasync_struct::fa_lock to read/write lock, then we can take it from interrupt without problems.
Kirill
On Wed, Apr 04, 2018 at 06:51:08PM +0300, Kirill Tkhai wrote:
On 04.04.2018 18:35, Peter Zijlstra wrote:
On Wed, Apr 04, 2018 at 06:24:39PM +0300, Kirill Tkhai wrote:
The following situation leads to deadlock:
[task 1] [task 2] [task 3] kill_fasync() mm_update_next_owner() copy_process() spin_lock_irqsave(&fa->fa_lock) read_lock(&tasklist_lock) write_lock_irq(&tasklist_lock) send_sigio() <IRQ> ... read_lock(&fown->lock) kill_fasync() ... read_lock(&tasklist_lock) spin_lock_irqsave(&fa->fa_lock) ...
Task 1 can't acquire read locked tasklist_lock, since there is already task 3 expressed its wish to take the lock exclusive. Task 2 holds the read locked lock, but it can't take the spin lock.
The patch makes queued_read_lock_slowpath() to give task 1 the same priority as it was an interrupt handler, and to take the lock
That re-introduces starvation scenarios. And the above looks like a proper deadlock that should be sorted by fixing the locking order.
We can move tasklist_lock out of send_sigio(), but I'm not sure it's possible for read_lock(&fown->lock).
So the scenario is:
CPU0 CPU1 CPU2
spin_lock_irqsave(&fa->fa_lock); read_lock(&tasklist_lock) write_lock_irq(&tasklist_lock) read_lock(&tasklist_lock) <IRQ> spin_lock_irqsave(&fa->fa_lock);
Right? (where the row now signifies time)
That doesn't seem to include fown->lock, you're saying it has an identical issue?
Is there another solution? Is there reliable way to iterate do_each_pid_task() with rcu_read_lock()?
Depends on what you call reliable :-), Yes you can use do_each_pid_task() with RCU, but as always you're prone to see tasks that are dead and miss tasks that just came in.
If that is sufficient for the signal muck, dunno :/ Typically signals use sighand lock, not tasklist_lock.
On 04.04.2018 19:18, Peter Zijlstra wrote:
On Wed, Apr 04, 2018 at 06:51:08PM +0300, Kirill Tkhai wrote:
On 04.04.2018 18:35, Peter Zijlstra wrote:
On Wed, Apr 04, 2018 at 06:24:39PM +0300, Kirill Tkhai wrote:
The following situation leads to deadlock:
[task 1] [task 2] [task 3] kill_fasync() mm_update_next_owner() copy_process() spin_lock_irqsave(&fa->fa_lock) read_lock(&tasklist_lock) write_lock_irq(&tasklist_lock) send_sigio() <IRQ> ... read_lock(&fown->lock) kill_fasync() ... read_lock(&tasklist_lock) spin_lock_irqsave(&fa->fa_lock) ...
Task 1 can't acquire read locked tasklist_lock, since there is already task 3 expressed its wish to take the lock exclusive. Task 2 holds the read locked lock, but it can't take the spin lock.
The patch makes queued_read_lock_slowpath() to give task 1 the same priority as it was an interrupt handler, and to take the lock
That re-introduces starvation scenarios. And the above looks like a proper deadlock that should be sorted by fixing the locking order.
We can move tasklist_lock out of send_sigio(), but I'm not sure it's possible for read_lock(&fown->lock).
So the scenario is:
CPU0 CPU1 CPU2
spin_lock_irqsave(&fa->fa_lock); read_lock(&tasklist_lock) write_lock_irq(&tasklist_lock) read_lock(&tasklist_lock) <IRQ> spin_lock_irqsave(&fa->fa_lock);
Right? (where the row now signifies time)
That doesn't seem to include fown->lock, you're saying it has an identical issue?
There is also read_lock(), which can be taken from interrupt:
CPU0 CPU1 CPU2
f_getown() kill_fasync() read_lock(&f_own->lock) spin_lock_irqsave(&fa->fa_lock, flags) <IRQ> send_sigio() write_lock_irq(&f_own->lock); kill_fasync() read_lock(&fown->lock) spin_lock_irqsave(&fa->fa_lock, flags)
To prevent deadlock, this requires all &f_own->lock be taken via read_lock_irqsave().
This may be formalized as lockdep rule: if spinlock nests into read_lock(), and they both can be called from interrupt, the rest of read_lock() always must disable interrupts.
Is there another solution? Is there reliable way to iterate do_each_pid_task() with rcu_read_lock()?
Depends on what you call reliable :-), Yes you can use do_each_pid_task() with RCU, but as always you're prone to see tasks that are dead and miss tasks that just came in.
If that is sufficient for the signal muck, dunno :/ Typically signals use sighand lock, not tasklist_lock.
The first thing is not a problem, while missing a newly moved task is not suitable. So, it seems it's not reliable...
Kirill
On Wed, Apr 04, 2018 at 06:24:39PM +0300, Kirill Tkhai wrote:
The following situation leads to deadlock:
[task 1] [task 2] [task 3] kill_fasync() mm_update_next_owner() copy_process() spin_lock_irqsave(&fa->fa_lock) read_lock(&tasklist_lock) write_lock_irq(&tasklist_lock) send_sigio() <IRQ> ... read_lock(&fown->lock) kill_fasync() ... read_lock(&tasklist_lock) spin_lock_irqsave(&fa->fa_lock) ...
Task 1 can't acquire read locked tasklist_lock, since there is already task 3 expressed its wish to take the lock exclusive. Task 2 holds the read locked lock, but it can't take the spin lock.
The patch makes queued_read_lock_slowpath() to give task 1 the same priority as it was an interrupt handler, and to take the lock dispite of task 3 is waiting it, and this prevents the deadlock. It seems there is no better way to detect such the situations, also in general it's not good to wait so long for readers with interrupts disabled, since read_lock may nest with another locks and delay the system.
Signed-off-by: Kirill Tkhai ktkhai@virtuozzo.com
kernel/locking/qrwlock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
<formletter>
This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html for how to do this properly.
</formletter>
linux-stable-mirror@lists.linaro.org