xen_qlock_wait() isn't safe for nested calls due to interrupts. A call of xen_qlock_kick() might be ignored in case a deeper nesting level was active right before the call of xen_poll_irq():
CPU 1: CPU 2: spin_lock(lock1) spin_lock(lock1) -> xen_qlock_wait() -> xen_clear_irq_pending() Interrupt happens spin_unlock(lock1) -> xen_qlock_kick(CPU 2) spin_lock_irqsave(lock2) spin_lock_irqsave(lock2) -> xen_qlock_wait() -> xen_clear_irq_pending() clears kick for lock1 -> xen_poll_irq() spin_unlock_irq_restore(lock2) -> xen_qlock_kick(CPU 2) wakes up spin_unlock_irq_restore(lock2) IRET resumes in xen_qlock_wait() -> xen_poll_irq() never wakes up
The solution is to disable interrupts in xen_qlock_wait() and not to poll for the irq in case xen_qlock_wait() is called in nmi context.
Cc: stable@vger.kernel.org Cc: Waiman.Long@hp.com Cc: peterz@infradead.org Signed-off-by: Juergen Gross jgross@suse.com --- arch/x86/xen/spinlock.c | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-)
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index cd210a4ba7b1..e8d880e98057 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -39,29 +39,25 @@ static void xen_qlock_kick(int cpu) */ static void xen_qlock_wait(u8 *byte, u8 val) { + unsigned long flags; int irq = __this_cpu_read(lock_kicker_irq);
/* If kicker interrupts not initialized yet, just spin */ - if (irq == -1) + if (irq == -1 || in_nmi()) return;
- /* If irq pending already clear it and return. */ + /* Guard against reentry. */ + local_irq_save(flags); + + /* If irq pending already clear it. */ if (xen_test_irq_pending(irq)) { xen_clear_irq_pending(irq); - return; + } else if (READ_ONCE(*byte) == val) { + /* Block until irq becomes pending (or a spurious wakeup) */ + xen_poll_irq(irq); }
- if (READ_ONCE(*byte) != val) - return; - - /* - * If an interrupt happens here, it will leave the wakeup irq - * pending, which will cause xen_poll_irq() to return - * immediately. - */ - - /* Block until irq becomes pending (or perhaps a spurious wakeup) */ - xen_poll_irq(irq); + local_irq_restore(flags); }
static irqreturn_t dummy_handler(int irq, void *dev_id)
Correcting Waiman's mail address
On 01/10/2018 09:16, Juergen Gross wrote:
xen_qlock_wait() isn't safe for nested calls due to interrupts. A call of xen_qlock_kick() might be ignored in case a deeper nesting level was active right before the call of xen_poll_irq():
CPU 1: CPU 2: spin_lock(lock1) spin_lock(lock1) -> xen_qlock_wait() -> xen_clear_irq_pending() Interrupt happens spin_unlock(lock1) -> xen_qlock_kick(CPU 2) spin_lock_irqsave(lock2) spin_lock_irqsave(lock2) -> xen_qlock_wait() -> xen_clear_irq_pending() clears kick for lock1 -> xen_poll_irq() spin_unlock_irq_restore(lock2) -> xen_qlock_kick(CPU 2) wakes up spin_unlock_irq_restore(lock2) IRET resumes in xen_qlock_wait() -> xen_poll_irq() never wakes up
The solution is to disable interrupts in xen_qlock_wait() and not to poll for the irq in case xen_qlock_wait() is called in nmi context.
Cc: stable@vger.kernel.org Cc: longman@redhat.com Cc: peterz@infradead.org Signed-off-by: Juergen Gross jgross@suse.com
arch/x86/xen/spinlock.c | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-)
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index cd210a4ba7b1..e8d880e98057 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -39,29 +39,25 @@ static void xen_qlock_kick(int cpu) */ static void xen_qlock_wait(u8 *byte, u8 val) {
- unsigned long flags; int irq = __this_cpu_read(lock_kicker_irq);
/* If kicker interrupts not initialized yet, just spin */
- if (irq == -1)
- if (irq == -1 || in_nmi()) return;
- /* If irq pending already clear it and return. */
- /* Guard against reentry. */
- local_irq_save(flags);
- /* If irq pending already clear it. */ if (xen_test_irq_pending(irq)) { xen_clear_irq_pending(irq);
return;
- } else if (READ_ONCE(*byte) == val) {
/* Block until irq becomes pending (or a spurious wakeup) */
}xen_poll_irq(irq);
- if (READ_ONCE(*byte) != val)
return;
- /*
* If an interrupt happens here, it will leave the wakeup irq
* pending, which will cause xen_poll_irq() to return
* immediately.
*/
- /* Block until irq becomes pending (or perhaps a spurious wakeup) */
- xen_poll_irq(irq);
- local_irq_restore(flags);
} static irqreturn_t dummy_handler(int irq, void *dev_id)
On 01.10.18 at 09:16, jgross@suse.com wrote:
xen_qlock_wait() isn't safe for nested calls due to interrupts. A call of xen_qlock_kick() might be ignored in case a deeper nesting level was active right before the call of xen_poll_irq():
CPU 1: CPU 2: spin_lock(lock1) spin_lock(lock1) -> xen_qlock_wait() -> xen_clear_irq_pending() Interrupt happens spin_unlock(lock1) -> xen_qlock_kick(CPU 2) spin_lock_irqsave(lock2) spin_lock_irqsave(lock2) -> xen_qlock_wait() -> xen_clear_irq_pending() clears kick for lock1 -> xen_poll_irq() spin_unlock_irq_restore(lock2) -> xen_qlock_kick(CPU 2) wakes up spin_unlock_irq_restore(lock2) IRET resumes in xen_qlock_wait() -> xen_poll_irq() never wakes up
The solution is to disable interrupts in xen_qlock_wait() and not to poll for the irq in case xen_qlock_wait() is called in nmi context.
Are precautions against NMI really worthwhile? Locks acquired both in NMI context as well as outside of it are liable to deadlock anyway, aren't they?
Jan
On Mon, 2018-10-01 at 09:16 +0200, Juergen Gross wrote:
/* If irq pending already clear it and return. */
/* Guard against reentry. */
local_irq_save(flags);
/* If irq pending already clear it. */ if (xen_test_irq_pending(irq)) { xen_clear_irq_pending(irq);
return;
} else if (READ_ONCE(*byte) == val) {
/* Block until irq becomes pending (or a spurious wakeup) */
xen_poll_irq(irq); }
Does this still allow other IRQs to wake it from xen_poll_irq()?
In the case where process-context code is spinning for a lock without disabling interrupts, we *should* allow interrupts to occur still... does this?
On Wed, 10 Oct 2018, David Woodhouse wrote:
On Mon, 2018-10-01 at 09:16 +0200, Juergen Gross wrote:
/* If irq pending already clear it and return. */
/* Guard against reentry. */
local_irq_save(flags);
/* If irq pending already clear it. */ if (xen_test_irq_pending(irq)) { xen_clear_irq_pending(irq);
return;
} else if (READ_ONCE(*byte) == val) {
/* Block until irq becomes pending (or a spurious wakeup) */
xen_poll_irq(irq); }
Does this still allow other IRQs to wake it from xen_poll_irq()?
In the case where process-context code is spinning for a lock without disabling interrupts, we *should* allow interrupts to occur still... does this?
Yes. Look at it like idle HLT or WFI. You have to disable interrupt before checking the condition and then the hardware or in this case the hypervisor has to bring you back when an interrupt is raised.
If that would not work then the check would be racy, because the interrupt could hit and be handled after the check and before going into HLT/WFI/hypercall and then the thing is out until the next interrupt comes along, which might be never.
Thanks,
tglx
On Wed, 2018-10-10 at 14:30 +0200, Thomas Gleixner wrote:
On Wed, 10 Oct 2018, David Woodhouse wrote:
On Mon, 2018-10-01 at 09:16 +0200, Juergen Gross wrote:
/* If irq pending already clear it and return. */
/* Guard against reentry. */
local_irq_save(flags);
/* If irq pending already clear it. */ if (xen_test_irq_pending(irq)) { xen_clear_irq_pending(irq);
return;
} else if (READ_ONCE(*byte) == val) {
/* Block until irq becomes pending (or a spurious wakeup) */
xen_poll_irq(irq); }
Does this still allow other IRQs to wake it from xen_poll_irq()?
In the case where process-context code is spinning for a lock without disabling interrupts, we *should* allow interrupts to occur still... does this?
Yes. Look at it like idle HLT or WFI. You have to disable interrupt before checking the condition and then the hardware or in this case the hypervisor has to bring you back when an interrupt is raised.
If that would not work then the check would be racy, because the interrupt could hit and be handled after the check and before going into HLT/WFI/hypercall and then the thing is out until the next interrupt comes along, which might be never.
Right, but in this case we're calling into the hypervisor to poll for one *specific* IRQ. Everything you say is true for that specific IRQ.
My question is what happens to *other* IRQs. We want them, but are they masked? I'm staring at the Xen do_poll() code and haven't quite worked that out...
On Wed, 10 Oct 2018, David Woodhouse wrote:
On Wed, 2018-10-10 at 14:30 +0200, Thomas Gleixner wrote:
On Wed, 10 Oct 2018, David Woodhouse wrote:
On Mon, 2018-10-01 at 09:16 +0200, Juergen Gross wrote:
/* If irq pending already clear it and return. */
/* Guard against reentry. */
local_irq_save(flags);
/* If irq pending already clear it. */ if (xen_test_irq_pending(irq)) { xen_clear_irq_pending(irq);
return;
} else if (READ_ONCE(*byte) == val) {
/* Block until irq becomes pending (or a spurious wakeup) */
xen_poll_irq(irq); }
Does this still allow other IRQs to wake it from xen_poll_irq()?
In the case where process-context code is spinning for a lock without disabling interrupts, we *should* allow interrupts to occur still... does this?
Yes. Look at it like idle HLT or WFI. You have to disable interrupt before checking the condition and then the hardware or in this case the hypervisor has to bring you back when an interrupt is raised.
If that would not work then the check would be racy, because the interrupt could hit and be handled after the check and before going into HLT/WFI/hypercall and then the thing is out until the next interrupt comes along, which might be never.
Right, but in this case we're calling into the hypervisor to poll for one *specific* IRQ. Everything you say is true for that specific IRQ.
My question is what happens to *other* IRQs. We want them, but are they masked? I'm staring at the Xen do_poll() code and haven't quite worked that out...
Ah, sorry. That of course has to come back like HLT/WFI for any interrupt, but I have no idea what the Xen HV is doing there.
Thanks,
tglx
On 10/10/2018 14:47, Thomas Gleixner wrote:
On Wed, 10 Oct 2018, David Woodhouse wrote:
On Wed, 2018-10-10 at 14:30 +0200, Thomas Gleixner wrote:
On Wed, 10 Oct 2018, David Woodhouse wrote:
On Mon, 2018-10-01 at 09:16 +0200, Juergen Gross wrote:
/* If irq pending already clear it and return. */
/* Guard against reentry. */
local_irq_save(flags);
/* If irq pending already clear it. */ if (xen_test_irq_pending(irq)) { xen_clear_irq_pending(irq);
return;
} else if (READ_ONCE(*byte) == val) {
/* Block until irq becomes pending (or a spurious wakeup) */
xen_poll_irq(irq); }
Does this still allow other IRQs to wake it from xen_poll_irq()?
In the case where process-context code is spinning for a lock without disabling interrupts, we *should* allow interrupts to occur still... does this?
Yes. Look at it like idle HLT or WFI. You have to disable interrupt before checking the condition and then the hardware or in this case the hypervisor has to bring you back when an interrupt is raised.
If that would not work then the check would be racy, because the interrupt could hit and be handled after the check and before going into HLT/WFI/hypercall and then the thing is out until the next interrupt comes along, which might be never.
Right, but in this case we're calling into the hypervisor to poll for one *specific* IRQ. Everything you say is true for that specific IRQ.
My question is what happens to *other* IRQs. We want them, but are they masked? I'm staring at the Xen do_poll() code and haven't quite worked that out...
Ah, sorry. That of course has to come back like HLT/WFI for any interrupt, but I have no idea what the Xen HV is doing there.
The Xen HV is doing it right. It is blocking the vcpu in do_poll() and any interrupt will unblock it.
Juergen
linux-stable-mirror@lists.linaro.org