sync_core_before_usermode() had an incorrect optimization. If we're in an IRQ, we can get to usermode without IRET -- we just have to schedule to a different task in the same mm and do SYSRET. Fortunately, there were no callers of sync_core_before_usermode() that could have had in_irq() or in_nmi() equal to true, because it's only ever called from the scheduler.
While we're at it, clarify a related comment.
Cc: stable@vger.kernel.org Reviewed-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Signed-off-by: Andy Lutomirski luto@kernel.org --- arch/x86/include/asm/sync_core.h | 9 +++++---- arch/x86/mm/tlb.c | 10 ++++++++-- 2 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/sync_core.h b/arch/x86/include/asm/sync_core.h index 0fd4a9dfb29c..ab7382f92aff 100644 --- a/arch/x86/include/asm/sync_core.h +++ b/arch/x86/include/asm/sync_core.h @@ -98,12 +98,13 @@ static inline void sync_core_before_usermode(void) /* With PTI, we unconditionally serialize before running user code. */ if (static_cpu_has(X86_FEATURE_PTI)) return; + /* - * Return from interrupt and NMI is done through iret, which is core - * serializing. + * Even if we're in an interrupt, we might reschedule before returning, + * in which case we could switch to a different thread in the same mm + * and return using SYSRET or SYSEXIT. Instead of trying to keep + * track of our need to sync the core, just sync right away. */ - if (in_irq() || in_nmi()) - return; sync_core(); }
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 11666ba19b62..569ac1d57f55 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -474,8 +474,14 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, /* * The membarrier system call requires a full memory barrier and * core serialization before returning to user-space, after - * storing to rq->curr. Writing to CR3 provides that full - * memory barrier and core serializing instruction. + * storing to rq->curr, when changing mm. This is because + * membarrier() sends IPIs to all CPUs that are in the target mm + * to make them issue memory barriers. However, if another CPU + * switches to/from the target mm concurrently with + * membarrier(), it can cause that CPU not to receive an IPI + * when it really should issue a memory barrier. Writing to CR3 + * provides that full memory barrier and core serializing + * instruction. */ if (real_prev == next) { VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) !=
On Thu, Dec 3, 2020 at 9:07 PM Andy Lutomirski luto@kernel.org wrote:
sync_core_before_usermode() had an incorrect optimization. If we're in an IRQ, we can get to usermode without IRET -- we just have to schedule to a different task in the same mm and do SYSRET. Fortunately, there were no callers of sync_core_before_usermode() that could have had in_irq() or in_nmi() equal to true, because it's only ever called from the scheduler.
While we're at it, clarify a related comment.
Fixes: ac1ab12a3e6e ("lockin/x86: Implement sync_core_before_usermode()")
Cc: stable@vger.kernel.org Reviewed-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Signed-off-by: Andy Lutomirski luto@kernel.org
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: a493d1ca1a03b532871f1da27f8dbda2b28b04c4 Gitweb: https://git.kernel.org/tip/a493d1ca1a03b532871f1da27f8dbda2b28b04c4 Author: Andy Lutomirski luto@kernel.org AuthorDate: Thu, 03 Dec 2020 21:07:03 -08:00 Committer: Thomas Gleixner tglx@linutronix.de CommitterDate: Wed, 09 Dec 2020 09:37:42 +01:00
x86/membarrier: Get rid of a dubious optimization
sync_core_before_usermode() had an incorrect optimization. If the kernel returns from an interrupt, it can get to usermode without IRET. It just has to schedule to a different task in the same mm and do SYSRET. Fortunately, there were no callers of sync_core_before_usermode() that could have had in_irq() or in_nmi() equal to true, because it's only ever called from the scheduler.
While at it, clarify a related comment.
Fixes: 70216e18e519 ("membarrier: Provide core serializing command, *_SYNC_CORE") Signed-off-by: Andy Lutomirski luto@kernel.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/5afc7632be1422f91eaf7611aaaa1b5b8580a086.160705830...
--- arch/x86/include/asm/sync_core.h | 9 +++++---- arch/x86/mm/tlb.c | 10 ++++++++-- 2 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/sync_core.h b/arch/x86/include/asm/sync_core.h index 0fd4a9d..ab7382f 100644 --- a/arch/x86/include/asm/sync_core.h +++ b/arch/x86/include/asm/sync_core.h @@ -98,12 +98,13 @@ static inline void sync_core_before_usermode(void) /* With PTI, we unconditionally serialize before running user code. */ if (static_cpu_has(X86_FEATURE_PTI)) return; + /* - * Return from interrupt and NMI is done through iret, which is core - * serializing. + * Even if we're in an interrupt, we might reschedule before returning, + * in which case we could switch to a different thread in the same mm + * and return using SYSRET or SYSEXIT. Instead of trying to keep + * track of our need to sync the core, just sync right away. */ - if (in_irq() || in_nmi()) - return; sync_core(); }
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 11666ba..569ac1d 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -474,8 +474,14 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, /* * The membarrier system call requires a full memory barrier and * core serialization before returning to user-space, after - * storing to rq->curr. Writing to CR3 provides that full - * memory barrier and core serializing instruction. + * storing to rq->curr, when changing mm. This is because + * membarrier() sends IPIs to all CPUs that are in the target mm + * to make them issue memory barriers. However, if another CPU + * switches to/from the target mm concurrently with + * membarrier(), it can cause that CPU not to receive an IPI + * when it really should issue a memory barrier. Writing to CR3 + * provides that full memory barrier and core serializing + * instruction. */ if (real_prev == next) { VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) !=
linux-stable-mirror@lists.linaro.org