From: Peter Zijlstra peterz@infradead.org
[ Upstream commit d86564a2f085b79ec046a5cba90188e612352806 ]
Jann reported that x86 was missing required TLB invalidates when he hit the !*batch slow path in tlb_remove_table().
This is indeed the case; RCU_TABLE_FREE does not provide TLB (cache) invalidates, the PowerPC-hash where this code originated and the Sparc-hash where this was subsequently used did not need that. ARM which later used this put an explicit TLB invalidate in their __p*_free_tlb() functions, and PowerPC-radix followed that example.
But when we hooked up x86 we failed to consider this. Fix this by (optionally) hooking tlb_remove_table() into the TLB invalidate code.
NOTE: s390 was also needing something like this and might now be able to use the generic code again.
[ Modified to be on top of Nick's cleanups, which simplified this patch now that tlb_flush_mmu_tlbonly() really only flushes the TLB - Linus ]
Fixes: 9e52fc2b50de ("x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)") Reported-by: Jann Horn jannh@google.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Acked-by: Rik van Riel riel@surriel.com Cc: Nicholas Piggin npiggin@gmail.com Cc: David Miller davem@davemloft.net Cc: Will Deacon will.deacon@arm.com Cc: Martin Schwidefsky schwidefsky@de.ibm.com Cc: Michael Ellerman mpe@ellerman.id.au Cc: stable@kernel.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin alexander.levin@microsoft.com --- arch/Kconfig | 3 +++ arch/x86/Kconfig | 1 + mm/memory.c | 18 ++++++++++++++++++ 3 files changed, 22 insertions(+)
diff --git a/arch/Kconfig b/arch/Kconfig index d1f2ed462ac8..f03b72644902 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -354,6 +354,9 @@ config HAVE_ARCH_JUMP_LABEL config HAVE_RCU_TABLE_FREE bool
+config HAVE_RCU_TABLE_INVALIDATE + bool + config ARCH_HAVE_NMI_SAFE_CMPXCHG bool
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 6b8065d718bd..1aa4dd3b5687 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -179,6 +179,7 @@ config X86 select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP select HAVE_RCU_TABLE_FREE + select HAVE_RCU_TABLE_INVALIDATE if HAVE_RCU_TABLE_FREE select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_RELIABLE_STACKTRACE if X86_64 && UNWINDER_FRAME_POINTER && STACK_VALIDATION select HAVE_STACKPROTECTOR if CC_HAS_SANE_STACKPROTECTOR diff --git a/mm/memory.c b/mm/memory.c index 0e356dd923c2..42ce01fd9793 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -330,6 +330,21 @@ bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page, int page_ * See the comment near struct mmu_table_batch. */
+/* + * If we want tlb_remove_table() to imply TLB invalidates. + */ +static inline void tlb_table_invalidate(struct mmu_gather *tlb) +{ +#ifdef CONFIG_HAVE_RCU_TABLE_INVALIDATE + /* + * Invalidate page-table caches used by hardware walkers. Then we still + * need to RCU-sched wait while freeing the pages because software + * walkers can still be in-flight. + */ + tlb_flush_mmu_tlbonly(tlb); +#endif +} + static void tlb_remove_table_smp_sync(void *arg) { /* Simply deliver the interrupt */ @@ -366,6 +381,7 @@ void tlb_table_flush(struct mmu_gather *tlb) struct mmu_table_batch **batch = &tlb->batch;
if (*batch) { + tlb_table_invalidate(tlb); call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu); *batch = NULL; } @@ -387,11 +403,13 @@ void tlb_remove_table(struct mmu_gather *tlb, void *table) if (*batch == NULL) { *batch = (struct mmu_table_batch *)__get_free_page(GFP_NOWAIT | __GFP_NOWARN); if (*batch == NULL) { + tlb_table_invalidate(tlb); tlb_remove_table_one(table); return; } (*batch)->nr = 0; } + (*batch)->tables[(*batch)->nr++] = table; if ((*batch)->nr == MAX_TABLE_BATCH) tlb_table_flush(tlb);