On Wed, Mar 05, 2025 at 06:48:19PM +0000, Catalin Marinas wrote:
On Tue, Mar 04, 2025 at 12:51:27AM -0800, Piotr Jaroszynski wrote:
Update the __flush_tlb_range_op macro not to modify its parameters as these are unexepcted semantics. In practice, this fixes the call to mmu_notifier_arch_invalidate_secondary_tlbs() in __flush_tlb_range_nosync() to use the correct range instead of an empty range with start=end. The empty range was (un)lucky as it results in taking the invalidate-all path that doesn't cause correctness issues, but can certainly result in suboptimal perf.
This has been broken since commit 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs") when the call to the notifiers was added to __flush_tlb_range(). It predates the addition of the __flush_tlb_range_op() macro from commit 360839027a6e ("arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range") that made the bug hard to spot.
That's the problem with macros.
Yep, that's why I missed it when adding the notifier call. Anyway:
Reviewed-by: Alistair Popple apopple@nvidia.com
Reviewed-by: Catalin Marinas catalin.marinas@arm.com
Will, do you want to take this as a fix? It's only a performance regression, though you never know how it breaks the callers of the macro at some point.
Fixes: 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs")
Signed-off-by: Piotr Jaroszynski pjaroszynski@nvidia.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org Cc: Robin Murphy robin.murphy@arm.com Cc: Alistair Popple apopple@nvidia.com Cc: Raghavendra Rao Ananta rananta@google.com Cc: SeongJae Park sj@kernel.org Cc: Jason Gunthorpe jgg@nvidia.com Cc: John Hubbard jhubbard@nvidia.com Cc: Nicolin Chen nicolinc@nvidia.com Cc: linux-arm-kernel@lists.infradead.org Cc: iommu@lists.linux.dev Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org
arch/arm64/include/asm/tlbflush.h | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h index bc94e036a26b..8104aee4f9a0 100644 --- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -396,33 +396,35 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) #define __flush_tlb_range_op(op, start, pages, stride, \ asid, tlb_level, tlbi_user, lpa2) \ do { \
- typeof(start) __flush_start = start; \
- typeof(pages) __flush_pages = pages; \ int num = 0; \ int scale = 3; \ int shift = lpa2 ? 16 : PAGE_SHIFT; \ unsigned long addr; \ \
- while (pages > 0) { \
- while (__flush_pages > 0) { \ if (!system_supports_tlb_range() || \
pages == 1 || \
(lpa2 && start != ALIGN(start, SZ_64K))) { \
addr = __TLBI_VADDR(start, asid); \
__flush_pages == 1 || \
(lpa2 && __flush_start != ALIGN(__flush_start, SZ_64K))) { \
addr = __TLBI_VADDR(__flush_start, asid); \ __tlbi_level(op, addr, tlb_level); \ if (tlbi_user) \ __tlbi_user_level(op, addr, tlb_level); \
start += stride; \
pages -= stride >> PAGE_SHIFT; \
__flush_start += stride; \
} \ \__flush_pages -= stride >> PAGE_SHIFT; \ continue; \
num = __TLBI_RANGE_NUM(pages, scale); \
if (num >= 0) { \num = __TLBI_RANGE_NUM(__flush_pages, scale); \
addr = __TLBI_VADDR_RANGE(start >> shift, asid, \
addr = __TLBI_VADDR_RANGE(__flush_start >> shift, asid, \ scale, num, tlb_level); \ __tlbi(r##op, addr); \ if (tlbi_user) \ __tlbi_user(r##op, addr); \
start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; \
pages -= __TLBI_RANGE_PAGES(num, scale); \
__flush_start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; \
} \ scale--; \ } \__flush_pages -= __TLBI_RANGE_PAGES(num, scale);\
base-commit: 99fa936e8e4f117d62f229003c9799686f74cebc
2.22.1.7.gac84d6e93c.dirty
-- Catalin