Update the __flush_tlb_range_op macro not to modify its parameters as these are unexepcted semantics. In practice, this fixes the call to mmu_notifier_arch_invalidate_secondary_tlbs() in __flush_tlb_range_nosync() to use the correct range instead of an empty range with start=end. The empty range was (un)lucky as it results in taking the invalidate-all path that doesn't cause correctness issues, but can certainly result in suboptimal perf.
This has been broken since commit 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs") when the call to the notifiers was added to __flush_tlb_range(). It predates the addition of the __flush_tlb_range_op() macro from commit 360839027a6e ("arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range") that made the bug hard to spot.
Fixes: 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs")
Signed-off-by: Piotr Jaroszynski pjaroszynski@nvidia.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org Cc: Robin Murphy robin.murphy@arm.com Cc: Alistair Popple apopple@nvidia.com Cc: Raghavendra Rao Ananta rananta@google.com Cc: SeongJae Park sj@kernel.org Cc: Jason Gunthorpe jgg@nvidia.com Cc: John Hubbard jhubbard@nvidia.com Cc: Nicolin Chen nicolinc@nvidia.com Cc: linux-arm-kernel@lists.infradead.org Cc: iommu@lists.linux.dev Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org --- arch/arm64/include/asm/tlbflush.h | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h index bc94e036a26b..8104aee4f9a0 100644 --- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -396,33 +396,35 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) #define __flush_tlb_range_op(op, start, pages, stride, \ asid, tlb_level, tlbi_user, lpa2) \ do { \ + typeof(start) __flush_start = start; \ + typeof(pages) __flush_pages = pages; \ int num = 0; \ int scale = 3; \ int shift = lpa2 ? 16 : PAGE_SHIFT; \ unsigned long addr; \ \ - while (pages > 0) { \ + while (__flush_pages > 0) { \ if (!system_supports_tlb_range() || \ - pages == 1 || \ - (lpa2 && start != ALIGN(start, SZ_64K))) { \ - addr = __TLBI_VADDR(start, asid); \ + __flush_pages == 1 || \ + (lpa2 && __flush_start != ALIGN(__flush_start, SZ_64K))) { \ + addr = __TLBI_VADDR(__flush_start, asid); \ __tlbi_level(op, addr, tlb_level); \ if (tlbi_user) \ __tlbi_user_level(op, addr, tlb_level); \ - start += stride; \ - pages -= stride >> PAGE_SHIFT; \ + __flush_start += stride; \ + __flush_pages -= stride >> PAGE_SHIFT; \ continue; \ } \ \ - num = __TLBI_RANGE_NUM(pages, scale); \ + num = __TLBI_RANGE_NUM(__flush_pages, scale); \ if (num >= 0) { \ - addr = __TLBI_VADDR_RANGE(start >> shift, asid, \ + addr = __TLBI_VADDR_RANGE(__flush_start >> shift, asid, \ scale, num, tlb_level); \ __tlbi(r##op, addr); \ if (tlbi_user) \ __tlbi_user(r##op, addr); \ - start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; \ - pages -= __TLBI_RANGE_PAGES(num, scale); \ + __flush_start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; \ + __flush_pages -= __TLBI_RANGE_PAGES(num, scale);\ } \ scale--; \ } \
base-commit: 99fa936e8e4f117d62f229003c9799686f74cebc
On Tue, Mar 04, 2025 at 12:51:27AM -0800, Piotr Jaroszynski wrote:
Update the __flush_tlb_range_op macro not to modify its parameters as these are unexepcted semantics. In practice, this fixes the call to mmu_notifier_arch_invalidate_secondary_tlbs() in __flush_tlb_range_nosync() to use the correct range instead of an empty range with start=end. The empty range was (un)lucky as it results in taking the invalidate-all path that doesn't cause correctness issues, but can certainly result in suboptimal perf.
This has been broken since commit 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs") when the call to the notifiers was added to __flush_tlb_range(). It predates the addition of the __flush_tlb_range_op() macro from commit 360839027a6e ("arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range") that made the bug hard to spot.
That's the problem with macros.
Reviewed-by: Catalin Marinas catalin.marinas@arm.com
Will, do you want to take this as a fix? It's only a performance regression, though you never know how it breaks the callers of the macro at some point.
Fixes: 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs")
Signed-off-by: Piotr Jaroszynski pjaroszynski@nvidia.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org Cc: Robin Murphy robin.murphy@arm.com Cc: Alistair Popple apopple@nvidia.com Cc: Raghavendra Rao Ananta rananta@google.com Cc: SeongJae Park sj@kernel.org Cc: Jason Gunthorpe jgg@nvidia.com Cc: John Hubbard jhubbard@nvidia.com Cc: Nicolin Chen nicolinc@nvidia.com Cc: linux-arm-kernel@lists.infradead.org Cc: iommu@lists.linux.dev Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org
arch/arm64/include/asm/tlbflush.h | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h index bc94e036a26b..8104aee4f9a0 100644 --- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -396,33 +396,35 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) #define __flush_tlb_range_op(op, start, pages, stride, \ asid, tlb_level, tlbi_user, lpa2) \ do { \
- typeof(start) __flush_start = start; \
- typeof(pages) __flush_pages = pages; \ int num = 0; \ int scale = 3; \ int shift = lpa2 ? 16 : PAGE_SHIFT; \ unsigned long addr; \ \
- while (pages > 0) { \
- while (__flush_pages > 0) { \ if (!system_supports_tlb_range() || \
pages == 1 || \
(lpa2 && start != ALIGN(start, SZ_64K))) { \
addr = __TLBI_VADDR(start, asid); \
__flush_pages == 1 || \
(lpa2 && __flush_start != ALIGN(__flush_start, SZ_64K))) { \
addr = __TLBI_VADDR(__flush_start, asid); \ __tlbi_level(op, addr, tlb_level); \ if (tlbi_user) \ __tlbi_user_level(op, addr, tlb_level); \
start += stride; \
pages -= stride >> PAGE_SHIFT; \
__flush_start += stride; \
} \ \__flush_pages -= stride >> PAGE_SHIFT; \ continue; \
num = __TLBI_RANGE_NUM(pages, scale); \
if (num >= 0) { \num = __TLBI_RANGE_NUM(__flush_pages, scale); \
addr = __TLBI_VADDR_RANGE(start >> shift, asid, \
addr = __TLBI_VADDR_RANGE(__flush_start >> shift, asid, \ scale, num, tlb_level); \ __tlbi(r##op, addr); \ if (tlbi_user) \ __tlbi_user(r##op, addr); \
start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; \
pages -= __TLBI_RANGE_PAGES(num, scale); \
__flush_start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; \
} \ scale--; \ } \__flush_pages -= __TLBI_RANGE_PAGES(num, scale);\
base-commit: 99fa936e8e4f117d62f229003c9799686f74cebc
2.22.1.7.gac84d6e93c.dirty
On Wed, Mar 05, 2025 at 06:48:19PM +0000, Catalin Marinas wrote:
On Tue, Mar 04, 2025 at 12:51:27AM -0800, Piotr Jaroszynski wrote:
Update the __flush_tlb_range_op macro not to modify its parameters as these are unexepcted semantics. In practice, this fixes the call to mmu_notifier_arch_invalidate_secondary_tlbs() in __flush_tlb_range_nosync() to use the correct range instead of an empty range with start=end. The empty range was (un)lucky as it results in taking the invalidate-all path that doesn't cause correctness issues, but can certainly result in suboptimal perf.
This has been broken since commit 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs") when the call to the notifiers was added to __flush_tlb_range(). It predates the addition of the __flush_tlb_range_op() macro from commit 360839027a6e ("arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range") that made the bug hard to spot.
That's the problem with macros.
Reviewed-by: Catalin Marinas catalin.marinas@arm.com
Will, do you want to take this as a fix? It's only a performance regression, though you never know how it breaks the callers of the macro at some point.
Yeah, I'll pick it up but I'm travelling atm so it may have to wait until next week.
Will
On Wed, Mar 05, 2025 at 06:48:19PM +0000, Catalin Marinas wrote:
On Tue, Mar 04, 2025 at 12:51:27AM -0800, Piotr Jaroszynski wrote:
Update the __flush_tlb_range_op macro not to modify its parameters as these are unexepcted semantics. In practice, this fixes the call to mmu_notifier_arch_invalidate_secondary_tlbs() in __flush_tlb_range_nosync() to use the correct range instead of an empty range with start=end. The empty range was (un)lucky as it results in taking the invalidate-all path that doesn't cause correctness issues, but can certainly result in suboptimal perf.
This has been broken since commit 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs") when the call to the notifiers was added to __flush_tlb_range(). It predates the addition of the __flush_tlb_range_op() macro from commit 360839027a6e ("arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range") that made the bug hard to spot.
That's the problem with macros.
Yep, that's why I missed it when adding the notifier call. Anyway:
Reviewed-by: Alistair Popple apopple@nvidia.com
Reviewed-by: Catalin Marinas catalin.marinas@arm.com
Will, do you want to take this as a fix? It's only a performance regression, though you never know how it breaks the callers of the macro at some point.
Fixes: 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs")
Signed-off-by: Piotr Jaroszynski pjaroszynski@nvidia.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org Cc: Robin Murphy robin.murphy@arm.com Cc: Alistair Popple apopple@nvidia.com Cc: Raghavendra Rao Ananta rananta@google.com Cc: SeongJae Park sj@kernel.org Cc: Jason Gunthorpe jgg@nvidia.com Cc: John Hubbard jhubbard@nvidia.com Cc: Nicolin Chen nicolinc@nvidia.com Cc: linux-arm-kernel@lists.infradead.org Cc: iommu@lists.linux.dev Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org
arch/arm64/include/asm/tlbflush.h | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h index bc94e036a26b..8104aee4f9a0 100644 --- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -396,33 +396,35 @@ static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) #define __flush_tlb_range_op(op, start, pages, stride, \ asid, tlb_level, tlbi_user, lpa2) \ do { \
- typeof(start) __flush_start = start; \
- typeof(pages) __flush_pages = pages; \ int num = 0; \ int scale = 3; \ int shift = lpa2 ? 16 : PAGE_SHIFT; \ unsigned long addr; \ \
- while (pages > 0) { \
- while (__flush_pages > 0) { \ if (!system_supports_tlb_range() || \
pages == 1 || \
(lpa2 && start != ALIGN(start, SZ_64K))) { \
addr = __TLBI_VADDR(start, asid); \
__flush_pages == 1 || \
(lpa2 && __flush_start != ALIGN(__flush_start, SZ_64K))) { \
addr = __TLBI_VADDR(__flush_start, asid); \ __tlbi_level(op, addr, tlb_level); \ if (tlbi_user) \ __tlbi_user_level(op, addr, tlb_level); \
start += stride; \
pages -= stride >> PAGE_SHIFT; \
__flush_start += stride; \
} \ \__flush_pages -= stride >> PAGE_SHIFT; \ continue; \
num = __TLBI_RANGE_NUM(pages, scale); \
if (num >= 0) { \num = __TLBI_RANGE_NUM(__flush_pages, scale); \
addr = __TLBI_VADDR_RANGE(start >> shift, asid, \
addr = __TLBI_VADDR_RANGE(__flush_start >> shift, asid, \ scale, num, tlb_level); \ __tlbi(r##op, addr); \ if (tlbi_user) \ __tlbi_user(r##op, addr); \
start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; \
pages -= __TLBI_RANGE_PAGES(num, scale); \
__flush_start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; \
} \ scale--; \ } \__flush_pages -= __TLBI_RANGE_PAGES(num, scale);\
base-commit: 99fa936e8e4f117d62f229003c9799686f74cebc
2.22.1.7.gac84d6e93c.dirty
-- Catalin
On Tue, 04 Mar 2025 00:51:27 -0800, Piotr Jaroszynski wrote:
Update the __flush_tlb_range_op macro not to modify its parameters as these are unexepcted semantics. In practice, this fixes the call to mmu_notifier_arch_invalidate_secondary_tlbs() in __flush_tlb_range_nosync() to use the correct range instead of an empty range with start=end. The empty range was (un)lucky as it results in taking the invalidate-all path that doesn't cause correctness issues, but can certainly result in suboptimal perf.
[...]
Applied to arm64 (for-next/fixes), thanks!
[1/1] Fix mmu notifiers for range-based invalidates https://git.kernel.org/arm64/c/f7edb07ad7c6
Cheers,
Hi,
On 03-04 00:51, Piotr Jaroszynski wrote:
Update the __flush_tlb_range_op macro not to modify its parameters as these are unexepcted semantics. In practice, this fixes the call to mmu_notifier_arch_invalidate_secondary_tlbs() in __flush_tlb_range_nosync() to use the correct range instead of an empty range with start=end. The empty range was (un)lucky as it results in taking the invalidate-all path that doesn't cause correctness issues, but can certainly result in suboptimal perf.
This has been broken since commit 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs") when the call to the notifiers was added to __flush_tlb_range(). It predates the addition of the __flush_tlb_range_op() macro from commit 360839027a6e ("arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range") that made the bug hard to spot.
Fixes: 6bbd42e2df8f ("mmu_notifiers: call invalidate_range() when invalidating TLBs")
I think that strictly speaking this should be:
Fixes: 360839027a6e ("arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range")
Regards, Ivan
linux-stable-mirror@lists.linaro.org