On 29/04/2025 15:29, David Hildenbrand wrote:
On 29.04.25 16:22, Petr Vaněk wrote:
folio_pte_batch() could overcount the number of contiguous PTEs when pte_advance_pfn() returns a zero-valued PTE and the following PTE in memory also happens to be zero. The loop doesn't break in such a case because pte_same() returns true, and the batch size is advanced by one more than it should be.
To fix this, bail out early if a non-present PTE is encountered, preventing the invalid comparison.
This issue started to appear after commit 10ebac4f95e7 ("mm/memory: optimize unmap/zap with PTE-mapped THP") and was discovered via git bisect.
Fixes: 10ebac4f95e7 ("mm/memory: optimize unmap/zap with PTE-mapped THP") Cc: stable@vger.kernel.org Signed-off-by: Petr Vaněk arkamar@atlas.cz
mm/internal.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/mm/internal.h b/mm/internal.h index e9695baa5922..c181fe2bac9d 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -279,6 +279,8 @@ static inline int folio_pte_batch(struct folio *folio, unsigned long addr, dirty = !!pte_dirty(pte); pte = __pte_batch_clear_ignored(pte, flags); + if (!pte_present(pte)) + break; if (!pte_same(pte, expected_pte)) break;
How could pte_same() suddenly match on a present and non-present PTE.
Something with XEN is really problematic here.
We are inside a lazy MMU region (arch_enter_lazy_mmu_mode()) at this point, which I believe XEN uses. If a PTE was written then read back while in lazy mode you could get a stale value.
See https://lore.kernel.org/all/912c7a32-b39c-494f-a29c-4865cd92aeba@agordeev.lo... for an example bug.