On 30.06.25 15:39, Lorenzo Stoakes wrote:
On Mon, Jun 30, 2025 at 09:13:05AM +0800, Lance Yang wrote:
From: Lance Yang lance.yang@linux.dev
As pointed out by David[1], the batched unmap logic in try_to_unmap_one() may read past the end of a PTE table when a large folio's PTE mappings are not fully contained within a single page table.
While this scenario might be rare, an issue triggerable from userspace must be fixed regardless of its likelihood. This patch fixes the out-of-bounds access by refactoring the logic into a new helper, folio_unmap_pte_batch().
The new helper correctly calculates the safe batch size by capping the scan at both the VMA and PMD boundaries. To simplify the code, it also supports partial batching (i.e., any number of pages from 1 up to the calculated safe maximum), as there is no strong reason to special-case for fully mapped folios.
[1] https://lore.kernel.org/linux-mm/a694398c-9f03-4737-81b9-7e49c857fcbe@redhat...
Fixes: 354dffd29575 ("mm: support batched unmap for lazyfree large folios during reclamation") Cc: stable@vger.kernel.org Acked-by: Barry Song baohua@kernel.org Suggested-by: David Hildenbrand david@redhat.com Suggested-by: Barry Song baohua@kernel.org Signed-off-by: Lance Yang lance.yang@linux.dev
This LGTM:
Reviewed-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com
v2 -> v3:
- Tweak changelog (per Barry and David)
- Pick AB from Barry - thanks!
- https://lore.kernel.org/linux-mm/20250627062319.84936-1-lance.yang@linux.dev
v1 -> v2:
- Update subject and changelog (per Barry)
- https://lore.kernel.org/linux-mm/20250627025214.30887-1-lance.yang@linux.dev
mm/rmap.c | 46 ++++++++++++++++++++++++++++------------------ 1 file changed, 28 insertions(+), 18 deletions(-)
diff --git a/mm/rmap.c b/mm/rmap.c index fb63d9256f09..1320b88fab74 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1845,23 +1845,32 @@ void folio_remove_rmap_pud(struct folio *folio, struct page *page, #endif }
-/* We support batch unmapping of PTEs for lazyfree large folios */ -static inline bool can_batch_unmap_folio_ptes(unsigned long addr,
struct folio *folio, pte_t *ptep)
+static inline unsigned int folio_unmap_pte_batch(struct folio *folio,
struct page_vma_mapped_walk *pvmw,
{ const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY;enum ttu_flags flags, pte_t pte)
- int max_nr = folio_nr_pages(folio);
- pte_t pte = ptep_get(ptep);
unsigned long end_addr, addr = pvmw->address;
struct vm_area_struct *vma = pvmw->vma;
unsigned int max_nr;
if (flags & TTU_HWPOISON)
return 1;
if (!folio_test_large(folio))
return 1;
/* We may only batch within a single VMA and a single page table. */
end_addr = pmd_addr_end(addr, vma->vm_end);
max_nr = (end_addr - addr) >> PAGE_SHIFT;
/* We only support lazyfree batching for now ... */ if (!folio_test_anon(folio) || folio_test_swapbacked(folio))
return false;
if (pte_unused(pte))return 1;
return false;
- if (pte_pfn(pte) != folio_pfn(folio))
return false;
return 1;
- return folio_pte_batch(folio, addr, ptep, pte, max_nr, fpb_flags, NULL,
NULL, NULL) == max_nr;
- return folio_pte_batch(folio, addr, pvmw->pte, pte, max_nr, fpb_flags,
NULL, NULL, NULL);
I guess this will conflict with David's changes, but maybe in this simpler case and given this was existing code a bit less? Anyway let's see.
It's a fix, so this is fine to go in first. (I already rebased on top of mm/mm-new where this is in)
Acked-by: David Hildenbrand david@redhat.com