Re: [PATCH v2] hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing

24 Oct 2022

On 10/24/22 14:55, Mike Kravetz wrote:
...
On 10/22/22 19:50, Mike Kravetz wrote:
...
madvise(MADV_DONTNEED) ends up calling zap_page_range() to clear the
page tables associated with the address range.  For hugetlb vmas,
zap_page_range will call __unmap_hugepage_range_final.  However,
__unmap_hugepage_range_final assumes the passed vma is about to be
removed and deletes the vma_lock to prevent pmd sharing as the vma is
on the way out.  In the case of madvise(MADV_DONTNEED) the vma remains,
but the missing vma_lock prevents pmd sharing and could potentially
lead to issues with truncation/fault races.
This issue was originally reported here [1] as a BUG triggered in
page_try_dup_anon_rmap.  Prior to the introduction of the hugetlb
vma_lock, __unmap_hugepage_range_final cleared the VM_MAYSHARE flag to
prevent pmd sharing.  Subsequent faults on this vma were confused as
VM_MAYSHARE indicates a sharable vma, but was not set so page_mapping
was not set in new pages added to the page table.  This resulted in
pages that appeared anonymous in a VM_SHARED vma and triggered the BUG.
Create a new routine clear_hugetlb_page_range() that can be called from
madvise(MADV_DONTNEED) for hugetlb vmas.  It has the same setup as
zap_page_range, but does not delete the vma_lock.
After seeing a syzbot use after free report [2] that is also addressed by
this patch, I started thinking ...
When __unmap_hugepage_range_final was created, the only time unmap_single_vma
was called for hugetlb vmas was during process exit time via exit_mmap.  I got
in trouble when I added a call via madvise(MADV_DONTNEED) which calls
zap_page_range.  This patch takes care of that calling path by having
madvise(MADV_DONTNEED) call a new routine clear_hugetlb_page_range instead of
zap_page_range for hugetlb vmas.  The use after free bug had me auditing code
paths to make sure __unmap_hugepage_range_final was REALLY only called at
process exit time.  If not, and we could fault on a vma after calling
__unmap_hugepage_range_final we would be in trouble.
My thought was, what if we had __unmap_hugepage_range_final check mm->mm_users
to determine if it was being called in the process exit path?  If !mm_users,
then we can delete the vma_lock to prevent pmd sharing as we know the process
is exiting.  If not, we do not delete the lock.  That seems to be more robust
and would prevent issues if someone accidentally introduces a new code path
where __unmap_hugepage_range_final (unmap_single_vma for a hugetlb vma)
could be called outside process exit context.
Thoughts?
[2] https://lore.kernel.org/linux-mm/000000000000d5e00a05e834962e@google.com/
Sorry if this seems like I am talking to myself.  Here is a proposed v3 as
described above.
From 1466fd43e180ede3f6479d1dca4e7f350f86f80b Mon Sep 17 00:00:00 2001
From: Mike Kravetz mike.kravetz@oracle.com
Date: Mon, 24 Oct 2022 15:40:05 -0700
Subject: [PATCH v3] hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED
 processing
When hugetlb madvise(MADV_DONTNEED) support was added, the existing
code to call zap_page_range() to clear the page tables associated
with the address range was not modified.  However, for hugetlb vmas
zap_page_range will call __unmap_hugepage_range_final. This routine
assumes the passed hugetlb vma is about to be removed and deletes
the vma_lock to prevent pmd sharing as the vma is on the way out.  In
the case of madvise(MADV_DONTNEED) the vma remains, but the missing
vma_lock prevents pmd sharing and could potentially lead to issues
with truncation/fault races.
This issue was originally reported here [1] as a BUG triggered in
page_try_dup_anon_rmap.  Prior to the introduction of the hugetlb
vma_lock, __unmap_hugepage_range_final cleared the VM_MAYSHARE flag
to prevent pmd sharing.  Subsequent faults on this vma were confused
as VM_MAYSHARE indicates a sharable vma, but was not set so page_mapping
was not set in new pages added to the page table.  This resulted in
pages that appeared anonymous in a VM_SHARED vma and triggered the BUG.
__unmap_hugepage_range_final was originally designed only to be called
in the context of process exit (exit_mmap).  It is now called in the
context of madvise(MADV_DONTNEED).  Restructure the routine and check
for !mm_users which indicates it is being called in the context of
process exit.  If being called in process exit context, delete the
vma_lock.  Otherwise, just unmap and leave the lock.  Since the routine
is called in more than just process exit context, rename to eliminate
'final' as __unmap_hugetlb_page_range.
[1] https://lore.kernel.org/lkml/CAO4mrfdLMXsao9RF4fUE8-Wfde8xmjsKrTNMNC9wjUb6Ju...
Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings")
Reported-by: Wei Chen harperchen1110@gmail.com
Signed-off-by: Mike Kravetz mike.kravetz@oracle.com
Cc: stable@vger.kernel.org
---
v3 - Check for !mm_users in __unmap_hugepage_range_final instead of
     creating a separate function.
include/linux/hugetlb.h |  4 ++--
 mm/hugetlb.c            | 30 ++++++++++++++++++++----------
 mm/memory.c             |  2 +-
 3 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index a899bc76d677..bc19a1f6ca10 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -158,7 +158,7 @@ long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
 void unmap_hugepage_range(struct vm_area_struct *,
    		  unsigned long, unsigned long, struct page *,
    		  zap_flags_t);
-void __unmap_hugepage_range_final(struct mmu_gather *tlb,
+void __unmap_hugetlb_page_range(struct mmu_gather *tlb,
    		  struct vm_area_struct *vma,
    		  unsigned long start, unsigned long end,
    		  struct page *ref_page, zap_flags_t zap_flags);
@@ -418,7 +418,7 @@ static inline unsigned long hugetlb_change_protection(
    return 0;
 }
-static inline void __unmap_hugepage_range_final(struct mmu_gather *tlb,
+static inline void __unmap_hugetlb_page_range(struct mmu_gather *tlb,
    		struct vm_area_struct *vma, unsigned long start,
    		unsigned long end, struct page *ref_page,
    		zap_flags_t zap_flags)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 931789a8f734..3fe1152c3c20 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5202,27 +5202,37 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct
    	tlb_flush_mmu_tlbonly(tlb);
 }
-void __unmap_hugepage_range_final(struct mmu_gather *tlb,
+void __unmap_hugetlb_page_range(struct mmu_gather *tlb,
    		  struct vm_area_struct *vma, unsigned long start,
    		  unsigned long end, struct page *ref_page,
    		  zap_flags_t zap_flags)
 {
+	struct mm_struct *mm = vma->vm_mm;
+
    hugetlb_vma_lock_write(vma);
    i_mmap_lock_write(vma->vm_file->f_mapping);
__unmap_hugepage_range(tlb, vma, start, end, ref_page, zap_flags);
/*
-	 * Unlock and free the vma lock before releasing i_mmap_rwsem.  When
-	 * the vma_lock is freed, this makes the vma ineligible for pmd
-	 * sharing.  And, i_mmap_rwsem is required to set up pmd sharing.
-	 * This is important as page tables for this unmapped range will
-	 * be asynchrously deleted.  If the page tables are shared, there
-	 * will be issues when accessed by someone else.
+	 * Free the vma_lock here if process exiting
     */
-	__hugetlb_vma_unlock_write_free(vma);
-
-	i_mmap_unlock_write(vma->vm_file->f_mapping);
+	if (!atomic_read(&mm->mm_users)) {
+		/*
+		 * Unlock and free the vma lock before releasing i_mmap_rwsem.
+		 * When the vma_lock is freed, this makes the vma ineligible
+		 * for pmd sharing.  And, i_mmap_rwsem is required to set up
+		 * pmd sharing.  This is important as page tables for this
+		 * unmapped range will be asynchrously deleted.  If the page
+		 * tables are shared, there will be issues when accessed by
+		 * someone else.
+		 */
+		__hugetlb_vma_unlock_write_free(vma);
+		i_mmap_unlock_write(vma->vm_file->f_mapping);
+	} else {
+		i_mmap_unlock_write(vma->vm_file->f_mapping);
+		hugetlb_vma_unlock_write(vma);
+	}
 }
void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
diff --git a/mm/memory.c b/mm/memory.c
index 8e72f703ed99..1de8ea504047 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1687,7 +1687,7 @@ static void unmap_single_vma(struct mmu_gather *tlb,
    		if (vma->vm_file) {
    			zap_flags_t zap_flags = details ?
    			    details->zap_flags : 0;
-				__unmap_hugepage_range_final(tlb, vma, start, end,
+				__unmap_hugetlb_page_range(tlb, vma, start, end,
    						     NULL, zap_flags);
    		}
    	} else
-- 
2.37.3


    

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v2] hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing