On Thu, Oct 02, 2025 at 10:31:53AM +0800, Lance Yang wrote:
On 2025/10/2 09:46, Wei Yang wrote:
On Thu, Oct 02, 2025 at 01:38:25AM +0000, Wei Yang wrote:
We add pmd folio into ds_queue on the first page fault in __do_huge_pmd_anonymous_page(), so that we can split it in case of memory pressure. This should be the same for a pmd folio during wp page fault.
Commit 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault") miss to add it to ds_queue, which means system may not reclaim enough memory in case of memory pressure even the pmd folio is under used.
Move deferred_split_folio() into map_anon_folio_pmd() to make the pmd folio installation consistent.
Since we move deferred_split_folio() into map_anon_folio_pmd(), I am thinking about whether we can consolidate the process in collapse_huge_page().
Use map_anon_folio_pmd() in collapse_huge_page(), but skip those statistic adjustment.
Yeah, that's a good idea :)
We could add a simple bool is_fault parameter to map_anon_folio_pmd() to control the statistics.
The fault paths would call it with true, and the collapse paths could then call it with false.
Something like this:
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 1b81680b4225..9924180a4a56 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1218,7 +1218,7 @@ static struct folio *vma_alloc_anon_folio_pmd(struct vm_area_struct *vma, } static void map_anon_folio_pmd(struct folio *folio, pmd_t *pmd, - struct vm_area_struct *vma, unsigned long haddr) + struct vm_area_struct *vma, unsigned long haddr, bool is_fault) { pmd_t entry; @@ -1228,10 +1228,15 @@ static void map_anon_folio_pmd(struct folio *folio, pmd_t *pmd, folio_add_lru_vma(folio, vma); set_pmd_at(vma->vm_mm, haddr, pmd, entry); update_mmu_cache_pmd(vma, haddr, pmd); - add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); - count_vm_event(THP_FAULT_ALLOC); - count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC); - count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC); + + if (is_fault) { + add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); + count_vm_event(THP_FAULT_ALLOC); + count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC); + count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC); + } + + deferred_split_folio(folio, false); } static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index d0957648db19..2eddd5a60e48 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1227,17 +1227,10 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, __folio_mark_uptodate(folio); pgtable = pmd_pgtable(_pmd); - _pmd = folio_mk_pmd(folio, vma->vm_page_prot); - _pmd = maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma); - spin_lock(pmd_ptl); BUG_ON(!pmd_none(*pmd)); - folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE); - folio_add_lru_vma(folio, vma); pgtable_trans_huge_deposit(mm, pmd, pgtable); - set_pmd_at(mm, address, pmd, _pmd); - update_mmu_cache_pmd(vma, address, pmd); - deferred_split_folio(folio, false); + map_anon_folio_pmd(folio, pmd, vma, address, false); spin_unlock(pmd_ptl); folio = NULL;
Untested, though.
This is the same as I thought.
Will prepare a patch for it.