On Fri, Oct 03, 2025 at 10:08:37AM -0400, Zi Yan wrote:
On 3 Oct 2025, at 9:49, Lance Yang wrote:
Hey Wei,
On 2025/10/2 09:38, Wei Yang wrote:
We add pmd folio into ds_queue on the first page fault in __do_huge_pmd_anonymous_page(), so that we can split it in case of memory pressure. This should be the same for a pmd folio during wp page fault.
Commit 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault") miss to add it to ds_queue, which means system may not reclaim enough memory
IIRC, it was commit dafff3f4c850 ("mm: split underused THPs") that started unconditionally adding all new anon THPs to _deferred_list :)
in case of memory pressure even the pmd folio is under used.
Move deferred_split_folio() into map_anon_folio_pmd() to make the pmd folio installation consistent.
Fixes: 1ced09e0331f ("mm: allocate THP on hugezeropage wp-fault")
Shouldn't this rather be the following?
Fixes: dafff3f4c850 ("mm: split underused THPs")
Yes, I agree. In this case, this patch looks more like an optimization for split underused THPs.
One observation on this change is that right after zero pmd wp, the deferred split queue could be scanned, the newly added pmd folio will split since it is all zero except one subpage. This means we probably should allocate a base folio for zero pmd wp and map the rest to zero page at the beginning if split underused THP is enabled to avoid this long trip. The downside is that user app cannot get a pmd folio if it is intended to write data into the entire folio.
Thanks for raising this.
IMHO, we could face the similar situation in __do_huge_pmd_anonymous_page(). If my understanding is correct, the allocated folio is zeroed and we don't have idea how user would write data to it.
Since shrinker is active when memory is low, maybe vma_alloc_anon_folio_pmd() has told use current status of the memory. If it does get a pmd folio, we are probably having enough memory in the system.
Usama might be able to give some insight here.