When handling a numa page fault, task_numa_fault() should be called by a process that restores the page table of the faulted folio to avoid duplicated stats counting. Commit b99a342d4f11 ("NUMA balancing: reduce TLB flush via delaying mapping on hint page fault") restructured do_numa_page() and did not avoid task_numa_fault() call in the second page table check after a numa migration failure. Fix it by making all !pte_same() return immediately.
This issue can cause task_numa_fault() being called more than necessary and lead to unexpected numa balancing results (It is hard to tell whether the issue will cause positive or negative performance impact due to duplicated numa fault counting).
Link: https://lkml.kernel.org/r/20240809145906.1513458-2-ziy@nvidia.com Fixes: b99a342d4f11 ("NUMA balancing: reduce TLB flush via delaying mapping on hint page fault") Signed-off-by: Zi Yan ziy@nvidia.com Reported-by: "Huang, Ying" ying.huang@intel.com Closes: https://lore.kernel.org/linux-mm/87zfqfw0yw.fsf@yhuang6-desk2.ccr.corp.intel... Acked-by: David Hildenbrand david@redhat.com Cc: Baolin Wang baolin.wang@linux.alibaba.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: Mel Gorman mgorman@suse.de Cc: Yang Shi shy828301@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 40b760cfd44566bca791c80e0720d70d75382b84) --- mm/memory.c | 29 ++++++++++++++--------------- 1 file changed, 14 insertions(+), 15 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c index 99d15abe4a06..4d6eda1cdb6d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4407,7 +4407,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) spin_lock(vmf->ptl); if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) { pte_unmap_unlock(vmf->pte, vmf->ptl); - goto out; + return 0; }
/* Get the normal PTE */ @@ -4454,21 +4454,17 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) if (migrate_misplaced_page(page, vma, target_nid)) { page_nid = target_nid; flags |= TNF_MIGRATED; - } else { - flags |= TNF_MIGRATE_FAIL; - vmf->pte = pte_offset_map(vmf->pmd, vmf->address); - spin_lock(vmf->ptl); - if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) { - pte_unmap_unlock(vmf->pte, vmf->ptl); - goto out; - } - goto out_map; + task_numa_fault(last_cpupid, page_nid, 1, flags); + return 0; }
-out: - if (page_nid != NUMA_NO_NODE) - task_numa_fault(last_cpupid, page_nid, 1, flags); - return 0; + flags |= TNF_MIGRATE_FAIL; + vmf->pte = pte_offset_map(vmf->pmd, vmf->address); + spin_lock(vmf->ptl); + if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) { + pte_unmap_unlock(vmf->pte, vmf->ptl); + return 0; + } out_map: /* * Make it present again, depending on how arch implements @@ -4482,7 +4478,10 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf) ptep_modify_prot_commit(vma, vmf->address, vmf->pte, old_pte, pte); update_mmu_cache(vma, vmf->address, vmf->pte); pte_unmap_unlock(vmf->pte, vmf->ptl); - goto out; + + if (page_nid != NUMA_NO_NODE) + task_numa_fault(last_cpupid, page_nid, 1, flags); + return 0; }
static inline vm_fault_t create_huge_pmd(struct vm_fault *vmf)