Hi,
LTP tests failure with the following commit described below:
On Fri, Jun 20, 2025 at 11:33:32PM +0200, Jann Horn wrote:
From: Liu Shixin liushixin2@huawei.com
[ Upstream commit 59d9094df3d79443937add8700b2ef1a866b1081 ]
The folio refcount may be increased unexpectly through try_get_folio() by caller such as split_huge_pages. In huge_pmd_unshare(), we use refcount to check whether a pmd page table is shared. The check is incorrect if the refcount is increased by the above caller, and this can cause the page table leaked:
BUG: Bad page state in process sh pfn:109324 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x66 pfn:0x109324 flags: 0x17ffff800000000(node=0|zone=2|lastcpupid=0xfffff) page_type: f2(table) raw: 017ffff800000000 0000000000000000 0000000000000000 0000000000000000 raw: 0000000000000066 0000000000000000 00000000f2000000 0000000000000000 page dumped because: nonzero mapcount ... CPU: 31 UID: 0 PID: 7515 Comm: sh Kdump: loaded Tainted: G B 6.13.0-rc2master+ #7 Tainted: [B]=BAD_PAGE Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: show_stack+0x20/0x38 (C) dump_stack_lvl+0x80/0xf8 dump_stack+0x18/0x28 bad_page+0x8c/0x130 free_page_is_bad_report+0xa4/0xb0 free_unref_page+0x3cc/0x620 __folio_put+0xf4/0x158 split_huge_pages_all+0x1e0/0x3e8 split_huge_pages_write+0x25c/0x2d8 full_proxy_write+0x64/0xd8 vfs_write+0xcc/0x280 ksys_write+0x70/0x110 __arm64_sys_write+0x24/0x38 invoke_syscall+0x50/0x120 el0_svc_common.constprop.0+0xc8/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x34/0x128 el0t_64_sync_handler+0xc8/0xd0 el0t_64_sync+0x190/0x198
The issue may be triggered by damon, offline_page, page_idle, etc, which will increase the refcount of page table.
The page table itself will be discarded after reporting the "nonzero mapcount".
The HugeTLB page mapped by the page table miss freeing since we treat the page table as shared and a shared page table will not be unmapped.
Fix it by introducing independent PMD page table shared count. As described by comment, pt_index/pt_mm/pt_frag_refcount are used for s390 gmap, x86 pgds and powerpc, pt_share_count is used for x86/arm64/riscv pmds, so we can reuse the field as pt_share_count.
The commit causes LTP test memfd_create03 to fail on i586 architecture on v6.1.142 stable release, the test was passing on v6.1.141. Found the commit with git bisect.
The failure:
root@i586:~# /usr/lib/ltp/testcases/bin/memfd_create03 tst_hugepage.c:78: TINFO: 2 hugepage(s) reserved tst_test.c:1526: TINFO: Timeout per run is 0h 00m 30s memfd_create03.c:171: TINFO: --TESTING WRITE CALL IN HUGEPAGES-- memfd_create03.c:176: TINFO: memfd_create() succeeded memfd_create03.c:70: TPASS: write(3, "LTP", 3) failed as expected
memfd_create03.c:171: TINFO: --TESTING PAGE SIZE OF CREATED FILE-- memfd_create03.c:176: TINFO: memfd_create() succeeded memfd_create03.c:43: TINFO: mmap((nil), 4194304, 2, 2, 3, 0) succeeded memfd_create03.c:92: TINFO: munmap(0xb7800000, 1024kB) failed as expected memfd_create03.c:92: TINFO: munmap(0xb7800000, 2048kB) failed as expected memfd_create03.c:92: TINFO: munmap(0xb7800000, 3072kB) failed as expected memfd_create03.c:111: TPASS: munmap() fails for page sizes less than 4096kB
memfd_create03.c:171: TINFO: --TESTING HUGEPAGE ALLOCATION LIMIT-- memfd_create03.c:176: TINFO: memfd_create() succeeded memfd_create03.c:39: TBROK: mmap((nil),0,2,2,3,0) failed: EINVAL (22)
Summary: passed 2 failed 0 broken 1 skipped 0 warnings 0
dmesg while the test run:
[ 16.072078] memfd_create03 (203): drop_caches: 3 [ 16.075298] mm/pgtable-generic.c:51: bad pgd 7d4000e7
The same error occurs for v5.10.239. There is no test failure on v6.12.35 nor v6.15.4 even though they contain the same commit.
Thanks,
Link: https://lkml.kernel.org/r/20241216071147.3984217-1-liushixin2@huawei.com Fixes: 39dde65c9940 ("[PATCH] shared page table for hugetlb page") Signed-off-by: Liu Shixin liushixin2@huawei.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: Ken Chen kenneth.w.chen@intel.com Cc: Muchun Song muchun.song@linux.dev Cc: Nanyong Sun sunnanyong@huawei.com Cc: Jane Chu jane.chu@oracle.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org [backport note: struct ptdesc did not exist yet, stuff it equivalently into struct page instead] Signed-off-by: Jann Horn jannh@google.com
include/linux/mm.h | 3 +++ include/linux/mm_types.h | 3 +++ mm/hugetlb.c | 16 +++++++--------- 3 files changed, 13 insertions(+), 9 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index 03357c196e0b..b36dffbfbe69 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2537,6 +2537,9 @@ static inline bool pgtable_pmd_page_ctor(struct page *page) if (!pmd_ptlock_init(page)) return false; __SetPageTable(page); +#ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE
- atomic_set(&page->pt_share_count, 0);
+#endif inc_lruvec_page_state(page, NR_PAGETABLE); return true; } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index a9c1d611029d..9b64610eddcc 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -160,6 +160,9 @@ struct page { union { struct mm_struct *pt_mm; /* x86 pgds only */ atomic_t pt_frag_refcount; /* powerpc */ +#ifdef CONFIG_ARCH_WANT_HUGE_PMD_SHARE
atomic_t pt_share_count;
+#endif }; #if ALLOC_SPLIT_PTLOCKS spinlock_t *ptl; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index fc5d3d665266..a3907edf2909 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -7114,7 +7114,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, spte = huge_pte_offset(svma->vm_mm, saddr, vma_mmu_pagesize(svma)); if (spte) {
get_page(virt_to_page(spte));
}atomic_inc(&virt_to_page(spte)->pt_share_count); break; }
@@ -7129,7 +7129,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, (pmd_t *)((unsigned long)spte & PAGE_MASK)); mm_inc_nr_pmds(mm); } else {
put_page(virt_to_page(spte));
} spin_unlock(ptl);atomic_dec(&virt_to_page(spte)->pt_share_count);
out: @@ -7141,10 +7141,6 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, /*
- unmap huge page backed by shared pte.
- Hugetlb pte page is ref counted at the time of mapping. If pte is shared
- indicated by page_count > 1, unmap is achieved by clearing pud and
- decrementing the ref count. If count == 1, the pte page is not shared.
- Called with page table lock held.
- returns: 1 successfully unmapped a shared pte page
@@ -7153,18 +7149,20 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) {
- unsigned long sz = huge_page_size(hstate_vma(vma)); pgd_t *pgd = pgd_offset(mm, addr); p4d_t *p4d = p4d_offset(pgd, addr); pud_t *pud = pud_offset(p4d, addr);
i_mmap_assert_write_locked(vma->vm_file->f_mapping); hugetlb_vma_assert_locked(vma);
- BUG_ON(page_count(virt_to_page(ptep)) == 0);
- if (page_count(virt_to_page(ptep)) == 1)
- if (sz != PMD_SIZE)
return 0;
- if (!atomic_read(&virt_to_page(ptep)->pt_share_count)) return 0;
pud_clear(pud);
- put_page(virt_to_page(ptep));
- atomic_dec(&virt_to_page(ptep)->pt_share_count); mm_dec_nr_pmds(mm); return 1;
}
2.50.0.rc2.701.gf1e915cc24-goog