From: Jinjiang Tu tujinjiang@huawei.com
[ Upstream commit 7b7387650dcf2881fd8bb55bcf3c8bd6c9542dd7 ]
Migration may be raced with fallocating hole. remove_inode_single_folio will unmap the folio if the folio is still mapped. However, it's called without folio lock. If the folio is migrated and the mapped pte has been converted to migration entry, folio_mapped() returns false, and won't unmap it. Due to extra refcount held by remove_inode_single_folio, migration fails, restores migration entry to normal pte, and the folio is mapped again. As a result, we triggered BUG in filemap_unaccount_folio.
The log is as follows: BUG: Bad page cache in process hugetlb pfn:156c00 page: refcount:515 mapcount:0 mapping:0000000099fef6e1 index:0x0 pfn:0x156c00 head: order:9 mapcount:1 entire_mapcount:1 nr_pages_mapped:0 pincount:0 aops:hugetlbfs_aops ino:dcc dentry name(?):"my_hugepage_file" flags: 0x17ffffc00000c1(locked|waiters|head|node=0|zone=2|lastcpupid=0x1fffff) page_type: f4(hugetlb) page dumped because: still mapped when deleted CPU: 1 UID: 0 PID: 395 Comm: hugetlb Not tainted 6.17.0-rc5-00044-g7aac71907bde-dirty #484 NONE Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 Call Trace: <TASK> dump_stack_lvl+0x4f/0x70 filemap_unaccount_folio+0xc4/0x1c0 __filemap_remove_folio+0x38/0x1c0 filemap_remove_folio+0x41/0xd0 remove_inode_hugepages+0x142/0x250 hugetlbfs_fallocate+0x471/0x5a0 vfs_fallocate+0x149/0x380
Hold folio lock before checking if the folio is mapped to avold race with migration.
Link: https://lkml.kernel.org/r/20250912074139.3575005-1-tujinjiang@huawei.com Fixes: 4aae8d1c051e ("mm/hugetlbfs: unmap pages if page fault raced with hole punch") Signed-off-by: Jinjiang Tu tujinjiang@huawei.com Cc: David Hildenbrand david@redhat.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Muchun Song muchun.song@linux.dev Cc: Oscar Salvador osalvador@suse.de Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org [ folio -> page ] Signed-off-by: Sasha Levin sashal@kernel.org --- fs/hugetlbfs/inode.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index c18a47a86e8b2..504820e0e2292 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -502,13 +502,13 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
/* * If page is mapped, it was faulted in after being - * unmapped in caller. Unmap (again) now after taking - * the fault mutex. The mutex will prevent faults - * until we finish removing the page. - * - * This race can only happen in the hole punch case. - * Getting here in a truncate operation is a bug. + * unmapped in caller or hugetlb_vmdelete_list() skips + * unmapping it due to fail to grab lock. Unmap (again) + * while holding the fault mutex. The mutex will prevent + * faults until we finish removing the page. Hold page + * lock to guarantee no concurrent migration. */ + lock_page(page); if (unlikely(page_mapped(page))) { BUG_ON(truncate_op);
@@ -518,8 +518,6 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, (index + 1) * pages_per_huge_page(h)); i_mmap_unlock_write(mapping); } - - lock_page(page); /* * We must free the huge page and remove from page * cache (remove_huge_page) BEFORE removing the