Syzbot reported [1] a bad page state problem caused by a page being freed using free_page() still having a mlocked flag at free_pages_prepare() stage:
BUG: Bad page state in process syz.0.15 pfn:1137bb page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff8881137bb870 pfn:0x1137bb flags: 0x400000000080000(mlocked|node=0|zone=1) raw: 0400000000080000 0000000000000000 dead000000000122 0000000000000000 raw: ffff8881137bb870 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), pid 3005, tgid 3004 (syz.0.15), ts 61546 608067, free_ts 61390082085 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1537 prep_new_page mm/page_alloc.c:1545 [inline] get_page_from_freelist+0x3008/0x31f0 mm/page_alloc.c:3457 __alloc_pages_noprof+0x292/0x7b0 mm/page_alloc.c:4733 alloc_pages_mpol_noprof+0x3e8/0x630 mm/mempolicy.c:2265 kvm_coalesced_mmio_init+0x1f/0xf0 virt/kvm/coalesced_mmio.c:99 kvm_create_vm virt/kvm/kvm_main.c:1235 [inline] kvm_dev_ioctl_create_vm virt/kvm/kvm_main.c:5500 [inline] kvm_dev_ioctl+0x13bb/0x2320 virt/kvm/kvm_main.c:5542 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x69/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e page last free pid 951 tgid 951 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] free_pages_prepare mm/page_alloc.c:1108 [inline] free_unref_page+0xcb1/0xf00 mm/page_alloc.c:2638 vfree+0x181/0x2e0 mm/vmalloc.c:3361 delayed_vfree_work+0x56/0x80 mm/vmalloc.c:3282 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0xa5c/0x17a0 kernel/workqueue.c:3310 worker_thread+0xa2b/0xf70 kernel/workqueue.c:3391 kthread+0x2df/0x370 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
The problem was originally introduced by commit b109b87050df ("mm/munlock: replace clear_page_mlock() by final clearance"): it was handling focused on handling pagecache and anonymous memory and wasn't suitable for lower level get_page()/free_page() API's used for example by KVM, as with this reproducer.
Fix it by moving the mlocked flag clearance down to free_page_prepare().
The bug itself if fairly old and harmless (aside from generating these warnings), so the stable backport is likely not justified.
Closes: https://syzkaller.appspot.com/x/report.txt?x=169a47d0580000 Fixes: b109b87050df ("mm/munlock: replace clear_page_mlock() by final clearance") Signed-off-by: Roman Gushchin roman.gushchin@linux.dev Cc: stable@vger.kernel.org Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox willy@infradead.org --- mm/page_alloc.c | 9 +++++++++ mm/swap.c | 14 -------------- 2 files changed, 9 insertions(+), 14 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bc55d39eb372..24200651ad92 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1044,6 +1044,7 @@ __always_inline bool free_pages_prepare(struct page *page, bool skip_kasan_poison = should_skip_kasan_poison(page); bool init = want_init_on_free(); bool compound = PageCompound(page); + struct folio *folio = page_folio(page);
VM_BUG_ON_PAGE(PageTail(page), page);
@@ -1053,6 +1054,14 @@ __always_inline bool free_pages_prepare(struct page *page, if (memcg_kmem_online() && PageMemcgKmem(page)) __memcg_kmem_uncharge_page(page, order);
+ if (unlikely(folio_test_mlocked(folio))) { + long nr_pages = folio_nr_pages(folio); + + __folio_clear_mlocked(folio); + zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages); + count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages); + } + if (unlikely(PageHWPoison(page)) && !order) { /* Do not let hwpoison pages hit pcplists/buddy */ reset_page_owner(page, order); diff --git a/mm/swap.c b/mm/swap.c index 835bdf324b76..7cd0f4719423 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -78,20 +78,6 @@ static void __page_cache_release(struct folio *folio, struct lruvec **lruvecp, lruvec_del_folio(*lruvecp, folio); __folio_clear_lru_flags(folio); } - - /* - * In rare cases, when truncation or holepunching raced with - * munlock after VM_LOCKED was cleared, Mlocked may still be - * found set here. This does not indicate a problem, unless - * "unevictable_pgs_cleared" appears worryingly large. - */ - if (unlikely(folio_test_mlocked(folio))) { - long nr_pages = folio_nr_pages(folio); - - __folio_clear_mlocked(folio); - zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages); - count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages); - } }
/*
On 10/21/24 18:48, Roman Gushchin wrote:
Syzbot reported [1] a bad page state problem caused by a page being freed using free_page() still having a mlocked flag at free_pages_prepare() stage:
BUG: Bad page state in process syz.0.15 pfn:1137bb page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff8881137bb870 pfn:0x1137bb flags: 0x400000000080000(mlocked|node=0|zone=1) raw: 0400000000080000 0000000000000000 dead000000000122 0000000000000000 raw: ffff8881137bb870 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), pid 3005, tgid 3004 (syz.0.15), ts 61546 608067, free_ts 61390082085 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1537 prep_new_page mm/page_alloc.c:1545 [inline] get_page_from_freelist+0x3008/0x31f0 mm/page_alloc.c:3457 __alloc_pages_noprof+0x292/0x7b0 mm/page_alloc.c:4733 alloc_pages_mpol_noprof+0x3e8/0x630 mm/mempolicy.c:2265 kvm_coalesced_mmio_init+0x1f/0xf0 virt/kvm/coalesced_mmio.c:99 kvm_create_vm virt/kvm/kvm_main.c:1235 [inline] kvm_dev_ioctl_create_vm virt/kvm/kvm_main.c:5500 [inline] kvm_dev_ioctl+0x13bb/0x2320 virt/kvm/kvm_main.c:5542 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x69/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e page last free pid 951 tgid 951 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] free_pages_prepare mm/page_alloc.c:1108 [inline] free_unref_page+0xcb1/0xf00 mm/page_alloc.c:2638 vfree+0x181/0x2e0 mm/vmalloc.c:3361 delayed_vfree_work+0x56/0x80 mm/vmalloc.c:3282 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0xa5c/0x17a0 kernel/workqueue.c:3310 worker_thread+0xa2b/0xf70 kernel/workqueue.c:3391 kthread+0x2df/0x370 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
The problem was originally introduced by commit b109b87050df ("mm/munlock: replace clear_page_mlock() by final clearance"): it was handling focused on handling pagecache and anonymous memory and wasn't suitable for lower level get_page()/free_page() API's used for example by KVM, as with this reproducer.
Does that mean KVM is mlocking pages that are not pagecache nor anonymous, thus not LRU? How and why (and since when) is that done?
Fix it by moving the mlocked flag clearance down to free_page_prepare().
The bug itself if fairly old and harmless (aside from generating these warnings), so the stable backport is likely not justified.
But since there's a Cc: stable below, it will be backported :)
Closes: https://syzkaller.appspot.com/x/report.txt?x=169a47d0580000 Fixes: b109b87050df ("mm/munlock: replace clear_page_mlock() by final clearance") Signed-off-by: Roman Gushchin roman.gushchin@linux.dev Cc: stable@vger.kernel.org Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox willy@infradead.org
mm/page_alloc.c | 9 +++++++++ mm/swap.c | 14 -------------- 2 files changed, 9 insertions(+), 14 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bc55d39eb372..24200651ad92 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1044,6 +1044,7 @@ __always_inline bool free_pages_prepare(struct page *page, bool skip_kasan_poison = should_skip_kasan_poison(page); bool init = want_init_on_free(); bool compound = PageCompound(page);
- struct folio *folio = page_folio(page);
VM_BUG_ON_PAGE(PageTail(page), page); @@ -1053,6 +1054,14 @@ __always_inline bool free_pages_prepare(struct page *page, if (memcg_kmem_online() && PageMemcgKmem(page)) __memcg_kmem_uncharge_page(page, order);
- if (unlikely(folio_test_mlocked(folio))) {
long nr_pages = folio_nr_pages(folio);
__folio_clear_mlocked(folio);
zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages);
count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages);
- }
Why drop the useful comment?
- if (unlikely(PageHWPoison(page)) && !order) { /* Do not let hwpoison pages hit pcplists/buddy */ reset_page_owner(page, order);
diff --git a/mm/swap.c b/mm/swap.c index 835bdf324b76..7cd0f4719423 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -78,20 +78,6 @@ static void __page_cache_release(struct folio *folio, struct lruvec **lruvecp, lruvec_del_folio(*lruvecp, folio); __folio_clear_lru_flags(folio); }
- /*
* In rare cases, when truncation or holepunching raced with
* munlock after VM_LOCKED was cleared, Mlocked may still be
* found set here. This does not indicate a problem, unless
* "unevictable_pgs_cleared" appears worryingly large.
*/
- if (unlikely(folio_test_mlocked(folio))) {
long nr_pages = folio_nr_pages(folio);
__folio_clear_mlocked(folio);
zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages);
count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages);
- }
} /*
On Mon, Oct 21, 2024 at 07:01:59PM +0200, Vlastimil Babka wrote:
On 10/21/24 18:48, Roman Gushchin wrote:
Syzbot reported [1] a bad page state problem caused by a page being freed using free_page() still having a mlocked flag at free_pages_prepare() stage:
BUG: Bad page state in process syz.0.15 pfn:1137bb page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff8881137bb870 pfn:0x1137bb flags: 0x400000000080000(mlocked|node=0|zone=1) raw: 0400000000080000 0000000000000000 dead000000000122 0000000000000000 raw: ffff8881137bb870 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), pid 3005, tgid 3004 (syz.0.15), ts 61546 608067, free_ts 61390082085 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1537 prep_new_page mm/page_alloc.c:1545 [inline] get_page_from_freelist+0x3008/0x31f0 mm/page_alloc.c:3457 __alloc_pages_noprof+0x292/0x7b0 mm/page_alloc.c:4733 alloc_pages_mpol_noprof+0x3e8/0x630 mm/mempolicy.c:2265 kvm_coalesced_mmio_init+0x1f/0xf0 virt/kvm/coalesced_mmio.c:99 kvm_create_vm virt/kvm/kvm_main.c:1235 [inline] kvm_dev_ioctl_create_vm virt/kvm/kvm_main.c:5500 [inline] kvm_dev_ioctl+0x13bb/0x2320 virt/kvm/kvm_main.c:5542 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x69/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e page last free pid 951 tgid 951 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] free_pages_prepare mm/page_alloc.c:1108 [inline] free_unref_page+0xcb1/0xf00 mm/page_alloc.c:2638 vfree+0x181/0x2e0 mm/vmalloc.c:3361 delayed_vfree_work+0x56/0x80 mm/vmalloc.c:3282 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0xa5c/0x17a0 kernel/workqueue.c:3310 worker_thread+0xa2b/0xf70 kernel/workqueue.c:3391 kthread+0x2df/0x370 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
The problem was originally introduced by commit b109b87050df ("mm/munlock: replace clear_page_mlock() by final clearance"): it was handling focused on handling pagecache and anonymous memory and wasn't suitable for lower level get_page()/free_page() API's used for example by KVM, as with this reproducer.
Does that mean KVM is mlocking pages that are not pagecache nor anonymous, thus not LRU? How and why (and since when) is that done?
KVM allows to mmap and mlock several pages allocated directly. Please, take a look at the reproducer: https://syzkaller.appspot.com/x/repro.c?x=1437939f980000
Fix it by moving the mlocked flag clearance down to free_page_prepare().
The bug itself if fairly old and harmless (aside from generating these warnings), so the stable backport is likely not justified.
But since there's a Cc: stable below, it will be backported :)
My bad, I changed my mind in the last minute and added Cc: stable but forgot to drop this sentence.
Closes: https://syzkaller.appspot.com/x/report.txt?x=169a47d0580000 Fixes: b109b87050df ("mm/munlock: replace clear_page_mlock() by final clearance") Signed-off-by: Roman Gushchin roman.gushchin@linux.dev Cc: stable@vger.kernel.org Cc: Hugh Dickins hughd@google.com Cc: Matthew Wilcox willy@infradead.org
mm/page_alloc.c | 9 +++++++++ mm/swap.c | 14 -------------- 2 files changed, 9 insertions(+), 14 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bc55d39eb372..24200651ad92 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1044,6 +1044,7 @@ __always_inline bool free_pages_prepare(struct page *page, bool skip_kasan_poison = should_skip_kasan_poison(page); bool init = want_init_on_free(); bool compound = PageCompound(page);
- struct folio *folio = page_folio(page);
VM_BUG_ON_PAGE(PageTail(page), page); @@ -1053,6 +1054,14 @@ __always_inline bool free_pages_prepare(struct page *page, if (memcg_kmem_online() && PageMemcgKmem(page)) __memcg_kmem_uncharge_page(page, order);
- if (unlikely(folio_test_mlocked(folio))) {
long nr_pages = folio_nr_pages(folio);
__folio_clear_mlocked(folio);
zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages);
count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages);
- }
Why drop the useful comment?
Agree. Sounds like I need to restore the comment, drop no stable backport recommendation and send v2.
Thank you for taking a look!
linux-stable-mirror@lists.linaro.org