The quilt patch titled
Subject: maple_tree: remove rcu_read_lock() from mt_validate()
has been removed from the -mm tree. Its filename was
maple_tree-remove-rcu_read_lock-from-mt_validate.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)Oracle.com>
Subject: maple_tree: remove rcu_read_lock() from mt_validate()
Date: Tue, 20 Aug 2024 13:54:17 -0400
The write lock should be held when validating the tree to avoid updates
racing with checks. Holding the rcu read lock during a large tree
validation may also cause a prolonged rcu read window and "rcu_preempt
detected stalls" warnings.
Link: https://lore.kernel.org/all/0000000000001d12d4062005aea1@google.com/
Link: https://lkml.kernel.org/r/20240820175417.2782532-1-Liam.Howlett@oracle.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Reported-by: syzbot+036af2f0c7338a33b0cd(a)syzkaller.appspotmail.com
Cc: Hillf Danton <hdanton(a)sina.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: "Paul E. McKenney" <paulmck(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
--- a/lib/maple_tree.c~maple_tree-remove-rcu_read_lock-from-mt_validate
+++ a/lib/maple_tree.c
@@ -7566,14 +7566,14 @@ static void mt_validate_nulls(struct map
* 2. The gap is correctly set in the parents
*/
void mt_validate(struct maple_tree *mt)
+ __must_hold(mas->tree->ma_lock)
{
unsigned char end;
MA_STATE(mas, mt, 0, 0);
- rcu_read_lock();
mas_start(&mas);
if (!mas_is_active(&mas))
- goto done;
+ return;
while (!mte_is_leaf(mas.node))
mas_descend(&mas);
@@ -7594,9 +7594,6 @@ void mt_validate(struct maple_tree *mt)
mas_dfs_postorder(&mas, ULONG_MAX);
}
mt_validate_nulls(mt);
-done:
- rcu_read_unlock();
-
}
EXPORT_SYMBOL_GPL(mt_validate);
_
Patches currently in -mm which might be from Liam.Howlett(a)Oracle.com are
mm-vma-correctly-position-vma_iterator-in-__split_vma.patch
mm-vma-introduce-abort_munmap_vmas.patch
mm-vma-introduce-vmi_complete_munmap_vmas.patch
mm-vma-extract-the-gathering-of-vmas-from-do_vmi_align_munmap.patch
mm-vma-introduce-vma_munmap_struct-for-use-in-munmap-operations.patch
mm-vma-change-munmap-to-use-vma_munmap_struct-for-accounting-and-surrounding-vmas.patch
mm-vma-extract-validate_mm-from-vma_complete.patch
mm-vma-inline-munmap-operation-in-mmap_region.patch
mm-vma-expand-mmap_region-munmap-call.patch
mm-vma-support-vma-==-null-in-init_vma_munmap.patch
mm-mmap-reposition-vma-iterator-in-mmap_region.patch
mm-vma-track-start-and-end-for-munmap-in-vma_munmap_struct.patch
mm-clean-up-unmap_region-argument-list.patch
mm-mmap-avoid-zeroing-vma-tree-in-mmap_region.patch
mm-change-failure-of-map_fixed-to-restoring-the-gap-on-failure.patch
mm-mmap-use-phys_pfn-in-mmap_region.patch
mm-mmap-use-vms-accounted-pages-in-mmap_region.patch
ipc-shm-mm-drop-do_vma_munmap.patch
mm-move-may_expand_vm-check-in-mmap_region.patch
mm-vma-drop-incorrect-comment-from-vms_gather_munmap_vmas.patch
mm-vmah-optimise-vma_munmap_struct.patch
The quilt patch titled
Subject: kexec_file: fix elfcorehdr digest exclusion when CONFIG_CRASH_HOTPLUG=y
has been removed from the -mm tree. Its filename was
kexec_file-fix-elfcorehdr-digest-exclusion-when-config_crash_hotplug=y.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Petr Tesarik <ptesarik(a)suse.com>
Subject: kexec_file: fix elfcorehdr digest exclusion when CONFIG_CRASH_HOTPLUG=y
Date: Mon, 5 Aug 2024 17:07:50 +0200
Fix the condition to exclude the elfcorehdr segment from the SHA digest
calculation.
The j iterator is an index into the output sha_regions[] array, not into
the input image->segment[] array. Once it reaches
image->elfcorehdr_index, all subsequent segments are excluded. Besides,
if the purgatory segment precedes the elfcorehdr segment, the elfcorehdr
may be wrongly included in the calculation.
Link: https://lkml.kernel.org/r/20240805150750.170739-1-petr.tesarik@suse.com
Fixes: f7cc804a9fd4 ("kexec: exclude elfcorehdr from the segment digest")
Signed-off-by: Petr Tesarik <ptesarik(a)suse.com>
Acked-by: Baoquan He <bhe(a)redhat.com>
Cc: Eric Biederman <ebiederm(a)xmission.com>
Cc: Hari Bathini <hbathini(a)linux.ibm.com>
Cc: Sourabh Jain <sourabhjain(a)linux.ibm.com>
Cc: Eric DeVolder <eric_devolder(a)yahoo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/kexec_file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/kexec_file.c~kexec_file-fix-elfcorehdr-digest-exclusion-when-config_crash_hotplug=y
+++ a/kernel/kexec_file.c
@@ -752,7 +752,7 @@ static int kexec_calculate_store_digests
#ifdef CONFIG_CRASH_HOTPLUG
/* Exclude elfcorehdr segment to allow future changes via hotplug */
- if (j == image->elfcorehdr_index)
+ if (i == image->elfcorehdr_index)
continue;
#endif
_
Patches currently in -mm which might be from ptesarik(a)suse.com are
The quilt patch titled
Subject: mm/slub: add check for s->flags in the alloc_tagging_slab_free_hook
has been removed from the -mm tree. Its filename was
mm-slub-add-check-for-s-flags-in-the-alloc_tagging_slab_free_hook.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Hao Ge <gehao(a)kylinos.cn>
Subject: mm/slub: add check for s->flags in the alloc_tagging_slab_free_hook
Date: Fri, 16 Aug 2024 09:33:36 +0800
When enable CONFIG_MEMCG & CONFIG_KFENCE & CONFIG_KMEMLEAK, the following
warning always occurs,This is because the following call stack occurred:
mem_pool_alloc
kmem_cache_alloc_noprof
slab_alloc_node
kfence_alloc
Once the kfence allocation is successful,slab->obj_exts will not be empty,
because it has already been assigned a value in kfence_init_pool.
Since in the prepare_slab_obj_exts_hook function,we perform a check for
s->flags & (SLAB_NO_OBJ_EXT | SLAB_NOLEAKTRACE),the alloc_tag_add function
will not be called as a result.Therefore,ref->ct remains NULL.
However,when we call mem_pool_free,since obj_ext is not empty, it
eventually leads to the alloc_tag_sub scenario being invoked. This is
where the warning occurs.
So we should add corresponding checks in the alloc_tagging_slab_free_hook.
For __GFP_NO_OBJ_EXT case,I didn't see the specific case where it's using
kfence,so I won't add the corresponding check in
alloc_tagging_slab_free_hook for now.
[ 3.734349] ------------[ cut here ]------------
[ 3.734807] alloc_tag was not set
[ 3.735129] WARNING: CPU: 4 PID: 40 at ./include/linux/alloc_tag.h:130 kmem_cache_free+0x444/0x574
[ 3.735866] Modules linked in: autofs4
[ 3.736211] CPU: 4 UID: 0 PID: 40 Comm: ksoftirqd/4 Tainted: G W 6.11.0-rc3-dirty #1
[ 3.736969] Tainted: [W]=WARN
[ 3.737258] Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
[ 3.737875] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 3.738501] pc : kmem_cache_free+0x444/0x574
[ 3.738951] lr : kmem_cache_free+0x444/0x574
[ 3.739361] sp : ffff80008357bb60
[ 3.739693] x29: ffff80008357bb70 x28: 0000000000000000 x27: 0000000000000000
[ 3.740338] x26: ffff80008207f000 x25: ffff000b2eb2fd60 x24: ffff0000c0005700
[ 3.740982] x23: ffff8000804229e4 x22: ffff800082080000 x21: ffff800081756000
[ 3.741630] x20: fffffd7ff8253360 x19: 00000000000000a8 x18: ffffffffffffffff
[ 3.742274] x17: ffff800ab327f000 x16: ffff800083398000 x15: ffff800081756df0
[ 3.742919] x14: 0000000000000000 x13: 205d344320202020 x12: 5b5d373038343337
[ 3.743560] x11: ffff80008357b650 x10: 000000000000005d x9 : 00000000ffffffd0
[ 3.744231] x8 : 7f7f7f7f7f7f7f7f x7 : ffff80008237bad0 x6 : c0000000ffff7fff
[ 3.744907] x5 : ffff80008237ba78 x4 : ffff8000820bbad0 x3 : 0000000000000001
[ 3.745580] x2 : 68d66547c09f7800 x1 : 68d66547c09f7800 x0 : 0000000000000000
[ 3.746255] Call trace:
[ 3.746530] kmem_cache_free+0x444/0x574
[ 3.746931] mem_pool_free+0x44/0xf4
[ 3.747306] free_object_rcu+0xc8/0xdc
[ 3.747693] rcu_do_batch+0x234/0x8a4
[ 3.748075] rcu_core+0x230/0x3e4
[ 3.748424] rcu_core_si+0x14/0x1c
[ 3.748780] handle_softirqs+0x134/0x378
[ 3.749189] run_ksoftirqd+0x70/0x9c
[ 3.749560] smpboot_thread_fn+0x148/0x22c
[ 3.749978] kthread+0x10c/0x118
[ 3.750323] ret_from_fork+0x10/0x20
[ 3.750696] ---[ end trace 0000000000000000 ]---
Link: https://lkml.kernel.org/r/20240816013336.17505-1-hao.ge@linux.dev
Fixes: 4b8736964640 ("mm/slab: add allocation accounting into slab allocation and free paths")
Signed-off-by: Hao Ge <gehao(a)kylinos.cn>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Hyeonggon Yoo <42.hyeyoo(a)gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Kees Cook <kees(a)kernel.org>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/slub.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/mm/slub.c~mm-slub-add-check-for-s-flags-in-the-alloc_tagging_slab_free_hook
+++ a/mm/slub.c
@@ -2116,6 +2116,10 @@ alloc_tagging_slab_free_hook(struct kmem
if (!mem_alloc_profiling_enabled())
return;
+ /* slab->obj_exts might not be NULL if it was created for MEMCG accounting. */
+ if (s->flags & (SLAB_NO_OBJ_EXT | SLAB_NOLEAKTRACE))
+ return;
+
obj_exts = slab_obj_exts(slab);
if (!obj_exts)
return;
_
Patches currently in -mm which might be from gehao(a)kylinos.cn are
mm-cma-change-the-addition-of-totalcma_pages-in-the-cma_init_reserved_mem.patch
The quilt patch titled
Subject: nilfs2: fix state management in error path of log writing function
has been removed from the -mm tree. Its filename was
nilfs2-fix-state-management-in-error-path-of-log-writing-function.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix state management in error path of log writing function
Date: Wed, 14 Aug 2024 19:11:19 +0900
After commit a694291a6211 ("nilfs2: separate wait function from
nilfs_segctor_write") was applied, the log writing function
nilfs_segctor_do_construct() was able to issue I/O requests continuously
even if user data blocks were split into multiple logs across segments,
but two potential flaws were introduced in its error handling.
First, if nilfs_segctor_begin_construction() fails while creating the
second or subsequent logs, the log writing function returns without
calling nilfs_segctor_abort_construction(), so the writeback flag set on
pages/folios will remain uncleared. This causes page cache operations to
hang waiting for the writeback flag. For example,
truncate_inode_pages_final(), which is called via nilfs_evict_inode() when
an inode is evicted from memory, will hang.
Second, the NILFS_I_COLLECTED flag set on normal inodes remain uncleared.
As a result, if the next log write involves checkpoint creation, that's
fine, but if a partial log write is performed that does not, inodes with
NILFS_I_COLLECTED set are erroneously removed from the "sc_dirty_files"
list, and their data and b-tree blocks may not be written to the device,
corrupting the block mapping.
Fix these issues by uniformly calling nilfs_segctor_abort_construction()
on failure of each step in the loop in nilfs_segctor_do_construct(),
having it clean up logs and segment usages according to progress, and
correcting the conditions for calling nilfs_redirty_inodes() to ensure
that the NILFS_I_COLLECTED flag is cleared.
Link: https://lkml.kernel.org/r/20240814101119.4070-1-konishi.ryusuke@gmail.com
Fixes: a694291a6211 ("nilfs2: separate wait function from nilfs_segctor_write")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/segment.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
--- a/fs/nilfs2/segment.c~nilfs2-fix-state-management-in-error-path-of-log-writing-function
+++ a/fs/nilfs2/segment.c
@@ -1812,6 +1812,9 @@ static void nilfs_segctor_abort_construc
nilfs_abort_logs(&logs, ret ? : err);
list_splice_tail_init(&sci->sc_segbufs, &logs);
+ if (list_empty(&logs))
+ return; /* if the first segment buffer preparation failed */
+
nilfs_cancel_segusage(&logs, nilfs->ns_sufile);
nilfs_free_incomplete_logs(&logs, nilfs);
@@ -2056,7 +2059,7 @@ static int nilfs_segctor_do_construct(st
err = nilfs_segctor_begin_construction(sci, nilfs);
if (unlikely(err))
- goto out;
+ goto failed;
/* Update time stamp */
sci->sc_seg_ctime = ktime_get_real_seconds();
@@ -2120,10 +2123,9 @@ static int nilfs_segctor_do_construct(st
return err;
failed_to_write:
- if (sci->sc_stage.flags & NILFS_CF_IFILE_STARTED)
- nilfs_redirty_inodes(&sci->sc_dirty_files);
-
failed:
+ if (mode == SC_LSEG_SR && nilfs_sc_cstage_get(sci) >= NILFS_ST_IFILE)
+ nilfs_redirty_inodes(&sci->sc_dirty_files);
if (nilfs_doing_gc())
nilfs_redirty_inodes(&sci->sc_gc_inodes);
nilfs_segctor_abort_construction(sci, nilfs, err);
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-add-support-for-fs_ioc_getuuid.patch
nilfs2-add-support-for-fs_ioc_getfssysfspath.patch
nilfs2-add-support-for-fs_ioc_getfslabel.patch
nilfs2-add-support-for-fs_ioc_setfslabel.patch
nilfs2-do-not-output-warnings-when-clearing-dirty-buffers.patch
nilfs2-add-missing-argument-description-for-__nilfs_error.patch
nilfs2-add-missing-argument-descriptions-for-ioctl-related-helpers.patch
nilfs2-improve-kernel-doc-comments-for-b-tree-node-helpers.patch
nilfs2-fix-incorrect-kernel-doc-declaration-of-nilfs_palloc_req-structure.patch
nilfs2-add-missing-description-of-nilfs_btree_path-structure.patch
nilfs2-describe-the-members-of-nilfs_bmap_operations-structure.patch
nilfs2-fix-inconsistencies-in-kernel-doc-comments-in-segmenth.patch
nilfs2-fix-missing-initial-short-descriptions-of-kernel-doc-comments.patch
nilfs2-treat-missing-sufile-header-block-as-metadata-corruption.patch
nilfs2-treat-missing-cpfile-header-block-as-metadata-corruption.patch
nilfs2-do-not-propagate-enoent-error-from-sufile-during-recovery.patch
nilfs2-do-not-propagate-enoent-error-from-sufile-during-gc.patch
nilfs2-do-not-propagate-enoent-error-from-nilfs_sufile_mark_dirty.patch
nilfs2-use-the-bits_per_long-macro.patch
nilfs2-separate-inode-type-information-from-i_state-field.patch
nilfs2-eliminate-the-shared-counter-and-spinlock-for-i_generation.patch
nilfs2-do-not-repair-reserved-inode-bitmap-in-nilfs_new_inode.patch
nilfs2-remove-sc_timer_task.patch
nilfs2-use-kthread_create-and-kthread_stop-for-the-log-writer-thread.patch
nilfs2-refactor-nilfs_segctor_thread.patch
The quilt patch titled
Subject: nilfs2: fix missing cleanup on rollforward recovery error
has been removed from the -mm tree. Its filename was
nilfs2-fix-missing-cleanup-on-rollforward-recovery-error.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix missing cleanup on rollforward recovery error
Date: Sat, 10 Aug 2024 15:52:42 +0900
In an error injection test of a routine for mount-time recovery, KASAN
found a use-after-free bug.
It turned out that if data recovery was performed using partial logs
created by dsync writes, but an error occurred before starting the log
writer to create a recovered checkpoint, the inodes whose data had been
recovered were left in the ns_dirty_files list of the nilfs object and
were not freed.
Fix this issue by cleaning up inodes that have read the recovery data if
the recovery routine fails midway before the log writer starts.
Link: https://lkml.kernel.org/r/20240810065242.3701-1-konishi.ryusuke@gmail.com
Fixes: 0f3e1c7f23f8 ("nilfs2: recovery functions")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/recovery.c | 35 +++++++++++++++++++++++++++++++++--
1 file changed, 33 insertions(+), 2 deletions(-)
--- a/fs/nilfs2/recovery.c~nilfs2-fix-missing-cleanup-on-rollforward-recovery-error
+++ a/fs/nilfs2/recovery.c
@@ -716,6 +716,33 @@ static void nilfs_finish_roll_forward(st
}
/**
+ * nilfs_abort_roll_forward - cleaning up after a failed rollforward recovery
+ * @nilfs: nilfs object
+ */
+static void nilfs_abort_roll_forward(struct the_nilfs *nilfs)
+{
+ struct nilfs_inode_info *ii, *n;
+ LIST_HEAD(head);
+
+ /* Abandon inodes that have read recovery data */
+ spin_lock(&nilfs->ns_inode_lock);
+ list_splice_init(&nilfs->ns_dirty_files, &head);
+ spin_unlock(&nilfs->ns_inode_lock);
+ if (list_empty(&head))
+ return;
+
+ set_nilfs_purging(nilfs);
+ list_for_each_entry_safe(ii, n, &head, i_dirty) {
+ spin_lock(&nilfs->ns_inode_lock);
+ list_del_init(&ii->i_dirty);
+ spin_unlock(&nilfs->ns_inode_lock);
+
+ iput(&ii->vfs_inode);
+ }
+ clear_nilfs_purging(nilfs);
+}
+
+/**
* nilfs_salvage_orphan_logs - salvage logs written after the latest checkpoint
* @nilfs: nilfs object
* @sb: super block instance
@@ -773,15 +800,19 @@ int nilfs_salvage_orphan_logs(struct the
if (unlikely(err)) {
nilfs_err(sb, "error %d writing segment for recovery",
err);
- goto failed;
+ goto put_root;
}
nilfs_finish_roll_forward(nilfs, ri);
}
- failed:
+put_root:
nilfs_put_root(root);
return err;
+
+failed:
+ nilfs_abort_roll_forward(nilfs);
+ goto put_root;
}
/**
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-add-support-for-fs_ioc_getuuid.patch
nilfs2-add-support-for-fs_ioc_getfssysfspath.patch
nilfs2-add-support-for-fs_ioc_getfslabel.patch
nilfs2-add-support-for-fs_ioc_setfslabel.patch
nilfs2-do-not-output-warnings-when-clearing-dirty-buffers.patch
nilfs2-add-missing-argument-description-for-__nilfs_error.patch
nilfs2-add-missing-argument-descriptions-for-ioctl-related-helpers.patch
nilfs2-improve-kernel-doc-comments-for-b-tree-node-helpers.patch
nilfs2-fix-incorrect-kernel-doc-declaration-of-nilfs_palloc_req-structure.patch
nilfs2-add-missing-description-of-nilfs_btree_path-structure.patch
nilfs2-describe-the-members-of-nilfs_bmap_operations-structure.patch
nilfs2-fix-inconsistencies-in-kernel-doc-comments-in-segmenth.patch
nilfs2-fix-missing-initial-short-descriptions-of-kernel-doc-comments.patch
nilfs2-treat-missing-sufile-header-block-as-metadata-corruption.patch
nilfs2-treat-missing-cpfile-header-block-as-metadata-corruption.patch
nilfs2-do-not-propagate-enoent-error-from-sufile-during-recovery.patch
nilfs2-do-not-propagate-enoent-error-from-sufile-during-gc.patch
nilfs2-do-not-propagate-enoent-error-from-nilfs_sufile_mark_dirty.patch
nilfs2-use-the-bits_per_long-macro.patch
nilfs2-separate-inode-type-information-from-i_state-field.patch
nilfs2-eliminate-the-shared-counter-and-spinlock-for-i_generation.patch
nilfs2-do-not-repair-reserved-inode-bitmap-in-nilfs_new_inode.patch
nilfs2-remove-sc_timer_task.patch
nilfs2-use-kthread_create-and-kthread_stop-for-the-log-writer-thread.patch
nilfs2-refactor-nilfs_segctor_thread.patch
The quilt patch titled
Subject: userfaultfd: don't BUG_ON() if khugepaged yanks our page table
has been removed from the -mm tree. Its filename was
userfaultfd-dont-bug_on-if-khugepaged-yanks-our-page-table.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Jann Horn <jannh(a)google.com>
Subject: userfaultfd: don't BUG_ON() if khugepaged yanks our page table
Date: Tue, 13 Aug 2024 22:25:22 +0200
Since khugepaged was changed to allow retracting page tables in file
mappings without holding the mmap lock, these BUG_ON()s are wrong - get
rid of them.
We could also remove the preceding "if (unlikely(...))" block, but then we
could reach pte_offset_map_lock() with transhuge pages not just for file
mappings but also for anonymous mappings - which would probably be fine
but I think is not necessarily expected.
Link: https://lkml.kernel.org/r/20240813-uffd-thp-flip-fix-v2-2-5efa61078a41@goog…
Fixes: 1d65b771bc08 ("mm/khugepaged: retract_page_tables() without mmap or vma lock")
Signed-off-by: Jann Horn <jannh(a)google.com>
Reviewed-by: Qi Zheng <zhengqi.arch(a)bytedance.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Pavel Emelyanov <xemul(a)virtuozzo.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/userfaultfd.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
--- a/mm/userfaultfd.c~userfaultfd-dont-bug_on-if-khugepaged-yanks-our-page-table
+++ a/mm/userfaultfd.c
@@ -807,9 +807,10 @@ retry:
err = -EFAULT;
break;
}
-
- BUG_ON(pmd_none(*dst_pmd));
- BUG_ON(pmd_trans_huge(*dst_pmd));
+ /*
+ * For shmem mappings, khugepaged is allowed to remove page
+ * tables under us; pte_offset_map_lock() will deal with that.
+ */
err = mfill_atomic_pte(dst_pmd, dst_vma, dst_addr,
src_addr, flags, &folio);
_
Patches currently in -mm which might be from jannh(a)google.com are
mm-fix-harmless-type-confusion-in-lock_vma_under_rcu.patch
The quilt patch titled
Subject: userfaultfd: fix checks for huge PMDs
has been removed from the -mm tree. Its filename was
userfaultfd-fix-checks-for-huge-pmds.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Jann Horn <jannh(a)google.com>
Subject: userfaultfd: fix checks for huge PMDs
Date: Tue, 13 Aug 2024 22:25:21 +0200
Patch series "userfaultfd: fix races around pmd_trans_huge() check", v2.
The pmd_trans_huge() code in mfill_atomic() is wrong in three different
ways depending on kernel version:
1. The pmd_trans_huge() check is racy and can lead to a BUG_ON() (if you hit
the right two race windows) - I've tested this in a kernel build with
some extra mdelay() calls. See the commit message for a description
of the race scenario.
On older kernels (before 6.5), I think the same bug can even
theoretically lead to accessing transhuge page contents as a page table
if you hit the right 5 narrow race windows (I haven't tested this case).
2. As pointed out by Qi Zheng, pmd_trans_huge() is not sufficient for
detecting PMDs that don't point to page tables.
On older kernels (before 6.5), you'd just have to win a single fairly
wide race to hit this.
I've tested this on 6.1 stable by racing migration (with a mdelay()
patched into try_to_migrate()) against UFFDIO_ZEROPAGE - on my x86
VM, that causes a kernel oops in ptlock_ptr().
3. On newer kernels (>=6.5), for shmem mappings, khugepaged is allowed
to yank page tables out from under us (though I haven't tested that),
so I think the BUG_ON() checks in mfill_atomic() are just wrong.
I decided to write two separate fixes for these (one fix for bugs 1+2, one
fix for bug 3), so that the first fix can be backported to kernels
affected by bugs 1+2.
This patch (of 2):
This fixes two issues.
I discovered that the following race can occur:
mfill_atomic other thread
============ ============
<zap PMD>
pmdp_get_lockless() [reads none pmd]
<bail if trans_huge>
<if none:>
<pagefault creates transhuge zeropage>
__pte_alloc [no-op]
<zap PMD>
<bail if pmd_trans_huge(*dst_pmd)>
BUG_ON(pmd_none(*dst_pmd))
I have experimentally verified this in a kernel with extra mdelay() calls;
the BUG_ON(pmd_none(*dst_pmd)) triggers.
On kernels newer than commit 0d940a9b270b ("mm/pgtable: allow
pte_offset_map[_lock]() to fail"), this can't lead to anything worse than
a BUG_ON(), since the page table access helpers are actually designed to
deal with page tables concurrently disappearing; but on older kernels
(<=6.4), I think we could probably theoretically race past the two
BUG_ON() checks and end up treating a hugepage as a page table.
The second issue is that, as Qi Zheng pointed out, there are other types
of huge PMDs that pmd_trans_huge() can't catch: devmap PMDs and swap PMDs
(in particular, migration PMDs).
On <=6.4, this is worse than the first issue: If mfill_atomic() runs on a
PMD that contains a migration entry (which just requires winning a single,
fairly wide race), it will pass the PMD to pte_offset_map_lock(), which
assumes that the PMD points to a page table.
Breakage follows: First, the kernel tries to take the PTE lock (which will
crash or maybe worse if there is no "struct page" for the address bits in
the migration entry PMD - I think at least on X86 there usually is no
corresponding "struct page" thanks to the PTE inversion mitigation, amd64
looks different).
If that didn't crash, the kernel would next try to write a PTE into what
it wrongly thinks is a page table.
As part of fixing these issues, get rid of the check for pmd_trans_huge()
before __pte_alloc() - that's redundant, we're going to have to check for
that after the __pte_alloc() anyway.
Backport note: pmdp_get_lockless() is pmd_read_atomic() in older kernels.
Link: https://lkml.kernel.org/r/20240813-uffd-thp-flip-fix-v2-0-5efa61078a41@goog…
Link: https://lkml.kernel.org/r/20240813-uffd-thp-flip-fix-v2-1-5efa61078a41@goog…
Fixes: c1a4de99fada ("userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation")
Signed-off-by: Jann Horn <jannh(a)google.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Pavel Emelyanov <xemul(a)virtuozzo.com>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/userfaultfd.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)
--- a/mm/userfaultfd.c~userfaultfd-fix-checks-for-huge-pmds
+++ a/mm/userfaultfd.c
@@ -787,21 +787,23 @@ retry:
}
dst_pmdval = pmdp_get_lockless(dst_pmd);
- /*
- * If the dst_pmd is mapped as THP don't
- * override it and just be strict.
- */
- if (unlikely(pmd_trans_huge(dst_pmdval))) {
- err = -EEXIST;
- break;
- }
if (unlikely(pmd_none(dst_pmdval)) &&
unlikely(__pte_alloc(dst_mm, dst_pmd))) {
err = -ENOMEM;
break;
}
- /* If an huge pmd materialized from under us fail */
- if (unlikely(pmd_trans_huge(*dst_pmd))) {
+ dst_pmdval = pmdp_get_lockless(dst_pmd);
+ /*
+ * If the dst_pmd is THP don't override it and just be strict.
+ * (This includes the case where the PMD used to be THP and
+ * changed back to none after __pte_alloc().)
+ */
+ if (unlikely(!pmd_present(dst_pmdval) || pmd_trans_huge(dst_pmdval) ||
+ pmd_devmap(dst_pmdval))) {
+ err = -EEXIST;
+ break;
+ }
+ if (unlikely(pmd_bad(dst_pmdval))) {
err = -EFAULT;
break;
}
_
Patches currently in -mm which might be from jannh(a)google.com are
mm-fix-harmless-type-confusion-in-lock_vma_under_rcu.patch
The quilt patch titled
Subject: mm: vmalloc: ensure vmap_block is initialised before adding to queue
has been removed from the -mm tree. Its filename was
mm-vmalloc-ensure-vmap_block-is-initialised-before-adding-to-queue.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Will Deacon <will(a)kernel.org>
Subject: mm: vmalloc: ensure vmap_block is initialised before adding to queue
Date: Mon, 12 Aug 2024 18:16:06 +0100
Commit 8c61291fd850 ("mm: fix incorrect vbq reference in
purge_fragmented_block") extended the 'vmap_block' structure to contain a
'cpu' field which is set at allocation time to the id of the initialising
CPU.
When a new 'vmap_block' is being instantiated by new_vmap_block(), the
partially initialised structure is added to the local 'vmap_block_queue'
xarray before the 'cpu' field has been initialised. If another CPU is
concurrently walking the xarray (e.g. via vm_unmap_aliases()), then it
may perform an out-of-bounds access to the remote queue thanks to an
uninitialised index.
This has been observed as UBSAN errors in Android:
| Internal error: UBSAN: array index out of bounds: 00000000f2005512 [#1] PREEMPT SMP
|
| Call trace:
| purge_fragmented_block+0x204/0x21c
| _vm_unmap_aliases+0x170/0x378
| vm_unmap_aliases+0x1c/0x28
| change_memory_common+0x1dc/0x26c
| set_memory_ro+0x18/0x24
| module_enable_ro+0x98/0x238
| do_init_module+0x1b0/0x310
Move the initialisation of 'vb->cpu' in new_vmap_block() ahead of the
addition to the xarray.
Link: https://lkml.kernel.org/r/20240812171606.17486-1-will@kernel.org
Fixes: 8c61291fd850 ("mm: fix incorrect vbq reference in purge_fragmented_block")
Signed-off-by: Will Deacon <will(a)kernel.org>
Reviewed-by: Baoquan He <bhe(a)redhat.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
Cc: Zhaoyang Huang <zhaoyang.huang(a)unisoc.com>
Cc: Hailong.Liu <hailong.liu(a)oppo.com>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Lorenzo Stoakes <lstoakes(a)gmail.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmalloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/vmalloc.c~mm-vmalloc-ensure-vmap_block-is-initialised-before-adding-to-queue
+++ a/mm/vmalloc.c
@@ -2626,6 +2626,7 @@ static void *new_vmap_block(unsigned int
vb->dirty_max = 0;
bitmap_set(vb->used_map, 0, (1UL << order));
INIT_LIST_HEAD(&vb->free_list);
+ vb->cpu = raw_smp_processor_id();
xa = addr_to_vb_xa(va->va_start);
vb_idx = addr_to_vb_idx(va->va_start);
@@ -2642,7 +2643,6 @@ static void *new_vmap_block(unsigned int
* integrity together with list_for_each_rcu from read
* side.
*/
- vb->cpu = raw_smp_processor_id();
vbq = per_cpu_ptr(&vmap_block_queue, vb->cpu);
spin_lock(&vbq->lock);
list_add_tail_rcu(&vb->free_list, &vbq->free);
_
Patches currently in -mm which might be from will(a)kernel.org are