The quilt patch titled
Subject: Revert "userfaultfd: don't fail on unrecognized features"
has been removed from the -mm tree. Its filename was
revert-userfaultfd-dont-fail-on-unrecognized-features.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: Revert "userfaultfd: don't fail on unrecognized features"
Date: Wed, 12 Apr 2023 12:38:52 -0400
This is a proposal to revert commit 914eedcb9ba0ff53c33808.
I found this when writing a simple UFFDIO_API test to be the first unit
test in this set. Two things breaks with the commit:
- UFFDIO_API check was lost and missing. According to man page, the
kernel should reject ioctl(UFFDIO_API) if uffdio_api.api != 0xaa. This
check is needed if the api version will be extended in the future, or
user app won't be able to identify which is a new kernel.
- Feature flags checks were removed, which means UFFDIO_API with a
feature that does not exist will also succeed. According to the man
page, we should (and it makes sense) to reject ioctl(UFFDIO_API) if
unknown features passed in.
Link: https://lore.kernel.org/r/20220722201513.1624158-1-axelrasmussen@google.com
Link: https://lkml.kernel.org/r/20230412163922.327282-2-peterx@redhat.com
Fixes: 914eedcb9ba0 ("userfaultfd: don't fail on unrecognized features")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Dmitry Safonov <0x7f454c46(a)gmail.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Mike Rapoport (IBM) <rppt(a)kernel.org>
Cc: Zach O'Keefe <zokeefe(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/userfaultfd.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/fs/userfaultfd.c~revert-userfaultfd-dont-fail-on-unrecognized-features
+++ a/fs/userfaultfd.c
@@ -1955,8 +1955,10 @@ static int userfaultfd_api(struct userfa
ret = -EFAULT;
if (copy_from_user(&uffdio_api, buf, sizeof(uffdio_api)))
goto out;
- /* Ignore unsupported features (userspace built against newer kernel) */
- features = uffdio_api.features & UFFD_API_FEATURES;
+ features = uffdio_api.features;
+ ret = -EINVAL;
+ if (uffdio_api.api != UFFD_API || (features & ~UFFD_API_FEATURES))
+ goto err_out;
ret = -EPERM;
if ((features & UFFD_FEATURE_EVENT_FORK) && !capable(CAP_SYS_PTRACE))
goto err_out;
_
Patches currently in -mm which might be from peterx(a)redhat.com are
selftests-mm-update-gitignore-with-two-missing-tests.patch
selftests-mm-dump-a-summary-in-run_vmtestssh.patch
selftests-mm-merge-utilh-into-vm_utilh.patch
selftests-mm-use-test_gen_progs-where-proper.patch
selftests-mm-link-vm_utilc-always.patch
selftests-mm-merge-default_huge_page_size-into-one.patch
selftests-mm-use-pm_-macros-in-vm_utilsh.patch
selftests-mm-reuse-pagemap_get_entry-in-vm_utilh.patch
selftests-mm-test-uffdio_zeropage-only-when-hugetlb.patch
selftests-mm-drop-test_uffdio_zeropage_eexist.patch
selftests-mm-create-uffd-common.patch
selftests-mm-split-uffd-tests-into-uffd-stress-and-uffd-unit-tests.patch
selftests-mm-uffd_register.patch
selftests-mm-uffd_open_devsys.patch
selftests-mm-uffdio_api-test.patch
selftests-mm-drop-global-mem_fd-in-uffd-tests.patch
selftests-mm-drop-global-hpage_size-in-uffd-tests.patch
selftests-mm-rename-uffd_stats-to-uffd_args.patch
selftests-mm-let-uffd_handle_page_fault-take-wp-parameter.patch
selftests-mm-allow-allocate_area-to-fail-properly.patch
selftests-mm-add-framework-for-uffd-unit-test.patch
selftests-mm-move-uffd-pagemap-test-to-unit-test.patch
selftests-mm-move-uffd-minor-test-to-unit-test.patch
selftests-mm-move-uffd-sig-events-tests-into-uffd-unit-tests.patch
selftests-mm-move-zeropage-test-into-uffd-unit-tests.patch
selftests-mm-workaround-no-way-to-detect-uffd-minor-wp.patch
selftests-mm-allow-uffd-test-to-skip-properly-with-no-privilege.patch
selftests-mm-drop-sys-dev-test-in-uffd-stress-test.patch
selftests-mm-add-shmem-private-test-to-uffd-stress.patch
selftests-mm-add-uffdio-register-ioctls-test.patch
The quilt patch titled
Subject: writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs
has been removed from the -mm tree. Its filename was
writeback-cgroup-fix-null-ptr-deref-write-in-bdi_split_work_to_wbs.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Baokun Li <libaokun1(a)huawei.com>
Subject: writeback, cgroup: fix null-ptr-deref write in bdi_split_work_to_wbs
Date: Mon, 10 Apr 2023 21:08:26 +0800
KASAN report null-ptr-deref:
==================================================================
BUG: KASAN: null-ptr-deref in bdi_split_work_to_wbs+0x5c5/0x7b0
Write of size 8 at addr 0000000000000000 by task sync/943
CPU: 5 PID: 943 Comm: sync Tainted: 6.3.0-rc5-next-20230406-dirty #461
Call Trace:
<TASK>
dump_stack_lvl+0x7f/0xc0
print_report+0x2ba/0x340
kasan_report+0xc4/0x120
kasan_check_range+0x1b7/0x2e0
__kasan_check_write+0x24/0x40
bdi_split_work_to_wbs+0x5c5/0x7b0
sync_inodes_sb+0x195/0x630
sync_inodes_one_sb+0x3a/0x50
iterate_supers+0x106/0x1b0
ksys_sync+0x98/0x160
[...]
==================================================================
The race that causes the above issue is as follows:
cpu1 cpu2
-------------------------|-------------------------
inode_switch_wbs
INIT_WORK(&isw->work, inode_switch_wbs_work_fn)
queue_rcu_work(isw_wq, &isw->work)
// queue_work async
inode_switch_wbs_work_fn
wb_put_many(old_wb, nr_switched)
percpu_ref_put_many
ref->data->release(ref)
cgwb_release
queue_work(cgwb_release_wq, &wb->release_work)
// queue_work async
&wb->release_work
cgwb_release_workfn
ksys_sync
iterate_supers
sync_inodes_one_sb
sync_inodes_sb
bdi_split_work_to_wbs
kmalloc(sizeof(*work), GFP_ATOMIC)
// alloc memory failed
percpu_ref_exit
ref->data = NULL
kfree(data)
wb_get(wb)
percpu_ref_get(&wb->refcnt)
percpu_ref_get_many(ref, 1)
atomic_long_add(nr, &ref->data->count)
atomic64_add(i, v)
// trigger null-ptr-deref
bdi_split_work_to_wbs() traverses &bdi->wb_list to split work into all
wbs. If the allocation of new work fails, the on-stack fallback will be
used and the reference count of the current wb is increased afterwards.
If cgroup writeback membership switches occur before getting the reference
count and the current wb is released as old_wd, then calling wb_get() or
wb_put() will trigger the null pointer dereference above.
This issue was introduced in v4.3-rc7 (see fix tag1). Both
sync_inodes_sb() and __writeback_inodes_sb_nr() calls to
bdi_split_work_to_wbs() can trigger this issue. For scenarios called via
sync_inodes_sb(), originally commit 7fc5854f8c6e ("writeback: synchronize
sync(2) against cgroup writeback membership switches") reduced the
possibility of the issue by adding wb_switch_rwsem, but in v5.14-rc1 (see
fix tag2) removed the "inode_io_list_del_locked(inode, old_wb)" from
inode_switch_wbs_work_fn() so that wb->state contains WB_has_dirty_io,
thus old_wb is not skipped when traversing wbs in bdi_split_work_to_wbs(),
and the issue becomes easily reproducible again.
To solve this problem, percpu_ref_exit() is called under RCU protection to
avoid race between cgwb_release_workfn() and bdi_split_work_to_wbs().
Moreover, replace wb_get() with wb_tryget() in bdi_split_work_to_wbs(),
and skip the current wb if wb_tryget() fails because the wb has already
been shutdown.
Link: https://lkml.kernel.org/r/20230410130826.1492525-1-libaokun1@huawei.com
Fixes: b817525a4a80 ("writeback: bdi_writeback iteration must not skip dying ones")
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Acked-by: Tejun Heo <tj(a)kernel.org>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger.kernel(a)dilger.ca>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Dennis Zhou <dennis(a)kernel.org>
Cc: Hou Tao <houtao1(a)huawei.com>
Cc: yangerkun <yangerkun(a)huawei.com>
Cc: Zhang Yi <yi.zhang(a)huawei.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/fs-writeback.c | 17 ++++++++++-------
mm/backing-dev.c | 12 ++++++++++--
2 files changed, 20 insertions(+), 9 deletions(-)
--- a/fs/fs-writeback.c~writeback-cgroup-fix-null-ptr-deref-write-in-bdi_split_work_to_wbs
+++ a/fs/fs-writeback.c
@@ -978,6 +978,16 @@ restart:
continue;
}
+ /*
+ * If wb_tryget fails, the wb has been shutdown, skip it.
+ *
+ * Pin @wb so that it stays on @bdi->wb_list. This allows
+ * continuing iteration from @wb after dropping and
+ * regrabbing rcu read lock.
+ */
+ if (!wb_tryget(wb))
+ continue;
+
/* alloc failed, execute synchronously using on-stack fallback */
work = &fallback_work;
*work = *base_work;
@@ -986,13 +996,6 @@ restart:
work->done = &fallback_work_done;
wb_queue_work(wb, work);
-
- /*
- * Pin @wb so that it stays on @bdi->wb_list. This allows
- * continuing iteration from @wb after dropping and
- * regrabbing rcu read lock.
- */
- wb_get(wb);
last_wb = wb;
rcu_read_unlock();
--- a/mm/backing-dev.c~writeback-cgroup-fix-null-ptr-deref-write-in-bdi_split_work_to_wbs
+++ a/mm/backing-dev.c
@@ -507,6 +507,15 @@ static LIST_HEAD(offline_cgwbs);
static void cleanup_offline_cgwbs_workfn(struct work_struct *work);
static DECLARE_WORK(cleanup_offline_cgwbs_work, cleanup_offline_cgwbs_workfn);
+static void cgwb_free_rcu(struct rcu_head *rcu_head)
+{
+ struct bdi_writeback *wb = container_of(rcu_head,
+ struct bdi_writeback, rcu);
+
+ percpu_ref_exit(&wb->refcnt);
+ kfree(wb);
+}
+
static void cgwb_release_workfn(struct work_struct *work)
{
struct bdi_writeback *wb = container_of(work, struct bdi_writeback,
@@ -529,11 +538,10 @@ static void cgwb_release_workfn(struct w
list_del(&wb->offline_node);
spin_unlock_irq(&cgwb_lock);
- percpu_ref_exit(&wb->refcnt);
wb_exit(wb);
bdi_put(bdi);
WARN_ON_ONCE(!list_empty(&wb->b_attached));
- kfree_rcu(wb, rcu);
+ call_rcu(&wb->rcu, cgwb_free_rcu);
}
static void cgwb_release(struct percpu_ref *refcnt)
_
Patches currently in -mm which might be from libaokun1(a)huawei.com are
The quilt patch titled
Subject: maple_tree: fix a potential memory leak, OOB access, or other unpredictable bug
has been removed from the -mm tree. Its filename was
maple_tree-fix-a-potential-memory-leak-oob-access-or-other-unpredictable-bug.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Peng Zhang <zhangpeng.00(a)bytedance.com>
Subject: maple_tree: fix a potential memory leak, OOB access, or other unpredictable bug
Date: Tue, 11 Apr 2023 12:10:04 +0800
In mas_alloc_nodes(), "node->node_count = 0" means to initialize the
node_count field of the new node, but the node may not be a new node. It
may be a node that existed before and node_count has a value, setting it
to 0 will cause a memory leak. At this time, mas->alloc->total will be
greater than the actual number of nodes in the linked list, which may
cause many other errors. For example, out-of-bounds access in
mas_pop_node(), and mas_pop_node() may return addresses that should not be
used. Fix it by initializing node_count only for new nodes.
Also, by the way, an if-else statement was removed to simplify the code.
Link: https://lkml.kernel.org/r/20230411041005.26205-1-zhangpeng.00@bytedance.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Peng Zhang <zhangpeng.00(a)bytedance.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 19 +++++++------------
1 file changed, 7 insertions(+), 12 deletions(-)
--- a/lib/maple_tree.c~maple_tree-fix-a-potential-memory-leak-oob-access-or-other-unpredictable-bug
+++ a/lib/maple_tree.c
@@ -1303,26 +1303,21 @@ static inline void mas_alloc_nodes(struc
node = mas->alloc;
node->request_count = 0;
while (requested) {
- max_req = MAPLE_ALLOC_SLOTS;
- if (node->node_count) {
- unsigned int offset = node->node_count;
-
- slots = (void **)&node->slot[offset];
- max_req -= offset;
- } else {
- slots = (void **)&node->slot;
- }
-
+ max_req = MAPLE_ALLOC_SLOTS - node->node_count;
+ slots = (void **)&node->slot[node->node_count];
max_req = min(requested, max_req);
count = mt_alloc_bulk(gfp, max_req, slots);
if (!count)
goto nomem_bulk;
+ if (node->node_count == 0) {
+ node->slot[0]->node_count = 0;
+ node->slot[0]->request_count = 0;
+ }
+
node->node_count += count;
allocated += count;
node = node->slot[0];
- node->node_count = 0;
- node->request_count = 0;
requested -= count;
}
mas->alloc->total = allocated;
_
Patches currently in -mm which might be from zhangpeng.00(a)bytedance.com are
mm-kfence-improve-the-performance-of-__kfence_alloc-and-__kfence_free.patch
maple_tree-simplify-mas_wr_node_walk.patch
maple_tree-use-correct-variable-type-in-sizeof.patch
maple_tree-add-a-test-case-to-check-maple_alloc.patch
The quilt patch titled
Subject: tools/mm/page_owner_sort.c: fix TGID output when cull=tg is used
has been removed from the -mm tree. Its filename was
tools-mm-page_owner_sortc-fix-tgid-output-when-cull=tg-is-used.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Steve Chou <steve_chou(a)pesi.com.tw>
Subject: tools/mm/page_owner_sort.c: fix TGID output when cull=tg is used
Date: Tue, 11 Apr 2023 11:49:28 +0800
When using cull option with 'tg' flag, the fprintf is using pid instead
of tgid. It should use tgid instead.
Link: https://lkml.kernel.org/r/20230411034929.2071501-1-steve_chou@pesi.com.tw
Fixes: 9c8a0a8e599f4a ("tools/vm/page_owner_sort.c: support for user-defined culling rules")
Signed-off-by: Steve Chou <steve_chou(a)pesi.com.tw>
Cc: Jiajian Ye <yejiajian2018(a)email.szu.edu.cn>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/mm/page_owner_sort.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/mm/page_owner_sort.c~tools-mm-page_owner_sortc-fix-tgid-output-when-cull=tg-is-used
+++ a/tools/mm/page_owner_sort.c
@@ -857,7 +857,7 @@ int main(int argc, char **argv)
if (cull & CULL_PID || filter & FILTER_PID)
fprintf(fout, ", PID %d", list[i].pid);
if (cull & CULL_TGID || filter & FILTER_TGID)
- fprintf(fout, ", TGID %d", list[i].pid);
+ fprintf(fout, ", TGID %d", list[i].tgid);
if (cull & CULL_COMM || filter & FILTER_COMM)
fprintf(fout, ", task_comm_name: %s", list[i].comm);
if (cull & CULL_ALLOCATOR) {
_
Patches currently in -mm which might be from steve_chou(a)pesi.com.tw are
The quilt patch titled
Subject: mm/mempolicy: fix use-after-free of VMA iterator
has been removed from the -mm tree. Its filename was
mm-mempolicy-fix-use-after-free-of-vma-iterator.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Subject: mm/mempolicy: fix use-after-free of VMA iterator
Date: Mon, 10 Apr 2023 11:22:05 -0400
set_mempolicy_home_node() iterates over a list of VMAs and calls
mbind_range() on each VMA, which also iterates over the singular list of
the VMA passed in and potentially splits the VMA. Since the VMA iterator
is not passed through, set_mempolicy_home_node() may now point to a stale
node in the VMA tree. This can result in a UAF as reported by syzbot.
Avoid the stale maple tree node by passing the VMA iterator through to the
underlying call to split_vma().
mbind_range() is also overly complicated, since there are two calling
functions and one already handles iterating over the VMAs. Simplify
mbind_range() to only handle merging and splitting of the VMAs.
Align the new loop in do_mbind() and existing loop in
set_mempolicy_home_node() to use the reduced mbind_range() function. This
allows for a single location of the range calculation and avoids
constantly looking up the previous VMA (since this is a loop over the
VMAs).
Link: https://lore.kernel.org/linux-mm/000000000000c93feb05f87e24ad@google.com/
Fixes: 66850be55e8e ("mm/mempolicy: use vma iterator & maple state instead of vma linked list")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reported-by: syzbot+a7c1ec5b1d71ceaa5186(a)syzkaller.appspotmail.com
Link: https://lkml.kernel.org/r/20230410152205.2294819-1-Liam.Howlett@oracle.com
Tested-by: syzbot+a7c1ec5b1d71ceaa5186(a)syzkaller.appspotmail.com
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mempolicy.c | 102 ++++++++++++++++++++++-------------------------
1 file changed, 48 insertions(+), 54 deletions(-)
--- a/mm/mempolicy.c~mm-mempolicy-fix-use-after-free-of-vma-iterator
+++ a/mm/mempolicy.c
@@ -790,61 +790,50 @@ static int vma_replace_policy(struct vm_
return err;
}
-/* Step 2: apply policy to a range and do splits. */
-static int mbind_range(struct mm_struct *mm, unsigned long start,
- unsigned long end, struct mempolicy *new_pol)
+/* Split or merge the VMA (if required) and apply the new policy */
+static int mbind_range(struct vma_iterator *vmi, struct vm_area_struct *vma,
+ struct vm_area_struct **prev, unsigned long start,
+ unsigned long end, struct mempolicy *new_pol)
{
- VMA_ITERATOR(vmi, mm, start);
- struct vm_area_struct *prev;
- struct vm_area_struct *vma;
- int err = 0;
+ struct vm_area_struct *merged;
+ unsigned long vmstart, vmend;
pgoff_t pgoff;
+ int err;
- prev = vma_prev(&vmi);
- vma = vma_find(&vmi, end);
- if (WARN_ON(!vma))
+ vmend = min(end, vma->vm_end);
+ if (start > vma->vm_start) {
+ *prev = vma;
+ vmstart = start;
+ } else {
+ vmstart = vma->vm_start;
+ }
+
+ if (mpol_equal(vma_policy(vma), new_pol))
return 0;
- if (start > vma->vm_start)
- prev = vma;
+ pgoff = vma->vm_pgoff + ((vmstart - vma->vm_start) >> PAGE_SHIFT);
+ merged = vma_merge(vmi, vma->vm_mm, *prev, vmstart, vmend, vma->vm_flags,
+ vma->anon_vma, vma->vm_file, pgoff, new_pol,
+ vma->vm_userfaultfd_ctx, anon_vma_name(vma));
+ if (merged) {
+ *prev = merged;
+ return vma_replace_policy(merged, new_pol);
+ }
- do {
- unsigned long vmstart = max(start, vma->vm_start);
- unsigned long vmend = min(end, vma->vm_end);
-
- if (mpol_equal(vma_policy(vma), new_pol))
- goto next;
-
- pgoff = vma->vm_pgoff +
- ((vmstart - vma->vm_start) >> PAGE_SHIFT);
- prev = vma_merge(&vmi, mm, prev, vmstart, vmend, vma->vm_flags,
- vma->anon_vma, vma->vm_file, pgoff,
- new_pol, vma->vm_userfaultfd_ctx,
- anon_vma_name(vma));
- if (prev) {
- vma = prev;
- goto replace;
- }
- if (vma->vm_start != vmstart) {
- err = split_vma(&vmi, vma, vmstart, 1);
- if (err)
- goto out;
- }
- if (vma->vm_end != vmend) {
- err = split_vma(&vmi, vma, vmend, 0);
- if (err)
- goto out;
- }
-replace:
- err = vma_replace_policy(vma, new_pol);
+ if (vma->vm_start != vmstart) {
+ err = split_vma(vmi, vma, vmstart, 1);
if (err)
- goto out;
-next:
- prev = vma;
- } for_each_vma_range(vmi, vma, end);
+ return err;
+ }
-out:
- return err;
+ if (vma->vm_end != vmend) {
+ err = split_vma(vmi, vma, vmend, 0);
+ if (err)
+ return err;
+ }
+
+ *prev = vma;
+ return vma_replace_policy(vma, new_pol);
}
/* Set the process memory policy */
@@ -1259,6 +1248,8 @@ static long do_mbind(unsigned long start
nodemask_t *nmask, unsigned long flags)
{
struct mm_struct *mm = current->mm;
+ struct vm_area_struct *vma, *prev;
+ struct vma_iterator vmi;
struct mempolicy *new;
unsigned long end;
int err;
@@ -1328,7 +1319,13 @@ static long do_mbind(unsigned long start
goto up_out;
}
- err = mbind_range(mm, start, end, new);
+ vma_iter_init(&vmi, mm, start);
+ prev = vma_prev(&vmi);
+ for_each_vma_range(vmi, vma, end) {
+ err = mbind_range(&vmi, vma, &prev, start, end, new);
+ if (err)
+ break;
+ }
if (!err) {
int nr_failed = 0;
@@ -1489,10 +1486,8 @@ SYSCALL_DEFINE4(set_mempolicy_home_node,
unsigned long, home_node, unsigned long, flags)
{
struct mm_struct *mm = current->mm;
- struct vm_area_struct *vma;
+ struct vm_area_struct *vma, *prev;
struct mempolicy *new, *old;
- unsigned long vmstart;
- unsigned long vmend;
unsigned long end;
int err = -ENOENT;
VMA_ITERATOR(vmi, mm, start);
@@ -1521,6 +1516,7 @@ SYSCALL_DEFINE4(set_mempolicy_home_node,
if (end == start)
return 0;
mmap_write_lock(mm);
+ prev = vma_prev(&vmi);
for_each_vma_range(vmi, vma, end) {
/*
* If any vma in the range got policy other than MPOL_BIND
@@ -1541,9 +1537,7 @@ SYSCALL_DEFINE4(set_mempolicy_home_node,
}
new->home_node = home_node;
- vmstart = max(start, vma->vm_start);
- vmend = min(end, vma->vm_end);
- err = mbind_range(mm, vmstart, vmend, new);
+ err = mbind_range(&vmi, vma, &prev, start, end, new);
mpol_put(new);
if (err)
break;
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
maple_tree-make-maple-state-reusable-after-mas_empty_area_rev.patch
maple_tree-fix-mas_empty_area-search.patch
mm-mmap-regression-fix-for-unmapped_area_topdown.patch
The quilt patch titled
Subject: mm/huge_memory.c: warn with pr_warn_ratelimited instead of VM_WARN_ON_ONCE_FOLIO
has been removed from the -mm tree. Its filename was
mm-huge_memoryc-warn-with-pr_warn_ratelimited-instead-of-vm_warn_on_once_folio.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Subject: mm/huge_memory.c: warn with pr_warn_ratelimited instead of VM_WARN_ON_ONCE_FOLIO
Date: Thu, 6 Apr 2023 17:20:04 +0900
split_huge_page_to_list() WARNs when called for huge zero pages, which
sounds to me too harsh because it does not imply a kernel bug, but just
notifies the event to admins. On the other hand, this is considered as
critical by syzkaller and makes its testing less efficient, which seems to
me harmful.
So replace the VM_WARN_ON_ONCE_FOLIO with pr_warn_ratelimited.
Link: https://lkml.kernel.org/r/20230406082004.2185420-1-naoya.horiguchi@linux.dev
Fixes: 478d134e9506 ("mm/huge_memory: do not overkill when splitting huge_zero_page")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Reported-by: syzbot+07a218429c8d19b1fb25(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/lkml/000000000000a6f34a05e6efcd01@google.com/
Reviewed-by: Yang Shi <shy828301(a)gmail.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Tetsuo Handa <penguin-kernel(a)i-love.sakura.ne.jp>
Cc: Xu Yu <xuyu(a)linux.alibaba.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--- a/mm/huge_memory.c~mm-huge_memoryc-warn-with-pr_warn_ratelimited-instead-of-vm_warn_on_once_folio
+++ a/mm/huge_memory.c
@@ -2665,9 +2665,10 @@ int split_huge_page_to_list(struct page
VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);
is_hzp = is_huge_zero_page(&folio->page);
- VM_WARN_ON_ONCE_FOLIO(is_hzp, folio);
- if (is_hzp)
+ if (is_hzp) {
+ pr_warn_ratelimited("Called split_huge_page for huge zero page\n");
return -EBUSY;
+ }
if (folio_test_writeback(folio))
return -EBUSY;
_
Patches currently in -mm which might be from naoya.horiguchi(a)nec.com are
The quilt patch titled
Subject: mm/mprotect: fix do_mprotect_pkey() return on error
has been removed from the -mm tree. Its filename was
mm-mprotect-fix-do_mprotect_pkey-return-on-error.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Subject: mm/mprotect: fix do_mprotect_pkey() return on error
Date: Thu, 6 Apr 2023 15:30:50 -0400
When the loop over the VMA is terminated early due to an error, the return
code could be overwritten with ENOMEM. Fix the return code by only
setting the error on early loop termination when the error is not set.
User-visible effects include: attempts to run mprotect() against a
special mapping or with a poorly-aligned hugetlb address should return
-EINVAL, but they presently return -ENOMEM. In other cases an -EACCESS
should be returned.
Link: https://lkml.kernel.org/r/20230406193050.1363476-1-Liam.Howlett@oracle.com
Fixes: 2286a6914c77 ("mm: change mprotect_fixup to vma iterator")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mprotect.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/mprotect.c~mm-mprotect-fix-do_mprotect_pkey-return-on-error
+++ a/mm/mprotect.c
@@ -838,7 +838,7 @@ static int do_mprotect_pkey(unsigned lon
}
tlb_finish_mmu(&tlb);
- if (vma_iter_end(&vmi) < end)
+ if (!error && vma_iter_end(&vmi) < end)
error = -ENOMEM;
out:
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
maple_tree-make-maple-state-reusable-after-mas_empty_area_rev.patch
maple_tree-fix-mas_empty_area-search.patch
mm-mmap-regression-fix-for-unmapped_area_topdown.patch
The quilt patch titled
Subject: mm/khugepaged: check again on anon uffd-wp during isolation
has been removed from the -mm tree. Its filename was
mm-khugepaged-check-again-on-anon-uffd-wp-during-isolation.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm/khugepaged: check again on anon uffd-wp during isolation
Date: Wed, 5 Apr 2023 11:51:20 -0400
Khugepaged collapse an anonymous thp in two rounds of scans. The 2nd
round done in __collapse_huge_page_isolate() after
hpage_collapse_scan_pmd(), during which all the locks will be released
temporarily. It means the pgtable can change during this phase before 2nd
round starts.
It's logically possible some ptes got wr-protected during this phase, and
we can errornously collapse a thp without noticing some ptes are
wr-protected by userfault. e1e267c7928f wanted to avoid it but it only
did that for the 1st phase, not the 2nd phase.
Since __collapse_huge_page_isolate() happens after a round of small page
swapins, we don't need to worry on any !present ptes - if it existed
khugepaged will already bail out. So we only need to check present ptes
with uffd-wp bit set there.
This is something I found only but never had a reproducer, I thought it
was one caused a bug in Muhammad's recent pagemap new ioctl work, but it
turns out it's not the cause of that but an userspace bug. However this
seems to still be a real bug even with a very small race window, still
worth to have it fixed and copy stable.
Link: https://lkml.kernel.org/r/20230405155120.3608140-1-peterx@redhat.com
Fixes: e1e267c7928f ("khugepaged: skip collapse if uffd-wp detected")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Yang Shi <shy828301(a)gmail.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/mm/khugepaged.c~mm-khugepaged-check-again-on-anon-uffd-wp-during-isolation
+++ a/mm/khugepaged.c
@@ -573,6 +573,10 @@ static int __collapse_huge_page_isolate(
result = SCAN_PTE_NON_PRESENT;
goto out;
}
+ if (pte_uffd_wp(pteval)) {
+ result = SCAN_PTE_UFFD_WP;
+ goto out;
+ }
page = vm_normal_page(vma, address, pteval);
if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
result = SCAN_PAGE_NULL;
_
Patches currently in -mm which might be from peterx(a)redhat.com are
selftests-mm-update-gitignore-with-two-missing-tests.patch
selftests-mm-dump-a-summary-in-run_vmtestssh.patch
selftests-mm-merge-utilh-into-vm_utilh.patch
selftests-mm-use-test_gen_progs-where-proper.patch
selftests-mm-link-vm_utilc-always.patch
selftests-mm-merge-default_huge_page_size-into-one.patch
selftests-mm-use-pm_-macros-in-vm_utilsh.patch
selftests-mm-reuse-pagemap_get_entry-in-vm_utilh.patch
selftests-mm-test-uffdio_zeropage-only-when-hugetlb.patch
selftests-mm-drop-test_uffdio_zeropage_eexist.patch
selftests-mm-create-uffd-common.patch
selftests-mm-split-uffd-tests-into-uffd-stress-and-uffd-unit-tests.patch
selftests-mm-uffd_register.patch
selftests-mm-uffd_open_devsys.patch
selftests-mm-uffdio_api-test.patch
selftests-mm-drop-global-mem_fd-in-uffd-tests.patch
selftests-mm-drop-global-hpage_size-in-uffd-tests.patch
selftests-mm-rename-uffd_stats-to-uffd_args.patch
selftests-mm-let-uffd_handle_page_fault-take-wp-parameter.patch
selftests-mm-allow-allocate_area-to-fail-properly.patch
selftests-mm-add-framework-for-uffd-unit-test.patch
selftests-mm-move-uffd-pagemap-test-to-unit-test.patch
selftests-mm-move-uffd-minor-test-to-unit-test.patch
selftests-mm-move-uffd-sig-events-tests-into-uffd-unit-tests.patch
selftests-mm-move-zeropage-test-into-uffd-unit-tests.patch
selftests-mm-workaround-no-way-to-detect-uffd-minor-wp.patch
selftests-mm-allow-uffd-test-to-skip-properly-with-no-privilege.patch
selftests-mm-drop-sys-dev-test-in-uffd-stress-test.patch
selftests-mm-add-shmem-private-test-to-uffd-stress.patch
selftests-mm-add-uffdio-register-ioctls-test.patch
The quilt patch titled
Subject: mm/userfaultfd: fix uffd-wp handling for THP migration entries
has been removed from the -mm tree. Its filename was
mm-userfaultfd-fix-uffd-wp-handling-for-thp-migration-entries.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm/userfaultfd: fix uffd-wp handling for THP migration entries
Date: Wed, 5 Apr 2023 18:02:35 +0200
Looks like what we fixed for hugetlb in commit 44f86392bdd1 ("mm/hugetlb:
fix uffd-wp handling for migration entries in
hugetlb_change_protection()") similarly applies to THP.
Setting/clearing uffd-wp on THP migration entries is not implemented
properly. Further, while removing migration PMDs considers the uffd-wp
bit, inserting migration PMDs does not consider the uffd-wp bit.
We have to set/clear independently of the migration entry type in
change_huge_pmd() and properly copy the uffd-wp bit in
set_pmd_migration_entry().
Verified using a simple reproducer that triggers migration of a THP, that
the set_pmd_migration_entry() no longer loses the uffd-wp bit.
Link: https://lkml.kernel.org/r/20230405160236.587705-2-david@redhat.com
Fixes: f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Cc: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
--- a/mm/huge_memory.c~mm-userfaultfd-fix-uffd-wp-handling-for-thp-migration-entries
+++ a/mm/huge_memory.c
@@ -1838,10 +1838,10 @@ int change_huge_pmd(struct mmu_gather *t
if (is_swap_pmd(*pmd)) {
swp_entry_t entry = pmd_to_swp_entry(*pmd);
struct page *page = pfn_swap_entry_to_page(entry);
+ pmd_t newpmd;
VM_BUG_ON(!is_pmd_migration_entry(*pmd));
if (is_writable_migration_entry(entry)) {
- pmd_t newpmd;
/*
* A protection check is difficult so
* just be safe and disable write
@@ -1855,8 +1855,16 @@ int change_huge_pmd(struct mmu_gather *t
newpmd = pmd_swp_mksoft_dirty(newpmd);
if (pmd_swp_uffd_wp(*pmd))
newpmd = pmd_swp_mkuffd_wp(newpmd);
- set_pmd_at(mm, addr, pmd, newpmd);
+ } else {
+ newpmd = *pmd;
}
+
+ if (uffd_wp)
+ newpmd = pmd_swp_mkuffd_wp(newpmd);
+ else if (uffd_wp_resolve)
+ newpmd = pmd_swp_clear_uffd_wp(newpmd);
+ if (!pmd_same(*pmd, newpmd))
+ set_pmd_at(mm, addr, pmd, newpmd);
goto unlock;
}
#endif
@@ -3251,6 +3259,8 @@ int set_pmd_migration_entry(struct page_
pmdswp = swp_entry_to_pmd(entry);
if (pmd_soft_dirty(pmdval))
pmdswp = pmd_swp_mksoft_dirty(pmdswp);
+ if (pmd_uffd_wp(pmdval))
+ pmdswp = pmd_swp_mkuffd_wp(pmdswp);
set_pmd_at(mm, address, pvmw->pmd, pmdswp);
page_remove_rmap(page, vma, true);
put_page(page);
_
Patches currently in -mm which might be from david(a)redhat.com are
m68k-mm-use-correct-bit-number-in-_page_swp_exclusive-comment.patch
mm-userfaultfd-dont-consider-uffd-wp-bit-of-writable-migration-entries.patch
selftests-mm-reuse-read_pmd_pagesize-in-cow-selftest.patch
selftests-mm-mkdirty-test-behavior-of-ptepmd_mkdirty-on-vmas-without-write-permissions.patch
sparc-mm-dont-unconditionally-set-hw-writable-bit-when-setting-pte-dirty-on-64bit.patch
mm-migrate-revert-mm-migrate-fix-wrongly-apply-write-bit-after-mkdirty-on-sparc64.patch
mm-huge_memory-revert-partly-revert-mm-thp-carry-over-dirty-bit-when-thp-splits-on-pmd.patch
mm-huge_memory-conditionally-call-maybe_mkwrite-and-drop-pte_wrprotect-in-__split_huge_pmd_locked.patch