Linux stable v5.10.226 suffers a lockdep warning when accessing /proc/PID/cpuset. cset_cgroup_from_root() is called without cgroup_mutex is held, which causes assertion failure.
Bisect blames 5.10.225 commit 688325078a8b ("cgroup/cpuset: Prevent UAF in proc_cpuset_show()"). I've have not easily reproduced the problem that this change fixes, so I'm not sure if it's best to revert the fix or adapt it to meet the 5.10 locking expectations.
The lockdep complaint:
$ cat /proc/1/cpuset $ dmesg [ 198.744891] ------------[ cut here ]------------ [ 198.744918] WARNING: CPU: 4 PID: 9301 at kernel/cgroup/cgroup.c:1395 cset_cgroup_from_root+0xb2/0xd0 [ 198.744957] RIP: 0010:cset_cgroup_from_root+0xb2/0xd0 [ 198.744960] Code: 02 00 00 74 11 48 8b 09 48 39 cb 75 eb eb 19 49 83 c6 10 4c 89 f0 48 85 c0 74 0d 5b 41 5e c3 48 8b 43 60 48 85 c0 75 f3 0f 0b <0f> 0b 83 3d 69 01 ee 01 00 0f 85 78 ff ff ff eb 8b 0f 0b eb 87 66 [ 198.744962] RSP: 0018:ffffb492608a7ce8 EFLAGS: 00010046 [ 198.744977] RAX: 0000000000000000 RBX: ffffffff8f4171b8 RCX: cc949de848c33e00 [ 198.744979] RDX: 0000000000001000 RSI: ffffffff8f415450 RDI: ffff92e5417c4dc0 [ 198.744981] RBP: ffff9303467e3f00 R08: 0000000000000008 R09: ffffffff9122d568 [ 198.744983] R10: ffff92e5417c4380 R11: 0000000000000000 R12: ffff92e3d9506000 [ 198.744984] R13: 0000000000000000 R14: ffff92e443a96000 R15: ffff92e3d9506000 [ 198.744987] FS: 00007f15d94ed740(0000) GS:ffff9302bf500000(0000) knlGS:0000000000000000 [ 198.744988] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 198.744990] CR2: 00007f15d94ca000 CR3: 00000002816ca003 CR4: 00000000001706e0 [ 198.744992] Call Trace: [ 198.744996] ? __warn+0xcd/0x1c0 [ 198.745000] ? cset_cgroup_from_root+0xb2/0xd0 [ 198.745008] ? report_bug+0x87/0xf0 [ 198.745015] ? handle_bug+0x42/0x80 [ 198.745017] ? exc_invalid_op+0x16/0x70 [ 198.745021] ? asm_exc_invalid_op+0x12/0x20 [ 198.745030] ? cset_cgroup_from_root+0xb2/0xd0 [ 198.745034] ? cset_cgroup_from_root+0x28/0xd0 [ 198.745038] cgroup_path_ns_locked+0x23/0x50 [ 198.745044] proc_cpuset_show+0x115/0x210 [ 198.745049] proc_single_show+0x4a/0xa0 [ 198.745056] seq_read_iter+0x14d/0x400 [ 198.745063] seq_read+0x103/0x130 [ 198.745074] vfs_read+0xea/0x320 [ 198.745078] ? do_user_addr_fault+0x25b/0x390 [ 198.745085] ? do_user_addr_fault+0x25b/0x390 [ 198.745090] ksys_read+0x70/0xe0 [ 198.745096] do_syscall_64+0x2d/0x40 [ 198.745099] entry_SYSCALL_64_after_hwframe+0x61/0xcb
Greg Thelen wrote:
Linux stable v5.10.226 suffers a lockdep warning when accessing /proc/PID/cpuset. cset_cgroup_from_root() is called without cgroup_mutex is held, which causes assertion failure.
Bisect blames 5.10.225 commit 688325078a8b ("cgroup/cpuset: Prevent UAF in proc_cpuset_show()"). I've have not easily reproduced the problem that this change fixes, so I'm not sure if it's best to revert the fix or adapt it to meet the 5.10 locking expectations.
The lockdep complaint:
$ cat /proc/1/cpuset $ dmesg [ 198.744891] ------------[ cut here ]------------ [ 198.744918] WARNING: CPU: 4 PID: 9301 at kernel/cgroup/cgroup.c:1395 cset_cgroup_from_root+0xb2/0xd0 [ 198.744957] RIP: 0010:cset_cgroup_from_root+0xb2/0xd0 [ 198.744960] Code: 02 00 00 74 11 48 8b 09 48 39 cb 75 eb eb 19 49 83 c6 10 4c 89 f0 48 85 c0 74 0d 5b 41 5e c3 48 8b 43 60 48 85 c0 75 f3 0f 0b <0f> 0b 83 3d 69 01 ee 01 00 0f 85 78 ff ff ff eb 8b 0f 0b eb 87 66 [ 198.744962] RSP: 0018:ffffb492608a7ce8 EFLAGS: 00010046 [ 198.744977] RAX: 0000000000000000 RBX: ffffffff8f4171b8 RCX: cc949de848c33e00 [ 198.744979] RDX: 0000000000001000 RSI: ffffffff8f415450 RDI: ffff92e5417c4dc0 [ 198.744981] RBP: ffff9303467e3f00 R08: 0000000000000008 R09: ffffffff9122d568 [ 198.744983] R10: ffff92e5417c4380 R11: 0000000000000000 R12: ffff92e3d9506000 [ 198.744984] R13: 0000000000000000 R14: ffff92e443a96000 R15: ffff92e3d9506000 [ 198.744987] FS: 00007f15d94ed740(0000) GS:ffff9302bf500000(0000) knlGS:0000000000000000 [ 198.744988] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 198.744990] CR2: 00007f15d94ca000 CR3: 00000002816ca003 CR4: 00000000001706e0 [ 198.744992] Call Trace: [ 198.744996] ? __warn+0xcd/0x1c0 [ 198.745000] ? cset_cgroup_from_root+0xb2/0xd0 [ 198.745008] ? report_bug+0x87/0xf0 [ 198.745015] ? handle_bug+0x42/0x80 [ 198.745017] ? exc_invalid_op+0x16/0x70 [ 198.745021] ? asm_exc_invalid_op+0x12/0x20 [ 198.745030] ? cset_cgroup_from_root+0xb2/0xd0 [ 198.745034] ? cset_cgroup_from_root+0x28/0xd0 [ 198.745038] cgroup_path_ns_locked+0x23/0x50 [ 198.745044] proc_cpuset_show+0x115/0x210 [ 198.745049] proc_single_show+0x4a/0xa0 [ 198.745056] seq_read_iter+0x14d/0x400 [ 198.745063] seq_read+0x103/0x130 [ 198.745074] vfs_read+0xea/0x320 [ 198.745078] ? do_user_addr_fault+0x25b/0x390 [ 198.745085] ? do_user_addr_fault+0x25b/0x390 [ 198.745090] ksys_read+0x70/0xe0 [ 198.745096] do_syscall_64+0x2d/0x40 [ 198.745099] entry_SYSCALL_64_after_hwframe+0x61/0xcb
Hello,
we've also encountered this problem. The thing is that commit 688325078a8b ("cgroup/cpuset: Prevent UAF in proc_cpuset_show()") relies on the RCU synchronization changes introduced by commit d23b5c577715 ("cgroup: Make operations on the cgroup root_list RCU safe") which wasn't backported to 5.10 as it couldn't be cleanly applied there. That commit converted access to the root_list synchronization from depending on cgroup mutex to be RCU-safe.
5.15 also has this problem, while 6.1 and later stables have the backport of this RCU-changing commit so they are not affected. As mentioned by Michal here: https://lore.kernel.org/stable/xrc6s5oyf3b5hflsffklogluuvd75h2khanrke2laes3e...
In the next email I'll send the adapted to 5.10/5.15 commit along with its upstream-fix to avoid build failure in some situations. Would be nice if you give them a try. Thanks!
From: Yafang Shao laoar.shao@gmail.com
commit d23b5c577715892c87533b13923306acc6243f93 upstream.
At present, when we perform operations on the cgroup root_list, we must hold the cgroup_mutex, which is a relatively heavyweight lock. In reality, we can make operations on this list RCU-safe, eliminating the need to hold the cgroup_mutex during traversal. Modifications to the list only occur in the cgroup root setup and destroy paths, which should be infrequent in a production environment. In contrast, traversal may occur frequently. Therefore, making it RCU-safe would be beneficial.
Signed-off-by: Yafang Shao laoar.shao@gmail.com Signed-off-by: Tejun Heo tj@kernel.org [fp: adapt to 5.10 mainly because of changes made by e210a89f5b07 ("cgroup.c: add helper __cset_cgroup_from_root to cleanup duplicated codes")] Signed-off-by: Fedor Pchelkin pchelkin@ispras.ru --- include/linux/cgroup-defs.h | 1 + kernel/cgroup/cgroup-internal.h | 3 ++- kernel/cgroup/cgroup.c | 23 ++++++++++++++++------- 3 files changed, 19 insertions(+), 8 deletions(-)
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index c9fafca1c30c..08912a5dd438 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -512,6 +512,7 @@ struct cgroup_root {
/* A list running through the active hierarchies */ struct list_head root_list; + struct rcu_head rcu;
/* Hierarchy-specific flags */ unsigned int flags; diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h index d8fcc139ac05..f38f56b8cc41 100644 --- a/kernel/cgroup/cgroup-internal.h +++ b/kernel/cgroup/cgroup-internal.h @@ -172,7 +172,8 @@ extern struct list_head cgroup_roots;
/* iterate across the hierarchies */ #define for_each_root(root) \ - list_for_each_entry((root), &cgroup_roots, root_list) + list_for_each_entry_rcu((root), &cgroup_roots, root_list, \ + lockdep_is_held(&cgroup_mutex))
/** * for_each_subsys - iterate all enabled cgroup subsystems diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 11400eba6124..3a0e60605cbe 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -1304,7 +1304,7 @@ static void cgroup_exit_root_id(struct cgroup_root *root)
void cgroup_free_root(struct cgroup_root *root) { - kfree(root); + kfree_rcu(root, rcu); }
static void cgroup_destroy_root(struct cgroup_root *root) @@ -1337,7 +1337,7 @@ static void cgroup_destroy_root(struct cgroup_root *root) spin_unlock_irq(&css_set_lock);
if (!list_empty(&root->root_list)) { - list_del(&root->root_list); + list_del_rcu(&root->root_list); cgroup_root_count--; }
@@ -1392,7 +1392,6 @@ static struct cgroup *cset_cgroup_from_root(struct css_set *cset, { struct cgroup *res = NULL;
- lockdep_assert_held(&cgroup_mutex); lockdep_assert_held(&css_set_lock);
if (cset == &init_css_set) { @@ -1412,13 +1411,23 @@ static struct cgroup *cset_cgroup_from_root(struct css_set *cset, } }
- BUG_ON(!res); + /* + * If cgroup_mutex is not held, the cgrp_cset_link will be freed + * before we remove the cgroup root from the root_list. Consequently, + * when accessing a cgroup root, the cset_link may have already been + * freed, resulting in a NULL res_cgroup. However, by holding the + * cgroup_mutex, we ensure that res_cgroup can't be NULL. + * If we don't hold cgroup_mutex in the caller, we must do the NULL + * check. + */ return res; }
/* * Return the cgroup for "task" from the given hierarchy. Must be - * called with cgroup_mutex and css_set_lock held. + * called with css_set_lock held to prevent task's groups from being modified. + * Must be called with either cgroup_mutex or rcu read lock to prevent the + * cgroup root from being destroyed. */ struct cgroup *task_cgroup_from_root(struct task_struct *task, struct cgroup_root *root) @@ -1950,7 +1959,7 @@ void init_cgroup_root(struct cgroup_fs_context *ctx) struct cgroup_root *root = ctx->root; struct cgroup *cgrp = &root->cgrp;
- INIT_LIST_HEAD(&root->root_list); + INIT_LIST_HEAD_RCU(&root->root_list); atomic_set(&root->nr_cgrps, 1); cgrp->root = root; init_cgroup_housekeeping(cgrp); @@ -2028,7 +2037,7 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask) * care of subsystems' refcounts, which are explicitly dropped in * the failure exit path. */ - list_add(&root->root_list, &cgroup_roots); + list_add_rcu(&root->root_list, &cgroup_roots); cgroup_root_count++;
/*
From: Waiman Long longman@redhat.com
commit a7fb0423c201ba12815877a0b5a68a6a1710b23a upstream.
Commit d23b5c577715 ("cgroup: Make operations on the cgroup root_list RCU safe") adds a new rcu_head to the cgroup_root structure and kvfree_rcu() for freeing the cgroup_root.
The current implementation of kvfree_rcu(), however, has the limitation that the offset of the rcu_head structure within the larger data structure must be less than 4096 or the compilation will fail. See the macro definition of __is_kvfree_rcu_offset() in include/linux/rcupdate.h for more information.
By putting rcu_head below the large cgroup structure, any change to the cgroup structure that makes it larger run the risk of causing build failure under certain configurations. Commit 77070eeb8821 ("cgroup: Avoid false cacheline sharing of read mostly rstat_cpu") happens to be the last straw that breaks it. Fix this problem by moving the rcu_head structure up before the cgroup structure.
Fixes: d23b5c577715 ("cgroup: Make operations on the cgroup root_list RCU safe") Reported-by: Stephen Rothwell sfr@canb.auug.org.au Closes: https://lore.kernel.org/lkml/20231207143806.114e0a74@canb.auug.org.au/ Signed-off-by: Waiman Long longman@redhat.com Acked-by: Yafang Shao laoar.shao@gmail.com Reviewed-by: Yosry Ahmed yosryahmed@google.com Reviewed-by: Michal Koutný mkoutny@suse.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Fedor Pchelkin pchelkin@ispras.ru --- include/linux/cgroup-defs.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 08912a5dd438..20b2b32f825b 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -501,6 +501,10 @@ struct cgroup_root { /* Unique id for this hierarchy. */ int hierarchy_id;
+ /* A list running through the active hierarchies */ + struct list_head root_list; + struct rcu_head rcu; + /* The root cgroup. Root is destroyed on its release. */ struct cgroup cgrp;
@@ -510,10 +514,6 @@ struct cgroup_root { /* Number of cgroups in the hierarchy, used only for /proc/cgroups */ atomic_t nr_cgrps;
- /* A list running through the active hierarchies */ - struct list_head root_list; - struct rcu_head rcu; - /* Hierarchy-specific flags */ unsigned int flags;
On 2024/9/19 16:47, Fedor Pchelkin wrote:
Greg Thelen wrote:
Linux stable v5.10.226 suffers a lockdep warning when accessing /proc/PID/cpuset. cset_cgroup_from_root() is called without cgroup_mutex is held, which causes assertion failure.
Bisect blames 5.10.225 commit 688325078a8b ("cgroup/cpuset: Prevent UAF in proc_cpuset_show()"). I've have not easily reproduced the problem that this change fixes, so I'm not sure if it's best to revert the fix or adapt it to meet the 5.10 locking expectations.
The lockdep complaint:
$ cat /proc/1/cpuset $ dmesg [ 198.744891] ------------[ cut here ]------------ [ 198.744918] WARNING: CPU: 4 PID: 9301 at kernel/cgroup/cgroup.c:1395 cset_cgroup_from_root+0xb2/0xd0 [ 198.744957] RIP: 0010:cset_cgroup_from_root+0xb2/0xd0 [ 198.744960] Code: 02 00 00 74 11 48 8b 09 48 39 cb 75 eb eb 19 49 83 c6 10 4c 89 f0 48 85 c0 74 0d 5b 41 5e c3 48 8b 43 60 48 85 c0 75 f3 0f 0b <0f> 0b 83 3d 69 01 ee 01 00 0f 85 78 ff ff ff eb 8b 0f 0b eb 87 66 [ 198.744962] RSP: 0018:ffffb492608a7ce8 EFLAGS: 00010046 [ 198.744977] RAX: 0000000000000000 RBX: ffffffff8f4171b8 RCX: cc949de848c33e00 [ 198.744979] RDX: 0000000000001000 RSI: ffffffff8f415450 RDI: ffff92e5417c4dc0 [ 198.744981] RBP: ffff9303467e3f00 R08: 0000000000000008 R09: ffffffff9122d568 [ 198.744983] R10: ffff92e5417c4380 R11: 0000000000000000 R12: ffff92e3d9506000 [ 198.744984] R13: 0000000000000000 R14: ffff92e443a96000 R15: ffff92e3d9506000 [ 198.744987] FS: 00007f15d94ed740(0000) GS:ffff9302bf500000(0000) knlGS:0000000000000000 [ 198.744988] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 198.744990] CR2: 00007f15d94ca000 CR3: 00000002816ca003 CR4: 00000000001706e0 [ 198.744992] Call Trace: [ 198.744996] ? __warn+0xcd/0x1c0 [ 198.745000] ? cset_cgroup_from_root+0xb2/0xd0 [ 198.745008] ? report_bug+0x87/0xf0 [ 198.745015] ? handle_bug+0x42/0x80 [ 198.745017] ? exc_invalid_op+0x16/0x70 [ 198.745021] ? asm_exc_invalid_op+0x12/0x20 [ 198.745030] ? cset_cgroup_from_root+0xb2/0xd0 [ 198.745034] ? cset_cgroup_from_root+0x28/0xd0 [ 198.745038] cgroup_path_ns_locked+0x23/0x50 [ 198.745044] proc_cpuset_show+0x115/0x210 [ 198.745049] proc_single_show+0x4a/0xa0 [ 198.745056] seq_read_iter+0x14d/0x400 [ 198.745063] seq_read+0x103/0x130 [ 198.745074] vfs_read+0xea/0x320 [ 198.745078] ? do_user_addr_fault+0x25b/0x390 [ 198.745085] ? do_user_addr_fault+0x25b/0x390 [ 198.745090] ksys_read+0x70/0xe0 [ 198.745096] do_syscall_64+0x2d/0x40 [ 198.745099] entry_SYSCALL_64_after_hwframe+0x61/0xcb
Hello,
we've also encountered this problem. The thing is that commit 688325078a8b ("cgroup/cpuset: Prevent UAF in proc_cpuset_show()") relies on the RCU synchronization changes introduced by commit d23b5c577715 ("cgroup: Make operations on the cgroup root_list RCU safe") which wasn't backported to 5.10 as it couldn't be cleanly applied there. That commit converted access to the root_list synchronization from depending on cgroup mutex to be RCU-safe.
5.15 also has this problem, while 6.1 and later stables have the backport of this RCU-changing commit so they are not affected. As mentioned by Michal here: https://lore.kernel.org/stable/xrc6s5oyf3b5hflsffklogluuvd75h2khanrke2laes3e...
Yes, I think commit d23b5c577715 ("cgroup: Make operations on the cgroup root_list RCU safe") is needed.
In the next email I'll send the adapted to 5.10/5.15 commit along with its upstream-fix to avoid build failure in some situations. Would be nice if you give them a try. Thanks!
linux-stable-mirror@lists.linaro.org