On Thu, 23 Oct 2025 at 08:29, Peng Wang peng_wang@linux.alibaba.com wrote:
An invalid pointer dereference bug was reported on arm64 cpu, and has not yet been seen on x86. A partial oops looks like:
Call trace: update_cfs_rq_h_load+0x80/0xb0 wake_affine+0x158/0x168 select_task_rq_fair+0x364/0x3a8 try_to_wake_up+0x154/0x648 wake_up_q+0x68/0xd0 futex_wake_op+0x280/0x4c8 do_futex+0x198/0x1c0 __arm64_sys_futex+0x11c/0x198
Link: https://lore.kernel.org/all/20251013071820.1531295-1-CruzZhao@linux.alibaba....
We found that the task_group corresponding to the problematic se is not in the parent task_group’s children list, indicating that h_load_next points to an invalid address. Consider the following cgroup and task hierarchy:
A / \ / \ B E / \ | / \ t2C D | | t0 t1
Here follows a timing sequence that may be responsible for triggering the problem:
CPU X CPU Y CPU Z wakeup t0 set list A->B->C traverse A->B->C t0 exits destroy C wakeup t2 set list A->E wakeup t1 set list A->B->D traverse A->B->C panic
CPU Z sets ->h_load_next list to A->B->D, but due to arm64 weaker memory ordering, Y may observe A->B before it sees B->D, then in this time window, it can traverse A->B->C and reach an invalid se.
We can avoid stale pointer accesses by clearing ->h_load_next when unregistering cgroup.
Suggested-by: Vincent Guittot vincent.guittot@linaro.org Fixes: 685207963be9 ("sched: Move h_load calculation to task_h_load()") Cc: stable@vger.kernel.org Co-developed-by: Cruz Zhao CruzZhao@linux.alibaba.com Signed-off-by: Cruz Zhao CruzZhao@linux.alibaba.com Signed-off-by: Peng Wang peng_wang@linux.alibaba.com
kernel/sched/fair.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index cee1793e8277..a5fce15093d3 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -13427,6 +13427,14 @@ void unregister_fair_sched_group(struct task_group *tg) list_del_leaf_cfs_rq(cfs_rq); } remove_entity_load_avg(se);
/** Clear parent's h_load_next if it points to the* sched_entity being freed to avoid stale pointer.*/struct cfs_rq *parent_cfs_rq = cfs_rq_of(se);
Move the declaration at the beg of the if (se) {
if (READ_ONCE(parent_cfs_rq->h_load_next) == se)WRITE_ONCE(parent_cfs_rq->h_load_next, NULL); } /*-- 2.27.0