6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrei Vagin avagin@google.com
[ Upstream commit bfafe5efa9754ebc991750da0bcca2a6694f3ed3 ]
Previously, seccomp filters were released in release_task(), which required the process to exit and its zombie to be collected. However, exited threads/processes can't trigger any seccomp events, making it more logical to release filters upon task exits.
This adjustment simplifies scenarios where a parent is tracing its child process. The parent process can now handle all events from a seccomp listening descriptor and then call wait to collect a child zombie.
seccomp_filter_release takes the siglock to avoid races with seccomp_sync_threads. There was an idea to bypass taking the lock by checking PF_EXITING, but it can be set without holding siglock if threads have SIGNAL_GROUP_EXIT. This means it can happen concurently with seccomp_filter_release.
This change also fixes another minor problem. Suppose that a group leader installs the new filter without SECCOMP_FILTER_FLAG_TSYNC, exits, and becomes a zombie. Without this change, SECCOMP_FILTER_FLAG_TSYNC from any other thread can never succeed, seccomp_can_sync_threads() will check a zombie leader and is_ancestor() will fail.
Reviewed-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Andrei Vagin avagin@google.com Link: https://lore.kernel.org/r/20240628021014.231976-3-avagin@google.com Reviewed-by: Tycho Andersen tandersen@netflix.com Signed-off-by: Kees Cook kees@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/exit.c | 3 ++- kernel/seccomp.c | 23 ++++++++++++++++++----- 2 files changed, 20 insertions(+), 6 deletions(-)
diff --git a/kernel/exit.c b/kernel/exit.c index 81fcee45d630..be81342caf1b 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -277,7 +277,6 @@ void release_task(struct task_struct *p) }
write_unlock_irq(&tasklist_lock); - seccomp_filter_release(p); proc_flush_pid(thread_pid); put_pid(thread_pid); release_thread(p); @@ -834,6 +833,8 @@ void __noreturn do_exit(long code) io_uring_files_cancel(); exit_signals(tsk); /* sets PF_EXITING */
+ seccomp_filter_release(tsk); + acct_update_integrals(tsk); group_dead = atomic_dec_and_test(&tsk->signal->live); if (group_dead) { diff --git a/kernel/seccomp.c b/kernel/seccomp.c index e30b60b57614..b02337e95664 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -502,6 +502,9 @@ static inline pid_t seccomp_can_sync_threads(void) /* Skip current, since it is initiating the sync. */ if (thread == caller) continue; + /* Skip exited threads. */ + if (thread->flags & PF_EXITING) + continue;
if (thread->seccomp.mode == SECCOMP_MODE_DISABLED || (thread->seccomp.mode == SECCOMP_MODE_FILTER && @@ -563,18 +566,21 @@ static void __seccomp_filter_release(struct seccomp_filter *orig) * @tsk: task the filter should be released from. * * This function should only be called when the task is exiting as - * it detaches it from its filter tree. As such, READ_ONCE() and - * barriers are not needed here, as would normally be needed. + * it detaches it from its filter tree. PF_EXITING has to be set + * for the task. */ void seccomp_filter_release(struct task_struct *tsk) { - struct seccomp_filter *orig = tsk->seccomp.filter; + struct seccomp_filter *orig;
- /* We are effectively holding the siglock by not having any sighand. */ - WARN_ON(tsk->sighand != NULL); + if (WARN_ON((tsk->flags & PF_EXITING) == 0)) + return;
+ spin_lock_irq(&tsk->sighand->siglock); + orig = tsk->seccomp.filter; /* Detach task from its filter tree. */ tsk->seccomp.filter = NULL; + spin_unlock_irq(&tsk->sighand->siglock); __seccomp_filter_release(orig); }
@@ -602,6 +608,13 @@ static inline void seccomp_sync_threads(unsigned long flags) if (thread == caller) continue;
+ /* + * Skip exited threads. seccomp_filter_release could have + * been already called for this task. + */ + if (thread->flags & PF_EXITING) + continue; + /* Get a task reference for the new leaf node. */ get_seccomp_filter(caller);