release_task, where the seccomp's filter refcounter is released, is not called for the case when the fork/clone is terminated midway by a signal. This leaves an extra reference that prevents filter from being destroyed even after all processes using it exit leading to a BPF JIT memory leak. Dereference the refcounter in the failure path of the copy_process function.
Fixes: 3a15fb6ed92c ("seccomp: release filter after task is fully dead") Cc: Christian Brauner brauner@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Oleksandr Tymoshenko ovt@google.com --- kernel/fork.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/kernel/fork.c b/kernel/fork.c index 90c85b17bf69..20f7a3e91354 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1763,6 +1763,21 @@ static void copy_seccomp(struct task_struct *p) #endif }
+static void release_seccomp(struct task_struct *p) +{ +#ifdef CONFIG_SECCOMP + /* + * Must be called with sighand->lock held, which is common to + * all threads in the group. Holding cred_guard_mutex is not + * needed because this new task is not yet running and cannot + * be racing exec. + */ + assert_spin_locked(¤t->sighand->siglock); + + seccomp_filter_release(p); +#endif +} + SYSCALL_DEFINE1(set_tid_address, int __user *, tidptr) { current->clear_child_tid = tidptr; @@ -2495,6 +2510,7 @@ static __latent_entropy struct task_struct *copy_process( return p;
bad_fork_cancel_cgroup: + release_seccomp(p); sched_core_free(p); spin_unlock(¤t->sighand->siglock); write_unlock_irq(&tasklist_lock);
On Fri, Sep 02, 2022 at 03:41:35AM +0000, Oleksandr Tymoshenko wrote:
release_task, where the seccomp's filter refcounter is released, is not called for the case when the fork/clone is terminated midway by a signal. This leaves an extra reference that prevents filter from being destroyed even after all processes using it exit leading to a BPF JIT memory leak. Dereference the refcounter in the failure path of the copy_process function.
Fixes: 3a15fb6ed92c ("seccomp: release filter after task is fully dead") Cc: Christian Brauner brauner@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Oleksandr Tymoshenko ovt@google.com
Hey Oleksandr,
Thanks for the patch! I'm really puzzled as to why we never noticed this and I'm trying to re-architect how this happend. But in any case, there's a patch in the seccomp tree that fixes this:
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?id=6d...
which is slighly different from your approach in that it moves copy_seccomp() after the point of no return. Let us know if you see any issues with this!
Christian
linux-stable-mirror@lists.linaro.org