Reset all signal handlers of the child not set to SIG_IGN to SIG_DFL. Mutually exclusive with CLONE_SIGHAND to not disturb other thread's signal handler.
In the spirit of closer cooperation between glibc developers and kernel developers (cf. [2]) this patchset came out of a discussion on the glibc mailing list for improving posix_spawn() (cf. [1], [3], [4]). Kernel support for this feature has been explicitly requested by glibc and I see no reason not to help them with this.
The child helper process on Linux posix_spawn must ensure that no signal handlers are enabled, so the signal disposition must be either SIG_DFL or SIG_IGN. However, it requires a sigprocmask to obtain the current signal mask and at least _NSIG sigaction calls to reset the signal handlers for each posix_spawn call or complex state tracking that might lead to data corruption in glibc. Adding this flags lets glibc avoid these problems.
[1]: https://www.sourceware.org/ml/libc-alpha/2019-10/msg00149.html [3]: https://www.sourceware.org/ml/libc-alpha/2019-10/msg00158.html [4]: https://www.sourceware.org/ml/libc-alpha/2019-10/msg00160.html [2]: https://lwn.net/Articles/799331/ '[...] by asking for better cooperation with the C-library projects in general. They should be copied on patches containing ABI changes, for example. I noted that there are often times where C-library developers wish the kernel community had done things differently; how could those be avoided in the future? Members of the audience suggested that more glibc developers should perhaps join the linux-api list. The other suggestion was to "copy Florian on everything".' Cc: Oleg Nesterov oleg@redhat.com Cc: Florian Weimer fweimer@redhat.com Cc: libc-alpha@sourceware.org Cc: linux-api@vger.kernel.org Signed-off-by: Christian Brauner christian.brauner@ubuntu.com --- /* v1 */ Link: https://lore.kernel.org/r/20191010133518.5420-1-christian.brauner@ubuntu.com
/* v2 */ Link: https://lore.kernel.org/r/20191011102537.27502-1-christian.brauner@ubuntu.co... - Florian Weimer fweimer@redhat.com: - update comment in clone3_args_valid()
/* v3 */ - "Michael Kerrisk (man-pages)" mtk.manpages@gmail.com: - s/CLONE3_CLEAR_SIGHAND/CLONE_CLEAR_SIGHAND/g --- include/uapi/linux/sched.h | 3 +++ kernel/fork.c | 16 +++++++++++----- 2 files changed, 14 insertions(+), 5 deletions(-)
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h index 99335e1f4a27..1d500ed03c63 100644 --- a/include/uapi/linux/sched.h +++ b/include/uapi/linux/sched.h @@ -33,6 +33,9 @@ #define CLONE_NEWNET 0x40000000 /* New network namespace */ #define CLONE_IO 0x80000000 /* Clone io context */
+/* Flags for the clone3() syscall. */ +#define CLONE_CLEAR_SIGHAND 0x100000000ULL /* Clear any signal handler and reset to SIG_DFL. */ + #ifndef __ASSEMBLY__ /** * struct clone_args - arguments for the clone3 syscall diff --git a/kernel/fork.c b/kernel/fork.c index 1f6c45f6a734..aa5b5137f071 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1517,6 +1517,11 @@ static int copy_sighand(unsigned long clone_flags, struct task_struct *tsk) spin_lock_irq(¤t->sighand->siglock); memcpy(sig->action, current->sighand->action, sizeof(sig->action)); spin_unlock_irq(¤t->sighand->siglock); + + /* Reset all signal handler not set to SIG_IGN to SIG_DFL. */ + if (clone_flags & CLONE_CLEAR_SIGHAND) + flush_signal_handlers(tsk, 0); + return 0; }
@@ -2563,11 +2568,8 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs,
static bool clone3_args_valid(const struct kernel_clone_args *kargs) { - /* - * All lower bits of the flag word are taken. - * Verify that no other unknown flags are passed along. - */ - if (kargs->flags & ~CLONE_LEGACY_FLAGS) + /* Verify that no unknown flags are passed along. */ + if (kargs->flags & ~(CLONE_LEGACY_FLAGS | CLONE_CLEAR_SIGHAND)) return false;
/* @@ -2577,6 +2579,10 @@ static bool clone3_args_valid(const struct kernel_clone_args *kargs) if (kargs->flags & (CLONE_DETACHED | CSIGNAL)) return false;
+ if ((kargs->flags & (CLONE_SIGHAND | CLONE_CLEAR_SIGHAND)) == + (CLONE_SIGHAND | CLONE_CLEAR_SIGHAND)) + return false; + if ((kargs->flags & (CLONE_THREAD | CLONE_PARENT)) && kargs->exit_signal) return false;
Test that CLONE_CLEAR_SIGHAND resets signal handlers to SIG_DFL for the child process and that CLONE_CLEAR_SIGHAND and CLONE_SIGHAND are mutually exclusive.
Cc: Florian Weimer fweimer@redhat.com Cc: libc-alpha@sourceware.org Cc: linux-api@vger.kernel.org Signed-off-by: Christian Brauner christian.brauner@ubuntu.com --- /* v1 */ Link: https://lore.kernel.org/r/20191010133518.5420-2-christian.brauner@ubuntu.com
/* v2 */ Link: https://lore.kernel.org/r/20191011102537.27502-2-christian.brauner@ubuntu.co... - Christian Brauner christian.brauner@ubuntu.com: - remove unused variable - reuse variable in child process instead od declaring a new one - move check for mutual exclusivity of CLONE_SIGHAND and CLONE3_CLEAR_SIGHAND to top of test before setting up signal handlers - rename variables
/* v3 */ - Christian Brauner christian.brauner@ubuntu.com: - s/CLONE3_CLEAR_SIGHAND/CLONE_CLEAR_SIGHAND/g --- MAINTAINERS | 1 + tools/testing/selftests/Makefile | 1 + tools/testing/selftests/clone3/.gitignore | 1 + tools/testing/selftests/clone3/Makefile | 7 + .../selftests/clone3/clone3_clear_sighand.c | 172 ++++++++++++++++++ 5 files changed, 182 insertions(+) create mode 100644 tools/testing/selftests/clone3/.gitignore create mode 100644 tools/testing/selftests/clone3/Makefile create mode 100644 tools/testing/selftests/clone3/clone3_clear_sighand.c
diff --git a/MAINTAINERS b/MAINTAINERS index 55199ef7fa74..582275d85607 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -12828,6 +12828,7 @@ S: Maintained T: git git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git F: samples/pidfd/ F: tools/testing/selftests/pidfd/ +F: tools/testing/selftests/clone3/ K: (?i)pidfd K: (?i)clone3 K: \b(clone_args|kernel_clone_args)\b diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index c3feccb99ff5..6bf7aeb47650 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -4,6 +4,7 @@ TARGETS += bpf TARGETS += breakpoints TARGETS += capabilities TARGETS += cgroup +TARGETS += clone3 TARGETS += cpufreq TARGETS += cpu-hotplug TARGETS += drivers/dma-buf diff --git a/tools/testing/selftests/clone3/.gitignore b/tools/testing/selftests/clone3/.gitignore new file mode 100644 index 000000000000..6c9f98097774 --- /dev/null +++ b/tools/testing/selftests/clone3/.gitignore @@ -0,0 +1 @@ +clone3_clear_sighand diff --git a/tools/testing/selftests/clone3/Makefile b/tools/testing/selftests/clone3/Makefile new file mode 100644 index 000000000000..3ecd56ebc99d --- /dev/null +++ b/tools/testing/selftests/clone3/Makefile @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0-only +CFLAGS += -g -I../../../../usr/include/ + +TEST_GEN_PROGS := clone3_clear_sighand + +include ../lib.mk + diff --git a/tools/testing/selftests/clone3/clone3_clear_sighand.c b/tools/testing/selftests/clone3/clone3_clear_sighand.c new file mode 100644 index 000000000000..0d957be1bdc5 --- /dev/null +++ b/tools/testing/selftests/clone3/clone3_clear_sighand.c @@ -0,0 +1,172 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#define _GNU_SOURCE +#include <errno.h> +#include <sched.h> +#include <signal.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> +#include <linux/sched.h> +#include <linux/types.h> +#include <sys/syscall.h> +#include <sys/wait.h> + +#include "../kselftest.h" + +#ifndef CLONE_CLEAR_SIGHAND +#define CLONE_CLEAR_SIGHAND 0x100000000ULL +#endif + +#ifndef __NR_clone3 +#define __NR_clone3 -1 +struct clone_args { + __aligned_u64 flags; + __aligned_u64 pidfd; + __aligned_u64 child_tid; + __aligned_u64 parent_tid; + __aligned_u64 exit_signal; + __aligned_u64 stack; + __aligned_u64 stack_size; + __aligned_u64 tls; +}; +#endif + +static pid_t sys_clone3(struct clone_args *args, size_t size) +{ + return syscall(__NR_clone3, args, size); +} + +static void test_clone3_supported(void) +{ + pid_t pid; + struct clone_args args = {}; + + if (__NR_clone3 < 0) + ksft_exit_skip("clone3() syscall is not supported\n"); + + /* Set to something that will always cause EINVAL. */ + args.exit_signal = -1; + pid = sys_clone3(&args, sizeof(args)); + if (!pid) + exit(EXIT_SUCCESS); + + if (pid > 0) { + wait(NULL); + ksft_exit_fail_msg( + "Managed to create child process with invalid exit_signal\n"); + } + + if (errno == ENOSYS) + ksft_exit_skip("clone3() syscall is not supported\n"); + + ksft_print_msg("clone3() syscall supported\n"); +} + +static void nop_handler(int signo) +{ +} + +static int wait_for_pid(pid_t pid) +{ + int status, ret; + +again: + ret = waitpid(pid, &status, 0); + if (ret == -1) { + if (errno == EINTR) + goto again; + + return -1; + } + + if (!WIFEXITED(status)) + return -1; + + return WEXITSTATUS(status); +} + +static void test_clone3_clear_sighand(void) +{ + int ret; + pid_t pid; + struct clone_args args = {}; + struct sigaction act; + + /* + * Check that CLONE_CLEAR_SIGHAND and CLONE_SIGHAND are mutually + * exclusive. + */ + args.flags |= CLONE_CLEAR_SIGHAND | CLONE_SIGHAND; + args.exit_signal = SIGCHLD; + pid = sys_clone3(&args, sizeof(args)); + if (pid > 0) + ksft_exit_fail_msg( + "clone3(CLONE_CLEAR_SIGHAND | CLONE_SIGHAND) succeeded\n"); + + act.sa_handler = nop_handler; + ret = sigemptyset(&act.sa_mask); + if (ret < 0) + ksft_exit_fail_msg("%s - sigemptyset() failed\n", + strerror(errno)); + + act.sa_flags = 0; + + /* Register signal handler for SIGUSR1 */ + ret = sigaction(SIGUSR1, &act, NULL); + if (ret < 0) + ksft_exit_fail_msg( + "%s - sigaction(SIGUSR1, &act, NULL) failed\n", + strerror(errno)); + + /* Register signal handler for SIGUSR2 */ + ret = sigaction(SIGUSR2, &act, NULL); + if (ret < 0) + ksft_exit_fail_msg( + "%s - sigaction(SIGUSR2, &act, NULL) failed\n", + strerror(errno)); + + /* Check that CLONE_CLEAR_SIGHAND works. */ + args.flags = CLONE_CLEAR_SIGHAND; + pid = sys_clone3(&args, sizeof(args)); + if (pid < 0) + ksft_exit_fail_msg("%s - clone3(CLONE_CLEAR_SIGHAND) failed\n", + strerror(errno)); + + if (pid == 0) { + ret = sigaction(SIGUSR1, NULL, &act); + if (ret < 0) + exit(EXIT_FAILURE); + + if (act.sa_handler != SIG_DFL) + exit(EXIT_FAILURE); + + ret = sigaction(SIGUSR2, NULL, &act); + if (ret < 0) + exit(EXIT_FAILURE); + + if (act.sa_handler != SIG_DFL) + exit(EXIT_FAILURE); + + exit(EXIT_SUCCESS); + } + + ret = wait_for_pid(pid); + if (ret) + ksft_exit_fail_msg( + "Failed to clear signal handler for child process\n"); + + ksft_test_result_pass("Cleared signal handlers for child process\n"); +} + +int main(int argc, char **argv) +{ + ksft_print_header(); + ksft_set_plan(1); + + test_clone3_supported(); + test_clone3_clear_sighand(); + + return ksft_exit_pass(); +}
On 10/14, Christian Brauner wrote:
The child helper process on Linux posix_spawn must ensure that no signal handlers are enabled, so the signal disposition must be either SIG_DFL or SIG_IGN. However, it requires a sigprocmask to obtain the current signal mask and at least _NSIG sigaction calls to reset the signal handlers for each posix_spawn call
Plus the caller has to block/unblock all signals around clone(VM|VFORK).
Can this justify the new CLONE_ flag? Honestly, I have no idea. But the patch is simple and looks technically correct to me. FWIW,
Reviewed-by: Oleg Nesterov oleg@redhat.com
On 10/21, Oleg Nesterov wrote:
On 10/14, Christian Brauner wrote:
The child helper process on Linux posix_spawn must ensure that no signal handlers are enabled, so the signal disposition must be either SIG_DFL or SIG_IGN. However, it requires a sigprocmask to obtain the current signal mask and at least _NSIG sigaction calls to reset the signal handlers for each posix_spawn call
Plus the caller has to block/unblock all signals around clone(VM|VFORK).
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
just in case... I meant that posix_spawn() has to block/unblock, not its caller.
Can this justify the new CLONE_ flag? Honestly, I have no idea. But the patch is simple and looks technically correct to me. FWIW,
Reviewed-by: Oleg Nesterov oleg@redhat.com
On Mon, Oct 21, 2019 at 05:12:55PM +0200, Oleg Nesterov wrote:
On 10/21, Oleg Nesterov wrote:
On 10/14, Christian Brauner wrote:
The child helper process on Linux posix_spawn must ensure that no signal handlers are enabled, so the signal disposition must be either SIG_DFL or SIG_IGN. However, it requires a sigprocmask to obtain the current signal mask and at least _NSIG sigaction calls to reset the signal handlers for each posix_spawn call
Plus the caller has to block/unblock all signals around clone(VM|VFORK).
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
just in case... I meant that posix_spawn() has to block/unblock, not its caller.
Yeah, I think that is what's currently happening.
Christian
On Mon, Oct 21, 2019 at 04:46:33PM +0200, Oleg Nesterov wrote:
On 10/14, Christian Brauner wrote:
The child helper process on Linux posix_spawn must ensure that no signal handlers are enabled, so the signal disposition must be either SIG_DFL or SIG_IGN. However, it requires a sigprocmask to obtain the current signal mask and at least _NSIG sigaction calls to reset the signal handlers for each posix_spawn call
Plus the caller has to block/unblock all signals around clone(VM|VFORK).
Can this justify the new CLONE_ flag? Honestly, I have no idea. But the patch is simple and looks technically correct to me. FWIW,
Reviewed-by: Oleg Nesterov oleg@redhat.com
The problem is not just the number of syscalls but also that the sigaction logic is already quite complicated and would need to be even more complicated without this flag. That's covered mostly in the glibc thread though. Even just the ability to avoid potentially _NSIG syscalls is enough justification especially since were not scarce on flags.
Thanks! I'll pick this up for 5.5. Christian
linux-kselftest-mirror@lists.linaro.org