From: Tycho Andersen tandersen@netflix.com
Zbigniew mentioned at Linux Plumber's that systemd is interested in switching to execveat() for service execution, but can't, because the contents of /proc/pid/comm are the file descriptor which was used, instead of the path to the binary. This makes the output of tools like top and ps useless, especially in a world where most fds are opened CLOEXEC so the number is truly meaningless.
Change exec path to fix up /proc/pid/comm in the case where we have allocated one of these synthetic paths in bprm_init(). This way the actual exec machinery is unchanged, but cosmetically the comm looks reasonable to admins investigating things.
Signed-off-by: Tycho Andersen tandersen@netflix.com Suggested-by: Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl CC: Aleksa Sarai cyphar@cyphar.com Link: https://github.com/uapi-group/kernel-features#set-comm-field-before-exec --- v2: * drop the flag, everyone :) * change the rendered value to f_path.dentry->d_name.name instead of argv[0], Eric v3: * fix up subject line, Eric v4: * switch to no flag, always rewrite approach, with some cleanup suggested by Kees --- fs/exec.c | 36 +++++++++++++++++++++++++++++++++++- include/linux/binfmts.h | 1 + 2 files changed, 36 insertions(+), 1 deletion(-)
diff --git a/fs/exec.c b/fs/exec.c index 6c53920795c2..3b559f598c74 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1347,7 +1347,16 @@ int begin_new_exec(struct linux_binprm * bprm) set_dumpable(current->mm, SUID_DUMP_USER);
perf_event_exec(); - __set_task_comm(me, kbasename(bprm->filename), true); + + /* + * If argv0 was set, alloc_bprm() made up a path that will + * probably not be useful to admins running ps or similar. + * Let's fix it up to be something reasonable. + */ + if (bprm->argv0) + __set_task_comm(me, kbasename(bprm->argv0), true); + else + __set_task_comm(me, kbasename(bprm->filename), true);
/* An exec changes our domain. We are no longer part of the thread group */ @@ -1497,9 +1506,28 @@ static void free_bprm(struct linux_binprm *bprm) if (bprm->interp != bprm->filename) kfree(bprm->interp); kfree(bprm->fdpath); + kfree(bprm->argv0); kfree(bprm); }
+static int bprm_add_fixup_comm(struct linux_binprm *bprm, + struct user_arg_ptr argv) +{ + const char __user *p = get_user_arg_ptr(argv, 0); + + /* + * If p == NULL, let's just fall back to fdpath. + */ + if (!p) + return 0; + + bprm->argv0 = strndup_user(p, MAX_ARG_STRLEN); + if (bprm->argv0) + return 0; + + return -EFAULT; +} + static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int flags) { struct linux_binprm *bprm; @@ -1906,6 +1934,12 @@ static int do_execveat_common(int fd, struct filename *filename, goto out_ret; }
+ if (unlikely(bprm->fdpath)) { + retval = bprm_add_fixup_comm(bprm, argv); + if (retval != 0) + goto out_free; + } + retval = count(argv, MAX_ARG_STRINGS); if (retval == 0) pr_warn_once("process '%s' launched '%s' with NULL argv: empty string added\n", diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index e6c00e860951..bab5121a746b 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -55,6 +55,7 @@ struct linux_binprm { of the time same as filename, but could be different for binfmt_{misc,script} */ const char *fdpath; /* generated filename for execveat */ + const char *argv0; /* argv0 from execveat */ unsigned interp_flags; int execfd; /* File descriptor of the executable */ unsigned long loader, exec;
base-commit: c1e939a21eb111a6d6067b38e8e04b8809b64c4e
From: Tycho Andersen tandersen@netflix.com
In the previous patch we've defined a couple behaviors:
1. execveat(fd, AT_EMPTY_PATH, {"foo"}, ...) should render argv[0] as /proc/pid/comm 2. execveat(fd, AT_EMPTY_PATH, {NULL}, ...) should keep the old behavior of rendering the fd as /proc/pid/comm
and just to be sure keeps working with symlinks, which was a concern in [1], I've added a test for that as well.
The test itself is a bit ugly, because the existing check_execveat_fail() helpers use a hardcoded envp and argv, and we want to "pass" things via the environment to test various argument values, but it seemed cleaner than passing one in everywhere in all the existing tests.
Output looks like:
ok 51 Check success of execveat(6, 'home/tycho/packages/...yyyyyyyyyyyyyyyyyyyy', 0)... # Check execveat(AT_EMPTY_PATH)'s comm is sentinel ok 52 Check success of execveat(9, '', 4096)... # Check execveat(AT_EMPTY_PATH)'s comm is sentinel ok 53 Check success of execveat(11, '', 4096)... # Check execveat(AT_EMPTY_PATH)'s comm is 9 [ 25.579272] process 'execveat' launched '/dev/fd/9' with NULL argv: empty string added ok 54 Check success of execveat(9, '', 4096)...
[1]: https://lore.kernel.org/all/20240925.152228-private.conflict.frozen.trios-Td... Signed-off-by: Tycho Andersen tandersen@netflix.com --- v4: fix up commit message, use ksft_perror() vs perror(), Shuah --- tools/testing/selftests/exec/execveat.c | 77 ++++++++++++++++++++++++- 1 file changed, 74 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/exec/execveat.c b/tools/testing/selftests/exec/execveat.c index 071e03532cba..3a05f8cbd815 100644 --- a/tools/testing/selftests/exec/execveat.c +++ b/tools/testing/selftests/exec/execveat.c @@ -23,9 +23,11 @@
#include "../kselftest.h"
-#define TESTS_EXPECTED 51 +#define TESTS_EXPECTED 54 #define TEST_NAME_LEN (PATH_MAX * 4)
+#define CHECK_COMM "CHECK_COMM" + static char longpath[2 * PATH_MAX] = ""; static char *envp[] = { "IN_TEST=yes", NULL, NULL }; static char *argv[] = { "execveat", "99", NULL }; @@ -237,12 +239,36 @@ static int check_execveat_pathmax(int root_dfd, const char *src, int is_script) return fail; }
+static int check_execveat_comm(int fd, char *argv0, char *expected) +{ + char buf[128], *old_env, *old_argv0; + int ret; + + snprintf(buf, sizeof(buf), CHECK_COMM "=%s", expected); + + old_env = envp[1]; + envp[1] = buf; + + old_argv0 = argv[0]; + argv[0] = argv0; + + ksft_print_msg("Check execveat(AT_EMPTY_PATH)'s comm is %s\n", + expected); + ret = check_execveat_invoked_rc(fd, "", AT_EMPTY_PATH, 0, 0); + + envp[1] = old_env; + argv[0] = old_argv0; + + return ret; +} + static int run_tests(void) { int fail = 0; char *fullname = realpath("execveat", NULL); char *fullname_script = realpath("script", NULL); char *fullname_symlink = concat(fullname, ".symlink"); + char fd_buf[10]; int subdir_dfd = open_or_die("subdir", O_DIRECTORY|O_RDONLY); int subdir_dfd_ephemeral = open_or_die("subdir.ephemeral", O_DIRECTORY|O_RDONLY); @@ -389,6 +415,15 @@ static int run_tests(void)
fail += check_execveat_pathmax(root_dfd, "execveat", 0); fail += check_execveat_pathmax(root_dfd, "script", 1); + + /* /proc/pid/comm gives argv[0] by default */ + fail += check_execveat_comm(fd, "sentinel", "sentinel"); + /* /proc/pid/comm gives argv[0] when invoked via link */ + fail += check_execveat_comm(fd_symlink, "sentinel", "sentinel"); + /* /proc/pid/comm gives fdno if NULL is passed */ + snprintf(fd_buf, sizeof(fd_buf), "%d", fd); + fail += check_execveat_comm(fd, NULL, fd_buf); + return fail; }
@@ -415,9 +450,13 @@ int main(int argc, char **argv) int ii; int rc; const char *verbose = getenv("VERBOSE"); + const char *check_comm = getenv(CHECK_COMM);
- if (argc >= 2) { - /* If we are invoked with an argument, don't run tests. */ + if (argc >= 2 || check_comm) { + /* + * If we are invoked with an argument, or no arguments but a + * command to check, don't run tests. + */ const char *in_test = getenv("IN_TEST");
if (verbose) { @@ -426,6 +465,38 @@ int main(int argc, char **argv) ksft_print_msg("\t[%d]='%s\n'", ii, argv[ii]); }
+ /* If the tests wanted us to check the command, do so. */ + if (check_comm) { + /* TASK_COMM_LEN == 16 */ + char buf[32]; + int fd, ret; + + fd = open("/proc/self/comm", O_RDONLY); + if (fd < 0) { + ksft_perror("open() comm failed"); + exit(1); + } + + ret = read(fd, buf, sizeof(buf)); + if (ret < 0) { + ksft_perror("read() comm failed"); + close(fd); + exit(1); + } + close(fd); + + // trim off the \n + buf[ret-1] = 0; + + if (strcmp(buf, check_comm)) { + ksft_print_msg("bad comm, got: %s expected: %s", + buf, check_comm); + exit(1); + } + + exit(0); + } + /* Check expected environment transferred. */ if (!in_test || strcmp(in_test, "yes") != 0) { ksft_print_msg("no IN_TEST=yes in env\n");
On Wed, Oct 30, 2024 at 02:37:32PM -0600, Tycho Andersen wrote:
From: Tycho Andersen tandersen@netflix.com
In the previous patch we've defined a couple behaviors:
- execveat(fd, AT_EMPTY_PATH, {"foo"}, ...) should render argv[0] as /proc/pid/comm
- execveat(fd, AT_EMPTY_PATH, {NULL}, ...) should keep the old behavior of rendering the fd as /proc/pid/comm
and just to be sure keeps working with symlinks, which was a concern in [1], I've added a test for that as well.
The test itself is a bit ugly, because the existing check_execveat_fail() helpers use a hardcoded envp and argv, and we want to "pass" things via the environment to test various argument values, but it seemed cleaner than passing one in everywhere in all the existing tests.
This test doesn't pass in my CI, running on an i.MX8MP Verdin board. This is an arm64 system and I'm running the tests on NFS.
Output looks like:
ok 51 Check success of execveat(6, 'home/tycho/packages/...yyyyyyyyyyyyyyyyyyyy', 0)... # Check execveat(AT_EMPTY_PATH)'s comm is sentinel ok 52 Check success of execveat(9, '', 4096)... # Check execveat(AT_EMPTY_PATH)'s comm is sentinel ok 53 Check success of execveat(11, '', 4096)... # Check execveat(AT_EMPTY_PATH)'s comm is 9 [ 25.579272] process 'execveat' launched '/dev/fd/9' with NULL argv: empty string added ok 54 Check success of execveat(9, '', 4096)...
The output when things fail is:
# # Check execveat(AT_EMPTY_PATH)'s comm is sentinel # # bad comm, got: 11 expected: sentinel# child 8257 exited with 1 neither 0 nor 0 # not ok 52 Check success of execveat(11, '', 4096)... # # Check execveat(AT_EMPTY_PATH)'s comm is sentinel # # bad comm, got: 13 expected: sentinel# child 8258 exited with 1 neither 0 nor 0 # not ok 53 Check success of execveat(13, '', 4096)...
Full log from a failing job at:
https://lava.sirena.org.uk/scheduler/job/993508
I didn't do any investigation beyond this.
On Wed, Nov 27, 2024 at 02:25:29PM +0000, Mark Brown wrote:
On Wed, Oct 30, 2024 at 02:37:32PM -0600, Tycho Andersen wrote:
From: Tycho Andersen tandersen@netflix.com
In the previous patch we've defined a couple behaviors:
- execveat(fd, AT_EMPTY_PATH, {"foo"}, ...) should render argv[0] as /proc/pid/comm
- execveat(fd, AT_EMPTY_PATH, {NULL}, ...) should keep the old behavior of rendering the fd as /proc/pid/comm
and just to be sure keeps working with symlinks, which was a concern in [1], I've added a test for that as well.
The test itself is a bit ugly, because the existing check_execveat_fail() helpers use a hardcoded envp and argv, and we want to "pass" things via the environment to test various argument values, but it seemed cleaner than passing one in everywhere in all the existing tests.
This test doesn't pass in my CI, running on an i.MX8MP Verdin board. This is an arm64 system and I'm running the tests on NFS.
Output looks like:
ok 51 Check success of execveat(6, 'home/tycho/packages/...yyyyyyyyyyyyyyyyyyyy', 0)... # Check execveat(AT_EMPTY_PATH)'s comm is sentinel ok 52 Check success of execveat(9, '', 4096)... # Check execveat(AT_EMPTY_PATH)'s comm is sentinel ok 53 Check success of execveat(11, '', 4096)... # Check execveat(AT_EMPTY_PATH)'s comm is 9 [ 25.579272] process 'execveat' launched '/dev/fd/9' with NULL argv: empty string added ok 54 Check success of execveat(9, '', 4096)...
The output when things fail is:
# # Check execveat(AT_EMPTY_PATH)'s comm is sentinel # # bad comm, got: 11 expected: sentinel# child 8257 exited with 1 neither 0 nor 0 # not ok 52 Check success of execveat(11, '', 4096)... # # Check execveat(AT_EMPTY_PATH)'s comm is sentinel # # bad comm, got: 13 expected: sentinel# child 8258 exited with 1 neither 0 nor 0 # not ok 53 Check success of execveat(13, '', 4096)...
Full log from a failing job at:
https://lava.sirena.org.uk/scheduler/job/993508
I didn't do any investigation beyond this.
Strange... but this series has been rejected by Linus anyway, so probably not worth investigating further.
Tycho
On Wed, Nov 27, 2024 at 08:00:12AM -0700, Tycho Andersen wrote:
On Wed, Nov 27, 2024 at 02:25:29PM +0000, Mark Brown wrote:
This test doesn't pass in my CI, running on an i.MX8MP Verdin board. This is an arm64 system and I'm running the tests on NFS.
Strange... but this series has been rejected by Linus anyway, so probably not worth investigating further.
Ah, OK - it's still in -next and causing the overall suite to fail.
On Wed, 30 Oct 2024 14:37:31 -0600, Tycho Andersen wrote:
Zbigniew mentioned at Linux Plumber's that systemd is interested in switching to execveat() for service execution, but can't, because the contents of /proc/pid/comm are the file descriptor which was used, instead of the path to the binary. This makes the output of tools like top and ps useless, especially in a world where most fds are opened CLOEXEC so the number is truly meaningless.
[...]
Applied to for-next/execve, thanks!
[1/2] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case https://git.kernel.org/kees/c/7bdc6fc85c9a [2/2] selftests/exec: add a test for execveat()'s comm https://git.kernel.org/kees/c/bd104872311a
Take care,
On Thu, Oct 31, 2024 at 03:10:37PM -0700, Kees Cook wrote:
On Wed, 30 Oct 2024 14:37:31 -0600, Tycho Andersen wrote:
Zbigniew mentioned at Linux Plumber's that systemd is interested in switching to execveat() for service execution, but can't, because the contents of /proc/pid/comm are the file descriptor which was used, instead of the path to the binary. This makes the output of tools like top and ps useless, especially in a world where most fds are opened CLOEXEC so the number is truly meaningless.
[...]
Applied to for-next/execve, thanks!
[1/2] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case https://git.kernel.org/kees/c/7bdc6fc85c9a [2/2] selftests/exec: add a test for execveat()'s comm https://git.kernel.org/kees/c/bd104872311a
I tested this with systemd compiled with -Dfexece=true and it all seems to work fine. Thanks!
Zbyszek
On Sat, Nov 02, 2024 at 11:29:55AM +0000, Zbigniew Jędrzejewski-Szmek wrote:
On Thu, Oct 31, 2024 at 03:10:37PM -0700, Kees Cook wrote:
On Wed, 30 Oct 2024 14:37:31 -0600, Tycho Andersen wrote:
Zbigniew mentioned at Linux Plumber's that systemd is interested in switching to execveat() for service execution, but can't, because the contents of /proc/pid/comm are the file descriptor which was used, instead of the path to the binary. This makes the output of tools like top and ps useless, especially in a world where most fds are opened CLOEXEC so the number is truly meaningless.
[...]
Applied to for-next/execve, thanks!
[1/2] exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case https://git.kernel.org/kees/c/7bdc6fc85c9a [2/2] selftests/exec: add a test for execveat()'s comm https://git.kernel.org/kees/c/bd104872311a
I tested this with systemd compiled with -Dfexece=true and it all seems to work fine. Thanks!
Great; thank you!
On Wed, Oct 30, 2024 at 02:37:31PM -0600, Tycho Andersen wrote:
From: Tycho Andersen tandersen@netflix.com
Zbigniew mentioned at Linux Plumber's that systemd is interested in switching to execveat() for service execution, but can't, because the contents of /proc/pid/comm are the file descriptor which was used, instead of the path to the binary. This makes the output of tools like top and ps useless, especially in a world where most fds are opened CLOEXEC so the number is truly meaningless.
Change exec path to fix up /proc/pid/comm in the case where we have allocated one of these synthetic paths in bprm_init(). This way the actual exec machinery is unchanged, but cosmetically the comm looks reasonable to admins investigating things.
Signed-off-by: Tycho Andersen tandersen@netflix.com Suggested-by: Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl CC: Aleksa Sarai cyphar@cyphar.com Link: https://github.com/uapi-group/kernel-features#set-comm-field-before-exec
We finally went full circle back to what was originally proposed :)
Reviewed-by: Christian Brauner brauner@kernel.org
linux-kselftest-mirror@lists.linaro.org