From: Tycho Andersen tandersen@netflix.com
Zbigniew mentioned at Linux Plumber's that systemd is interested in switching to execveat() for service execution, but can't, because the contents of /proc/pid/comm are the file descriptor which was used, instead of the path to the binary. This makes the output of tools like top and ps useless, especially in a world where most fds are opened CLOEXEC so the number is truly meaningless.
Change exec path to fix up /proc/pid/comm in the case where we have allocated one of these synthetic paths in bprm_init(). This way the actual exec machinery is unchanged, but cosmetically the comm looks reasonable to admins investigating things.
Signed-off-by: Tycho Andersen tandersen@netflix.com Suggested-by: Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl CC: Aleksa Sarai cyphar@cyphar.com Link: https://github.com/uapi-group/kernel-features#set-comm-field-before-exec --- v2: * drop the flag, everyone :) * change the rendered value to f_path.dentry->d_name.name instead of argv[0], Eric --- fs/exec.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/fs/exec.c b/fs/exec.c index dad402d55681..9520359a8dcc 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1416,7 +1416,18 @@ int begin_new_exec(struct linux_binprm * bprm) set_dumpable(current->mm, SUID_DUMP_USER);
perf_event_exec(); - __set_task_comm(me, kbasename(bprm->filename), true); + + /* + * If fdpath was set, execveat() made up a path that will + * probably not be useful to admins running ps or similar. + * Let's fix it up to be something reasonable. + */ + if (bprm->fdpath) { + BUILD_BUG_ON(TASK_COMM_LEN > DNAME_INLINE_LEN); + __set_task_comm(me, bprm->file->f_path.dentry->d_name.name, true); + } else { + __set_task_comm(me, kbasename(bprm->filename), true); + }
/* An exec changes our domain. We are no longer part of the thread group */
base-commit: baeb9a7d8b60b021d907127509c44507539c15e5
From: Tycho Andersen tandersen@netflix.com
We want to ensure that /proc/self/comm stays useful for execveat() callers.
Signed-off-by: Tycho Andersen tandersen@netflix.com --- tools/testing/selftests/exec/execveat.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+)
diff --git a/tools/testing/selftests/exec/execveat.c b/tools/testing/selftests/exec/execveat.c index 071e03532cba..091029f4ca9b 100644 --- a/tools/testing/selftests/exec/execveat.c +++ b/tools/testing/selftests/exec/execveat.c @@ -419,6 +419,9 @@ int main(int argc, char **argv) if (argc >= 2) { /* If we are invoked with an argument, don't run tests. */ const char *in_test = getenv("IN_TEST"); + /* TASK_COMM_LEN == 16 */ + char buf[32]; + int fd;
if (verbose) { ksft_print_msg("invoked with:\n"); @@ -432,6 +435,28 @@ int main(int argc, char **argv) return 1; }
+ fd = open("/proc/self/comm", O_RDONLY); + if (fd < 0) { + perror("open comm"); + return 1; + } + + if (read(fd, buf, sizeof(buf)) < 0) { + close(fd); + perror("read comm"); + return 1; + } + close(fd); + + /* + * /proc/self/comm should fail to convert to an integer, i.e. + * atoi() should return 0. + */ + if (atoi(buf) != 0) { + ksft_print_msg("bad /proc/self/comm: %s", buf); + return 1; + } + /* Use the final argument as an exit code. */ rc = atoi(argv[argc - 1]); exit(rc);
Tycho Andersen tycho@tycho.pizza writes:
From: Tycho Andersen tandersen@netflix.com
Zbigniew mentioned at Linux Plumber's that systemd is interested in switching to execveat() for service execution, but can't, because the contents of /proc/pid/comm are the file descriptor which was used, instead of the path to the binary. This makes the output of tools like top and ps useless, especially in a world where most fds are opened CLOEXEC so the number is truly meaningless.
Change exec path to fix up /proc/pid/comm in the case where we have allocated one of these synthetic paths in bprm_init(). This way the actual exec machinery is unchanged, but cosmetically the comm looks reasonable to admins investigating things.
Perhaps change the subject to match the code.
Signed-off-by: Tycho Andersen tandersen@netflix.com Suggested-by: Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl CC: Aleksa Sarai cyphar@cyphar.com Link: https://github.com/uapi-group/kernel-features#set-comm-field-before-exec
v2: * drop the flag, everyone :) * change the rendered value to f_path.dentry->d_name.name instead of argv[0], Eric
fs/exec.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/fs/exec.c b/fs/exec.c index dad402d55681..9520359a8dcc 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1416,7 +1416,18 @@ int begin_new_exec(struct linux_binprm * bprm) set_dumpable(current->mm, SUID_DUMP_USER); perf_event_exec();
- __set_task_comm(me, kbasename(bprm->filename), true);
- /*
* If fdpath was set, execveat() made up a path that will
* probably not be useful to admins running ps or similar.
* Let's fix it up to be something reasonable.
*/
- if (bprm->fdpath) {
BUILD_BUG_ON(TASK_COMM_LEN > DNAME_INLINE_LEN);
__set_task_comm(me, bprm->file->f_path.dentry->d_name.name, true);
We can just do this regardless of bprm->fdpath.
It will be a change of behavior on when executing symlinks and possibly mount points but I don't think we care. If we do then we can add make it conditional with "if (bprm->fdpath)"
At the very least using the above version unconditionally ought to flush out any bugs.
It should be 99% application invisible as all an application can see is argv0. So it is only ps and friends where the comm value is visible.
Eric
On Fri, Sep 27, 2024 at 10:45:58AM -0500, Eric W. Biederman wrote:
Tycho Andersen tycho@tycho.pizza writes:
From: Tycho Andersen tandersen@netflix.com
Zbigniew mentioned at Linux Plumber's that systemd is interested in switching to execveat() for service execution, but can't, because the contents of /proc/pid/comm are the file descriptor which was used, instead of the path to the binary. This makes the output of tools like top and ps useless, especially in a world where most fds are opened CLOEXEC so the number is truly meaningless.
Change exec path to fix up /proc/pid/comm in the case where we have allocated one of these synthetic paths in bprm_init(). This way the actual exec machinery is unchanged, but cosmetically the comm looks reasonable to admins investigating things.
Perhaps change the subject to match the code.
Signed-off-by: Tycho Andersen tandersen@netflix.com Suggested-by: Zbigniew Jędrzejewski-Szmek zbyszek@in.waw.pl CC: Aleksa Sarai cyphar@cyphar.com Link: https://github.com/uapi-group/kernel-features#set-comm-field-before-exec
v2: * drop the flag, everyone :) * change the rendered value to f_path.dentry->d_name.name instead of argv[0], Eric
fs/exec.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/fs/exec.c b/fs/exec.c index dad402d55681..9520359a8dcc 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1416,7 +1416,18 @@ int begin_new_exec(struct linux_binprm * bprm) set_dumpable(current->mm, SUID_DUMP_USER); perf_event_exec();
- __set_task_comm(me, kbasename(bprm->filename), true);
- /*
* If fdpath was set, execveat() made up a path that will
* probably not be useful to admins running ps or similar.
* Let's fix it up to be something reasonable.
*/
- if (bprm->fdpath) {
BUILD_BUG_ON(TASK_COMM_LEN > DNAME_INLINE_LEN);
__set_task_comm(me, bprm->file->f_path.dentry->d_name.name, true);
We can just do this regardless of bprm->fdpath.
It will be a change of behavior on when executing symlinks and possibly mount points but I don't think we care. If we do then we can add make it conditional with "if (bprm->fdpath)"
At the very least using the above version unconditionally ought to flush out any bugs.
I'm not super comfortable doing this regardless of bprm->fdpath; that seems like too many cases getting changed. Can we just leave it as depending on bprm->fdpath?
Also, is d_name.name always going to be set? e.g. what about memfd, etc?
linux-kselftest-mirror@lists.linaro.org