If you wish to utilise a pidfd interface to refer to the current process or thread it is rather cumbersome, requiring something like:
int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD);
...
close(pidfd);
Or the equivalent call opening /proc/self. It is more convenient to use a sentinel value to indicate to an interface that accepts a pidfd that we simply wish to refer to the current process thread.
This series introduces sentinels for this purposes which can be passed as the pidfd in this instance rather than having to establish a dummy fd for this purpose.
It is useful to refer to both the current thread from the userland's perspective for which we use PIDFD_SELF, and the current process from the userland's perspective, for which we use PIDFD_SELF_PROCESS.
There is unfortunately some confusion between the kernel and userland as to what constitutes a process - a thread from the userland perspective is a process in userland, and a userland process is a thread group (more specifically the thread group leader from the kernel perspective). We therefore alias things thusly:
* PIDFD_SELF_THREAD aliased by PIDFD_SELF - use PIDTYPE_PID. * PIDFD_SELF_THREAD_GROUP alised by PIDFD_SELF_PROCESS - use PIDTYPE_TGID.
In all of the kernel code we refer to PIDFD_SELF_THREAD and PIDFD_SELF_THREAD_GROUP. However we expect users to use PIDFD_SELF and PIDFD_SELF_PROCESS.
This matters for cases where, for instance, a user unshare()'s FDs or does thread-specific signal handling and where the user would be hugely confused if the FDs referenced or signal processed referred to the thread group leader rather than the individual thread.
For now we only adjust pidfd_get_task() and the pidfd_send_signal() system call with specific handling for this, implementing this functionality for process_madvise(), process_mrelease() (albeit, using it here wouldn't really make sense) and pidfd_send_signal().
We defer making further changes, as this would require a significant rework of the pidfd mechanism.
The motivating case here is to support PIDFD_SELF in process_madvise(), so this suffices for immediate uses. Moving forward, this can be further expanded to other uses.
v7: * Reworked implementation according to Christian's requirements. We now only support process_madvise() and pidfd_send_signal() system calls with PIDFD_SELF as specified. * Updated tests to account for broken pidfd_open_test.c implementation. * Fixed missing includes in pidfd self tests. * Removed tests relating to functionality no longer supported. * Update guard pages test to use PIDFD_SELF.
v6: * Avoid static inline in UAPI header as suggested by Pedro. * Place PIDFD_SELF values out of range of errors and any other sentinel as suggested by Pedro. https://lore.kernel.org/linux-mm/cover.1729926229.git.lorenzo.stoakes@oracle...
v5: * Fixup self test dependencies on pidfd/pidfd.h. https://lore.kernel.org/linux-mm/cover.1729848252.git.lorenzo.stoakes@oracle...
v4: * Avoid returning an fd in the __pidfd_get_pid() function as pointed out by Christian, instead simply always pin the pid and maintain fd scope in the helper alone. * Add wrapper header file in tools/include/linux to allow for import of UAPI pidfd.h header without encountering the collision between system fcntl.h and linux/fcntl.h as discussed with Shuah and John. * Fixup tests to import the UAPI pidfd.h header working around conflicts between system fcntl.h and linux/fcntl.h which the UAPI pidfd.h imports, as reported by Shuah. * Use an int for pidfd_is_self_sentinel() to avoid any dependency on stdbool.h in userland. https://lore.kernel.org/linux-mm/cover.1729198898.git.lorenzo.stoakes@oracle...
v3: * Do not fput() an invalid fd as reported by kernel test bot. * Fix unintended churn from moving variable declaration. https://lore.kernel.org/linux-mm/cover.1729073310.git.lorenzo.stoakes@oracle...
v2: * Fix tests as reported by Shuah. * Correct RFC version lore link. https://lore.kernel.org/linux-mm/cover.1728643714.git.lorenzo.stoakes@oracle...
Non-RFC v1: * Removed RFC tag - there seems to be general consensus that this change is a good idea, but perhaps some debate to be had on implementation. It seems sensible then to move forward with the RFC flag removed. * Introduced PIDFD_SELF_THREAD, PIDFD_SELF_THREAD_GROUP and their aliases PIDFD_SELF and PIDFD_SELF_PROCESS respectively. * Updated testing accordingly. https://lore.kernel.org/linux-mm/cover.1728578231.git.lorenzo.stoakes@oracle...
RFC version: https://lore.kernel.org/linux-mm/cover.1727644404.git.lorenzo.stoakes@oracle...
Lorenzo Stoakes (6): pidfd: add PIDFD_SELF* sentinels to refer to own thread/process selftests/pidfd: add missing system header imcludes to pidfd tests tools: testing: separate out wait_for_pid() into helper header selftests: pidfd: add pidfd.h UAPI wrapper selftests: pidfd: add tests for PIDFD_SELF_* selftests/mm: use PIDFD_SELF in guard pages test
include/uapi/linux/pidfd.h | 24 ++++ kernel/pid.c | 24 +++- kernel/signal.c | 106 +++++++++++------- tools/include/linux/pidfd.h | 14 +++ tools/testing/selftests/cgroup/test_kill.c | 2 +- tools/testing/selftests/mm/Makefile | 4 + tools/testing/selftests/mm/guard-pages.c | 15 +-- .../pid_namespace/regression_enomem.c | 2 +- tools/testing/selftests/pidfd/Makefile | 3 +- tools/testing/selftests/pidfd/pidfd.h | 28 +---- .../selftests/pidfd/pidfd_fdinfo_test.c | 1 + tools/testing/selftests/pidfd/pidfd_helpers.h | 39 +++++++ .../testing/selftests/pidfd/pidfd_open_test.c | 6 +- .../selftests/pidfd/pidfd_setns_test.c | 1 + tools/testing/selftests/pidfd/pidfd_test.c | 76 +++++++++++-- 15 files changed, 242 insertions(+), 103 deletions(-) create mode 100644 tools/include/linux/pidfd.h create mode 100644 tools/testing/selftests/pidfd/pidfd_helpers.h
-- 2.48.1
It is useful to be able to utilise the pidfd mechanism to reference the current thread or process (from a userland point of view - thread group leader from the kernel's point of view).
Therefore introduce PIDFD_SELF_THREAD to refer to the current thread, and PIDFD_SELF_THREAD_GROUP to refer to the current thread group leader.
For convenience and to avoid confusion from userland's perspective we alias these:
* PIDFD_SELF is an alias for PIDFD_SELF_THREAD - This is nearly always what the user will want to use, as they would find it surprising if for instance fd's were unshared()'d and they wanted to invoke pidfd_getfd() and that failed.
* PIDFD_SELF_PROCESS is an alias for PIDFD_SELF_THREAD_GROUP - Most users have no concept of thread groups or what a thread group leader is, and from userland's perspective and nomenclature this is what userland considers to be a process.
We adjust pidfd_get_task() and the pidfd_send_signal() system call with specific handling for this, implementing this functionality for process_madvise(), process_mrelease() (albeit, using it here wouldn't really make sense) and pidfd_send_signal().
Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com --- include/uapi/linux/pidfd.h | 24 +++++++++ kernel/pid.c | 24 +++++++-- kernel/signal.c | 106 ++++++++++++++++++++++--------------- 3 files changed, 107 insertions(+), 47 deletions(-)
diff --git a/include/uapi/linux/pidfd.h b/include/uapi/linux/pidfd.h index 4540f6301b8c..e0abd0b18841 100644 --- a/include/uapi/linux/pidfd.h +++ b/include/uapi/linux/pidfd.h @@ -23,6 +23,30 @@
#define PIDFD_INFO_SIZE_VER0 64 /* sizeof first published struct */
+/* + * The concept of process and threads in userland and the kernel is a confusing + * one - within the kernel every thread is a 'task' with its own individual PID, + * however from userland's point of view threads are grouped by a single PID, + * which is that of the 'thread group leader', typically the first thread + * spawned. + * + * To cut the Gideon knot, for internal kernel usage, we refer to + * PIDFD_SELF_THREAD to refer to the current thread (or task from a kernel + * perspective), and PIDFD_SELF_THREAD_GROUP to refer to the current thread + * group leader... + */ +#define PIDFD_SELF_THREAD -10000 /* Current thread. */ +#define PIDFD_SELF_THREAD_GROUP -20000 /* Current thread group leader. */ + +/* + * ...and for userland we make life simpler - PIDFD_SELF refers to the current + * thread, PIDFD_SELF_PROCESS refers to the process thread group leader. + * + * For nearly all practical uses, a user will want to use PIDFD_SELF. + */ +#define PIDFD_SELF PIDFD_SELF_THREAD +#define PIDFD_SELF_PROCESS PIDFD_SELF_THREAD_GROUP + struct pidfd_info { /* * This mask is similar to the request_mask in statx(2). diff --git a/kernel/pid.c b/kernel/pid.c index 3a10a7b6fcf8..1d2fc59d64fc 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -564,15 +564,29 @@ struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags) */ struct task_struct *pidfd_get_task(int pidfd, unsigned int *flags) { - unsigned int f_flags; + unsigned int f_flags = 0; struct pid *pid; struct task_struct *task; + enum pid_type type;
- pid = pidfd_get_pid(pidfd, &f_flags); - if (IS_ERR(pid)) - return ERR_CAST(pid); + switch (pidfd) { + case PIDFD_SELF_THREAD: + type = PIDTYPE_PID; + pid = get_task_pid(current, type); + break; + case PIDFD_SELF_THREAD_GROUP: + type = PIDTYPE_TGID; + pid = get_task_pid(current, type); + break; + default: + pid = pidfd_get_pid(pidfd, &f_flags); + if (IS_ERR(pid)) + return ERR_CAST(pid); + type = PIDTYPE_TGID; + break; + }
- task = get_pid_task(pid, PIDTYPE_TGID); + task = get_pid_task(pid, type); put_pid(pid); if (!task) return ERR_PTR(-ESRCH); diff --git a/kernel/signal.c b/kernel/signal.c index a2afd54303f0..1e8f792f88de 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -4009,6 +4009,46 @@ static struct pid *pidfd_to_pid(const struct file *file) (PIDFD_SIGNAL_THREAD | PIDFD_SIGNAL_THREAD_GROUP | \ PIDFD_SIGNAL_PROCESS_GROUP)
+static int do_pidfd_send_signal(struct pid *pid, int sig, enum pid_type type, + siginfo_t __user *info, unsigned int flags) +{ + kernel_siginfo_t kinfo; + + switch (flags) { + case PIDFD_SIGNAL_THREAD: + type = PIDTYPE_PID; + break; + case PIDFD_SIGNAL_THREAD_GROUP: + type = PIDTYPE_TGID; + break; + case PIDFD_SIGNAL_PROCESS_GROUP: + type = PIDTYPE_PGID; + break; + } + + if (info) { + int ret = copy_siginfo_from_user_any(&kinfo, info); + + if (unlikely(ret)) + return ret; + + if (unlikely(sig != kinfo.si_signo)) + return -EINVAL; + + /* Only allow sending arbitrary signals to yourself. */ + if ((task_pid(current) != pid || type > PIDTYPE_TGID) && + (kinfo.si_code >= 0 || kinfo.si_code == SI_TKILL)) + return -EPERM; + } else { + prepare_kill_siginfo(sig, &kinfo, type); + } + + if (type == PIDTYPE_PGID) + return kill_pgrp_info(sig, &kinfo, pid); + + return kill_pid_info_type(sig, &kinfo, pid, type); +} + /** * sys_pidfd_send_signal - Signal a process through a pidfd * @pidfd: file descriptor of the process @@ -4028,7 +4068,6 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig, { int ret; struct pid *pid; - kernel_siginfo_t kinfo; enum pid_type type;
/* Enforce flags be set to 0 until we add an extension. */ @@ -4040,54 +4079,37 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig, return -EINVAL;
CLASS(fd, f)(pidfd); - if (fd_empty(f)) - return -EBADF;
- /* Is this a pidfd? */ - pid = pidfd_to_pid(fd_file(f)); - if (IS_ERR(pid)) - return PTR_ERR(pid); + switch (pidfd) { + case PIDFD_SELF_THREAD: + pid = get_task_pid(current, PIDTYPE_PID); + type = PIDTYPE_PID; + break; + case PIDFD_SELF_THREAD_GROUP: + pid = get_task_pid(current, PIDTYPE_TGID); + type = PIDTYPE_TGID; + break; + default: + if (fd_empty(f)) + return -EBADF;
- if (!access_pidfd_pidns(pid)) - return -EINVAL; + /* Is this a pidfd? */ + pid = pidfd_to_pid(fd_file(f)); + if (IS_ERR(pid)) + return PTR_ERR(pid);
- switch (flags) { - case 0: + if (!access_pidfd_pidns(pid)) + return -EINVAL; /* Infer scope from the type of pidfd. */ if (fd_file(f)->f_flags & PIDFD_THREAD) type = PIDTYPE_PID; else type = PIDTYPE_TGID; break; - case PIDFD_SIGNAL_THREAD: - type = PIDTYPE_PID; - break; - case PIDFD_SIGNAL_THREAD_GROUP: - type = PIDTYPE_TGID; - break; - case PIDFD_SIGNAL_PROCESS_GROUP: - type = PIDTYPE_PGID; - break; - } - - if (info) { - ret = copy_siginfo_from_user_any(&kinfo, info); - if (unlikely(ret)) - return ret; - - if (unlikely(sig != kinfo.si_signo)) - return -EINVAL; - - /* Only allow sending arbitrary signals to yourself. */ - if ((task_pid(current) != pid || type > PIDTYPE_TGID) && - (kinfo.si_code >= 0 || kinfo.si_code == SI_TKILL)) - return -EPERM; - } else { - prepare_kill_siginfo(sig, &kinfo, type); }
- if (type == PIDTYPE_PGID) - return kill_pgrp_info(sig, &kinfo, pid); - else - return kill_pid_info_type(sig, &kinfo, pid, type); + ret = do_pidfd_send_signal(pid, sig, type, info, flags); + if (fd_empty(f)) + put_pid(pid); + return ret; } -- 2.48.1
On Thu, Jan 30, 2025 at 08:40:26PM +0000, Lorenzo Stoakes wrote:
It is useful to be able to utilise the pidfd mechanism to reference the current thread or process (from a userland point of view - thread group leader from the kernel's point of view).
Therefore introduce PIDFD_SELF_THREAD to refer to the current thread, and PIDFD_SELF_THREAD_GROUP to refer to the current thread group leader.
For convenience and to avoid confusion from userland's perspective we alias these:
PIDFD_SELF is an alias for PIDFD_SELF_THREAD - This is nearly always what the user will want to use, as they would find it surprising if for instance fd's were unshared()'d and they wanted to invoke pidfd_getfd() and that failed.
PIDFD_SELF_PROCESS is an alias for PIDFD_SELF_THREAD_GROUP - Most users have no concept of thread groups or what a thread group leader is, and from userland's perspective and nomenclature this is what userland considers to be a process.
We adjust pidfd_get_task() and the pidfd_send_signal() system call with specific handling for this, implementing this functionality for process_madvise(), process_mrelease() (albeit, using it here wouldn't really make sense) and pidfd_send_signal().
Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com
Reviewed-by: Shakeel Butt shakeel.butt@linux.dev
On Thu, Jan 30, 2025 at 08:40:26PM +0000, Lorenzo Stoakes lorenzo.stoakes@oracle.com wrote:
Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com
include/uapi/linux/pidfd.h | 24 +++++++++ kernel/pid.c | 24 +++++++-- kernel/signal.c | 106 ++++++++++++++++++++++--------------- 3 files changed, 107 insertions(+), 47 deletions(-)
Practical idea, thanks.
diff --git a/include/uapi/linux/pidfd.h b/include/uapi/linux/pidfd.h
- To cut the Gideon knot, for internal kernel usage, we refer to
A nit https://en.wikipedia.org/wiki/Gordian_Knot
(if still applicable)
Michal
On Tue, Feb 11, 2025 at 04:24:07PM +0100, Michal Koutný wrote:
On Thu, Jan 30, 2025 at 08:40:26PM +0000, Lorenzo Stoakes lorenzo.stoakes@oracle.com wrote:
Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com
include/uapi/linux/pidfd.h | 24 +++++++++ kernel/pid.c | 24 +++++++-- kernel/signal.c | 106 ++++++++++++++++++++++--------------- 3 files changed, 107 insertions(+), 47 deletions(-)
Practical idea, thanks.
Thanks!
diff --git a/include/uapi/linux/pidfd.h b/include/uapi/linux/pidfd.h
- To cut the Gideon knot, for internal kernel usage, we refer to
A nit https://en.wikipedia.org/wiki/Gordian_Knot
(if still applicable)
MY GOD. Hahaha. How embarrassing. God knows how 'Gideon' ended up there. Apologies to all, I appear to have had a senior moment there...
Feel free to correct Christian, unless we want to leave this as an Easter Egg? :P
Michal
On Tue, Feb 11, 2025 at 03:45:20PM +0000, Lorenzo Stoakes wrote:
On Tue, Feb 11, 2025 at 04:24:07PM +0100, Michal Koutný wrote:
On Thu, Jan 30, 2025 at 08:40:26PM +0000, Lorenzo Stoakes lorenzo.stoakes@oracle.com wrote:
Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com
include/uapi/linux/pidfd.h | 24 +++++++++ kernel/pid.c | 24 +++++++-- kernel/signal.c | 106 ++++++++++++++++++++++--------------- 3 files changed, 107 insertions(+), 47 deletions(-)
Practical idea, thanks.
Thanks!
diff --git a/include/uapi/linux/pidfd.h b/include/uapi/linux/pidfd.h
- To cut the Gideon knot, for internal kernel usage, we refer to
A nit https://en.wikipedia.org/wiki/Gordian_Knot
(if still applicable)
MY GOD. Hahaha. How embarrassing. God knows how 'Gideon' ended up there. Apologies to all, I appear to have had a senior moment there...
Feel free to correct Christian, unless we want to leave this as an Easter Egg? :P
Everybody knows it's the "quotidian knot" that's the most challenging.
The pidfd_fdinfo_test.c and pidfd_setns_test.c tests appear to be missing fundamental system header imports required to execute correctly. Add these.
Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com --- tools/testing/selftests/pidfd/pidfd_fdinfo_test.c | 1 + tools/testing/selftests/pidfd/pidfd_setns_test.c | 1 + 2 files changed, 2 insertions(+)
diff --git a/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c b/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c index f062a986e382..f718aac75068 100644 --- a/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c +++ b/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c @@ -13,6 +13,7 @@ #include <syscall.h> #include <sys/wait.h> #include <sys/mman.h> +#include <sys/mount.h>
#include "pidfd.h" #include "../kselftest.h" diff --git a/tools/testing/selftests/pidfd/pidfd_setns_test.c b/tools/testing/selftests/pidfd/pidfd_setns_test.c index 222f8131283b..a55f6641e0b6 100644 --- a/tools/testing/selftests/pidfd/pidfd_setns_test.c +++ b/tools/testing/selftests/pidfd/pidfd_setns_test.c @@ -14,6 +14,7 @@ #include <sys/prctl.h> #include <sys/wait.h> #include <unistd.h> +#include <sys/ioctl.h> #include <sys/socket.h> #include <sys/stat.h> #include <linux/ioctl.h>
On Thu, Jan 30, 2025 at 08:40:27PM +0000, Lorenzo Stoakes wrote:
The pidfd_fdinfo_test.c and pidfd_setns_test.c tests appear to be missing fundamental system header imports required to execute correctly. Add these.
Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com
Reviewed-by: Shakeel Butt shakeel.butt@linux.dev
Hello *,
On Thu, 30 Jan 2025 20:40:27 +0000, Lorenzo Stoakes lorenzo.stoakes@oracle.com wrote:
The pidfd_fdinfo_test.c and pidfd_setns_test.c tests appear to be missing fundamental system header imports required to execute correctly. Add these.
Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com
tools/testing/selftests/pidfd/pidfd_fdinfo_test.c | 1 + tools/testing/selftests/pidfd/pidfd_setns_test.c | 1 + 2 files changed, 2 insertions(+)
diff --git a/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c b/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c index f062a986e382..f718aac75068 100644 --- a/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c +++ b/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c @@ -13,6 +13,7 @@ #include <syscall.h> #include <sys/wait.h> #include <sys/mman.h> +#include <sys/mount.h>
#include "pidfd.h" #include "../kselftest.h"
Predated patch already available, see
https://lore.kernel.org/linux-kselftest/20250115105211.390370-1-ps.report@gm...
diff --git a/tools/testing/selftests/pidfd/pidfd_setns_test.c b/tools/testing/selftests/pidfd/pidfd_setns_test.c index 222f8131283b..a55f6641e0b6 100644 --- a/tools/testing/selftests/pidfd/pidfd_setns_test.c +++ b/tools/testing/selftests/pidfd/pidfd_setns_test.c @@ -14,6 +14,7 @@ #include <sys/prctl.h> #include <sys/wait.h> #include <unistd.h> +#include <sys/ioctl.h> #include <sys/socket.h> #include <sys/stat.h> #include <linux/ioctl.h>
and predated patch available, see
https://lore.kernel.org/linux-kselftest/20250115105211.390370-2-ps.report@gm...
Regards, Peter
It seems tests other than the pidfd tests use the wait_for_pid() function declared in pidfd.h.
Since we will shortly be modifying pidfd.h in a way that might clash with other tests, separate this out and update tests accordingly.
Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com --- tools/testing/selftests/cgroup/test_kill.c | 2 +- .../pid_namespace/regression_enomem.c | 2 +- tools/testing/selftests/pidfd/pidfd.h | 26 +------------ tools/testing/selftests/pidfd/pidfd_helpers.h | 39 +++++++++++++++++++ 4 files changed, 42 insertions(+), 27 deletions(-) create mode 100644 tools/testing/selftests/pidfd/pidfd_helpers.h
diff --git a/tools/testing/selftests/cgroup/test_kill.c b/tools/testing/selftests/cgroup/test_kill.c index 0e5bb6c7307a..2367f645fe89 100644 --- a/tools/testing/selftests/cgroup/test_kill.c +++ b/tools/testing/selftests/cgroup/test_kill.c @@ -10,7 +10,7 @@ #include <unistd.h>
#include "../kselftest.h" -#include "../pidfd/pidfd.h" +#include "../pidfd/pidfd_helpers.h" #include "cgroup_util.h"
/* diff --git a/tools/testing/selftests/pid_namespace/regression_enomem.c b/tools/testing/selftests/pid_namespace/regression_enomem.c index 7d84097ad45c..f3e6989c8069 100644 --- a/tools/testing/selftests/pid_namespace/regression_enomem.c +++ b/tools/testing/selftests/pid_namespace/regression_enomem.c @@ -12,7 +12,7 @@ #include <sys/wait.h>
#include "../kselftest_harness.h" -#include "../pidfd/pidfd.h" +#include "../pidfd/pidfd_helpers.h"
/* * Regression test for: diff --git a/tools/testing/selftests/pidfd/pidfd.h b/tools/testing/selftests/pidfd/pidfd.h index 0b96ac4b8ce5..d02cfc5ef77b 100644 --- a/tools/testing/selftests/pidfd/pidfd.h +++ b/tools/testing/selftests/pidfd/pidfd.h @@ -17,6 +17,7 @@
#include "../kselftest.h" #include "../clone3/clone3_selftests.h" +#include "pidfd_helpers.h"
#ifndef P_PIDFD #define P_PIDFD 3 @@ -73,31 +74,6 @@ static inline int sys_waitid(int which, pid_t pid, siginfo_t *info, int options) return syscall(__NR_waitid, which, pid, info, options, NULL); }
-static inline int wait_for_pid(pid_t pid) -{ - int status, ret; - -again: - ret = waitpid(pid, &status, 0); - if (ret == -1) { - if (errno == EINTR) - goto again; - - ksft_print_msg("waitpid returned -1, errno=%d\n", errno); - return -1; - } - - if (!WIFEXITED(status)) { - ksft_print_msg( - "waitpid !WIFEXITED, WIFSIGNALED=%d, WTERMSIG=%d\n", - WIFSIGNALED(status), WTERMSIG(status)); - return -1; - } - - ret = WEXITSTATUS(status); - return ret; -} - static inline int sys_pidfd_open(pid_t pid, unsigned int flags) { return syscall(__NR_pidfd_open, pid, flags); diff --git a/tools/testing/selftests/pidfd/pidfd_helpers.h b/tools/testing/selftests/pidfd/pidfd_helpers.h new file mode 100644 index 000000000000..5637bfe888de --- /dev/null +++ b/tools/testing/selftests/pidfd/pidfd_helpers.h @@ -0,0 +1,39 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __PIDFD_HELPERS_H +#define __PIDFD_HELPERS_H + +#define _GNU_SOURCE +#include <errno.h> +#include <stdlib.h> +#include <sys/types.h> +#include <sys/wait.h> +#include <unistd.h> +#include "../kselftest.h" + +static inline int wait_for_pid(pid_t pid) +{ + int status, ret; + +again: + ret = waitpid(pid, &status, 0); + if (ret == -1) { + if (errno == EINTR) + goto again; + + ksft_print_msg("waitpid returned -1, errno=%d\n", errno); + return -1; + } + + if (!WIFEXITED(status)) { + ksft_print_msg( + "waitpid !WIFEXITED, WIFSIGNALED=%d, WTERMSIG=%d\n", + WIFSIGNALED(status), WTERMSIG(status)); + return -1; + } + + ret = WEXITSTATUS(status); + return ret; +} + +#endif /* __PIDFD_HELPERS_H */
On Thu, Jan 30, 2025 at 08:40:28PM +0000, Lorenzo Stoakes wrote:
It seems tests other than the pidfd tests use the wait_for_pid() function declared in pidfd.h.
Since we will shortly be modifying pidfd.h in a way that might clash with other tests, separate this out and update tests accordingly.
Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com
Reviewed-by: Shakeel Butt shakeel.butt@linux.dev
Conflicts can arise between system fcntl.h and linux/fcntl.h, imported by the linux/pidfd.h UAPI header.
Work around this by adding a wrapper for linux/pidfd.h to tools/include/ which sets the linux/fcntl.h header guard ahead of importing the pidfd.h header file.
Adjust the pidfd selftests Makefile to reference this include directory and put it at a higher precidence than any make header installed headers to ensure the wrapper is preferred.
This way we can directly import the UAPI header file without issue, use the latest system header file without having to duplicate anything.
Reviewed-by: Shuah Khan skhan@linuxfoundation.org Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com --- tools/include/linux/pidfd.h | 14 ++++++++++++++ tools/testing/selftests/pidfd/Makefile | 3 +-- 2 files changed, 15 insertions(+), 2 deletions(-) create mode 100644 tools/include/linux/pidfd.h
diff --git a/tools/include/linux/pidfd.h b/tools/include/linux/pidfd.h new file mode 100644 index 000000000000..113c8023072d --- /dev/null +++ b/tools/include/linux/pidfd.h @@ -0,0 +1,14 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ + +#ifndef _TOOLS_LINUX_PIDFD_H +#define _TOOLS_LINUX_PIDFD_H + +/* + * Some systems have issues with the linux/fcntl.h import in linux/pidfd.h, so + * work around this by setting the header guard. + */ +#define _LINUX_FCNTL_H +#include "../../../include/uapi/linux/pidfd.h" +#undef _LINUX_FCNTL_H + +#endif /* _TOOLS_LINUX_PIDFD_H */ diff --git a/tools/testing/selftests/pidfd/Makefile b/tools/testing/selftests/pidfd/Makefile index 301343a11b62..5363d5ab27a4 100644 --- a/tools/testing/selftests/pidfd/Makefile +++ b/tools/testing/selftests/pidfd/Makefile @@ -1,9 +1,8 @@ # SPDX-License-Identifier: GPL-2.0-only -CFLAGS += -g $(KHDR_INCLUDES) -pthread -Wall +CFLAGS += -g -isystem $(top_srcdir)/tools/include $(KHDR_INCLUDES) -pthread -Wall
TEST_GEN_PROGS := pidfd_test pidfd_fdinfo_test pidfd_open_test \ pidfd_poll_test pidfd_wait pidfd_getfd_test pidfd_setns_test \ pidfd_file_handle_test pidfd_bind_mount
include ../lib.mk -
Add tests to assert that PIDFD_SELF* correctly refers to the current thread and process.
We explicitly test pidfd_send_signal(), however We defer testing of mm-specific functionality which uses pidfd, namely process_madvise() and process_mrelease() to mm testing (though note the latter can not be sensibly tested as it would require the testing process to be dying).
We also correct the pidfd_open_test.c fields which refer to .request_mask whereas the UAPI header refers to .mask, which otherwise break the import of the UAPI header.
Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com --- tools/testing/selftests/pidfd/pidfd.h | 2 + .../testing/selftests/pidfd/pidfd_open_test.c | 6 +- tools/testing/selftests/pidfd/pidfd_test.c | 76 ++++++++++++++++--- 3 files changed, 69 insertions(+), 15 deletions(-)
diff --git a/tools/testing/selftests/pidfd/pidfd.h b/tools/testing/selftests/pidfd/pidfd.h index d02cfc5ef77b..cc684d872253 100644 --- a/tools/testing/selftests/pidfd/pidfd.h +++ b/tools/testing/selftests/pidfd/pidfd.h @@ -15,6 +15,8 @@ #include <sys/types.h> #include <sys/wait.h>
+#include <linux/pidfd.h> + #include "../kselftest.h" #include "../clone3/clone3_selftests.h" #include "pidfd_helpers.h" diff --git a/tools/testing/selftests/pidfd/pidfd_open_test.c b/tools/testing/selftests/pidfd/pidfd_open_test.c index ce413a221bac..9a40ccb1ff6d 100644 --- a/tools/testing/selftests/pidfd/pidfd_open_test.c +++ b/tools/testing/selftests/pidfd/pidfd_open_test.c @@ -31,7 +31,7 @@ #define PIDFD_INFO_CGROUPID (1UL << 0)
struct pidfd_info { - __u64 request_mask; + __u64 mask; __u64 cgroupid; __u32 pid; __u32 tgid; @@ -148,7 +148,7 @@ static pid_t get_pid_from_fdinfo_file(int pidfd, const char *key, size_t keylen) int main(int argc, char **argv) { struct pidfd_info info = { - .request_mask = PIDFD_INFO_CGROUPID, + .mask = PIDFD_INFO_CGROUPID, }; int pidfd = -1, ret = 1; pid_t pid; @@ -227,7 +227,7 @@ int main(int argc, char **argv) getegid(), info.sgid); goto on_error; } - if ((info.request_mask & PIDFD_INFO_CGROUPID) && info.cgroupid == 0) { + if ((info.mask & PIDFD_INFO_CGROUPID) && info.cgroupid == 0) { ksft_print_msg("cgroupid should not be 0 when PIDFD_INFO_CGROUPID is set\n"); goto on_error; } diff --git a/tools/testing/selftests/pidfd/pidfd_test.c b/tools/testing/selftests/pidfd/pidfd_test.c index e9728e86b4f2..fcd85cad9f18 100644 --- a/tools/testing/selftests/pidfd/pidfd_test.c +++ b/tools/testing/selftests/pidfd/pidfd_test.c @@ -42,12 +42,41 @@ static pid_t pidfd_clone(int flags, int *pidfd, int (*fn)(void *)) #endif }
-static int signal_received; +static pthread_t signal_received;
static void set_signal_received_on_sigusr1(int sig) { if (sig == SIGUSR1) - signal_received = 1; + signal_received = pthread_self(); +} + +static int send_signal(int pidfd) +{ + int ret = 0; + + if (sys_pidfd_send_signal(pidfd, SIGUSR1, NULL, 0) < 0) { + ret = -EINVAL; + goto exit; + } + + if (signal_received != pthread_self()) { + ret = -EINVAL; + goto exit; + } + +exit: + signal_received = 0; + return ret; +} + +static void *send_signal_worker(void *arg) +{ + int pidfd = (int)(intptr_t)arg; + int ret; + + /* We forward any errors for the caller to handle. */ + ret = send_signal(pidfd); + return (void *)(intptr_t)ret; }
/* @@ -56,8 +85,11 @@ static void set_signal_received_on_sigusr1(int sig) */ static int test_pidfd_send_signal_simple_success(void) { - int pidfd, ret; + int pidfd; const char *test_name = "pidfd_send_signal send SIGUSR1"; + pthread_t thread; + void *thread_res; + int err;
if (!have_pidfd_send_signal) { ksft_test_result_skip( @@ -66,25 +98,45 @@ static int test_pidfd_send_signal_simple_success(void) return 0; }
+ signal(SIGUSR1, set_signal_received_on_sigusr1); + + /* Try sending a signal to ourselves via /proc/self. */ pidfd = open("/proc/self", O_DIRECTORY | O_CLOEXEC); if (pidfd < 0) ksft_exit_fail_msg( "%s test: Failed to open process file descriptor\n", test_name); + err = send_signal(pidfd); + if (err) + ksft_exit_fail_msg( + "%s test: Error %d on sending pidfd signal\n", + test_name, err); + close(pidfd);
- signal(SIGUSR1, set_signal_received_on_sigusr1); + /* Now try the same thing only using PIDFD_SELF_THREAD_GROUP. */ + err = send_signal(PIDFD_SELF_THREAD_GROUP); + if (err) + ksft_exit_fail_msg( + "%s test: Error %d on PIDFD_SELF_THREAD_GROUP signal\n", + test_name, err);
- ret = sys_pidfd_send_signal(pidfd, SIGUSR1, NULL, 0); - close(pidfd); - if (ret < 0) - ksft_exit_fail_msg("%s test: Failed to send signal\n", + /* + * Now try the same thing in a thread and assert thread ID is equal to + * worker thread ID. + */ + if (pthread_create(&thread, NULL, send_signal_worker, + (void *)(intptr_t)PIDFD_SELF_THREAD)) + ksft_exit_fail_msg("%s test: Failed to create thread\n", test_name); - - if (signal_received != 1) - ksft_exit_fail_msg("%s test: Failed to receive signal\n", + if (pthread_join(thread, &thread_res)) + ksft_exit_fail_msg("%s test: Failed to join thread\n", test_name); + err = (int)(intptr_t)thread_res; + if (err) + ksft_exit_fail_msg( + "%s test: Error %d on PIDFD_SELF_THREAD signal\n", + test_name, err);
- signal_received = 0; ksft_test_result_pass("%s test: Sent signal\n", test_name); return 0; }
On Thu, Jan 30, 2025 at 08:40:30PM +0000, Lorenzo Stoakes wrote:
Add tests to assert that PIDFD_SELF* correctly refers to the current thread and process.
We explicitly test pidfd_send_signal(), however We defer testing of mm-specific functionality which uses pidfd, namely process_madvise() and process_mrelease() to mm testing (though note the latter can not be sensibly tested as it would require the testing process to be dying).
We also correct the pidfd_open_test.c fields which refer to .request_mask whereas the UAPI header refers to .mask, which otherwise break the import of the UAPI header.
Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com
Reviewed-by: Shakeel Butt shakeel.butt@linux.dev
Now we have PIDFD_SELF available for process_madvise(), make use of it in the guard pages test.
This is both more convenient and asserts that PIDFD_SELF works as expected.
Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com --- tools/testing/selftests/mm/Makefile | 4 ++++ tools/testing/selftests/mm/guard-pages.c | 15 +++------------ 2 files changed, 7 insertions(+), 12 deletions(-)
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile index 63ce39d024bb..ecc4252fa3fe 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -225,6 +225,10 @@ $(OUTPUT)/ksm_tests: LDLIBS += -lnuma
$(OUTPUT)/migration: LDLIBS += -lnuma
+# We need uapi/pdifd.h but need to work around broken linux/fcntl.h, so +# explicitly import +$(OUTPUT)/guard-pages: CFLAGS += -I $(top_srcdir)/tools/include + local_config.mk local_config.h: check_config.sh /bin/sh ./check_config.sh $(CC)
diff --git a/tools/testing/selftests/mm/guard-pages.c b/tools/testing/selftests/mm/guard-pages.c index ece37212a8a2..549653724661 100644 --- a/tools/testing/selftests/mm/guard-pages.c +++ b/tools/testing/selftests/mm/guard-pages.c @@ -18,6 +18,7 @@ #include <sys/syscall.h> #include <sys/uio.h> #include <unistd.h> +#include <linux/pidfd.h>
/* * Ignore the checkpatch warning, as per the C99 standard, section 7.14.1.1: @@ -50,11 +51,6 @@ static void handle_fatal(int c) siglongjmp(signal_jmp_buf, c); }
-static int pidfd_open(pid_t pid, unsigned int flags) -{ - return syscall(SYS_pidfd_open, pid, flags); -} - static ssize_t sys_process_madvise(int pidfd, const struct iovec *iovec, size_t n, int advice, unsigned int flags) { @@ -370,14 +366,10 @@ TEST_F(guard_pages, multi_vma) TEST_F(guard_pages, process_madvise) { const unsigned long page_size = self->page_size; - pid_t pid = getpid(); - int pidfd = pidfd_open(pid, 0); char *ptr_region, *ptr1, *ptr2, *ptr3; ssize_t count; struct iovec vec[6];
- ASSERT_NE(pidfd, -1); - /* Reserve region to map over. */ ptr_region = mmap(NULL, 100 * page_size, PROT_NONE, MAP_ANON | MAP_PRIVATE, -1, 0); @@ -425,7 +417,7 @@ TEST_F(guard_pages, process_madvise) ASSERT_EQ(munmap(&ptr_region[99 * page_size], page_size), 0);
/* Now guard in one step. */ - count = sys_process_madvise(pidfd, vec, 6, MADV_GUARD_INSTALL, 0); + count = sys_process_madvise(PIDFD_SELF, vec, 6, MADV_GUARD_INSTALL, 0);
/* OK we don't have permission to do this, skip. */ if (count == -1 && errno == EPERM) @@ -446,7 +438,7 @@ TEST_F(guard_pages, process_madvise) ASSERT_FALSE(try_read_write_buf(&ptr3[19 * page_size]));
/* Now do the same with unguard... */ - count = sys_process_madvise(pidfd, vec, 6, MADV_GUARD_REMOVE, 0); + count = sys_process_madvise(PIDFD_SELF, vec, 6, MADV_GUARD_REMOVE, 0);
/* ...and everything should now succeed. */
@@ -463,7 +455,6 @@ TEST_F(guard_pages, process_madvise) ASSERT_EQ(munmap(ptr1, 10 * page_size), 0); ASSERT_EQ(munmap(ptr2, 5 * page_size), 0); ASSERT_EQ(munmap(ptr3, 20 * page_size), 0); - close(pidfd); }
/* Assert that unmapping ranges does not leave guard markers behind. */
On Thu, Jan 30, 2025 at 08:40:31PM +0000, Lorenzo Stoakes wrote:
Now we have PIDFD_SELF available for process_madvise(), make use of it in the guard pages test.
This is both more convenient and asserts that PIDFD_SELF works as expected.
Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com
Reviewed-by: Shakeel Butt shakeel.butt@linux.dev
On Thu, 30 Jan 2025 20:40:25 +0000 Lorenzo Stoakes lorenzo.stoakes@oracle.com wrote:
If you wish to utilise a pidfd interface to refer to the current process or thread it is rather cumbersome, requiring something like:
int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD);
...
close(pidfd);
Or the equivalent call opening /proc/self. It is more convenient to use a sentinel value to indicate to an interface that accepts a pidfd that we simply wish to refer to the current process thread.
The above code sequence doesn't seem at all onerous. I'm not understanding why it's worth altering the kernel to permit this little shortcut?
On Thu, Jan 30, 2025 at 02:37:54PM -0800, Andrew Morton wrote:
On Thu, 30 Jan 2025 20:40:25 +0000 Lorenzo Stoakes lorenzo.stoakes@oracle.com wrote:
If you wish to utilise a pidfd interface to refer to the current process or thread it is rather cumbersome, requiring something like:
int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD);
...
close(pidfd);
Or the equivalent call opening /proc/self. It is more convenient to use a sentinel value to indicate to an interface that accepts a pidfd that we simply wish to refer to the current process thread.
The above code sequence doesn't seem at all onerous. I'm not understanding why it's worth altering the kernel to permit this little shortcut?
In practice it adds quite a bit of overhead for something that whatever mechanism is using the pidfd can avoid.
It was specifically intended for a real case of utilising process_madvise(), using the newly extended ability to batch _any_ madvise() operations for the current process, like:
if (process_madvise(PIDFD_SELF, iovec, 10, MADV_GUARD_INSTALL, 0)) { ... error handling ... }
vs.
pid_t pid = getpid(); int pidfd = pidfd_open(pid, PIDFD_THREAD);
if (pidfd < 0) { ... error handling ... }
if (process_madvise(PIDFD_SELF, iovec, 10, MADV_GUARD_INSTALL, 0)) { ... cleanup pidfd ... ... error handling ... }
...
... cleanup pidfd ...
So in practice, it's actually a lot more ceremony and noise. Suren has been working with this code in practice and found this to be useful.
The suggestion to embed it as PIDFD_SELF rather than to pass it as a process_madvise() flag was made on the original series where I extended its functionality.
So in practice I think it's onerous enough to justify this, plus it allows for a more fluent use of pidfd's in other cases where one is referring to the same process/thread, to the extent that I've seen people commenting on supporting it while sending series relating to pidfd.
Also Christian and others appear to support this idea.
On Thu, Jan 30, 2025 at 10:53 PM Lorenzo Stoakes lorenzo.stoakes@oracle.com wrote:
On Thu, Jan 30, 2025 at 02:37:54PM -0800, Andrew Morton wrote:
On Thu, 30 Jan 2025 20:40:25 +0000 Lorenzo Stoakes lorenzo.stoakes@oracle.com wrote:
If you wish to utilise a pidfd interface to refer to the current process or thread it is rather cumbersome, requiring something like:
int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD); ... close(pidfd);
Or the equivalent call opening /proc/self. It is more convenient to use a sentinel value to indicate to an interface that accepts a pidfd that we simply wish to refer to the current process thread.
The above code sequence doesn't seem at all onerous. I'm not understanding why it's worth altering the kernel to permit this little shortcut?
In practice it adds quite a bit of overhead for something that whatever mechanism is using the pidfd can avoid.
It was specifically intended for a real case of utilising process_madvise(), using the newly extended ability to batch _any_ madvise() operations for the current process, like:
if (process_madvise(PIDFD_SELF, iovec, 10, MADV_GUARD_INSTALL, 0)) { ... error handling ... }
vs.
pid_t pid = getpid(); int pidfd = pidfd_open(pid, PIDFD_THREAD); if (pidfd < 0) { ... error handling ... } if (process_madvise(PIDFD_SELF, iovec, 10, MADV_GUARD_INSTALL, 0)) { ... cleanup pidfd ... ... error handling ... } ... ... cleanup pidfd ...
So in practice, it's actually a lot more ceremony and noise. Suren has been working with this code in practice and found this to be useful.
It's also nice to add that people on the libc/allocator side should also appreciate skipping pidfd_open's reliability concerns (mostly, that RLIMIT_NOFILE Should Not(tm) ever affect thread spawning or a malloc[1]). Besides the big syscall reduction and nice speedup, that is.
[1] whether this is the already case is an exercise left to the reader, but at the very least we should not add onto existing problems
On Thu, 30 Jan 2025 23:10:53 +0000 Pedro Falcato pedro.falcato@gmail.com wrote:
On Thu, Jan 30, 2025 at 10:53 PM Lorenzo Stoakes lorenzo.stoakes@oracle.com wrote:
The above code sequence doesn't seem at all onerous. I'm not understanding why it's worth altering the kernel to permit this little shortcut?
In practice it adds quite a bit of overhead for something that whatever mechanism is using the pidfd can avoid.
It was specifically intended for a real case of utilising process_madvise(), using the newly extended ability to batch _any_ madvise() operations for the current process, like:
if (process_madvise(PIDFD_SELF, iovec, 10, MADV_GUARD_INSTALL, 0)) { ... error handling ... }
vs.
pid_t pid = getpid(); int pidfd = pidfd_open(pid, PIDFD_THREAD); if (pidfd < 0) { ... error handling ... } if (process_madvise(PIDFD_SELF, iovec, 10, MADV_GUARD_INSTALL, 0)) { ... cleanup pidfd ... ... error handling ... } ... ... cleanup pidfd ...
So in practice, it's actually a lot more ceremony and noise. Suren has been working with this code in practice and found this to be useful.
It's also nice to add that people on the libc/allocator side should also appreciate skipping pidfd_open's reliability concerns (mostly, that RLIMIT_NOFILE Should Not(tm) ever affect thread spawning or a malloc[1]). Besides the big syscall reduction and nice speedup, that is.
[1] whether this is the already case is an exercise left to the reader, but at the very least we should not add onto existing problems
Thanks.
Could we please get all the above spelled out much more thoroughly in the [0/n] description (aka Patch Series Sales Brochure)?
On Thu, Jan 30, 2025 at 03:32:36PM -0800, Andrew Morton wrote:
On Thu, 30 Jan 2025 23:10:53 +0000 Pedro Falcato pedro.falcato@gmail.com wrote:
On Thu, Jan 30, 2025 at 10:53 PM Lorenzo Stoakes lorenzo.stoakes@oracle.com wrote:
The above code sequence doesn't seem at all onerous. I'm not understanding why it's worth altering the kernel to permit this little shortcut?
In practice it adds quite a bit of overhead for something that whatever mechanism is using the pidfd can avoid.
It was specifically intended for a real case of utilising process_madvise(), using the newly extended ability to batch _any_ madvise() operations for the current process, like:
if (process_madvise(PIDFD_SELF, iovec, 10, MADV_GUARD_INSTALL, 0)) { ... error handling ... }
vs.
pid_t pid = getpid(); int pidfd = pidfd_open(pid, PIDFD_THREAD); if (pidfd < 0) { ... error handling ... } if (process_madvise(PIDFD_SELF, iovec, 10, MADV_GUARD_INSTALL, 0)) { ... cleanup pidfd ... ... error handling ... } ... ... cleanup pidfd ...
So in practice, it's actually a lot more ceremony and noise. Suren has been working with this code in practice and found this to be useful.
It's also nice to add that people on the libc/allocator side should also appreciate skipping pidfd_open's reliability concerns (mostly, that RLIMIT_NOFILE Should Not(tm) ever affect thread spawning or a malloc[1]). Besides the big syscall reduction and nice speedup, that is.
[1] whether this is the already case is an exercise left to the reader, but at the very least we should not add onto existing problems
Thanks.
Could we please get all the above spelled out much more thoroughly in the [0/n] description (aka Patch Series Sales Brochure)?
Ack, will expand if there's a respin, or Christian - perhaps you could fold the above explanation into the cover letter?
Intent is for Christian to take this in his tree (if he so wishes) to be clear!
Cheers
On Sat, Feb 01, 2025 at 12:12:46PM +0100, Christian Brauner wrote:
Intent is for Christian to take this in his tree (if he so wishes) to be clear!
If you send me an updated blurb I can fold it.
Thanks, I therefore propose replacing the cover letter blurb with the below:
----8<---- If you wish to utilise a pidfd interface to refer to the current process or thread then there is a lot of ceremony involved, looking something like:
pid_t pid = getpid(); int pidfd = pidfd_open(pid, PIDFD_THREAD);
if (pidfd < 0) { ... error handling ... }
if (process_madvise(pidfd, iovec, 8, MADV_GUARD_INSTALL, 0)) { ... cleanup pidfd ... ... error handling ... }
...
... cleanup pidfd ...
This adds unnecessary overhead + system calls, complicated error handling and in addition pidfd_open() is subject to RLIMIT_NOFILE (i.e. the limit of per-process number of open file descriptors), so the call may fail spuriously on this basis.
Rather than doing this we can make use of sentinels for this purpose which can be passed as the pidfd instead. This looks like:
if (process_madvise(PIDFD_SELF, iovec, 8, MADV_GUARD_INSTALL, 0)) { ... error handling ... }
And avoids all of the aforementioned issues. This series introduces such sentinels.
It is useful to refer to both the current thread from the userland's perspective for which we use PIDFD_SELF, and the current process from the userland's perspective, for which we use PIDFD_SELF_PROCESS.
There is unfortunately some confusion between the kernel and userland as to what constitutes a process - a thread from the userland perspective is a process in userland, and a userland process is a thread group (more specifically the thread group leader from the kernel perspective). We therefore alias things thusly:
* PIDFD_SELF_THREAD aliased by PIDFD_SELF - use PIDTYPE_PID. * PIDFD_SELF_THREAD_GROUP alised by PIDFD_SELF_PROCESS - use PIDTYPE_TGID.
In all of the kernel code we refer to PIDFD_SELF_THREAD and PIDFD_SELF_THREAD_GROUP. However we expect users to use PIDFD_SELF and PIDFD_SELF_PROCESS.
This matters for cases where, for instance, a user unshare()'s FDs or does thread-specific signal handling and where the user would be hugely confused if the FDs referenced or signal processed referred to the thread group leader rather than the individual thread.
For now we only adjust pidfd_get_task() and the pidfd_send_signal() system call with specific handling for this, implementing this functionality for process_madvise(), process_mrelease() (albeit, using it here wouldn't really make sense) and pidfd_send_signal().
We defer making further changes, as this would require a significant rework of the pidfd mechanism.
The motivating case here is to support PIDFD_SELF in process_madvise(), so this suffices for immediate uses. Moving forward, this can be further expanded to other uses.
On Thu, 30 Jan 2025 20:40:25 +0000, Lorenzo Stoakes wrote:
If you wish to utilise a pidfd interface to refer to the current process or thread it is rather cumbersome, requiring something like:
int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD);
...
[...]
Updated merge message. I've slightly rearranged pidfd_send_signal() so we don't have to call CLASS(fd, f)(pidfd) unconditionally anymore.
---
Applied to the vfs-6.15.pidfs branch of the vfs/vfs.git tree. Patches in the vfs-6.15.pidfs branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase, trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git branch: vfs-6.15.pidfs
[1/6] pidfd: add PIDFD_SELF* sentinels to refer to own thread/process https://git.kernel.org/vfs/vfs/c/e6e4ed42f8d8 [2/6] selftests/pidfd: add missing system header imcludes to pidfd tests https://git.kernel.org/vfs/vfs/c/c9f04f4a251d [3/6] tools: testing: separate out wait_for_pid() into helper header https://git.kernel.org/vfs/vfs/c/fb67fe44116e [4/6] selftests: pidfd: add pidfd.h UAPI wrapper https://git.kernel.org/vfs/vfs/c/ac331e56724d [5/6] selftests: pidfd: add tests for PIDFD_SELF_* https://git.kernel.org/vfs/vfs/c/881a3515c191 [6/6] selftests/mm: use PIDFD_SELF in guard pages test https://git.kernel.org/vfs/vfs/c/b4703f056f42
On Tue, Feb 04, 2025 at 10:46:35AM +0100, Christian Brauner wrote:
On Thu, 30 Jan 2025 20:40:25 +0000, Lorenzo Stoakes wrote:
If you wish to utilise a pidfd interface to refer to the current process or thread it is rather cumbersome, requiring something like:
int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD);
...
[...]
Updated merge message. I've slightly rearranged pidfd_send_signal() so we don't have to call CLASS(fd, f)(pidfd) unconditionally anymore.
Sounds good and thank you! Glad to get this in :)
Applied to the vfs-6.15.pidfs branch of the vfs/vfs.git tree. Patches in the vfs-6.15.pidfs branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase, trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git branch: vfs-6.15.pidfs
[1/6] pidfd: add PIDFD_SELF* sentinels to refer to own thread/process https://git.kernel.org/vfs/vfs/c/e6e4ed42f8d8 [2/6] selftests/pidfd: add missing system header imcludes to pidfd tests https://git.kernel.org/vfs/vfs/c/c9f04f4a251d [3/6] tools: testing: separate out wait_for_pid() into helper header https://git.kernel.org/vfs/vfs/c/fb67fe44116e [4/6] selftests: pidfd: add pidfd.h UAPI wrapper https://git.kernel.org/vfs/vfs/c/ac331e56724d [5/6] selftests: pidfd: add tests for PIDFD_SELF_* https://git.kernel.org/vfs/vfs/c/881a3515c191 [6/6] selftests/mm: use PIDFD_SELF in guard pages test https://git.kernel.org/vfs/vfs/c/b4703f056f42
On Tue, Feb 4, 2025 at 2:01 AM Lorenzo Stoakes lorenzo.stoakes@oracle.com wrote:
On Tue, Feb 04, 2025 at 10:46:35AM +0100, Christian Brauner wrote:
On Thu, 30 Jan 2025 20:40:25 +0000, Lorenzo Stoakes wrote:
If you wish to utilise a pidfd interface to refer to the current process or thread it is rather cumbersome, requiring something like:
int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD); ...
[...]
Updated merge message. I've slightly rearranged pidfd_send_signal() so we don't have to call CLASS(fd, f)(pidfd) unconditionally anymore.
Sounds good and thank you! Glad to get this in :)
Sorry, a bit late to the party...
We were discussing MADV_GUARD_INSTALL use with Android Bionic team and the possibility of caching pidfd_open() result for reuse when installing multiple guards, however doing that in libraries would pose issues as we can't predict the user behavior, which can fork() in between such calls. That would be an additional reason why having these sentinels is beneficial.
Applied to the vfs-6.15.pidfs branch of the vfs/vfs.git tree. Patches in the vfs-6.15.pidfs branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase, trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git branch: vfs-6.15.pidfs
[1/6] pidfd: add PIDFD_SELF* sentinels to refer to own thread/process https://git.kernel.org/vfs/vfs/c/e6e4ed42f8d8 [2/6] selftests/pidfd: add missing system header imcludes to pidfd tests https://git.kernel.org/vfs/vfs/c/c9f04f4a251d [3/6] tools: testing: separate out wait_for_pid() into helper header https://git.kernel.org/vfs/vfs/c/fb67fe44116e [4/6] selftests: pidfd: add pidfd.h UAPI wrapper https://git.kernel.org/vfs/vfs/c/ac331e56724d [5/6] selftests: pidfd: add tests for PIDFD_SELF_* https://git.kernel.org/vfs/vfs/c/881a3515c191 [6/6] selftests/mm: use PIDFD_SELF in guard pages test https://git.kernel.org/vfs/vfs/c/b4703f056f42
On Tue, Feb 04, 2025 at 09:43:31AM -0800, Suren Baghdasaryan wrote:
On Tue, Feb 4, 2025 at 2:01 AM Lorenzo Stoakes lorenzo.stoakes@oracle.com wrote:
On Tue, Feb 04, 2025 at 10:46:35AM +0100, Christian Brauner wrote:
On Thu, 30 Jan 2025 20:40:25 +0000, Lorenzo Stoakes wrote:
If you wish to utilise a pidfd interface to refer to the current process or thread it is rather cumbersome, requiring something like:
int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD); ...
[...]
Updated merge message. I've slightly rearranged pidfd_send_signal() so we don't have to call CLASS(fd, f)(pidfd) unconditionally anymore.
Sounds good and thank you! Glad to get this in :)
Sorry, a bit late to the party...
We were discussing MADV_GUARD_INSTALL use with Android Bionic team and the possibility of caching pidfd_open() result for reuse when installing multiple guards, however doing that in libraries would pose issues as we can't predict the user behavior, which can fork() in between such calls. That would be an additional reason why having these sentinels is beneficial.
Ok, added this to the cover letter as well.
Note that starting with v6.14 pidfs supports file handles. This works because pidfs provides each pidfd with a unique 64bit inode number that is exposed in statx(). On 64-bit the ->st_ino simply is the inode number. On 32-bit the unique identifier can be reconstructed using ->st_ino and the inode generation number which can be retrieved via the FS_IOC_GETVERSION ioctl. So the 64-bit identifier on 32-bit is reconstructed by using ->st_ino as the lower 32-bits and the 32-bit generation number as the upper 32-bits.
Also note that since the introduction of pidfs each struct pid will refer to a different inode but the same struct pid will refer to the same inode if it's opened multiple times. In contrast to pre-pidfs pidfds where each struct pid refered to the same inode.
IOW, with pidfs statx() is sufficient to compare to pidfds whether they refer to the same process. On 64-bit it's sufficient to do the usual st1->st_dev == st2->st_dev && st1->st_ino == st2->st_ino and on 32-bit you will want to also compare the generation number:
TEST_F(pidfd_bind_mount, reopen) { int pidfd; char proc_path[PATH_MAX];
sprintf(proc_path, "/proc/self/fd/%d", self->pidfd); pidfd = open(proc_path, O_RDONLY | O_NOCTTY | O_CLOEXEC); ASSERT_GE(pidfd, 0);
ASSERT_GE(fstat(self->pidfd, &self->st2), 0); ASSERT_EQ(ioctl(self->pidfd, FS_IOC_GETVERSION, &self->gen2), 0);
ASSERT_TRUE(self->st1.st_dev == self->st2.st_dev && self->st1.st_ino == self->st2.st_ino); ASSERT_TRUE(self->gen1 == self->gen2);
ASSERT_EQ(close(pidfd), 0); }
Plus, you can bind-mount them now.
In any case, this allows us to create file handles that are unique for the lifetime of the system. Please see
tools/testing/selftests/pidfd/pidfd_file_handle_test.c
for how that works. The gist is that decoding and encoding for pidfs is unprivileged and the only requirement we have is that the process the file handle resolves to must be valid in the caller's pid namespace hierarchy:
TEST_F(file_handle, file_handle_child_pidns) { int mnt_id; struct file_handle *fh; int pidfd = -EBADF; struct stat st1, st2;
fh = malloc(sizeof(struct file_handle) + MAX_HANDLE_SZ); ASSERT_NE(fh, NULL); memset(fh, 0, sizeof(struct file_handle) + MAX_HANDLE_SZ); fh->handle_bytes = MAX_HANDLE_SZ;
ASSERT_EQ(name_to_handle_at(self->child_pidfd2, "", fh, &mnt_id, AT_EMPTY_PATH), 0);
ASSERT_EQ(fstat(self->child_pidfd2, &st1), 0);
pidfd = open_by_handle_at(self->pidfd, fh, 0); ASSERT_GE(pidfd, 0);
ASSERT_EQ(fstat(pidfd, &st2), 0); ASSERT_TRUE(st1.st_dev == st2.st_dev && st1.st_ino == st2.st_ino);
ASSERT_EQ(close(pidfd), 0);
pidfd = open_by_handle_at(self->pidfd, fh, O_CLOEXEC); ASSERT_GE(pidfd, 0);
ASSERT_EQ(fstat(pidfd, &st2), 0); ASSERT_TRUE(st1.st_dev == st2.st_dev && st1.st_ino == st2.st_ino);
ASSERT_EQ(close(pidfd), 0);
pidfd = open_by_handle_at(self->pidfd, fh, O_NONBLOCK); ASSERT_GE(pidfd, 0);
ASSERT_EQ(fstat(pidfd, &st2), 0); ASSERT_TRUE(st1.st_dev == st2.st_dev && st1.st_ino == st2.st_ino);
ASSERT_EQ(close(pidfd), 0);
free(fh); }
So you don't need to keep the fd open.
Applied to the vfs-6.15.pidfs branch of the vfs/vfs.git tree. Patches in the vfs-6.15.pidfs branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase, trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git branch: vfs-6.15.pidfs
[1/6] pidfd: add PIDFD_SELF* sentinels to refer to own thread/process https://git.kernel.org/vfs/vfs/c/e6e4ed42f8d8 [2/6] selftests/pidfd: add missing system header imcludes to pidfd tests https://git.kernel.org/vfs/vfs/c/c9f04f4a251d [3/6] tools: testing: separate out wait_for_pid() into helper header https://git.kernel.org/vfs/vfs/c/fb67fe44116e [4/6] selftests: pidfd: add pidfd.h UAPI wrapper https://git.kernel.org/vfs/vfs/c/ac331e56724d [5/6] selftests: pidfd: add tests for PIDFD_SELF_* https://git.kernel.org/vfs/vfs/c/881a3515c191 [6/6] selftests/mm: use PIDFD_SELF in guard pages test https://git.kernel.org/vfs/vfs/c/b4703f056f42
linux-kselftest-mirror@lists.linaro.org