[RFC PATCH 0/3] introduce PIDFD_SELF

List overview All Threads
Download

newer

older

[PATCH] selftests: vDSO: align...

Re: [PATCH v2 1/2] exec: add a...

Lorenzo Stoakes

30 Sep 2024 30 Sep '24

9:22 a.m.

If you wish to utilise a pidfd interface to refer to the current process (from the point of view of userland - from the kernel point of view - the thread group leader), it is rather cumbersome, requiring something like:

int pidfd = pidfd_open(getpid(), 0);

...

close(pidfd);

Or the equivalent call opening /proc/self. It is more convenient to use a sentinel value to indicate to an interface that accepts a pidfd that we simply wish to refer to the current process.

This series introduces such a sentinel, PIDFD_SELF, which can be passed as the pidfd in this instance rather than having to establish a dummy fd for this purpose.

The only pidfd interface where this is particularly useful is process_madvise(), which provides the motivation for this series. However, as this is a general interface, we ensure that all pidfd interfaces can handle this correctly.

We ensure that pidfd_send_signal() and pidfd_getfd() work correctly, and assert as much in selftests. All other interfaces except setns() will work implicitly with this new interface, however it doesn't make sense to test waitid(P_PIDFD, ...) as waiting on ourselves is a blocking operation.

In the case of setns() we explicitly disallow use of PIDFD_SELF as it doesn't make sense to obtain the namespaces of our own process, and it would require work to implement this functionality there that would be of no use.

We also do not provide the ability to utilise PIDFD_SELF in ordinary fd operations such as open() or poll(), as this would require extensive work and be of no real use.

Lorenzo Stoakes (3): pidfd: refactor pidfd_get_pid/to_pid() and de-duplicate pid lookup pidfd: add PIDFD_SELF sentinel to refer to own process selftests: pidfd: add tests for PIDFD_SELF

-- 2.46.2

Show replies by date

Lorenzo Stoakes

30 Sep 30 Sep

9:22 a.m.

New subject: [RFC PATCH 1/3] pidfd: refactor pidfd_get_pid/to_pid() and de-duplicate pid lookup

The means by which a pid is determined from a pidfd is duplicated, some callers holding a reference to the (pid)fd, and others explicitly pinning the pid.

Introduce __pidfd_get_pid() which abstracts both approaches and provide optional output parameters for file->f_flags and the fd (the latter of which, if provided, prevents the function from decrementing the fd's reference count).

We add a wrapper function pidfd_get_pid() which performs the same functionality as the original, only deferring to __pidfd_get_pid() for the heavy lifting.

Additionally, abstract the ability to open a pidfd by opening a /proc/<pid> directory (used by the pidfd_send_signal() system call), providing a pidfd_get_pid_proc() wrapper function to do so.

Doing this allows us to eliminate open-coded pidfd pid lookup and to consistently handle this in one place.

This lays the groundwork for a subsequent patch which adds a new sentinel pidfd to explicitly reference the current process (i.e. thread group leader) without the need for a pidfd.

Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com --- include/linux/pid.h | 42 +++++++++++++++++++++++++++++++++- kernel/pid.c | 55 +++++++++++++++++++++++++++++++-------------- kernel/signal.c | 26 +++++---------------- 3 files changed, 84 insertions(+), 39 deletions(-)

diff --git a/include/linux/pid.h b/include/linux/pid.h index a3aad9b4074c..68b02eab7509 100644 --- a/include/linux/pid.h +++ b/include/linux/pid.h @@ -2,6 +2,7 @@ #ifndef _LINUX_PID_H #define _LINUX_PID_H

+#include <linux/file.h> #include <linux/pid_types.h> #include <linux/rculist.h> #include <linux/rcupdate.h> @@ -72,8 +73,47 @@ extern struct pid init_struct_pid;

struct file;

+ +/** + * __pidfd_get_pid() - Retrieve a pid associated with the specified pidfd. + * + * @pidfd: The pidfd whose pid we want, or the fd of a /proc/<pid> file if + * @alloc_proc is also set. + * @pin_pid: If set, then the reference counter of the returned pid is + * incremented. If not set, then @fd should be provided to pin the + * pidfd. + * @allow_proc: If set, then an fd of a /proc/<pid> file can be passed instead + * of a pidfd, and this will be used to determine the pid. + * @flags: Output variable, if non-NULL, then the file->f_flags of the + * pidfd will be set here. + * @fd: Output variable, if non-NULL, then the pidfd reference will + * remain elevated and the caller will need to decrement it + * themselves. + * + * Returns: If successful, the pid associated with the pidfd, otherwise an + * error. + */ +struct pid *__pidfd_get_pid(unsigned int pidfd, bool pin_pid, + bool allow_proc, unsigned int *flags, + struct fd *fd); + +static inline struct pid *pidfd_get_pid(unsigned int pidfd, unsigned int *flags) +{ + return __pidfd_get_pid(pidfd, /* pin_pid = */ true, + /* allow_proc = */ false, + flags, /* fd = */ NULL); +} + +static inline struct pid *pidfd_to_pid_proc(unsigned int pidfd, + unsigned int *flags, + struct fd *fd) +{ + return __pidfd_get_pid(pidfd, /* pin_pid = */ false, + /* allow_proc = */ true, + flags, fd); +} + struct pid *pidfd_pid(const struct file *file); -struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags); struct task_struct *pidfd_get_task(int pidfd, unsigned int *flags); int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret); void do_notify_pidfd(struct task_struct *task); diff --git a/kernel/pid.c b/kernel/pid.c index 2715afb77eab..26e2581210c4 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -36,6 +36,7 @@ #include <linux/pid_namespace.h> #include <linux/init_task.h> #include <linux/syscalls.h> +#include <linux/proc_fs.h> #include <linux/proc_ns.h> #include <linux/refcount.h> #include <linux/anon_inodes.h> @@ -534,22 +535,47 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns) } EXPORT_SYMBOL_GPL(find_ge_pid);

-struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags) +struct pid *__pidfd_get_pid(unsigned int pidfd, bool pin_pid, + bool allow_proc, unsigned int *flags, + struct fd *fd) { struct fd f; struct pid *pid; + struct file *file;

f = fdget(fd); - if (!fd_file(f)) + file = fd_file(f); + if (!file) return ERR_PTR(-EBADF);

- pid = pidfd_pid(fd_file(f)); - if (!IS_ERR(pid)) { - get_pid(pid); - *flags = fd_file(f)->f_flags; + pid = pidfd_pid(file); + /* If we allow opening a pidfd via /proc/<pid>, do so. */ + if (IS_ERR(pid) && allow_proc) + pid = tgid_pidfd_to_pid(file); + + if (IS_ERR(pid)) { + fdput(f); + return pid; }

- fdput(f); + if (pin_pid) + get_pid(pid); + else + WARN_ON_ONCE(!fd); /* Nothing to keep pid/pidfd around? */ + + if (flags) + *flags = file->f_flags; + + /* + * If the user provides an fd output then it will handle decrementing + * its reference counter. + */ + if (fd) + *fd = f; + else + /* Otherwise we release it. */ + fdput(f); + return pid; }

@@ -747,23 +773,18 @@ SYSCALL_DEFINE3(pidfd_getfd, int, pidfd, int, fd, unsigned int, flags) { struct pid *pid; - struct fd f; int ret;

/* flags is currently unused - make sure it's unset */ if (flags) return -EINVAL;

- f = fdget(pidfd); - if (!fd_file(f)) - return -EBADF; - - pid = pidfd_pid(fd_file(f)); + pid = pidfd_get_pid(pidfd, NULL); if (IS_ERR(pid)) - ret = PTR_ERR(pid); - else - ret = pidfd_getfd(pid, fd); + return PTR_ERR(pid);

- fdput(f); + ret = pidfd_getfd(pid, fd); + + put_pid(pid); return ret; } diff --git a/kernel/signal.c b/kernel/signal.c index 6e57036f947f..abbb1931deba 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -3894,17 +3894,6 @@ static int copy_siginfo_from_user_any(kernel_siginfo_t *kinfo, return copy_siginfo_from_user(kinfo, info); }

-static struct pid *pidfd_to_pid(const struct file *file) -{ - struct pid *pid; - - pid = pidfd_pid(file); - if (!IS_ERR(pid)) - return pid; - - return tgid_pidfd_to_pid(file); -} - #define PIDFD_SEND_SIGNAL_FLAGS \ (PIDFD_SIGNAL_THREAD | PIDFD_SIGNAL_THREAD_GROUP | \ PIDFD_SIGNAL_PROCESS_GROUP) @@ -3931,6 +3920,7 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig, struct pid *pid; kernel_siginfo_t kinfo; enum pid_type type; + unsigned int f_flags;

/* Enforce flags be set to 0 until we add an extension. */ if (flags & ~PIDFD_SEND_SIGNAL_FLAGS) @@ -3940,16 +3930,10 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig, if (hweight32(flags & PIDFD_SEND_SIGNAL_FLAGS) > 1) return -EINVAL;

- f = fdget(pidfd); - if (!fd_file(f)) - return -EBADF; - /* Is this a pidfd? */ - pid = pidfd_to_pid(fd_file(f)); - if (IS_ERR(pid)) { - ret = PTR_ERR(pid); - goto err; - } + pid = pidfd_to_pid_proc(pidfd, &f_flags, &f); + if (IS_ERR(pid)) + return PTR_ERR(pid);

ret = -EINVAL; if (!access_pidfd_pidns(pid)) @@ -3958,7 +3942,7 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig, switch (flags) { case 0: /* Infer scope from the type of pidfd. */ - if (fd_file(f)->f_flags & PIDFD_THREAD) + if (f_flags & PIDFD_THREAD) type = PIDTYPE_PID; else type = PIDTYPE_TGID;

-- 2.46.2

Lorenzo Stoakes

9:22 a.m.

New subject: [RFC PATCH 2/3] pidfd: add PIDFD_SELF sentinel to refer to own process

Add a PIDFD_SELF special sentinel value which can be passed as a pidfd to indicate that the current process is to be targeted (as this is a process from the userland perspective, this means from the kernel's perspective the current thread group leader is to be targeted).

Due to the refactoring of the central __pidfd_get_pid() function we can implement this functionality centrally, providing the use of this sentinel in most functionality which utilises pidfd's.

We need to explicitly adjust kernel_waitid_prepare() to permit this (though it wouldn't really make sense to use this there, we provide the ability for consistency).

We explicitly disallow use of this in setns(), which would otherwise have required explicit custom handling, as it doesn't make sense to set the current calling thread to join the namespace of itself.

As the callers of pidfd_get_pid() expect an increased reference count on the pid we do so in this case even for the current thread leader, reducing churn and avoiding any breakage from existing logic which decrements this reference count.

In the pidfd_send_signal() system call, we can continue to fdput() the struct fd output by pidfs_to_pid_proc() even if PIDFD_SELF is specified, as this will be empty and the invocation will be a no-op.

This change implicitly provides PIDFD_SELF support in the waitid(P_PIDFS, ...), process_madvise(), process_mrelease(), pidfd_send_signal(), and pidfd_getfd() system calls.

Things such as polling a pidfs and general fd operations are not supported, this strictly provides the sentinel for APIs which explicitly accept a pidfd.

Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com --- include/linux/pid.h | 9 +++--- include/uapi/linux/pidfd.h | 3 ++ kernel/exit.c | 3 +- kernel/nsproxy.c | 1 + kernel/pid.c | 63 ++++++++++++++++++++++---------------- 5 files changed, 47 insertions(+), 32 deletions(-)

diff --git a/include/linux/pid.h b/include/linux/pid.h index 68b02eab7509..e95dd7b81ae2 100644 --- a/include/linux/pid.h +++ b/include/linux/pid.h @@ -77,18 +77,19 @@ struct file; /** * __pidfd_get_pid() - Retrieve a pid associated with the specified pidfd. * - * @pidfd: The pidfd whose pid we want, or the fd of a /proc/<pid> file if - * @alloc_proc is also set. + * @pidfd: The pidfd whose pid we want, the fd of a /proc/<pid> file if + * @alloc_proc is also set, or PIDFD_SELF if the pid we require + * is the thread group leader. * @pin_pid: If set, then the reference counter of the returned pid is * incremented. If not set, then @fd should be provided to pin the * pidfd. * @allow_proc: If set, then an fd of a /proc/<pid> file can be passed instead * of a pidfd, and this will be used to determine the pid. * @flags: Output variable, if non-NULL, then the file->f_flags of the - * pidfd will be set here. + * pidfd will be set here. If PIDFD_SELF set, this is zero. * @fd: Output variable, if non-NULL, then the pidfd reference will * remain elevated and the caller will need to decrement it - * themselves. + * themselves. If PIDFD_SELF set, this is empty. * * Returns: If successful, the pid associated with the pidfd, otherwise an * error. diff --git a/include/uapi/linux/pidfd.h b/include/uapi/linux/pidfd.h index 565fc0629fff..0bff5276b51d 100644 --- a/include/uapi/linux/pidfd.h +++ b/include/uapi/linux/pidfd.h @@ -29,4 +29,7 @@ #define PIDFD_GET_USER_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 9) #define PIDFD_GET_UTS_NAMESPACE _IO(PIDFS_IOCTL_MAGIC, 10)

+#define PIDFD_SELF -200 /* Special sentinel value which can be passed as a pidfd + * to refer to the current thread group leader. */ + #endif /* _UAPI_LINUX_PIDFD_H */ diff --git a/kernel/exit.c b/kernel/exit.c index 619f0014c33b..fc10eb2b180c 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -71,6 +71,7 @@ #include <linux/user_events.h> #include <linux/uaccess.h>

+#include <uapi/linux/pidfd.h> #include <uapi/linux/wait.h>

#include <asm/unistd.h> @@ -1739,7 +1740,7 @@ int kernel_waitid_prepare(struct wait_opts *wo, int which, pid_t upid, break; case P_PIDFD: type = PIDTYPE_PID; - if (upid < 0) + if (upid < 0 && upid != PIDFD_SELF) return -EINVAL;

pid = pidfd_get_pid(upid, &f_flags); diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c index dc952c3b05af..aee86fcaf670 100644 --- a/kernel/nsproxy.c +++ b/kernel/nsproxy.c @@ -550,6 +550,7 @@ SYSCALL_DEFINE2(setns, int, fd, int, flags) struct nsset nsset = {}; int err = 0;

+ /* If fd is PIDFD_SELF, implicitly fail here, as invalid. */ if (!fd_file(f)) return -EBADF;

diff --git a/kernel/pid.c b/kernel/pid.c index 26e2581210c4..9c037441afb8 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -539,23 +539,28 @@ struct pid *__pidfd_get_pid(unsigned int pidfd, bool pin_pid, bool allow_proc, unsigned int *flags, struct fd *fd) { - struct fd f; struct pid *pid; - struct file *file; - - f = fdget(fd); - file = fd_file(f); - if (!file) - return ERR_PTR(-EBADF); - - pid = pidfd_pid(file); - /* If we allow opening a pidfd via /proc/<pid>, do so. */ - if (IS_ERR(pid) && allow_proc) - pid = tgid_pidfd_to_pid(file); - - if (IS_ERR(pid)) { - fdput(f); - return pid; + struct fd f = {}; + struct file *file = NULL; + unsigned int f_flags = 0; + + if (pidfd == PIDFD_SELF) { + pid = *task_pid_ptr(current, PIDTYPE_TGID); + } else { + f = fdget(pidfd); + file = fd_file(f); + if (!file) + return ERR_PTR(-EBADF); + + pid = pidfd_pid(file); + /* If we allow opening a pidfd via /proc/<pid>, do so. */ + if (IS_ERR(pid) && allow_proc) + pid = tgid_pidfd_to_pid(file); + + if (IS_ERR(pid)) { + fdput(f); + return pid; + } }

if (pin_pid) @@ -563,18 +568,22 @@ struct pid *__pidfd_get_pid(unsigned int pidfd, bool pin_pid, else WARN_ON_ONCE(!fd); /* Nothing to keep pid/pidfd around? */

- if (flags) - *flags = file->f_flags; + if (file) { + f_flags = file->f_flags;

- /* - * If the user provides an fd output then it will handle decrementing - * its reference counter. - */ - if (fd) - *fd = f; - else - /* Otherwise we release it. */ - fdput(f); + /* + * If the user provides an fd output then it will handle decrementing + * its reference counter. + */ + if (fd) + *fd = f; + else + /* Otherwise we release it. */ + fdput(f); + } + + if (flags) + *flags = f_flags;

return pid; }

-- 2.46.2

Lorenzo Stoakes

9:22 a.m.

New subject: [RFC PATCH 3/3] selftests: pidfd: add tests for PIDFD_SELF

Add tests to assert that PIDFD_SELF correctly refers to the current process.

This is only practically meaningful to pidfd_send_signal() and pidfd_getfd(), but also explicitly test that we disallow this feature for setns() where it would make no sense.

We cannot reasonably wait on ourself using waitid(P_PIDFD, ...) so while in theory PIDFD_SELF would work here, we'd be left blocked if we tried it.

We defer testing of mm-specific functionality which uses pidfd, namely process_madvise() and process_mrelease() to mm testing (though note the latter can not be sensibly tested as it would require the testing process to be dying).

Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com --- tools/testing/selftests/pidfd/pidfd.h | 5 +++ .../selftests/pidfd/pidfd_getfd_test.c | 38 +++++++++++++++++++ .../selftests/pidfd/pidfd_setns_test.c | 6 +++ tools/testing/selftests/pidfd/pidfd_test.c | 13 +++++++ 4 files changed, 62 insertions(+)

diff --git a/tools/testing/selftests/pidfd/pidfd.h b/tools/testing/selftests/pidfd/pidfd.h index 88d6830ee004..099ee7178193 100644 --- a/tools/testing/selftests/pidfd/pidfd.h +++ b/tools/testing/selftests/pidfd/pidfd.h @@ -50,6 +50,11 @@ #define PIDFD_NONBLOCK O_NONBLOCK #endif

+/* System header file may not have this available. */ +#ifndef PIDFD_SELF +#define PIDFD_SELF -200 +#endif + /* * The kernel reserves 300 pids via RESERVED_PIDS in kernel/pid.c * That means, when it wraps around any pid < 300 will be skipped. diff --git a/tools/testing/selftests/pidfd/pidfd_getfd_test.c b/tools/testing/selftests/pidfd/pidfd_getfd_test.c index cd51d547b751..cdf375fd61b2 100644 --- a/tools/testing/selftests/pidfd/pidfd_getfd_test.c +++ b/tools/testing/selftests/pidfd/pidfd_getfd_test.c @@ -15,6 +15,7 @@ #include <sys/prctl.h> #include <sys/wait.h> #include <unistd.h> +#include <sys/mman.h> #include <sys/socket.h> #include <linux/kcmp.h>

@@ -264,6 +265,43 @@ TEST_F(child, no_strange_EBADF) EXPECT_EQ(errno, ESRCH); }

+TEST(pidfd_self) +{ + int memfd = sys_memfd_create("test_self", 0); + long page_size = sysconf(_SC_PAGESIZE); + int newfd; + char *ptr; + + ASSERT_GE(memfd, 0); + ASSERT_EQ(ftruncate(memfd, page_size), 0); + + /* + * Map so we can assert that the duplicated fd references the same + * memory. + */ + ptr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, + MAP_SHARED, memfd, 0); + ASSERT_NE(ptr, MAP_FAILED); + ptr[0] = 'x'; + ASSERT_EQ(munmap(ptr, page_size), 0); + + /* Now get a duplicate of our memfd. */ + newfd = sys_pidfd_getfd(PIDFD_SELF, memfd, 0); + ASSERT_GE(newfd, 0); + ASSERT_NE(memfd, newfd); + + /* Now map duplicate fd and make sure it references the same memory. */ + ptr = mmap(NULL, page_size, PROT_READ | PROT_WRITE, + MAP_SHARED, newfd, 0); + ASSERT_NE(ptr, MAP_FAILED); + ASSERT_EQ(ptr[0], 'x'); + ASSERT_EQ(munmap(ptr, page_size), 0); + + /* Cleanup. */ + close(memfd); + close(newfd); +} + #if __NR_pidfd_getfd == -1 int main(void) { diff --git a/tools/testing/selftests/pidfd/pidfd_setns_test.c b/tools/testing/selftests/pidfd/pidfd_setns_test.c index 7c2a4349170a..8e1510744aaa 100644 --- a/tools/testing/selftests/pidfd/pidfd_setns_test.c +++ b/tools/testing/selftests/pidfd/pidfd_setns_test.c @@ -752,4 +752,10 @@ TEST(setns_einval) close(fd); }

+TEST(setns_pidfd_self_disallowed) +{ + ASSERT_EQ(setns(PIDFD_SELF, 0), -1); + EXPECT_EQ(errno, EBADF); +} + TEST_HARNESS_MAIN diff --git a/tools/testing/selftests/pidfd/pidfd_test.c b/tools/testing/selftests/pidfd/pidfd_test.c index 9faa686f90e4..18802b657352 100644 --- a/tools/testing/selftests/pidfd/pidfd_test.c +++ b/tools/testing/selftests/pidfd/pidfd_test.c @@ -85,6 +85,19 @@ static int test_pidfd_send_signal_simple_success(void) test_name);

signal_received = 0; + + /* Now try the same thing only using PIDFD_SELF. */ + ret = sys_pidfd_send_signal(PIDFD_SELF, SIGUSR1, NULL, 0); + if (ret < 0) + ksft_exit_fail_msg("%s test: Failed to send PIDFD_SELF signal\n", + test_name); + + if (signal_received != 1) + ksft_exit_fail_msg("%s test: Failed to receive PIDFD_SELF signal\n", + test_name); + + signal_received = 0; + ksft_test_result_pass("%s test: Sent signal\n", test_name); return 0; }

-- 2.46.2

Florian Weimer

10:33 a.m.

* Lorenzo Stoakes:

...

If you wish to utilise a pidfd interface to refer to the current process (from the point of view of userland - from the kernel point of view - the thread group leader), it is rather cumbersome, requiring something like:

int pidfd = pidfd_open(getpid(), 0);

...

close(pidfd);

Or the equivalent call opening /proc/self. It is more convenient to use a sentinel value to indicate to an interface that accepts a pidfd that we simply wish to refer to the current process.

The descriptor will refer to the current thread, not process, right?

The distinction matters for pidfd_getfd if a process contains multiple threads with different file descriptor tables, and probably for pidfd_send_signal as well.

Thanks, Florian

Lorenzo Stoakes

10:39 a.m.

On Mon, Sep 30, 2024 at 12:33:18PM GMT, Florian Weimer wrote:

...

Lorenzo Stoakes:

...
If you wish to utilise a pidfd interface to refer to the current process (from the point of view of userland - from the kernel point of view - the thread group leader), it is rather cumbersome, requiring something like:

int pidfd = pidfd_open(getpid(), 0);

...

close(pidfd);

Or the equivalent call opening /proc/self. It is more convenient to use a sentinel value to indicate to an interface that accepts a pidfd that we simply wish to refer to the current process.

The descriptor will refer to the current thread, not process, right?

No it refers to the current process (i.e. thread group leader from kernel perspective). Unless you specify PIDFD_THREAD, this is the same if you did the above.

...

The distinction matters for pidfd_getfd if a process contains multiple threads with different file descriptor tables, and probably for pidfd_send_signal as well.

You mean if you did a strange set of flags to clone()? Otherwise these are shared right?

Again, we are explicitly looking at process not thread from userland perspective. A PIDFD_SELF_THREAD might be possible, but this series doesn't try to implement that.

...

Thanks, Florian

Christian Brauner

12:34 p.m.

On Mon, Sep 30, 2024 at 11:39:49AM GMT, Lorenzo Stoakes wrote:

...

On Mon, Sep 30, 2024 at 12:33:18PM GMT, Florian Weimer wrote:

...

Lorenzo Stoakes:

...
If you wish to utilise a pidfd interface to refer to the current process (from the point of view of userland - from the kernel point of view - the thread group leader), it is rather cumbersome, requiring something like:

int pidfd = pidfd_open(getpid(), 0);

...

close(pidfd);

Or the equivalent call opening /proc/self. It is more convenient to use a sentinel value to indicate to an interface that accepts a pidfd that we simply wish to refer to the current process.

The descriptor will refer to the current thread, not process, right?

No it refers to the current process (i.e. thread group leader from kernel perspective). Unless you specify PIDFD_THREAD, this is the same if you did the above.

...
The distinction matters for pidfd_getfd if a process contains multiple threads with different file descriptor tables, and probably for pidfd_send_signal as well.

You mean if you did a strange set of flags to clone()? Otherwise these are shared right?

Again, we are explicitly looking at process not thread from userland perspective. A PIDFD_SELF_THREAD might be possible, but this series doesn't try to implement that.

Florian raises a good point. Currently we have:

(1) int pidfd_tgid = pidfd_open(getpid(), 0); (2) int pidfd_thread = pidfd_open(getpid(), PIDFD_THREAD);

and this instructs:

pidfd_send_signal() pidfd_getfd()

to do different things. For pidfd_send_signal() it's whether the operation has thread-group scope or thread-scope for pidfd_send_signal() and for pidfd_getfd() it determines the fdtable to use.

The thing is that if you pass:

pidfd_getfd(PDIFD_SELF)

and you have:

TGID

T1 { clone(CLONE_THREAD) unshare(CLONE_FILES) }

T2 { clone(CLONE_THREAD) unshare(CLONE_FILES) }

You have 3 threads in the same thread-group that all have distinct file descriptor tables from each other.

So if T1 did:

pidfd_getfd(PIDFD_SELF, ...)

and we mirror the PIDTYPE_TGID behavior then T1 will very likely expect to get the fd from its file descriptor table. IOW, its reasonable to expect that T1 is interested in their very own resource, not someone else's even if it is the thread-group leader.

But what T1 will get in reality is an fd from TGID's file descriptor table (and similar for T2).

Iirc, yes that confusion exists already with /proc/self. But the question is whether we should add the same confusion to the pidfd api or whether we make PIDFD_SELF actually mean PIDTYPE_PID aka the actual calling thread.

My thinking is that if you have the reasonable suspicion that you're multi-threaded and that you're interested in the thread-group resource then you should be using:

int pidfd = pidfd_open(getpid(), 0)

and hand that thread-group leader pidfd around since you're interested in another thread. But if you're really just interested in your own resource then pidfd_open(getpid(), 0) makes no sense and you would want PIDFD_SELF.

Thoughts?

Lorenzo Stoakes

1:10 p.m.

On Mon, Sep 30, 2024 at 02:34:33PM GMT, Christian Brauner wrote:

...

On Mon, Sep 30, 2024 at 11:39:49AM GMT, Lorenzo Stoakes wrote:

...
On Mon, Sep 30, 2024 at 12:33:18PM GMT, Florian Weimer wrote:

...

Lorenzo Stoakes:

...
If you wish to utilise a pidfd interface to refer to the current process (from the point of view of userland - from the kernel point of view - the thread group leader), it is rather cumbersome, requiring something like:

int pidfd = pidfd_open(getpid(), 0);

...

close(pidfd);

Or the equivalent call opening /proc/self. It is more convenient to use a sentinel value to indicate to an interface that accepts a pidfd that we simply wish to refer to the current process.

The descriptor will refer to the current thread, not process, right?

No it refers to the current process (i.e. thread group leader from kernel perspective). Unless you specify PIDFD_THREAD, this is the same if you did the above.

...
The distinction matters for pidfd_getfd if a process contains multiple threads with different file descriptor tables, and probably for pidfd_send_signal as well.

You mean if you did a strange set of flags to clone()? Otherwise these are shared right?

Again, we are explicitly looking at process not thread from userland perspective. A PIDFD_SELF_THREAD might be possible, but this series doesn't try to implement that.

Florian raises a good point. Currently we have:

(1) int pidfd_tgid = pidfd_open(getpid(), 0); (2) int pidfd_thread = pidfd_open(getpid(), PIDFD_THREAD);

and this instructs:

pidfd_send_signal() pidfd_getfd()

to do different things. For pidfd_send_signal() it's whether the operation has thread-group scope or thread-scope for pidfd_send_signal() and for pidfd_getfd() it determines the fdtable to use.

The thing is that if you pass:

pidfd_getfd(PDIFD_SELF)

and you have:

TGID

T1 { clone(CLONE_THREAD) unshare(CLONE_FILES) }

T2 { clone(CLONE_THREAD) unshare(CLONE_FILES) }

You have 3 threads in the same thread-group that all have distinct file descriptor tables from each other.

So if T1 did:

pidfd_getfd(PIDFD_SELF, ...)

and we mirror the PIDTYPE_TGID behavior then T1 will very likely expect to get the fd from its file descriptor table. IOW, its reasonable to expect that T1 is interested in their very own resource, not someone else's even if it is the thread-group leader.

But what T1 will get in reality is an fd from TGID's file descriptor table (and similar for T2).

Iirc, yes that confusion exists already with /proc/self. But the question is whether we should add the same confusion to the pidfd api or whether we make PIDFD_SELF actually mean PIDTYPE_PID aka the actual calling thread.

My thinking is that if you have the reasonable suspicion that you're multi-threaded and that you're interested in the thread-group resource then you should be using:

int pidfd = pidfd_open(getpid(), 0)

and hand that thread-group leader pidfd around since you're interested in another thread. But if you're really just interested in your own resource then pidfd_open(getpid(), 0) makes no sense and you would want PIDFD_SELF.

Thoughts?

I mean from my perspective, my aim is to get current->mm for process_madvise() so both work for me :) however you both raise a very good point here (sorry Florian, perhaps I was a little too dismissive as to your point, you're absolutely right).

My intent was for PIDFD_SELF to simply mirror the pidfd_open(getpid(), 0) behaviour, but you and Florian make a strong case that you'd _probably_ find this very confusing had you unshared in this fashion.

I mean in general this confusion already exists, and is for what PIDFD_THREAD was created, but I suspect ideally if you could go back you might actually do this by default Christian + let the TGL behaviour be the optional thing?

For most users this will not be an issue, but for those they'd get the same result whichever they used, but yes actually I think you're both right - PIDFD_SELF should in effect imply PIDFD_THREAD.

We can adjust the pidfd_send_signal() call to infer the correct scope (actually nicely we can do that without any change there, by having __pidfd_get_pid() set f_flags accordingly).

So TL;DR: I agree, I will respin with PIDFD_SELF referring to the thread.

My question in return here then is - should we introduce PIDFD_SELF_PROCESS also (do advise if you feel this naming isn't quite right) - to provide thread group leader behaviour?

Thanks!

Aleksa Sarai

2:21 p.m.

On 2024-09-30, Lorenzo Stoakes lorenzo.stoakes@oracle.com wrote:

...

On Mon, Sep 30, 2024 at 02:34:33PM GMT, Christian Brauner wrote:

...
On Mon, Sep 30, 2024 at 11:39:49AM GMT, Lorenzo Stoakes wrote:

...
On Mon, Sep 30, 2024 at 12:33:18PM GMT, Florian Weimer wrote:

...

Lorenzo Stoakes:

...
If you wish to utilise a pidfd interface to refer to the current process (from the point of view of userland - from the kernel point of view - the thread group leader), it is rather cumbersome, requiring something like:

int pidfd = pidfd_open(getpid(), 0);

...

close(pidfd);

Or the equivalent call opening /proc/self. It is more convenient to use a sentinel value to indicate to an interface that accepts a pidfd that we simply wish to refer to the current process.

The descriptor will refer to the current thread, not process, right?

No it refers to the current process (i.e. thread group leader from kernel perspective). Unless you specify PIDFD_THREAD, this is the same if you did the above.

...
The distinction matters for pidfd_getfd if a process contains multiple threads with different file descriptor tables, and probably for pidfd_send_signal as well.

You mean if you did a strange set of flags to clone()? Otherwise these are shared right?

Again, we are explicitly looking at process not thread from userland perspective. A PIDFD_SELF_THREAD might be possible, but this series doesn't try to implement that.

Florian raises a good point. Currently we have:

(1) int pidfd_tgid = pidfd_open(getpid(), 0); (2) int pidfd_thread = pidfd_open(getpid(), PIDFD_THREAD);

and this instructs:

pidfd_send_signal() pidfd_getfd()

to do different things. For pidfd_send_signal() it's whether the operation has thread-group scope or thread-scope for pidfd_send_signal() and for pidfd_getfd() it determines the fdtable to use.

The thing is that if you pass:

pidfd_getfd(PDIFD_SELF)

and you have:

TGID

T1 { clone(CLONE_THREAD) unshare(CLONE_FILES) }

T2 { clone(CLONE_THREAD) unshare(CLONE_FILES) }

You have 3 threads in the same thread-group that all have distinct file descriptor tables from each other.

So if T1 did:

pidfd_getfd(PIDFD_SELF, ...)

and we mirror the PIDTYPE_TGID behavior then T1 will very likely expect to get the fd from its file descriptor table. IOW, its reasonable to expect that T1 is interested in their very own resource, not someone else's even if it is the thread-group leader.

But what T1 will get in reality is an fd from TGID's file descriptor table (and similar for T2).

Iirc, yes that confusion exists already with /proc/self. But the question is whether we should add the same confusion to the pidfd api or whether we make PIDFD_SELF actually mean PIDTYPE_PID aka the actual calling thread.

My thinking is that if you have the reasonable suspicion that you're multi-threaded and that you're interested in the thread-group resource then you should be using:

int pidfd = pidfd_open(getpid(), 0)

and hand that thread-group leader pidfd around since you're interested in another thread. But if you're really just interested in your own resource then pidfd_open(getpid(), 0) makes no sense and you would want PIDFD_SELF.

Thoughts?

I mean from my perspective, my aim is to get current->mm for process_madvise() so both work for me :) however you both raise a very good point here (sorry Florian, perhaps I was a little too dismissive as to your point, you're absolutely right).

My intent was for PIDFD_SELF to simply mirror the pidfd_open(getpid(), 0) behaviour, but you and Florian make a strong case that you'd _probably_ find this very confusing had you unshared in this fashion.

I mean in general this confusion already exists, and is for what PIDFD_THREAD was created, but I suspect ideally if you could go back you might actually do this by default Christian + let the TGL behaviour be the optional thing?

For most users this will not be an issue, but for those they'd get the same result whichever they used, but yes actually I think you're both right - PIDFD_SELF should in effect imply PIDFD_THREAD.

Funnily enough we ran into issues with this when running Go code in runc that did precisely this -- /proc/self gave you the wrong fd table in very specific circumstances that were annoying to debug. For languages with green-threading you can't turn off (like Go) these kinds of issues pop up surprisingly often.

...

We can adjust the pidfd_send_signal() call to infer the correct scope (actually nicely we can do that without any change there, by having __pidfd_get_pid() set f_flags accordingly).

So TL;DR: I agree, I will respin with PIDFD_SELF referring to the thread.

My question in return here then is - should we introduce PIDFD_SELF_PROCESS also (do advise if you feel this naming isn't quite right) - to provide thread group leader behaviour?

Sorry to bike-shed, but to match /proc/self and /proc/thread-self, maybe they should be called PIDFD_SELF (for tgid) and PIDFD_THREAD_SELF (for current's tid)? In principle I guess users might use PIDFD_SELF by accident but if we mirror the naming with /proc/{,thread-}self that might not be that big of an issue?

Just a thought.

...

Thanks!

-- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/

Lorenzo Stoakes

2:32 p.m.

On Mon, Sep 30, 2024 at 04:21:23PM GMT, Aleksa Sarai wrote:

...

On 2024-09-30, Lorenzo Stoakes lorenzo.stoakes@oracle.com wrote:

...
On Mon, Sep 30, 2024 at 02:34:33PM GMT, Christian Brauner wrote:

...
On Mon, Sep 30, 2024 at 11:39:49AM GMT, Lorenzo Stoakes wrote:

...
On Mon, Sep 30, 2024 at 12:33:18PM GMT, Florian Weimer wrote:

...

Lorenzo Stoakes:

...
If you wish to utilise a pidfd interface to refer to the current process (from the point of view of userland - from the kernel point of view - the thread group leader), it is rather cumbersome, requiring something like:

int pidfd = pidfd_open(getpid(), 0);

...

close(pidfd);

Or the equivalent call opening /proc/self. It is more convenient to use a sentinel value to indicate to an interface that accepts a pidfd that we simply wish to refer to the current process.

The descriptor will refer to the current thread, not process, right?

No it refers to the current process (i.e. thread group leader from kernel perspective). Unless you specify PIDFD_THREAD, this is the same if you did the above.

...
The distinction matters for pidfd_getfd if a process contains multiple threads with different file descriptor tables, and probably for pidfd_send_signal as well.

You mean if you did a strange set of flags to clone()? Otherwise these are shared right?

Again, we are explicitly looking at process not thread from userland perspective. A PIDFD_SELF_THREAD might be possible, but this series doesn't try to implement that.

Florian raises a good point. Currently we have:

(1) int pidfd_tgid = pidfd_open(getpid(), 0); (2) int pidfd_thread = pidfd_open(getpid(), PIDFD_THREAD);

and this instructs:

pidfd_send_signal() pidfd_getfd()

to do different things. For pidfd_send_signal() it's whether the operation has thread-group scope or thread-scope for pidfd_send_signal() and for pidfd_getfd() it determines the fdtable to use.

The thing is that if you pass:

pidfd_getfd(PDIFD_SELF)

and you have:

TGID

T1 { clone(CLONE_THREAD) unshare(CLONE_FILES) }

T2 { clone(CLONE_THREAD) unshare(CLONE_FILES) }

You have 3 threads in the same thread-group that all have distinct file descriptor tables from each other.

So if T1 did:

pidfd_getfd(PIDFD_SELF, ...)

and we mirror the PIDTYPE_TGID behavior then T1 will very likely expect to get the fd from its file descriptor table. IOW, its reasonable to expect that T1 is interested in their very own resource, not someone else's even if it is the thread-group leader.

But what T1 will get in reality is an fd from TGID's file descriptor table (and similar for T2).

Iirc, yes that confusion exists already with /proc/self. But the question is whether we should add the same confusion to the pidfd api or whether we make PIDFD_SELF actually mean PIDTYPE_PID aka the actual calling thread.

My thinking is that if you have the reasonable suspicion that you're multi-threaded and that you're interested in the thread-group resource then you should be using:

int pidfd = pidfd_open(getpid(), 0)

and hand that thread-group leader pidfd around since you're interested in another thread. But if you're really just interested in your own resource then pidfd_open(getpid(), 0) makes no sense and you would want PIDFD_SELF.

Thoughts?

I mean from my perspective, my aim is to get current->mm for process_madvise() so both work for me :) however you both raise a very good point here (sorry Florian, perhaps I was a little too dismissive as to your point, you're absolutely right).

My intent was for PIDFD_SELF to simply mirror the pidfd_open(getpid(), 0) behaviour, but you and Florian make a strong case that you'd _probably_ find this very confusing had you unshared in this fashion.

I mean in general this confusion already exists, and is for what PIDFD_THREAD was created, but I suspect ideally if you could go back you might actually do this by default Christian + let the TGL behaviour be the optional thing?

For most users this will not be an issue, but for those they'd get the same result whichever they used, but yes actually I think you're both right - PIDFD_SELF should in effect imply PIDFD_THREAD.

Funnily enough we ran into issues with this when running Go code in runc that did precisely this -- /proc/self gave you the wrong fd table in very specific circumstances that were annoying to debug. For languages with green-threading you can't turn off (like Go) these kinds of issues pop up surprisingly often.

Yeah, damn, useful insight that such things do happen in the wild.

...

...
We can adjust the pidfd_send_signal() call to infer the correct scope (actually nicely we can do that without any change there, by having __pidfd_get_pid() set f_flags accordingly).

So TL;DR: I agree, I will respin with PIDFD_SELF referring to the thread.

My question in return here then is - should we introduce PIDFD_SELF_PROCESS also (do advise if you feel this naming isn't quite right) - to provide thread group leader behaviour?

Sorry to bike-shed, but to match /proc/self and /proc/thread-self, maybe they should be called PIDFD_SELF (for tgid) and PIDFD_THREAD_SELF (for current's tid)? In principle I guess users might use PIDFD_SELF by accident but if we mirror the naming with /proc/{,thread-}self that might not be that big of an issue?

Lol, you know I wasn't even aware /proc/thread-self existed...

Yeah, that actually makes sense and is consistent, though obviously the concern is people will reflexively use PIDFD_SELF and end up with potentially confusing results.

I will obviously be doing a manpage patch for this so we can spell it out there very clearly and also in the header - so I'd actually lean towards doing this myself.

Christian, Florian? Thoughts?

Thanks!

...

Just a thought.

...
Thanks!

-- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/

Christian Brauner

1 Oct 1 Oct

10:21 a.m.

On Mon, Sep 30, 2024 at 03:32:25PM GMT, Lorenzo Stoakes wrote:

...

On Mon, Sep 30, 2024 at 04:21:23PM GMT, Aleksa Sarai wrote:

...
On 2024-09-30, Lorenzo Stoakes lorenzo.stoakes@oracle.com wrote:

...
On Mon, Sep 30, 2024 at 02:34:33PM GMT, Christian Brauner wrote:

...
On Mon, Sep 30, 2024 at 11:39:49AM GMT, Lorenzo Stoakes wrote:

...
On Mon, Sep 30, 2024 at 12:33:18PM GMT, Florian Weimer wrote:

...

Lorenzo Stoakes:

> If you wish to utilise a pidfd interface to refer to the current process > (from the point of view of userland - from the kernel point of view - the > thread group leader), it is rather cumbersome, requiring something like: > > int pidfd = pidfd_open(getpid(), 0); > > ... > > close(pidfd); > > Or the equivalent call opening /proc/self. It is more convenient to use a > sentinel value to indicate to an interface that accepts a pidfd that we > simply wish to refer to the current process.

The descriptor will refer to the current thread, not process, right?

No it refers to the current process (i.e. thread group leader from kernel perspective). Unless you specify PIDFD_THREAD, this is the same if you did the above.

...
The distinction matters for pidfd_getfd if a process contains multiple threads with different file descriptor tables, and probably for pidfd_send_signal as well.

You mean if you did a strange set of flags to clone()? Otherwise these are shared right?

Again, we are explicitly looking at process not thread from userland perspective. A PIDFD_SELF_THREAD might be possible, but this series doesn't try to implement that.

Florian raises a good point. Currently we have:

(1) int pidfd_tgid = pidfd_open(getpid(), 0); (2) int pidfd_thread = pidfd_open(getpid(), PIDFD_THREAD);

and this instructs:

pidfd_send_signal() pidfd_getfd()

to do different things. For pidfd_send_signal() it's whether the operation has thread-group scope or thread-scope for pidfd_send_signal() and for pidfd_getfd() it determines the fdtable to use.

The thing is that if you pass:

pidfd_getfd(PDIFD_SELF)

and you have:

TGID

T1 { clone(CLONE_THREAD) unshare(CLONE_FILES) }

T2 { clone(CLONE_THREAD) unshare(CLONE_FILES) }

You have 3 threads in the same thread-group that all have distinct file descriptor tables from each other.

So if T1 did:

pidfd_getfd(PIDFD_SELF, ...)

and we mirror the PIDTYPE_TGID behavior then T1 will very likely expect to get the fd from its file descriptor table. IOW, its reasonable to expect that T1 is interested in their very own resource, not someone else's even if it is the thread-group leader.

But what T1 will get in reality is an fd from TGID's file descriptor table (and similar for T2).

Iirc, yes that confusion exists already with /proc/self. But the question is whether we should add the same confusion to the pidfd api or whether we make PIDFD_SELF actually mean PIDTYPE_PID aka the actual calling thread.

My thinking is that if you have the reasonable suspicion that you're multi-threaded and that you're interested in the thread-group resource then you should be using:

int pidfd = pidfd_open(getpid(), 0)

and hand that thread-group leader pidfd around since you're interested in another thread. But if you're really just interested in your own resource then pidfd_open(getpid(), 0) makes no sense and you would want PIDFD_SELF.

Thoughts?

I mean from my perspective, my aim is to get current->mm for process_madvise() so both work for me :) however you both raise a very good point here (sorry Florian, perhaps I was a little too dismissive as to your point, you're absolutely right).

My intent was for PIDFD_SELF to simply mirror the pidfd_open(getpid(), 0) behaviour, but you and Florian make a strong case that you'd _probably_ find this very confusing had you unshared in this fashion.

I mean in general this confusion already exists, and is for what PIDFD_THREAD was created, but I suspect ideally if you could go back you might actually do this by default Christian + let the TGL behaviour be the optional thing?

For most users this will not be an issue, but for those they'd get the same result whichever they used, but yes actually I think you're both right - PIDFD_SELF should in effect imply PIDFD_THREAD.

Funnily enough we ran into issues with this when running Go code in runc that did precisely this -- /proc/self gave you the wrong fd table in very specific circumstances that were annoying to debug. For languages with green-threading you can't turn off (like Go) these kinds of issues pop up surprisingly often.

Yeah, damn, useful insight that such things do happen in the wild.

...
...
We can adjust the pidfd_send_signal() call to infer the correct scope (actually nicely we can do that without any change there, by having __pidfd_get_pid() set f_flags accordingly).

So TL;DR: I agree, I will respin with PIDFD_SELF referring to the thread.

My question in return here then is - should we introduce PIDFD_SELF_PROCESS also (do advise if you feel this naming isn't quite right) - to provide thread group leader behaviour?

Sorry to bike-shed, but to match /proc/self and /proc/thread-self, maybe they should be called PIDFD_SELF (for tgid) and PIDFD_THREAD_SELF (for current's tid)? In principle I guess users might use PIDFD_SELF by accident but if we mirror the naming with /proc/{,thread-}self that might not be that big of an issue?

Lol, you know I wasn't even aware /proc/thread-self existed...

Wait until you learn that /proc/$TID thread entries exist but aren't shown when you do ls -al /proc, only when you explicitly access them.

...

Yeah, that actually makes sense and is consistent, though obviously the concern is people will reflexively use PIDFD_SELF and end up with potentially confusing results.

I will obviously be doing a manpage patch for this so we can spell it out there very clearly and also in the header - so I'd actually lean towards doing this myself.

Christian, Florian? Thoughts?

I think adding both would be potentially useful. How about:

#define PIDFD_SELF -100 #define PIDFD_THREAD_GROUP -200

This will make PIDFD_SELF mean current and PIDFD_THREAD_GROUP mean current->pid_links[PIDTYPE_TGID]. I don't think we need to or should mirror procfs in any way. pidfds are intended to be usable without procfs at all.

I want to leave one comment on a bit of quirkiness in the api when we add this. I don't consider it a big deal but it should be pointed out.

It can be useful to allow PIDFD_SELF or PIDFD_THREAD_GROUP to refer to the calling thread even for pidfd_open() to avoid an additional getpid() system call:

(1) pidfd_open(PIDFD_SELF, PIDFD_THREAD) (2) pidfd_open(PIDFD_SELF, 0)

So if we allow this (Should we allow it?) then (1) is just redundant but whathever. But (2) is at least worth discussing: Should we reject (2) on the grounds of contradictory requests or allow it and document that it's equivalent to pidfd_open(getpid(), PIDFD_THREAD)? It feels like the latter would be ok.

Similar for pidfd_send_signal():

// redundant but ok: (1) pidfd_send_signal(PIDFD_SELF, PIDFD_SIGNAL_THREAD)

// redundant but ok: (2) pidfd_send_signal(PIDFD_THREAD_GROUP, PIDFD_SIGNAL_THREAD_GROUP)

// weird way to spell pidfd_send_signal(PIDFD_THREAD_GROUP, 0) (3) pidfd_send_signal(PIDFD_SELF, PIDFD_SIGNAL_THREAD_GROUP)

// weird way to spell pidfd_send_signal(PIDFD_SELF, 0) (4) pidfd_send_signal(PIDFD_THREAD_GROUP, PIDFD_SIGNAL_THREAD)

I think all of this is ok but does anyone else have a strong opinion?

Lorenzo Stoakes

2:31 p.m.

On Tue, Oct 01, 2024 at 12:21:32PM GMT, Christian Brauner wrote:

...

On Mon, Sep 30, 2024 at 03:32:25PM GMT, Lorenzo Stoakes wrote:

...
On Mon, Sep 30, 2024 at 04:21:23PM GMT, Aleksa Sarai wrote:

[snip]

...

...
...
Sorry to bike-shed, but to match /proc/self and /proc/thread-self, maybe they should be called PIDFD_SELF (for tgid) and PIDFD_THREAD_SELF (for current's tid)? In principle I guess users might use PIDFD_SELF by accident but if we mirror the naming with /proc/{,thread-}self that might not be that big of an issue?

Lol, you know I wasn't even aware /proc/thread-self existed...

Wait until you learn that /proc/$TID thread entries exist but aren't shown when you do ls -al /proc, only when you explicitly access them.

My God... You're right, that's crazy... :)

...

...
Yeah, that actually makes sense and is consistent, though obviously the concern is people will reflexively use PIDFD_SELF and end up with potentially confusing results.

I will obviously be doing a manpage patch for this so we can spell it out there very clearly and also in the header - so I'd actually lean towards doing this myself.

Christian, Florian? Thoughts?

I think adding both would be potentially useful. How about:

#define PIDFD_SELF -100 #define PIDFD_THREAD_GROUP -200

Sure, makes sense to add both.

...

This will make PIDFD_SELF mean current and PIDFD_THREAD_GROUP mean current->pid_links[PIDTYPE_TGID]. I don't think we need to or should mirror procfs in any way. pidfds are intended to be usable without procfs at all.

Yeah, I think it's important to ensure the _default_ choice, so in this case PIDFD_SELF clearly, is one that will be least surprising.

The proc thing is sort of pleasing from an aesthetic point of view, but if you followed it you'd have to _clearly_ document PIDFD_THREAD_SELF as the default.

Happy to go along with this. PIDFD_THREAD_GROUP is also clearer as it is distinct from PIDFD_SELF (doesn't reference 'self' at all).

...

I want to leave one comment on a bit of quirkiness in the api when we add this. I don't consider it a big deal but it should be pointed out.

It can be useful to allow PIDFD_SELF or PIDFD_THREAD_GROUP to refer to the calling thread even for pidfd_open() to avoid an additional getpid() system call:

(1) pidfd_open(PIDFD_SELF, PIDFD_THREAD) (2) pidfd_open(PIDFD_SELF, 0)

Hm, this is a bit weird, as these are pid_t's and PIDFD_SELF and PIDFD_THREAD_GROUP are otherwise (pid)fd's.

Being dummy values sort of allows us to put them into service here also, but it is just weird, we pass what is usually a pidfd to receive a pidfd, only this time it's an actually concrete one?

I'm not sure I like this, even though as you say it avoids a getpid().

If we did this I'd prefer it to be a separate name, even if it has the same numeric value (I guess we also might want to check for anything that uses a negative pid_t to refer to an error or something else too).

Perhaps PID_SELF and PID_THREAD_GROUP?

...

So if we allow this (Should we allow it?) then (1) is just redundant but whathever. But (2) is at least worth discussing: Should we reject (2) on the grounds of contradictory requests or allow it and document that it's equivalent to pidfd_open(getpid(), PIDFD_THREAD)? It feels like the latter would be ok.

Similar for pidfd_send_signal():

// redundant but ok: (1) pidfd_send_signal(PIDFD_SELF, PIDFD_SIGNAL_THREAD)

// redundant but ok: (2) pidfd_send_signal(PIDFD_THREAD_GROUP, PIDFD_SIGNAL_THREAD_GROUP)

// weird way to spell pidfd_send_signal(PIDFD_THREAD_GROUP, 0) (3) pidfd_send_signal(PIDFD_SELF, PIDFD_SIGNAL_THREAD_GROUP)

// weird way to spell pidfd_send_signal(PIDFD_SELF, 0) (4) pidfd_send_signal(PIDFD_THREAD_GROUP, PIDFD_SIGNAL_THREAD)

I think all of this is ok but does anyone else have a strong opinion?

I think it's fine to allow all 4 and we should get this behaviour by default (if we have no flags we use the f_flags as a hint, which will be set correctly).

I think people might find contradictory ones, i.e. 3 and 4, strange, but it makes sense for the flags to override the pidfd (as they would for a non-sentinel pidfd) and it makes everything consistent vs. if you were not using a sentinel value.

So yes I think that's fine.

463

days inactive

464

days old

linux-kselftest-mirror@lists.linaro.org

11 comments

participants

tags (0)

participants (4)

Aleksa Sarai
Christian Brauner
Florian Weimer
Lorenzo Stoakes