On Wed, Jul 31, 2024 at 04:51:33PM GMT, Oleg Nesterov wrote:
On 07/31, Christian Brauner wrote:
It's currently possible to create pidfds for kthreads but it is unclear what that is supposed to mean. Until we have use-cases for it and we figured out what behavior we want block the creation of pidfds for kthreads.
Hmm... could you explain your concerns? Why do you think we should disallow pidfd_open(pid-of-kthread) ?
It basically just works now and it's not intentional - at least not on my part. You can't send signals to them, you may or may not get notified via poll when a kthread exits. If we ever want this to be useful I would like to enable it explicitly.
Plus, this causes confusion in userspace. When you have qemu running with kvm support then kvm creates several kthreads (that inherit the cgroup of the calling process). If you try to kill those instances via systemctl kill or systemctl stop then pidfds for these kthreads are opened but sending a signal to them is meaningless.
(So imho this causes more confusion then it is actually helpful. If we add supports for kthreads I'd also like pidfs to gain a way to identify them via statx() or fdinfo.)
@@ -2403,6 +2416,12 @@ __latent_entropy struct task_struct *copy_process( if (clone_flags & CLONE_PIDFD) { int flags = (clone_flags & CLONE_THREAD) ? PIDFD_THREAD : 0;
/* Don't create pidfds for kernel threads for now. */
if (args->kthread) {
retval = -EINVAL;
goto bad_fork_free_pid;
Do we really need this check? Userspace can't use args->kthread != NULL, the kernel users should not use CLONE_PIDFD.
Yeah, I know. That's really just proactive so that user of e.g., copy_process() such as vhost or so on don't start handing out pidfds for stuff without requring changes to the helper itself.