Re: [PATCH v3 2/6] userfaultfd: add /dev/userfaultfd for fine grained access control

15 Jun 2022


      On Jun 14, 2022, at 5:55 PM, Axel Rasmussen axelrasmussen@google.com wrote:
...
⚠ External Email
On Mon, Jun 13, 2022 at 5:10 PM Nadav Amit namit@vmware.com wrote:
...
On Jun 13, 2022, at 3:38 PM, Axel Rasmussen axelrasmussen@google.com wrote:
...
On Mon, Jun 13, 2022 at 3:29 PM Peter Xu peterx@redhat.com wrote:
...
On Mon, Jun 13, 2022 at 02:55:40PM -0700, Andrew Morton wrote:
...
On Wed,  1 Jun 2022 14:09:47 -0700 Axel Rasmussen axelrasmussen@google.com wrote:
...
To achieve this, add a /dev/userfaultfd misc device. This device
provides an alternative to the userfaultfd(2) syscall for the creation
of new userfaultfds. The idea is, any userfaultfds created this way will
be able to handle kernel faults, without the caller having any special
capabilities. Access to this mechanism is instead restricted using e.g.
standard filesystem permissions.
The use of a /dev node isn't pretty.  Why can't this be done by
tweaking sys_userfaultfd() or by adding a sys_userfaultfd2()?
I think for any approach involving syscalls, we need to be able to
control access to who can call a syscall. Maybe there's another way
I'm not aware of, but I think today the only mechanism to do this is
capabilities. I proposed adding a CAP_USERFAULTFD for this purpose,
but that approach was rejected [1]. So, I'm not sure of another way
besides using a device node.
One thing that could potentially make this cleaner is, as one LWN
commenter pointed out, we could have open() on /dev/userfaultfd just
return a new userfaultfd directly, instead of this multi-step process
of open /dev/userfaultfd, NEW ioctl, then you get a userfaultfd. When
I wrote this originally it wasn't clear to me how to get that to
happen - open() doesn't directly return the result of our custom open
function pointer, as far as I can tell - but it could be investigated.
If this direction is pursued, I think that it would be better to set it as
/proc/[pid]/userfaultfd, which would allow remote monitors (processes) to
hook into userfaultfd of remote processes. I have a patch for that which
extends userfaultfd syscall, but /proc/[pid]/userfaultfd may be cleaner.
Hmm, one thing I'm unsure about -
If a process is able to control another process' memory like this,
then this seems like exactly what CAP_SYS_PTRACE is intended to deal
with, right? So I'm not sure this case is directly related to the one
I'm trying to address.
This also seems distinct to me versus the existing way you'd do this,
which is open a userfaultfd and register a shared memory region, and
then fork(). Now you can control your child's memory with userfaultfd.
But, attaching to some other, previously-unrelated process with
/proc/[pid]/userfaultfd seems like a clear case for CAP_SYS_PTRACE.
I agree about CAP_SYS_PTRACE. I just know that if the /dev approach is
taken, there would be even more pushback for userfaultfd2.
Whatever.

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v3 2/6] userfaultfd: add /dev/userfaultfd for fine grained access control