I finished up some other work and got around to writing a v5 today, but I ran into a problem with /proc/[pid]/userfaultfd.
Files in /proc/[pid]/* are owned by the user/group which started the process, and they don't support being chmod'ed.
For the userfaultfd device, I think we want the following semantics: - For UFFDs created via the device, we want to always allow handling kernel mode faults - For security, the device should be owned by root:root by default, so unprivileged users don't have default access to handle kernel faults - But, the system administrator should be able to chown/chmod it, to grant access to handling kernel faults for this process more widely.
It could be made to work like that but I think it would involve at least:
- Special casing userfaultfd in proc_pid_make_inode - Updating setattr/getattr for /proc/[pid] to meaningfully store and then retrieve uid/gid different from the task's, again probably special cased for userfautlfd since we don't want this behavior for other files
It seems to me such a change might raise eyebrows among procfs folks. Before I spend the time to write this up, does this seem like something that would obviously be nack'ed?
On Wed, Jul 20, 2022 at 4:21 PM Nadav Amit namit@vmware.com wrote:
On Jul 20, 2022, at 4:04 PM, Axel Rasmussen axelrasmussen@google.com wrote:
⚠ External Email
On Wed, Jul 20, 2022 at 3:16 PM Schaufler, Casey casey.schaufler@intel.com wrote:
-----Original Message----- From: Axel Rasmussen axelrasmussen@google.com Sent: Tuesday, July 19, 2022 12:56 PM To: Alexander Viro viro@zeniv.linux.org.uk; Andrew Morton akpm@linux-foundation.org; Dave Hansen dave.hansen@linux.intel.com; Dmitry V . Levin ldv@altlinux.org; Gleb Fotengauer-Malinovskiy glebfm@altlinux.org; Hugh Dickins hughd@google.com; Jan Kara jack@suse.cz; Jonathan Corbet corbet@lwn.net; Mel Gorman mgorman@techsingularity.net; Mike Kravetz mike.kravetz@oracle.com; Mike Rapoport rppt@kernel.org; Amit, Nadav namit@vmware.com; Peter Xu peterx@redhat.com; Shuah Khan shuah@kernel.org; Suren Baghdasaryan surenb@google.com; Vlastimil Babka vbabka@suse.cz; zhangyi yi.zhang@huawei.com Cc: Axel Rasmussen axelrasmussen@google.com; linux- doc@vger.kernel.org; linux-fsdevel@vger.kernel.org; linux- kernel@vger.kernel.org; linux-mm@kvack.org; linux- kselftest@vger.kernel.org Subject: [PATCH v4 0/5] userfaultfd: add /dev/userfaultfd for fine grained access control
I assume that leaving the LSM mailing list off of the CC is purely accidental. Please, please include us in the next round.
Honestly it just hadn't occurred to me, but I'm more than happy to CC it on future revisions.
This series is based on torvalds/master.
The series is split up like so:
- Patch 1 is a simple fixup which we should take in any case (even by itself).
- Patches 2-6 add the feature, configurable selftest support, and docs.
Why not ...?
- Why not /proc/[pid]/userfaultfd? The proposed use case for this is for one
process to open a userfaultfd which can intercept another process' page faults. This seems to me like exactly what CAP_SYS_PTRACE is for, though, so I think this use case can simply use a syscall without the powers CAP_SYS_PTRACE grants being "too much".
- Why not use a syscall? Access to syscalls is generally controlled by
capabilities. We don't have a capability which is used for userfaultfd access without also granting more / other permissions as well, and adding a new capability was rejected [1].
- It's possible a LSM could be used to control access instead. I suspect
adding a brand new one just for this would be rejected,
You won't know if you don't ask.
Fair enough - I wonder if MM folks (Andrew, Peter, Nadav especially) would find that approach more palatable than /proc/[pid]/userfaultfd? Would it make sense from our perspective to propose a userfaultfd- or MM-specific LSM for controlling access to certain features?
I remember +Andrea saying Red Hat was also interested in some kind of access control mechanism like this. Would one or the other approach be more convenient for you?
To reiterate my position - I think that /proc/[pid]/userfaultfd is very natural and can be easily extended to support cross-process access of userfaultfd. The necessary access controls are simple in any case. For cross-process access, they are similar to those that are used for other /proc/[pid]/X such as pagemap.
I have little experience with LSM and I do not know how real deployments use them. If they are easier to deploy and people prefer them over some pseudo-file, I cannot argue against them.