On Mon, Aug 19, 2019 at 8:37 PM Aleksa Sarai cyphar@cyphar.com wrote:
The most obvious syscall to add support for the new LOOKUP_* scoping flags would be openat(2). However, there are a few reasons why this is not the best course of action:
The new LOOKUP_* flags are intended to be security features, and openat(2) will silently ignore all unknown flags. This means that users would need to avoid foot-gunning themselves constantly when using this interface if it were part of openat(2). This can be fixed by having userspace libraries handle this for users[1], but should be avoided if possible.
Resolution scoping feels like a different operation to the existing O_* flags. And since openat(2) has limited flag space, it seems to be quite wasteful to clutter it with 5 flags that are all resolution-related. Arguably O_NOFOLLOW is also a resolution flag but its entire purpose is to error out if you encounter a trailing symlink -- not to scope resolution.
Other systems would be able to reimplement this syscall allowing for cross-OS standardisation rather than being hidden amongst O_* flags which may result in it not being used by all the parties that might want to use it (file servers, web servers, container runtimes, etc).
It gives us the opportunity to iterate on the O_PATH interface. In particular, the new @how->upgrade_mask field for fd re-opening is only possible because we have a clean slate without needing to re-use the ACC_MODE flag design nor the existing openat(2) @mode semantics.
To this end, we introduce the openat2(2) syscall. It provides all of the features of openat(2) through the @how->flags argument, but also also provides a new @how->resolve argument which exposes RESOLVE_* flags that map to our new LOOKUP_* flags. It also eliminates the long-standing ugliness of variadic-open(2) by embedding it in a struct.
In order to allow for userspace to lock down their usage of file descriptor re-opening, openat2(2) has the ability for users to disallow certain re-opening modes through @how->upgrade_mask. At the moment, there is no UPGRADE_NOEXEC. The open_how struct is padded to 64 bytes for future extensions (all of the reserved bits must be zeroed).
Why pad the structure when new functionality (perhaps accommodated via a larger structure) could be signaled by passing a new flag? Adding reserved fields to a structure with a size embedded in the ABI makes a lot of sense --- e.g., pthread_mutex_t can't grow. But this structure can grow, so the reservation seems needless to me.