The primary motivation for the need for this flag is container runtimes which have to interact with malicious root filesystems in the host namespaces. One of the first requirements for a container runtime to be secure against a malicious rootfs is that they correctly scope symlinks (that is, they should be scoped as though they are chroot(2)ed into the container's rootfs) and ".."-style paths[*]. The already-existing LOOKUP_NO_XDEV and LOOKUP_NO_MAGICLINKS help defend against other potential attacks in a malicious rootfs scenario.
Currently most container runtimes try to do this resolution in userspace[1], causing many potential race conditions. In addition, the "obvious" alternative (actually performing a {ch,pivot_}root(2)) requires a fork+exec (for some runtimes) which is *very* costly if necessary for every filesystem operation involving a container.
[*] At the moment, ".." and magic-link jumping are disallowed for the same reason it is disabled for LOOKUP_BENEATH -- currently it is not safe to allow it. Future patches may enable it unconditionally once we have resolved the possible races (for "..") and semantics (for magic-link jumping).
The most significant *at(2) semantic change with LOOKUP_IN_ROOT is that absolute pathnames no longer cause the dirfd to be ignored completely.
The rationale is that LOOKUP_IN_ROOT must necessarily chroot-scope symlinks with absolute paths to dirfd, and so doing it for the base path seems to be the most consistent behaviour (and also avoids foot-gunning users who want to scope paths that are absolute).
[1]: https://github.com/cyphar/filepath-securejoin
Signed-off-by: Aleksa Sarai cyphar@cyphar.com --- fs/namei.c | 5 +++++ include/linux/namei.h | 3 ++- 2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/namei.c b/fs/namei.c index b80efc0ae0f3..efed62c6136e 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2274,6 +2274,11 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
nd->m_seq = read_seqbegin(&mount_lock);
+ /* LOOKUP_IN_ROOT treats absolute paths as being relative-to-dirfd. */ + if (flags & LOOKUP_IN_ROOT) + while (*s == '/') + s++; + /* Figure out the starting path and root (if needed). */ if (*s == '/') { error = nd_jump_root(nd); diff --git a/include/linux/namei.h b/include/linux/namei.h index 88b610ca4d83..1ace31052237 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -48,8 +48,9 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT, LAST_BIND}; #define LOOKUP_NO_MAGICLINKS 0x080000 /* No /proc/$pid/fd/ "symlink" crossing. */ #define LOOKUP_NO_SYMLINKS 0x100000 /* No symlink crossing *at all*. Implies LOOKUP_NO_MAGICLINKS. */ +#define LOOKUP_IN_ROOT 0x200000 /* Treat dirfd as %current->fs->root. */ /* LOOKUP_* flags which do scope-related checks based on the dirfd. */ -#define LOOKUP_DIRFD_SCOPE_FLAGS LOOKUP_BENEATH +#define LOOKUP_DIRFD_SCOPE_FLAGS (LOOKUP_BENEATH | LOOKUP_IN_ROOT)
extern int path_pts(struct path *path);