From: Mattias Nissler mnissler@chromium.org
For mounts that have the new "nosymfollow" option, don't follow symlinks when resolving paths. The new option is similar in spirit to the existing "nodev", "noexec", and "nosuid" options, as well as to the LOOKUP_NO_SYMLINKS resolve flag in the openat2(2) syscall. Various BSD variants have been supporting the "nosymfollow" mount option for a long time with equivalent implementations.
Note that symlinks may still be created on file systems mounted with the "nosymfollow" option present. readlink() remains functional, so user space code that is aware of symlinks can still choose to follow them explicitly.
Setting the "nosymfollow" mount option helps prevent privileged writers from modifying files unintentionally in case there is an unexpected link along the accessed path. The "nosymfollow" option is thus useful as a defensive measure for systems that need to deal with untrusted file systems in privileged contexts.
More information on the history and motivation for this patch can be found here:
https://sites.google.com/a/chromium.org/dev/chromium-os/chromiumos-design-do...
Signed-off-by: Mattias Nissler mnissler@chromium.org Signed-off-by: Ross Zwisler zwisler@google.com Reviewed-by: Aleksa Sarai cyphar@cyphar.com --- Changes since v7 [1]: * Rebased onto v5.9-rc1. * Added selftest in second patch. * Added Aleska's Reviewed-By tag. Thank you for the review!
After this lands I will upstream changes to util-linux[2] and man-pages [3].
[1]: https://lkml.org/lkml/2020/8/11/896 [2]: https://github.com/rzwisler/util-linux/commit/7f8771acd85edb70d97921c026c55e... [3]: https://github.com/rzwisler/man-pages/commit/b8fe8079f64b5068940c0144586e580... --- fs/namei.c | 3 ++- fs/namespace.c | 2 ++ fs/proc_namespace.c | 1 + fs/statfs.c | 2 ++ include/linux/mount.h | 3 ++- include/linux/statfs.h | 1 + include/uapi/linux/mount.h | 1 + 7 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/fs/namei.c b/fs/namei.c index e99e2a9da0f7d..12d92af2e2ca7 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1626,7 +1626,8 @@ static const char *pick_link(struct nameidata *nd, struct path *link, return ERR_PTR(error); }
- if (unlikely(nd->flags & LOOKUP_NO_SYMLINKS)) + if (unlikely(nd->flags & LOOKUP_NO_SYMLINKS) || + unlikely(nd->path.mnt->mnt_flags & MNT_NOSYMFOLLOW)) return ERR_PTR(-ELOOP);
if (!(nd->flags & LOOKUP_RCU)) { diff --git a/fs/namespace.c b/fs/namespace.c index bae0e95b3713a..6408788a649e1 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -3160,6 +3160,8 @@ int path_mount(const char *dev_name, struct path *path, mnt_flags &= ~(MNT_RELATIME | MNT_NOATIME); if (flags & MS_RDONLY) mnt_flags |= MNT_READONLY; + if (flags & MS_NOSYMFOLLOW) + mnt_flags |= MNT_NOSYMFOLLOW;
/* The default atime for remount is preservation */ if ((flags & MS_REMOUNT) && diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c index 3059a9394c2d6..e59d4bb3a89e4 100644 --- a/fs/proc_namespace.c +++ b/fs/proc_namespace.c @@ -70,6 +70,7 @@ static void show_mnt_opts(struct seq_file *m, struct vfsmount *mnt) { MNT_NOATIME, ",noatime" }, { MNT_NODIRATIME, ",nodiratime" }, { MNT_RELATIME, ",relatime" }, + { MNT_NOSYMFOLLOW, ",nosymfollow" }, { 0, NULL } }; const struct proc_fs_opts *fs_infop; diff --git a/fs/statfs.c b/fs/statfs.c index 2616424012ea7..59f33752c1311 100644 --- a/fs/statfs.c +++ b/fs/statfs.c @@ -29,6 +29,8 @@ static int flags_by_mnt(int mnt_flags) flags |= ST_NODIRATIME; if (mnt_flags & MNT_RELATIME) flags |= ST_RELATIME; + if (mnt_flags & MNT_NOSYMFOLLOW) + flags |= ST_NOSYMFOLLOW; return flags; }
diff --git a/include/linux/mount.h b/include/linux/mount.h index de657bd211fa6..aaf343b38671c 100644 --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -30,6 +30,7 @@ struct fs_context; #define MNT_NODIRATIME 0x10 #define MNT_RELATIME 0x20 #define MNT_READONLY 0x40 /* does the user want this to be r/o? */ +#define MNT_NOSYMFOLLOW 0x80
#define MNT_SHRINKABLE 0x100 #define MNT_WRITE_HOLD 0x200 @@ -46,7 +47,7 @@ struct fs_context; #define MNT_SHARED_MASK (MNT_UNBINDABLE) #define MNT_USER_SETTABLE_MASK (MNT_NOSUID | MNT_NODEV | MNT_NOEXEC \ | MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME \ - | MNT_READONLY) + | MNT_READONLY | MNT_NOSYMFOLLOW) #define MNT_ATIME_MASK (MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME )
#define MNT_INTERNAL_FLAGS (MNT_SHARED | MNT_WRITE_HOLD | MNT_INTERNAL | \ diff --git a/include/linux/statfs.h b/include/linux/statfs.h index 9bc69edb8f188..fac4356ea1bfc 100644 --- a/include/linux/statfs.h +++ b/include/linux/statfs.h @@ -40,6 +40,7 @@ struct kstatfs { #define ST_NOATIME 0x0400 /* do not update access times */ #define ST_NODIRATIME 0x0800 /* do not update directory access times */ #define ST_RELATIME 0x1000 /* update atime relative to mtime/ctime */ +#define ST_NOSYMFOLLOW 0x2000 /* do not follow symlinks */
struct dentry; extern int vfs_get_fsid(struct dentry *dentry, __kernel_fsid_t *fsid); diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h index 96a0240f23fed..dd8306ea336c1 100644 --- a/include/uapi/linux/mount.h +++ b/include/uapi/linux/mount.h @@ -16,6 +16,7 @@ #define MS_REMOUNT 32 /* Alter flags of a mounted FS */ #define MS_MANDLOCK 64 /* Allow mandatory locks on an FS */ #define MS_DIRSYNC 128 /* Directory modifications are synchronous */ +#define MS_NOSYMFOLLOW 256 /* Do not follow symlinks */ #define MS_NOATIME 1024 /* Do not update access times. */ #define MS_NODIRATIME 2048 /* Do not update directory access times */ #define MS_BIND 4096
Add tests for the new 'nosymfollow' mount option. We test to make sure that symlink traversal fails with ELOOP when 'nosymfollow' is set, but that readlink(2) and realpath(3) still work as expected. We also verify that statfs(2) correctly returns ST_NOSYMFOLLOW when we are mounted with the 'nosymfollow' option.
Signed-off-by: Ross Zwisler zwisler@google.com --- tools/testing/selftests/mount/.gitignore | 1 + tools/testing/selftests/mount/Makefile | 4 +- .../selftests/mount/nosymfollow-test.c | 218 ++++++++++++++++++ .../selftests/mount/run_nosymfollow.sh | 4 + ...n_tests.sh => run_unprivileged_remount.sh} | 0 5 files changed, 225 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/mount/nosymfollow-test.c create mode 100755 tools/testing/selftests/mount/run_nosymfollow.sh rename tools/testing/selftests/mount/{run_tests.sh => run_unprivileged_remount.sh} (100%)
diff --git a/tools/testing/selftests/mount/.gitignore b/tools/testing/selftests/mount/.gitignore index 0bc64a6d4c181..17f2d84151622 100644 --- a/tools/testing/selftests/mount/.gitignore +++ b/tools/testing/selftests/mount/.gitignore @@ -1,2 +1,3 @@ # SPDX-License-Identifier: GPL-2.0-only unprivileged-remount-test +nosymfollow-test diff --git a/tools/testing/selftests/mount/Makefile b/tools/testing/selftests/mount/Makefile index 026890744215b..2d9454841644a 100644 --- a/tools/testing/selftests/mount/Makefile +++ b/tools/testing/selftests/mount/Makefile @@ -3,7 +3,7 @@ CFLAGS = -Wall \ -O2
-TEST_PROGS := run_tests.sh -TEST_GEN_FILES := unprivileged-remount-test +TEST_PROGS := run_unprivileged_remount.sh run_nosymfollow.sh +TEST_GEN_FILES := unprivileged-remount-test nosymfollow-test
include ../lib.mk diff --git a/tools/testing/selftests/mount/nosymfollow-test.c b/tools/testing/selftests/mount/nosymfollow-test.c new file mode 100644 index 0000000000000..650d6d80a1d27 --- /dev/null +++ b/tools/testing/selftests/mount/nosymfollow-test.c @@ -0,0 +1,218 @@ +// SPDX-License-Identifier: GPL-2.0 +#define _GNU_SOURCE +#include <errno.h> +#include <fcntl.h> +#include <limits.h> +#include <sched.h> +#include <stdarg.h> +#include <stdbool.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/mount.h> +#include <sys/stat.h> +#include <sys/types.h> +#include <sys/vfs.h> +#include <unistd.h> + +#ifndef MS_NOSYMFOLLOW +# define MS_NOSYMFOLLOW 256 /* Do not follow symlinks */ +#endif + +#ifndef ST_NOSYMFOLLOW +# define ST_NOSYMFOLLOW 0x2000 /* Do not follow symlinks */ +#endif + +#define DATA "/tmp/data" +#define LINK "/tmp/symlink" +#define TMP "/tmp" + +static void die(char *fmt, ...) +{ + va_list ap; + + va_start(ap, fmt); + vfprintf(stderr, fmt, ap); + va_end(ap); + exit(EXIT_FAILURE); +} + +static void vmaybe_write_file(bool enoent_ok, char *filename, char *fmt, + va_list ap) +{ + ssize_t written; + char buf[4096]; + int buf_len; + int fd; + + buf_len = vsnprintf(buf, sizeof(buf), fmt, ap); + if (buf_len < 0) + die("vsnprintf failed: %s\n", strerror(errno)); + + if (buf_len >= sizeof(buf)) + die("vsnprintf output truncated\n"); + + fd = open(filename, O_WRONLY); + if (fd < 0) { + if ((errno == ENOENT) && enoent_ok) + return; + die("open of %s failed: %s\n", filename, strerror(errno)); + } + + written = write(fd, buf, buf_len); + if (written != buf_len) { + if (written >= 0) { + die("short write to %s\n", filename); + } else { + die("write to %s failed: %s\n", + filename, strerror(errno)); + } + } + + if (close(fd) != 0) + die("close of %s failed: %s\n", filename, strerror(errno)); +} + +static void maybe_write_file(char *filename, char *fmt, ...) +{ + va_list ap; + + va_start(ap, fmt); + vmaybe_write_file(true, filename, fmt, ap); + va_end(ap); +} + +static void write_file(char *filename, char *fmt, ...) +{ + va_list ap; + + va_start(ap, fmt); + vmaybe_write_file(false, filename, fmt, ap); + va_end(ap); +} + +static void create_and_enter_ns(void) +{ + uid_t uid = getuid(); + gid_t gid = getgid(); + + if (unshare(CLONE_NEWUSER) != 0) + die("unshare(CLONE_NEWUSER) failed: %s\n", strerror(errno)); + + maybe_write_file("/proc/self/setgroups", "deny"); + write_file("/proc/self/uid_map", "0 %d 1", uid); + write_file("/proc/self/gid_map", "0 %d 1", gid); + + if (setgid(0) != 0) + die("setgid(0) failed %s\n", strerror(errno)); + if (setuid(0) != 0) + die("setuid(0) failed %s\n", strerror(errno)); + + if (unshare(CLONE_NEWNS) != 0) + die("unshare(CLONE_NEWNS) failed: %s\n", strerror(errno)); +} + +static void setup_symlink(void) +{ + int data, err; + + data = creat(DATA, O_RDWR); + if (data < 0) + die("creat failed: %s\n", strerror(errno)); + + err = symlink(DATA, LINK); + if (err < 0) + die("symlink failed: %s\n", strerror(errno)); + + if (close(data) != 0) + die("close of %s failed: %s\n", DATA, strerror(errno)); +} + +static void test_link_traversal(bool nosymfollow) +{ + int link; + + link = open(LINK, 0, O_RDWR); + if (nosymfollow) { + if ((link != -1 || errno != ELOOP)) { + die("link traversal unexpected result: %d, %s\n", + link, strerror(errno)); + } + } else { + if (link < 0) + die("link traversal failed: %s\n", strerror(errno)); + + if (close(link) != 0) + die("close of link failed: %s\n", strerror(errno)); + } +} + +static void test_readlink(void) +{ + char buf[4096]; + ssize_t ret; + + bzero(buf, sizeof(buf)); + + ret = readlink(LINK, buf, sizeof(buf)); + if (ret < 0) + die("readlink failed: %s\n", strerror(errno)); + if (strcmp(buf, DATA) != 0) + die("readlink strcmp failed: '%s' '%s'\n", buf, DATA); +} + +static void test_realpath(void) +{ + char *path = realpath(LINK, NULL); + + if (!path) + die("realpath failed: %s\n", strerror(errno)); + if (strcmp(path, DATA) != 0) + die("realpath strcmp failed\n"); + + free(path); +} + +static void test_statfs(bool nosymfollow) +{ + struct statfs buf; + int ret; + + ret = statfs(TMP, &buf); + if (ret) + die("statfs failed: %s\n", strerror(errno)); + + if (nosymfollow) { + if ((buf.f_flags & ST_NOSYMFOLLOW) == 0) + die("ST_NOSYMFOLLOW not set on %s\n", TMP); + } else { + if ((buf.f_flags & ST_NOSYMFOLLOW) != 0) + die("ST_NOSYMFOLLOW set on %s\n", TMP); + } +} + +static void run_tests(bool nosymfollow) +{ + test_link_traversal(nosymfollow); + test_readlink(); + test_realpath(); + test_statfs(nosymfollow); +} + +int main(int argc, char **argv) +{ + create_and_enter_ns(); + + if (mount("testing", TMP, "ramfs", 0, NULL) != 0) + die("mount failed: %s\n", strerror(errno)); + + setup_symlink(); + run_tests(false); + + if (mount("testing", TMP, "ramfs", MS_REMOUNT|MS_NOSYMFOLLOW, NULL) != 0) + die("remount failed: %s\n", strerror(errno)); + + run_tests(true); + + return EXIT_SUCCESS; +} diff --git a/tools/testing/selftests/mount/run_nosymfollow.sh b/tools/testing/selftests/mount/run_nosymfollow.sh new file mode 100755 index 0000000000000..5fbbf03043a2e --- /dev/null +++ b/tools/testing/selftests/mount/run_nosymfollow.sh @@ -0,0 +1,4 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +./nosymfollow-test diff --git a/tools/testing/selftests/mount/run_tests.sh b/tools/testing/selftests/mount/run_unprivileged_remount.sh similarity index 100% rename from tools/testing/selftests/mount/run_tests.sh rename to tools/testing/selftests/mount/run_unprivileged_remount.sh
O Wed, Aug 19, 2020 at 10:43:16AM -0600, Ross Zwisler wrote:
From: Mattias Nissler mnissler@chromium.org
For mounts that have the new "nosymfollow" option, don't follow symlinks when resolving paths. The new option is similar in spirit to the existing "nodev", "noexec", and "nosuid" options, as well as to the LOOKUP_NO_SYMLINKS resolve flag in the openat2(2) syscall. Various BSD variants have been supporting the "nosymfollow" mount option for a long time with equivalent implementations.
Note that symlinks may still be created on file systems mounted with the "nosymfollow" option present. readlink() remains functional, so user space code that is aware of symlinks can still choose to follow them explicitly.
Setting the "nosymfollow" mount option helps prevent privileged writers from modifying files unintentionally in case there is an unexpected link along the accessed path. The "nosymfollow" option is thus useful as a defensive measure for systems that need to deal with untrusted file systems in privileged contexts.
More information on the history and motivation for this patch can be found here:
https://sites.google.com/a/chromium.org/dev/chromium-os/chromiumos-design-do...
Signed-off-by: Mattias Nissler mnissler@chromium.org Signed-off-by: Ross Zwisler zwisler@google.com Reviewed-by: Aleksa Sarai cyphar@cyphar.com
Changes since v7 [1]:
- Rebased onto v5.9-rc1.
- Added selftest in second patch.
- Added Aleska's Reviewed-By tag. Thank you for the review!
After this lands I will upstream changes to util-linux[2] and man-pages [3].
[3]: https://github.com/rzwisler/man-pages/commit/b8fe8079f64b5068940c0144586e580...
Friendly ping on this.
Al, now that the changes to fs/namei.c have landed and we're past the merge window for v5.9, what are your thoughts on this patch and the associated test?
On Wed, Aug 26, 2020 at 02:48:19PM -0600, Ross Zwisler wrote:
Al, now that the changes to fs/namei.c have landed and we're past the merge window for v5.9, what are your thoughts on this patch and the associated test?
Humm... should that be nd->path.mnt->mnt_flags or link->mnt->mnt_flags? Usually it's the same thing, but they might differ. IOW, is that about the directory we'd found it in, or is it about the link itself?
On 2020-08-27, Al Viro viro@zeniv.linux.org.uk wrote:
On Wed, Aug 26, 2020 at 02:48:19PM -0600, Ross Zwisler wrote:
Al, now that the changes to fs/namei.c have landed and we're past the merge window for v5.9, what are your thoughts on this patch and the associated test?
Humm... should that be nd->path.mnt->mnt_flags or link->mnt->mnt_flags? Usually it's the same thing, but they might differ. IOW, is that about the directory we'd found it in, or is it about the link itself?
Now that you mention it, I think link->mnt->mnt_flags makes more sense. The restriction should apply in the context of whatever filesystem contains the symlink, and that would matches FreeBSD's semantics (at least as far as I can tell from a quick look at sys/kern/vfs_lookup.c).
On Fri, Aug 28, 2020 at 01:41:39AM +1000, Aleksa Sarai wrote:
On 2020-08-27, Al Viro viro@zeniv.linux.org.uk wrote:
On Wed, Aug 26, 2020 at 02:48:19PM -0600, Ross Zwisler wrote:
Al, now that the changes to fs/namei.c have landed and we're past the merge window for v5.9, what are your thoughts on this patch and the associated test?
Humm... should that be nd->path.mnt->mnt_flags or link->mnt->mnt_flags? Usually it's the same thing, but they might differ. IOW, is that about the directory we'd found it in, or is it about the link itself?
Now that you mention it, I think link->mnt->mnt_flags makes more sense. The restriction should apply in the context of whatever filesystem contains the symlink, and that would matches FreeBSD's semantics (at least as far as I can tell from a quick look at sys/kern/vfs_lookup.c).
Yep, changing this to link->mnt->mnt_flags makes sense to me, as you're right that we care about the link itself and not the link's parent directory. Thank you for the review, and I'll send out v9 momentarily.
linux-kselftest-mirror@lists.linaro.org