Hi,
This new version of Landlock is a major revamp of the previous series [1], hence the RFC tag. The three main changes are the replacement of eBPF with a dedicated safe management of access rules, the replacement of the use of seccomp(2) with a dedicated syscall, and the management of filesystem access-control (back from the v10).
As discussed in [2], eBPF may be too powerful and dangerous to be put in the hand of unprivileged and potentially malicious processes, especially because of side-channel attacks against access-controls or other parts of the kernel.
Thanks to this new implementation (1540 SLOC), designed from the ground to be used by unprivileged processes, this series enables a process to sandbox itself without requiring CAP_SYS_ADMIN, but only the no_new_privs constraint (like seccomp). Not relying on eBPF also enables to improve performances, especially for stacked security policies thanks to mergeable rulesets.
The compiled documentation is available here: https://landlock.io/linux-doc/landlock-v14/security/landlock/index.html
This series can be applied on top of v5.6-rc3. This can be tested with CONFIG_SECURITY_LANDLOCK and CONFIG_SAMPLE_LANDLOCK. This patch series can be found in a Git repository here: https://github.com/landlock-lsm/linux/commits/landlock-v14 I would really appreciate constructive comments on the design and the code.
# Landlock LSM
The goal of Landlock is to enable to restrict ambient rights (e.g. global filesystem access) for a set of processes. Because Landlock is a stackable LSM [3], it makes possible to create safe security sandboxes as new security layers in addition to the existing system-wide access-controls. This kind of sandbox is expected to help mitigate the security impact of bugs or unexpected/malicious behaviors in user-space applications. Landlock empower any process, including unprivileged ones, to securely restrict themselves.
Landlock is inspired by seccomp-bpf but instead of filtering syscalls and their raw arguments, a Landlock rule can restrict the use of kernel objects like file hierarchies, according to the kernel semantic. Landlock also takes inspiration from other OS sandbox mechanisms: XNU Sandbox, FreeBSD Capsicum or OpenBSD Pledge/Unveil.
# Current limitations
## Path walk
Landlock need to use dentries to identify a file hierarchy, which is needed for composable and unprivileged access-controls. This means that path resolution/walking (handled with inode_permission()) is not supported, yet. This could be filled with a future extension first of the LSM framework. The Landlock userspace ABI can handle such change with new option (e.g. to the struct landlock_ruleset).
## UnionFS
An UnionFS super-block use a set of upper and lower directories. An access request to a file in one of these hierarchy trigger a call to ovl_path_real() which generate another access request according to the matching hierarchy. Because such super-block is not aware of its current mount point, OverlayFS can't create a dedicated mnt_parent for each of the upper and lower directories mount clones. It is then not currently possible to track the source of such indirect access-request, and then not possible to identify a unified OverlayFS hierarchy.
## Syscall
Because it is only tested on x86_64, the syscall is only wired up for this architecture. The whole x86 family (and probably all the others) will be supported in the next patch series.
## Memory limits
There is currently no limit on the memory usage. Any idea to leverage an existing mechanism (e.g. rlimit)?
# Changes since v13
* Revamp of the LSM: remove the need for eBPF and seccomp(2). * Implement a full filesystem access-control. * Take care of the backward compatibility issues, especially for this security features.
Previous version: https://lore.kernel.org/lkml/20191104172146.30797-1-mic@digikod.net/
[1] https://lore.kernel.org/lkml/20191104172146.30797-1-mic@digikod.net/ [2] https://lore.kernel.org/lkml/a6b61f33-82dc-0c1c-7a6c-1926343ef63e@digikod.ne... [3] https://lore.kernel.org/lkml/50db058a-7dde-441b-a7f9-f6837fe8b69f@schaufler-...
Regards,
Mickaël Salaün (10): landlock: Add object and rule management landlock: Add ruleset and domain management landlock: Set up the security framework and manage credentials landlock: Add ptrace restrictions fs,landlock: Support filesystem access-control landlock: Add syscall implementation arch: Wire up landlock() syscall selftests/landlock: Add initial tests samples/landlock: Add a sandbox manager example landlock: Add user and kernel documentation
Documentation/security/index.rst | 1 + Documentation/security/landlock/index.rst | 18 + Documentation/security/landlock/kernel.rst | 44 ++ Documentation/security/landlock/user.rst | 233 +++++++ MAINTAINERS | 12 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + fs/super.c | 2 + include/linux/landlock.h | 22 + include/linux/syscalls.h | 3 + include/uapi/asm-generic/unistd.h | 4 +- include/uapi/linux/landlock.h | 315 +++++++++ samples/Kconfig | 7 + samples/Makefile | 1 + samples/landlock/.gitignore | 1 + samples/landlock/Makefile | 15 + samples/landlock/sandboxer.c | 226 +++++++ security/Kconfig | 11 +- security/Makefile | 2 + security/landlock/Kconfig | 16 + security/landlock/Makefile | 4 + security/landlock/cred.c | 47 ++ security/landlock/cred.h | 55 ++ security/landlock/fs.c | 591 +++++++++++++++++ security/landlock/fs.h | 42 ++ security/landlock/object.c | 341 ++++++++++ security/landlock/object.h | 134 ++++ security/landlock/ptrace.c | 118 ++++ security/landlock/ptrace.h | 14 + security/landlock/ruleset.c | 463 +++++++++++++ security/landlock/ruleset.h | 106 +++ security/landlock/setup.c | 38 ++ security/landlock/setup.h | 20 + security/landlock/syscall.c | 470 +++++++++++++ tools/testing/selftests/Makefile | 1 + tools/testing/selftests/landlock/.gitignore | 3 + tools/testing/selftests/landlock/Makefile | 13 + tools/testing/selftests/landlock/config | 4 + tools/testing/selftests/landlock/test.h | 40 ++ tools/testing/selftests/landlock/test_base.c | 80 +++ tools/testing/selftests/landlock/test_fs.c | 624 ++++++++++++++++++ .../testing/selftests/landlock/test_ptrace.c | 293 ++++++++ 41 files changed, 4429 insertions(+), 6 deletions(-) create mode 100644 Documentation/security/landlock/index.rst create mode 100644 Documentation/security/landlock/kernel.rst create mode 100644 Documentation/security/landlock/user.rst create mode 100644 include/linux/landlock.h create mode 100644 include/uapi/linux/landlock.h create mode 100644 samples/landlock/.gitignore create mode 100644 samples/landlock/Makefile create mode 100644 samples/landlock/sandboxer.c create mode 100644 security/landlock/Kconfig create mode 100644 security/landlock/Makefile create mode 100644 security/landlock/cred.c create mode 100644 security/landlock/cred.h create mode 100644 security/landlock/fs.c create mode 100644 security/landlock/fs.h create mode 100644 security/landlock/object.c create mode 100644 security/landlock/object.h create mode 100644 security/landlock/ptrace.c create mode 100644 security/landlock/ptrace.h create mode 100644 security/landlock/ruleset.c create mode 100644 security/landlock/ruleset.h create mode 100644 security/landlock/setup.c create mode 100644 security/landlock/setup.h create mode 100644 security/landlock/syscall.c create mode 100644 tools/testing/selftests/landlock/.gitignore create mode 100644 tools/testing/selftests/landlock/Makefile create mode 100644 tools/testing/selftests/landlock/config create mode 100644 tools/testing/selftests/landlock/test.h create mode 100644 tools/testing/selftests/landlock/test_base.c create mode 100644 tools/testing/selftests/landlock/test_fs.c create mode 100644 tools/testing/selftests/landlock/test_ptrace.c
A Landlock object enables to identify a kernel object (e.g. an inode). A Landlock rule is a set of access rights allowed on an object. Rules are grouped in rulesets that may be tied to a set of processes (i.e. subjects) to enforce a scoped access-control (i.e. a domain).
Because Landlock's goal is to empower any process (especially unprivileged ones) to sandbox themselves, we can't rely on a system-wide object identification such as file extended attributes. Indeed, we need innocuous, composable and modular access-controls.
The main challenge with this constraints is to identify kernel objects while this identification is useful (i.e. when a security policy makes use of this object). But this identification data should be freed once no policy is using it. This ephemeral tagging should not and may not be written in the filesystem. We then need to manage the lifetime of a rule according to the lifetime of its object. To avoid a global lock, this implementation make use of RCU and counters to safely reference objects.
A following commit uses this generic object management for inodes.
Signed-off-by: Mickaël Salaün mic@digikod.net Cc: Andy Lutomirski luto@amacapital.net Cc: James Morris jmorris@namei.org Cc: Kees Cook keescook@chromium.org Cc: Serge E. Hallyn serge@hallyn.com ---
Changes since v13: * New dedicated implementation, removing the need for eBPF.
Previous version: https://lore.kernel.org/lkml/20190721213116.23476-6-mic@digikod.net/ --- MAINTAINERS | 10 ++ security/Kconfig | 1 + security/Makefile | 2 + security/landlock/Kconfig | 15 ++ security/landlock/Makefile | 3 + security/landlock/object.c | 339 +++++++++++++++++++++++++++++++++++++ security/landlock/object.h | 134 +++++++++++++++ 7 files changed, 504 insertions(+) create mode 100644 security/landlock/Kconfig create mode 100644 security/landlock/Makefile create mode 100644 security/landlock/object.c create mode 100644 security/landlock/object.h
diff --git a/MAINTAINERS b/MAINTAINERS index fcd79fc38928..206f85768cd9 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9360,6 +9360,16 @@ F: net/core/skmsg.c F: net/core/sock_map.c F: net/ipv4/tcp_bpf.c
+LANDLOCK SECURITY MODULE +M: Mickaël Salaün mic@digikod.net +L: linux-security-module@vger.kernel.org +W: https://landlock.io +T: git https://github.com/landlock-lsm/linux.git +S: Supported +F: security/landlock/ +K: landlock +K: LANDLOCK + LANTIQ / INTEL Ethernet drivers M: Hauke Mehrtens hauke@hauke-m.de L: netdev@vger.kernel.org diff --git a/security/Kconfig b/security/Kconfig index 2a1a2d396228..9d9981394fb0 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -238,6 +238,7 @@ source "security/loadpin/Kconfig" source "security/yama/Kconfig" source "security/safesetid/Kconfig" source "security/lockdown/Kconfig" +source "security/landlock/Kconfig"
source "security/integrity/Kconfig"
diff --git a/security/Makefile b/security/Makefile index 746438499029..2472ef96d40a 100644 --- a/security/Makefile +++ b/security/Makefile @@ -12,6 +12,7 @@ subdir-$(CONFIG_SECURITY_YAMA) += yama subdir-$(CONFIG_SECURITY_LOADPIN) += loadpin subdir-$(CONFIG_SECURITY_SAFESETID) += safesetid subdir-$(CONFIG_SECURITY_LOCKDOWN_LSM) += lockdown +subdir-$(CONFIG_SECURITY_LANDLOCK) += landlock
# always enable default capabilities obj-y += commoncap.o @@ -29,6 +30,7 @@ obj-$(CONFIG_SECURITY_YAMA) += yama/ obj-$(CONFIG_SECURITY_LOADPIN) += loadpin/ obj-$(CONFIG_SECURITY_SAFESETID) += safesetid/ obj-$(CONFIG_SECURITY_LOCKDOWN_LSM) += lockdown/ +obj-$(CONFIG_SECURITY_LANDLOCK) += landlock/ obj-$(CONFIG_CGROUP_DEVICE) += device_cgroup.o
# Object integrity file lists diff --git a/security/landlock/Kconfig b/security/landlock/Kconfig new file mode 100644 index 000000000000..4a321d5b3f67 --- /dev/null +++ b/security/landlock/Kconfig @@ -0,0 +1,15 @@ +# SPDX-License-Identifier: GPL-2.0-only + +config SECURITY_LANDLOCK + bool "Landlock support" + depends on SECURITY + default n + help + This selects Landlock, a safe sandboxing mechanism. It enables to + restrict processes on the fly (i.e. enforce an access control policy), + which can complement seccomp-bpf. The security policy is a set of access + rights tied to an object, which could be a file, a socket or a process. + + See Documentation/security/landlock/ for further information. + + If you are unsure how to answer this question, answer N. diff --git a/security/landlock/Makefile b/security/landlock/Makefile new file mode 100644 index 000000000000..cb6deefbf4c0 --- /dev/null +++ b/security/landlock/Makefile @@ -0,0 +1,3 @@ +obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o + +landlock-y := object.o diff --git a/security/landlock/object.c b/security/landlock/object.c new file mode 100644 index 000000000000..38fbbb108120 --- /dev/null +++ b/security/landlock/object.c @@ -0,0 +1,339 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Landlock LSM - Object and rule management + * + * Copyright © 2016-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2018-2020 ANSSI + * + * Principles and constraints of the object and rule management: + * - Do not leak memory. + * - Try as much as possible to free a memory allocation as soon as it is + * unused. + * - Do not use global lock. + * - Do not charge processes other than the one requesting a Landlock + * operation. + */ + +#include <linux/bug.h> +#include <linux/compiler.h> +#include <linux/compiler_types.h> +#include <linux/err.h> +#include <linux/errno.h> +#include <linux/fs.h> +#include <linux/kernel.h> +#include <linux/list.h> +#include <linux/rbtree.h> +#include <linux/rcupdate.h> +#include <linux/refcount.h> +#include <linux/slab.h> +#include <linux/spinlock.h> +#include <linux/workqueue.h> + +#include "object.h" + +struct landlock_object *landlock_create_object( + const enum landlock_object_type type, void *underlying_object) +{ + struct landlock_object *object; + + if (WARN_ON_ONCE(!underlying_object)) + return NULL; + object = kzalloc(sizeof(*object), GFP_KERNEL); + if (!object) + return NULL; + refcount_set(&object->usage, 1); + refcount_set(&object->cleaners, 1); + spin_lock_init(&object->lock); + INIT_LIST_HEAD(&object->rules); + object->type = type; + WRITE_ONCE(object->underlying_object, underlying_object); + return object; +} + +struct landlock_object *landlock_get_object(struct landlock_object *object) + __acquires(object->usage) +{ + __acquire(object->usage); + /* + * If @object->usage equal 0, then it will be ignored by writers, and + * underlying_object->object may be replaced, but this is not an issue + * for release_object(). + */ + if (object && refcount_inc_not_zero(&object->usage)) { + /* + * It should not be possible to get a reference to an object if + * its underlying object is being terminated (e.g. with + * landlock_release_object()), because an object is only + * modifiable through such underlying object. This is not the + * case with landlock_get_object_cleaner(). + */ + WARN_ON_ONCE(!READ_ONCE(object->underlying_object)); + return object; + } + return NULL; +} + +static struct landlock_object *get_object_cleaner( + struct landlock_object *object) + __acquires(object->cleaners) +{ + __acquire(object->cleaners); + if (object && refcount_inc_not_zero(&object->cleaners)) + return object; + return NULL; +} + +/* + * There is two cases when an object should be free and the reference to the + * underlying object should be put: + * - when the last rule tied to this object is removed, which is handled by + * landlock_put_rule() and then release_object(); + * - when the object is being terminated (e.g. no more reference to an inode), + * which is handled by landlock_put_object(). + */ +static void put_object_free(struct landlock_object *object) + __releases(object->cleaners) +{ + __release(object->cleaners); + if (!refcount_dec_and_test(&object->cleaners)) + return; + WARN_ON_ONCE(refcount_read(&object->usage)); + /* + * Ensures a safe use of @object in the RCU block from + * landlock_put_rule(). + */ + kfree_rcu(object, rcu_free); +} + +/* + * Destroys a newly created and useless object. + */ +void landlock_drop_object(struct landlock_object *object) +{ + if (WARN_ON_ONCE(!refcount_dec_and_test(&object->usage))) + return; + __acquire(object->cleaners); + put_object_free(object); +} + +/* + * Puts the underlying object (e.g. inode) if it is the first request to + * release @object, without calling landlock_put_object(). + * + * Return true if this call effectively marks @object as released, false + * otherwise. + */ +static bool release_object(struct landlock_object *object) + __releases(&object->lock) +{ + void *underlying_object; + + lockdep_assert_held(&object->lock); + + underlying_object = xchg(&object->underlying_object, NULL); + spin_unlock(&object->lock); + might_sleep(); + if (!underlying_object) + return false; + + switch (object->type) { + case LANDLOCK_OBJECT_INODE: + break; + default: + WARN_ON_ONCE(1); + } + return true; +} + +static void put_object_cleaner(struct landlock_object *object) + __releases(object->cleaners) +{ + /* Let's try an early lockless check. */ + if (list_empty(&object->rules) && + READ_ONCE(object->underlying_object)) { + /* + * Puts @object if there is no rule tied to it and the + * remaining user is the underlying object. This check is + * atomic because @object->rules and @object->underlying_object + * are protected by @object->lock. + */ + spin_lock(&object->lock); + if (list_empty(&object->rules) && + READ_ONCE(object->underlying_object) && + refcount_dec_if_one(&object->usage)) { + /* + * Releases @object, in place of + * landlock_release_object(). + * + * @object is already empty, implying that all its + * previous rules are already disabled. + * + * Unbalance the @object->cleaners counter to reflect + * the underlying object release. + */ + if (!WARN_ON_ONCE(!release_object(object))) { + __acquire(object->cleaners); + put_object_free(object); + } + } else { + spin_unlock(&object->lock); + } + } + put_object_free(object); +} + +/* + * Putting an object is easy when the object is being terminated, but it is + * much more tricky when the reason is that there is no more rule tied to this + * object. Indeed, new rules could be added at the same time. + */ +void landlock_put_object(struct landlock_object *object) + __releases(object->usage) +{ + struct landlock_object *object_cleaner; + + __release(object->usage); + might_sleep(); + if (!object) + return; + /* + * Guards against concurrent termination to be able to terminate + * @object if it is empty and not referenced by another rule-appender + * other than the underlying object. + */ + object_cleaner = get_object_cleaner(object); + if (WARN_ON_ONCE(!object_cleaner)) { + __release(object->cleaners); + return; + } + /* + * Decrements @object->usage and if it reach zero, also decrement + * @object->cleaners. If both reach zero, then release and free + * @object. + */ + if (refcount_dec_and_test(&object->usage)) { + struct landlock_rule *rule_walker, *rule_walker2; + + spin_lock(&object->lock); + /* + * Disables all the rules tied to @object when it is forbidden + * to add new rule but still allowed to remove them with + * landlock_put_rule(). This is crucial to be able to safely + * free a rule according to landlock_rule_is_disabled(). + */ + list_for_each_entry_safe(rule_walker, rule_walker2, + &object->rules, list) + list_del_rcu(&rule_walker->list); + + /* + * Releases @object if it is not already released (e.g. with + * landlock_release_object()). + */ + release_object(object); + /* + * Unbalances the @object->cleaners counter to reflect the + * underlying object release. + */ + __acquire(object->cleaners); + put_object_free(object); + } + put_object_cleaner(object_cleaner); +} + +void landlock_put_rule(struct landlock_object *object, + struct landlock_rule *rule) +{ + if (!rule) + return; + WARN_ON_ONCE(!object); + /* + * Guards against a concurrent @object self-destruction with + * landlock_put_object() or put_object_cleaner(). + */ + rcu_read_lock(); + if (landlock_rule_is_disabled(rule)) { + rcu_read_unlock(); + if (refcount_dec_and_test(&rule->usage)) + kfree_rcu(rule, rcu_free); + return; + } + if (refcount_dec_and_test(&rule->usage)) { + struct landlock_object *safe_object; + + /* + * Now, @rule may still be enabled, or in the process of being + * untied to @object by put_object_cleaner(). However, we know + * that @object will not be freed until rcu_read_unlock() and + * until @object->cleaners reach zero. Furthermore, we may not + * be the only one willing to free a @rule linked with @object. + * If we succeed to hold @object with get_object_cleaner(), we + * know that until put_object_cleaner(), we can safely use + * @object to remove @rule. + */ + safe_object = get_object_cleaner(object); + rcu_read_unlock(); + if (!safe_object) { + __release(safe_object->cleaners); + /* + * We can safely free @rule because it is already + * removed from @object's list. + */ + WARN_ON_ONCE(!landlock_rule_is_disabled(rule)); + kfree_rcu(rule, rcu_free); + } else { + spin_lock(&safe_object->lock); + if (!landlock_rule_is_disabled(rule)) + list_del(&rule->list); + spin_unlock(&safe_object->lock); + kfree_rcu(rule, rcu_free); + put_object_cleaner(safe_object); + } + } else { + rcu_read_unlock(); + } + /* + * put_object_cleaner() might sleep, but it is only reachable if + * !landlock_rule_is_disabled(). Therefore, clean_ref() can not sleep. + */ + might_sleep(); +} + +void landlock_release_object(struct landlock_object __rcu *rcu_object) +{ + struct landlock_object *object; + + if (!rcu_object) + return; + rcu_read_lock(); + object = get_object_cleaner(rcu_dereference(rcu_object)); + rcu_read_unlock(); + if (unlikely(!object)) { + __release(object->cleaners); + return; + } + /* + * Makes sure that the underlying object never point to a freed object + * by firstly releasing the object (i.e. NULL the reference to it) to + * be sure no one could get a new reference to it while it is being + * terminated. Secondly, put the object globally (e.g. for the + * super-block). + * + * This can run concurrently with put_object_cleaner(), which may try + * to release @object as well. + */ + spin_lock(&object->lock); + if (release_object(object)) { + /* + * Unbalances the object to reflect the underlying object + * release. + */ + __acquire(object->usage); + landlock_put_object(object); + } + /* + * If a concurrent thread is adding a new rule, the object will be free + * at the end of this rule addition, otherwise it will be free with the + * following put_object_cleaner() or a remaining one. + */ + put_object_cleaner(object); +} diff --git a/security/landlock/object.h b/security/landlock/object.h new file mode 100644 index 000000000000..15dfc9a75a82 --- /dev/null +++ b/security/landlock/object.h @@ -0,0 +1,134 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Landlock LSM - Object and rule management + * + * Copyright © 2016-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2018-2020 ANSSI + */ + +#ifndef _SECURITY_LANDLOCK_OBJECT_H +#define _SECURITY_LANDLOCK_OBJECT_H + +#include <linux/compiler_types.h> +#include <linux/list.h> +#include <linux/poison.h> +#include <linux/rcupdate.h> +#include <linux/refcount.h> +#include <linux/spinlock.h> + +struct landlock_access { + /* + * @self: Bitfield of allowed actions on the kernel object. They are + * relative to the object type (e.g. LANDLOCK_ACTION_FS_READ). + */ + u32 self; + /* + * @beneath: Same as @self, but for the child objects (e.g. a file in a + * directory). + */ + u32 beneath; +}; + +struct landlock_rule { + struct landlock_access access; + /* + * @list: Linked list with other rules tied to the same object, which + * enable to manage their lifetimes. This is also used to identify if + * a rule is still valid, thanks to landlock_rule_is_disabled(), which + * is important in the matching process because the original object + * address might have been recycled. + */ + struct list_head list; + union { + /* + * @usage: Number of rulesets pointing to this rule. This + * field is never used by RCU readers. + */ + refcount_t usage; + struct rcu_head rcu_free; + }; +}; + +enum landlock_object_type { + LANDLOCK_OBJECT_INODE = 1, +}; + +struct landlock_object { + /* + * @usage: Main usage counter, used to tie an object to it's underlying + * object (i.e. create a lifetime) and potentially add new rules. + */ + refcount_t usage; + /* + * @cleaners: Usage counter used to free a rule from @rules (thanks to + * put_rule()). Enables to get a reference to this object until it + * really become freed. Cf. put_object(). + */ + refcount_t cleaners; + union { + /* + * The use of this struct is controlled by @usage and + * @cleaners, which makes it safe to union it with @rcu_free. + */ + struct { + /* + * @underlying_object: Used when cleaning up an object + * and to mark an object as tied to its underlying + * kernel structure. It must then be atomically read + * using READ_ONCE(). + * + * The one who clear @underlying_object must: + * 1. clear the object self-reference and + * 2. decrement @usage (and potentially free the + * object). + * + * Cf. clean_object(). + */ + void *underlying_object; + /* + * @type: Only used when cleaning up an object. + */ + enum landlock_object_type type; + spinlock_t lock; + /* + * @rules: List of struct landlock_rule linked with + * their "list" field. This list is only accessed when + * updating the list (to be able to clean up later) + * while holding @lock. + */ + struct list_head rules; + }; + struct rcu_head rcu_free; + }; +}; + +void landlock_put_rule(struct landlock_object *object, + struct landlock_rule *rule); + +void landlock_release_object(struct landlock_object __rcu *rcu_object); + +struct landlock_object *landlock_create_object( + const enum landlock_object_type type, void *underlying_object); + +struct landlock_object *landlock_get_object(struct landlock_object *object) + __acquires(object->usage); + +void landlock_put_object(struct landlock_object *object) + __releases(object->usage); + +void landlock_drop_object(struct landlock_object *object); + +static inline bool landlock_rule_is_disabled( + struct landlock_rule *rule) +{ + /* + * Disabling (i.e. unlinking) a landlock_rule is a one-way operation. + * It is not possible to re-enable such a rule, then there is no need + * for smp_load_acquire(). + * + * LIST_POISON2 is set by list_del() and list_del_rcu(). + */ + return !rule || READ_ONCE(rule->list.prev) == LIST_POISON2; +} + +#endif /* _SECURITY_LANDLOCK_OBJECT_H */
On Mon, Feb 24, 2020 at 5:05 PM Mickaël Salaün mic@digikod.net wrote:
A Landlock object enables to identify a kernel object (e.g. an inode). A Landlock rule is a set of access rights allowed on an object. Rules are grouped in rulesets that may be tied to a set of processes (i.e. subjects) to enforce a scoped access-control (i.e. a domain).
Because Landlock's goal is to empower any process (especially unprivileged ones) to sandbox themselves, we can't rely on a system-wide object identification such as file extended attributes. Indeed, we need innocuous, composable and modular access-controls.
The main challenge with this constraints is to identify kernel objects while this identification is useful (i.e. when a security policy makes use of this object). But this identification data should be freed once no policy is using it. This ephemeral tagging should not and may not be written in the filesystem. We then need to manage the lifetime of a rule according to the lifetime of its object. To avoid a global lock, this implementation make use of RCU and counters to safely reference objects.
A following commit uses this generic object management for inodes.
[...]
diff --git a/security/landlock/Kconfig b/security/landlock/Kconfig new file mode 100644 index 000000000000..4a321d5b3f67 --- /dev/null +++ b/security/landlock/Kconfig @@ -0,0 +1,15 @@ +# SPDX-License-Identifier: GPL-2.0-only
+config SECURITY_LANDLOCK
bool "Landlock support"
depends on SECURITY
default n
(I think "default n" is implicit?)
help
This selects Landlock, a safe sandboxing mechanism. It enables to
restrict processes on the fly (i.e. enforce an access control policy),
which can complement seccomp-bpf. The security policy is a set of access
rights tied to an object, which could be a file, a socket or a process.
See Documentation/security/landlock/ for further information.
If you are unsure how to answer this question, answer N.
[...]
diff --git a/security/landlock/object.c b/security/landlock/object.c new file mode 100644 index 000000000000..38fbbb108120 --- /dev/null +++ b/security/landlock/object.c @@ -0,0 +1,339 @@ +// SPDX-License-Identifier: GPL-2.0-only +/*
- Landlock LSM - Object and rule management
- Copyright © 2016-2020 Mickaël Salaün mic@digikod.net
- Copyright © 2018-2020 ANSSI
- Principles and constraints of the object and rule management:
- Do not leak memory.
- Try as much as possible to free a memory allocation as soon as it is
- unused.
- Do not use global lock.
- Do not charge processes other than the one requesting a Landlock
- operation.
- */
+#include <linux/bug.h> +#include <linux/compiler.h> +#include <linux/compiler_types.h> +#include <linux/err.h> +#include <linux/errno.h> +#include <linux/fs.h> +#include <linux/kernel.h> +#include <linux/list.h> +#include <linux/rbtree.h> +#include <linux/rcupdate.h> +#include <linux/refcount.h> +#include <linux/slab.h> +#include <linux/spinlock.h> +#include <linux/workqueue.h>
+#include "object.h"
+struct landlock_object *landlock_create_object(
const enum landlock_object_type type, void *underlying_object)
+{
struct landlock_object *object;
if (WARN_ON_ONCE(!underlying_object))
return NULL;
object = kzalloc(sizeof(*object), GFP_KERNEL);
if (!object)
return NULL;
refcount_set(&object->usage, 1);
refcount_set(&object->cleaners, 1);
spin_lock_init(&object->lock);
INIT_LIST_HEAD(&object->rules);
object->type = type;
WRITE_ONCE(object->underlying_object, underlying_object);
`object` is not globally visible at this point, so WRITE_ONCE() is unnecessary.
return object;
+}
+struct landlock_object *landlock_get_object(struct landlock_object *object)
__acquires(object->usage)
+{
__acquire(object->usage);
/*
* If @object->usage equal 0, then it will be ignored by writers, and
* underlying_object->object may be replaced, but this is not an issue
* for release_object().
*/
if (object && refcount_inc_not_zero(&object->usage)) {
/*
* It should not be possible to get a reference to an object if
* its underlying object is being terminated (e.g. with
* landlock_release_object()), because an object is only
* modifiable through such underlying object. This is not the
* case with landlock_get_object_cleaner().
*/
WARN_ON_ONCE(!READ_ONCE(object->underlying_object));
return object;
}
return NULL;
+}
+static struct landlock_object *get_object_cleaner(
struct landlock_object *object)
__acquires(object->cleaners)
+{
__acquire(object->cleaners);
if (object && refcount_inc_not_zero(&object->cleaners))
return object;
return NULL;
+}
I don't get this whole "cleaners" thing. Can you give a quick description of why this is necessary, and what benefits it has over a standard refcounting+RCU scheme? I don't immediately see anything that requires this.
+/*
- There is two cases when an object should be free and the reference to the
- underlying object should be put:
- when the last rule tied to this object is removed, which is handled by
- landlock_put_rule() and then release_object();
- when the object is being terminated (e.g. no more reference to an inode),
- which is handled by landlock_put_object().
- */
+static void put_object_free(struct landlock_object *object)
__releases(object->cleaners)
+{
__release(object->cleaners);
if (!refcount_dec_and_test(&object->cleaners))
return;
WARN_ON_ONCE(refcount_read(&object->usage));
/*
* Ensures a safe use of @object in the RCU block from
* landlock_put_rule().
*/
kfree_rcu(object, rcu_free);
+}
+/*
- Destroys a newly created and useless object.
- */
+void landlock_drop_object(struct landlock_object *object) +{
if (WARN_ON_ONCE(!refcount_dec_and_test(&object->usage)))
return;
__acquire(object->cleaners);
put_object_free(object);
+}
+/*
- Puts the underlying object (e.g. inode) if it is the first request to
- release @object, without calling landlock_put_object().
- Return true if this call effectively marks @object as released, false
- otherwise.
- */
+static bool release_object(struct landlock_object *object)
__releases(&object->lock)
+{
void *underlying_object;
lockdep_assert_held(&object->lock);
underlying_object = xchg(&object->underlying_object, NULL);
spin_unlock(&object->lock);
might_sleep();
if (!underlying_object)
return false;
switch (object->type) {
case LANDLOCK_OBJECT_INODE:
break;
default:
WARN_ON_ONCE(1);
}
return true;
+}
+static void put_object_cleaner(struct landlock_object *object)
__releases(object->cleaners)
+{
/* Let's try an early lockless check. */
if (list_empty(&object->rules) &&
READ_ONCE(object->underlying_object)) {
/*
* Puts @object if there is no rule tied to it and the
* remaining user is the underlying object. This check is
* atomic because @object->rules and @object->underlying_object
* are protected by @object->lock.
*/
spin_lock(&object->lock);
if (list_empty(&object->rules) &&
READ_ONCE(object->underlying_object) &&
refcount_dec_if_one(&object->usage)) {
/*
* Releases @object, in place of
* landlock_release_object().
*
* @object is already empty, implying that all its
* previous rules are already disabled.
*
* Unbalance the @object->cleaners counter to reflect
* the underlying object release.
*/
if (!WARN_ON_ONCE(!release_object(object))) {
__acquire(object->cleaners);
put_object_free(object);
}
} else {
spin_unlock(&object->lock);
}
}
put_object_free(object);
+}
+/*
- Putting an object is easy when the object is being terminated, but it is
- much more tricky when the reason is that there is no more rule tied to this
- object. Indeed, new rules could be added at the same time.
- */
+void landlock_put_object(struct landlock_object *object)
__releases(object->usage)
+{
struct landlock_object *object_cleaner;
__release(object->usage);
might_sleep();
if (!object)
return;
/*
* Guards against concurrent termination to be able to terminate
* @object if it is empty and not referenced by another rule-appender
* other than the underlying object.
*/
object_cleaner = get_object_cleaner(object);
if (WARN_ON_ONCE(!object_cleaner)) {
__release(object->cleaners);
return;
}
/*
* Decrements @object->usage and if it reach zero, also decrement
* @object->cleaners. If both reach zero, then release and free
* @object.
*/
if (refcount_dec_and_test(&object->usage)) {
struct landlock_rule *rule_walker, *rule_walker2;
spin_lock(&object->lock);
/*
* Disables all the rules tied to @object when it is forbidden
* to add new rule but still allowed to remove them with
* landlock_put_rule(). This is crucial to be able to safely
* free a rule according to landlock_rule_is_disabled().
*/
list_for_each_entry_safe(rule_walker, rule_walker2,
&object->rules, list)
list_del_rcu(&rule_walker->list);
/*
* Releases @object if it is not already released (e.g. with
* landlock_release_object()).
*/
release_object(object);
/*
* Unbalances the @object->cleaners counter to reflect the
* underlying object release.
*/
__acquire(object->cleaners);
put_object_free(object);
}
put_object_cleaner(object_cleaner);
+}
+void landlock_put_rule(struct landlock_object *object,
struct landlock_rule *rule)
+{
if (!rule)
return;
WARN_ON_ONCE(!object);
/*
* Guards against a concurrent @object self-destruction with
* landlock_put_object() or put_object_cleaner().
*/
rcu_read_lock();
if (landlock_rule_is_disabled(rule)) {
rcu_read_unlock();
if (refcount_dec_and_test(&rule->usage))
kfree_rcu(rule, rcu_free);
return;
}
if (refcount_dec_and_test(&rule->usage)) {
struct landlock_object *safe_object;
/*
* Now, @rule may still be enabled, or in the process of being
* untied to @object by put_object_cleaner(). However, we know
* that @object will not be freed until rcu_read_unlock() and
* until @object->cleaners reach zero. Furthermore, we may not
* be the only one willing to free a @rule linked with @object.
* If we succeed to hold @object with get_object_cleaner(), we
* know that until put_object_cleaner(), we can safely use
* @object to remove @rule.
*/
safe_object = get_object_cleaner(object);
rcu_read_unlock();
if (!safe_object) {
__release(safe_object->cleaners);
/*
* We can safely free @rule because it is already
* removed from @object's list.
*/
WARN_ON_ONCE(!landlock_rule_is_disabled(rule));
kfree_rcu(rule, rcu_free);
} else {
spin_lock(&safe_object->lock);
if (!landlock_rule_is_disabled(rule))
list_del(&rule->list);
spin_unlock(&safe_object->lock);
kfree_rcu(rule, rcu_free);
put_object_cleaner(safe_object);
}
} else {
rcu_read_unlock();
}
/*
* put_object_cleaner() might sleep, but it is only reachable if
* !landlock_rule_is_disabled(). Therefore, clean_ref() can not sleep.
*/
might_sleep();
+}
+void landlock_release_object(struct landlock_object __rcu *rcu_object) +{
struct landlock_object *object;
if (!rcu_object)
return;
rcu_read_lock();
object = get_object_cleaner(rcu_dereference(rcu_object));
This is not how RCU works. You need the rcu annotation on the access to the data structure member (or global variable) that's actually being accessed. A "struct foo __rcu *foo" argument is essentially always wrong.
+struct landlock_rule {
struct landlock_access access;
/*
* @list: Linked list with other rules tied to the same object, which
* enable to manage their lifetimes. This is also used to identify if
* a rule is still valid, thanks to landlock_rule_is_disabled(), which
* is important in the matching process because the original object
* address might have been recycled.
*/
struct list_head list;
union {
/*
* @usage: Number of rulesets pointing to this rule. This
* field is never used by RCU readers.
*/
refcount_t usage;
struct rcu_head rcu_free;
};
+};
An object that is subject to RCU but whose refcount must not be accessed from RCU context? That seems a weird.
+enum landlock_object_type {
LANDLOCK_OBJECT_INODE = 1,
+};
+struct landlock_object {
/*
* @usage: Main usage counter, used to tie an object to it's underlying
* object (i.e. create a lifetime) and potentially add new rules.
I can't really follow this by reading this patch on its own. As one suggestion to make things at least a bit better, how about documenting here that `usage` always reaches zero before `cleaners` does?
*/
refcount_t usage;
/*
* @cleaners: Usage counter used to free a rule from @rules (thanks to
* put_rule()). Enables to get a reference to this object until it
* really become freed. Cf. put_object().
Maybe add: @usage being non-zero counts as one reference to @cleaners. Once @cleaners has become zero, the object is freed after an RCU grace period.
*/
refcount_t cleaners;
union {
/*
* The use of this struct is controlled by @usage and
* @cleaners, which makes it safe to union it with @rcu_free.
*/
[...]
struct rcu_head rcu_free;
};
+};
[...]
+static inline bool landlock_rule_is_disabled(
struct landlock_rule *rule)
+{
/*
* Disabling (i.e. unlinking) a landlock_rule is a one-way operation.
* It is not possible to re-enable such a rule, then there is no need
* for smp_load_acquire().
*
* LIST_POISON2 is set by list_del() and list_del_rcu().
*/
return !rule || READ_ONCE(rule->list.prev) == LIST_POISON2;
You're not allowed to do this, the comment above list_del() states:
* Note: list_empty() on entry does not return true after this, the entry is * in an undefined state.
If you want to be able to test whether the element is on a list afterwards, use stuff like list_del_init().
+}
On 25/02/2020 21:49, Jann Horn wrote:
On Mon, Feb 24, 2020 at 5:05 PM Mickaël Salaün mic@digikod.net wrote:
A Landlock object enables to identify a kernel object (e.g. an inode). A Landlock rule is a set of access rights allowed on an object. Rules are grouped in rulesets that may be tied to a set of processes (i.e. subjects) to enforce a scoped access-control (i.e. a domain).
Because Landlock's goal is to empower any process (especially unprivileged ones) to sandbox themselves, we can't rely on a system-wide object identification such as file extended attributes. Indeed, we need innocuous, composable and modular access-controls.
The main challenge with this constraints is to identify kernel objects while this identification is useful (i.e. when a security policy makes use of this object). But this identification data should be freed once no policy is using it. This ephemeral tagging should not and may not be written in the filesystem. We then need to manage the lifetime of a rule according to the lifetime of its object. To avoid a global lock, this implementation make use of RCU and counters to safely reference objects.
A following commit uses this generic object management for inodes.
[...]
diff --git a/security/landlock/Kconfig b/security/landlock/Kconfig new file mode 100644 index 000000000000..4a321d5b3f67 --- /dev/null +++ b/security/landlock/Kconfig @@ -0,0 +1,15 @@ +# SPDX-License-Identifier: GPL-2.0-only
+config SECURITY_LANDLOCK
bool "Landlock support"
depends on SECURITY
default n
(I think "default n" is implicit?)
It seems that most (all?) Kconfig are written like this.
help
This selects Landlock, a safe sandboxing mechanism. It enables to
restrict processes on the fly (i.e. enforce an access control policy),
which can complement seccomp-bpf. The security policy is a set of access
rights tied to an object, which could be a file, a socket or a process.
See Documentation/security/landlock/ for further information.
If you are unsure how to answer this question, answer N.
[...]
diff --git a/security/landlock/object.c b/security/landlock/object.c new file mode 100644 index 000000000000..38fbbb108120 --- /dev/null +++ b/security/landlock/object.c @@ -0,0 +1,339 @@ +// SPDX-License-Identifier: GPL-2.0-only +/*
- Landlock LSM - Object and rule management
- Copyright © 2016-2020 Mickaël Salaün mic@digikod.net
- Copyright © 2018-2020 ANSSI
- Principles and constraints of the object and rule management:
- Do not leak memory.
- Try as much as possible to free a memory allocation as soon as it is
- unused.
- Do not use global lock.
- Do not charge processes other than the one requesting a Landlock
- operation.
- */
+#include <linux/bug.h> +#include <linux/compiler.h> +#include <linux/compiler_types.h> +#include <linux/err.h> +#include <linux/errno.h> +#include <linux/fs.h> +#include <linux/kernel.h> +#include <linux/list.h> +#include <linux/rbtree.h> +#include <linux/rcupdate.h> +#include <linux/refcount.h> +#include <linux/slab.h> +#include <linux/spinlock.h> +#include <linux/workqueue.h>
+#include "object.h"
+struct landlock_object *landlock_create_object(
const enum landlock_object_type type, void *underlying_object)
+{
struct landlock_object *object;
if (WARN_ON_ONCE(!underlying_object))
return NULL;
object = kzalloc(sizeof(*object), GFP_KERNEL);
if (!object)
return NULL;
refcount_set(&object->usage, 1);
refcount_set(&object->cleaners, 1);
spin_lock_init(&object->lock);
INIT_LIST_HEAD(&object->rules);
object->type = type;
WRITE_ONCE(object->underlying_object, underlying_object);
`object` is not globally visible at this point, so WRITE_ONCE() is unnecessary.
Right. It was written like this to have a uniform use of this pointer, but I'll remove it.
return object;
+}
+struct landlock_object *landlock_get_object(struct landlock_object *object)
__acquires(object->usage)
+{
__acquire(object->usage);
/*
* If @object->usage equal 0, then it will be ignored by writers, and
* underlying_object->object may be replaced, but this is not an issue
* for release_object().
*/
if (object && refcount_inc_not_zero(&object->usage)) {
/*
* It should not be possible to get a reference to an object if
* its underlying object is being terminated (e.g. with
* landlock_release_object()), because an object is only
* modifiable through such underlying object. This is not the
* case with landlock_get_object_cleaner().
*/
WARN_ON_ONCE(!READ_ONCE(object->underlying_object));
return object;
}
return NULL;
+}
+static struct landlock_object *get_object_cleaner(
struct landlock_object *object)
__acquires(object->cleaners)
+{
__acquire(object->cleaners);
if (object && refcount_inc_not_zero(&object->cleaners))
return object;
return NULL;
+}
I don't get this whole "cleaners" thing. Can you give a quick description of why this is necessary, and what benefits it has over a standard refcounting+RCU scheme? I don't immediately see anything that requires this.
This indeed needs more documentation here. Here is a comment I'll add to get_object_cleaner():
This enables to safely get a reference to an object to potentially free it if it is not already being freed by a concurrent thread. Indeed, the object's address may still be read and dereferenced while a concurrent thread is attempting to clean the object. Cf. &struct landlock_object->usage and &struct landlock_object->cleaners.
See below the explanation about "usage" and "cleaners".
+/*
- There is two cases when an object should be free and the reference to the
- underlying object should be put:
- when the last rule tied to this object is removed, which is handled by
- landlock_put_rule() and then release_object();
- when the object is being terminated (e.g. no more reference to an inode),
- which is handled by landlock_put_object().
- */
+static void put_object_free(struct landlock_object *object)
__releases(object->cleaners)
+{
__release(object->cleaners);
if (!refcount_dec_and_test(&object->cleaners))
return;
WARN_ON_ONCE(refcount_read(&object->usage));
/*
* Ensures a safe use of @object in the RCU block from
* landlock_put_rule().
*/
kfree_rcu(object, rcu_free);
+}
+/*
- Destroys a newly created and useless object.
- */
+void landlock_drop_object(struct landlock_object *object) +{
if (WARN_ON_ONCE(!refcount_dec_and_test(&object->usage)))
return;
__acquire(object->cleaners);
put_object_free(object);
+}
+/*
- Puts the underlying object (e.g. inode) if it is the first request to
- release @object, without calling landlock_put_object().
- Return true if this call effectively marks @object as released, false
- otherwise.
- */
+static bool release_object(struct landlock_object *object)
__releases(&object->lock)
+{
void *underlying_object;
lockdep_assert_held(&object->lock);
underlying_object = xchg(&object->underlying_object, NULL);
spin_unlock(&object->lock);
might_sleep();
if (!underlying_object)
return false;
switch (object->type) {
case LANDLOCK_OBJECT_INODE:
break;
default:
WARN_ON_ONCE(1);
}
return true;
+}
+static void put_object_cleaner(struct landlock_object *object)
__releases(object->cleaners)
+{
/* Let's try an early lockless check. */
if (list_empty(&object->rules) &&
READ_ONCE(object->underlying_object)) {
/*
* Puts @object if there is no rule tied to it and the
* remaining user is the underlying object. This check is
* atomic because @object->rules and @object->underlying_object
* are protected by @object->lock.
*/
spin_lock(&object->lock);
if (list_empty(&object->rules) &&
READ_ONCE(object->underlying_object) &&
refcount_dec_if_one(&object->usage)) {
/*
* Releases @object, in place of
* landlock_release_object().
*
* @object is already empty, implying that all its
* previous rules are already disabled.
*
* Unbalance the @object->cleaners counter to reflect
* the underlying object release.
*/
if (!WARN_ON_ONCE(!release_object(object))) {
__acquire(object->cleaners);
put_object_free(object);
}
} else {
spin_unlock(&object->lock);
}
}
put_object_free(object);
+}
+/*
- Putting an object is easy when the object is being terminated, but it is
- much more tricky when the reason is that there is no more rule tied to this
- object. Indeed, new rules could be added at the same time.
- */
+void landlock_put_object(struct landlock_object *object)
__releases(object->usage)
+{
struct landlock_object *object_cleaner;
__release(object->usage);
might_sleep();
if (!object)
return;
/*
* Guards against concurrent termination to be able to terminate
* @object if it is empty and not referenced by another rule-appender
* other than the underlying object.
*/
object_cleaner = get_object_cleaner(object);
if (WARN_ON_ONCE(!object_cleaner)) {
__release(object->cleaners);
return;
}
/*
* Decrements @object->usage and if it reach zero, also decrement
* @object->cleaners. If both reach zero, then release and free
* @object.
*/
if (refcount_dec_and_test(&object->usage)) {
struct landlock_rule *rule_walker, *rule_walker2;
spin_lock(&object->lock);
/*
* Disables all the rules tied to @object when it is forbidden
* to add new rule but still allowed to remove them with
* landlock_put_rule(). This is crucial to be able to safely
* free a rule according to landlock_rule_is_disabled().
*/
list_for_each_entry_safe(rule_walker, rule_walker2,
&object->rules, list)
list_del_rcu(&rule_walker->list);
/*
* Releases @object if it is not already released (e.g. with
* landlock_release_object()).
*/
release_object(object);
/*
* Unbalances the @object->cleaners counter to reflect the
* underlying object release.
*/
__acquire(object->cleaners);
put_object_free(object);
}
put_object_cleaner(object_cleaner);
+}
+void landlock_put_rule(struct landlock_object *object,
struct landlock_rule *rule)
+{
if (!rule)
return;
WARN_ON_ONCE(!object);
/*
* Guards against a concurrent @object self-destruction with
* landlock_put_object() or put_object_cleaner().
*/
rcu_read_lock();
if (landlock_rule_is_disabled(rule)) {
rcu_read_unlock();
if (refcount_dec_and_test(&rule->usage))
kfree_rcu(rule, rcu_free);
return;
}
if (refcount_dec_and_test(&rule->usage)) {
struct landlock_object *safe_object;
/*
* Now, @rule may still be enabled, or in the process of being
* untied to @object by put_object_cleaner(). However, we know
* that @object will not be freed until rcu_read_unlock() and
* until @object->cleaners reach zero. Furthermore, we may not
* be the only one willing to free a @rule linked with @object.
* If we succeed to hold @object with get_object_cleaner(), we
* know that until put_object_cleaner(), we can safely use
* @object to remove @rule.
*/
safe_object = get_object_cleaner(object);
rcu_read_unlock();
if (!safe_object) {
__release(safe_object->cleaners);
/*
* We can safely free @rule because it is already
* removed from @object's list.
*/
WARN_ON_ONCE(!landlock_rule_is_disabled(rule));
kfree_rcu(rule, rcu_free);
} else {
spin_lock(&safe_object->lock);
if (!landlock_rule_is_disabled(rule))
list_del(&rule->list);
spin_unlock(&safe_object->lock);
kfree_rcu(rule, rcu_free);
put_object_cleaner(safe_object);
}
} else {
rcu_read_unlock();
}
/*
* put_object_cleaner() might sleep, but it is only reachable if
* !landlock_rule_is_disabled(). Therefore, clean_ref() can not sleep.
*/
might_sleep();
+}
+void landlock_release_object(struct landlock_object __rcu *rcu_object) +{
struct landlock_object *object;
if (!rcu_object)
return;
rcu_read_lock();
object = get_object_cleaner(rcu_dereference(rcu_object));
This is not how RCU works. You need the rcu annotation on the access to the data structure member (or global variable) that's actually being accessed. A "struct foo __rcu *foo" argument is essentially always wrong.
Absolutely! I fixed this with the following patch:
diff --git a/security/landlock/fs.c b/security/landlock/fs.c index 7f3bd4fd04bb..01a48c75f210 100644 --- a/security/landlock/fs.c +++ b/security/landlock/fs.c @@ -98,7 +98,9 @@ void landlock_release_inodes(struct super_block *sb) if (iput_inode) iput(iput_inode);
- landlock_release_object(inode_landlock(inode)->object); + rcu_read_lock(); + landlock_release_object(rcu_dereference( + inode_landlock(inode)->object));
iput_inode = inode; spin_lock(&sb->s_inode_list_lock); diff --git a/security/landlock/object.c b/security/landlock/object.c index 2d373f224989..a0e65a78068d 100644 --- a/security/landlock/object.c +++ b/security/landlock/object.c @@ -300,14 +300,16 @@ void landlock_put_rule(struct landlock_object *object, might_sleep(); }
-void landlock_release_object(struct landlock_object __rcu *rcu_object) +void landlock_release_object(struct landlock_object *rcu_object) + __releases(RCU) { struct landlock_object *object;
- if (!rcu_object) + if (!rcu_object) { + rcu_read_unlock(); return; - rcu_read_lock(); - object = get_object_cleaner(rcu_dereference(rcu_object)); + } + object = get_object_cleaner(rcu_object); rcu_read_unlock(); if (unlikely(!object)) { __release(object->cleaners); diff --git a/security/landlock/object.h b/security/landlock/object.h index 15dfc9a75a82..78bfb25d4bcc 100644 --- a/security/landlock/object.h +++ b/security/landlock/object.h @@ -12,9 +12,9 @@ #include <linux/compiler_types.h> #include <linux/list.h> #include <linux/poison.h> -#include <linux/rcupdate.h> #include <linux/refcount.h> #include <linux/spinlock.h> +#include <linux/types.h>
struct landlock_access { /* @@ -105,7 +105,8 @@ struct landlock_object { void landlock_put_rule(struct landlock_object *object, struct landlock_rule *rule);
-void landlock_release_object(struct landlock_object __rcu *rcu_object); +void landlock_release_object(struct landlock_object *object) + __releases(RCU);
struct landlock_object *landlock_create_object( const enum landlock_object_type type, void *underlying_object);
+struct landlock_rule {
struct landlock_access access;
/*
* @list: Linked list with other rules tied to the same object, which
* enable to manage their lifetimes. This is also used to identify if
* a rule is still valid, thanks to landlock_rule_is_disabled(), which
* is important in the matching process because the original object
* address might have been recycled.
*/
struct list_head list;
union {
/*
* @usage: Number of rulesets pointing to this rule. This
* field is never used by RCU readers.
*/
refcount_t usage;
struct rcu_head rcu_free;
};
+};
An object that is subject to RCU but whose refcount must not be accessed from RCU context? That seems a weird.
The fields "access" and "list" are read (in a RCU-read block) by ruleset.c:landlock_find_access() (cf. patch 2). The use of the "usage" counter is in landlock_insert_ruleset_rule() and landlock_put_rule(), but in these cases the rule is always owned/held by the caller. I should say something like "This field must only be used when already holding the rule."
+enum landlock_object_type {
LANDLOCK_OBJECT_INODE = 1,
+};
+struct landlock_object {
/*
* @usage: Main usage counter, used to tie an object to it's underlying
* object (i.e. create a lifetime) and potentially add new rules.
I can't really follow this by reading this patch on its own. As one suggestion to make things at least a bit better, how about documenting here that `usage` always reaches zero before `cleaners` does?
What about this?
This counter is used to tie an object to its underlying object (e.g. an inode) and to modify it (e.g. add or remove a rule). If this counter reaches zero, the object must not be modified, but it may still be used from within an RCU-read block. When adding a new rule to an object with a usage counter of zero, the underlying object must be locked and its object pointer can then be replaced with a new empty object (while ignoring the disabled object which is being handled by another thread). This counter always reaches zero before @cleaners does.
*/
refcount_t usage;
/*
* @cleaners: Usage counter used to free a rule from @rules (thanks to
* put_rule()). Enables to get a reference to this object until it
* really become freed. Cf. put_object().
Maybe add: @usage being non-zero counts as one reference to @cleaners. Once @cleaners has become zero, the object is freed after an RCU grace period.
What about this?
This counter can only reach zero if the @usage counter already reached zero. Indeed, @usage being non-zero counts as one reference to @cleaners. Once @cleaners has become zero, the object is freed after an RCU grace period. This enables concurrent threads to safely get an object reference to terminate it if there is no more concurrent cleaners for this object. This mechanism is required to enable concurrent threads to safely dereference an object from potentially different pointers (e.g. the underlying object, or a rule tied to this object), to potentially terminate and free it (i.e. if there is no more rules tied to it, or if the underlying object is being terminated).
*/
refcount_t cleaners;
union {
/*
* The use of this struct is controlled by @usage and
* @cleaners, which makes it safe to union it with @rcu_free.
*/
[...]
struct rcu_head rcu_free;
};
+};
[...]
+static inline bool landlock_rule_is_disabled(
struct landlock_rule *rule)
+{
/*
* Disabling (i.e. unlinking) a landlock_rule is a one-way operation.
* It is not possible to re-enable such a rule, then there is no need
* for smp_load_acquire().
*
* LIST_POISON2 is set by list_del() and list_del_rcu().
*/
return !rule || READ_ONCE(rule->list.prev) == LIST_POISON2;
You're not allowed to do this, the comment above list_del() states:
- Note: list_empty() on entry does not return true after this, the entry is
- in an undefined state.
list_del() checks READ_ONCE(head->next) == head, but landlock_rule_is_disabled() checks READ_ONCE(rule->list.prev) == LIST_POISON2. The comment about LIST_POISON2 is right but may be misleading. There is no use of list_empty() with a landlock_rule->list, only landlock_object->rules. The only list_del() is in landlock_put_rule() when there is a guarantee that there is no other reference to it, hence no possible use of landlock_rule_is_disabled() with this rule. I could replace it with a call to list_del_rcu() to make it more consistent.
If you want to be able to test whether the element is on a list afterwards, use stuff like list_del_init().
There is no need to re-initialize the list but using list_del_init() and list_empty() could work too. However, there is no list_del_init_rcu() helper. Moreover, resetting the list's pointer with LIST_POISON2 might help to detect bugs.
Thanks for this review!
On Wed, Feb 26, 2020 at 4:32 PM Mickaël Salaün mic@digikod.net wrote:
On 25/02/2020 21:49, Jann Horn wrote:
On Mon, Feb 24, 2020 at 5:05 PM Mickaël Salaün mic@digikod.net wrote:
A Landlock object enables to identify a kernel object (e.g. an inode). A Landlock rule is a set of access rights allowed on an object. Rules are grouped in rulesets that may be tied to a set of processes (i.e. subjects) to enforce a scoped access-control (i.e. a domain).
Because Landlock's goal is to empower any process (especially unprivileged ones) to sandbox themselves, we can't rely on a system-wide object identification such as file extended attributes. Indeed, we need innocuous, composable and modular access-controls.
The main challenge with this constraints is to identify kernel objects while this identification is useful (i.e. when a security policy makes use of this object). But this identification data should be freed once no policy is using it. This ephemeral tagging should not and may not be written in the filesystem. We then need to manage the lifetime of a rule according to the lifetime of its object. To avoid a global lock, this implementation make use of RCU and counters to safely reference objects.
A following commit uses this generic object management for inodes.
[...]
+config SECURITY_LANDLOCK
bool "Landlock support"
depends on SECURITY
default n
(I think "default n" is implicit?)
It seems that most (all?) Kconfig are written like this.
See e.g. https://lore.kernel.org/lkml/c187bb77-e804-93bd-64db-9418be58f191@infradead.org/.
[...]
return object;
+}
+struct landlock_object *landlock_get_object(struct landlock_object *object)
__acquires(object->usage)
+{
__acquire(object->usage);
/*
* If @object->usage equal 0, then it will be ignored by writers, and
* underlying_object->object may be replaced, but this is not an issue
* for release_object().
*/
if (object && refcount_inc_not_zero(&object->usage)) {
/*
* It should not be possible to get a reference to an object if
* its underlying object is being terminated (e.g. with
* landlock_release_object()), because an object is only
* modifiable through such underlying object. This is not the
* case with landlock_get_object_cleaner().
*/
WARN_ON_ONCE(!READ_ONCE(object->underlying_object));
return object;
}
return NULL;
+}
+static struct landlock_object *get_object_cleaner(
struct landlock_object *object)
__acquires(object->cleaners)
+{
__acquire(object->cleaners);
if (object && refcount_inc_not_zero(&object->cleaners))
return object;
return NULL;
+}
I don't get this whole "cleaners" thing. Can you give a quick description of why this is necessary, and what benefits it has over a standard refcounting+RCU scheme? I don't immediately see anything that requires this.
This indeed needs more documentation here. Here is a comment I'll add to get_object_cleaner():
This enables to safely get a reference to an object to potentially free it if it is not already being freed by a concurrent thread.
"get a reference to an object to potentially free it" just sounds all wrong to me. You free an object when you're *dropping* a reference to it. Your refcounting scheme doesn't fit my mental models of how normal refcounting works at all...
[...]
+/*
- Putting an object is easy when the object is being terminated, but it is
- much more tricky when the reason is that there is no more rule tied to this
- object. Indeed, new rules could be added at the same time.
- */
+void landlock_put_object(struct landlock_object *object)
__releases(object->usage)
+{
struct landlock_object *object_cleaner;
__release(object->usage);
might_sleep();
if (!object)
return;
/*
* Guards against concurrent termination to be able to terminate
* @object if it is empty and not referenced by another rule-appender
* other than the underlying object.
*/
object_cleaner = get_object_cleaner(object);
[...]
/*
* Decrements @object->usage and if it reach zero, also decrement
* @object->cleaners. If both reach zero, then release and free
* @object.
*/
if (refcount_dec_and_test(&object->usage)) {
struct landlock_rule *rule_walker, *rule_walker2;
spin_lock(&object->lock);
/*
* Disables all the rules tied to @object when it is forbidden
* to add new rule but still allowed to remove them with
* landlock_put_rule(). This is crucial to be able to safely
* free a rule according to landlock_rule_is_disabled().
*/
list_for_each_entry_safe(rule_walker, rule_walker2,
&object->rules, list)
list_del_rcu(&rule_walker->list);
So... rules don't take references on the landlock_objects they use? Instead, the landlock_object knows which rules use it, and when the landlock_object goes away, it nukes all the rules associated with itself?
That seems terrible to me - AFAICS it means that if some random process decides to install a landlock rule that uses inode X, and then that process dies together with all its landlock rules, the inode still stays pinned in kernel memory as long as the superblock is mounted. In other words, it's a resource leak. (And if I'm not missing something in patch 5, that applies even if the inode has been unlinked?)
Can you please refactor your refcounting as follows?
- A rule takes a reference on each landlock_object it uses. - A landlock_object takes a reference on the underlying object (just like now). - The underlying object *DOES NOT* take a reference on the landlock_object (unlike now); the reference from the underlying object to the landlock_object has weak pointer semantics. - When a landlock_object's refcount drops to zero (iow no rules use it anymore), it is freed.
That might also help get rid of the awkward ->cleaners thing?
/*
* Releases @object if it is not already released (e.g. with
* landlock_release_object()).
*/
release_object(object);
/*
* Unbalances the @object->cleaners counter to reflect the
* underlying object release.
*/
__acquire(object->cleaners);
put_object_free(object);
}
put_object_cleaner(object_cleaner);
+}
[...]
+static inline bool landlock_rule_is_disabled(
struct landlock_rule *rule)
+{
/*
* Disabling (i.e. unlinking) a landlock_rule is a one-way operation.
* It is not possible to re-enable such a rule, then there is no need
* for smp_load_acquire().
*
* LIST_POISON2 is set by list_del() and list_del_rcu().
*/
return !rule || READ_ONCE(rule->list.prev) == LIST_POISON2;
You're not allowed to do this, the comment above list_del() states:
- Note: list_empty() on entry does not return true after this, the entry is
- in an undefined state.
list_del() checks READ_ONCE(head->next) == head, but landlock_rule_is_disabled() checks READ_ONCE(rule->list.prev) == LIST_POISON2. The comment about LIST_POISON2 is right but may be misleading. There is no use of list_empty() with a landlock_rule->list, only landlock_object->rules. The only list_del() is in landlock_put_rule() when there is a guarantee that there is no other reference to it, hence no possible use of landlock_rule_is_disabled() with this rule. I could replace it with a call to list_del_rcu() to make it more consistent.
If you want to be able to test whether the element is on a list afterwards, use stuff like list_del_init().
There is no need to re-initialize the list but using list_del_init() and list_empty() could work too. However, there is no list_del_init_rcu() helper. Moreover, resetting the list's pointer with LIST_POISON2 might help to detect bugs.
Either way, you are currently using the list_head API in a way that goes against what the header documents. If you want to rely on list_del() bringing the object into a specific state, then you can't leave the comment above list_del() as-is that says that it puts the object in an undefined state; and this kind of check should probably be done in a helper in list.h instead of open-coding the check for LIST_POISON2.
On 26/02/2020 21:24, Jann Horn wrote:
On Wed, Feb 26, 2020 at 4:32 PM Mickaël Salaün mic@digikod.net wrote:
On 25/02/2020 21:49, Jann Horn wrote:
On Mon, Feb 24, 2020 at 5:05 PM Mickaël Salaün mic@digikod.net wrote:
A Landlock object enables to identify a kernel object (e.g. an inode). A Landlock rule is a set of access rights allowed on an object. Rules are grouped in rulesets that may be tied to a set of processes (i.e. subjects) to enforce a scoped access-control (i.e. a domain).
Because Landlock's goal is to empower any process (especially unprivileged ones) to sandbox themselves, we can't rely on a system-wide object identification such as file extended attributes. Indeed, we need innocuous, composable and modular access-controls.
The main challenge with this constraints is to identify kernel objects while this identification is useful (i.e. when a security policy makes use of this object). But this identification data should be freed once no policy is using it. This ephemeral tagging should not and may not be written in the filesystem. We then need to manage the lifetime of a rule according to the lifetime of its object. To avoid a global lock, this implementation make use of RCU and counters to safely reference objects.
A following commit uses this generic object management for inodes.
[...]
+config SECURITY_LANDLOCK
bool "Landlock support"
depends on SECURITY
default n
(I think "default n" is implicit?)
It seems that most (all?) Kconfig are written like this.
See e.g. https://lore.kernel.org/lkml/c187bb77-e804-93bd-64db-9418be58f191@infradead.org/.
Ok, done.
[...]
return object;
+}
+struct landlock_object *landlock_get_object(struct landlock_object *object)
__acquires(object->usage)
+{
__acquire(object->usage);
/*
* If @object->usage equal 0, then it will be ignored by writers, and
* underlying_object->object may be replaced, but this is not an issue
* for release_object().
*/
if (object && refcount_inc_not_zero(&object->usage)) {
/*
* It should not be possible to get a reference to an object if
* its underlying object is being terminated (e.g. with
* landlock_release_object()), because an object is only
* modifiable through such underlying object. This is not the
* case with landlock_get_object_cleaner().
*/
WARN_ON_ONCE(!READ_ONCE(object->underlying_object));
return object;
}
return NULL;
+}
+static struct landlock_object *get_object_cleaner(
struct landlock_object *object)
__acquires(object->cleaners)
+{
__acquire(object->cleaners);
if (object && refcount_inc_not_zero(&object->cleaners))
return object;
return NULL;
+}
I don't get this whole "cleaners" thing. Can you give a quick description of why this is necessary, and what benefits it has over a standard refcounting+RCU scheme? I don't immediately see anything that requires this.
This indeed needs more documentation here. Here is a comment I'll add to get_object_cleaner():
This enables to safely get a reference to an object to potentially free it if it is not already being freed by a concurrent thread.
"get a reference to an object to potentially free it" just sounds all wrong to me. You free an object when you're *dropping* a reference to it. Your refcounting scheme doesn't fit my mental models of how normal refcounting works at all...
Unfortunately, as I explain below, it is a bit tricky.
[...]
+/*
- Putting an object is easy when the object is being terminated, but it is
- much more tricky when the reason is that there is no more rule tied to this
- object. Indeed, new rules could be added at the same time.
- */
+void landlock_put_object(struct landlock_object *object)
__releases(object->usage)
+{
struct landlock_object *object_cleaner;
__release(object->usage);
might_sleep();
if (!object)
return;
/*
* Guards against concurrent termination to be able to terminate
* @object if it is empty and not referenced by another rule-appender
* other than the underlying object.
*/
object_cleaner = get_object_cleaner(object);
[...]
/*
* Decrements @object->usage and if it reach zero, also decrement
* @object->cleaners. If both reach zero, then release and free
* @object.
*/
if (refcount_dec_and_test(&object->usage)) {
struct landlock_rule *rule_walker, *rule_walker2;
spin_lock(&object->lock);
/*
* Disables all the rules tied to @object when it is forbidden
* to add new rule but still allowed to remove them with
* landlock_put_rule(). This is crucial to be able to safely
* free a rule according to landlock_rule_is_disabled().
*/
list_for_each_entry_safe(rule_walker, rule_walker2,
&object->rules, list)
list_del_rcu(&rule_walker->list);
So... rules don't take references on the landlock_objects they use? Instead, the landlock_object knows which rules use it, and when the landlock_object goes away, it nukes all the rules associated with itself?
Right.
That seems terrible to me - AFAICS it means that if some random process decides to install a landlock rule that uses inode X, and then that process dies together with all its landlock rules, the inode still stays pinned in kernel memory as long as the superblock is mounted. In other words, it's a resource leak.
That is not correct. When there is no more process enforced by a domain/ruleset, this domain is terminated, which means that every rules linked to this domain are put away. When the usage counter of a rule reaches zero, then the rule is terminated with landlock_put_rule() which unlink the rule from its object and clean this object. The cleaning involves to free the object if there is no rule tied to this object, thanks to put_object_cleaner().
When the underlying object is terminated, landlock_release_object() also decrement the usage counter. However, if there is a concurrent thread adding a new rule, the usage counter still stay greater than zero while the new rule is being added, but the counter then drops to zero at the end of this addition, which can then unbalance the "cleaners" counter, which will finally leads to the object freeing. This design enables to add rules without locking (if the object already exists). While this property is interesting for a performance point of view, the main reason is to avoid unnecessary lock between processes (especially from different domains).
(And if I'm not missing something in patch 5, that applies even if the inode has been unlinked?)
That is true for now, but only because I didn't find yet the right spot to call landlock_release_inode(). Indeed, unlinking a file may not terminate an inode because it can still be open by a process, and freeing an object when the underlying object is unlinked could be a way to bypass a check on that object/inode.
Do you know where is the best spot to identify the last userspace reference (through the filesystem or a file descriptor) to an inode? Fnotify doesn't seem to check for that.
Can you please refactor your refcounting as follows?
- A rule takes a reference on each landlock_object it uses.
- A landlock_object takes a reference on the underlying object (just like now).
- The underlying object *DOES NOT* take a reference on the
landlock_object (unlike now); the reference from the underlying object to the landlock_object has weak pointer semantics.
We need to increment the reference counter of the underlying objects (i.e. inodes) not to lose the link with their Landlock object and then the related access-control. For instance, if a struct inode (e.g. a directory) is first tied to a Landlock object/access-control, then because the inode is not open nor used by any process and the kernel decides to free it, when a process tries to access a file beneath this directory, there will not have any Landlock object tied to it and the requested access might then be forbidden (whereas the initial policy allowed it).
- When a landlock_object's refcount drops to zero (iow no rules use
it anymore), it is freed.
Before the current design, I used a similar pattern, but this is not necessary because of the management of the underlying object lifetime. The list_empty() check is enough, and because we need to handle concurrent termination, the object's usage counter for the rules seems unnecessary.
That might also help get rid of the awkward ->cleaners thing?
/*
* Releases @object if it is not already released (e.g. with
* landlock_release_object()).
*/
release_object(object);
/*
* Unbalances the @object->cleaners counter to reflect the
* underlying object release.
*/
__acquire(object->cleaners);
put_object_free(object);
}
put_object_cleaner(object_cleaner);
+}
[...]
+static inline bool landlock_rule_is_disabled(
struct landlock_rule *rule)
+{
/*
* Disabling (i.e. unlinking) a landlock_rule is a one-way operation.
* It is not possible to re-enable such a rule, then there is no need
* for smp_load_acquire().
*
* LIST_POISON2 is set by list_del() and list_del_rcu().
*/
return !rule || READ_ONCE(rule->list.prev) == LIST_POISON2;
You're not allowed to do this, the comment above list_del() states:
- Note: list_empty() on entry does not return true after this, the entry is
- in an undefined state.
list_del() checks READ_ONCE(head->next) == head, but landlock_rule_is_disabled() checks READ_ONCE(rule->list.prev) == LIST_POISON2. The comment about LIST_POISON2 is right but may be misleading. There is no use of list_empty() with a landlock_rule->list, only landlock_object->rules. The only list_del() is in landlock_put_rule() when there is a guarantee that there is no other reference to it, hence no possible use of landlock_rule_is_disabled() with this rule. I could replace it with a call to list_del_rcu() to make it more consistent.
If you want to be able to test whether the element is on a list afterwards, use stuff like list_del_init().
There is no need to re-initialize the list but using list_del_init() and list_empty() could work too. However, there is no list_del_init_rcu() helper. Moreover, resetting the list's pointer with LIST_POISON2 might help to detect bugs.
Either way, you are currently using the list_head API in a way that goes against what the header documents. If you want to rely on list_del() bringing the object into a specific state, then you can't leave the comment above list_del() as-is that says that it puts the object in an undefined state; and this kind of check should probably be done in a helper in list.h instead of open-coding the check for LIST_POISON2.
In the case of Landlock, it is illegal to use or recycle a rule which was untied from its (initial) object. There is no use of list_empty(&landlock_rule->list), only landlock_rule_is_disabled(landlock_rule). The LIST_POISON2 might help to identify such misuse.
A Landlock ruleset is mainly a red-black tree with Landlock rules as nodes. This enables quick update and lookup to match a requested access e.g., to a file. A ruleset is usable through a dedicated file descriptor (cf. following commit adding the syscall) which enables a process to build it by adding new rules.
A domain is a ruleset tied to a set of processes. This group of rules defined the security policy enforced on these processes and their future children. A domain can transition to a new domain which is the merge of itself with a ruleset provided by the current process. This merge is the intersection of all the constraints, which means that a process can only gain more constraints over time.
Signed-off-by: Mickaël Salaün mic@digikod.net Cc: Andy Lutomirski luto@amacapital.net Cc: James Morris jmorris@namei.org Cc: Kees Cook keescook@chromium.org Cc: Serge E. Hallyn serge@hallyn.com ---
Changes since v13: * New implementation, inspired by the previous inode eBPF map, but agnostic to the underlying kernel object.
Previous version: https://lore.kernel.org/lkml/20190721213116.23476-7-mic@digikod.net/ --- MAINTAINERS | 1 + include/uapi/linux/landlock.h | 102 ++++++++ security/landlock/Makefile | 2 +- security/landlock/ruleset.c | 460 ++++++++++++++++++++++++++++++++++ security/landlock/ruleset.h | 106 ++++++++ 5 files changed, 670 insertions(+), 1 deletion(-) create mode 100644 include/uapi/linux/landlock.h create mode 100644 security/landlock/ruleset.c create mode 100644 security/landlock/ruleset.h
diff --git a/MAINTAINERS b/MAINTAINERS index 206f85768cd9..937257925e65 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9366,6 +9366,7 @@ L: linux-security-module@vger.kernel.org W: https://landlock.io T: git https://github.com/landlock-lsm/linux.git S: Supported +F: include/uapi/linux/landlock.h F: security/landlock/ K: landlock K: LANDLOCK diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h new file mode 100644 index 000000000000..92760aca3645 --- /dev/null +++ b/include/uapi/linux/landlock.h @@ -0,0 +1,102 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * Landlock - UAPI headers + * + * Copyright © 2017-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2018-2020 ANSSI + */ + +#ifndef _UAPI__LINUX_LANDLOCK_H__ +#define _UAPI__LINUX_LANDLOCK_H__ + +/** + * DOC: fs_access + * + * A set of actions on kernel objects may be defined by an attribute (e.g. + * &struct landlock_attr_path_beneath) and a bitmask of access. + * + * Filesystem flags + * ~~~~~~~~~~~~~~~~ + * + * These flags enable to restrict a sandbox process to a set of of actions on + * files and directories. Files or directories opened before the sandboxing + * are not subject to these restrictions. + * + * - %LANDLOCK_ACCESS_FS_READ: Open or map a file with read access. + * - %LANDLOCK_ACCESS_FS_READDIR: List the content of a directory. + * - %LANDLOCK_ACCESS_FS_GETATTR: Read metadata of a file or a directory. + * - %LANDLOCK_ACCESS_FS_WRITE: Write to a file. + * - %LANDLOCK_ACCESS_FS_TRUNCATE: Truncate a file. + * - %LANDLOCK_ACCESS_FS_LOCK: Lock a file. + * - %LANDLOCK_ACCESS_FS_CHMOD: Change DAC permissions on a file or a + * directory. + * - %LANDLOCK_ACCESS_FS_CHOWN: Change the owner of a file or a directory. + * - %LANDLOCK_ACCESS_FS_CHGRP: Change the group of a file or a directory. + * - %LANDLOCK_ACCESS_FS_IOCTL: Send various command to a special file, cf. + * :manpage:`ioctl(2)`. + * - %LANDLOCK_ACCESS_FS_LINK_TO: Link a file into a directory. + * - %LANDLOCK_ACCESS_FS_RENAME_FROM: Rename a file or a directory. + * - %LANDLOCK_ACCESS_FS_RENAME_TO: Rename a file or a directory. + * - %LANDLOCK_ACCESS_FS_RMDIR: Remove an empty directory. + * - %LANDLOCK_ACCESS_FS_UNLINK: Remove a file. + * - %LANDLOCK_ACCESS_FS_MAKE_CHAR: Create a character device. + * - %LANDLOCK_ACCESS_FS_MAKE_DIR: Create a directory. + * - %LANDLOCK_ACCESS_FS_MAKE_REG: Create a regular file. + * - %LANDLOCK_ACCESS_FS_MAKE_SOCK: Create a UNIX domain socket. + * - %LANDLOCK_ACCESS_FS_MAKE_FIFO: Create a named pipe. + * - %LANDLOCK_ACCESS_FS_MAKE_BLOCK: Create a block device. + * - %LANDLOCK_ACCESS_FS_MAKE_SYM: Create a symbolic link. + * - %LANDLOCK_ACCESS_FS_EXECUTE: Execute a file. + * - %LANDLOCK_ACCESS_FS_CHROOT: Change the root directory of the current + * process. + * - %LANDLOCK_ACCESS_FS_OPEN: Open a file or a directory. This flag is set + * for any actions (e.g. read, write, execute) requested to open a file or + * directory. + * - %LANDLOCK_ACCESS_FS_MAP: Map a file. This flag is set for any actions + * (e.g. read, write, execute) requested to map a file. + * + * There is currently no restriction for directory walking e.g., + * :manpage:`chdir(2)`. + */ +#define LANDLOCK_ACCESS_FS_READ (1ULL << 0) +#define LANDLOCK_ACCESS_FS_READDIR (1ULL << 1) +#define LANDLOCK_ACCESS_FS_GETATTR (1ULL << 2) +#define LANDLOCK_ACCESS_FS_WRITE (1ULL << 3) +#define LANDLOCK_ACCESS_FS_TRUNCATE (1ULL << 4) +#define LANDLOCK_ACCESS_FS_LOCK (1ULL << 5) +#define LANDLOCK_ACCESS_FS_CHMOD (1ULL << 6) +#define LANDLOCK_ACCESS_FS_CHOWN (1ULL << 7) +#define LANDLOCK_ACCESS_FS_CHGRP (1ULL << 8) +#define LANDLOCK_ACCESS_FS_IOCTL (1ULL << 9) +#define LANDLOCK_ACCESS_FS_LINK_TO (1ULL << 10) +#define LANDLOCK_ACCESS_FS_RENAME_FROM (1ULL << 11) +#define LANDLOCK_ACCESS_FS_RENAME_TO (1ULL << 12) +#define LANDLOCK_ACCESS_FS_RMDIR (1ULL << 13) +#define LANDLOCK_ACCESS_FS_UNLINK (1ULL << 14) +#define LANDLOCK_ACCESS_FS_MAKE_CHAR (1ULL << 15) +#define LANDLOCK_ACCESS_FS_MAKE_DIR (1ULL << 16) +#define LANDLOCK_ACCESS_FS_MAKE_REG (1ULL << 17) +#define LANDLOCK_ACCESS_FS_MAKE_SOCK (1ULL << 18) +#define LANDLOCK_ACCESS_FS_MAKE_FIFO (1ULL << 19) +#define LANDLOCK_ACCESS_FS_MAKE_BLOCK (1ULL << 20) +#define LANDLOCK_ACCESS_FS_MAKE_SYM (1ULL << 21) +#define LANDLOCK_ACCESS_FS_EXECUTE (1ULL << 22) +#define LANDLOCK_ACCESS_FS_CHROOT (1ULL << 23) +#define LANDLOCK_ACCESS_FS_OPEN (1ULL << 24) +#define LANDLOCK_ACCESS_FS_MAP (1ULL << 25) + +/* + * Potential future access: + * - %LANDLOCK_ACCESS_FS_SETATTR + * - %LANDLOCK_ACCESS_FS_APPEND + * - %LANDLOCK_ACCESS_FS_LINK_FROM + * - %LANDLOCK_ACCESS_FS_MOUNT_FROM + * - %LANDLOCK_ACCESS_FS_MOUNT_TO + * - %LANDLOCK_ACCESS_FS_UNMOUNT + * - %LANDLOCK_ACCESS_FS_TRANSFER + * - %LANDLOCK_ACCESS_FS_RECEIVE + * - %LANDLOCK_ACCESS_FS_CHDIR + * - %LANDLOCK_ACCESS_FS_FCNTL + */ + +#endif /* _UAPI__LINUX_LANDLOCK_H__ */ diff --git a/security/landlock/Makefile b/security/landlock/Makefile index cb6deefbf4c0..d846eba445bb 100644 --- a/security/landlock/Makefile +++ b/security/landlock/Makefile @@ -1,3 +1,3 @@ obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
-landlock-y := object.o +landlock-y := object.o ruleset.o diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c new file mode 100644 index 000000000000..5ec013a4188d --- /dev/null +++ b/security/landlock/ruleset.c @@ -0,0 +1,460 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Landlock LSM - Ruleset management + * + * Copyright © 2016-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2018-2020 ANSSI + */ + +#include <linux/bug.h> +#include <linux/err.h> +#include <linux/errno.h> +#include <linux/kernel.h> +#include <linux/list.h> +#include <linux/rbtree.h> +#include <linux/rcupdate.h> +#include <linux/refcount.h> +#include <linux/slab.h> +#include <linux/spinlock.h> +#include <linux/workqueue.h> + +#include "object.h" +#include "ruleset.h" + +static struct landlock_ruleset *create_ruleset(void) +{ + struct landlock_ruleset *ruleset; + + ruleset = kzalloc(sizeof(*ruleset), GFP_KERNEL); + if (!ruleset) + return ERR_PTR(-ENOMEM); + refcount_set(&ruleset->usage, 1); + mutex_init(&ruleset->lock); + atomic_set(&ruleset->nb_rules, 0); + ruleset->root = RB_ROOT; + return ruleset; +} + +struct landlock_ruleset *landlock_create_ruleset(u64 fs_access_mask) +{ + struct landlock_ruleset *ruleset; + + /* Safely handles 32-bits conversion. */ + BUILD_BUG_ON(!__same_type(fs_access_mask, _LANDLOCK_ACCESS_FS_LAST)); + + /* Checks content. */ + if ((fs_access_mask | _LANDLOCK_ACCESS_FS_MASK) != + _LANDLOCK_ACCESS_FS_MASK) + return ERR_PTR(-EINVAL); + /* Informs about useless ruleset. */ + if (!fs_access_mask) + return ERR_PTR(-ENOMSG); + ruleset = create_ruleset(); + if (!IS_ERR(ruleset)) + ruleset->fs_access_mask = fs_access_mask; + return ruleset; +} + +/* + * The underlying kernel object must be held by the caller. + */ +static struct landlock_ruleset_elem *create_ruleset_elem( + struct landlock_object *object) +{ + struct landlock_ruleset_elem *ruleset_elem; + + ruleset_elem = kzalloc(sizeof(*ruleset_elem), GFP_KERNEL); + if (!ruleset_elem) + return ERR_PTR(-ENOMEM); + RB_CLEAR_NODE(&ruleset_elem->node); + RCU_INIT_POINTER(ruleset_elem->ref.object, object); + return ruleset_elem; +} + +static struct landlock_rule *create_rule(struct landlock_object *object, + struct landlock_access *access) +{ + struct landlock_rule *new_rule; + + if (WARN_ON_ONCE(!object)) + return ERR_PTR(-EFAULT); + if (WARN_ON_ONCE(!access)) + return ERR_PTR(-EFAULT); + new_rule = kzalloc(sizeof(*new_rule), GFP_KERNEL); + if (!new_rule) + return ERR_PTR(-ENOMEM); + refcount_set(&new_rule->usage, 1); + INIT_LIST_HEAD(&new_rule->list); + new_rule->access = *access; + + spin_lock(&object->lock); + list_add_tail(&new_rule->list, &object->rules); + spin_unlock(&object->lock); + return new_rule; +} + +/* + * An inserted rule can not be removed, only disabled (cf. struct + * landlock_ruleset_elem). + * + * The underlying kernel object must be held by the caller. + * + * @rule: Allocated struct owned by this function. The caller must hold the + * underlying kernel object (e.g., with a FD). + */ +int landlock_insert_ruleset_rule(struct landlock_ruleset *ruleset, + struct landlock_object *object, struct landlock_access *access, + struct landlock_rule *rule) +{ + struct rb_node **new; + struct rb_node *parent = NULL; + struct landlock_ruleset_elem *ruleset_elem; + struct landlock_rule *new_rule; + + might_sleep(); + /* Accesses may be set when creating a new rule. */ + if (rule) { + if (WARN_ON_ONCE(access)) + return -EINVAL; + } else { + if (WARN_ON_ONCE(!access)) + return -EFAULT; + } + + lockdep_assert_held(&ruleset->lock); + new = &(ruleset->root.rb_node); + while (*new) { + struct landlock_ruleset_elem *this = rb_entry(*new, + struct landlock_ruleset_elem, node); + uintptr_t this_object; + struct landlock_rule *this_rule; + struct landlock_access new_access; + + this_object = (uintptr_t)rcu_access_pointer(this->ref.object); + if (this_object != (uintptr_t)object) { + parent = *new; + if (this_object < (uintptr_t)object) + new = &((*new)->rb_right); + else + new = &((*new)->rb_left); + continue; + } + + /* Do not increment ruleset->nb_rules. */ + this_rule = rcu_dereference_protected(this->ref.rule, + lockdep_is_held(&ruleset->lock)); + /* + * Checks if it is a new object with the same address as a + * previously disabled one. There is no possible race + * condition because an object can not be disabled/deleted + * while being inserted in this tree. + */ + if (landlock_rule_is_disabled(this_rule)) { + if (rule) { + refcount_inc(&rule->usage); + new_rule = rule; + } else { + /* Replace the previous rule with a new one. */ + new_rule = create_rule(object, access); + if (IS_ERR(new_rule)) + return PTR_ERR(new_rule); + } + rcu_assign_pointer(this->ref.rule, new_rule); + landlock_put_rule(object, this_rule); + return 0; + } + + /* this_rule is potentially enabled. */ + if (refcount_read(&this_rule->usage) == 1) { + if (rule) { + /* merge rule: intersection of access rights */ + this_rule->access.self &= rule->access.self; + this_rule->access.beneath &= + rule->access.beneath; + } else { + /* extend rule: union of access rights */ + this_rule->access.self |= access->self; + this_rule->access.beneath |= access->beneath; + } + return 0; + } + + /* + * If this_rule is shared with another ruleset, then create a + * new object rule. + */ + if (rule) { + /* Merging a rule means an intersection of access. */ + new_access.self = this_rule->access.self & + rule->access.self; + new_access.beneath = this_rule->access.beneath & + rule->access.beneath; + } else { + /* Extending a rule means a union of access. */ + new_access.self = this_rule->access.self | + access->self; + new_access.beneath = this_rule->access.self | + access->beneath; + } + new_rule = create_rule(object, &new_access); + if (IS_ERR(new_rule)) + return PTR_ERR(new_rule); + rcu_assign_pointer(this->ref.rule, new_rule); + landlock_put_rule(object, this_rule); + return 0; + } + + /* There is no match for @object. */ + ruleset_elem = create_ruleset_elem(object); + if (IS_ERR(ruleset_elem)) + return PTR_ERR(ruleset_elem); + if (rule) { + refcount_inc(&rule->usage); + new_rule = rule; + } else { + new_rule = create_rule(object, access); + if (IS_ERR(new_rule)) { + kfree(ruleset_elem); + return PTR_ERR(new_rule); + } + } + RCU_INIT_POINTER(ruleset_elem->ref.rule, new_rule); + /* + * Because of the missing RCU context annotation in struct rb_node, + * Sparse emits a warning when encountering rb_link_node_rcu(), but + * this function call is still safe. + */ + rb_link_node_rcu(&ruleset_elem->node, parent, new); + rb_insert_color(&ruleset_elem->node, &ruleset->root); + atomic_inc(&ruleset->nb_rules); + return 0; +} + +static int merge_ruleset(struct landlock_ruleset *dst, + struct landlock_ruleset *src) +{ + struct rb_node *node; + int err = 0; + + might_sleep(); + if (!src) + return 0; + if (WARN_ON_ONCE(!dst)) + return -EFAULT; + if (WARN_ON_ONCE(!dst->hierarchy)) + return -EINVAL; + + mutex_lock(&dst->lock); + mutex_lock_nested(&src->lock, 1); + dst->fs_access_mask |= src->fs_access_mask; + for (node = rb_first(&src->root); node; node = rb_next(node)) { + struct landlock_ruleset_elem *elem = rb_entry(node, + struct landlock_ruleset_elem, node); + struct landlock_object *object = + rcu_dereference_protected(elem->ref.object, + lockdep_is_held(&src->lock)); + struct landlock_rule *rule = + rcu_dereference_protected(elem->ref.rule, + lockdep_is_held(&src->lock)); + + err = landlock_insert_ruleset_rule(dst, object, NULL, rule); + if (err) + goto out_unlock; + } + +out_unlock: + mutex_unlock(&src->lock); + mutex_unlock(&dst->lock); + return err; +} + +void landlock_get_ruleset(struct landlock_ruleset *ruleset) +{ + if (!ruleset) + return; + refcount_inc(&ruleset->usage); +} + +static void put_hierarchy(struct landlock_hierarchy *hierarchy) +{ + if (hierarchy && refcount_dec_and_test(&hierarchy->usage)) + kfree(hierarchy); +} + +static void put_ruleset(struct landlock_ruleset *ruleset) +{ + struct rb_node *orig; + + might_sleep(); + for (orig = rb_first(&ruleset->root); orig; orig = rb_next(orig)) { + struct landlock_ruleset_elem *freeme; + struct landlock_object *object; + struct landlock_rule *rule; + + freeme = rb_entry(orig, struct landlock_ruleset_elem, node); + object = rcu_dereference_protected(freeme->ref.object, + refcount_read(&ruleset->usage) == 0); + rule = rcu_dereference_protected(freeme->ref.rule, + refcount_read(&ruleset->usage) == 0); + landlock_put_rule(object, rule); + kfree_rcu(freeme, rcu_free); + } + put_hierarchy(ruleset->hierarchy); + kfree_rcu(ruleset, rcu_free); +} + +void landlock_put_ruleset(struct landlock_ruleset *ruleset) +{ + might_sleep(); + if (ruleset && refcount_dec_and_test(&ruleset->usage)) + put_ruleset(ruleset); +} + +static void put_ruleset_work(struct work_struct *work) +{ + struct landlock_ruleset *ruleset; + + ruleset = container_of(work, struct landlock_ruleset, work_put); + /* + * Clean up rcu_free because of previous use through union work_put. + * ruleset->rcu_free.func is already NULLed by __rcu_reclaim(). + */ + ruleset->rcu_free.next = NULL; + put_ruleset(ruleset); +} + +void landlock_put_ruleset_enqueue(struct landlock_ruleset *ruleset) +{ + if (ruleset && refcount_dec_and_test(&ruleset->usage)) { + INIT_WORK(&ruleset->work_put, put_ruleset_work); + schedule_work(&ruleset->work_put); + } +} + +static bool clean_ref(struct landlock_ref *ref) +{ + struct landlock_rule *rule; + + rule = rcu_dereference(ref->rule); + if (!rule) + return false; + if (!landlock_rule_is_disabled(rule)) + return false; + rcu_assign_pointer(ref->rule, NULL); + /* + * landlock_put_rule() will not sleep because we already checked + * !landlock_rule_is_disabled(rule). + */ + landlock_put_rule(rcu_dereference(ref->object), rule); + return true; +} + +static void clean_ruleset(struct landlock_ruleset *ruleset) +{ + struct rb_node *node; + + if (!ruleset) + return; + /* We must lock the ruleset to not have a wrong nb_rules counter. */ + mutex_lock(&ruleset->lock); + rcu_read_lock(); + for (node = rb_first(&ruleset->root); node; node = rb_next(node)) { + struct landlock_ruleset_elem *elem = rb_entry(node, + struct landlock_ruleset_elem, node); + + if (clean_ref(&elem->ref)) { + rb_erase(&elem->node, &ruleset->root); + kfree_rcu(elem, rcu_free); + atomic_dec(&ruleset->nb_rules); + } + } + rcu_read_unlock(); + mutex_unlock(&ruleset->lock); +} + +/* + * Creates a new ruleset, merged of @parent and @ruleset, or return @parent if + * @ruleset is empty. If @parent is empty, return a duplicate of @ruleset. + * + * @parent: Must not be modified (i.e. locked or read-only). + */ +struct landlock_ruleset *landlock_merge_ruleset( + struct landlock_ruleset *parent, + struct landlock_ruleset *ruleset) +{ + struct landlock_ruleset *new_dom; + int err; + + might_sleep(); + /* Opportunistically put disabled rules. */ + clean_ruleset(ruleset); + + if (parent && WARN_ON_ONCE(!parent->hierarchy)) + return ERR_PTR(-EINVAL); + if (!ruleset || atomic_read(&ruleset->nb_rules) == 0 || + parent == ruleset) { + landlock_get_ruleset(parent); + return parent; + } + + new_dom = create_ruleset(); + if (IS_ERR(new_dom)) + return new_dom; + new_dom->hierarchy = kzalloc(sizeof(*new_dom->hierarchy), GFP_KERNEL); + if (!new_dom->hierarchy) { + landlock_put_ruleset(new_dom); + return ERR_PTR(-ENOMEM); + } + refcount_set(&new_dom->hierarchy->usage, 1); + + if (parent) { + new_dom->hierarchy->parent = parent->hierarchy; + refcount_inc(&parent->hierarchy->usage); + err = merge_ruleset(new_dom, parent); + if (err) { + landlock_put_ruleset(new_dom); + return ERR_PTR(err); + } + } + err = merge_ruleset(new_dom, ruleset); + if (err) { + landlock_put_ruleset(new_dom); + return ERR_PTR(err); + } + return new_dom; +} + +/* + * The return pointer must only be used in a RCU-read block. + */ +const struct landlock_access *landlock_find_access( + const struct landlock_ruleset *ruleset, + const struct landlock_object *object) +{ + struct rb_node *node; + + WARN_ON_ONCE(!rcu_read_lock_held()); + if (!object) + return NULL; + node = ruleset->root.rb_node; + while (node) { + struct landlock_ruleset_elem *this = rb_entry(node, + struct landlock_ruleset_elem, node); + uintptr_t this_object = + (uintptr_t)rcu_access_pointer(this->ref.object); + + if (this_object == (uintptr_t)object) { + struct landlock_rule *rule; + + rule = rcu_dereference(this->ref.rule); + if (!landlock_rule_is_disabled(rule)) + return &rule->access; + return NULL; + } + if (this_object < (uintptr_t)object) + node = node->rb_right; + else + node = node->rb_left; + } + return NULL; +} diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h new file mode 100644 index 000000000000..afc88dbb8b4b --- /dev/null +++ b/security/landlock/ruleset.h @@ -0,0 +1,106 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Landlock LSM - Ruleset management + * + * Copyright © 2016-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2018-2020 ANSSI + */ + +#ifndef _SECURITY_LANDLOCK_RULESET_H +#define _SECURITY_LANDLOCK_RULESET_H + +#include <linux/compiler.h> +#include <linux/mutex.h> +#include <linux/poison.h> +#include <linux/rbtree.h> +#include <linux/rcupdate.h> +#include <linux/refcount.h> +#include <linux/types.h> +#include <linux/workqueue.h> +#include <uapi/linux/landlock.h> + +#include "object.h" + +#define _LANDLOCK_ACCESS_FS_LAST LANDLOCK_ACCESS_FS_MAP +#define _LANDLOCK_ACCESS_FS_MASK ((_LANDLOCK_ACCESS_FS_LAST << 1) - 1) + +struct landlock_ref { + /* + * @object: Identify a kernel object (e.g. an inode). This is used as + * a key for a ruleset tree (cf. struct landlock_ruleset_elem). This + * pointer is set once and never modified. It may point to a deleted + * object and should then be dereferenced with great care, thanks to a + * call to landlock_rule_is_disabled(@rule) from inside an RCU-read + * block, cf. landlock_put_rule(). + */ + struct landlock_object __rcu *object; + /* + * @rule: Ties a rule to an object. Set once with an allocated rule, + * but can be NULLed if the rule is disabled. + */ + struct landlock_rule __rcu *rule; +}; + +/* + * Red-black tree element used in a landlock_ruleset. + */ +struct landlock_ruleset_elem { + struct landlock_ref ref; + struct rb_node node; + struct rcu_head rcu_free; +}; + +/* + * Enable hierarchy identification even when a parent domain vanishes. This is + * needed for the ptrace protection. + */ +struct landlock_hierarchy { + struct landlock_hierarchy *parent; + refcount_t usage; +}; + +/* + * Kernel representation of a ruleset. This data structure must contains + * unique entries, be updatable, and quick to match an object. + */ +struct landlock_ruleset { + /* + * @fs_access_mask: Contains the subset of filesystem actions which are + * restricted by a ruleset. This is used when merging rulesets and for + * userspace backward compatibility (i.e. future-proof). Set once and + * never changed for the lifetime of the ruleset. + */ + u32 fs_access_mask; + struct landlock_hierarchy *hierarchy; + refcount_t usage; + union { + struct rcu_head rcu_free; + struct work_struct work_put; + }; + struct mutex lock; + atomic_t nb_rules; + /* + * @root: Red-black tree containing landlock_ruleset_elem nodes. + */ + struct rb_root root; +}; + +struct landlock_ruleset *landlock_create_ruleset(u64 fs_access_mask); + +void landlock_get_ruleset(struct landlock_ruleset *ruleset); +void landlock_put_ruleset(struct landlock_ruleset *ruleset); +void landlock_put_ruleset_enqueue(struct landlock_ruleset *ruleset); + +int landlock_insert_ruleset_rule(struct landlock_ruleset *ruleset, + struct landlock_object *object, struct landlock_access *access, + struct landlock_rule *rule); + +struct landlock_ruleset *landlock_merge_ruleset( + struct landlock_ruleset *domain, + struct landlock_ruleset *ruleset); + +const struct landlock_access *landlock_find_access( + const struct landlock_ruleset *ruleset, + const struct landlock_object *object); + +#endif /* _SECURITY_LANDLOCK_RULESET_H */
A process credentials point to a Landlock domain, which is underneath implemented with a ruleset. In the following commits, this domain is used to check and enforce the ptrace and filesystem security policies. A domain is inherited from a parent to its child the same way a thread inherits a seccomp policy.
Signed-off-by: Mickaël Salaün mic@digikod.net Cc: Andy Lutomirski luto@amacapital.net Cc: James Morris jmorris@namei.org Cc: Kees Cook keescook@chromium.org Cc: Serge E. Hallyn serge@hallyn.com ---
Changes since v13: * totally get ride of the seccomp dependency * only keep credential management and LSM setup.
Previous version: https://lore.kernel.org/lkml/20191104172146.30797-4-mic@digikod.net/ --- security/Kconfig | 10 +++---- security/landlock/Makefile | 3 ++- security/landlock/cred.c | 47 ++++++++++++++++++++++++++++++++ security/landlock/cred.h | 55 ++++++++++++++++++++++++++++++++++++++ security/landlock/setup.c | 30 +++++++++++++++++++++ security/landlock/setup.h | 18 +++++++++++++ 6 files changed, 157 insertions(+), 6 deletions(-) create mode 100644 security/landlock/cred.c create mode 100644 security/landlock/cred.h create mode 100644 security/landlock/setup.c create mode 100644 security/landlock/setup.h
diff --git a/security/Kconfig b/security/Kconfig index 9d9981394fb0..76547b5c694d 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -278,11 +278,11 @@ endchoice
config LSM string "Ordered list of enabled LSMs" - default "lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor" if DEFAULT_SECURITY_SMACK - default "lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo" if DEFAULT_SECURITY_APPARMOR - default "lockdown,yama,loadpin,safesetid,integrity,tomoyo" if DEFAULT_SECURITY_TOMOYO - default "lockdown,yama,loadpin,safesetid,integrity" if DEFAULT_SECURITY_DAC - default "lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor" + default "landlock,lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor" if DEFAULT_SECURITY_SMACK + default "landlock,lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo" if DEFAULT_SECURITY_APPARMOR + default "landlock,lockdown,yama,loadpin,safesetid,integrity,tomoyo" if DEFAULT_SECURITY_TOMOYO + default "landlock,lockdown,yama,loadpin,safesetid,integrity" if DEFAULT_SECURITY_DAC + default "landlock,lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor" help A comma-separated list of LSMs, in initialization order. Any LSMs left off this list will be ignored. This can be diff --git a/security/landlock/Makefile b/security/landlock/Makefile index d846eba445bb..041ea242e627 100644 --- a/security/landlock/Makefile +++ b/security/landlock/Makefile @@ -1,3 +1,4 @@ obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
-landlock-y := object.o ruleset.o +landlock-y := setup.o object.o ruleset.o \ + cred.o diff --git a/security/landlock/cred.c b/security/landlock/cred.c new file mode 100644 index 000000000000..69ef93e29a53 --- /dev/null +++ b/security/landlock/cred.c @@ -0,0 +1,47 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Landlock LSM - Credential hooks + * + * Copyright © 2017-2019 Mickaël Salaün mic@digikod.net + * Copyright © 2018-2019 ANSSI + */ + +#include <linux/cred.h> +#include <linux/lsm_hooks.h> + +#include "cred.h" +#include "ruleset.h" +#include "setup.h" + +static int hook_cred_prepare(struct cred *new, const struct cred *old, + gfp_t gfp) +{ + const struct landlock_cred_security *cred_old = landlock_cred(old); + struct landlock_cred_security *cred_new = landlock_cred(new); + struct landlock_ruleset *dom_old; + + dom_old = cred_old->domain; + if (dom_old) { + landlock_get_ruleset(dom_old); + cred_new->domain = dom_old; + } else { + cred_new->domain = NULL; + } + return 0; +} + +static void hook_cred_free(struct cred *cred) +{ + landlock_put_ruleset_enqueue(landlock_cred(cred)->domain); +} + +static struct security_hook_list landlock_hooks[] __lsm_ro_after_init = { + LSM_HOOK_INIT(cred_prepare, hook_cred_prepare), + LSM_HOOK_INIT(cred_free, hook_cred_free), +}; + +__init void landlock_add_hooks_cred(void) +{ + security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks), + LANDLOCK_NAME); +} diff --git a/security/landlock/cred.h b/security/landlock/cred.h new file mode 100644 index 000000000000..1e24682ee27e --- /dev/null +++ b/security/landlock/cred.h @@ -0,0 +1,55 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Landlock LSM - Credential hooks + * + * Copyright © 2019 Mickaël Salaün mic@digikod.net + * Copyright © 2019 ANSSI + */ + +#ifndef _SECURITY_LANDLOCK_CRED_H +#define _SECURITY_LANDLOCK_CRED_H + +#include <linux/cred.h> +#include <linux/init.h> +#include <linux/rcupdate.h> + +#include "ruleset.h" +#include "setup.h" + +struct landlock_cred_security { + struct landlock_ruleset *domain; +}; + +static inline struct landlock_cred_security *landlock_cred( + const struct cred *cred) +{ + return cred->security + landlock_blob_sizes.lbs_cred; +} + +static inline struct landlock_ruleset *landlock_get_current_domain(void) +{ + return landlock_cred(current_cred())->domain; +} + +/* + * The caller needs an RCU lock. + */ +static inline struct landlock_ruleset *landlock_get_task_domain( + struct task_struct *task) +{ + return landlock_cred(__task_cred(task))->domain; +} + +static inline bool landlocked(struct task_struct *task) +{ + bool has_dom; + + rcu_read_lock(); + has_dom = !!landlock_get_task_domain(task); + rcu_read_unlock(); + return has_dom; +} + +__init void landlock_add_hooks_cred(void); + +#endif /* _SECURITY_LANDLOCK_CRED_H */ diff --git a/security/landlock/setup.c b/security/landlock/setup.c new file mode 100644 index 000000000000..fca5fa185465 --- /dev/null +++ b/security/landlock/setup.c @@ -0,0 +1,30 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Landlock LSM - Security framework setup + * + * Copyright © 2016-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2018-2020 ANSSI + */ + +#include <linux/init.h> +#include <linux/lsm_hooks.h> + +#include "cred.h" +#include "setup.h" + +struct lsm_blob_sizes landlock_blob_sizes __lsm_ro_after_init = { + .lbs_cred = sizeof(struct landlock_cred_security), +}; + +static int __init landlock_init(void) +{ + pr_info(LANDLOCK_NAME ": Registering hooks\n"); + landlock_add_hooks_cred(); + return 0; +} + +DEFINE_LSM(LANDLOCK_NAME) = { + .name = LANDLOCK_NAME, + .init = landlock_init, + .blobs = &landlock_blob_sizes, +}; diff --git a/security/landlock/setup.h b/security/landlock/setup.h new file mode 100644 index 000000000000..52eb8d806376 --- /dev/null +++ b/security/landlock/setup.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Landlock LSM - Security framework setup + * + * Copyright © 2016-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2018-2020 ANSSI + */ + +#ifndef _SECURITY_LANDLOCK_SETUP_H +#define _SECURITY_LANDLOCK_SETUP_H + +#include <linux/lsm_hooks.h> + +#define LANDLOCK_NAME "landlock" + +extern struct lsm_blob_sizes landlock_blob_sizes; + +#endif /* _SECURITY_LANDLOCK_SETUP_H */
Using ptrace(2) and related debug features on a target process can lead to a privilege escalation. Indeed, ptrace(2) can be used by an attacker to impersonate another task and to remain undetected while performing malicious activities. Thanks to ptrace_may_access(), various part of the kernel can check if a tracer is more privileged than a tracee.
A landlocked process has fewer privileges than a non-landlocked process and must then be subject to additional restrictions when manipulating processes. To be allowed to use ptrace(2) and related syscalls on a target process, a landlocked process must have a subset of the target process' rules (i.e. the tracee must be in a sub-domain of the tracer).
Signed-off-by: Mickaël Salaün mic@digikod.net Cc: Andy Lutomirski luto@amacapital.net Cc: James Morris jmorris@namei.org Cc: Kees Cook keescook@chromium.org Cc: Serge E. Hallyn serge@hallyn.com ---
Changes since v13: * Make the ptrace restriction mandatory, like in the v10. * Remove the eBPF dependency.
Previous version: https://lore.kernel.org/lkml/20191104172146.30797-5-mic@digikod.net/ --- security/landlock/Makefile | 2 +- security/landlock/ptrace.c | 118 +++++++++++++++++++++++++++++++++++++ security/landlock/ptrace.h | 14 +++++ security/landlock/setup.c | 2 + 4 files changed, 135 insertions(+), 1 deletion(-) create mode 100644 security/landlock/ptrace.c create mode 100644 security/landlock/ptrace.h
diff --git a/security/landlock/Makefile b/security/landlock/Makefile index 041ea242e627..f1d1eb72fa76 100644 --- a/security/landlock/Makefile +++ b/security/landlock/Makefile @@ -1,4 +1,4 @@ obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
landlock-y := setup.o object.o ruleset.o \ - cred.o + cred.o ptrace.o diff --git a/security/landlock/ptrace.c b/security/landlock/ptrace.c new file mode 100644 index 000000000000..6c7326788c46 --- /dev/null +++ b/security/landlock/ptrace.c @@ -0,0 +1,118 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Landlock LSM - Ptrace hooks + * + * Copyright © 2017-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2020 ANSSI + */ + +#include <asm/current.h> +#include <linux/cred.h> +#include <linux/errno.h> +#include <linux/kernel.h> +#include <linux/lsm_hooks.h> +#include <linux/rcupdate.h> +#include <linux/sched.h> + +#include "cred.h" +#include "ptrace.h" +#include "ruleset.h" +#include "setup.h" + +/** + * domain_scope_le - Checks domain ordering for scoped ptrace + * + * @parent: Parent domain. + * @child: Potential child of @parent. + * + * Checks if the @parent domain is less or equal to (i.e. an ancestor, which + * means a subset of) the @child domain. + */ +static bool domain_scope_le(const struct landlock_ruleset *parent, + const struct landlock_ruleset *child) +{ + const struct landlock_hierarchy *walker; + + if (!parent) + return true; + if (!child) + return false; + for (walker = child->hierarchy; walker; walker = walker->parent) { + if (walker == parent->hierarchy) + /* @parent is in the scoped hierarchy of @child. */ + return true; + } + /* There is no relationship between @parent and @child. */ + return false; +} + +static bool task_is_scoped(struct task_struct *parent, + struct task_struct *child) +{ + bool is_scoped; + const struct landlock_ruleset *dom_parent, *dom_child; + + rcu_read_lock(); + dom_parent = landlock_get_task_domain(parent); + dom_child = landlock_get_task_domain(child); + is_scoped = domain_scope_le(dom_parent, dom_child); + rcu_read_unlock(); + return is_scoped; +} + +static int task_ptrace(struct task_struct *parent, struct task_struct *child) +{ + /* Quick return for non-landlocked tasks. */ + if (!landlocked(parent)) + return 0; + if (task_is_scoped(parent, child)) + return 0; + return -EPERM; +} + +/** + * hook_ptrace_access_check - Determines whether the current process may access + * another + * + * @child: Process to be accessed. + * @mode: Mode of attachment. + * + * If the current task has Landlock rules, then the child must have at least + * the same rules. Else denied. + * + * Determines whether a process may access another, returning 0 if permission + * granted, -errno if denied. + */ +static int hook_ptrace_access_check(struct task_struct *child, + unsigned int mode) +{ + return task_ptrace(current, child); +} + +/** + * hook_ptrace_traceme - Determines whether another process may trace the + * current one + * + * @parent: Task proposed to be the tracer. + * + * If the parent has Landlock rules, then the current task must have the same + * or more rules. Else denied. + * + * Determines whether the nominated task is permitted to trace the current + * process, returning 0 if permission is granted, -errno if denied. + */ +static int hook_ptrace_traceme(struct task_struct *parent) +{ + return task_ptrace(parent, current); +} + +static struct security_hook_list landlock_hooks[] __lsm_ro_after_init = { + LSM_HOOK_INIT(ptrace_access_check, hook_ptrace_access_check), + LSM_HOOK_INIT(ptrace_traceme, hook_ptrace_traceme), +}; + +__init void landlock_add_hooks_ptrace(void) +{ + security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks), + LANDLOCK_NAME); +} diff --git a/security/landlock/ptrace.h b/security/landlock/ptrace.h new file mode 100644 index 000000000000..6740c6a723de --- /dev/null +++ b/security/landlock/ptrace.h @@ -0,0 +1,14 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Landlock LSM - Ptrace hooks + * + * Copyright © 2017-2019 Mickaël Salaün mic@digikod.net + * Copyright © 2019 ANSSI + */ + +#ifndef _SECURITY_LANDLOCK_PTRACE_H +#define _SECURITY_LANDLOCK_PTRACE_H + +__init void landlock_add_hooks_ptrace(void); + +#endif /* _SECURITY_LANDLOCK_PTRACE_H */ diff --git a/security/landlock/setup.c b/security/landlock/setup.c index fca5fa185465..117afb344da6 100644 --- a/security/landlock/setup.c +++ b/security/landlock/setup.c @@ -10,6 +10,7 @@ #include <linux/lsm_hooks.h>
#include "cred.h" +#include "ptrace.h" #include "setup.h"
struct lsm_blob_sizes landlock_blob_sizes __lsm_ro_after_init = { @@ -20,6 +21,7 @@ static int __init landlock_init(void) { pr_info(LANDLOCK_NAME ": Registering hooks\n"); landlock_add_hooks_cred(); + landlock_add_hooks_ptrace(); return 0; }
Thanks to the Landlock objects and ruleset, it is possible to identify inodes according to a process' domain. To enable an unprivileged process to express a file hierarchy, it first needs to open a directory (or a file) and pass this file descriptor to the kernel through landlock(2). When checking if a file access request is allowed, we walk from the requested dentry to the real root, following the different mount layers. The access to each "tagged" inodes are collected and ANDed to create an access to the requested file hierarchy. This makes possible to identify a lot of files without tagging every inodes nor modifying the filesystem, while still following the view and understanding the user has from the filesystem.
Signed-off-by: Mickaël Salaün mic@digikod.net Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: Andy Lutomirski luto@amacapital.net Cc: James Morris jmorris@namei.org Cc: Kees Cook keescook@chromium.org Cc: Serge E. Hallyn serge@hallyn.com ---
Changes since v11: * Add back, revamp and make a fully working filesystem access-control based on paths and inodes. * Remove the eBPF dependency.
Previous version: https://lore.kernel.org/lkml/20190721213116.23476-6-mic@digikod.net/ --- MAINTAINERS | 1 + fs/super.c | 2 + include/linux/landlock.h | 22 ++ security/landlock/Kconfig | 1 + security/landlock/Makefile | 2 +- security/landlock/fs.c | 591 +++++++++++++++++++++++++++++++++++++ security/landlock/fs.h | 42 +++ security/landlock/object.c | 2 + security/landlock/setup.c | 6 + security/landlock/setup.h | 2 + 10 files changed, 670 insertions(+), 1 deletion(-) create mode 100644 include/linux/landlock.h create mode 100644 security/landlock/fs.c create mode 100644 security/landlock/fs.h
diff --git a/MAINTAINERS b/MAINTAINERS index 937257925e65..0c8c2c651b96 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9366,6 +9366,7 @@ L: linux-security-module@vger.kernel.org W: https://landlock.io T: git https://github.com/landlock-lsm/linux.git S: Supported +F: include/linux/landlock.h F: include/uapi/linux/landlock.h F: security/landlock/ K: landlock diff --git a/fs/super.c b/fs/super.c index cd352530eca9..4ad6a64a1706 100644 --- a/fs/super.c +++ b/fs/super.c @@ -34,6 +34,7 @@ #include <linux/cleancache.h> #include <linux/fscrypt.h> #include <linux/fsnotify.h> +#include <linux/landlock.h> #include <linux/lockdep.h> #include <linux/user_namespace.h> #include <linux/fs_context.h> @@ -454,6 +455,7 @@ void generic_shutdown_super(struct super_block *sb) evict_inodes(sb); /* only nonzero refcount inodes can have marks */ fsnotify_sb_delete(sb); + landlock_release_inodes(sb);
if (sb->s_dio_done_wq) { destroy_workqueue(sb->s_dio_done_wq); diff --git a/include/linux/landlock.h b/include/linux/landlock.h new file mode 100644 index 000000000000..0fb16d130b0a --- /dev/null +++ b/include/linux/landlock.h @@ -0,0 +1,22 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Landlock LSM - public kernel headers + * + * Copyright © 2016-2019 Mickaël Salaün mic@digikod.net + * Copyright © 2018-2019 ANSSI + */ + +#ifndef _LINUX_LANDLOCK_H +#define _LINUX_LANDLOCK_H + +#include <linux/fs.h> + +#ifdef CONFIG_SECURITY_LANDLOCK +extern void landlock_release_inodes(struct super_block *sb); +#else +static inline void landlock_release_inodes(struct super_block *sb) +{ +} +#endif + +#endif /* _LINUX_LANDLOCK_H */ diff --git a/security/landlock/Kconfig b/security/landlock/Kconfig index 4a321d5b3f67..af0593c2a9e5 100644 --- a/security/landlock/Kconfig +++ b/security/landlock/Kconfig @@ -3,6 +3,7 @@ config SECURITY_LANDLOCK bool "Landlock support" depends on SECURITY + select SECURITY_PATH default n help This selects Landlock, a safe sandboxing mechanism. It enables to diff --git a/security/landlock/Makefile b/security/landlock/Makefile index f1d1eb72fa76..92e3d80ab8ed 100644 --- a/security/landlock/Makefile +++ b/security/landlock/Makefile @@ -1,4 +1,4 @@ obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
landlock-y := setup.o object.o ruleset.o \ - cred.o ptrace.o + cred.o ptrace.o fs.o diff --git a/security/landlock/fs.c b/security/landlock/fs.c new file mode 100644 index 000000000000..7f3bd4fd04bb --- /dev/null +++ b/security/landlock/fs.c @@ -0,0 +1,591 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Landlock LSM - Filesystem management and hooks + * + * Copyright © 2016-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2018-2020 ANSSI + */ + +#include <linux/compiler_types.h> +#include <linux/dcache.h> +#include <linux/fs.h> +#include <linux/init.h> +#include <linux/kernel.h> +#include <linux/landlock.h> +#include <linux/lsm_hooks.h> +#include <linux/mman.h> +#include <linux/mm_types.h> +#include <linux/mount.h> +#include <linux/namei.h> +#include <linux/path.h> +#include <linux/rcupdate.h> +#include <linux/spinlock.h> +#include <linux/stat.h> +#include <linux/types.h> +#include <linux/uidgid.h> +#include <linux/workqueue.h> +#include <uapi/linux/landlock.h> + +#include "cred.h" +#include "fs.h" +#include "object.h" +#include "ruleset.h" +#include "setup.h" + +/* Underlying object management */ + +void landlock_release_inode(struct inode *inode, struct landlock_object *object) +{ + /* + * A call to landlock_put_object() or release_object() might sleep, but + * landlock_release_object() can not sleep because it is called with a + * reference to the inode. However, we can still mark this function as + * such because this should not bother landlock_release_object() + * callers (e.g. landlock_release_inodes()). + */ + might_sleep(); + /* + * We must check that no one else replaced the pinned object in the + * window between the reset of @object->underlying_object and now. + */ + spin_lock(&inode->i_lock); + if (rcu_access_pointer(inode_landlock(inode)->object) == object) + rcu_assign_pointer(inode_landlock(inode)->object, NULL); + spin_unlock(&inode->i_lock); + /* + * Because we first NULL the reference to the object, calling iput() + * won't trigger a call to landlock_put_object() (via + * put_underlying_object). + */ + iput(inode); +} + +/* + * Release the inodes used in a security policy. + * + * It is much more clean to have a dedicated call in generic_shutdown_super() + * than a hacky sb_free_security hook, especially with the locked sb_lock. + * + * Cf. fsnotify_unmount_inodes() + */ +void landlock_release_inodes(struct super_block *sb) +{ + struct inode *inode, *next, *iput_inode = NULL; + + if (!landlock_initialized) + return; + + spin_lock(&sb->s_inode_list_lock); + list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) { + spin_lock(&inode->i_lock); + if (inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW)) { + spin_unlock(&inode->i_lock); + continue; + } + if (!atomic_read(&inode->i_count)) { + spin_unlock(&inode->i_lock); + continue; + } + __iget(inode); + spin_unlock(&inode->i_lock); + spin_unlock(&sb->s_inode_list_lock); + /* + * We can now actually put the previous inode, which is not + * needed anymore for the loop walk. Because this inode should + * only be referenced by Landlock for this super block, iput() + * should trigger a call to hook_inode_free_security(). + */ + if (iput_inode) + iput(iput_inode); + + landlock_release_object(inode_landlock(inode)->object); + + iput_inode = inode; + spin_lock(&sb->s_inode_list_lock); + } + spin_unlock(&sb->s_inode_list_lock); + if (iput_inode) + iput(iput_inode); +} + +/* Ruleset management */ + +static struct landlock_object *get_inode_object(struct inode *inode) + __acquires(object->usage) +{ + struct landlock_object *object, *new_object; + + /* Let's first try a lockless access. */ + rcu_read_lock(); + object = landlock_get_object(rcu_dereference( + inode_landlock(inode)->object)); + rcu_read_unlock(); + if (object) + return object; + + __release(object->usage); + /* + * If there is no object tied to @inode, then create a new one (outside + * of a locked block). + */ + new_object = landlock_create_object(LANDLOCK_OBJECT_INODE, inode); + + spin_lock(&inode->i_lock); + object = landlock_get_object(rcu_dereference_protected( + inode_landlock(inode)->object, + lockdep_is_held(&inode->i_lock))); + if (unlikely(object)) { + /* + * Do not try to iput(inode) because it is not held yet. + */ + landlock_drop_object(new_object); + } else { + __release(object->usage); + object = landlock_get_object(new_object); + rcu_assign_pointer(inode_landlock(inode)->object, object); + /* + * @inode will be released by landlock_release_inodes() on its + * super-block shutdown. + */ + ihold(inode); + } + spin_unlock(&inode->i_lock); + return object; +} + +/* + * @path: Should have been checked by get_path_from_fd(). + */ +int landlock_append_fs_rule(struct landlock_ruleset *ruleset, + struct path *path, u64 access_hierarchy) +{ + int err; + struct landlock_access access; + struct landlock_object *object; + + /* + * Checks that @access_hierarchy matches the @ruleset constraints, but + * allow empty @access_hierarchy i.e., deny @ruleset->fs_access_mask . + */ + if ((ruleset->fs_access_mask | access_hierarchy) != + ruleset->fs_access_mask) + return -EINVAL; + /* Transforms relative access rights to absolute ones. */ + access_hierarchy |= _LANDLOCK_ACCESS_FS_MASK & + ~ruleset->fs_access_mask; + access.self = access_hierarchy; + access.beneath = access_hierarchy; + object = get_inode_object(d_backing_inode(path->dentry)); + mutex_lock(&ruleset->lock); + err = landlock_insert_ruleset_rule(ruleset, object, &access, NULL); + mutex_unlock(&ruleset->lock); + /* + * No need to check for an error because landlock_put_object() handles + * empty object and will terminate it if necessary. + */ + landlock_put_object(object); + return err; +} + +/* Access-control management */ + +static bool check_access_path_continue( + const struct landlock_ruleset *domain, + const struct path *path, u32 access_request, + const bool check_self, bool *allow) +{ + const struct landlock_access *access; + bool next = true; + + rcu_read_lock(); + access = landlock_find_access(domain, rcu_dereference(inode_landlock( + d_backing_inode(path->dentry))->object)); + if (access) { + next = ((check_self ? access->self : access->beneath) & + access_request) == access_request; + *allow = next; + } + rcu_read_unlock(); + return next; +} + +static int check_access_path(const struct landlock_ruleset *domain, + const struct path *path, u32 access_request) +{ + bool allow = false; + struct path walker_path; + + if (WARN_ON_ONCE(!path)) + return 0; + /* An access request not handled by the domain should be allowed. */ + access_request &= domain->fs_access_mask; + if (access_request == 0) + return 0; + walker_path = *path; + path_get(&walker_path); + if (check_access_path_continue(domain, &walker_path, access_request, + true, &allow)) { + /* + * We need to walk through all the hierarchy to not miss any + * relevant restriction. This could be optimized with a future + * commit. + */ + do { + struct dentry *parent_dentry; + +jump_up: + /* + * Does not work with orphaned/private mounts like + * overlayfs layers for now (cf. ovl_path_real() and + * ovl_path_open()). + */ + if (walker_path.dentry == walker_path.mnt->mnt_root) { + if (follow_up(&walker_path)) + /* Ignores hidden mount points. */ + goto jump_up; + else + /* Stops at the real root. */ + break; + } + parent_dentry = dget_parent(walker_path.dentry); + dput(walker_path.dentry); + walker_path.dentry = parent_dentry; + } while (check_access_path_continue(domain, &walker_path, + access_request, false, &allow)); + } + path_put(&walker_path); + return allow ? 0 : -EACCES; +} + +static inline int current_check_access_path(const struct path *path, + u32 access_request) +{ + struct landlock_ruleset *dom; + + dom = landlock_get_current_domain(); + if (!dom) + return 0; + return check_access_path(dom, path, access_request); +} + +/* Super-block hooks */ + +/* + * Because a Landlock security policy is defined according to the filesystem + * layout (i.e. the mount namespace), changing it may grant access to files not + * previously allowed. + * + * To make it simple, deny any filesystem layout modification by landlocked + * processes. Non-landlocked processes may still change the namespace of a + * landlocked process, but this kind of threat must be handled by a system-wide + * access-control security policy. + * + * This could be lifted in the future if Landlock can safely handle mount + * namespace updates requested by a landlocked process. Indeed, we could + * update the current domain (which is currently read-only) by taking into + * account the accesses of the source and the destination of a new mount point. + * However, it would also require to make all the child domains dynamically + * inherit these new constraints. Anyway, for backward compatibility reasons, + * a dedicated user space option would be required (e.g. as a ruleset command + * option). + */ +static int hook_sb_mount(const char *dev_name, const struct path *path, + const char *type, unsigned long flags, void *data) +{ + if (!landlock_get_current_domain()) + return 0; + return -EPERM; +} + +static int hook_move_mount(const struct path *from_path, + const struct path *to_path) +{ + if (!landlock_get_current_domain()) + return 0; + return -EPERM; +} + +/* + * Removing a mount point may reveal a previously hidden file hierarchy, which + * may then grant access to files, which may have previously been forbidden. + */ +static int hook_sb_umount(struct vfsmount *mnt, int flags) +{ + if (!landlock_get_current_domain()) + return 0; + return -EPERM; +} + +static int hook_sb_remount(struct super_block *sb, void *mnt_opts) +{ + if (!landlock_get_current_domain()) + return 0; + return -EPERM; +} + +/* + * pivot_root(2), like mount(2), changes the current mount namespace. It must + * then be forbidden for a landlocked process. + * + * However, chroot(2) may be allowed because it only changes the relative root + * directory of the current process. + */ +static int hook_sb_pivotroot(const struct path *old_path, + const struct path *new_path) +{ + if (!landlock_get_current_domain()) + return 0; + return -EPERM; +} + +/* Path hooks */ + +static int hook_path_link(struct dentry *old_dentry, + const struct path *new_dir, struct dentry *new_dentry) +{ + return current_check_access_path(new_dir, LANDLOCK_ACCESS_FS_LINK_TO); +} + +static int hook_path_mkdir(const struct path *dir, struct dentry *dentry, + umode_t mode) +{ + return current_check_access_path(dir, LANDLOCK_ACCESS_FS_MAKE_DIR); +} + +static inline u32 get_mode_access(umode_t mode) +{ + switch (mode & S_IFMT) { + case S_IFLNK: + return LANDLOCK_ACCESS_FS_MAKE_SYM; + case S_IFREG: + return LANDLOCK_ACCESS_FS_MAKE_REG; + case S_IFDIR: + return LANDLOCK_ACCESS_FS_MAKE_DIR; + case S_IFCHR: + return LANDLOCK_ACCESS_FS_MAKE_CHAR; + case S_IFBLK: + return LANDLOCK_ACCESS_FS_MAKE_BLOCK; + case S_IFIFO: + return LANDLOCK_ACCESS_FS_MAKE_FIFO; + case S_IFSOCK: + return LANDLOCK_ACCESS_FS_MAKE_SOCK; + default: + WARN_ON_ONCE(1); + return 0; + } +} + +static int hook_path_mknod(const struct path *dir, struct dentry *dentry, + umode_t mode, unsigned int dev) +{ + return current_check_access_path(dir, get_mode_access(mode)); +} + +static int hook_path_symlink(const struct path *dir, struct dentry *dentry, + const char *old_name) +{ + return current_check_access_path(dir, LANDLOCK_ACCESS_FS_MAKE_SYM); +} + +static int hook_path_truncate(const struct path *path) +{ + return current_check_access_path(path, LANDLOCK_ACCESS_FS_TRUNCATE); +} + +static int hook_path_unlink(const struct path *dir, struct dentry *dentry) +{ + return current_check_access_path(dir, LANDLOCK_ACCESS_FS_UNLINK); +} + +static int hook_path_rmdir(const struct path *dir, struct dentry *dentry) +{ + return current_check_access_path(dir, LANDLOCK_ACCESS_FS_RMDIR); +} + +static int hook_path_rename(const struct path *old_dir, + struct dentry *old_dentry, const struct path *new_dir, + struct dentry *new_dentry) +{ + struct landlock_ruleset *dom; + int err; + + dom = landlock_get_current_domain(); + if (!dom) + return 0; + err = check_access_path(dom, old_dir, LANDLOCK_ACCESS_FS_RENAME_FROM); + if (err) + return err; + return check_access_path(dom, new_dir, LANDLOCK_ACCESS_FS_RENAME_TO); +} + +static int hook_path_chmod(const struct path *path, umode_t mode) +{ + return current_check_access_path(path, LANDLOCK_ACCESS_FS_CHMOD); +} + +static int hook_path_chown(const struct path *path, kuid_t uid, kgid_t gid) +{ + struct landlock_ruleset *dom; + int err; + + dom = landlock_get_current_domain(); + if (!dom) + return 0; + if (uid_valid(uid)) { + err = check_access_path(dom, path, LANDLOCK_ACCESS_FS_CHOWN); + if (err) + return err; + } + if (gid_valid(gid)) { + err = check_access_path(dom, path, LANDLOCK_ACCESS_FS_CHGRP); + if (err) + return err; + } + return 0; +} + +static int hook_path_chroot(const struct path *path) +{ + return current_check_access_path(path, LANDLOCK_ACCESS_FS_CHROOT); +} + +/* Inode hooks */ + +static int hook_inode_alloc_security(struct inode *inode) +{ + inode_landlock(inode)->object = NULL; + return 0; +} + +static void hook_inode_free_security(struct inode *inode) +{ + WARN_ON_ONCE(rcu_access_pointer(inode_landlock(inode)->object)); +} + +static int hook_inode_getattr(const struct path *path) +{ + return current_check_access_path(path, LANDLOCK_ACCESS_FS_GETATTR); +} + +/* File hooks */ + +static inline u32 get_file_access(const struct file *file) +{ + u32 access = 0; + + if (file->f_mode & FMODE_READ) { + /* A directory can only be opened in read mode. */ + if (S_ISDIR(file_inode(file)->i_mode)) + access |= LANDLOCK_ACCESS_FS_READDIR; + else + access |= LANDLOCK_ACCESS_FS_READ; + } + /* + * A LANDLOCK_ACCESS_FS_APPEND could be added be we also need to check + * fcntl(2). + */ + if (file->f_mode & FMODE_WRITE) + access |= LANDLOCK_ACCESS_FS_WRITE; + /* __FMODE_EXEC is indeed part of f_flags, not f_mode. */ + if (file->f_flags & __FMODE_EXEC) + access |= LANDLOCK_ACCESS_FS_EXECUTE; + return access; +} + +static int hook_file_open(struct file *file) +{ + if (WARN_ON_ONCE(!file)) + return 0; + if (!file_inode(file)) + return -ENOENT; + return current_check_access_path(&file->f_path, + LANDLOCK_ACCESS_FS_OPEN | get_file_access(file)); +} + +static inline u32 get_mem_access(unsigned long prot, bool private) +{ + u32 access = LANDLOCK_ACCESS_FS_MAP; + + /* Private mapping do not write to files. */ + if (!private && (prot & PROT_WRITE)) + access |= LANDLOCK_ACCESS_FS_WRITE; + if (prot & PROT_READ) + access |= LANDLOCK_ACCESS_FS_READ; + if (prot & PROT_EXEC) + access |= LANDLOCK_ACCESS_FS_EXECUTE; + return access; +} + +static int hook_mmap_file(struct file *file, unsigned long reqprot, + unsigned long prot, unsigned long flags) +{ + /* @file can be null for anonymous mmap. */ + if (!file) + return 0; + return current_check_access_path(&file->f_path, + get_mem_access(prot, flags & MAP_PRIVATE)); +} + +static int hook_file_mprotect(struct vm_area_struct *vma, + unsigned long reqprot, unsigned long prot) +{ + if (WARN_ON_ONCE(!vma)) + return 0; + if (!vma->vm_file) + return 0; + return current_check_access_path(&vma->vm_file->f_path, + get_mem_access(prot, !(vma->vm_flags & VM_SHARED))); +} + +static int hook_file_ioctl(struct file *file, unsigned int cmd, + unsigned long arg) +{ + if (WARN_ON_ONCE(!file)) + return 0; + return current_check_access_path(&file->f_path, + LANDLOCK_ACCESS_FS_IOCTL); +} + +static int hook_file_lock(struct file *file, unsigned int cmd) +{ + if (WARN_ON_ONCE(!file)) + return 0; + return current_check_access_path(&file->f_path, + LANDLOCK_ACCESS_FS_LOCK); +} + +static struct security_hook_list landlock_hooks[] __lsm_ro_after_init = { + LSM_HOOK_INIT(sb_mount, hook_sb_mount), + LSM_HOOK_INIT(move_mount, hook_move_mount), + LSM_HOOK_INIT(sb_umount, hook_sb_umount), + LSM_HOOK_INIT(sb_remount, hook_sb_remount), + LSM_HOOK_INIT(sb_pivotroot, hook_sb_pivotroot), + + LSM_HOOK_INIT(path_link, hook_path_link), + LSM_HOOK_INIT(path_mkdir, hook_path_mkdir), + LSM_HOOK_INIT(path_mknod, hook_path_mknod), + LSM_HOOK_INIT(path_symlink, hook_path_symlink), + LSM_HOOK_INIT(path_truncate, hook_path_truncate), + LSM_HOOK_INIT(path_unlink, hook_path_unlink), + LSM_HOOK_INIT(path_rmdir, hook_path_rmdir), + LSM_HOOK_INIT(path_rename, hook_path_rename), + LSM_HOOK_INIT(path_chmod, hook_path_chmod), + LSM_HOOK_INIT(path_chown, hook_path_chown), + LSM_HOOK_INIT(path_chroot, hook_path_chroot), + + LSM_HOOK_INIT(inode_alloc_security, hook_inode_alloc_security), + LSM_HOOK_INIT(inode_free_security, hook_inode_free_security), + LSM_HOOK_INIT(inode_getattr, hook_inode_getattr), + + LSM_HOOK_INIT(file_open, hook_file_open), + LSM_HOOK_INIT(mmap_file, hook_mmap_file), + LSM_HOOK_INIT(file_mprotect, hook_file_mprotect), + LSM_HOOK_INIT(file_ioctl, hook_file_ioctl), + LSM_HOOK_INIT(file_lock, hook_file_lock), +}; + +__init void landlock_add_hooks_fs(void) +{ + security_add_hooks(landlock_hooks, ARRAY_SIZE(landlock_hooks), + LANDLOCK_NAME); +} diff --git a/security/landlock/fs.h b/security/landlock/fs.h new file mode 100644 index 000000000000..5d2ed8a1d4d4 --- /dev/null +++ b/security/landlock/fs.h @@ -0,0 +1,42 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Landlock LSM - Filesystem management and hooks + * + * Copyright © 2017-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2018-2020 ANSSI + */ + +#ifndef _SECURITY_LANDLOCK_FS_H +#define _SECURITY_LANDLOCK_FS_H + +#include <linux/fs.h> +#include <linux/init.h> +#include <linux/rcupdate.h> + +#include "ruleset.h" +#include "setup.h" + +struct landlock_inode_security { + /* + * We need an allocated object to be able to safely untie a rule from + * an object (i.e. unlink then free a rule), cf. put_rule(). This + * object is guarded by the underlying object's lock. + */ + struct landlock_object __rcu *object; +}; + +static inline struct landlock_inode_security *inode_landlock( + const struct inode *inode) +{ + return inode->i_security + landlock_blob_sizes.lbs_inode; +} + +__init void landlock_add_hooks_fs(void); + +void landlock_release_inode(struct inode *inode, + struct landlock_object *object); + +int landlock_append_fs_rule(struct landlock_ruleset *ruleset, + struct path *path, u64 actions); + +#endif /* _SECURITY_LANDLOCK_FS_H */ diff --git a/security/landlock/object.c b/security/landlock/object.c index 38fbbb108120..2d373f224989 100644 --- a/security/landlock/object.c +++ b/security/landlock/object.c @@ -29,6 +29,7 @@ #include <linux/spinlock.h> #include <linux/workqueue.h>
+#include "fs.h" #include "object.h"
struct landlock_object *landlock_create_object( @@ -138,6 +139,7 @@ static bool release_object(struct landlock_object *object)
switch (object->type) { case LANDLOCK_OBJECT_INODE: + landlock_release_inode(underlying_object, object); break; default: WARN_ON_ONCE(1); diff --git a/security/landlock/setup.c b/security/landlock/setup.c index 117afb344da6..93ef2dbe83ae 100644 --- a/security/landlock/setup.c +++ b/security/landlock/setup.c @@ -10,11 +10,15 @@ #include <linux/lsm_hooks.h>
#include "cred.h" +#include "fs.h" #include "ptrace.h" #include "setup.h"
+bool landlock_initialized __lsm_ro_after_init = false; + struct lsm_blob_sizes landlock_blob_sizes __lsm_ro_after_init = { .lbs_cred = sizeof(struct landlock_cred_security), + .lbs_inode = sizeof(struct landlock_inode_security), };
static int __init landlock_init(void) @@ -22,6 +26,8 @@ static int __init landlock_init(void) pr_info(LANDLOCK_NAME ": Registering hooks\n"); landlock_add_hooks_cred(); landlock_add_hooks_ptrace(); + landlock_add_hooks_fs(); + landlock_initialized = true; return 0; }
diff --git a/security/landlock/setup.h b/security/landlock/setup.h index 52eb8d806376..260fd2068b95 100644 --- a/security/landlock/setup.h +++ b/security/landlock/setup.h @@ -13,6 +13,8 @@
#define LANDLOCK_NAME "landlock"
+extern bool landlock_initialized; + extern struct lsm_blob_sizes landlock_blob_sizes;
#endif /* _SECURITY_LANDLOCK_SETUP_H */
On Mon, Feb 24, 2020 at 5:03 PM Mickaël Salaün mic@digikod.net wrote:
+static inline u32 get_mem_access(unsigned long prot, bool private) +{
u32 access = LANDLOCK_ACCESS_FS_MAP;
/* Private mapping do not write to files. */
if (!private && (prot & PROT_WRITE))
access |= LANDLOCK_ACCESS_FS_WRITE;
if (prot & PROT_READ)
access |= LANDLOCK_ACCESS_FS_READ;
if (prot & PROT_EXEC)
access |= LANDLOCK_ACCESS_FS_EXECUTE;
return access;
+}
When I do the following, is landlock going to detect that the mmap() is a read access, or is it incorrectly going to think that it's neither read nor write?
$ cat write-only.c #include <fcntl.h> #include <sys/mman.h> #include <stdio.h> int main(void) { int fd = open("/etc/passwd", O_RDONLY); char *ptr = mmap(NULL, 0x1000, PROT_WRITE, MAP_PRIVATE, fd, 0); printf("'%.*s'\n", 4, ptr); } $ gcc -o write-only write-only.c -Wall $ ./write-only 'root' $
On 26/02/2020 21:29, Jann Horn wrote:
On Mon, Feb 24, 2020 at 5:03 PM Mickaël Salaün mic@digikod.net wrote:
+static inline u32 get_mem_access(unsigned long prot, bool private) +{
u32 access = LANDLOCK_ACCESS_FS_MAP;
/* Private mapping do not write to files. */
if (!private && (prot & PROT_WRITE))
access |= LANDLOCK_ACCESS_FS_WRITE;
if (prot & PROT_READ)
access |= LANDLOCK_ACCESS_FS_READ;
if (prot & PROT_EXEC)
access |= LANDLOCK_ACCESS_FS_EXECUTE;
return access;
+}
When I do the following, is landlock going to detect that the mmap() is a read access, or is it incorrectly going to think that it's neither read nor write?
$ cat write-only.c #include <fcntl.h> #include <sys/mman.h> #include <stdio.h> int main(void) { int fd = open("/etc/passwd", O_RDONLY); char *ptr = mmap(NULL, 0x1000, PROT_WRITE, MAP_PRIVATE, fd, 0); printf("'%.*s'\n", 4, ptr); } $ gcc -o write-only write-only.c -Wall $ ./write-only 'root' $
Thanks to the "if (!private && (prot & PROT_WRITE))", Landlock allows this private mmap (as intended) even if there is no write access to this file, but not with a shared mmap (and a file opened with O_RDWR). I just added a test for this to be sure.
However, I'm not sure this hook is useful for now. Indeed, the process still need to have a file descriptor open with the right accesses.
On Thu, Feb 27, 2020 at 5:50 PM Mickaël Salaün mic@digikod.net wrote:
On 26/02/2020 21:29, Jann Horn wrote:
On Mon, Feb 24, 2020 at 5:03 PM Mickaël Salaün mic@digikod.net wrote:
+static inline u32 get_mem_access(unsigned long prot, bool private) +{
u32 access = LANDLOCK_ACCESS_FS_MAP;
/* Private mapping do not write to files. */
if (!private && (prot & PROT_WRITE))
access |= LANDLOCK_ACCESS_FS_WRITE;
if (prot & PROT_READ)
access |= LANDLOCK_ACCESS_FS_READ;
if (prot & PROT_EXEC)
access |= LANDLOCK_ACCESS_FS_EXECUTE;
return access;
+}
[...]
However, I'm not sure this hook is useful for now. Indeed, the process still need to have a file descriptor open with the right accesses.
Yeah, agreed.
This syscall, inspired from seccomp(2) and bpf(2), is designed to be used by unprivileged processes to sandbox themselves. It has the same usage restrictions as seccomp(2): no_new_privs check.
There is currently four commands: * get_features: Gets the supported features (required for backward compatibility and best-effort security). * create_ruleset: Creates a ruleset and returns its file descriptor. * add_rule: Adds a rule (e.g. file hierarchy access) to a ruleset, identified by the dedicated file descriptor. * enforce_ruleset: Enforces a ruleset on the current thread (similar to seccomp).
See the user and code documentation for more details.
Signed-off-by: Mickaël Salaün mic@digikod.net Cc: Andy Lutomirski luto@amacapital.net Cc: Arnd Bergmann arnd@arndb.de Cc: James Morris jmorris@namei.org Cc: Kees Cook keescook@chromium.org Cc: Serge E. Hallyn serge@hallyn.com ---
Changes since v13: * New implementation, replacing the dependency on seccomp(2) and bpf(2). --- include/linux/syscalls.h | 3 + include/uapi/linux/landlock.h | 213 +++++++++++++++ security/landlock/Makefile | 2 +- security/landlock/ruleset.c | 3 + security/landlock/syscall.c | 470 ++++++++++++++++++++++++++++++++++ 5 files changed, 690 insertions(+), 1 deletion(-) create mode 100644 security/landlock/syscall.c
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 1815065d52f3..beaadcf4ef77 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1003,6 +1003,9 @@ asmlinkage long sys_pidfd_send_signal(int pidfd, int sig, siginfo_t __user *info, unsigned int flags); asmlinkage long sys_pidfd_getfd(int pidfd, int fd, unsigned int flags); +asmlinkage long sys_landlock(unsigned int command, unsigned int options, + size_t attr1_size, void __user *attr1_ptr, + size_t attr2_size, void __user *attr2_ptr);
/* * Architecture-specific system calls diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h index 92760aca3645..0b6d3e9f4b37 100644 --- a/include/uapi/linux/landlock.h +++ b/include/uapi/linux/landlock.h @@ -9,6 +9,219 @@ #ifndef _UAPI__LINUX_LANDLOCK_H__ #define _UAPI__LINUX_LANDLOCK_H__
+#include <linux/types.h> + +/** + * enum landlock_cmd - Landlock commands + * + * First argument of sys_landlock(). + */ +enum landlock_cmd { + /** + * @LANDLOCK_CMD_GET_FEATURES: Asks the kernel for supported Landlock + * features. The option argument must contains + * %LANDLOCK_OPT_GET_FEATURES. This commands fills the &struct + * landlock_attr_features provided as first attribute. + */ + LANDLOCK_CMD_GET_FEATURES = 1, + /** + * @LANDLOCK_CMD_CREATE_RULESET: Creates a new ruleset and return its + * file descriptor on success. The option argument must contains + * %LANDLOCK_OPT_CREATE_RULESET. The ruleset is defined by the &struct + * landlock_attr_ruleset provided as first attribute. + */ + LANDLOCK_CMD_CREATE_RULESET, + /** + * @LANDLOCK_CMD_ADD_RULE: Adds a rule to a ruleset. The option + * argument must contains %LANDLOCK_OPT_ADD_RULE_PATH_BENEATH. The + * ruleset and the rule are both defined by the &struct + * landlock_attr_path_beneath provided as first attribute. + */ + LANDLOCK_CMD_ADD_RULE, + /** + * @LANDLOCK_CMD_ENFORCE_RULESET: Enforces a ruleset on the current + * process. The option argument must contains + * %LANDLOCK_OPT_ENFORCE_RULESET. The ruleset is defined by the + * &struct landlock_attr_enforce provided as first attribute. + */ + LANDLOCK_CMD_ENFORCE_RULESET, +}; + +/** + * DOC: options_intro + * + * These options may be used as second argument of sys_landlock(). Each + * command have a dedicated set of options, represented as bitmasks. For two + * different commands, their options may overlap. Each command have at least + * one option defining the used attribute type. This also enables to always + * have a usable &struct landlock_attr_features (i.e. filled with bits). + */ + +/** + * DOC: options_get_features + * + * Options for ``LANDLOCK_CMD_GET_FEATURES`` + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + * + * - %LANDLOCK_OPT_GET_FEATURES: the attr type is `struct + * landlock_attr_features`. + */ +#define LANDLOCK_OPT_GET_FEATURES (1ULL << 0) + +/** + * DOC: options_create_ruleset + * + * Options for ``LANDLOCK_CMD_CREATE_RULESET`` + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + * + * - %LANDLOCK_OPT_CREATE_RULESET: the attr type is `struct + * landlock_attr_ruleset`. + */ +#define LANDLOCK_OPT_CREATE_RULESET (1ULL << 0) + +/** + * DOC: options_add_rule + * + * Options for ``LANDLOCK_CMD_ADD_RULE`` + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + * + * - %LANDLOCK_OPT_ADD_RULE_PATH_BENEATH: the attr type is `struct + * landlock_attr_path_beneath`. + */ +#define LANDLOCK_OPT_ADD_RULE_PATH_BENEATH (1ULL << 0) + +/** + * DOC: options_enforce_ruleset + * + * Options for ``LANDLOCK_CMD_ENFORCE_RULESET`` + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + * + * - %LANDLOCK_OPT_ENFORCE_RULESET: the attr type is `struct + * landlock_attr_enforce`. + */ +#define LANDLOCK_OPT_ENFORCE_RULESET (1ULL << 0) + +/** + * struct landlock_attr_features - Receives the supported features + * + * This struct should be allocated by user space but it will be filled by the + * kernel to indicate the subset of Landlock features effectively handled by + * the running kernel. This enables backward compatibility for applications + * which are developed on a newer kernel than the one running the application. + * This helps avoid hard errors that may entirely disable the use of Landlock + * features because some of them may not be supported. Indeed, because + * Landlock is a security feature, even if the kernel doesn't support all the + * requested features, user space applications should still use the subset + * which is supported by the running kernel. Indeed, a partial security policy + * can still improve the security of the application and better protect the + * user (i.e. best-effort approach). The %LANDLOCK_CMD_GET_FEATURES command + * and &struct landlock_attr_features are future-proof because the future + * unknown fields requested by user space (i.e. a larger &struct + * landlock_attr_features) can still be filled with zeros. + * + * The Landlock commands will fail if an unsupported option or access is + * requested. By firstly requesting the supported options and accesses, it is + * quite easy for the developer to binary AND these returned bitmasks with the + * used options and accesses from the attribute structs (e.g. &struct + * landlock_attr_ruleset), and even infer the supported Landlock commands. + * Indeed, because each command must support at least one option, the options_* + * fields are always filled if the related commands are supported. The + * supported attributes are also discoverable thanks to the size_* fields. All + * this data enable to create applications doing their best to sandbox + * themselves regardless of the running kernel. + */ +struct landlock_attr_features { + /** + * @options_get_features: Options supported by the + * %LANDLOCK_CMD_GET_FEATURES command. Cf. `Options`_. + */ + __aligned_u64 options_get_features; + /** + * @options_create_ruleset: Options supported by the + * %LANDLOCK_CMD_CREATE_RULESET command. Cf. `Options`_. + */ + __aligned_u64 options_create_ruleset; + /** + * @options_add_rule: Options supported by the %LANDLOCK_CMD_ADD_RULE + * command. Cf. `Options`_. + */ + __aligned_u64 options_add_rule; + /** + * @options_enforce_ruleset: Options supported by the + * %LANDLOCK_CMD_ENFORCE_RULESET command. Cf. `Options`_. + */ + __aligned_u64 options_enforce_ruleset; + /** + * @access_fs: Subset of file system access supported by the running + * kernel, used in &struct landlock_attr_ruleset and &struct + * landlock_attr_path_beneath. Cf. `Filesystem flags`_. + */ + __aligned_u64 access_fs; + /** + * @size_attr_ruleset: Size of the &struct landlock_attr_ruleset as + * known by the kernel (i.e. ``sizeof(struct + * landlock_attr_ruleset)``). + */ + __aligned_u64 size_attr_ruleset; + /** + * @size_attr_path_beneath: Size of the &struct + * landlock_attr_path_beneath as known by the kernel (i.e. + * ``sizeof(struct landlock_path_beneath)``). + */ + __aligned_u64 size_attr_path_beneath; +}; + +/** + * struct landlock_attr_ruleset- Defines a new ruleset + * + * Used as first attribute for the %LANDLOCK_CMD_CREATE_RULESET command and + * with the %LANDLOCK_OPT_CREATE_RULESET option. + */ +struct landlock_attr_ruleset { + /** + * @handled_access_fs: Bitmask of actions (cf. `Filesystem flags`_) + * that is handled by this ruleset and should then be forbidden if no + * rule explicitly allow them. This is needed for backward + * compatibility reasons. The user space code should check the + * effectively supported actions thanks to %LANDLOCK_CMD_GET_SUPPORTED + * and &struct landlock_attr_features, and then adjust the arguments of + * the next calls to sys_landlock() accordingly. + */ + __aligned_u64 handled_access_fs; +}; + +/** + * struct landlock_attr_path_beneath - Defines a path hierarchy + */ +struct landlock_attr_path_beneath { + /** + * @ruleset_fd: File descriptor tied to the ruleset which should be + * extended with this new access. + */ + __aligned_u64 ruleset_fd; + /** + * @parent_fd: File descriptor, open with ``O_PATH``, which identify + * the parent directory of a file hierarchy, or just a file. + */ + __aligned_u64 parent_fd; + /** + * @allowed_access: Bitmask of allowed actions for this file hierarchy + * (cf. `Filesystem flags`_). + */ + __aligned_u64 allowed_access; +}; + +/** + * struct landlock_attr_enforce - Describes the enforcement + */ +struct landlock_attr_enforce { + /** + * @ruleset_fd: File descriptor tied to the ruleset to merge with the + * current domain. + */ + __aligned_u64 ruleset_fd; +}; + /** * DOC: fs_access * diff --git a/security/landlock/Makefile b/security/landlock/Makefile index 92e3d80ab8ed..4388494779ec 100644 --- a/security/landlock/Makefile +++ b/security/landlock/Makefile @@ -1,4 +1,4 @@ obj-$(CONFIG_SECURITY_LANDLOCK) := landlock.o
-landlock-y := setup.o object.o ruleset.o \ +landlock-y := setup.o syscall.o object.o ruleset.o \ cred.o ptrace.o fs.o diff --git a/security/landlock/ruleset.c b/security/landlock/ruleset.c index 5ec013a4188d..fab17110804f 100644 --- a/security/landlock/ruleset.c +++ b/security/landlock/ruleset.c @@ -17,6 +17,7 @@ #include <linux/slab.h> #include <linux/spinlock.h> #include <linux/workqueue.h> +#include <uapi/linux/landlock.h>
#include "object.h" #include "ruleset.h" @@ -40,6 +41,8 @@ struct landlock_ruleset *landlock_create_ruleset(u64 fs_access_mask) struct landlock_ruleset *ruleset;
/* Safely handles 32-bits conversion. */ + BUILD_BUG_ON(!__same_type(fs_access_mask, ((struct + landlock_attr_ruleset *)NULL)->handled_access_fs)); BUILD_BUG_ON(!__same_type(fs_access_mask, _LANDLOCK_ACCESS_FS_LAST));
/* Checks content. */ diff --git a/security/landlock/syscall.c b/security/landlock/syscall.c new file mode 100644 index 000000000000..da80e3061b5a --- /dev/null +++ b/security/landlock/syscall.c @@ -0,0 +1,470 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Landlock LSM - System call and user space interfaces + * + * Copyright © 2016-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2018-2020 ANSSI + */ + +#include <asm/current.h> +#include <linux/anon_inodes.h> +#include <linux/build_bug.h> +#include <linux/capability.h> +#include <linux/dcache.h> +#include <linux/err.h> +#include <linux/errno.h> +#include <linux/fs.h> +#include <linux/landlock.h> +#include <linux/limits.h> +#include <linux/path.h> +#include <linux/rcupdate.h> +#include <linux/refcount.h> +#include <linux/sched.h> +#include <linux/security.h> +#include <linux/syscalls.h> +#include <linux/types.h> +#include <linux/uaccess.h> +#include <uapi/linux/landlock.h> + +#include "cred.h" +#include "fs.h" +#include "ruleset.h" +#include "setup.h" + +/** + * copy_struct_if_any_from_user - Safe future-proof argument copying + * + * Extend copy_struct_from_user() to handle NULL @src, which allows for future + * use of @src even if it is not used right now. + * + * @dst: kernel space pointer or NULL + * @ksize: size of the data pointed by @dst + * @src: user space pointer or NULL + * @usize: size of the data pointed by @src + */ +static int copy_struct_if_any_from_user(void *dst, size_t ksize, + const void __user *src, size_t usize) +{ + int ret; + + if (dst) { + if (WARN_ON_ONCE(ksize == 0)) + return -EFAULT; + } else { + if (WARN_ON_ONCE(ksize != 0)) + return -EFAULT; + } + if (!src) { + if (usize != 0) + return -EFAULT; + if (dst) + memset(dst, 0, ksize); + return 0; + } + if (usize == 0) + return -ENODATA; + if (usize > PAGE_SIZE) + return -E2BIG; + if (dst) + return copy_struct_from_user(dst, ksize, src, usize); + ret = check_zeroed_user(src, usize); + if (ret <= 0) + return ret ?: -E2BIG; + return 0; +} + +/* Features */ + +#define _LANDLOCK_OPT_GET_FEATURES_LAST LANDLOCK_OPT_GET_FEATURES +#define _LANDLOCK_OPT_GET_FEATURES_MASK ((_LANDLOCK_OPT_GET_FEATURES_LAST << 1) - 1) + +#define _LANDLOCK_OPT_CREATE_RULESET_LAST LANDLOCK_OPT_CREATE_RULESET +#define _LANDLOCK_OPT_CREATE_RULESET_MASK ((_LANDLOCK_OPT_CREATE_RULESET_LAST << 1) - 1) + +#define _LANDLOCK_OPT_ADD_RULE_LAST LANDLOCK_OPT_ADD_RULE_PATH_BENEATH +#define _LANDLOCK_OPT_ADD_RULE_MASK ((_LANDLOCK_OPT_ADD_RULE_LAST << 1) - 1) + +#define _LANDLOCK_OPT_ENFORCE_RULESET_LAST LANDLOCK_OPT_ENFORCE_RULESET +#define _LANDLOCK_OPT_ENFORCE_RULESET_MASK ((_LANDLOCK_OPT_ENFORCE_RULESET_LAST << 1) - 1) + +static int syscall_get_features(size_t attr_size, void __user *attr_ptr) +{ + size_t data_size, fill_size; + struct landlock_attr_features supported = { + .options_get_features = _LANDLOCK_OPT_GET_FEATURES_MASK, + .options_create_ruleset = _LANDLOCK_OPT_CREATE_RULESET_MASK, + .options_add_rule = _LANDLOCK_OPT_ADD_RULE_MASK, + .options_enforce_ruleset = _LANDLOCK_OPT_ENFORCE_RULESET_MASK, + .access_fs = _LANDLOCK_ACCESS_FS_MASK, + .size_attr_ruleset = sizeof(struct landlock_attr_ruleset), + .size_attr_path_beneath = sizeof(struct + landlock_attr_path_beneath), + }; + + if (attr_size == 0) + return -ENODATA; + if (attr_size > PAGE_SIZE) + return -E2BIG; + data_size = min(sizeof(supported), attr_size); + if (copy_to_user(attr_ptr, &supported, data_size)) + return -EFAULT; + /* Fills the rest with zeros. */ + fill_size = attr_size - data_size; + if (fill_size > 0 && clear_user(attr_ptr + data_size, fill_size)) + return -EFAULT; + return 0; +} + +/* Ruleset handling */ + +#ifdef CONFIG_PROC_FS +static void fop_ruleset_show_fdinfo(struct seq_file *m, struct file *filp) +{ + const struct landlock_ruleset *ruleset = filp->private_data; + + seq_printf(m, "handled_access_fs:\t%x\n", ruleset->fs_access_mask); + seq_printf(m, "nb_rules:\t%d\n", atomic_read(&ruleset->nb_rules)); +} +#endif + +static int fop_ruleset_release(struct inode *inode, struct file *filp) +{ + struct landlock_ruleset *ruleset = filp->private_data; + + landlock_put_ruleset(ruleset); + return 0; +} + +static ssize_t fop_dummy_read(struct file *filp, char __user *buf, size_t size, + loff_t *ppos) +{ + /* Dummy handler to enable FMODE_CAN_READ. */ + return -EINVAL; +} + +static ssize_t fop_dummy_write(struct file *filp, const char __user *buf, + size_t size, loff_t *ppos) +{ + /* Dummy handler to enable FMODE_CAN_WRITE. */ + return -EINVAL; +} + +/* + * A ruleset file descriptor enables to build a ruleset by adding (i.e. + * writing) rule after rule, without relying on the task's context. This + * reentrant design is also used in a read way to enforce the ruleset on the + * current task. + */ +static const struct file_operations ruleset_fops = { +#ifdef CONFIG_PROC_FS + .show_fdinfo = fop_ruleset_show_fdinfo, +#endif + .release = fop_ruleset_release, + .read = fop_dummy_read, + .write = fop_dummy_write, +}; + +static int syscall_create_ruleset(size_t attr_size, void __user *attr_ptr) +{ + struct landlock_attr_ruleset attr_ruleset; + struct landlock_ruleset *ruleset; + int err, ruleset_fd; + + /* Copies raw userspace struct. */ + err = copy_struct_if_any_from_user(&attr_ruleset, sizeof(attr_ruleset), + attr_ptr, attr_size); + if (err) + return err; + + /* Checks arguments and transform to kernel struct. */ + ruleset = landlock_create_ruleset(attr_ruleset.handled_access_fs); + if (IS_ERR(ruleset)) + return PTR_ERR(ruleset); + + /* Creates anonymous FD referring to the ruleset, with safe flags. */ + ruleset_fd = anon_inode_getfd("landlock-ruleset", &ruleset_fops, + ruleset, O_RDWR | O_CLOEXEC); + if (ruleset_fd < 0) + landlock_put_ruleset(ruleset); + return ruleset_fd; +} + +/* + * Returns an owned ruleset from a FD. It is thus needed to call + * landlock_put_ruleset() on the return value. + */ +static struct landlock_ruleset *get_ruleset_from_fd(u64 fd, fmode_t mode) +{ + struct fd ruleset_f; + struct landlock_ruleset *ruleset; + int err; + + BUILD_BUG_ON(!__same_type(fd, + ((struct landlock_attr_path_beneath *)NULL)->ruleset_fd)); + BUILD_BUG_ON(!__same_type(fd, + ((struct landlock_attr_enforce *)NULL)->ruleset_fd)); + /* Checks 32-bits overflow. fdget() checks for INT_MAX/FD. */ + if (fd > U32_MAX) + return ERR_PTR(-EINVAL); + ruleset_f = fdget(fd); + if (!ruleset_f.file) + return ERR_PTR(-EBADF); + err = 0; + if (ruleset_f.file->f_op != &ruleset_fops) + err = -EBADR; + else if (!(ruleset_f.file->f_mode & mode)) + err = -EPERM; + if (!err) { + ruleset = ruleset_f.file->private_data; + landlock_get_ruleset(ruleset); + } + fdput(ruleset_f); + return err ? ERR_PTR(err) : ruleset; +} + +/* Path handling */ + +static inline bool is_user_mountable(struct dentry *dentry) +{ + /* + * Check pseudo-filesystems that will never be mountable (e.g. sockfs, + * pipefs, bdev), cf. fs/libfs.c:init_pseudo(). + */ + return d_is_positive(dentry) && + !IS_PRIVATE(dentry->d_inode) && + !(dentry->d_sb->s_flags & SB_NOUSER); +} + +/* + * @path: Must call put_path(@path) after the call if it succeeded. + */ +static int get_path_from_fd(u64 fd, struct path *path) +{ + struct fd f; + int err; + + BUILD_BUG_ON(!__same_type(fd, + ((struct landlock_attr_path_beneath *)NULL)->parent_fd)); + /* Checks 32-bits overflow. fdget_raw() checks for INT_MAX/FD. */ + if (fd > U32_MAX) + return -EINVAL; + /* Handles O_PATH. */ + f = fdget_raw(fd); + if (!f.file) + return -EBADF; + /* + * Forbids to add to a ruleset a path which is forbidden to open (by + * Landlock, another LSM, DAC...). Because the file was open with + * O_PATH, the file mode doesn't have FMODE_READ nor FMODE_WRITE. + * + * WARNING: security_file_open() was only called in do_dentry_open() + * until now. The main difference now is that f_op may be NULL. This + * field doesn't seem to be dereferenced by any upstream LSM though. + */ + err = security_file_open(f.file); + if (err) + goto out_fdput; + /* + * Only allows O_PATH FD: enable to restrict ambiant (FS) accesses + * without requiring to open and risk leaking or misuing a FD. Accept + * removed, but still open directory (S_DEAD). + */ + if (!(f.file->f_mode & FMODE_PATH) || !f.file->f_path.mnt || + !is_user_mountable(f.file->f_path.dentry)) { + err = -EBADR; + goto out_fdput; + } + path->mnt = f.file->f_path.mnt; + path->dentry = f.file->f_path.dentry; + path_get(path); + +out_fdput: + fdput(f); + return err; +} + +static int syscall_add_rule_path_beneath(size_t attr_size, + void __user *attr_ptr) +{ + struct landlock_attr_path_beneath attr_path_beneath; + struct path path; + struct landlock_ruleset *ruleset; + int err; + + /* Copies raw userspace struct. */ + err = copy_struct_if_any_from_user(&attr_path_beneath, + sizeof(attr_path_beneath), attr_ptr, attr_size); + if (err) + return err; + + /* Gets the ruleset. */ + ruleset = get_ruleset_from_fd(attr_path_beneath.ruleset_fd, + FMODE_CAN_WRITE); + if (IS_ERR(ruleset)) + return PTR_ERR(ruleset); + + /* Checks content (fs_access_mask is upgraded to 64-bits). */ + if ((attr_path_beneath.allowed_access | ruleset->fs_access_mask) != + ruleset->fs_access_mask) { + err = -EINVAL; + goto out_put_ruleset; + } + + err = get_path_from_fd(attr_path_beneath.parent_fd, &path); + if (err) + goto out_put_ruleset; + + err = landlock_append_fs_rule(ruleset, &path, + attr_path_beneath.allowed_access); + path_put(&path); + +out_put_ruleset: + landlock_put_ruleset(ruleset); + return err; +} + +/* Enforcement */ + +static int syscall_enforce_ruleset(size_t attr_size, + void __user *attr_ptr) +{ + struct landlock_ruleset *new_dom, *ruleset; + struct cred *new_cred; + struct landlock_cred_security *new_llcred; + struct landlock_attr_enforce attr_enforce; + int err; + + /* + * Enforcing a Landlock ruleset requires that the task has + * CAP_SYS_ADMIN in its namespace or be running with no_new_privs. + * This avoids scenarios where unprivileged tasks can affect the + * behavior of privileged children. These are similar checks as for + * seccomp(2), except that an -EPERM may be returned. + */ + if (!task_no_new_privs(current)) { + err = security_capable(current_cred(), current_user_ns(), + CAP_SYS_ADMIN, CAP_OPT_NOAUDIT); + if (err) + return err; + } + + /* Copies raw userspace struct. */ + err = copy_struct_if_any_from_user(&attr_enforce, sizeof(attr_enforce), + attr_ptr, attr_size); + if (err) + return err; + + /* Get the ruleset. */ + ruleset = get_ruleset_from_fd(attr_enforce.ruleset_fd, FMODE_CAN_READ); + if (IS_ERR(ruleset)) + return PTR_ERR(ruleset); + /* Informs about useless ruleset. */ + if (!atomic_read(&ruleset->nb_rules)) { + err = -ENOMSG; + goto out_put_ruleset; + } + + new_cred = prepare_creds(); + if (!new_cred) { + err = -ENOMEM; + goto out_put_ruleset; + } + new_llcred = landlock_cred(new_cred); + /* + * There is no possible race condition while copying and manipulating + * the current credentials because they are dedicated per thread. + */ + new_dom = landlock_merge_ruleset(new_llcred->domain, ruleset); + if (IS_ERR(new_dom)) { + err = PTR_ERR(new_dom); + goto out_put_creds; + } + /* Replaces the old (prepared) domain. */ + landlock_put_ruleset(new_llcred->domain); + new_llcred->domain = new_dom; + + landlock_put_ruleset(ruleset); + return commit_creds(new_cred); + +out_put_creds: + abort_creds(new_cred); + +out_put_ruleset: + landlock_put_ruleset(ruleset); + return err; +} + +/** + * landlock - System call to enable a process to safely sandbox itself + * + * @command: Landlock command to perform miscellaneous, but safe, actions. Cf. + * `Commands`_. + * @options: Bitmask of options dedicated to one command. Cf. `Options`_. + * @attr1_size: First attribute size (i.e. size of the struct). + * @attr1_ptr: Pointer to the first attribute. Cf. `Attributes`_. + * @attr2_size: Unused for now. + * @attr2_ptr: Unused for now. + * + * The @command and @options arguments enable a seccomp-bpf policy to control + * the requested actions. However, it should be noted that Landlock is + * designed from the ground to enable unprivileged process to drop privileges + * and accesses in a way that can not harm other processes. This syscall and + * all its arguments should then be allowed for any process, which will then + * enable applications to strengthen the security of the whole system. + * + * @attr2_size and @attr2_ptr describe a second attribute which could be used + * in the future to compose with the first attribute (e.g. a + * landlock_attr_path_beneath with a landlock_attr_ioctl). + * + * The order of return errors begins with ENOPKG (disabled Landlock), + * EOPNOTSUPP (unknown command or option) and then EINVAL (invalid attribute). + * The other error codes may be specific to each command. + */ +SYSCALL_DEFINE6(landlock, unsigned int, command, unsigned int, options, + size_t, attr1_size, void __user *, attr1_ptr, + size_t, attr2_size, void __user *, attr2_ptr) +{ + /* + * Enables user space to identify if Landlock is disabled, thanks to a + * specific error code. + */ + if (!landlock_initialized) + return -ENOPKG; + + switch ((enum landlock_cmd)command) { + case LANDLOCK_CMD_GET_FEATURES: + if (options == LANDLOCK_OPT_GET_FEATURES) { + if (attr2_size || attr2_ptr) + return -EINVAL; + return syscall_get_features(attr1_size, attr1_ptr); + } + return -EOPNOTSUPP; + case LANDLOCK_CMD_CREATE_RULESET: + if (options == LANDLOCK_OPT_CREATE_RULESET) { + if (attr2_size || attr2_ptr) + return -EINVAL; + return syscall_create_ruleset(attr1_size, attr1_ptr); + } + return -EOPNOTSUPP; + case LANDLOCK_CMD_ADD_RULE: + /* + * A future extension could add a + * LANDLOCK_OPT_ADD_RULE_PATH_RANGE. + */ + if (options == LANDLOCK_OPT_ADD_RULE_PATH_BENEATH) { + if (attr2_size || attr2_ptr) + return -EINVAL; + return syscall_add_rule_path_beneath(attr1_size, + attr1_ptr); + } + return -EOPNOTSUPP; + case LANDLOCK_CMD_ENFORCE_RULESET: + if (options == LANDLOCK_OPT_ENFORCE_RULESET) { + if (attr2_size || attr2_ptr) + return -EINVAL; + return syscall_enforce_ruleset(attr1_size, attr1_ptr); + } + return -EOPNOTSUPP; + } + return -EOPNOTSUPP; +}
On Mon, Feb 24, 2020 at 05:02:11PM +0100, Mickaël Salaün wrote:
+static int get_path_from_fd(u64 fd, struct path *path)
- /*
* Only allows O_PATH FD: enable to restrict ambiant (FS) accesses
* without requiring to open and risk leaking or misuing a FD. Accept
* removed, but still open directory (S_DEAD).
*/
- if (!(f.file->f_mode & FMODE_PATH) || !f.file->f_path.mnt ||
^^^^^^^^^^^^^^^^^^^ Could you explain what that one had been be about? The underlined subexpression is always false; was that supposed to check some condition and if so, which one?
On 17/03/2020 17:47, Al Viro wrote:
On Mon, Feb 24, 2020 at 05:02:11PM +0100, Mickaël Salaün wrote:
+static int get_path_from_fd(u64 fd, struct path *path)
- /*
* Only allows O_PATH FD: enable to restrict ambiant (FS) accesses
* without requiring to open and risk leaking or misuing a FD. Accept
* removed, but still open directory (S_DEAD).
*/
- if (!(f.file->f_mode & FMODE_PATH) || !f.file->f_path.mnt ||
^^^^^^^^^^^^^^^^^^^
Could you explain what that one had been be about? The underlined subexpression is always false; was that supposed to check some condition and if so, which one?
This was just to be sure that the next assignment "path->mnt = f.file->f_path.mnt;" always creates a valid path. If this is always true, I will remove it.
Wire up the landlock() call for x86_64 (for now).
Signed-off-by: Mickaël Salaün mic@digikod.net Cc: Andy Lutomirski luto@amacapital.net Cc: Arnd Bergmann arnd@arndb.de Cc: James Morris jmorris@namei.org Cc: Kees Cook keescook@chromium.org Cc: Serge E. Hallyn serge@hallyn.com ---
Changes since v13: * New implementation. --- arch/x86/entry/syscalls/syscall_64.tbl | 1 + include/uapi/asm-generic/unistd.h | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 44d510bc9b78..3e759505c8bf 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -359,6 +359,7 @@ 435 common clone3 __x64_sys_clone3/ptregs 437 common openat2 __x64_sys_openat2 438 common pidfd_getfd __x64_sys_pidfd_getfd +439 common landlock __x64_sys_landlock
# # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 3a3201e4618e..31d5814ddb13 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -855,9 +855,11 @@ __SYSCALL(__NR_clone3, sys_clone3) __SYSCALL(__NR_openat2, sys_openat2) #define __NR_pidfd_getfd 438 __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd) +#define __NR_landlock 439 +__SYSCALL(__NR_landlock, sys_landlock)
#undef __NR_syscalls -#define __NR_syscalls 439 +#define __NR_syscalls 440
/* * 32 bit systems traditionally used different
Test landlock syscall, ptrace hooks semantic and filesystem access-control.
This is an initial batch, more tests will follow.
Signed-off-by: Mickaël Salaün mic@digikod.net Reviewed-by: Vincent Dagonneau vincent.dagonneau@ssi.gouv.fr Cc: Andy Lutomirski luto@amacapital.net Cc: James Morris jmorris@namei.org Cc: Kees Cook keescook@chromium.org Cc: Serge E. Hallyn serge@hallyn.com Cc: Shuah Khan shuah@kernel.org ---
Changes since v13: * Add back the filesystem tests (from v10) and extend them. * Add tests for the new syscall.
Previous version: https://lore.kernel.org/lkml/20191104172146.30797-7-mic@digikod.net/ --- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/landlock/.gitignore | 3 + tools/testing/selftests/landlock/Makefile | 13 + tools/testing/selftests/landlock/config | 4 + tools/testing/selftests/landlock/test.h | 40 ++ tools/testing/selftests/landlock/test_base.c | 80 +++ tools/testing/selftests/landlock/test_fs.c | 624 ++++++++++++++++++ .../testing/selftests/landlock/test_ptrace.c | 293 ++++++++ 8 files changed, 1058 insertions(+) create mode 100644 tools/testing/selftests/landlock/.gitignore create mode 100644 tools/testing/selftests/landlock/Makefile create mode 100644 tools/testing/selftests/landlock/config create mode 100644 tools/testing/selftests/landlock/test.h create mode 100644 tools/testing/selftests/landlock/test_base.c create mode 100644 tools/testing/selftests/landlock/test_fs.c create mode 100644 tools/testing/selftests/landlock/test_ptrace.c
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 6ec503912bea..5183f269c244 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -24,6 +24,7 @@ TARGETS += ir TARGETS += kcmp TARGETS += kexec TARGETS += kvm +TARGETS += landlock TARGETS += lib TARGETS += livepatch TARGETS += lkdtm diff --git a/tools/testing/selftests/landlock/.gitignore b/tools/testing/selftests/landlock/.gitignore new file mode 100644 index 000000000000..4ee53c733af0 --- /dev/null +++ b/tools/testing/selftests/landlock/.gitignore @@ -0,0 +1,3 @@ +/test_base +/test_fs +/test_ptrace diff --git a/tools/testing/selftests/landlock/Makefile b/tools/testing/selftests/landlock/Makefile new file mode 100644 index 000000000000..c7e26e1251c4 --- /dev/null +++ b/tools/testing/selftests/landlock/Makefile @@ -0,0 +1,13 @@ +# SPDX-License-Identifier: GPL-2.0 + +test_src := $(wildcard test_*.c) + +TEST_GEN_PROGS := $(test_src:.c=) + +usr_include := ../../../../usr/include + +CFLAGS += -Wall -O2 -I$(usr_include) + +include ../lib.mk + +$(TEST_GEN_PROGS): ../kselftest_harness.h $(usr_include)/linux/landlock.h diff --git a/tools/testing/selftests/landlock/config b/tools/testing/selftests/landlock/config new file mode 100644 index 000000000000..662f72c5a0df --- /dev/null +++ b/tools/testing/selftests/landlock/config @@ -0,0 +1,4 @@ +CONFIG_HEADERS_INSTALL=y +CONFIG_SECURITY_LANDLOCK=y +CONFIG_SECURITY_PATH=y +CONFIG_SECURITY=y diff --git a/tools/testing/selftests/landlock/test.h b/tools/testing/selftests/landlock/test.h new file mode 100644 index 000000000000..f9cebd8fc169 --- /dev/null +++ b/tools/testing/selftests/landlock/test.h @@ -0,0 +1,40 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Landlock test helpers + * + * Copyright © 2017-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2019-2020 ANSSI + */ + +#include <errno.h> +#include <sys/syscall.h> + +#include "../kselftest_harness.h" + +#ifndef landlock +static inline int landlock(unsigned int command, unsigned int options, + size_t attr_size, void *attr_ptr) +{ + errno = 0; + return syscall(__NR_landlock, command, options, attr_size, attr_ptr, 0, + NULL); +} +#endif + +FIXTURE(ruleset_rw) { + struct landlock_attr_ruleset attr_ruleset; + int ruleset_fd; +}; + +FIXTURE_SETUP(ruleset_rw) { + self->attr_ruleset.handled_access_fs = LANDLOCK_ACCESS_FS_READ | + LANDLOCK_ACCESS_FS_WRITE; + self->ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET, + LANDLOCK_OPT_CREATE_RULESET, + sizeof(self->attr_ruleset), &self->attr_ruleset); + ASSERT_LE(0, self->ruleset_fd); +} + +FIXTURE_TEARDOWN(ruleset_rw) { + ASSERT_EQ(0, close(self->ruleset_fd)); +} diff --git a/tools/testing/selftests/landlock/test_base.c b/tools/testing/selftests/landlock/test_base.c new file mode 100644 index 000000000000..1ac7dbead3b2 --- /dev/null +++ b/tools/testing/selftests/landlock/test_base.c @@ -0,0 +1,80 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Landlock tests - common resources + * + * Copyright © 2017-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2019-2020 ANSSI + */ + +#define _GNU_SOURCE +#include <errno.h> +#include <fcntl.h> +#include <linux/landlock.h> +#include <sys/prctl.h> + +#include "test.h" + +#define FDINFO_TEMPLATE "/proc/self/fdinfo/%d" +#define FDINFO_SIZE 128 + +#ifndef O_PATH +#define O_PATH 010000000 +#endif + +TEST_F(ruleset_rw, fdinfo) +{ + int fdinfo_fd, fdinfo_path_size, fdinfo_buf_size; + char fdinfo_path[sizeof(FDINFO_TEMPLATE) + 2]; + char fdinfo_buf[FDINFO_SIZE]; + + fdinfo_path_size = snprintf(fdinfo_path, sizeof(fdinfo_path), + FDINFO_TEMPLATE, self->ruleset_fd); + ASSERT_LE(fdinfo_path_size, sizeof(fdinfo_path)); + + fdinfo_fd = open(fdinfo_path, O_RDONLY | O_CLOEXEC); + ASSERT_GE(fdinfo_fd, 0); + + fdinfo_buf_size = read(fdinfo_fd, fdinfo_buf, sizeof(fdinfo_buf)); + ASSERT_LE(fdinfo_buf_size, sizeof(fdinfo_buf) - 1); + + /* + * fdinfo_buf: pos: 0 + * flags: 02000002 + * mnt_id: 13 + * handled_access_fs: 804000 + */ + EXPECT_EQ(0, close(fdinfo_fd)); +} + +TEST(features) +{ + struct landlock_attr_features attr_features = { + .options_get_features = ~0ULL, + .options_create_ruleset = ~0ULL, + .options_add_rule = ~0ULL, + .options_enforce_ruleset = ~0ULL, + .access_fs = ~0ULL, + .size_attr_ruleset = ~0ULL, + .size_attr_path_beneath = ~0ULL, + }; + + ASSERT_EQ(0, landlock(LANDLOCK_CMD_GET_FEATURES, + LANDLOCK_OPT_CREATE_RULESET, + sizeof(attr_features), &attr_features)); + ASSERT_EQ(((LANDLOCK_OPT_GET_FEATURES << 1) - 1), + attr_features.options_get_features); + ASSERT_EQ(((LANDLOCK_OPT_CREATE_RULESET << 1) - 1), + attr_features.options_create_ruleset); + ASSERT_EQ(((LANDLOCK_OPT_ADD_RULE_PATH_BENEATH << 1) - 1), + attr_features.options_add_rule); + ASSERT_EQ(((LANDLOCK_OPT_ENFORCE_RULESET << 1) - 1), + attr_features.options_enforce_ruleset); + ASSERT_EQ(((LANDLOCK_ACCESS_FS_MAP << 1) - 1), + attr_features.access_fs); + ASSERT_EQ(sizeof(struct landlock_attr_ruleset), + attr_features.size_attr_ruleset); + ASSERT_EQ(sizeof(struct landlock_attr_path_beneath), + attr_features.size_attr_path_beneath); +} + +TEST_HARNESS_MAIN diff --git a/tools/testing/selftests/landlock/test_fs.c b/tools/testing/selftests/landlock/test_fs.c new file mode 100644 index 000000000000..627cb3a71f89 --- /dev/null +++ b/tools/testing/selftests/landlock/test_fs.c @@ -0,0 +1,624 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Landlock tests - filesystem + * + * Copyright © 2017-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2020 ANSSI + */ + +#define _GNU_SOURCE +#include <fcntl.h> +#include <linux/landlock.h> +#include <sched.h> +#include <sys/mount.h> +#include <sys/prctl.h> +#include <sys/stat.h> +#include <unistd.h> + +#include "test.h" + +#define TMP_PREFIX "tmp_" + +/* Paths (sibling number and depth) */ +const char dir_s0_d1[] = TMP_PREFIX "a0"; +const char dir_s0_d2[] = TMP_PREFIX "a0/b0"; +const char dir_s0_d3[] = TMP_PREFIX "a0/b0/c0"; +const char dir_s1_d1[] = TMP_PREFIX "a1"; +const char dir_s2_d1[] = TMP_PREFIX "a2"; +const char dir_s2_d2[] = TMP_PREFIX "a2/b2"; + +/* dir_s3_d1 is a tmpfs mount. */ +const char dir_s3_d1[] = TMP_PREFIX "a3"; +const char dir_s3_d2[] = TMP_PREFIX "a3/b3"; + +/* dir_s4_d2 is a tmpfs mount. */ +const char dir_s4_d1[] = TMP_PREFIX "a4"; +const char dir_s4_d2[] = TMP_PREFIX "a4/b4"; + +static void cleanup_layout1(void) +{ + rmdir(dir_s2_d2); + rmdir(dir_s2_d1); + rmdir(dir_s1_d1); + rmdir(dir_s0_d3); + rmdir(dir_s0_d2); + rmdir(dir_s0_d1); + + /* dir_s3_d2 may be bind mounted */ + umount(dir_s3_d2); + rmdir(dir_s3_d2); + umount(dir_s3_d1); + rmdir(dir_s3_d1); + + umount(dir_s4_d2); + rmdir(dir_s4_d2); + rmdir(dir_s4_d1); +} + +FIXTURE(layout1) { +}; + +FIXTURE_SETUP(layout1) +{ + cleanup_layout1(); + + /* Do not pollute the rest of the system. */ + ASSERT_NE(-1, unshare(CLONE_NEWNS)); + + ASSERT_EQ(0, mkdir(dir_s0_d1, 0600)); + ASSERT_EQ(0, mkdir(dir_s0_d2, 0600)); + ASSERT_EQ(0, mkdir(dir_s0_d3, 0600)); + ASSERT_EQ(0, mkdir(dir_s1_d1, 0600)); + ASSERT_EQ(0, mkdir(dir_s2_d1, 0600)); + ASSERT_EQ(0, mkdir(dir_s2_d2, 0600)); + + ASSERT_EQ(0, mkdir(dir_s3_d1, 0600)); + ASSERT_EQ(0, mount("tmp", dir_s3_d1, "tmpfs", 0, NULL)); + ASSERT_EQ(0, mkdir(dir_s3_d2, 0600)); + + ASSERT_EQ(0, mkdir(dir_s4_d1, 0600)); + ASSERT_EQ(0, mkdir(dir_s4_d2, 0600)); + ASSERT_EQ(0, mount("tmp", dir_s4_d2, "tmpfs", 0, NULL)); +} + +FIXTURE_TEARDOWN(layout1) +{ + /* + * cleanup_layout1() would be denied here, use TEST(cleanup) instead. + */ +} + +static void test_path_rel(struct __test_metadata *_metadata, int dirfd, + const char *path, int ret) +{ + int fd; + struct stat statbuf; + + /* faccessat() can not be restricted for now */ + ASSERT_EQ(ret, fstatat(dirfd, path, &statbuf, 0)) { + TH_LOG("fstatat path "%s" returned %s\n", path, + strerror(errno)); + } + if (ret) { + ASSERT_EQ(EACCES, errno); + } + fd = openat(dirfd, path, O_DIRECTORY); + if (ret) { + ASSERT_EQ(-1, fd); + ASSERT_EQ(EACCES, errno); + } else { + ASSERT_NE(-1, fd); + EXPECT_EQ(0, close(fd)); + } +} + +static void test_path(struct __test_metadata *_metadata, const char *path, + int ret) +{ + return test_path_rel(_metadata, AT_FDCWD, path, ret); +} + +TEST_F(layout1, no_restriction) +{ + test_path(_metadata, dir_s0_d1, 0); + test_path(_metadata, dir_s0_d2, 0); + test_path(_metadata, dir_s0_d3, 0); + test_path(_metadata, dir_s1_d1, 0); + test_path(_metadata, dir_s2_d2, 0); +} + +TEST_F(ruleset_rw, inval) +{ + int err; + struct landlock_attr_path_beneath path_beneath = { + .allowed_access = LANDLOCK_ACCESS_FS_READ | + LANDLOCK_ACCESS_FS_WRITE, + .parent_fd = -1, + }; + struct landlock_attr_enforce attr_enforce; + + path_beneath.ruleset_fd = self->ruleset_fd; + path_beneath.parent_fd = open(dir_s0_d2, O_PATH | O_NOFOLLOW | + O_DIRECTORY | O_CLOEXEC); + ASSERT_GE(path_beneath.parent_fd, 0); + err = landlock(LANDLOCK_CMD_ADD_RULE, + LANDLOCK_OPT_ADD_RULE_PATH_BENEATH, + sizeof(path_beneath), &path_beneath); + ASSERT_EQ(errno, 0); + ASSERT_EQ(err, 0); + ASSERT_EQ(0, close(path_beneath.parent_fd)); + + /* Tests without O_PATH. */ + path_beneath.parent_fd = open(dir_s0_d2, O_NOFOLLOW | O_DIRECTORY | + O_CLOEXEC); + ASSERT_GE(path_beneath.parent_fd, 0); + err = landlock(LANDLOCK_CMD_ADD_RULE, + LANDLOCK_OPT_ADD_RULE_PATH_BENEATH, + sizeof(path_beneath), &path_beneath); + ASSERT_EQ(err, -1); + ASSERT_EQ(errno, EBADR); + errno = 0; + ASSERT_EQ(0, close(path_beneath.parent_fd)); + + /* Checks un-handled access. */ + path_beneath.parent_fd = open(dir_s0_d2, O_PATH | O_NOFOLLOW | + O_DIRECTORY | O_CLOEXEC); + ASSERT_GE(path_beneath.parent_fd, 0); + path_beneath.allowed_access |= LANDLOCK_ACCESS_FS_EXECUTE; + err = landlock(LANDLOCK_CMD_ADD_RULE, + LANDLOCK_OPT_ADD_RULE_PATH_BENEATH, + sizeof(path_beneath), &path_beneath); + ASSERT_EQ(errno, EINVAL); + errno = 0; + ASSERT_EQ(err, -1); + ASSERT_EQ(0, close(path_beneath.parent_fd)); + + err = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); + ASSERT_EQ(errno, 0); + ASSERT_EQ(err, 0); + + attr_enforce.ruleset_fd = self->ruleset_fd; + err = landlock(LANDLOCK_CMD_ENFORCE_RULESET, + LANDLOCK_OPT_ENFORCE_RULESET, sizeof(attr_enforce), + &attr_enforce); + ASSERT_EQ(errno, 0); + ASSERT_EQ(err, 0); +} + +TEST_F(ruleset_rw, nsfs) +{ + struct landlock_attr_path_beneath path_beneath = { + .allowed_access = LANDLOCK_ACCESS_FS_READ | + LANDLOCK_ACCESS_FS_WRITE, + .ruleset_fd = self->ruleset_fd, + }; + int err; + + path_beneath.parent_fd = open("/proc/self/ns/mnt", O_PATH | O_NOFOLLOW | + O_CLOEXEC); + ASSERT_GE(path_beneath.parent_fd, 0); + err = landlock(LANDLOCK_CMD_ADD_RULE, + LANDLOCK_OPT_ADD_RULE_PATH_BENEATH, + sizeof(path_beneath), &path_beneath); + ASSERT_EQ(errno, 0); + ASSERT_EQ(err, 0); + ASSERT_EQ(0, close(path_beneath.parent_fd)); +} + +static void add_path_beneath(struct __test_metadata *_metadata, int ruleset_fd, + __u64 allowed_access, const char *path) +{ + int err; + struct landlock_attr_path_beneath path_beneath = { + .ruleset_fd = ruleset_fd, + .allowed_access = allowed_access, + }; + + path_beneath.parent_fd = open(path, O_PATH | O_NOFOLLOW | O_DIRECTORY | + O_CLOEXEC); + ASSERT_GE(path_beneath.parent_fd, 0) { + TH_LOG("Failed to open directory "%s": %s\n", path, + strerror(errno)); + } + err = landlock(LANDLOCK_CMD_ADD_RULE, + LANDLOCK_OPT_ADD_RULE_PATH_BENEATH, + sizeof(path_beneath), &path_beneath); + ASSERT_EQ(err, 0) { + TH_LOG("Failed to update the ruleset with "%s": %s\n", + path, strerror(errno)); + } + ASSERT_EQ(errno, 0); + ASSERT_EQ(0, close(path_beneath.parent_fd)); +} + +static int create_ruleset(struct __test_metadata *_metadata, + const char *const dirs[]) +{ + int ruleset_fd, dirs_len, i; + struct landlock_attr_features attr_features; + struct landlock_attr_ruleset attr_ruleset = { + .handled_access_fs = + LANDLOCK_ACCESS_FS_OPEN | + LANDLOCK_ACCESS_FS_READ | + LANDLOCK_ACCESS_FS_WRITE | + LANDLOCK_ACCESS_FS_EXECUTE | + LANDLOCK_ACCESS_FS_GETATTR + }; + __u64 allowed_access = + LANDLOCK_ACCESS_FS_OPEN | + LANDLOCK_ACCESS_FS_READ | + LANDLOCK_ACCESS_FS_GETATTR; + + ASSERT_NE(NULL, dirs) { + TH_LOG("No directory list\n"); + } + ASSERT_NE(NULL, dirs[0]) { + TH_LOG("Empty directory list\n"); + } + /* Gets the number of dir entries. */ + for (dirs_len = 0; dirs[dirs_len]; dirs_len++); + + ASSERT_EQ(0, landlock(LANDLOCK_CMD_GET_FEATURES, + LANDLOCK_OPT_GET_FEATURES, + sizeof(attr_features), &attr_features)); + /* Only for test, use a binary AND for real application instead. */ + ASSERT_EQ(attr_ruleset.handled_access_fs, + attr_ruleset.handled_access_fs & + attr_features.access_fs); + ASSERT_EQ(allowed_access, allowed_access & attr_features.access_fs); + ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET, + LANDLOCK_OPT_CREATE_RULESET, sizeof(attr_ruleset), + &attr_ruleset); + ASSERT_GE(ruleset_fd, 0) { + TH_LOG("Failed to create a ruleset: %s\n", strerror(errno)); + } + + for (i = 0; dirs[i]; i++) { + add_path_beneath(_metadata, ruleset_fd, allowed_access, + dirs[i]); + } + return ruleset_fd; +} + +static void enforce_ruleset(struct __test_metadata *_metadata, int ruleset_fd) +{ + struct landlock_attr_enforce attr_enforce = { + .ruleset_fd = ruleset_fd, + }; + int err; + + err = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); + ASSERT_EQ(errno, 0); + ASSERT_EQ(err, 0); + + err = landlock(LANDLOCK_CMD_ENFORCE_RULESET, + LANDLOCK_OPT_ENFORCE_RULESET, sizeof(attr_enforce), + &attr_enforce); + ASSERT_EQ(err, 0) { + TH_LOG("Failed to enforce ruleset: %s\n", strerror(errno)); + } + ASSERT_EQ(errno, 0); +} + +TEST_F(layout1, whitelist) +{ + int ruleset_fd = create_ruleset(_metadata, + (const char *const []){ dir_s0_d2, dir_s1_d1, NULL }); + ASSERT_NE(-1, ruleset_fd); + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + + test_path(_metadata, "/", -1); + test_path(_metadata, dir_s0_d1, -1); + test_path(_metadata, dir_s0_d2, 0); + test_path(_metadata, dir_s0_d3, 0); +} + +TEST_F(layout1, unhandled_access) +{ + int ruleset_fd = create_ruleset(_metadata, + (const char *const []){ dir_s0_d2, NULL }); + ASSERT_NE(-1, ruleset_fd); + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + + /* + * Because the policy does not handled LANDLOCK_ACCESS_FS_CHROOT, + * chroot(2) should be allowed. + */ + ASSERT_EQ(0, chroot(dir_s0_d1)); + ASSERT_EQ(0, chroot(dir_s0_d2)); + ASSERT_EQ(0, chroot(dir_s0_d3)); +} + +TEST_F(layout1, ruleset_overlap) +{ + struct stat statbuf; + int open_fd; + int ruleset_fd = create_ruleset(_metadata, + (const char *const []){ dir_s1_d1, NULL }); + ASSERT_NE(-1, ruleset_fd); + /* These rules should be ORed among them. */ + add_path_beneath(_metadata, ruleset_fd, + LANDLOCK_ACCESS_FS_GETATTR, dir_s0_d2); + add_path_beneath(_metadata, ruleset_fd, + LANDLOCK_ACCESS_FS_OPEN, dir_s0_d2); + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + + ASSERT_EQ(-1, fstatat(AT_FDCWD, dir_s0_d1, &statbuf, 0)); + ASSERT_EQ(-1, openat(AT_FDCWD, dir_s0_d1, O_DIRECTORY)); + ASSERT_EQ(0, fstatat(AT_FDCWD, dir_s0_d2, &statbuf, 0)); + open_fd = openat(AT_FDCWD, dir_s0_d2, O_DIRECTORY); + ASSERT_LE(0, open_fd); + EXPECT_EQ(0, close(open_fd)); + ASSERT_EQ(0, fstatat(AT_FDCWD, dir_s0_d3, &statbuf, 0)); + open_fd = openat(AT_FDCWD, dir_s0_d3, O_DIRECTORY); + ASSERT_LE(0, open_fd); + EXPECT_EQ(0, close(open_fd)); +} + +TEST_F(layout1, inherit_superset) +{ + struct stat statbuf; + int ruleset_fd, open_fd; + + ruleset_fd = create_ruleset(_metadata, + (const char *const []){ dir_s1_d1, NULL }); + ASSERT_NE(-1, ruleset_fd); + add_path_beneath(_metadata, ruleset_fd, + LANDLOCK_ACCESS_FS_OPEN, dir_s0_d2); + enforce_ruleset(_metadata, ruleset_fd); + + ASSERT_EQ(-1, fstatat(AT_FDCWD, dir_s0_d1, &statbuf, 0)); + ASSERT_EQ(-1, openat(AT_FDCWD, dir_s0_d1, O_DIRECTORY)); + + ASSERT_EQ(-1, fstatat(AT_FDCWD, dir_s0_d2, &statbuf, 0)); + open_fd = openat(AT_FDCWD, dir_s0_d2, O_DIRECTORY); + ASSERT_NE(-1, open_fd); + ASSERT_EQ(0, close(open_fd)); + + ASSERT_EQ(-1, fstatat(AT_FDCWD, dir_s0_d3, &statbuf, 0)); + open_fd = openat(AT_FDCWD, dir_s0_d3, O_DIRECTORY); + ASSERT_NE(-1, open_fd); + ASSERT_EQ(0, close(open_fd)); + + /* + * Test shared rule extension: the following rules should not grant any + * new access, only remove some. Once enforced, these rules are ANDed + * with the previous ones. + */ + add_path_beneath(_metadata, ruleset_fd, LANDLOCK_ACCESS_FS_GETATTR, + dir_s0_d2); + /* + * In ruleset_fd, dir_s0_d2 should now have the LANDLOCK_ACCESS_FS_OPEN + * and LANDLOCK_ACCESS_FS_GETATTR access rights (even if this directory + * is opened a second time). However, when enforcing this updated + * ruleset, the ruleset tied to the current process will still only + * have the dir_s0_d2 with LANDLOCK_ACCESS_FS_OPEN access, + * LANDLOCK_ACCESS_FS_GETATTR must not be allowed because it would be a + * privilege escalation. + */ + enforce_ruleset(_metadata, ruleset_fd); + + /* Same tests and results as above. */ + ASSERT_EQ(-1, fstatat(AT_FDCWD, dir_s0_d1, &statbuf, 0)); + ASSERT_EQ(-1, openat(AT_FDCWD, dir_s0_d1, O_DIRECTORY)); + + /* It is still forbiden to fstat(dir_s0_d2). */ + ASSERT_EQ(-1, fstatat(AT_FDCWD, dir_s0_d2, &statbuf, 0)); + open_fd = openat(AT_FDCWD, dir_s0_d2, O_DIRECTORY); + ASSERT_NE(-1, open_fd); + ASSERT_EQ(0, close(open_fd)); + + ASSERT_EQ(-1, fstatat(AT_FDCWD, dir_s0_d3, &statbuf, 0)); + open_fd = openat(AT_FDCWD, dir_s0_d3, O_DIRECTORY); + ASSERT_NE(-1, open_fd); + ASSERT_EQ(0, close(open_fd)); + + /* + * Now, dir_s0_d3 get a new rule tied to it, only allowing + * LANDLOCK_ACCESS_FS_GETATTR. The kernel internal difference is that + * there was no rule tied to it before. + */ + add_path_beneath(_metadata, ruleset_fd, LANDLOCK_ACCESS_FS_GETATTR, + dir_s0_d3); + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + + /* + * Same tests and results as above, except for open(dir_s0_d3) which is + * now denied because the new rule mask the rule previously inherited + * from dir_s0_d2. + */ + ASSERT_EQ(-1, fstatat(AT_FDCWD, dir_s0_d1, &statbuf, 0)); + ASSERT_EQ(-1, openat(AT_FDCWD, dir_s0_d1, O_DIRECTORY)); + + ASSERT_EQ(-1, fstatat(AT_FDCWD, dir_s0_d2, &statbuf, 0)); + open_fd = openat(AT_FDCWD, dir_s0_d2, O_DIRECTORY); + ASSERT_NE(-1, open_fd); + ASSERT_EQ(0, close(open_fd)); + + /* It is still forbiden to fstat(dir_s0_d3). */ + ASSERT_EQ(-1, fstatat(AT_FDCWD, dir_s0_d3, &statbuf, 0)); + open_fd = openat(AT_FDCWD, dir_s0_d3, O_DIRECTORY); + /* open(dir_s0_d3) is now forbidden. */ + ASSERT_EQ(-1, open_fd); + ASSERT_EQ(EACCES, errno); +} + +TEST_F(layout1, extend_ruleset_with_denied_path) +{ + struct landlock_attr_path_beneath path_beneath = { + .allowed_access = LANDLOCK_ACCESS_FS_GETATTR, + }; + + path_beneath.ruleset_fd = create_ruleset(_metadata, + (const char *const []){ dir_s0_d2, NULL }); + ASSERT_NE(-1, path_beneath.ruleset_fd); + enforce_ruleset(_metadata, path_beneath.ruleset_fd); + + ASSERT_EQ(-1, open(dir_s0_d1, O_NOFOLLOW | O_DIRECTORY | O_CLOEXEC)); + ASSERT_EQ(EACCES, errno); + + /* + * Tests that we can't create a rule for which we are not allowed to + * open its path. + */ + path_beneath.parent_fd = open(dir_s0_d1, O_PATH | O_NOFOLLOW + | O_DIRECTORY | O_CLOEXEC); + ASSERT_GE(path_beneath.parent_fd, 0); + ASSERT_EQ(-1, landlock(LANDLOCK_CMD_ADD_RULE, + LANDLOCK_OPT_CREATE_RULESET, + sizeof(path_beneath), &path_beneath)); + ASSERT_EQ(EACCES, errno); + ASSERT_EQ(0, close(path_beneath.parent_fd)); + EXPECT_EQ(0, close(path_beneath.ruleset_fd)); +} + +TEST_F(layout1, rule_on_mountpoint) +{ + int ruleset_fd = create_ruleset(_metadata, + (const char *const []){ dir_s0_d1, dir_s3_d1, NULL }); + ASSERT_NE(-1, ruleset_fd); + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + + test_path(_metadata, dir_s1_d1, -1); + test_path(_metadata, dir_s0_d1, 0); + test_path(_metadata, dir_s3_d1, 0); +} + +TEST_F(layout1, rule_over_mountpoint) +{ + int ruleset_fd = create_ruleset(_metadata, + (const char *const []){ dir_s4_d1, dir_s0_d1, NULL }); + ASSERT_NE(-1, ruleset_fd); + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + + test_path(_metadata, dir_s4_d2, 0); + test_path(_metadata, dir_s0_d1, 0); + test_path(_metadata, dir_s4_d1, 0); +} + +/* + * This test verifies that we can apply a landlock rule on the root (/), it + * might require special handling. + */ +TEST_F(layout1, rule_over_root) +{ + int ruleset_fd = create_ruleset(_metadata, + (const char *const []){ "/", NULL }); + ASSERT_NE(-1, ruleset_fd); + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + + test_path(_metadata, "/", 0); + test_path(_metadata, dir_s0_d1, 0); +} + +TEST_F(layout1, rule_inside_mount_ns) +{ + ASSERT_NE(-1, mount(NULL, "/", NULL, MS_PRIVATE | MS_REC, NULL)); + ASSERT_NE(-1, syscall(SYS_pivot_root, dir_s3_d1, dir_s3_d2)); + ASSERT_NE(-1, chdir("/")); + + int ruleset_fd = create_ruleset(_metadata, + (const char *const []){ "b3", NULL }); + ASSERT_NE(-1, ruleset_fd); + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + + test_path(_metadata, "b3", 0); + test_path(_metadata, "/", -1); +} + +TEST_F(layout1, mount_and_pivot) +{ + int ruleset_fd = create_ruleset(_metadata, + (const char *const []){ dir_s3_d1, NULL }); + ASSERT_NE(-1, ruleset_fd); + enforce_ruleset(_metadata, ruleset_fd); + EXPECT_EQ(0, close(ruleset_fd)); + + ASSERT_EQ(-1, mount(NULL, "/", NULL, MS_PRIVATE | MS_REC, NULL)); + ASSERT_EQ(-1, syscall(SYS_pivot_root, dir_s3_d1, dir_s3_d2)); +} + +enum relative_access { + REL_OPEN, + REL_CHDIR, + REL_CHROOT, +}; + +static void check_access(struct __test_metadata *_metadata, + bool enforce, enum relative_access rel) +{ + int dirfd; + int ruleset_fd = -1; + + if (enforce) { + ruleset_fd = create_ruleset(_metadata, (const char *const []){ + dir_s0_d2, dir_s1_d1, NULL }); + ASSERT_NE(-1, ruleset_fd); + if (rel == REL_CHROOT) + ASSERT_NE(-1, chdir(dir_s0_d2)); + enforce_ruleset(_metadata, ruleset_fd); + } else if (rel == REL_CHROOT) + ASSERT_NE(-1, chdir(dir_s0_d2)); + switch (rel) { + case REL_OPEN: + dirfd = open(dir_s0_d2, O_DIRECTORY); + ASSERT_NE(-1, dirfd); + break; + case REL_CHDIR: + ASSERT_NE(-1, chdir(dir_s0_d2)); + dirfd = AT_FDCWD; + break; + case REL_CHROOT: + ASSERT_NE(-1, chroot(".")) { + TH_LOG("Failed to chroot: %s\n", strerror(errno)); + } + dirfd = AT_FDCWD; + break; + default: + ASSERT_TRUE(false); + return; + } + + test_path_rel(_metadata, dirfd, "..", (rel == REL_CHROOT) ? 0 : -1); + test_path_rel(_metadata, dirfd, ".", 0); + if (rel != REL_CHROOT) { + test_path_rel(_metadata, dirfd, "./c0", 0); + test_path_rel(_metadata, dirfd, "../../" TMP_PREFIX "a1", 0); + test_path_rel(_metadata, dirfd, "../../" TMP_PREFIX "a2", -1); + } + + if (rel == REL_OPEN) + EXPECT_EQ(0, close(dirfd)); + if (enforce) + EXPECT_EQ(0, close(ruleset_fd)); +} + +TEST_F(layout1, deny_open) +{ + check_access(_metadata, true, REL_OPEN); +} + +TEST_F(layout1, deny_chdir) +{ + check_access(_metadata, true, REL_CHDIR); +} + +TEST_F(layout1, deny_chroot) +{ + check_access(_metadata, true, REL_CHROOT); +} + +TEST(cleanup) +{ + cleanup_layout1(); +} + +TEST_HARNESS_MAIN diff --git a/tools/testing/selftests/landlock/test_ptrace.c b/tools/testing/selftests/landlock/test_ptrace.c new file mode 100644 index 000000000000..fcdb41e172d1 --- /dev/null +++ b/tools/testing/selftests/landlock/test_ptrace.c @@ -0,0 +1,293 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Landlock tests - ptrace + * + * Copyright © 2017-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2019-2020 ANSSI + */ + +#define _GNU_SOURCE +#include <errno.h> +#include <fcntl.h> +#include <linux/landlock.h> +#include <signal.h> +#include <sys/prctl.h> +#include <sys/ptrace.h> +#include <sys/types.h> +#include <sys/wait.h> +#include <unistd.h> + +#include "test.h" + +static void create_domain(struct __test_metadata *_metadata) +{ + int ruleset_fd, err; + struct landlock_attr_features attr_features; + struct landlock_attr_enforce attr_enforce; + struct landlock_attr_ruleset attr_ruleset = { + .handled_access_fs = LANDLOCK_ACCESS_FS_READ, + }; + struct landlock_attr_path_beneath path_beneath = { + .allowed_access = LANDLOCK_ACCESS_FS_READ, + }; + + ASSERT_EQ(0, landlock(LANDLOCK_CMD_GET_FEATURES, + LANDLOCK_OPT_GET_FEATURES, + sizeof(attr_features), &attr_features)); + /* Only for test, use a binary AND for real application instead. */ + ASSERT_EQ(attr_ruleset.handled_access_fs, + attr_ruleset.handled_access_fs & + attr_features.access_fs); + ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET, + LANDLOCK_OPT_CREATE_RULESET, sizeof(attr_ruleset), + &attr_ruleset); + ASSERT_GE(ruleset_fd, 0) { + TH_LOG("Failed to create a ruleset: %s\n", strerror(errno)); + } + path_beneath.ruleset_fd = ruleset_fd; + path_beneath.parent_fd = open("/tmp", O_PATH | O_NOFOLLOW | O_DIRECTORY + | O_CLOEXEC); + ASSERT_GE(path_beneath.parent_fd, 0); + err = landlock(LANDLOCK_CMD_ADD_RULE, + LANDLOCK_OPT_ADD_RULE_PATH_BENEATH, + sizeof(path_beneath), &path_beneath); + ASSERT_EQ(err, 0); + ASSERT_EQ(errno, 0); + ASSERT_EQ(0, close(path_beneath.parent_fd)); + + err = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); + ASSERT_EQ(errno, 0); + ASSERT_EQ(err, 0); + + attr_enforce.ruleset_fd = ruleset_fd; + err = landlock(LANDLOCK_CMD_ENFORCE_RULESET, + LANDLOCK_OPT_ENFORCE_RULESET, sizeof(attr_enforce), + &attr_enforce); + ASSERT_EQ(err, 0); + ASSERT_EQ(errno, 0); + + ASSERT_EQ(0, close(ruleset_fd)); +} + +/* test PTRACE_TRACEME and PTRACE_ATTACH for parent and child */ +static void check_ptrace(struct __test_metadata *_metadata, + bool domain_both, bool domain_parent, bool domain_child) +{ + pid_t child, parent; + int status; + int pipe_child[2], pipe_parent[2]; + char buf_parent; + + parent = getpid(); + ASSERT_EQ(0, pipe(pipe_child)); + ASSERT_EQ(0, pipe(pipe_parent)); + if (domain_both) + create_domain(_metadata); + + child = fork(); + ASSERT_LE(0, child); + if (child == 0) { + char buf_child; + + EXPECT_EQ(0, close(pipe_parent[1])); + EXPECT_EQ(0, close(pipe_child[0])); + if (domain_child) + create_domain(_metadata); + + /* sync #1 */ + ASSERT_EQ(1, read(pipe_parent[0], &buf_child, 1)) { + TH_LOG("Failed to read() sync #1 from parent"); + } + ASSERT_EQ('.', buf_child); + + /* Tests the parent protection. */ + ASSERT_EQ(domain_child ? -1 : 0, + ptrace(PTRACE_ATTACH, parent, NULL, 0)); + if (domain_child) { + ASSERT_EQ(EPERM, errno); + } else { + ASSERT_EQ(parent, waitpid(parent, &status, 0)); + ASSERT_EQ(1, WIFSTOPPED(status)); + ASSERT_EQ(0, ptrace(PTRACE_DETACH, parent, NULL, 0)); + } + + /* sync #2 */ + ASSERT_EQ(1, write(pipe_child[1], ".", 1)) { + TH_LOG("Failed to write() sync #2 to parent"); + } + + /* Tests traceme. */ + ASSERT_EQ(domain_parent ? -1 : 0, ptrace(PTRACE_TRACEME)); + if (domain_parent) { + ASSERT_EQ(EPERM, errno); + } else { + ASSERT_EQ(0, raise(SIGSTOP)); + } + + /* sync #3 */ + ASSERT_EQ(1, read(pipe_parent[0], &buf_child, 1)) { + TH_LOG("Failed to read() sync #3 from parent"); + } + ASSERT_EQ('.', buf_child); + _exit(_metadata->passed ? EXIT_SUCCESS : EXIT_FAILURE); + } + + EXPECT_EQ(0, close(pipe_child[1])); + EXPECT_EQ(0, close(pipe_parent[0])); + if (domain_parent) + create_domain(_metadata); + + /* sync #1 */ + ASSERT_EQ(1, write(pipe_parent[1], ".", 1)) { + TH_LOG("Failed to write() sync #1 to child"); + } + + /* Tests the parent protection. */ + /* sync #2 */ + ASSERT_EQ(1, read(pipe_child[0], &buf_parent, 1)) { + TH_LOG("Failed to read() sync #2 from child"); + } + ASSERT_EQ('.', buf_parent); + + /* Tests traceme. */ + if (!domain_parent) { + ASSERT_EQ(child, waitpid(child, &status, 0)); + ASSERT_EQ(1, WIFSTOPPED(status)); + ASSERT_EQ(0, ptrace(PTRACE_DETACH, child, NULL, 0)); + } + /* Tests attach. */ + ASSERT_EQ(domain_parent ? -1 : 0, + ptrace(PTRACE_ATTACH, child, NULL, 0)); + if (domain_parent) { + ASSERT_EQ(EPERM, errno); + } else { + ASSERT_EQ(child, waitpid(child, &status, 0)); + ASSERT_EQ(1, WIFSTOPPED(status)); + ASSERT_EQ(0, ptrace(PTRACE_DETACH, child, NULL, 0)); + } + + /* sync #3 */ + ASSERT_EQ(1, write(pipe_parent[1], ".", 1)) { + TH_LOG("Failed to write() sync #3 to child"); + } + ASSERT_EQ(child, waitpid(child, &status, 0)); + if (WIFSIGNALED(status) || WEXITSTATUS(status)) + _metadata->passed = 0; +} + +/* + * Test multiple tracing combinations between a parent process P1 and a child + * process P2. + * + * Yama's scoped ptrace is presumed disabled. If enabled, this optional + * restriction is enforced in addition to any Landlock check, which means that + * all P2 requests to trace P1 would be denied. + */ + +/* + * No domain + * + * P1-. P1 -> P2 : allow + * \ P2 -> P1 : allow + * 'P2 + */ +TEST(allow_without_domain) { + check_ptrace(_metadata, false, false, false); +} + +/* + * Child domain + * + * P1--. P1 -> P2 : allow + * \ P2 -> P1 : deny + * .'-----. + * | P2 | + * '------' + */ +TEST(allow_with_one_domain) { + check_ptrace(_metadata, false, false, true); +} + +/* + * Parent domain + * .------. + * | P1 --. P1 -> P2 : deny + * '------' \ P2 -> P1 : allow + * ' + * P2 + */ +TEST(deny_with_parent_domain) { + check_ptrace(_metadata, false, true, false); +} + +/* + * Parent + child domain (siblings) + * .------. + * | P1 ---. P1 -> P2 : deny + * '------' \ P2 -> P1 : deny + * .---'--. + * | P2 | + * '------' + */ +TEST(deny_with_sibling_domain) { + check_ptrace(_metadata, false, true, true); +} + +/* + * Same domain (inherited) + * .-------------. + * | P1----. | P1 -> P2 : allow + * | \ | P2 -> P1 : allow + * | ' | + * | P2 | + * '-------------' + */ +TEST(allow_sibling_domain) { + check_ptrace(_metadata, true, false, false); +} + +/* + * Inherited + child domain + * .-----------------. + * | P1----. | P1 -> P2 : allow + * | \ | P2 -> P1 : deny + * | .-'----. | + * | | P2 | | + * | '------' | + * '-----------------' + */ +TEST(allow_with_nested_domain) { + check_ptrace(_metadata, true, false, true); +} + +/* + * Inherited + parent domain + * .-----------------. + * |.------. | P1 -> P2 : deny + * || P1 ----. | P2 -> P1 : allow + * |'------' \ | + * | ' | + * | P2 | + * '-----------------' + */ +TEST(deny_with_nested_and_parent_domain) { + check_ptrace(_metadata, true, true, false); +} + +/* + * Inherited + parent and child domain (siblings) + * .-----------------. + * | .------. | P1 -> P2 : deny + * | | P1 . | P2 -> P1 : deny + * | '------'\ | + * | \ | + * | .--'---. | + * | | P2 | | + * | '------' | + * '-----------------' + */ +TEST(deny_with_forked_domain) { + check_ptrace(_metadata, true, true, true); +} + +TEST_HARNESS_MAIN
Add a basic sandbox tool to launch a command which can only access a whitelist of file hierarchies in a read-only or read-write way.
Signed-off-by: Mickaël Salaün mic@digikod.net Cc: Andy Lutomirski luto@amacapital.net Cc: James Morris jmorris@namei.org Cc: Kees Cook keescook@chromium.org Cc: Serge E. Hallyn serge@hallyn.com ---
Changes since v11: * Add back the filesystem sandbox manager and update it to work with the new Landlock syscall.
Previous version: https://lore.kernel.org/lkml/20190721213116.23476-9-mic@digikod.net/ --- samples/Kconfig | 7 ++ samples/Makefile | 1 + samples/landlock/.gitignore | 1 + samples/landlock/Makefile | 15 +++ samples/landlock/sandboxer.c | 226 +++++++++++++++++++++++++++++++++++ 5 files changed, 250 insertions(+) create mode 100644 samples/landlock/.gitignore create mode 100644 samples/landlock/Makefile create mode 100644 samples/landlock/sandboxer.c
diff --git a/samples/Kconfig b/samples/Kconfig index 9d236c346de5..5ec43a732b10 100644 --- a/samples/Kconfig +++ b/samples/Kconfig @@ -120,6 +120,13 @@ config SAMPLE_HIDRAW bool "hidraw sample" depends on HEADERS_INSTALL
+config SAMPLE_LANDLOCK + bool "Build Landlock sample code" + depends on HEADERS_INSTALL + help + Build a simple Landlock sandbox manager able to launch a process + restricted by a user-defined filesystem access-control security policy. + config SAMPLE_PIDFD bool "pidfd sample" depends on HEADERS_INSTALL diff --git a/samples/Makefile b/samples/Makefile index f8f847b4f61f..61a2bd216f53 100644 --- a/samples/Makefile +++ b/samples/Makefile @@ -11,6 +11,7 @@ obj-$(CONFIG_SAMPLE_KDB) += kdb/ obj-$(CONFIG_SAMPLE_KFIFO) += kfifo/ obj-$(CONFIG_SAMPLE_KOBJECT) += kobject/ obj-$(CONFIG_SAMPLE_KPROBES) += kprobes/ +subdir-$(CONFIG_SAMPLE_LANDLOCK) += landlock obj-$(CONFIG_SAMPLE_LIVEPATCH) += livepatch/ subdir-$(CONFIG_SAMPLE_PIDFD) += pidfd obj-$(CONFIG_SAMPLE_QMI_CLIENT) += qmi/ diff --git a/samples/landlock/.gitignore b/samples/landlock/.gitignore new file mode 100644 index 000000000000..f43668b2d318 --- /dev/null +++ b/samples/landlock/.gitignore @@ -0,0 +1 @@ +/sandboxer diff --git a/samples/landlock/Makefile b/samples/landlock/Makefile new file mode 100644 index 000000000000..9dfb571641ba --- /dev/null +++ b/samples/landlock/Makefile @@ -0,0 +1,15 @@ +# SPDX-License-Identifier: BSD-3-Clause + +hostprogs-y := sandboxer + +always := $(hostprogs-y) + +KBUILD_HOSTCFLAGS += -I$(objtree)/usr/include + +.PHONY: all clean + +all: + $(MAKE) -C ../.. samples/landlock/ + +clean: + $(MAKE) -C ../.. M=samples/landlock/ clean diff --git a/samples/landlock/sandboxer.c b/samples/landlock/sandboxer.c new file mode 100644 index 000000000000..882c12f71edb --- /dev/null +++ b/samples/landlock/sandboxer.c @@ -0,0 +1,226 @@ +// SPDX-License-Identifier: BSD-3-Clause +/* + * Simple Landlock sandbox manager able to launch a process restricted by a + * user-defined filesystem access-control security policy. + * + * Copyright © 2017-2020 Mickaël Salaün mic@digikod.net + * Copyright © 2020 ANSSI + */ + +#define _GNU_SOURCE +#include <errno.h> +#include <fcntl.h> +#include <linux/landlock.h> +#include <linux/prctl.h> +#include <stddef.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/prctl.h> +#include <sys/syscall.h> +#include <unistd.h> + +#ifndef landlock + +#ifndef __NR_landlock +#define __NR_landlock 436 +#endif + +static inline int landlock(unsigned int command, unsigned int options, + size_t attr_size, void *attr_ptr) +{ + errno = 0; + return syscall(__NR_landlock, command, options, attr_size, attr_ptr, 0, + NULL); +} +#endif + +#define ENV_FS_RO_NAME "LL_FS_RO" +#define ENV_FS_RW_NAME "LL_FS_RW" +#define ENV_PATH_TOKEN ":" + +static int parse_path(char *env_path, const char ***path_list) +{ + int i, path_nb = 0; + + if (env_path) { + path_nb++; + for (i = 0; env_path[i]; i++) { + if (env_path[i] == ENV_PATH_TOKEN[0]) + path_nb++; + } + } + *path_list = malloc(path_nb * sizeof(**path_list)); + for (i = 0; i < path_nb; i++) + (*path_list)[i] = strsep(&env_path, ENV_PATH_TOKEN); + + return path_nb; +} + +static int populate_ruleset(const struct landlock_attr_features *attr_features, + const char *env_var, int ruleset_fd, __u64 allowed_access) +{ + int path_nb, i; + char *env_path_name; + const char **path_list = NULL; + struct landlock_attr_path_beneath path_beneath = { + .ruleset_fd = ruleset_fd, + .allowed_access = allowed_access, + .parent_fd = -1, + }; + + env_path_name = getenv(env_var); + if (!env_path_name) { + fprintf(stderr, "Missing environment variable %s\n", env_var); + return 1; + } + env_path_name = strdup(env_path_name); + unsetenv(env_var); + path_nb = parse_path(env_path_name, &path_list); + if (path_nb == 1 && path_list[0][0] == '\0') { + fprintf(stderr, "Missing path in %s\n", env_var); + goto err_free_name; + } + + /* follow a best-effort approach */ + path_beneath.allowed_access &= attr_features->access_fs; + for (i = 0; i < path_nb; i++) { + path_beneath.parent_fd = open(path_list[i], + O_PATH | O_NOFOLLOW | O_CLOEXEC); + if (path_beneath.parent_fd < 0) { + fprintf(stderr, "Failed to open "%s": %s\n", + path_list[i], + strerror(errno)); + goto err_free_name; + } + if (landlock(LANDLOCK_CMD_ADD_RULE, + LANDLOCK_OPT_ADD_RULE_PATH_BENEATH, + sizeof(path_beneath), &path_beneath)) { + fprintf(stderr, "Failed to update the ruleset with "%s": %s\n", + path_list[i], strerror(errno)); + close(path_beneath.parent_fd); + goto err_free_name; + } + close(path_beneath.parent_fd); + } + free(env_path_name); + return 0; + +err_free_name: + free(env_path_name); + return 1; +} + +#define ACCESS_FS_ROUGHLY_READ ( \ + LANDLOCK_ACCESS_FS_READ | \ + LANDLOCK_ACCESS_FS_READDIR | \ + LANDLOCK_ACCESS_FS_GETATTR | \ + LANDLOCK_ACCESS_FS_EXECUTE | \ + LANDLOCK_ACCESS_FS_CHROOT) + +#define ACCESS_FS_ROUGHLY_WRITE ( \ + LANDLOCK_ACCESS_FS_WRITE | \ + LANDLOCK_ACCESS_FS_TRUNCATE | \ + LANDLOCK_ACCESS_FS_LOCK | \ + LANDLOCK_ACCESS_FS_CHMOD | \ + LANDLOCK_ACCESS_FS_CHOWN | \ + LANDLOCK_ACCESS_FS_CHGRP | \ + LANDLOCK_ACCESS_FS_IOCTL | \ + LANDLOCK_ACCESS_FS_LINK_TO | \ + LANDLOCK_ACCESS_FS_RENAME_FROM | \ + LANDLOCK_ACCESS_FS_RENAME_TO | \ + LANDLOCK_ACCESS_FS_RMDIR | \ + LANDLOCK_ACCESS_FS_UNLINK | \ + LANDLOCK_ACCESS_FS_MAKE_CHAR | \ + LANDLOCK_ACCESS_FS_MAKE_DIR | \ + LANDLOCK_ACCESS_FS_MAKE_REG | \ + LANDLOCK_ACCESS_FS_MAKE_SOCK | \ + LANDLOCK_ACCESS_FS_MAKE_FIFO | \ + LANDLOCK_ACCESS_FS_MAKE_BLOCK | \ + LANDLOCK_ACCESS_FS_MAKE_SYM) + +int main(int argc, char * const argv[], char * const *envp) +{ + char *cmd_path; + char * const *cmd_argv; + int ruleset_fd; + struct landlock_attr_features attr_features; + struct landlock_attr_ruleset ruleset = { + /* only restrict open and getattr */ + .handled_access_fs = ACCESS_FS_ROUGHLY_READ | + ACCESS_FS_ROUGHLY_WRITE, + }; + struct landlock_attr_enforce attr_enforce = {}; + + if (argc < 2) { + fprintf(stderr, "usage: %s="..." %s="..." %s <cmd> [args]...\n\n", + ENV_FS_RO_NAME, ENV_FS_RW_NAME, argv[0]); + fprintf(stderr, "Launch a command in a restricted environment.\n\n"); + fprintf(stderr, "Environment variables containing paths, each separated by a colon:\n"); + fprintf(stderr, "* %s: list of paths allowed to be used in a read-only way.\n", + ENV_FS_RO_NAME); + fprintf(stderr, "* %s: list of paths allowed to be used in a read-write way.\n", + ENV_FS_RO_NAME); + fprintf(stderr, "\nexample:\n" + "%s="/bin:/lib:/usr" " + "%s="/dev/pts" " + "%s /bin/bash -i\n", + ENV_FS_RO_NAME, ENV_FS_RW_NAME, argv[0]); + return 1; + } + + if (landlock(LANDLOCK_CMD_GET_FEATURES, LANDLOCK_OPT_GET_FEATURES, + sizeof(attr_features), &attr_features)) { + perror("Failed to probe the Landlock supported features"); + switch (errno) { + case ENOSYS: + fprintf(stderr, "Hint: this kernel does not support Landlock.\n"); + break; + case ENOPKG: + fprintf(stderr, "Hint: Landlock is currently disabled. It can be enabled in the kernel configuration or at boot with the "lsm=landlock" parameter.\n"); + break; + } + return 1; + } + /* follow a best-effort approach */ + ruleset.handled_access_fs &= attr_features.access_fs; + ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET, + LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), + &ruleset); + if (ruleset_fd < 0) { + perror("Failed to create a ruleset"); + return 1; + } + if (populate_ruleset(&attr_features, ENV_FS_RO_NAME, ruleset_fd, + ACCESS_FS_ROUGHLY_READ)) { + goto err_close_ruleset; + } + if (populate_ruleset(&attr_features, ENV_FS_RW_NAME, ruleset_fd, + ACCESS_FS_ROUGHLY_READ | + ACCESS_FS_ROUGHLY_WRITE)) { + goto err_close_ruleset; + } + if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) { + perror("Failed to restrict privileges"); + goto err_close_ruleset; + } + attr_enforce.ruleset_fd = ruleset_fd; + if (landlock(LANDLOCK_CMD_ENFORCE_RULESET, + LANDLOCK_OPT_ENFORCE_RULESET, + sizeof(attr_enforce), &attr_enforce)) { + perror("Failed to enforce ruleset"); + goto err_close_ruleset; + } + close(ruleset_fd); + + cmd_path = argv[1]; + cmd_argv = argv + 1; + execve(cmd_path, cmd_argv, envp); + fprintf(stderr, "Failed to execute "%s"\n", cmd_path); + fprintf(stderr, "Hint: access to the binary or its shared libraries may be denied.\n"); + return 1; + +err_close_ruleset: + close(ruleset_fd); + return 1; +}
This documentation can be built with the Sphinx framework.
Another location might be more appropriate, though.
Signed-off-by: Mickaël Salaün mic@digikod.net Reviewed-by: Vincent Dagonneau vincent.dagonneau@ssi.gouv.fr Cc: Andy Lutomirski luto@amacapital.net Cc: James Morris jmorris@namei.org Cc: Kees Cook keescook@chromium.org Cc: Serge E. Hallyn serge@hallyn.com ---
Changes since v13: * Rewrote the documentation according to the major revamp.
Previous version: https://lore.kernel.org/lkml/20191104172146.30797-8-mic@digikod.net/ --- Documentation/security/index.rst | 1 + Documentation/security/landlock/index.rst | 18 ++ Documentation/security/landlock/kernel.rst | 44 ++++ Documentation/security/landlock/user.rst | 233 +++++++++++++++++++++ 4 files changed, 296 insertions(+) create mode 100644 Documentation/security/landlock/index.rst create mode 100644 Documentation/security/landlock/kernel.rst create mode 100644 Documentation/security/landlock/user.rst
diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst index fc503dd689a7..4d213e76ddf4 100644 --- a/Documentation/security/index.rst +++ b/Documentation/security/index.rst @@ -15,3 +15,4 @@ Security Documentation self-protection siphash tpm/index + landlock/index diff --git a/Documentation/security/landlock/index.rst b/Documentation/security/landlock/index.rst new file mode 100644 index 000000000000..dbd33b96ce60 --- /dev/null +++ b/Documentation/security/landlock/index.rst @@ -0,0 +1,18 @@ +========================================= +Landlock LSM: unprivileged access control +========================================= + +:Author: Mickaël Salaün + +The goal of Landlock is to enable to restrict ambient rights (e.g. global +filesystem access) for a set of processes. Because Landlock is a stackable +LSM, it makes possible to create safe security sandboxes as new security layers +in addition to the existing system-wide access-controls. This kind of sandbox +is expected to help mitigate the security impact of bugs or +unexpected/malicious behaviors in user-space applications. Landlock empower any +process, including unprivileged ones, to securely restrict themselves. + +.. toctree:: + + user + kernel diff --git a/Documentation/security/landlock/kernel.rst b/Documentation/security/landlock/kernel.rst new file mode 100644 index 000000000000..b87769909029 --- /dev/null +++ b/Documentation/security/landlock/kernel.rst @@ -0,0 +1,44 @@ +============================== +Landlock: kernel documentation +============================== + +Landlock's goal is to create scoped access-control (i.e. sandboxing). To +harden a whole system, this feature should be available to any process, +including unprivileged ones. Because such process may be compromised or +backdoored (i.e. untrusted), Landlock's features must be safe to use from the +kernel and other processes point of view. Landlock's interface must therefore +expose a minimal attack surface. + +Landlock is designed to be usable by unprivileged processes while following the +system security policy enforced by other access control mechanisms (e.g. DAC, +LSM). Indeed, a Landlock rule shall not interfere with other access-controls +enforced on the system, only add more restrictions. + +Any user can enforce Landlock rulesets on their processes. They are merged and +evaluated according to the inherited ones in a way that ensure that only more +constraints can be added. + + +Guiding principles for safe access controls +=========================================== + +* A Landlock rule shall be focused on access control on kernel objects instead + of syscall filtering (i.e. syscall arguments), which is the purpose of + seccomp-bpf. +* To avoid multiple kind of side-channel attacks (e.g. leak of security + policies, CPU-based attacks), Landlock rules shall not be able to + programmatically communicate with user space. +* Kernel access check shall not slow down access request from unsandboxed + processes. +* Computation related to Landlock operations (e.g. enforce a ruleset) shall + only impact the processes requesting them. + + +Landlock rulesets and domains +============================= + +A domain is a read-only ruleset tied to a set of subjects (i.e. tasks). A +domain can transition to a new one which is the intersection of the constraints +from the current and a new ruleset. The definition of a subject is implicit +for a task sandboxing itself, which makes the reasoning much easier and helps +avoid pitfalls. diff --git a/Documentation/security/landlock/user.rst b/Documentation/security/landlock/user.rst new file mode 100644 index 000000000000..cbd7f61fca8c --- /dev/null +++ b/Documentation/security/landlock/user.rst @@ -0,0 +1,233 @@ +================================= +Landlock: userspace documentation +================================= + +Landlock rules +============== + +A Landlock rule enables to describe an action on an object. An object is +currently a file hierarchy, and the related filesystem actions are defined in +`Access rights`_. A set of rules are aggregated in a ruleset, which can then +restricts the thread enforcing it, and its future children. + + +Defining and enforcing a security policy +---------------------------------------- + +Before defining a security policy, an application should first probe for the +features supported by the running kernel, which is important to be compatible +with older kernels. This can be done thanks to the `landlock` syscall (cf. +:ref:`syscall`). + +.. code-block:: c + + struct landlock_attr_features attr_features; + + if (landlock(LANDLOCK_CMD_GET_FEATURES, LANDLOCK_OPT_GET_FEATURES, + sizeof(attr_features), &attr_features)) { + perror("Failed to probe the Landlock supported features"); + return 1; + } + +Then, we need to create the ruleset that will contains our rules. For this +example, the ruleset will contains rules which only allow read actions, but +write actions will be denied. The ruleset then needs to handle both of these +kind of actions. To have a backward compatibility, these actions should be +ANDed with the supported ones. + +.. code-block:: c + + int ruleset_fd; + struct landlock_attr_ruleset ruleset = { + .handled_access_fs = + LANDLOCK_ACCESS_FS_READ | + LANDLOCK_ACCESS_FS_READDIR | + LANDLOCK_ACCESS_FS_EXECUTE | + LANDLOCK_ACCESS_FS_WRITE | + LANDLOCK_ACCESS_FS_TRUNCATE | + LANDLOCK_ACCESS_FS_CHMOD | + LANDLOCK_ACCESS_FS_CHOWN | + LANDLOCK_ACCESS_FS_CHGRP | + LANDLOCK_ACCESS_FS_LINK_TO | + LANDLOCK_ACCESS_FS_RENAME_FROM | + LANDLOCK_ACCESS_FS_RENAME_TO | + LANDLOCK_ACCESS_FS_RMDIR | + LANDLOCK_ACCESS_FS_UNLINK | + LANDLOCK_ACCESS_FS_MAKE_CHAR | + LANDLOCK_ACCESS_FS_MAKE_DIR | + LANDLOCK_ACCESS_FS_MAKE_REG | + LANDLOCK_ACCESS_FS_MAKE_SOCK | + LANDLOCK_ACCESS_FS_MAKE_FIFO | + LANDLOCK_ACCESS_FS_MAKE_BLOCK | + LANDLOCK_ACCESS_FS_MAKE_SYM, + }; + + ruleset.handled_access_fs &= attr_features.access_fs; + ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET, + LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset); + if (ruleset_fd < 0) { + perror("Failed to create a ruleset"); + return 1; + } + +We can now add a new rule to this ruleset thanks to the returned file +descriptor referring to this ruleset. The rule will only enable to read the +file hierarchy ``/usr``. Without other rule, write actions would then be +denied by the ruleset. To add ``/usr`` to the ruleset, we open it with the +``O_PATH`` flag and fill the &struct landlock_attr_path_beneath with this file +descriptor. + +.. code-block:: c + + int err; + struct landlock_attr_path_beneath path_beneath = { + .ruleset_fd = ruleset_fd, + .allowed_access = + LANDLOCK_ACCESS_FS_READ | + LANDLOCK_ACCESS_FS_READDIR | + LANDLOCK_ACCESS_FS_EXECUTE, + }; + + path_beneath.allowed_access &= attr_features.access_fs; + path_beneath.parent_fd = open("/usr", O_PATH | O_CLOEXEC); + if (path_beneath.parent_fd < 0) { + perror("Failed to open file"); + close(ruleset_fd); + return 1; + } + err = landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH, + sizeof(path_beneath), &path_beneath); + close(path_beneath.parent_fd); + if (err) { + perror("Failed to update ruleset"); + close(ruleset_fd); + return 1; + } + +We now have a ruleset with one rule allowing read access to ``/usr`` while +denying all accesses featured in ``attr_features.access_fs`` to everything else +on the filesystem. The next step is to restrict the current thread from +gaining more privileges (e.g. thanks to a SUID binary). + +.. code-block:: c + + if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) { + perror("Failed to restrict privileges"); + close(ruleset_fd); + return 1; + } + +The current thread is now ready to sandbox itself with the ruleset. + +.. code-block:: c + + struct landlock_attr_enforce attr_enforce = { + .ruleset_fd = ruleset_fd, + }; + + if (landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET, + sizeof(attr_enforce), &attr_enforce)) { + perror("Failed to enforce ruleset"); + close(ruleset_fd); + return 1; + } + close(ruleset_fd); + +If this last system call succeeds, the current thread is now restricted and +this policy will be enforced on all its subsequently created children as well. +Once a thread is landlocked, there is no way to remove its security policy, +only adding more restrictions is allowed. These threads are now in a new +Landlock domain, merge of their parent one (if any) with the new ruleset. + +A full working code can be found in `samples/landlock/sandboxer.c`_. + + +Inheritance +----------- + +Every new thread resulting from a :manpage:`clone(2)` inherits Landlock program +restrictions from its parent. This is similar to the seccomp inheritance (cf. +:doc:`/userspace-api/seccomp_filter`) or any other LSM dealing with task's +:manpage:`credentials(7)`. For instance, one process' thread may apply +Landlock rules to itself, but they will not be automatically applied to other +sibling threads (unlike POSIX thread credential changes, cf. +:manpage:`nptl(7)`). + + +Ptrace restrictions +------------------- + +A sandboxed process has less privileges than a non-sandboxed process and must +then be subject to additional restrictions when manipulating another process. +To be allowed to use :manpage:`ptrace(2)` and related syscalls on a target +process, a sandboxed process should have a subset of the target process rules, +which means the tracee must be in a sub-domain of the tracer. + + +.. _syscall: + +The `landlock` syscall and its arguments +======================================== + +.. kernel-doc:: security/landlock/syscall.c + :functions: sys_landlock + +Commands +-------- + +.. kernel-doc:: include/uapi/linux/landlock.h + :functions: landlock_cmd + +Options +------- + +.. kernel-doc:: include/uapi/linux/landlock.h + :functions: options_intro + options_get_features options_create_ruleset + options_add_rule options_enforce_ruleset + +Attributes +---------- + +.. kernel-doc:: include/uapi/linux/landlock.h + :functions: landlock_attr_features landlock_attr_ruleset + landlock_attr_path_beneath landlock_attr_enforce + +Access rights +------------- + +.. kernel-doc:: include/uapi/linux/landlock.h + :functions: fs_access + + +Questions and answers +===================== + +What about user space sandbox managers? +--------------------------------------- + +Using user space process to enforce restrictions on kernel resources can lead +to race conditions or inconsistent evaluations (i.e. `Incorrect mirroring of +the OS code and state +https://www.ndss-symposium.org/ndss2003/traps-and-pitfalls-practical-problems-system-call-interposition-based-security-tools/`_). + +What about namespaces and containers? +------------------------------------- + +Namespaces can help create sandboxes but they are not designed for +access-control and then miss useful features for such use case (e.g. no +fine-grained restrictions). Moreover, their complexity can lead to security +issues, especially when untrusted processes can manipulate them (cf. +`Controlling access to user namespaces https://lwn.net/Articles/673597/`_). + + +Additional documentation +======================== + +See https://landlock.io + + +.. Links +.. _samples/landlock/sandboxer.c: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/sample... +.. _tools/testing/selftests/landlock/: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/tools/... +.. _tools/testing/selftests/landlock/test_ptrace.c: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/tools/...
Hi, Here are a few corrections for you to consider.
On 2/24/20 8:02 AM, Mickaël Salaün wrote:
This documentation can be built with the Sphinx framework.
Another location might be more appropriate, though.
Signed-off-by: Mickaël Salaün mic@digikod.net Reviewed-by: Vincent Dagonneau vincent.dagonneau@ssi.gouv.fr Cc: Andy Lutomirski luto@amacapital.net Cc: James Morris jmorris@namei.org Cc: Kees Cook keescook@chromium.org Cc: Serge E. Hallyn serge@hallyn.com
Changes since v13:
- Rewrote the documentation according to the major revamp.
Previous version: https://lore.kernel.org/lkml/20191104172146.30797-8-mic@digikod.net/
Documentation/security/index.rst | 1 + Documentation/security/landlock/index.rst | 18 ++ Documentation/security/landlock/kernel.rst | 44 ++++ Documentation/security/landlock/user.rst | 233 +++++++++++++++++++++ 4 files changed, 296 insertions(+) create mode 100644 Documentation/security/landlock/index.rst create mode 100644 Documentation/security/landlock/kernel.rst create mode 100644 Documentation/security/landlock/user.rst
diff --git a/Documentation/security/landlock/index.rst b/Documentation/security/landlock/index.rst new file mode 100644 index 000000000000..dbd33b96ce60 --- /dev/null +++ b/Documentation/security/landlock/index.rst @@ -0,0 +1,18 @@ +========================================= +Landlock LSM: unprivileged access control +=========================================
+:Author: Mickaël Salaün
+The goal of Landlock is to enable to restrict ambient rights (e.g. global +filesystem access) for a set of processes. Because Landlock is a stackable +LSM, it makes possible to create safe security sandboxes as new security layers +in addition to the existing system-wide access-controls. This kind of sandbox +is expected to help mitigate the security impact of bugs or +unexpected/malicious behaviors in user-space applications. Landlock empower any
empowers
+process, including unprivileged ones, to securely restrict themselves.
+.. toctree::
- user
- kernel
diff --git a/Documentation/security/landlock/kernel.rst b/Documentation/security/landlock/kernel.rst new file mode 100644 index 000000000000..b87769909029 --- /dev/null +++ b/Documentation/security/landlock/kernel.rst @@ -0,0 +1,44 @@ +============================== +Landlock: kernel documentation +==============================
+Landlock's goal is to create scoped access-control (i.e. sandboxing). To +harden a whole system, this feature should be available to any process, +including unprivileged ones. Because such process may be compromised or +backdoored (i.e. untrusted), Landlock's features must be safe to use from the +kernel and other processes point of view. Landlock's interface must therefore +expose a minimal attack surface.
+Landlock is designed to be usable by unprivileged processes while following the +system security policy enforced by other access control mechanisms (e.g. DAC, +LSM). Indeed, a Landlock rule shall not interfere with other access-controls +enforced on the system, only add more restrictions.
+Any user can enforce Landlock rulesets on their processes. They are merged and +evaluated according to the inherited ones in a way that ensure that only more
ensures
+constraints can be added.
+Guiding principles for safe access controls +===========================================
+* A Landlock rule shall be focused on access control on kernel objects instead
- of syscall filtering (i.e. syscall arguments), which is the purpose of
- seccomp-bpf.
+* To avoid multiple kind of side-channel attacks (e.g. leak of security
kinds
- policies, CPU-based attacks), Landlock rules shall not be able to
- programmatically communicate with user space.
+* Kernel access check shall not slow down access request from unsandboxed
- processes.
+* Computation related to Landlock operations (e.g. enforce a ruleset) shall
- only impact the processes requesting them.
+Landlock rulesets and domains +=============================
+A domain is a read-only ruleset tied to a set of subjects (i.e. tasks). A +domain can transition to a new one which is the intersection of the constraints +from the current and a new ruleset. The definition of a subject is implicit +for a task sandboxing itself, which makes the reasoning much easier and helps +avoid pitfalls. diff --git a/Documentation/security/landlock/user.rst b/Documentation/security/landlock/user.rst new file mode 100644 index 000000000000..cbd7f61fca8c --- /dev/null +++ b/Documentation/security/landlock/user.rst @@ -0,0 +1,233 @@ +================================= +Landlock: userspace documentation +=================================
+Landlock rules +==============
+A Landlock rule enables to describe an action on an object. An object is +currently a file hierarchy, and the related filesystem actions are defined in +`Access rights`_. A set of rules are aggregated in a ruleset, which can then
is
+restricts the thread enforcing it, and its future children.
restrict
+Defining and enforcing a security policy +----------------------------------------
+Before defining a security policy, an application should first probe for the +features supported by the running kernel, which is important to be compatible +with older kernels. This can be done thanks to the `landlock` syscall (cf. +:ref:`syscall`).
+.. code-block:: c
- struct landlock_attr_features attr_features;
- if (landlock(LANDLOCK_CMD_GET_FEATURES, LANDLOCK_OPT_GET_FEATURES,
sizeof(attr_features), &attr_features)) {
perror("Failed to probe the Landlock supported features");
return 1;
- }
+Then, we need to create the ruleset that will contains our rules. For this
contain
+example, the ruleset will contains rules which only allow read actions, but
contain
+write actions will be denied. The ruleset then needs to handle both of these +kind of actions. To have a backward compatibility, these actions should be +ANDed with the supported ones.
+.. code-block:: c
- int ruleset_fd;
- struct landlock_attr_ruleset ruleset = {
.handled_access_fs =
LANDLOCK_ACCESS_FS_READ |
LANDLOCK_ACCESS_FS_READDIR |
LANDLOCK_ACCESS_FS_EXECUTE |
LANDLOCK_ACCESS_FS_WRITE |
LANDLOCK_ACCESS_FS_TRUNCATE |
LANDLOCK_ACCESS_FS_CHMOD |
LANDLOCK_ACCESS_FS_CHOWN |
LANDLOCK_ACCESS_FS_CHGRP |
LANDLOCK_ACCESS_FS_LINK_TO |
LANDLOCK_ACCESS_FS_RENAME_FROM |
LANDLOCK_ACCESS_FS_RENAME_TO |
LANDLOCK_ACCESS_FS_RMDIR |
LANDLOCK_ACCESS_FS_UNLINK |
LANDLOCK_ACCESS_FS_MAKE_CHAR |
LANDLOCK_ACCESS_FS_MAKE_DIR |
LANDLOCK_ACCESS_FS_MAKE_REG |
LANDLOCK_ACCESS_FS_MAKE_SOCK |
LANDLOCK_ACCESS_FS_MAKE_FIFO |
LANDLOCK_ACCESS_FS_MAKE_BLOCK |
LANDLOCK_ACCESS_FS_MAKE_SYM,
- };
- ruleset.handled_access_fs &= attr_features.access_fs;
- ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset);
- if (ruleset_fd < 0) {
perror("Failed to create a ruleset");
return 1;
- }
+We can now add a new rule to this ruleset thanks to the returned file +descriptor referring to this ruleset. The rule will only enable to read the +file hierarchy ``/usr``. Without other rule, write actions would then be
Without other rules, or Without another rule,
+denied by the ruleset. To add ``/usr`` to the ruleset, we open it with the +``O_PATH`` flag and fill the &struct landlock_attr_path_beneath with this file +descriptor.
+.. code-block:: c
- int err;
- struct landlock_attr_path_beneath path_beneath = {
.ruleset_fd = ruleset_fd,
.allowed_access =
LANDLOCK_ACCESS_FS_READ |
LANDLOCK_ACCESS_FS_READDIR |
LANDLOCK_ACCESS_FS_EXECUTE,
- };
- path_beneath.allowed_access &= attr_features.access_fs;
- path_beneath.parent_fd = open("/usr", O_PATH | O_CLOEXEC);
- if (path_beneath.parent_fd < 0) {
perror("Failed to open file");
close(ruleset_fd);
return 1;
- }
- err = landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
sizeof(path_beneath), &path_beneath);
- close(path_beneath.parent_fd);
- if (err) {
perror("Failed to update ruleset");
close(ruleset_fd);
return 1;
- }
+We now have a ruleset with one rule allowing read access to ``/usr`` while +denying all accesses featured in ``attr_features.access_fs`` to everything else +on the filesystem. The next step is to restrict the current thread from +gaining more privileges (e.g. thanks to a SUID binary).
+.. code-block:: c
- if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
perror("Failed to restrict privileges");
close(ruleset_fd);
return 1;
- }
+The current thread is now ready to sandbox itself with the ruleset.
+.. code-block:: c
- struct landlock_attr_enforce attr_enforce = {
.ruleset_fd = ruleset_fd,
- };
- if (landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET,
sizeof(attr_enforce), &attr_enforce)) {
perror("Failed to enforce ruleset");
close(ruleset_fd);
return 1;
- }
- close(ruleset_fd);
+If this last system call succeeds, the current thread is now restricted and
If this last landlock system call succeeds,
[because close() is the last system call]
+this policy will be enforced on all its subsequently created children as well. +Once a thread is landlocked, there is no way to remove its security policy,
preferably: policy;
+only adding more restrictions is allowed. These threads are now in a new +Landlock domain, merge of their parent one (if any) with the new ruleset.
+A full working code can be found in `samples/landlock/sandboxer.c`_.
Full working code
+Inheritance +-----------
+Every new thread resulting from a :manpage:`clone(2)` inherits Landlock program +restrictions from its parent. This is similar to the seccomp inheritance (cf. +:doc:`/userspace-api/seccomp_filter`) or any other LSM dealing with task's +:manpage:`credentials(7)`. For instance, one process' thread may apply
process's
+Landlock rules to itself, but they will not be automatically applied to other +sibling threads (unlike POSIX thread credential changes, cf. +:manpage:`nptl(7)`).
[snip]
thanks for the documentation.
On 29/02/2020 18:23, Randy Dunlap wrote:
Hi, Here are a few corrections for you to consider.
On 2/24/20 8:02 AM, Mickaël Salaün wrote:
This documentation can be built with the Sphinx framework.
Another location might be more appropriate, though.
Signed-off-by: Mickaël Salaün mic@digikod.net Reviewed-by: Vincent Dagonneau vincent.dagonneau@ssi.gouv.fr Cc: Andy Lutomirski luto@amacapital.net Cc: James Morris jmorris@namei.org Cc: Kees Cook keescook@chromium.org Cc: Serge E. Hallyn serge@hallyn.com
Changes since v13:
- Rewrote the documentation according to the major revamp.
Previous version: https://lore.kernel.org/lkml/20191104172146.30797-8-mic@digikod.net/
Documentation/security/index.rst | 1 + Documentation/security/landlock/index.rst | 18 ++ Documentation/security/landlock/kernel.rst | 44 ++++ Documentation/security/landlock/user.rst | 233 +++++++++++++++++++++ 4 files changed, 296 insertions(+) create mode 100644 Documentation/security/landlock/index.rst create mode 100644 Documentation/security/landlock/kernel.rst create mode 100644 Documentation/security/landlock/user.rst
diff --git a/Documentation/security/landlock/index.rst b/Documentation/security/landlock/index.rst new file mode 100644 index 000000000000..dbd33b96ce60 --- /dev/null +++ b/Documentation/security/landlock/index.rst @@ -0,0 +1,18 @@ +========================================= +Landlock LSM: unprivileged access control +=========================================
+:Author: Mickaël Salaün
+The goal of Landlock is to enable to restrict ambient rights (e.g. global +filesystem access) for a set of processes. Because Landlock is a stackable +LSM, it makes possible to create safe security sandboxes as new security layers +in addition to the existing system-wide access-controls. This kind of sandbox +is expected to help mitigate the security impact of bugs or +unexpected/malicious behaviors in user-space applications. Landlock empower any
empowers
+process, including unprivileged ones, to securely restrict themselves.
+.. toctree::
- user
- kernel
diff --git a/Documentation/security/landlock/kernel.rst b/Documentation/security/landlock/kernel.rst new file mode 100644 index 000000000000..b87769909029 --- /dev/null +++ b/Documentation/security/landlock/kernel.rst @@ -0,0 +1,44 @@ +============================== +Landlock: kernel documentation +==============================
+Landlock's goal is to create scoped access-control (i.e. sandboxing). To +harden a whole system, this feature should be available to any process, +including unprivileged ones. Because such process may be compromised or +backdoored (i.e. untrusted), Landlock's features must be safe to use from the +kernel and other processes point of view. Landlock's interface must therefore +expose a minimal attack surface.
+Landlock is designed to be usable by unprivileged processes while following the +system security policy enforced by other access control mechanisms (e.g. DAC, +LSM). Indeed, a Landlock rule shall not interfere with other access-controls +enforced on the system, only add more restrictions.
+Any user can enforce Landlock rulesets on their processes. They are merged and +evaluated according to the inherited ones in a way that ensure that only more
ensures
+constraints can be added.
+Guiding principles for safe access controls +===========================================
+* A Landlock rule shall be focused on access control on kernel objects instead
- of syscall filtering (i.e. syscall arguments), which is the purpose of
- seccomp-bpf.
+* To avoid multiple kind of side-channel attacks (e.g. leak of security
kinds
- policies, CPU-based attacks), Landlock rules shall not be able to
- programmatically communicate with user space.
+* Kernel access check shall not slow down access request from unsandboxed
- processes.
+* Computation related to Landlock operations (e.g. enforce a ruleset) shall
- only impact the processes requesting them.
+Landlock rulesets and domains +=============================
+A domain is a read-only ruleset tied to a set of subjects (i.e. tasks). A +domain can transition to a new one which is the intersection of the constraints +from the current and a new ruleset. The definition of a subject is implicit +for a task sandboxing itself, which makes the reasoning much easier and helps +avoid pitfalls. diff --git a/Documentation/security/landlock/user.rst b/Documentation/security/landlock/user.rst new file mode 100644 index 000000000000..cbd7f61fca8c --- /dev/null +++ b/Documentation/security/landlock/user.rst @@ -0,0 +1,233 @@ +================================= +Landlock: userspace documentation +=================================
+Landlock rules +==============
+A Landlock rule enables to describe an action on an object. An object is +currently a file hierarchy, and the related filesystem actions are defined in +`Access rights`_. A set of rules are aggregated in a ruleset, which can then
is
+restricts the thread enforcing it, and its future children.
restrict
+Defining and enforcing a security policy +----------------------------------------
+Before defining a security policy, an application should first probe for the +features supported by the running kernel, which is important to be compatible +with older kernels. This can be done thanks to the `landlock` syscall (cf. +:ref:`syscall`).
+.. code-block:: c
- struct landlock_attr_features attr_features;
- if (landlock(LANDLOCK_CMD_GET_FEATURES, LANDLOCK_OPT_GET_FEATURES,
sizeof(attr_features), &attr_features)) {
perror("Failed to probe the Landlock supported features");
return 1;
- }
+Then, we need to create the ruleset that will contains our rules. For this
contain
+example, the ruleset will contains rules which only allow read actions, but
contain
+write actions will be denied. The ruleset then needs to handle both of these +kind of actions. To have a backward compatibility, these actions should be +ANDed with the supported ones.
+.. code-block:: c
- int ruleset_fd;
- struct landlock_attr_ruleset ruleset = {
.handled_access_fs =
LANDLOCK_ACCESS_FS_READ |
LANDLOCK_ACCESS_FS_READDIR |
LANDLOCK_ACCESS_FS_EXECUTE |
LANDLOCK_ACCESS_FS_WRITE |
LANDLOCK_ACCESS_FS_TRUNCATE |
LANDLOCK_ACCESS_FS_CHMOD |
LANDLOCK_ACCESS_FS_CHOWN |
LANDLOCK_ACCESS_FS_CHGRP |
LANDLOCK_ACCESS_FS_LINK_TO |
LANDLOCK_ACCESS_FS_RENAME_FROM |
LANDLOCK_ACCESS_FS_RENAME_TO |
LANDLOCK_ACCESS_FS_RMDIR |
LANDLOCK_ACCESS_FS_UNLINK |
LANDLOCK_ACCESS_FS_MAKE_CHAR |
LANDLOCK_ACCESS_FS_MAKE_DIR |
LANDLOCK_ACCESS_FS_MAKE_REG |
LANDLOCK_ACCESS_FS_MAKE_SOCK |
LANDLOCK_ACCESS_FS_MAKE_FIFO |
LANDLOCK_ACCESS_FS_MAKE_BLOCK |
LANDLOCK_ACCESS_FS_MAKE_SYM,
- };
- ruleset.handled_access_fs &= attr_features.access_fs;
- ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET,
LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset);
- if (ruleset_fd < 0) {
perror("Failed to create a ruleset");
return 1;
- }
+We can now add a new rule to this ruleset thanks to the returned file +descriptor referring to this ruleset. The rule will only enable to read the +file hierarchy ``/usr``. Without other rule, write actions would then be
Without other rules,
or Without another rule,
+denied by the ruleset. To add ``/usr`` to the ruleset, we open it with the +``O_PATH`` flag and fill the &struct landlock_attr_path_beneath with this file +descriptor.
+.. code-block:: c
- int err;
- struct landlock_attr_path_beneath path_beneath = {
.ruleset_fd = ruleset_fd,
.allowed_access =
LANDLOCK_ACCESS_FS_READ |
LANDLOCK_ACCESS_FS_READDIR |
LANDLOCK_ACCESS_FS_EXECUTE,
- };
- path_beneath.allowed_access &= attr_features.access_fs;
- path_beneath.parent_fd = open("/usr", O_PATH | O_CLOEXEC);
- if (path_beneath.parent_fd < 0) {
perror("Failed to open file");
close(ruleset_fd);
return 1;
- }
- err = landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH,
sizeof(path_beneath), &path_beneath);
- close(path_beneath.parent_fd);
- if (err) {
perror("Failed to update ruleset");
close(ruleset_fd);
return 1;
- }
+We now have a ruleset with one rule allowing read access to ``/usr`` while +denying all accesses featured in ``attr_features.access_fs`` to everything else +on the filesystem. The next step is to restrict the current thread from +gaining more privileges (e.g. thanks to a SUID binary).
+.. code-block:: c
- if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
perror("Failed to restrict privileges");
close(ruleset_fd);
return 1;
- }
+The current thread is now ready to sandbox itself with the ruleset.
+.. code-block:: c
- struct landlock_attr_enforce attr_enforce = {
.ruleset_fd = ruleset_fd,
- };
- if (landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET,
sizeof(attr_enforce), &attr_enforce)) {
perror("Failed to enforce ruleset");
close(ruleset_fd);
return 1;
- }
- close(ruleset_fd);
+If this last system call succeeds, the current thread is now restricted and
If this last landlock system call succeeds,
[because close() is the last system call]
+this policy will be enforced on all its subsequently created children as well. +Once a thread is landlocked, there is no way to remove its security policy,
preferably: policy;
+only adding more restrictions is allowed. These threads are now in a new +Landlock domain, merge of their parent one (if any) with the new ruleset.
+A full working code can be found in `samples/landlock/sandboxer.c`_.
Full working code
+Inheritance +-----------
+Every new thread resulting from a :manpage:`clone(2)` inherits Landlock program +restrictions from its parent. This is similar to the seccomp inheritance (cf. +:doc:`/userspace-api/seccomp_filter`) or any other LSM dealing with task's +:manpage:`credentials(7)`. For instance, one process' thread may apply
process's
+Landlock rules to itself, but they will not be automatically applied to other +sibling threads (unlike POSIX thread credential changes, cf. +:manpage:`nptl(7)`).
[snip]
thanks for the documentation.
Done. Thanks for this attentive review!
On 2/24/20 8:02 AM, Mickaël Salaün wrote:
## Syscall
Because it is only tested on x86_64, the syscall is only wired up for this architecture. The whole x86 family (and probably all the others) will be supported in the next patch series.
General question for u. What is it meant "whole x86 family will be supported". 32-bit x86 will be supported?
Thanks, Jay
On 25/02/2020 19:49, J Freyensee wrote:
On 2/24/20 8:02 AM, Mickaël Salaün wrote:
## Syscall
Because it is only tested on x86_64, the syscall is only wired up for this architecture. The whole x86 family (and probably all the others) will be supported in the next patch series.
General question for u. What is it meant "whole x86 family will be supported". 32-bit x86 will be supported?
Yes, I was referring to x86_32, x86_64 and x32, but all architectures should be supported.
On Mon, Feb 24, 2020 at 5:03 PM Mickaël Salaün mic@digikod.net wrote:
This new version of Landlock is a major revamp of the previous series [1], hence the RFC tag. The three main changes are the replacement of eBPF with a dedicated safe management of access rules, the replacement of the use of seccomp(2) with a dedicated syscall, and the management of filesystem access-control (back from the v10).
As discussed in [2], eBPF may be too powerful and dangerous to be put in the hand of unprivileged and potentially malicious processes, especially because of side-channel attacks against access-controls or other parts of the kernel.
Thanks to this new implementation (1540 SLOC), designed from the ground to be used by unprivileged processes, this series enables a process to sandbox itself without requiring CAP_SYS_ADMIN, but only the no_new_privs constraint (like seccomp). Not relying on eBPF also enables to improve performances, especially for stacked security policies thanks to mergeable rulesets.
The compiled documentation is available here: https://landlock.io/linux-doc/landlock-v14/security/landlock/index.html
This series can be applied on top of v5.6-rc3. This can be tested with CONFIG_SECURITY_LANDLOCK and CONFIG_SAMPLE_LANDLOCK. This patch series can be found in a Git repository here: https://github.com/landlock-lsm/linux/commits/landlock-v14 I would really appreciate constructive comments on the design and the code.
I've looked through the patchset, and I think that it would be possible to simplify it quite a bit. I have tried to do that (and compiled-tested it, but not actually tried running it); here's what I came up with:
https://github.com/thejh/linux/commits/landlock-mod
The three modified patches (patches 1, 2 and 5) are marked with "[MODIFIED]" in their title. Please take a look - what do you think? Feel free to integrate my changes into your patches if you think they make sense.
Apart from simplifying the code, I also found the following issues, which I have fixed in the modified patches:
put_hierarchy() has to drop a reference on its parent. (However, this must not recurse, so we have to do it with a loop.)
put_ruleset() is not in an RCU read-side critical section, so as soon as it calls kfree_rcu(), "freeme" might disappear; but "orig" is in "freeme", so when the loop tries to find the next element with rb_next(orig), that can be a UAF. rbtree_postorder_for_each_entry_safe() exists for dealing with such issues.
AFAIK the calls to rb_erase() in clean_ruleset() is not safe if someone is concurrently accessing the rbtree as an RCU reader, because concurrent rotations can prevent a lookup from succeeding. The simplest fix is probably to just make any rbtree that has been installed on a process immutable, and give up on the cleaning - arguably the memory wastage that can cause is pretty limited. (By the way, as a future optimization, we might want to turn the rbtree into a hashtable when installing it?)
The iput() in landlock_release_inode() looks unsafe - you need to guarantee that even if the deletion of a ruleset races with generic_shutdown_super(), every iput() for that superblock finishes before landlock_release_inodes() returns, even if the iput() is happening in the context of ruleset deletion. This is why fsnotify_unmount_inodes() has that wait_var_event() at the end.
Aside from those things, there is also a major correctness issue where I'm not sure how to solve it properly:
Let's say a process installs a filter on itself like this:
struct landlock_attr_ruleset ruleset = { .handled_access_fs = ACCESS_FS_ROUGHLY_WRITE}; int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET, LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset); struct landlock_attr_path_beneath path_beneath = { .ruleset_fd = ruleset_fd, .allowed_access = ACCESS_FS_ROUGHLY_WRITE, .parent_fd = open("/tmp/foobar", O_PATH), }; landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH, sizeof(path_beneath), &path_beneath); prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd }; landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET, sizeof(attr_enforce), &attr_enforce);
At this point, the process is not supposed to be able to write to anything outside /tmp/foobar, right? But what happens if the process does the following next?
struct landlock_attr_ruleset ruleset = { .handled_access_fs = ACCESS_FS_ROUGHLY_WRITE}; int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET, LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset); struct landlock_attr_path_beneath path_beneath = { .ruleset_fd = ruleset_fd, .allowed_access = ACCESS_FS_ROUGHLY_WRITE, .parent_fd = open("/", O_PATH), }; landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH, sizeof(path_beneath), &path_beneath); prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd }; landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET, sizeof(attr_enforce), &attr_enforce);
As far as I can tell from looking at the source, after this, you will have write access to the entire filesystem again. I think the idea is that LANDLOCK_CMD_ENFORCE_RULESET should only let you drop privileges, not increase them, right?
I think the easy way to fix this would be to add a bitmask to each rule that says from which ruleset it originally comes, and then let check_access_path() collect these bitmasks from each rule with OR, and check at the end whether the resulting bitmask is full - if not, at least one of the rulesets did not permit the access, and it should be denied.
But maybe it would make more sense to change how the API works instead, and get rid of the concept of "merging" two rulesets together? Instead, we could make the API work like this:
- LANDLOCK_CMD_CREATE_RULESET gives you a file descriptor whose ->private_data contains a pointer to the old ruleset of the process, as well as a pointer to a new empty ruleset. - LANDLOCK_CMD_ADD_RULE fails if the specified rule would not be permitted by the old ruleset, then adds the rule to the new ruleset - LANDLOCK_CMD_ENFORCE_RULESET fails if the old ruleset pointer in ->private_data doesn't match the current ruleset of the process, then replaces the old ruleset with the new ruleset.
With this, the new ruleset is guaranteed to be a subset of the old ruleset because each of the new ruleset's rules is permitted by the old ruleset. (Unless the directory hierarchy rotates, but in that case the inaccuracy isn't much worse than what would've been possible through RCU path walk anyway AFAIK.)
What do you think?
On 10/03/2020 00:44, Jann Horn wrote:
On Mon, Feb 24, 2020 at 5:03 PM Mickaël Salaün mic@digikod.net wrote:
This new version of Landlock is a major revamp of the previous series [1], hence the RFC tag. The three main changes are the replacement of eBPF with a dedicated safe management of access rules, the replacement of the use of seccomp(2) with a dedicated syscall, and the management of filesystem access-control (back from the v10).
As discussed in [2], eBPF may be too powerful and dangerous to be put in the hand of unprivileged and potentially malicious processes, especially because of side-channel attacks against access-controls or other parts of the kernel.
Thanks to this new implementation (1540 SLOC), designed from the ground to be used by unprivileged processes, this series enables a process to sandbox itself without requiring CAP_SYS_ADMIN, but only the no_new_privs constraint (like seccomp). Not relying on eBPF also enables to improve performances, especially for stacked security policies thanks to mergeable rulesets.
The compiled documentation is available here: https://landlock.io/linux-doc/landlock-v14/security/landlock/index.html
This series can be applied on top of v5.6-rc3. This can be tested with CONFIG_SECURITY_LANDLOCK and CONFIG_SAMPLE_LANDLOCK. This patch series can be found in a Git repository here: https://github.com/landlock-lsm/linux/commits/landlock-v14 I would really appreciate constructive comments on the design and the code.
I've looked through the patchset, and I think that it would be possible to simplify it quite a bit. I have tried to do that (and compiled-tested it, but not actually tried running it); here's what I came up with:
https://github.com/thejh/linux/commits/landlock-mod
The three modified patches (patches 1, 2 and 5) are marked with "[MODIFIED]" in their title. Please take a look - what do you think? Feel free to integrate my changes into your patches if you think they make sense.
Regarding the landlock_release_inodes(), the final wait_var_event() is indeed needed (as does fsnotify), but why do you use a READ_ONCE() for landlock_initialized?
I was reluctant to use function pointers but landlock_object_operations makes a cleaner and more generic interface to manage objects.
Your get_inode_object() is much simpler and easier to understand than the get_object() and get_cleaner(). The other main change is about the object cross-reference: you entirely removed it, which means that an object will only be free when there are no rules using it. This does not free an object when its underlying object is being terminated. We now only have to worry about the termination of the parent of an underlying object (e.g. the super-block of an inode).
However, I think you forgot to increment object->usage in create_ruleset_elem(). There is also an unused checked_mask variable in merge_ruleset().
All this removes optimizations that made the code more difficult to understand. The performance difference is negligible, and I think that the memory footprint is fine. These optimizations (and others) could be discussed later. I'm integrating most of your changes in the next patch series.
Thank you very much for this review and the code.
Apart from simplifying the code, I also found the following issues, which I have fixed in the modified patches:
put_hierarchy() has to drop a reference on its parent. (However, this must not recurse, so we have to do it with a loop.)
Right, fixed.
put_ruleset() is not in an RCU read-side critical section, so as soon as it calls kfree_rcu(), "freeme" might disappear; but "orig" is in "freeme", so when the loop tries to find the next element with rb_next(orig), that can be a UAF. rbtree_postorder_for_each_entry_safe() exists for dealing with such issues.
Good catch.
AFAIK the calls to rb_erase() in clean_ruleset() is not safe if someone is concurrently accessing the rbtree as an RCU reader, because concurrent rotations can prevent a lookup from succeeding. The simplest fix is probably to just make any rbtree that has been installed on a process immutable, and give up on the cleaning - arguably the memory wastage that can cause is pretty limited.
Yes, let's go for immutable domains.
(By the way, as a future optimization, we might want to turn the rbtree into a hashtable when installing it?)
Definitely. This was a previous (private) implementation I did for domains, but to simplify the code I reused the same type as a ruleset. A future evolution of Landlock could add back this optimization.
The iput() in landlock_release_inode() looks unsafe - you need to guarantee that even if the deletion of a ruleset races with generic_shutdown_super(), every iput() for that superblock finishes before landlock_release_inodes() returns, even if the iput() is happening in the context of ruleset deletion. This is why fsnotify_unmount_inodes() has that wait_var_event() at the end.
Right, much better with that.
Aside from those things, there is also a major correctness issue where I'm not sure how to solve it properly:
Let's say a process installs a filter on itself like this:
struct landlock_attr_ruleset ruleset = { .handled_access_fs = ACCESS_FS_ROUGHLY_WRITE}; int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET, LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset); struct landlock_attr_path_beneath path_beneath = { .ruleset_fd = ruleset_fd, .allowed_access = ACCESS_FS_ROUGHLY_WRITE, .parent_fd = open("/tmp/foobar", O_PATH), }; landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH, sizeof(path_beneath), &path_beneath); prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd }; landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET, sizeof(attr_enforce), &attr_enforce);
At this point, the process is not supposed to be able to write to anything outside /tmp/foobar, right? But what happens if the process does the following next?
struct landlock_attr_ruleset ruleset = { .handled_access_fs = ACCESS_FS_ROUGHLY_WRITE}; int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET, LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset); struct landlock_attr_path_beneath path_beneath = { .ruleset_fd = ruleset_fd, .allowed_access = ACCESS_FS_ROUGHLY_WRITE, .parent_fd = open("/", O_PATH), }; landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH, sizeof(path_beneath), &path_beneath); prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd }; landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET, sizeof(attr_enforce), &attr_enforce);
As far as I can tell from looking at the source, after this, you will have write access to the entire filesystem again. I think the idea is that LANDLOCK_CMD_ENFORCE_RULESET should only let you drop privileges, not increase them, right?
There is an additionnal check in syscall.c:get_path_from_fd(): it is forbidden to add a rule with a path which is not accessible (according to LANDLOCK_ACCESS_FS_OPEN) thanks to a call to security_file_open(), but this is definitely not perfect.
I think the easy way to fix this would be to add a bitmask to each rule that says from which ruleset it originally comes, and then let check_access_path() collect these bitmasks from each rule with OR, and check at the end whether the resulting bitmask is full - if not, at least one of the rulesets did not permit the access, and it should be denied.
But maybe it would make more sense to change how the API works instead, and get rid of the concept of "merging" two rulesets together? Instead, we could make the API work like this:
- LANDLOCK_CMD_CREATE_RULESET gives you a file descriptor whose
->private_data contains a pointer to the old ruleset of the process, as well as a pointer to a new empty ruleset.
- LANDLOCK_CMD_ADD_RULE fails if the specified rule would not be
permitted by the old ruleset, then adds the rule to the new ruleset
- LANDLOCK_CMD_ENFORCE_RULESET fails if the old ruleset pointer in
->private_data doesn't match the current ruleset of the process, then replaces the old ruleset with the new ruleset.
With this, the new ruleset is guaranteed to be a subset of the old ruleset because each of the new ruleset's rules is permitted by the old ruleset. (Unless the directory hierarchy rotates, but in that case the inaccuracy isn't much worse than what would've been possible through RCU path walk anyway AFAIK.)
What do you think?
I would prefer to add the same checks you described at first (with check_access_path), but only when creating a new ruleset with merge_ruleset() (which should probably be renamed). This enables not to rely on a parent ruleset/domain until the enforcement, which is the case anyway. Unfortunately this doesn't work for some cases with bind mounts. Because check_access_path() goes through one path, another (bind mounted) path could be illegitimately allowed. That makes the problem a bit more complicated. A solution may be to keep track of the hierarchy of each rule (e.g. with a layer/depth number), and only allow an access request if at least a rule of each layer allow this access. In this case we also need to correctly handle the case when rules from different layers are tied to the same object.
I would like Landlock to have "pure" syscalls, in the sense that a process A (e.g. a daemon) could prepare a ruleset and sends its FD to a process B which would then be able to use it to sandbox itself. I think it makes the reasoning clearer not to have a given ruleset (FD) tied to a domain (i.e. parent ruleset) at first. Landlock should (as much as possible) return an error if a syscall argument is invalid, not according to the current access control (which is not the case currently because of the security_file_open() check). This means that these additional merge_ruleset() checks should only affect the new domain/ruleset, but it should not be visible to userspace.
In a future evolution, it may be useful to add a lock/seal command to deny any additional rule enforcement. However that may be counter-productive because that enable application developers (e.g. for a shell) to deny the use of Landlock features to its child processes. But it would be possible anyway with seccomp-bpf…
On Thu, Mar 12, 2020 at 12:38 AM Mickaël Salaün mic@digikod.net wrote:
On 10/03/2020 00:44, Jann Horn wrote:
On Mon, Feb 24, 2020 at 5:03 PM Mickaël Salaün mic@digikod.net wrote:
This new version of Landlock is a major revamp of the previous series [1], hence the RFC tag. The three main changes are the replacement of eBPF with a dedicated safe management of access rules, the replacement of the use of seccomp(2) with a dedicated syscall, and the management of filesystem access-control (back from the v10).
As discussed in [2], eBPF may be too powerful and dangerous to be put in the hand of unprivileged and potentially malicious processes, especially because of side-channel attacks against access-controls or other parts of the kernel.
Thanks to this new implementation (1540 SLOC), designed from the ground to be used by unprivileged processes, this series enables a process to sandbox itself without requiring CAP_SYS_ADMIN, but only the no_new_privs constraint (like seccomp). Not relying on eBPF also enables to improve performances, especially for stacked security policies thanks to mergeable rulesets.
The compiled documentation is available here: https://landlock.io/linux-doc/landlock-v14/security/landlock/index.html
This series can be applied on top of v5.6-rc3. This can be tested with CONFIG_SECURITY_LANDLOCK and CONFIG_SAMPLE_LANDLOCK. This patch series can be found in a Git repository here: https://github.com/landlock-lsm/linux/commits/landlock-v14 I would really appreciate constructive comments on the design and the code.
I've looked through the patchset, and I think that it would be possible to simplify it quite a bit. I have tried to do that (and compiled-tested it, but not actually tried running it); here's what I came up with:
https://github.com/thejh/linux/commits/landlock-mod
The three modified patches (patches 1, 2 and 5) are marked with "[MODIFIED]" in their title. Please take a look - what do you think? Feel free to integrate my changes into your patches if you think they make sense.
Regarding the landlock_release_inodes(), the final wait_var_event() is indeed needed (as does fsnotify), but why do you use a READ_ONCE() for landlock_initialized?
Ah, good point - that READ_ONCE() should be unnecessary.
The other main change is about the object cross-reference: you entirely removed it, which means that an object will only be free when there are no rules using it. This does not free an object when its underlying object is being terminated. We now only have to worry about the termination of the parent of an underlying object (e.g. the super-block of an inode).
However, I think you forgot to increment object->usage in create_ruleset_elem().
Whoops, you're right.
There is also an unused checked_mask variable in merge_ruleset().
Oh, yeah, oops.
All this removes optimizations that made the code more difficult to understand. The performance difference is negligible, and I think that the memory footprint is fine. These optimizations (and others) could be discussed later. I'm integrating most of your changes in the next patch series.
:)
Aside from those things, there is also a major correctness issue where I'm not sure how to solve it properly:
Let's say a process installs a filter on itself like this:
struct landlock_attr_ruleset ruleset = { .handled_access_fs = ACCESS_FS_ROUGHLY_WRITE}; int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET, LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset); struct landlock_attr_path_beneath path_beneath = { .ruleset_fd = ruleset_fd, .allowed_access = ACCESS_FS_ROUGHLY_WRITE, .parent_fd = open("/tmp/foobar", O_PATH), }; landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH, sizeof(path_beneath), &path_beneath); prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd }; landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET, sizeof(attr_enforce), &attr_enforce);
At this point, the process is not supposed to be able to write to anything outside /tmp/foobar, right? But what happens if the process does the following next?
struct landlock_attr_ruleset ruleset = { .handled_access_fs = ACCESS_FS_ROUGHLY_WRITE}; int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET, LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset); struct landlock_attr_path_beneath path_beneath = { .ruleset_fd = ruleset_fd, .allowed_access = ACCESS_FS_ROUGHLY_WRITE, .parent_fd = open("/", O_PATH), }; landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH, sizeof(path_beneath), &path_beneath); prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd }; landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET, sizeof(attr_enforce), &attr_enforce);
As far as I can tell from looking at the source, after this, you will have write access to the entire filesystem again. I think the idea is that LANDLOCK_CMD_ENFORCE_RULESET should only let you drop privileges, not increase them, right?
There is an additionnal check in syscall.c:get_path_from_fd(): it is forbidden to add a rule with a path which is not accessible (according to LANDLOCK_ACCESS_FS_OPEN) thanks to a call to security_file_open(), but this is definitely not perfect.
Ah, I missed that.
I think the easy way to fix this would be to add a bitmask to each rule that says from which ruleset it originally comes, and then let check_access_path() collect these bitmasks from each rule with OR, and check at the end whether the resulting bitmask is full - if not, at least one of the rulesets did not permit the access, and it should be denied.
But maybe it would make more sense to change how the API works instead, and get rid of the concept of "merging" two rulesets together? Instead, we could make the API work like this:
- LANDLOCK_CMD_CREATE_RULESET gives you a file descriptor whose
->private_data contains a pointer to the old ruleset of the process, as well as a pointer to a new empty ruleset.
- LANDLOCK_CMD_ADD_RULE fails if the specified rule would not be
permitted by the old ruleset, then adds the rule to the new ruleset
- LANDLOCK_CMD_ENFORCE_RULESET fails if the old ruleset pointer in
->private_data doesn't match the current ruleset of the process, then replaces the old ruleset with the new ruleset.
With this, the new ruleset is guaranteed to be a subset of the old ruleset because each of the new ruleset's rules is permitted by the old ruleset. (Unless the directory hierarchy rotates, but in that case the inaccuracy isn't much worse than what would've been possible through RCU path walk anyway AFAIK.)
What do you think?
I would prefer to add the same checks you described at first (with check_access_path), but only when creating a new ruleset with merge_ruleset() (which should probably be renamed). This enables not to rely on a parent ruleset/domain until the enforcement, which is the case anyway. Unfortunately this doesn't work for some cases with bind mounts. Because check_access_path() goes through one path, another (bind mounted) path could be illegitimately allowed.
Hmm... I'm not sure what you mean. At the moment, landlock doesn't allow any sandboxed process to change the mount hierarchy, right? Can you give an example where this would go wrong?
That makes the problem a bit more complicated. A solution may be to keep track of the hierarchy of each rule (e.g. with a layer/depth number), and only allow an access request if at least a rule of each layer allow this access. In this case we also need to correctly handle the case when rules from different layers are tied to the same object.
On 17/03/2020 17:19, Jann Horn wrote:
On Thu, Mar 12, 2020 at 12:38 AM Mickaël Salaün mic@digikod.net wrote:
On 10/03/2020 00:44, Jann Horn wrote:
On Mon, Feb 24, 2020 at 5:03 PM Mickaël Salaün mic@digikod.net wrote:
[...]
Aside from those things, there is also a major correctness issue where I'm not sure how to solve it properly:
Let's say a process installs a filter on itself like this:
struct landlock_attr_ruleset ruleset = { .handled_access_fs = ACCESS_FS_ROUGHLY_WRITE}; int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET, LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset); struct landlock_attr_path_beneath path_beneath = { .ruleset_fd = ruleset_fd, .allowed_access = ACCESS_FS_ROUGHLY_WRITE, .parent_fd = open("/tmp/foobar", O_PATH), }; landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH, sizeof(path_beneath), &path_beneath); prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd }; landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET, sizeof(attr_enforce), &attr_enforce);
At this point, the process is not supposed to be able to write to anything outside /tmp/foobar, right? But what happens if the process does the following next?
struct landlock_attr_ruleset ruleset = { .handled_access_fs = ACCESS_FS_ROUGHLY_WRITE}; int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET, LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset); struct landlock_attr_path_beneath path_beneath = { .ruleset_fd = ruleset_fd, .allowed_access = ACCESS_FS_ROUGHLY_WRITE, .parent_fd = open("/", O_PATH), }; landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH, sizeof(path_beneath), &path_beneath); prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd }; landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET, sizeof(attr_enforce), &attr_enforce);
As far as I can tell from looking at the source, after this, you will have write access to the entire filesystem again. I think the idea is that LANDLOCK_CMD_ENFORCE_RULESET should only let you drop privileges, not increase them, right?
There is an additionnal check in syscall.c:get_path_from_fd(): it is forbidden to add a rule with a path which is not accessible (according to LANDLOCK_ACCESS_FS_OPEN) thanks to a call to security_file_open(), but this is definitely not perfect.
Ah, I missed that.
I think the easy way to fix this would be to add a bitmask to each rule that says from which ruleset it originally comes, and then let check_access_path() collect these bitmasks from each rule with OR, and check at the end whether the resulting bitmask is full - if not, at least one of the rulesets did not permit the access, and it should be denied.
But maybe it would make more sense to change how the API works instead, and get rid of the concept of "merging" two rulesets together? Instead, we could make the API work like this:
- LANDLOCK_CMD_CREATE_RULESET gives you a file descriptor whose
->private_data contains a pointer to the old ruleset of the process, as well as a pointer to a new empty ruleset.
- LANDLOCK_CMD_ADD_RULE fails if the specified rule would not be
permitted by the old ruleset, then adds the rule to the new ruleset
- LANDLOCK_CMD_ENFORCE_RULESET fails if the old ruleset pointer in
->private_data doesn't match the current ruleset of the process, then replaces the old ruleset with the new ruleset.
With this, the new ruleset is guaranteed to be a subset of the old ruleset because each of the new ruleset's rules is permitted by the old ruleset. (Unless the directory hierarchy rotates, but in that case the inaccuracy isn't much worse than what would've been possible through RCU path walk anyway AFAIK.)
What do you think?
I would prefer to add the same checks you described at first (with check_access_path), but only when creating a new ruleset with merge_ruleset() (which should probably be renamed). This enables not to rely on a parent ruleset/domain until the enforcement, which is the case anyway. Unfortunately this doesn't work for some cases with bind mounts. Because check_access_path() goes through one path, another (bind mounted) path could be illegitimately allowed.
Hmm... I'm not sure what you mean. At the moment, landlock doesn't allow any sandboxed process to change the mount hierarchy, right? Can you give an example where this would go wrong?
Indeed, a Landlocked process must no be able to change its mount namespace layout. However, bind mounts may already exist. Let's say a process sandbox itself to only access /a in a read-write way. Then, this process (or one of its children) add a new restriction on /a/b to only be able to read this hierarchy. The check at insertion time would allow this because this access right is a subset of the access right allowed with the parent directory. However, If /a/b is bind mounted somewhere else, let's say in /private/b, then the second enforcement just gave new access rights to this hierarchy too. This is why it seems risky to rely on a check about the legitimacy of a new access right when adding it to a ruleset or when enforcing it.
That makes the problem a bit more complicated. A solution may be to keep track of the hierarchy of each rule (e.g. with a layer/depth number), and only allow an access request if at least a rule of each layer allow this access. In this case we also need to correctly handle the case when rules from different layers are tied to the same object.
On Tue, Mar 17, 2020 at 6:50 PM Mickaël Salaün mic@digikod.net wrote:
On 17/03/2020 17:19, Jann Horn wrote:
On Thu, Mar 12, 2020 at 12:38 AM Mickaël Salaün mic@digikod.net wrote:
On 10/03/2020 00:44, Jann Horn wrote:
On Mon, Feb 24, 2020 at 5:03 PM Mickaël Salaün mic@digikod.net wrote:
[...]
Aside from those things, there is also a major correctness issue where I'm not sure how to solve it properly:
Let's say a process installs a filter on itself like this:
struct landlock_attr_ruleset ruleset = { .handled_access_fs = ACCESS_FS_ROUGHLY_WRITE}; int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET, LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset); struct landlock_attr_path_beneath path_beneath = { .ruleset_fd = ruleset_fd, .allowed_access = ACCESS_FS_ROUGHLY_WRITE, .parent_fd = open("/tmp/foobar", O_PATH), }; landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH, sizeof(path_beneath), &path_beneath); prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd }; landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET, sizeof(attr_enforce), &attr_enforce);
At this point, the process is not supposed to be able to write to anything outside /tmp/foobar, right? But what happens if the process does the following next?
struct landlock_attr_ruleset ruleset = { .handled_access_fs = ACCESS_FS_ROUGHLY_WRITE}; int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET, LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset); struct landlock_attr_path_beneath path_beneath = { .ruleset_fd = ruleset_fd, .allowed_access = ACCESS_FS_ROUGHLY_WRITE, .parent_fd = open("/", O_PATH), }; landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH, sizeof(path_beneath), &path_beneath); prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd }; landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET, sizeof(attr_enforce), &attr_enforce);
As far as I can tell from looking at the source, after this, you will have write access to the entire filesystem again. I think the idea is that LANDLOCK_CMD_ENFORCE_RULESET should only let you drop privileges, not increase them, right?
There is an additionnal check in syscall.c:get_path_from_fd(): it is forbidden to add a rule with a path which is not accessible (according to LANDLOCK_ACCESS_FS_OPEN) thanks to a call to security_file_open(), but this is definitely not perfect.
Ah, I missed that.
I think the easy way to fix this would be to add a bitmask to each rule that says from which ruleset it originally comes, and then let check_access_path() collect these bitmasks from each rule with OR, and check at the end whether the resulting bitmask is full - if not, at least one of the rulesets did not permit the access, and it should be denied.
But maybe it would make more sense to change how the API works instead, and get rid of the concept of "merging" two rulesets together? Instead, we could make the API work like this:
- LANDLOCK_CMD_CREATE_RULESET gives you a file descriptor whose
->private_data contains a pointer to the old ruleset of the process, as well as a pointer to a new empty ruleset.
- LANDLOCK_CMD_ADD_RULE fails if the specified rule would not be
permitted by the old ruleset, then adds the rule to the new ruleset
- LANDLOCK_CMD_ENFORCE_RULESET fails if the old ruleset pointer in
->private_data doesn't match the current ruleset of the process, then replaces the old ruleset with the new ruleset.
With this, the new ruleset is guaranteed to be a subset of the old ruleset because each of the new ruleset's rules is permitted by the old ruleset. (Unless the directory hierarchy rotates, but in that case the inaccuracy isn't much worse than what would've been possible through RCU path walk anyway AFAIK.)
What do you think?
I would prefer to add the same checks you described at first (with check_access_path), but only when creating a new ruleset with merge_ruleset() (which should probably be renamed). This enables not to rely on a parent ruleset/domain until the enforcement, which is the case anyway. Unfortunately this doesn't work for some cases with bind mounts. Because check_access_path() goes through one path, another (bind mounted) path could be illegitimately allowed.
Hmm... I'm not sure what you mean. At the moment, landlock doesn't allow any sandboxed process to change the mount hierarchy, right? Can you give an example where this would go wrong?
Indeed, a Landlocked process must no be able to change its mount namespace layout. However, bind mounts may already exist. Let's say a process sandbox itself to only access /a in a read-write way.
So, first policy:
/a RW
Then, this process (or one of its children) add a new restriction on /a/b to only be able to read this hierarchy.
You mean with the second policy looking like this?
/a RW /a/b R
Then the resulting policy would be:
/a RW policy_bitmask=0x00000003 (bits 0 and 1 set) /a/b R policy_bitmask=0x00000002 (bit 1 set) required_bits=0x00000003 (bits 0 and 1 set)
The check at insertion time would allow this because this access right is a subset of the access right allowed with the parent directory. However, If /a/b is bind mounted somewhere else, let's say in /private/b, then the second enforcement just gave new access rights to this hierarchy too.
But with the solution I proposed, landlock's path walk would see something like this when accessing a file at /private/b/foo: /private/b/foo <no rules> policies seen until now: 0x00000000 /private/b <access: R, policy_bitmask=0x00000002> policies seen until now: 0x00000002 /private <no rules> policies seen until now: 0x00000002 / <no rules> policies seen until now: 0x00000002
It wouldn't encounter any rule from the first policy, so the OR of the seen policy bitmasks would be 0x00000002, which is not the required value 0x00000003, and so the access would be denied.
On 17/03/2020 20:45, Jann Horn wrote:
On Tue, Mar 17, 2020 at 6:50 PM Mickaël Salaün mic@digikod.net wrote:
On 17/03/2020 17:19, Jann Horn wrote:
On Thu, Mar 12, 2020 at 12:38 AM Mickaël Salaün mic@digikod.net wrote:
On 10/03/2020 00:44, Jann Horn wrote:
On Mon, Feb 24, 2020 at 5:03 PM Mickaël Salaün mic@digikod.net wrote:
[...]
Aside from those things, there is also a major correctness issue where I'm not sure how to solve it properly:
Let's say a process installs a filter on itself like this:
struct landlock_attr_ruleset ruleset = { .handled_access_fs = ACCESS_FS_ROUGHLY_WRITE}; int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET, LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset); struct landlock_attr_path_beneath path_beneath = { .ruleset_fd = ruleset_fd, .allowed_access = ACCESS_FS_ROUGHLY_WRITE, .parent_fd = open("/tmp/foobar", O_PATH), }; landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH, sizeof(path_beneath), &path_beneath); prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd }; landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET, sizeof(attr_enforce), &attr_enforce);
At this point, the process is not supposed to be able to write to anything outside /tmp/foobar, right? But what happens if the process does the following next?
struct landlock_attr_ruleset ruleset = { .handled_access_fs = ACCESS_FS_ROUGHLY_WRITE}; int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET, LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset); struct landlock_attr_path_beneath path_beneath = { .ruleset_fd = ruleset_fd, .allowed_access = ACCESS_FS_ROUGHLY_WRITE, .parent_fd = open("/", O_PATH), }; landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH, sizeof(path_beneath), &path_beneath); prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd }; landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET, sizeof(attr_enforce), &attr_enforce);
As far as I can tell from looking at the source, after this, you will have write access to the entire filesystem again. I think the idea is that LANDLOCK_CMD_ENFORCE_RULESET should only let you drop privileges, not increase them, right?
There is an additionnal check in syscall.c:get_path_from_fd(): it is forbidden to add a rule with a path which is not accessible (according to LANDLOCK_ACCESS_FS_OPEN) thanks to a call to security_file_open(), but this is definitely not perfect.
Ah, I missed that.
I think the easy way to fix this would be to add a bitmask to each rule that says from which ruleset it originally comes, and then let check_access_path() collect these bitmasks from each rule with OR, and check at the end whether the resulting bitmask is full - if not, at least one of the rulesets did not permit the access, and it should be denied.
But maybe it would make more sense to change how the API works instead, and get rid of the concept of "merging" two rulesets together? Instead, we could make the API work like this:
- LANDLOCK_CMD_CREATE_RULESET gives you a file descriptor whose
->private_data contains a pointer to the old ruleset of the process, as well as a pointer to a new empty ruleset.
- LANDLOCK_CMD_ADD_RULE fails if the specified rule would not be
permitted by the old ruleset, then adds the rule to the new ruleset
- LANDLOCK_CMD_ENFORCE_RULESET fails if the old ruleset pointer in
->private_data doesn't match the current ruleset of the process, then replaces the old ruleset with the new ruleset.
With this, the new ruleset is guaranteed to be a subset of the old ruleset because each of the new ruleset's rules is permitted by the old ruleset. (Unless the directory hierarchy rotates, but in that case the inaccuracy isn't much worse than what would've been possible through RCU path walk anyway AFAIK.)
What do you think?
I would prefer to add the same checks you described at first (with check_access_path), but only when creating a new ruleset with merge_ruleset() (which should probably be renamed). This enables not to rely on a parent ruleset/domain until the enforcement, which is the case anyway. Unfortunately this doesn't work for some cases with bind mounts. Because check_access_path() goes through one path, another (bind mounted) path could be illegitimately allowed.
Hmm... I'm not sure what you mean. At the moment, landlock doesn't allow any sandboxed process to change the mount hierarchy, right? Can you give an example where this would go wrong?
Indeed, a Landlocked process must no be able to change its mount namespace layout. However, bind mounts may already exist. Let's say a process sandbox itself to only access /a in a read-write way.
So, first policy:
/a RW
Then, this process (or one of its children) add a new restriction on /a/b to only be able to read this hierarchy.
You mean with the second policy looking like this?
Right.
/a RW /a/b R
Then the resulting policy would be:
/a RW policy_bitmask=0x00000003 (bits 0 and 1 set) /a/b R policy_bitmask=0x00000002 (bit 1 set) required_bits=0x00000003 (bits 0 and 1 set)
The check at insertion time would allow this because this access right is a subset of the access right allowed with the parent directory. However, If /a/b is bind mounted somewhere else, let's say in /private/b, then the second enforcement just gave new access rights to this hierarchy too.
But with the solution I proposed, landlock's path walk would see something like this when accessing a file at /private/b/foo: /private/b/foo <no rules> policies seen until now: 0x00000000 /private/b <access: R, policy_bitmask=0x00000002> policies seen until now: 0x00000002 /private <no rules> policies seen until now: 0x00000002 / <no rules> policies seen until now: 0x00000002
It wouldn't encounter any rule from the first policy, so the OR of the seen policy bitmasks would be 0x00000002, which is not the required value 0x00000003, and so the access would be denied.
As I understand your proposition, we need to build the required_bits when adding a rule or enforcing/merging a ruleset with a domain. The issue is that a rule only refers to a struct inode, not a struct path. For your proposition to work, we would need to walk through the file path when adding a rule to a ruleset, which means that we need to depend of the current view of the process (i.e. its mount namespace), and its Landlock domain. If the required_bits field is set when the ruleset is merged with the domain, it is not possible anymore to walk through the corresponding initial file path, which makes the enforcement step too late to check for such consistency. The important point is that a ruleset/domain doesn't have a notion of file hierarchy, a ruleset is only a set of tagged inodes.
I'm not sure I got your proposition right, though. When and how would you generate the required_bits?
Here is my updated proposition: add a layer level and a depth to each rule (once enforced/merged with a domain), and a top layer level for a domain. When enforcing a ruleset (i.e. merging a ruleset into the current domain), the layer level of a new rule would be the incremented top layer level. If there is no rule (from this domain) tied to the same inode, then the depth of the new rule is 1. However, if there is already a rule tied to the same inode and if this rule's layer level is the previous top layer level, then the depth and the layer level are both incremented and the rule is updated with the new access rights (boolean AND).
The policy looks like this: domain top_layer=2 /a RW policy_bitmask=0x00000003 layer=1 depth=1 /a/b R policy_bitmask=0x00000002 layer=2 depth=1
The path walk access check walks through all inodes and start with a layer counter equal to the top layer of the current domain. For each encountered inode tied to a rule, the access rights are checked and a new check ensures that the layer of the matching rule is the same as the counter (this may be a merged ruleset containing rules pertaining to the same hierarchy, which is fine) or equal to the decremented counter (i.e. the path walk just reached the underlying layer). If the path walk encounter a rule with a layer strictly less than the counter minus one, there is a whole in the layers which means that the ruleset hierarchy/subset does not match, and the access must be denied.
When accessing a file at /private/b/foo for a read access: /private/b/foo <no rules> allowed_access=unknown layer_counter=2 /private/b <access: R, policy_bitmask=0x00000002, layer=2, depth=1> allowed_access=allowed layer_counter=2 /private <no rules> allowed_access=allowed layer_counter=2 / <no rules> allowed_access=allowed layer_counter=2
Because the layer_counter didn't reach 1, the access request is then denied.
This proposition enables not to rely on a parent ruleset at first, only when enforcing/merging a ruleset with a domain. This also solves the issue with multiple inherited/nested rules on the same inode (in which case the depth just grows). Moreover, this enables to safely stop the path walk as soon as we reach the layer 1.
Here is a more complex example. A process sandbox itself with a first rule: domain top_layer=1 /a RW policy_bitmask=0x00000003 layer=1 depth=1
Then the sandbox process enforces to itself this second (useless) ruleset: /a/b RW policy_bitmask=0x00000003
The resulting domain is then: domain top_layer=2 /a RW policy_bitmask=0x00000003 layer=1 depth=1 /a/b RW policy_bitmask=0x00000003 layer=2 depth=1
Then the sandbox process enforces to itself this third ruleset (which effectively reduces its access): /a/b R policy_bitmask=0x00000002
The resulting domain is then: domain top_layer=3 /a RW policy_bitmask=0x00000003 layer=1 depth=1 /a/b R policy_bitmask=0x00000002 layer=3 depth=2
At this time, only /a/b is accessible in a read way. The access rights on /a are ignored (but still inherited).
Then the sandbox process enforces to itself this fourth ruleset: /c R policy_bitmask=0x00000002
The resulting domain is then: domain top_layer=4 /a RW policy_bitmask=0x00000003 layer=1 depth=1 /a/b R policy_bitmask=0x00000002 layer=3 depth=2 /c R policy_bitmask=0x00000002 layer=4 depth=1
Now, every read or write access requests will be denied.
Then the sandbox process enforces to itself this fifth ruleset: /a R policy_bitmask=0x00000002
Because /a is not in a contiguous underneath layer, the resulting domain is unchanged (except the top_layer which may be incremented anyway). Of course, we must check that the top_layer is not overflowing, in which case an error must be returned to inform userspace that the ruleset can't be enforced.
On Wed, Mar 18, 2020 at 1:06 PM Mickaël Salaün mic@digikod.net wrote:
On 17/03/2020 20:45, Jann Horn wrote:
On Tue, Mar 17, 2020 at 6:50 PM Mickaël Salaün mic@digikod.net wrote:
On 17/03/2020 17:19, Jann Horn wrote:
On Thu, Mar 12, 2020 at 12:38 AM Mickaël Salaün mic@digikod.net wrote:
On 10/03/2020 00:44, Jann Horn wrote:
On Mon, Feb 24, 2020 at 5:03 PM Mickaël Salaün mic@digikod.net wrote:
[...]
Aside from those things, there is also a major correctness issue where I'm not sure how to solve it properly:
Let's say a process installs a filter on itself like this:
struct landlock_attr_ruleset ruleset = { .handled_access_fs = ACCESS_FS_ROUGHLY_WRITE}; int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET, LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset); struct landlock_attr_path_beneath path_beneath = { .ruleset_fd = ruleset_fd, .allowed_access = ACCESS_FS_ROUGHLY_WRITE, .parent_fd = open("/tmp/foobar", O_PATH), }; landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH, sizeof(path_beneath), &path_beneath); prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd }; landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET, sizeof(attr_enforce), &attr_enforce);
At this point, the process is not supposed to be able to write to anything outside /tmp/foobar, right? But what happens if the process does the following next?
struct landlock_attr_ruleset ruleset = { .handled_access_fs = ACCESS_FS_ROUGHLY_WRITE}; int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET, LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset); struct landlock_attr_path_beneath path_beneath = { .ruleset_fd = ruleset_fd, .allowed_access = ACCESS_FS_ROUGHLY_WRITE, .parent_fd = open("/", O_PATH), }; landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH, sizeof(path_beneath), &path_beneath); prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd }; landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET, sizeof(attr_enforce), &attr_enforce);
As far as I can tell from looking at the source, after this, you will have write access to the entire filesystem again. I think the idea is that LANDLOCK_CMD_ENFORCE_RULESET should only let you drop privileges, not increase them, right?
There is an additionnal check in syscall.c:get_path_from_fd(): it is forbidden to add a rule with a path which is not accessible (according to LANDLOCK_ACCESS_FS_OPEN) thanks to a call to security_file_open(), but this is definitely not perfect.
Ah, I missed that.
I think the easy way to fix this would be to add a bitmask to each rule that says from which ruleset it originally comes, and then let check_access_path() collect these bitmasks from each rule with OR, and check at the end whether the resulting bitmask is full - if not, at least one of the rulesets did not permit the access, and it should be denied.
But maybe it would make more sense to change how the API works instead, and get rid of the concept of "merging" two rulesets together? Instead, we could make the API work like this:
- LANDLOCK_CMD_CREATE_RULESET gives you a file descriptor whose
->private_data contains a pointer to the old ruleset of the process, as well as a pointer to a new empty ruleset.
- LANDLOCK_CMD_ADD_RULE fails if the specified rule would not be
permitted by the old ruleset, then adds the rule to the new ruleset
- LANDLOCK_CMD_ENFORCE_RULESET fails if the old ruleset pointer in
->private_data doesn't match the current ruleset of the process, then replaces the old ruleset with the new ruleset.
With this, the new ruleset is guaranteed to be a subset of the old ruleset because each of the new ruleset's rules is permitted by the old ruleset. (Unless the directory hierarchy rotates, but in that case the inaccuracy isn't much worse than what would've been possible through RCU path walk anyway AFAIK.)
What do you think?
I would prefer to add the same checks you described at first (with check_access_path), but only when creating a new ruleset with merge_ruleset() (which should probably be renamed). This enables not to rely on a parent ruleset/domain until the enforcement, which is the case anyway. Unfortunately this doesn't work for some cases with bind mounts. Because check_access_path() goes through one path, another (bind mounted) path could be illegitimately allowed.
Hmm... I'm not sure what you mean. At the moment, landlock doesn't allow any sandboxed process to change the mount hierarchy, right? Can you give an example where this would go wrong?
Indeed, a Landlocked process must no be able to change its mount namespace layout. However, bind mounts may already exist. Let's say a process sandbox itself to only access /a in a read-write way.
So, first policy:
/a RW
Then, this process (or one of its children) add a new restriction on /a/b to only be able to read this hierarchy.
You mean with the second policy looking like this?
Right.
/a RW /a/b R
Then the resulting policy would be:
/a RW policy_bitmask=0x00000003 (bits 0 and 1 set) /a/b R policy_bitmask=0x00000002 (bit 1 set) required_bits=0x00000003 (bits 0 and 1 set)
The check at insertion time would allow this because this access right is a subset of the access right allowed with the parent directory. However, If /a/b is bind mounted somewhere else, let's say in /private/b, then the second enforcement just gave new access rights to this hierarchy too.
But with the solution I proposed, landlock's path walk would see something like this when accessing a file at /private/b/foo: /private/b/foo <no rules> policies seen until now: 0x00000000 /private/b <access: R, policy_bitmask=0x00000002> policies seen until now: 0x00000002 /private <no rules> policies seen until now: 0x00000002 / <no rules> policies seen until now: 0x00000002
It wouldn't encounter any rule from the first policy, so the OR of the seen policy bitmasks would be 0x00000002, which is not the required value 0x00000003, and so the access would be denied.
As I understand your proposition, we need to build the required_bits when adding a rule or enforcing/merging a ruleset with a domain. The issue is that a rule only refers to a struct inode, not a struct path. For your proposition to work, we would need to walk through the file path when adding a rule to a ruleset, which means that we need to depend of the current view of the process (i.e. its mount namespace), and its Landlock domain.
I don't see why that is necessary. Why would we have to walk the file path when adding a rule?
If the required_bits field is set when the ruleset is merged with the domain, it is not possible anymore to walk through the corresponding initial file path, which makes the enforcement step too late to check for such consistency. The important point is that a ruleset/domain doesn't have a notion of file hierarchy, a ruleset is only a set of tagged inodes.
I'm not sure I got your proposition right, though. When and how would you generate the required_bits?
Using your terminology: A domain is a collection of N layers, which are assigned indices 0..N-1. For each possible access type, a domain has a bitmask containing N bits that stores which layers control that access type. (Basically a per-layer version of fs_access_mask.) To validate an access, you start by ORing together the bitmasks for the requested access types; that gives you the required_bits mask, which lists all layers that want to control the access. Then you set seen_policy_bits=0, then do the check_access_path_continue() loop while keeping track of which layers you've seen with "seen_policy_bits |= access->contributing_policies", or something like that. And in the end, you check that seen_policy_bits is a superset of required_bits - something like `(~seen_policy_bits) & required_bits == 0`.
AFAICS to create a new domain from a bunch of layers, you wouldn't have to do any path walking.
Here is my updated proposition: add a layer level and a depth to each rule (once enforced/merged with a domain), and a top layer level for a domain. When enforcing a ruleset (i.e. merging a ruleset into the current domain), the layer level of a new rule would be the incremented top layer level. If there is no rule (from this domain) tied to the same inode, then the depth of the new rule is 1. However, if there is already a rule tied to the same inode and if this rule's layer level is the previous top layer level, then the depth and the layer level are both incremented and the rule is updated with the new access rights (boolean AND).
The policy looks like this: domain top_layer=2 /a RW policy_bitmask=0x00000003 layer=1 depth=1 /a/b R policy_bitmask=0x00000002 layer=2 depth=1
The path walk access check walks through all inodes and start with a layer counter equal to the top layer of the current domain. For each encountered inode tied to a rule, the access rights are checked and a new check ensures that the layer of the matching rule is the same as the counter (this may be a merged ruleset containing rules pertaining to the same hierarchy, which is fine) or equal to the decremented counter (i.e. the path walk just reached the underlying layer). If the path walk encounter a rule with a layer strictly less than the counter minus one, there is a whole in the layers which means that the ruleset hierarchy/subset does not match, and the access must be denied.
When accessing a file at /private/b/foo for a read access: /private/b/foo <no rules> allowed_access=unknown layer_counter=2 /private/b <access: R, policy_bitmask=0x00000002, layer=2, depth=1> allowed_access=allowed layer_counter=2 /private <no rules> allowed_access=allowed layer_counter=2 / <no rules> allowed_access=allowed layer_counter=2
Because the layer_counter didn't reach 1, the access request is then denied.
This proposition enables not to rely on a parent ruleset at first, only when enforcing/merging a ruleset with a domain. This also solves the issue with multiple inherited/nested rules on the same inode (in which case the depth just grows). Moreover, this enables to safely stop the path walk as soon as we reach the layer 1.
(FWIW, you could do the same optimization with the seen_policy_bits approach.)
I guess the difference between your proposal and mine is that in my proposal, the following would work, in effect permitting W access to /foo/bar/baz (and nothing else)?
first ruleset: /foo W second ruleset: /foo/bar/baz W third ruleset: /foo/bar W
whereas in your proposal, IIUC it wouldn't be valid for a new ruleset to whitelist a superset of what was whitelisted in a previous ruleset?
On 19/03/2020 00:33, Jann Horn wrote:
On Wed, Mar 18, 2020 at 1:06 PM Mickaël Salaün mic@digikod.net wrote:
On 17/03/2020 20:45, Jann Horn wrote:
On Tue, Mar 17, 2020 at 6:50 PM Mickaël Salaün mic@digikod.net wrote:
On 17/03/2020 17:19, Jann Horn wrote:
On Thu, Mar 12, 2020 at 12:38 AM Mickaël Salaün mic@digikod.net wrote:
On 10/03/2020 00:44, Jann Horn wrote: > On Mon, Feb 24, 2020 at 5:03 PM Mickaël Salaün mic@digikod.net wrote:
[...]
> Aside from those things, there is also a major correctness issue where > I'm not sure how to solve it properly: > > Let's say a process installs a filter on itself like this: > > struct landlock_attr_ruleset ruleset = { .handled_access_fs = > ACCESS_FS_ROUGHLY_WRITE}; > int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET, > LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset); > struct landlock_attr_path_beneath path_beneath = { > .ruleset_fd = ruleset_fd, > .allowed_access = ACCESS_FS_ROUGHLY_WRITE, > .parent_fd = open("/tmp/foobar", O_PATH), > }; > landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH, > sizeof(path_beneath), &path_beneath); > prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); > struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd }; > landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET, > sizeof(attr_enforce), &attr_enforce); > > At this point, the process is not supposed to be able to write to > anything outside /tmp/foobar, right? But what happens if the process > does the following next? > > struct landlock_attr_ruleset ruleset = { .handled_access_fs = > ACCESS_FS_ROUGHLY_WRITE}; > int ruleset_fd = landlock(LANDLOCK_CMD_CREATE_RULESET, > LANDLOCK_OPT_CREATE_RULESET, sizeof(ruleset), &ruleset); > struct landlock_attr_path_beneath path_beneath = { > .ruleset_fd = ruleset_fd, > .allowed_access = ACCESS_FS_ROUGHLY_WRITE, > .parent_fd = open("/", O_PATH), > }; > landlock(LANDLOCK_CMD_ADD_RULE, LANDLOCK_OPT_ADD_RULE_PATH_BENEATH, > sizeof(path_beneath), &path_beneath); > prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); > struct landlock_attr_enforce attr_enforce = { .ruleset_fd = ruleset_fd }; > landlock(LANDLOCK_CMD_ENFORCE_RULESET, LANDLOCK_OPT_ENFORCE_RULESET, > sizeof(attr_enforce), &attr_enforce); > > As far as I can tell from looking at the source, after this, you will > have write access to the entire filesystem again. I think the idea is > that LANDLOCK_CMD_ENFORCE_RULESET should only let you drop privileges, > not increase them, right?
There is an additionnal check in syscall.c:get_path_from_fd(): it is forbidden to add a rule with a path which is not accessible (according to LANDLOCK_ACCESS_FS_OPEN) thanks to a call to security_file_open(), but this is definitely not perfect.
Ah, I missed that.
> I think the easy way to fix this would be to add a bitmask to each > rule that says from which ruleset it originally comes, and then let > check_access_path() collect these bitmasks from each rule with OR, and > check at the end whether the resulting bitmask is full - if not, at > least one of the rulesets did not permit the access, and it should be > denied. > > But maybe it would make more sense to change how the API works > instead, and get rid of the concept of "merging" two rulesets > together? Instead, we could make the API work like this: > > - LANDLOCK_CMD_CREATE_RULESET gives you a file descriptor whose > ->private_data contains a pointer to the old ruleset of the process, > as well as a pointer to a new empty ruleset. > - LANDLOCK_CMD_ADD_RULE fails if the specified rule would not be > permitted by the old ruleset, then adds the rule to the new ruleset > - LANDLOCK_CMD_ENFORCE_RULESET fails if the old ruleset pointer in > ->private_data doesn't match the current ruleset of the process, then > replaces the old ruleset with the new ruleset. > > With this, the new ruleset is guaranteed to be a subset of the old > ruleset because each of the new ruleset's rules is permitted by the > old ruleset. (Unless the directory hierarchy rotates, but in that case > the inaccuracy isn't much worse than what would've been possible > through RCU path walk anyway AFAIK.) > > What do you think? >
I would prefer to add the same checks you described at first (with check_access_path), but only when creating a new ruleset with merge_ruleset() (which should probably be renamed). This enables not to rely on a parent ruleset/domain until the enforcement, which is the case anyway. Unfortunately this doesn't work for some cases with bind mounts. Because check_access_path() goes through one path, another (bind mounted) path could be illegitimately allowed.
Hmm... I'm not sure what you mean. At the moment, landlock doesn't allow any sandboxed process to change the mount hierarchy, right? Can you give an example where this would go wrong?
Indeed, a Landlocked process must no be able to change its mount namespace layout. However, bind mounts may already exist. Let's say a process sandbox itself to only access /a in a read-write way.
So, first policy:
/a RW
Then, this process (or one of its children) add a new restriction on /a/b to only be able to read this hierarchy.
You mean with the second policy looking like this?
Right.
/a RW /a/b R
Then the resulting policy would be:
/a RW policy_bitmask=0x00000003 (bits 0 and 1 set) /a/b R policy_bitmask=0x00000002 (bit 1 set) required_bits=0x00000003 (bits 0 and 1 set)
The check at insertion time would allow this because this access right is a subset of the access right allowed with the parent directory. However, If /a/b is bind mounted somewhere else, let's say in /private/b, then the second enforcement just gave new access rights to this hierarchy too.
But with the solution I proposed, landlock's path walk would see something like this when accessing a file at /private/b/foo: /private/b/foo <no rules> policies seen until now: 0x00000000 /private/b <access: R, policy_bitmask=0x00000002> policies seen until now: 0x00000002 /private <no rules> policies seen until now: 0x00000002 / <no rules> policies seen until now: 0x00000002
It wouldn't encounter any rule from the first policy, so the OR of the seen policy bitmasks would be 0x00000002, which is not the required value 0x00000003, and so the access would be denied.
As I understand your proposition, we need to build the required_bits when adding a rule or enforcing/merging a ruleset with a domain. The issue is that a rule only refers to a struct inode, not a struct path. For your proposition to work, we would need to walk through the file path when adding a rule to a ruleset, which means that we need to depend of the current view of the process (i.e. its mount namespace), and its Landlock domain.
I don't see why that is necessary. Why would we have to walk the file path when adding a rule?
If the required_bits field is set when the ruleset is merged with the domain, it is not possible anymore to walk through the corresponding initial file path, which makes the enforcement step too late to check for such consistency. The important point is that a ruleset/domain doesn't have a notion of file hierarchy, a ruleset is only a set of tagged inodes.
I'm not sure I got your proposition right, though. When and how would you generate the required_bits?
Using your terminology: A domain is a collection of N layers, which are assigned indices 0..N-1. For each possible access type, a domain has a bitmask containing N bits that stores which layers control that access type. (Basically a per-layer version of fs_access_mask.)
OK, so there is a bit for each domain, which means that you get a limit of, let's say 64 layers? Knowing that each layer can be created by a standalone application, potentially nested in a bunch of layers, this seems artificially limiting.
To validate an access, you start by ORing together the bitmasks for the requested access types; that gives you the required_bits mask, which lists all layers that want to control the access. Then you set seen_policy_bits=0, then do the check_access_path_continue() loop while keeping track of which layers you've seen with "seen_policy_bits |= access->contributing_policies", or something like that. And in the end, you check that seen_policy_bits is a superset of required_bits - something like `(~seen_policy_bits) & required_bits == 0`.
AFAICS to create a new domain from a bunch of layers, you wouldn't have to do any path walking.
Right, I misunderstood your previous email.
Here is my updated proposition: add a layer level and a depth to each rule (once enforced/merged with a domain), and a top layer level for a domain. When enforcing a ruleset (i.e. merging a ruleset into the current domain), the layer level of a new rule would be the incremented top layer level. If there is no rule (from this domain) tied to the same inode, then the depth of the new rule is 1. However, if there is already a rule tied to the same inode and if this rule's layer level is the previous top layer level, then the depth and the layer level are both incremented and the rule is updated with the new access rights (boolean AND).
The policy looks like this: domain top_layer=2 /a RW policy_bitmask=0x00000003 layer=1 depth=1 /a/b R policy_bitmask=0x00000002 layer=2 depth=1
The path walk access check walks through all inodes and start with a layer counter equal to the top layer of the current domain. For each encountered inode tied to a rule, the access rights are checked and a new check ensures that the layer of the matching rule is the same as the counter (this may be a merged ruleset containing rules pertaining to the same hierarchy, which is fine) or equal to the decremented counter (i.e. the path walk just reached the underlying layer). If the path walk encounter a rule with a layer strictly less than the counter minus one, there is a whole in the layers which means that the ruleset hierarchy/subset does not match, and the access must be denied.
When accessing a file at /private/b/foo for a read access: /private/b/foo <no rules> allowed_access=unknown layer_counter=2 /private/b <access: R, policy_bitmask=0x00000002, layer=2, depth=1> allowed_access=allowed layer_counter=2 /private <no rules> allowed_access=allowed layer_counter=2 / <no rules> allowed_access=allowed layer_counter=2
Because the layer_counter didn't reach 1, the access request is then denied.
This proposition enables not to rely on a parent ruleset at first, only when enforcing/merging a ruleset with a domain. This also solves the issue with multiple inherited/nested rules on the same inode (in which case the depth just grows). Moreover, this enables to safely stop the path walk as soon as we reach the layer 1.
(FWIW, you could do the same optimization with the seen_policy_bits approach.)
I guess the difference between your proposal and mine is that in my proposal, the following would work, in effect permitting W access to /foo/bar/baz (and nothing else)?
first ruleset: /foo W second ruleset: /foo/bar/baz W third ruleset: /foo/bar W
whereas in your proposal, IIUC it wouldn't be valid for a new ruleset to whitelist a superset of what was whitelisted in a previous ruleset?
This behavior seems dangerous because a process which sandbox itself to only access /foo/bar W can bypass the restrictions from one of its parent domains (i.e. only access /foo/bar/baz W). Indeed, each layer is (most of the time) a different and standalone security policy.
To sum up, the bitmask approach doesn't have the notion of layers ordering. It is then not possible to check that a rule comes from a domain which is the direct ancestor of a child's domain. I want each policy/layer to be really nested in the sense that a process sandboxing itself can only add more restriction to itself with regard to its parent domain (and the whole hierarchy). This is a similar approach to seccomp-bpf (with chained filters), except there is almost no overhead to nest several policies/layers together because they are flattened. Using the layer level and depth approach enables to implement this.
On Thu, Mar 19, 2020 at 5:58 PM Mickaël Salaün mic@digikod.net wrote:
On 19/03/2020 00:33, Jann Horn wrote:
On Wed, Mar 18, 2020 at 1:06 PM Mickaël Salaün mic@digikod.net wrote:
[...]
As I understand your proposition, we need to build the required_bits when adding a rule or enforcing/merging a ruleset with a domain. The issue is that a rule only refers to a struct inode, not a struct path. For your proposition to work, we would need to walk through the file path when adding a rule to a ruleset, which means that we need to depend of the current view of the process (i.e. its mount namespace), and its Landlock domain.
I don't see why that is necessary. Why would we have to walk the file path when adding a rule?
If the required_bits field is set when the ruleset is merged with the domain, it is not possible anymore to walk through the corresponding initial file path, which makes the enforcement step too late to check for such consistency. The important point is that a ruleset/domain doesn't have a notion of file hierarchy, a ruleset is only a set of tagged inodes.
I'm not sure I got your proposition right, though. When and how would you generate the required_bits?
Using your terminology: A domain is a collection of N layers, which are assigned indices 0..N-1. For each possible access type, a domain has a bitmask containing N bits that stores which layers control that access type. (Basically a per-layer version of fs_access_mask.)
OK, so there is a bit for each domain, which means that you get a limit of, let's say 64 layers? Knowing that each layer can be created by a standalone application, potentially nested in a bunch of layers, this seems artificially limiting.
Yes, that is a downside of my approach.
To validate an access, you start by ORing together the bitmasks for the requested access types; that gives you the required_bits mask, which lists all layers that want to control the access. Then you set seen_policy_bits=0, then do the check_access_path_continue() loop while keeping track of which layers you've seen with "seen_policy_bits |= access->contributing_policies", or something like that. And in the end, you check that seen_policy_bits is a superset of required_bits - something like `(~seen_policy_bits) & required_bits == 0`.
AFAICS to create a new domain from a bunch of layers, you wouldn't have to do any path walking.
Right, I misunderstood your previous email.
Here is my updated proposition: add a layer level and a depth to each rule (once enforced/merged with a domain), and a top layer level for a domain. When enforcing a ruleset (i.e. merging a ruleset into the current domain), the layer level of a new rule would be the incremented top layer level. If there is no rule (from this domain) tied to the same inode, then the depth of the new rule is 1. However, if there is already a rule tied to the same inode and if this rule's layer level is the previous top layer level, then the depth and the layer level are both incremented and the rule is updated with the new access rights (boolean AND).
The policy looks like this: domain top_layer=2 /a RW policy_bitmask=0x00000003 layer=1 depth=1 /a/b R policy_bitmask=0x00000002 layer=2 depth=1
The path walk access check walks through all inodes and start with a layer counter equal to the top layer of the current domain. For each encountered inode tied to a rule, the access rights are checked and a new check ensures that the layer of the matching rule is the same as the counter (this may be a merged ruleset containing rules pertaining to the same hierarchy, which is fine) or equal to the decremented counter (i.e. the path walk just reached the underlying layer). If the path walk encounter a rule with a layer strictly less than the counter minus one, there is a whole in the layers which means that the ruleset hierarchy/subset does not match, and the access must be denied.
When accessing a file at /private/b/foo for a read access: /private/b/foo <no rules> allowed_access=unknown layer_counter=2 /private/b <access: R, policy_bitmask=0x00000002, layer=2, depth=1> allowed_access=allowed layer_counter=2 /private <no rules> allowed_access=allowed layer_counter=2 / <no rules> allowed_access=allowed layer_counter=2
Because the layer_counter didn't reach 1, the access request is then denied.
This proposition enables not to rely on a parent ruleset at first, only when enforcing/merging a ruleset with a domain. This also solves the issue with multiple inherited/nested rules on the same inode (in which case the depth just grows). Moreover, this enables to safely stop the path walk as soon as we reach the layer 1.
(FWIW, you could do the same optimization with the seen_policy_bits approach.)
I guess the difference between your proposal and mine is that in my proposal, the following would work, in effect permitting W access to /foo/bar/baz (and nothing else)?
first ruleset: /foo W second ruleset: /foo/bar/baz W third ruleset: /foo/bar W
whereas in your proposal, IIUC it wouldn't be valid for a new ruleset to whitelist a superset of what was whitelisted in a previous ruleset?
This behavior seems dangerous because a process which sandbox itself to only access /foo/bar W can bypass the restrictions from one of its parent domains (i.e. only access /foo/bar/baz W). Indeed, each layer is (most of the time) a different and standalone security policy.
It isn't actually bypassing the restriction: You still can't actually access files like /foo/bar/blah, because a path walk from there doesn't encounter any rules from the second ruleset.
To sum up, the bitmask approach doesn't have the notion of layers ordering. It is then not possible to check that a rule comes from a domain which is the direct ancestor of a child's domain. I want each policy/layer to be really nested in the sense that a process sandboxing itself can only add more restriction to itself with regard to its parent domain (and the whole hierarchy). This is a similar approach to seccomp-bpf (with chained filters), except there is almost no overhead to nest several policies/layers together because they are flattened. Using the layer level and depth approach enables to implement this.
On 19/03/2020 22:17, Jann Horn wrote:
On Thu, Mar 19, 2020 at 5:58 PM Mickaël Salaün mic@digikod.net wrote:
On 19/03/2020 00:33, Jann Horn wrote:
On Wed, Mar 18, 2020 at 1:06 PM Mickaël Salaün mic@digikod.net wrote:
[...]
As I understand your proposition, we need to build the required_bits when adding a rule or enforcing/merging a ruleset with a domain. The issue is that a rule only refers to a struct inode, not a struct path. For your proposition to work, we would need to walk through the file path when adding a rule to a ruleset, which means that we need to depend of the current view of the process (i.e. its mount namespace), and its Landlock domain.
I don't see why that is necessary. Why would we have to walk the file path when adding a rule?
If the required_bits field is set when the ruleset is merged with the domain, it is not possible anymore to walk through the corresponding initial file path, which makes the enforcement step too late to check for such consistency. The important point is that a ruleset/domain doesn't have a notion of file hierarchy, a ruleset is only a set of tagged inodes.
I'm not sure I got your proposition right, though. When and how would you generate the required_bits?
Using your terminology: A domain is a collection of N layers, which are assigned indices 0..N-1. For each possible access type, a domain has a bitmask containing N bits that stores which layers control that access type. (Basically a per-layer version of fs_access_mask.)
OK, so there is a bit for each domain, which means that you get a limit of, let's say 64 layers? Knowing that each layer can be created by a standalone application, potentially nested in a bunch of layers, this seems artificially limiting.
Yes, that is a downside of my approach.
To validate an access, you start by ORing together the bitmasks for the requested access types; that gives you the required_bits mask, which lists all layers that want to control the access. Then you set seen_policy_bits=0, then do the check_access_path_continue() loop while keeping track of which layers you've seen with "seen_policy_bits |= access->contributing_policies", or something like that. And in the end, you check that seen_policy_bits is a superset of required_bits - something like `(~seen_policy_bits) & required_bits == 0`.
AFAICS to create a new domain from a bunch of layers, you wouldn't have to do any path walking.
Right, I misunderstood your previous email.
Here is my updated proposition: add a layer level and a depth to each rule (once enforced/merged with a domain), and a top layer level for a domain. When enforcing a ruleset (i.e. merging a ruleset into the current domain), the layer level of a new rule would be the incremented top layer level. If there is no rule (from this domain) tied to the same inode, then the depth of the new rule is 1. However, if there is already a rule tied to the same inode and if this rule's layer level is the previous top layer level, then the depth and the layer level are both incremented and the rule is updated with the new access rights (boolean AND).
The policy looks like this: domain top_layer=2 /a RW policy_bitmask=0x00000003 layer=1 depth=1 /a/b R policy_bitmask=0x00000002 layer=2 depth=1
The path walk access check walks through all inodes and start with a layer counter equal to the top layer of the current domain. For each encountered inode tied to a rule, the access rights are checked and a new check ensures that the layer of the matching rule is the same as the counter (this may be a merged ruleset containing rules pertaining to the same hierarchy, which is fine) or equal to the decremented counter (i.e. the path walk just reached the underlying layer). If the path walk encounter a rule with a layer strictly less than the counter minus one, there is a whole in the layers which means that the ruleset hierarchy/subset does not match, and the access must be denied.
When accessing a file at /private/b/foo for a read access: /private/b/foo <no rules> allowed_access=unknown layer_counter=2 /private/b <access: R, policy_bitmask=0x00000002, layer=2, depth=1> allowed_access=allowed layer_counter=2 /private <no rules> allowed_access=allowed layer_counter=2 / <no rules> allowed_access=allowed layer_counter=2
Because the layer_counter didn't reach 1, the access request is then denied.
This proposition enables not to rely on a parent ruleset at first, only when enforcing/merging a ruleset with a domain. This also solves the issue with multiple inherited/nested rules on the same inode (in which case the depth just grows). Moreover, this enables to safely stop the path walk as soon as we reach the layer 1.
(FWIW, you could do the same optimization with the seen_policy_bits approach.)
I guess the difference between your proposal and mine is that in my proposal, the following would work, in effect permitting W access to /foo/bar/baz (and nothing else)?
first ruleset: /foo W second ruleset: /foo/bar/baz W third ruleset: /foo/bar W
whereas in your proposal, IIUC it wouldn't be valid for a new ruleset to whitelist a superset of what was whitelisted in a previous ruleset?
This behavior seems dangerous because a process which sandbox itself to only access /foo/bar W can bypass the restrictions from one of its parent domains (i.e. only access /foo/bar/baz W). Indeed, each layer is (most of the time) a different and standalone security policy.
It isn't actually bypassing the restriction: You still can't actually access files like /foo/bar/blah, because a path walk from there doesn't encounter any rules from the second ruleset.
Right, this use case is legitimate, e.g. first giving access to ~/Downloads and then another layer giving access to ~/ (because it doesn't know about the current restriction).
I think that neither my initial approach nor yours fit well, but I found a new one inspired from both approaches. The first solution I gave, and since implemented in the v15 [1], can manage 2^31-1 layers but it only works when refining a security policy *knowing the parent one* (i.e. refining an access tied to an inode, not a full file hierarchy). Instead of having a layer level and a layer depth, my new implementation (for v16) use a layer bitfield for each rule (and ruleset). We still AND access rights when merging rulesets, but instead of storing the last layer lever and depth, we set the corresponding bit in the layer bitfield of the rule. This way we don't consume more memory than the v15 implementation (for 64 layers top, which would be 64 bits * number of access types with your approach, i.e. between 1KB and 2KB) and Landlock can properly manage supersets of access rights in nested hierarchies, whatever their stacking order. However, I don't see another solution to better handle more than 64 layers than a VLA, but that could come later.
[1] https://lore.kernel.org/lkml/20200326202731.693608-6-mic@digikod.net/
To sum up, the bitmask approach doesn't have the notion of layers ordering. It is then not possible to check that a rule comes from a domain which is the direct ancestor of a child's domain. I want each policy/layer to be really nested in the sense that a process sandboxing itself can only add more restriction to itself with regard to its parent domain (and the whole hierarchy). This is a similar approach to seccomp-bpf (with chained filters), except there is almost no overhead to nest several policies/layers together because they are flattened. Using the layer level and depth approach enables to implement this.
linux-kselftest-mirror@lists.linaro.org