While creating a LSM BPF MAC policy to block user namespace creation, we used the LSM cred_prepare hook because that is the closest hook to prevent a call to create_user_ns().
The calls look something like this:
cred = prepare_creds() security_prepare_creds() call_int_hook(cred_prepare, ... if (cred) create_user_ns(cred)
We noticed that error codes were not propagated from this hook and introduced a patch [1] to propagate those errors.
The discussion notes that security_prepare_creds() is not appropriate for MAC policies, and instead the hook is meant for LSM authors to prepare credentials for mutation. [2]
Ultimately, we concluded that a better course of action is to introduce a new security hook for LSM authors. [3]
This patch set first introduces a new security_create_user_ns() function and userns_create LSM hook, then marks the hook as sleepable in BPF.
Links: 1. https://lore.kernel.org/all/20220608150942.776446-1-fred@cloudflare.com/ 2. https://lore.kernel.org/all/87y1xzyhub.fsf@email.froward.int.ebiederm.org/ 3. https://lore.kernel.org/all/9fe9cd9f-1ded-a179-8ded-5fde8960a586@cloudflare....
Past discussions: V2: https://lore.kernel.org/all/20220707223228.1940249-1-fred@cloudflare.com/ V1: https://lore.kernel.org/all/20220621233939.993579-1-fred@cloudflare.com/
Changes since v2: - Rename create_user_ns hook to userns_create - Use user_namespace as an object opposed to a generic namespace object - s/domB_t/domA_t in commit message Changes since v1: - Add selftests/bpf: Add tests verifying bpf lsm create_user_ns hook patch - Add selinux: Implement create_user_ns hook patch - Change function signature of security_create_user_ns() to only take struct cred - Move security_create_user_ns() call after id mapping check in create_user_ns() - Update documentation to reflect changes
Frederick Lawler (4): security, lsm: Introduce security_create_user_ns() bpf-lsm: Make bpf_lsm_userns_create() sleepable selftests/bpf: Add tests verifying bpf lsm userns_create hook selinux: Implement userns_create hook
include/linux/lsm_hook_defs.h | 1 + include/linux/lsm_hooks.h | 4 + include/linux/security.h | 6 ++ kernel/bpf/bpf_lsm.c | 1 + kernel/user_namespace.c | 5 ++ security/security.c | 5 ++ security/selinux/hooks.c | 9 ++ security/selinux/include/classmap.h | 2 + .../selftests/bpf/prog_tests/deny_namespace.c | 88 +++++++++++++++++++ .../selftests/bpf/progs/test_deny_namespace.c | 39 ++++++++ 10 files changed, 160 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/deny_namespace.c create mode 100644 tools/testing/selftests/bpf/progs/test_deny_namespace.c
-- 2.30.2
Preventing user namespace (privileged or otherwise) creation comes in a few of forms in order of granularity:
1. /proc/sys/user/max_user_namespaces sysctl 2. OS specific patch(es) 3. CONFIG_USER_NS
To block a task based on its attributes, the LSM hook cred_prepare is a good candidate for use because it provides more granular control, and it is called before create_user_ns():
cred = prepare_creds() security_prepare_creds() call_int_hook(cred_prepare, ... if (cred) create_user_ns(cred)
Since security_prepare_creds() is meant for LSMs to copy and prepare credentials, access control is an unintended use of the hook. Therefore introduce a new function security_create_user_ns() with an accompanying userns_create LSM hook.
This hook takes the prepared creds for LSM authors to write policy against. On success, the new namespace is applied to credentials, otherwise an error is returned.
Signed-off-by: Frederick Lawler fred@cloudflare.com
--- Changes since v2: - Rename create_user_ns hook to userns_create Changes since v1: - Changed commit wording - Moved execution to be after id mapping check - Changed signature to only accept a const struct cred * --- include/linux/lsm_hook_defs.h | 1 + include/linux/lsm_hooks.h | 4 ++++ include/linux/security.h | 6 ++++++ kernel/user_namespace.c | 5 +++++ security/security.c | 5 +++++ 5 files changed, 21 insertions(+)
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h index eafa1d2489fd..7ff93cb8ca8d 100644 --- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -223,6 +223,7 @@ LSM_HOOK(int, -ENOSYS, task_prctl, int option, unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5) LSM_HOOK(void, LSM_RET_VOID, task_to_inode, struct task_struct *p, struct inode *inode) +LSM_HOOK(int, 0, userns_create, const struct cred *cred) LSM_HOOK(int, 0, ipc_permission, struct kern_ipc_perm *ipcp, short flag) LSM_HOOK(void, LSM_RET_VOID, ipc_getsecid, struct kern_ipc_perm *ipcp, u32 *secid) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 91c8146649f5..54fe534d0e01 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -799,6 +799,10 @@ * security attributes, e.g. for /proc/pid inodes. * @p contains the task_struct for the task. * @inode contains the inode structure for the inode. + * @userns_create: + * Check permission prior to creating a new user namespace. + * @cred points to prepared creds. + * Return 0 if successful, otherwise < 0 error code. * * Security hooks for Netlink messaging. * diff --git a/include/linux/security.h b/include/linux/security.h index 7fc4e9f49f54..a195bf33246a 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -435,6 +435,7 @@ int security_task_kill(struct task_struct *p, struct kernel_siginfo *info, int security_task_prctl(int option, unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5); void security_task_to_inode(struct task_struct *p, struct inode *inode); +int security_create_user_ns(const struct cred *cred); int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag); void security_ipc_getsecid(struct kern_ipc_perm *ipcp, u32 *secid); int security_msg_msg_alloc(struct msg_msg *msg); @@ -1185,6 +1186,11 @@ static inline int security_task_prctl(int option, unsigned long arg2, static inline void security_task_to_inode(struct task_struct *p, struct inode *inode) { }
+static inline int security_create_user_ns(const struct cred *cred) +{ + return 0; +} + static inline int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag) { diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 5481ba44a8d6..3f464bbda0e9 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -9,6 +9,7 @@ #include <linux/highuid.h> #include <linux/cred.h> #include <linux/securebits.h> +#include <linux/security.h> #include <linux/keyctl.h> #include <linux/key-type.h> #include <keys/user-type.h> @@ -113,6 +114,10 @@ int create_user_ns(struct cred *new) !kgid_has_mapping(parent_ns, group)) goto fail_dec;
+ ret = security_create_user_ns(new); + if (ret < 0) + goto fail_dec; + ret = -ENOMEM; ns = kmem_cache_zalloc(user_ns_cachep, GFP_KERNEL); if (!ns) diff --git a/security/security.c b/security/security.c index 188b8f782220..ec9b4696e86c 100644 --- a/security/security.c +++ b/security/security.c @@ -1903,6 +1903,11 @@ void security_task_to_inode(struct task_struct *p, struct inode *inode) call_void_hook(task_to_inode, p, inode); }
+int security_create_user_ns(const struct cred *cred) +{ + return call_int_hook(userns_create, 0, cred); +} + int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag) { return call_int_hook(ipc_permission, 0, ipcp, flag);
Users may want to audit calls to security_create_user_ns() and access user space memory. Also create_user_ns() runs without pagefault_disabled(). Therefore, make bpf_lsm_userns_create() sleepable for mandatory access control policies.
Signed-off-by: Frederick Lawler fred@cloudflare.com
--- Changes since v2: - Rename create_user_ns hook to userns_create Changes since v1: - None --- kernel/bpf/bpf_lsm.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c index c1351df9f7ee..4593437809cc 100644 --- a/kernel/bpf/bpf_lsm.c +++ b/kernel/bpf/bpf_lsm.c @@ -250,6 +250,7 @@ BTF_ID(func, bpf_lsm_task_getsecid_obj) BTF_ID(func, bpf_lsm_task_prctl) BTF_ID(func, bpf_lsm_task_setscheduler) BTF_ID(func, bpf_lsm_task_to_inode) +BTF_ID(func, bpf_lsm_userns_create) BTF_SET_END(sleepable_lsm_hooks)
bool bpf_lsm_is_sleepable_hook(u32 btf_id)
The LSM hook userns_create was introduced to provide LSM's an opportunity to block or allow unprivileged user namespace creation. This test serves two purposes: it provides a test eBPF implementation, and tests the hook successfully blocks or allows user namespace creation.
This tests 4 cases:
1. Unattached bpf program does not block unpriv user namespace creation. 2. Attached bpf program allows user namespace creation given CAP_SYS_ADMIN privileges. 3. Attached bpf program denies user namespace creation for a user without CAP_SYS_ADMIN. 4. The sleepable implementation loads
Signed-off-by: Frederick Lawler fred@cloudflare.com
--- The generic deny_namespace file name is used for future namespace expansion. I didn't want to limit these files to just the create_user_ns hook. Changes since v2: - Rename create_user_ns hook to userns_create Changes since v1: - Introduce this patch --- .../selftests/bpf/prog_tests/deny_namespace.c | 88 +++++++++++++++++++ .../selftests/bpf/progs/test_deny_namespace.c | 39 ++++++++ 2 files changed, 127 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/deny_namespace.c create mode 100644 tools/testing/selftests/bpf/progs/test_deny_namespace.c
diff --git a/tools/testing/selftests/bpf/prog_tests/deny_namespace.c b/tools/testing/selftests/bpf/prog_tests/deny_namespace.c new file mode 100644 index 000000000000..9e4714295008 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/deny_namespace.c @@ -0,0 +1,88 @@ +// SPDX-License-Identifier: GPL-2.0 +#define _GNU_SOURCE +#include <test_progs.h> +#include "test_deny_namespace.skel.h" +#include <sched.h> +#include "cap_helpers.h" + +#define STACK_SIZE (1024 * 1024) +static char child_stack[STACK_SIZE]; + +int clone_callback(void *arg) +{ + return 0; +} + +static int create_new_user_ns(void) +{ + int status; + pid_t cpid; + + cpid = clone(clone_callback, child_stack + STACK_SIZE, + CLONE_NEWUSER | SIGCHLD, NULL); + + if (cpid == -1) + return errno; + + if (cpid == 0) + return 0; + + waitpid(cpid, &status, 0); + if (WIFEXITED(status)) + return WEXITSTATUS(status); + + return -1; +} + +static void test_userns_create_bpf(void) +{ + __u32 cap_mask = 1ULL << CAP_SYS_ADMIN; + __u64 old_caps = 0; + + ASSERT_OK(create_new_user_ns(), "priv new user ns"); + + cap_disable_effective(cap_mask, &old_caps); + + ASSERT_EQ(create_new_user_ns(), EPERM, "unpriv new user ns"); + + if (cap_mask & old_caps) + cap_enable_effective(cap_mask, NULL); +} + +static void test_unpriv_userns_create_no_bpf(void) +{ + __u32 cap_mask = 1ULL << CAP_SYS_ADMIN; + __u64 old_caps = 0; + + cap_disable_effective(cap_mask, &old_caps); + + ASSERT_OK(create_new_user_ns(), "no-bpf unpriv new user ns"); + + if (cap_mask & old_caps) + cap_enable_effective(cap_mask, NULL); +} + +void test_deny_namespace(void) +{ + struct test_deny_namespace *skel = NULL; + int err; + + if (test__start_subtest("unpriv_userns_create_no_bpf")) + test_unpriv_userns_create_no_bpf(); + + skel = test_deny_namespace__open_and_load(); + if (!ASSERT_OK_PTR(skel, "skel load")) + goto close_prog; + + err = test_deny_namespace__attach(skel); + if (!ASSERT_OK(err, "attach")) + goto close_prog; + + if (test__start_subtest("userns_create_bpf")) + test_userns_create_bpf(); + + test_deny_namespace__detach(skel); + +close_prog: + test_deny_namespace__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/progs/test_deny_namespace.c b/tools/testing/selftests/bpf/progs/test_deny_namespace.c new file mode 100644 index 000000000000..9ec9dabc8372 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_deny_namespace.c @@ -0,0 +1,39 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <linux/bpf.h> +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_tracing.h> +#include <errno.h> +#include <linux/capability.h> + +struct kernel_cap_struct { + __u32 cap[_LINUX_CAPABILITY_U32S_3]; +} __attribute__((preserve_access_index)); + +struct cred { + struct kernel_cap_struct cap_effective; +} __attribute__((preserve_access_index)); + +char _license[] SEC("license") = "GPL"; + +SEC("lsm/userns_create") +int BPF_PROG(test_userns_create, const struct cred *cred, int ret) +{ + struct kernel_cap_struct caps = cred->cap_effective; + int cap_index = CAP_TO_INDEX(CAP_SYS_ADMIN); + __u32 cap_mask = CAP_TO_MASK(CAP_SYS_ADMIN); + + if (ret) + return 0; + + ret = -EPERM; + if (caps.cap[cap_index] & cap_mask) + return 0; + + return -EPERM; +} + +SEC("lsm.s/userns_create") +int BPF_PROG(test_sleepable_userns_create, const struct cred *cred, int ret) +{ + return 0; +}
Unprivileged user namespace creation is an intended feature to enable sandboxing, however this feature is often used to as an initial step to perform a privilege escalation attack.
This patch implements a new user_namespace { create } access control permission to restrict which domains allow or deny user namespace creation. This is necessary for system administrators to quickly protect their systems while waiting for vulnerability patches to be applied.
This permission can be used in the following way:
allow domA_t domA_t : user_namespace { create };
Signed-off-by: Frederick Lawler fred@cloudflare.com
--- Changes since v2: - Rename create_user_ns hook to userns_create - Use user_namespace as an object opposed to a generic namespace object - s/domB_t/domA_t in commit message Changes since v1: - Introduce this patch --- security/selinux/hooks.c | 9 +++++++++ security/selinux/include/classmap.h | 2 ++ 2 files changed, 11 insertions(+)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index beceb89f68d9..afc9da0249e7 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -4227,6 +4227,14 @@ static void selinux_task_to_inode(struct task_struct *p, spin_unlock(&isec->lock); }
+static int selinux_userns_create(const struct cred *cred) +{ + u32 sid = current_sid(); + + return avc_has_perm(&selinux_state, sid, sid, SECCLASS_USER_NAMESPACE, + USER_NAMESPACE__CREATE, NULL); +} + /* Returns error only if unable to parse addresses */ static int selinux_parse_skb_ipv4(struct sk_buff *skb, struct common_audit_data *ad, u8 *proto) @@ -7117,6 +7125,7 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(task_movememory, selinux_task_movememory), LSM_HOOK_INIT(task_kill, selinux_task_kill), LSM_HOOK_INIT(task_to_inode, selinux_task_to_inode), + LSM_HOOK_INIT(userns_create, selinux_userns_create),
LSM_HOOK_INIT(ipc_permission, selinux_ipc_permission), LSM_HOOK_INIT(ipc_getsecid, selinux_ipc_getsecid), diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h index ff757ae5f253..0bff55bb9cde 100644 --- a/security/selinux/include/classmap.h +++ b/security/selinux/include/classmap.h @@ -254,6 +254,8 @@ const struct security_class_mapping secclass_map[] = { { COMMON_FILE_PERMS, NULL } }, { "io_uring", { "override_creds", "sqpoll", NULL } }, + { "user_namespace", + { "create", NULL } }, { NULL } };
On Thu, Jul 21, 2022 at 12:28:07PM -0500, Frederick Lawler wrote:
The LSM hook userns_create was introduced to provide LSM's an opportunity to block or allow unprivileged user namespace creation. This test serves two purposes: it provides a test eBPF implementation, and tests the hook successfully blocks or allows user namespace creation.
This tests 4 cases:
1. Unattached bpf program does not block unpriv user namespace creation. 2. Attached bpf program allows user namespace creation given CAP_SYS_ADMIN privileges. 3. Attached bpf program denies user namespace creation for a user without CAP_SYS_ADMIN. 4. The sleepable implementation loads
Signed-off-by: Frederick Lawler fred@cloudflare.com
The generic deny_namespace file name is used for future namespace expansion. I didn't want to limit these files to just the create_user_ns hook. Changes since v2:
- Rename create_user_ns hook to userns_create
Changes since v1:
- Introduce this patch
.../selftests/bpf/prog_tests/deny_namespace.c | 88 +++++++++++++++++++ .../selftests/bpf/progs/test_deny_namespace.c | 39 ++++++++ 2 files changed, 127 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/deny_namespace.c create mode 100644 tools/testing/selftests/bpf/progs/test_deny_namespace.c
diff --git a/tools/testing/selftests/bpf/prog_tests/deny_namespace.c b/tools/testing/selftests/bpf/prog_tests/deny_namespace.c new file mode 100644 index 000000000000..9e4714295008 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/deny_namespace.c @@ -0,0 +1,88 @@ +// SPDX-License-Identifier: GPL-2.0 +#define _GNU_SOURCE +#include <test_progs.h> +#include "test_deny_namespace.skel.h" +#include <sched.h> +#include "cap_helpers.h"
+#define STACK_SIZE (1024 * 1024)
Does the child need 1M stack space ?
+static char child_stack[STACK_SIZE];
+int clone_callback(void *arg)
static
+{
- return 0;
+}
+static int create_new_user_ns(void) +{
- int status;
- pid_t cpid;
- cpid = clone(clone_callback, child_stack + STACK_SIZE,
CLONE_NEWUSER | SIGCHLD, NULL);
- if (cpid == -1)
return errno;
- if (cpid == 0)
Not an expert in clone() call and it is not clear what 0 return value mean from the man page. Could you explain ?
return 0;
- waitpid(cpid, &status, 0);
- if (WIFEXITED(status))
return WEXITSTATUS(status);
- return -1;
+}
+static void test_userns_create_bpf(void) +{
- __u32 cap_mask = 1ULL << CAP_SYS_ADMIN;
- __u64 old_caps = 0;
- ASSERT_OK(create_new_user_ns(), "priv new user ns");
Does it need to enable CAP_SYS_ADMIN first ?
- cap_disable_effective(cap_mask, &old_caps);
- ASSERT_EQ(create_new_user_ns(), EPERM, "unpriv new user ns");
- if (cap_mask & old_caps)
cap_enable_effective(cap_mask, NULL);
+}
+static void test_unpriv_userns_create_no_bpf(void) +{
- __u32 cap_mask = 1ULL << CAP_SYS_ADMIN;
- __u64 old_caps = 0;
- cap_disable_effective(cap_mask, &old_caps);
- ASSERT_OK(create_new_user_ns(), "no-bpf unpriv new user ns");
- if (cap_mask & old_caps)
cap_enable_effective(cap_mask, NULL);
+}
+void test_deny_namespace(void) +{
- struct test_deny_namespace *skel = NULL;
- int err;
- if (test__start_subtest("unpriv_userns_create_no_bpf"))
test_unpriv_userns_create_no_bpf();
- skel = test_deny_namespace__open_and_load();
- if (!ASSERT_OK_PTR(skel, "skel load"))
goto close_prog;
- err = test_deny_namespace__attach(skel);
- if (!ASSERT_OK(err, "attach"))
goto close_prog;
- if (test__start_subtest("userns_create_bpf"))
test_userns_create_bpf();
- test_deny_namespace__detach(skel);
+close_prog:
- test_deny_namespace__destroy(skel);
+} diff --git a/tools/testing/selftests/bpf/progs/test_deny_namespace.c b/tools/testing/selftests/bpf/progs/test_deny_namespace.c new file mode 100644 index 000000000000..9ec9dabc8372 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_deny_namespace.c @@ -0,0 +1,39 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <linux/bpf.h> +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_tracing.h> +#include <errno.h> +#include <linux/capability.h>
+struct kernel_cap_struct {
- __u32 cap[_LINUX_CAPABILITY_U32S_3];
+} __attribute__((preserve_access_index));
+struct cred {
- struct kernel_cap_struct cap_effective;
+} __attribute__((preserve_access_index));
+char _license[] SEC("license") = "GPL";
+SEC("lsm/userns_create") +int BPF_PROG(test_userns_create, const struct cred *cred, int ret) +{
- struct kernel_cap_struct caps = cred->cap_effective;
- int cap_index = CAP_TO_INDEX(CAP_SYS_ADMIN);
- __u32 cap_mask = CAP_TO_MASK(CAP_SYS_ADMIN);
- if (ret)
return 0;
- ret = -EPERM;
- if (caps.cap[cap_index] & cap_mask)
return 0;
- return -EPERM;
+}
+SEC("lsm.s/userns_create") +int BPF_PROG(test_sleepable_userns_create, const struct cred *cred, int ret) +{
An empty program is weird. If the intention is to ensure a sleepable program can attach to userns_create, move the test logic here and remove the non-sleepable program above.
- return 0;
+}
2.30.2
On Thu, Jul 21, 2022 at 12:28:04PM -0500, Frederick Lawler wrote:
While creating a LSM BPF MAC policy to block user namespace creation, we used the LSM cred_prepare hook because that is the closest hook to prevent a call to create_user_ns().
The calls look something like this:
cred = prepare_creds() security_prepare_creds() call_int_hook(cred_prepare, ... if (cred) create_user_ns(cred)
We noticed that error codes were not propagated from this hook and introduced a patch [1] to propagate those errors.
The discussion notes that security_prepare_creds() is not appropriate for MAC policies, and instead the hook is meant for LSM authors to prepare credentials for mutation. [2]
Ultimately, we concluded that a better course of action is to introduce a new security hook for LSM authors. [3]
This patch set first introduces a new security_create_user_ns() function and userns_create LSM hook, then marks the hook as sleepable in BPF.
Patch 1 and 4 still need review from the lsm/security side.
On Thu, Jul 21, 2022 at 12:28:07PM -0500, Frederick Lawler wrote:
The LSM hook userns_create was introduced to provide LSM's an opportunity to block or allow unprivileged user namespace creation. This test serves two purposes: it provides a test eBPF implementation, and tests the hook successfully blocks or allows user namespace creation.
This tests 4 cases:
1. Unattached bpf program does not block unpriv user namespace creation. 2. Attached bpf program allows user namespace creation given CAP_SYS_ADMIN privileges. 3. Attached bpf program denies user namespace creation for a user without CAP_SYS_ADMIN. 4. The sleepable implementation loads
Sounds good!
Signed-off-by: Frederick Lawler fred@cloudflare.com
The generic deny_namespace file name is used for future namespace expansion. I didn't want to limit these files to just the create_user_ns hook. Changes since v2:
- Rename create_user_ns hook to userns_create
Changes since v1:
- Introduce this patch
.../selftests/bpf/prog_tests/deny_namespace.c | 88 +++++++++++++++++++ .../selftests/bpf/progs/test_deny_namespace.c | 39 ++++++++ 2 files changed, 127 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/deny_namespace.c create mode 100644 tools/testing/selftests/bpf/progs/test_deny_namespace.c
diff --git a/tools/testing/selftests/bpf/prog_tests/deny_namespace.c b/tools/testing/selftests/bpf/prog_tests/deny_namespace.c new file mode 100644 index 000000000000..9e4714295008 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/deny_namespace.c @@ -0,0 +1,88 @@ +// SPDX-License-Identifier: GPL-2.0 +#define _GNU_SOURCE +#include <test_progs.h> +#include "test_deny_namespace.skel.h" +#include <sched.h> +#include "cap_helpers.h"
+#define STACK_SIZE (1024 * 1024) +static char child_stack[STACK_SIZE];
+int clone_callback(void *arg) +{
- return 0;
+}
+static int create_new_user_ns(void) +{
- int status;
- pid_t cpid;
- cpid = clone(clone_callback, child_stack + STACK_SIZE,
CLONE_NEWUSER | SIGCHLD, NULL);
- if (cpid == -1)
return errno;
- if (cpid == 0)
return 0;
Martin asked about this already but fwiw, this cannot happen with clone(). The clone() function doesn't return twice. It always returns the PID of the child process or an error.
- waitpid(cpid, &status, 0);
- if (WIFEXITED(status))
return WEXITSTATUS(status);
- return -1;
+}
You can also just avoid the clone() dance and simply do sm like:
static int wait_for_pid(pid_t pid) { int status, ret;
again: ret = waitpid(pid, &status, 0); if (ret == -1) { if (errno == EINTR) goto again;
return -1; }
if (!WIFEXITED(status)) return -1;
return WEXITSTATUS(status); }
/* negative return value -> some internal error * positive return value -> userns creation failed * 0 -> userns creation succeeded */ static int create_user_ns(void) { pid_t pid;
pid = fork(); if (pid < 0) return -1;
if (pid == 0) { if (unshare(CLONE_NEWUSER)) _exit(EXIT_FAILURE); _exit(EXIT_SUCCESS); }
return wait_for_pid(pid); }
Same difference since both codepaths hit the right spot in the kernel.
+static void test_userns_create_bpf(void) +{
- __u32 cap_mask = 1ULL << CAP_SYS_ADMIN;
- __u64 old_caps = 0;
- ASSERT_OK(create_new_user_ns(), "priv new user ns");
- cap_disable_effective(cap_mask, &old_caps);
- ASSERT_EQ(create_new_user_ns(), EPERM, "unpriv new user ns");
- if (cap_mask & old_caps)
cap_enable_effective(cap_mask, NULL);
+}
+static void test_unpriv_userns_create_no_bpf(void) +{
- __u32 cap_mask = 1ULL << CAP_SYS_ADMIN;
- __u64 old_caps = 0;
- cap_disable_effective(cap_mask, &old_caps);
- ASSERT_OK(create_new_user_ns(), "no-bpf unpriv new user ns");
- if (cap_mask & old_caps)
cap_enable_effective(cap_mask, NULL);
+}
+void test_deny_namespace(void) +{
- struct test_deny_namespace *skel = NULL;
- int err;
- if (test__start_subtest("unpriv_userns_create_no_bpf"))
test_unpriv_userns_create_no_bpf();
- skel = test_deny_namespace__open_and_load();
- if (!ASSERT_OK_PTR(skel, "skel load"))
goto close_prog;
- err = test_deny_namespace__attach(skel);
- if (!ASSERT_OK(err, "attach"))
goto close_prog;
- if (test__start_subtest("userns_create_bpf"))
test_userns_create_bpf();
- test_deny_namespace__detach(skel);
+close_prog:
- test_deny_namespace__destroy(skel);
+} diff --git a/tools/testing/selftests/bpf/progs/test_deny_namespace.c b/tools/testing/selftests/bpf/progs/test_deny_namespace.c new file mode 100644 index 000000000000..9ec9dabc8372 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_deny_namespace.c @@ -0,0 +1,39 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <linux/bpf.h> +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_tracing.h> +#include <errno.h> +#include <linux/capability.h>
+struct kernel_cap_struct {
- __u32 cap[_LINUX_CAPABILITY_U32S_3];
+} __attribute__((preserve_access_index));
+struct cred {
- struct kernel_cap_struct cap_effective;
+} __attribute__((preserve_access_index));
+char _license[] SEC("license") = "GPL";
+SEC("lsm/userns_create") +int BPF_PROG(test_userns_create, const struct cred *cred, int ret) +{
- struct kernel_cap_struct caps = cred->cap_effective;
- int cap_index = CAP_TO_INDEX(CAP_SYS_ADMIN);
- __u32 cap_mask = CAP_TO_MASK(CAP_SYS_ADMIN);
- if (ret)
return 0;
- ret = -EPERM;
- if (caps.cap[cap_index] & cap_mask)
return 0;
- return -EPERM;
+}
Looks nice and simple. Acked-by: Christian Brauner (Microsoft) brauner@kernel.org
On Thu, Jul 21, 2022 at 12:28:06PM -0500, Frederick Lawler wrote:
Users may want to audit calls to security_create_user_ns() and access user space memory. Also create_user_ns() runs without pagefault_disabled(). Therefore, make bpf_lsm_userns_create() sleepable for mandatory access control policies.
Signed-off-by: Frederick Lawler fred@cloudflare.com
Seems reasonable, Acked-by: Christian Brauner (Microsoft) brauner@kernel.org
On Thu, Jul 21, 2022 at 12:28:05PM -0500, Frederick Lawler wrote:
Preventing user namespace (privileged or otherwise) creation comes in a few of forms in order of granularity:
1. /proc/sys/user/max_user_namespaces sysctl 2. OS specific patch(es) 3. CONFIG_USER_NS
To block a task based on its attributes, the LSM hook cred_prepare is a good candidate for use because it provides more granular control, and it is called before create_user_ns():
cred = prepare_creds() security_prepare_creds() call_int_hook(cred_prepare, ... if (cred) create_user_ns(cred)
Since security_prepare_creds() is meant for LSMs to copy and prepare credentials, access control is an unintended use of the hook. Therefore introduce a new function security_create_user_ns() with an accompanying userns_create LSM hook.
This hook takes the prepared creds for LSM authors to write policy against. On success, the new namespace is applied to credentials, otherwise an error is returned.
Signed-off-by: Frederick Lawler fred@cloudflare.com
Nice and straightforward, Reviewed-by: Christian Brauner (Microsoft) brauner@kernel.org
On July 22, 2022 2:12:03 AM Martin KaFai Lau kafai@fb.com wrote:
On Thu, Jul 21, 2022 at 12:28:04PM -0500, Frederick Lawler wrote:
While creating a LSM BPF MAC policy to block user namespace creation, we used the LSM cred_prepare hook because that is the closest hook to prevent a call to create_user_ns().
The calls look something like this:
cred = prepare_creds() security_prepare_creds() call_int_hook(cred_prepare, ... if (cred) create_user_ns(cred)
We noticed that error codes were not propagated from this hook and introduced a patch [1] to propagate those errors.
The discussion notes that security_prepare_creds() is not appropriate for MAC policies, and instead the hook is meant for LSM authors to prepare credentials for mutation. [2]
Ultimately, we concluded that a better course of action is to introduce a new security hook for LSM authors. [3]
This patch set first introduces a new security_create_user_ns() function and userns_create LSM hook, then marks the hook as sleepable in BPF.
Patch 1 and 4 still need review from the lsm/security side.
This patchset is in my review queue and assuming everything checks out, I expect to merge it after the upcoming merge window closes.
I would also need an ACK from the BPF LSM folks, but they're CC'd on this patchset.
-- paul-moore.com
On 7/22/22 1:07 AM, Martin KaFai Lau wrote:
On Thu, Jul 21, 2022 at 12:28:07PM -0500, Frederick Lawler wrote:
The LSM hook userns_create was introduced to provide LSM's an opportunity to block or allow unprivileged user namespace creation. This test serves two purposes: it provides a test eBPF implementation, and tests the hook successfully blocks or allows user namespace creation.
This tests 4 cases:
1. Unattached bpf program does not block unpriv user namespace creation. 2. Attached bpf program allows user namespace creation given CAP_SYS_ADMIN privileges. 3. Attached bpf program denies user namespace creation for a user without CAP_SYS_ADMIN. 4. The sleepable implementation loads
Signed-off-by: Frederick Lawler fred@cloudflare.com
The generic deny_namespace file name is used for future namespace expansion. I didn't want to limit these files to just the create_user_ns hook. Changes since v2:
- Rename create_user_ns hook to userns_create
Changes since v1:
- Introduce this patch
.../selftests/bpf/prog_tests/deny_namespace.c | 88 +++++++++++++++++++ .../selftests/bpf/progs/test_deny_namespace.c | 39 ++++++++ 2 files changed, 127 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/deny_namespace.c create mode 100644 tools/testing/selftests/bpf/progs/test_deny_namespace.c
diff --git a/tools/testing/selftests/bpf/prog_tests/deny_namespace.c b/tools/testing/selftests/bpf/prog_tests/deny_namespace.c new file mode 100644 index 000000000000..9e4714295008 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/deny_namespace.c @@ -0,0 +1,88 @@ +// SPDX-License-Identifier: GPL-2.0 +#define _GNU_SOURCE +#include <test_progs.h> +#include "test_deny_namespace.skel.h" +#include <sched.h> +#include "cap_helpers.h"
+#define STACK_SIZE (1024 * 1024)
Does the child need 1M stack space ?
No, I can reduce that.
+static char child_stack[STACK_SIZE];
+int clone_callback(void *arg)
static
+{
- return 0;
+}
+static int create_new_user_ns(void) +{
- int status;
- pid_t cpid;
- cpid = clone(clone_callback, child_stack + STACK_SIZE,
CLONE_NEWUSER | SIGCHLD, NULL);
- if (cpid == -1)
return errno;
- if (cpid == 0)
Not an expert in clone() call and it is not clear what 0 return value mean from the man page. Could you explain ?
Good catch. This is using the libc clone().
return 0;
- waitpid(cpid, &status, 0);
- if (WIFEXITED(status))
return WEXITSTATUS(status);
- return -1;
+}
+static void test_userns_create_bpf(void) +{
- __u32 cap_mask = 1ULL << CAP_SYS_ADMIN;
- __u64 old_caps = 0;
- ASSERT_OK(create_new_user_ns(), "priv new user ns");
Does it need to enable CAP_SYS_ADMIN first ?
You're right, this should be more explicitly set. I ran tests with the vmtest.sh script supplied with sefltests/bpf which run under root. I should always set CAP_SYS_ADMIN here to be consistent.
- cap_disable_effective(cap_mask, &old_caps);
- ASSERT_EQ(create_new_user_ns(), EPERM, "unpriv new user ns");
- if (cap_mask & old_caps)
cap_enable_effective(cap_mask, NULL);
+}
+static void test_unpriv_userns_create_no_bpf(void) +{
- __u32 cap_mask = 1ULL << CAP_SYS_ADMIN;
- __u64 old_caps = 0;
- cap_disable_effective(cap_mask, &old_caps);
- ASSERT_OK(create_new_user_ns(), "no-bpf unpriv new user ns");
- if (cap_mask & old_caps)
cap_enable_effective(cap_mask, NULL);
+}
+void test_deny_namespace(void) +{
- struct test_deny_namespace *skel = NULL;
- int err;
- if (test__start_subtest("unpriv_userns_create_no_bpf"))
test_unpriv_userns_create_no_bpf();
- skel = test_deny_namespace__open_and_load();
- if (!ASSERT_OK_PTR(skel, "skel load"))
goto close_prog;
- err = test_deny_namespace__attach(skel);
- if (!ASSERT_OK(err, "attach"))
goto close_prog;
- if (test__start_subtest("userns_create_bpf"))
test_userns_create_bpf();
- test_deny_namespace__detach(skel);
+close_prog:
- test_deny_namespace__destroy(skel);
+} diff --git a/tools/testing/selftests/bpf/progs/test_deny_namespace.c b/tools/testing/selftests/bpf/progs/test_deny_namespace.c new file mode 100644 index 000000000000..9ec9dabc8372 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_deny_namespace.c @@ -0,0 +1,39 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <linux/bpf.h> +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_tracing.h> +#include <errno.h> +#include <linux/capability.h>
+struct kernel_cap_struct {
- __u32 cap[_LINUX_CAPABILITY_U32S_3];
+} __attribute__((preserve_access_index));
+struct cred {
- struct kernel_cap_struct cap_effective;
+} __attribute__((preserve_access_index));
+char _license[] SEC("license") = "GPL";
+SEC("lsm/userns_create") +int BPF_PROG(test_userns_create, const struct cred *cred, int ret) +{
- struct kernel_cap_struct caps = cred->cap_effective;
- int cap_index = CAP_TO_INDEX(CAP_SYS_ADMIN);
- __u32 cap_mask = CAP_TO_MASK(CAP_SYS_ADMIN);
- if (ret)
return 0;
- ret = -EPERM;
- if (caps.cap[cap_index] & cap_mask)
return 0;
- return -EPERM;
+}
+SEC("lsm.s/userns_create") +int BPF_PROG(test_sleepable_userns_create, const struct cred *cred, int ret) +{
An empty program is weird. If the intention is to ensure a sleepable program can attach to userns_create, move the test logic here and remove the non-sleepable program above.
Sure, I can do that.
- return 0;
+}
2.30.2
Frederick Lawler fred@cloudflare.com writes:
While creating a LSM BPF MAC policy to block user namespace creation, we used the LSM cred_prepare hook because that is the closest hook to prevent a call to create_user_ns().
That description is wrong. Your goal his is not to limit access to the user namespace. Your goal is to reduce the attack surface of the kernel by not allowing some processes access to a user namespace.
You have already said that you don't have concerns about the fundamentals of the user namespace, and what it enables only that it allows access to exploitable code.
Achieving the protection you seek requires talking and thinking clearly about the goal.
I have a couple of deep and fundamental problems with this approach, to limiting access to potentially exploitable code.
1) The first is that unless there is a high probability (say 90%) that at any time the only exploitable code in the kernel can only be accessed by an unprivileged user with the help of user namespaces, attackers will just route around this restriction and so it will achieve nothing in practice, while at the same time incur an extra maintenance burden.
2) The second is that there is a long standing problem with code that gets added to the kernel. Many times new kernel code because it has the potential to confuse suid root executables that code has been made root only. Over time that results in more and more code running as root to be able to make use of the useful features of the linux kernel.
One of the goals of the user namespace is to avoid more and more code migrating to running as root. To achieve that goal ordinary application developers need to be able to assume that typically user namespaces will be available on linux.
An assumption that ordinary applications like chromium make today.
Your intentions seem to be to place a capability check so that only root can use user namespaces or something of the sort. Thus breaking the general availability of user namespaces for ordinary applications on your systems.
My apologies if this has been addressed somewhere in the conversation already. I don't see these issues addressed in the descriptions of your patches.
Until these issues are firmly addressed and you are not proposing a patch that can only cause regressions in userspace applications.
Nacked-by: "Eric W. Biederman" ebiederm@xmission.com
The calls look something like this:
cred = prepare_creds() security_prepare_creds() call_int_hook(cred_prepare, ... if (cred) create_user_ns(cred)
We noticed that error codes were not propagated from this hook and introduced a patch [1] to propagate those errors.
The discussion notes that security_prepare_creds() is not appropriate for MAC policies, and instead the hook is meant for LSM authors to prepare credentials for mutation. [2]
Ultimately, we concluded that a better course of action is to introduce a new security hook for LSM authors. [3]
This patch set first introduces a new security_create_user_ns() function and userns_create LSM hook, then marks the hook as sleepable in BPF.
Links:
- https://lore.kernel.org/all/20220608150942.776446-1-fred@cloudflare.com/
- https://lore.kernel.org/all/87y1xzyhub.fsf@email.froward.int.ebiederm.org/
- https://lore.kernel.org/all/9fe9cd9f-1ded-a179-8ded-5fde8960a586@cloudflare....
Past discussions: V2: https://lore.kernel.org/all/20220707223228.1940249-1-fred@cloudflare.com/ V1: https://lore.kernel.org/all/20220621233939.993579-1-fred@cloudflare.com/
Changes since v2:
- Rename create_user_ns hook to userns_create
- Use user_namespace as an object opposed to a generic namespace object
- s/domB_t/domA_t in commit message
Changes since v1:
- Add selftests/bpf: Add tests verifying bpf lsm create_user_ns hook patch
- Add selinux: Implement create_user_ns hook patch
- Change function signature of security_create_user_ns() to only take struct cred
- Move security_create_user_ns() call after id mapping check in create_user_ns()
- Update documentation to reflect changes
Frederick Lawler (4): security, lsm: Introduce security_create_user_ns() bpf-lsm: Make bpf_lsm_userns_create() sleepable selftests/bpf: Add tests verifying bpf lsm userns_create hook selinux: Implement userns_create hook
include/linux/lsm_hook_defs.h | 1 + include/linux/lsm_hooks.h | 4 + include/linux/security.h | 6 ++ kernel/bpf/bpf_lsm.c | 1 + kernel/user_namespace.c | 5 ++ security/security.c | 5 ++ security/selinux/hooks.c | 9 ++ security/selinux/include/classmap.h | 2 + .../selftests/bpf/prog_tests/deny_namespace.c | 88 +++++++++++++++++++ .../selftests/bpf/progs/test_deny_namespace.c | 39 ++++++++ 10 files changed, 160 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/deny_namespace.c create mode 100644 tools/testing/selftests/bpf/progs/test_deny_namespace.c
-- 2.30.2
Eric
On Fri, Jul 22, 2022 at 1:05 PM Eric W. Biederman ebiederm@xmission.com wrote:
Frederick Lawler fred@cloudflare.com writes:
While creating a LSM BPF MAC policy to block user namespace creation, we used the LSM cred_prepare hook because that is the closest hook to prevent a call to create_user_ns().
That description is wrong. Your goal his is not to limit access to the user namespace. Your goal is to reduce the attack surface of the kernel by not allowing some processes access to a user namespace.
You have already said that you don't have concerns about the fundamentals of the user namespace, and what it enables only that it allows access to exploitable code.
Achieving the protection you seek requires talking and thinking clearly about the goal.
Providing a single concrete goal for a LSM hook is always going to be a challenge due to the nature of the LSM layer and the great unknown of all the different LSMs that are implemented underneath the LSM abstraction. However, we can make some very general statements such that up to this point the LSMs that have been merged into mainline generally provide some level of access control, observability, or both. While that may change in the future (the LSM layer does not attempt to restrict LSMs to just these two ideas), I think they are "good enough" goals for this discussion.
In addition to thinking about these goals, I think it also important to take a step back and think about the original motivation for the LSM and why it, and Linux itself, has proven to be popular with everything from the phone in your hand to the datacenter servers powering ... pretty much everything :) Arguably Linux has seen such profound success because of its malleability; the open nature of the kernel development process has allowed the Linux Kernel to adopt capabilities well beyond what any one dev team could produce, and as Linux continues to grow in adoption, its ability to flex into new use cases only increases. The kernel namespace concept is an excellent example of this: virtualizing core kernel ideas, such as user credentials, to provide better, safer solutions. It is my belief that the LSM layer is very much built around this same idea of abstracting and extending core kernel concepts, in this case security controls, to provide better solutions. Integrating the LSM into the kernel's namespaces is a natural fit, and one that is long overdue.
If we can't find a way to make everyone happy here, let's at least try to find a way to make everyone "okay" with adding a LSM hook to the user namespace. If you want to NACK this approach Eric, that's okay, but please provide some alternative paths forward that we can discuss.
Hi Eric,
On Fri, Jul 22, 2022 at 7:07 PM Eric W. Biederman ebiederm@xmission.com wrote:
Frederick Lawler fred@cloudflare.com writes:
While creating a LSM BPF MAC policy to block user namespace creation, we used the LSM cred_prepare hook because that is the closest hook to prevent a call to create_user_ns().
That description is wrong. Your goal his is not to limit access to the user namespace. Your goal is to reduce the attack surface of the kernel by not allowing some processes access to a user namespace.
You have already said that you don't have concerns about the fundamentals of the user namespace, and what it enables only that it allows access to exploitable code.
Achieving the protection you seek requires talking and thinking clearly about the goal.
We have valid use cases not specifically related to the attack surface, but go into the middle from bpf observability to enforcement. As we want to track namespace creation, changes, nesting and per task creds context depending on the nature of the workload.
Obvious example is nesting as we want to track namespace creations not necessarily user namespace but all to report hierarchies to dashboards, then from kubernetes namespace view, we would like some applications to setup namespaces privileged or not, but deny other apps creation of nested pidns, userns, etc, it depends on users how they setup their kubernetes namespaces and labels...
...
The second is that there is a long standing problem with code that gets added to the kernel. Many times new kernel code because it has the potential to confuse suid root executables that code has been made root only. Over time that results in more and more code running as root to be able to make use of the useful features of the linux kernel.
One of the goals of the user namespace is to avoid more and more code migrating to running as root. To achieve that goal ordinary application developers need to be able to assume that typically user namespaces will be available on linux.
An assumption that ordinary applications like chromium make today.
I don't necessarily disagree with statement 2. and in a perfect world yes. But practically as noted by Paul in his email, Linux is flexible and speaking about kubernetes world we have multiple workload per namespaces, and we would like a solution that we can support in the next months.
Also these are features that some user space may use, some may not, we will never be able to dictate to all user space applications how to do things.
From bpf side observability or bpf-lsm enforcement it allows to escalate how to respond to the task and *make lsm and bpf (bpf-lsm) have a consistent design* where they both follow the same path.
It is unfortunate that the security_task_alloc() [1] hook is _late_ and can't be used for context initialization as the credentials and even user namespace have already been created. Strictly speaking we have a context that has been already created and applied and we can't properly catch it !
There is no way to do that from user space as most bpf based tools (observability and enforcement) do not and should not mess up at the user space level with the namespace configuration of tasks (/proc...), they are external programs to the running tasks, they do not set up the environment. Having the hook before the namespaces and creds copying allows to properly track this and construct the _right_ context. From lsm and bpf-lsm this will definitely offer a better interface that is not prone to errors.
We would like an answer here or an alternative hook that is placed before the creation/setting of any namespace, credentials or creating new keyring. So we can provide bpf-based transparent solutions that work.
[1] https://elixir.bootlin.com/linux/v5.18.13/source/kernel/fork.c#L2216
My apologies if this has been addressed somewhere in the conversation already. I don't see these issues addressed in the descriptions of your patches.
Until these issues are firmly addressed and you are not proposing a patch that can only cause regressions in userspace applications.
Nacked-by: "Eric W. Biederman" ebiederm@xmission.com
The calls look something like this:
cred = prepare_creds() security_prepare_creds() call_int_hook(cred_prepare, ... if (cred) create_user_ns(cred)
We noticed that error codes were not propagated from this hook and introduced a patch [1] to propagate those errors.
The discussion notes that security_prepare_creds() is not appropriate for MAC policies, and instead the hook is meant for LSM authors to prepare credentials for mutation. [2]
Ultimately, we concluded that a better course of action is to introduce a new security hook for LSM authors. [3]
This patch set first introduces a new security_create_user_ns() function and userns_create LSM hook, then marks the hook as sleepable in BPF.
Links:
- https://lore.kernel.org/all/20220608150942.776446-1-fred@cloudflare.com/
- https://lore.kernel.org/all/87y1xzyhub.fsf@email.froward.int.ebiederm.org/
- https://lore.kernel.org/all/9fe9cd9f-1ded-a179-8ded-5fde8960a586@cloudflare....
Past discussions: V2: https://lore.kernel.org/all/20220707223228.1940249-1-fred@cloudflare.com/ V1: https://lore.kernel.org/all/20220621233939.993579-1-fred@cloudflare.com/
Changes since v2:
- Rename create_user_ns hook to userns_create
- Use user_namespace as an object opposed to a generic namespace object
- s/domB_t/domA_t in commit message
Changes since v1:
- Add selftests/bpf: Add tests verifying bpf lsm create_user_ns hook patch
- Add selinux: Implement create_user_ns hook patch
- Change function signature of security_create_user_ns() to only take struct cred
- Move security_create_user_ns() call after id mapping check in create_user_ns()
- Update documentation to reflect changes
Frederick Lawler (4): security, lsm: Introduce security_create_user_ns() bpf-lsm: Make bpf_lsm_userns_create() sleepable selftests/bpf: Add tests verifying bpf lsm userns_create hook selinux: Implement userns_create hook
include/linux/lsm_hook_defs.h | 1 + include/linux/lsm_hooks.h | 4 + include/linux/security.h | 6 ++ kernel/bpf/bpf_lsm.c | 1 + kernel/user_namespace.c | 5 ++ security/security.c | 5 ++ security/selinux/hooks.c | 9 ++ security/selinux/include/classmap.h | 2 + .../selftests/bpf/prog_tests/deny_namespace.c | 88 +++++++++++++++++++ .../selftests/bpf/progs/test_deny_namespace.c | 39 ++++++++ 10 files changed, 160 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/deny_namespace.c create mode 100644 tools/testing/selftests/bpf/progs/test_deny_namespace.c
-- 2.30.2
Eric
On Fri, Jul 22, 2022 at 6:05 PM Eric W. Biederman ebiederm@xmission.com wrote:
Frederick Lawler fred@cloudflare.com writes:
While creating a LSM BPF MAC policy to block user namespace creation, we used the LSM cred_prepare hook because that is the closest hook to prevent a call to create_user_ns().
That description is wrong. Your goal his is not to limit access to the user namespace. Your goal is to reduce the attack surface of the kernel by not allowing some processes access to a user namespace.
You have already said that you don't have concerns about the fundamentals of the user namespace, and what it enables only that it allows access to exploitable code.
Achieving the protection you seek requires talking and thinking clearly about the goal.
I have a couple of deep and fundamental problems with this approach, to limiting access to potentially exploitable code.
The first is that unless there is a high probability (say 90%) that at any time the only exploitable code in the kernel can only be accessed by an unprivileged user with the help of user namespaces, attackers will just route around this restriction and so it will achieve nothing in practice, while at the same time incur an extra maintenance burden.
The second is that there is a long standing problem with code that gets added to the kernel. Many times new kernel code because it has the potential to confuse suid root executables that code has been made root only. Over time that results in more and more code running as root to be able to make use of the useful features of the linux kernel.
One of the goals of the user namespace is to avoid more and more code migrating to running as root. To achieve that goal ordinary application developers need to be able to assume that typically user namespaces will be available on linux.
An assumption that ordinary applications like chromium make today.
Your intentions seem to be to place a capability check so that only root can use user namespaces or something of the sort. Thus breaking the general availability of user namespaces for ordinary applications on your systems.
I would like to comment here that our intention with the hook is quite the opposite: we do want to embrace user namespaces in our code and some of our workloads already depend on it. Hence we didn't agree to Debian's approach of just having a global sysctl. But there is "our code" and there is "third party" code, which might not even be open source due to various reasons. And while the path exists for that code to do something bad - we want to block it.
So in a way, I think this hook allows better adoption of user namespaces in the first place and gives distros and other system maintainers a reasonable alternative than just providing a global "kill" sysctl (which is de-facto is used by many, thus actually limiting userspace applications accessing the user namespace functionality)
My apologies if this has been addressed somewhere in the conversation already. I don't see these issues addressed in the descriptions of your patches.
Until these issues are firmly addressed and you are not proposing a patch that can only cause regressions in userspace applications.
Nacked-by: "Eric W. Biederman" ebiederm@xmission.com
The calls look something like this:
cred = prepare_creds() security_prepare_creds() call_int_hook(cred_prepare, ... if (cred) create_user_ns(cred)
We noticed that error codes were not propagated from this hook and introduced a patch [1] to propagate those errors.
The discussion notes that security_prepare_creds() is not appropriate for MAC policies, and instead the hook is meant for LSM authors to prepare credentials for mutation. [2]
Ultimately, we concluded that a better course of action is to introduce a new security hook for LSM authors. [3]
This patch set first introduces a new security_create_user_ns() function and userns_create LSM hook, then marks the hook as sleepable in BPF.
Links:
- https://lore.kernel.org/all/20220608150942.776446-1-fred@cloudflare.com/
- https://lore.kernel.org/all/87y1xzyhub.fsf@email.froward.int.ebiederm.org/
- https://lore.kernel.org/all/9fe9cd9f-1ded-a179-8ded-5fde8960a586@cloudflare....
Past discussions: V2: https://lore.kernel.org/all/20220707223228.1940249-1-fred@cloudflare.com/ V1: https://lore.kernel.org/all/20220621233939.993579-1-fred@cloudflare.com/
Changes since v2:
- Rename create_user_ns hook to userns_create
- Use user_namespace as an object opposed to a generic namespace object
- s/domB_t/domA_t in commit message
Changes since v1:
- Add selftests/bpf: Add tests verifying bpf lsm create_user_ns hook patch
- Add selinux: Implement create_user_ns hook patch
- Change function signature of security_create_user_ns() to only take struct cred
- Move security_create_user_ns() call after id mapping check in create_user_ns()
- Update documentation to reflect changes
Frederick Lawler (4): security, lsm: Introduce security_create_user_ns() bpf-lsm: Make bpf_lsm_userns_create() sleepable selftests/bpf: Add tests verifying bpf lsm userns_create hook selinux: Implement userns_create hook
include/linux/lsm_hook_defs.h | 1 + include/linux/lsm_hooks.h | 4 + include/linux/security.h | 6 ++ kernel/bpf/bpf_lsm.c | 1 + kernel/user_namespace.c | 5 ++ security/security.c | 5 ++ security/selinux/hooks.c | 9 ++ security/selinux/include/classmap.h | 2 + .../selftests/bpf/prog_tests/deny_namespace.c | 88 +++++++++++++++++++ .../selftests/bpf/progs/test_deny_namespace.c | 39 ++++++++ 10 files changed, 160 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/deny_namespace.c create mode 100644 tools/testing/selftests/bpf/progs/test_deny_namespace.c
-- 2.30.2
Eric
On 7/22/22 7:20 AM, Paul Moore wrote:
On July 22, 2022 2:12:03 AM Martin KaFai Lau kafai@fb.com wrote:
On Thu, Jul 21, 2022 at 12:28:04PM -0500, Frederick Lawler wrote:
While creating a LSM BPF MAC policy to block user namespace creation, we used the LSM cred_prepare hook because that is the closest hook to prevent a call to create_user_ns().
The calls look something like this:
cred = prepare_creds() security_prepare_creds() call_int_hook(cred_prepare, ... if (cred) create_user_ns(cred)
We noticed that error codes were not propagated from this hook and introduced a patch [1] to propagate those errors.
The discussion notes that security_prepare_creds() is not appropriate for MAC policies, and instead the hook is meant for LSM authors to prepare credentials for mutation. [2]
Ultimately, we concluded that a better course of action is to introduce a new security hook for LSM authors. [3]
This patch set first introduces a new security_create_user_ns() function and userns_create LSM hook, then marks the hook as sleepable in BPF.
Patch 1 and 4 still need review from the lsm/security side.
This patchset is in my review queue and assuming everything checks out, I expect to merge it after the upcoming merge window closes.
I would also need an ACK from the BPF LSM folks, but they're CC'd on this patchset.
Based on last weeks comments, should I go ahead and put up v4 for 5.20-rc1 when that drops, or do I need to wait for more feedback?
-- paul-moore.com
On Mon, Aug 1, 2022 at 9:13 AM Frederick Lawler fred@cloudflare.com wrote:
On 7/22/22 7:20 AM, Paul Moore wrote:
On July 22, 2022 2:12:03 AM Martin KaFai Lau kafai@fb.com wrote:
On Thu, Jul 21, 2022 at 12:28:04PM -0500, Frederick Lawler wrote:
While creating a LSM BPF MAC policy to block user namespace creation, we used the LSM cred_prepare hook because that is the closest hook to prevent a call to create_user_ns().
The calls look something like this:
cred = prepare_creds() security_prepare_creds() call_int_hook(cred_prepare, ... if (cred) create_user_ns(cred)
We noticed that error codes were not propagated from this hook and introduced a patch [1] to propagate those errors.
The discussion notes that security_prepare_creds() is not appropriate for MAC policies, and instead the hook is meant for LSM authors to prepare credentials for mutation. [2]
Ultimately, we concluded that a better course of action is to introduce a new security hook for LSM authors. [3]
This patch set first introduces a new security_create_user_ns() function and userns_create LSM hook, then marks the hook as sleepable in BPF.
Patch 1 and 4 still need review from the lsm/security side.
This patchset is in my review queue and assuming everything checks out, I expect to merge it after the upcoming merge window closes.
I would also need an ACK from the BPF LSM folks, but they're CC'd on this patchset.
Based on last weeks comments, should I go ahead and put up v4 for 5.20-rc1 when that drops, or do I need to wait for more feedback?
In general it rarely hurts to make another revision, and I think you've gotten some decent feedback on this draft, especially around the BPF LSM tests; I think rebasing on Linus tree after the upcoming io_uring changes are merged would be a good idea. Although as a reminder to the BPF LSM folks - I'm looking at you KP Singh :) - I need an ACK from you guys before I merge the BPF related patches (patches {2,3}/4). For the record, I think the SELinux portion of this patchset (path 4/4) is fine.
There is the issue of Eric's NACK, but I believe the responses that followed his comment sufficiently addressed those concerns and it has now been a week with no further comment from Eric; we should continue to move forward with this.
On 8/1/2022 6:13 AM, Frederick Lawler wrote:
On 7/22/22 7:20 AM, Paul Moore wrote:
On July 22, 2022 2:12:03 AM Martin KaFai Lau kafai@fb.com wrote:
On Thu, Jul 21, 2022 at 12:28:04PM -0500, Frederick Lawler wrote:
While creating a LSM BPF MAC policy to block user namespace creation, we used the LSM cred_prepare hook because that is the closest hook to prevent a call to create_user_ns().
The calls look something like this:
cred = prepare_creds() security_prepare_creds() call_int_hook(cred_prepare, ... if (cred) create_user_ns(cred)
We noticed that error codes were not propagated from this hook and introduced a patch [1] to propagate those errors.
The discussion notes that security_prepare_creds() is not appropriate for MAC policies, and instead the hook is meant for LSM authors to prepare credentials for mutation. [2]
Ultimately, we concluded that a better course of action is to introduce a new security hook for LSM authors. [3]
This patch set first introduces a new security_create_user_ns() function and userns_create LSM hook, then marks the hook as sleepable in BPF.
Patch 1 and 4 still need review from the lsm/security side.
This patchset is in my review queue and assuming everything checks out, I expect to merge it after the upcoming merge window closes.
I would also need an ACK from the BPF LSM folks, but they're CC'd on this patchset.
Based on last weeks comments, should I go ahead and put up v4 for 5.20-rc1 when that drops, or do I need to wait for more feedback?
As the primary consumer of this hook is BPF I would really expect their reviewed-by before accepting this.
-- paul-moore.com
On Mon, Aug 1, 2022 at 11:25 AM Casey Schaufler casey@schaufler-ca.com wrote:
On 8/1/2022 6:13 AM, Frederick Lawler wrote:
On 7/22/22 7:20 AM, Paul Moore wrote:
On July 22, 2022 2:12:03 AM Martin KaFai Lau kafai@fb.com wrote:
On Thu, Jul 21, 2022 at 12:28:04PM -0500, Frederick Lawler wrote:
While creating a LSM BPF MAC policy to block user namespace creation, we used the LSM cred_prepare hook because that is the closest hook to prevent a call to create_user_ns().
The calls look something like this:
cred = prepare_creds() security_prepare_creds() call_int_hook(cred_prepare, ... if (cred) create_user_ns(cred)
We noticed that error codes were not propagated from this hook and introduced a patch [1] to propagate those errors.
The discussion notes that security_prepare_creds() is not appropriate for MAC policies, and instead the hook is meant for LSM authors to prepare credentials for mutation. [2]
Ultimately, we concluded that a better course of action is to introduce a new security hook for LSM authors. [3]
This patch set first introduces a new security_create_user_ns() function and userns_create LSM hook, then marks the hook as sleepable in BPF.
Patch 1 and 4 still need review from the lsm/security side.
This patchset is in my review queue and assuming everything checks out, I expect to merge it after the upcoming merge window closes.
I would also need an ACK from the BPF LSM folks, but they're CC'd on this patchset.
Based on last weeks comments, should I go ahead and put up v4 for 5.20-rc1 when that drops, or do I need to wait for more feedback?
As the primary consumer of this hook is BPF I would really expect their reviewed-by before accepting this.
We love all our in-tree LSMs equally. As long as there is at least one LSM which provides an implementation and has ACK'd the hook, and no other LSMs have NACK'd the hook, then I have no problem merging it. I doubt it will be necessary in this case, but if we need to tweak the hook in the future we can definitely do that; we've done this in the past when it has made sense.
As a reminder, the LSM hooks are *not* part of the "don't break userspace" promise. I know it gets a little muddy with the way the BPF LSM works, but just as we don't want to allow one LSM to impact the runtime controls on another, we don't want to allow one LSM to freeze the hooks for everyone.
On Mon, Aug 1, 2022 at 5:19 PM Paul Moore paul@paul-moore.com wrote:
On Mon, Aug 1, 2022 at 9:13 AM Frederick Lawler fred@cloudflare.com wrote:
On 7/22/22 7:20 AM, Paul Moore wrote:
On July 22, 2022 2:12:03 AM Martin KaFai Lau kafai@fb.com wrote:
On Thu, Jul 21, 2022 at 12:28:04PM -0500, Frederick Lawler wrote:
While creating a LSM BPF MAC policy to block user namespace creation, we used the LSM cred_prepare hook because that is the closest hook to prevent a call to create_user_ns().
The calls look something like this:
cred = prepare_creds() security_prepare_creds() call_int_hook(cred_prepare, ... if (cred) create_user_ns(cred)
We noticed that error codes were not propagated from this hook and introduced a patch [1] to propagate those errors.
The discussion notes that security_prepare_creds() is not appropriate for MAC policies, and instead the hook is meant for LSM authors to prepare credentials for mutation. [2]
Ultimately, we concluded that a better course of action is to introduce a new security hook for LSM authors. [3]
This patch set first introduces a new security_create_user_ns() function and userns_create LSM hook, then marks the hook as sleepable in BPF.
Patch 1 and 4 still need review from the lsm/security side.
This patchset is in my review queue and assuming everything checks out, I expect to merge it after the upcoming merge window closes.
I would also need an ACK from the BPF LSM folks, but they're CC'd on this patchset.
Based on last weeks comments, should I go ahead and put up v4 for 5.20-rc1 when that drops, or do I need to wait for more feedback?
In general it rarely hurts to make another revision, and I think you've gotten some decent feedback on this draft, especially around the BPF LSM tests; I think rebasing on Linus tree after the upcoming io_uring changes are merged would be a good idea. Although as a reminder to the BPF LSM folks - I'm looking at you KP Singh :) - I need an ACK from you guys before I merge the BPF related patches
Apologies, I was on vacation. I am looking at the patches now. Reviews and acks coming soon :)
- KP
(patches {2,3}/4). For the record, I think the SELinux portion of this patchset (path 4/4) is fine.
[...]
-- paul-moore.com
On Mon, Aug 1, 2022 at 6:35 PM Paul Moore paul@paul-moore.com wrote:
On Mon, Aug 1, 2022 at 11:25 AM Casey Schaufler casey@schaufler-ca.com wrote:
On 8/1/2022 6:13 AM, Frederick Lawler wrote:
On 7/22/22 7:20 AM, Paul Moore wrote:
On July 22, 2022 2:12:03 AM Martin KaFai Lau kafai@fb.com wrote:
On Thu, Jul 21, 2022 at 12:28:04PM -0500, Frederick Lawler wrote:
While creating a LSM BPF MAC policy to block user namespace creation, we used the LSM cred_prepare hook because that is the closest hook to prevent a call to create_user_ns().
The calls look something like this:
cred = prepare_creds() security_prepare_creds() call_int_hook(cred_prepare, ... if (cred) create_user_ns(cred)
We noticed that error codes were not propagated from this hook and introduced a patch [1] to propagate those errors.
The discussion notes that security_prepare_creds() is not appropriate for MAC policies, and instead the hook is meant for LSM authors to prepare credentials for mutation. [2]
Ultimately, we concluded that a better course of action is to introduce a new security hook for LSM authors. [3]
This patch set first introduces a new security_create_user_ns() function and userns_create LSM hook, then marks the hook as sleepable in BPF.
Patch 1 and 4 still need review from the lsm/security side.
This patchset is in my review queue and assuming everything checks out, I expect to merge it after the upcoming merge window closes.
I would also need an ACK from the BPF LSM folks, but they're CC'd on this patchset.
Based on last weeks comments, should I go ahead and put up v4 for 5.20-rc1 when that drops, or do I need to wait for more feedback?
As the primary consumer of this hook is BPF I would really expect their reviewed-by before accepting this.
We love all our in-tree LSMs equally. As long as there is at least one LSM which provides an implementation and has ACK'd the hook, and no other LSMs have NACK'd the hook, then I have no problem merging it. I doubt it will be necessary in this case, but if we need to tweak the hook in the future we can definitely do that; we've done this in the past when it has made sense.
As a reminder, the LSM hooks are *not* part of the "don't break userspace" promise. I know it gets a little muddy with the way the
That's correct. Also, with BPF LSM, we encourage users to build the application / bpf program logic to be resilient to changes in the LSM hooks.
I am happy to share how we've done it, if folks are interested.
- KP
BPF LSM works, but just as we don't want to allow one LSM to impact the runtime controls on another, we don't want to allow one LSM to freeze the hooks for everyone.
-- paul-moore.com
Paul Moore paul@paul-moore.com writes:
On July 22, 2022 2:12:03 AM Martin KaFai Lau kafai@fb.com wrote:
On Thu, Jul 21, 2022 at 12:28:04PM -0500, Frederick Lawler wrote:
While creating a LSM BPF MAC policy to block user namespace creation, we used the LSM cred_prepare hook because that is the closest hook to prevent a call to create_user_ns().
The calls look something like this:
cred = prepare_creds() security_prepare_creds() call_int_hook(cred_prepare, ... if (cred) create_user_ns(cred)
We noticed that error codes were not propagated from this hook and introduced a patch [1] to propagate those errors.
The discussion notes that security_prepare_creds() is not appropriate for MAC policies, and instead the hook is meant for LSM authors to prepare credentials for mutation. [2]
Ultimately, we concluded that a better course of action is to introduce a new security hook for LSM authors. [3]
This patch set first introduces a new security_create_user_ns() function and userns_create LSM hook, then marks the hook as sleepable in BPF.
Patch 1 and 4 still need review from the lsm/security side.
This patchset is in my review queue and assuming everything checks out, I expect to merge it after the upcoming merge window closes.
It doesn't even address my issues with the last patchset.
So it has my NACK.
Eric
On Tue, Aug 2, 2022 at 5:25 PM KP Singh kpsingh@kernel.org wrote:
On Mon, Aug 1, 2022 at 5:19 PM Paul Moore paul@paul-moore.com wrote:
On Mon, Aug 1, 2022 at 9:13 AM Frederick Lawler fred@cloudflare.com wrote:
On 7/22/22 7:20 AM, Paul Moore wrote:
On July 22, 2022 2:12:03 AM Martin KaFai Lau kafai@fb.com wrote:
On Thu, Jul 21, 2022 at 12:28:04PM -0500, Frederick Lawler wrote:
While creating a LSM BPF MAC policy to block user namespace creation, we used the LSM cred_prepare hook because that is the closest hook to prevent a call to create_user_ns().
The calls look something like this:
cred = prepare_creds() security_prepare_creds() call_int_hook(cred_prepare, ... if (cred) create_user_ns(cred)
We noticed that error codes were not propagated from this hook and introduced a patch [1] to propagate those errors.
The discussion notes that security_prepare_creds() is not appropriate for MAC policies, and instead the hook is meant for LSM authors to prepare credentials for mutation. [2]
Ultimately, we concluded that a better course of action is to introduce a new security hook for LSM authors. [3]
This patch set first introduces a new security_create_user_ns() function and userns_create LSM hook, then marks the hook as sleepable in BPF.
Patch 1 and 4 still need review from the lsm/security side.
This patchset is in my review queue and assuming everything checks out, I expect to merge it after the upcoming merge window closes.
I would also need an ACK from the BPF LSM folks, but they're CC'd on this patchset.
Based on last weeks comments, should I go ahead and put up v4 for 5.20-rc1 when that drops, or do I need to wait for more feedback?
In general it rarely hurts to make another revision, and I think you've gotten some decent feedback on this draft, especially around the BPF LSM tests; I think rebasing on Linus tree after the upcoming io_uring changes are merged would be a good idea.
As I was typing up my reply I realized I mistakenly mentioned the io_uring changes that Linus just merged today - oops! If you haven't figured it out already, you can disregard that comment, that's a completely different problem and a completely different set of patches :)
Although as a reminder to the BPF LSM folks - I'm looking at you KP Singh :) - I need an ACK from you guys before I merge the BPF related patches
Apologies, I was on vacation. I am looking at the patches now. Reviews and acks coming soon :)
No worries, we've still got the two weeks of the merge window before I can do anything into linux-next - thanks KP!
On 8/2/22 4:33 PM, Eric W. Biederman wrote:
Paul Moore paul@paul-moore.com writes:
On July 22, 2022 2:12:03 AM Martin KaFai Lau kafai@fb.com wrote:
On Thu, Jul 21, 2022 at 12:28:04PM -0500, Frederick Lawler wrote:
While creating a LSM BPF MAC policy to block user namespace creation, we used the LSM cred_prepare hook because that is the closest hook to prevent a call to create_user_ns().
The calls look something like this:
cred = prepare_creds() security_prepare_creds() call_int_hook(cred_prepare, ... if (cred) create_user_ns(cred)
We noticed that error codes were not propagated from this hook and introduced a patch [1] to propagate those errors.
The discussion notes that security_prepare_creds() is not appropriate for MAC policies, and instead the hook is meant for LSM authors to prepare credentials for mutation. [2]
Ultimately, we concluded that a better course of action is to introduce a new security hook for LSM authors. [3]
This patch set first introduces a new security_create_user_ns() function and userns_create LSM hook, then marks the hook as sleepable in BPF.
Patch 1 and 4 still need review from the lsm/security side.
This patchset is in my review queue and assuming everything checks out, I expect to merge it after the upcoming merge window closes.
It doesn't even address my issues with the last patchset.
Are you referring to [1], and with regards to [2], is the issue that the wording could be improved for both the cover letter and patch 1/4?
Ultimately, the goal of CF is to leverage and use user namespaces and block tasks whose meta information do not align with our allow list criteria. Yes, there is a higher goal of restricting our attack surface. Yes, people will find ways around security. The point is to have multiple levels of security, and this patch series allows people to add another level.
Calling this hook a regression is not true since there's no actual regression in the code. What would constitute a perceived regression is an admin imposing such a SELinux or BPF restriction within their company, but developers in that company ideally would try to work with the admin to enable user namespaces for certain use cases, or alternatively do what you don't want given current tooling: always run code as root. That's where this hook comes in: let people observe and enforce how they see fit. The average enthusiasts would see no impact.
I was requested to add _some_ test to BPF and to add a SELinux implementation. The low hanging fruit for a test to prove that the hook is capable of doing _something_ was to simply just block outright, and provide _some example_ of use. It doesn't make sense for us to write a test that outlines specifically what CF or others are doing because that would put too much emphasis on an implementation detail that doesn't matter to prove that the hook works.
Without Djalal's comment, I can't defend an observability use case that we're not currently leveraging. We have it now, so therefore I'll defend it per KP's suggestion[3] in v5.
By not responding to the email discussions, we can't accurately gauge what should or should not be in the descriptions. No one here necessarily disagrees with some of the points you made, and others have appropriately responded. As others have also wrote, you're not proposing alternatives. How do you expect us to work with that?
Please, let us know which bits and pieces ought to be included in the descriptions, and let us know what things we should call out caveats to that would satisfy your concerns.
Links: 1. https://lore.kernel.org/all/01368386-521f-230b-1d49-de19377c27d1@cloudflare.... 2. https://lore.kernel.org/all/877d45kri4.fsf@email.froward.int.ebiederm.org/#t 3. https://lore.kernel.org/all/CACYkzJ4x90DamdN4dRCn1gZuAHLqJNy4MoP=qTX+44Bqx1u... 4. https://lore.kernel.org/all/CAEiveUdPhEPAk7Y0ZXjPsD=Vb5hn453CHzS9aG-tkyRa8bf...
So it has my NACK.
Eric
linux-kselftest-mirror@lists.linaro.org