While creating a LSM BPF MAC policy to block user namespace creation, we used the LSM cred_prepare hook because that is the closest hook to prevent a call to create_user_ns().
The calls look something like this:
cred = prepare_creds() security_prepare_creds() call_int_hook(cred_prepare, ... if (cred) create_user_ns(cred)
We noticed that error codes were not propagated from this hook and introduced a patch [1] to propagate those errors.
The discussion notes that security_prepare_creds() is not appropriate for MAC policies, and instead the hook is meant for LSM authors to prepare credentials for mutation. [2]
Ultimately, we concluded that a better course of action is to introduce a new security hook for LSM authors. [3]
This patch set first introduces a new security_create_user_ns() function and userns_create LSM hook, then marks the hook as sleepable in BPF.
Links: 1. https://lore.kernel.org/all/20220608150942.776446-1-fred@cloudflare.com/ 2. https://lore.kernel.org/all/87y1xzyhub.fsf@email.froward.int.ebiederm.org/ 3. https://lore.kernel.org/all/9fe9cd9f-1ded-a179-8ded-5fde8960a586@cloudflare....
Past discussions: V3: https://lore.kernel.org/all/20220721172808.585539-1-fred@cloudflare.com/ V2: https://lore.kernel.org/all/20220707223228.1940249-1-fred@cloudflare.com/ V1: https://lore.kernel.org/all/20220621233939.993579-1-fred@cloudflare.com/
Changes since v3: - Explicitly set CAP_SYS_ADMIN to test namespace is created given permission - Simplify BPF test to use sleepable hook only - Prefer unshare() over clone() for tests Changes since v2: - Rename create_user_ns hook to userns_create - Use user_namespace as an object opposed to a generic namespace object - s/domB_t/domA_t in commit message Changes since v1: - Add selftests/bpf: Add tests verifying bpf lsm create_user_ns hook patch - Add selinux: Implement create_user_ns hook patch - Change function signature of security_create_user_ns() to only take struct cred - Move security_create_user_ns() call after id mapping check in create_user_ns() - Update documentation to reflect changes
Frederick Lawler (4): security, lsm: Introduce security_create_user_ns() bpf-lsm: Make bpf_lsm_userns_create() sleepable selftests/bpf: Add tests verifying bpf lsm userns_create hook selinux: Implement userns_create hook
include/linux/lsm_hook_defs.h | 1 + include/linux/lsm_hooks.h | 4 + include/linux/security.h | 6 ++ kernel/bpf/bpf_lsm.c | 1 + kernel/user_namespace.c | 5 + security/security.c | 5 + security/selinux/hooks.c | 9 ++ security/selinux/include/classmap.h | 2 + .../selftests/bpf/prog_tests/deny_namespace.c | 102 ++++++++++++++++++ .../selftests/bpf/progs/test_deny_namespace.c | 33 ++++++ 10 files changed, 168 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/deny_namespace.c create mode 100644 tools/testing/selftests/bpf/progs/test_deny_namespace.c
Preventing user namespace (privileged or otherwise) creation comes in a few of forms in order of granularity:
1. /proc/sys/user/max_user_namespaces sysctl 2. OS specific patch(es) 3. CONFIG_USER_NS
To block a task based on its attributes, the LSM hook cred_prepare is a good candidate for use because it provides more granular control, and it is called before create_user_ns():
cred = prepare_creds() security_prepare_creds() call_int_hook(cred_prepare, ... if (cred) create_user_ns(cred)
Since security_prepare_creds() is meant for LSMs to copy and prepare credentials, access control is an unintended use of the hook. Therefore introduce a new function security_create_user_ns() with an accompanying userns_create LSM hook.
This hook takes the prepared creds for LSM authors to write policy against. On success, the new namespace is applied to credentials, otherwise an error is returned.
Signed-off-by: Frederick Lawler fred@cloudflare.com Reviewed-by: Christian Brauner (Microsoft) brauner@kernel.org
--- Changes since v3: - No changes Changes since v2: - Rename create_user_ns hook to userns_create Changes since v1: - Changed commit wording - Moved execution to be after id mapping check - Changed signature to only accept a const struct cred * --- include/linux/lsm_hook_defs.h | 1 + include/linux/lsm_hooks.h | 4 ++++ include/linux/security.h | 6 ++++++ kernel/user_namespace.c | 5 +++++ security/security.c | 5 +++++ 5 files changed, 21 insertions(+)
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h index eafa1d2489fd..7ff93cb8ca8d 100644 --- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -223,6 +223,7 @@ LSM_HOOK(int, -ENOSYS, task_prctl, int option, unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5) LSM_HOOK(void, LSM_RET_VOID, task_to_inode, struct task_struct *p, struct inode *inode) +LSM_HOOK(int, 0, userns_create, const struct cred *cred) LSM_HOOK(int, 0, ipc_permission, struct kern_ipc_perm *ipcp, short flag) LSM_HOOK(void, LSM_RET_VOID, ipc_getsecid, struct kern_ipc_perm *ipcp, u32 *secid) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 91c8146649f5..54fe534d0e01 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -799,6 +799,10 @@ * security attributes, e.g. for /proc/pid inodes. * @p contains the task_struct for the task. * @inode contains the inode structure for the inode. + * @userns_create: + * Check permission prior to creating a new user namespace. + * @cred points to prepared creds. + * Return 0 if successful, otherwise < 0 error code. * * Security hooks for Netlink messaging. * diff --git a/include/linux/security.h b/include/linux/security.h index 7fc4e9f49f54..a195bf33246a 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -435,6 +435,7 @@ int security_task_kill(struct task_struct *p, struct kernel_siginfo *info, int security_task_prctl(int option, unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5); void security_task_to_inode(struct task_struct *p, struct inode *inode); +int security_create_user_ns(const struct cred *cred); int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag); void security_ipc_getsecid(struct kern_ipc_perm *ipcp, u32 *secid); int security_msg_msg_alloc(struct msg_msg *msg); @@ -1185,6 +1186,11 @@ static inline int security_task_prctl(int option, unsigned long arg2, static inline void security_task_to_inode(struct task_struct *p, struct inode *inode) { }
+static inline int security_create_user_ns(const struct cred *cred) +{ + return 0; +} + static inline int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag) { diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 5481ba44a8d6..3f464bbda0e9 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -9,6 +9,7 @@ #include <linux/highuid.h> #include <linux/cred.h> #include <linux/securebits.h> +#include <linux/security.h> #include <linux/keyctl.h> #include <linux/key-type.h> #include <keys/user-type.h> @@ -113,6 +114,10 @@ int create_user_ns(struct cred *new) !kgid_has_mapping(parent_ns, group)) goto fail_dec;
+ ret = security_create_user_ns(new); + if (ret < 0) + goto fail_dec; + ret = -ENOMEM; ns = kmem_cache_zalloc(user_ns_cachep, GFP_KERNEL); if (!ns) diff --git a/security/security.c b/security/security.c index 188b8f782220..ec9b4696e86c 100644 --- a/security/security.c +++ b/security/security.c @@ -1903,6 +1903,11 @@ void security_task_to_inode(struct task_struct *p, struct inode *inode) call_void_hook(task_to_inode, p, inode); }
+int security_create_user_ns(const struct cred *cred) +{ + return call_int_hook(userns_create, 0, cred); +} + int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag) { return call_int_hook(ipc_permission, 0, ipcp, flag);
On Mon, Aug 1, 2022 at 8:02 PM Frederick Lawler fred@cloudflare.com wrote:
Preventing user namespace (privileged or otherwise) creation comes in a few of forms in order of granularity:
1. /proc/sys/user/max_user_namespaces sysctl 2. OS specific patch(es) 3. CONFIG_USER_NS
To block a task based on its attributes, the LSM hook cred_prepare is a good candidate for use because it provides more granular control, and it is called before create_user_ns():
cred = prepare_creds() security_prepare_creds() call_int_hook(cred_prepare, ... if (cred) create_user_ns(cred)
Since security_prepare_creds() is meant for LSMs to copy and prepare credentials, access control is an unintended use of the hook. Therefore introduce a new function security_create_user_ns() with an accompanying userns_create LSM hook.
This hook takes the prepared creds for LSM authors to write policy against. On success, the new namespace is applied to credentials, otherwise an error is returned.
Signed-off-by: Frederick Lawler fred@cloudflare.com Reviewed-by: Christian Brauner (Microsoft) brauner@kernel.org
Reviewed-by: KP Singh kpsingh@kernel.org
This looks useful, and I would also like folks to consider the observability aspects of BPF LSM as brought up here:
https://lore.kernel.org/all/CAEiveUdPhEPAk7Y0ZXjPsD=Vb5hn453CHzS9aG-tkyRa8bf...
Frederick, what about adding the observability aspects to the commit description as well.
- KP
Changes since v3:
- No changes
Changes since v2:
- Rename create_user_ns hook to userns_create
Changes since v1:
- Changed commit wording
- Moved execution to be after id mapping check
- Changed signature to only accept a const struct cred *
include/linux/lsm_hook_defs.h | 1 + include/linux/lsm_hooks.h | 4 ++++ include/linux/security.h | 6 ++++++ kernel/user_namespace.c | 5 +++++ security/security.c | 5 +++++ 5 files changed, 21 insertions(+)
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h index eafa1d2489fd..7ff93cb8ca8d 100644 --- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -223,6 +223,7 @@ LSM_HOOK(int, -ENOSYS, task_prctl, int option, unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5) LSM_HOOK(void, LSM_RET_VOID, task_to_inode, struct task_struct *p, struct inode *inode) +LSM_HOOK(int, 0, userns_create, const struct cred *cred) LSM_HOOK(int, 0, ipc_permission, struct kern_ipc_perm *ipcp, short flag) LSM_HOOK(void, LSM_RET_VOID, ipc_getsecid, struct kern_ipc_perm *ipcp, u32 *secid) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 91c8146649f5..54fe534d0e01 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -799,6 +799,10 @@
security attributes, e.g. for /proc/pid inodes.
@p contains the task_struct for the task.
@inode contains the inode structure for the inode.
- @userns_create:
Check permission prior to creating a new user namespace.
@cred points to prepared creds.
Return 0 if successful, otherwise < 0 error code.
- Security hooks for Netlink messaging.
diff --git a/include/linux/security.h b/include/linux/security.h index 7fc4e9f49f54..a195bf33246a 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -435,6 +435,7 @@ int security_task_kill(struct task_struct *p, struct kernel_siginfo *info, int security_task_prctl(int option, unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5); void security_task_to_inode(struct task_struct *p, struct inode *inode); +int security_create_user_ns(const struct cred *cred); int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag); void security_ipc_getsecid(struct kern_ipc_perm *ipcp, u32 *secid); int security_msg_msg_alloc(struct msg_msg *msg); @@ -1185,6 +1186,11 @@ static inline int security_task_prctl(int option, unsigned long arg2, static inline void security_task_to_inode(struct task_struct *p, struct inode *inode) { }
+static inline int security_create_user_ns(const struct cred *cred) +{
return 0;
+}
static inline int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag) { diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 5481ba44a8d6..3f464bbda0e9 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -9,6 +9,7 @@ #include <linux/highuid.h> #include <linux/cred.h> #include <linux/securebits.h> +#include <linux/security.h> #include <linux/keyctl.h> #include <linux/key-type.h> #include <keys/user-type.h> @@ -113,6 +114,10 @@ int create_user_ns(struct cred *new) !kgid_has_mapping(parent_ns, group)) goto fail_dec;
ret = security_create_user_ns(new);
if (ret < 0)
goto fail_dec;
ret = -ENOMEM; ns = kmem_cache_zalloc(user_ns_cachep, GFP_KERNEL); if (!ns)
diff --git a/security/security.c b/security/security.c index 188b8f782220..ec9b4696e86c 100644 --- a/security/security.c +++ b/security/security.c @@ -1903,6 +1903,11 @@ void security_task_to_inode(struct task_struct *p, struct inode *inode) call_void_hook(task_to_inode, p, inode); }
+int security_create_user_ns(const struct cred *cred) +{
return call_int_hook(userns_create, 0, cred);
+}
int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag) { return call_int_hook(ipc_permission, 0, ipcp, flag); -- 2.30.2
On 8/2/22 4:47 PM, KP Singh wrote:
On Mon, Aug 1, 2022 at 8:02 PM Frederick Lawler fred@cloudflare.com wrote:
Preventing user namespace (privileged or otherwise) creation comes in a few of forms in order of granularity:
1. /proc/sys/user/max_user_namespaces sysctl 2. OS specific patch(es) 3. CONFIG_USER_NS
To block a task based on its attributes, the LSM hook cred_prepare is a good candidate for use because it provides more granular control, and it is called before create_user_ns():
cred = prepare_creds() security_prepare_creds() call_int_hook(cred_prepare, ... if (cred) create_user_ns(cred)
Since security_prepare_creds() is meant for LSMs to copy and prepare credentials, access control is an unintended use of the hook. Therefore introduce a new function security_create_user_ns() with an accompanying userns_create LSM hook.
This hook takes the prepared creds for LSM authors to write policy against. On success, the new namespace is applied to credentials, otherwise an error is returned.
Signed-off-by: Frederick Lawler fred@cloudflare.com Reviewed-by: Christian Brauner (Microsoft) brauner@kernel.org
Reviewed-by: KP Singh kpsingh@kernel.org
This looks useful, and I would also like folks to consider the observability aspects of BPF LSM as brought up here:
https://lore.kernel.org/all/CAEiveUdPhEPAk7Y0ZXjPsD=Vb5hn453CHzS9aG-tkyRa8bf...
Frederick, what about adding the observability aspects to the commit description as well.
Agreed. I'll include that in v5.
- KP
Changes since v3:
- No changes
Changes since v2:
- Rename create_user_ns hook to userns_create
Changes since v1:
- Changed commit wording
- Moved execution to be after id mapping check
- Changed signature to only accept a const struct cred *
include/linux/lsm_hook_defs.h | 1 + include/linux/lsm_hooks.h | 4 ++++ include/linux/security.h | 6 ++++++ kernel/user_namespace.c | 5 +++++ security/security.c | 5 +++++ 5 files changed, 21 insertions(+)
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h index eafa1d2489fd..7ff93cb8ca8d 100644 --- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -223,6 +223,7 @@ LSM_HOOK(int, -ENOSYS, task_prctl, int option, unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5) LSM_HOOK(void, LSM_RET_VOID, task_to_inode, struct task_struct *p, struct inode *inode) +LSM_HOOK(int, 0, userns_create, const struct cred *cred) LSM_HOOK(int, 0, ipc_permission, struct kern_ipc_perm *ipcp, short flag) LSM_HOOK(void, LSM_RET_VOID, ipc_getsecid, struct kern_ipc_perm *ipcp, u32 *secid) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 91c8146649f5..54fe534d0e01 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -799,6 +799,10 @@
security attributes, e.g. for /proc/pid inodes.
@p contains the task_struct for the task.
@inode contains the inode structure for the inode.
- @userns_create:
Check permission prior to creating a new user namespace.
@cred points to prepared creds.
Return 0 if successful, otherwise < 0 error code.
- Security hooks for Netlink messaging.
diff --git a/include/linux/security.h b/include/linux/security.h index 7fc4e9f49f54..a195bf33246a 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -435,6 +435,7 @@ int security_task_kill(struct task_struct *p, struct kernel_siginfo *info, int security_task_prctl(int option, unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5); void security_task_to_inode(struct task_struct *p, struct inode *inode); +int security_create_user_ns(const struct cred *cred); int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag); void security_ipc_getsecid(struct kern_ipc_perm *ipcp, u32 *secid); int security_msg_msg_alloc(struct msg_msg *msg); @@ -1185,6 +1186,11 @@ static inline int security_task_prctl(int option, unsigned long arg2, static inline void security_task_to_inode(struct task_struct *p, struct inode *inode) { }
+static inline int security_create_user_ns(const struct cred *cred) +{
return 0;
+}
- static inline int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag) {
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 5481ba44a8d6..3f464bbda0e9 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -9,6 +9,7 @@ #include <linux/highuid.h> #include <linux/cred.h> #include <linux/securebits.h> +#include <linux/security.h> #include <linux/keyctl.h> #include <linux/key-type.h> #include <keys/user-type.h> @@ -113,6 +114,10 @@ int create_user_ns(struct cred *new) !kgid_has_mapping(parent_ns, group)) goto fail_dec;
ret = security_create_user_ns(new);
if (ret < 0)
goto fail_dec;
ret = -ENOMEM; ns = kmem_cache_zalloc(user_ns_cachep, GFP_KERNEL); if (!ns)
diff --git a/security/security.c b/security/security.c index 188b8f782220..ec9b4696e86c 100644 --- a/security/security.c +++ b/security/security.c @@ -1903,6 +1903,11 @@ void security_task_to_inode(struct task_struct *p, struct inode *inode) call_void_hook(task_to_inode, p, inode); }
+int security_create_user_ns(const struct cred *cred) +{
return call_int_hook(userns_create, 0, cred);
+}
- int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag) { return call_int_hook(ipc_permission, 0, ipcp, flag);
-- 2.30.2
Users may want to audit calls to security_create_user_ns() and access user space memory. Also create_user_ns() runs without pagefault_disabled(). Therefore, make bpf_lsm_userns_create() sleepable for mandatory access control policies.
Signed-off-by: Frederick Lawler fred@cloudflare.com Acked-by: Christian Brauner (Microsoft) brauner@kernel.org
--- Changes since v3: - None Changes since v2: - Rename create_user_ns hook to userns_create Changes since v1: - None --- kernel/bpf/bpf_lsm.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c index c1351df9f7ee..4593437809cc 100644 --- a/kernel/bpf/bpf_lsm.c +++ b/kernel/bpf/bpf_lsm.c @@ -250,6 +250,7 @@ BTF_ID(func, bpf_lsm_task_getsecid_obj) BTF_ID(func, bpf_lsm_task_prctl) BTF_ID(func, bpf_lsm_task_setscheduler) BTF_ID(func, bpf_lsm_task_to_inode) +BTF_ID(func, bpf_lsm_userns_create) BTF_SET_END(sleepable_lsm_hooks)
bool bpf_lsm_is_sleepable_hook(u32 btf_id)
On Mon, Aug 01, 2022 at 01:01:44PM -0500, Frederick Lawler wrote:
Users may want to audit calls to security_create_user_ns() and access user space memory. Also create_user_ns() runs without pagefault_disabled(). Therefore, make bpf_lsm_userns_create() sleepable for mandatory access control policies.
Signed-off-by: Frederick Lawler fred@cloudflare.com Acked-by: Christian Brauner (Microsoft) brauner@kernel.org
We can take this set through bpf-next tree if it's easier.
Or if it goes through other trees: Acked-by: Alexei Starovoitov ast@kernel.org
On Mon, Aug 1, 2022 at 7:00 PM Alexei Starovoitov alexei.starovoitov@gmail.com wrote:
On Mon, Aug 01, 2022 at 01:01:44PM -0500, Frederick Lawler wrote:
Users may want to audit calls to security_create_user_ns() and access user space memory. Also create_user_ns() runs without pagefault_disabled(). Therefore, make bpf_lsm_userns_create() sleepable for mandatory access control policies.
Signed-off-by: Frederick Lawler fred@cloudflare.com Acked-by: Christian Brauner (Microsoft) brauner@kernel.org
We can take this set through bpf-next tree if it's easier.
Thanks Alexei, but I'm currently planning to merge it into the LSM next branch once the merge window closes.
Or if it goes through other trees: Acked-by: Alexei Starovoitov ast@kernel.org
I appreciate the review/ACK, would you mind reviewing the tests too (patch 3/4)?
On Mon, Aug 1, 2022 at 8:02 PM Frederick Lawler fred@cloudflare.com wrote:
Users may want to audit calls to security_create_user_ns() and access user space memory. Also create_user_ns() runs without pagefault_disabled(). Therefore, make bpf_lsm_userns_create() sleepable for mandatory access control policies.
Signed-off-by: Frederick Lawler fred@cloudflare.com Acked-by: Christian Brauner (Microsoft) brauner@kernel.org
Acked-by: KP Singh kpsingh@kernel.org
The LSM hook userns_create was introduced to provide LSM's an opportunity to block or allow unprivileged user namespace creation. This test serves two purposes: it provides a test eBPF implementation, and tests the hook successfully blocks or allows user namespace creation.
This tests 3 cases:
1. Unattached bpf program does not block unpriv user namespace creation. 2. Attached bpf program allows user namespace creation given CAP_SYS_ADMIN privileges. 3. Attached bpf program denies user namespace creation for a user without CAP_SYS_ADMIN.
Signed-off-by: Frederick Lawler fred@cloudflare.com
--- The generic deny_namespace file name is used for future namespace expansion. I didn't want to limit these files to just the create_user_ns hook. Changes since v3: - Explicitly set CAP_SYS_ADMIN to test namespace is created given permission - Simplify BPF test to use sleepable hook only - Prefer unshare() over clone() for tests Changes since v2: - Rename create_user_ns hook to userns_create Changes since v1: - Introduce this patch --- .../selftests/bpf/prog_tests/deny_namespace.c | 102 ++++++++++++++++++ .../selftests/bpf/progs/test_deny_namespace.c | 33 ++++++ 2 files changed, 135 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/deny_namespace.c create mode 100644 tools/testing/selftests/bpf/progs/test_deny_namespace.c
diff --git a/tools/testing/selftests/bpf/prog_tests/deny_namespace.c b/tools/testing/selftests/bpf/prog_tests/deny_namespace.c new file mode 100644 index 000000000000..1bc6241b755b --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/deny_namespace.c @@ -0,0 +1,102 @@ +// SPDX-License-Identifier: GPL-2.0 +#define _GNU_SOURCE +#include <test_progs.h> +#include "test_deny_namespace.skel.h" +#include <sched.h> +#include "cap_helpers.h" +#include <stdio.h> + +static int wait_for_pid(pid_t pid) +{ + int status, ret; + +again: + ret = waitpid(pid, &status, 0); + if (ret == -1) { + if (errno == EINTR) + goto again; + + return -1; + } + + if (!WIFEXITED(status)) + return -1; + + return WEXITSTATUS(status); +} + +/* negative return value -> some internal error + * positive return value -> userns creation failed + * 0 -> userns creation succeeded + */ +static int create_user_ns(void) +{ + pid_t pid; + + pid = fork(); + if (pid < 0) + return -1; + + if (pid == 0) { + if (unshare(CLONE_NEWUSER)) + _exit(EXIT_FAILURE); + _exit(EXIT_SUCCESS); + } + + return wait_for_pid(pid); +} + +static void test_userns_create_bpf(void) +{ + __u32 cap_mask = 1ULL << CAP_SYS_ADMIN; + __u64 old_caps = 0; + + cap_enable_effective(cap_mask, &old_caps); + + ASSERT_OK(create_user_ns(), "priv new user ns"); + + cap_disable_effective(cap_mask, &old_caps); + + ASSERT_EQ(create_user_ns(), EPERM, "unpriv new user ns"); + + if (cap_mask & old_caps) + cap_enable_effective(cap_mask, NULL); +} + +static void test_unpriv_userns_create_no_bpf(void) +{ + __u32 cap_mask = 1ULL << CAP_SYS_ADMIN; + __u64 old_caps = 0; + + cap_disable_effective(cap_mask, &old_caps); + + ASSERT_OK(create_user_ns(), "no-bpf unpriv new user ns"); + + if (cap_mask & old_caps) + cap_enable_effective(cap_mask, NULL); +} + +void test_deny_namespace(void) +{ + struct test_deny_namespace *skel = NULL; + int err; + + if (test__start_subtest("unpriv_userns_create_no_bpf")) + test_unpriv_userns_create_no_bpf(); + + skel = test_deny_namespace__open_and_load(); + if (!ASSERT_OK_PTR(skel, "skel load")) + goto close_prog; + + err = test_deny_namespace__attach(skel); + if (!ASSERT_OK(err, "attach")) + goto close_prog; + + if (test__start_subtest("userns_create_bpf")) + test_userns_create_bpf(); + + test_deny_namespace__detach(skel); + +close_prog: + test_deny_namespace__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/progs/test_deny_namespace.c b/tools/testing/selftests/bpf/progs/test_deny_namespace.c new file mode 100644 index 000000000000..09ad5a4ebd1f --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_deny_namespace.c @@ -0,0 +1,33 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <linux/bpf.h> +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_tracing.h> +#include <errno.h> +#include <linux/capability.h> + +struct kernel_cap_struct { + __u32 cap[_LINUX_CAPABILITY_U32S_3]; +} __attribute__((preserve_access_index)); + +struct cred { + struct kernel_cap_struct cap_effective; +} __attribute__((preserve_access_index)); + +char _license[] SEC("license") = "GPL"; + +SEC("lsm.s/userns_create") +int BPF_PROG(test_userns_create, const struct cred *cred, int ret) +{ + struct kernel_cap_struct caps = cred->cap_effective; + int cap_index = CAP_TO_INDEX(CAP_SYS_ADMIN); + __u32 cap_mask = CAP_TO_MASK(CAP_SYS_ADMIN); + + if (ret) + return 0; + + ret = -EPERM; + if (caps.cap[cap_index] & cap_mask) + return 0; + + return -EPERM; +}
On Mon, Aug 1, 2022 at 8:02 PM Frederick Lawler fred@cloudflare.com wrote:
The LSM hook userns_create was introduced to provide LSM's an opportunity to block or allow unprivileged user namespace creation. This test serves two purposes: it provides a test eBPF implementation, and tests the hook successfully blocks or allows user namespace creation.
This tests 3 cases:
1. Unattached bpf program does not block unpriv user namespace creation. 2. Attached bpf program allows user namespace creation given CAP_SYS_ADMIN privileges. 3. Attached bpf program denies user namespace creation for a user without CAP_SYS_ADMIN.
Signed-off-by: Frederick Lawler fred@cloudflare.com
Looks good to me (Also checked it on vmtest.sh)
Acked-by: KP Singh kpsingh@kernel.org
Unprivileged user namespace creation is an intended feature to enable sandboxing, however this feature is often used to as an initial step to perform a privilege escalation attack.
This patch implements a new user_namespace { create } access control permission to restrict which domains allow or deny user namespace creation. This is necessary for system administrators to quickly protect their systems while waiting for vulnerability patches to be applied.
This permission can be used in the following way:
allow domA_t domA_t : user_namespace { create };
Signed-off-by: Frederick Lawler fred@cloudflare.com
--- Changes since v3: - None Changes since v2: - Rename create_user_ns hook to userns_create - Use user_namespace as an object opposed to a generic namespace object - s/domB_t/domA_t in commit message Changes since v1: - Introduce this patch --- security/selinux/hooks.c | 9 +++++++++ security/selinux/include/classmap.h | 2 ++ 2 files changed, 11 insertions(+)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index beceb89f68d9..afc9da0249e7 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -4227,6 +4227,14 @@ static void selinux_task_to_inode(struct task_struct *p, spin_unlock(&isec->lock); }
+static int selinux_userns_create(const struct cred *cred) +{ + u32 sid = current_sid(); + + return avc_has_perm(&selinux_state, sid, sid, SECCLASS_USER_NAMESPACE, + USER_NAMESPACE__CREATE, NULL); +} + /* Returns error only if unable to parse addresses */ static int selinux_parse_skb_ipv4(struct sk_buff *skb, struct common_audit_data *ad, u8 *proto) @@ -7117,6 +7125,7 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(task_movememory, selinux_task_movememory), LSM_HOOK_INIT(task_kill, selinux_task_kill), LSM_HOOK_INIT(task_to_inode, selinux_task_to_inode), + LSM_HOOK_INIT(userns_create, selinux_userns_create),
LSM_HOOK_INIT(ipc_permission, selinux_ipc_permission), LSM_HOOK_INIT(ipc_getsecid, selinux_ipc_getsecid), diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h index ff757ae5f253..0bff55bb9cde 100644 --- a/security/selinux/include/classmap.h +++ b/security/selinux/include/classmap.h @@ -254,6 +254,8 @@ const struct security_class_mapping secclass_map[] = { { COMMON_FILE_PERMS, NULL } }, { "io_uring", { "override_creds", "sqpoll", NULL } }, + { "user_namespace", + { "create", NULL } }, { NULL } };
Frederick Lawler fred@cloudflare.com writes:
While creating a LSM BPF MAC policy to block user namespace creation, we used the LSM cred_prepare hook because that is the closest hook to prevent a call to create_user_ns().
Re-nack for all of the same reasons. AKA This can only break the users of the user namespace.
Nacked-by: "Eric W. Biederman" ebiederm@xmission.com
You aren't fixing what your problem you are papering over it by denying access to the user namespace.
Nack Nack Nack.
Stop.
Go back to the drawing board.
Do not pass go.
Do not collect $200.
The calls look something like this:
cred = prepare_creds() security_prepare_creds() call_int_hook(cred_prepare, ... if (cred) create_user_ns(cred)
We noticed that error codes were not propagated from this hook and introduced a patch [1] to propagate those errors.
The discussion notes that security_prepare_creds() is not appropriate for MAC policies, and instead the hook is meant for LSM authors to prepare credentials for mutation. [2]
Ultimately, we concluded that a better course of action is to introduce a new security hook for LSM authors. [3]
This patch set first introduces a new security_create_user_ns() function and userns_create LSM hook, then marks the hook as sleepable in BPF.
Links:
- https://lore.kernel.org/all/20220608150942.776446-1-fred@cloudflare.com/
- https://lore.kernel.org/all/87y1xzyhub.fsf@email.froward.int.ebiederm.org/
- https://lore.kernel.org/all/9fe9cd9f-1ded-a179-8ded-5fde8960a586@cloudflare....
Past discussions: V3: https://lore.kernel.org/all/20220721172808.585539-1-fred@cloudflare.com/ V2: https://lore.kernel.org/all/20220707223228.1940249-1-fred@cloudflare.com/ V1: https://lore.kernel.org/all/20220621233939.993579-1-fred@cloudflare.com/
Changes since v3:
- Explicitly set CAP_SYS_ADMIN to test namespace is created given permission
- Simplify BPF test to use sleepable hook only
- Prefer unshare() over clone() for tests
Changes since v2:
- Rename create_user_ns hook to userns_create
- Use user_namespace as an object opposed to a generic namespace object
- s/domB_t/domA_t in commit message
Changes since v1:
- Add selftests/bpf: Add tests verifying bpf lsm create_user_ns hook patch
- Add selinux: Implement create_user_ns hook patch
- Change function signature of security_create_user_ns() to only take struct cred
- Move security_create_user_ns() call after id mapping check in create_user_ns()
- Update documentation to reflect changes
Frederick Lawler (4): security, lsm: Introduce security_create_user_ns() bpf-lsm: Make bpf_lsm_userns_create() sleepable selftests/bpf: Add tests verifying bpf lsm userns_create hook selinux: Implement userns_create hook
include/linux/lsm_hook_defs.h | 1 + include/linux/lsm_hooks.h | 4 + include/linux/security.h | 6 ++ kernel/bpf/bpf_lsm.c | 1 + kernel/user_namespace.c | 5 + security/security.c | 5 + security/selinux/hooks.c | 9 ++ security/selinux/include/classmap.h | 2 + .../selftests/bpf/prog_tests/deny_namespace.c | 102 ++++++++++++++++++ .../selftests/bpf/progs/test_deny_namespace.c | 33 ++++++ 10 files changed, 168 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/deny_namespace.c create mode 100644 tools/testing/selftests/bpf/progs/test_deny_namespace.c
Eric
On Mon, Aug 1, 2022 at 10:56 PM Eric W. Biederman ebiederm@xmission.com wrote:
Frederick Lawler fred@cloudflare.com writes:
While creating a LSM BPF MAC policy to block user namespace creation, we used the LSM cred_prepare hook because that is the closest hook to prevent a call to create_user_ns().
Re-nack for all of the same reasons. AKA This can only break the users of the user namespace.
Nacked-by: "Eric W. Biederman" ebiederm@xmission.com
You aren't fixing what your problem you are papering over it by denying access to the user namespace.
Nack Nack Nack.
Stop.
Go back to the drawing board.
Do not pass go.
Do not collect $200.
If you want us to take your comments seriously Eric, you need to provide the list with some constructive feedback that would allow Frederick to move forward with a solution to the use case that has been proposed. You response above may be many things, but it is certainly not that.
We've heard from different users now that there are very real use cases for this LSM hook. I understand you are concerned about adding additional controls to user namespaces, but these are controls requested by real users, and the controls being requested (LSM hooks, with BPF and SELinux implementations) are configurable by the *users* at *runtime*. This patchset does not force additional restrictions on user namespaces, it provides a mechanism that *users* can leverage to add additional granularity to the access controls surrounding user namespaces.
Eric, if you have a different approach in mind to adding a LSM hook to user namespace creation I think we would all very much like to hear about it. However, if you do not have any suggestions along those lines, and simply want to NACK any effort to add a LSM hook to user namespace creation, I think we all understand your point of view and respectfully disagree. Barring any new approaches or suggestions, I think Frederick's patches look reasonable and I still plan on merging them into the LSM next branch when the merge window closes.
Paul Moore paul@paul-moore.com writes:
On Mon, Aug 1, 2022 at 10:56 PM Eric W. Biederman ebiederm@xmission.com wrote:
Frederick Lawler fred@cloudflare.com writes:
While creating a LSM BPF MAC policy to block user namespace creation, we used the LSM cred_prepare hook because that is the closest hook to prevent a call to create_user_ns().
Re-nack for all of the same reasons. AKA This can only break the users of the user namespace.
Nacked-by: "Eric W. Biederman" ebiederm@xmission.com
You aren't fixing what your problem you are papering over it by denying access to the user namespace.
Nack Nack Nack.
Stop.
Go back to the drawing board.
Do not pass go.
Do not collect $200.
If you want us to take your comments seriously Eric, you need to provide the list with some constructive feedback that would allow Frederick to move forward with a solution to the use case that has been proposed. You response above may be many things, but it is certainly not that.
I did provide constructive feedback. My feedback to his problem was to address the real problem of bugs in the kernel.
It is not a constructive approach to shoot the messenger and is not a constructive approach to blow me off every time you reply.
I have proposed that is there is a subsystem that is unduly buggy we stop it from being enabled with a user-namespaces.
Further this is a hook really should have extra-ordinary requirements, as all it can do is add additional failure modes to something that does not really fail. AKA all it can do is break-userspace.
As such I need to see a justification on why it makes sense to break-userspace.
We've heard from different users now that there are very real use cases for this LSM hook. I understand you are concerned about adding additional controls to user namespaces, but these are controls requested by real users, and the controls being requested (LSM hooks, with BPF and SELinux implementations) are configurable by the *users* at *runtime*. This patchset does not force additional restrictions on user namespaces, it provides a mechanism that *users* can leverage to add additional granularity to the access controls surrounding user namespaces.
But that is not the problem that cloudfare encountered and are trying to solve.
At least that is not what I was told when I asked early in the review cycle.
All saying that is user-configurable does is shift the blame from the kernel maintainers to the users. Shift the responsibility from people who should have enough expertise to know what is going on to people who are by definition have other concerns, so are less likely to be as well informed, and less likely to come up with good solutions.
Eric, if you have a different approach in mind to adding a LSM hook to user namespace creation I think we would all very much like to hear about it. However, if you do not have any suggestions along those lines, and simply want to NACK any effort to add a LSM hook to user namespace creation, I think we all understand your point of view and respectfully disagree. Barring any new approaches or suggestions, I think Frederick's patches look reasonable and I still plan on merging them into the LSM next branch when the merge window closes.
But it is my code you are planning to merge this into, and your are asking me to support something.
I admit I have not had time to read everything. I am sick and tired and quite frankly very tired that people are busy wanting to shoot the messenger to the fact that there are bugs in the kernel.
I am speaking up and engaging as best as I can with objections that are not hot-air.
You are very much proposing to merge code that can only cause regressions and cause me grief. At least that is all I see. I don't see anything in the change descriptions of the change that refutes that.
I don't see any interaction in fact with my concerns.
In fact your last reply was to completely blow off my request on how to address the concerns that inspired this patch and to say other people have a use too.
At this point I am happy to turn your request around and ask that you address my concerns and not blow them off. As I have seen no constructive engagement with my concerns. I think that is reasonable as by definition I will get the support issues when some LSM has some ill-thought out idea of how things should work and I get the bug report.
Eric
On Mon, Aug 8, 2022 at 2:56 PM Eric W. Biederman ebiederm@xmission.com wrote:
Paul Moore paul@paul-moore.com writes:
On Mon, Aug 1, 2022 at 10:56 PM Eric W. Biederman ebiederm@xmission.com wrote:
Frederick Lawler fred@cloudflare.com writes:
While creating a LSM BPF MAC policy to block user namespace creation, we used the LSM cred_prepare hook because that is the closest hook to prevent a call to create_user_ns().
Re-nack for all of the same reasons. AKA This can only break the users of the user namespace.
Nacked-by: "Eric W. Biederman" ebiederm@xmission.com
You aren't fixing what your problem you are papering over it by denying access to the user namespace.
Nack Nack Nack.
Stop.
Go back to the drawing board.
Do not pass go.
Do not collect $200.
If you want us to take your comments seriously Eric, you need to provide the list with some constructive feedback that would allow Frederick to move forward with a solution to the use case that has been proposed. You response above may be many things, but it is certainly not that.
I did provide constructive feedback. My feedback to his problem was to address the real problem of bugs in the kernel.
We've heard from several people who have use cases which require adding LSM-level access controls and observability to user namespace creation. This is the problem we are trying to solve here; if you do not like the approach proposed in this patchset please suggest another implementation that allows LSMs visibility into user namespace creation.
Paul Moore paul@paul-moore.com writes:
I did provide constructive feedback. My feedback to his problem was to address the real problem of bugs in the kernel.
We've heard from several people who have use cases which require adding LSM-level access controls and observability to user namespace creation. This is the problem we are trying to solve here; if you do not like the approach proposed in this patchset please suggest another implementation that allows LSMs visibility into user namespace creation.
Please stop, ignoring my feedback, not detailing what problem or problems you are actually trying to be solved, and threatening to merge code into files that I maintain that has the express purpose of breaking my users.
You just artificially constrained the problems, so that no other solution is acceptable. On that basis alone I am object to this whole approach to steam roll over me and my code.
Eric
"Eric W. Biederman" ebiederm@xmission.com writes:
Paul Moore paul@paul-moore.com writes:
I did provide constructive feedback. My feedback to his problem was to address the real problem of bugs in the kernel.
We've heard from several people who have use cases which require adding LSM-level access controls and observability to user namespace creation. This is the problem we are trying to solve here; if you do not like the approach proposed in this patchset please suggest another implementation that allows LSMs visibility into user namespace creation.
Please stop, ignoring my feedback, not detailing what problem or problems you are actually trying to be solved, and threatening to merge code into files that I maintain that has the express purpose of breaking my users.
You just artificially constrained the problems, so that no other solution is acceptable. On that basis alone I am object to this whole approach to steam roll over me and my code.
If you want an example of what kind of harm it can cause to introduce a failure where no failure was before I invite you to look at what happened with sendmail when setuid was modified to fail, when changing the user of a process would cause RLIMIT_NPROC to be exceeded.
I am not arguing that what you are proposing is that bad but unexpected failures cause real problems, and at a minimum that needs a better response than: "There is at least one user that wants a failure here".
Frankly I would love to see an argument that semantically it ever makes sense for creating a user namespace to fail. If that argument has already been made, my apologies to the person who made as I missed it, in being sick and tired, and frustrated at being blown off, when I asked for a proper discuss of the problem at hand.
Eric
On Mon, Aug 8, 2022 at 3:43 PM Eric W. Biederman ebiederm@xmission.com wrote:
"Eric W. Biederman" ebiederm@xmission.com writes:
Paul Moore paul@paul-moore.com writes:
I did provide constructive feedback. My feedback to his problem was to address the real problem of bugs in the kernel.
We've heard from several people who have use cases which require adding LSM-level access controls and observability to user namespace creation. This is the problem we are trying to solve here; if you do not like the approach proposed in this patchset please suggest another implementation that allows LSMs visibility into user namespace creation.
Please stop, ignoring my feedback, not detailing what problem or problems you are actually trying to be solved, and threatening to merge code into files that I maintain that has the express purpose of breaking my users.
You just artificially constrained the problems, so that no other solution is acceptable. On that basis alone I am object to this whole approach to steam roll over me and my code.
If you want an example of what kind of harm it can cause to introduce a failure where no failure was before I invite you to look at what happened with sendmail when setuid was modified to fail, when changing the user of a process would cause RLIMIT_NPROC to be exceeded.
I think we are all familiar with the sendmail capabilities bug and the others like it, but using that as an excuse to block additional access controls seems very weak. The Linux Kernel is very different from when the sendmail bug hit (what was that, ~20 years ago?), with advancements in capabilities and other discretionary controls, as well as mandatory access controls which have enabled Linux to be certified through a number of third party security evaluations.
I am not arguing that what you are proposing is that bad but unexpected failures cause real problems, and at a minimum that needs a better response than: "There is at least one user that wants a failure here".
Let me fix that for you: "There are multiple users who want to have better visibility and access control for user namespace creation."
-- paul-moore.com
Paul Moore paul@paul-moore.com writes:
On Mon, Aug 8, 2022 at 3:43 PM Eric W. Biederman ebiederm@xmission.com wrote:
"Eric W. Biederman" ebiederm@xmission.com writes:
Paul Moore paul@paul-moore.com writes:
I did provide constructive feedback. My feedback to his problem was to address the real problem of bugs in the kernel.
We've heard from several people who have use cases which require adding LSM-level access controls and observability to user namespace creation. This is the problem we are trying to solve here; if you do not like the approach proposed in this patchset please suggest another implementation that allows LSMs visibility into user namespace creation.
Please stop, ignoring my feedback, not detailing what problem or problems you are actually trying to be solved, and threatening to merge code into files that I maintain that has the express purpose of breaking my users.
You just artificially constrained the problems, so that no other solution is acceptable. On that basis alone I am object to this whole approach to steam roll over me and my code.
If you want an example of what kind of harm it can cause to introduce a failure where no failure was before I invite you to look at what happened with sendmail when setuid was modified to fail, when changing the user of a process would cause RLIMIT_NPROC to be exceeded.
I think we are all familiar with the sendmail capabilities bug and the others like it, but using that as an excuse to block additional access controls seems very weak. The Linux Kernel is very different from when the sendmail bug hit (what was that, ~20 years ago?), with advancements in capabilities and other discretionary controls, as well as mandatory access controls which have enabled Linux to be certified through a number of third party security evaluations.
If you are familiar with scenarios like that then why is there not being due diligence performed to ensure that userspace won't break?
Certainly none of the paperwork you are talking about does that kind of checking and it most definitely is not happening before the code gets merged.
I am saying that performing that due diligence should be a requirement before anyone even thinks about merging a patch that adds permission checks where no existed before.
Sometimes changes to fix security bugs can get away with adding new restrictions because we know with a very very high degree of probability that the only thing that will break will be exploit code. In the rare case when real world applications are broken such changes need to be reverted or adapted. No one has even made the argument that only exploit code will be affected.
So I am sorry I am the one who has to be the one to get in the way of a broken process with semantic review, but due diligence has not been done. So I am say no way this code should be merged.
In addition to actually breaking existing userspace, I think there is a very real danger of breaking userspace, I think there is a very real danger of breaking network effects by making such a large change to the design of user namespaces.
I am not arguing that what you are proposing is that bad but unexpected failures cause real problems, and at a minimum that needs a better response than: "There is at least one user that wants a failure here".
Let me fix that for you: "There are multiple users who want to have better visibility and access control for user namespace creation."
Visibility sure. Design a proper hook for that. All the proposed hook can do is print an audit message. It can't allocate or manage any state as there is not the corresponding hook when a user namespace is freed. So the proposed hook is not appropriate for increasing visibility.
Access control. Not a chance unless it is carefully designed and reviewed. There is a very large cost to adding access control where it has not previously existed.
I talk about that cost as people breaking my users as that is how I see it. I don't see any discussion on why I am wrong.
If we are going to add an access controls I want to see someone point out something that is actually semantically a problem. What motivates an access control?
So far the only answer I have received is people want to reduce the attack surface of the kernel. I don't possibly see how reducing the attack surface by removing user namespaces makes the probability of having an exploitable kernel bug, anything approaching zero.
So I look at the calculus. Chance of actually breaking userspace, or preventing people with a legitimate use from using user namespaces > 0%. Chance of actually preventing a determined attacker from exploiting the kernel < 1%. Amount of work to maintain, non-zero, and I really don't like it.
Lots of work to achieve nothing but breaking some of my users.
So please stop trying to redesign my subsystem and cause me headaches, unless you are going to do the due diligence necessary to do so responsibly.
Eric
On Tue, Aug 9, 2022 at 12:08 PM Eric W. Biederman ebiederm@xmission.com wrote:
Paul Moore paul@paul-moore.com writes:
On Mon, Aug 8, 2022 at 3:43 PM Eric W. Biederman ebiederm@xmission.com wrote:
"Eric W. Biederman" ebiederm@xmission.com writes:
Paul Moore paul@paul-moore.com writes:
I did provide constructive feedback. My feedback to his problem was to address the real problem of bugs in the kernel.
We've heard from several people who have use cases which require adding LSM-level access controls and observability to user namespace creation. This is the problem we are trying to solve here; if you do not like the approach proposed in this patchset please suggest another implementation that allows LSMs visibility into user namespace creation.
Please stop, ignoring my feedback, not detailing what problem or problems you are actually trying to be solved, and threatening to merge code into files that I maintain that has the express purpose of breaking my users.
You just artificially constrained the problems, so that no other solution is acceptable. On that basis alone I am object to this whole approach to steam roll over me and my code.
If you want an example of what kind of harm it can cause to introduce a failure where no failure was before I invite you to look at what happened with sendmail when setuid was modified to fail, when changing the user of a process would cause RLIMIT_NPROC to be exceeded.
I think we are all familiar with the sendmail capabilities bug and the others like it, but using that as an excuse to block additional access controls seems very weak. The Linux Kernel is very different from when the sendmail bug hit (what was that, ~20 years ago?), with advancements in capabilities and other discretionary controls, as well as mandatory access controls which have enabled Linux to be certified through a number of third party security evaluations.
If you are familiar with scenarios like that then why is there not being due diligence performed to ensure that userspace won't break?
What level of due diligence would satisfy you Eric? This feels very much like an unbounded ask that can be used to permanently block any effort to add any form of additional access control around user namespace creation. If that isn't the case, and this request is being made in good faith, please elaborate on what you believe would be sufficient analysis of this patch?
Personally, the fact that the fork()/clone() variants and the unshare() syscall all can fail and return error codes to userspace seems like a reasonable level of safety. If userspace is currently ignoring the return values of fork/clone/unshare that is a problem independent of this patchset. Even looking at the existing create_user_ns() function shows several cases where the user namespace code can forcibly error out due to access controls, memory pressure, etc. I also see that additional restrictions were put on user namespace creation after it was introduced, e.g. the chroot restriction; I'm assuming that didn't break "your" users? If you can describe the analysis you did on that change, perhaps we can do the same for this patchset and call it sufficient, yes?
I am not arguing that what you are proposing is that bad but unexpected failures cause real problems, and at a minimum that needs a better response than: "There is at least one user that wants a failure here".
Let me fix that for you: "There are multiple users who want to have better visibility and access control for user namespace creation."
Visibility sure. Design a proper hook for that. All the proposed hook can do is print an audit message. It can't allocate or manage any state as there is not the corresponding hook when a user namespace is freed. So the proposed hook is not appropriate for increasing visibility.
Auditing very much increases visibility. Look at what the BPF LSM can do, observability is one of its primary goals.
Access control. Not a chance unless it is carefully designed and reviewed.
See the above. The relevant syscalls already have the risk of failure, if userspace is assuming fork/clone/unshare/etc. will never fail then that application is broken independent of this discussion.
Paul Moore paul@paul-moore.com writes:
What level of due diligence would satisfy you Eric?
Having a real conversation about what a change is doing and to talk about it's merits and it's pro's and cons. I can't promise I would be convinced but that is the kind of conversation it would take.
I was not trying to place an insurmountable barrier I was simply looking to see if people had been being careful and doing what is generally accepted for submitting a kernel patch. From all I can see that has completely not happened here.
If that isn't the case, and this request is being made in good faith
Again you are calling me a liar. I really don't appreciate that.
As for something already returning an error. The setuid system call also has error returns, and enforcing RLIMIT_NPROC caused sendmail to misbehave.
I bring up the past in this way only to illustrate that things can happen. That simply examining the kernel and not thinking about userspace really isn't enough.
I am also concerned about the ecosystem effects of adding random access control checks to a system call that does not perform access control checks.
As I said this patch is changing a rather fundamental design decision by adding an access control. A design decision that for the most part has worked out quite well, and has allowed applications to add sandboxing support to themselves without asking permission to anyone.
Adding an access control all of a sudden means application developers are having to ask for permission to things that are perfectly safe, and it means many parts of the kernel gets less love both in use and in maintenance.
It might be possible to convince me that design decision needs to change, or that what is being proposed is small enough it does not practically change that design decision.
Calling me a liar is not the way to change my mind. Ignoring me and pushing this through without addressing my concerns is not the way to change my mind.
I honestly I want what I asked for at the start. I want discussion of what problems are being solved so we can talk about the problem or problems and if this is even the appropriate solution to them.
Eric
On Tue, Aug 9, 2022 at 5:41 PM Eric W. Biederman ebiederm@xmission.com wrote:
Paul Moore paul@paul-moore.com writes:
What level of due diligence would satisfy you Eric?
Having a real conversation about what a change is doing and to talk about it's merits and it's pro's and cons. I can't promise I would be convinced but that is the kind of conversation it would take.
Earlier today you talked about due diligence to ensure that userspace won't break and I provided my reasoning on why userspace would not break (at least not because of this change). Userspace might be blocked from creating a new user namespace due to a security policy, but that would be the expected and desired outcome, not breakage. As far as your most recent comment regarding merit and pros/cons, I believe we have had that discussion (quite a few times already); it just seems you are not satisfied with the majority's conclusion.
Personally, I'm not sure there is anything more I can do to convince you that this patchset is reasonable; I'm going to leave it to others at this point, or we can all simply agree to disagree for the moment. Just as you haven't heard a compelling argument for this patchset, I haven't heard a compelling argument against it. Barring some significant new discussion point, or opinion, I still plan on merging this into the LSM next branch when the merge window closes next week so it has time to go through a full round of linux-next testing. Assuming no unresolvable problems are found during the additional testing I plan to send it to Linus during the v6.1 merge window and I'm guessing we will get to go through this all again. It's less than ideal, but I think this is where we are at right now.
On Tue, Aug 9, 2022 at 3:40 PM Paul Moore paul@paul-moore.com wrote:
On Tue, Aug 9, 2022 at 5:41 PM Eric W. Biederman ebiederm@xmission.com wrote:
Paul Moore paul@paul-moore.com writes:
What level of due diligence would satisfy you Eric?
Having a real conversation about what a change is doing and to talk about it's merits and it's pro's and cons. I can't promise I would be convinced but that is the kind of conversation it would take.
Earlier today you talked about due diligence to ensure that userspace won't break and I provided my reasoning on why userspace would not break (at least not because of this change). Userspace might be blocked from creating a new user namespace due to a security policy, but that would be the expected and desired outcome, not breakage. As far as your most recent comment regarding merit and pros/cons, I believe we have had that discussion (quite a few times already); it just seems you are not satisfied with the majority's conclusion.
Personally, I'm not sure there is anything more I can do to convince you that this patchset is reasonable; I'm going to leave it to others at this point, or we can all simply agree to disagree for the moment. Just as you haven't heard a compelling argument for this patchset, I haven't heard a compelling argument against it. Barring some significant new discussion point, or opinion, I still plan on merging this into the LSM next branch when the merge window closes next week so it has time to go through a full round of linux-next testing. Assuming no unresolvable problems are found during the additional testing I plan to send it to Linus during the v6.1 merge window and I'm guessing we will get to go through this all again. It's less than ideal, but I think this is where we are at right now.
+1
On 8/9/2022 9:07 AM, Eric W. Biederman wrote:
Paul Moore paul@paul-moore.com writes:
On Mon, Aug 8, 2022 at 3:43 PM Eric W. Biederman ebiederm@xmission.com wrote:
"Eric W. Biederman" ebiederm@xmission.com writes:
Paul Moore paul@paul-moore.com writes:
I did provide constructive feedback. My feedback to his problem was to address the real problem of bugs in the kernel.
We've heard from several people who have use cases which require adding LSM-level access controls and observability to user namespace creation. This is the problem we are trying to solve here; if you do not like the approach proposed in this patchset please suggest another implementation that allows LSMs visibility into user namespace creation.
Please stop, ignoring my feedback, not detailing what problem or problems you are actually trying to be solved, and threatening to merge code into files that I maintain that has the express purpose of breaking my users.
You just artificially constrained the problems, so that no other solution is acceptable. On that basis alone I am object to this whole approach to steam roll over me and my code.
If you want an example of what kind of harm it can cause to introduce a failure where no failure was before I invite you to look at what happened with sendmail when setuid was modified to fail, when changing the user of a process would cause RLIMIT_NPROC to be exceeded.
I think we are all familiar with the sendmail capabilities bug and the others like it, but using that as an excuse to block additional access controls seems very weak. The Linux Kernel is very different from when the sendmail bug hit (what was that, ~20 years ago?), with advancements in capabilities and other discretionary controls, as well as mandatory access controls which have enabled Linux to be certified through a number of third party security evaluations.
If you are familiar with scenarios like that then why is there not being due diligence performed to ensure that userspace won't break?
Certainly none of the paperwork you are talking about does that kind of checking and it most definitely is not happening before the code gets merged.
I am saying that performing that due diligence should be a requirement before anyone even thinks about merging a patch that adds permission checks where no existed before.
Sometimes changes to fix security bugs can get away with adding new restrictions because we know with a very very high degree of probability that the only thing that will break will be exploit code. In the rare case when real world applications are broken such changes need to be reverted or adapted. No one has even made the argument that only exploit code will be affected.
So I am sorry I am the one who has to be the one to get in the way of a broken process with semantic review, but due diligence has not been done. So I am say no way this code should be merged.
In addition to actually breaking existing userspace, I think there is a very real danger of breaking userspace, I think there is a very real danger of breaking network effects by making such a large change to the design of user namespaces.
I am not arguing that what you are proposing is that bad but unexpected failures cause real problems, and at a minimum that needs a better response than: "There is at least one user that wants a failure here".
Let me fix that for you: "There are multiple users who want to have better visibility and access control for user namespace creation."
Visibility sure. Design a proper hook for that. All the proposed hook can do is print an audit message. It can't allocate or manage any state as there is not the corresponding hook when a user namespace is freed. So the proposed hook is not appropriate for increasing visibility.
Access control. Not a chance unless it is carefully designed and reviewed. There is a very large cost to adding access control where it has not previously existed.
I talk about that cost as people breaking my users as that is how I see it. I don't see any discussion on why I am wrong.
If we are going to add an access controls I want to see someone point out something that is actually semantically a problem. What motivates an access control?
Smack has no interest in using the proposed hook because user namespaces are neither subjects nor objects. They are collections of DAC and/or privilege configuration alternatives. Or something like that. From the viewpoint of a security module that only implements old fashioned MAC there is no value in constraining user namespaces.
SELinux, on the other hand, seeks to be comprehensive well beyond controlling accesses between subjects and objects. Asking the question "should there be a control on this operation?" seems sufficient to justify adding the control to SELinux policy. This is characteristic of "Fine Grain" control.
So I'm of two minds on this. I don't need the hook, but I also understand why SELinux and BPF want it. I don't necessarily agree with their logic, but it is consistent with existing behavior. Any system that uses either of those security modules needs to be ready for unexpected denials based on any potential security concern. Keeping namespaces completely orthogonal to LSM seems doomed to failure eventually.
So far the only answer I have received is people want to reduce the attack surface of the kernel. I don't possibly see how reducing the attack surface by removing user namespaces makes the probability of having an exploitable kernel bug, anything approaching zero.
So I look at the calculus. Chance of actually breaking userspace, or preventing people with a legitimate use from using user namespaces > 0%. Chance of actually preventing a determined attacker from exploiting the kernel < 1%. Amount of work to maintain, non-zero, and I really don't like it.
Lots of work to achieve nothing but breaking some of my users.
So please stop trying to redesign my subsystem and cause me headaches, unless you are going to do the due diligence necessary to do so responsibly.
Eric
Casey Schaufler casey@schaufler-ca.com writes:
Smack has no interest in using the proposed hook because user namespaces are neither subjects nor objects. They are collections of DAC and/or privilege configuration alternatives. Or something like that. From the viewpoint of a security module that only implements old fashioned MAC there is no value in constraining user namespaces.
The goal is to simply enable things that become safe when you don't have to worry about confusing setuid executables.
SELinux, on the other hand, seeks to be comprehensive well beyond controlling accesses between subjects and objects. Asking the question "should there be a control on this operation?" seems sufficient to justify adding the control to SELinux policy. This is characteristic of "Fine Grain" control.
I see that from a theoretical standpoint. On the flip side I prefer arguments grounded in something more than what an abstract framework could appreciate. We are so far from any theoretical purity in the linux kernel that I can't see theoretical purity being enough to justify a decision like this.
So I'm of two minds on this. I don't need the hook, but I also understand why SELinux and BPF want it. I don't necessarily agree with their logic, but it is consistent with existing behavior. Any system that uses either of those security modules needs to be ready for unexpected denials based on any potential security concern. Keeping namespaces completely orthogonal to LSM seems doomed to failure eventually.
I can cross that bridge when there is a healthy conversation about such a change.
Too often I get "ouch! Creating a user namespace was used as the easiest way to exploit a security bug. Let's solve the issue by denying user namespaces." Which leads to half thought out policies made out of fear.
Which is where I think this conversation started. I haven't seen it make it's way to any healthy reasons to deny user namespaces yet.
Eric
On Mon, Aug 8, 2022 at 3:26 PM Eric W. Biederman ebiederm@xmission.com wrote:
Paul Moore paul@paul-moore.com writes:
I did provide constructive feedback. My feedback to his problem was to address the real problem of bugs in the kernel.
We've heard from several people who have use cases which require adding LSM-level access controls and observability to user namespace creation. This is the problem we are trying to solve here; if you do not like the approach proposed in this patchset please suggest another implementation that allows LSMs visibility into user namespace creation.
Please stop, ignoring my feedback, not detailing what problem or problems you are actually trying to be solved, and threatening to merge code into files that I maintain that has the express purpose of breaking my users.
I've heard you talk about bugs being the only reason why people would want to ever block user namespaces, but I think we've all seen use cases now where it goes beyond that. However, even if it didn't, the need to build high confidence/assurance systems where big chunks of functionality can be disabled based on a security policy is a very real use case, and this patchset would help enable that. I've noticed you like to talk about these hooks being a source of "regressions", but access controls are not regressions Eric, they are tools that system builders, administrators, and users use to secure their systems.
From my perspective, I believe that addresses your feedback around "fix the bugs" and "this is a regression", which is the only thing I've noted from your responses in this thread and others, but if I'm missing something more technical please let me/us know.
You just artificially constrained the problems, so that no other solution is acceptable.
There is a real need to be able to gain both additional visibility and access control over user namespace creation, please suggest the approach(es) you would find acceptable.
On that basis alone I am object to this whole approach to steam roll over me and my code.
I saw that choice of wording in your last email and thought it a bit curious, so I did a quick git log dump on kernel/user_namespace.c and I see approximately 31 contributors to that one file. I've always thought of the open source maintainer role as more of a "steward" and less of an "owner", but that's just my opinion.
Paul Moore paul@paul-moore.com writes:
On Mon, Aug 8, 2022 at 3:26 PM Eric W. Biederman ebiederm@xmission.com wrote:
Paul Moore paul@paul-moore.com writes:
I did provide constructive feedback. My feedback to his problem was to address the real problem of bugs in the kernel.
We've heard from several people who have use cases which require adding LSM-level access controls and observability to user namespace creation. This is the problem we are trying to solve here; if you do not like the approach proposed in this patchset please suggest another implementation that allows LSMs visibility into user namespace creation.
Please stop, ignoring my feedback, not detailing what problem or problems you are actually trying to be solved, and threatening to merge code into files that I maintain that has the express purpose of breaking my users.
I've heard you talk about bugs being the only reason why people would want to ever block user namespaces, but I think we've all seen use cases now where it goes beyond that.
I really have not, and I don't appreciate being called a liar.
However, even if it didn't, the need to build high confidence/assurance systems where big chunks of functionality can be disabled based on a security policy is a very real use case, and this patchset would help enable that.
Details please. What does this look like. What is the overall plan for attack surface reduction in one of these systems. How does it differ from seccomp?
How does this differ from setting /proc/sys/user/max_usernamespaces to 0?
Why is it only the user namespace that needs to be modified to implement such a system?
Why is there no discussion of that in the change description.
I've noticed you like to talk about these hooks being a source of "regressions", but access controls are not regressions Eric, they are tools that system builders, administrators, and users use to secure their systems.
From my perspective, I believe that addresses your feedback around "fix the bugs" and "this is a regression", which is the only thing I've noted from your responses in this thread and others, but if I'm missing something more technical please let me/us know.
Which is a short way of saying that the using this hook for attack surface reduction without a larger plan will be ineffective. If the attack surface is not sufficiently reduced it will not achieve a prevention of exploits and the attacks will still happen and be successful.
With a change that is designed to prevent exploits not actually doing so all that is left is breaking userspace and causing maintenance problems.
Earlier I asked to confirm that was the only reason cloudfare was interested in this change. I have asked that we have an actual conversation about what is trying to be achieved.
Instead the conversation has simply been about implementation issues and not about if the code will be worth having. So far in my book the code very much does not look worth having. That is my technical judgment and I don't see anyone taking about my arguments or even really engaging in them.
Since I keep getting blown off, instead of having my concerns addressed I say this code should not go.
You just artificially constrained the problems, so that no other solution is acceptable.
There is a real need to be able to gain both additional visibility and access control over user namespace creation, please suggest the approach(es) you would find acceptable.
The suggested hook is not at all appropriate for visibility. Either the user namespace needs to have some state that can be set, or there needs to be something that is notified when the user namespace goes away. At best the hook can print an audit message. So the proposed hook is really not appropriate to add visibility to the user namespace.
For the record I don't object to adding visibility, I am just pointing out the proposed hook is not appropriate to that task.
What is the need to have an access control?
Why do you need to fundamentally change the design of user namespaces?
Those are questions I have not seen any answers to. Without actual answers of what the actual problems are I can't have a reasonable technical conversation.
On that basis alone I am object to this whole approach to steam roll over me and my code.
I saw that choice of wording in your last email and thought it a bit curious, so I did a quick git log dump on kernel/user_namespace.c and I see approximately 31 contributors to that one file. I've always thought of the open source maintainer role as more of a "steward" and less of an "owner", but that's just my opinion.
As such it is unfortunately my responsibility to say no to badly thought out proposals. Proposals that will negatively affect the people using the code I maintain.
My apologies if I have not been more elegant right now when I have been constantly sick, and tired. People getting scared of user namespaces for no real reason has been an on-going trend for a decade or so. This isn't a new issue, and it irritates me that it is still going on. I have addressed real concerns and fixed code, for many many years.
This round of the people being afraid of user namespaces, I have yet to find any real concerns.
So when I express my concerns that this is a pointless exercise and people don't address my concern. I say no.
Eric
On Mon, Aug 08, 2022 at 03:16:16PM -0400, Paul Moore wrote:
On Mon, Aug 8, 2022 at 2:56 PM Eric W. Biederman ebiederm@xmission.com wrote:
Paul Moore paul@paul-moore.com writes:
On Mon, Aug 1, 2022 at 10:56 PM Eric W. Biederman ebiederm@xmission.com wrote:
Frederick Lawler fred@cloudflare.com writes:
While creating a LSM BPF MAC policy to block user namespace creation, we used the LSM cred_prepare hook because that is the closest hook to prevent a call to create_user_ns().
Re-nack for all of the same reasons. AKA This can only break the users of the user namespace.
Nacked-by: "Eric W. Biederman" ebiederm@xmission.com
You aren't fixing what your problem you are papering over it by denying access to the user namespace.
Nack Nack Nack.
Stop.
Go back to the drawing board.
Do not pass go.
Do not collect $200.
If you want us to take your comments seriously Eric, you need to provide the list with some constructive feedback that would allow Frederick to move forward with a solution to the use case that has been proposed. You response above may be many things, but it is certainly not that.
I did provide constructive feedback. My feedback to his problem was to address the real problem of bugs in the kernel.
We've heard from several people who have use cases which require adding LSM-level access controls and observability to user namespace creation. This is the problem we are trying to solve here; if you do not like the approach proposed in this patchset please suggest another implementation that allows LSMs visibility into user namespace creation.
Regarding the observability - can someone concisely lay out why just auditing userns creation would not suffice? Userspace could decide what to report based on whether the creating user_ns == /proc/1/ns/user...
Regarding limiting the tweaking of otherwise-privileged code by unprivileged users, i wonder whether we could instead add smarts to ns_capable(). Point being, uid mapping would still work, but we'd break the "privileged against resources you own" part of user namespaces. I would want it to default to allow, but then when a 0-day is found which requires reaching ns_capable() code, admins could easily prevent exploitation until reboot from a fixed kernel.
On Sun, Aug 14, 2022 at 11:55 AM Serge E. Hallyn serge@hallyn.com wrote:
On Mon, Aug 08, 2022 at 03:16:16PM -0400, Paul Moore wrote:
On Mon, Aug 8, 2022 at 2:56 PM Eric W. Biederman ebiederm@xmission.com wrote:
Paul Moore paul@paul-moore.com writes:
On Mon, Aug 1, 2022 at 10:56 PM Eric W. Biederman ebiederm@xmission.com wrote:
Frederick Lawler fred@cloudflare.com writes:
While creating a LSM BPF MAC policy to block user namespace creation, we used the LSM cred_prepare hook because that is the closest hook to prevent a call to create_user_ns().
Re-nack for all of the same reasons. AKA This can only break the users of the user namespace.
Nacked-by: "Eric W. Biederman" ebiederm@xmission.com
You aren't fixing what your problem you are papering over it by denying access to the user namespace.
Nack Nack Nack.
Stop.
Go back to the drawing board.
Do not pass go.
Do not collect $200.
If you want us to take your comments seriously Eric, you need to provide the list with some constructive feedback that would allow Frederick to move forward with a solution to the use case that has been proposed. You response above may be many things, but it is certainly not that.
I did provide constructive feedback. My feedback to his problem was to address the real problem of bugs in the kernel.
We've heard from several people who have use cases which require adding LSM-level access controls and observability to user namespace creation. This is the problem we are trying to solve here; if you do not like the approach proposed in this patchset please suggest another implementation that allows LSMs visibility into user namespace creation.
Regarding the observability - can someone concisely lay out why just auditing userns creation would not suffice? Userspace could decide what to report based on whether the creating user_ns == /proc/1/ns/user...
One of the selling points of the BPF LSM is that it allows for various different ways of reporting and logging beyond audit. However, even if it was limited to just audit I believe that provides some useful justification as auditing fork()/clone() isn't quite the same and could be difficult to do at scale in some configurations. I haven't personally added a BPF LSM program to the kernel so I can't speak to the details on what is possible, but I'm sure others on the To/CC line could help provide more information if that is important to you.
Regarding limiting the tweaking of otherwise-privileged code by unprivileged users, i wonder whether we could instead add smarts to ns_capable().
The existing security_capable() hook is eventually called by ns_capable():
ns_capable() ns_capable_common() security_capable(const struct cred *cred, struct user_namespace *ns, int cap, unsigned int opts);
... I'm not sure what additional smarts would be useful here?
[side note: SELinux does actually distinguish between capability checks in the initial user namespace vs child namespaces]
Point being, uid mapping would still work, but we'd break the "privileged against resources you own" part of user namespaces. I would want it to default to allow, but then when a 0-day is found which requires reaching ns_capable() code, admins could easily prevent exploitation until reboot from a fixed kernel.
That assumes that everything you care about is behind a capability check, which is probably going to be correct in a lot of the cases, but I think it would be a mistake to assume that is always going to be true.
-- paul-moore.com
On Sun, Aug 14, 2022 at 10:32:51PM -0400, Paul Moore wrote:
On Sun, Aug 14, 2022 at 11:55 AM Serge E. Hallyn serge@hallyn.com wrote:
On Mon, Aug 08, 2022 at 03:16:16PM -0400, Paul Moore wrote:
On Mon, Aug 8, 2022 at 2:56 PM Eric W. Biederman ebiederm@xmission.com wrote:
Paul Moore paul@paul-moore.com writes:
On Mon, Aug 1, 2022 at 10:56 PM Eric W. Biederman ebiederm@xmission.com wrote:
Frederick Lawler fred@cloudflare.com writes:
> While creating a LSM BPF MAC policy to block user namespace creation, we > used the LSM cred_prepare hook because that is the closest hook to prevent > a call to create_user_ns().
Re-nack for all of the same reasons. AKA This can only break the users of the user namespace.
Nacked-by: "Eric W. Biederman" ebiederm@xmission.com
You aren't fixing what your problem you are papering over it by denying access to the user namespace.
Nack Nack Nack.
Stop.
Go back to the drawing board.
Do not pass go.
Do not collect $200.
If you want us to take your comments seriously Eric, you need to provide the list with some constructive feedback that would allow Frederick to move forward with a solution to the use case that has been proposed. You response above may be many things, but it is certainly not that.
I did provide constructive feedback. My feedback to his problem was to address the real problem of bugs in the kernel.
We've heard from several people who have use cases which require adding LSM-level access controls and observability to user namespace creation. This is the problem we are trying to solve here; if you do not like the approach proposed in this patchset please suggest another implementation that allows LSMs visibility into user namespace creation.
Regarding the observability - can someone concisely lay out why just auditing userns creation would not suffice? Userspace could decide what to report based on whether the creating user_ns == /proc/1/ns/user...
One of the selling points of the BPF LSM is that it allows for various different ways of reporting and logging beyond audit. However, even if it was limited to just audit I believe that provides some useful justification as auditing fork()/clone() isn't quite the same and could be difficult to do at scale in some configurations. I haven't personally added a BPF LSM program to the kernel so I can't speak to the details on what is possible, but I'm sure others on the To/CC line could help provide more information if that is important to you.
Regarding limiting the tweaking of otherwise-privileged code by unprivileged users, i wonder whether we could instead add smarts to ns_capable().
The existing security_capable() hook is eventually called by ns_capable():
ns_capable() ns_capable_common() security_capable(const struct cred *cred, struct user_namespace *ns, int cap, unsigned int opts);
... I'm not sure what additional smarts would be useful here?
Oh - i wasn't necessarily thinking of an LSM. I was picturing a sysctl next to unprivileged_userns_clone. But you're right, looks like an LSM could already do this. Of course, there's an issue early on in that the root user in the new namespace couldn't setuid, so the uid mapping is still limited. So this idea probably isn't worth the characters we've typed about it so far, sorry.
[side note: SELinux does actually distinguish between capability checks in the initial user namespace vs child namespaces]
Point being, uid mapping would still work, but we'd break the "privileged against resources you own" part of user namespaces. I would want it to default to allow, but then when a 0-day is found which requires reaching ns_capable() code, admins could easily prevent exploitation until reboot from a fixed kernel.
That assumes that everything you care about is behind a capability check, which is probably going to be correct in a lot of the cases, but I think it would be a mistake to assume that is always going to be true.
I might be thinking wrongly, but if it's not behind a capability check, then it seems to me it's not an exploit that can only be reached by becoming root in a user namespace, which means disabling user namespace creation by unprivileged users will not stop the attack.
On Mon, Aug 15, 2022 at 11:41 AM Serge E. Hallyn serge@hallyn.com wrote:
On Sun, Aug 14, 2022 at 10:32:51PM -0400, Paul Moore wrote:
On Sun, Aug 14, 2022 at 11:55 AM Serge E. Hallyn serge@hallyn.com wrote:
On Mon, Aug 08, 2022 at 03:16:16PM -0400, Paul Moore wrote:
On Mon, Aug 8, 2022 at 2:56 PM Eric W. Biederman ebiederm@xmission.com wrote:
Paul Moore paul@paul-moore.com writes:
On Mon, Aug 1, 2022 at 10:56 PM Eric W. Biederman ebiederm@xmission.com wrote: > Frederick Lawler fred@cloudflare.com writes: > > > While creating a LSM BPF MAC policy to block user namespace creation, we > > used the LSM cred_prepare hook because that is the closest hook to prevent > > a call to create_user_ns(). > > Re-nack for all of the same reasons. > AKA This can only break the users of the user namespace. > > Nacked-by: "Eric W. Biederman" ebiederm@xmission.com > > You aren't fixing what your problem you are papering over it by denying > access to the user namespace. > > Nack Nack Nack. > > Stop. > > Go back to the drawing board. > > Do not pass go. > > Do not collect $200.
If you want us to take your comments seriously Eric, you need to provide the list with some constructive feedback that would allow Frederick to move forward with a solution to the use case that has been proposed. You response above may be many things, but it is certainly not that.
I did provide constructive feedback. My feedback to his problem was to address the real problem of bugs in the kernel.
We've heard from several people who have use cases which require adding LSM-level access controls and observability to user namespace creation. This is the problem we are trying to solve here; if you do not like the approach proposed in this patchset please suggest another implementation that allows LSMs visibility into user namespace creation.
Regarding the observability - can someone concisely lay out why just auditing userns creation would not suffice? Userspace could decide what to report based on whether the creating user_ns == /proc/1/ns/user...
One of the selling points of the BPF LSM is that it allows for various different ways of reporting and logging beyond audit. However, even if it was limited to just audit I believe that provides some useful justification as auditing fork()/clone() isn't quite the same and could be difficult to do at scale in some configurations. I haven't personally added a BPF LSM program to the kernel so I can't speak to the details on what is possible, but I'm sure others on the To/CC line could help provide more information if that is important to you.
Regarding limiting the tweaking of otherwise-privileged code by unprivileged users, i wonder whether we could instead add smarts to ns_capable().
The existing security_capable() hook is eventually called by ns_capable():
ns_capable() ns_capable_common() security_capable(const struct cred *cred, struct user_namespace *ns, int cap, unsigned int opts);
... I'm not sure what additional smarts would be useful here?
Oh - i wasn't necessarily thinking of an LSM. I was picturing a sysctl next to unprivileged_userns_clone. But you're right, looks like an LSM could already do this. Of course, there's an issue early on in that the root user in the new namespace couldn't setuid, so the uid mapping is still limited. So this idea probably isn't worth the characters we've typed about it so far, sorry.
No harm, no foul. This thread has already reached record lows with respect to usefulness-vs-characters ratio, a few more isn't going to hurt anything ;)
[side note: SELinux does actually distinguish between capability checks in the initial user namespace vs child namespaces]
Point being, uid mapping would still work, but we'd break the "privileged against resources you own" part of user namespaces. I would want it to default to allow, but then when a 0-day is found which requires reaching ns_capable() code, admins could easily prevent exploitation until reboot from a fixed kernel.
That assumes that everything you care about is behind a capability check, which is probably going to be correct in a lot of the cases, but I think it would be a mistake to assume that is always going to be true.
I might be thinking wrongly, but if it's not behind a capability check, then it seems to me it's not an exploit that can only be reached by becoming root in a user namespace, which means disabling user namespace creation by unprivileged users will not stop the attack.
I was primarily thinking about two things: subj/obj relationships which are really not addressed with capability checks, and unrelated problems which aren't the fault of the user namespace but could be somehow made easier through some of the unique situations offered by user namespaces. There are exploits that often require chaining together multiple "things" to trigger the necessary flaw, and sometimes the most immediate way to stop such an attack is to apply additional controls to one of these intermediate steps. Frekerick's work puts the necessary infrastructure in place so we can do that with user namespaces.
linux-kselftest-mirror@lists.linaro.org