On Thu, Aug 25, 2022 at 6:42 PM Song Liu songliubraving@fb.com wrote:
On Aug 25, 2022, at 3:10 PM, Paul Moore paul@paul-moore.com wrote: On Thu, Aug 25, 2022 at 5:58 PM Song Liu songliubraving@fb.com wrote:
...
I am new to user_namespace and security work, so please pardon me if anything below is very wrong.
IIUC, user_namespace is a tool that enables trusted userspace code to control the behavior of untrusted (or less trusted) userspace code. Failing create_user_ns() doesn't make the system more reliable. Specifically, we call create_user_ns() via two paths: fork/clone and unshare. For both paths, we need the userspace to use user_namespace, and to honor failed create_user_ns().
On the other hand, I would echo that killing the process is not practical in some use cases. Specifically, allowing the application to run in a less secure environment for a short period of time might be much better than killing it and taking down the whole service. Of course, there are other cases that security is more important, and taking down the whole service is the better choice.
I guess the ultimate solution is a way to enforce using user_namespace in the kernel (if it ever makes sense...).
The LSM framework, and the BPF and SELinux LSM implementations in this patchset, provide a mechanism to do just that: kernel enforced access controls using flexible security policies which can be tailored by the distro, solution provider, or end user to meet the specific needs of their use case.
In this case, I wouldn't call the kernel is enforcing access control. (I might be wrong). There are 3 components here: kernel, LSM, and trusted userspace (whoever calls unshare).
The LSM layer, and the LSMs themselves are part of the kernel; look at the changes in this patchset to see the LSM, BPF LSM, and SELinux kernel changes. Explaining how the different LSMs work is quite a bit beyond the scope of this discussion, but there is plenty of information available online that should be able to serve as an introduction, not to mention the kernel source itself. However, in very broad terms you can think of the individual LSMs as somewhat analogous to filesystem drivers, e.g. ext4, and the LSM itself as the VFS layer.
AFAICT, kernel simply passes the decision made by LSM (BPF or SELinux) to the trusted userspace. It is up to the trusted userspace to honor the return value of unshare().
With a LSM enabled and enforcing a security policy on user namespace creation, which appears to be the case of most concern, the kernel would make a decision on the namespace creation based on various factors (e.g. for SELinux this would be the calling process' security domain and the domain's permission set as determined by the configured security policy) and if the operation was rejected an error code would be returned to userspace and the operation rejected. It is the exact same thing as what would happen if the calling process is chrooted or doesn't have a proper UID/GID mapping. Don't forget that the create_user_ns() function already enforces a security policy and returns errors to userspace; this patchset doesn't add anything new in that regard, it just allows for a richer and more flexible security policy to be built on top of the existing constraints.
If the userspace simply ignores unshare failures, or does not call unshare(CLONE_NEWUSER), kernel and LSM cannot do much about it, right?
The process is still subject to any security policies that are active and being enforced by the kernel. A malicious or misconfigured application can still be constrained by the kernel using both the kernel's legacy Discretionary Access Controls (DAC) as well as the more comprehensive Mandatory Access Controls (MAC) provided by many of the LSMs.