In [1] Michal Koutný mkoutny@suse.com reported that it does not make sense to unconditionally exempt the INIT_USER during fork and exec from RLIMIT_NPROC and then to impose a limit on that same user with is_ucounts_overlimit. So I looked into why the exeception was added.
commit 909cc4ae86f3 ("[PATCH] Fix two bugs with process limits (RLIMIT_NPROC)") says:
If a setuid process swaps it's real and effective uids and then forks, the fork fails if the new realuid has more processes than the original process was limited to. This is particularly a problem if a user with a process limit (e.g. 256) runs a setuid-root program which does setuid() + fork() (e.g. lprng) while root already has more than 256 process (which is quite possible).
The root problem here is that a limit which should be a per-user limit is being implemented as a per-process limit with per-process (e.g. CAP_SYS_RESOURCE) controls. Being a per-user limit, it should be that the root-user can over-ride it, not just some process with CAP_SYS_RESOURCE.
This patch adds a test to ignore process limits if the real user is root.
With the moving of the limits from per-user to per-user-per-user_ns it is clear that testing a user_struct is no longer the proper test and the test should be a test against the ucounts struct to match the rest of the logic change that was made.
With RLIMIT_NPROC not being enforced for the global root user anywhere else should it be enforced in is_ucounts_overlimit for a user namespace created by the global root user?
Since this is limited to just the global root user, and RLIMIT_NPROC is already ignored for that user I am going to vote no.
This change does imply that it becomes possible to limit all users in a user namespace but to not enforce the rlimits on the root user or anyone with CAP_SYS_ADMIN and CAP_SYS_RESOURCE in the user namespace. It is not clear to me why any of those exceptions exist so I figure we should until this is actually a problem for someone before we relax the permission checks here.
Cc: stable@vger.kernel.org Reported-by: Michal Koutný mkoutny@suse.com [1] https://lkml.kernel.org/r/20220207121800.5079-5-mkoutny@suse.com History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts") Signed-off-by: "Eric W. Biederman" ebiederm@xmission.com --- fs/exec.c | 2 +- kernel/fork.c | 2 +- kernel/user_namespace.c | 2 ++ 3 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c index 1e7f757cbc2c..01c8c7bae9b4 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1881,7 +1881,7 @@ static int do_execveat_common(int fd, struct filename *filename, */ if ((current->flags & PF_NPROC_CHECK) && is_ucounts_overlimit(current_ucounts(), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)) && - (current_user() != INIT_USER) && + (current_ucounts() != &init_ucounts) && !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN)) { retval = -EAGAIN; goto out_ret; diff --git a/kernel/fork.c b/kernel/fork.c index 2b6a28a86325..6f62d37f3650 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2027,7 +2027,7 @@ static __latent_entropy struct task_struct *copy_process(
retval = -EAGAIN; if (is_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) { - if (p->real_cred->user != INIT_USER && + if ((task_ucounts(p) != &init_ucounts) && !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN)) goto bad_fork_cleanup_count; } diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 6b2e3ca7ee99..f0c04073403d 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -123,6 +123,8 @@ int create_user_ns(struct cred *new) ns->ucount_max[i] = INT_MAX; } set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)); + if (new->ucounts == &init_ucounts) + set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_NPROC, RLIMIT_INFINITY); set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_MSGQUEUE, rlimit(RLIMIT_MSGQUEUE)); set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_SIGPENDING, rlimit(RLIMIT_SIGPENDING)); set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_MEMLOCK, rlimit(RLIMIT_MEMLOCK));