This series is a result of looking deeper into breakage of tools/testing/selftests/rlimits/rlimits-per-userns.c after https://lore.kernel.org/r/20220204181144.24462-1-mkoutny@suse.com/ is applied.
The description of the original problem that lead to RLIMIT_NPROC et al. ucounts rewrite could be ambiguously interpretted as supporting either the case of: - never-fork service or - fork (RLIMIT_NPROC-1) times service.
The scenario is weird anyway given existence of pids controller.
The realization of that scenario relies not only on tracking number of processes per user_ns but also newly allows the root to override limit through set*uid. The commit message didn't mention that, so it's unclear if it was the intention too.
I also noticed that the RLIMIT_NPROC enforcing in fork seems subject to TOCTOU race (check(nr_tasks),...,nr_tasks++) so the limit is rather advisory (but that's not a new thing related to ucounts rewrite).
This series is RFC to discuss relevance of the subtle changes RLIMIT_NPROC to ucounts rewrite introduced.
Michal Koutný (6): set_user: Perform RLIMIT_NPROC capability check against new user credentials set*uid: Check RLIMIT_PROC against new credentials cred: Count tasks by their real uid into RLIMIT_NPROC ucounts: Allow root to override RLIMIT_NPROC selftests: Challenge RLIMIT_NPROC in user namespaces selftests: Test RLIMIT_NPROC in clone-created user namespaces
fs/exec.c | 2 +- include/linux/cred.h | 2 +- kernel/cred.c | 29 ++- kernel/fork.c | 2 +- kernel/sys.c | 20 +- kernel/ucount.c | 3 + kernel/user_namespace.c | 2 +- .../selftests/rlimits/rlimits-per-userns.c | 233 +++++++++++++++--- 8 files changed, 229 insertions(+), 64 deletions(-)