Michal Koutný mkoutny@suse.com writes:
This series is a result of looking deeper into breakage of tools/testing/selftests/rlimits/rlimits-per-userns.c after https://lore.kernel.org/r/20220204181144.24462-1-mkoutny@suse.com/ is applied.
The description of the original problem that lead to RLIMIT_NPROC et al. ucounts rewrite could be ambiguously interpretted as supporting either the case of:
- never-fork service or
- fork (RLIMIT_NPROC-1) times service.
The scenario is weird anyway given existence of pids controller.
The realization of that scenario relies not only on tracking number of processes per user_ns but also newly allows the root to override limit through set*uid. The commit message didn't mention that, so it's unclear if it was the intention too.
I also noticed that the RLIMIT_NPROC enforcing in fork seems subject to TOCTOU race (check(nr_tasks),...,nr_tasks++) so the limit is rather advisory (but that's not a new thing related to ucounts rewrite).
This series is RFC to discuss relevance of the subtle changes RLIMIT_NPROC to ucounts rewrite introduced.
A quick reply (because I don't have a lot of time at the moment).
I agree with the issues your first patch before this series addresses and the issues the first 3 patches address.
I have not looked at the tests.
I actually disagree with most of your fixes. Both because of intrusiveness and because of awkwardness. My basic problem with your fixes is I don't think they leave the code in a more maintainable state.
Hopefully later today I can propose some alternative fixes and we can continue the discussion.
One thing I think you misunderstood is the capability checks in set_user have always been there. There is a very good argument they are badly placed so are not exactly checking the correct credentials. Especially now.
Your patch 4/6 I don't think makes sense. It has always been the case that root without capabilities is subject to the rlimit. If you are in a user namespace you are root without capabilities.
Eric