On Sat, Jul 26, 2025 at 10:12:34AM -0700, Andrei Vagin wrote:
On Thu, Jul 24, 2025 at 4:00 PM Al Viro viro@zeniv.linux.org.uk wrote:
On Thu, Jul 24, 2025 at 01:02:48PM -0700, Andrei Vagin wrote:
Hi Al and Christian,
The commit 12f147ddd6de ("do_change_type(): refuse to operate on unmounted/not ours mounts") introduced an ABI backward compatibility break. CRIU depends on the previous behavior, and users are now reporting criu restore failures following the kernel update. This change has been propagated to stable kernels. Is this check strictly required?
Yes.
Would it be possible to check only if the current process has CAP_SYS_ADMIN within the mount user namespace?
Not enough, both in terms of permissions *and* in terms of "thou shalt not bugger the kernel data structures - nobody's priveleged enough for that".
Al,
I am still thinking in terms of "Thou shalt not break userspace"...
Seriously though, this original behavior has been in the kernel for 20 years, and it hasn't triggered any corruptions in all that time.
For a very mild example of fun to be had there: mount("none", "/mnt", "tmpfs", 0, ""); chdir("/mnt"); umount2(".", MNT_DETACH); mount(NULL, ".", NULL, MS_SHARED, NULL); Repeat in a loop, watch mount group id leak. That's a trivial example of violating the assertion ("a mount that had been through umount_tree() is out of propagation graph and related data structures for good").
As for the "CAP_SYS_ADMIN within the mount user namespace" - which userns do you have in mind?