On Thu, Jul 31, 2025 at 3:53 PM Christian Brauner brauner@kernel.org wrote:
On Thu, Jul 31, 2025 at 10:40:40AM +0800, Pavel Tikhomirov wrote:
If detached mounts are our only concern, it looks like the check instead of:
if (!check_mnt(mnt)) { err = -EINVAL; goto out_unlock; }
could've been a more relaxed one:
if (mnt_detached(mnt)) { err = -EINVAL; goto out_unlock; }
bool mnt_detached(struct mount *mnt) { return !mnt->mnt_ns; }
not to allow propagation change only on detached mounts. (As umount_tree sets mnt_ns to NULL.)
Changing propagation settings on detached mounts is fine and shoud work? Changing propagation settings on unmounted mounts not so much...
Sorry, it's my confused terminology, here by "detached" mounts I mean mounts which were unmounted with umount2(MNT_DETACH), maybe I should call them "unmounted" (e.g. s/mnt_detached/mnt_unmounted/).
And yes, I understand why we need to allow changing propagation on mounts in anonymous namespace without being inside it, because one can't just enter anonymous namespace.
I don't think that we need to change anything, just sharing my thoughts that it could be more relaxed and will still protect us from propagation setting on unmounted mounts.
Also in do_mount_setattr we have a more relaxed check too:
if ((mnt_has_parent(mnt) || !is_anon_ns(mnt->mnt_ns)) && !check_mnt(mnt)) goto out;
Best Regards, Tikhomirov Pavel.
On Sun, Jul 27, 2025 at 5:01 AM Andrei Vagin avagin@google.com wrote:
On Sat, Jul 26, 2025 at 10:53 AM Al Viro viro@zeniv.linux.org.uk wrote:
On Sat, Jul 26, 2025 at 10:12:34AM -0700, Andrei Vagin wrote:
On Thu, Jul 24, 2025 at 4:00 PM Al Viro viro@zeniv.linux.org.uk wrote:
On Thu, Jul 24, 2025 at 01:02:48PM -0700, Andrei Vagin wrote: > Hi Al and Christian, > > The commit 12f147ddd6de ("do_change_type(): refuse to operate on > unmounted/not ours mounts") introduced an ABI backward compatibility > break. CRIU depends on the previous behavior, and users are now > reporting criu restore failures following the kernel update. This change > has been propagated to stable kernels. Is this check strictly required?
Yes.
> Would it be possible to check only if the current process has > CAP_SYS_ADMIN within the mount user namespace?
Not enough, both in terms of permissions *and* in terms of "thou shalt not bugger the kernel data structures - nobody's priveleged enough for that".
Al,
I am still thinking in terms of "Thou shalt not break userspace"...
Seriously though, this original behavior has been in the kernel for 20 years, and it hasn't triggered any corruptions in all that time.
For a very mild example of fun to be had there: mount("none", "/mnt", "tmpfs", 0, ""); chdir("/mnt"); umount2(".", MNT_DETACH); mount(NULL, ".", NULL, MS_SHARED, NULL); Repeat in a loop, watch mount group id leak. That's a trivial example of violating the assertion ("a mount that had been through umount_tree() is out of propagation graph and related data structures for good").
I wasn't referring to detached mounts. CRIU modifies mounts from non-current namespaces.
As for the "CAP_SYS_ADMIN within the mount user namespace" - which userns do you have in mind?
The user namespace of the target mount: ns_capable(mnt->mnt_ns->user_ns, CAP_SYS_ADMIN)