On 2019-12-29, Linus Torvalds torvalds@linux-foundation.org wrote:
On Sun, Dec 29, 2019 at 9:21 PM Aleksa Sarai cyphar@cyphar.com wrote:
if (d_is_symlink(mp->m_dentry) ||
d_is_symlink(mnt->mnt.mnt_root))
return -EINVAL;
So I don't hate this kind of check in general - overmounting a symlink sounds odd, but at the same time I get the feeling that the real issue is that something went wrong earlier.
Yeah, the mount target kind of _is_ a path, but at the same time, we most definitely want to have the permission to really open the directory in question, don't we, and I don't see that we should accept a O_PATH file descriptor.
The new mount API uses O_PATH under the hood (which is a good thing since some files you'd like to avoid actually opening -- FIFOs are the obvious example) so I'm not sure that's something we could really avoid.
But if we block O_PATH for mounts this will achieve the same thing, because the only way to get a file descriptor that references a symlink is through (O_PATH | O_NOFOLLOW).
I feel like the only valid use of "O_PATH" files is to then use them as the base for an openat() and friends (ie fchmodat/execveat() etc).
See below, we use this for all sorts of dirty^Wclever tricks.
But maybe I'm completely wrong, and people really do want O_PATH handling exactly for mounting too. It does sound a bit odd. By definition, mounting wants permissions to the mount-point, so what's the point of using O_PATH?
When you go through O_PATH, you still get a proper 'struct path' which means that for operations such as mount (or open) you will operate on the *real* underlying file.
This is part of what makes magic-links so useful (but also quite terrifying).
For example, is the problem that when you do a proper
fd = open("somepath", O_PATH);
in one process, and then another thread does
fd = open("/proc/<pid>/fd/<opathfd>", O_RDWR);
then we get confused and do bad things on that *second* open? Because now the second open doesn't have O_PATH, and doesn't ghet marked FMODE_PATH, but the underlying file descriptor is one of those limited "is really only useful for openat() and friends".
Actually, this isn't true (for the same reason as above) -- when you do a re-open through /proc/$pid/fd/$n you get a real-as-a-heart-attack file descriptor. We make lots of use of this in container runtimes in order to do some dirty^Wfun tricks that help us harden the runtime against malicious container processes.
You might recall that when I was posting the earlier revisions of openat2(), I also included a patch for O_EMPTYPATH (which basically did a re-open of /proc/self/fd/$dfd but without needing /proc). That had precisely the same semantics so that you could do the same operation without procfs. That patch was dropped before Al merged openat2(), but I am probably going to revive it for the reasons I outlined below.
I dunno. I haven't thought through the whole thing. But the oopses you quote seem like we're really doing something wrong, and it really does feel like your patch in no way _fixes_ the wrong thing we're doing, it's just hiding the symptoms.
That's fair enough.
I'll be honest, the real reason why I don't want mounts over symlinks to be possible is for an entirely different reason. I'm working on a safe path resolution library to accompany openat2()[1] -- and one of the things I want to do is to harden all of our uses of procfs (such that if we are running in a context where procfs has been messed with -- such as having files bind-mounted -- we can detect it and abort). The issue with symlinks is that we need to be able to operate on magic-links (such as /proc/self/fd/$n and /proc/self/exe) -- and if it's possible bind-mount over those magic-links then we can't detect it at all.
openat2(RESOLVE_NO_XDEV) would block it, but it also blocks going through magic-links which change your mount (which would almost always be true). You can't trust /proc/self/mountinfo by definition -- not just because of the TOCTOU race but also because you can't depend on /proc to harden against a "bad" /proc. All other options such as umount2(MNT_EXPIRE) won't help with magic-links because we cannot take an O_PATH to a magic-link and follow it -- O_PATHs of symlinks are completely stunted in this respect.
If allowing bind-mounts over symlinks is allowed (which I don't have a problem with really), it just means we'll need a few more kernel pieces to get this hardening to work. But these features would be useful outside of the problems I'm dealing with (O_EMPTYPATH and some kind of pidfd-based interface to grab the equivalent of /proc/self/exe and a few other such magic-link targets).
[1]: https://github.com/openSUSE/libpathrs