On Fri, Aug 05, 2022 at 10:21:21PM +0000, jeffxu@google.com wrote:
This v2 series MFD_NOEXEC, this series includes: 1> address comments in V1 2> add sysctl (vm.mfd_noexec) to change the default file permissions of memfd_create to be non-executable.
Below are cover-level for v1:
The default file permissions on a memfd include execute bits, which means that such a memfd can be filled with a executable and passed to the exec() family of functions. This is undesirable on systems where all code is verified and all filesystems are intended to be mounted noexec, since an attacker may be able to use a memfd to load unverified code and execute it.
I would absolutely like to see some kind of protection here. However, I'd like a more specific threat model. What are the cases where the X bit has been abused (e.g.[1])? What are the cases where the X bit is needed (e.g.[2])? With those in mind, it should be possible to draw a clear line between the two cases. (e.g. we need to avoid a confused deputy attack where an "unprivileged" user can pass an executable memfd to a "privileged" user. How those privileges are defined may matter a lot based on how memfds are being used. For example, can runc's use of executable memfds be distinguished from an attacker's?)
Additionally, execution via memfd is a common way to avoid scrutiny for malicious code, since it allows execution of a program without a file ever appearing on disk. This attack vector is not totally mitigated with this new flag, since the default memfd file permissions must remain executable to avoid breaking existing legitimate uses, but it should be possible to use other security mechanisms to prevent memfd_create calls without MFD_NOEXEC on systems where it is known that executable memfds are not necessary.
This reminds me of dealing with non-executable stacks. There ended up being three states:
- requested to be executable (PT_GNU_STACK X) - requested to be non-executable (PT_GNU_STACK NX) - undefined (no PT_GNU_STACK)
The first two are clearly defined, but the third needed a lot of special handling. For a "safe by default" world, the third should be "NX", but old stuff depended on it being "X".
Here, we have a bit being present or not, so we only have a binary state. I'd much rather the default be NX (no bit set) instead of making every future (safe) user of memfd have to specify MFD_NOEXEC.
It's also easier on a filtering side to say "disallow memfd_create with MFD_EXEC", but how do we deal with the older software?
If the default perms of memfd_create()'s exec bit is controlled by a sysctl and the sysctl is set to "leave it executable", how does a user create an NX memfd? (i.e. setting MFD_EXEC means "exec" and not setting it means "exec" also.) Are two bits needed? Seems wasteful. MFD_I_KNOW_HOW_TO_SET_EXEC | MFD_EXEC, etc...
For F_SEAL_EXEC, it seems this should imply F_SEAL_WRITE if forced executable to avoid WX mappings (i.e. provide W^X from the start).
-Kees
[1] https://bugs.chromium.org/p/chromium/issues/list?q=type%3Dbug-security%20mem... [2] https://lwn.net/Articles/781013/