From: Nicolas Geoffray ngeoffray@google.com
F_SEAL_FUTURE_WRITE has unexpected behavior when used with MAP_PRIVATE: A private mapping created after the memfd file that gets sealed with F_SEAL_FUTURE_WRITE loses the copy-on-write at fork behavior, meaning children and parent share the same memory, even though the mapping is private.
The reason for this is due to the code below:
static int shmem_mmap(struct file *file, struct vm_area_struct *vma) { struct shmem_inode_info *info = SHMEM_I(file_inode(file));
if (info->seals & F_SEAL_FUTURE_WRITE) { /* * New PROT_WRITE and MAP_SHARED mmaps are not allowed when * "future write" seal active. */ if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_WRITE)) return -EPERM;
/* * Since the F_SEAL_FUTURE_WRITE seals allow for a MAP_SHARED * read-only mapping, take care to not allow mprotect to revert * protections. */ vma->vm_flags &= ~(VM_MAYWRITE); } ... }
And for the mm to know if a mapping is copy-on-write: static inline bool is_cow_mapping(vm_flags_t flags) { return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE; }
The patch fixes the issue by making the mprotect revert protection happen only for shared mappings. For private mappings, using mprotect will have no effect on the seal behavior.
Cc: kernel-team@android.com Signed-off-by: Nicolas Geoffray ngeoffray@google.com Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
--- Google bug: 143833776
mm/shmem.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c index 447fd575587c..6ac5e867ef13 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2214,11 +2214,14 @@ static int shmem_mmap(struct file *file, struct vm_area_struct *vma) return -EPERM;
/* - * Since the F_SEAL_FUTURE_WRITE seals allow for a MAP_SHARED - * read-only mapping, take care to not allow mprotect to revert - * protections. + * Since an F_SEAL_FUTURE_WRITE sealed memfd can be mapped as + * MAP_SHARED and read-only, take care to not allow mprotect to + * revert protections on such mappings. Do this only for shared + * mappings. For private mappings, don't need to mask VM_MAYWRITE + * as we still want them to be COW-writable. */ - vma->vm_flags &= ~(VM_MAYWRITE); + if (vma->vm_flags & VM_SHARED) + vma->vm_flags &= ~(VM_MAYWRITE); }
file_accessed(file);
In this test, the parent and child both have writable private mappings. The test shows that without the patch in this series, the parent and child shared the same memory which is incorrect. In other words, COW needs to be triggered so any writes to child's copy stays local to the child.
Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org --- tools/testing/selftests/memfd/memfd_test.c | 36 ++++++++++++++++++++++ 1 file changed, 36 insertions(+)
diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c index c67d32eeb668..334a7eea2004 100644 --- a/tools/testing/selftests/memfd/memfd_test.c +++ b/tools/testing/selftests/memfd/memfd_test.c @@ -290,6 +290,40 @@ static void mfd_assert_read_shared(int fd) munmap(p, mfd_def_size); }
+static void mfd_assert_fork_private_write(int fd) +{ + int *p; + pid_t pid; + + p = mmap(NULL, + mfd_def_size, + PROT_READ | PROT_WRITE, + MAP_PRIVATE, + fd, + 0); + if (p == MAP_FAILED) { + printf("mmap() failed: %m\n"); + abort(); + } + + p[0] = 22; + + pid = fork(); + if (pid == 0) { + p[0] = 33; + exit(0); + } else { + waitpid(pid, NULL, 0); + + if (p[0] != 22) { + printf("MAP_PRIVATE copy-on-write failed: %m\n"); + abort(); + } + } + + munmap(p, mfd_def_size); +} + static void mfd_assert_write(int fd) { ssize_t l; @@ -760,6 +794,8 @@ static void test_seal_future_write(void) mfd_assert_read_shared(fd2); mfd_fail_write(fd2);
+ mfd_assert_fork_private_write(fd); + munmap(p, mfd_def_size); close(fd2); close(fd);
On Thu, 7 Nov 2019 14:53:54 -0500 "Joel Fernandes (Google)" joel@joelfernandes.org wrote:
F_SEAL_FUTURE_WRITE has unexpected behavior when used with MAP_PRIVATE: A private mapping created after the memfd file that gets sealed with F_SEAL_FUTURE_WRITE loses the copy-on-write at fork behavior, meaning children and parent share the same memory, even though the mapping is private.
That sounds fairly serious. Should this be backported into -stable kernels?
On Thu, Nov 07, 2019 at 05:00:23PM -0800, Andrew Morton wrote:
On Thu, 7 Nov 2019 14:53:54 -0500 "Joel Fernandes (Google)" joel@joelfernandes.org wrote:
F_SEAL_FUTURE_WRITE has unexpected behavior when used with MAP_PRIVATE: A private mapping created after the memfd file that gets sealed with F_SEAL_FUTURE_WRITE loses the copy-on-write at fork behavior, meaning children and parent share the same memory, even though the mapping is private.
That sounds fairly serious. Should this be backported into -stable kernels?
Yes, it should be. The F_SEAL_FUTURE_WRITE feature was introduced in v5.1 so v5.3.x stable kernels would need a backport. I can submit a backport tomorrow unless we are Ok with stable automatically picking it up (I believe the stable folks "auto select" fixes which should detect this is a fix since I have said it is a fix in the subject line).
thanks,
- Joel
On Thu, 7 Nov 2019 21:06:14 -0500 Joel Fernandes joel@joelfernandes.org wrote:
On Thu, Nov 07, 2019 at 05:00:23PM -0800, Andrew Morton wrote:
On Thu, 7 Nov 2019 14:53:54 -0500 "Joel Fernandes (Google)" joel@joelfernandes.org wrote:
F_SEAL_FUTURE_WRITE has unexpected behavior when used with MAP_PRIVATE: A private mapping created after the memfd file that gets sealed with F_SEAL_FUTURE_WRITE loses the copy-on-write at fork behavior, meaning children and parent share the same memory, even though the mapping is private.
That sounds fairly serious. Should this be backported into -stable kernels?
Yes, it should be.
I added
Fixes: ab3948f58ff84 ("mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd") Cc: stable@vger.kernel.org
The F_SEAL_FUTURE_WRITE feature was introduced in v5.1 so v5.3.x stable kernels would need a backport. I can submit a backport tomorrow unless we are Ok with stable automatically picking it up (I believe the stable folks "auto select" fixes which should detect this is a fix since I have said it is a fix in the subject line).
The Cc:stable tag should trigger the appropriate actions, assisted by the Fixes:. I doubt if "fix" in the Subject has much effect.
On Thu, Nov 07, 2019 at 09:06:14PM -0500, Joel Fernandes wrote:
On Thu, Nov 07, 2019 at 05:00:23PM -0800, Andrew Morton wrote:
On Thu, 7 Nov 2019 14:53:54 -0500 "Joel Fernandes (Google)" joel@joelfernandes.org wrote:
F_SEAL_FUTURE_WRITE has unexpected behavior when used with MAP_PRIVATE: A private mapping created after the memfd file that gets sealed with F_SEAL_FUTURE_WRITE loses the copy-on-write at fork behavior, meaning children and parent share the same memory, even though the mapping is private.
That sounds fairly serious. Should this be backported into -stable kernels?
Yes, it should be. The F_SEAL_FUTURE_WRITE feature was introduced in v5.1 so v5.3.x stable kernels would need a backport. I can submit a backport tomorrow unless we are Ok with stable automatically picking it up (I believe the stable folks "auto select" fixes which should detect this is a fix since I have said it is a fix in the subject line).
Never rely on "auto select" to pick up a patch for stable if you already know it should go to stable. Just mark it as such, or tell stable@vger after the fact.
thanks,
greg k-h
On Fri, Nov 08, 2019 at 07:37:15AM +0100, Greg KH wrote:
On Thu, Nov 07, 2019 at 09:06:14PM -0500, Joel Fernandes wrote:
On Thu, Nov 07, 2019 at 05:00:23PM -0800, Andrew Morton wrote:
On Thu, 7 Nov 2019 14:53:54 -0500 "Joel Fernandes (Google)" joel@joelfernandes.org wrote:
F_SEAL_FUTURE_WRITE has unexpected behavior when used with MAP_PRIVATE: A private mapping created after the memfd file that gets sealed with F_SEAL_FUTURE_WRITE loses the copy-on-write at fork behavior, meaning children and parent share the same memory, even though the mapping is private.
That sounds fairly serious. Should this be backported into -stable kernels?
Yes, it should be. The F_SEAL_FUTURE_WRITE feature was introduced in v5.1 so v5.3.x stable kernels would need a backport. I can submit a backport tomorrow unless we are Ok with stable automatically picking it up (I believe the stable folks "auto select" fixes which should detect this is a fix since I have said it is a fix in the subject line).
Never rely on "auto select" to pick up a patch for stable if you already know it should go to stable. Just mark it as such, or tell stable@vger after the fact.
Sure, agreed.
Thanks Andrew for adding the tags!
thanks,
- Joel
* Since the F_SEAL_FUTURE_WRITE seals allow for a MAP_SHARED
* read-only mapping, take care to not allow mprotect to revert
* protections.
* Since an F_SEAL_FUTURE_WRITE sealed memfd can be mapped as
* MAP_SHARED and read-only, take care to not allow mprotect to
* revert protections on such mappings. Do this only for shared
* mappings. For private mappings, don't need to mask VM_MAYWRITE
This adds an > 80 char line.
On Thu, Nov 07, 2019 at 10:33:08PM -0800, Christoph Hellwig wrote:
* Since the F_SEAL_FUTURE_WRITE seals allow for a MAP_SHARED
* read-only mapping, take care to not allow mprotect to revert
* protections.
* Since an F_SEAL_FUTURE_WRITE sealed memfd can be mapped as
* MAP_SHARED and read-only, take care to not allow mprotect to
* revert protections on such mappings. Do this only for shared
* mappings. For private mappings, don't need to mask VM_MAYWRITE
This adds an > 80 char line.
Oh, true. Sorry. Andrew I hate to ask you but since you took the patch already, could you just the comment for the character limit in the one you applied?
thanks,
- Joel
linux-kselftest-mirror@lists.linaro.org