From: Nicolas Geoffray ngeoffray@google.com
F_SEAL_FUTURE_WRITE has unexpected behavior when used with MAP_PRIVATE: A private mapping created after the memfd file that gets sealed with F_SEAL_FUTURE_WRITE loses the copy-on-write at fork behavior, meaning children and parent share the same memory, even though the mapping is private.
The reason for this is due to the code below:
static int shmem_mmap(struct file *file, struct vm_area_struct *vma) { struct shmem_inode_info *info = SHMEM_I(file_inode(file));
if (info->seals & F_SEAL_FUTURE_WRITE) { /* * New PROT_WRITE and MAP_SHARED mmaps are not allowed when * "future write" seal active. */ if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_WRITE)) return -EPERM;
/* * Since the F_SEAL_FUTURE_WRITE seals allow for a MAP_SHARED * read-only mapping, take care to not allow mprotect to revert * protections. */ vma->vm_flags &= ~(VM_MAYWRITE); } ... }
And for the mm to know if a mapping is copy-on-write: static inline bool is_cow_mapping(vm_flags_t flags) { return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE; }
The patch fixes the issue by making the mprotect revert protection happen only for shared mappings. For private mappings, using mprotect will have no effect on the seal behavior.
Cc: kernel-team@android.com Signed-off-by: Nicolas Geoffray ngeoffray@google.com Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
--- Google bug: 143833776
mm/shmem.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c index 447fd575587c..6ac5e867ef13 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2214,11 +2214,14 @@ static int shmem_mmap(struct file *file, struct vm_area_struct *vma) return -EPERM;
/* - * Since the F_SEAL_FUTURE_WRITE seals allow for a MAP_SHARED - * read-only mapping, take care to not allow mprotect to revert - * protections. + * Since an F_SEAL_FUTURE_WRITE sealed memfd can be mapped as + * MAP_SHARED and read-only, take care to not allow mprotect to + * revert protections on such mappings. Do this only for shared + * mappings. For private mappings, don't need to mask VM_MAYWRITE + * as we still want them to be COW-writable. */ - vma->vm_flags &= ~(VM_MAYWRITE); + if (vma->vm_flags & VM_SHARED) + vma->vm_flags &= ~(VM_MAYWRITE); }
file_accessed(file);