Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

22 Dec 2020


      On Mon, Dec 21, 2020 at 3:22 PM Linus Torvalds
torvalds@linux-foundation.org wrote:
...
On Mon, Dec 21, 2020 at 2:30 PM Peter Xu peterx@redhat.com wrote:
...
AFAIU mprotect() is the only one who modifies the pte using the mmap write
lock.  NUMA balancing is also using read mmap lock when changing pte
protections, while my understanding is mprotect() used write lock only because
it manipulates the address space itself (aka. vma layout) rather than modifying
the ptes, so it needs to.
So it's ok to change the pte holding only the PTE lock, if it's a
*one*way* conversion.
That doesn't break the "re-check the PTE contents" model (which
predates _all_ of the rest: NUMA, userfaultfd, everything - it's
pretty much the original model for our page table operations, and goes
back to the dark ages even before SMP and the existence of a page
table lock).
So for example, a COW will always create a different pte (not just
because the page number itself changes - you could imagine a page
getting re-used and changing back - but because it's always a RO->RW
transition).
So two COW operations cannot "undo" each other and fool us into
thinking nothing changed.
Anything that changes RW->RO - like fork(), for example - needs to
take the mmap_lock.
Ugh, this is unpleasantly complicated.  I will admit that any API that
takes an address and more-or-less-blindly marks it RO makes me quite
nervous even assuming all the relevant locks are held.  At least
userfaultfd refuses to operate on VM_SHARED VMAs, but we have another
instance of this (with mmap_sem held for write) in x86:
mark_screen_rdonly().  Dare I ask how broken this is?  We could likely
get away with deleting it entirely.
--Andy

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect