Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

21 Dec 2020


      On Mon, Dec 21, 2020 at 10:31:57AM -0800, Nadav Amit wrote:
...
...
On Dec 21, 2020, at 9:27 AM, Peter Xu peterx@redhat.com wrote:
Hi, Nadav,
On Sun, Dec 20, 2020 at 12:06:38AM -0800, Nadav Amit wrote:
[...]
...
So to correct myself, I think that what I really encountered was actually
during MM_CP_UFFD_WP_RESOLVE (i.e., when the protection is removed). The
problem was that in this case the “write”-bit was removed during unprotect.
Sorry for the strange formatting to fit within 80 columns:
I assume I can ignore the race mentioned in the commit message but only refer
to this one below.  However I'm still confused.  Please see below.
...
[ Start: PTE is writable ]
cpu0				cpu1			cpu2

					[ Writable PTE 
					  cached in TLB ]

Here cpu2 got writable pte in tlb.  But why?
If below is an unprotect, it means it must have been protected once by
userfaultfd, right?  If so, the previous change_protection_range() which did
the wr-protect should have done a tlb flush already before it returns (since
pages>0 - we protected one pte at least).  Then I can't see why cpu2 tlb has
stall data.
Thanks, Peter. Just as you can munprotect() a region which was not protected
before, you can ufff-unprotect a region that was not protected before. It
might be that the user tried to unprotect a large region, which was
partially protected and partially unprotected.
The selftest obviously blindly unprotect some regions to check for bugs.
So to your question - it was not write-protected (think about initial copy
without write-protecting).
If that's the only case, how about we don't touch the ptes at all? Instead of
playing with preserve_write, I'm thinking something like this right before
ptep_modify_prot_start(), even for uffd_wp==true:
if (uffd_wp && pte_uffd_wp(old_pte)) {
    WARN_ON_ONCE(pte_write(old_pte));
    continue;
  }
if (uffd_wp_resolve && !pte_uffd_wp(old_pte))
      continue;
Then we can also avoid the heavy operations on changing ptes back and forth.
...
...
If I assume cpu2 doesn't have that cached tlb, then "write to old page" won't
happen either, because cpu1/cpu2 will all go through the cow path and pgtable
lock should serialize them.
...
userfaultfd_writeprotect()				
[ write-*unprotect* ]
mwriteprotect_range()
mmap_read_lock()
change_protection()
change_protection_range()
...
change_pte_range()
[ *clear* “write”-bit ]
[ defer TLB flushes]
   			[ page-fault ]
   			…
   			wp_page_copy()
   			 cow_user_page()
   			  [ copy page ]
   						[ write to old
   						  page ]
   			…
   			 set_pte_at_notify()
[ End: cpu2 write not copied form old to new page. ]
Could you share how to reproduce the problem?  I would be glad to give it a
shot as well.
You can run the selftests/userfaultfd with my small patch [1]. I ran it with
the following parameters: “ ./userfaultfd anon 100 100 “. I think that it is
more easily reproducible with “mitigations=off idle=poll” as kernel
parameters.
[1] https://lore.kernel.org/patchwork/patch/1346386/
Thanks.
...
...
...
[1] https://lore.kernel.org/patchwork/patch/1346386
PS: Sorry to not have read the other series of yours.  It seems to need some
chunk of time so I postponed it a bit due to other things; but I'll read at
least the fixes very soon.
Thanks again, I will post RFCv2 with some numbers soon.
I read the patch 1/3 of the series.  Would it be better to post them separately
just in case Andrew would like to pick them earlier?
Since you seem to be heavily working on uffd-wp - I do still have a few uffd-wp
fixes locally even for anonymous.  I think they're related to some corner cases
like either thp or migration entry convertions, but anyway I'll see whether I
should post them even earlier (I planned to add smap/pagemap support for
uffd-wp so maybe I can even write some test case to verify some of them).  Just
a FYI...
Thanks,
-- 
Peter Xu

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect