On Mon, Dec 21, 2020 at 10:31:57AM -0800, Nadav Amit wrote:
On Dec 21, 2020, at 9:27 AM, Peter Xu peterx@redhat.com wrote:
Hi, Nadav,
On Sun, Dec 20, 2020 at 12:06:38AM -0800, Nadav Amit wrote:
[...]
So to correct myself, I think that what I really encountered was actually during MM_CP_UFFD_WP_RESOLVE (i.e., when the protection is removed). The problem was that in this case the “write”-bit was removed during unprotect. Sorry for the strange formatting to fit within 80 columns:
I assume I can ignore the race mentioned in the commit message but only refer to this one below. However I'm still confused. Please see below.
[ Start: PTE is writable ]
cpu0 cpu1 cpu2
[ Writable PTE cached in TLB ]
Here cpu2 got writable pte in tlb. But why?
If below is an unprotect, it means it must have been protected once by userfaultfd, right? If so, the previous change_protection_range() which did the wr-protect should have done a tlb flush already before it returns (since pages>0 - we protected one pte at least). Then I can't see why cpu2 tlb has stall data.
Thanks, Peter. Just as you can munprotect() a region which was not protected before, you can ufff-unprotect a region that was not protected before. It might be that the user tried to unprotect a large region, which was partially protected and partially unprotected.
The selftest obviously blindly unprotect some regions to check for bugs.
So to your question - it was not write-protected (think about initial copy without write-protecting).
If I assume cpu2 doesn't have that cached tlb, then "write to old page" won't happen either, because cpu1/cpu2 will all go through the cow path and pgtable lock should serialize them.
userfaultfd_writeprotect() [ write-*unprotect* ] mwriteprotect_range() mmap_read_lock() change_protection()
change_protection_range() ... change_pte_range() [ *clear* “write”-bit ] [ defer TLB flushes] [ page-fault ] … wp_page_copy() cow_user_page() [ copy page ] [ write to old page ] … set_pte_at_notify()
[ End: cpu2 write not copied form old to new page. ]
Could you share how to reproduce the problem? I would be glad to give it a shot as well.
You can run the selftests/userfaultfd with my small patch [1]. I ran it with the following parameters: “ ./userfaultfd anon 100 100 “. I think that it is more easily reproducible with “mitigations=off idle=poll” as kernel parameters.
Hi Linus,
Nadav Amit found memory corruptions when running userfaultfd test above. It seems to me the problem is related to commit 09854ba94c6a ("mm: do_wp_page() simplification"). Can you please take a look? Thanks.
TL;DR: it may not safe to make copies of singly mapped (non-COW) pages when it's locked or has additional ref count because concurrent clear_soft_dirty or change_pte_range may have removed pte_write but yet to flush tlb.