On Mon, Dec 21, 2020 at 11:55:02AM -0800, Linus Torvalds wrote:
On Mon, Dec 21, 2020 at 11:16 AM Yu Zhao yuzhao@google.com wrote:
Nadav Amit found memory corruptions when running userfaultfd test above. It seems to me the problem is related to commit 09854ba94c6a ("mm: do_wp_page() simplification"). Can you please take a look? Thanks.
TL;DR: it may not safe to make copies of singly mapped (non-COW) pages when it's locked or has additional ref count because concurrent clear_soft_dirty or change_pte_range may have removed pte_write but yet to flush tlb.
Hmm. The TLB flush shouldn't actually matter, because anything that changes the writable bit had better be serialized by the page table lock.
Well, unfortunately we have places that use optimizations like
inc_tlb_flush_pending() lock page table pte_wrprotect flush_tlb_range() dec_tlb_flush_pending()
which complicate things. And usually checking mm_tlb_flush_pending() in addition to pte_write() (while holding page table lock) would fix the similar problems. But for this one, doing so apparently isn't as straightforward or the best solution.
Yes, we often load the page table value without holding the page table lock (in order to know what we are going to do), but then before we finalize the operation, we then re-check - undet the page table lock - that the value we loaded still matches.
But I think I see what *MAY* be going on. The userfaultfd mwriteprotect_range() code takes the mm lock for _reading_. Which means that you can have
Thread A Thread B
fault starts. Sees write-protected pte, allocates memory, copies data
- userfaultfd makes the regions writable - usefaultfd case writes to the region - userfaultfd makes region non-writable
fault continues, gets the page table lock, sees that the pte is the
same, uses old copied data
But if this is what's happening, I think it's a userfaultfd bug. I think the mmap_read_lock(dst_mm) in mwriteprotect_range() needs to be a mmap_write_lock().
mprotect() does this right, it looks like userfaultfd does not. You cannot just change the writability of a page willy-nilly without the correct locking.
Maybe there are other causes, but this one stands out to me as one possible cause.
Comments?
Linus