On Dec 21, 2020, at 3:30 PM, Linus Torvalds torvalds@linux-foundation.org wrote:
On Mon, Dec 21, 2020 at 2:55 PM Nadav Amit nadav.amit@gmail.com wrote:
So as an alternative solution, I can do copying under the PTL after flushing, which seems to solve the problem.
... Note that the "Re-validate under PTL" code in cow_user_page() is *not* the "now we are installing the copy". No, that's actually for the "uhhuh, the copy using the virtual address outside the ptl failed, now we need to do something special”. ... So are we sure the COW case is so special?
I really think this is clearly just a userfaultfd bug that we hadn't realized until now, and had possibly been hidden by timings or other random stuff before.
Thanks for the detailed explanation. I think I got the COW parts correct, but as you said, I am completely not sure that COW is so special.
Seems as if some general per page-table mechanism for detection of stale PTEs is needed, so by default anyone that acquires the PTL is guaranteed that the PTEs in memory are coherent across all the TLBs.
But I still did not figure out how to do so without introducing overheads, and the question is indeed if people care about mprotect and uffd-wp performance.