On Mon, Dec 21, 2020 at 05:30:41PM -0500, Peter Xu wrote:
On Mon, Dec 21, 2020 at 01:49:55PM -0800, Nadav Amit wrote:
BTW: In general, I think that you are right, and that changing of PTEs should not require taking mmap_lock for write. However, I am not sure cow_user_page() is not the only one that poses a problem and whether a more systematic solution is needed. If cow_user_pages() is the only problem, do you think it is possible to do the copying while holding the PTL? It works for normal-pages, but I am not sure whether special-pages pose special problems.
Anyhow, this is an enhancement that we can try later.
AFAIU mprotect() is the only one who modifies the pte using the mmap write lock. NUMA balancing is also using read mmap lock when changing pte protections,
NUMA balance doesn't clear pte_write() -- I would not call setting pte_none() a change of protection.
while my understanding is mprotect() used write lock only because it manipulates the address space itself (aka. vma layout) rather than modifying the ptes, so it needs to.
Yes, and personally, I would only take mmap lock for write when I change VMAs, not PTE protections.
At the pte level, it seems always to be the pgtable lock that serializes things.
So it's perfectly legal to me for e.g. a driver to modify ptes with the read lock of mmap_sem, unless I'm severely mistaken.. as long as the pgtable lock is taken when doing so.
If there's a driver that manipulated the ptes, changed the content of the page, recover the ptes to origin, and all these happen right after wp_page_copy() unlocked the pgtable lock but before wp_page_copy() retakes the same lock again, we may face the same issue finding that the page got copied contains corrupted data at last. While I don't know what to blame on the driver either because it seems to be exactly following the rules.
I believe changing into write lock would solve the race here because tlb flushing would be guaranteed along the way, but I'm just a bit worried it's not the best way to go..
I can't say I disagree with you but the man has made the call and I think we should just move on.