On Tue, Dec 22, 2020 at 04:01:45PM -0800, Linus Torvalds wrote:
On Tue, Dec 22, 2020 at 3:50 PM Linus Torvalds torvalds@linux-foundation.org wrote:
See zap_pte_range() for an example of doing it right, even in the presence of complexities (ie that has an example of both flushing the TLB, and doing the actual "free the pages after flush", and it does the two cases separately).
The more I look at the mprotect code, the less I like it. We seem to be much better about the TLB flushes in other places (looking at mremap, for example). The mprotect code seems to be very laissez-faire about the TLB flushing.
Does adding a TLB flush to before that
pte_unmap_unlock(pte - 1, ptl);
fix things for you?
It definitely does. But if I had to choose, I'd go with holding mmap_lock for write because 1) it's less likely to storm other CPUs by IPI and would only have performance impact on processes that use ufd, which I guess already have high tolerance for not-so-good performance, and 2) people are spearheading multiple efforts to reduce the mmap_lock contention, which hopefully would make ufd users suffer less soon.
That's not the right fix - leaving a stale TLB entry around is fine if the TLB entry is more strict wrt protections - but it might be worth testing as a "does it at least close the problem" patch.
Well, things get trick if we do this. I'm not sure if I could vouch such a fix for stable as confident as I do holding mmap_lock for write.