On Thu, Jun 2, 2022 at 6:39 AM Matthew Wilcox willy@infradead.org wrote:
On Wed, Jun 01, 2022 at 02:47:41PM -0700, Suren Baghdasaryan wrote:
Unclear why this patch fiddles with the mm_struct locking in this fashion - changelogging that would have been helpful.
Yeah, I should have clarified this in the description. Everything up to unmap_vmas() can be done under mmap_read_lock and that way oom-reaper and process_mrelease can do the unmapping in parallel with exit_mmap. That's the reason we take mmap_read_lock, unmap the vmas, mark the mm with MMF_OOM_SKIP and take the mmap_write_lock to execute free_pgtables. I think maple trees do not change that except there is no mm->mmap anymore, so the line at the end of exit_mmap where we reset mm->mmap to NULL can be removed (I show that line below).
I don't understand why we _want_ unmapping to proceed in parallel? Is it so urgent to unmap these page tables that we need two processes doing it at the same time? And doesn't that just change the contention from visible (contention on a lock) to invisible (contention on cachelines)?
It's important for process_madvise() syscall not to be blocked by a potentially lower priority task doing exit_mmap. I've seen such priority inversion happening when the dying process is running on a little core taking its time while a high-priority task is waiting in the syscall while there is no reason for them to block each other.