On 10/30/25 20:47, Lorenzo Stoakes wrote:
On Thu, Oct 30, 2025 at 07:47:34PM +0100, Vlastimil Babka wrote:
Could we use MADVISE_VMA_READ_LOCK mode (would be actually an improvement over the current MADVISE_MMAP_READ_LOCK), together with the atomic flag setting? I think the places that could race with us to cause RMW use vma write lock so that would be excluded. Fork AFAICS unfortunately doesn't (for the oldmm) and it probably would't make sense to start doing it. Maybe we could think of something to deal with this special case...
During discussion with Pedro off-list I realized fork takes mmap lock for write on the old mm, so if we kept taking mmap sem for read, then vma lock for read in addition (which should be cheap enough, also we'd only need it in case VM_MAYBE_GUARD is not yet set), and set the flag atomicaly, perhaps that would cover all non-bening races?
We take VMA write lock in dup_mmap() on each mpnt (old VMA).
Ah yes I thought it was the new one.
We take the VMA write lock (vma_start_write()) for each mpnt.
We then vm_area_dup() the mpnt to the new VMA before calling:
copy_page_range() -> vma_needs_copy()
Which is where the check is done.
So we are holding the VMA write lock, so a VMA read lock should suffice no?
Yeah, even better!
For belts + braces we could atomically read the flag in vma_needs_copy(), though note it's intended VM_COPY_ON_FORK could have more than one flag.
We could drop that for now and be explicit.
Great!