On 18.02.25 14:05, Lorenzo Stoakes wrote:
On Tue, Feb 18, 2025 at 01:12:05PM +0100, Vlastimil Babka wrote:
On 2/13/25 19:16, Lorenzo Stoakes wrote:
The guard regions feature was initially implemented to support anonymous mappings only, excluding shmem.
This was done such as to introduce the feature carefully and incrementally and to be conservative when considering the various caveats and corner cases that are applicable to file-backed mappings but not to anonymous ones.
Now this feature has landed in 6.13, it is time to revisit this and to extend this functionality to file-backed and shmem mappings.
In order to make this maximally useful, and since one may map file-backed mappings read-only (for instance ELF images), we also remove the restriction on read-only mappings and permit the establishment of guard regions in any non-hugetlb, non-mlock()'d mapping.
Do you plan to address mlock later too? I guess somebody might find use for those. Is there some fundamental issue or just that we need to define some good semantics for corner cases? (i.e. if pages are already populated in the mlocked area, discarding them by replacing with guard pte's goes against that, so do we allow it or not?).
Yeah that's the fundamental issue with mlock, it does not interact with the zapping part of MADV_GUARD_INSTALL, and that is why we disallow it (but not so for MADV_GUARD_REMOVE, as if a VMA that contains guard regions is locked _afterwards_ there will be no zapping).
We could potentially expose an equivalent, as there are for other flags, a _LOCKED variant of the madvise() flag, like MADV_GUARD_INSTALL_LOCKED to make this explicit.
That is probably the most sensible option, if there is a need for this!
mlock is weird, because it assumes that memory will be faulted in in the whole VMA.
You'd likely have to populate + mlock the page when removing the guard. Also not sure what happens if one does an mlock()/mlockall() after already installing PTE markers.
__mm_populate() would skip whole VMAs in case populate_vma_page_range() fails. And I would assume populate_vma_page_range() fails on the first guard when it triggers a page fault.
OTOH, supporting the mlock-on-fault thingy should be easy. That's precisely where MADV_DONTNEED_LOCKED originates from:
commit 9457056ac426e5ed0671356509c8dcce69f8dee0 Author: Johannes Weiner hannes@cmpxchg.org Date: Thu Mar 24 18:14:12 2022 -0700
mm: madvise: MADV_DONTNEED_LOCKED
MADV_DONTNEED historically rejects mlocked ranges, but with MLOCK_ONFAULT and MCL_ONFAULT allowing to mlock without populating, there are valid use cases for depopulating locked ranges as well.
Adding support for that would be indeed nice.