Re: [PATCH v3 3/5] mm: madvise: implement lightweight guard page mechanism

25 Oct 2024

On Fri, Oct 25, 2024 at 11:44:56PM +0200, Vlastimil Babka wrote:
...
On 10/23/24 18:24, Lorenzo Stoakes wrote:
...
Implement a new lightweight guard page feature, that is regions of userland
virtual memory that, when accessed, cause a fatal signal to arise.
Currently users must establish PROT_NONE ranges to achieve this.
However this is very costly memory-wise - we need a VMA for each and every
one of these regions AND they become unmergeable with surrounding VMAs.
In addition repeated mmap() calls require repeated kernel context switches
and contention of the mmap lock to install these ranges, potentially also
having to unmap memory if installed over existing ranges.
The lightweight guard approach eliminates the VMA cost altogether - rather
than establishing a PROT_NONE VMA, it operates at the level of page table
entries - establishing PTE markers such that accesses to them cause a fault
followed by a SIGSGEV signal being raised.
This is achieved through the PTE marker mechanism, which we have already
extended to provide PTE_MARKER_GUARD, which we installed via the generic
page walking logic which we have extended for this purpose.
These guard ranges are established with MADV_GUARD_INSTALL. If the range in
which they are installed contain any existing mappings, they will be
zapped, i.e. free the range and unmap memory (thus mimicking the behaviour
of MADV_DONTNEED in this respect).
Any existing guard entries will be left untouched. There is therefore no
nesting of guarded pages.
Guarded ranges are NOT cleared by MADV_DONTNEED nor MADV_FREE (in both
instances the memory range may be reused at which point a user would expect
guards to still be in place), but they are cleared via MADV_GUARD_REMOVE,
process teardown or unmapping of memory ranges.
The guard property can be removed from ranges via MADV_GUARD_REMOVE. The
ranges over which this is applied, should they contain non-guard entries,
will be untouched, with only guard entries being cleared.
We permit this operation on anonymous memory only, and only VMAs which are
non-special, non-huge and not mlock()'d (if we permitted this we'd have to
drop locked pages which would be rather counterintuitive).
Racing page faults can cause repeated attempts to install guard pages that
are interrupted, result in a zap, and this process can end up being
repeated. If this happens more than would be expected in normal operation,
we rescind locks and retry the whole thing, which avoids lock contention in
this scenario.
Suggested-by: Vlastimil Babka vbabka@suse.cz
Suggested-by: Jann Horn jannh@google.com
Suggested-by: David Hildenbrand david@redhat.com
Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com
Reviewed-by: Vlastimil Babka vbabka@suse.cz
Thanks!
...
...

--- a/mm/internal.h
+++ b/mm/internal.h
@@ -423,6 +423,12 @@ extern unsigned long highest_memmap_pfn;
  */
 #define MAX_RECLAIM_RETRIES 16
+/*


Maximum number of attempts we make to install guard pages before we give up



and return -ERESTARTNOINTR to have userspace try again.


*/

+#define MAX_MADVISE_GUARD_RETRIES 3
Can't we simply put this in mm/madvise.c ? Didn't find usage elsewhere.
Sure, will move if respin/can send a quick fixpatch next week if otherwise
settled. Just felt vaguely 'neater' here for... spurious subjective squishy
brained reasons :)

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v3 3/5] mm: madvise: implement lightweight guard page mechanism