Re: [PATCH 0/4] mm: permit guard regions for file-backed/shmem mappings

19 Feb 2025

      On Wed, Feb 19, 2025 at 12:25:51AM -0800, Kalesh Singh wrote:
...
On Thu, Feb 13, 2025 at 10:18 AM Lorenzo Stoakes
lorenzo.stoakes@oracle.com wrote:
...
The guard regions feature was initially implemented to support anonymous
mappings only, excluding shmem.
This was done such as to introduce the feature carefully and incrementally
and to be conservative when considering the various caveats and corner
cases that are applicable to file-backed mappings but not to anonymous
ones.
Now this feature has landed in 6.13, it is time to revisit this and to
extend this functionality to file-backed and shmem mappings.
In order to make this maximally useful, and since one may map file-backed
mappings read-only (for instance ELF images), we also remove the
restriction on read-only mappings and permit the establishment of guard
regions in any non-hugetlb, non-mlock()'d mapping.
Hi Lorenzo,
Thank you for your work on this.
You're welcome.
...
Have we thought about how guard regions are represented in /proc/*/[s]maps?
This is off-topic here but... Yes, extensively. No they do not appear
there.
I thought you had attended LPC and my talk where I mentioned this
purposefully as a drawback?
I went out of my way to advertise this limitation at the LPC talk, in the
original series, etc. so it's a little disappointing that this is being
brought up so late, but nobody else has raised objections to this issue so
I think in general it's not a limitation that matters in practice.
...
In the field, I've found that many applications read the ranges from
/proc/self/[s]maps to determine what they can access (usually related
to obfuscation techniques). If they don't know of the guard regions it
would cause them to crash; I think that we'll need similar entries to
PROT_NONE (---p) for these, and generally to maintain consistency
between the behavior and what is being said from /proc/*/[s]maps.
No, we cannot have these, sorry.
Firstly /proc/$pid/[s]maps describes VMAs. The entire purpose of this
feature is to avoid having to accumulate VMAs for regions which are not
intended to be accessible.
Secondly, there is no practical means for this to be accomplished in
/proc/$pid/maps in _any_ way - as no metadata relating to a VMA indicates
they have guard regions.
This is intentional, because setting such metadata is simply not practical
- why? Because when you try to split the VMA, how do you know which bit
gets the metadata and which doesn't? You can't without _reading page
tables_.
/proc/$pid/smaps _does_ read page tables, but we can't start pretending
VMAs exist when they don't, this would be completely inaccurate, would
break assumptions for things like mremap (which require a single VMA) and
would be unworkable.
The best that _could_ be achieved is to have a marker in /proc/$pid/smaps
saying 'hey this region has guard regions somewhere'.
But I haven't seen any demand for this and presumably this wouldn't help
your imagined program?
I don't really understand your use case though, what programs would read
/proc/maps, then... try to use /proc/$pid/mem or whatnot to arbitrarily
read regions? Such applications would be in danger of SIGBUS in any case if
they were to read invalid portions of file-backed mappings, and have no way
of knowing this, so they seem fundamentally broken as it is?
...
-- Kalesh
...
It is permissible to permit the establishment of guard regions in read-only
mappings because the guard regions only reduce access to the mapping, and
when removed simply reinstate the existing attributes of the underlying
VMA, meaning no access violations can occur.
While the change in kernel code introduced in this series is small, the
majority of the effort here is spent in extending the testing to assert
that the feature works correctly across numerous file-backed mapping
scenarios.
Every single guard region self-test performed against anonymous memory
(which is relevant and not anon-only) has now been updated to also be
performed against shmem and a mapping of a file in the working directory.
This confirms that all cases also function correctly for file-backed guard
regions.
In addition a number of other tests are added for specific file-backed
mapping scenarios.
There are a number of other concerns that one might have with regard to
guard regions, addressed below:
Readahead

Readahead is a process through which the page cache is populated on the
assumption that sequential reads will occur, thus amortising I/O and,
through a clever use of the PG_readahead folio flag establishing during
major fault and checked upon minor fault, provides for asynchronous I/O to
occur as dat is processed, reducing I/O stalls as data is faulted in.

Guard regions do not alter this mechanism which operations at the folio and
fault level, but do of course prevent the faulting of folios that would
otherwise be mapped.

In the instance of a major fault prior to a guard region, synchronous
readahead will occur including populating folios in the page cache which
the guard regions will, in the case of the mapping in question, prevent
access to.

In addition, if PG_readahead is placed in a folio that is now inaccessible,
this will prevent asynchronous readahead from occurring as it would
otherwise do.

However, there are mechanisms for heuristically resetting this within
readahead regardless, which will 'recover' correct readahead behaviour.

Readahead presumes sequential data access, the presence of a guard region
clearly indicates that, at least in the guard region, no such sequential
access will occur, as it cannot occur there.

So this should have very little impact on any real workload. The far more
important point is as to whether readahead causes incorrect or
inappropriate mapping of ranges disallowed by the presence of guard
regions - this is not the case, as readahead does not 'pre-fault' memory in
this fashion.

At any rate, any mechanism which would attempt to do so would hit the usual
page fault paths, which correctly handle PTE markers as with anonymous
mappings.

Fault-Around

The fault-around logic, in a similar vein to readahead, attempts to improve
efficiency with regard to file-backed memory mappings, however it differs
in that it does not try to fetch folios into the page cache that are about
to be accessed, but rather pre-maps a range of folios around the faulting
address.
Guard regions making use of PTE markers makes this relatively trivial, as
this case is already handled - see filemap_map_folio_range() and
filemap_map_order0_folio() - in both instances, the solution is to simply
keep the established page table mappings and let the fault handler take
care of PTE markers, as per the comment:
    /*
     * NOTE: If there're PTE markers, we'll leave them to be
     * handled in the specific fault path, and it'll prohibit
     * the fault-around logic.
     */

This works, as establishing guard regions results in page table mappings
with PTE markers, and clearing them removes them.
Truncation

File truncation will not eliminate existing guard regions, as the
truncation operation will ultimately zap the range via
unmap_mapping_range(), which specifically excludes PTE markers.

Zapping
~~~~~~~

Zapping is, as with anonymous mappings, handled by zap_nonpresent_ptes(),
which specifically deals with guard entries, leaving them intact except in
instances such as process teardown or munmap() where they need to be
removed.

Reclaim
~~~~~~~

When reclaim is performed on file-backed folios, it ultimately invokes
try_to_unmap_one() via the rmap. If the folio is non-large, then map_pte()
will ultimately abort the operation for the guard region mapping. If large,
then check_pte() will determine that this is a non-device private
entry/device-exclusive entry 'swap' PTE and thus abort the operation in
that instance.

Therefore, no odd things happen in the instance of reclaim being attempted
upon a file-backed guard region.

Hole Punching

This updates the page cache and ultimately invokes unmap_mapping_range(),
which explicitly leaves PTE markers in place.
Because the establishment of guard regions zapped any existing mappings to
file-backed folios, once the guard regions are removed then the
hole-punched region will be faulted in as usual and everything will behave
as expected.
Lorenzo Stoakes (4):
  mm: allow guard regions in file-backed and read-only mappings
  selftests/mm: rename guard-pages to guard-regions
  tools/selftests: expand all guard region tests to file-backed
  tools/selftests: add file/shmem-backed mapping guard region tests
mm/madvise.c                                  |   8 +-
 tools/testing/selftests/mm/.gitignore         |   2 +-
 tools/testing/selftests/mm/Makefile           |   2 +-
 .../mm/{guard-pages.c => guard-regions.c}     | 921 ++++++++++++++++--
 4 files changed, 821 insertions(+), 112 deletions(-)
 rename tools/testing/selftests/mm/{guard-pages.c => guard-regions.c} (58%)
--
2.48.1

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 0/4] mm: permit guard regions for file-backed/shmem mappings