Re: [PATCH v5] userfaultfd/shmem: fix MCOPY_ATOMIC_CONTINUE behavior

9 Apr 2021

      On Thu, 8 Apr 2021, Axel Rasmussen wrote:
...
On Tue, Apr 6, 2021 at 4:49 PM Peter Xu peterx@redhat.com wrote:
...
On Mon, Apr 05, 2021 at 10:19:17AM -0700, Axel Rasmussen wrote:
...
...
...
...
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
...
...
...
...

if (is_file_backed) {

        /* The shmem MAP_PRIVATE case requires checking the i_size */

        inode = dst_vma->vm_file->f_inode;

        offset = linear_page_index(dst_vma, dst_addr);

        max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);

        ret = -EFAULT;

        if (unlikely(offset >= max_off))

                goto out_unlock;

Frankly I don't really know why this must be put into pgtable lock..  Since if
not required then it can be moved into UFFDIO_COPY path, as CONTINUE doesn't
need it iiuc.  Just raise it up as a pure question.
It's not clear to me either. shmem_getpage_gfp() does check this twice
kinda like we're doing, but it doesn't ever touch the PTL. What it
seems to be worried about is, what happens if a concurrent
FALLOC_FL_PUNCH_HOLE happens somewhere in the middle of whatever
manipulation we're doing? From looking at shmem_fallocate(), I think
the basic point is that truncation happens while "inode_lock(inode)"
is held, but neither shmem_mcopy_atomic_pte() or the new
mcopy_atomic_install_ptes() take that lock.
I'm a bit hesitant to just remove it, run some tests, and then declare
victory, because it seems plausible it's there to catch some
semi-hard-to-induce race. I'm not sure how to prove that *isn't*
needed, so my inclination is to just keep it?
I'll send a series addressing the feedback so far this afternoon, and
I'll leave this alone for now - at least, it doesn't seem to hurt
anything. Maybe Hugh or someone else has some more advice about it. If
so, I'm happy to remove it in a follow-up.
It takes some thinking about, but the i_size check is required to be
under the pagetable lock, for the MAP_PRIVATE UFFDIO_COPY path, where
it is inserting an anonymous page into the file-backed vma (skipping
actually inserting a page into page cache, as an ordinary fault would).
Not because of FALLOC_FL_PUNCH_HOLE (which makes no change to i_size;
and it's okay if a race fills in the hole immediately afterwards),
but because of truncation (which must remove all beyond i_size).
In the MAP_SHARED case, with a locked page inserted into page cache,
the page lock is enough to exclude concurrent truncation.  But even
in that case the second i_size check (I'm looking at 5.12-rc's
shmem_mfill_atomic_pte(), rather than recent patches which might differ)
is required: because the first i_size check was done before the page
became visible in page cache, so a concurrent truncation could miss it).
Maybe that first check is redundant, though I'm definitely for doing it;
or maybe shmem_add_to_page_cache() would be better if it made that check
itself, under xas_lock (I think the reason it does not is historical).
The second check, in the MAP_SHARED case, does not need to be under
pagetable lock - the page lock on the page cache page is enough -
but probably Andrea placed it there to resemble the anonymous case.
You might then question, how come there is no i_size check in all of
mm/memory.c, where ordinary faulting is handled.  I'll answer that
the pte_same() check, under pagetable lock in wp_page_copy(), is
where the equivalent to userfaultfd's MAP_PRIVATE UFFDIO_COPY check
is made: if the page cache page has already been truncated, that pte
will have been cleared.
Or, if the page cache page is truncated an instant after wp_page_copy()
drops page table lock, then the unmap_mapping_range(,,, even_cows = 1)
which follows truncation has to clean it up.  Er, does that mean that
the i_size check I started off insisting is required, actually is not
required?  Um, maybe, but let's just keep it and say goodnight!
Hugh

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v5] userfaultfd/shmem: fix MCOPY_ATOMIC_CONTINUE behavior