Re: [PATCH 4/9] userfaultfd/shmem: support UFFDIO_CONTINUE for shmem

13 Apr 2021

      On Mon, Apr 12, 2021 at 4:17 PM Peter Xu peterx@redhat.com wrote:
...
On Thu, Apr 08, 2021 at 04:43:22PM -0700, Axel Rasmussen wrote:
...
+/*

Install PTEs, to map dst_addr (within dst_vma) to page.

This function handles MCOPY_ATOMIC_CONTINUE (which is always file-backed),

whether or not dst_vma is VM_SHARED. It also handles the more general

MCOPY_ATOMIC_NORMAL case, when dst_vma is *not* VM_SHARED (it may be file

backed, or not).

Note that MCOPY_ATOMIC_NORMAL for a VM_SHARED dst_vma is handled by

shmem_mcopy_atomic_pte instead.

*/

+static int mcopy_atomic_install_ptes(struct mm_struct *dst_mm, pmd_t *dst_pmd,

                             struct vm_area_struct *dst_vma,

                             unsigned long dst_addr, struct page *page,

                             bool newly_allocated, bool wp_copy)

+{

int ret;

pte_t _dst_pte, *dst_pte;

int writable;

bool vm_shared = dst_vma->vm_flags & VM_SHARED;

spinlock_t *ptl;

struct inode *inode;

pgoff_t offset, max_off;

_dst_pte = mk_pte(page, dst_vma->vm_page_prot);

writable = dst_vma->vm_flags & VM_WRITE;

/* For private, non-anon we need CoW (don't write to page cache!) */

if (!vma_is_anonymous(dst_vma) && !vm_shared)

        writable = 0;

if (writable || vma_is_anonymous(dst_vma))

        _dst_pte = pte_mkdirty(_dst_pte);

if (writable) {

        if (wp_copy)

                _dst_pte = pte_mkuffd_wp(_dst_pte);

        else

                _dst_pte = pte_mkwrite(_dst_pte);

} else if (vm_shared) {

        /*

         * Since we didn't pte_mkdirty(), mark the page dirty or it

         * could be freed from under us. We could do this

         * unconditionally, but doing it only if !writable is faster.

         */

        set_page_dirty(page);

}

dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl);

if (vma_is_shmem(dst_vma)) {

        /* The shmem MAP_PRIVATE case requires checking the i_size */

When you start to use this function in the last patch it'll be needed too even
if MAP_SHARED?
How about directly state the reason of doing this ("serialize against truncate
with the PT lock") instead of commenting about "who will need it"?
...

        inode = dst_vma->vm_file->f_inode;

        offset = linear_page_index(dst_vma, dst_addr);

        max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);

        ret = -EFAULT;

        if (unlikely(offset >= max_off))

                goto out_unlock;

}

[...]
...
+/* Handles UFFDIO_CONTINUE for all shmem VMAs (shared or private). */
+static int mcontinue_atomic_pte(struct mm_struct *dst_mm,

                        pmd_t *dst_pmd,

                        struct vm_area_struct *dst_vma,

                        unsigned long dst_addr,

                        bool wp_copy)

+{

struct inode *inode = file_inode(dst_vma->vm_file);

pgoff_t pgoff = linear_page_index(dst_vma, dst_addr);

struct page *page;

int ret;

ret = shmem_getpage(inode, pgoff, &page, SGP_READ);

SGP_READ looks right, as we don't want page allocation.  However I noticed
there's very slight difference when the page was just fallocated:
    /* fallocated page? */
    if (page && !PageUptodate(page)) {
            if (sgp != SGP_READ)
                    goto clear;
            unlock_page(page);
            put_page(page);
            page = NULL;
            hindex = index;
    }

I think it won't happen for your case since the page should be uptodate already
(the other thread should check and modify the page before CONTINUE), but still
raise this up, since if the page was allocated it smells better to still
install the fallocated page (do we need to clear the page and SetUptodate)?
Sorry for the somewhat rambling thought process:
My first thought is, I don't really know what PageUptodate means for
shmem pages. If I understand correctly, normally we say PageUptodate()
if the in memory data is more recent or equivalent to the on-disk
data. But, shmem pages are entirely in memory - they are file backed
in name only, in some sense.
fallocate() does all sorts of things so the comment to me seems a bit
ambiguous, but it seems the implication is that we're worried
specifically about the case where the shmem page was recently
allocated with fallocate(mode=0)? In that case, do we use
!PageUptodate() to denote that the page has been allocated, but its
contents are undefined?
I suppose that would make sense, as the action "goto clear;" generally
memset()-s the page to zero it, and then calls SetPageUptodate().
Okay so let's say the following sequence of events happens:
1. Userspace calls fallocate(mode=0) to allocate some shmem pages.
2. Another thread, via a UFFD-registered mapping, manages to trigger a
minor fault on one such page, while we still have !PageUptodate().
(I'm not 100% sure this can happen, but let's say it can.)
3. UFFD handler thread gets the minor fault event, and for whatever
(buggy?) reason does nothing - it doesn't modify the page, it just
calls CONTINUE.
I think if we get to this point, zeroing the page, returning it, and
setting up the PTEs seems somewhat reasonable to me. I suppose
alternatively we could notice that this happened and return an error
to the caller? I'm hesitant to mess with the behavior of
shmem_getpage_gfp() to make such a thing happen though. I do think if
we're going to set up the PTEs instead of returning an error, we
definitely do need to clear and SetPageUptodate() the page first.
In conclusion, I think this behavior is correct.
...
--
Peter Xu

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 4/9] userfaultfd/shmem: support UFFDIO_CONTINUE for shmem