The patch titled
Subject: IB/core: disable memory registration of fileystem-dax vmas
has been added to the -mm tree. Its filename is
ib-core-disable-memory-registration-of-fileystem-dax-vmas.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/ib-core-disable-memory-registratio…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/ib-core-disable-memory-registratio…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: IB/core: disable memory registration of fileystem-dax vmas
Until there is a solution to the dma-to-dax vs truncate problem it is not
safe to allow RDMA to create long standing memory registrations against
filesytem-dax vmas.
Link: http://lkml.kernel.org/r/151068941011.7446.7766030590347262502.stgit@dwilli…
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reported-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Cc: Sean Hefty <sean.hefty(a)intel.com>
Cc: Doug Ledford <dledford(a)redhat.com>
Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Cc: Jason Gunthorpe <jgunthorpe(a)obsidianresearch.com>
Cc: Inki Dae <inki.dae(a)samsung.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Joonyoung Shim <jy0922.shim(a)samsung.com>
Cc: Kyungmin Park <kyungmin.park(a)samsung.com>
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/infiniband/core/umem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN drivers/infiniband/core/umem.c~ib-core-disable-memory-registration-of-fileystem-dax-vmas drivers/infiniband/core/umem.c
--- a/drivers/infiniband/core/umem.c~ib-core-disable-memory-registration-of-fileystem-dax-vmas
+++ a/drivers/infiniband/core/umem.c
@@ -191,7 +191,7 @@ struct ib_umem *ib_umem_get(struct ib_uc
sg_list_start = umem->sg_head.sgl;
while (npages) {
- ret = get_user_pages(cur_base,
+ ret = get_user_pages_longterm(cur_base,
min_t(unsigned long, npages,
PAGE_SIZE / sizeof (struct page *)),
gup_flags, page_list, vma_list);
_
Patches currently in -mm which might be from dan.j.williams(a)intel.com are
mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages.patch
mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages-v3.patch
mm-switch-to-define-pmd_write-instead-of-__have_arch_pmd_write.patch
mm-replace-pud_write-with-pud_access_permitted-in-fault-gup-paths.patch
mm-replace-pud_write-with-pud_access_permitted-in-fault-gup-paths-v3.patch
mm-replace-pmd_write-with-pmd_access_permitted-in-fault-gup-paths.patch
mm-replace-pte_write-with-pte_access_permitted-in-fault-gup-paths.patch
mm-hugetlbfs-introduce-split-to-vm_operations_struct.patch
device-dax-implement-split-to-catch-invalid-munmap-attempts.patch
mm-introduce-get_user_pages_longterm.patch
mm-fail-get_vaddr_frames-for-filesystem-dax-mappings.patch
v4l2-disable-filesystem-dax-mapping-support.patch
ib-core-disable-memory-registration-of-fileystem-dax-vmas.patch
The patch titled
Subject: v4l2: disable filesystem-dax mapping support
has been added to the -mm tree. Its filename is
v4l2-disable-filesystem-dax-mapping-support.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/v4l2-disable-filesystem-dax-mappin…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/v4l2-disable-filesystem-dax-mappin…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: v4l2: disable filesystem-dax mapping support
V4L2 memory registrations are incompatible with filesystem-dax that needs
the ability to revoke dma access to a mapping at will, or otherwise allow
the kernel to wait for completion of DMA. The filesystem-dax
implementation breaks the traditional solution of truncate of active file
backed mappings since there is no page-cache page we can orphan to sustain
ongoing DMA.
If v4l2 wants to support long lived DMA mappings it needs to arrange to
hold a file lease or use some other mechanism so that the kernel can
coordinate revoking DMA access when the filesystem needs to truncate
mappings.
Link: http://lkml.kernel.org/r/151068940499.7446.12846708245365671207.stgit@dwill…
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reported-by: Jan Kara <jack(a)suse.cz>
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Doug Ledford <dledford(a)redhat.com>
Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com>
Cc: Inki Dae <inki.dae(a)samsung.com>
Cc: Jason Gunthorpe <jgunthorpe(a)obsidianresearch.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Joonyoung Shim <jy0922.shim(a)samsung.com>
Cc: Kyungmin Park <kyungmin.park(a)samsung.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Cc: Sean Hefty <sean.hefty(a)intel.com>
Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/media/v4l2-core/videobuf-dma-sg.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff -puN drivers/media/v4l2-core/videobuf-dma-sg.c~v4l2-disable-filesystem-dax-mapping-support drivers/media/v4l2-core/videobuf-dma-sg.c
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c~v4l2-disable-filesystem-dax-mapping-support
+++ a/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -185,12 +185,13 @@ static int videobuf_dma_init_user_locked
dprintk(1, "init user [0x%lx+0x%lx => %d pages]\n",
data, size, dma->nr_pages);
- err = get_user_pages(data & PAGE_MASK, dma->nr_pages,
+ err = get_user_pages_longterm(data & PAGE_MASK, dma->nr_pages,
flags, dma->pages, NULL);
if (err != dma->nr_pages) {
dma->nr_pages = (err >= 0) ? err : 0;
- dprintk(1, "get_user_pages: err=%d [%d]\n", err, dma->nr_pages);
+ dprintk(1, "get_user_pages_longterm: err=%d [%d]\n", err,
+ dma->nr_pages);
return err < 0 ? err : -EINVAL;
}
return 0;
_
Patches currently in -mm which might be from dan.j.williams(a)intel.com are
mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages.patch
mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages-v3.patch
mm-switch-to-define-pmd_write-instead-of-__have_arch_pmd_write.patch
mm-replace-pud_write-with-pud_access_permitted-in-fault-gup-paths.patch
mm-replace-pud_write-with-pud_access_permitted-in-fault-gup-paths-v3.patch
mm-replace-pmd_write-with-pmd_access_permitted-in-fault-gup-paths.patch
mm-replace-pte_write-with-pte_access_permitted-in-fault-gup-paths.patch
mm-hugetlbfs-introduce-split-to-vm_operations_struct.patch
device-dax-implement-split-to-catch-invalid-munmap-attempts.patch
mm-introduce-get_user_pages_longterm.patch
mm-fail-get_vaddr_frames-for-filesystem-dax-mappings.patch
v4l2-disable-filesystem-dax-mapping-support.patch
ib-core-disable-memory-registration-of-fileystem-dax-vmas.patch
The patch titled
Subject: mm: fail get_vaddr_frames() for filesystem-dax mappings
has been added to the -mm tree. Its filename is
mm-fail-get_vaddr_frames-for-filesystem-dax-mappings.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-fail-get_vaddr_frames-for-files…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-fail-get_vaddr_frames-for-files…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: mm: fail get_vaddr_frames() for filesystem-dax mappings
Until there is a solution to the dma-to-dax vs truncate problem it is not
safe to allow V4L2, Exynos, and other frame vector users to create long
standing / irrevocable memory registrations against filesytem-dax vmas.
Link: http://lkml.kernel.org/r/151068939985.7446.15684639617389154187.stgit@dwill…
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Cc: Inki Dae <inki.dae(a)samsung.com>
Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com>
Cc: Joonyoung Shim <jy0922.shim(a)samsung.com>
Cc: Kyungmin Park <kyungmin.park(a)samsung.com>
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Doug Ledford <dledford(a)redhat.com>
Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com>
Cc: Jason Gunthorpe <jgunthorpe(a)obsidianresearch.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Cc: Sean Hefty <sean.hefty(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/frame_vector.c | 4 ++++
1 file changed, 4 insertions(+)
diff -puN mm/frame_vector.c~mm-fail-get_vaddr_frames-for-filesystem-dax-mappings mm/frame_vector.c
--- a/mm/frame_vector.c~mm-fail-get_vaddr_frames-for-filesystem-dax-mappings
+++ a/mm/frame_vector.c
@@ -53,6 +53,10 @@ int get_vaddr_frames(unsigned long start
ret = -EFAULT;
goto out;
}
+
+ if (vma_is_fsdax(vma))
+ return -EOPNOTSUPP;
+
if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
vec->got_ref = true;
vec->is_pfns = false;
_
Patches currently in -mm which might be from dan.j.williams(a)intel.com are
mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages.patch
mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages-v3.patch
mm-switch-to-define-pmd_write-instead-of-__have_arch_pmd_write.patch
mm-replace-pud_write-with-pud_access_permitted-in-fault-gup-paths.patch
mm-replace-pud_write-with-pud_access_permitted-in-fault-gup-paths-v3.patch
mm-replace-pmd_write-with-pmd_access_permitted-in-fault-gup-paths.patch
mm-replace-pte_write-with-pte_access_permitted-in-fault-gup-paths.patch
mm-hugetlbfs-introduce-split-to-vm_operations_struct.patch
device-dax-implement-split-to-catch-invalid-munmap-attempts.patch
mm-introduce-get_user_pages_longterm.patch
mm-fail-get_vaddr_frames-for-filesystem-dax-mappings.patch
v4l2-disable-filesystem-dax-mapping-support.patch
ib-core-disable-memory-registration-of-fileystem-dax-vmas.patch
The patch titled
Subject: mm: introduce get_user_pages_longterm
has been added to the -mm tree. Its filename is
mm-introduce-get_user_pages_longterm.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-introduce-get_user_pages_longte…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-introduce-get_user_pages_longte…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: mm: introduce get_user_pages_longterm
Patch series "introduce get_user_pages_longterm()", v2.
Here is a new get_user_pages api for cases where a driver intends to keep
an elevated page count indefinitely. This is distinct from usages like
iov_iter_get_pages where the elevated page counts are transient. The
iov_iter_get_pages cases immediately turn around and submit the pages to a
device driver which will put_page when the i/o operation completes (under
kernel control).
In the longterm case userspace is responsible for dropping the page
reference at some undefined point in the future. This is untenable for
filesystem-dax case where the filesystem is in control of the lifetime of
the block / page and needs reasonable limits on how long it can wait for
pages in a mapping to become idle.
Fixing filesystems to actually wait for dax pages to be idle before blocks
from a truncate/hole-punch operation are repurposed is saved for a later
patch series.
Also, allowing longterm registration of dax mappings is a future patch
series that introduces a "map with lease" semantic where the kernel can
revoke a lease and force userspace to drop its page references.
I have also tagged these for -stable to purposely break cases that might
assume that longterm memory registrations for filesystem-dax mappings were
supported by the kernel. The behavior regression this policy change
implies is one of the reasons we maintain the "dax enabled. Warning:
EXPERIMENTAL, use at your own risk" notification when mounting a
filesystem in dax mode.
It is worth noting the device-dax interface does not suffer the same
constraints since it does not support file space management operations
like hole-punch.
This patch (of 4):
Until there is a solution to the dma-to-dax vs truncate problem it is not
safe to allow long standing memory registrations against filesytem-dax
vmas. Device-dax vmas do not have this problem and are explicitly
allowed.
This is temporary until a "memory registration with layout-lease"
mechanism can be implemented for the affected sub-systems (RDMA and V4L2).
Link: http://lkml.kernel.org/r/151068939435.7446.13560129395419350737.stgit@dwill…
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Suggested-by: Christoph Hellwig <hch(a)lst.de>
Cc: Doug Ledford <dledford(a)redhat.com>
Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com>
Cc: Inki Dae <inki.dae(a)samsung.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Jason Gunthorpe <jgunthorpe(a)obsidianresearch.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Joonyoung Shim <jy0922.shim(a)samsung.com>
Cc: Kyungmin Park <kyungmin.park(a)samsung.com>
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Cc: Sean Hefty <sean.hefty(a)intel.com>
Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/fs.h | 14 +++++++++
include/linux/mm.h | 13 ++++++++
mm/gup.c | 64 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 91 insertions(+)
diff -puN include/linux/fs.h~mm-introduce-get_user_pages_longterm include/linux/fs.h
--- a/include/linux/fs.h~mm-introduce-get_user_pages_longterm
+++ a/include/linux/fs.h
@@ -3194,6 +3194,20 @@ static inline bool vma_is_dax(struct vm_
return vma->vm_file && IS_DAX(vma->vm_file->f_mapping->host);
}
+static inline bool vma_is_fsdax(struct vm_area_struct *vma)
+{
+ struct inode *inode;
+
+ if (!vma->vm_file)
+ return false;
+ if (!vma_is_dax(vma))
+ return false;
+ inode = file_inode(vma->vm_file);
+ if (inode->i_mode == S_IFCHR)
+ return false; /* device-dax */
+ return true;
+}
+
static inline int iocb_flags(struct file *file)
{
int res = 0;
diff -puN include/linux/mm.h~mm-introduce-get_user_pages_longterm include/linux/mm.h
--- a/include/linux/mm.h~mm-introduce-get_user_pages_longterm
+++ a/include/linux/mm.h
@@ -1380,6 +1380,19 @@ long get_user_pages_locked(unsigned long
unsigned int gup_flags, struct page **pages, int *locked);
long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
struct page **pages, unsigned int gup_flags);
+#ifdef CONFIG_FS_DAX
+long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
+ unsigned int gup_flags, struct page **pages,
+ struct vm_area_struct **vmas);
+#else
+static inline long get_user_pages_longterm(unsigned long start,
+ unsigned long nr_pages, unsigned int gup_flags,
+ struct page **pages, struct vm_area_struct **vmas)
+{
+ return get_user_pages(start, nr_pages, gup_flags, pages, vmas);
+}
+#endif /* CONFIG_FS_DAX */
+
int get_user_pages_fast(unsigned long start, int nr_pages, int write,
struct page **pages);
diff -puN mm/gup.c~mm-introduce-get_user_pages_longterm mm/gup.c
--- a/mm/gup.c~mm-introduce-get_user_pages_longterm
+++ a/mm/gup.c
@@ -1095,6 +1095,70 @@ long get_user_pages(unsigned long start,
}
EXPORT_SYMBOL(get_user_pages);
+#ifdef CONFIG_FS_DAX
+/*
+ * This is the same as get_user_pages() in that it assumes we are
+ * operating on the current task's mm, but it goes further to validate
+ * that the vmas associated with the address range are suitable for
+ * longterm elevated page reference counts. For example, filesystem-dax
+ * mappings are subject to the lifetime enforced by the filesystem and
+ * we need guarantees that longterm users like RDMA and V4L2 only
+ * establish mappings that have a kernel enforced revocation mechanism.
+ *
+ * "longterm" == userspace controlled elevated page count lifetime.
+ * Contrast this to iov_iter_get_pages() usages which are transient.
+ */
+long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
+ unsigned int gup_flags, struct page **pages,
+ struct vm_area_struct **vmas_arg)
+{
+ struct vm_area_struct **vmas = vmas_arg;
+ struct vm_area_struct *vma_prev = NULL;
+ long rc, i;
+
+ if (!pages)
+ return -EINVAL;
+
+ if (!vmas) {
+ vmas = kzalloc(sizeof(struct vm_area_struct *) * nr_pages,
+ GFP_KERNEL);
+ if (!vmas)
+ return -ENOMEM;
+ }
+
+ rc = get_user_pages(start, nr_pages, gup_flags, pages, vmas);
+
+ for (i = 0; i < rc; i++) {
+ struct vm_area_struct *vma = vmas[i];
+
+ if (vma == vma_prev)
+ continue;
+
+ vma_prev = vma;
+
+ if (vma_is_fsdax(vma))
+ break;
+ }
+
+ /*
+ * Either get_user_pages() failed, or the vma validation
+ * succeeded, in either case we don't need to put_page() before
+ * returning.
+ */
+ if (i >= rc)
+ goto out;
+
+ for (i = 0; i < rc; i++)
+ put_page(pages[i]);
+ rc = -EOPNOTSUPP;
+out:
+ if (vmas != vmas_arg)
+ kfree(vmas);
+ return rc;
+}
+EXPORT_SYMBOL(get_user_pages_longterm);
+#endif /* CONFIG_FS_DAX */
+
/**
* populate_vma_page_range() - populate a range of pages in the vma.
* @vma: target vma
_
Patches currently in -mm which might be from dan.j.williams(a)intel.com are
mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages.patch
mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages-v3.patch
mm-switch-to-define-pmd_write-instead-of-__have_arch_pmd_write.patch
mm-replace-pud_write-with-pud_access_permitted-in-fault-gup-paths.patch
mm-replace-pud_write-with-pud_access_permitted-in-fault-gup-paths-v3.patch
mm-replace-pmd_write-with-pmd_access_permitted-in-fault-gup-paths.patch
mm-replace-pte_write-with-pte_access_permitted-in-fault-gup-paths.patch
mm-hugetlbfs-introduce-split-to-vm_operations_struct.patch
device-dax-implement-split-to-catch-invalid-munmap-attempts.patch
mm-introduce-get_user_pages_longterm.patch
mm-fail-get_vaddr_frames-for-filesystem-dax-mappings.patch
v4l2-disable-filesystem-dax-mapping-support.patch
ib-core-disable-memory-registration-of-fileystem-dax-vmas.patch
The patch titled
Subject: device-dax: implement ->split() to catch invalid munmap attempts
has been added to the -mm tree. Its filename is
device-dax-implement-split-to-catch-invalid-munmap-attempts.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/device-dax-implement-split-to-catc…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/device-dax-implement-split-to-catc…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: device-dax: implement ->split() to catch invalid munmap attempts
Similar to how device-dax enforces that the 'address', 'offset', and 'len'
parameters to mmap() be aligned to the device's fundamental alignment, the
same constraints apply to munmap(). Implement ->split() to fail munmap
calls that violate the alignment constraint. Otherwise, we later fail
VM_BUG_ON checks in the unmap_page_range() path with crash signatures of
the form:
vma ffff8800b60c8a88 start 00007f88c0000000 end 00007f88c0e00000
next (null) prev (null) mm ffff8800b61150c0
prot 8000000000000027 anon_vma (null) vm_ops ffffffffa0091240
pgoff 0 file ffff8800b638ef80 private_data (null)
flags: 0x380000fb(read|write|shared|mayread|maywrite|mayexec|mayshare|softdirty|mixedmap|hugepage)
------------[ cut here ]------------
kernel BUG at mm/huge_memory.c:2014!
[..]
RIP: 0010:__split_huge_pud+0x12a/0x180
[..]
Call Trace:
unmap_page_range+0x245/0xa40
? __vma_adjust+0x301/0x990
unmap_vmas+0x4c/0xa0
unmap_region+0xae/0x120
? __vma_rb_erase+0x11a/0x230
do_munmap+0x276/0x410
vm_munmap+0x6a/0xa0
SyS_munmap+0x1d/0x30
Link: http://lkml.kernel.org/r/151130418681.4029.7118245855057952010.stgit@dwilli…
Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reported-by: Jeff Moyer <jmoyer(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/dax/device.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff -puN drivers/dax/device.c~device-dax-implement-split-to-catch-invalid-munmap-attempts drivers/dax/device.c
--- a/drivers/dax/device.c~device-dax-implement-split-to-catch-invalid-munmap-attempts
+++ a/drivers/dax/device.c
@@ -428,9 +428,21 @@ static int dev_dax_fault(struct vm_fault
return dev_dax_huge_fault(vmf, PE_SIZE_PTE);
}
+static int dev_dax_split(struct vm_area_struct *vma, unsigned long addr)
+{
+ struct file *filp = vma->vm_file;
+ struct dev_dax *dev_dax = filp->private_data;
+ struct dax_region *dax_region = dev_dax->region;
+
+ if (!IS_ALIGNED(addr, dax_region->align))
+ return -EINVAL;
+ return 0;
+}
+
static const struct vm_operations_struct dax_vm_ops = {
.fault = dev_dax_fault,
.huge_fault = dev_dax_huge_fault,
+ .split = dev_dax_split,
};
static int dax_mmap(struct file *filp, struct vm_area_struct *vma)
_
Patches currently in -mm which might be from dan.j.williams(a)intel.com are
mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages.patch
mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages-v3.patch
mm-switch-to-define-pmd_write-instead-of-__have_arch_pmd_write.patch
mm-replace-pud_write-with-pud_access_permitted-in-fault-gup-paths.patch
mm-replace-pud_write-with-pud_access_permitted-in-fault-gup-paths-v3.patch
mm-replace-pmd_write-with-pmd_access_permitted-in-fault-gup-paths.patch
mm-replace-pte_write-with-pte_access_permitted-in-fault-gup-paths.patch
mm-hugetlbfs-introduce-split-to-vm_operations_struct.patch
device-dax-implement-split-to-catch-invalid-munmap-attempts.patch
The patch titled
Subject: mm, hugetlbfs: introduce ->split() to vm_operations_struct
has been added to the -mm tree. Its filename is
mm-hugetlbfs-introduce-split-to-vm_operations_struct.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-hugetlbfs-introduce-split-to-vm…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-hugetlbfs-introduce-split-to-vm…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: mm, hugetlbfs: introduce ->split() to vm_operations_struct
Patch series "device-dax: fix unaligned munmap handling"
When device-dax is operating in huge-page mode we want it to behave like
hugetlbfs and fail attempts to split vmas into unaligned ranges. It would
be messy to teach the munmap path about device-dax alignment constraints
in the same (hstate) way that hugetlbfs communicates this constraint.
Instead, these patches introduce a new ->split() vm operation.
This patch (of 2):
The device-dax interface has similar constraints as hugetlbfs in that it
requires the munmap path to unmap in huge page aligned units. Rather than
add more custom vma handling code in __split_vma() introduce a new vm
operation to perform this vma specific check.
Link: http://lkml.kernel.org/r/151130418135.4029.6783191281930729710.stgit@dwilli…
Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mm.h | 1 +
mm/hugetlb.c | 8 ++++++++
mm/mmap.c | 8 +++++---
3 files changed, 14 insertions(+), 3 deletions(-)
diff -puN include/linux/mm.h~mm-hugetlbfs-introduce-split-to-vm_operations_struct include/linux/mm.h
--- a/include/linux/mm.h~mm-hugetlbfs-introduce-split-to-vm_operations_struct
+++ a/include/linux/mm.h
@@ -377,6 +377,7 @@ enum page_entry_size {
struct vm_operations_struct {
void (*open)(struct vm_area_struct * area);
void (*close)(struct vm_area_struct * area);
+ int (*split)(struct vm_area_struct * area, unsigned long addr);
int (*mremap)(struct vm_area_struct * area);
int (*fault)(struct vm_fault *vmf);
int (*huge_fault)(struct vm_fault *vmf, enum page_entry_size pe_size);
diff -puN mm/hugetlb.c~mm-hugetlbfs-introduce-split-to-vm_operations_struct mm/hugetlb.c
--- a/mm/hugetlb.c~mm-hugetlbfs-introduce-split-to-vm_operations_struct
+++ a/mm/hugetlb.c
@@ -3125,6 +3125,13 @@ static void hugetlb_vm_op_close(struct v
}
}
+static int hugetlb_vm_op_split(struct vm_area_struct *vma, unsigned long addr)
+{
+ if (addr & ~(huge_page_mask(hstate_vma(vma))))
+ return -EINVAL;
+ return 0;
+}
+
/*
* We cannot handle pagefaults against hugetlb pages at all. They cause
* handle_mm_fault() to try to instantiate regular-sized pages in the
@@ -3141,6 +3148,7 @@ const struct vm_operations_struct hugetl
.fault = hugetlb_vm_op_fault,
.open = hugetlb_vm_op_open,
.close = hugetlb_vm_op_close,
+ .split = hugetlb_vm_op_split,
};
static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page,
diff -puN mm/mmap.c~mm-hugetlbfs-introduce-split-to-vm_operations_struct mm/mmap.c
--- a/mm/mmap.c~mm-hugetlbfs-introduce-split-to-vm_operations_struct
+++ a/mm/mmap.c
@@ -2555,9 +2555,11 @@ int __split_vma(struct mm_struct *mm, st
struct vm_area_struct *new;
int err;
- if (is_vm_hugetlb_page(vma) && (addr &
- ~(huge_page_mask(hstate_vma(vma)))))
- return -EINVAL;
+ if (vma->vm_ops && vma->vm_ops->split) {
+ err = vma->vm_ops->split(vma, addr);
+ if (err)
+ return err;
+ }
new = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
if (!new)
_
Patches currently in -mm which might be from dan.j.williams(a)intel.com are
mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages.patch
mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages-v3.patch
mm-switch-to-define-pmd_write-instead-of-__have_arch_pmd_write.patch
mm-replace-pud_write-with-pud_access_permitted-in-fault-gup-paths.patch
mm-replace-pud_write-with-pud_access_permitted-in-fault-gup-paths-v3.patch
mm-replace-pmd_write-with-pmd_access_permitted-in-fault-gup-paths.patch
mm-replace-pte_write-with-pte_access_permitted-in-fault-gup-paths.patch
mm-hugetlbfs-introduce-split-to-vm_operations_struct.patch
device-dax-implement-split-to-catch-invalid-munmap-attempts.patch
Hi Andrew,
Here is another device-dax fix that requires touching some mm code. When
device-dax is operating in huge-page mode we want it to behave like
hugetlbfs and fail attempts to split vmas into unaligned ranges. It
would be messy to teach the munmap path about device-dax alignment
constraints in the same (hstate) way that hugetlbfs communicates this
constraint. Instead, these patches introduce a new ->split() vm
operation.
---
Dan Williams (2):
mm, hugetlbfs: introduce ->split() to vm_operations_struct
device-dax: implement ->split() to catch invalid munmap attempts
drivers/dax/device.c | 12 ++++++++++++
include/linux/mm.h | 1 +
mm/hugetlb.c | 8 ++++++++
mm/mmap.c | 8 +++++---
4 files changed, 26 insertions(+), 3 deletions(-)
The patch titled
Subject: mm: migrate: fix an incorrect call of prep_transhuge_page()
has been added to the -mm tree. Its filename is
mm-migrate-fix-an-incorrect-call-of-prep_transhuge_page.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-migrate-fix-an-incorrect-call-o…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-migrate-fix-an-incorrect-call-o…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Zi Yan <zi.yan(a)cs.rutgers.edu>
Subject: mm: migrate: fix an incorrect call of prep_transhuge_page()
In https://lkml.org/lkml/2017/11/20/411, Andrea reported that during
memory hotplug/hot remove prep_transhuge_page() is called incorrectly on
non-THP pages for migration, when THP is on but THP migration is not
enabled. This leads to a bad state of target pages for migration.
This patch fixes it by only calling prep_transhuge_page() when we are
certain that the target page is THP.
Link: http://lkml.kernel.org/r/20171121021855.50525-1-zi.yan@sent.com
Fixes: 8135d8926c08 ("mm: memory_hotplug: memory hotremove supports thp migration")
Signed-off-by: Zi Yan <zi.yan(a)cs.rutgers.edu>
Reported-by: Andrea Reale <ar(a)linux.vnet.ibm.com>
Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
Cc: <stable(a)vger.kernel.org> [4.14]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/migrate.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN include/linux/migrate.h~mm-migrate-fix-an-incorrect-call-of-prep_transhuge_page include/linux/migrate.h
--- a/include/linux/migrate.h~mm-migrate-fix-an-incorrect-call-of-prep_transhuge_page
+++ a/include/linux/migrate.h
@@ -54,7 +54,7 @@ static inline struct page *new_page_node
new_page = __alloc_pages_nodemask(gfp_mask, order,
preferred_nid, nodemask);
- if (new_page && PageTransHuge(page))
+ if (new_page && PageTransHuge(new_page))
prep_transhuge_page(new_page);
return new_page;
_
Patches currently in -mm which might be from zi.yan(a)cs.rutgers.edu are
mm-migrate-fix-an-incorrect-call-of-prep_transhuge_page.patch