November 2017 - Linux-stable-mirror

[Linux-stable-mirror] [patch 22/28] mm: migrate: fix an incorrect call of prep_transhuge_page()

by akpm＠linux-foundation.org

From: Zi Yan <zi.yan(a)cs.rutgers.edu> Subject: mm: migrate: fix an incorrect call of prep_transhuge_page() In https://lkml.org/lkml/2017/11/20/411, Andrea reported that during memory hotplug/hot remove prep_transhuge_page() is called incorrectly on non-THP pages for migration, when THP is on but THP migration is not enabled. This leads to a bad state of target pages for migration. By inspecting the code, if called on a non-THP, prep_transhuge_page() will 1) change the value of the mapping of (page + 2), since it is used for THP deferred list; 2) change the lru value of (page + 1), since it is used for THP's dtor. Both can lead to data corruption of these two pages. Andrea said: : Pragmatically and from the point of view of the memory_hotplug subsys, the : effect is a kernel crash when pages are being migrated during a memory hot : remove offline and migration target pages are found in a bad state. This patch fixes it by only calling prep_transhuge_page() when we are certain that the target page is THP. Link: http://lkml.kernel.org/r/20171121021855.50525-1-zi.yan@sent.com Fixes: 8135d8926c08 ("mm: memory_hotplug: memory hotremove supports thp migration") Signed-off-by: Zi Yan <zi.yan(a)cs.rutgers.edu> Reported-by: Andrea Reale <ar(a)linux.vnet.ibm.com> Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com> Cc: Michal Hocko <mhocko(a)kernel.org> Cc: "Jérôme Glisse" <jglisse(a)redhat.com> Cc: <stable(a)vger.kernel.org> [4.14] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/migrate.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff -puN include/linux/migrate.h~mm-migrate-fix-an-incorrect-call-of-prep_transhuge_page include/linux/migrate.h --- a/include/linux/migrate.h~mm-migrate-fix-an-incorrect-call-of-prep_transhuge_page +++ a/include/linux/migrate.h @@ -54,7 +54,7 @@ static inline struct page *new_page_node new_page = __alloc_pages_nodemask(gfp_mask, order, preferred_nid, nodemask); - if (new_page && PageTransHuge(page)) + if (new_page && PageTransHuge(new_page)) prep_transhuge_page(new_page); return new_page; _

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] [patch 17/28] mm/madvise.c: fix madvise() infinite loop under special circumstances

by akpm＠linux-foundation.org

From: chenjie <chenjie6(a)huawei.com> Subject: mm/madvise.c: fix madvise() infinite loop under special circumstances MADVISE_WILLNEED has always been a noop for DAX (formerly XIP) mappings. Unfortunately madvise_willneed() doesn't communicate this information properly to the generic madvise syscall implementation. The calling convention is quite subtle there. madvise_vma() is supposed to either return an error or update &prev otherwise the main loop will never advance to the next vma and it will keep looping for ever without a way to get out of the kernel. It seems this has been broken since introduction. Nobody has noticed because nobody seems to be using MADVISE_WILLNEED on these DAX mappings. [mhocko(a)suse.com: rewrite changelog] Link: http://lkml.kernel.org/r/20171127115318.911-1-guoxuenan@huawei.com Fixes: fe77ba6f4f97 ("[PATCH] xip: madvice/fadvice: execute in place") Signed-off-by: chenjie <chenjie6(a)huawei.com> Signed-off-by: guoxuenan <guoxuenan(a)huawei.com> Acked-by: Michal Hocko <mhocko(a)suse.com> Cc: Minchan Kim <minchan(a)kernel.org> Cc: zhangyi (F) <yi.zhang(a)huawei.com> Cc: Miao Xie <miaoxie(a)huawei.com> Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com> Cc: Shaohua Li <shli(a)fb.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Mel Gorman <mgorman(a)techsingularity.net> Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: David Rientjes <rientjes(a)google.com> Cc: Anshuman Khandual <khandual(a)linux.vnet.ibm.com> Cc: Rik van Riel <riel(a)redhat.com> Cc: Carsten Otte <cotte(a)de.ibm.com> Cc: Dan Williams <dan.j.williams(a)intel.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/madvise.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff -puN mm/madvise.c~mmmadvise-bugfix-of-madvise-systemcall-infinite-loop-under-special-circumstances mm/madvise.c --- a/mm/madvise.c~mmmadvise-bugfix-of-madvise-systemcall-infinite-loop-under-special-circumstances +++ a/mm/madvise.c @@ -276,15 +276,14 @@ static long madvise_willneed(struct vm_a { struct file *file = vma->vm_file; + *prev = vma; #ifdef CONFIG_SWAP if (!file) { - *prev = vma; force_swapin_readahead(vma, start, end); return 0; } if (shmem_mapping(file->f_mapping)) { - *prev = vma; force_shm_swapin_readahead(vma, start, end, file->f_mapping); return 0; @@ -299,7 +298,6 @@ static long madvise_willneed(struct vm_a return 0; } - *prev = vma; start = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; if (end > vma->vm_end) end = vma->vm_end; _

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] [patch 16/28] exec: avoid RLIMIT_STACK races with prlimit()

by akpm＠linux-foundation.org

From: Kees Cook <keescook(a)chromium.org> Subject: exec: avoid RLIMIT_STACK races with prlimit() While the defense-in-depth RLIMIT_STACK limit on setuid processes was protected against races from other threads calling setrlimit(), I missed protecting it against races from external processes calling prlimit(). This adds locking around the change and makes sure that rlim_max is set too. Link: http://lkml.kernel.org/r/20171127193457.GA11348@beast Fixes: 64701dee4178e ("exec: Use sane stack rlimit under secureexec") Signed-off-by: Kees Cook <keescook(a)chromium.org> Reported-by: Ben Hutchings <ben.hutchings(a)codethink.co.uk> Reported-by: Brad Spengler <spender(a)grsecurity.net> Acked-by: Serge Hallyn <serge(a)hallyn.com> Cc: James Morris <james.l.morris(a)oracle.com> Cc: Andy Lutomirski <luto(a)kernel.org> Cc: Oleg Nesterov <oleg(a)redhat.com> Cc: Jiri Slaby <jslaby(a)suse.cz> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/exec.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff -puN fs/exec.c~exec-avoid-rlimit_stack-races-with-prlimit fs/exec.c --- a/fs/exec.c~exec-avoid-rlimit_stack-races-with-prlimit +++ a/fs/exec.c @@ -1340,10 +1340,15 @@ void setup_new_exec(struct linux_binprm * avoid bad behavior from the prior rlimits. This has to * happen before arch_pick_mmap_layout(), which examines * RLIMIT_STACK, but after the point of no return to avoid - * needing to clean up the change on failure. + * races from other threads changing the limits. This also + * must be protected from races with prlimit() calls. */ + task_lock(current->group_leader); if (current->signal->rlim[RLIMIT_STACK].rlim_cur > _STK_LIM) current->signal->rlim[RLIMIT_STACK].rlim_cur = _STK_LIM; + if (current->signal->rlim[RLIMIT_STACK].rlim_max > _STK_LIM) + current->signal->rlim[RLIMIT_STACK].rlim_max = _STK_LIM; + task_unlock(current->group_leader); } arch_pick_mmap_layout(current->mm); _

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] [patch 15/28] IB/core: disable memory registration of filesystem-dax vmas

by akpm＠linux-foundation.org

From: Dan Williams <dan.j.williams(a)intel.com> Subject: IB/core: disable memory registration of filesystem-dax vmas Until there is a solution to the dma-to-dax vs truncate problem it is not safe to allow RDMA to create long standing memory registrations against filesytem-dax vmas. Link: http://lkml.kernel.org/r/151068941011.7446.7766030590347262502.stgit@dwilli… Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Reported-by: Christoph Hellwig <hch(a)lst.de> Reviewed-by: Christoph Hellwig <hch(a)lst.de> Acked-by: Jason Gunthorpe <jgg(a)mellanox.com> Cc: Sean Hefty <sean.hefty(a)intel.com> Cc: Doug Ledford <dledford(a)redhat.com> Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com> Cc: Jeff Moyer <jmoyer(a)redhat.com> Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com> Cc: Inki Dae <inki.dae(a)samsung.com> Cc: Jan Kara <jack(a)suse.cz> Cc: Joonyoung Shim <jy0922.shim(a)samsung.com> Cc: Kyungmin Park <kyungmin.park(a)samsung.com> Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org> Cc: Mel Gorman <mgorman(a)suse.de> Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- drivers/infiniband/core/umem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff -puN drivers/infiniband/core/umem.c~ib-core-disable-memory-registration-of-fileystem-dax-vmas drivers/infiniband/core/umem.c --- a/drivers/infiniband/core/umem.c~ib-core-disable-memory-registration-of-fileystem-dax-vmas +++ a/drivers/infiniband/core/umem.c @@ -191,7 +191,7 @@ struct ib_umem *ib_umem_get(struct ib_uc sg_list_start = umem->sg_head.sgl; while (npages) { - ret = get_user_pages(cur_base, + ret = get_user_pages_longterm(cur_base, min_t(unsigned long, npages, PAGE_SIZE / sizeof (struct page *)), gup_flags, page_list, vma_list); _

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] [patch 14/28] v4l2: disable filesystem-dax mapping support

by akpm＠linux-foundation.org

From: Dan Williams <dan.j.williams(a)intel.com> Subject: v4l2: disable filesystem-dax mapping support V4L2 memory registrations are incompatible with filesystem-dax that needs the ability to revoke dma access to a mapping at will, or otherwise allow the kernel to wait for completion of DMA. The filesystem-dax implementation breaks the traditional solution of truncate of active file backed mappings since there is no page-cache page we can orphan to sustain ongoing DMA. If v4l2 wants to support long lived DMA mappings it needs to arrange to hold a file lease or use some other mechanism so that the kernel can coordinate revoking DMA access when the filesystem needs to truncate mappings. Link: http://lkml.kernel.org/r/151068940499.7446.12846708245365671207.stgit@dwill… Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Reported-by: Jan Kara <jack(a)suse.cz> Reviewed-by: Jan Kara <jack(a)suse.cz> Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org> Cc: Christoph Hellwig <hch(a)lst.de> Cc: Doug Ledford <dledford(a)redhat.com> Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com> Cc: Inki Dae <inki.dae(a)samsung.com> Cc: Jason Gunthorpe <jgg(a)mellanox.com> Cc: Jeff Moyer <jmoyer(a)redhat.com> Cc: Joonyoung Shim <jy0922.shim(a)samsung.com> Cc: Kyungmin Park <kyungmin.park(a)samsung.com> Cc: Mel Gorman <mgorman(a)suse.de> Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com> Cc: Sean Hefty <sean.hefty(a)intel.com> Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- drivers/media/v4l2-core/videobuf-dma-sg.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff -puN drivers/media/v4l2-core/videobuf-dma-sg.c~v4l2-disable-filesystem-dax-mapping-support drivers/media/v4l2-core/videobuf-dma-sg.c --- a/drivers/media/v4l2-core/videobuf-dma-sg.c~v4l2-disable-filesystem-dax-mapping-support +++ a/drivers/media/v4l2-core/videobuf-dma-sg.c @@ -185,12 +185,13 @@ static int videobuf_dma_init_user_locked dprintk(1, "init user [0x%lx+0x%lx => %d pages]\n", data, size, dma->nr_pages); - err = get_user_pages(data & PAGE_MASK, dma->nr_pages, + err = get_user_pages_longterm(data & PAGE_MASK, dma->nr_pages, flags, dma->pages, NULL); if (err != dma->nr_pages) { dma->nr_pages = (err >= 0) ? err : 0; - dprintk(1, "get_user_pages: err=%d [%d]\n", err, dma->nr_pages); + dprintk(1, "get_user_pages_longterm: err=%d [%d]\n", err, + dma->nr_pages); return err < 0 ? err : -EINVAL; } return 0; _

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] [patch 13/28] mm: fail get_vaddr_frames() for filesystem-dax mappings

by akpm＠linux-foundation.org

From: Dan Williams <dan.j.williams(a)intel.com> Subject: mm: fail get_vaddr_frames() for filesystem-dax mappings Until there is a solution to the dma-to-dax vs truncate problem it is not safe to allow V4L2, Exynos, and other frame vector users to create long standing / irrevocable memory registrations against filesytem-dax vmas. [dan.j.williams(a)intel.com: add comment for vma_is_fsdax() check in get_vaddr_frames(), per Jan] Link: http://lkml.kernel.org/r/151197874035.26211.4061781453123083667.stgit@dwill… Link: http://lkml.kernel.org/r/151068939985.7446.15684639617389154187.stgit@dwill… Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Reviewed-by: Jan Kara <jack(a)suse.cz> Cc: Inki Dae <inki.dae(a)samsung.com> Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com> Cc: Joonyoung Shim <jy0922.shim(a)samsung.com> Cc: Kyungmin Park <kyungmin.park(a)samsung.com> Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org> Cc: Mel Gorman <mgorman(a)suse.de> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: Christoph Hellwig <hch(a)lst.de> Cc: Doug Ledford <dledford(a)redhat.com> Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com> Cc: Jason Gunthorpe <jgg(a)mellanox.com> Cc: Jeff Moyer <jmoyer(a)redhat.com> Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com> Cc: Sean Hefty <sean.hefty(a)intel.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/frame_vector.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff -puN mm/frame_vector.c~mm-fail-get_vaddr_frames-for-filesystem-dax-mappings mm/frame_vector.c --- a/mm/frame_vector.c~mm-fail-get_vaddr_frames-for-filesystem-dax-mappings +++ a/mm/frame_vector.c @@ -53,6 +53,18 @@ int get_vaddr_frames(unsigned long start ret = -EFAULT; goto out; } + + /* + * While get_vaddr_frames() could be used for transient (kernel + * controlled lifetime) pinning of memory pages all current + * users establish long term (userspace controlled lifetime) + * page pinning. Treat get_vaddr_frames() like + * get_user_pages_longterm() and disallow it for filesystem-dax + * mappings. + */ + if (vma_is_fsdax(vma)) + return -EOPNOTSUPP; + if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) { vec->got_ref = true; vec->is_pfns = false; _

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] [patch 12/28] mm: introduce get_user_pages_longterm

by akpm＠linux-foundation.org

From: Dan Williams <dan.j.williams(a)intel.com> Subject: mm: introduce get_user_pages_longterm Patch series "introduce get_user_pages_longterm()", v2. Here is a new get_user_pages api for cases where a driver intends to keep an elevated page count indefinitely. This is distinct from usages like iov_iter_get_pages where the elevated page counts are transient. The iov_iter_get_pages cases immediately turn around and submit the pages to a device driver which will put_page when the i/o operation completes (under kernel control). In the longterm case userspace is responsible for dropping the page reference at some undefined point in the future. This is untenable for filesystem-dax case where the filesystem is in control of the lifetime of the block / page and needs reasonable limits on how long it can wait for pages in a mapping to become idle. Fixing filesystems to actually wait for dax pages to be idle before blocks from a truncate/hole-punch operation are repurposed is saved for a later patch series. Also, allowing longterm registration of dax mappings is a future patch series that introduces a "map with lease" semantic where the kernel can revoke a lease and force userspace to drop its page references. I have also tagged these for -stable to purposely break cases that might assume that longterm memory registrations for filesystem-dax mappings were supported by the kernel. The behavior regression this policy change implies is one of the reasons we maintain the "dax enabled. Warning: EXPERIMENTAL, use at your own risk" notification when mounting a filesystem in dax mode. It is worth noting the device-dax interface does not suffer the same constraints since it does not support file space management operations like hole-punch. This patch (of 4): Until there is a solution to the dma-to-dax vs truncate problem it is not safe to allow long standing memory registrations against filesytem-dax vmas. Device-dax vmas do not have this problem and are explicitly allowed. This is temporary until a "memory registration with layout-lease" mechanism can be implemented for the affected sub-systems (RDMA and V4L2). [akpm(a)linux-foundation.org: use kcalloc()] Link: http://lkml.kernel.org/r/151068939435.7446.13560129395419350737.stgit@dwill… Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Suggested-by: Christoph Hellwig <hch(a)lst.de> Cc: Doug Ledford <dledford(a)redhat.com> Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com> Cc: Inki Dae <inki.dae(a)samsung.com> Cc: Jan Kara <jack(a)suse.cz> Cc: Jason Gunthorpe <jgg(a)mellanox.com> Cc: Jeff Moyer <jmoyer(a)redhat.com> Cc: Joonyoung Shim <jy0922.shim(a)samsung.com> Cc: Kyungmin Park <kyungmin.park(a)samsung.com> Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org> Cc: Mel Gorman <mgorman(a)suse.de> Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com> Cc: Sean Hefty <sean.hefty(a)intel.com> Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/fs.h | 14 +++++++++ include/linux/mm.h | 13 ++++++++ mm/gup.c | 64 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 91 insertions(+) diff -puN include/linux/fs.h~mm-introduce-get_user_pages_longterm include/linux/fs.h --- a/include/linux/fs.h~mm-introduce-get_user_pages_longterm +++ a/include/linux/fs.h @@ -3194,6 +3194,20 @@ static inline bool vma_is_dax(struct vm_ return vma->vm_file && IS_DAX(vma->vm_file->f_mapping->host); } +static inline bool vma_is_fsdax(struct vm_area_struct *vma) +{ + struct inode *inode; + + if (!vma->vm_file) + return false; + if (!vma_is_dax(vma)) + return false; + inode = file_inode(vma->vm_file); + if (inode->i_mode == S_IFCHR) + return false; /* device-dax */ + return true; +} + static inline int iocb_flags(struct file *file) { int res = 0; diff -puN include/linux/mm.h~mm-introduce-get_user_pages_longterm include/linux/mm.h --- a/include/linux/mm.h~mm-introduce-get_user_pages_longterm +++ a/include/linux/mm.h @@ -1380,6 +1380,19 @@ long get_user_pages_locked(unsigned long unsigned int gup_flags, struct page **pages, int *locked); long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages, struct page **pages, unsigned int gup_flags); +#ifdef CONFIG_FS_DAX +long get_user_pages_longterm(unsigned long start, unsigned long nr_pages, + unsigned int gup_flags, struct page **pages, + struct vm_area_struct **vmas); +#else +static inline long get_user_pages_longterm(unsigned long start, + unsigned long nr_pages, unsigned int gup_flags, + struct page **pages, struct vm_area_struct **vmas) +{ + return get_user_pages(start, nr_pages, gup_flags, pages, vmas); +} +#endif /* CONFIG_FS_DAX */ + int get_user_pages_fast(unsigned long start, int nr_pages, int write, struct page **pages); diff -puN mm/gup.c~mm-introduce-get_user_pages_longterm mm/gup.c --- a/mm/gup.c~mm-introduce-get_user_pages_longterm +++ a/mm/gup.c @@ -1095,6 +1095,70 @@ long get_user_pages(unsigned long start, } EXPORT_SYMBOL(get_user_pages); +#ifdef CONFIG_FS_DAX +/* + * This is the same as get_user_pages() in that it assumes we are + * operating on the current task's mm, but it goes further to validate + * that the vmas associated with the address range are suitable for + * longterm elevated page reference counts. For example, filesystem-dax + * mappings are subject to the lifetime enforced by the filesystem and + * we need guarantees that longterm users like RDMA and V4L2 only + * establish mappings that have a kernel enforced revocation mechanism. + * + * "longterm" == userspace controlled elevated page count lifetime. + * Contrast this to iov_iter_get_pages() usages which are transient. + */ +long get_user_pages_longterm(unsigned long start, unsigned long nr_pages, + unsigned int gup_flags, struct page **pages, + struct vm_area_struct **vmas_arg) +{ + struct vm_area_struct **vmas = vmas_arg; + struct vm_area_struct *vma_prev = NULL; + long rc, i; + + if (!pages) + return -EINVAL; + + if (!vmas) { + vmas = kcalloc(nr_pages, sizeof(struct vm_area_struct *), + GFP_KERNEL); + if (!vmas) + return -ENOMEM; + } + + rc = get_user_pages(start, nr_pages, gup_flags, pages, vmas); + + for (i = 0; i < rc; i++) { + struct vm_area_struct *vma = vmas[i]; + + if (vma == vma_prev) + continue; + + vma_prev = vma; + + if (vma_is_fsdax(vma)) + break; + } + + /* + * Either get_user_pages() failed, or the vma validation + * succeeded, in either case we don't need to put_page() before + * returning. + */ + if (i >= rc) + goto out; + + for (i = 0; i < rc; i++) + put_page(pages[i]); + rc = -EOPNOTSUPP; +out: + if (vmas != vmas_arg) + kfree(vmas); + return rc; +} +EXPORT_SYMBOL(get_user_pages_longterm); +#endif /* CONFIG_FS_DAX */ + /** * populate_vma_page_range() - populate a range of pages in the vma. * @vma: target vma _

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] [patch 11/28] device-dax: implement ->split() to catch invalid munmap attempts

by akpm＠linux-foundation.org

From: Dan Williams <dan.j.williams(a)intel.com> Subject: device-dax: implement ->split() to catch invalid munmap attempts Similar to how device-dax enforces that the 'address', 'offset', and 'len' parameters to mmap() be aligned to the device's fundamental alignment, the same constraints apply to munmap(). Implement ->split() to fail munmap calls that violate the alignment constraint. Otherwise, we later fail VM_BUG_ON checks in the unmap_page_range() path with crash signatures of the form: vma ffff8800b60c8a88 start 00007f88c0000000 end 00007f88c0e00000 next (null) prev (null) mm ffff8800b61150c0 prot 8000000000000027 anon_vma (null) vm_ops ffffffffa0091240 pgoff 0 file ffff8800b638ef80 private_data (null) flags: 0x380000fb(read|write|shared|mayread|maywrite|mayexec|mayshare|softdirty|mixedmap|hugepage) ------------[ cut here ]------------ kernel BUG at mm/huge_memory.c:2014! [..] RIP: 0010:__split_huge_pud+0x12a/0x180 [..] Call Trace: unmap_page_range+0x245/0xa40 ? __vma_adjust+0x301/0x990 unmap_vmas+0x4c/0xa0 unmap_region+0xae/0x120 ? __vma_rb_erase+0x11a/0x230 do_munmap+0x276/0x410 vm_munmap+0x6a/0xa0 SyS_munmap+0x1d/0x30 Link: http://lkml.kernel.org/r/151130418681.4029.7118245855057952010.stgit@dwilli… Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Reported-by: Jeff Moyer <jmoyer(a)redhat.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- drivers/dax/device.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff -puN drivers/dax/device.c~device-dax-implement-split-to-catch-invalid-munmap-attempts drivers/dax/device.c --- a/drivers/dax/device.c~device-dax-implement-split-to-catch-invalid-munmap-attempts +++ a/drivers/dax/device.c @@ -428,9 +428,21 @@ static int dev_dax_fault(struct vm_fault return dev_dax_huge_fault(vmf, PE_SIZE_PTE); } +static int dev_dax_split(struct vm_area_struct *vma, unsigned long addr) +{ + struct file *filp = vma->vm_file; + struct dev_dax *dev_dax = filp->private_data; + struct dax_region *dax_region = dev_dax->region; + + if (!IS_ALIGNED(addr, dax_region->align)) + return -EINVAL; + return 0; +} + static const struct vm_operations_struct dax_vm_ops = { .fault = dev_dax_fault, .huge_fault = dev_dax_huge_fault, + .split = dev_dax_split, }; static int dax_mmap(struct file *filp, struct vm_area_struct *vma) _

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] [patch 10/28] mm, hugetlbfs: introduce ->split() to vm_operations_struct

by akpm＠linux-foundation.org

From: Dan Williams <dan.j.williams(a)intel.com> Subject: mm, hugetlbfs: introduce ->split() to vm_operations_struct Patch series "device-dax: fix unaligned munmap handling" When device-dax is operating in huge-page mode we want it to behave like hugetlbfs and fail attempts to split vmas into unaligned ranges. It would be messy to teach the munmap path about device-dax alignment constraints in the same (hstate) way that hugetlbfs communicates this constraint. Instead, these patches introduce a new ->split() vm operation. This patch (of 2): The device-dax interface has similar constraints as hugetlbfs in that it requires the munmap path to unmap in huge page aligned units. Rather than add more custom vma handling code in __split_vma() introduce a new vm operation to perform this vma specific check. Link: http://lkml.kernel.org/r/151130418135.4029.6783191281930729710.stgit@dwilli… Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Cc: Jeff Moyer <jmoyer(a)redhat.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/mm.h | 1 + mm/hugetlb.c | 8 ++++++++ mm/mmap.c | 8 +++++--- 3 files changed, 14 insertions(+), 3 deletions(-) diff -puN include/linux/mm.h~mm-hugetlbfs-introduce-split-to-vm_operations_struct include/linux/mm.h --- a/include/linux/mm.h~mm-hugetlbfs-introduce-split-to-vm_operations_struct +++ a/include/linux/mm.h @@ -377,6 +377,7 @@ enum page_entry_size { struct vm_operations_struct { void (*open)(struct vm_area_struct * area); void (*close)(struct vm_area_struct * area); + int (*split)(struct vm_area_struct * area, unsigned long addr); int (*mremap)(struct vm_area_struct * area); int (*fault)(struct vm_fault *vmf); int (*huge_fault)(struct vm_fault *vmf, enum page_entry_size pe_size); diff -puN mm/hugetlb.c~mm-hugetlbfs-introduce-split-to-vm_operations_struct mm/hugetlb.c --- a/mm/hugetlb.c~mm-hugetlbfs-introduce-split-to-vm_operations_struct +++ a/mm/hugetlb.c @@ -3125,6 +3125,13 @@ static void hugetlb_vm_op_close(struct v } } +static int hugetlb_vm_op_split(struct vm_area_struct *vma, unsigned long addr) +{ + if (addr & ~(huge_page_mask(hstate_vma(vma)))) + return -EINVAL; + return 0; +} + /* * We cannot handle pagefaults against hugetlb pages at all. They cause * handle_mm_fault() to try to instantiate regular-sized pages in the @@ -3141,6 +3148,7 @@ const struct vm_operations_struct hugetl .fault = hugetlb_vm_op_fault, .open = hugetlb_vm_op_open, .close = hugetlb_vm_op_close, + .split = hugetlb_vm_op_split, }; static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, diff -puN mm/mmap.c~mm-hugetlbfs-introduce-split-to-vm_operations_struct mm/mmap.c --- a/mm/mmap.c~mm-hugetlbfs-introduce-split-to-vm_operations_struct +++ a/mm/mmap.c @@ -2555,9 +2555,11 @@ int __split_vma(struct mm_struct *mm, st struct vm_area_struct *new; int err; - if (is_vm_hugetlb_page(vma) && (addr & - ~(huge_page_mask(hstate_vma(vma))))) - return -EINVAL; + if (vma->vm_ops && vma->vm_ops->split) { + err = vma->vm_ops->split(vma, addr); + if (err) + return err; + } new = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL); if (!new) _

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] [patch 04/28] mm: fix device-dax pud write-faults triggered by get_user_pages()

by akpm＠linux-foundation.org

From: Dan Williams <dan.j.williams(a)intel.com> Subject: mm: fix device-dax pud write-faults triggered by get_user_pages() Currently only get_user_pages_fast() can safely handle the writable gup case due to its use of pud_access_permitted() to check whether the pud entry is writable. In the gup slow path pud_write() is used instead of pud_access_permitted() and to date it has been unimplemented, just calls BUG_ON(). kernel BUG at ./include/linux/hugetlb.h:244! [..] RIP: 0010:follow_devmap_pud+0x482/0x490 [..] Call Trace: follow_page_mask+0x28c/0x6e0 __get_user_pages+0xe4/0x6c0 get_user_pages_unlocked+0x130/0x1b0 get_user_pages_fast+0x89/0xb0 iov_iter_get_pages_alloc+0x114/0x4a0 nfs_direct_read_schedule_iovec+0xd2/0x350 ? nfs_start_io_direct+0x63/0x70 nfs_file_direct_read+0x1e0/0x250 nfs_file_read+0x90/0xc0 For now this just implements a simple check for the _PAGE_RW bit similar to pmd_write. However, this implies that the gup-slow-path check is missing the extra checks that the gup-fast-path performs with pud_access_permitted. Later patches will align all checks to use the 'access_permitted' helper if the architecture provides it. Note that the generic 'access_permitted' helper fallback is the simple _PAGE_RW check on architectures that do not define the 'access_permitted' helper(s). [dan.j.williams(a)intel.com: fix powerpc compile error] Link: http://lkml.kernel.org/r/151129126165.37405.16031785266675461397.stgit@dwil… Link: http://lkml.kernel.org/r/151043109938.2842.14834662818213616199.stgit@dwill… Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages") Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Reported-by: Stephen Rothwell <sfr(a)canb.auug.org.au> Acked-by: Thomas Gleixner <tglx(a)linutronix.de> [x86] Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: "David S. Miller" <davem(a)davemloft.net> Cc: Dave Hansen <dave.hansen(a)intel.com> Cc: Will Deacon <will.deacon(a)arm.com> Cc: "H. Peter Anvin" <hpa(a)zytor.com> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Arnd Bergmann <arnd(a)arndb.de> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- arch/x86/include/asm/pgtable.h | 6 ++++++ include/asm-generic/pgtable.h | 8 ++++++++ include/linux/hugetlb.h | 8 -------- 3 files changed, 14 insertions(+), 8 deletions(-) diff -puN arch/x86/include/asm/pgtable.h~mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages arch/x86/include/asm/pgtable.h --- a/arch/x86/include/asm/pgtable.h~mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages +++ a/arch/x86/include/asm/pgtable.h @@ -1088,6 +1088,12 @@ static inline void pmdp_set_wrprotect(st clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); } +#define pud_write pud_write +static inline int pud_write(pud_t pud) +{ + return pud_flags(pud) & _PAGE_RW; +} + /* * clone_pgd_range(pgd_t *dst, pgd_t *src, int count); * diff -puN include/asm-generic/pgtable.h~mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages include/asm-generic/pgtable.h --- a/include/asm-generic/pgtable.h~mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages +++ a/include/asm-generic/pgtable.h @@ -814,6 +814,14 @@ static inline int pmd_write(pmd_t pmd) #endif /* __HAVE_ARCH_PMD_WRITE */ #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#ifndef pud_write +static inline int pud_write(pud_t pud) +{ + BUG(); + return 0; +} +#endif /* pud_write */ + #if !defined(CONFIG_TRANSPARENT_HUGEPAGE) || \ (defined(CONFIG_TRANSPARENT_HUGEPAGE) && \ !defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD)) diff -puN include/linux/hugetlb.h~mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages include/linux/hugetlb.h --- a/include/linux/hugetlb.h~mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages +++ a/include/linux/hugetlb.h @@ -239,14 +239,6 @@ static inline int pgd_write(pgd_t pgd) } #endif -#ifndef pud_write -static inline int pud_write(pud_t pud) -{ - BUG(); - return 0; -} -#endif - #define HUGETLB_ANON_FILE "anon_hugepage" enum { _

7 years, 7 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror November 2017