November 2017 - Linux-stable-mirror

[Linux-stable-mirror] [merged] v4l2-disable-filesystem-dax-mapping-support.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: v4l2: disable filesystem-dax mapping support has been removed from the -mm tree. Its filename was v4l2-disable-filesystem-dax-mapping-support.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Dan Williams <dan.j.williams(a)intel.com> Subject: v4l2: disable filesystem-dax mapping support V4L2 memory registrations are incompatible with filesystem-dax that needs the ability to revoke dma access to a mapping at will, or otherwise allow the kernel to wait for completion of DMA. The filesystem-dax implementation breaks the traditional solution of truncate of active file backed mappings since there is no page-cache page we can orphan to sustain ongoing DMA. If v4l2 wants to support long lived DMA mappings it needs to arrange to hold a file lease or use some other mechanism so that the kernel can coordinate revoking DMA access when the filesystem needs to truncate mappings. Link: http://lkml.kernel.org/r/151068940499.7446.12846708245365671207.stgit@dwill… Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Reported-by: Jan Kara <jack(a)suse.cz> Reviewed-by: Jan Kara <jack(a)suse.cz> Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org> Cc: Christoph Hellwig <hch(a)lst.de> Cc: Doug Ledford <dledford(a)redhat.com> Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com> Cc: Inki Dae <inki.dae(a)samsung.com> Cc: Jason Gunthorpe <jgg(a)mellanox.com> Cc: Jeff Moyer <jmoyer(a)redhat.com> Cc: Joonyoung Shim <jy0922.shim(a)samsung.com> Cc: Kyungmin Park <kyungmin.park(a)samsung.com> Cc: Mel Gorman <mgorman(a)suse.de> Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com> Cc: Sean Hefty <sean.hefty(a)intel.com> Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- drivers/media/v4l2-core/videobuf-dma-sg.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff -puN drivers/media/v4l2-core/videobuf-dma-sg.c~v4l2-disable-filesystem-dax-mapping-support drivers/media/v4l2-core/videobuf-dma-sg.c --- a/drivers/media/v4l2-core/videobuf-dma-sg.c~v4l2-disable-filesystem-dax-mapping-support +++ a/drivers/media/v4l2-core/videobuf-dma-sg.c @@ -185,12 +185,13 @@ static int videobuf_dma_init_user_locked dprintk(1, "init user [0x%lx+0x%lx => %d pages]\n", data, size, dma->nr_pages); - err = get_user_pages(data & PAGE_MASK, dma->nr_pages, + err = get_user_pages_longterm(data & PAGE_MASK, dma->nr_pages, flags, dma->pages, NULL); if (err != dma->nr_pages) { dma->nr_pages = (err >= 0) ? err : 0; - dprintk(1, "get_user_pages: err=%d [%d]\n", err, dma->nr_pages); + dprintk(1, "get_user_pages_longterm: err=%d [%d]\n", err, + dma->nr_pages); return err < 0 ? err : -EINVAL; } return 0; _ Patches currently in -mm which might be from dan.j.williams(a)intel.com are

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] [merged] mm-fail-get_vaddr_frames-for-filesystem-dax-mappings.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm: fail get_vaddr_frames() for filesystem-dax mappings has been removed from the -mm tree. Its filename was mm-fail-get_vaddr_frames-for-filesystem-dax-mappings.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Dan Williams <dan.j.williams(a)intel.com> Subject: mm: fail get_vaddr_frames() for filesystem-dax mappings Until there is a solution to the dma-to-dax vs truncate problem it is not safe to allow V4L2, Exynos, and other frame vector users to create long standing / irrevocable memory registrations against filesytem-dax vmas. [dan.j.williams(a)intel.com: add comment for vma_is_fsdax() check in get_vaddr_frames(), per Jan] Link: http://lkml.kernel.org/r/151197874035.26211.4061781453123083667.stgit@dwill… Link: http://lkml.kernel.org/r/151068939985.7446.15684639617389154187.stgit@dwill… Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Reviewed-by: Jan Kara <jack(a)suse.cz> Cc: Inki Dae <inki.dae(a)samsung.com> Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com> Cc: Joonyoung Shim <jy0922.shim(a)samsung.com> Cc: Kyungmin Park <kyungmin.park(a)samsung.com> Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org> Cc: Mel Gorman <mgorman(a)suse.de> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: Christoph Hellwig <hch(a)lst.de> Cc: Doug Ledford <dledford(a)redhat.com> Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com> Cc: Jason Gunthorpe <jgg(a)mellanox.com> Cc: Jeff Moyer <jmoyer(a)redhat.com> Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com> Cc: Sean Hefty <sean.hefty(a)intel.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/frame_vector.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff -puN mm/frame_vector.c~mm-fail-get_vaddr_frames-for-filesystem-dax-mappings mm/frame_vector.c --- a/mm/frame_vector.c~mm-fail-get_vaddr_frames-for-filesystem-dax-mappings +++ a/mm/frame_vector.c @@ -53,6 +53,18 @@ int get_vaddr_frames(unsigned long start ret = -EFAULT; goto out; } + + /* + * While get_vaddr_frames() could be used for transient (kernel + * controlled lifetime) pinning of memory pages all current + * users establish long term (userspace controlled lifetime) + * page pinning. Treat get_vaddr_frames() like + * get_user_pages_longterm() and disallow it for filesystem-dax + * mappings. + */ + if (vma_is_fsdax(vma)) + return -EOPNOTSUPP; + if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) { vec->got_ref = true; vec->is_pfns = false; _ Patches currently in -mm which might be from dan.j.williams(a)intel.com are

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] [merged] mm-introduce-get_user_pages_longterm.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm: introduce get_user_pages_longterm has been removed from the -mm tree. Its filename was mm-introduce-get_user_pages_longterm.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Dan Williams <dan.j.williams(a)intel.com> Subject: mm: introduce get_user_pages_longterm Patch series "introduce get_user_pages_longterm()", v2. Here is a new get_user_pages api for cases where a driver intends to keep an elevated page count indefinitely. This is distinct from usages like iov_iter_get_pages where the elevated page counts are transient. The iov_iter_get_pages cases immediately turn around and submit the pages to a device driver which will put_page when the i/o operation completes (under kernel control). In the longterm case userspace is responsible for dropping the page reference at some undefined point in the future. This is untenable for filesystem-dax case where the filesystem is in control of the lifetime of the block / page and needs reasonable limits on how long it can wait for pages in a mapping to become idle. Fixing filesystems to actually wait for dax pages to be idle before blocks from a truncate/hole-punch operation are repurposed is saved for a later patch series. Also, allowing longterm registration of dax mappings is a future patch series that introduces a "map with lease" semantic where the kernel can revoke a lease and force userspace to drop its page references. I have also tagged these for -stable to purposely break cases that might assume that longterm memory registrations for filesystem-dax mappings were supported by the kernel. The behavior regression this policy change implies is one of the reasons we maintain the "dax enabled. Warning: EXPERIMENTAL, use at your own risk" notification when mounting a filesystem in dax mode. It is worth noting the device-dax interface does not suffer the same constraints since it does not support file space management operations like hole-punch. This patch (of 4): Until there is a solution to the dma-to-dax vs truncate problem it is not safe to allow long standing memory registrations against filesytem-dax vmas. Device-dax vmas do not have this problem and are explicitly allowed. This is temporary until a "memory registration with layout-lease" mechanism can be implemented for the affected sub-systems (RDMA and V4L2). [akpm(a)linux-foundation.org: use kcalloc()] Link: http://lkml.kernel.org/r/151068939435.7446.13560129395419350737.stgit@dwill… Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Suggested-by: Christoph Hellwig <hch(a)lst.de> Cc: Doug Ledford <dledford(a)redhat.com> Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com> Cc: Inki Dae <inki.dae(a)samsung.com> Cc: Jan Kara <jack(a)suse.cz> Cc: Jason Gunthorpe <jgg(a)mellanox.com> Cc: Jeff Moyer <jmoyer(a)redhat.com> Cc: Joonyoung Shim <jy0922.shim(a)samsung.com> Cc: Kyungmin Park <kyungmin.park(a)samsung.com> Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org> Cc: Mel Gorman <mgorman(a)suse.de> Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com> Cc: Sean Hefty <sean.hefty(a)intel.com> Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/fs.h | 14 +++++++++ include/linux/mm.h | 13 ++++++++ mm/gup.c | 64 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 91 insertions(+) diff -puN include/linux/fs.h~mm-introduce-get_user_pages_longterm include/linux/fs.h --- a/include/linux/fs.h~mm-introduce-get_user_pages_longterm +++ a/include/linux/fs.h @@ -3194,6 +3194,20 @@ static inline bool vma_is_dax(struct vm_ return vma->vm_file && IS_DAX(vma->vm_file->f_mapping->host); } +static inline bool vma_is_fsdax(struct vm_area_struct *vma) +{ + struct inode *inode; + + if (!vma->vm_file) + return false; + if (!vma_is_dax(vma)) + return false; + inode = file_inode(vma->vm_file); + if (inode->i_mode == S_IFCHR) + return false; /* device-dax */ + return true; +} + static inline int iocb_flags(struct file *file) { int res = 0; diff -puN include/linux/mm.h~mm-introduce-get_user_pages_longterm include/linux/mm.h --- a/include/linux/mm.h~mm-introduce-get_user_pages_longterm +++ a/include/linux/mm.h @@ -1380,6 +1380,19 @@ long get_user_pages_locked(unsigned long unsigned int gup_flags, struct page **pages, int *locked); long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages, struct page **pages, unsigned int gup_flags); +#ifdef CONFIG_FS_DAX +long get_user_pages_longterm(unsigned long start, unsigned long nr_pages, + unsigned int gup_flags, struct page **pages, + struct vm_area_struct **vmas); +#else +static inline long get_user_pages_longterm(unsigned long start, + unsigned long nr_pages, unsigned int gup_flags, + struct page **pages, struct vm_area_struct **vmas) +{ + return get_user_pages(start, nr_pages, gup_flags, pages, vmas); +} +#endif /* CONFIG_FS_DAX */ + int get_user_pages_fast(unsigned long start, int nr_pages, int write, struct page **pages); diff -puN mm/gup.c~mm-introduce-get_user_pages_longterm mm/gup.c --- a/mm/gup.c~mm-introduce-get_user_pages_longterm +++ a/mm/gup.c @@ -1095,6 +1095,70 @@ long get_user_pages(unsigned long start, } EXPORT_SYMBOL(get_user_pages); +#ifdef CONFIG_FS_DAX +/* + * This is the same as get_user_pages() in that it assumes we are + * operating on the current task's mm, but it goes further to validate + * that the vmas associated with the address range are suitable for + * longterm elevated page reference counts. For example, filesystem-dax + * mappings are subject to the lifetime enforced by the filesystem and + * we need guarantees that longterm users like RDMA and V4L2 only + * establish mappings that have a kernel enforced revocation mechanism. + * + * "longterm" == userspace controlled elevated page count lifetime. + * Contrast this to iov_iter_get_pages() usages which are transient. + */ +long get_user_pages_longterm(unsigned long start, unsigned long nr_pages, + unsigned int gup_flags, struct page **pages, + struct vm_area_struct **vmas_arg) +{ + struct vm_area_struct **vmas = vmas_arg; + struct vm_area_struct *vma_prev = NULL; + long rc, i; + + if (!pages) + return -EINVAL; + + if (!vmas) { + vmas = kcalloc(nr_pages, sizeof(struct vm_area_struct *), + GFP_KERNEL); + if (!vmas) + return -ENOMEM; + } + + rc = get_user_pages(start, nr_pages, gup_flags, pages, vmas); + + for (i = 0; i < rc; i++) { + struct vm_area_struct *vma = vmas[i]; + + if (vma == vma_prev) + continue; + + vma_prev = vma; + + if (vma_is_fsdax(vma)) + break; + } + + /* + * Either get_user_pages() failed, or the vma validation + * succeeded, in either case we don't need to put_page() before + * returning. + */ + if (i >= rc) + goto out; + + for (i = 0; i < rc; i++) + put_page(pages[i]); + rc = -EOPNOTSUPP; +out: + if (vmas != vmas_arg) + kfree(vmas); + return rc; +} +EXPORT_SYMBOL(get_user_pages_longterm); +#endif /* CONFIG_FS_DAX */ + /** * populate_vma_page_range() - populate a range of pages in the vma. * @vma: target vma _ Patches currently in -mm which might be from dan.j.williams(a)intel.com are

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] [merged] device-dax-implement-split-to-catch-invalid-munmap-attempts.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: device-dax: implement ->split() to catch invalid munmap attempts has been removed from the -mm tree. Its filename was device-dax-implement-split-to-catch-invalid-munmap-attempts.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Dan Williams <dan.j.williams(a)intel.com> Subject: device-dax: implement ->split() to catch invalid munmap attempts Similar to how device-dax enforces that the 'address', 'offset', and 'len' parameters to mmap() be aligned to the device's fundamental alignment, the same constraints apply to munmap(). Implement ->split() to fail munmap calls that violate the alignment constraint. Otherwise, we later fail VM_BUG_ON checks in the unmap_page_range() path with crash signatures of the form: vma ffff8800b60c8a88 start 00007f88c0000000 end 00007f88c0e00000 next (null) prev (null) mm ffff8800b61150c0 prot 8000000000000027 anon_vma (null) vm_ops ffffffffa0091240 pgoff 0 file ffff8800b638ef80 private_data (null) flags: 0x380000fb(read|write|shared|mayread|maywrite|mayexec|mayshare|softdirty|mixedmap|hugepage) ------------[ cut here ]------------ kernel BUG at mm/huge_memory.c:2014! [..] RIP: 0010:__split_huge_pud+0x12a/0x180 [..] Call Trace: unmap_page_range+0x245/0xa40 ? __vma_adjust+0x301/0x990 unmap_vmas+0x4c/0xa0 unmap_region+0xae/0x120 ? __vma_rb_erase+0x11a/0x230 do_munmap+0x276/0x410 vm_munmap+0x6a/0xa0 SyS_munmap+0x1d/0x30 Link: http://lkml.kernel.org/r/151130418681.4029.7118245855057952010.stgit@dwilli… Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Reported-by: Jeff Moyer <jmoyer(a)redhat.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- drivers/dax/device.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff -puN drivers/dax/device.c~device-dax-implement-split-to-catch-invalid-munmap-attempts drivers/dax/device.c --- a/drivers/dax/device.c~device-dax-implement-split-to-catch-invalid-munmap-attempts +++ a/drivers/dax/device.c @@ -428,9 +428,21 @@ static int dev_dax_fault(struct vm_fault return dev_dax_huge_fault(vmf, PE_SIZE_PTE); } +static int dev_dax_split(struct vm_area_struct *vma, unsigned long addr) +{ + struct file *filp = vma->vm_file; + struct dev_dax *dev_dax = filp->private_data; + struct dax_region *dax_region = dev_dax->region; + + if (!IS_ALIGNED(addr, dax_region->align)) + return -EINVAL; + return 0; +} + static const struct vm_operations_struct dax_vm_ops = { .fault = dev_dax_fault, .huge_fault = dev_dax_huge_fault, + .split = dev_dax_split, }; static int dax_mmap(struct file *filp, struct vm_area_struct *vma) _ Patches currently in -mm which might be from dan.j.williams(a)intel.com are

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] [merged] mm-hugetlbfs-introduce-split-to-vm_operations_struct.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm, hugetlbfs: introduce ->split() to vm_operations_struct has been removed from the -mm tree. Its filename was mm-hugetlbfs-introduce-split-to-vm_operations_struct.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Dan Williams <dan.j.williams(a)intel.com> Subject: mm, hugetlbfs: introduce ->split() to vm_operations_struct Patch series "device-dax: fix unaligned munmap handling" When device-dax is operating in huge-page mode we want it to behave like hugetlbfs and fail attempts to split vmas into unaligned ranges. It would be messy to teach the munmap path about device-dax alignment constraints in the same (hstate) way that hugetlbfs communicates this constraint. Instead, these patches introduce a new ->split() vm operation. This patch (of 2): The device-dax interface has similar constraints as hugetlbfs in that it requires the munmap path to unmap in huge page aligned units. Rather than add more custom vma handling code in __split_vma() introduce a new vm operation to perform this vma specific check. Link: http://lkml.kernel.org/r/151130418135.4029.6783191281930729710.stgit@dwilli… Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Cc: Jeff Moyer <jmoyer(a)redhat.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/mm.h | 1 + mm/hugetlb.c | 8 ++++++++ mm/mmap.c | 8 +++++--- 3 files changed, 14 insertions(+), 3 deletions(-) diff -puN include/linux/mm.h~mm-hugetlbfs-introduce-split-to-vm_operations_struct include/linux/mm.h --- a/include/linux/mm.h~mm-hugetlbfs-introduce-split-to-vm_operations_struct +++ a/include/linux/mm.h @@ -377,6 +377,7 @@ enum page_entry_size { struct vm_operations_struct { void (*open)(struct vm_area_struct * area); void (*close)(struct vm_area_struct * area); + int (*split)(struct vm_area_struct * area, unsigned long addr); int (*mremap)(struct vm_area_struct * area); int (*fault)(struct vm_fault *vmf); int (*huge_fault)(struct vm_fault *vmf, enum page_entry_size pe_size); diff -puN mm/hugetlb.c~mm-hugetlbfs-introduce-split-to-vm_operations_struct mm/hugetlb.c --- a/mm/hugetlb.c~mm-hugetlbfs-introduce-split-to-vm_operations_struct +++ a/mm/hugetlb.c @@ -3125,6 +3125,13 @@ static void hugetlb_vm_op_close(struct v } } +static int hugetlb_vm_op_split(struct vm_area_struct *vma, unsigned long addr) +{ + if (addr & ~(huge_page_mask(hstate_vma(vma)))) + return -EINVAL; + return 0; +} + /* * We cannot handle pagefaults against hugetlb pages at all. They cause * handle_mm_fault() to try to instantiate regular-sized pages in the @@ -3141,6 +3148,7 @@ const struct vm_operations_struct hugetl .fault = hugetlb_vm_op_fault, .open = hugetlb_vm_op_open, .close = hugetlb_vm_op_close, + .split = hugetlb_vm_op_split, }; static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, diff -puN mm/mmap.c~mm-hugetlbfs-introduce-split-to-vm_operations_struct mm/mmap.c --- a/mm/mmap.c~mm-hugetlbfs-introduce-split-to-vm_operations_struct +++ a/mm/mmap.c @@ -2555,9 +2555,11 @@ int __split_vma(struct mm_struct *mm, st struct vm_area_struct *new; int err; - if (is_vm_hugetlb_page(vma) && (addr & - ~(huge_page_mask(hstate_vma(vma))))) - return -EINVAL; + if (vma->vm_ops && vma->vm_ops->split) { + err = vma->vm_ops->split(vma, addr); + if (err) + return err; + } new = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL); if (!new) _ Patches currently in -mm which might be from dan.j.williams(a)intel.com are

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] [merged] mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm: fix device-dax pud write-faults triggered by get_user_pages() has been removed from the -mm tree. Its filename was mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Dan Williams <dan.j.williams(a)intel.com> Subject: mm: fix device-dax pud write-faults triggered by get_user_pages() Currently only get_user_pages_fast() can safely handle the writable gup case due to its use of pud_access_permitted() to check whether the pud entry is writable. In the gup slow path pud_write() is used instead of pud_access_permitted() and to date it has been unimplemented, just calls BUG_ON(). kernel BUG at ./include/linux/hugetlb.h:244! [..] RIP: 0010:follow_devmap_pud+0x482/0x490 [..] Call Trace: follow_page_mask+0x28c/0x6e0 __get_user_pages+0xe4/0x6c0 get_user_pages_unlocked+0x130/0x1b0 get_user_pages_fast+0x89/0xb0 iov_iter_get_pages_alloc+0x114/0x4a0 nfs_direct_read_schedule_iovec+0xd2/0x350 ? nfs_start_io_direct+0x63/0x70 nfs_file_direct_read+0x1e0/0x250 nfs_file_read+0x90/0xc0 For now this just implements a simple check for the _PAGE_RW bit similar to pmd_write. However, this implies that the gup-slow-path check is missing the extra checks that the gup-fast-path performs with pud_access_permitted. Later patches will align all checks to use the 'access_permitted' helper if the architecture provides it. Note that the generic 'access_permitted' helper fallback is the simple _PAGE_RW check on architectures that do not define the 'access_permitted' helper(s). [dan.j.williams(a)intel.com: fix powerpc compile error] Link: http://lkml.kernel.org/r/151129126165.37405.16031785266675461397.stgit@dwil… Link: http://lkml.kernel.org/r/151043109938.2842.14834662818213616199.stgit@dwill… Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages") Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Reported-by: Stephen Rothwell <sfr(a)canb.auug.org.au> Acked-by: Thomas Gleixner <tglx(a)linutronix.de> [x86] Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: "David S. Miller" <davem(a)davemloft.net> Cc: Dave Hansen <dave.hansen(a)intel.com> Cc: Will Deacon <will.deacon(a)arm.com> Cc: "H. Peter Anvin" <hpa(a)zytor.com> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Arnd Bergmann <arnd(a)arndb.de> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- arch/x86/include/asm/pgtable.h | 6 ++++++ include/asm-generic/pgtable.h | 8 ++++++++ include/linux/hugetlb.h | 8 -------- 3 files changed, 14 insertions(+), 8 deletions(-) diff -puN arch/x86/include/asm/pgtable.h~mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages arch/x86/include/asm/pgtable.h --- a/arch/x86/include/asm/pgtable.h~mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages +++ a/arch/x86/include/asm/pgtable.h @@ -1088,6 +1088,12 @@ static inline void pmdp_set_wrprotect(st clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); } +#define pud_write pud_write +static inline int pud_write(pud_t pud) +{ + return pud_flags(pud) & _PAGE_RW; +} + /* * clone_pgd_range(pgd_t *dst, pgd_t *src, int count); * diff -puN include/asm-generic/pgtable.h~mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages include/asm-generic/pgtable.h --- a/include/asm-generic/pgtable.h~mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages +++ a/include/asm-generic/pgtable.h @@ -814,6 +814,14 @@ static inline int pmd_write(pmd_t pmd) #endif /* __HAVE_ARCH_PMD_WRITE */ #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#ifndef pud_write +static inline int pud_write(pud_t pud) +{ + BUG(); + return 0; +} +#endif /* pud_write */ + #if !defined(CONFIG_TRANSPARENT_HUGEPAGE) || \ (defined(CONFIG_TRANSPARENT_HUGEPAGE) && \ !defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD)) diff -puN include/linux/hugetlb.h~mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages include/linux/hugetlb.h --- a/include/linux/hugetlb.h~mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages +++ a/include/linux/hugetlb.h @@ -239,14 +239,6 @@ static inline int pgd_write(pgd_t pgd) } #endif -#ifndef pud_write -static inline int pud_write(pud_t pud) -{ - BUG(); - return 0; -} -#endif - #define HUGETLB_ANON_FILE "anon_hugepage" enum { _ Patches currently in -mm which might be from dan.j.williams(a)intel.com are

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] [merged] mm-cma-fix-alloc_contig_range-ret-code-potential-leak-v2.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm/cma: fix alloc_contig_range ret code/potential leak has been removed from the -mm tree. Its filename was mm-cma-fix-alloc_contig_range-ret-code-potential-leak-v2.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Mike Kravetz <mike.kravetz(a)oracle.com> Subject: mm/cma: fix alloc_contig_range ret code/potential leak If the call __alloc_contig_migrate_range() in alloc_contig_range returns -EBUSY, processing continues so that test_pages_isolated() is called where there is a tracepoint to identify the busy pages. However, it is possible for busy pages to become available between the calls to these two routines. In this case, the range of pages may be allocated. Unfortunately, the original return code (ret == -EBUSY) is still set and returned to the caller. Therefore, the caller believes the pages were not allocated and they are leaked. Update comment to indicate that allocation is still possible even if __alloc_contig_migrate_range returns -EBUSY. Also, clear return code in this case so that it is not accidentally used or returned to caller. Link: http://lkml.kernel.org/r/20171122185214.25285-1-mike.kravetz@oracle.com Fixes: 8ef5849fa8a2 ("mm/cma: always check which page caused allocation failure") Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com> Acked-by: Vlastimil Babka <vbabka(a)suse.cz> Acked-by: Michal Hocko <mhocko(a)suse.com> Acked-by: Johannes Weiner <hannes(a)cmpxchg.org> Acked-by: Joonsoo Kim <iamjoonsoo.kim(a)lge.com> Cc: Michal Nazarewicz <mina86(a)mina86.com> Cc: Laura Abbott <labbott(a)redhat.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Mel Gorman <mgorman(a)techsingularity.net> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/page_alloc.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff -puN mm/page_alloc.c~mm-cma-fix-alloc_contig_range-ret-code-potential-leak-v2 mm/page_alloc.c --- a/mm/page_alloc.c~mm-cma-fix-alloc_contig_range-ret-code-potential-leak-v2 +++ a/mm/page_alloc.c @@ -7652,11 +7652,18 @@ int alloc_contig_range(unsigned long sta /* * In case of -EBUSY, we'd like to know which page causes problem. - * So, just fall through. We will check it in test_pages_isolated(). + * So, just fall through. test_pages_isolated() has a tracepoint + * which will report the busy page. + * + * It is possible that busy pages could become available before + * the call to test_pages_isolated, and the range will actually be + * allocated. So, if we fall through be sure to clear ret so that + * -EBUSY is not accidentally used or returned to caller. */ ret = __alloc_contig_migrate_range(&cc, start, end); if (ret && ret != -EBUSY) goto done; + ret =0; /* * Pages from [start, end) are within a MAX_ORDER_NR_PAGES _ Patches currently in -mm which might be from mike.kravetz(a)oracle.com are

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] [merged] mm-oom_reaper-gather-each-vma-to-prevent-leaking-tlb-entry.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm, oom_reaper: gather each vma to prevent leaking TLB entry has been removed from the -mm tree. Its filename was mm-oom_reaper-gather-each-vma-to-prevent-leaking-tlb-entry.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Wang Nan <wangnan0(a)huawei.com> Subject: mm, oom_reaper: gather each vma to prevent leaking TLB entry tlb_gather_mmu(&tlb, mm, 0, -1) means gathering the whole virtual memory space. In this case, tlb->fullmm is true. Some archs like arm64 doesn't flush TLB when tlb->fullmm is true: commit 5a7862e83000 ("arm64: tlbflush: avoid flushing when fullmm == 1"). Which causes leaking of tlb entries. Will clarifies his patch: : Basically, we tag each address space with an ASID (PCID on x86) which : is resident in the TLB. This means we can elide TLB invalidation when : pulling down a full mm because we won't ever assign that ASID to another mm : without doing TLB invalidation elsewhere (which actually just nukes the : whole TLB). : : I think that means that we could potentially not fault on a kernel uaccess, : because we could hit in the TLB. There could be a window between complete_signal() sending IPI to other cores and all threads sharing this mm are really kicked off from cores. In this window, the oom reaper may calls tlb_flush_mmu_tlbonly() to flush TLB then frees pages. However, due to the above problem, the TLB entries are not really flushed on arm64. Other threads are possible to access these pages through TLB entries. Moreover, a copy_to_user() can also write to these pages without generating page fault, causes use-after-free bugs. This patch gathers each vma instead of gathering full vm space. In this case tlb->fullmm is not true. The behavior of oom reaper become similar to munmapping before do_exit, which should be safe for all archs. Link: http://lkml.kernel.org/r/20171107095453.179940-1-wangnan0@huawei.com Fixes: aac453635549 ("mm, oom: introduce oom reaper") Signed-off-by: Wang Nan <wangnan0(a)huawei.com> Acked-by: Michal Hocko <mhocko(a)suse.com> Acked-by: David Rientjes <rientjes(a)google.com> Cc: Minchan Kim <minchan(a)kernel.org> Cc: Will Deacon <will.deacon(a)arm.com> Cc: Bob Liu <liubo95(a)huawei.com> Cc: Ingo Molnar <mingo(a)kernel.org> Cc: Roman Gushchin <guro(a)fb.com> Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/oom_kill.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff -puN mm/oom_kill.c~mm-oom_reaper-gather-each-vma-to-prevent-leaking-tlb-entry mm/oom_kill.c --- a/mm/oom_kill.c~mm-oom_reaper-gather-each-vma-to-prevent-leaking-tlb-entry +++ a/mm/oom_kill.c @@ -550,7 +550,6 @@ static bool __oom_reap_task_mm(struct ta */ set_bit(MMF_UNSTABLE, &mm->flags); - tlb_gather_mmu(&tlb, mm, 0, -1); for (vma = mm->mmap ; vma; vma = vma->vm_next) { if (!can_madv_dontneed_vma(vma)) continue; @@ -565,11 +564,13 @@ static bool __oom_reap_task_mm(struct ta * we do not want to block exit_mmap by keeping mm ref * count elevated without a good reason. */ - if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED)) + if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED)) { + tlb_gather_mmu(&tlb, mm, vma->vm_start, vma->vm_end); unmap_page_range(&tlb, vma, vma->vm_start, vma->vm_end, NULL); + tlb_finish_mmu(&tlb, vma->vm_start, vma->vm_end); + } } - tlb_finish_mmu(&tlb, 0, -1); pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n", task_pid_nr(tsk), tsk->comm, K(get_mm_counter(mm, MM_ANONPAGES)), _ Patches currently in -mm which might be from wangnan0(a)huawei.com are

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] [merged]mm-memory_hotplug-do-not-back-off-draining-pcp-free-pages-from-kworker-context.patchremoved from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm, memory_hotplug: do not back off draining pcp free pages from kworker context has been removed from the -mm tree. Its filename was mm-memory_hotplug-do-not-back-off-draining-pcp-free-pages-from-kworker-context.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Michal Hocko <mhocko(a)suse.com> Subject: mm, memory_hotplug: do not back off draining pcp free pages from kworker context drain_all_pages backs off when called from a kworker context since 0ccce3b924212 ("mm, page_alloc: drain per-cpu pages from workqueue context") because the original IPI based pcp draining has been replaced by a WQ based one and the check wanted to prevent from recursion and inter workers dependencies. This has made some sense at the time because the system WQ has been used and one worker holding the lock could be blocked while waiting for new workers to emerge which can be a problem under OOM conditions. Since then ce612879ddc7 ("mm: move pcp and lru-pcp draining into single wq") has moved draining to a dedicated (mm_percpu_wq) WQ with a rescuer so we shouldn't depend on any other WQ activity to make a forward progress so calling drain_all_pages from a worker context is safe as long as this doesn't happen from mm_percpu_wq itself which is not the case because all workers are required to _not_ depend on any MM locks. Why is this a problem in the first place? ACPI driven memory hot-remove (acpi_device_hotplug) is executed from the worker context. We end up calling __offline_pages to free all the pages and that requires both lru_add_drain_all_cpuslocked and drain_all_pages to do their job otherwise we can have dangling pages on pcp lists and fail the offline operation (__test_page_isolated_in_pageblock would see a page with 0 ref. count but without PageBuddy set). Fix the issue by removing the worker check in drain_all_pages. lru_add_drain_all_cpuslocked doesn't have this restriction so it works as expected. Link: http://lkml.kernel.org/r/20170828093341.26341-1-mhocko@kernel.org Fixes: 0ccce3b924212 ("mm, page_alloc: drain per-cpu pages from workqueue context") Signed-off-by: Michal Hocko <mhocko(a)suse.com> Cc: Mel Gorman <mgorman(a)suse.de> Cc: Tejun Heo <tj(a)kernel.org> Cc: <stable(a)vger.kernel.org> [4.11+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/page_alloc.c | 4 ---- 1 file changed, 4 deletions(-) diff -puN mm/page_alloc.c~mm-memory_hotplug-do-not-back-off-draining-pcp-free-pages-from-kworker-context mm/page_alloc.c --- a/mm/page_alloc.c~mm-memory_hotplug-do-not-back-off-draining-pcp-free-pages-from-kworker-context +++ a/mm/page_alloc.c @@ -2507,10 +2507,6 @@ void drain_all_pages(struct zone *zone) if (WARN_ON_ONCE(!mm_percpu_wq)) return; - /* Workqueues cannot recurse */ - if (current->flags & PF_WQ_WORKER) - return; - /* * Do not drain if one is already in progress unless it's specific to * a zone. Such callers are primarily CMA and memory hotplug and need _ Patches currently in -mm which might be from mhocko(a)suse.com are mm-drop-hotplug-lock-from-lru_add_drain_all.patch mm-hugetlb-drop-hugepages_treat_as_movable-sysctl.patch

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] [RESEND] Kaiser backport to stable v4.9.y

by Eduardo Valentin

Hello, (correcting stable tree mailing list address and add GregKH) I have created this branch with the KAISER patches and dependencies to v4.9.y. This is massive, I know. But I attempted to include all dependencies I saw in the mailing list discussions. The backport is done from the tip/WIP.x86/mm branch. The list of patches include: a. Several patch dependencies that change x86 arch code so following applies. b. Andy Lutomirski work to refactor the x86 entry code. c. Andy Lutomirski work to do the x86 trampolim. d. Dave Handen's work to incorporate the KAISER feature on x86. e. Several fixes/improvements on KAISER by tglx and PeterZ. Branch is here: https://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux.git/log/?h=b… Right now, I am still validating it under different scenarios. First shot on the same lseek1 [1] micro bench, a see the Kernel being ~4x slower: -> Without KAISER: ~14Mlseek/s. -> With KAISER: ~3.6Mlseek/s. If anybody is interested in testing please send feedback. Also, if somebody else is working on a minimalist backport of the feature to v4.9 or other stable kernels, let me know. [1] - https://github.com/antonblanchard/will-it-scale/blob/master/tests/lseek1.c/ -- All the best, Eduardo Valentin

7 years, 7 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror November 2017