This is a note to let you know that I've just added the patch titled
mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mm-thp-do-not-make-page-table-dirty-unconditionally-in-touch_pd.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From a8f97366452ed491d13cf1e44241bc0b5740b1f0 Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Date: Mon, 27 Nov 2017 06:21:25 +0300
Subject: mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()
From: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
commit a8f97366452ed491d13cf1e44241bc0b5740b1f0 upstream.
Currently, we unconditionally make page table dirty in touch_pmd().
It may result in false-positive can_follow_write_pmd().
We may avoid the situation, if we would only make the page table entry
dirty if caller asks for write access -- FOLL_WRITE.
The patch also changes touch_pud() in the same way.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
[Salvatore Bonaccorso: backport for 4.9:
- Adjust context
- Drop specific part for PUD-sized transparent hugepages. Support
for PUD-sized transparent hugepages was added in v4.11-rc1
]
Signed-off-by: Ben Hutchings <ben(a)decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
mm/huge_memory.c | 19 +++++++------------
1 file changed, 7 insertions(+), 12 deletions(-)
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -745,20 +745,15 @@ int vmf_insert_pfn_pmd(struct vm_area_st
EXPORT_SYMBOL_GPL(vmf_insert_pfn_pmd);
static void touch_pmd(struct vm_area_struct *vma, unsigned long addr,
- pmd_t *pmd)
+ pmd_t *pmd, int flags)
{
pmd_t _pmd;
- /*
- * We should set the dirty bit only for FOLL_WRITE but for now
- * the dirty bit in the pmd is meaningless. And if the dirty
- * bit will become meaningful and we'll only set it with
- * FOLL_WRITE, an atomic set_bit will be required on the pmd to
- * set the young bit, instead of the current set_pmd_at.
- */
- _pmd = pmd_mkyoung(pmd_mkdirty(*pmd));
+ _pmd = pmd_mkyoung(*pmd);
+ if (flags & FOLL_WRITE)
+ _pmd = pmd_mkdirty(_pmd);
if (pmdp_set_access_flags(vma, addr & HPAGE_PMD_MASK,
- pmd, _pmd, 1))
+ pmd, _pmd, flags & FOLL_WRITE))
update_mmu_cache_pmd(vma, addr, pmd);
}
@@ -787,7 +782,7 @@ struct page *follow_devmap_pmd(struct vm
return NULL;
if (flags & FOLL_TOUCH)
- touch_pmd(vma, addr, pmd);
+ touch_pmd(vma, addr, pmd, flags);
/*
* device mapped pages can only be returned if the
@@ -1158,7 +1153,7 @@ struct page *follow_trans_huge_pmd(struc
page = pmd_page(*pmd);
VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
if (flags & FOLL_TOUCH)
- touch_pmd(vma, addr, pmd);
+ touch_pmd(vma, addr, pmd, flags);
if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) {
/*
* We don't mlock() pte-mapped THPs. This way we can avoid
Patches currently in stable-queue which might be from kirill.shutemov(a)linux.intel.com are
queue-4.9/mm-thp-do-not-make-page-table-dirty-unconditionally-in-touch_pd.patch
queue-4.9/mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch
This is a note to let you know that I've just added the patch titled
mm/madvise.c: fix madvise() infinite loop under special circumstances
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 6ea8d958a2c95a1d514015d4e29ba21a8c0a1a91 Mon Sep 17 00:00:00 2001
From: chenjie <chenjie6(a)huawei.com>
Date: Wed, 29 Nov 2017 16:10:54 -0800
Subject: mm/madvise.c: fix madvise() infinite loop under special circumstances
From: chenjie <chenjie6(a)huawei.com>
commit 6ea8d958a2c95a1d514015d4e29ba21a8c0a1a91 upstream.
MADVISE_WILLNEED has always been a noop for DAX (formerly XIP) mappings.
Unfortunately madvise_willneed() doesn't communicate this information
properly to the generic madvise syscall implementation. The calling
convention is quite subtle there. madvise_vma() is supposed to either
return an error or update &prev otherwise the main loop will never
advance to the next vma and it will keep looping for ever without a way
to get out of the kernel.
It seems this has been broken since introduction. Nobody has noticed
because nobody seems to be using MADVISE_WILLNEED on these DAX mappings.
[mhocko(a)suse.com: rewrite changelog]
Link: http://lkml.kernel.org/r/20171127115318.911-1-guoxuenan@huawei.com
Fixes: fe77ba6f4f97 ("[PATCH] xip: madvice/fadvice: execute in place")
Signed-off-by: chenjie <chenjie6(a)huawei.com>
Signed-off-by: guoxuenan <guoxuenan(a)huawei.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: zhangyi (F) <yi.zhang(a)huawei.com>
Cc: Miao Xie <miaoxie(a)huawei.com>
Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com>
Cc: Shaohua Li <shli(a)fb.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Anshuman Khandual <khandual(a)linux.vnet.ibm.com>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Carsten Otte <cotte(a)de.ibm.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
mm/madvise.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -228,15 +228,14 @@ static long madvise_willneed(struct vm_a
{
struct file *file = vma->vm_file;
+ *prev = vma;
#ifdef CONFIG_SWAP
if (!file) {
- *prev = vma;
force_swapin_readahead(vma, start, end);
return 0;
}
if (shmem_mapping(file->f_mapping)) {
- *prev = vma;
force_shm_swapin_readahead(vma, start, end,
file->f_mapping);
return 0;
@@ -251,7 +250,6 @@ static long madvise_willneed(struct vm_a
return 0;
}
- *prev = vma;
start = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
if (end > vma->vm_end)
end = vma->vm_end;
Patches currently in stable-queue which might be from chenjie6(a)huawei.com are
queue-4.9/mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch
This is a note to let you know that I've just added the patch titled
mm, hugetlbfs: introduce ->split() to vm_operations_struct
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mm-hugetlbfs-introduce-split-to-vm_operations_struct.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 31383c6865a578834dd953d9dbc88e6b19fe3997 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 29 Nov 2017 16:10:28 -0800
Subject: mm, hugetlbfs: introduce ->split() to vm_operations_struct
From: Dan Williams <dan.j.williams(a)intel.com>
commit 31383c6865a578834dd953d9dbc88e6b19fe3997 upstream.
Patch series "device-dax: fix unaligned munmap handling"
When device-dax is operating in huge-page mode we want it to behave like
hugetlbfs and fail attempts to split vmas into unaligned ranges. It
would be messy to teach the munmap path about device-dax alignment
constraints in the same (hstate) way that hugetlbfs communicates this
constraint. Instead, these patches introduce a new ->split() vm
operation.
This patch (of 2):
The device-dax interface has similar constraints as hugetlbfs in that it
requires the munmap path to unmap in huge page aligned units. Rather
than add more custom vma handling code in __split_vma() introduce a new
vm operation to perform this vma specific check.
Link: http://lkml.kernel.org/r/151130418135.4029.6783191281930729710.stgit@dwilli…
Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/linux/mm.h | 1 +
mm/hugetlb.c | 8 ++++++++
mm/mmap.c | 8 +++++---
3 files changed, 14 insertions(+), 3 deletions(-)
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -347,6 +347,7 @@ struct fault_env {
struct vm_operations_struct {
void (*open)(struct vm_area_struct * area);
void (*close)(struct vm_area_struct * area);
+ int (*split)(struct vm_area_struct * area, unsigned long addr);
int (*mremap)(struct vm_area_struct * area);
int (*fault)(struct vm_area_struct *vma, struct vm_fault *vmf);
int (*pmd_fault)(struct vm_area_struct *, unsigned long address,
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3135,6 +3135,13 @@ static void hugetlb_vm_op_close(struct v
}
}
+static int hugetlb_vm_op_split(struct vm_area_struct *vma, unsigned long addr)
+{
+ if (addr & ~(huge_page_mask(hstate_vma(vma))))
+ return -EINVAL;
+ return 0;
+}
+
/*
* We cannot handle pagefaults against hugetlb pages at all. They cause
* handle_mm_fault() to try to instantiate regular-sized pages in the
@@ -3151,6 +3158,7 @@ const struct vm_operations_struct hugetl
.fault = hugetlb_vm_op_fault,
.open = hugetlb_vm_op_open,
.close = hugetlb_vm_op_close,
+ .split = hugetlb_vm_op_split,
};
static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page,
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2538,9 +2538,11 @@ static int __split_vma(struct mm_struct
struct vm_area_struct *new;
int err;
- if (is_vm_hugetlb_page(vma) && (addr &
- ~(huge_page_mask(hstate_vma(vma)))))
- return -EINVAL;
+ if (vma->vm_ops && vma->vm_ops->split) {
+ err = vma->vm_ops->split(vma, addr);
+ if (err)
+ return err;
+ }
new = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
if (!new)
Patches currently in stable-queue which might be from dan.j.williams(a)intel.com are
queue-4.9/mm-hugetlbfs-introduce-split-to-vm_operations_struct.patch
queue-4.9/mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch
This is a note to let you know that I've just added the patch titled
mm/cma: fix alloc_contig_range ret code/potential leak
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mm-cma-fix-alloc_contig_range-ret-code-potential-leak.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 63cd448908b5eb51d84c52f02b31b9b4ccd1cb5a Mon Sep 17 00:00:00 2001
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Date: Wed, 29 Nov 2017 16:10:01 -0800
Subject: mm/cma: fix alloc_contig_range ret code/potential leak
From: Mike Kravetz <mike.kravetz(a)oracle.com>
commit 63cd448908b5eb51d84c52f02b31b9b4ccd1cb5a upstream.
If the call __alloc_contig_migrate_range() in alloc_contig_range returns
-EBUSY, processing continues so that test_pages_isolated() is called
where there is a tracepoint to identify the busy pages. However, it is
possible for busy pages to become available between the calls to these
two routines. In this case, the range of pages may be allocated.
Unfortunately, the original return code (ret == -EBUSY) is still set and
returned to the caller. Therefore, the caller believes the pages were
not allocated and they are leaked.
Update the comment to indicate that allocation is still possible even if
__alloc_contig_migrate_range returns -EBUSY. Also, clear return code in
this case so that it is not accidentally used or returned to caller.
Link: http://lkml.kernel.org/r/20171122185214.25285-1-mike.kravetz@oracle.com
Fixes: 8ef5849fa8a2 ("mm/cma: always check which page caused allocation failure")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Acked-by: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Michal Nazarewicz <mina86(a)mina86.com>
Cc: Laura Abbott <labbott(a)redhat.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
mm/page_alloc.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7309,11 +7309,18 @@ int alloc_contig_range(unsigned long sta
/*
* In case of -EBUSY, we'd like to know which page causes problem.
- * So, just fall through. We will check it in test_pages_isolated().
+ * So, just fall through. test_pages_isolated() has a tracepoint
+ * which will report the busy page.
+ *
+ * It is possible that busy pages could become available before
+ * the call to test_pages_isolated, and the range will actually be
+ * allocated. So, if we fall through be sure to clear ret so that
+ * -EBUSY is not accidentally used or returned to caller.
*/
ret = __alloc_contig_migrate_range(&cc, start, end);
if (ret && ret != -EBUSY)
goto done;
+ ret =0;
/*
* Pages from [start, end) are within a MAX_ORDER_NR_PAGES
Patches currently in stable-queue which might be from mike.kravetz(a)oracle.com are
queue-4.9/mm-cma-fix-alloc_contig_range-ret-code-potential-leak.patch
This is a note to let you know that I've just added the patch titled
mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mm-thp-do-not-make-page-table-dirty-unconditionally-in-touch_pd.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From a8f97366452ed491d13cf1e44241bc0b5740b1f0 Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Date: Mon, 27 Nov 2017 06:21:25 +0300
Subject: mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()
From: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
commit a8f97366452ed491d13cf1e44241bc0b5740b1f0 upstream.
Currently, we unconditionally make page table dirty in touch_pmd().
It may result in false-positive can_follow_write_pmd().
We may avoid the situation, if we would only make the page table entry
dirty if caller asks for write access -- FOLL_WRITE.
The patch also changes touch_pud() in the same way.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
[Salvatore Bonaccorso: backport for 3.16:
- Adjust context
- Drop specific part for PUD-sized transparent hugepages. Support
for PUD-sized transparent hugepages was added in v4.11-rc1
]
Signed-off-by: Ben Hutchings <ben(a)decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
mm/huge_memory.c | 14 ++++----------
1 file changed, 4 insertions(+), 10 deletions(-)
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1304,17 +1304,11 @@ struct page *follow_trans_huge_pmd(struc
VM_BUG_ON_PAGE(!PageHead(page), page);
if (flags & FOLL_TOUCH) {
pmd_t _pmd;
- /*
- * We should set the dirty bit only for FOLL_WRITE but
- * for now the dirty bit in the pmd is meaningless.
- * And if the dirty bit will become meaningful and
- * we'll only set it with FOLL_WRITE, an atomic
- * set_bit will be required on the pmd to set the
- * young bit, instead of the current set_pmd_at.
- */
- _pmd = pmd_mkyoung(pmd_mkdirty(*pmd));
+ _pmd = pmd_mkyoung(*pmd);
+ if (flags & FOLL_WRITE)
+ _pmd = pmd_mkdirty(_pmd);
if (pmdp_set_access_flags(vma, addr & HPAGE_PMD_MASK,
- pmd, _pmd, 1))
+ pmd, _pmd, flags & FOLL_WRITE))
update_mmu_cache_pmd(vma, addr, pmd);
}
if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) {
Patches currently in stable-queue which might be from kirill.shutemov(a)linux.intel.com are
queue-4.4/mm-thp-do-not-make-page-table-dirty-unconditionally-in-touch_pd.patch
queue-4.4/mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch
This is a note to let you know that I've just added the patch titled
mm/madvise.c: fix madvise() infinite loop under special circumstances
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 6ea8d958a2c95a1d514015d4e29ba21a8c0a1a91 Mon Sep 17 00:00:00 2001
From: chenjie <chenjie6(a)huawei.com>
Date: Wed, 29 Nov 2017 16:10:54 -0800
Subject: mm/madvise.c: fix madvise() infinite loop under special circumstances
From: chenjie <chenjie6(a)huawei.com>
commit 6ea8d958a2c95a1d514015d4e29ba21a8c0a1a91 upstream.
MADVISE_WILLNEED has always been a noop for DAX (formerly XIP) mappings.
Unfortunately madvise_willneed() doesn't communicate this information
properly to the generic madvise syscall implementation. The calling
convention is quite subtle there. madvise_vma() is supposed to either
return an error or update &prev otherwise the main loop will never
advance to the next vma and it will keep looping for ever without a way
to get out of the kernel.
It seems this has been broken since introduction. Nobody has noticed
because nobody seems to be using MADVISE_WILLNEED on these DAX mappings.
[mhocko(a)suse.com: rewrite changelog]
Link: http://lkml.kernel.org/r/20171127115318.911-1-guoxuenan@huawei.com
Fixes: fe77ba6f4f97 ("[PATCH] xip: madvice/fadvice: execute in place")
Signed-off-by: chenjie <chenjie6(a)huawei.com>
Signed-off-by: guoxuenan <guoxuenan(a)huawei.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: zhangyi (F) <yi.zhang(a)huawei.com>
Cc: Miao Xie <miaoxie(a)huawei.com>
Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com>
Cc: Shaohua Li <shli(a)fb.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Anshuman Khandual <khandual(a)linux.vnet.ibm.com>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Carsten Otte <cotte(a)de.ibm.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
mm/madvise.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -223,15 +223,14 @@ static long madvise_willneed(struct vm_a
{
struct file *file = vma->vm_file;
+ *prev = vma;
#ifdef CONFIG_SWAP
if (!file) {
- *prev = vma;
force_swapin_readahead(vma, start, end);
return 0;
}
if (shmem_mapping(file->f_mapping)) {
- *prev = vma;
force_shm_swapin_readahead(vma, start, end,
file->f_mapping);
return 0;
@@ -246,7 +245,6 @@ static long madvise_willneed(struct vm_a
return 0;
}
- *prev = vma;
start = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
if (end > vma->vm_end)
end = vma->vm_end;
Patches currently in stable-queue which might be from chenjie6(a)huawei.com are
queue-4.4/mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch
This is a note to let you know that I've just added the patch titled
v4l2: disable filesystem-dax mapping support
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
v4l2-disable-filesystem-dax-mapping-support.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From b70131de648c2b997d22f4653934438013f407a1 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 29 Nov 2017 16:10:43 -0800
Subject: v4l2: disable filesystem-dax mapping support
From: Dan Williams <dan.j.williams(a)intel.com>
commit b70131de648c2b997d22f4653934438013f407a1 upstream.
V4L2 memory registrations are incompatible with filesystem-dax that
needs the ability to revoke dma access to a mapping at will, or
otherwise allow the kernel to wait for completion of DMA. The
filesystem-dax implementation breaks the traditional solution of
truncate of active file backed mappings since there is no page-cache
page we can orphan to sustain ongoing DMA.
If v4l2 wants to support long lived DMA mappings it needs to arrange to
hold a file lease or use some other mechanism so that the kernel can
coordinate revoking DMA access when the filesystem needs to truncate
mappings.
Link: http://lkml.kernel.org/r/151068940499.7446.12846708245365671207.stgit@dwill…
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reported-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Doug Ledford <dledford(a)redhat.com>
Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com>
Cc: Inki Dae <inki.dae(a)samsung.com>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Joonyoung Shim <jy0922.shim(a)samsung.com>
Cc: Kyungmin Park <kyungmin.park(a)samsung.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Cc: Sean Hefty <sean.hefty(a)intel.com>
Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/media/v4l2-core/videobuf-dma-sg.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -185,12 +185,13 @@ static int videobuf_dma_init_user_locked
dprintk(1, "init user [0x%lx+0x%lx => %d pages]\n",
data, size, dma->nr_pages);
- err = get_user_pages(data & PAGE_MASK, dma->nr_pages,
+ err = get_user_pages_longterm(data & PAGE_MASK, dma->nr_pages,
flags, dma->pages, NULL);
if (err != dma->nr_pages) {
dma->nr_pages = (err >= 0) ? err : 0;
- dprintk(1, "get_user_pages: err=%d [%d]\n", err, dma->nr_pages);
+ dprintk(1, "get_user_pages_longterm: err=%d [%d]\n", err,
+ dma->nr_pages);
return err < 0 ? err : -EINVAL;
}
return 0;
Patches currently in stable-queue which might be from dan.j.williams(a)intel.com are
queue-4.14/mm-hugetlbfs-introduce-split-to-vm_operations_struct.patch
queue-4.14/ib-core-disable-memory-registration-of-filesystem-dax-vmas.patch
queue-4.14/mm-introduce-get_user_pages_longterm.patch
queue-4.14/mm-fail-get_vaddr_frames-for-filesystem-dax-mappings.patch
queue-4.14/device-dax-implement-split-to-catch-invalid-munmap-attempts.patch
queue-4.14/v4l2-disable-filesystem-dax-mapping-support.patch
queue-4.14/mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages.patch
queue-4.14/mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch
This is a note to let you know that I've just added the patch titled
mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mm-thp-do-not-make-page-table-dirty-unconditionally-in-touch_pd.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From a8f97366452ed491d13cf1e44241bc0b5740b1f0 Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Date: Mon, 27 Nov 2017 06:21:25 +0300
Subject: mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()
From: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
commit a8f97366452ed491d13cf1e44241bc0b5740b1f0 upstream.
Currently, we unconditionally make page table dirty in touch_pmd().
It may result in false-positive can_follow_write_pmd().
We may avoid the situation, if we would only make the page table entry
dirty if caller asks for write access -- FOLL_WRITE.
The patch also changes touch_pud() in the same way.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
mm/huge_memory.c | 36 +++++++++++++-----------------------
1 file changed, 13 insertions(+), 23 deletions(-)
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -842,20 +842,15 @@ EXPORT_SYMBOL_GPL(vmf_insert_pfn_pud);
#endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
static void touch_pmd(struct vm_area_struct *vma, unsigned long addr,
- pmd_t *pmd)
+ pmd_t *pmd, int flags)
{
pmd_t _pmd;
- /*
- * We should set the dirty bit only for FOLL_WRITE but for now
- * the dirty bit in the pmd is meaningless. And if the dirty
- * bit will become meaningful and we'll only set it with
- * FOLL_WRITE, an atomic set_bit will be required on the pmd to
- * set the young bit, instead of the current set_pmd_at.
- */
- _pmd = pmd_mkyoung(pmd_mkdirty(*pmd));
+ _pmd = pmd_mkyoung(*pmd);
+ if (flags & FOLL_WRITE)
+ _pmd = pmd_mkdirty(_pmd);
if (pmdp_set_access_flags(vma, addr & HPAGE_PMD_MASK,
- pmd, _pmd, 1))
+ pmd, _pmd, flags & FOLL_WRITE))
update_mmu_cache_pmd(vma, addr, pmd);
}
@@ -884,7 +879,7 @@ struct page *follow_devmap_pmd(struct vm
return NULL;
if (flags & FOLL_TOUCH)
- touch_pmd(vma, addr, pmd);
+ touch_pmd(vma, addr, pmd, flags);
/*
* device mapped pages can only be returned if the
@@ -995,20 +990,15 @@ out:
#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
static void touch_pud(struct vm_area_struct *vma, unsigned long addr,
- pud_t *pud)
+ pud_t *pud, int flags)
{
pud_t _pud;
- /*
- * We should set the dirty bit only for FOLL_WRITE but for now
- * the dirty bit in the pud is meaningless. And if the dirty
- * bit will become meaningful and we'll only set it with
- * FOLL_WRITE, an atomic set_bit will be required on the pud to
- * set the young bit, instead of the current set_pud_at.
- */
- _pud = pud_mkyoung(pud_mkdirty(*pud));
+ _pud = pud_mkyoung(*pud);
+ if (flags & FOLL_WRITE)
+ _pud = pud_mkdirty(_pud);
if (pudp_set_access_flags(vma, addr & HPAGE_PUD_MASK,
- pud, _pud, 1))
+ pud, _pud, flags & FOLL_WRITE))
update_mmu_cache_pud(vma, addr, pud);
}
@@ -1031,7 +1021,7 @@ struct page *follow_devmap_pud(struct vm
return NULL;
if (flags & FOLL_TOUCH)
- touch_pud(vma, addr, pud);
+ touch_pud(vma, addr, pud, flags);
/*
* device mapped pages can only be returned if the
@@ -1407,7 +1397,7 @@ struct page *follow_trans_huge_pmd(struc
page = pmd_page(*pmd);
VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page);
if (flags & FOLL_TOUCH)
- touch_pmd(vma, addr, pmd);
+ touch_pmd(vma, addr, pmd, flags);
if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) {
/*
* We don't mlock() pte-mapped THPs. This way we can avoid
Patches currently in stable-queue which might be from kirill.shutemov(a)linux.intel.com are
queue-4.14/mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages.patch
queue-4.14/mm-hugetlb-fix-null-pointer-dereference-on-5-level-paging-machine.patch
queue-4.14/mm-thp-do-not-make-page-table-dirty-unconditionally-in-touch_pd.patch
queue-4.14/mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch
This is a note to let you know that I've just added the patch titled
mm: migrate: fix an incorrect call of prep_transhuge_page()
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mm-migrate-fix-an-incorrect-call-of-prep_transhuge_page.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 40a899ed16486455f964e46d1af31fd4fded21c1 Mon Sep 17 00:00:00 2001
From: Zi Yan <zi.yan(a)cs.rutgers.edu>
Date: Wed, 29 Nov 2017 16:11:12 -0800
Subject: mm: migrate: fix an incorrect call of prep_transhuge_page()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Zi Yan <zi.yan(a)cs.rutgers.edu>
commit 40a899ed16486455f964e46d1af31fd4fded21c1 upstream.
In https://lkml.org/lkml/2017/11/20/411, Andrea reported that during
memory hotplug/hot remove prep_transhuge_page() is called incorrectly on
non-THP pages for migration, when THP is on but THP migration is not
enabled. This leads to a bad state of target pages for migration.
By inspecting the code, if called on a non-THP, prep_transhuge_page()
will
1) change the value of the mapping of (page + 2), since it is used for
THP deferred list;
2) change the lru value of (page + 1), since it is used for THP's dtor.
Both can lead to data corruption of these two pages.
Andrea said:
"Pragmatically and from the point of view of the memory_hotplug subsys,
the effect is a kernel crash when pages are being migrated during a
memory hot remove offline and migration target pages are found in a
bad state"
This patch fixes it by only calling prep_transhuge_page() when we are
certain that the target page is THP.
Link: http://lkml.kernel.org/r/20171121021855.50525-1-zi.yan@sent.com
Fixes: 8135d8926c08 ("mm: memory_hotplug: memory hotremove supports thp migration")
Signed-off-by: Zi Yan <zi.yan(a)cs.rutgers.edu>
Reported-by: Andrea Reale <ar(a)linux.vnet.ibm.com>
Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 895ec0c4942e..a2246cf670ba 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -54,7 +54,7 @@ static inline struct page *new_page_nodemask(struct page *page,
new_page = __alloc_pages_nodemask(gfp_mask, order,
preferred_nid, nodemask);
- if (new_page && PageTransHuge(page))
+ if (new_page && PageTransHuge(new_page))
prep_transhuge_page(new_page);
return new_page;
Patches currently in stable-queue which might be from zi.yan(a)cs.rutgers.edu are
queue-4.14/mm-migrate-fix-an-incorrect-call-of-prep_transhuge_page.patch
This is a note to let you know that I've just added the patch titled
mm, memcg: fix mem_cgroup_swapout() for THPs
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mm-memcg-fix-mem_cgroup_swapout-for-thps.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d08afa149acfd00871484ada6dabc3880524cd1c Mon Sep 17 00:00:00 2001
From: Shakeel Butt <shakeelb(a)google.com>
Date: Wed, 29 Nov 2017 16:11:15 -0800
Subject: mm, memcg: fix mem_cgroup_swapout() for THPs
From: Shakeel Butt <shakeelb(a)google.com>
commit d08afa149acfd00871484ada6dabc3880524cd1c upstream.
Commit d6810d730022 ("memcg, THP, swap: make mem_cgroup_swapout()
support THP") changed mem_cgroup_swapout() to support transparent huge
page (THP).
However the patch missed one location which should be changed for
correctly handling THPs. The resulting bug will cause the memory
cgroups whose THPs were swapped out to become zombies on deletion.
Link: http://lkml.kernel.org/r/20171128161941.20931-1-shakeelb@google.com
Fixes: d6810d730022 ("memcg, THP, swap: make mem_cgroup_swapout() support THP")
Signed-off-by: Shakeel Butt <shakeelb(a)google.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Huang Ying <ying.huang(a)intel.com>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: Greg Thelen <gthelen(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
---
mm/memcontrol.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6044,7 +6044,7 @@ void mem_cgroup_swapout(struct page *pag
memcg_check_events(memcg, page);
if (!mem_cgroup_is_root(memcg))
- css_put(&memcg->css);
+ css_put_many(&memcg->css, nr_entries);
}
/**
Patches currently in stable-queue which might be from shakeelb(a)google.com are
queue-4.14/mm-memcg-fix-mem_cgroup_swapout-for-thps.patch
This is a note to let you know that I've just added the patch titled
mm/madvise.c: fix madvise() infinite loop under special circumstances
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 6ea8d958a2c95a1d514015d4e29ba21a8c0a1a91 Mon Sep 17 00:00:00 2001
From: chenjie <chenjie6(a)huawei.com>
Date: Wed, 29 Nov 2017 16:10:54 -0800
Subject: mm/madvise.c: fix madvise() infinite loop under special circumstances
From: chenjie <chenjie6(a)huawei.com>
commit 6ea8d958a2c95a1d514015d4e29ba21a8c0a1a91 upstream.
MADVISE_WILLNEED has always been a noop for DAX (formerly XIP) mappings.
Unfortunately madvise_willneed() doesn't communicate this information
properly to the generic madvise syscall implementation. The calling
convention is quite subtle there. madvise_vma() is supposed to either
return an error or update &prev otherwise the main loop will never
advance to the next vma and it will keep looping for ever without a way
to get out of the kernel.
It seems this has been broken since introduction. Nobody has noticed
because nobody seems to be using MADVISE_WILLNEED on these DAX mappings.
[mhocko(a)suse.com: rewrite changelog]
Link: http://lkml.kernel.org/r/20171127115318.911-1-guoxuenan@huawei.com
Fixes: fe77ba6f4f97 ("[PATCH] xip: madvice/fadvice: execute in place")
Signed-off-by: chenjie <chenjie6(a)huawei.com>
Signed-off-by: guoxuenan <guoxuenan(a)huawei.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: zhangyi (F) <yi.zhang(a)huawei.com>
Cc: Miao Xie <miaoxie(a)huawei.com>
Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com>
Cc: Shaohua Li <shli(a)fb.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Anshuman Khandual <khandual(a)linux.vnet.ibm.com>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Carsten Otte <cotte(a)de.ibm.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
mm/madvise.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -276,15 +276,14 @@ static long madvise_willneed(struct vm_a
{
struct file *file = vma->vm_file;
+ *prev = vma;
#ifdef CONFIG_SWAP
if (!file) {
- *prev = vma;
force_swapin_readahead(vma, start, end);
return 0;
}
if (shmem_mapping(file->f_mapping)) {
- *prev = vma;
force_shm_swapin_readahead(vma, start, end,
file->f_mapping);
return 0;
@@ -299,7 +298,6 @@ static long madvise_willneed(struct vm_a
return 0;
}
- *prev = vma;
start = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
if (end > vma->vm_end)
end = vma->vm_end;
Patches currently in stable-queue which might be from chenjie6(a)huawei.com are
queue-4.14/mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch
This is a note to let you know that I've just added the patch titled
mm: introduce get_user_pages_longterm
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mm-introduce-get_user_pages_longterm.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 2bb6d2837083de722bfdc369cb0d76ce188dd9b4 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 29 Nov 2017 16:10:35 -0800
Subject: mm: introduce get_user_pages_longterm
From: Dan Williams <dan.j.williams(a)intel.com>
commit 2bb6d2837083de722bfdc369cb0d76ce188dd9b4 upstream.
Patch series "introduce get_user_pages_longterm()", v2.
Here is a new get_user_pages api for cases where a driver intends to
keep an elevated page count indefinitely. This is distinct from usages
like iov_iter_get_pages where the elevated page counts are transient.
The iov_iter_get_pages cases immediately turn around and submit the
pages to a device driver which will put_page when the i/o operation
completes (under kernel control).
In the longterm case userspace is responsible for dropping the page
reference at some undefined point in the future. This is untenable for
filesystem-dax case where the filesystem is in control of the lifetime
of the block / page and needs reasonable limits on how long it can wait
for pages in a mapping to become idle.
Fixing filesystems to actually wait for dax pages to be idle before
blocks from a truncate/hole-punch operation are repurposed is saved for
a later patch series.
Also, allowing longterm registration of dax mappings is a future patch
series that introduces a "map with lease" semantic where the kernel can
revoke a lease and force userspace to drop its page references.
I have also tagged these for -stable to purposely break cases that might
assume that longterm memory registrations for filesystem-dax mappings
were supported by the kernel. The behavior regression this policy
change implies is one of the reasons we maintain the "dax enabled.
Warning: EXPERIMENTAL, use at your own risk" notification when mounting
a filesystem in dax mode.
It is worth noting the device-dax interface does not suffer the same
constraints since it does not support file space management operations
like hole-punch.
This patch (of 4):
Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow long standing memory registrations against
filesytem-dax vmas. Device-dax vmas do not have this problem and are
explicitly allowed.
This is temporary until a "memory registration with layout-lease"
mechanism can be implemented for the affected sub-systems (RDMA and
V4L2).
[akpm(a)linux-foundation.org: use kcalloc()]
Link: http://lkml.kernel.org/r/151068939435.7446.13560129395419350737.stgit@dwill…
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Suggested-by: Christoph Hellwig <hch(a)lst.de>
Cc: Doug Ledford <dledford(a)redhat.com>
Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com>
Cc: Inki Dae <inki.dae(a)samsung.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Joonyoung Shim <jy0922.shim(a)samsung.com>
Cc: Kyungmin Park <kyungmin.park(a)samsung.com>
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Cc: Sean Hefty <sean.hefty(a)intel.com>
Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/linux/fs.h | 14 +++++++++++
include/linux/mm.h | 13 ++++++++++
mm/gup.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 91 insertions(+)
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3175,6 +3175,20 @@ static inline bool vma_is_dax(struct vm_
return vma->vm_file && IS_DAX(vma->vm_file->f_mapping->host);
}
+static inline bool vma_is_fsdax(struct vm_area_struct *vma)
+{
+ struct inode *inode;
+
+ if (!vma->vm_file)
+ return false;
+ if (!vma_is_dax(vma))
+ return false;
+ inode = file_inode(vma->vm_file);
+ if (inode->i_mode == S_IFCHR)
+ return false; /* device-dax */
+ return true;
+}
+
static inline int iocb_flags(struct file *file)
{
int res = 0;
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1368,6 +1368,19 @@ long get_user_pages_locked(unsigned long
unsigned int gup_flags, struct page **pages, int *locked);
long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
struct page **pages, unsigned int gup_flags);
+#ifdef CONFIG_FS_DAX
+long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
+ unsigned int gup_flags, struct page **pages,
+ struct vm_area_struct **vmas);
+#else
+static inline long get_user_pages_longterm(unsigned long start,
+ unsigned long nr_pages, unsigned int gup_flags,
+ struct page **pages, struct vm_area_struct **vmas)
+{
+ return get_user_pages(start, nr_pages, gup_flags, pages, vmas);
+}
+#endif /* CONFIG_FS_DAX */
+
int get_user_pages_fast(unsigned long start, int nr_pages, int write,
struct page **pages);
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1095,6 +1095,70 @@ long get_user_pages(unsigned long start,
}
EXPORT_SYMBOL(get_user_pages);
+#ifdef CONFIG_FS_DAX
+/*
+ * This is the same as get_user_pages() in that it assumes we are
+ * operating on the current task's mm, but it goes further to validate
+ * that the vmas associated with the address range are suitable for
+ * longterm elevated page reference counts. For example, filesystem-dax
+ * mappings are subject to the lifetime enforced by the filesystem and
+ * we need guarantees that longterm users like RDMA and V4L2 only
+ * establish mappings that have a kernel enforced revocation mechanism.
+ *
+ * "longterm" == userspace controlled elevated page count lifetime.
+ * Contrast this to iov_iter_get_pages() usages which are transient.
+ */
+long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
+ unsigned int gup_flags, struct page **pages,
+ struct vm_area_struct **vmas_arg)
+{
+ struct vm_area_struct **vmas = vmas_arg;
+ struct vm_area_struct *vma_prev = NULL;
+ long rc, i;
+
+ if (!pages)
+ return -EINVAL;
+
+ if (!vmas) {
+ vmas = kcalloc(nr_pages, sizeof(struct vm_area_struct *),
+ GFP_KERNEL);
+ if (!vmas)
+ return -ENOMEM;
+ }
+
+ rc = get_user_pages(start, nr_pages, gup_flags, pages, vmas);
+
+ for (i = 0; i < rc; i++) {
+ struct vm_area_struct *vma = vmas[i];
+
+ if (vma == vma_prev)
+ continue;
+
+ vma_prev = vma;
+
+ if (vma_is_fsdax(vma))
+ break;
+ }
+
+ /*
+ * Either get_user_pages() failed, or the vma validation
+ * succeeded, in either case we don't need to put_page() before
+ * returning.
+ */
+ if (i >= rc)
+ goto out;
+
+ for (i = 0; i < rc; i++)
+ put_page(pages[i]);
+ rc = -EOPNOTSUPP;
+out:
+ if (vmas != vmas_arg)
+ kfree(vmas);
+ return rc;
+}
+EXPORT_SYMBOL(get_user_pages_longterm);
+#endif /* CONFIG_FS_DAX */
+
/**
* populate_vma_page_range() - populate a range of pages in the vma.
* @vma: target vma
Patches currently in stable-queue which might be from dan.j.williams(a)intel.com are
queue-4.14/mm-hugetlbfs-introduce-split-to-vm_operations_struct.patch
queue-4.14/ib-core-disable-memory-registration-of-filesystem-dax-vmas.patch
queue-4.14/mm-introduce-get_user_pages_longterm.patch
queue-4.14/mm-fail-get_vaddr_frames-for-filesystem-dax-mappings.patch
queue-4.14/device-dax-implement-split-to-catch-invalid-munmap-attempts.patch
queue-4.14/v4l2-disable-filesystem-dax-mapping-support.patch
queue-4.14/mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages.patch
queue-4.14/mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch
This is a note to let you know that I've just added the patch titled
mm, hugetlbfs: introduce ->split() to vm_operations_struct
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mm-hugetlbfs-introduce-split-to-vm_operations_struct.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 31383c6865a578834dd953d9dbc88e6b19fe3997 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 29 Nov 2017 16:10:28 -0800
Subject: mm, hugetlbfs: introduce ->split() to vm_operations_struct
From: Dan Williams <dan.j.williams(a)intel.com>
commit 31383c6865a578834dd953d9dbc88e6b19fe3997 upstream.
Patch series "device-dax: fix unaligned munmap handling"
When device-dax is operating in huge-page mode we want it to behave like
hugetlbfs and fail attempts to split vmas into unaligned ranges. It
would be messy to teach the munmap path about device-dax alignment
constraints in the same (hstate) way that hugetlbfs communicates this
constraint. Instead, these patches introduce a new ->split() vm
operation.
This patch (of 2):
The device-dax interface has similar constraints as hugetlbfs in that it
requires the munmap path to unmap in huge page aligned units. Rather
than add more custom vma handling code in __split_vma() introduce a new
vm operation to perform this vma specific check.
Link: http://lkml.kernel.org/r/151130418135.4029.6783191281930729710.stgit@dwilli…
Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/linux/mm.h | 1 +
mm/hugetlb.c | 8 ++++++++
mm/mmap.c | 8 +++++---
3 files changed, 14 insertions(+), 3 deletions(-)
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -367,6 +367,7 @@ enum page_entry_size {
struct vm_operations_struct {
void (*open)(struct vm_area_struct * area);
void (*close)(struct vm_area_struct * area);
+ int (*split)(struct vm_area_struct * area, unsigned long addr);
int (*mremap)(struct vm_area_struct * area);
int (*fault)(struct vm_fault *vmf);
int (*huge_fault)(struct vm_fault *vmf, enum page_entry_size pe_size);
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3125,6 +3125,13 @@ static void hugetlb_vm_op_close(struct v
}
}
+static int hugetlb_vm_op_split(struct vm_area_struct *vma, unsigned long addr)
+{
+ if (addr & ~(huge_page_mask(hstate_vma(vma))))
+ return -EINVAL;
+ return 0;
+}
+
/*
* We cannot handle pagefaults against hugetlb pages at all. They cause
* handle_mm_fault() to try to instantiate regular-sized pages in the
@@ -3141,6 +3148,7 @@ const struct vm_operations_struct hugetl
.fault = hugetlb_vm_op_fault,
.open = hugetlb_vm_op_open,
.close = hugetlb_vm_op_close,
+ .split = hugetlb_vm_op_split,
};
static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page,
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2540,9 +2540,11 @@ int __split_vma(struct mm_struct *mm, st
struct vm_area_struct *new;
int err;
- if (is_vm_hugetlb_page(vma) && (addr &
- ~(huge_page_mask(hstate_vma(vma)))))
- return -EINVAL;
+ if (vma->vm_ops && vma->vm_ops->split) {
+ err = vma->vm_ops->split(vma, addr);
+ if (err)
+ return err;
+ }
new = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
if (!new)
Patches currently in stable-queue which might be from dan.j.williams(a)intel.com are
queue-4.14/mm-hugetlbfs-introduce-split-to-vm_operations_struct.patch
queue-4.14/ib-core-disable-memory-registration-of-filesystem-dax-vmas.patch
queue-4.14/mm-introduce-get_user_pages_longterm.patch
queue-4.14/mm-fail-get_vaddr_frames-for-filesystem-dax-mappings.patch
queue-4.14/device-dax-implement-split-to-catch-invalid-munmap-attempts.patch
queue-4.14/v4l2-disable-filesystem-dax-mapping-support.patch
queue-4.14/mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages.patch
queue-4.14/mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch
This is a note to let you know that I've just added the patch titled
mm/hugetlb: fix NULL-pointer dereference on 5-level paging machine
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mm-hugetlb-fix-null-pointer-dereference-on-5-level-paging-machine.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From f4f0a3d85b50a65a348e2b8635041d6b30f01deb Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Date: Wed, 29 Nov 2017 16:11:30 -0800
Subject: mm/hugetlb: fix NULL-pointer dereference on 5-level paging machine
From: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
commit f4f0a3d85b50a65a348e2b8635041d6b30f01deb upstream.
I made a mistake during converting hugetlb code to 5-level paging: in
huge_pte_alloc() we have to use p4d_alloc(), not p4d_offset().
Otherwise it leads to crash -- NULL-pointer dereference in pud_alloc()
if p4d table is not yet allocated.
It only can happen in 5-level paging mode. In 4-level paging mode
p4d_offset() always returns pgd, so we are fine.
Link: http://lkml.kernel.org/r/20171122121921.64822-1-kirill.shutemov@linux.intel…
Fixes: c2febafc6773 ("mm: convert generic code to 5-level paging")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
mm/hugetlb.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4625,7 +4625,9 @@ pte_t *huge_pte_alloc(struct mm_struct *
pte_t *pte = NULL;
pgd = pgd_offset(mm, addr);
- p4d = p4d_offset(pgd, addr);
+ p4d = p4d_alloc(mm, pgd, addr);
+ if (!p4d)
+ return NULL;
pud = pud_alloc(mm, p4d, addr);
if (pud) {
if (sz == PUD_SIZE) {
Patches currently in stable-queue which might be from kirill.shutemov(a)linux.intel.com are
queue-4.14/mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages.patch
queue-4.14/mm-hugetlb-fix-null-pointer-dereference-on-5-level-paging-machine.patch
queue-4.14/mm-thp-do-not-make-page-table-dirty-unconditionally-in-touch_pd.patch
queue-4.14/mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch
This is a note to let you know that I've just added the patch titled
mm: fix device-dax pud write-faults triggered by get_user_pages()
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 1501899a898dfb5477c55534bdfd734c046da06d Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 29 Nov 2017 16:10:06 -0800
Subject: mm: fix device-dax pud write-faults triggered by get_user_pages()
From: Dan Williams <dan.j.williams(a)intel.com>
commit 1501899a898dfb5477c55534bdfd734c046da06d upstream.
Currently only get_user_pages_fast() can safely handle the writable gup
case due to its use of pud_access_permitted() to check whether the pud
entry is writable. In the gup slow path pud_write() is used instead of
pud_access_permitted() and to date it has been unimplemented, just calls
BUG_ON().
kernel BUG at ./include/linux/hugetlb.h:244!
[..]
RIP: 0010:follow_devmap_pud+0x482/0x490
[..]
Call Trace:
follow_page_mask+0x28c/0x6e0
__get_user_pages+0xe4/0x6c0
get_user_pages_unlocked+0x130/0x1b0
get_user_pages_fast+0x89/0xb0
iov_iter_get_pages_alloc+0x114/0x4a0
nfs_direct_read_schedule_iovec+0xd2/0x350
? nfs_start_io_direct+0x63/0x70
nfs_file_direct_read+0x1e0/0x250
nfs_file_read+0x90/0xc0
For now this just implements a simple check for the _PAGE_RW bit similar
to pmd_write. However, this implies that the gup-slow-path check is
missing the extra checks that the gup-fast-path performs with
pud_access_permitted. Later patches will align all checks to use the
'access_permitted' helper if the architecture provides it.
Note that the generic 'access_permitted' helper fallback is the simple
_PAGE_RW check on architectures that do not define the
'access_permitted' helper(s).
[dan.j.williams(a)intel.com: fix powerpc compile error]
Link: http://lkml.kernel.org/r/151129126165.37405.16031785266675461397.stgit@dwil…
Link: http://lkml.kernel.org/r/151043109938.2842.14834662818213616199.stgit@dwill…
Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reported-by: Stephen Rothwell <sfr(a)canb.auug.org.au>
Acked-by: Thomas Gleixner <tglx(a)linutronix.de> [x86]
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/pgtable.h | 6 ++++++
include/asm-generic/pgtable.h | 8 ++++++++
include/linux/hugetlb.h | 8 --------
3 files changed, 14 insertions(+), 8 deletions(-)
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1093,6 +1093,12 @@ static inline void pmdp_set_wrprotect(st
clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp);
}
+#define pud_write pud_write
+static inline int pud_write(pud_t pud)
+{
+ return pud_flags(pud) & _PAGE_RW;
+}
+
/*
* clone_pgd_range(pgd_t *dst, pgd_t *src, int count);
*
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -814,6 +814,14 @@ static inline int pmd_write(pmd_t pmd)
#endif /* __HAVE_ARCH_PMD_WRITE */
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+#ifndef pud_write
+static inline int pud_write(pud_t pud)
+{
+ BUG();
+ return 0;
+}
+#endif /* pud_write */
+
#if !defined(CONFIG_TRANSPARENT_HUGEPAGE) || \
(defined(CONFIG_TRANSPARENT_HUGEPAGE) && \
!defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD))
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -239,14 +239,6 @@ static inline int pgd_write(pgd_t pgd)
}
#endif
-#ifndef pud_write
-static inline int pud_write(pud_t pud)
-{
- BUG();
- return 0;
-}
-#endif
-
#define HUGETLB_ANON_FILE "anon_hugepage"
enum {
Patches currently in stable-queue which might be from dan.j.williams(a)intel.com are
queue-4.14/mm-hugetlbfs-introduce-split-to-vm_operations_struct.patch
queue-4.14/ib-core-disable-memory-registration-of-filesystem-dax-vmas.patch
queue-4.14/mm-introduce-get_user_pages_longterm.patch
queue-4.14/mm-fail-get_vaddr_frames-for-filesystem-dax-mappings.patch
queue-4.14/device-dax-implement-split-to-catch-invalid-munmap-attempts.patch
queue-4.14/v4l2-disable-filesystem-dax-mapping-support.patch
queue-4.14/mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages.patch
queue-4.14/mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch
This is a note to let you know that I've just added the patch titled
mm: fail get_vaddr_frames() for filesystem-dax mappings
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mm-fail-get_vaddr_frames-for-filesystem-dax-mappings.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From b7f0554a56f21fb3e636a627450a9add030889be Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 29 Nov 2017 16:10:39 -0800
Subject: mm: fail get_vaddr_frames() for filesystem-dax mappings
From: Dan Williams <dan.j.williams(a)intel.com>
commit b7f0554a56f21fb3e636a627450a9add030889be upstream.
Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow V4L2, Exynos, and other frame vector users to create
long standing / irrevocable memory registrations against filesytem-dax
vmas.
[dan.j.williams(a)intel.com: add comment for vma_is_fsdax() check in get_vaddr_frames(), per Jan]
Link: http://lkml.kernel.org/r/151197874035.26211.4061781453123083667.stgit@dwill…
Link: http://lkml.kernel.org/r/151068939985.7446.15684639617389154187.stgit@dwill…
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Cc: Inki Dae <inki.dae(a)samsung.com>
Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com>
Cc: Joonyoung Shim <jy0922.shim(a)samsung.com>
Cc: Kyungmin Park <kyungmin.park(a)samsung.com>
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Doug Ledford <dledford(a)redhat.com>
Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Cc: Sean Hefty <sean.hefty(a)intel.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
mm/frame_vector.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
--- a/mm/frame_vector.c
+++ b/mm/frame_vector.c
@@ -53,6 +53,18 @@ int get_vaddr_frames(unsigned long start
ret = -EFAULT;
goto out;
}
+
+ /*
+ * While get_vaddr_frames() could be used for transient (kernel
+ * controlled lifetime) pinning of memory pages all current
+ * users establish long term (userspace controlled lifetime)
+ * page pinning. Treat get_vaddr_frames() like
+ * get_user_pages_longterm() and disallow it for filesystem-dax
+ * mappings.
+ */
+ if (vma_is_fsdax(vma))
+ return -EOPNOTSUPP;
+
if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
vec->got_ref = true;
vec->is_pfns = false;
Patches currently in stable-queue which might be from dan.j.williams(a)intel.com are
queue-4.14/mm-hugetlbfs-introduce-split-to-vm_operations_struct.patch
queue-4.14/ib-core-disable-memory-registration-of-filesystem-dax-vmas.patch
queue-4.14/mm-introduce-get_user_pages_longterm.patch
queue-4.14/mm-fail-get_vaddr_frames-for-filesystem-dax-mappings.patch
queue-4.14/device-dax-implement-split-to-catch-invalid-munmap-attempts.patch
queue-4.14/v4l2-disable-filesystem-dax-mapping-support.patch
queue-4.14/mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages.patch
queue-4.14/mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch
This is a note to let you know that I've just added the patch titled
mm/cma: fix alloc_contig_range ret code/potential leak
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mm-cma-fix-alloc_contig_range-ret-code-potential-leak.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 63cd448908b5eb51d84c52f02b31b9b4ccd1cb5a Mon Sep 17 00:00:00 2001
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Date: Wed, 29 Nov 2017 16:10:01 -0800
Subject: mm/cma: fix alloc_contig_range ret code/potential leak
From: Mike Kravetz <mike.kravetz(a)oracle.com>
commit 63cd448908b5eb51d84c52f02b31b9b4ccd1cb5a upstream.
If the call __alloc_contig_migrate_range() in alloc_contig_range returns
-EBUSY, processing continues so that test_pages_isolated() is called
where there is a tracepoint to identify the busy pages. However, it is
possible for busy pages to become available between the calls to these
two routines. In this case, the range of pages may be allocated.
Unfortunately, the original return code (ret == -EBUSY) is still set and
returned to the caller. Therefore, the caller believes the pages were
not allocated and they are leaked.
Update the comment to indicate that allocation is still possible even if
__alloc_contig_migrate_range returns -EBUSY. Also, clear return code in
this case so that it is not accidentally used or returned to caller.
Link: http://lkml.kernel.org/r/20171122185214.25285-1-mike.kravetz@oracle.com
Fixes: 8ef5849fa8a2 ("mm/cma: always check which page caused allocation failure")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Acked-by: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Michal Nazarewicz <mina86(a)mina86.com>
Cc: Laura Abbott <labbott(a)redhat.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
mm/page_alloc.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7587,11 +7587,18 @@ int alloc_contig_range(unsigned long sta
/*
* In case of -EBUSY, we'd like to know which page causes problem.
- * So, just fall through. We will check it in test_pages_isolated().
+ * So, just fall through. test_pages_isolated() has a tracepoint
+ * which will report the busy page.
+ *
+ * It is possible that busy pages could become available before
+ * the call to test_pages_isolated, and the range will actually be
+ * allocated. So, if we fall through be sure to clear ret so that
+ * -EBUSY is not accidentally used or returned to caller.
*/
ret = __alloc_contig_migrate_range(&cc, start, end);
if (ret && ret != -EBUSY)
goto done;
+ ret =0;
/*
* Pages from [start, end) are within a MAX_ORDER_NR_PAGES
Patches currently in stable-queue which might be from mike.kravetz(a)oracle.com are
queue-4.14/mm-cma-fix-alloc_contig_range-ret-code-potential-leak.patch
This is a note to let you know that I've just added the patch titled
IB/core: disable memory registration of filesystem-dax vmas
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ib-core-disable-memory-registration-of-filesystem-dax-vmas.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 5f1d43de54164dcfb9bfa542fcc92c1e1a1b6c1d Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 29 Nov 2017 16:10:47 -0800
Subject: IB/core: disable memory registration of filesystem-dax vmas
From: Dan Williams <dan.j.williams(a)intel.com>
commit 5f1d43de54164dcfb9bfa542fcc92c1e1a1b6c1d upstream.
Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow RDMA to create long standing memory registrations
against filesytem-dax vmas.
Link: http://lkml.kernel.org/r/151068941011.7446.7766030590347262502.stgit@dwilli…
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reported-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Acked-by: Jason Gunthorpe <jgg(a)mellanox.com>
Acked-by: Doug Ledford <dledford(a)redhat.com>
Cc: Sean Hefty <sean.hefty(a)intel.com>
Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Cc: Inki Dae <inki.dae(a)samsung.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Joonyoung Shim <jy0922.shim(a)samsung.com>
Cc: Kyungmin Park <kyungmin.park(a)samsung.com>
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/core/umem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -191,7 +191,7 @@ struct ib_umem *ib_umem_get(struct ib_uc
sg_list_start = umem->sg_head.sgl;
while (npages) {
- ret = get_user_pages(cur_base,
+ ret = get_user_pages_longterm(cur_base,
min_t(unsigned long, npages,
PAGE_SIZE / sizeof (struct page *)),
gup_flags, page_list, vma_list);
Patches currently in stable-queue which might be from dan.j.williams(a)intel.com are
queue-4.14/mm-hugetlbfs-introduce-split-to-vm_operations_struct.patch
queue-4.14/ib-core-disable-memory-registration-of-filesystem-dax-vmas.patch
queue-4.14/mm-introduce-get_user_pages_longterm.patch
queue-4.14/mm-fail-get_vaddr_frames-for-filesystem-dax-mappings.patch
queue-4.14/device-dax-implement-split-to-catch-invalid-munmap-attempts.patch
queue-4.14/v4l2-disable-filesystem-dax-mapping-support.patch
queue-4.14/mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages.patch
queue-4.14/mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch
This is a note to let you know that I've just added the patch titled
fs/fat/inode.c: fix sb_rdonly() change
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
fs-fat-inode.c-fix-sb_rdonly-change.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From b6e8e12c0aeb5fbf1bf46c84d58cc93aedede385 Mon Sep 17 00:00:00 2001
From: OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp>
Date: Wed, 29 Nov 2017 16:11:19 -0800
Subject: fs/fat/inode.c: fix sb_rdonly() change
From: OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp>
commit b6e8e12c0aeb5fbf1bf46c84d58cc93aedede385 upstream.
Commit bc98a42c1f7d ("VFS: Convert sb->s_flags & MS_RDONLY to
sb_rdonly(sb)") converted fat_remount():new_rdonly from a bool to an
int.
However fat_remount() depends upon the compiler's conversion of a
non-zero integer into boolean `true'.
Fix it by switching `new_rdonly' back into a bool.
Link: http://lkml.kernel.org/r/87mv3d5x51.fsf@mail.parknet.co.jp
Fixes: bc98a42c1f7d0f8 ("VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)")
Signed-off-by: OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp>
Cc: Joe Perches <joe(a)perches.com>
Cc: David Howells <dhowells(a)redhat.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/fat/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -779,7 +779,7 @@ static void __exit fat_destroy_inodecach
static int fat_remount(struct super_block *sb, int *flags, char *data)
{
- int new_rdonly;
+ bool new_rdonly;
struct msdos_sb_info *sbi = MSDOS_SB(sb);
*flags |= MS_NODIRATIME | (sbi->options.isvfat ? 0 : MS_NOATIME);
Patches currently in stable-queue which might be from hirofumi(a)mail.parknet.co.jp are
queue-4.14/fs-fat-inode.c-fix-sb_rdonly-change.patch
This is a note to let you know that I've just added the patch titled
exec: avoid RLIMIT_STACK races with prlimit()
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
exec-avoid-rlimit_stack-races-with-prlimit.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 04e35f4495dd560db30c25efca4eecae8ec8c375 Mon Sep 17 00:00:00 2001
From: Kees Cook <keescook(a)chromium.org>
Date: Wed, 29 Nov 2017 16:10:51 -0800
Subject: exec: avoid RLIMIT_STACK races with prlimit()
From: Kees Cook <keescook(a)chromium.org>
commit 04e35f4495dd560db30c25efca4eecae8ec8c375 upstream.
While the defense-in-depth RLIMIT_STACK limit on setuid processes was
protected against races from other threads calling setrlimit(), I missed
protecting it against races from external processes calling prlimit().
This adds locking around the change and makes sure that rlim_max is set
too.
Link: http://lkml.kernel.org/r/20171127193457.GA11348@beast
Fixes: 64701dee4178e ("exec: Use sane stack rlimit under secureexec")
Signed-off-by: Kees Cook <keescook(a)chromium.org>
Reported-by: Ben Hutchings <ben.hutchings(a)codethink.co.uk>
Reported-by: Brad Spengler <spender(a)grsecurity.net>
Acked-by: Serge Hallyn <serge(a)hallyn.com>
Cc: James Morris <james.l.morris(a)oracle.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Jiri Slaby <jslaby(a)suse.cz>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/exec.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1340,10 +1340,15 @@ void setup_new_exec(struct linux_binprm
* avoid bad behavior from the prior rlimits. This has to
* happen before arch_pick_mmap_layout(), which examines
* RLIMIT_STACK, but after the point of no return to avoid
- * needing to clean up the change on failure.
+ * races from other threads changing the limits. This also
+ * must be protected from races with prlimit() calls.
*/
+ task_lock(current->group_leader);
if (current->signal->rlim[RLIMIT_STACK].rlim_cur > _STK_LIM)
current->signal->rlim[RLIMIT_STACK].rlim_cur = _STK_LIM;
+ if (current->signal->rlim[RLIMIT_STACK].rlim_max > _STK_LIM)
+ current->signal->rlim[RLIMIT_STACK].rlim_max = _STK_LIM;
+ task_unlock(current->group_leader);
}
arch_pick_mmap_layout(current->mm);
Patches currently in stable-queue which might be from keescook(a)chromium.org are
queue-4.14/exec-avoid-rlimit_stack-races-with-prlimit.patch
This is a note to let you know that I've just added the patch titled
device-dax: implement ->split() to catch invalid munmap attempts
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
device-dax-implement-split-to-catch-invalid-munmap-attempts.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 9702cffdbf2129516db679e4467db81e1cd287da Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 29 Nov 2017 16:10:32 -0800
Subject: device-dax: implement ->split() to catch invalid munmap attempts
From: Dan Williams <dan.j.williams(a)intel.com>
commit 9702cffdbf2129516db679e4467db81e1cd287da upstream.
Similar to how device-dax enforces that the 'address', 'offset', and
'len' parameters to mmap() be aligned to the device's fundamental
alignment, the same constraints apply to munmap(). Implement ->split()
to fail munmap calls that violate the alignment constraint.
Otherwise, we later fail VM_BUG_ON checks in the unmap_page_range() path
with crash signatures of the form:
vma ffff8800b60c8a88 start 00007f88c0000000 end 00007f88c0e00000
next (null) prev (null) mm ffff8800b61150c0
prot 8000000000000027 anon_vma (null) vm_ops ffffffffa0091240
pgoff 0 file ffff8800b638ef80 private_data (null)
flags: 0x380000fb(read|write|shared|mayread|maywrite|mayexec|mayshare|softdirty|mixedmap|hugepage)
------------[ cut here ]------------
kernel BUG at mm/huge_memory.c:2014!
[..]
RIP: 0010:__split_huge_pud+0x12a/0x180
[..]
Call Trace:
unmap_page_range+0x245/0xa40
? __vma_adjust+0x301/0x990
unmap_vmas+0x4c/0xa0
unmap_region+0xae/0x120
? __vma_rb_erase+0x11a/0x230
do_munmap+0x276/0x410
vm_munmap+0x6a/0xa0
SyS_munmap+0x1d/0x30
Link: http://lkml.kernel.org/r/151130418681.4029.7118245855057952010.stgit@dwilli…
Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reported-by: Jeff Moyer <jmoyer(a)redhat.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/dax/device.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -427,9 +427,21 @@ static int dev_dax_fault(struct vm_fault
return dev_dax_huge_fault(vmf, PE_SIZE_PTE);
}
+static int dev_dax_split(struct vm_area_struct *vma, unsigned long addr)
+{
+ struct file *filp = vma->vm_file;
+ struct dev_dax *dev_dax = filp->private_data;
+ struct dax_region *dax_region = dev_dax->region;
+
+ if (!IS_ALIGNED(addr, dax_region->align))
+ return -EINVAL;
+ return 0;
+}
+
static const struct vm_operations_struct dax_vm_ops = {
.fault = dev_dax_fault,
.huge_fault = dev_dax_huge_fault,
+ .split = dev_dax_split,
};
static int dax_mmap(struct file *filp, struct vm_area_struct *vma)
Patches currently in stable-queue which might be from dan.j.williams(a)intel.com are
queue-4.14/mm-hugetlbfs-introduce-split-to-vm_operations_struct.patch
queue-4.14/ib-core-disable-memory-registration-of-filesystem-dax-vmas.patch
queue-4.14/mm-introduce-get_user_pages_longterm.patch
queue-4.14/mm-fail-get_vaddr_frames-for-filesystem-dax-mappings.patch
queue-4.14/device-dax-implement-split-to-catch-invalid-munmap-attempts.patch
queue-4.14/v4l2-disable-filesystem-dax-mapping-support.patch
queue-4.14/mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages.patch
queue-4.14/mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch
This is a note to let you know that I've just added the patch titled
autofs: revert "autofs: take more care to not update last_used on path walk"
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
autofs-revert-autofs-take-more-care-to-not-update-last_used-on-path-walk.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 43694d4bf843ddd34519e8e9de983deefeada699 Mon Sep 17 00:00:00 2001
From: Ian Kent <raven(a)themaw.net>
Date: Wed, 29 Nov 2017 16:11:23 -0800
Subject: autofs: revert "autofs: take more care to not update last_used on path walk"
From: Ian Kent <raven(a)themaw.net>
commit 43694d4bf843ddd34519e8e9de983deefeada699 upstream.
While commit 092a53452bb7 ("autofs: take more care to not update
last_used on path walk") helped (partially) resolve a problem where
automounts were not expiring due to aggressive accesses from user space
it has a side effect for very large environments.
This change helps with the expire problem by making the expire more
aggressive but, for very large environments, that means more mount
requests from clients. When there are a lot of clients that can mean
fairly significant server load increases.
It turns out I put the last_used in this position to solve this very
problem and failed to update my own thinking of the autofs expire
policy. So the patch being reverted introduces a regression which
should be fixed.
Link: http://lkml.kernel.org/r/151174729420.6162.1832622523537052460.stgit@pluto.…
Fixes: 092a53452b ("autofs: take more care to not update last_used on path walk")
Signed-off-by: Ian Kent <raven(a)themaw.net>
Reviewed-by: NeilBrown <neilb(a)suse.com>
Cc: Al Viro <viro(a)ZenIV.linux.org.uk>
Cc: Colin Walters <walters(a)redhat.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Ondrej Holy <oholy(a)redhat.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/autofs4/root.c | 17 ++++++-----------
1 file changed, 6 insertions(+), 11 deletions(-)
--- a/fs/autofs4/root.c
+++ b/fs/autofs4/root.c
@@ -281,8 +281,8 @@ static int autofs4_mount_wait(const stru
pr_debug("waiting for mount name=%pd\n", path->dentry);
status = autofs4_wait(sbi, path, NFY_MOUNT);
pr_debug("mount wait done status=%d\n", status);
- ino->last_used = jiffies;
}
+ ino->last_used = jiffies;
return status;
}
@@ -321,21 +321,16 @@ static struct dentry *autofs4_mountpoint
*/
if (autofs_type_indirect(sbi->type) && d_unhashed(dentry)) {
struct dentry *parent = dentry->d_parent;
+ struct autofs_info *ino;
struct dentry *new;
new = d_lookup(parent, &dentry->d_name);
if (!new)
return NULL;
- if (new == dentry)
- dput(new);
- else {
- struct autofs_info *ino;
-
- ino = autofs4_dentry_ino(new);
- ino->last_used = jiffies;
- dput(path->dentry);
- path->dentry = new;
- }
+ ino = autofs4_dentry_ino(new);
+ ino->last_used = jiffies;
+ dput(path->dentry);
+ path->dentry = new;
}
return path->dentry;
}
Patches currently in stable-queue which might be from raven(a)themaw.net are
queue-4.14/autofs-revert-autofs-fix-at_no_automount-not-being-honored.patch
queue-4.14/autofs-revert-autofs-take-more-care-to-not-update-last_used-on-path-walk.patch
This is a note to let you know that I've just added the patch titled
autofs: revert "autofs: fix AT_NO_AUTOMOUNT not being honored"
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
autofs-revert-autofs-fix-at_no_automount-not-being-honored.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 5d38f049cee1e1c4a7ac55aa79d37d01ddcc3860 Mon Sep 17 00:00:00 2001
From: Ian Kent <raven(a)themaw.net>
Date: Wed, 29 Nov 2017 16:11:26 -0800
Subject: autofs: revert "autofs: fix AT_NO_AUTOMOUNT not being honored"
From: Ian Kent <raven(a)themaw.net>
commit 5d38f049cee1e1c4a7ac55aa79d37d01ddcc3860 upstream.
Commit 42f461482178 ("autofs: fix AT_NO_AUTOMOUNT not being honored")
allowed the fstatat(2) system call to properly honor the AT_NO_AUTOMOUNT
flag but introduced a semantic change.
In order to honor AT_NO_AUTOMOUNT a semantic change was made to the
negative dentry case for stat family system calls in follow_automount().
This changed the unconditional triggering of an automount in this case
to no longer be done and an error returned instead.
This has caused more problems than I expected so reverting the change is
needed.
In a discussion with Neil Brown it was concluded that the automount(8)
daemon can implement this change without kernel modifications. So that
will be done instead and the autofs module documentation updated with a
description of the problem and what needs to be done by module users for
this specific case.
Link: http://lkml.kernel.org/r/151174730120.6162.3848002191530283984.stgit@pluto.…
Fixes: 42f4614821 ("autofs: fix AT_NO_AUTOMOUNT not being honored")
Signed-off-by: Ian Kent <raven(a)themaw.net>
Cc: Neil Brown <neilb(a)suse.com>
Cc: Al Viro <viro(a)ZenIV.linux.org.uk>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Colin Walters <walters(a)redhat.com>
Cc: Ondrej Holy <oholy(a)redhat.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/namei.c | 15 +++------------
include/linux/fs.h | 3 ++-
2 files changed, 5 insertions(+), 13 deletions(-)
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1129,18 +1129,9 @@ static int follow_automount(struct path
* of the daemon to instantiate them before they can be used.
*/
if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
- LOOKUP_OPEN | LOOKUP_CREATE |
- LOOKUP_AUTOMOUNT))) {
- /* Positive dentry that isn't meant to trigger an
- * automount, EISDIR will allow it to be used,
- * otherwise there's no mount here "now" so return
- * ENOENT.
- */
- if (path->dentry->d_inode)
- return -EISDIR;
- else
- return -ENOENT;
- }
+ LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
+ path->dentry->d_inode)
+ return -EISDIR;
if (path->dentry->d_sb->s_user_ns != &init_user_ns)
return -EACCES;
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3069,7 +3069,8 @@ static inline int vfs_lstat(const char _
static inline int vfs_fstatat(int dfd, const char __user *filename,
struct kstat *stat, int flags)
{
- return vfs_statx(dfd, filename, flags, stat, STATX_BASIC_STATS);
+ return vfs_statx(dfd, filename, flags | AT_NO_AUTOMOUNT,
+ stat, STATX_BASIC_STATS);
}
static inline int vfs_fstat(int fd, struct kstat *stat)
{
Patches currently in stable-queue which might be from raven(a)themaw.net are
queue-4.14/autofs-revert-autofs-fix-at_no_automount-not-being-honored.patch
queue-4.14/autofs-revert-autofs-take-more-care-to-not-update-last_used-on-path-walk.patch
This is a note to let you know that I've just added the patch titled
mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mm-thp-do-not-make-page-table-dirty-unconditionally-in-touch_pd.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From a8f97366452ed491d13cf1e44241bc0b5740b1f0 Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Date: Mon, 27 Nov 2017 06:21:25 +0300
Subject: mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()
From: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
commit a8f97366452ed491d13cf1e44241bc0b5740b1f0 upstream.
Currently, we unconditionally make page table dirty in touch_pmd().
It may result in false-positive can_follow_write_pmd().
We may avoid the situation, if we would only make the page table entry
dirty if caller asks for write access -- FOLL_WRITE.
The patch also changes touch_pud() in the same way.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Hugh Dickins <hughd(a)google.com>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
[Salvatore Bonaccorso: backport for 3.16:
- Adjust context
- Drop specific part for PUD-sized transparent hugepages. Support
for PUD-sized transparent hugepages was added in v4.11-rc1
]
Signed-off-by: Ben Hutchings <ben(a)decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
mm/huge_memory.c | 14 ++++----------
1 file changed, 4 insertions(+), 10 deletions(-)
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1240,17 +1240,11 @@ struct page *follow_trans_huge_pmd(struc
VM_BUG_ON_PAGE(!PageHead(page), page);
if (flags & FOLL_TOUCH) {
pmd_t _pmd;
- /*
- * We should set the dirty bit only for FOLL_WRITE but
- * for now the dirty bit in the pmd is meaningless.
- * And if the dirty bit will become meaningful and
- * we'll only set it with FOLL_WRITE, an atomic
- * set_bit will be required on the pmd to set the
- * young bit, instead of the current set_pmd_at.
- */
- _pmd = pmd_mkyoung(pmd_mkdirty(*pmd));
+ _pmd = pmd_mkyoung(*pmd);
+ if (flags & FOLL_WRITE)
+ _pmd = pmd_mkdirty(_pmd);
if (pmdp_set_access_flags(vma, addr & HPAGE_PMD_MASK,
- pmd, _pmd, 1))
+ pmd, _pmd, flags & FOLL_WRITE))
update_mmu_cache_pmd(vma, addr, pmd);
}
if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) {
Patches currently in stable-queue which might be from kirill.shutemov(a)linux.intel.com are
queue-3.18/mm-thp-do-not-make-page-table-dirty-unconditionally-in-touch_pd.patch
queue-3.18/mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch
This is a note to let you know that I've just added the patch titled
mm/madvise.c: fix madvise() infinite loop under special circumstances
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 6ea8d958a2c95a1d514015d4e29ba21a8c0a1a91 Mon Sep 17 00:00:00 2001
From: chenjie <chenjie6(a)huawei.com>
Date: Wed, 29 Nov 2017 16:10:54 -0800
Subject: mm/madvise.c: fix madvise() infinite loop under special circumstances
From: chenjie <chenjie6(a)huawei.com>
commit 6ea8d958a2c95a1d514015d4e29ba21a8c0a1a91 upstream.
MADVISE_WILLNEED has always been a noop for DAX (formerly XIP) mappings.
Unfortunately madvise_willneed() doesn't communicate this information
properly to the generic madvise syscall implementation. The calling
convention is quite subtle there. madvise_vma() is supposed to either
return an error or update &prev otherwise the main loop will never
advance to the next vma and it will keep looping for ever without a way
to get out of the kernel.
It seems this has been broken since introduction. Nobody has noticed
because nobody seems to be using MADVISE_WILLNEED on these DAX mappings.
[mhocko(a)suse.com: rewrite changelog]
Link: http://lkml.kernel.org/r/20171127115318.911-1-guoxuenan@huawei.com
Fixes: fe77ba6f4f97 ("[PATCH] xip: madvice/fadvice: execute in place")
Signed-off-by: chenjie <chenjie6(a)huawei.com>
Signed-off-by: guoxuenan <guoxuenan(a)huawei.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: zhangyi (F) <yi.zhang(a)huawei.com>
Cc: Miao Xie <miaoxie(a)huawei.com>
Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com>
Cc: Shaohua Li <shli(a)fb.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Anshuman Khandual <khandual(a)linux.vnet.ibm.com>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Carsten Otte <cotte(a)de.ibm.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
mm/madvise.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -221,9 +221,9 @@ static long madvise_willneed(struct vm_a
{
struct file *file = vma->vm_file;
+ *prev = vma;
#ifdef CONFIG_SWAP
if (!file || mapping_cap_swap_backed(file->f_mapping)) {
- *prev = vma;
if (!file)
force_swapin_readahead(vma, start, end);
else
@@ -241,7 +241,6 @@ static long madvise_willneed(struct vm_a
return 0;
}
- *prev = vma;
start = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
if (end > vma->vm_end)
end = vma->vm_end;
Patches currently in stable-queue which might be from chenjie6(a)huawei.com are
queue-3.18/mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5f1d43de54164dcfb9bfa542fcc92c1e1a1b6c1d Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 29 Nov 2017 16:10:47 -0800
Subject: [PATCH] IB/core: disable memory registration of filesystem-dax vmas
Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow RDMA to create long standing memory registrations
against filesytem-dax vmas.
Link: http://lkml.kernel.org/r/151068941011.7446.7766030590347262502.stgit@dwilli…
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reported-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Acked-by: Jason Gunthorpe <jgg(a)mellanox.com>
Acked-by: Doug Ledford <dledford(a)redhat.com>
Cc: Sean Hefty <sean.hefty(a)intel.com>
Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Cc: Inki Dae <inki.dae(a)samsung.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Joonyoung Shim <jy0922.shim(a)samsung.com>
Cc: Kyungmin Park <kyungmin.park(a)samsung.com>
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 21e60b1e2ff4..130606c3b07c 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -191,7 +191,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
sg_list_start = umem->sg_head.sgl;
while (npages) {
- ret = get_user_pages(cur_base,
+ ret = get_user_pages_longterm(cur_base,
min_t(unsigned long, npages,
PAGE_SIZE / sizeof (struct page *)),
gup_flags, page_list, vma_list);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b70131de648c2b997d22f4653934438013f407a1 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 29 Nov 2017 16:10:43 -0800
Subject: [PATCH] v4l2: disable filesystem-dax mapping support
V4L2 memory registrations are incompatible with filesystem-dax that
needs the ability to revoke dma access to a mapping at will, or
otherwise allow the kernel to wait for completion of DMA. The
filesystem-dax implementation breaks the traditional solution of
truncate of active file backed mappings since there is no page-cache
page we can orphan to sustain ongoing DMA.
If v4l2 wants to support long lived DMA mappings it needs to arrange to
hold a file lease or use some other mechanism so that the kernel can
coordinate revoking DMA access when the filesystem needs to truncate
mappings.
Link: http://lkml.kernel.org/r/151068940499.7446.12846708245365671207.stgit@dwill…
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reported-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Doug Ledford <dledford(a)redhat.com>
Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com>
Cc: Inki Dae <inki.dae(a)samsung.com>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Joonyoung Shim <jy0922.shim(a)samsung.com>
Cc: Kyungmin Park <kyungmin.park(a)samsung.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Cc: Sean Hefty <sean.hefty(a)intel.com>
Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c
index 0b5c43f7e020..f412429cf5ba 100644
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -185,12 +185,13 @@ static int videobuf_dma_init_user_locked(struct videobuf_dmabuf *dma,
dprintk(1, "init user [0x%lx+0x%lx => %d pages]\n",
data, size, dma->nr_pages);
- err = get_user_pages(data & PAGE_MASK, dma->nr_pages,
+ err = get_user_pages_longterm(data & PAGE_MASK, dma->nr_pages,
flags, dma->pages, NULL);
if (err != dma->nr_pages) {
dma->nr_pages = (err >= 0) ? err : 0;
- dprintk(1, "get_user_pages: err=%d [%d]\n", err, dma->nr_pages);
+ dprintk(1, "get_user_pages_longterm: err=%d [%d]\n", err,
+ dma->nr_pages);
return err < 0 ? err : -EINVAL;
}
return 0;
Hi,
This patch series fixes the out-of-bound error caused by the return value
of usb_string(). It was descovered by KASAN. The Patch 1 is the V2 about
http://www.spinics.net/lists/alsa-devel/msg69487.html
Chanes in V2:
- put an explicit error bail out(by Takashi iwai)
Patch1 was founded by connecting the following product.
http://www.lg.com/uk/lg-friends/lg-AFD-1200
I found that it only check if the return value from usb_string() is always
zero while modifying OOB KASAN message. So instead of making the
modifications to OOB to V2, I sent a patch series.
I am sorry to break the mail thread.
Thanks
jaejoong
Jaejoong Kim (3):
ALSA: usb-audio: Fix out-of-bound error
ALSA: usb-audio: Fix return value check for usb_string()
ALSA: usb-audio: Add check return value for usb_string()
sound/usb/mixer.c | 41 ++++++++++++++++++++++++-----------------
1 file changed, 24 insertions(+), 17 deletions(-)
--
2.7.4
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b7f0554a56f21fb3e636a627450a9add030889be Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 29 Nov 2017 16:10:39 -0800
Subject: [PATCH] mm: fail get_vaddr_frames() for filesystem-dax mappings
Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow V4L2, Exynos, and other frame vector users to create
long standing / irrevocable memory registrations against filesytem-dax
vmas.
[dan.j.williams(a)intel.com: add comment for vma_is_fsdax() check in get_vaddr_frames(), per Jan]
Link: http://lkml.kernel.org/r/151197874035.26211.4061781453123083667.stgit@dwill…
Link: http://lkml.kernel.org/r/151068939985.7446.15684639617389154187.stgit@dwill…
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Cc: Inki Dae <inki.dae(a)samsung.com>
Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com>
Cc: Joonyoung Shim <jy0922.shim(a)samsung.com>
Cc: Kyungmin Park <kyungmin.park(a)samsung.com>
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Doug Ledford <dledford(a)redhat.com>
Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Cc: Sean Hefty <sean.hefty(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/frame_vector.c b/mm/frame_vector.c
index 2f98df0d460e..297c7238f7d4 100644
--- a/mm/frame_vector.c
+++ b/mm/frame_vector.c
@@ -53,6 +53,18 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
ret = -EFAULT;
goto out;
}
+
+ /*
+ * While get_vaddr_frames() could be used for transient (kernel
+ * controlled lifetime) pinning of memory pages all current
+ * users establish long term (userspace controlled lifetime)
+ * page pinning. Treat get_vaddr_frames() like
+ * get_user_pages_longterm() and disallow it for filesystem-dax
+ * mappings.
+ */
+ if (vma_is_fsdax(vma))
+ return -EOPNOTSUPP;
+
if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
vec->got_ref = true;
vec->is_pfns = false;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2bb6d2837083de722bfdc369cb0d76ce188dd9b4 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 29 Nov 2017 16:10:35 -0800
Subject: [PATCH] mm: introduce get_user_pages_longterm
Patch series "introduce get_user_pages_longterm()", v2.
Here is a new get_user_pages api for cases where a driver intends to
keep an elevated page count indefinitely. This is distinct from usages
like iov_iter_get_pages where the elevated page counts are transient.
The iov_iter_get_pages cases immediately turn around and submit the
pages to a device driver which will put_page when the i/o operation
completes (under kernel control).
In the longterm case userspace is responsible for dropping the page
reference at some undefined point in the future. This is untenable for
filesystem-dax case where the filesystem is in control of the lifetime
of the block / page and needs reasonable limits on how long it can wait
for pages in a mapping to become idle.
Fixing filesystems to actually wait for dax pages to be idle before
blocks from a truncate/hole-punch operation are repurposed is saved for
a later patch series.
Also, allowing longterm registration of dax mappings is a future patch
series that introduces a "map with lease" semantic where the kernel can
revoke a lease and force userspace to drop its page references.
I have also tagged these for -stable to purposely break cases that might
assume that longterm memory registrations for filesystem-dax mappings
were supported by the kernel. The behavior regression this policy
change implies is one of the reasons we maintain the "dax enabled.
Warning: EXPERIMENTAL, use at your own risk" notification when mounting
a filesystem in dax mode.
It is worth noting the device-dax interface does not suffer the same
constraints since it does not support file space management operations
like hole-punch.
This patch (of 4):
Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow long standing memory registrations against
filesytem-dax vmas. Device-dax vmas do not have this problem and are
explicitly allowed.
This is temporary until a "memory registration with layout-lease"
mechanism can be implemented for the affected sub-systems (RDMA and
V4L2).
[akpm(a)linux-foundation.org: use kcalloc()]
Link: http://lkml.kernel.org/r/151068939435.7446.13560129395419350737.stgit@dwill…
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Suggested-by: Christoph Hellwig <hch(a)lst.de>
Cc: Doug Ledford <dledford(a)redhat.com>
Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com>
Cc: Inki Dae <inki.dae(a)samsung.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Joonyoung Shim <jy0922.shim(a)samsung.com>
Cc: Kyungmin Park <kyungmin.park(a)samsung.com>
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Cc: Sean Hefty <sean.hefty(a)intel.com>
Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/include/linux/fs.h b/include/linux/fs.h
index bbd92da0946e..9dc498d16cc1 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3194,6 +3194,20 @@ static inline bool vma_is_dax(struct vm_area_struct *vma)
return vma->vm_file && IS_DAX(vma->vm_file->f_mapping->host);
}
+static inline bool vma_is_fsdax(struct vm_area_struct *vma)
+{
+ struct inode *inode;
+
+ if (!vma->vm_file)
+ return false;
+ if (!vma_is_dax(vma))
+ return false;
+ inode = file_inode(vma->vm_file);
+ if (inode->i_mode == S_IFCHR)
+ return false; /* device-dax */
+ return true;
+}
+
static inline int iocb_flags(struct file *file)
{
int res = 0;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b3b6a7e313e9..ea818ff739cd 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1380,6 +1380,19 @@ long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages, int *locked);
long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
struct page **pages, unsigned int gup_flags);
+#ifdef CONFIG_FS_DAX
+long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
+ unsigned int gup_flags, struct page **pages,
+ struct vm_area_struct **vmas);
+#else
+static inline long get_user_pages_longterm(unsigned long start,
+ unsigned long nr_pages, unsigned int gup_flags,
+ struct page **pages, struct vm_area_struct **vmas)
+{
+ return get_user_pages(start, nr_pages, gup_flags, pages, vmas);
+}
+#endif /* CONFIG_FS_DAX */
+
int get_user_pages_fast(unsigned long start, int nr_pages, int write,
struct page **pages);
diff --git a/mm/gup.c b/mm/gup.c
index 85cc822fd403..d3fb60e5bfac 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1095,6 +1095,70 @@ long get_user_pages(unsigned long start, unsigned long nr_pages,
}
EXPORT_SYMBOL(get_user_pages);
+#ifdef CONFIG_FS_DAX
+/*
+ * This is the same as get_user_pages() in that it assumes we are
+ * operating on the current task's mm, but it goes further to validate
+ * that the vmas associated with the address range are suitable for
+ * longterm elevated page reference counts. For example, filesystem-dax
+ * mappings are subject to the lifetime enforced by the filesystem and
+ * we need guarantees that longterm users like RDMA and V4L2 only
+ * establish mappings that have a kernel enforced revocation mechanism.
+ *
+ * "longterm" == userspace controlled elevated page count lifetime.
+ * Contrast this to iov_iter_get_pages() usages which are transient.
+ */
+long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
+ unsigned int gup_flags, struct page **pages,
+ struct vm_area_struct **vmas_arg)
+{
+ struct vm_area_struct **vmas = vmas_arg;
+ struct vm_area_struct *vma_prev = NULL;
+ long rc, i;
+
+ if (!pages)
+ return -EINVAL;
+
+ if (!vmas) {
+ vmas = kcalloc(nr_pages, sizeof(struct vm_area_struct *),
+ GFP_KERNEL);
+ if (!vmas)
+ return -ENOMEM;
+ }
+
+ rc = get_user_pages(start, nr_pages, gup_flags, pages, vmas);
+
+ for (i = 0; i < rc; i++) {
+ struct vm_area_struct *vma = vmas[i];
+
+ if (vma == vma_prev)
+ continue;
+
+ vma_prev = vma;
+
+ if (vma_is_fsdax(vma))
+ break;
+ }
+
+ /*
+ * Either get_user_pages() failed, or the vma validation
+ * succeeded, in either case we don't need to put_page() before
+ * returning.
+ */
+ if (i >= rc)
+ goto out;
+
+ for (i = 0; i < rc; i++)
+ put_page(pages[i]);
+ rc = -EOPNOTSUPP;
+out:
+ if (vmas != vmas_arg)
+ kfree(vmas);
+ return rc;
+}
+EXPORT_SYMBOL(get_user_pages_longterm);
+#endif /* CONFIG_FS_DAX */
+
/**
* populate_vma_page_range() - populate a range of pages in the vma.
* @vma: target vma
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 9702cffdbf2129516db679e4467db81e1cd287da Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 29 Nov 2017 16:10:32 -0800
Subject: [PATCH] device-dax: implement ->split() to catch invalid munmap
attempts
Similar to how device-dax enforces that the 'address', 'offset', and
'len' parameters to mmap() be aligned to the device's fundamental
alignment, the same constraints apply to munmap(). Implement ->split()
to fail munmap calls that violate the alignment constraint.
Otherwise, we later fail VM_BUG_ON checks in the unmap_page_range() path
with crash signatures of the form:
vma ffff8800b60c8a88 start 00007f88c0000000 end 00007f88c0e00000
next (null) prev (null) mm ffff8800b61150c0
prot 8000000000000027 anon_vma (null) vm_ops ffffffffa0091240
pgoff 0 file ffff8800b638ef80 private_data (null)
flags: 0x380000fb(read|write|shared|mayread|maywrite|mayexec|mayshare|softdirty|mixedmap|hugepage)
------------[ cut here ]------------
kernel BUG at mm/huge_memory.c:2014!
[..]
RIP: 0010:__split_huge_pud+0x12a/0x180
[..]
Call Trace:
unmap_page_range+0x245/0xa40
? __vma_adjust+0x301/0x990
unmap_vmas+0x4c/0xa0
unmap_region+0xae/0x120
? __vma_rb_erase+0x11a/0x230
do_munmap+0x276/0x410
vm_munmap+0x6a/0xa0
SyS_munmap+0x1d/0x30
Link: http://lkml.kernel.org/r/151130418681.4029.7118245855057952010.stgit@dwilli…
Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reported-by: Jeff Moyer <jmoyer(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 6833ada237ab..7b0bf825c4e7 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -428,9 +428,21 @@ static int dev_dax_fault(struct vm_fault *vmf)
return dev_dax_huge_fault(vmf, PE_SIZE_PTE);
}
+static int dev_dax_split(struct vm_area_struct *vma, unsigned long addr)
+{
+ struct file *filp = vma->vm_file;
+ struct dev_dax *dev_dax = filp->private_data;
+ struct dax_region *dax_region = dev_dax->region;
+
+ if (!IS_ALIGNED(addr, dax_region->align))
+ return -EINVAL;
+ return 0;
+}
+
static const struct vm_operations_struct dax_vm_ops = {
.fault = dev_dax_fault,
.huge_fault = dev_dax_huge_fault,
+ .split = dev_dax_split,
};
static int dax_mmap(struct file *filp, struct vm_area_struct *vma)
Hi, Greg
>On Mon, Nov 27, 2017 at 11:25:47AM +0000, Bean Huo (beanhuo) wrote:
>> Hi, all
>> Is there someone knows if exists one utilis dedicated to UFS device, rather
>than SCSI utils?
>> I have tried sg3-utils, but it is not convenient for the embedded ARM-based
>system.
>> And also it doesn't support several UFS special command.
>
>What specific UFS commands do you need to make to the device that the
>current driver does not support?
There are some UFS/vendor native commands. They are not SCSI based.
>And yes, this is a trick question as there are about 4 different major forks that
>I know of of the UFS driver in different vendor trees, all of which support
>different types of UFS commands :(
>
>> If we don't have this kind of tool for UFS, is it necessary for us to develop a
>>ufs-utils?
>
>I doubt it, what neds to happen is getting all of the functionality that lives in
>these different forks all merged upstream into the in-kernel driver. Then I bet
>all of the needed functionality you are looking for will be there.
>
Sometimes customers tend to use user space tool to do some configuration.
And especially, for example the UFS FFU.
>good luck!
>
Thanks !
>greg k-h
//Bean Huo
Occasionally the following error message can be seen in the logs of
Qualcomm devices using UFS:
EXT4-fs (sda9): Delayed block allocation failed for inode 685600 at logical offset 1086 with max blocks 3 with error 121
EXT4-fs (sda9): This should not happen!! Data will be lost
This is caused by a failing WRITE_SAME command, which per the JEDEC UFS
specification is not a supported. Set the no_write_same flag on the
ufshcd SCSI host to let the SCSI layer know this.
Fixes: 7a3e97b0dc4b ("[SCSI] ufshcd: UFS Host controller driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Bjorn Andersson <bjorn.andersson(a)linaro.org>
---
drivers/scsi/ufs/ufshcd.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 88c086f5c4e3..e5b1efd1dafd 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -6515,6 +6515,7 @@ static struct scsi_host_template ufshcd_driver_template = {
.can_queue = UFSHCD_CAN_QUEUE,
.max_host_blocked = 1,
.track_queue_depth = 1,
+ .no_write_same = 1,
};
static int ufshcd_config_vreg_load(struct device *dev, struct ufs_vreg *vreg,
--
2.15.0
This is a note to let you know that I've just added the patch titled
iio: stm32: fix adc/trigger link error
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 6d745ee8b5e81f3a33791e3c854fbbfd6f3e585e Mon Sep 17 00:00:00 2001
From: Arnd Bergmann <arnd(a)arndb.de>
Date: Wed, 6 Sep 2017 14:56:50 +0200
Subject: iio: stm32: fix adc/trigger link error
The ADC driver can trigger on either the timer or the lptim
trigger, but it only uses a Kconfig 'select' statement
to ensure that the first of the two is present. When the lptim
trigger is enabled as a loadable module, and the adc driver
is built-in, we now get a link error:
drivers/iio/adc/stm32-adc.o: In function `stm32_adc_get_trig_extsel':
stm32-adc.c:(.text+0x4e0): undefined reference to `is_stm32_lptim_trigger'
We could use a second 'select' statement and always have both
trigger drivers enabled when the adc driver is, but it seems that
the lptimer trigger was intentionally left optional, so it seems
better to keep it that way.
This adds a hack to use 'IS_REACHABLE()' rather than 'IS_ENABLED()',
which avoids the link error, but instead leads to the lptimer trigger
not being used in the broken configuration. I've added a runtime
warning for this case to help users figure out what they did wrong
if this should ever be done by accident.
Fixes: f0b638a7f6db ("iio: adc: stm32: add support for lptimer triggers")
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
include/linux/iio/timer/stm32-lptim-trigger.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/include/linux/iio/timer/stm32-lptim-trigger.h b/include/linux/iio/timer/stm32-lptim-trigger.h
index 34d59bfdce2d..464458d20b16 100644
--- a/include/linux/iio/timer/stm32-lptim-trigger.h
+++ b/include/linux/iio/timer/stm32-lptim-trigger.h
@@ -16,11 +16,14 @@
#define LPTIM2_OUT "lptim2_out"
#define LPTIM3_OUT "lptim3_out"
-#if IS_ENABLED(CONFIG_IIO_STM32_LPTIMER_TRIGGER)
+#if IS_REACHABLE(CONFIG_IIO_STM32_LPTIMER_TRIGGER)
bool is_stm32_lptim_trigger(struct iio_trigger *trig);
#else
static inline bool is_stm32_lptim_trigger(struct iio_trigger *trig)
{
+#if IS_ENABLED(CONFIG_IIO_STM32_LPTIMER_TRIGGER)
+ pr_warn_once("stm32 lptim_trigger not linked in\n");
+#endif
return false;
}
#endif
--
2.15.1
This is a note to let you know that I've just added the patch titled
iio: health: max30102: Temperature should be in milli Celsius
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From ad44a9f804c1591ba2a2ec0ac8d916a515d2790c Mon Sep 17 00:00:00 2001
From: Peter Meerwald-Stadler <pmeerw(a)pmeerw.net>
Date: Fri, 27 Oct 2017 21:45:31 +0200
Subject: iio: health: max30102: Temperature should be in milli Celsius
As per ABI temperature should be in milli Celsius after scaling,
not Celsius
Note on stable cc. This driver is breaking the standard IIO
ABI. (JC)
Signed-off-by: Peter Meerwald-Stadler <pmeerw(a)pmeerw.net>
Acked-by: Matt Ranostay <matt.ranostay(a)konsulko.com>
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/health/max30102.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iio/health/max30102.c b/drivers/iio/health/max30102.c
index 203ffb9cad6a..147a8c14235f 100644
--- a/drivers/iio/health/max30102.c
+++ b/drivers/iio/health/max30102.c
@@ -371,7 +371,7 @@ static int max30102_read_raw(struct iio_dev *indio_dev,
mutex_unlock(&indio_dev->mlock);
break;
case IIO_CHAN_INFO_SCALE:
- *val = 1; /* 0.0625 */
+ *val = 1000; /* 62.5 */
*val2 = 16;
ret = IIO_VAL_FRACTIONAL;
break;
--
2.15.1
This is a note to let you know that I've just added the patch titled
iio: fix kernel-doc build errors
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From c175cb7cd953782bbf4e8bdf088ad61440d6dde5 Mon Sep 17 00:00:00 2001
From: Randy Dunlap <rdunlap(a)infradead.org>
Date: Sun, 29 Oct 2017 17:06:01 -0700
Subject: iio: fix kernel-doc build errors
Fix build errors in kernel-doc notation. Symbols that end in '_'
have a special meaning, but adding a '*' makes them OK.
../drivers/iio/industrialio-core.c:635: ERROR: Unknown target name: "iio_val".
../drivers/iio/industrialio-core.c:642: ERROR: Unknown target name: "iio_val".
Signed-off-by: Randy Dunlap <rdunlap(a)infradead.org>
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/industrialio-core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/iio/industrialio-core.c b/drivers/iio/industrialio-core.c
index 9c4cfd19b739..2f0998ebeed2 100644
--- a/drivers/iio/industrialio-core.c
+++ b/drivers/iio/industrialio-core.c
@@ -631,7 +631,7 @@ static ssize_t __iio_format_value(char *buf, size_t len, unsigned int type,
* iio_format_value() - Formats a IIO value into its string representation
* @buf: The buffer to which the formatted value gets written
* which is assumed to be big enough (i.e. PAGE_SIZE).
- * @type: One of the IIO_VAL_... constants. This decides how the val
+ * @type: One of the IIO_VAL_* constants. This decides how the val
* and val2 parameters are formatted.
* @size: Number of IIO value entries contained in vals
* @vals: Pointer to the values, exact meaning depends on the
@@ -639,7 +639,7 @@ static ssize_t __iio_format_value(char *buf, size_t len, unsigned int type,
*
* Return: 0 by default, a negative number on failure or the
* total number of characters written for a type that belongs
- * to the IIO_VAL_... constant.
+ * to the IIO_VAL_* constant.
*/
ssize_t iio_format_value(char *buf, unsigned int type, int size, int *vals)
{
--
2.15.1
This is a note to let you know that I've just added the patch titled
iio: adc: meson-saradc: fix the bit_idx of the adc_en clock
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 7a6b0420d2fe4ce59437bd318826fe468f0d71ae Mon Sep 17 00:00:00 2001
From: Martin Blumenstingl <martin.blumenstingl(a)googlemail.com>
Date: Tue, 31 Oct 2017 21:01:43 +0100
Subject: iio: adc: meson-saradc: fix the bit_idx of the adc_en clock
Meson8 and Meson8b SoCs use the the SAR ADC gate clock provided by the
MESON_SAR_ADC_REG3 register within the SAR ADC register area.
According to the datasheet (and the existing MESON_SAR_ADC_REG3_CLK_EN
definition) the gate is on bit 30.
The fls() function returns the last set bit, which is "bit index + 1"
(fls(MESON_SAR_ADC_REG3_CLK_EN) returns 31). Fix this by switching to
__ffs() which returns the first set bit, which is bit 30 in our case.
This off by one error results in the ADC not being usable on devices
where the bootloader did not enable the clock.
Fixes: 3adbf3427330 ("iio: adc: add a driver for the SAR ADC found in Amlogic Meson SoCs")
Signed-off-by: Martin Blumenstingl <martin.blumenstingl(a)googlemail.com>
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/adc/meson_saradc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iio/adc/meson_saradc.c b/drivers/iio/adc/meson_saradc.c
index 9c6932ffc0af..1d25c78b74d2 100644
--- a/drivers/iio/adc/meson_saradc.c
+++ b/drivers/iio/adc/meson_saradc.c
@@ -600,7 +600,7 @@ static int meson_sar_adc_clk_init(struct iio_dev *indio_dev,
init.num_parents = 1;
priv->clk_gate.reg = base + MESON_SAR_ADC_REG3;
- priv->clk_gate.bit_idx = fls(MESON_SAR_ADC_REG3_CLK_EN);
+ priv->clk_gate.bit_idx = __ffs(MESON_SAR_ADC_REG3_CLK_EN);
priv->clk_gate.hw.init = &init;
priv->adc_clk = devm_clk_register(&indio_dev->dev, &priv->clk_gate.hw);
--
2.15.1
This is a note to let you know that I've just added the patch titled
iio: adc: cpcap: fix incorrect validation
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 81b039ec36a41a5451e1e36f05bb055eceab1dc8 Mon Sep 17 00:00:00 2001
From: Pan Bian <bianpan2016(a)163.com>
Date: Mon, 13 Nov 2017 00:01:20 +0800
Subject: iio: adc: cpcap: fix incorrect validation
Function platform_get_irq_byname() returns a negative error code on
failure, and a zero or positive number on success. However, in function
cpcap_adc_probe(), positive IRQ numbers are also taken as error cases.
Use "if (ddata->irq < 0)" instead of "if (!ddata->irq)" to validate the
return value of platform_get_irq_byname().
Signed-off-by: Pan Bian <bianpan2016(a)163.com>
Fixes: 25ec249632d50 ("iio: adc: cpcap: Add minimal support for CPCAP PMIC ADC")
Reviewed-by: Sebastian Reichel <sebastian.reichel(a)collabora.co.uk>
Acked-by: Tony Lindgren <tony(a)atomide.com>
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
drivers/iio/adc/cpcap-adc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iio/adc/cpcap-adc.c b/drivers/iio/adc/cpcap-adc.c
index 3576ec73ec23..9ad60421d360 100644
--- a/drivers/iio/adc/cpcap-adc.c
+++ b/drivers/iio/adc/cpcap-adc.c
@@ -1011,7 +1011,7 @@ static int cpcap_adc_probe(struct platform_device *pdev)
platform_set_drvdata(pdev, indio_dev);
ddata->irq = platform_get_irq_byname(pdev, "adcdone");
- if (!ddata->irq)
+ if (ddata->irq < 0)
return -ENODEV;
error = devm_request_threaded_irq(&pdev->dev, ddata->irq, NULL,
--
2.15.1
backup_info field is only allocated for decrypt code path.
The field was not nullified when not used causing a kfree
in an error handling path to attempt to free random
addresses as uncovered in stress testing.
Fixes: 737aed947f9b ("staging: ccree: save ciphertext for CTS IV")
Cc: stable(a)vger.kernel.org
Signed-off-by: Gilad Ben-Yossef <gilad(a)benyossef.com>
---
drivers/staging/ccree/ssi_cipher.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/staging/ccree/ssi_cipher.c b/drivers/staging/ccree/ssi_cipher.c
index 9019615..7b484f1 100644
--- a/drivers/staging/ccree/ssi_cipher.c
+++ b/drivers/staging/ccree/ssi_cipher.c
@@ -907,6 +907,7 @@ static int ssi_ablkcipher_encrypt(struct ablkcipher_request *req)
unsigned int ivsize = crypto_ablkcipher_ivsize(ablk_tfm);
req_ctx->is_giv = false;
+ req_ctx->backup_info = NULL;
return ssi_blkcipher_process(tfm, req_ctx, req->dst, req->src,
req->nbytes, req->info, ivsize,
--
2.7.4
On Tue, Sep 5, 2017 at 1:06 PM, Marcin Nowakowski
<marcin.nowakowski(a)imgtec.com> wrote:
> Change 73fbc1eba7ff added a fix to ensure that the memory range between
> PHYS_OFFSET and low memory address specified by mem= cmdline argument is
> not later processed by free_all_bootmem.
> This change was incorrect for systems where the commandline specifies
> more than 1 mem argument, as it will cause all memory between
> PHYS_OFFSET and each of the memory offsets to be marked as reserved,
> which results in parts of the RAM marked as reserved (Creator CI20's
> u-boot has a default commandline argument 'mem=256M@0x0
> mem=768M@0x30000000').
>
> Change the behaviour to ensure that only the range between PHYS_OFFSET
> and the lowest start address of the memories is marked as protected.
>
> This change also ensures that the range is marked protected even if it's
> only defined through the devicetree and not only via commandline
> arguments.
>
> Reported-by: Mathieu Malaterre <mathieu.malaterre(a)gmail.com>
> Signed-off-by: Marcin Nowakowski <marcin.nowakowski(a)imgtec.com>
> Fixes: 73fbc1eba7ff ("MIPS: fix mem=X@Y commandline processing")
> Cc: stable(a)vger.kernel.org
> ---
> arch/mips/kernel/setup.c | 19 ++++++++++++++++---
> 1 file changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> index fe39397..a1c39ec 100644
> --- a/arch/mips/kernel/setup.c
> +++ b/arch/mips/kernel/setup.c
> @@ -374,6 +374,7 @@ static void __init bootmem_init(void)
> unsigned long reserved_end;
> unsigned long mapstart = ~0UL;
> unsigned long bootmap_size;
> + phys_addr_t ramstart = ~0UL;
> bool bootmap_valid = false;
> int i;
>
> @@ -394,6 +395,21 @@ static void __init bootmem_init(void)
> max_low_pfn = 0;
>
> /*
> + * Reserve any memory between the start of RAM and PHYS_OFFSET
> + */
> + for (i = 0; i < boot_mem_map.nr_map; i++) {
> + if (boot_mem_map.map[i].type != BOOT_MEM_RAM)
> + continue;
> +
> + ramstart = min(ramstart, boot_mem_map.map[i].addr);
> + }
> +
> + if (ramstart > PHYS_OFFSET)
> + add_memory_region(PHYS_OFFSET, ramstart - PHYS_OFFSET,
> + BOOT_MEM_RESERVED);
> +
> +
> + /*
> * Find the highest page frame number we have available.
> */
> for (i = 0; i < boot_mem_map.nr_map; i++) {
> @@ -663,9 +679,6 @@ static int __init early_parse_mem(char *p)
>
> add_memory_region(start, size, BOOT_MEM_RAM);
>
> - if (start && start > PHYS_OFFSET)
> - add_memory_region(PHYS_OFFSET, start - PHYS_OFFSET,
> - BOOT_MEM_RESERVED);
> return 0;
> }
> early_param("mem", early_parse_mem);
> --
> 2.7.4
>
>
Teested-by: Mathieu Malaterre <malat(a)debian.org>
Would be nice to have it upstream at some point...
Thanks
From: Xin Long <lucien.xin(a)gmail.com>
[ Upstream commit cebe84c6190d741045a322f5343f717139993c08 ]
Now when ip route flush cache and it turn out all fnhe_genid != genid.
If a redirect/pmtu icmp packet comes and the old fnhe is found and all
it's members but fnhe_genid will be updated.
Then next time when it looks up route and tries to rebind this fnhe to
the new dst, the fnhe will be flushed due to fnhe_genid != genid. It
causes this redirect/pmtu icmp packet acutally not to be applied.
This patch is to also reset fnhe_genid when updating a route cache.
Fixes: 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions")
Acked-by: Hannes Frederic Sowa <hannes(a)stressinduktion.org>
Signed-off-by: Xin Long <lucien.xin(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin(a)verizon.com>
---
net/ipv4/route.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 2fe9459fbe24..a04d29d70e6a 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -622,9 +622,12 @@ static void update_or_create_fnhe(struct fib_nh *nh, __be32 daddr, __be32 gw,
struct fnhe_hash_bucket *hash;
struct fib_nh_exception *fnhe;
struct rtable *rt;
+ u32 genid, hval;
unsigned int i;
int depth;
- u32 hval = fnhe_hashfun(daddr);
+
+ genid = fnhe_genid(dev_net(nh->nh_dev));
+ hval = fnhe_hashfun(daddr);
spin_lock_bh(&fnhe_lock);
@@ -647,6 +650,8 @@ static void update_or_create_fnhe(struct fib_nh *nh, __be32 daddr, __be32 gw,
}
if (fnhe) {
+ if (fnhe->fnhe_genid != genid)
+ fnhe->fnhe_genid = genid;
if (gw)
fnhe->fnhe_gw = gw;
if (pmtu) {
@@ -671,7 +676,7 @@ static void update_or_create_fnhe(struct fib_nh *nh, __be32 daddr, __be32 gw,
fnhe->fnhe_next = hash->chain;
rcu_assign_pointer(hash->chain, fnhe);
}
- fnhe->fnhe_genid = fnhe_genid(dev_net(nh->nh_dev));
+ fnhe->fnhe_genid = genid;
fnhe->fnhe_daddr = daddr;
fnhe->fnhe_gw = gw;
fnhe->fnhe_pmtu = pmtu;
--
2.11.0
From: Masahiro Yamada <yamada.masahiro(a)socionext.com>
[ Upstream commit 2dbc644ac62bbcb9ee78e84719953f611be0413d ]
For rpm-pkg and deb-pkg, a source tar file is created. All paths in
the archive must be prefixed with the base name of the tar so that
everything is contained in the directory when you extract it.
Currently, scripts/package/Makefile uses a symlink for that, and
removes it after the tar is created.
If you terminate the build during the tar creation, the symlink is
left over. Then, at the next package build, you will see a warning
like follows:
ln: '.' and 'kernel-4.14.0+/.' are the same file
It is possible to fix it by adding -n (--no-dereference) option to
the "ln" command, but a cleaner way is to use --transform option
of "tar" command. This option is GNU extension, but it should not
hurt to use it in the Linux build system.
The 'S' flag is needed to exclude symlinks from the path fixup.
Without it, symlinks in the kernel are broken.
Signed-off-by: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Signed-off-by: Sasha Levin <alexander.levin(a)verizon.com>
---
scripts/package/Makefile | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/scripts/package/Makefile b/scripts/package/Makefile
index 493e226356ca..52917fb8e0c5 100644
--- a/scripts/package/Makefile
+++ b/scripts/package/Makefile
@@ -39,10 +39,9 @@ if test "$(objtree)" != "$(srctree)"; then \
false; \
fi ; \
$(srctree)/scripts/setlocalversion --save-scmversion; \
-ln -sf $(srctree) $(2); \
tar -cz $(RCS_TAR_IGNORE) -f $(2).tar.gz \
- $(addprefix $(2)/,$(TAR_CONTENT) $(3)); \
-rm -f $(2) $(objtree)/.scmversion
+ --transform 's:^:$(2)/:S' $(TAR_CONTENT) $(3); \
+rm -f $(objtree)/.scmversion
# rpm-pkg
# ---------------------------------------------------------------------------
--
2.11.0
From: Masahiro Yamada <yamada.masahiro(a)socionext.com>
[ Upstream commit 2dbc644ac62bbcb9ee78e84719953f611be0413d ]
For rpm-pkg and deb-pkg, a source tar file is created. All paths in
the archive must be prefixed with the base name of the tar so that
everything is contained in the directory when you extract it.
Currently, scripts/package/Makefile uses a symlink for that, and
removes it after the tar is created.
If you terminate the build during the tar creation, the symlink is
left over. Then, at the next package build, you will see a warning
like follows:
ln: '.' and 'kernel-4.14.0+/.' are the same file
It is possible to fix it by adding -n (--no-dereference) option to
the "ln" command, but a cleaner way is to use --transform option
of "tar" command. This option is GNU extension, but it should not
hurt to use it in the Linux build system.
The 'S' flag is needed to exclude symlinks from the path fixup.
Without it, symlinks in the kernel are broken.
Signed-off-by: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Signed-off-by: Sasha Levin <alexander.levin(a)verizon.com>
---
scripts/package/Makefile | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/scripts/package/Makefile b/scripts/package/Makefile
index 71b4a8af9d4d..7badec3498b8 100644
--- a/scripts/package/Makefile
+++ b/scripts/package/Makefile
@@ -39,10 +39,9 @@ if test "$(objtree)" != "$(srctree)"; then \
false; \
fi ; \
$(srctree)/scripts/setlocalversion --save-scmversion; \
-ln -sf $(srctree) $(2); \
tar -cz $(RCS_TAR_IGNORE) -f $(2).tar.gz \
- $(addprefix $(2)/,$(TAR_CONTENT) $(3)); \
-rm -f $(2) $(objtree)/.scmversion
+ --transform 's:^:$(2)/:S' $(TAR_CONTENT) $(3); \
+rm -f $(objtree)/.scmversion
# rpm-pkg
# ---------------------------------------------------------------------------
--
2.11.0
From: Colin Ian King <colin.king(a)canonical.com>
[ Upstream commit e9990d70e8a063a7b894c5cbb99f630a0f41200d ]
The comparison of u32 nregs being less than zero is never true since
nregs is unsigned. Fix this by making nregs a signed integer.
Fixes: f20cc9b00c7b ("irqchip/qcom: Add IRQ combiner driver")
Signed-off-by: Colin Ian King <colin.king(a)canonical.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Marc Zyngier <marc.zyngier(a)arm.com>
Cc: kernel-janitors(a)vger.kernel.org
Cc: Jason Cooper <jason(a)lakedaemon.net>
Link: https://lkml.kernel.org/r/20171117183553.2739-1-colin.king@canonical.com
Signed-off-by: Sasha Levin <alexander.levin(a)verizon.com>
---
drivers/irqchip/qcom-irq-combiner.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/irqchip/qcom-irq-combiner.c b/drivers/irqchip/qcom-irq-combiner.c
index 6aa3ea479214..f31265937439 100644
--- a/drivers/irqchip/qcom-irq-combiner.c
+++ b/drivers/irqchip/qcom-irq-combiner.c
@@ -238,7 +238,7 @@ static int __init combiner_probe(struct platform_device *pdev)
{
struct combiner *combiner;
size_t alloc_sz;
- u32 nregs;
+ int nregs;
int err;
nregs = count_registers(pdev);
--
2.11.0
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Hi Greg,
Pleae pull commits for Linux 3.18 .
I've sent a review request for all commits over a week ago and all
comments were addressed.
Thanks,
Sasha
=====
The following changes since commit b42518053ffd221d79cff2df8c0257db88a71334:
Linux 3.18.85 (2017-11-30 08:35:56 +0000)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/sashal/linux-stable.git for-greg/4.14/3.18
for you to fetch changes up to 90cbc83fe2279c1f0b5e94196c27253513405d77:
perf test attr: Fix ignored test case result (2017-11-30 17:01:13 -0500)
- ----------------------------------------------------------------
Ben Hutchings (1):
usbip: tools: Install all headers needed for libusbip development
Boshi Wang (1):
ima: fix hash algorithm initialization
Gustavo A. R. Silva (1):
EDAC, sb_edac: Fix missing break in switch
Hiromitsu Yamasaki (1):
spi: sh-msiof: Fix DMA transfer size check
Jibin Xu (1):
sysrq : fix Show Regs call trace on ARM
Lukas Wunner (1):
serial: 8250_fintek: Fix rs485 disablement on invalid ioctl()
Masami Hiramatsu (1):
kprobes: Use synchronize_rcu_tasks() for optprobe with CONFIG_PREEMPT=y
Thomas Richter (1):
perf test attr: Fix ignored test case result
arch/Kconfig | 2 +-
drivers/edac/sb_edac.c | 1 +
drivers/spi/spi-sh-msiof.c | 2 +-
drivers/tty/serial/8250/8250_fintek.c | 2 +-
drivers/tty/sysrq.c | 9 +++++++--
kernel/kprobes.c | 14 ++++++++------
security/integrity/ima/ima_main.c | 4 ++++
tools/perf/tests/attr.c | 2 +-
tools/usb/usbip/Makefile.am | 3 ++-
9 files changed, 26 insertions(+), 13 deletions(-)
-----BEGIN PGP SIGNATURE-----
iQIcBAEBCAAGBQJaIIAGAAoJEN6mb/eXdyzcB/AP/2JM6q4DXlQNiG6gFgU1MAIR
N6uwbY9a+RRE/P2sSb0fTx+/f2JTdRhY/NK15y2up+vWxOAzBmqE/NotO0gzaMc1
a+8LkomLf9BS4I/ScRb/iAANUBwWcZNPFHuiYRNGg5y0Ru4uI2C4LNpmNXE6uhWC
ia7PjeElSbVZptmMAk1Fdb3A+eYZDwZUvBjf1Zcsjd3QYCS2psa/xQu+uvv4CsNX
1UlGnVqF6hIqrpO1JYgS1vRuGSsSdLiPVrFdVha9Npg6ORtBk6Be6y0vq7uHmvqD
x0+iFimMTadSRLPWW5u5CmXsDdhgXb3+5PPl8Bw4krJ7RX9/qWLWmyXJ3pGaZr7U
81LFws1Kt7sG+aDwCSq48rhUnbQAEardI16H9abujM3sCe3b0uboLm8PV9K7JlpZ
xiK5lo3U1ia2v1TKUeFS2KhdQtnV0YzHpfIDS9TDQk28kFFtkKDek3x8xM74Z6aa
xXtDi4qAJpa/73V9cr9DQM0UuavAk+3QDFEZras8C0dkMqi/DIEvpvZUQrvBMj3t
bCldw32ylhtg+xaKPNUO7CXoRoKmE6b2kAed0PJ9sZIXYwoS3PGZnvGUDXDNqOY1
Mad0Mhh8JDiEBtBpEuIQzSVSTRNIeOqGYS1RiCN0oPI0uY/3xWZ7NV2bj+KXAqOe
66nYqLJq2Ai7Dgx4l6Es
=eGmn
-----END PGP SIGNATURE-----
This is a note to let you know that I've just added the patch titled
mm, oom_reaper: gather each vma to prevent leaking TLB entry
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mm-oom_reaper-gather-each-vma-to-prevent-leaking-tlb-entry.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 687cb0884a714ff484d038e9190edc874edcf146 Mon Sep 17 00:00:00 2001
From: Wang Nan <wangnan0(a)huawei.com>
Date: Wed, 29 Nov 2017 16:09:58 -0800
Subject: mm, oom_reaper: gather each vma to prevent leaking TLB entry
From: Wang Nan <wangnan0(a)huawei.com>
commit 687cb0884a714ff484d038e9190edc874edcf146 upstream.
tlb_gather_mmu(&tlb, mm, 0, -1) means gathering the whole virtual memory
space. In this case, tlb->fullmm is true. Some archs like arm64
doesn't flush TLB when tlb->fullmm is true:
commit 5a7862e83000 ("arm64: tlbflush: avoid flushing when fullmm == 1").
Which causes leaking of tlb entries.
Will clarifies his patch:
"Basically, we tag each address space with an ASID (PCID on x86) which
is resident in the TLB. This means we can elide TLB invalidation when
pulling down a full mm because we won't ever assign that ASID to
another mm without doing TLB invalidation elsewhere (which actually
just nukes the whole TLB).
I think that means that we could potentially not fault on a kernel
uaccess, because we could hit in the TLB"
There could be a window between complete_signal() sending IPI to other
cores and all threads sharing this mm are really kicked off from cores.
In this window, the oom reaper may calls tlb_flush_mmu_tlbonly() to
flush TLB then frees pages. However, due to the above problem, the TLB
entries are not really flushed on arm64. Other threads are possible to
access these pages through TLB entries. Moreover, a copy_to_user() can
also write to these pages without generating page fault, causes
use-after-free bugs.
This patch gathers each vma instead of gathering full vm space. In this
case tlb->fullmm is not true. The behavior of oom reaper become similar
to munmapping before do_exit, which should be safe for all archs.
Link: http://lkml.kernel.org/r/20171107095453.179940-1-wangnan0@huawei.com
Fixes: aac453635549 ("mm, oom: introduce oom reaper")
Signed-off-by: Wang Nan <wangnan0(a)huawei.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: David Rientjes <rientjes(a)google.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: Bob Liu <liubo95(a)huawei.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
mm/oom_kill.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -532,7 +532,6 @@ static bool __oom_reap_task_mm(struct ta
*/
set_bit(MMF_UNSTABLE, &mm->flags);
- tlb_gather_mmu(&tlb, mm, 0, -1);
for (vma = mm->mmap ; vma; vma = vma->vm_next) {
if (!can_madv_dontneed_vma(vma))
continue;
@@ -547,11 +546,13 @@ static bool __oom_reap_task_mm(struct ta
* we do not want to block exit_mmap by keeping mm ref
* count elevated without a good reason.
*/
- if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED))
+ if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED)) {
+ tlb_gather_mmu(&tlb, mm, vma->vm_start, vma->vm_end);
unmap_page_range(&tlb, vma, vma->vm_start, vma->vm_end,
NULL);
+ tlb_finish_mmu(&tlb, vma->vm_start, vma->vm_end);
+ }
}
- tlb_finish_mmu(&tlb, 0, -1);
pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n",
task_pid_nr(tsk), tsk->comm,
K(get_mm_counter(mm, MM_ANONPAGES)),
Patches currently in stable-queue which might be from wangnan0(a)huawei.com are
queue-4.14/mm-oom_reaper-gather-each-vma-to-prevent-leaking-tlb-entry.patch
This is a note to let you know that I've just added the patch titled
mm, memory_hotplug: do not back off draining pcp free pages from kworker context
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
mm-memory_hotplug-do-not-back-off-draining-pcp-free-pages-from-kworker-context.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 4b81cb2ff69c8a8e297a147d2eb4d9b5e8d7c435 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko(a)suse.com>
Date: Wed, 29 Nov 2017 16:09:54 -0800
Subject: mm, memory_hotplug: do not back off draining pcp free pages from kworker context
From: Michal Hocko <mhocko(a)suse.com>
commit 4b81cb2ff69c8a8e297a147d2eb4d9b5e8d7c435 upstream.
drain_all_pages backs off when called from a kworker context since
commit 0ccce3b92421 ("mm, page_alloc: drain per-cpu pages from workqueue
context") because the original IPI based pcp draining has been replaced
by a WQ based one and the check wanted to prevent from recursion and
inter workers dependencies. This has made some sense at the time
because the system WQ has been used and one worker holding the lock
could be blocked while waiting for new workers to emerge which can be a
problem under OOM conditions.
Since then commit ce612879ddc7 ("mm: move pcp and lru-pcp draining into
single wq") has moved draining to a dedicated (mm_percpu_wq) WQ with a
rescuer so we shouldn't depend on any other WQ activity to make a
forward progress so calling drain_all_pages from a worker context is
safe as long as this doesn't happen from mm_percpu_wq itself which is
not the case because all workers are required to _not_ depend on any MM
locks.
Why is this a problem in the first place? ACPI driven memory hot-remove
(acpi_device_hotplug) is executed from the worker context. We end up
calling __offline_pages to free all the pages and that requires both
lru_add_drain_all_cpuslocked and drain_all_pages to do their job
otherwise we can have dangling pages on pcp lists and fail the offline
operation (__test_page_isolated_in_pageblock would see a page with 0 ref
count but without PageBuddy set).
Fix the issue by removing the worker check in drain_all_pages.
lru_add_drain_all_cpuslocked doesn't have this restriction so it works
as expected.
Link: http://lkml.kernel.org/r/20170828093341.26341-1-mhocko@kernel.org
Fixes: 0ccce3b924212 ("mm, page_alloc: drain per-cpu pages from workqueue context")
Signed-off-by: Michal Hocko <mhocko(a)suse.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Tejun Heo <tj(a)kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
mm/page_alloc.c | 4 ----
1 file changed, 4 deletions(-)
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2487,10 +2487,6 @@ void drain_all_pages(struct zone *zone)
if (WARN_ON_ONCE(!mm_percpu_wq))
return;
- /* Workqueues cannot recurse */
- if (current->flags & PF_WQ_WORKER)
- return;
-
/*
* Do not drain if one is already in progress unless it's specific to
* a zone. Such callers are primarily CMA and memory hotplug and need
Patches currently in stable-queue which might be from mhocko(a)suse.com are
queue-4.14/mm-oom_reaper-gather-each-vma-to-prevent-leaking-tlb-entry.patch
queue-4.14/mm-memory_hotplug-do-not-back-off-draining-pcp-free-pages-from-kworker-context.patch
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Hi Greg,
Pleae pull commits for Linux 3.18. .
I've sent a review request for all commits over a week ago and all
comments were addressed.
Thanks,
Sasha
=====
-----BEGIN PGP SIGNATURE-----
iQIcBAEBCAAGBQJaIH/ZAAoJEN6mb/eXdyzcNa4P/2ctdnWkCUVD+UgsEKWn8hBf
Gb2+RPaMaWf9wVF9LuKdl4JlGdrzTdXRAqmOZuaEWjGr89AOqbn84Z/Yb8NcmXef
8z/CEC2Gmb8lVLtDZdN7a8oSkV+Nt3EAGMK9qvUvUIoJAxAI12l/2jnVdwo1QKef
Q0PNH9rPrpEb4k1nlnT8xqz+Uc3qGVMa26s5jBVnheg1YX3ucdXCoOfwfgMbnujT
P0Sckb/j+hfWFx3AHHvuHuavrDpUEMSXvWeVd3mcQQUsyI+iUsjiIFVt9QY8p2Bw
qpFgWSHg6qwlFAl2/QrUhIOTJ6RfnExbjvkKPPIU/MH/hUhbzmQ+vl7o/TFgg2p0
n89tpP+TYRwMy0rA1NMYqVKN8+l+pQvJ89j2OXVJ5uJfJblvp3HbrwQu6EFIN52x
86/Su7zzy3WOp9KY/v+CksMxo1COrTOxmNSXu0Rd1I9P1F4j8rxHQj229sIuCHcs
zDgzWRT6oU5cdnw/iLcpKcH9GqrwBPDNkJYeSRrkSU5Vr/MCdZjUHFl8hBAz24tB
Tzy1km2N0HXQwfikTk8cruwHR9t+hziraN49gi1wknda5uw93wTgqFRwj8HHfxXJ
lSc5C5c4OlbkjARyAYkOlpgOBH/gLKrQWGg90cffOQIlfJ5cRgMfpI5o8WC3kobG
F99f74cLB36Gh7z84rIX
=0vft
-----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Hi Greg,
Pleae pull commits for Linux 3.18 .
I've sent a review request for all commits over a week ago and all
comments were addressed.
Thanks,
Sasha
=====
The following changes since commit c35c375efa4e2c832946a04e83155f928135e8f6:
Linux 3.18.83 (2017-11-21 09:01:08 +0100)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/sashal/linux-stable.git for-greg-3.18
for you to fetch changes up to af0d729a7a2287ee2ad8468025529700cf182f32:
net: fec: fix multicast filtering hardware setup (2017-11-22 13:40:15 -0500)
- ----------------------------------------------------------------
Benjamin Coddington (1):
nfs: Don't take a reference on fl->fl_file for LOCK operation
Colin Ian King (1):
net: sctp: fix array overrun read on sctp_timer_tbl
David Forster (1):
vti6: fix device register to report IFLA_INFO_KIND
Jan Kara (1):
mm: avoid returning VM_FAULT_RETRY from ->page_mkwrite handlers
Parthasarathy Bhuvaragan (1):
tipc: fix cleanup at module unload
Peter Ujfalusi (1):
ARM: OMAP1: DMA: Correct the number of logical channels
Rui Sousa (1):
net: fec: fix multicast filtering hardware setup
Trond Myklebust (1):
NFSv4: Fix client recovery when server reboots multiple times
Vlad Tsyrklevich (1):
net/appletalk: Fix kernel memory disclosure
arch/arm/mach-omap1/dma.c | 16 +++++++---------
drivers/net/appletalk/ipddp.c | 2 +-
drivers/net/ethernet/freescale/fec_main.c | 23 +++++++++--------------
drivers/staging/lustre/lustre/llite/llite_mmap.c | 4 +---
fs/nfs/nfs4proc.c | 3 ---
fs/nfs/nfs4state.c | 1 -
include/linux/buffer_head.h | 4 +---
net/ipv6/ip6_vti.c | 2 +-
net/sctp/debug.c | 2 +-
net/tipc/server.c | 4 +---
10 files changed, 22 insertions(+), 39 deletions(-)
-----BEGIN PGP SIGNATURE-----
iQIcBAEBCAAGBQJaIH/kAAoJEN6mb/eXdyzc7mAP/3yIZQSqf9D4m3Ze4zUD8MAL
DP6L5qmUz79FFzXaTtrles3Shn48P1I+r1o4Gqz2y0Nf6wZu+jmeKdaeByAeM1av
G69vHv35BnuooGhnrqBE/xM2EAZQ4eybeyVUFRztC07+LxX/9+CuCM13h5A2VJ07
Q2tNXd6W4l18cgfR5AeTy+x2kkoQsWU64XtZeZkT7fax02FBme7Q+jCRcK2TJYHy
MJFh+4yTTmwv0wOGCTUU8hdIdoOIZxBs/eQ1VbxXvzzuvNmucJzNEa2sG4pFB02a
p5e9SzxH/guiUoEuYX4yFWQNO48bh+6XvPpKMo2hR209jTHh7jlcJhb+6Ei46RXs
U6hIjHDOYoGOufolRNudCBsfrJKECxQzLi//Qx69Aq2Lww8OkgVIJq3nd0/0YP83
J0MD+8B0ofncHo4ietTt98Udz2xklr+gmOJLKggLGVbbn5symAUkSbWV4164O87r
a3o7mzRNky6JI/bQVyqGHvnBxIGMWzTb2gf1bf1HWwADrabcDYDpOpsNc5u7VkNa
n5GDf+IuiXtprc242BtuyPiODc8dDctmCwoqegOaytUNJWv5lHNw/jW324DeZ/BF
g2yFet6HdsbDBjEvo5rOMHisbr+m7ckgor1lmOnKmSuu/ZtIdbkSw7P0iHPOj2Id
m6Zzb8HW7iAkjezRFJyo
=xF38
-----END PGP SIGNATURE-----
From: Daniel Jurgens <danielj(a)mellanox.com>
For now the only LSM security enforcement mechanism available is
specific to InfiniBand. Bypass enforcement for non-IB link types.
This fixes a regression where modify_qp fails for iWARP because
querying the PKEY returns -EINVAL.
Cc: Paul Moore <paul(a)paul-moore.com>
Cc: Don Dutile <ddutile(a)redhat.com>
Cc: stable(a)vger.kernel.org
Reported-by: Potnuri Bharat Teja <bharat(a)chelsio.com>
Fixes: d291f1a65232("IB/core: Enforce PKey security on QPs")
Fixes: 47a2b338fe63("IB/core: Enforce security on management datagrams")
Signed-off-by: Daniel Jurgens <danielj(a)mellanox.com>
Reviewed-by: Parav Pandit <parav(a)mellanox.com>
Tested-by: Potnuri Bharat Teja <bharat(a)chelsio.com>
Signed-off-by: Leon Romanovsky <leon(a)kernel.org>
---
Changelog:
v3->v4: Unlock in error flow
v2->v3: Fix build warning
v1->v2: Fixed build errors
v0->v1: Added proper SElinux patch
---
drivers/infiniband/core/security.c | 50 +++++++++++++++++++++++++++++++++++---
1 file changed, 46 insertions(+), 4 deletions(-)
diff --git a/drivers/infiniband/core/security.c b/drivers/infiniband/core/security.c
index 209d057..817d554 100644
--- a/drivers/infiniband/core/security.c
+++ b/drivers/infiniband/core/security.c
@@ -417,8 +417,17 @@ void ib_close_shared_qp_security(struct ib_qp_security *sec)
int ib_create_qp_security(struct ib_qp *qp, struct ib_device *dev)
{
+ u8 i = rdma_start_port(dev);
+ bool is_ib = false;
int ret;
+ while (i <= rdma_end_port(dev) && !is_ib)
+ is_ib = rdma_protocol_ib(dev, i++);
+
+ /* If this isn't an IB device don't create the security context */
+ if (!is_ib)
+ return 0;
+
qp->qp_sec = kzalloc(sizeof(*qp->qp_sec), GFP_KERNEL);
if (!qp->qp_sec)
return -ENOMEM;
@@ -441,6 +450,10 @@ int ib_create_qp_security(struct ib_qp *qp, struct ib_device *dev)
void ib_destroy_qp_security_begin(struct ib_qp_security *sec)
{
+ /* Return if not IB */
+ if (!sec)
+ return;
+
mutex_lock(&sec->mutex);
/* Remove the QP from the lists so it won't get added to
@@ -470,6 +483,10 @@ void ib_destroy_qp_security_abort(struct ib_qp_security *sec)
int ret;
int i;
+ /* Return if not IB */
+ if (!sec)
+ return;
+
/* If a concurrent cache update is in progress this
* QP security could be marked for an error state
* transition. Wait for this to complete.
@@ -505,6 +522,10 @@ void ib_destroy_qp_security_end(struct ib_qp_security *sec)
{
int i;
+ /* Return if not IB */
+ if (!sec)
+ return;
+
/* If a concurrent cache update is occurring we must
* wait until this QP security structure is processed
* in the QP to error flow before destroying it because
@@ -557,7 +578,7 @@ int ib_security_modify_qp(struct ib_qp *qp,
{
int ret = 0;
struct ib_ports_pkeys *tmp_pps;
- struct ib_ports_pkeys *new_pps;
+ struct ib_ports_pkeys *new_pps = NULL;
struct ib_qp *real_qp = qp->real_qp;
bool special_qp = (real_qp->qp_type == IB_QPT_SMI ||
real_qp->qp_type == IB_QPT_GSI ||
@@ -565,18 +586,27 @@ int ib_security_modify_qp(struct ib_qp *qp,
bool pps_change = ((qp_attr_mask & (IB_QP_PKEY_INDEX | IB_QP_PORT)) ||
(qp_attr_mask & IB_QP_ALT_PATH));
+ WARN_ONCE((qp_attr_mask & IB_QP_PORT &&
+ rdma_protocol_ib(real_qp->device, qp_attr->port_num) &&
+ !real_qp->qp_sec),
+ "%s: QP security is not initialized for IB QP: %d\n",
+ __func__, real_qp->qp_num);
+
/* The port/pkey settings are maintained only for the real QP. Open
* handles on the real QP will be in the shared_qp_list. When
* enforcing security on the real QP all the shared QPs will be
* checked as well.
*/
- if (pps_change && !special_qp) {
+ if (pps_change && !special_qp && real_qp->qp_sec) {
mutex_lock(&real_qp->qp_sec->mutex);
new_pps = get_new_pps(real_qp,
qp_attr,
qp_attr_mask);
-
+ if (!new_pps) {
+ mutex_unlock(&real_qp->qp_sec->mutex);
+ return -ENOMEM;
+ }
/* Add this QP to the lists for the new port
* and pkey settings before checking for permission
* in case there is a concurrent cache update
@@ -600,7 +630,7 @@ int ib_security_modify_qp(struct ib_qp *qp,
qp_attr_mask,
udata);
- if (pps_change && !special_qp) {
+ if (new_pps) {
/* Clean up the lists and free the appropriate
* ports_pkeys structure.
*/
@@ -630,6 +660,9 @@ static int ib_security_pkey_access(struct ib_device *dev,
u16 pkey;
int ret;
+ if (!rdma_protocol_ib(dev, port_num))
+ return 0;
+
ret = ib_get_cached_pkey(dev, port_num, pkey_index, &pkey);
if (ret)
return ret;
@@ -663,6 +696,9 @@ int ib_mad_agent_security_setup(struct ib_mad_agent *agent,
{
int ret;
+ if (!rdma_protocol_ib(agent->device, agent->port_num))
+ return 0;
+
ret = security_ib_alloc_security(&agent->security);
if (ret)
return ret;
@@ -688,6 +724,9 @@ int ib_mad_agent_security_setup(struct ib_mad_agent *agent,
void ib_mad_agent_security_cleanup(struct ib_mad_agent *agent)
{
+ if (!rdma_protocol_ib(agent->device, agent->port_num))
+ return;
+
security_ib_free_security(agent->security);
if (agent->lsm_nb_reg)
unregister_lsm_notifier(&agent->lsm_nb);
@@ -695,6 +734,9 @@ void ib_mad_agent_security_cleanup(struct ib_mad_agent *agent)
int ib_mad_enforce_security(struct ib_mad_agent_private *map, u16 pkey_index)
{
+ if (!rdma_protocol_ib(map->agent.device, map->agent.port_num))
+ return 0;
+
if (map->agent.qp->qp_type == IB_QPT_SMI && !map->agent.smp_allowed)
return -EACCES;
--
1.8.3.1
If scsi_eh_scmd_add() is called concurrently with
scsi_host_queue_ready() while shost->host_blocked > 0 then it can
happen that neither function wakes up the SCSI error handler. Fix
this by making every function that decreases the host_busy counter
wake up the error handler if necessary and by protecting the
host_failed checks with the SCSI host lock.
Reported-by: Pavel Tikhomirov <ptikhomirov(a)virtuozzo.com>
Fixes: commit 746650160866 ("scsi: convert host_busy to atomic_t")
Signed-off-by: Bart Van Assche <bart.vanassche(a)wdc.com>
Cc: Konstantin Khorenko <khorenko(a)virtuozzo.com>
Cc: Stuart Hayes <stuart.w.hayes(a)gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov(a)virtuozzo.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Hannes Reinecke <hare(a)suse.com>
Cc: Johannes Thumshirn <jthumshirn(a)suse.de>
Cc: <stable(a)vger.kernel.org>
---
drivers/scsi/scsi_error.c | 8 +++++++-
drivers/scsi/scsi_lib.c | 39 ++++++++++++++++++++++++++++-----------
2 files changed, 35 insertions(+), 12 deletions(-)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 5e89049e9b4e..b22a9a23c74c 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -233,19 +233,25 @@ static void scsi_eh_reset(struct scsi_cmnd *scmd)
void scsi_eh_scmd_add(struct scsi_cmnd *scmd)
{
struct Scsi_Host *shost = scmd->device->host;
+ enum scsi_host_state shost_state;
unsigned long flags;
int ret;
WARN_ON_ONCE(!shost->ehandler);
spin_lock_irqsave(shost->host_lock, flags);
+ shost_state = shost->shost_state;
if (scsi_host_set_state(shost, SHOST_RECOVERY)) {
ret = scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY);
WARN_ON_ONCE(ret);
}
if (shost->eh_deadline != -1 && !shost->last_reset)
shost->last_reset = jiffies;
-
+ if (shost_state != shost->shost_state) {
+ spin_unlock_irqrestore(shost->host_lock, flags);
+ synchronize_rcu();
+ spin_lock_irqsave(shost->host_lock, flags);
+ }
scsi_eh_reset(scmd);
list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
shost->host_failed++;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index b6d3842b6809..7d18fb245d7d 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -318,22 +318,39 @@ static void scsi_init_cmd_errh(struct scsi_cmnd *cmd)
cmd->cmd_len = scsi_command_size(cmd->cmnd);
}
-void scsi_device_unbusy(struct scsi_device *sdev)
+/*
+ * Decrement the host_busy counter and wake up the error handler if necessary.
+ * Avoid as follows that the error handler is not woken up if shost->host_busy
+ * == shost->host_failed: use synchronize_rcu() in scsi_eh_scmd_add() in
+ * combination with an RCU read lock in this function to ensure that this
+ * function in its entirety either finishes before scsi_eh_scmd_add()
+ * increases the host_failed counter or that it notices the shost state change
+ * made by scsi_eh_scmd_add().
+ */
+static void scsi_dec_host_busy(struct Scsi_Host *shost)
{
- struct Scsi_Host *shost = sdev->host;
- struct scsi_target *starget = scsi_target(sdev);
unsigned long flags;
+ rcu_read_lock();
atomic_dec(&shost->host_busy);
- if (starget->can_queue > 0)
- atomic_dec(&starget->target_busy);
-
- if (unlikely(scsi_host_in_recovery(shost) &&
- (shost->host_failed || shost->host_eh_scheduled))) {
+ if (unlikely(scsi_host_in_recovery(shost))) {
spin_lock_irqsave(shost->host_lock, flags);
- scsi_eh_wakeup(shost);
+ if (shost->host_failed || shost->host_eh_scheduled)
+ scsi_eh_wakeup(shost);
spin_unlock_irqrestore(shost->host_lock, flags);
}
+ rcu_read_unlock();
+}
+
+void scsi_device_unbusy(struct scsi_device *sdev)
+{
+ struct Scsi_Host *shost = sdev->host;
+ struct scsi_target *starget = scsi_target(sdev);
+
+ scsi_dec_host_busy(shost);
+
+ if (starget->can_queue > 0)
+ atomic_dec(&starget->target_busy);
atomic_dec(&sdev->device_busy);
}
@@ -1531,7 +1548,7 @@ static inline int scsi_host_queue_ready(struct request_queue *q,
list_add_tail(&sdev->starved_entry, &shost->starved_list);
spin_unlock_irq(shost->host_lock);
out_dec:
- atomic_dec(&shost->host_busy);
+ scsi_dec_host_busy(shost);
return 0;
}
@@ -2017,7 +2034,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
return BLK_STS_OK;
out_dec_host_busy:
- atomic_dec(&shost->host_busy);
+ scsi_dec_host_busy(shost);
out_dec_target_busy:
if (scsi_target(sdev)->can_queue > 0)
atomic_dec(&scsi_target(sdev)->target_busy);
--
2.15.0
This is a note to let you know that I've just added the patch titled
x86/efi-bgrt: Replace early_memremap() with memremap()
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-efi-bgrt-replace-early_memremap-with-memremap.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From e2c90dd7e11e3025b46719a79fb4bb1e7a5cef9f Mon Sep 17 00:00:00 2001
From: Matt Fleming <matt(a)codeblueprint.co.uk>
Date: Mon, 21 Dec 2015 14:12:52 +0000
Subject: x86/efi-bgrt: Replace early_memremap() with memremap()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Matt Fleming <matt(a)codeblueprint.co.uk>
commit e2c90dd7e11e3025b46719a79fb4bb1e7a5cef9f upstream.
Môshe reported the following warning triggered on his machine since
commit 50a0cb565246 ("x86/efi-bgrt: Fix kernel panic when mapping BGRT
data"),
[ 0.026936] ------------[ cut here ]------------
[ 0.026941] WARNING: CPU: 0 PID: 0 at mm/early_ioremap.c:137 __early_ioremap+0x102/0x1bb()
[ 0.026941] Modules linked in:
[ 0.026944] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.0-rc1 #2
[ 0.026945] Hardware name: Dell Inc. XPS 13 9343/09K8G1, BIOS A05 07/14/2015
[ 0.026946] 0000000000000000 900f03d5a116524d ffffffff81c03e60 ffffffff813a3fff
[ 0.026948] 0000000000000000 ffffffff81c03e98 ffffffff810a0852 00000000d7b76000
[ 0.026949] 0000000000000000 0000000000000001 0000000000000001 000000000000017c
[ 0.026951] Call Trace:
[ 0.026955] [<ffffffff813a3fff>] dump_stack+0x44/0x55
[ 0.026958] [<ffffffff810a0852>] warn_slowpath_common+0x82/0xc0
[ 0.026959] [<ffffffff810a099a>] warn_slowpath_null+0x1a/0x20
[ 0.026961] [<ffffffff81d8c395>] __early_ioremap+0x102/0x1bb
[ 0.026962] [<ffffffff81d8c602>] early_memremap+0x13/0x15
[ 0.026964] [<ffffffff81d78361>] efi_bgrt_init+0x162/0x1ad
[ 0.026966] [<ffffffff81d778ec>] efi_late_init+0x9/0xb
[ 0.026968] [<ffffffff81d58ff5>] start_kernel+0x46f/0x49f
[ 0.026970] [<ffffffff81d58120>] ? early_idt_handler_array+0x120/0x120
[ 0.026972] [<ffffffff81d58339>] x86_64_start_reservations+0x2a/0x2c
[ 0.026974] [<ffffffff81d58485>] x86_64_start_kernel+0x14a/0x16d
[ 0.026977] ---[ end trace f9b3812eb8e24c58 ]---
[ 0.026978] efi_bgrt: Ignoring BGRT: failed to map image memory
early_memremap() has an upper limit on the size of mapping it can
handle which is ~200KB. Clearly the BGRT image on Môshe's machine is
much larger than that.
There's actually no reason to restrict ourselves to using the early_*
version of memremap() - the ACPI BGRT driver is invoked late enough in
boot that we can use the standard version, with the benefit that the
late version allows mappings of arbitrary size.
Reported-by: Môshe van der Sterre <me(a)moshe.nl>
Tested-by: Môshe van der Sterre <me(a)moshe.nl>
Signed-off-by: Matt Fleming <matt(a)codeblueprint.co.uk>
Cc: Josh Triplett <josh(a)joshtriplett.org>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya(a)intel.com>
Cc: Borislav Petkov <bp(a)suse.de>
Link: http://lkml.kernel.org/r/1450707172-12561-1-git-send-email-matt@codebluepri…
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: "Ghannam, Yazen" <Yazen.Ghannam(a)amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/platform/efi/efi-bgrt.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--- a/arch/x86/platform/efi/efi-bgrt.c
+++ b/arch/x86/platform/efi/efi-bgrt.c
@@ -69,14 +69,14 @@ void __init efi_bgrt_init(void)
return;
}
- image = early_memremap(bgrt_tab->image_address, sizeof(bmp_header));
+ image = memremap(bgrt_tab->image_address, sizeof(bmp_header), MEMREMAP_WB);
if (!image) {
pr_err("Ignoring BGRT: failed to map image header memory\n");
return;
}
memcpy(&bmp_header, image, sizeof(bmp_header));
- early_memunmap(image, sizeof(bmp_header));
+ memunmap(image);
bgrt_image_size = bmp_header.size;
bgrt_image = kmalloc(bgrt_image_size, GFP_KERNEL | __GFP_NOWARN);
@@ -86,7 +86,7 @@ void __init efi_bgrt_init(void)
return;
}
- image = early_memremap(bgrt_tab->image_address, bmp_header.size);
+ image = memremap(bgrt_tab->image_address, bmp_header.size, MEMREMAP_WB);
if (!image) {
pr_err("Ignoring BGRT: failed to map image memory\n");
kfree(bgrt_image);
@@ -95,5 +95,5 @@ void __init efi_bgrt_init(void)
}
memcpy(bgrt_image, image, bgrt_image_size);
- early_memunmap(image, bmp_header.size);
+ memunmap(image);
}
Patches currently in stable-queue which might be from matt(a)codeblueprint.co.uk are
queue-4.4/x86-mm-pat-ensure-cpa-pfn-only-contains-page-frame-numbers.patch
queue-4.4/x86-efi-hoist-page-table-switching-code-into-efi_call_virt.patch
queue-4.4/x86-efi-build-our-own-page-table-structures.patch
queue-4.4/x86-efi-bgrt-replace-early_memremap-with-memremap.patch
queue-4.4/x86-efi-bgrt-fix-kernel-panic-when-mapping-bgrt-data.patch
This is a note to let you know that I've just added the patch titled
ARM: dts: omap3: logicpd-torpedo-37xx-devkit: Fix MMC1 cd-gpio
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
arm-dts-omap3-logicpd-torpedo-37xx-devkit-fix-mmc1-cd-gpio.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From b7ace5ed8867ca54503727988adec6b20af54eeb Mon Sep 17 00:00:00 2001
From: Adam Ford <aford173(a)gmail.com>
Date: Thu, 17 Aug 2017 06:01:28 -0500
Subject: ARM: dts: omap3: logicpd-torpedo-37xx-devkit: Fix MMC1 cd-gpio
From: Adam Ford <aford173(a)gmail.com>
commit b7ace5ed8867ca54503727988adec6b20af54eeb upstream.
Fixes commit 687c27676151 ("ARM: dts: Add minimal support for LogicPD
Torpedo DM3730 devkit")
This patch corrects an issue where the cd-gpios was improperly setup
using IRQ_TYPE_LEVEL_LOW instead of GPIO_ACTIVE_LOW.
Signed-off-by: Adam Ford <aford173(a)gmail.com>
Signed-off-by: Tony Lindgren <tony(a)atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/arm/boot/dts/logicpd-torpedo-37xx-devkit.dts | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm/boot/dts/logicpd-torpedo-37xx-devkit.dts
+++ b/arch/arm/boot/dts/logicpd-torpedo-37xx-devkit.dts
@@ -88,7 +88,7 @@
interrupts-extended = <&intc 83 &omap3_pmx_core 0x11a>;
pinctrl-names = "default";
pinctrl-0 = <&mmc1_pins &mmc1_cd>;
- cd-gpios = <&gpio4 31 IRQ_TYPE_LEVEL_LOW>; /* gpio127 */
+ cd-gpios = <&gpio4 31 GPIO_ACTIVE_LOW>; /* gpio127 */
vmmc-supply = <&vmmc1>;
bus-width = <4>;
cap-power-off-card;
Patches currently in stable-queue which might be from aford173(a)gmail.com are
queue-4.4/arm-dts-omap3-logicpd-torpedo-37xx-devkit-fix-mmc1-cd-gpio.patch
This is a note to let you know that I've just added the patch titled
ARM: dts: omap3: logicpd-torpedo-37xx-devkit: Fix MMC1 cd-gpio
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
arm-dts-omap3-logicpd-torpedo-37xx-devkit-fix-mmc1-cd-gpio.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From b7ace5ed8867ca54503727988adec6b20af54eeb Mon Sep 17 00:00:00 2001
From: Adam Ford <aford173(a)gmail.com>
Date: Thu, 17 Aug 2017 06:01:28 -0500
Subject: ARM: dts: omap3: logicpd-torpedo-37xx-devkit: Fix MMC1 cd-gpio
From: Adam Ford <aford173(a)gmail.com>
commit b7ace5ed8867ca54503727988adec6b20af54eeb upstream.
Fixes commit 687c27676151 ("ARM: dts: Add minimal support for LogicPD
Torpedo DM3730 devkit")
This patch corrects an issue where the cd-gpios was improperly setup
using IRQ_TYPE_LEVEL_LOW instead of GPIO_ACTIVE_LOW.
Signed-off-by: Adam Ford <aford173(a)gmail.com>
Signed-off-by: Tony Lindgren <tony(a)atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/arm/boot/dts/logicpd-torpedo-37xx-devkit.dts | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm/boot/dts/logicpd-torpedo-37xx-devkit.dts
+++ b/arch/arm/boot/dts/logicpd-torpedo-37xx-devkit.dts
@@ -192,7 +192,7 @@
interrupts-extended = <&intc 83 &omap3_pmx_core 0x11a>;
pinctrl-names = "default";
pinctrl-0 = <&mmc1_pins &mmc1_cd>;
- cd-gpios = <&gpio4 31 IRQ_TYPE_LEVEL_LOW>; /* gpio127 */
+ cd-gpios = <&gpio4 31 GPIO_ACTIVE_LOW>; /* gpio127 */
vmmc-supply = <&vmmc1>;
bus-width = <4>;
cap-power-off-card;
Patches currently in stable-queue which might be from aford173(a)gmail.com are
queue-4.9/arm-dts-omap3-logicpd-torpedo-37xx-devkit-fix-mmc1-cd-gpio.patch
queue-4.9/arm-dts-logicpd-torpedo-fix-camera-pin-mux.patch
This is a note to let you know that I've just added the patch titled
ARM: dts: LogicPD Torpedo: Fix camera pin mux
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
arm-dts-logicpd-torpedo-fix-camera-pin-mux.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 56322e123235370f1449c7444e311cce857d12f5 Mon Sep 17 00:00:00 2001
From: Adam Ford <aford173(a)gmail.com>
Date: Thu, 11 May 2017 12:21:19 -0500
Subject: ARM: dts: LogicPD Torpedo: Fix camera pin mux
From: Adam Ford <aford173(a)gmail.com>
commit 56322e123235370f1449c7444e311cce857d12f5 upstream.
Fix commit 05c4ffc3a266 ("ARM: dts: LogicPD Torpedo: Add MT9P031 Support")
In the previous commit, I indicated that the only testing was done by
showing the camera showed up when probing. This patch fixes an incorrect
pin muxing on cam_d0, cam_d1 and cam_d2.
Signed-off-by: Adam Ford <aford173(a)gmail.com>
Signed-off-by: Tony Lindgren <tony(a)atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/arm/boot/dts/logicpd-torpedo-37xx-devkit.dts | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/arch/arm/boot/dts/logicpd-torpedo-37xx-devkit.dts
+++ b/arch/arm/boot/dts/logicpd-torpedo-37xx-devkit.dts
@@ -249,9 +249,9 @@
OMAP3_CORE1_IOPAD(0x2110, PIN_INPUT | MUX_MODE0) /* cam_xclka.cam_xclka */
OMAP3_CORE1_IOPAD(0x2112, PIN_INPUT | MUX_MODE0) /* cam_pclk.cam_pclk */
- OMAP3_CORE1_IOPAD(0x2114, PIN_INPUT | MUX_MODE0) /* cam_d0.cam_d0 */
- OMAP3_CORE1_IOPAD(0x2116, PIN_INPUT | MUX_MODE0) /* cam_d1.cam_d1 */
- OMAP3_CORE1_IOPAD(0x2118, PIN_INPUT | MUX_MODE0) /* cam_d2.cam_d2 */
+ OMAP3_CORE1_IOPAD(0x2116, PIN_INPUT | MUX_MODE0) /* cam_d0.cam_d0 */
+ OMAP3_CORE1_IOPAD(0x2118, PIN_INPUT | MUX_MODE0) /* cam_d1.cam_d1 */
+ OMAP3_CORE1_IOPAD(0x211a, PIN_INPUT | MUX_MODE0) /* cam_d2.cam_d2 */
OMAP3_CORE1_IOPAD(0x211c, PIN_INPUT | MUX_MODE0) /* cam_d3.cam_d3 */
OMAP3_CORE1_IOPAD(0x211e, PIN_INPUT | MUX_MODE0) /* cam_d4.cam_d4 */
OMAP3_CORE1_IOPAD(0x2120, PIN_INPUT | MUX_MODE0) /* cam_d5.cam_d5 */
Patches currently in stable-queue which might be from aford173(a)gmail.com are
queue-4.9/arm-dts-omap3-logicpd-torpedo-37xx-devkit-fix-mmc1-cd-gpio.patch
queue-4.9/arm-dts-logicpd-torpedo-fix-camera-pin-mux.patch
Commit b7ace5ed8867 ("ARM: dts: omap3: logicpd-torpedo-37xx-devkit:
Fix MMC1 cd-gpio") fixes the card detect gpio detection.
Can this patch please be applied to linux-4.4.y and linux-4.9.y?
thank you
adam
This is a note to let you know that I've just added the patch titled
x86/mm/pat: Ensure cpa->pfn only contains page frame numbers
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-mm-pat-ensure-cpa-pfn-only-contains-page-frame-numbers.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From edc3b9129cecd0f0857112136f5b8b1bc1d45918 Mon Sep 17 00:00:00 2001
From: Matt Fleming <matt(a)codeblueprint.co.uk>
Date: Fri, 27 Nov 2015 21:09:31 +0000
Subject: x86/mm/pat: Ensure cpa->pfn only contains page frame numbers
From: Matt Fleming <matt(a)codeblueprint.co.uk>
commit edc3b9129cecd0f0857112136f5b8b1bc1d45918 upstream.
The x86 pageattr code is confused about the data that is stored
in cpa->pfn, sometimes it's treated as a page frame number,
sometimes it's treated as an unshifted physical address, and in
one place it's treated as a pte.
The result of this is that the mapping functions do not map the
intended physical address.
This isn't a problem in practice because most of the addresses
we're mapping in the EFI code paths are already mapped in
'trampoline_pgd' and so the pageattr mapping functions don't
actually do anything in this case. But when we move to using a
separate page table for the EFI runtime this will be an issue.
Signed-off-by: Matt Fleming <matt(a)codeblueprint.co.uk>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Acked-by: Borislav Petkov <bp(a)suse.de>
Cc: Andy Lutomirski <luto(a)amacapital.net>
Cc: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya(a)intel.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Toshi Kani <toshi.kani(a)hp.com>
Cc: linux-efi(a)vger.kernel.org
Link: http://lkml.kernel.org/r/1448658575-17029-3-git-send-email-matt@codebluepri…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: "Ghannam, Yazen" <Yazen.Ghannam(a)amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/mm/pageattr.c | 17 ++++++-----------
arch/x86/platform/efi/efi_64.c | 16 ++++++++++------
2 files changed, 16 insertions(+), 17 deletions(-)
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -911,15 +911,10 @@ static void populate_pte(struct cpa_data
pte = pte_offset_kernel(pmd, start);
while (num_pages-- && start < end) {
-
- /* deal with the NX bit */
- if (!(pgprot_val(pgprot) & _PAGE_NX))
- cpa->pfn &= ~_PAGE_NX;
-
- set_pte(pte, pfn_pte(cpa->pfn >> PAGE_SHIFT, pgprot));
+ set_pte(pte, pfn_pte(cpa->pfn, pgprot));
start += PAGE_SIZE;
- cpa->pfn += PAGE_SIZE;
+ cpa->pfn++;
pte++;
}
}
@@ -975,11 +970,11 @@ static int populate_pmd(struct cpa_data
pmd = pmd_offset(pud, start);
- set_pmd(pmd, __pmd(cpa->pfn | _PAGE_PSE |
+ set_pmd(pmd, __pmd(cpa->pfn << PAGE_SHIFT | _PAGE_PSE |
massage_pgprot(pmd_pgprot)));
start += PMD_SIZE;
- cpa->pfn += PMD_SIZE;
+ cpa->pfn += PMD_SIZE >> PAGE_SHIFT;
cur_pages += PMD_SIZE >> PAGE_SHIFT;
}
@@ -1048,11 +1043,11 @@ static int populate_pud(struct cpa_data
* Map everything starting from the Gb boundary, possibly with 1G pages
*/
while (end - start >= PUD_SIZE) {
- set_pud(pud, __pud(cpa->pfn | _PAGE_PSE |
+ set_pud(pud, __pud(cpa->pfn << PAGE_SHIFT | _PAGE_PSE |
massage_pgprot(pud_pgprot)));
start += PUD_SIZE;
- cpa->pfn += PUD_SIZE;
+ cpa->pfn += PUD_SIZE >> PAGE_SHIFT;
cur_pages += PUD_SIZE >> PAGE_SHIFT;
pud++;
}
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -143,7 +143,7 @@ void efi_sync_low_kernel_mappings(void)
int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
{
- unsigned long text;
+ unsigned long pfn, text;
struct page *page;
unsigned npages;
pgd_t *pgd;
@@ -160,7 +160,8 @@ int __init efi_setup_page_tables(unsigne
* and ident-map those pages containing the map before calling
* phys_efi_set_virtual_address_map().
*/
- if (kernel_map_pages_in_pgd(pgd, pa_memmap, pa_memmap, num_pages, _PAGE_NX)) {
+ pfn = pa_memmap >> PAGE_SHIFT;
+ if (kernel_map_pages_in_pgd(pgd, pfn, pa_memmap, num_pages, _PAGE_NX)) {
pr_err("Error ident-mapping new memmap (0x%lx)!\n", pa_memmap);
return 1;
}
@@ -185,8 +186,9 @@ int __init efi_setup_page_tables(unsigne
npages = (_end - _text) >> PAGE_SHIFT;
text = __pa(_text);
+ pfn = text >> PAGE_SHIFT;
- if (kernel_map_pages_in_pgd(pgd, text >> PAGE_SHIFT, text, npages, 0)) {
+ if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, 0)) {
pr_err("Failed to map kernel text 1:1\n");
return 1;
}
@@ -204,12 +206,14 @@ void __init efi_cleanup_page_tables(unsi
static void __init __map_region(efi_memory_desc_t *md, u64 va)
{
pgd_t *pgd = (pgd_t *)__va(real_mode_header->trampoline_pgd);
- unsigned long pf = 0;
+ unsigned long flags = 0;
+ unsigned long pfn;
if (!(md->attribute & EFI_MEMORY_WB))
- pf |= _PAGE_PCD;
+ flags |= _PAGE_PCD;
- if (kernel_map_pages_in_pgd(pgd, md->phys_addr, va, md->num_pages, pf))
+ pfn = md->phys_addr >> PAGE_SHIFT;
+ if (kernel_map_pages_in_pgd(pgd, pfn, va, md->num_pages, flags))
pr_warn("Error mapping PA 0x%llx -> VA 0x%llx!\n",
md->phys_addr, va);
}
Patches currently in stable-queue which might be from matt(a)codeblueprint.co.uk are
queue-4.4/x86-mm-pat-ensure-cpa-pfn-only-contains-page-frame-numbers.patch
queue-4.4/x86-efi-hoist-page-table-switching-code-into-efi_call_virt.patch
queue-4.4/x86-efi-build-our-own-page-table-structures.patch
This is a note to let you know that I've just added the patch titled
x86/efi: Build our own page table structures
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
x86-efi-build-our-own-page-table-structures.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 67a9108ed4313b85a9c53406d80dc1ae3f8c3e36 Mon Sep 17 00:00:00 2001
From: Matt Fleming <matt(a)codeblueprint.co.uk>
Date: Fri, 27 Nov 2015 21:09:34 +0000
Subject: x86/efi: Build our own page table structures
From: Matt Fleming <matt(a)codeblueprint.co.uk>
commit 67a9108ed4313b85a9c53406d80dc1ae3f8c3e36 upstream.
With commit e1a58320a38d ("x86/mm: Warn on W^X mappings") all
users booting on 64-bit UEFI machines see the following warning,
------------[ cut here ]------------
WARNING: CPU: 7 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x5dc/0x780()
x86/mm: Found insecure W+X mapping at address ffff88000005f000/0xffff88000005f000
...
x86/mm: Checked W+X mappings: FAILED, 165660 W+X pages found.
...
This is caused by mapping EFI regions with RWX permissions.
There isn't much we can do to restrict the permissions for these
regions due to the way the firmware toolchains mix code and
data, but we can at least isolate these mappings so that they do
not appear in the regular kernel page tables.
In commit d2f7cbe7b26a ("x86/efi: Runtime services virtual
mapping") we started using 'trampoline_pgd' to map the EFI
regions because there was an existing identity mapping there
which we use during the SetVirtualAddressMap() call and for
broken firmware that accesses those addresses.
But 'trampoline_pgd' shares some PGD entries with
'swapper_pg_dir' and does not provide the isolation we require.
Notably the virtual address for __START_KERNEL_map and
MODULES_START are mapped by the same PGD entry so we need to be
more careful when copying changes over in
efi_sync_low_kernel_mappings().
This patch doesn't go the full mile, we still want to share some
PGD entries with 'swapper_pg_dir'. Having completely separate
page tables brings its own issues such as synchronising new
mappings after memory hotplug and module loading. Sharing also
keeps memory usage down.
Signed-off-by: Matt Fleming <matt(a)codeblueprint.co.uk>
Reviewed-by: Borislav Petkov <bp(a)suse.de>
Acked-by: Borislav Petkov <bp(a)suse.de>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Andy Lutomirski <luto(a)amacapital.net>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Dave Jones <davej(a)codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk(a)redhat.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya(a)intel.com>
Cc: Stephen Smalley <sds(a)tycho.nsa.gov>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Toshi Kani <toshi.kani(a)hp.com>
Cc: linux-efi(a)vger.kernel.org
Link: http://lkml.kernel.org/r/1448658575-17029-6-git-send-email-matt@codebluepri…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: "Ghannam, Yazen" <Yazen.Ghannam(a)amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
arch/x86/include/asm/efi.h | 1
arch/x86/platform/efi/efi.c | 39 +++++-----------
arch/x86/platform/efi/efi_32.c | 5 ++
arch/x86/platform/efi/efi_64.c | 97 ++++++++++++++++++++++++++++++++++-------
4 files changed, 102 insertions(+), 40 deletions(-)
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -136,6 +136,7 @@ extern void __init efi_memory_uc(u64 add
extern void __init efi_map_region(efi_memory_desc_t *md);
extern void __init efi_map_region_fixed(efi_memory_desc_t *md);
extern void efi_sync_low_kernel_mappings(void);
+extern int __init efi_alloc_page_tables(void);
extern int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages);
extern void __init efi_cleanup_page_tables(unsigned long pa_memmap, unsigned num_pages);
extern void __init old_map_region(efi_memory_desc_t *md);
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -869,7 +869,7 @@ static void __init kexec_enter_virtual_m
* This function will switch the EFI runtime services to virtual mode.
* Essentially, we look through the EFI memmap and map every region that
* has the runtime attribute bit set in its memory descriptor into the
- * ->trampoline_pgd page table using a top-down VA allocation scheme.
+ * efi_pgd page table.
*
* The old method which used to update that memory descriptor with the
* virtual address obtained from ioremap() is still supported when the
@@ -879,8 +879,8 @@ static void __init kexec_enter_virtual_m
*
* The new method does a pagetable switch in a preemption-safe manner
* so that we're in a different address space when calling a runtime
- * function. For function arguments passing we do copy the PGDs of the
- * kernel page table into ->trampoline_pgd prior to each call.
+ * function. For function arguments passing we do copy the PUDs of the
+ * kernel page table into efi_pgd prior to each call.
*
* Specially for kexec boot, efi runtime maps in previous kernel should
* be passed in via setup_data. In that case runtime ranges will be mapped
@@ -895,6 +895,12 @@ static void __init __efi_enter_virtual_m
efi.systab = NULL;
+ if (efi_alloc_page_tables()) {
+ pr_err("Failed to allocate EFI page tables\n");
+ clear_bit(EFI_RUNTIME_SERVICES, &efi.flags);
+ return;
+ }
+
efi_merge_regions();
new_memmap = efi_map_regions(&count, &pg_shift);
if (!new_memmap) {
@@ -954,28 +960,11 @@ static void __init __efi_enter_virtual_m
efi_runtime_mkexec();
/*
- * We mapped the descriptor array into the EFI pagetable above but we're
- * not unmapping it here. Here's why:
- *
- * We're copying select PGDs from the kernel page table to the EFI page
- * table and when we do so and make changes to those PGDs like unmapping
- * stuff from them, those changes appear in the kernel page table and we
- * go boom.
- *
- * From setup_real_mode():
- *
- * ...
- * trampoline_pgd[0] = init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd;
- *
- * In this particular case, our allocation is in PGD 0 of the EFI page
- * table but we've copied that PGD from PGD[272] of the EFI page table:
- *
- * pgd_index(__PAGE_OFFSET = 0xffff880000000000) = 272
- *
- * where the direct memory mapping in kernel space is.
- *
- * new_memmap's VA comes from that direct mapping and thus clearing it,
- * it would get cleared in the kernel page table too.
+ * We mapped the descriptor array into the EFI pagetable above
+ * but we're not unmapping it here because if we're running in
+ * EFI mixed mode we need all of memory to be accessible when
+ * we pass parameters to the EFI runtime services in the
+ * thunking code.
*
* efi_cleanup_page_tables(__pa(new_memmap), 1 << pg_shift);
*/
--- a/arch/x86/platform/efi/efi_32.c
+++ b/arch/x86/platform/efi/efi_32.c
@@ -38,6 +38,11 @@
* say 0 - 3G.
*/
+int __init efi_alloc_page_tables(void)
+{
+ return 0;
+}
+
void efi_sync_low_kernel_mappings(void) {}
void __init efi_dump_pagetable(void) {}
int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -40,6 +40,7 @@
#include <asm/fixmap.h>
#include <asm/realmode.h>
#include <asm/time.h>
+#include <asm/pgalloc.h>
/*
* We allocate runtime services regions bottom-up, starting from -4G, i.e.
@@ -121,22 +122,92 @@ void __init efi_call_phys_epilog(pgd_t *
early_code_mapping_set_exec(0);
}
+static pgd_t *efi_pgd;
+
+/*
+ * We need our own copy of the higher levels of the page tables
+ * because we want to avoid inserting EFI region mappings (EFI_VA_END
+ * to EFI_VA_START) into the standard kernel page tables. Everything
+ * else can be shared, see efi_sync_low_kernel_mappings().
+ */
+int __init efi_alloc_page_tables(void)
+{
+ pgd_t *pgd;
+ pud_t *pud;
+ gfp_t gfp_mask;
+
+ if (efi_enabled(EFI_OLD_MEMMAP))
+ return 0;
+
+ gfp_mask = GFP_KERNEL | __GFP_NOTRACK | __GFP_REPEAT | __GFP_ZERO;
+ efi_pgd = (pgd_t *)__get_free_page(gfp_mask);
+ if (!efi_pgd)
+ return -ENOMEM;
+
+ pgd = efi_pgd + pgd_index(EFI_VA_END);
+
+ pud = pud_alloc_one(NULL, 0);
+ if (!pud) {
+ free_page((unsigned long)efi_pgd);
+ return -ENOMEM;
+ }
+
+ pgd_populate(NULL, pgd, pud);
+
+ return 0;
+}
+
/*
* Add low kernel mappings for passing arguments to EFI functions.
*/
void efi_sync_low_kernel_mappings(void)
{
- unsigned num_pgds;
- pgd_t *pgd = (pgd_t *)__va(real_mode_header->trampoline_pgd);
+ unsigned num_entries;
+ pgd_t *pgd_k, *pgd_efi;
+ pud_t *pud_k, *pud_efi;
if (efi_enabled(EFI_OLD_MEMMAP))
return;
- num_pgds = pgd_index(MODULES_END - 1) - pgd_index(PAGE_OFFSET);
+ /*
+ * We can share all PGD entries apart from the one entry that
+ * covers the EFI runtime mapping space.
+ *
+ * Make sure the EFI runtime region mappings are guaranteed to
+ * only span a single PGD entry and that the entry also maps
+ * other important kernel regions.
+ */
+ BUILD_BUG_ON(pgd_index(EFI_VA_END) != pgd_index(MODULES_END));
+ BUILD_BUG_ON((EFI_VA_START & PGDIR_MASK) !=
+ (EFI_VA_END & PGDIR_MASK));
+
+ pgd_efi = efi_pgd + pgd_index(PAGE_OFFSET);
+ pgd_k = pgd_offset_k(PAGE_OFFSET);
+
+ num_entries = pgd_index(EFI_VA_END) - pgd_index(PAGE_OFFSET);
+ memcpy(pgd_efi, pgd_k, sizeof(pgd_t) * num_entries);
+
+ /*
+ * We share all the PUD entries apart from those that map the
+ * EFI regions. Copy around them.
+ */
+ BUILD_BUG_ON((EFI_VA_START & ~PUD_MASK) != 0);
+ BUILD_BUG_ON((EFI_VA_END & ~PUD_MASK) != 0);
+
+ pgd_efi = efi_pgd + pgd_index(EFI_VA_END);
+ pud_efi = pud_offset(pgd_efi, 0);
+
+ pgd_k = pgd_offset_k(EFI_VA_END);
+ pud_k = pud_offset(pgd_k, 0);
+
+ num_entries = pud_index(EFI_VA_END);
+ memcpy(pud_efi, pud_k, sizeof(pud_t) * num_entries);
- memcpy(pgd + pgd_index(PAGE_OFFSET),
- init_mm.pgd + pgd_index(PAGE_OFFSET),
- sizeof(pgd_t) * num_pgds);
+ pud_efi = pud_offset(pgd_efi, EFI_VA_START);
+ pud_k = pud_offset(pgd_k, EFI_VA_START);
+
+ num_entries = PTRS_PER_PUD - pud_index(EFI_VA_START);
+ memcpy(pud_efi, pud_k, sizeof(pud_t) * num_entries);
}
int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
@@ -149,8 +220,8 @@ int __init efi_setup_page_tables(unsigne
if (efi_enabled(EFI_OLD_MEMMAP))
return 0;
- efi_scratch.efi_pgt = (pgd_t *)(unsigned long)real_mode_header->trampoline_pgd;
- pgd = __va(efi_scratch.efi_pgt);
+ efi_scratch.efi_pgt = (pgd_t *)__pa(efi_pgd);
+ pgd = efi_pgd;
/*
* It can happen that the physical address of new_memmap lands in memory
@@ -196,16 +267,14 @@ int __init efi_setup_page_tables(unsigne
void __init efi_cleanup_page_tables(unsigned long pa_memmap, unsigned num_pages)
{
- pgd_t *pgd = (pgd_t *)__va(real_mode_header->trampoline_pgd);
-
- kernel_unmap_pages_in_pgd(pgd, pa_memmap, num_pages);
+ kernel_unmap_pages_in_pgd(efi_pgd, pa_memmap, num_pages);
}
static void __init __map_region(efi_memory_desc_t *md, u64 va)
{
- pgd_t *pgd = (pgd_t *)__va(real_mode_header->trampoline_pgd);
unsigned long flags = 0;
unsigned long pfn;
+ pgd_t *pgd = efi_pgd;
if (!(md->attribute & EFI_MEMORY_WB))
flags |= _PAGE_PCD;
@@ -314,9 +383,7 @@ void __init efi_runtime_mkexec(void)
void __init efi_dump_pagetable(void)
{
#ifdef CONFIG_EFI_PGT_DUMP
- pgd_t *pgd = (pgd_t *)__va(real_mode_header->trampoline_pgd);
-
- ptdump_walk_pgd_level(NULL, pgd);
+ ptdump_walk_pgd_level(NULL, efi_pgd);
#endif
}
Patches currently in stable-queue which might be from matt(a)codeblueprint.co.uk are
queue-4.4/x86-mm-pat-ensure-cpa-pfn-only-contains-page-frame-numbers.patch
queue-4.4/x86-efi-hoist-page-table-switching-code-into-efi_call_virt.patch
queue-4.4/x86-efi-build-our-own-page-table-structures.patch
Commit 56322e123235 ("ARM: dts: LogicPD Torpedo: Fix camera pin mux")
fixes some pin muxing issues which causes distortion in the camera.
Can this patch please be ported to linux-4.9.y?
adam
Hi stable/arm/Willy,
1f65c13efef69b6dc908e588f91a133641d8475c is an important commit,
because it involves evaluation of pointers from userspace. I'm running
into issues with RNDADDTOENTCNT reading bogus values, because p is
incremented twice as much as it should in this random.c block:
case RNDADDENTROPY:
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
if (get_user(ent_count, p++))
return -EFAULT;
if (ent_count < 0)
return -EINVAL;
if (get_user(size, p++))
return -EFAULT;
retval = write_pool(&input_pool, (const char __user *)p,
size);
That seems reasonable, but on aarch64, get_user is defined as:
#define get_user(x, ptr) \
({ \
might_sleep(); \
access_ok(VERIFY_READ, (ptr), sizeof(*(ptr))) ? \
__get_user((x), (ptr)) : \
((x) = 0, -EFAULT); \
})
Notice the multiple use of ptr.
I thought I had found something breathtakingly bad, until I realized
that it was already fixed in 2013 by Takahiro. It just wasn't marked
for stable.
Not sure if there's ever going to be another stable 3.10 release, but
if so, this would be an important one to backport.
Regards,
Jason
4.9.65-rt57-rc1 stable review patch.
If anyone has any objections, please let me know.
------------------
From: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
This reverts commit "fs: jbd2: pull your plug when waiting for space".
This was a duct-tape fix which shouldn't be needed since commit
"locking/rt-mutex: fix deadlock in device mapper / block-IO".
Cc: stable(a)vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
fs/jbd2/checkpoint.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 6e18a06aaabe..684996c8a3a4 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -116,8 +116,6 @@ void __jbd2_log_wait_for_space(journal_t *journal)
nblocks = jbd2_space_needed(journal);
while (jbd2_log_space_left(journal) < nblocks) {
write_unlock(&journal->j_state_lock);
- if (current->plug)
- io_schedule();
mutex_lock(&journal->j_checkpoint_mutex);
/*
--
2.13.2
4.9.65-rt57-rc1 stable review patch.
If anyone has any objections, please let me know.
------------------
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
The commit "memcontrol: Prevent scheduling while atomic in cgroup code"
fixed this issue:
refill_stock()
get_cpu_var()
drain_stock()
res_counter_uncharge()
res_counter_uncharge_until()
spin_lock() <== boom
But commit 3e32cb2e0a12b ("mm: memcontrol: lockless page counters") replaced
the calls to res_counter_uncharge() in drain_stock() to the lockless
function page_counter_uncharge(). There is no more spin lock there and no
more reason to have that local lock.
Cc: <stable(a)vger.kernel.org>
Reported-by: Haiyang HY1 Tan <tanhy1(a)lenovo.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
[bigeasy: That upstream commit appeared in v3.19 and the patch in
question in v3.18.7-rt2 and v3.18 seems still to be maintained. So I
guess that v3.18 would need the locallocks that we are about to remove
here. I am not sure if any earlier versions have the patch
backported.
The stable tag here is because Haiyang reported (and debugged) a crash
in 4.4-RT with this patch applied (which has get_cpu_light() instead
the locallocks it gained in v4.9-RT).
https://lkml.kernel.org/r/05AA4EC5C6EC1D48BE2CDCFF3AE0B8A637F78A15@CNMAILEX…
]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
---
mm/memcontrol.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 12b94909ba7b..c04403033aec 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1698,7 +1698,6 @@ struct memcg_stock_pcp {
#define FLUSHING_CACHED_CHARGE 0
};
static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
-static DEFINE_LOCAL_IRQ_LOCK(memcg_stock_ll);
static DEFINE_MUTEX(percpu_charge_mutex);
/**
@@ -1721,7 +1720,7 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
if (nr_pages > CHARGE_BATCH)
return ret;
- local_lock_irqsave(memcg_stock_ll, flags);
+ local_irq_save(flags);
stock = this_cpu_ptr(&memcg_stock);
if (memcg == stock->cached && stock->nr_pages >= nr_pages) {
@@ -1729,7 +1728,7 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
ret = true;
}
- local_unlock_irqrestore(memcg_stock_ll, flags);
+ local_irq_restore(flags);
return ret;
}
@@ -1756,13 +1755,13 @@ static void drain_local_stock(struct work_struct *dummy)
struct memcg_stock_pcp *stock;
unsigned long flags;
- local_lock_irqsave(memcg_stock_ll, flags);
+ local_irq_save(flags);
stock = this_cpu_ptr(&memcg_stock);
drain_stock(stock);
clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags);
- local_unlock_irqrestore(memcg_stock_ll, flags);
+ local_irq_restore(flags);
}
/*
@@ -1774,7 +1773,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
struct memcg_stock_pcp *stock;
unsigned long flags;
- local_lock_irqsave(memcg_stock_ll, flags);
+ local_irq_save(flags);
stock = this_cpu_ptr(&memcg_stock);
if (stock->cached != memcg) { /* reset if necessary */
@@ -1783,7 +1782,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
}
stock->nr_pages += nr_pages;
- local_unlock_irqrestore(memcg_stock_ll, flags);
+ local_irq_restore(flags);
}
/*
--
2.13.2
4.9.65-rt57-rc1 stable review patch.
If anyone has any objections, please let me know.
------------------
From: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
This reverts commit "fs: jbd2: pull your plug when waiting for space".
This was a duct-tape fix which shouldn't be needed since commit
"locking/rt-mutex: fix deadlock in device mapper / block-IO".
Cc: stable(a)vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
fs/jbd2/checkpoint.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 6e18a06aaabe..684996c8a3a4 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -116,8 +116,6 @@ void __jbd2_log_wait_for_space(journal_t *journal)
nblocks = jbd2_space_needed(journal);
while (jbd2_log_space_left(journal) < nblocks) {
write_unlock(&journal->j_state_lock);
- if (current->plug)
- io_schedule();
mutex_lock(&journal->j_checkpoint_mutex);
/*
--
2.13.2
4.9.65-rt57-rc1 stable review patch.
If anyone has any objections, please let me know.
------------------
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
The commit "memcontrol: Prevent scheduling while atomic in cgroup code"
fixed this issue:
refill_stock()
get_cpu_var()
drain_stock()
res_counter_uncharge()
res_counter_uncharge_until()
spin_lock() <== boom
But commit 3e32cb2e0a12b ("mm: memcontrol: lockless page counters") replaced
the calls to res_counter_uncharge() in drain_stock() to the lockless
function page_counter_uncharge(). There is no more spin lock there and no
more reason to have that local lock.
Cc: <stable(a)vger.kernel.org>
Reported-by: Haiyang HY1 Tan <tanhy1(a)lenovo.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
[bigeasy: That upstream commit appeared in v3.19 and the patch in
question in v3.18.7-rt2 and v3.18 seems still to be maintained. So I
guess that v3.18 would need the locallocks that we are about to remove
here. I am not sure if any earlier versions have the patch
backported.
The stable tag here is because Haiyang reported (and debugged) a crash
in 4.4-RT with this patch applied (which has get_cpu_light() instead
the locallocks it gained in v4.9-RT).
https://lkml.kernel.org/r/05AA4EC5C6EC1D48BE2CDCFF3AE0B8A637F78A15@CNMAILEX…
]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
---
mm/memcontrol.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 12b94909ba7b..c04403033aec 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1698,7 +1698,6 @@ struct memcg_stock_pcp {
#define FLUSHING_CACHED_CHARGE 0
};
static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
-static DEFINE_LOCAL_IRQ_LOCK(memcg_stock_ll);
static DEFINE_MUTEX(percpu_charge_mutex);
/**
@@ -1721,7 +1720,7 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
if (nr_pages > CHARGE_BATCH)
return ret;
- local_lock_irqsave(memcg_stock_ll, flags);
+ local_irq_save(flags);
stock = this_cpu_ptr(&memcg_stock);
if (memcg == stock->cached && stock->nr_pages >= nr_pages) {
@@ -1729,7 +1728,7 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
ret = true;
}
- local_unlock_irqrestore(memcg_stock_ll, flags);
+ local_irq_restore(flags);
return ret;
}
@@ -1756,13 +1755,13 @@ static void drain_local_stock(struct work_struct *dummy)
struct memcg_stock_pcp *stock;
unsigned long flags;
- local_lock_irqsave(memcg_stock_ll, flags);
+ local_irq_save(flags);
stock = this_cpu_ptr(&memcg_stock);
drain_stock(stock);
clear_bit(FLUSHING_CACHED_CHARGE, &stock->flags);
- local_unlock_irqrestore(memcg_stock_ll, flags);
+ local_irq_restore(flags);
}
/*
@@ -1774,7 +1773,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
struct memcg_stock_pcp *stock;
unsigned long flags;
- local_lock_irqsave(memcg_stock_ll, flags);
+ local_irq_save(flags);
stock = this_cpu_ptr(&memcg_stock);
if (stock->cached != memcg) { /* reset if necessary */
@@ -1783,7 +1782,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
}
stock->nr_pages += nr_pages;
- local_unlock_irqrestore(memcg_stock_ll, flags);
+ local_irq_restore(flags);
}
/*
--
2.13.2
This is a note to let you know that I've just added the patch titled
usb: xhci: fix panic in xhci_free_virt_devices_depth_first
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 80e457699a8dbdd70f2d26911e46f538645c55fc Mon Sep 17 00:00:00 2001
From: Yu Chen <chenyu56(a)huawei.com>
Date: Fri, 1 Dec 2017 13:41:20 +0200
Subject: usb: xhci: fix panic in xhci_free_virt_devices_depth_first
Check vdev->real_port 0 to avoid panic
[ 9.261347] [<ffffff800884a390>] xhci_free_virt_devices_depth_first+0x58/0x108
[ 9.261352] [<ffffff800884a814>] xhci_mem_cleanup+0x1bc/0x570
[ 9.261355] [<ffffff8008842de8>] xhci_stop+0x140/0x1c8
[ 9.261365] [<ffffff80087ed304>] usb_remove_hcd+0xfc/0x1d0
[ 9.261369] [<ffffff80088551c4>] xhci_plat_remove+0x6c/0xa8
[ 9.261377] [<ffffff80086e928c>] platform_drv_remove+0x2c/0x70
[ 9.261384] [<ffffff80086e6ea0>] __device_release_driver+0x80/0x108
[ 9.261387] [<ffffff80086e7a1c>] device_release_driver+0x2c/0x40
[ 9.261392] [<ffffff80086e5f28>] bus_remove_device+0xe0/0x120
[ 9.261396] [<ffffff80086e2e34>] device_del+0x114/0x210
[ 9.261399] [<ffffff80086e9e00>] platform_device_del+0x30/0xa0
[ 9.261403] [<ffffff8008810bdc>] dwc3_otg_work+0x204/0x488
[ 9.261407] [<ffffff80088133fc>] event_work+0x304/0x5b8
[ 9.261414] [<ffffff80080e31b0>] process_one_work+0x148/0x490
[ 9.261417] [<ffffff80080e3548>] worker_thread+0x50/0x4a0
[ 9.261421] [<ffffff80080e9ea0>] kthread+0xe8/0x100
[ 9.261427] [<ffffff8008083680>] ret_from_fork+0x10/0x50
The problem can occur if xhci_plat_remove() is called shortly after
xhci_plat_probe(). While xhci_free_virt_devices_depth_first been
called before the device has been setup and get real_port initialized.
The problem occurred on Hikey960 and was reproduced by Guenter Roeck
on Kevin with chromeos-4.4.
Fixes: ee8665e28e8d ("xhci: free xhci virtual devices with leaf nodes first")
Cc: Guenter Roeck <groeck(a)google.com>
Cc: <stable(a)vger.kernel.org> # v4.10+
Reviewed-by: Guenter Roeck <groeck(a)chromium.org>
Tested-by: Guenter Roeck <linux(a)roeck-us.net>
Signed-off-by: Fan Ning <fanning4(a)hisilicon.com>
Signed-off-by: Li Rui <lirui39(a)hisilicon.com>
Signed-off-by: yangdi <yangdi10(a)hisilicon.com>
Signed-off-by: Yu Chen <chenyu56(a)huawei.com>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/host/xhci-mem.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index e1fba4688509..15f7d422885f 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -934,6 +934,12 @@ void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_id)
if (!vdev)
return;
+ if (vdev->real_port == 0 ||
+ vdev->real_port > HCS_MAX_PORTS(xhci->hcs_params1)) {
+ xhci_dbg(xhci, "Bad vdev->real_port.\n");
+ goto out;
+ }
+
tt_list_head = &(xhci->rh_bw[vdev->real_port - 1].tts);
list_for_each_entry_safe(tt_info, next, tt_list_head, tt_list) {
/* is this a hub device that added a tt_info to the tts list */
@@ -947,6 +953,7 @@ void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_id)
}
}
}
+out:
/* we are now at a leaf device */
xhci_debugfs_remove_slot(xhci, slot_id);
xhci_free_virt_device(xhci, slot_id);
--
2.15.1
This is a note to let you know that I've just added the patch titled
xhci: Don't show incorrect WARN message about events for empty rings
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From e4ec40ec4b260efcca15089de4285a0a3411259b Mon Sep 17 00:00:00 2001
From: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Date: Fri, 1 Dec 2017 13:41:19 +0200
Subject: xhci: Don't show incorrect WARN message about events for empty rings
xHC can generate two events for a short transfer if the short TRB and
last TRB in the TD are not the same TRB.
The driver will handle the TD after the first short event, and remove
it from its internal list. Driver then incorrectly prints a warning
for the second event:
"WARN Event TRB for slot x ep y with no TDs queued"
Fix this by not printing a warning if we get a event on a empty list
if the previous event was a short event.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/host/xhci-ring.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index c239c688076c..6eb87c6e4d24 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2477,12 +2477,16 @@ static int handle_tx_event(struct xhci_hcd *xhci,
*/
if (list_empty(&ep_ring->td_list)) {
/*
- * A stopped endpoint may generate an extra completion
- * event if the device was suspended. Don't print
- * warnings.
+ * Don't print wanings if it's due to a stopped endpoint
+ * generating an extra completion event if the device
+ * was suspended. Or, a event for the last TRB of a
+ * short TD we already got a short event for.
+ * The short TD is already removed from the TD list.
*/
+
if (!(trb_comp_code == COMP_STOPPED ||
- trb_comp_code == COMP_STOPPED_LENGTH_INVALID)) {
+ trb_comp_code == COMP_STOPPED_LENGTH_INVALID ||
+ ep_ring->last_td_was_short)) {
xhci_warn(xhci, "WARN Event TRB for slot %d ep %d with no TDs queued?\n",
TRB_TO_SLOT_ID(le32_to_cpu(event->flags)),
ep_index);
--
2.15.1
This is a note to let you know that I've just added the patch titled
ipsec: Fix aborted xfrm policy dump crash
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ipsec-fix-aborted-xfrm-policy-dump-crash.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 1137b5e2529a8f5ca8ee709288ecba3e68044df2 Mon Sep 17 00:00:00 2001
From: Herbert Xu <herbert(a)gondor.apana.org.au>
Date: Thu, 19 Oct 2017 20:51:10 +0800
Subject: ipsec: Fix aborted xfrm policy dump crash
From: Herbert Xu <herbert(a)gondor.apana.org.au>
commit 1137b5e2529a8f5ca8ee709288ecba3e68044df2 upstream.
An independent security researcher, Mohamed Ghannam, has reported
this vulnerability to Beyond Security's SecuriTeam Secure Disclosure
program.
The xfrm_dump_policy_done function expects xfrm_dump_policy to
have been called at least once or it will crash. This can be
triggered if a dump fails because the target socket's receive
buffer is full.
This patch fixes it by using the cb->start mechanism to ensure that
the initialisation is always done regardless of the buffer situation.
Fixes: 12a169e7d8f4 ("ipsec: Put dumpers on the dump list")
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert(a)secunet.com>
Cc: Guenter Roeck <linux(a)roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/xfrm/xfrm_user.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1652,32 +1652,34 @@ static int dump_one_policy(struct xfrm_p
static int xfrm_dump_policy_done(struct netlink_callback *cb)
{
- struct xfrm_policy_walk *walk = (struct xfrm_policy_walk *) &cb->args[1];
+ struct xfrm_policy_walk *walk = (struct xfrm_policy_walk *)cb->args;
struct net *net = sock_net(cb->skb->sk);
xfrm_policy_walk_done(walk, net);
return 0;
}
+static int xfrm_dump_policy_start(struct netlink_callback *cb)
+{
+ struct xfrm_policy_walk *walk = (struct xfrm_policy_walk *)cb->args;
+
+ BUILD_BUG_ON(sizeof(*walk) > sizeof(cb->args));
+
+ xfrm_policy_walk_init(walk, XFRM_POLICY_TYPE_ANY);
+ return 0;
+}
+
static int xfrm_dump_policy(struct sk_buff *skb, struct netlink_callback *cb)
{
struct net *net = sock_net(skb->sk);
- struct xfrm_policy_walk *walk = (struct xfrm_policy_walk *) &cb->args[1];
+ struct xfrm_policy_walk *walk = (struct xfrm_policy_walk *)cb->args;
struct xfrm_dump_info info;
- BUILD_BUG_ON(sizeof(struct xfrm_policy_walk) >
- sizeof(cb->args) - sizeof(cb->args[0]));
-
info.in_skb = cb->skb;
info.out_skb = skb;
info.nlmsg_seq = cb->nlh->nlmsg_seq;
info.nlmsg_flags = NLM_F_MULTI;
- if (!cb->args[0]) {
- cb->args[0] = 1;
- xfrm_policy_walk_init(walk, XFRM_POLICY_TYPE_ANY);
- }
-
(void) xfrm_policy_walk(net, walk, dump_one_policy, &info);
return skb->len;
@@ -2415,6 +2417,7 @@ static const struct nla_policy xfrma_spd
static const struct xfrm_link {
int (*doit)(struct sk_buff *, struct nlmsghdr *, struct nlattr **);
+ int (*start)(struct netlink_callback *);
int (*dump)(struct sk_buff *, struct netlink_callback *);
int (*done)(struct netlink_callback *);
const struct nla_policy *nla_pol;
@@ -2428,6 +2431,7 @@ static const struct xfrm_link {
[XFRM_MSG_NEWPOLICY - XFRM_MSG_BASE] = { .doit = xfrm_add_policy },
[XFRM_MSG_DELPOLICY - XFRM_MSG_BASE] = { .doit = xfrm_get_policy },
[XFRM_MSG_GETPOLICY - XFRM_MSG_BASE] = { .doit = xfrm_get_policy,
+ .start = xfrm_dump_policy_start,
.dump = xfrm_dump_policy,
.done = xfrm_dump_policy_done },
[XFRM_MSG_ALLOCSPI - XFRM_MSG_BASE] = { .doit = xfrm_alloc_userspi },
@@ -2479,6 +2483,7 @@ static int xfrm_user_rcv_msg(struct sk_b
{
struct netlink_dump_control c = {
+ .start = link->start,
.dump = link->dump,
.done = link->done,
};
Patches currently in stable-queue which might be from herbert(a)gondor.apana.org.au are
queue-4.4/ipsec-fix-aborted-xfrm-policy-dump-crash.patch
This is a note to let you know that I've just added the patch titled
netlink: add a start callback for starting a netlink dump
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
netlink-add-a-start-callback-for-starting-a-netlink-dump.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From fc9e50f5a5a4e1fa9ba2756f745a13e693cf6a06 Mon Sep 17 00:00:00 2001
From: Tom Herbert <tom(a)herbertland.com>
Date: Tue, 15 Dec 2015 15:41:37 -0800
Subject: netlink: add a start callback for starting a netlink dump
From: Tom Herbert <tom(a)herbertland.com>
commit fc9e50f5a5a4e1fa9ba2756f745a13e693cf6a06 upstream.
The start callback allows the caller to set up a context for the
dump callbacks. Presumably, the context can then be destroyed in
the done callback.
Signed-off-by: Tom Herbert <tom(a)herbertland.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Cc: Guenter Roeck <linux(a)roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/linux/netlink.h | 2 ++
include/net/genetlink.h | 2 ++
net/netlink/af_netlink.c | 4 ++++
net/netlink/genetlink.c | 16 ++++++++++++++++
4 files changed, 24 insertions(+)
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -131,6 +131,7 @@ netlink_skb_clone(struct sk_buff *skb, g
struct netlink_callback {
struct sk_buff *skb;
const struct nlmsghdr *nlh;
+ int (*start)(struct netlink_callback *);
int (*dump)(struct sk_buff * skb,
struct netlink_callback *cb);
int (*done)(struct netlink_callback *cb);
@@ -153,6 +154,7 @@ struct nlmsghdr *
__nlmsg_put(struct sk_buff *skb, u32 portid, u32 seq, int type, int len, int flags);
struct netlink_dump_control {
+ int (*start)(struct netlink_callback *);
int (*dump)(struct sk_buff *skb, struct netlink_callback *);
int (*done)(struct netlink_callback *);
void *data;
--- a/include/net/genetlink.h
+++ b/include/net/genetlink.h
@@ -114,6 +114,7 @@ static inline void genl_info_net_set(str
* @flags: flags
* @policy: attribute validation policy
* @doit: standard command callback
+ * @start: start callback for dumps
* @dumpit: callback for dumpers
* @done: completion callback for dumps
* @ops_list: operations list
@@ -122,6 +123,7 @@ struct genl_ops {
const struct nla_policy *policy;
int (*doit)(struct sk_buff *skb,
struct genl_info *info);
+ int (*start)(struct netlink_callback *cb);
int (*dumpit)(struct sk_buff *skb,
struct netlink_callback *cb);
int (*done)(struct netlink_callback *cb);
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2203,6 +2203,7 @@ int __netlink_dump_start(struct sock *ss
cb = &nlk->cb;
memset(cb, 0, sizeof(*cb));
+ cb->start = control->start;
cb->dump = control->dump;
cb->done = control->done;
cb->nlh = nlh;
@@ -2216,6 +2217,9 @@ int __netlink_dump_start(struct sock *ss
mutex_unlock(nlk->cb_mutex);
+ if (cb->start)
+ cb->start(cb);
+
ret = netlink_dump(sk);
sock_put(sk);
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -513,6 +513,20 @@ void *genlmsg_put(struct sk_buff *skb, u
}
EXPORT_SYMBOL(genlmsg_put);
+static int genl_lock_start(struct netlink_callback *cb)
+{
+ /* our ops are always const - netlink API doesn't propagate that */
+ const struct genl_ops *ops = cb->data;
+ int rc = 0;
+
+ if (ops->start) {
+ genl_lock();
+ rc = ops->start(cb);
+ genl_unlock();
+ }
+ return rc;
+}
+
static int genl_lock_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
{
/* our ops are always const - netlink API doesn't propagate that */
@@ -577,6 +591,7 @@ static int genl_family_rcv_msg(struct ge
.module = family->module,
/* we have const, but the netlink API doesn't */
.data = (void *)ops,
+ .start = genl_lock_start,
.dump = genl_lock_dumpit,
.done = genl_lock_done,
};
@@ -588,6 +603,7 @@ static int genl_family_rcv_msg(struct ge
} else {
struct netlink_dump_control c = {
.module = family->module,
+ .start = ops->start,
.dump = ops->dumpit,
.done = ops->done,
};
Patches currently in stable-queue which might be from tom(a)herbertland.com are
queue-4.4/netlink-add-a-start-callback-for-starting-a-netlink-dump.patch
This is a note to let you know that I've just added the patch titled
netlink: add a start callback for starting a netlink dump
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
netlink-add-a-start-callback-for-starting-a-netlink-dump.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From fc9e50f5a5a4e1fa9ba2756f745a13e693cf6a06 Mon Sep 17 00:00:00 2001
From: Tom Herbert <tom(a)herbertland.com>
Date: Tue, 15 Dec 2015 15:41:37 -0800
Subject: netlink: add a start callback for starting a netlink dump
From: Tom Herbert <tom(a)herbertland.com>
commit fc9e50f5a5a4e1fa9ba2756f745a13e693cf6a06 upstream.
The start callback allows the caller to set up a context for the
dump callbacks. Presumably, the context can then be destroyed in
the done callback.
Signed-off-by: Tom Herbert <tom(a)herbertland.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Cc: Guenter Roeck <linux(a)roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/linux/netlink.h | 2 ++
include/net/genetlink.h | 2 ++
net/netlink/af_netlink.c | 4 ++++
net/netlink/genetlink.c | 16 ++++++++++++++++
4 files changed, 24 insertions(+)
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -120,6 +120,7 @@ netlink_skb_clone(struct sk_buff *skb, g
struct netlink_callback {
struct sk_buff *skb;
const struct nlmsghdr *nlh;
+ int (*start)(struct netlink_callback *);
int (*dump)(struct sk_buff * skb,
struct netlink_callback *cb);
int (*done)(struct netlink_callback *cb);
@@ -142,6 +143,7 @@ struct nlmsghdr *
__nlmsg_put(struct sk_buff *skb, u32 portid, u32 seq, int type, int len, int flags);
struct netlink_dump_control {
+ int (*start)(struct netlink_callback *);
int (*dump)(struct sk_buff *skb, struct netlink_callback *);
int (*done)(struct netlink_callback *);
void *data;
--- a/include/net/genetlink.h
+++ b/include/net/genetlink.h
@@ -106,6 +106,7 @@ static inline void genl_info_net_set(str
* @flags: flags
* @policy: attribute validation policy
* @doit: standard command callback
+ * @start: start callback for dumps
* @dumpit: callback for dumpers
* @done: completion callback for dumps
* @ops_list: operations list
@@ -114,6 +115,7 @@ struct genl_ops {
const struct nla_policy *policy;
int (*doit)(struct sk_buff *skb,
struct genl_info *info);
+ int (*start)(struct netlink_callback *cb);
int (*dumpit)(struct sk_buff *skb,
struct netlink_callback *cb);
int (*done)(struct netlink_callback *cb);
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2043,6 +2043,7 @@ int __netlink_dump_start(struct sock *ss
cb = &nlk->cb;
memset(cb, 0, sizeof(*cb));
+ cb->start = control->start;
cb->dump = control->dump;
cb->done = control->done;
cb->nlh = nlh;
@@ -2056,6 +2057,9 @@ int __netlink_dump_start(struct sock *ss
mutex_unlock(nlk->cb_mutex);
+ if (cb->start)
+ cb->start(cb);
+
ret = netlink_dump(sk);
sock_put(sk);
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -507,6 +507,20 @@ void *genlmsg_put(struct sk_buff *skb, u
}
EXPORT_SYMBOL(genlmsg_put);
+static int genl_lock_start(struct netlink_callback *cb)
+{
+ /* our ops are always const - netlink API doesn't propagate that */
+ const struct genl_ops *ops = cb->data;
+ int rc = 0;
+
+ if (ops->start) {
+ genl_lock();
+ rc = ops->start(cb);
+ genl_unlock();
+ }
+ return rc;
+}
+
static int genl_lock_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
{
/* our ops are always const - netlink API doesn't propagate that */
@@ -571,6 +585,7 @@ static int genl_family_rcv_msg(struct ge
.module = family->module,
/* we have const, but the netlink API doesn't */
.data = (void *)ops,
+ .start = genl_lock_start,
.dump = genl_lock_dumpit,
.done = genl_lock_done,
};
@@ -582,6 +597,7 @@ static int genl_family_rcv_msg(struct ge
} else {
struct netlink_dump_control c = {
.module = family->module,
+ .start = ops->start,
.dump = ops->dumpit,
.done = ops->done,
};
Patches currently in stable-queue which might be from tom(a)herbertland.com are
queue-3.18/netlink-add-a-start-callback-for-starting-a-netlink-dump.patch
This is a note to let you know that I've just added the patch titled
ipsec: Fix aborted xfrm policy dump crash
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ipsec-fix-aborted-xfrm-policy-dump-crash.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 1137b5e2529a8f5ca8ee709288ecba3e68044df2 Mon Sep 17 00:00:00 2001
From: Herbert Xu <herbert(a)gondor.apana.org.au>
Date: Thu, 19 Oct 2017 20:51:10 +0800
Subject: ipsec: Fix aborted xfrm policy dump crash
From: Herbert Xu <herbert(a)gondor.apana.org.au>
commit 1137b5e2529a8f5ca8ee709288ecba3e68044df2 upstream.
An independent security researcher, Mohamed Ghannam, has reported
this vulnerability to Beyond Security's SecuriTeam Secure Disclosure
program.
The xfrm_dump_policy_done function expects xfrm_dump_policy to
have been called at least once or it will crash. This can be
triggered if a dump fails because the target socket's receive
buffer is full.
This patch fixes it by using the cb->start mechanism to ensure that
the initialisation is always done regardless of the buffer situation.
Fixes: 12a169e7d8f4 ("ipsec: Put dumpers on the dump list")
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert(a)secunet.com>
Cc: Guenter Roeck <linux(a)roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/xfrm/xfrm_user.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1625,32 +1625,34 @@ static int dump_one_policy(struct xfrm_p
static int xfrm_dump_policy_done(struct netlink_callback *cb)
{
- struct xfrm_policy_walk *walk = (struct xfrm_policy_walk *) &cb->args[1];
+ struct xfrm_policy_walk *walk = (struct xfrm_policy_walk *)cb->args;
struct net *net = sock_net(cb->skb->sk);
xfrm_policy_walk_done(walk, net);
return 0;
}
+static int xfrm_dump_policy_start(struct netlink_callback *cb)
+{
+ struct xfrm_policy_walk *walk = (struct xfrm_policy_walk *)cb->args;
+
+ BUILD_BUG_ON(sizeof(*walk) > sizeof(cb->args));
+
+ xfrm_policy_walk_init(walk, XFRM_POLICY_TYPE_ANY);
+ return 0;
+}
+
static int xfrm_dump_policy(struct sk_buff *skb, struct netlink_callback *cb)
{
struct net *net = sock_net(skb->sk);
- struct xfrm_policy_walk *walk = (struct xfrm_policy_walk *) &cb->args[1];
+ struct xfrm_policy_walk *walk = (struct xfrm_policy_walk *)cb->args;
struct xfrm_dump_info info;
- BUILD_BUG_ON(sizeof(struct xfrm_policy_walk) >
- sizeof(cb->args) - sizeof(cb->args[0]));
-
info.in_skb = cb->skb;
info.out_skb = skb;
info.nlmsg_seq = cb->nlh->nlmsg_seq;
info.nlmsg_flags = NLM_F_MULTI;
- if (!cb->args[0]) {
- cb->args[0] = 1;
- xfrm_policy_walk_init(walk, XFRM_POLICY_TYPE_ANY);
- }
-
(void) xfrm_policy_walk(net, walk, dump_one_policy, &info);
return skb->len;
@@ -2384,6 +2386,7 @@ static const struct nla_policy xfrma_spd
static const struct xfrm_link {
int (*doit)(struct sk_buff *, struct nlmsghdr *, struct nlattr **);
+ int (*start)(struct netlink_callback *);
int (*dump)(struct sk_buff *, struct netlink_callback *);
int (*done)(struct netlink_callback *);
const struct nla_policy *nla_pol;
@@ -2397,6 +2400,7 @@ static const struct xfrm_link {
[XFRM_MSG_NEWPOLICY - XFRM_MSG_BASE] = { .doit = xfrm_add_policy },
[XFRM_MSG_DELPOLICY - XFRM_MSG_BASE] = { .doit = xfrm_get_policy },
[XFRM_MSG_GETPOLICY - XFRM_MSG_BASE] = { .doit = xfrm_get_policy,
+ .start = xfrm_dump_policy_start,
.dump = xfrm_dump_policy,
.done = xfrm_dump_policy_done },
[XFRM_MSG_ALLOCSPI - XFRM_MSG_BASE] = { .doit = xfrm_alloc_userspi },
@@ -2443,6 +2447,7 @@ static int xfrm_user_rcv_msg(struct sk_b
{
struct netlink_dump_control c = {
+ .start = link->start,
.dump = link->dump,
.done = link->done,
};
Patches currently in stable-queue which might be from herbert(a)gondor.apana.org.au are
queue-3.18/ipsec-fix-aborted-xfrm-policy-dump-crash.patch
From: Martin Kelly <mkelly(a)xevo.com>
Currently, when you disconnect the device, the driver infinitely
resubmits all URBs, so you see:
Rx URB aborted (-32)
in an infinite loop.
Fix this by catching -EPIPE (what we get in urb->status when the device
disconnects) and not resubmitting.
With this patch, I can plug and unplug many times and the driver
recovers correctly.
Signed-off-by: Martin Kelly <mkelly(a)xevo.com>
Cc: linux-stable <stable(a)vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/usb/mcba_usb.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/can/usb/mcba_usb.c b/drivers/net/can/usb/mcba_usb.c
index c4355f0a20d5..ef417dcddbf7 100644
--- a/drivers/net/can/usb/mcba_usb.c
+++ b/drivers/net/can/usb/mcba_usb.c
@@ -592,6 +592,7 @@ static void mcba_usb_read_bulk_callback(struct urb *urb)
break;
case -ENOENT:
+ case -EPIPE:
case -ESHUTDOWN:
return;
--
2.15.0
From: Stephane Grosjean <s.grosjean(a)peak-system.com>
PCI/PCIe drivers for PEAK-System CAN/CAN-FD interfaces do some access to the
PCI config during probing. In case one of these accesses fails, a POSITIVE
PCIBIOS_xxx error code is returned back. This POSITIVE error code MUST be
converted into a NEGATIVE errno for the probe() function to indicate it
failed. Using the pcibios_err_to_errno() function, we make sure that the
return code will always be negative.
Signed-off-by: Stephane Grosjean <s.grosjean(a)peak-system.com>
Cc: linux-stable <stable(a)vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/peak_canfd/peak_pciefd_main.c | 5 ++++-
drivers/net/can/sja1000/peak_pci.c | 5 ++++-
2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/net/can/peak_canfd/peak_pciefd_main.c b/drivers/net/can/peak_canfd/peak_pciefd_main.c
index b4efd711f824..788c3464a3b0 100644
--- a/drivers/net/can/peak_canfd/peak_pciefd_main.c
+++ b/drivers/net/can/peak_canfd/peak_pciefd_main.c
@@ -825,7 +825,10 @@ static int peak_pciefd_probe(struct pci_dev *pdev,
err_disable_pci:
pci_disable_device(pdev);
- return err;
+ /* pci_xxx_config_word() return positive PCIBIOS_xxx error codes while
+ * the probe() function must return a negative errno in case of failure
+ * (err is unchanged if negative) */
+ return pcibios_err_to_errno(err);
}
/* free the board structure object, as well as its resources: */
diff --git a/drivers/net/can/sja1000/peak_pci.c b/drivers/net/can/sja1000/peak_pci.c
index 131026fbc2d7..5adc95c922ee 100644
--- a/drivers/net/can/sja1000/peak_pci.c
+++ b/drivers/net/can/sja1000/peak_pci.c
@@ -717,7 +717,10 @@ static int peak_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
failure_disable_pci:
pci_disable_device(pdev);
- return err;
+ /* pci_xxx_config_word() return positive PCIBIOS_xxx error codes while
+ * the probe() function must return a negative errno in case of failure
+ * (err is unchanged if negative) */
+ return pcibios_err_to_errno(err);
}
static void peak_pci_remove(struct pci_dev *pdev)
--
2.15.0
From: Oliver Stäbler <oliver.staebler(a)bytesatwork.ch>
After commit d75b1ade567f ("net: less interrupt masking in NAPI") napi
repoll is done only when work_done == budget.
So we need to return budget if there are still packets to receive.
Signed-off-by: Oliver Stäbler <oliver.staebler(a)bytesatwork.ch>
Cc: linux-stable <stable(a)vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/ti_hecc.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/can/ti_hecc.c b/drivers/net/can/ti_hecc.c
index 4d4941469cfc..db6ea936dc3f 100644
--- a/drivers/net/can/ti_hecc.c
+++ b/drivers/net/can/ti_hecc.c
@@ -637,6 +637,9 @@ static int ti_hecc_rx_poll(struct napi_struct *napi, int quota)
mbx_mask = hecc_read(priv, HECC_CANMIM);
mbx_mask |= HECC_TX_MBOX_MASK;
hecc_write(priv, HECC_CANMIM, mbx_mask);
+ } else {
+ /* repoll is done only if whole budget is used */
+ num_pkts = quota;
}
return num_pkts;
--
2.15.0
From: Jimmy Assarsson <jimmyassarsson(a)gmail.com>
The conditon in the while-loop becomes true when actual_length is less than
2 (MSG_HEADER_LEN). In best case we end up with a former, already
dispatched msg, that got msg->len greater than actual_length. This will
result in a "Format error" error printout.
Problem seen when unplugging a Kvaser USB device connected to a vbox guest.
warning: comparison between signed and unsigned integer expressions
[-Wsign-compare]
Signed-off-by: Jimmy Assarsson <jimmyassarsson(a)gmail.com>
Cc: linux-stable <stable(a)vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/usb/kvaser_usb.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/can/usb/kvaser_usb.c b/drivers/net/can/usb/kvaser_usb.c
index 075644591498..d87e330a20b3 100644
--- a/drivers/net/can/usb/kvaser_usb.c
+++ b/drivers/net/can/usb/kvaser_usb.c
@@ -1334,7 +1334,7 @@ static void kvaser_usb_read_bulk_callback(struct urb *urb)
goto resubmit_urb;
}
- while (pos <= urb->actual_length - MSG_HEADER_LEN) {
+ while (pos <= (int)(urb->actual_length - MSG_HEADER_LEN)) {
msg = urb->transfer_buffer + pos;
/* The Kvaser firmware can only read and write messages that
--
2.15.0
From: Johannes Berg <johannes.berg(a)intel.com>
Before deleting a time event (remain-on-channel instance), flush
the queue so that frames cannot get stuck on it. We already flush
the AUX STA queues, but a separate station is used for the P2P
Device queue.
Cc: stable(a)vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
Signed-off-by: Luca Coelho <luciano.coelho(a)intel.com>
---
drivers/net/wireless/intel/iwlwifi/mvm/mvm.h | 2 ++
.../net/wireless/intel/iwlwifi/mvm/time-event.c | 24 ++++++++++++++++++++--
2 files changed, 24 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h b/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h
index 6a9a25beab3f..55ab5349dd40 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h
@@ -1061,6 +1061,7 @@ struct iwl_mvm {
* @IWL_MVM_STATUS_ROC_AUX_RUNNING: AUX remain-on-channel is running
* @IWL_MVM_STATUS_D3_RECONFIG: D3 reconfiguration is being done
* @IWL_MVM_STATUS_FIRMWARE_RUNNING: firmware is running
+ * @IWL_MVM_STATUS_NEED_FLUSH_P2P: need to flush P2P bcast STA
*/
enum iwl_mvm_status {
IWL_MVM_STATUS_HW_RFKILL,
@@ -1072,6 +1073,7 @@ enum iwl_mvm_status {
IWL_MVM_STATUS_ROC_AUX_RUNNING,
IWL_MVM_STATUS_D3_RECONFIG,
IWL_MVM_STATUS_FIRMWARE_RUNNING,
+ IWL_MVM_STATUS_NEED_FLUSH_P2P,
};
/* Keep track of completed init configuration */
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/time-event.c b/drivers/net/wireless/intel/iwlwifi/mvm/time-event.c
index 4d0314912e94..e25cda9fbf6c 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/time-event.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/time-event.c
@@ -132,6 +132,24 @@ void iwl_mvm_roc_done_wk(struct work_struct *wk)
* executed, and a new time event means a new command.
*/
iwl_mvm_flush_sta(mvm, &mvm->aux_sta, true, CMD_ASYNC);
+
+ /* Do the same for the P2P device queue (STA) */
+ if (test_and_clear_bit(IWL_MVM_STATUS_NEED_FLUSH_P2P, &mvm->status)) {
+ struct iwl_mvm_vif *mvmvif;
+
+ /*
+ * NB: access to this pointer would be racy, but the flush bit
+ * can only be set when we had a P2P-Device VIF, and we have a
+ * flush of this work in iwl_mvm_prepare_mac_removal() so it's
+ * not really racy.
+ */
+
+ if (!WARN_ON(!mvm->p2p_device_vif)) {
+ mvmvif = iwl_mvm_vif_from_mac80211(mvm->p2p_device_vif);
+ iwl_mvm_flush_sta(mvm, &mvmvif->bcast_sta, true,
+ CMD_ASYNC);
+ }
+ }
}
static void iwl_mvm_roc_finished(struct iwl_mvm *mvm)
@@ -855,10 +873,12 @@ void iwl_mvm_stop_roc(struct iwl_mvm *mvm)
mvmvif = iwl_mvm_vif_from_mac80211(te_data->vif);
- if (te_data->vif->type == NL80211_IFTYPE_P2P_DEVICE)
+ if (te_data->vif->type == NL80211_IFTYPE_P2P_DEVICE) {
iwl_mvm_remove_time_event(mvm, mvmvif, te_data);
- else
+ set_bit(IWL_MVM_STATUS_NEED_FLUSH_P2P, &mvm->status);
+ } else {
iwl_mvm_remove_aux_roc_te(mvm, mvmvif, te_data);
+ }
iwl_mvm_roc_finished(mvm);
}
--
2.15.0
xHC can generate two events for a short transfer if the short TRB and
last TRB in the TD are not the same TRB.
The driver will handle the TD after the first short event, and remove
it from its internal list. Driver then incorrectly prints a warning
for the second event:
"WARN Event TRB for slot x ep y with no TDs queued"
Fix this by not printing a warning if we get a event on a empty list
if the previous event was a short event.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
---
drivers/usb/host/xhci-ring.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index c239c68..6eb87c6 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2477,12 +2477,16 @@ static int handle_tx_event(struct xhci_hcd *xhci,
*/
if (list_empty(&ep_ring->td_list)) {
/*
- * A stopped endpoint may generate an extra completion
- * event if the device was suspended. Don't print
- * warnings.
+ * Don't print wanings if it's due to a stopped endpoint
+ * generating an extra completion event if the device
+ * was suspended. Or, a event for the last TRB of a
+ * short TD we already got a short event for.
+ * The short TD is already removed from the TD list.
*/
+
if (!(trb_comp_code == COMP_STOPPED ||
- trb_comp_code == COMP_STOPPED_LENGTH_INVALID)) {
+ trb_comp_code == COMP_STOPPED_LENGTH_INVALID ||
+ ep_ring->last_td_was_short)) {
xhci_warn(xhci, "WARN Event TRB for slot %d ep %d with no TDs queued?\n",
TRB_TO_SLOT_ID(le32_to_cpu(event->flags)),
ep_index);
--
2.7.4
The commit de3ee99b097d ("mmc: Delete bounce buffer handling") deletes the
bounce buffer handling, but also causes the max_req_size for sdhci to be
increased, in case when max_segs == 1. This causes errors for sdhci-pci
Ricoh variant, about the swiotlb buffer to become full.
Fix the issue, by taking IO_TLB_SEGSIZE and IO_TLB_SHIFT into account when
deciding the max_req_size for sdhci.
Reported-by: Jiri Slaby <jslaby(a)suse.cz>
Fixes: de3ee99b097d ("mmc: Delete bounce buffer handling")
Cc: <stable(a)vger.kernel.org> # v4.14+
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
Tested-by: Jiri Slaby <jslaby(a)suse.cz>
---
drivers/mmc/host/sdhci.c | 28 ++++++++++++++++++----------
1 file changed, 18 insertions(+), 10 deletions(-)
diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 2f14334..e9290a3 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -21,6 +21,7 @@
#include <linux/dma-mapping.h>
#include <linux/slab.h>
#include <linux/scatterlist.h>
+#include <linux/swiotlb.h>
#include <linux/regulator/consumer.h>
#include <linux/pm_runtime.h>
#include <linux/of.h>
@@ -3651,22 +3652,29 @@ int sdhci_setup_host(struct sdhci_host *host)
spin_lock_init(&host->lock);
/*
+ * Maximum number of sectors in one transfer. Limited by SDMA boundary
+ * size (512KiB). Note some tuning modes impose a 4MiB limit, but this
+ * is less anyway.
+ */
+ mmc->max_req_size = 524288;
+
+ /*
* Maximum number of segments. Depends on if the hardware
* can do scatter/gather or not.
*/
- if (host->flags & SDHCI_USE_ADMA)
+ if (host->flags & SDHCI_USE_ADMA) {
mmc->max_segs = SDHCI_MAX_SEGS;
- else if (host->flags & SDHCI_USE_SDMA)
+ } else if (host->flags & SDHCI_USE_SDMA) {
mmc->max_segs = 1;
- else /* PIO */
+ if (swiotlb_max_segment()) {
+ unsigned int max_req_size = (1 << IO_TLB_SHIFT) *
+ IO_TLB_SEGSIZE;
+ mmc->max_req_size = min(mmc->max_req_size,
+ max_req_size);
+ }
+ } else { /* PIO */
mmc->max_segs = SDHCI_MAX_SEGS;
-
- /*
- * Maximum number of sectors in one transfer. Limited by SDMA boundary
- * size (512KiB). Note some tuning modes impose a 4MiB limit, but this
- * is less anyway.
- */
- mmc->max_req_size = 524288;
+ }
/*
* Maximum segment size. Could be one segment with the maximum number
--
2.7.4
BCM43341 devices soldered onto the PCB (non-removable) always (AFAICT)
use an UART connection for bluetooth. But they also advertise btsdio
support on their 3th sdio function, this causes 2 problems:
1) A non functioning BT HCI getting registered
2) Since the btsdio driver does not have suspend/resume callbacks,
mmc_sdio_pre_suspend will return -ENOSYS, causing mmc_pm_notify()
to react as if the SDIO-card is removed and since the slot is
marked as non-removable it will never get detected as inserted again.
Which results in wifi no longer working after a suspend/resume.
This commit fixes both by making btsdio ignore BCM43341 devices
when connected to a slot which is marked non-removable.
Cc: stable(a)vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
drivers/bluetooth/btsdio.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/bluetooth/btsdio.c b/drivers/bluetooth/btsdio.c
index c8e945d19ffe..76c1405c7242 100644
--- a/drivers/bluetooth/btsdio.c
+++ b/drivers/bluetooth/btsdio.c
@@ -31,6 +31,7 @@
#include <linux/errno.h>
#include <linux/skbuff.h>
+#include <linux/mmc/host.h>
#include <linux/mmc/sdio_ids.h>
#include <linux/mmc/sdio_func.h>
@@ -292,6 +293,15 @@ static int btsdio_probe(struct sdio_func *func,
tuple = tuple->next;
}
+ /*
+ * BCM43341 devices soldered onto the PCB (non-removable) use an
+ * uart connection for bluetooth, ignore the BT SDIO interface.
+ */
+ if (func->vendor == SDIO_VENDOR_ID_BROADCOM &&
+ func->device == SDIO_DEVICE_ID_BROADCOM_43341 &&
+ !mmc_card_is_removable(func->card->host))
+ return -ENODEV;
+
data = devm_kzalloc(&func->dev, sizeof(*data), GFP_KERNEL);
if (!data)
return -ENOMEM;
--
2.14.3
SMSM is not symmetrical, the incoming bits from WCNSS are available at
index 6, but the outgoing host id for WCNSS is 3. Further more, upstream
references the base of APCS (in contrast to downstream), so the register
offset of 8 must be included.
Fixes: 1fb47e0a9ba4 ("arm64: dts: qcom: msm8916: Add smsm and smp2p nodes")
Cc: stable(a)vger.kernel.org
Reported-by: Ramon Fried <rfried(a)codeaurora.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson(a)linaro.org>
---
arch/arm64/boot/dts/qcom/msm8916.dtsi | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/qcom/msm8916.dtsi b/arch/arm64/boot/dts/qcom/msm8916.dtsi
index 9fb317853a5c..a8e1e3b4562c 100644
--- a/arch/arm64/boot/dts/qcom/msm8916.dtsi
+++ b/arch/arm64/boot/dts/qcom/msm8916.dtsi
@@ -1437,8 +1437,8 @@
#address-cells = <1>;
#size-cells = <0>;
- qcom,ipc-1 = <&apcs 0 13>;
- qcom,ipc-6 = <&apcs 0 19>;
+ qcom,ipc-1 = <&apcs 8 13>;
+ qcom,ipc-3 = <&apcs 8 19>;
apps_smsm: apps@0 {
reg = <0>;
--
2.15.0
Hi Greg,
At 10/05/2017 04:30 PM, gregkh(a)linuxfoundation.org wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> x86/acpi: Restore the order of CPU IDs
>
> to the 4.9-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
Dao found a bug in Linux 4.9 LTS which shows below.
The reason of the bug is that we just backport the patch titled
x86/acpi: Restore the order of CPU IDs
but, ignored the other patches in the series[1].
IMO, the commit c962cff17dfa
("Revert "x86/acpi: Set persistent cpuid <-> nodeid mapping when
booting"")
in the series can fixed this bug. I suggest to backport it.
BTW, I read the rules in Documentation/process/stable-kernel-rules.rst,
and found that:
...
- It cannot be bigger than 100 lines, with context.
...
I guess it seems that it's the reason why it did not be pulled in
stable tree. Is it right? Can you tell more details about it. :-)
[1] https://lkml.org/lkml/2017/3/3/71
Thanks,
dou.
...
[ 3.210401] BUG: unable to handle kernel NULL pointer dereference at
(null)
[ 3.219161] IP: [<ffffffffa5e77158>] __queue_work+0x78/0x420
[ 3.225491] PGD 0 [ 3.227537]
[ 3.229205] Oops: 0000 [#1] SMP
[ 3.232707] Modules linked in:
[ 3.236124] CPU: 25 PID: 1 Comm: swapper/0 Not tainted
4.9.59-cloudflare-2017.10.3 #1
[ 3.244857] Hardware name: IBM x3630M4 -[7158OCN]-/00KF922, BIOS
-[BEE142AUS-1.71]- 07/30/2014
[ 3.254461] task: ffff8dc2b2281e00 task.stack: ffffaa348c474000
[ 3.261068] RIP: 0010:[<ffffffffa5e77158>] [<ffffffffa5e77158>]
__queue_work+0x78/0x420
[ 3.270112] RSP: 0000:ffffaa348c477cb0 EFLAGS: 00010046
[ 3.276039] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
0000000000000000
[ 3.284002] RDX: ffff8dc2b0272000 RSI: 000000007fffffff RDI:
ffff8dc2b0272000
[ 3.291965] RBP: ffffaa348c477ce8 R08: 000000000001b3a0 R09:
ffff8dc2be807840
[ 3.299929] R10: 0000000000ffff0a R11: 0000000000000003 R12:
ffff8dc2b0272000
[ 3.307894] R13: 0000000000000200 R14: ffff8dc2be8a2000 R15:
0000000000013198
[ 3.315857] FS: 0000000000000000(0000) GS:ffff8dc2bf440000(0000)
knlGS:0000000000000000
[ 3.324890] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.331301] CR2: 0000000000000000 CR3: 00000001a2c07000 CR4:
00000000001406e0
[ 3.339264] Stack:
[ 3.341505] ffff8dc2afcea000 0000001900000800 0000000000000246
0000000000000000
[ 3.349801] 0000000000000000 ffffffffa686bdbd ffff8dc2b13e39c0
ffffaa348c477d00
[ 3.358094] ffffffffa5e77519 ffff8dc2b0272000 ffffaa348c477d40
ffffffffa5e73eae
[ 3.366390] Call Trace:
[ 3.369121] [<ffffffffa5e77519>] queue_work_on+0x19/0x30
[ 3.375146] [<ffffffffa5e73eae>] call_usermodehelper_exec+0x7e/0x130
[ 3.382337] [<ffffffffa6242157>] kobject_uevent_env+0x4b7/0x510
[ 3.389033] [<ffffffffa62421bb>] kobject_uevent+0xb/0x10
[ 3.395058] [<ffffffffa62416e9>] kset_register+0x59/0x70
[ 3.401086] [<ffffffffa63268b0>] bus_register+0xd0/0x260
[ 3.407114] [<ffffffffa6bdd7bf>] ? acpi_int340x_thermal_init+0x12/0x12
[ 3.414496] [<ffffffffa6bdd7cf>] pnp_init+0x10/0x12
[ 3.420039] [<ffffffffa5e00440>] do_one_initcall+0x50/0x180
[ 3.426357] [<ffffffffa6b97077>] kernel_init_freeable+0x1a2/0x22a
[ 3.433258] [<ffffffffa655df40>] ? rest_init+0x80/0x80
[ 3.439089] [<ffffffffa655df4e>] kernel_init+0xe/0x100
[ 3.444921] [<ffffffffa6564da2>] ret_from_fork+0x22/0x30
[ 3.450945] Code: 00 00 41 f6 86 00 01 00 00 02 0f 85 ee 00 0
...
> The filename of the patch is:
> x86-acpi-restore-the-order-of-cpu-ids.patch
> and it can be found in the queue-4.9 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
>>From foo@baz Thu Oct 5 10:28:31 CEST 2017
> From: Dou Liyang <douly.fnst(a)cn.fujitsu.com>
> Date: Fri, 3 Mar 2017 16:02:25 +0800
> Subject: x86/acpi: Restore the order of CPU IDs
>
> From: Dou Liyang <douly.fnst(a)cn.fujitsu.com>
>
>
> [ Upstream commit 2b85b3d22920db7473e5fed5719e7955c0ec323e ]
>
> The following commits:
>
> f7c28833c2 ("x86/acpi: Enable acpi to register all possible cpus at
> boot time") and 8f54969dc8 ("x86/acpi: Introduce persistent storage
> for cpuid <-> apicid mapping")
>
> ... registered all the possible CPUs at boot time via ACPI tables to
> make the mapping of cpuid <-> apicid fixed. Both enabled and disabled
> CPUs could have a logical CPU ID after boot time.
>
> But, ACPI tables are unreliable. the number amd order of Local APIC
> entries which depends on the firmware is often inconsistent with the
> physical devices. Even if they are consistent, The disabled CPUs which
> take up some logical CPU IDs will also make the order discontinuous.
>
> Revert the part of disabled CPUs registration, keep the allocation
> logic of logical CPU IDs and also keep some code location changes.
>
> Signed-off-by: Dou Liyang <douly.fnst(a)cn.fujitsu.com>
> Tested-by: Xiaolong Ye <xiaolong.ye(a)intel.com>
> Cc: rjw(a)rjwysocki.net
> Cc: linux-acpi(a)vger.kernel.org
> Cc: guzheng1(a)huawei.com
> Cc: izumi.taku(a)jp.fujitsu.com
> Cc: lenb(a)kernel.org
> Link: http://lkml.kernel.org/r/1488528147-2279-4-git-send-email-douly.fnst@cn.fuj…
> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
> Signed-off-by: Sasha Levin <alexander.levin(a)verizon.com>
> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> ---
> arch/x86/kernel/acpi/boot.c | 7 ++++++-
> arch/x86/kernel/apic/apic.c | 26 +++++++-------------------
> 2 files changed, 13 insertions(+), 20 deletions(-)
>
> --- a/arch/x86/kernel/acpi/boot.c
> +++ b/arch/x86/kernel/acpi/boot.c
> @@ -176,10 +176,15 @@ static int acpi_register_lapic(int id, u
> return -EINVAL;
> }
>
> + if (!enabled) {
> + ++disabled_cpus;
> + return -EINVAL;
> + }
> +
> if (boot_cpu_physical_apicid != -1U)
> ver = boot_cpu_apic_version;
>
> - cpu = __generic_processor_info(id, ver, enabled);
> + cpu = generic_processor_info(id, ver);
> if (cpu >= 0)
> early_per_cpu(x86_cpu_to_acpiid, cpu) = acpiid;
>
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -2070,7 +2070,7 @@ static int allocate_logical_cpuid(int ap
> return nr_logical_cpuids++;
> }
>
> -int __generic_processor_info(int apicid, int version, bool enabled)
> +int generic_processor_info(int apicid, int version)
> {
> int cpu, max = nr_cpu_ids;
> bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
> @@ -2128,11 +2128,9 @@ int __generic_processor_info(int apicid,
> if (num_processors >= nr_cpu_ids) {
> int thiscpu = max + disabled_cpus;
>
> - if (enabled) {
> - pr_warning("APIC: NR_CPUS/possible_cpus limit of %i "
> - "reached. Processor %d/0x%x ignored.\n",
> - max, thiscpu, apicid);
> - }
> + pr_warning("APIC: NR_CPUS/possible_cpus limit of %i "
> + "reached. Processor %d/0x%x ignored.\n",
> + max, thiscpu, apicid);
>
> disabled_cpus++;
> return -EINVAL;
> @@ -2184,23 +2182,13 @@ int __generic_processor_info(int apicid,
> apic->x86_32_early_logical_apicid(cpu);
> #endif
> set_cpu_possible(cpu, true);
> -
> - if (enabled) {
> - num_processors++;
> - physid_set(apicid, phys_cpu_present_map);
> - set_cpu_present(cpu, true);
> - } else {
> - disabled_cpus++;
> - }
> + physid_set(apicid, phys_cpu_present_map);
> + set_cpu_present(cpu, true);
> + num_processors++;
>
> return cpu;
> }
>
> -int generic_processor_info(int apicid, int version)
> -{
> - return __generic_processor_info(apicid, version, true);
> -}
> -
> int hard_smp_processor_id(void)
> {
> return read_apic_id();
>
>
> Patches currently in stable-queue which might be from douly.fnst(a)cn.fujitsu.com are
>
> queue-4.9/x86-acpi-restore-the-order-of-cpu-ids.patch
>
>
>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Hi Greg,
Pleae pull commits for Linux 4.4 .
I've sent a review request for all commits over a week ago and all
comments were addressed.
Thanks,
Sasha
=====
The following changes since commit 08c15ad2e6278a5fe1b209e8fcdbd2d235c48f34:
Linux 4.4.103 (2017-11-30 08:37:28 +0000)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/sashal/linux-stable.git for-greg/4.14/4.4
for you to fetch changes up to 35875b21e77f03b5e0ce4278579294af69da5d00:
kprobes/x86: Disable preemption in ftrace-based jprobes (2017-11-30 16:49:32 -0500)
- ----------------------------------------------------------------
Aaron Sierra (1):
serial: 8250: Preserve DLD[7:4] for PORT_XR17V35X
Alexey Khoroshilov (1):
usb: phy: tahvo: fix error handling in tahvo_usb_probe()
Andy Lutomirski (1):
selftests/x86/ldt_get: Add a few additional tests for limits
Ben Hutchings (1):
usbip: tools: Install all headers needed for libusbip development
Boshi Wang (1):
ima: fix hash algorithm initialization
Christian Borntraeger (1):
s390/pci: do not require AIS facility
Dave Hansen (1):
x86/entry: Use SYSCALL_DEFINE() macros for sys_modify_ldt()
Gustavo A. R. Silva (1):
EDAC, sb_edac: Fix missing break in switch
Hiromitsu Yamasaki (1):
spi: sh-msiof: Fix DMA transfer size check
Jibin Xu (1):
sysrq : fix Show Regs call trace on ARM
John Stultz (2):
usb: dwc2: Fix UDC state tracking
usb: dwc2: Error out of dwc2_hsotg_ep_disable() if we're in host mode
Lukas Wunner (1):
serial: 8250_fintek: Fix rs485 disablement on invalid ioctl()
Masami Hiramatsu (1):
kprobes/x86: Disable preemption in ftrace-based jprobes
Thomas Richter (1):
perf test attr: Fix ignored test case result
arch/s390/include/asm/pci_insn.h | 2 +-
arch/s390/pci/pci.c | 5 +++--
arch/s390/pci/pci_insn.c | 6 +++++-
arch/x86/include/asm/syscalls.h | 2 +-
arch/x86/kernel/kprobes/ftrace.c | 23 ++++++++++++++---------
arch/x86/kernel/ldt.c | 16 +++++++++++++---
arch/x86/um/ldt.c | 7 +++++--
drivers/edac/sb_edac.c | 1 +
drivers/spi/spi-sh-msiof.c | 2 +-
drivers/tty/serial/8250/8250_fintek.c | 2 +-
drivers/tty/serial/8250/8250_port.c | 5 ++++-
drivers/tty/sysrq.c | 9 +++++++--
drivers/usb/dwc2/gadget.c | 7 +++++++
drivers/usb/phy/phy-tahvo.c | 3 ++-
security/integrity/ima/ima_main.c | 4 ++++
tools/perf/tests/attr.c | 2 +-
tools/testing/selftests/x86/ldt_gdt.c | 17 ++++++++++++++++-
tools/usb/usbip/Makefile.am | 3 ++-
18 files changed, 88 insertions(+), 28 deletions(-)
-----BEGIN PGP SIGNATURE-----
iQIcBAEBCAAGBQJaIH/+AAoJEN6mb/eXdyzcQQkP/j3Cdl+nqMgY28ui3xmFG/7N
rmTIjsIjS9lCPHH2i53jR518WPwKwte47k25FzOZi0kwdG2Q7Ds8N0jxJhs/rquU
lrz9T41bO7wihILHTFBOca185Mzf7elXF+VATP+dtVz51JnqvpL1vtZe29Wsv9We
WTuCdjnc9Sv6DP0DbctxQToUCc+SFT/DqOlgdTQfZJmhQAcNrCxmfzKjMJ48W38V
P4FgBdZ7I1pkaNIXWbzDB8XARzesdupFjMjACgTCQ/XdhodgdPwgssNMURX8uKC8
PIpbW8syYQPGmepLv7hiDoFgPKxDPgfSVQXgFIQkfFWXSMnnulI+sj7gi2F8u1fS
MVENj75UTBKL1RgbwFxuIZLRZKDcMi/RRjAdOxiMQWY5v0PWMnCwta1/CtGKnkBT
Wf1do7Lt0ce8sstd+udztNvfQToSBi47LMC/Sdn2GcXTCM1MTUUXiMHeybYLZR1+
Kpsjpj4Fc1CqIrelJUA78KpDpCpUb5osj8tnahx4pYmBp11V3G6jGQZYLPZDUxdU
0vpzAaeeVIRg2X17VB4zuDdLzVMrB0ChImRcXIDht7wuyMG2+vqKX7IB+Mit7yEQ
md6Benpeok44SSxlXKRrVz37ZlStxGSVIG+F1DTR4dE17N/4ZUNpKx/eXP8pFtlh
QVkzov+tpzlMNXSB1cg6
=BCzy
-----END PGP SIGNATURE-----
From: Sachin Sant <sachinp(a)linux.vnet.ibm.com>
[ Upstream commit a6d8a21596df041f36f4c2ccc260c459e3e851f1 ]
Tests under alignment subdirectory are skipped when executed on previous
generation hardware, but harness still marks them as failed.
test: test_copy_unaligned
tags: git_version:unknown
[SKIP] Test skipped on line 26
skip: test_copy_unaligned
selftests: copy_unaligned [FAIL]
The MAGIC_SKIP_RETURN_VALUE value assigned to rc variable is retained till
the program exit which causes the test to be marked as failed.
This patch resets the value before returning to the main() routine.
With this patch the test o/p is as follows:
test: test_copy_unaligned
tags: git_version:unknown
[SKIP] Test skipped on line 26
skip: test_copy_unaligned
selftests: copy_unaligned [PASS]
Signed-off-by: Sachin Sant <sachinp(a)linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin(a)verizon.com>
---
tools/testing/selftests/powerpc/harness.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/powerpc/harness.c b/tools/testing/selftests/powerpc/harness.c
index 8ebc58a09311..d1ed7bab65a5 100644
--- a/tools/testing/selftests/powerpc/harness.c
+++ b/tools/testing/selftests/powerpc/harness.c
@@ -105,9 +105,11 @@ int test_harness(int (test_function)(void), char *name)
rc = run_test(test_function, name);
- if (rc == MAGIC_SKIP_RETURN_VALUE)
+ if (rc == MAGIC_SKIP_RETURN_VALUE) {
test_skip(name);
- else
+ /* so that skipped test is not marked as failed */
+ rc = 0;
+ } else
test_finish(name, rc);
return rc;
--
2.11.0
The patch titled
Subject: mm/hugetlb: fix NULL-pointer dereference on 5-level paging machine
has been removed from the -mm tree. Its filename was
mm-hugetlb-fix-null-pointer-dereference-on-5-level-paging-machine.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Subject: mm/hugetlb: fix NULL-pointer dereference on 5-level paging machine
I made a mistake during converting hugetlb code to 5-level paging: in
huge_pte_alloc() we have to use p4d_alloc(), not p4d_offset(). Otherwise
it leads to crash -- NULL-pointer dereference in pud_alloc() if p4d table
is not yet allocated.
It only can happen in 5-level paging mode. In 4-level paging mode
p4d_offset() always returns pgd, so we are fine.
Link: http://lkml.kernel.org/r/20171122121921.64822-1-kirill.shutemov@linux.intel…
Fixes: c2febafc6773 ("mm: convert generic code to 5-level paging")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: <stable(a)vger.kernel.org> [4.11+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff -puN mm/hugetlb.c~mm-hugetlb-fix-null-pointer-dereference-on-5-level-paging-machine mm/hugetlb.c
--- a/mm/hugetlb.c~mm-hugetlb-fix-null-pointer-dereference-on-5-level-paging-machine
+++ a/mm/hugetlb.c
@@ -4635,7 +4635,9 @@ pte_t *huge_pte_alloc(struct mm_struct *
pte_t *pte = NULL;
pgd = pgd_offset(mm, addr);
- p4d = p4d_offset(pgd, addr);
+ p4d = p4d_alloc(mm, pgd, addr);
+ if (!p4d)
+ return NULL;
pud = pud_alloc(mm, p4d, addr);
if (pud) {
if (sz == PUD_SIZE) {
_
Patches currently in -mm which might be from kirill.shutemov(a)linux.intel.com are
The patch titled
Subject: autofs: revert "autofs: fix AT_NO_AUTOMOUNT not being honored"
has been removed from the -mm tree. Its filename was
autofs-revert-fix-at_no_automount-not-being-honored.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Ian Kent <raven(a)themaw.net>
Subject: autofs: revert "autofs: fix AT_NO_AUTOMOUNT not being honored"
42f4614821 ("autofs: fix AT_NO_AUTOMOUNT not being honored") allowed the
fstatat(2) system call to properly honor the AT_NO_AUTOMOUNT flag but
introduced a semantic change.
In order to honor AT_NO_AUTOMOUNT a semantic change was made to the
negative dentry case for stat family system calls in follow_automount().
This changed the unconditional triggering of an automount in this case to
no longer be done and an error returned instead.
This has caused more problems than I expected so reverting the change is
needed.
In a discussion with Neil Brown it was concluded that the automount(8)
daemon can implement this change without kernel modifications. So that
will be done instead and the autofs module documentation updated with a
description of the problem and what needs to be done by module users for
this specific case.
Link: http://lkml.kernel.org/r/151174730120.6162.3848002191530283984.stgit@pluto.…
Fixes: 42f4614821 ("autofs: fix AT_NO_AUTOMOUNT not being honored")
Signed-off-by: Ian Kent <raven(a)themaw.net>
Cc: Neil Brown <neilb(a)suse.com>
Cc: Al Viro <viro(a)ZenIV.linux.org.uk>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Colin Walters <walters(a)redhat.com>
Cc: Ondrej Holy <oholy(a)redhat.com>
Cc: <stable(a)vger.kernel.org> [4.11+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/namei.c | 15 +++------------
include/linux/fs.h | 3 ++-
2 files changed, 5 insertions(+), 13 deletions(-)
diff -puN fs/namei.c~autofs-revert-fix-at_no_automount-not-being-honored fs/namei.c
--- a/fs/namei.c~autofs-revert-fix-at_no_automount-not-being-honored
+++ a/fs/namei.c
@@ -1129,18 +1129,9 @@ static int follow_automount(struct path
* of the daemon to instantiate them before they can be used.
*/
if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
- LOOKUP_OPEN | LOOKUP_CREATE |
- LOOKUP_AUTOMOUNT))) {
- /* Positive dentry that isn't meant to trigger an
- * automount, EISDIR will allow it to be used,
- * otherwise there's no mount here "now" so return
- * ENOENT.
- */
- if (path->dentry->d_inode)
- return -EISDIR;
- else
- return -ENOENT;
- }
+ LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
+ path->dentry->d_inode)
+ return -EISDIR;
if (path->dentry->d_sb->s_user_ns != &init_user_ns)
return -EACCES;
diff -puN include/linux/fs.h~autofs-revert-fix-at_no_automount-not-being-honored include/linux/fs.h
--- a/include/linux/fs.h~autofs-revert-fix-at_no_automount-not-being-honored
+++ a/include/linux/fs.h
@@ -3088,7 +3088,8 @@ static inline int vfs_lstat(const char _
static inline int vfs_fstatat(int dfd, const char __user *filename,
struct kstat *stat, int flags)
{
- return vfs_statx(dfd, filename, flags, stat, STATX_BASIC_STATS);
+ return vfs_statx(dfd, filename, flags | AT_NO_AUTOMOUNT,
+ stat, STATX_BASIC_STATS);
}
static inline int vfs_fstat(int fd, struct kstat *stat)
{
_
Patches currently in -mm which might be from raven(a)themaw.net are
The patch titled
Subject: autofs: revert "autofs: take more care to not update last_used on path walk"
has been removed from the -mm tree. Its filename was
autofs-revert-take-more-care-to-not-update-last_used-on-path-walk.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Ian Kent <raven(a)themaw.net>
Subject: autofs: revert "autofs: take more care to not update last_used on path walk"
While 092a53452b ("autofs: take more care to not update last_used on path
walk") helped (partially) resolve a problem where automounts were not
expiring due to aggressive accesses from user space it has a side effect
for very large environments.
This change helps with the expire problem by making the expire more
aggressive but, for very large environments, that means more mount
requests from clients. When there are a lot of clients that can mean
fairly significant server load increases.
It turns out I put the last_used in this position to solve this very
problem and failed to update my own thinking of the autofs expire policy.
So the patch being reverted introduces a regression which should be fixed.
Link: http://lkml.kernel.org/r/151174729420.6162.1832622523537052460.stgit@pluto.…
Fixes: 092a53452b ("autofs: take more care to not update last_used on path walk")
Signed-off-by: Ian Kent <raven(a)themaw.net>
Reviewed-by: NeilBrown <neilb(a)suse.com>
Cc: Al Viro <viro(a)ZenIV.linux.org.uk>
Cc: <stable(a)vger.kernel.org> [4.11+]
Cc: Colin Walters <walters(a)redhat.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Ondrej Holy <oholy(a)redhat.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/autofs4/root.c | 17 ++++++-----------
1 file changed, 6 insertions(+), 11 deletions(-)
diff -puN fs/autofs4/root.c~autofs-revert-take-more-care-to-not-update-last_used-on-path-walk fs/autofs4/root.c
--- a/fs/autofs4/root.c~autofs-revert-take-more-care-to-not-update-last_used-on-path-walk
+++ a/fs/autofs4/root.c
@@ -281,8 +281,8 @@ static int autofs4_mount_wait(const stru
pr_debug("waiting for mount name=%pd\n", path->dentry);
status = autofs4_wait(sbi, path, NFY_MOUNT);
pr_debug("mount wait done status=%d\n", status);
- ino->last_used = jiffies;
}
+ ino->last_used = jiffies;
return status;
}
@@ -321,21 +321,16 @@ static struct dentry *autofs4_mountpoint
*/
if (autofs_type_indirect(sbi->type) && d_unhashed(dentry)) {
struct dentry *parent = dentry->d_parent;
+ struct autofs_info *ino;
struct dentry *new;
new = d_lookup(parent, &dentry->d_name);
if (!new)
return NULL;
- if (new == dentry)
- dput(new);
- else {
- struct autofs_info *ino;
-
- ino = autofs4_dentry_ino(new);
- ino->last_used = jiffies;
- dput(path->dentry);
- path->dentry = new;
- }
+ ino = autofs4_dentry_ino(new);
+ ino->last_used = jiffies;
+ dput(path->dentry);
+ path->dentry = new;
}
return path->dentry;
}
_
Patches currently in -mm which might be from raven(a)themaw.net are
The patch titled
Subject: fs/fat/inode.c: fix sb_rdonly() change
has been removed from the -mm tree. Its filename was
fat-fix-sb_rdonly-change.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp>
Subject: fs/fat/inode.c: fix sb_rdonly() change
bc98a42c1f7d0f ("VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)")
converted fat_remount():new_rdonly from a bool to an int. However
fat_remount() depends upon the compiler's conversion of a non-zero integer
into boolean `true'.
Fix it by switching `new_rdonly' back into a bool.
Link: http://lkml.kernel.org/r/87mv3d5x51.fsf@mail.parknet.co.jp
Fixes: bc98a42c1f7d0f8 ("VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)")
Signed-off-by: OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp>
Cc: Joe Perches <joe(a)perches.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/fat/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN fs/fat/inode.c~fat-fix-sb_rdonly-change fs/fat/inode.c
--- a/fs/fat/inode.c~fat-fix-sb_rdonly-change
+++ a/fs/fat/inode.c
@@ -779,7 +779,7 @@ static void __exit fat_destroy_inodecach
static int fat_remount(struct super_block *sb, int *flags, char *data)
{
- int new_rdonly;
+ bool new_rdonly;
struct msdos_sb_info *sbi = MSDOS_SB(sb);
*flags |= SB_NODIRATIME | (sbi->options.isvfat ? 0 : SB_NOATIME);
_
Patches currently in -mm which might be from hirofumi(a)mail.parknet.co.jp are
The patch titled
Subject: mm, memcg: fix mem_cgroup_swapout() for THPs
has been removed from the -mm tree. Its filename was
mm-memcg-fix-mem_cgroup_swapout-for-thps.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Shakeel Butt <shakeelb(a)google.com>
Subject: mm, memcg: fix mem_cgroup_swapout() for THPs
d6810d730022 ("memcg, THP, swap: make mem_cgroup_swapout() support THP")
changed mem_cgroup_swapout() to support transparent huge page (THP).
However the patch missed one location which should be changed for
correctly handling THPs. The resulting bug will cause the memory cgroups
whose THPs were swapped out to become zombies on deletion.
Link: http://lkml.kernel.org/r/20171128161941.20931-1-shakeelb@google.com
Fixes: d6810d730022 ("memcg, THP, swap: make mem_cgroup_swapout() support THP")
Signed-off-by: Shakeel Butt <shakeelb(a)google.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Huang Ying <ying.huang(a)intel.com>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: Greg Thelen <gthelen(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN mm/memcontrol.c~mm-memcg-fix-mem_cgroup_swapout-for-thps mm/memcontrol.c
--- a/mm/memcontrol.c~mm-memcg-fix-mem_cgroup_swapout-for-thps
+++ a/mm/memcontrol.c
@@ -6044,7 +6044,7 @@ void mem_cgroup_swapout(struct page *pag
memcg_check_events(memcg, page);
if (!mem_cgroup_is_root(memcg))
- css_put(&memcg->css);
+ css_put_many(&memcg->css, nr_entries);
}
/**
_
Patches currently in -mm which might be from shakeelb(a)google.com are
mm-mlock-vmscan-no-more-skipping-pagevecs.patch
vfs-remove-might_sleep-from-clear_inode.patch
The patch titled
Subject: mm: migrate: fix an incorrect call of prep_transhuge_page()
has been removed from the -mm tree. Its filename was
mm-migrate-fix-an-incorrect-call-of-prep_transhuge_page.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Zi Yan <zi.yan(a)cs.rutgers.edu>
Subject: mm: migrate: fix an incorrect call of prep_transhuge_page()
In https://lkml.org/lkml/2017/11/20/411, Andrea reported that during
memory hotplug/hot remove prep_transhuge_page() is called incorrectly on
non-THP pages for migration, when THP is on but THP migration is not
enabled. This leads to a bad state of target pages for migration.
By inspecting the code, if called on a non-THP, prep_transhuge_page() will
1) change the value of the mapping of (page + 2), since it is used
for THP deferred list;
2) change the lru value of (page + 1), since it is used for THP's
dtor.
Both can lead to data corruption of these two pages.
Andrea said:
: Pragmatically and from the point of view of the memory_hotplug subsys, the
: effect is a kernel crash when pages are being migrated during a memory hot
: remove offline and migration target pages are found in a bad state.
This patch fixes it by only calling prep_transhuge_page() when we are
certain that the target page is THP.
Link: http://lkml.kernel.org/r/20171121021855.50525-1-zi.yan@sent.com
Fixes: 8135d8926c08 ("mm: memory_hotplug: memory hotremove supports thp migration")
Signed-off-by: Zi Yan <zi.yan(a)cs.rutgers.edu>
Reported-by: Andrea Reale <ar(a)linux.vnet.ibm.com>
Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
Cc: <stable(a)vger.kernel.org> [4.14]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/migrate.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN include/linux/migrate.h~mm-migrate-fix-an-incorrect-call-of-prep_transhuge_page include/linux/migrate.h
--- a/include/linux/migrate.h~mm-migrate-fix-an-incorrect-call-of-prep_transhuge_page
+++ a/include/linux/migrate.h
@@ -54,7 +54,7 @@ static inline struct page *new_page_node
new_page = __alloc_pages_nodemask(gfp_mask, order,
preferred_nid, nodemask);
- if (new_page && PageTransHuge(page))
+ if (new_page && PageTransHuge(new_page))
prep_transhuge_page(new_page);
return new_page;
_
Patches currently in -mm which might be from zi.yan(a)cs.rutgers.edu are
The patch titled
Subject: mm/madvise.c: fix madvise() infinite loop under special circumstances
has been removed from the -mm tree. Its filename was
mmmadvise-bugfix-of-madvise-systemcall-infinite-loop-under-special-circumstances.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: chenjie <chenjie6(a)huawei.com>
Subject: mm/madvise.c: fix madvise() infinite loop under special circumstances
MADVISE_WILLNEED has always been a noop for DAX (formerly XIP) mappings.
Unfortunately madvise_willneed() doesn't communicate this information
properly to the generic madvise syscall implementation. The calling
convention is quite subtle there. madvise_vma() is supposed to either
return an error or update &prev otherwise the main loop will never advance
to the next vma and it will keep looping for ever without a way to get out
of the kernel.
It seems this has been broken since introduction. Nobody has noticed
because nobody seems to be using MADVISE_WILLNEED on these DAX mappings.
[mhocko(a)suse.com: rewrite changelog]
Link: http://lkml.kernel.org/r/20171127115318.911-1-guoxuenan@huawei.com
Fixes: fe77ba6f4f97 ("[PATCH] xip: madvice/fadvice: execute in place")
Signed-off-by: chenjie <chenjie6(a)huawei.com>
Signed-off-by: guoxuenan <guoxuenan(a)huawei.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: zhangyi (F) <yi.zhang(a)huawei.com>
Cc: Miao Xie <miaoxie(a)huawei.com>
Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com>
Cc: Shaohua Li <shli(a)fb.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Anshuman Khandual <khandual(a)linux.vnet.ibm.com>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Carsten Otte <cotte(a)de.ibm.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/madvise.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff -puN mm/madvise.c~mmmadvise-bugfix-of-madvise-systemcall-infinite-loop-under-special-circumstances mm/madvise.c
--- a/mm/madvise.c~mmmadvise-bugfix-of-madvise-systemcall-infinite-loop-under-special-circumstances
+++ a/mm/madvise.c
@@ -276,15 +276,14 @@ static long madvise_willneed(struct vm_a
{
struct file *file = vma->vm_file;
+ *prev = vma;
#ifdef CONFIG_SWAP
if (!file) {
- *prev = vma;
force_swapin_readahead(vma, start, end);
return 0;
}
if (shmem_mapping(file->f_mapping)) {
- *prev = vma;
force_shm_swapin_readahead(vma, start, end,
file->f_mapping);
return 0;
@@ -299,7 +298,6 @@ static long madvise_willneed(struct vm_a
return 0;
}
- *prev = vma;
start = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
if (end > vma->vm_end)
end = vma->vm_end;
_
Patches currently in -mm which might be from chenjie6(a)huawei.com are
The patch titled
Subject: exec: avoid RLIMIT_STACK races with prlimit()
has been removed from the -mm tree. Its filename was
exec-avoid-rlimit_stack-races-with-prlimit.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Kees Cook <keescook(a)chromium.org>
Subject: exec: avoid RLIMIT_STACK races with prlimit()
While the defense-in-depth RLIMIT_STACK limit on setuid processes was
protected against races from other threads calling setrlimit(), I missed
protecting it against races from external processes calling prlimit().
This adds locking around the change and makes sure that rlim_max is set
too.
Link: http://lkml.kernel.org/r/20171127193457.GA11348@beast
Fixes: 64701dee4178e ("exec: Use sane stack rlimit under secureexec")
Signed-off-by: Kees Cook <keescook(a)chromium.org>
Reported-by: Ben Hutchings <ben.hutchings(a)codethink.co.uk>
Reported-by: Brad Spengler <spender(a)grsecurity.net>
Acked-by: Serge Hallyn <serge(a)hallyn.com>
Cc: James Morris <james.l.morris(a)oracle.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Jiri Slaby <jslaby(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/exec.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff -puN fs/exec.c~exec-avoid-rlimit_stack-races-with-prlimit fs/exec.c
--- a/fs/exec.c~exec-avoid-rlimit_stack-races-with-prlimit
+++ a/fs/exec.c
@@ -1340,10 +1340,15 @@ void setup_new_exec(struct linux_binprm
* avoid bad behavior from the prior rlimits. This has to
* happen before arch_pick_mmap_layout(), which examines
* RLIMIT_STACK, but after the point of no return to avoid
- * needing to clean up the change on failure.
+ * races from other threads changing the limits. This also
+ * must be protected from races with prlimit() calls.
*/
+ task_lock(current->group_leader);
if (current->signal->rlim[RLIMIT_STACK].rlim_cur > _STK_LIM)
current->signal->rlim[RLIMIT_STACK].rlim_cur = _STK_LIM;
+ if (current->signal->rlim[RLIMIT_STACK].rlim_max > _STK_LIM)
+ current->signal->rlim[RLIMIT_STACK].rlim_max = _STK_LIM;
+ task_unlock(current->group_leader);
}
arch_pick_mmap_layout(current->mm);
_
Patches currently in -mm which might be from keescook(a)chromium.org are
makefile-move-stack-protector-compiler-breakage-test-earlier.patch
makefile-move-stack-protector-availability-out-of-kconfig.patch
makefile-introduce-config_cc_stackprotector_auto.patch
The patch titled
Subject: IB/core: disable memory registration of filesystem-dax vmas
has been removed from the -mm tree. Its filename was
ib-core-disable-memory-registration-of-fileystem-dax-vmas.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: IB/core: disable memory registration of filesystem-dax vmas
Until there is a solution to the dma-to-dax vs truncate problem it is not
safe to allow RDMA to create long standing memory registrations against
filesytem-dax vmas.
Link: http://lkml.kernel.org/r/151068941011.7446.7766030590347262502.stgit@dwilli…
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reported-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Acked-by: Jason Gunthorpe <jgg(a)mellanox.com>
Acked-by: Doug Ledford <dledford(a)redhat.com>
Cc: Sean Hefty <sean.hefty(a)intel.com>
Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Cc: Inki Dae <inki.dae(a)samsung.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Joonyoung Shim <jy0922.shim(a)samsung.com>
Cc: Kyungmin Park <kyungmin.park(a)samsung.com>
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/infiniband/core/umem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN drivers/infiniband/core/umem.c~ib-core-disable-memory-registration-of-fileystem-dax-vmas drivers/infiniband/core/umem.c
--- a/drivers/infiniband/core/umem.c~ib-core-disable-memory-registration-of-fileystem-dax-vmas
+++ a/drivers/infiniband/core/umem.c
@@ -191,7 +191,7 @@ struct ib_umem *ib_umem_get(struct ib_uc
sg_list_start = umem->sg_head.sgl;
while (npages) {
- ret = get_user_pages(cur_base,
+ ret = get_user_pages_longterm(cur_base,
min_t(unsigned long, npages,
PAGE_SIZE / sizeof (struct page *)),
gup_flags, page_list, vma_list);
_
Patches currently in -mm which might be from dan.j.williams(a)intel.com are
The patch titled
Subject: v4l2: disable filesystem-dax mapping support
has been removed from the -mm tree. Its filename was
v4l2-disable-filesystem-dax-mapping-support.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: v4l2: disable filesystem-dax mapping support
V4L2 memory registrations are incompatible with filesystem-dax that needs
the ability to revoke dma access to a mapping at will, or otherwise allow
the kernel to wait for completion of DMA. The filesystem-dax
implementation breaks the traditional solution of truncate of active file
backed mappings since there is no page-cache page we can orphan to sustain
ongoing DMA.
If v4l2 wants to support long lived DMA mappings it needs to arrange to
hold a file lease or use some other mechanism so that the kernel can
coordinate revoking DMA access when the filesystem needs to truncate
mappings.
Link: http://lkml.kernel.org/r/151068940499.7446.12846708245365671207.stgit@dwill…
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reported-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Doug Ledford <dledford(a)redhat.com>
Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com>
Cc: Inki Dae <inki.dae(a)samsung.com>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Joonyoung Shim <jy0922.shim(a)samsung.com>
Cc: Kyungmin Park <kyungmin.park(a)samsung.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Cc: Sean Hefty <sean.hefty(a)intel.com>
Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/media/v4l2-core/videobuf-dma-sg.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff -puN drivers/media/v4l2-core/videobuf-dma-sg.c~v4l2-disable-filesystem-dax-mapping-support drivers/media/v4l2-core/videobuf-dma-sg.c
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c~v4l2-disable-filesystem-dax-mapping-support
+++ a/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -185,12 +185,13 @@ static int videobuf_dma_init_user_locked
dprintk(1, "init user [0x%lx+0x%lx => %d pages]\n",
data, size, dma->nr_pages);
- err = get_user_pages(data & PAGE_MASK, dma->nr_pages,
+ err = get_user_pages_longterm(data & PAGE_MASK, dma->nr_pages,
flags, dma->pages, NULL);
if (err != dma->nr_pages) {
dma->nr_pages = (err >= 0) ? err : 0;
- dprintk(1, "get_user_pages: err=%d [%d]\n", err, dma->nr_pages);
+ dprintk(1, "get_user_pages_longterm: err=%d [%d]\n", err,
+ dma->nr_pages);
return err < 0 ? err : -EINVAL;
}
return 0;
_
Patches currently in -mm which might be from dan.j.williams(a)intel.com are
The patch titled
Subject: mm: fail get_vaddr_frames() for filesystem-dax mappings
has been removed from the -mm tree. Its filename was
mm-fail-get_vaddr_frames-for-filesystem-dax-mappings.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: mm: fail get_vaddr_frames() for filesystem-dax mappings
Until there is a solution to the dma-to-dax vs truncate problem it is not
safe to allow V4L2, Exynos, and other frame vector users to create long
standing / irrevocable memory registrations against filesytem-dax vmas.
[dan.j.williams(a)intel.com: add comment for vma_is_fsdax() check in get_vaddr_frames(), per Jan]
Link: http://lkml.kernel.org/r/151197874035.26211.4061781453123083667.stgit@dwill…
Link: http://lkml.kernel.org/r/151068939985.7446.15684639617389154187.stgit@dwill…
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Cc: Inki Dae <inki.dae(a)samsung.com>
Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com>
Cc: Joonyoung Shim <jy0922.shim(a)samsung.com>
Cc: Kyungmin Park <kyungmin.park(a)samsung.com>
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Doug Ledford <dledford(a)redhat.com>
Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Cc: Sean Hefty <sean.hefty(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/frame_vector.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff -puN mm/frame_vector.c~mm-fail-get_vaddr_frames-for-filesystem-dax-mappings mm/frame_vector.c
--- a/mm/frame_vector.c~mm-fail-get_vaddr_frames-for-filesystem-dax-mappings
+++ a/mm/frame_vector.c
@@ -53,6 +53,18 @@ int get_vaddr_frames(unsigned long start
ret = -EFAULT;
goto out;
}
+
+ /*
+ * While get_vaddr_frames() could be used for transient (kernel
+ * controlled lifetime) pinning of memory pages all current
+ * users establish long term (userspace controlled lifetime)
+ * page pinning. Treat get_vaddr_frames() like
+ * get_user_pages_longterm() and disallow it for filesystem-dax
+ * mappings.
+ */
+ if (vma_is_fsdax(vma))
+ return -EOPNOTSUPP;
+
if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
vec->got_ref = true;
vec->is_pfns = false;
_
Patches currently in -mm which might be from dan.j.williams(a)intel.com are
The patch titled
Subject: mm: introduce get_user_pages_longterm
has been removed from the -mm tree. Its filename was
mm-introduce-get_user_pages_longterm.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: mm: introduce get_user_pages_longterm
Patch series "introduce get_user_pages_longterm()", v2.
Here is a new get_user_pages api for cases where a driver intends to keep
an elevated page count indefinitely. This is distinct from usages like
iov_iter_get_pages where the elevated page counts are transient. The
iov_iter_get_pages cases immediately turn around and submit the pages to a
device driver which will put_page when the i/o operation completes (under
kernel control).
In the longterm case userspace is responsible for dropping the page
reference at some undefined point in the future. This is untenable for
filesystem-dax case where the filesystem is in control of the lifetime of
the block / page and needs reasonable limits on how long it can wait for
pages in a mapping to become idle.
Fixing filesystems to actually wait for dax pages to be idle before blocks
from a truncate/hole-punch operation are repurposed is saved for a later
patch series.
Also, allowing longterm registration of dax mappings is a future patch
series that introduces a "map with lease" semantic where the kernel can
revoke a lease and force userspace to drop its page references.
I have also tagged these for -stable to purposely break cases that might
assume that longterm memory registrations for filesystem-dax mappings were
supported by the kernel. The behavior regression this policy change
implies is one of the reasons we maintain the "dax enabled. Warning:
EXPERIMENTAL, use at your own risk" notification when mounting a
filesystem in dax mode.
It is worth noting the device-dax interface does not suffer the same
constraints since it does not support file space management operations
like hole-punch.
This patch (of 4):
Until there is a solution to the dma-to-dax vs truncate problem it is not
safe to allow long standing memory registrations against filesytem-dax
vmas. Device-dax vmas do not have this problem and are explicitly
allowed.
This is temporary until a "memory registration with layout-lease"
mechanism can be implemented for the affected sub-systems (RDMA and V4L2).
[akpm(a)linux-foundation.org: use kcalloc()]
Link: http://lkml.kernel.org/r/151068939435.7446.13560129395419350737.stgit@dwill…
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Suggested-by: Christoph Hellwig <hch(a)lst.de>
Cc: Doug Ledford <dledford(a)redhat.com>
Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com>
Cc: Inki Dae <inki.dae(a)samsung.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Joonyoung Shim <jy0922.shim(a)samsung.com>
Cc: Kyungmin Park <kyungmin.park(a)samsung.com>
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Cc: Sean Hefty <sean.hefty(a)intel.com>
Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/fs.h | 14 +++++++++
include/linux/mm.h | 13 ++++++++
mm/gup.c | 64 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 91 insertions(+)
diff -puN include/linux/fs.h~mm-introduce-get_user_pages_longterm include/linux/fs.h
--- a/include/linux/fs.h~mm-introduce-get_user_pages_longterm
+++ a/include/linux/fs.h
@@ -3194,6 +3194,20 @@ static inline bool vma_is_dax(struct vm_
return vma->vm_file && IS_DAX(vma->vm_file->f_mapping->host);
}
+static inline bool vma_is_fsdax(struct vm_area_struct *vma)
+{
+ struct inode *inode;
+
+ if (!vma->vm_file)
+ return false;
+ if (!vma_is_dax(vma))
+ return false;
+ inode = file_inode(vma->vm_file);
+ if (inode->i_mode == S_IFCHR)
+ return false; /* device-dax */
+ return true;
+}
+
static inline int iocb_flags(struct file *file)
{
int res = 0;
diff -puN include/linux/mm.h~mm-introduce-get_user_pages_longterm include/linux/mm.h
--- a/include/linux/mm.h~mm-introduce-get_user_pages_longterm
+++ a/include/linux/mm.h
@@ -1380,6 +1380,19 @@ long get_user_pages_locked(unsigned long
unsigned int gup_flags, struct page **pages, int *locked);
long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
struct page **pages, unsigned int gup_flags);
+#ifdef CONFIG_FS_DAX
+long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
+ unsigned int gup_flags, struct page **pages,
+ struct vm_area_struct **vmas);
+#else
+static inline long get_user_pages_longterm(unsigned long start,
+ unsigned long nr_pages, unsigned int gup_flags,
+ struct page **pages, struct vm_area_struct **vmas)
+{
+ return get_user_pages(start, nr_pages, gup_flags, pages, vmas);
+}
+#endif /* CONFIG_FS_DAX */
+
int get_user_pages_fast(unsigned long start, int nr_pages, int write,
struct page **pages);
diff -puN mm/gup.c~mm-introduce-get_user_pages_longterm mm/gup.c
--- a/mm/gup.c~mm-introduce-get_user_pages_longterm
+++ a/mm/gup.c
@@ -1095,6 +1095,70 @@ long get_user_pages(unsigned long start,
}
EXPORT_SYMBOL(get_user_pages);
+#ifdef CONFIG_FS_DAX
+/*
+ * This is the same as get_user_pages() in that it assumes we are
+ * operating on the current task's mm, but it goes further to validate
+ * that the vmas associated with the address range are suitable for
+ * longterm elevated page reference counts. For example, filesystem-dax
+ * mappings are subject to the lifetime enforced by the filesystem and
+ * we need guarantees that longterm users like RDMA and V4L2 only
+ * establish mappings that have a kernel enforced revocation mechanism.
+ *
+ * "longterm" == userspace controlled elevated page count lifetime.
+ * Contrast this to iov_iter_get_pages() usages which are transient.
+ */
+long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
+ unsigned int gup_flags, struct page **pages,
+ struct vm_area_struct **vmas_arg)
+{
+ struct vm_area_struct **vmas = vmas_arg;
+ struct vm_area_struct *vma_prev = NULL;
+ long rc, i;
+
+ if (!pages)
+ return -EINVAL;
+
+ if (!vmas) {
+ vmas = kcalloc(nr_pages, sizeof(struct vm_area_struct *),
+ GFP_KERNEL);
+ if (!vmas)
+ return -ENOMEM;
+ }
+
+ rc = get_user_pages(start, nr_pages, gup_flags, pages, vmas);
+
+ for (i = 0; i < rc; i++) {
+ struct vm_area_struct *vma = vmas[i];
+
+ if (vma == vma_prev)
+ continue;
+
+ vma_prev = vma;
+
+ if (vma_is_fsdax(vma))
+ break;
+ }
+
+ /*
+ * Either get_user_pages() failed, or the vma validation
+ * succeeded, in either case we don't need to put_page() before
+ * returning.
+ */
+ if (i >= rc)
+ goto out;
+
+ for (i = 0; i < rc; i++)
+ put_page(pages[i]);
+ rc = -EOPNOTSUPP;
+out:
+ if (vmas != vmas_arg)
+ kfree(vmas);
+ return rc;
+}
+EXPORT_SYMBOL(get_user_pages_longterm);
+#endif /* CONFIG_FS_DAX */
+
/**
* populate_vma_page_range() - populate a range of pages in the vma.
* @vma: target vma
_
Patches currently in -mm which might be from dan.j.williams(a)intel.com are
The patch titled
Subject: device-dax: implement ->split() to catch invalid munmap attempts
has been removed from the -mm tree. Its filename was
device-dax-implement-split-to-catch-invalid-munmap-attempts.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: device-dax: implement ->split() to catch invalid munmap attempts
Similar to how device-dax enforces that the 'address', 'offset', and 'len'
parameters to mmap() be aligned to the device's fundamental alignment, the
same constraints apply to munmap(). Implement ->split() to fail munmap
calls that violate the alignment constraint. Otherwise, we later fail
VM_BUG_ON checks in the unmap_page_range() path with crash signatures of
the form:
vma ffff8800b60c8a88 start 00007f88c0000000 end 00007f88c0e00000
next (null) prev (null) mm ffff8800b61150c0
prot 8000000000000027 anon_vma (null) vm_ops ffffffffa0091240
pgoff 0 file ffff8800b638ef80 private_data (null)
flags: 0x380000fb(read|write|shared|mayread|maywrite|mayexec|mayshare|softdirty|mixedmap|hugepage)
------------[ cut here ]------------
kernel BUG at mm/huge_memory.c:2014!
[..]
RIP: 0010:__split_huge_pud+0x12a/0x180
[..]
Call Trace:
unmap_page_range+0x245/0xa40
? __vma_adjust+0x301/0x990
unmap_vmas+0x4c/0xa0
unmap_region+0xae/0x120
? __vma_rb_erase+0x11a/0x230
do_munmap+0x276/0x410
vm_munmap+0x6a/0xa0
SyS_munmap+0x1d/0x30
Link: http://lkml.kernel.org/r/151130418681.4029.7118245855057952010.stgit@dwilli…
Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reported-by: Jeff Moyer <jmoyer(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/dax/device.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff -puN drivers/dax/device.c~device-dax-implement-split-to-catch-invalid-munmap-attempts drivers/dax/device.c
--- a/drivers/dax/device.c~device-dax-implement-split-to-catch-invalid-munmap-attempts
+++ a/drivers/dax/device.c
@@ -428,9 +428,21 @@ static int dev_dax_fault(struct vm_fault
return dev_dax_huge_fault(vmf, PE_SIZE_PTE);
}
+static int dev_dax_split(struct vm_area_struct *vma, unsigned long addr)
+{
+ struct file *filp = vma->vm_file;
+ struct dev_dax *dev_dax = filp->private_data;
+ struct dax_region *dax_region = dev_dax->region;
+
+ if (!IS_ALIGNED(addr, dax_region->align))
+ return -EINVAL;
+ return 0;
+}
+
static const struct vm_operations_struct dax_vm_ops = {
.fault = dev_dax_fault,
.huge_fault = dev_dax_huge_fault,
+ .split = dev_dax_split,
};
static int dax_mmap(struct file *filp, struct vm_area_struct *vma)
_
Patches currently in -mm which might be from dan.j.williams(a)intel.com are
The patch titled
Subject: mm, hugetlbfs: introduce ->split() to vm_operations_struct
has been removed from the -mm tree. Its filename was
mm-hugetlbfs-introduce-split-to-vm_operations_struct.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: mm, hugetlbfs: introduce ->split() to vm_operations_struct
Patch series "device-dax: fix unaligned munmap handling"
When device-dax is operating in huge-page mode we want it to behave like
hugetlbfs and fail attempts to split vmas into unaligned ranges. It would
be messy to teach the munmap path about device-dax alignment constraints
in the same (hstate) way that hugetlbfs communicates this constraint.
Instead, these patches introduce a new ->split() vm operation.
This patch (of 2):
The device-dax interface has similar constraints as hugetlbfs in that it
requires the munmap path to unmap in huge page aligned units. Rather than
add more custom vma handling code in __split_vma() introduce a new vm
operation to perform this vma specific check.
Link: http://lkml.kernel.org/r/151130418135.4029.6783191281930729710.stgit@dwilli…
Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mm.h | 1 +
mm/hugetlb.c | 8 ++++++++
mm/mmap.c | 8 +++++---
3 files changed, 14 insertions(+), 3 deletions(-)
diff -puN include/linux/mm.h~mm-hugetlbfs-introduce-split-to-vm_operations_struct include/linux/mm.h
--- a/include/linux/mm.h~mm-hugetlbfs-introduce-split-to-vm_operations_struct
+++ a/include/linux/mm.h
@@ -377,6 +377,7 @@ enum page_entry_size {
struct vm_operations_struct {
void (*open)(struct vm_area_struct * area);
void (*close)(struct vm_area_struct * area);
+ int (*split)(struct vm_area_struct * area, unsigned long addr);
int (*mremap)(struct vm_area_struct * area);
int (*fault)(struct vm_fault *vmf);
int (*huge_fault)(struct vm_fault *vmf, enum page_entry_size pe_size);
diff -puN mm/hugetlb.c~mm-hugetlbfs-introduce-split-to-vm_operations_struct mm/hugetlb.c
--- a/mm/hugetlb.c~mm-hugetlbfs-introduce-split-to-vm_operations_struct
+++ a/mm/hugetlb.c
@@ -3125,6 +3125,13 @@ static void hugetlb_vm_op_close(struct v
}
}
+static int hugetlb_vm_op_split(struct vm_area_struct *vma, unsigned long addr)
+{
+ if (addr & ~(huge_page_mask(hstate_vma(vma))))
+ return -EINVAL;
+ return 0;
+}
+
/*
* We cannot handle pagefaults against hugetlb pages at all. They cause
* handle_mm_fault() to try to instantiate regular-sized pages in the
@@ -3141,6 +3148,7 @@ const struct vm_operations_struct hugetl
.fault = hugetlb_vm_op_fault,
.open = hugetlb_vm_op_open,
.close = hugetlb_vm_op_close,
+ .split = hugetlb_vm_op_split,
};
static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page,
diff -puN mm/mmap.c~mm-hugetlbfs-introduce-split-to-vm_operations_struct mm/mmap.c
--- a/mm/mmap.c~mm-hugetlbfs-introduce-split-to-vm_operations_struct
+++ a/mm/mmap.c
@@ -2555,9 +2555,11 @@ int __split_vma(struct mm_struct *mm, st
struct vm_area_struct *new;
int err;
- if (is_vm_hugetlb_page(vma) && (addr &
- ~(huge_page_mask(hstate_vma(vma)))))
- return -EINVAL;
+ if (vma->vm_ops && vma->vm_ops->split) {
+ err = vma->vm_ops->split(vma, addr);
+ if (err)
+ return err;
+ }
new = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
if (!new)
_
Patches currently in -mm which might be from dan.j.williams(a)intel.com are
The patch titled
Subject: mm: fix device-dax pud write-faults triggered by get_user_pages()
has been removed from the -mm tree. Its filename was
mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: mm: fix device-dax pud write-faults triggered by get_user_pages()
Currently only get_user_pages_fast() can safely handle the writable gup
case due to its use of pud_access_permitted() to check whether the pud
entry is writable. In the gup slow path pud_write() is used instead of
pud_access_permitted() and to date it has been unimplemented, just calls
BUG_ON().
kernel BUG at ./include/linux/hugetlb.h:244!
[..]
RIP: 0010:follow_devmap_pud+0x482/0x490
[..]
Call Trace:
follow_page_mask+0x28c/0x6e0
__get_user_pages+0xe4/0x6c0
get_user_pages_unlocked+0x130/0x1b0
get_user_pages_fast+0x89/0xb0
iov_iter_get_pages_alloc+0x114/0x4a0
nfs_direct_read_schedule_iovec+0xd2/0x350
? nfs_start_io_direct+0x63/0x70
nfs_file_direct_read+0x1e0/0x250
nfs_file_read+0x90/0xc0
For now this just implements a simple check for the _PAGE_RW bit similar
to pmd_write. However, this implies that the gup-slow-path check is
missing the extra checks that the gup-fast-path performs with
pud_access_permitted. Later patches will align all checks to use the
'access_permitted' helper if the architecture provides it. Note that the
generic 'access_permitted' helper fallback is the simple _PAGE_RW check on
architectures that do not define the 'access_permitted' helper(s).
[dan.j.williams(a)intel.com: fix powerpc compile error]
Link: http://lkml.kernel.org/r/151129126165.37405.16031785266675461397.stgit@dwil…
Link: http://lkml.kernel.org/r/151043109938.2842.14834662818213616199.stgit@dwill…
Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reported-by: Stephen Rothwell <sfr(a)canb.auug.org.au>
Acked-by: Thomas Gleixner <tglx(a)linutronix.de> [x86]
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/include/asm/pgtable.h | 6 ++++++
include/asm-generic/pgtable.h | 8 ++++++++
include/linux/hugetlb.h | 8 --------
3 files changed, 14 insertions(+), 8 deletions(-)
diff -puN arch/x86/include/asm/pgtable.h~mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages arch/x86/include/asm/pgtable.h
--- a/arch/x86/include/asm/pgtable.h~mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages
+++ a/arch/x86/include/asm/pgtable.h
@@ -1088,6 +1088,12 @@ static inline void pmdp_set_wrprotect(st
clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp);
}
+#define pud_write pud_write
+static inline int pud_write(pud_t pud)
+{
+ return pud_flags(pud) & _PAGE_RW;
+}
+
/*
* clone_pgd_range(pgd_t *dst, pgd_t *src, int count);
*
diff -puN include/asm-generic/pgtable.h~mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages include/asm-generic/pgtable.h
--- a/include/asm-generic/pgtable.h~mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages
+++ a/include/asm-generic/pgtable.h
@@ -814,6 +814,14 @@ static inline int pmd_write(pmd_t pmd)
#endif /* __HAVE_ARCH_PMD_WRITE */
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+#ifndef pud_write
+static inline int pud_write(pud_t pud)
+{
+ BUG();
+ return 0;
+}
+#endif /* pud_write */
+
#if !defined(CONFIG_TRANSPARENT_HUGEPAGE) || \
(defined(CONFIG_TRANSPARENT_HUGEPAGE) && \
!defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD))
diff -puN include/linux/hugetlb.h~mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages include/linux/hugetlb.h
--- a/include/linux/hugetlb.h~mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages
+++ a/include/linux/hugetlb.h
@@ -239,14 +239,6 @@ static inline int pgd_write(pgd_t pgd)
}
#endif
-#ifndef pud_write
-static inline int pud_write(pud_t pud)
-{
- BUG();
- return 0;
-}
-#endif
-
#define HUGETLB_ANON_FILE "anon_hugepage"
enum {
_
Patches currently in -mm which might be from dan.j.williams(a)intel.com are
The patch titled
Subject: mm/cma: fix alloc_contig_range ret code/potential leak
has been removed from the -mm tree. Its filename was
mm-cma-fix-alloc_contig_range-ret-code-potential-leak-v2.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: mm/cma: fix alloc_contig_range ret code/potential leak
If the call __alloc_contig_migrate_range() in alloc_contig_range returns
-EBUSY, processing continues so that test_pages_isolated() is called where
there is a tracepoint to identify the busy pages. However, it is possible
for busy pages to become available between the calls to these two
routines. In this case, the range of pages may be allocated.
Unfortunately, the original return code (ret == -EBUSY) is still set and
returned to the caller. Therefore, the caller believes the pages were not
allocated and they are leaked.
Update comment to indicate that allocation is still possible even if
__alloc_contig_migrate_range returns -EBUSY. Also, clear return code in
this case so that it is not accidentally used or returned to caller.
Link: http://lkml.kernel.org/r/20171122185214.25285-1-mike.kravetz@oracle.com
Fixes: 8ef5849fa8a2 ("mm/cma: always check which page caused allocation failure")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Acked-by: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Michal Nazarewicz <mina86(a)mina86.com>
Cc: Laura Abbott <labbott(a)redhat.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff -puN mm/page_alloc.c~mm-cma-fix-alloc_contig_range-ret-code-potential-leak-v2 mm/page_alloc.c
--- a/mm/page_alloc.c~mm-cma-fix-alloc_contig_range-ret-code-potential-leak-v2
+++ a/mm/page_alloc.c
@@ -7652,11 +7652,18 @@ int alloc_contig_range(unsigned long sta
/*
* In case of -EBUSY, we'd like to know which page causes problem.
- * So, just fall through. We will check it in test_pages_isolated().
+ * So, just fall through. test_pages_isolated() has a tracepoint
+ * which will report the busy page.
+ *
+ * It is possible that busy pages could become available before
+ * the call to test_pages_isolated, and the range will actually be
+ * allocated. So, if we fall through be sure to clear ret so that
+ * -EBUSY is not accidentally used or returned to caller.
*/
ret = __alloc_contig_migrate_range(&cc, start, end);
if (ret && ret != -EBUSY)
goto done;
+ ret =0;
/*
* Pages from [start, end) are within a MAX_ORDER_NR_PAGES
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
The patch titled
Subject: mm, oom_reaper: gather each vma to prevent leaking TLB entry
has been removed from the -mm tree. Its filename was
mm-oom_reaper-gather-each-vma-to-prevent-leaking-tlb-entry.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Wang Nan <wangnan0(a)huawei.com>
Subject: mm, oom_reaper: gather each vma to prevent leaking TLB entry
tlb_gather_mmu(&tlb, mm, 0, -1) means gathering the whole virtual memory
space. In this case, tlb->fullmm is true. Some archs like arm64 doesn't
flush TLB when tlb->fullmm is true:
commit 5a7862e83000 ("arm64: tlbflush: avoid flushing when fullmm == 1").
Which causes leaking of tlb entries.
Will clarifies his patch:
: Basically, we tag each address space with an ASID (PCID on x86) which
: is resident in the TLB. This means we can elide TLB invalidation when
: pulling down a full mm because we won't ever assign that ASID to another mm
: without doing TLB invalidation elsewhere (which actually just nukes the
: whole TLB).
:
: I think that means that we could potentially not fault on a kernel uaccess,
: because we could hit in the TLB.
There could be a window between complete_signal() sending IPI to other
cores and all threads sharing this mm are really kicked off from cores.
In this window, the oom reaper may calls tlb_flush_mmu_tlbonly() to flush
TLB then frees pages. However, due to the above problem, the TLB entries
are not really flushed on arm64. Other threads are possible to access
these pages through TLB entries. Moreover, a copy_to_user() can also
write to these pages without generating page fault, causes use-after-free
bugs.
This patch gathers each vma instead of gathering full vm space. In this
case tlb->fullmm is not true. The behavior of oom reaper become similar
to munmapping before do_exit, which should be safe for all archs.
Link: http://lkml.kernel.org/r/20171107095453.179940-1-wangnan0@huawei.com
Fixes: aac453635549 ("mm, oom: introduce oom reaper")
Signed-off-by: Wang Nan <wangnan0(a)huawei.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: David Rientjes <rientjes(a)google.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: Bob Liu <liubo95(a)huawei.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/oom_kill.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff -puN mm/oom_kill.c~mm-oom_reaper-gather-each-vma-to-prevent-leaking-tlb-entry mm/oom_kill.c
--- a/mm/oom_kill.c~mm-oom_reaper-gather-each-vma-to-prevent-leaking-tlb-entry
+++ a/mm/oom_kill.c
@@ -550,7 +550,6 @@ static bool __oom_reap_task_mm(struct ta
*/
set_bit(MMF_UNSTABLE, &mm->flags);
- tlb_gather_mmu(&tlb, mm, 0, -1);
for (vma = mm->mmap ; vma; vma = vma->vm_next) {
if (!can_madv_dontneed_vma(vma))
continue;
@@ -565,11 +564,13 @@ static bool __oom_reap_task_mm(struct ta
* we do not want to block exit_mmap by keeping mm ref
* count elevated without a good reason.
*/
- if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED))
+ if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED)) {
+ tlb_gather_mmu(&tlb, mm, vma->vm_start, vma->vm_end);
unmap_page_range(&tlb, vma, vma->vm_start, vma->vm_end,
NULL);
+ tlb_finish_mmu(&tlb, vma->vm_start, vma->vm_end);
+ }
}
- tlb_finish_mmu(&tlb, 0, -1);
pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n",
task_pid_nr(tsk), tsk->comm,
K(get_mm_counter(mm, MM_ANONPAGES)),
_
Patches currently in -mm which might be from wangnan0(a)huawei.com are
The patch titled
Subject: mm, memory_hotplug: do not back off draining pcp free pages from kworker context
has been removed from the -mm tree. Its filename was
mm-memory_hotplug-do-not-back-off-draining-pcp-free-pages-from-kworker-context.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Michal Hocko <mhocko(a)suse.com>
Subject: mm, memory_hotplug: do not back off draining pcp free pages from kworker context
drain_all_pages backs off when called from a kworker context since
0ccce3b924212 ("mm, page_alloc: drain per-cpu pages from workqueue
context") because the original IPI based pcp draining has been replaced by
a WQ based one and the check wanted to prevent from recursion and inter
workers dependencies. This has made some sense at the time because the
system WQ has been used and one worker holding the lock could be blocked
while waiting for new workers to emerge which can be a problem under OOM
conditions.
Since then ce612879ddc7 ("mm: move pcp and lru-pcp draining into single
wq") has moved draining to a dedicated (mm_percpu_wq) WQ with a rescuer so
we shouldn't depend on any other WQ activity to make a forward progress so
calling drain_all_pages from a worker context is safe as long as this
doesn't happen from mm_percpu_wq itself which is not the case because all
workers are required to _not_ depend on any MM locks.
Why is this a problem in the first place? ACPI driven memory hot-remove
(acpi_device_hotplug) is executed from the worker context. We end up
calling __offline_pages to free all the pages and that requires both
lru_add_drain_all_cpuslocked and drain_all_pages to do their job otherwise
we can have dangling pages on pcp lists and fail the offline operation
(__test_page_isolated_in_pageblock would see a page with 0 ref. count but
without PageBuddy set).
Fix the issue by removing the worker check in drain_all_pages.
lru_add_drain_all_cpuslocked doesn't have this restriction so it works as
expected.
Link: http://lkml.kernel.org/r/20170828093341.26341-1-mhocko@kernel.org
Fixes: 0ccce3b924212 ("mm, page_alloc: drain per-cpu pages from workqueue context")
Signed-off-by: Michal Hocko <mhocko(a)suse.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [4.11+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 4 ----
1 file changed, 4 deletions(-)
diff -puN mm/page_alloc.c~mm-memory_hotplug-do-not-back-off-draining-pcp-free-pages-from-kworker-context mm/page_alloc.c
--- a/mm/page_alloc.c~mm-memory_hotplug-do-not-back-off-draining-pcp-free-pages-from-kworker-context
+++ a/mm/page_alloc.c
@@ -2507,10 +2507,6 @@ void drain_all_pages(struct zone *zone)
if (WARN_ON_ONCE(!mm_percpu_wq))
return;
- /* Workqueues cannot recurse */
- if (current->flags & PF_WQ_WORKER)
- return;
-
/*
* Do not drain if one is already in progress unless it's specific to
* a zone. Such callers are primarily CMA and memory hotplug and need
_
Patches currently in -mm which might be from mhocko(a)suse.com are
mm-drop-hotplug-lock-from-lru_add_drain_all.patch
mm-hugetlb-drop-hugepages_treat_as_movable-sysctl.patch
Hello,
(correcting stable tree mailing list address and add GregKH)
I have created this branch with the KAISER patches and dependencies to v4.9.y.
This is massive, I know. But I attempted to include all dependencies I saw
in the mailing list discussions. The backport is done from the tip/WIP.x86/mm
branch. The list of patches include:
a. Several patch dependencies that change x86 arch code so following applies.
b. Andy Lutomirski work to refactor the x86 entry code.
c. Andy Lutomirski work to do the x86 trampolim.
d. Dave Handen's work to incorporate the KAISER feature on x86.
e. Several fixes/improvements on KAISER by tglx and PeterZ.
Branch is here:
https://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux.git/log/?h=b…
Right now, I am still validating it under different scenarios. First shot on
the same lseek1 [1] micro bench, a see the Kernel being ~4x slower:
-> Without KAISER: ~14Mlseek/s.
-> With KAISER: ~3.6Mlseek/s.
If anybody is interested in testing please send feedback.
Also, if somebody else is working on a minimalist backport of the feature
to v4.9 or other stable kernels, let me know.
[1] - https://github.com/antonblanchard/will-it-scale/blob/master/tests/lseek1.c/
--
All the best,
Eduardo Valentin
This is a note to let you know that I've just added the patch titled
usbip: fix usbip attach to find a port that matches the requested
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 1ac7c8a78be85f84b019d3d2742d1a9f07255cc5 Mon Sep 17 00:00:00 2001
From: Shuah Khan <shuahkh(a)osg.samsung.com>
Date: Wed, 29 Nov 2017 15:24:22 -0700
Subject: usbip: fix usbip attach to find a port that matches the requested
speed
usbip attach fails to find a free port when the device on the first port
is a USB_SPEED_SUPER device and non-super speed device is being attached.
It keeps checking the first port and returns without a match getting stuck
in a loop.
Fix it check to find the first port with matching speed.
Reported-by: Juan Zea <juan.zea(a)qindel.com>
Signed-off-by: Shuah Khan <shuahkh(a)osg.samsung.com>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
tools/usb/usbip/libsrc/vhci_driver.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/tools/usb/usbip/libsrc/vhci_driver.c b/tools/usb/usbip/libsrc/vhci_driver.c
index 5727dfb15a83..8a1cd1616de4 100644
--- a/tools/usb/usbip/libsrc/vhci_driver.c
+++ b/tools/usb/usbip/libsrc/vhci_driver.c
@@ -329,9 +329,17 @@ int usbip_vhci_refresh_device_list(void)
int usbip_vhci_get_free_port(uint32_t speed)
{
for (int i = 0; i < vhci_driver->nports; i++) {
- if (speed == USB_SPEED_SUPER &&
- vhci_driver->idev[i].hub != HUB_SPEED_SUPER)
- continue;
+
+ switch (speed) {
+ case USB_SPEED_SUPER:
+ if (vhci_driver->idev[i].hub != HUB_SPEED_SUPER)
+ continue;
+ break;
+ default:
+ if (vhci_driver->idev[i].hub != HUB_SPEED_HIGH)
+ continue;
+ break;
+ }
if (vhci_driver->idev[i].status == VDEV_ST_NULL)
return vhci_driver->idev[i].port;
--
2.15.1
This is a note to let you know that I've just added the patch titled
usbip: Fix USB device hang due to wrong enabling of scatter-gather
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 770b2edece42fa55bbe7d4cbe53347a07b8968d4 Mon Sep 17 00:00:00 2001
From: Yuyang Du <yuyang.du(a)intel.com>
Date: Thu, 30 Nov 2017 10:22:40 +0800
Subject: usbip: Fix USB device hang due to wrong enabling of scatter-gather
The previous USB3 SuperSpeed enabling patches mistakenly enabled
URB scatter-gather chaining, which is actually not supported by
the VHCI HCD. This patch fixes that.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=197867
Fixes: 03cd00d538a6feb ("usbip: vhci-hcd: Set the vhci structure up to work")
Reported-by: Juan Zea <juan.zea(a)qindel.com>
Signed-off-by: Yuyang Du <yuyang.du(a)intel.com>
Cc: stable <stable(a)vger.kernel.org>
Acked-by: Shuah Khan <shuahkh(a)osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/usbip/vhci_hcd.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/usb/usbip/vhci_hcd.c b/drivers/usb/usbip/vhci_hcd.c
index 713e94170963..6b3278c4b72a 100644
--- a/drivers/usb/usbip/vhci_hcd.c
+++ b/drivers/usb/usbip/vhci_hcd.c
@@ -1098,7 +1098,6 @@ static int hcd_name_to_id(const char *name)
static int vhci_setup(struct usb_hcd *hcd)
{
struct vhci *vhci = *((void **)dev_get_platdata(hcd->self.controller));
- hcd->self.sg_tablesize = ~0;
if (usb_hcd_is_primary_hcd(hcd)) {
vhci->vhci_hcd_hs = hcd_to_vhci_hcd(hcd);
vhci->vhci_hcd_hs->vhci = vhci;
--
2.15.1
This is a note to let you know that I've just added the patch titled
usb: f_fs: Force Reserved1=1 in OS_DESC_EXT_COMPAT
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From a3acc696085e112733d191a77b106e67a4fa110b Mon Sep 17 00:00:00 2001
From: John Keeping <john(a)metanate.com>
Date: Mon, 27 Nov 2017 18:15:40 +0000
Subject: usb: f_fs: Force Reserved1=1 in OS_DESC_EXT_COMPAT
The specification says that the Reserved1 field in OS_DESC_EXT_COMPAT
must have the value "1", but when this feature was first implemented we
rejected any non-zero values.
This was adjusted to accept all non-zero values (while now rejecting
zero) in commit 53642399aa71 ("usb: gadget: f_fs: Fix wrong check on
reserved1 of OS_DESC_EXT_COMPAT"), but that breaks any userspace
programs that worked previously by returning EINVAL when Reserved1 == 0
which was previously the only value that succeeded!
If we just set the field to "1" ourselves, both old and new userspace
programs continue to work correctly and, as a bonus, old programs are
now compliant with the specification without having to fix anything
themselves.
Fixes: 53642399aa71 ("usb: gadget: f_fs: Fix wrong check on reserved1 of OS_DESC_EXT_COMPAT")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: John Keeping <john(a)metanate.com>
Signed-off-by: Felipe Balbi <felipe.balbi(a)linux.intel.com>
---
drivers/usb/gadget/function/f_fs.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c
index 9aa457b53e01..b6cf5ab5a0a1 100644
--- a/drivers/usb/gadget/function/f_fs.c
+++ b/drivers/usb/gadget/function/f_fs.c
@@ -2282,9 +2282,18 @@ static int __ffs_data_do_os_desc(enum ffs_os_desc_type type,
int i;
if (len < sizeof(*d) ||
- d->bFirstInterfaceNumber >= ffs->interfaces_count ||
- !d->Reserved1)
+ d->bFirstInterfaceNumber >= ffs->interfaces_count)
return -EINVAL;
+ if (d->Reserved1 != 1) {
+ /*
+ * According to the spec, Reserved1 must be set to 1
+ * but older kernels incorrectly rejected non-zero
+ * values. We fix it here to avoid returning EINVAL
+ * in response to values we used to accept.
+ */
+ pr_debug("usb_ext_compat_desc::Reserved1 forced to 1\n");
+ d->Reserved1 = 1;
+ }
for (i = 0; i < ARRAY_SIZE(d->Reserved2); ++i)
if (d->Reserved2[i])
return -EINVAL;
--
2.15.1
This is a note to let you know that I've just added the patch titled
usb: gadget: core: Fix ->udc_set_speed() speed handling
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From a4f0927ef588cf62bb864707261482c874352942 Mon Sep 17 00:00:00 2001
From: Roger Quadros <rogerq(a)ti.com>
Date: Tue, 31 Oct 2017 15:56:29 +0200
Subject: usb: gadget: core: Fix ->udc_set_speed() speed handling
Currently UDC core calls ->udc_set_speed() with the speed parameter
containing the maximum speed supported by the gadget function
driver. This might very well be more than that supported by the
UDC controller driver.
Select the lesser of the 2 speeds so both UDC and gadget function
driver are operating within limits.
This fixes PHY Erratic errors and 2 second enumeration delay on
TI's AM437x platforms.
Fixes: 6099eca796ae ("usb: gadget: core: introduce ->udc_set_speed() method")
Cc: <stable(a)vger.kernel.org> # v4.13+
Reported-by: Dylan Howey <Dylan.Howey(a)tennantco.com>
Signed-off-by: Roger Quadros <rogerq(a)ti.com>
Signed-off-by: Felipe Balbi <felipe.balbi(a)linux.intel.com>
---
drivers/usb/gadget/udc/core.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/gadget/udc/core.c b/drivers/usb/gadget/udc/core.c
index 61422d624ad0..93eff7dec2f5 100644
--- a/drivers/usb/gadget/udc/core.c
+++ b/drivers/usb/gadget/udc/core.c
@@ -1069,8 +1069,12 @@ static inline void usb_gadget_udc_stop(struct usb_udc *udc)
static inline void usb_gadget_udc_set_speed(struct usb_udc *udc,
enum usb_device_speed speed)
{
- if (udc->gadget->ops->udc_set_speed)
- udc->gadget->ops->udc_set_speed(udc->gadget, speed);
+ if (udc->gadget->ops->udc_set_speed) {
+ enum usb_device_speed s;
+
+ s = min(speed, udc->gadget->max_speed);
+ udc->gadget->ops->udc_set_speed(udc->gadget, s);
+ }
}
/**
--
2.15.1
This is a note to let you know that I've just added the patch titled
usb: gadget: udc: renesas_usb3: fix number of the pipes
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From a58204ab91ad8cae4d8474aa0ba5d1fc504860c9 Mon Sep 17 00:00:00 2001
From: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Date: Mon, 13 Nov 2017 17:59:18 +0900
Subject: usb: gadget: udc: renesas_usb3: fix number of the pipes
This controller on R-Car Gen3 has 6 pipes that included PIPE 0 for
control actually. But, the datasheet has error in writing as it has
31 pipes. (However, the previous code defined 30 pipes wrongly...)
Anyway, this patch fixes it.
Fixes: 746bfe63bba3 ("usb: gadget: renesas_usb3: add support for Renesas USB3.0 peripheral controller")
Cc: <stable(a)vger.kernel.org> # v4.5+
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Signed-off-by: Felipe Balbi <felipe.balbi(a)linux.intel.com>
---
drivers/usb/gadget/udc/renesas_usb3.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/udc/renesas_usb3.c b/drivers/usb/gadget/udc/renesas_usb3.c
index bc37f40baacf..6e87af248367 100644
--- a/drivers/usb/gadget/udc/renesas_usb3.c
+++ b/drivers/usb/gadget/udc/renesas_usb3.c
@@ -252,7 +252,7 @@
#define USB3_EP0_SS_MAX_PACKET_SIZE 512
#define USB3_EP0_HSFS_MAX_PACKET_SIZE 64
#define USB3_EP0_BUF_SIZE 8
-#define USB3_MAX_NUM_PIPES 30
+#define USB3_MAX_NUM_PIPES 6 /* This includes PIPE 0 */
#define USB3_WAIT_US 3
#define USB3_DMA_NUM_SETTING_AREA 4
/*
--
2.15.1
This is a note to let you know that I've just added the patch titled
USB: serial: usb_debug: add new USB device id
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 762ff4678e89a5e3f8b2237533e04d3ef2737e78 Mon Sep 17 00:00:00 2001
From: Lu Baolu <baolu.lu(a)linux.intel.com>
Date: Tue, 28 Nov 2017 12:40:59 +0800
Subject: USB: serial: usb_debug: add new USB device id
USB vendor id and product id for Linux USB Debug Target is added.
Signed-off-by: Lu Baolu <baolu.lu(a)linux.intel.com>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/usb/serial/usb_debug.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/usb/serial/usb_debug.c b/drivers/usb/serial/usb_debug.c
index ab5a2ac4993a..aaf4813e4971 100644
--- a/drivers/usb/serial/usb_debug.c
+++ b/drivers/usb/serial/usb_debug.c
@@ -31,12 +31,14 @@ static const struct usb_device_id id_table[] = {
};
static const struct usb_device_id dbc_id_table[] = {
+ { USB_DEVICE(0x1d6b, 0x0010) },
{ USB_DEVICE(0x1d6b, 0x0011) },
{ },
};
static const struct usb_device_id id_table_combined[] = {
{ USB_DEVICE(0x0525, 0x127a) },
+ { USB_DEVICE(0x1d6b, 0x0010) },
{ USB_DEVICE(0x1d6b, 0x0011) },
{ },
};
--
2.15.1
This is a note to let you know that I've just added the patch titled
USB: serial: option: add Quectel BG96 id
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From c654b21ede93845863597de9ad774fd30db5f2ab Mon Sep 17 00:00:00 2001
From: Sebastian Sjoholm <ssjoholm(a)mac.com>
Date: Mon, 20 Nov 2017 19:29:32 +0100
Subject: USB: serial: option: add Quectel BG96 id
Quectel BG96 is an Qualcomm MDM9206 based IoT modem, supporting both
CAT-M and NB-IoT. Tested hardware is BG96 mounted on Quectel
development board (EVB). The USB id is added to option.c to allow
DIAG,GPS,AT and modem communication with the BG96.
Signed-off-by: Sebastian Sjoholm <ssjoholm(a)mac.com>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/usb/serial/option.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c
index aaa7d901a06d..3b3513874cfd 100644
--- a/drivers/usb/serial/option.c
+++ b/drivers/usb/serial/option.c
@@ -238,6 +238,7 @@ static void option_instat_callback(struct urb *urb);
/* These Quectel products use Quectel's vendor ID */
#define QUECTEL_PRODUCT_EC21 0x0121
#define QUECTEL_PRODUCT_EC25 0x0125
+#define QUECTEL_PRODUCT_BG96 0x0296
#define CMOTECH_VENDOR_ID 0x16d8
#define CMOTECH_PRODUCT_6001 0x6001
@@ -1182,6 +1183,8 @@ static const struct usb_device_id option_ids[] = {
.driver_info = (kernel_ulong_t)&net_intf4_blacklist },
{ USB_DEVICE(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EC25),
.driver_info = (kernel_ulong_t)&net_intf4_blacklist },
+ { USB_DEVICE(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_BG96),
+ .driver_info = (kernel_ulong_t)&net_intf4_blacklist },
{ USB_DEVICE(CMOTECH_VENDOR_ID, CMOTECH_PRODUCT_6001) },
{ USB_DEVICE(CMOTECH_VENDOR_ID, CMOTECH_PRODUCT_CMU_300) },
{ USB_DEVICE(CMOTECH_VENDOR_ID, CMOTECH_PRODUCT_6003),
--
2.15.1
Dear stable maintainers,
please backport commit 9968e12a291e639dd51d1218b694d440b22a917f
("platform/x86: hp-wmi: Fix tablet mode detection for convertibles") to
all active stable branches which have (a backport of) commit
f9cf3b2880cc ("platform/x86: hp-wmi: Refactor dock and tablet state
fetchers"). It fixes the internal keyboard and pointing devices of HP
laptops not working (with current libinput) while the laptop is docked.
Thanks,
--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer
This is a note to let you know that I've just added the patch titled
platform/x86: hp-wmi: Fix tablet mode detection for convertibles
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
platform-x86-hp-wmi-fix-tablet-mode-detection-for-convertibles.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 9968e12a291e639dd51d1218b694d440b22a917f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Stefan=20Br=C3=BCns?= <stefan.bruens(a)rwth-aachen.de>
Date: Fri, 3 Nov 2017 03:01:53 +0100
Subject: platform/x86: hp-wmi: Fix tablet mode detection for convertibles
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
From: Stefan Brüns <stefan.bruens(a)rwth-aachen.de>
commit 9968e12a291e639dd51d1218b694d440b22a917f upstream.
Commit f9cf3b2880cc ("platform/x86: hp-wmi: Refactor dock and tablet
state fetchers") consolidated the methods for docking and laptop mode
detection, but omitted to apply the correct mask for the laptop mode
(it always uses the constant for docking).
Fixes: f9cf3b2880cc ("platform/x86: hp-wmi: Refactor dock and tablet state fetchers")
Signed-off-by: Stefan Brüns <stefan.bruens(a)rwth-aachen.de>
Cc: Michel Dänzer <michel(a)daenzer.net>
Signed-off-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/platform/x86/hp-wmi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/platform/x86/hp-wmi.c
+++ b/drivers/platform/x86/hp-wmi.c
@@ -297,7 +297,7 @@ static int hp_wmi_hw_state(int mask)
if (state < 0)
return state;
- return state & 0x1;
+ return !!(state & mask);
}
static int __init hp_wmi_bios_2008_later(void)
Patches currently in stable-queue which might be from stefan.bruens(a)rwth-aachen.de are
queue-4.14/platform-x86-hp-wmi-fix-tablet-mode-detection-for-convertibles.patch
The patch
ASoC: codecs: msm8916-wcd: Fix supported formats
has been applied to the asoc tree at
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git
All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.
You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.
If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.
Please add any relevant lists and maintainers to the CCs when replying
to this mail.
Thanks,
Mark
>From 51f493ae71adc2c49a317a13c38e54e1cdf46005 Mon Sep 17 00:00:00 2001
From: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Date: Thu, 30 Nov 2017 10:15:02 +0000
Subject: [PATCH] ASoC: codecs: msm8916-wcd: Fix supported formats
This codec is configurable for only 16 bit and 32 bit samples, so reflect
this in the supported formats also remove 24bit sample from supported list.
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
sound/soc/codecs/msm8916-wcd-analog.c | 2 +-
sound/soc/codecs/msm8916-wcd-digital.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/sound/soc/codecs/msm8916-wcd-analog.c b/sound/soc/codecs/msm8916-wcd-analog.c
index 5f3c42c4f74a..066ea2f4ce7b 100644
--- a/sound/soc/codecs/msm8916-wcd-analog.c
+++ b/sound/soc/codecs/msm8916-wcd-analog.c
@@ -267,7 +267,7 @@
#define MSM8916_WCD_ANALOG_RATES (SNDRV_PCM_RATE_8000 | SNDRV_PCM_RATE_16000 |\
SNDRV_PCM_RATE_32000 | SNDRV_PCM_RATE_48000)
#define MSM8916_WCD_ANALOG_FORMATS (SNDRV_PCM_FMTBIT_S16_LE |\
- SNDRV_PCM_FMTBIT_S24_LE)
+ SNDRV_PCM_FMTBIT_S32_LE)
static int btn_mask = SND_JACK_BTN_0 | SND_JACK_BTN_1 |
SND_JACK_BTN_2 | SND_JACK_BTN_3 | SND_JACK_BTN_4;
diff --git a/sound/soc/codecs/msm8916-wcd-digital.c b/sound/soc/codecs/msm8916-wcd-digital.c
index a10a724eb448..13354d6304a8 100644
--- a/sound/soc/codecs/msm8916-wcd-digital.c
+++ b/sound/soc/codecs/msm8916-wcd-digital.c
@@ -194,7 +194,7 @@
SNDRV_PCM_RATE_32000 | \
SNDRV_PCM_RATE_48000)
#define MSM8916_WCD_DIGITAL_FORMATS (SNDRV_PCM_FMTBIT_S16_LE |\
- SNDRV_PCM_FMTBIT_S24_LE)
+ SNDRV_PCM_FMTBIT_S32_LE)
struct msm8916_wcd_digital_priv {
struct clk *ahbclk, *mclk;
@@ -645,7 +645,7 @@ static int msm8916_wcd_digital_hw_params(struct snd_pcm_substream *substream,
RX_I2S_CTL_RX_I2S_MODE_MASK,
RX_I2S_CTL_RX_I2S_MODE_16);
break;
- case SNDRV_PCM_FORMAT_S24_LE:
+ case SNDRV_PCM_FORMAT_S32_LE:
snd_soc_update_bits(dai->codec, LPASS_CDC_CLK_TX_I2S_CTL,
TX_I2S_CTL_TX_I2S_MODE_MASK,
TX_I2S_CTL_TX_I2S_MODE_32);
--
2.15.0
d_prune_aliases() does not prune dentry that has non-zero reference
count. If we want trim_caps() to trim 'N' caps, the first 'N' inodes
in session->s_caps are all directories with child dentry, trim_caps()
can fail to trim anything.
Cc: stable(a)vger.kernel.org
Signed-off-by: "Yan, Zheng" <zyan(a)redhat.com>
---
fs/ceph/mds_client.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 80339e278ad8..da181acd4a61 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -1490,16 +1490,20 @@ static int trim_caps_cb(struct inode *inode, struct ceph_cap *cap, void *arg)
if ((used | wanted) & ~oissued & mine)
goto out; /* we need these caps */
- session->s_trim_caps--;
if (oissued) {
/* we aren't the only cap.. just remove us */
__ceph_remove_cap(cap, true);
+ session->s_trim_caps--;
} else {
+ int refs;
/* try dropping referring dentries */
spin_unlock(&ci->i_ceph_lock);
d_prune_aliases(inode);
+ refs = atomic_read(&inode->i_count);
+ if (refs == 1)
+ session->s_trim_caps--;
dout("trim_caps_cb %p cap %p pruned, count now %d\n",
- inode, cap, atomic_read(&inode->i_count));
+ inode, cap, refs);
return 0;
}
--
2.13.6
Tomas Charvat <tc(a)excello.cz> wrote:
[ CC stable, Steffen ]
> Hi Florian and David, I'm running several servers that use XFRM ipsec.
> It do work well on all kernels bellow 4.14.0.
>
> It doesnt work on 4.14.0-2. There is no any error in dmesg or in
> userspace when I do configure policies.
>
> Since there is not much info about XFRM in dmesg I have no clue, where
> to start when I want to debug this issue.
David, please consider picking up
94802151894d482e82c324edf2c658f8e6b96508
("Revert "xfrm: Fix stack-out-of-bounds read in xfrm_state_find.")
for the 4.14.y stable queue.
I think its a pretty safe bet that this fixes the problem, it broke
transport mode wildcard policy lookup.
> I have seen that you have removed flow-cache that we have fixed 2 time.
> Do you have clue where to start with debug of this issue ?
If the revert doesn't help, please do a bug report to
netdev(a)vger.kernel.org and provide /proc/net/xfrm_stat content
and the list of policies/SAs.
[BUG]
fstrim on some btrfs only trims the unallocated space, not trimming any
space in existing block groups.
[CAUSE]
Before fstrim_range passed to btrfs_trim_fs(), it get truncated to
range [0, super->total_bytes).
So later btrfs_trim_fs() will only be able to trim block groups in range
[0, super->total_bytes).
While for btrfs, any bytenr aligned to sector size is valid, since btrfs use
its logical address space, there is nothing limiting the location where
we put block groups.
For btrfs with routine balance, it's quite easy to relocate all
block groups and bytenr of block groups will start beyond super->total_bytes.
In that case, btrfs will not trim existing block groups.
[FIX]
Just remove the truncation in btrfs_ioctl_fitrim(), so btrfs_trim_fs()
can get the unmodified range, which is normally set to [0, U64_MAX].
Reported-by: Chris Murphy <lists(a)colorremedies.com>
Fixes: f4c697e6406d ("btrfs: return EINVAL if start > total_bytes in fitrim ioctl")
Cc: <stable(a)vger.kernel.org> # v4.0+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
---
changelog:
v2:
Locate the root cause in btrfs_ioctl_fitrim(), remove the truncation
so we can still allow user to trim custom range.
---
fs/btrfs/ioctl.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index fd172a93d11a..017fda31400d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -365,7 +365,6 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, void __user *arg)
struct fstrim_range range;
u64 minlen = ULLONG_MAX;
u64 num_devices = 0;
- u64 total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
int ret;
if (!capable(CAP_SYS_ADMIN))
@@ -389,11 +388,15 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, void __user *arg)
return -EOPNOTSUPP;
if (copy_from_user(&range, arg, sizeof(range)))
return -EFAULT;
- if (range.start > total_bytes ||
- range.len < fs_info->sb->s_blocksize)
+
+ /*
+ * NOTE: Don't truncate the range using super->total_bytes.
+ * Bytenr of btrfs block group is in btrfs logical address space,
+ * which can be any sector size aligned bytenr in [0, U64_MAX].
+ */
+ if (range.len < fs_info->sb->s_blocksize)
return -EINVAL;
- range.len = min(range.len, total_bytes - range.start);
range.minlen = max(range.minlen, minlen);
ret = btrfs_trim_fs(fs_info, &range);
if (ret < 0)
--
2.15.0
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Subject: mm/hugetlb: fix NULL-pointer dereference on 5-level paging machine
I made a mistake during converting hugetlb code to 5-level paging: in
huge_pte_alloc() we have to use p4d_alloc(), not p4d_offset(). Otherwise
it leads to crash -- NULL-pointer dereference in pud_alloc() if p4d table
is not yet allocated.
It only can happen in 5-level paging mode. In 4-level paging mode
p4d_offset() always returns pgd, so we are fine.
Link: http://lkml.kernel.org/r/20171122121921.64822-1-kirill.shutemov@linux.intel…
Fixes: c2febafc6773 ("mm: convert generic code to 5-level paging")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: <stable(a)vger.kernel.org> [4.11+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff -puN mm/hugetlb.c~mm-hugetlb-fix-null-pointer-dereference-on-5-level-paging-machine mm/hugetlb.c
--- a/mm/hugetlb.c~mm-hugetlb-fix-null-pointer-dereference-on-5-level-paging-machine
+++ a/mm/hugetlb.c
@@ -4635,7 +4635,9 @@ pte_t *huge_pte_alloc(struct mm_struct *
pte_t *pte = NULL;
pgd = pgd_offset(mm, addr);
- p4d = p4d_offset(pgd, addr);
+ p4d = p4d_alloc(mm, pgd, addr);
+ if (!p4d)
+ return NULL;
pud = pud_alloc(mm, p4d, addr);
if (pud) {
if (sz == PUD_SIZE) {
_
From: Ian Kent <raven(a)themaw.net>
Subject: autofs: revert "autofs: fix AT_NO_AUTOMOUNT not being honored"
42f4614821 ("autofs: fix AT_NO_AUTOMOUNT not being honored") allowed the
fstatat(2) system call to properly honor the AT_NO_AUTOMOUNT flag but
introduced a semantic change.
In order to honor AT_NO_AUTOMOUNT a semantic change was made to the
negative dentry case for stat family system calls in follow_automount().
This changed the unconditional triggering of an automount in this case to
no longer be done and an error returned instead.
This has caused more problems than I expected so reverting the change is
needed.
In a discussion with Neil Brown it was concluded that the automount(8)
daemon can implement this change without kernel modifications. So that
will be done instead and the autofs module documentation updated with a
description of the problem and what needs to be done by module users for
this specific case.
Link: http://lkml.kernel.org/r/151174730120.6162.3848002191530283984.stgit@pluto.…
Fixes: 42f4614821 ("autofs: fix AT_NO_AUTOMOUNT not being honored")
Signed-off-by: Ian Kent <raven(a)themaw.net>
Cc: Neil Brown <neilb(a)suse.com>
Cc: Al Viro <viro(a)ZenIV.linux.org.uk>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Colin Walters <walters(a)redhat.com>
Cc: Ondrej Holy <oholy(a)redhat.com>
Cc: <stable(a)vger.kernel.org> [4.11+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/namei.c | 15 +++------------
include/linux/fs.h | 3 ++-
2 files changed, 5 insertions(+), 13 deletions(-)
diff -puN fs/namei.c~autofs-revert-fix-at_no_automount-not-being-honored fs/namei.c
--- a/fs/namei.c~autofs-revert-fix-at_no_automount-not-being-honored
+++ a/fs/namei.c
@@ -1129,18 +1129,9 @@ static int follow_automount(struct path
* of the daemon to instantiate them before they can be used.
*/
if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
- LOOKUP_OPEN | LOOKUP_CREATE |
- LOOKUP_AUTOMOUNT))) {
- /* Positive dentry that isn't meant to trigger an
- * automount, EISDIR will allow it to be used,
- * otherwise there's no mount here "now" so return
- * ENOENT.
- */
- if (path->dentry->d_inode)
- return -EISDIR;
- else
- return -ENOENT;
- }
+ LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
+ path->dentry->d_inode)
+ return -EISDIR;
if (path->dentry->d_sb->s_user_ns != &init_user_ns)
return -EACCES;
diff -puN include/linux/fs.h~autofs-revert-fix-at_no_automount-not-being-honored include/linux/fs.h
--- a/include/linux/fs.h~autofs-revert-fix-at_no_automount-not-being-honored
+++ a/include/linux/fs.h
@@ -3088,7 +3088,8 @@ static inline int vfs_lstat(const char _
static inline int vfs_fstatat(int dfd, const char __user *filename,
struct kstat *stat, int flags)
{
- return vfs_statx(dfd, filename, flags, stat, STATX_BASIC_STATS);
+ return vfs_statx(dfd, filename, flags | AT_NO_AUTOMOUNT,
+ stat, STATX_BASIC_STATS);
}
static inline int vfs_fstat(int fd, struct kstat *stat)
{
_
From: Ian Kent <raven(a)themaw.net>
Subject: autofs: revert "autofs: take more care to not update last_used on path walk"
While 092a53452b ("autofs: take more care to not update last_used on path
walk") helped (partially) resolve a problem where automounts were not
expiring due to aggressive accesses from user space it has a side effect
for very large environments.
This change helps with the expire problem by making the expire more
aggressive but, for very large environments, that means more mount
requests from clients. When there are a lot of clients that can mean
fairly significant server load increases.
It turns out I put the last_used in this position to solve this very
problem and failed to update my own thinking of the autofs expire policy.
So the patch being reverted introduces a regression which should be fixed.
Link: http://lkml.kernel.org/r/151174729420.6162.1832622523537052460.stgit@pluto.…
Fixes: 092a53452b ("autofs: take more care to not update last_used on path walk")
Signed-off-by: Ian Kent <raven(a)themaw.net>
Reviewed-by: NeilBrown <neilb(a)suse.com>
Cc: Al Viro <viro(a)ZenIV.linux.org.uk>
Cc: <stable(a)vger.kernel.org> [4.11+]
Cc: Colin Walters <walters(a)redhat.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Ondrej Holy <oholy(a)redhat.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/autofs4/root.c | 17 ++++++-----------
1 file changed, 6 insertions(+), 11 deletions(-)
diff -puN fs/autofs4/root.c~autofs-revert-take-more-care-to-not-update-last_used-on-path-walk fs/autofs4/root.c
--- a/fs/autofs4/root.c~autofs-revert-take-more-care-to-not-update-last_used-on-path-walk
+++ a/fs/autofs4/root.c
@@ -281,8 +281,8 @@ static int autofs4_mount_wait(const stru
pr_debug("waiting for mount name=%pd\n", path->dentry);
status = autofs4_wait(sbi, path, NFY_MOUNT);
pr_debug("mount wait done status=%d\n", status);
- ino->last_used = jiffies;
}
+ ino->last_used = jiffies;
return status;
}
@@ -321,21 +321,16 @@ static struct dentry *autofs4_mountpoint
*/
if (autofs_type_indirect(sbi->type) && d_unhashed(dentry)) {
struct dentry *parent = dentry->d_parent;
+ struct autofs_info *ino;
struct dentry *new;
new = d_lookup(parent, &dentry->d_name);
if (!new)
return NULL;
- if (new == dentry)
- dput(new);
- else {
- struct autofs_info *ino;
-
- ino = autofs4_dentry_ino(new);
- ino->last_used = jiffies;
- dput(path->dentry);
- path->dentry = new;
- }
+ ino = autofs4_dentry_ino(new);
+ ino->last_used = jiffies;
+ dput(path->dentry);
+ path->dentry = new;
}
return path->dentry;
}
_
From: Shakeel Butt <shakeelb(a)google.com>
Subject: mm, memcg: fix mem_cgroup_swapout() for THPs
d6810d730022 ("memcg, THP, swap: make mem_cgroup_swapout() support THP")
changed mem_cgroup_swapout() to support transparent huge page (THP).
However the patch missed one location which should be changed for
correctly handling THPs. The resulting bug will cause the memory cgroups
whose THPs were swapped out to become zombies on deletion.
Link: http://lkml.kernel.org/r/20171128161941.20931-1-shakeelb@google.com
Fixes: d6810d730022 ("memcg, THP, swap: make mem_cgroup_swapout() support THP")
Signed-off-by: Shakeel Butt <shakeelb(a)google.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Huang Ying <ying.huang(a)intel.com>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: Greg Thelen <gthelen(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN mm/memcontrol.c~mm-memcg-fix-mem_cgroup_swapout-for-thps mm/memcontrol.c
--- a/mm/memcontrol.c~mm-memcg-fix-mem_cgroup_swapout-for-thps
+++ a/mm/memcontrol.c
@@ -6044,7 +6044,7 @@ void mem_cgroup_swapout(struct page *pag
memcg_check_events(memcg, page);
if (!mem_cgroup_is_root(memcg))
- css_put(&memcg->css);
+ css_put_many(&memcg->css, nr_entries);
}
/**
_
From: Zi Yan <zi.yan(a)cs.rutgers.edu>
Subject: mm: migrate: fix an incorrect call of prep_transhuge_page()
In https://lkml.org/lkml/2017/11/20/411, Andrea reported that during
memory hotplug/hot remove prep_transhuge_page() is called incorrectly on
non-THP pages for migration, when THP is on but THP migration is not
enabled. This leads to a bad state of target pages for migration.
By inspecting the code, if called on a non-THP, prep_transhuge_page() will
1) change the value of the mapping of (page + 2), since it is used
for THP deferred list;
2) change the lru value of (page + 1), since it is used for THP's
dtor.
Both can lead to data corruption of these two pages.
Andrea said:
: Pragmatically and from the point of view of the memory_hotplug subsys, the
: effect is a kernel crash when pages are being migrated during a memory hot
: remove offline and migration target pages are found in a bad state.
This patch fixes it by only calling prep_transhuge_page() when we are
certain that the target page is THP.
Link: http://lkml.kernel.org/r/20171121021855.50525-1-zi.yan@sent.com
Fixes: 8135d8926c08 ("mm: memory_hotplug: memory hotremove supports thp migration")
Signed-off-by: Zi Yan <zi.yan(a)cs.rutgers.edu>
Reported-by: Andrea Reale <ar(a)linux.vnet.ibm.com>
Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
Cc: <stable(a)vger.kernel.org> [4.14]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/migrate.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN include/linux/migrate.h~mm-migrate-fix-an-incorrect-call-of-prep_transhuge_page include/linux/migrate.h
--- a/include/linux/migrate.h~mm-migrate-fix-an-incorrect-call-of-prep_transhuge_page
+++ a/include/linux/migrate.h
@@ -54,7 +54,7 @@ static inline struct page *new_page_node
new_page = __alloc_pages_nodemask(gfp_mask, order,
preferred_nid, nodemask);
- if (new_page && PageTransHuge(page))
+ if (new_page && PageTransHuge(new_page))
prep_transhuge_page(new_page);
return new_page;
_
From: chenjie <chenjie6(a)huawei.com>
Subject: mm/madvise.c: fix madvise() infinite loop under special circumstances
MADVISE_WILLNEED has always been a noop for DAX (formerly XIP) mappings.
Unfortunately madvise_willneed() doesn't communicate this information
properly to the generic madvise syscall implementation. The calling
convention is quite subtle there. madvise_vma() is supposed to either
return an error or update &prev otherwise the main loop will never advance
to the next vma and it will keep looping for ever without a way to get out
of the kernel.
It seems this has been broken since introduction. Nobody has noticed
because nobody seems to be using MADVISE_WILLNEED on these DAX mappings.
[mhocko(a)suse.com: rewrite changelog]
Link: http://lkml.kernel.org/r/20171127115318.911-1-guoxuenan@huawei.com
Fixes: fe77ba6f4f97 ("[PATCH] xip: madvice/fadvice: execute in place")
Signed-off-by: chenjie <chenjie6(a)huawei.com>
Signed-off-by: guoxuenan <guoxuenan(a)huawei.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: zhangyi (F) <yi.zhang(a)huawei.com>
Cc: Miao Xie <miaoxie(a)huawei.com>
Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com>
Cc: Shaohua Li <shli(a)fb.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Anshuman Khandual <khandual(a)linux.vnet.ibm.com>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Carsten Otte <cotte(a)de.ibm.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/madvise.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff -puN mm/madvise.c~mmmadvise-bugfix-of-madvise-systemcall-infinite-loop-under-special-circumstances mm/madvise.c
--- a/mm/madvise.c~mmmadvise-bugfix-of-madvise-systemcall-infinite-loop-under-special-circumstances
+++ a/mm/madvise.c
@@ -276,15 +276,14 @@ static long madvise_willneed(struct vm_a
{
struct file *file = vma->vm_file;
+ *prev = vma;
#ifdef CONFIG_SWAP
if (!file) {
- *prev = vma;
force_swapin_readahead(vma, start, end);
return 0;
}
if (shmem_mapping(file->f_mapping)) {
- *prev = vma;
force_shm_swapin_readahead(vma, start, end,
file->f_mapping);
return 0;
@@ -299,7 +298,6 @@ static long madvise_willneed(struct vm_a
return 0;
}
- *prev = vma;
start = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
if (end > vma->vm_end)
end = vma->vm_end;
_
From: Kees Cook <keescook(a)chromium.org>
Subject: exec: avoid RLIMIT_STACK races with prlimit()
While the defense-in-depth RLIMIT_STACK limit on setuid processes was
protected against races from other threads calling setrlimit(), I missed
protecting it against races from external processes calling prlimit().
This adds locking around the change and makes sure that rlim_max is set
too.
Link: http://lkml.kernel.org/r/20171127193457.GA11348@beast
Fixes: 64701dee4178e ("exec: Use sane stack rlimit under secureexec")
Signed-off-by: Kees Cook <keescook(a)chromium.org>
Reported-by: Ben Hutchings <ben.hutchings(a)codethink.co.uk>
Reported-by: Brad Spengler <spender(a)grsecurity.net>
Acked-by: Serge Hallyn <serge(a)hallyn.com>
Cc: James Morris <james.l.morris(a)oracle.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Jiri Slaby <jslaby(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/exec.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff -puN fs/exec.c~exec-avoid-rlimit_stack-races-with-prlimit fs/exec.c
--- a/fs/exec.c~exec-avoid-rlimit_stack-races-with-prlimit
+++ a/fs/exec.c
@@ -1340,10 +1340,15 @@ void setup_new_exec(struct linux_binprm
* avoid bad behavior from the prior rlimits. This has to
* happen before arch_pick_mmap_layout(), which examines
* RLIMIT_STACK, but after the point of no return to avoid
- * needing to clean up the change on failure.
+ * races from other threads changing the limits. This also
+ * must be protected from races with prlimit() calls.
*/
+ task_lock(current->group_leader);
if (current->signal->rlim[RLIMIT_STACK].rlim_cur > _STK_LIM)
current->signal->rlim[RLIMIT_STACK].rlim_cur = _STK_LIM;
+ if (current->signal->rlim[RLIMIT_STACK].rlim_max > _STK_LIM)
+ current->signal->rlim[RLIMIT_STACK].rlim_max = _STK_LIM;
+ task_unlock(current->group_leader);
}
arch_pick_mmap_layout(current->mm);
_
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: IB/core: disable memory registration of filesystem-dax vmas
Until there is a solution to the dma-to-dax vs truncate problem it is not
safe to allow RDMA to create long standing memory registrations against
filesytem-dax vmas.
Link: http://lkml.kernel.org/r/151068941011.7446.7766030590347262502.stgit@dwilli…
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reported-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Acked-by: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Sean Hefty <sean.hefty(a)intel.com>
Cc: Doug Ledford <dledford(a)redhat.com>
Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Cc: Inki Dae <inki.dae(a)samsung.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Joonyoung Shim <jy0922.shim(a)samsung.com>
Cc: Kyungmin Park <kyungmin.park(a)samsung.com>
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/infiniband/core/umem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN drivers/infiniband/core/umem.c~ib-core-disable-memory-registration-of-fileystem-dax-vmas drivers/infiniband/core/umem.c
--- a/drivers/infiniband/core/umem.c~ib-core-disable-memory-registration-of-fileystem-dax-vmas
+++ a/drivers/infiniband/core/umem.c
@@ -191,7 +191,7 @@ struct ib_umem *ib_umem_get(struct ib_uc
sg_list_start = umem->sg_head.sgl;
while (npages) {
- ret = get_user_pages(cur_base,
+ ret = get_user_pages_longterm(cur_base,
min_t(unsigned long, npages,
PAGE_SIZE / sizeof (struct page *)),
gup_flags, page_list, vma_list);
_
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: v4l2: disable filesystem-dax mapping support
V4L2 memory registrations are incompatible with filesystem-dax that needs
the ability to revoke dma access to a mapping at will, or otherwise allow
the kernel to wait for completion of DMA. The filesystem-dax
implementation breaks the traditional solution of truncate of active file
backed mappings since there is no page-cache page we can orphan to sustain
ongoing DMA.
If v4l2 wants to support long lived DMA mappings it needs to arrange to
hold a file lease or use some other mechanism so that the kernel can
coordinate revoking DMA access when the filesystem needs to truncate
mappings.
Link: http://lkml.kernel.org/r/151068940499.7446.12846708245365671207.stgit@dwill…
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reported-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Doug Ledford <dledford(a)redhat.com>
Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com>
Cc: Inki Dae <inki.dae(a)samsung.com>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Joonyoung Shim <jy0922.shim(a)samsung.com>
Cc: Kyungmin Park <kyungmin.park(a)samsung.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Cc: Sean Hefty <sean.hefty(a)intel.com>
Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/media/v4l2-core/videobuf-dma-sg.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff -puN drivers/media/v4l2-core/videobuf-dma-sg.c~v4l2-disable-filesystem-dax-mapping-support drivers/media/v4l2-core/videobuf-dma-sg.c
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c~v4l2-disable-filesystem-dax-mapping-support
+++ a/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -185,12 +185,13 @@ static int videobuf_dma_init_user_locked
dprintk(1, "init user [0x%lx+0x%lx => %d pages]\n",
data, size, dma->nr_pages);
- err = get_user_pages(data & PAGE_MASK, dma->nr_pages,
+ err = get_user_pages_longterm(data & PAGE_MASK, dma->nr_pages,
flags, dma->pages, NULL);
if (err != dma->nr_pages) {
dma->nr_pages = (err >= 0) ? err : 0;
- dprintk(1, "get_user_pages: err=%d [%d]\n", err, dma->nr_pages);
+ dprintk(1, "get_user_pages_longterm: err=%d [%d]\n", err,
+ dma->nr_pages);
return err < 0 ? err : -EINVAL;
}
return 0;
_
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: mm: fail get_vaddr_frames() for filesystem-dax mappings
Until there is a solution to the dma-to-dax vs truncate problem it is not
safe to allow V4L2, Exynos, and other frame vector users to create long
standing / irrevocable memory registrations against filesytem-dax vmas.
[dan.j.williams(a)intel.com: add comment for vma_is_fsdax() check in get_vaddr_frames(), per Jan]
Link: http://lkml.kernel.org/r/151197874035.26211.4061781453123083667.stgit@dwill…
Link: http://lkml.kernel.org/r/151068939985.7446.15684639617389154187.stgit@dwill…
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Cc: Inki Dae <inki.dae(a)samsung.com>
Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com>
Cc: Joonyoung Shim <jy0922.shim(a)samsung.com>
Cc: Kyungmin Park <kyungmin.park(a)samsung.com>
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Doug Ledford <dledford(a)redhat.com>
Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Cc: Sean Hefty <sean.hefty(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/frame_vector.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff -puN mm/frame_vector.c~mm-fail-get_vaddr_frames-for-filesystem-dax-mappings mm/frame_vector.c
--- a/mm/frame_vector.c~mm-fail-get_vaddr_frames-for-filesystem-dax-mappings
+++ a/mm/frame_vector.c
@@ -53,6 +53,18 @@ int get_vaddr_frames(unsigned long start
ret = -EFAULT;
goto out;
}
+
+ /*
+ * While get_vaddr_frames() could be used for transient (kernel
+ * controlled lifetime) pinning of memory pages all current
+ * users establish long term (userspace controlled lifetime)
+ * page pinning. Treat get_vaddr_frames() like
+ * get_user_pages_longterm() and disallow it for filesystem-dax
+ * mappings.
+ */
+ if (vma_is_fsdax(vma))
+ return -EOPNOTSUPP;
+
if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
vec->got_ref = true;
vec->is_pfns = false;
_
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: mm: introduce get_user_pages_longterm
Patch series "introduce get_user_pages_longterm()", v2.
Here is a new get_user_pages api for cases where a driver intends to keep
an elevated page count indefinitely. This is distinct from usages like
iov_iter_get_pages where the elevated page counts are transient. The
iov_iter_get_pages cases immediately turn around and submit the pages to a
device driver which will put_page when the i/o operation completes (under
kernel control).
In the longterm case userspace is responsible for dropping the page
reference at some undefined point in the future. This is untenable for
filesystem-dax case where the filesystem is in control of the lifetime of
the block / page and needs reasonable limits on how long it can wait for
pages in a mapping to become idle.
Fixing filesystems to actually wait for dax pages to be idle before blocks
from a truncate/hole-punch operation are repurposed is saved for a later
patch series.
Also, allowing longterm registration of dax mappings is a future patch
series that introduces a "map with lease" semantic where the kernel can
revoke a lease and force userspace to drop its page references.
I have also tagged these for -stable to purposely break cases that might
assume that longterm memory registrations for filesystem-dax mappings were
supported by the kernel. The behavior regression this policy change
implies is one of the reasons we maintain the "dax enabled. Warning:
EXPERIMENTAL, use at your own risk" notification when mounting a
filesystem in dax mode.
It is worth noting the device-dax interface does not suffer the same
constraints since it does not support file space management operations
like hole-punch.
This patch (of 4):
Until there is a solution to the dma-to-dax vs truncate problem it is not
safe to allow long standing memory registrations against filesytem-dax
vmas. Device-dax vmas do not have this problem and are explicitly
allowed.
This is temporary until a "memory registration with layout-lease"
mechanism can be implemented for the affected sub-systems (RDMA and V4L2).
[akpm(a)linux-foundation.org: use kcalloc()]
Link: http://lkml.kernel.org/r/151068939435.7446.13560129395419350737.stgit@dwill…
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Suggested-by: Christoph Hellwig <hch(a)lst.de>
Cc: Doug Ledford <dledford(a)redhat.com>
Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com>
Cc: Inki Dae <inki.dae(a)samsung.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Joonyoung Shim <jy0922.shim(a)samsung.com>
Cc: Kyungmin Park <kyungmin.park(a)samsung.com>
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com>
Cc: Sean Hefty <sean.hefty(a)intel.com>
Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/fs.h | 14 +++++++++
include/linux/mm.h | 13 ++++++++
mm/gup.c | 64 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 91 insertions(+)
diff -puN include/linux/fs.h~mm-introduce-get_user_pages_longterm include/linux/fs.h
--- a/include/linux/fs.h~mm-introduce-get_user_pages_longterm
+++ a/include/linux/fs.h
@@ -3194,6 +3194,20 @@ static inline bool vma_is_dax(struct vm_
return vma->vm_file && IS_DAX(vma->vm_file->f_mapping->host);
}
+static inline bool vma_is_fsdax(struct vm_area_struct *vma)
+{
+ struct inode *inode;
+
+ if (!vma->vm_file)
+ return false;
+ if (!vma_is_dax(vma))
+ return false;
+ inode = file_inode(vma->vm_file);
+ if (inode->i_mode == S_IFCHR)
+ return false; /* device-dax */
+ return true;
+}
+
static inline int iocb_flags(struct file *file)
{
int res = 0;
diff -puN include/linux/mm.h~mm-introduce-get_user_pages_longterm include/linux/mm.h
--- a/include/linux/mm.h~mm-introduce-get_user_pages_longterm
+++ a/include/linux/mm.h
@@ -1380,6 +1380,19 @@ long get_user_pages_locked(unsigned long
unsigned int gup_flags, struct page **pages, int *locked);
long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
struct page **pages, unsigned int gup_flags);
+#ifdef CONFIG_FS_DAX
+long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
+ unsigned int gup_flags, struct page **pages,
+ struct vm_area_struct **vmas);
+#else
+static inline long get_user_pages_longterm(unsigned long start,
+ unsigned long nr_pages, unsigned int gup_flags,
+ struct page **pages, struct vm_area_struct **vmas)
+{
+ return get_user_pages(start, nr_pages, gup_flags, pages, vmas);
+}
+#endif /* CONFIG_FS_DAX */
+
int get_user_pages_fast(unsigned long start, int nr_pages, int write,
struct page **pages);
diff -puN mm/gup.c~mm-introduce-get_user_pages_longterm mm/gup.c
--- a/mm/gup.c~mm-introduce-get_user_pages_longterm
+++ a/mm/gup.c
@@ -1095,6 +1095,70 @@ long get_user_pages(unsigned long start,
}
EXPORT_SYMBOL(get_user_pages);
+#ifdef CONFIG_FS_DAX
+/*
+ * This is the same as get_user_pages() in that it assumes we are
+ * operating on the current task's mm, but it goes further to validate
+ * that the vmas associated with the address range are suitable for
+ * longterm elevated page reference counts. For example, filesystem-dax
+ * mappings are subject to the lifetime enforced by the filesystem and
+ * we need guarantees that longterm users like RDMA and V4L2 only
+ * establish mappings that have a kernel enforced revocation mechanism.
+ *
+ * "longterm" == userspace controlled elevated page count lifetime.
+ * Contrast this to iov_iter_get_pages() usages which are transient.
+ */
+long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
+ unsigned int gup_flags, struct page **pages,
+ struct vm_area_struct **vmas_arg)
+{
+ struct vm_area_struct **vmas = vmas_arg;
+ struct vm_area_struct *vma_prev = NULL;
+ long rc, i;
+
+ if (!pages)
+ return -EINVAL;
+
+ if (!vmas) {
+ vmas = kcalloc(nr_pages, sizeof(struct vm_area_struct *),
+ GFP_KERNEL);
+ if (!vmas)
+ return -ENOMEM;
+ }
+
+ rc = get_user_pages(start, nr_pages, gup_flags, pages, vmas);
+
+ for (i = 0; i < rc; i++) {
+ struct vm_area_struct *vma = vmas[i];
+
+ if (vma == vma_prev)
+ continue;
+
+ vma_prev = vma;
+
+ if (vma_is_fsdax(vma))
+ break;
+ }
+
+ /*
+ * Either get_user_pages() failed, or the vma validation
+ * succeeded, in either case we don't need to put_page() before
+ * returning.
+ */
+ if (i >= rc)
+ goto out;
+
+ for (i = 0; i < rc; i++)
+ put_page(pages[i]);
+ rc = -EOPNOTSUPP;
+out:
+ if (vmas != vmas_arg)
+ kfree(vmas);
+ return rc;
+}
+EXPORT_SYMBOL(get_user_pages_longterm);
+#endif /* CONFIG_FS_DAX */
+
/**
* populate_vma_page_range() - populate a range of pages in the vma.
* @vma: target vma
_
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: mm, hugetlbfs: introduce ->split() to vm_operations_struct
Patch series "device-dax: fix unaligned munmap handling"
When device-dax is operating in huge-page mode we want it to behave like
hugetlbfs and fail attempts to split vmas into unaligned ranges. It would
be messy to teach the munmap path about device-dax alignment constraints
in the same (hstate) way that hugetlbfs communicates this constraint.
Instead, these patches introduce a new ->split() vm operation.
This patch (of 2):
The device-dax interface has similar constraints as hugetlbfs in that it
requires the munmap path to unmap in huge page aligned units. Rather than
add more custom vma handling code in __split_vma() introduce a new vm
operation to perform this vma specific check.
Link: http://lkml.kernel.org/r/151130418135.4029.6783191281930729710.stgit@dwilli…
Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mm.h | 1 +
mm/hugetlb.c | 8 ++++++++
mm/mmap.c | 8 +++++---
3 files changed, 14 insertions(+), 3 deletions(-)
diff -puN include/linux/mm.h~mm-hugetlbfs-introduce-split-to-vm_operations_struct include/linux/mm.h
--- a/include/linux/mm.h~mm-hugetlbfs-introduce-split-to-vm_operations_struct
+++ a/include/linux/mm.h
@@ -377,6 +377,7 @@ enum page_entry_size {
struct vm_operations_struct {
void (*open)(struct vm_area_struct * area);
void (*close)(struct vm_area_struct * area);
+ int (*split)(struct vm_area_struct * area, unsigned long addr);
int (*mremap)(struct vm_area_struct * area);
int (*fault)(struct vm_fault *vmf);
int (*huge_fault)(struct vm_fault *vmf, enum page_entry_size pe_size);
diff -puN mm/hugetlb.c~mm-hugetlbfs-introduce-split-to-vm_operations_struct mm/hugetlb.c
--- a/mm/hugetlb.c~mm-hugetlbfs-introduce-split-to-vm_operations_struct
+++ a/mm/hugetlb.c
@@ -3125,6 +3125,13 @@ static void hugetlb_vm_op_close(struct v
}
}
+static int hugetlb_vm_op_split(struct vm_area_struct *vma, unsigned long addr)
+{
+ if (addr & ~(huge_page_mask(hstate_vma(vma))))
+ return -EINVAL;
+ return 0;
+}
+
/*
* We cannot handle pagefaults against hugetlb pages at all. They cause
* handle_mm_fault() to try to instantiate regular-sized pages in the
@@ -3141,6 +3148,7 @@ const struct vm_operations_struct hugetl
.fault = hugetlb_vm_op_fault,
.open = hugetlb_vm_op_open,
.close = hugetlb_vm_op_close,
+ .split = hugetlb_vm_op_split,
};
static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page,
diff -puN mm/mmap.c~mm-hugetlbfs-introduce-split-to-vm_operations_struct mm/mmap.c
--- a/mm/mmap.c~mm-hugetlbfs-introduce-split-to-vm_operations_struct
+++ a/mm/mmap.c
@@ -2555,9 +2555,11 @@ int __split_vma(struct mm_struct *mm, st
struct vm_area_struct *new;
int err;
- if (is_vm_hugetlb_page(vma) && (addr &
- ~(huge_page_mask(hstate_vma(vma)))))
- return -EINVAL;
+ if (vma->vm_ops && vma->vm_ops->split) {
+ err = vma->vm_ops->split(vma, addr);
+ if (err)
+ return err;
+ }
new = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
if (!new)
_
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: mm: fix device-dax pud write-faults triggered by get_user_pages()
Currently only get_user_pages_fast() can safely handle the writable gup
case due to its use of pud_access_permitted() to check whether the pud
entry is writable. In the gup slow path pud_write() is used instead of
pud_access_permitted() and to date it has been unimplemented, just calls
BUG_ON().
kernel BUG at ./include/linux/hugetlb.h:244!
[..]
RIP: 0010:follow_devmap_pud+0x482/0x490
[..]
Call Trace:
follow_page_mask+0x28c/0x6e0
__get_user_pages+0xe4/0x6c0
get_user_pages_unlocked+0x130/0x1b0
get_user_pages_fast+0x89/0xb0
iov_iter_get_pages_alloc+0x114/0x4a0
nfs_direct_read_schedule_iovec+0xd2/0x350
? nfs_start_io_direct+0x63/0x70
nfs_file_direct_read+0x1e0/0x250
nfs_file_read+0x90/0xc0
For now this just implements a simple check for the _PAGE_RW bit similar
to pmd_write. However, this implies that the gup-slow-path check is
missing the extra checks that the gup-fast-path performs with
pud_access_permitted. Later patches will align all checks to use the
'access_permitted' helper if the architecture provides it. Note that the
generic 'access_permitted' helper fallback is the simple _PAGE_RW check on
architectures that do not define the 'access_permitted' helper(s).
[dan.j.williams(a)intel.com: fix powerpc compile error]
Link: http://lkml.kernel.org/r/151129126165.37405.16031785266675461397.stgit@dwil…
Link: http://lkml.kernel.org/r/151043109938.2842.14834662818213616199.stgit@dwill…
Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reported-by: Stephen Rothwell <sfr(a)canb.auug.org.au>
Acked-by: Thomas Gleixner <tglx(a)linutronix.de> [x86]
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/include/asm/pgtable.h | 6 ++++++
include/asm-generic/pgtable.h | 8 ++++++++
include/linux/hugetlb.h | 8 --------
3 files changed, 14 insertions(+), 8 deletions(-)
diff -puN arch/x86/include/asm/pgtable.h~mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages arch/x86/include/asm/pgtable.h
--- a/arch/x86/include/asm/pgtable.h~mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages
+++ a/arch/x86/include/asm/pgtable.h
@@ -1088,6 +1088,12 @@ static inline void pmdp_set_wrprotect(st
clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp);
}
+#define pud_write pud_write
+static inline int pud_write(pud_t pud)
+{
+ return pud_flags(pud) & _PAGE_RW;
+}
+
/*
* clone_pgd_range(pgd_t *dst, pgd_t *src, int count);
*
diff -puN include/asm-generic/pgtable.h~mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages include/asm-generic/pgtable.h
--- a/include/asm-generic/pgtable.h~mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages
+++ a/include/asm-generic/pgtable.h
@@ -814,6 +814,14 @@ static inline int pmd_write(pmd_t pmd)
#endif /* __HAVE_ARCH_PMD_WRITE */
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+#ifndef pud_write
+static inline int pud_write(pud_t pud)
+{
+ BUG();
+ return 0;
+}
+#endif /* pud_write */
+
#if !defined(CONFIG_TRANSPARENT_HUGEPAGE) || \
(defined(CONFIG_TRANSPARENT_HUGEPAGE) && \
!defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD))
diff -puN include/linux/hugetlb.h~mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages include/linux/hugetlb.h
--- a/include/linux/hugetlb.h~mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages
+++ a/include/linux/hugetlb.h
@@ -239,14 +239,6 @@ static inline int pgd_write(pgd_t pgd)
}
#endif
-#ifndef pud_write
-static inline int pud_write(pud_t pud)
-{
- BUG();
- return 0;
-}
-#endif
-
#define HUGETLB_ANON_FILE "anon_hugepage"
enum {
_
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: mm/cma: fix alloc_contig_range ret code/potential leak
If the call __alloc_contig_migrate_range() in alloc_contig_range returns
-EBUSY, processing continues so that test_pages_isolated() is called where
there is a tracepoint to identify the busy pages. However, it is possible
for busy pages to become available between the calls to these two
routines. In this case, the range of pages may be allocated.
Unfortunately, the original return code (ret == -EBUSY) is still set and
returned to the caller. Therefore, the caller believes the pages were not
allocated and they are leaked.
Update comment to indicate that allocation is still possible even if
__alloc_contig_migrate_range returns -EBUSY. Also, clear return code in
this case so that it is not accidentally used or returned to caller.
Link: http://lkml.kernel.org/r/20171122185214.25285-1-mike.kravetz@oracle.com
Fixes: 8ef5849fa8a2 ("mm/cma: always check which page caused allocation failure")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Acked-by: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Michal Nazarewicz <mina86(a)mina86.com>
Cc: Laura Abbott <labbott(a)redhat.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff -puN mm/page_alloc.c~mm-cma-fix-alloc_contig_range-ret-code-potential-leak-v2 mm/page_alloc.c
--- a/mm/page_alloc.c~mm-cma-fix-alloc_contig_range-ret-code-potential-leak-v2
+++ a/mm/page_alloc.c
@@ -7652,11 +7652,18 @@ int alloc_contig_range(unsigned long sta
/*
* In case of -EBUSY, we'd like to know which page causes problem.
- * So, just fall through. We will check it in test_pages_isolated().
+ * So, just fall through. test_pages_isolated() has a tracepoint
+ * which will report the busy page.
+ *
+ * It is possible that busy pages could become available before
+ * the call to test_pages_isolated, and the range will actually be
+ * allocated. So, if we fall through be sure to clear ret so that
+ * -EBUSY is not accidentally used or returned to caller.
*/
ret = __alloc_contig_migrate_range(&cc, start, end);
if (ret && ret != -EBUSY)
goto done;
+ ret =0;
/*
* Pages from [start, end) are within a MAX_ORDER_NR_PAGES
_
From: Wang Nan <wangnan0(a)huawei.com>
Subject: mm, oom_reaper: gather each vma to prevent leaking TLB entry
tlb_gather_mmu(&tlb, mm, 0, -1) means gathering the whole virtual memory
space. In this case, tlb->fullmm is true. Some archs like arm64 doesn't
flush TLB when tlb->fullmm is true:
commit 5a7862e83000 ("arm64: tlbflush: avoid flushing when fullmm == 1").
Which causes leaking of tlb entries.
Will clarifies his patch:
: Basically, we tag each address space with an ASID (PCID on x86) which
: is resident in the TLB. This means we can elide TLB invalidation when
: pulling down a full mm because we won't ever assign that ASID to another mm
: without doing TLB invalidation elsewhere (which actually just nukes the
: whole TLB).
:
: I think that means that we could potentially not fault on a kernel uaccess,
: because we could hit in the TLB.
There could be a window between complete_signal() sending IPI to other
cores and all threads sharing this mm are really kicked off from cores.
In this window, the oom reaper may calls tlb_flush_mmu_tlbonly() to flush
TLB then frees pages. However, due to the above problem, the TLB entries
are not really flushed on arm64. Other threads are possible to access
these pages through TLB entries. Moreover, a copy_to_user() can also
write to these pages without generating page fault, causes use-after-free
bugs.
This patch gathers each vma instead of gathering full vm space. In this
case tlb->fullmm is not true. The behavior of oom reaper become similar
to munmapping before do_exit, which should be safe for all archs.
Link: http://lkml.kernel.org/r/20171107095453.179940-1-wangnan0@huawei.com
Fixes: aac453635549 ("mm, oom: introduce oom reaper")
Signed-off-by: Wang Nan <wangnan0(a)huawei.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: David Rientjes <rientjes(a)google.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: Bob Liu <liubo95(a)huawei.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/oom_kill.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff -puN mm/oom_kill.c~mm-oom_reaper-gather-each-vma-to-prevent-leaking-tlb-entry mm/oom_kill.c
--- a/mm/oom_kill.c~mm-oom_reaper-gather-each-vma-to-prevent-leaking-tlb-entry
+++ a/mm/oom_kill.c
@@ -550,7 +550,6 @@ static bool __oom_reap_task_mm(struct ta
*/
set_bit(MMF_UNSTABLE, &mm->flags);
- tlb_gather_mmu(&tlb, mm, 0, -1);
for (vma = mm->mmap ; vma; vma = vma->vm_next) {
if (!can_madv_dontneed_vma(vma))
continue;
@@ -565,11 +564,13 @@ static bool __oom_reap_task_mm(struct ta
* we do not want to block exit_mmap by keeping mm ref
* count elevated without a good reason.
*/
- if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED))
+ if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED)) {
+ tlb_gather_mmu(&tlb, mm, vma->vm_start, vma->vm_end);
unmap_page_range(&tlb, vma, vma->vm_start, vma->vm_end,
NULL);
+ tlb_finish_mmu(&tlb, vma->vm_start, vma->vm_end);
+ }
}
- tlb_finish_mmu(&tlb, 0, -1);
pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n",
task_pid_nr(tsk), tsk->comm,
K(get_mm_counter(mm, MM_ANONPAGES)),
_
From: Michal Hocko <mhocko(a)suse.com>
Subject: mm, memory_hotplug: do not back off draining pcp free pages from kworker context
drain_all_pages backs off when called from a kworker context since
0ccce3b924212 ("mm, page_alloc: drain per-cpu pages from workqueue
context") because the original IPI based pcp draining has been replaced by
a WQ based one and the check wanted to prevent from recursion and inter
workers dependencies. This has made some sense at the time because the
system WQ has been used and one worker holding the lock could be blocked
while waiting for new workers to emerge which can be a problem under OOM
conditions.
Since then ce612879ddc7 ("mm: move pcp and lru-pcp draining into single
wq") has moved draining to a dedicated (mm_percpu_wq) WQ with a rescuer so
we shouldn't depend on any other WQ activity to make a forward progress so
calling drain_all_pages from a worker context is safe as long as this
doesn't happen from mm_percpu_wq itself which is not the case because all
workers are required to _not_ depend on any MM locks.
Why is this a problem in the first place? ACPI driven memory hot-remove
(acpi_device_hotplug) is executed from the worker context. We end up
calling __offline_pages to free all the pages and that requires both
lru_add_drain_all_cpuslocked and drain_all_pages to do their job otherwise
we can have dangling pages on pcp lists and fail the offline operation
(__test_page_isolated_in_pageblock would see a page with 0 ref. count but
without PageBuddy set).
Fix the issue by removing the worker check in drain_all_pages.
lru_add_drain_all_cpuslocked doesn't have this restriction so it works as
expected.
Link: http://lkml.kernel.org/r/20170828093341.26341-1-mhocko@kernel.org
Fixes: 0ccce3b924212 ("mm, page_alloc: drain per-cpu pages from workqueue context")
Signed-off-by: Michal Hocko <mhocko(a)suse.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [4.11+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 4 ----
1 file changed, 4 deletions(-)
diff -puN mm/page_alloc.c~mm-memory_hotplug-do-not-back-off-draining-pcp-free-pages-from-kworker-context mm/page_alloc.c
--- a/mm/page_alloc.c~mm-memory_hotplug-do-not-back-off-draining-pcp-free-pages-from-kworker-context
+++ a/mm/page_alloc.c
@@ -2507,10 +2507,6 @@ void drain_all_pages(struct zone *zone)
if (WARN_ON_ONCE(!mm_percpu_wq))
return;
- /* Workqueues cannot recurse */
- if (current->flags & PF_WQ_WORKER)
- return;
-
/*
* Do not drain if one is already in progress unless it's specific to
* a zone. Such callers are primarily CMA and memory hotplug and need
_
Fix an API loophole introduced with commit 9791554b45a2 ("MIPS,prctl:
add PR_[GS]ET_FP_MODE prctl options for MIPS"), where the caller of
prctl(2) is incorrectly allowed to make a change to CP0.Status.FR or
CP0.Config5.FRE register bits even if CONFIG_MIPS_O32_FP64_SUPPORT has
not been enabled, despite that an executable requesting the mode
requested via ELF file annotation would not be allowed to run in the
first place, or for n64 and n64 ABI tasks which do not have non-default
modes defined at all. Add suitable checks to `mips_set_process_fp_mode'
and bail out if an invalid mode change has been requested for the ABI in
effect, even if the FPU hardware or emulation would otherwise allow it.
Always succeed however without taking any further action if the mode
requested is the same as one already in effect, regardless of whether
any mode change, should it be requested, would actually be allowed for
the task concerned.
Cc: stable(a)vger.kernel.org # 4.0+
Fixes: 9791554b45a2 ("MIPS,prctl: add PR_[GS]ET_FP_MODE prctl options for MIPS")
Signed-off-by: Maciej W. Rozycki <macro(a)mips.com>
---
arch/mips/kernel/process.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
linux-mips-prctl-fp-mode-o32-fp64.diff
Index: linux-sfr-test/arch/mips/kernel/process.c
===================================================================
--- linux-sfr-test.orig/arch/mips/kernel/process.c 2017-11-25 12:40:55.868109000 +0000
+++ linux-sfr-test/arch/mips/kernel/process.c 2017-11-25 12:41:56.411578000 +0000
@@ -705,6 +705,18 @@ int mips_set_process_fp_mode(struct task
struct task_struct *t;
int max_users;
+ /* If nothing to change, return right away, successfully. */
+ if (value == mips_get_process_fp_mode(task))
+ return 0;
+
+ /* Only accept a mode change if 64-bit FP enabled for o32. */
+ if (!IS_ENABLED(CONFIG_MIPS_O32_FP64_SUPPORT))
+ return -EOPNOTSUPP;
+
+ /* And only for o32 tasks. */
+ if (IS_ENABLED(CONFIG_64BIT) && !test_thread_flag(TIF_32BIT_REGS))
+ return -EOPNOTSUPP;
+
/* Check the value is valid */
if (value & ~known_bits)
return -EOPNOTSUPP;
From: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Previously I was under the impression that the scanline counter
reads 0 when the pipe is off. Turns out that's not correct, and
instead the scanline counter simply stops when the pipe stops, and
it retains it's last value until the pipe starts up again, at which
point the scanline counter jumps to vblank start.
These jumps can cause the timestamp to jump backwards by one frame.
Since we use the timestamps to guesstimage also the frame counter
value on gen2, that would cause the frame counter to also jump
backwards, which leads to a massice difference from the previous value.
The end result is that flips/vblank events don't appear to complete as
they're stuck waiting for the frame counter to catch up to that massive
difference.
Fix the problem properly by actually making sure the scanline counter
has started to move before we assume that it's safe to enable vblank
processing.
v2: Less pointless duplication in the code (Chris)
Cc: stable(a)vger.kernel.org
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris(a)chris-wilson.co.uk> #v1
Fixes: b7792d8b54cc ("drm/i915: Wait for pipe to start before sampling vblank timestamps on gen2")
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
---
drivers/gpu/drm/i915/intel_display.c | 51 +++++++++++++++++++++++++-----------
1 file changed, 35 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index f7dc7b7fed80..7280dd699316 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -998,7 +998,8 @@ enum transcoder intel_pipe_to_cpu_transcoder(struct drm_i915_private *dev_priv,
return crtc->config->cpu_transcoder;
}
-static bool pipe_dsl_stopped(struct drm_i915_private *dev_priv, enum pipe pipe)
+static bool pipe_scanline_is_moving(struct drm_i915_private *dev_priv,
+ enum pipe pipe)
{
i915_reg_t reg = PIPEDSL(pipe);
u32 line1, line2;
@@ -1013,7 +1014,28 @@ static bool pipe_dsl_stopped(struct drm_i915_private *dev_priv, enum pipe pipe)
msleep(5);
line2 = I915_READ(reg) & line_mask;
- return line1 == line2;
+ return line1 != line2;
+}
+
+static void wait_for_pipe_scanline_moving(struct intel_crtc *crtc, bool state)
+{
+ struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
+ enum pipe pipe = crtc->pipe;
+
+ /* Wait for the display line to settle/start moving */
+ if (wait_for(pipe_scanline_is_moving(dev_priv, pipe) == state, 100))
+ DRM_ERROR("pipe %c scanline %s wait timed out\n",
+ pipe_name(pipe), onoff(state));
+}
+
+static void intel_wait_for_pipe_scanline_stopped(struct intel_crtc *crtc)
+{
+ wait_for_pipe_scanline_moving(crtc, false);
+}
+
+static void intel_wait_for_pipe_scanline_moving(struct intel_crtc *crtc)
+{
+ wait_for_pipe_scanline_moving(crtc, true);
}
/*
@@ -1036,7 +1058,6 @@ static void intel_wait_for_pipe_off(struct intel_crtc *crtc)
{
struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
enum transcoder cpu_transcoder = crtc->config->cpu_transcoder;
- enum pipe pipe = crtc->pipe;
if (INTEL_GEN(dev_priv) >= 4) {
i915_reg_t reg = PIPECONF(cpu_transcoder);
@@ -1047,9 +1068,7 @@ static void intel_wait_for_pipe_off(struct intel_crtc *crtc)
100))
WARN(1, "pipe_off wait timed out\n");
} else {
- /* Wait for the display line to settle */
- if (wait_for(pipe_dsl_stopped(dev_priv, pipe), 100))
- WARN(1, "pipe_off wait timed out\n");
+ intel_wait_for_pipe_scanline_stopped(crtc);
}
}
@@ -1862,15 +1881,14 @@ static void intel_enable_pipe(struct intel_crtc *crtc)
POSTING_READ(reg);
/*
- * Until the pipe starts DSL will read as 0, which would cause
- * an apparent vblank timestamp jump, which messes up also the
- * frame count when it's derived from the timestamps. So let's
- * wait for the pipe to start properly before we call
- * drm_crtc_vblank_on()
+ * Until the pipe starts PIPEDSL reads will return a stale value,
+ * which causes an apparent vblank timestamp jump when PIPEDSL
+ * resets to its proper value. That also messes up the frame count
+ * when it's derived from the timestamps. So let's wait for the
+ * pipe to start properly before we call drm_crtc_vblank_on()
*/
- if (dev->max_vblank_count == 0 &&
- wait_for(intel_get_crtc_scanline(crtc) != crtc->scanline_offset, 50))
- DRM_ERROR("pipe %c didn't start\n", pipe_name(pipe));
+ if (dev->max_vblank_count == 0)
+ intel_wait_for_pipe_scanline_moving(crtc);
}
/**
@@ -14728,6 +14746,8 @@ void i830_enable_pipe(struct drm_i915_private *dev_priv, enum pipe pipe)
void i830_disable_pipe(struct drm_i915_private *dev_priv, enum pipe pipe)
{
+ struct intel_crtc *crtc = intel_get_crtc_for_pipe(dev_priv, pipe);
+
DRM_DEBUG_KMS("disabling pipe %c due to force quirk\n",
pipe_name(pipe));
@@ -14737,8 +14757,7 @@ void i830_disable_pipe(struct drm_i915_private *dev_priv, enum pipe pipe)
I915_WRITE(PIPECONF(pipe), 0);
POSTING_READ(PIPECONF(pipe));
- if (wait_for(pipe_dsl_stopped(dev_priv, pipe), 100))
- DRM_ERROR("pipe %c off wait timed out\n", pipe_name(pipe));
+ intel_wait_for_pipe_scanline_stopped(crtc);
I915_WRITE(DPLL(pipe), DPLL_VGA_MODE_DIS);
POSTING_READ(DPLL(pipe));
--
2.13.6
On Wed, Nov 29, 2017 at 10:07:02AM +0000, Mark Brown wrote:
> On Wed, Nov 29, 2017 at 09:07:45AM +0100, Greg Kroah-Hartman wrote:
> > On Tue, Nov 28, 2017 at 11:07:01PM +0530, Naresh Kamboju wrote:
>
> > > Results from Linaro’s test farm.
> > > No regressions on arm64, arm and x86_64.
>
> > Thanks for testing.
>
> > What is up with the odd email subject prefix?
>
> There's another internal list for looking at LKFT been set up and
> they've set it up as a mailman list adding subject prefixes :(
That's a pretty horrid thing to spam the public with :(
I'm going to drop the linaro.org address from my -rc announcements now
until someone learns how to properly sort their mailing lists by mail
headers, and not email subjects...
greg k-h
Some drivers like i915 start with crtc's enabled, but with deferred
fbcon setup they were no longer disabled as part of fbdev setup.
Headless units could no longer enter pc3 state because the crtc was
still enabled.
Fix this by calling restore_fbdev_mode when we would have called
it otherwise once during initial fbdev setup.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Fixes: ca91a2758fce ("drm/fb-helper: Support deferred setup")
Cc: <stable(a)vger.kernel.org> # v4.14+
Reported-by: Thomas Voegtle <tv(a)lio96.de>
---
drivers/gpu/drm/drm_fb_helper.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
index 07374008f146..e56166334455 100644
--- a/drivers/gpu/drm/drm_fb_helper.c
+++ b/drivers/gpu/drm/drm_fb_helper.c
@@ -1809,6 +1809,10 @@ static int drm_fb_helper_single_fb_probe(struct drm_fb_helper *fb_helper,
if (crtc_count == 0 || sizes.fb_width == -1 || sizes.fb_height == -1) {
DRM_INFO("Cannot find any crtc or sizes\n");
+
+ /* First time: disable all crtc's.. */
+ if (!fb_helper->deferred_setup && !READ_ONCE(fb_helper->dev->master))
+ restore_fbdev_mode(fb_helper);
return -EAGAIN;
}
--
2.15.0
From: Daniel Jurgens <danielj(a)mellanox.com>
For now the only LSM security enforcement mechanism available is
specific to InfiniBand. Bypass enforcement for non-IB link types.
This fixes a regression where modify_qp fails for iWARP because
querying the PKEY returns -EINVAL.
Cc: Paul Moore <paul(a)paul-moore.com>
Cc: Don Dutile <ddutile(a)redhat.com>
Cc: stable(a)vger.kernel.org
Reported-by: Potnuri Bharat Teja <bharat(a)chelsio.com>
Fixes: d291f1a65232("IB/core: Enforce PKey security on QPs")
Fixes: 47a2b338fe63("IB/core: Enforce security on management datagrams")
Signed-off-by: Daniel Jurgens <danielj(a)mellanox.com>
Reviewed-by: Parav Pandit <parav(a)mellanox.com>
Tested-by: Potnuri Bharat Teja <bharat(a)chelsio.com>
Signed-off-by: Leon Romanovsky <leon(a)kernel.org>
---
Changelog:
v2->v3: Fix build warning
v1->v2: Fixed build errors
v0->v1: Added proper SElinux patch
---
drivers/infiniband/core/security.c | 47 +++++++++++++++++++++++++++++++++++---
1 file changed, 44 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/core/security.c b/drivers/infiniband/core/security.c
index 209d057..5bc323f 100644
--- a/drivers/infiniband/core/security.c
+++ b/drivers/infiniband/core/security.c
@@ -417,8 +417,17 @@ void ib_close_shared_qp_security(struct ib_qp_security *sec)
int ib_create_qp_security(struct ib_qp *qp, struct ib_device *dev)
{
+ u8 i = rdma_start_port(dev);
+ bool is_ib = false;
int ret;
+ while (i <= rdma_end_port(dev) && !is_ib)
+ is_ib = rdma_protocol_ib(dev, i++);
+
+ /* If this isn't an IB device don't create the security context */
+ if (!is_ib)
+ return 0;
+
qp->qp_sec = kzalloc(sizeof(*qp->qp_sec), GFP_KERNEL);
if (!qp->qp_sec)
return -ENOMEM;
@@ -441,6 +450,10 @@ int ib_create_qp_security(struct ib_qp *qp, struct ib_device *dev)
void ib_destroy_qp_security_begin(struct ib_qp_security *sec)
{
+ /* Return if not IB */
+ if (!sec)
+ return;
+
mutex_lock(&sec->mutex);
/* Remove the QP from the lists so it won't get added to
@@ -470,6 +483,10 @@ void ib_destroy_qp_security_abort(struct ib_qp_security *sec)
int ret;
int i;
+ /* Return if not IB */
+ if (!sec)
+ return;
+
/* If a concurrent cache update is in progress this
* QP security could be marked for an error state
* transition. Wait for this to complete.
@@ -505,6 +522,10 @@ void ib_destroy_qp_security_end(struct ib_qp_security *sec)
{
int i;
+ /* Return if not IB */
+ if (!sec)
+ return;
+
/* If a concurrent cache update is occurring we must
* wait until this QP security structure is processed
* in the QP to error flow before destroying it because
@@ -557,7 +578,7 @@ int ib_security_modify_qp(struct ib_qp *qp,
{
int ret = 0;
struct ib_ports_pkeys *tmp_pps;
- struct ib_ports_pkeys *new_pps;
+ struct ib_ports_pkeys *new_pps = NULL;
struct ib_qp *real_qp = qp->real_qp;
bool special_qp = (real_qp->qp_type == IB_QPT_SMI ||
real_qp->qp_type == IB_QPT_GSI ||
@@ -565,17 +586,25 @@ int ib_security_modify_qp(struct ib_qp *qp,
bool pps_change = ((qp_attr_mask & (IB_QP_PKEY_INDEX | IB_QP_PORT)) ||
(qp_attr_mask & IB_QP_ALT_PATH));
+ WARN_ONCE((qp_attr_mask & IB_QP_PORT &&
+ rdma_protocol_ib(real_qp->device, qp_attr->port_num) &&
+ !real_qp->qp_sec),
+ "%s: QP security is not initialized for IB QP: %d\n",
+ __func__, real_qp->qp_num);
+
/* The port/pkey settings are maintained only for the real QP. Open
* handles on the real QP will be in the shared_qp_list. When
* enforcing security on the real QP all the shared QPs will be
* checked as well.
*/
- if (pps_change && !special_qp) {
+ if (pps_change && !special_qp && real_qp->qp_sec) {
mutex_lock(&real_qp->qp_sec->mutex);
new_pps = get_new_pps(real_qp,
qp_attr,
qp_attr_mask);
+ if (!new_pps)
+ return -ENOMEM;
/* Add this QP to the lists for the new port
* and pkey settings before checking for permission
@@ -600,7 +629,7 @@ int ib_security_modify_qp(struct ib_qp *qp,
qp_attr_mask,
udata);
- if (pps_change && !special_qp) {
+ if (new_pps) {
/* Clean up the lists and free the appropriate
* ports_pkeys structure.
*/
@@ -630,6 +659,9 @@ static int ib_security_pkey_access(struct ib_device *dev,
u16 pkey;
int ret;
+ if (!rdma_protocol_ib(dev, port_num))
+ return 0;
+
ret = ib_get_cached_pkey(dev, port_num, pkey_index, &pkey);
if (ret)
return ret;
@@ -663,6 +695,9 @@ int ib_mad_agent_security_setup(struct ib_mad_agent *agent,
{
int ret;
+ if (!rdma_protocol_ib(agent->device, agent->port_num))
+ return 0;
+
ret = security_ib_alloc_security(&agent->security);
if (ret)
return ret;
@@ -688,6 +723,9 @@ int ib_mad_agent_security_setup(struct ib_mad_agent *agent,
void ib_mad_agent_security_cleanup(struct ib_mad_agent *agent)
{
+ if (!rdma_protocol_ib(agent->device, agent->port_num))
+ return;
+
security_ib_free_security(agent->security);
if (agent->lsm_nb_reg)
unregister_lsm_notifier(&agent->lsm_nb);
@@ -695,6 +733,9 @@ void ib_mad_agent_security_cleanup(struct ib_mad_agent *agent)
int ib_mad_enforce_security(struct ib_mad_agent_private *map, u16 pkey_index)
{
+ if (!rdma_protocol_ib(map->agent.device, map->agent.port_num))
+ return 0;
+
if (map->agent.qp->qp_type == IB_QPT_SMI && !map->agent.smp_allowed)
return -EACCES;
--
1.8.3.1
The direction_output callback of the gpio_chip structure is supposed to
set the output direction but also to set the value of the gpio. For the
armada-37xx driver this callback acted as the gpio_set_direction callback
for the pinctrl.
This patch fixes the behavior of the direction_output callback by also
applying the value received as parameter.
Cc: stable(a)vger.kernel.org
Fixes: 5715092a458c ("pinctrl: armada-37xx: Add gpio support")
Reported-by: Alexandre Belloni <alexandre.belloni(a)free-electrons.com>
Signed-off-by: Gregory CLEMENT <gregory.clement(a)free-electrons.com>
---
drivers/pinctrl/mvebu/pinctrl-armada-37xx.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/drivers/pinctrl/mvebu/pinctrl-armada-37xx.c b/drivers/pinctrl/mvebu/pinctrl-armada-37xx.c
index 71b944748304..c5fe7d4a9065 100644
--- a/drivers/pinctrl/mvebu/pinctrl-armada-37xx.c
+++ b/drivers/pinctrl/mvebu/pinctrl-armada-37xx.c
@@ -408,12 +408,21 @@ static int armada_37xx_gpio_direction_output(struct gpio_chip *chip,
{
struct armada_37xx_pinctrl *info = gpiochip_get_data(chip);
unsigned int reg = OUTPUT_EN;
- unsigned int mask;
+ unsigned int mask, val, ret;
armada_37xx_update_reg(®, offset);
mask = BIT(offset);
- return regmap_update_bits(info->regmap, reg, mask, mask);
+ ret = regmap_update_bits(info->regmap, reg, mask, mask);
+
+ if (ret)
+ return ret;
+
+ reg = OUTPUT_VAL;
+ val = value ? mask : 0;
+ regmap_update_bits(info->regmap, reg, mask, val);
+
+ return 0;
}
static int armada_37xx_gpio_get(struct gpio_chip *chip, unsigned int offset)
--
2.14.2
Fix child-node lookup during probe, which ended up searching the whole
device tree depth-first starting at the parent rather than just matching
on its children.
This would only cause trouble if the child node is missing while there
is an unrelated node named "backlight" elsewhere in the tree.
Fixes: eebfdc17cc6c ("backlight: Add TPS65217 WLED driver")
Cc: stable <stable(a)vger.kernel.org> # 3.7
Cc: Matthias Kaehlcke <matthias(a)kaehlcke.net>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/video/backlight/tps65217_bl.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/video/backlight/tps65217_bl.c b/drivers/video/backlight/tps65217_bl.c
index 380917c86276..762e3feed097 100644
--- a/drivers/video/backlight/tps65217_bl.c
+++ b/drivers/video/backlight/tps65217_bl.c
@@ -184,11 +184,11 @@ static struct tps65217_bl_pdata *
tps65217_bl_parse_dt(struct platform_device *pdev)
{
struct tps65217 *tps = dev_get_drvdata(pdev->dev.parent);
- struct device_node *node = of_node_get(tps->dev->of_node);
+ struct device_node *node;
struct tps65217_bl_pdata *pdata, *err;
u32 val;
- node = of_find_node_by_name(node, "backlight");
+ node = of_get_child_by_name(tps->dev->of_node, "backlight");
if (!node)
return ERR_PTR(-ENODEV);
--
2.15.0
A helper purported to look up a child node based on its name was using
the wrong of-helper and ended up prematurely freeing the parent of-node
while leaking any matching node.
To make things worse, any matching node would not even necessarily be a
child node as the whole device tree was searched depth-first starting at
the parent.
Fixes: 019a7e6b7b31 ("mfd: twl4030-audio: Add DT support")
Cc: stable <stable(a)vger.kernel.org> # 3.7
Cc: Peter Ujfalusi <peter.ujfalusi(a)ti.com>
---
drivers/mfd/twl4030-audio.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/mfd/twl4030-audio.c b/drivers/mfd/twl4030-audio.c
index da16bf45fab4..dc94ffc6321a 100644
--- a/drivers/mfd/twl4030-audio.c
+++ b/drivers/mfd/twl4030-audio.c
@@ -159,13 +159,18 @@ unsigned int twl4030_audio_get_mclk(void)
EXPORT_SYMBOL_GPL(twl4030_audio_get_mclk);
static bool twl4030_audio_has_codec(struct twl4030_audio_data *pdata,
- struct device_node *node)
+ struct device_node *parent)
{
+ struct device_node *node;
+
if (pdata && pdata->codec)
return true;
- if (of_find_node_by_name(node, "codec"))
+ node = of_get_child_by_name(parent, "codec");
+ if (node) {
+ of_node_put(node);
return true;
+ }
return false;
}
--
2.15.0
On the Tegra124 Nyan-Big chromebook the very first SPI message sent to
the EC is failing.
The Tegra SPI driver configures the SPI chip-selects to be active-high
by default (and always has for many years). The EC SPI requires an
active-low chip-select and so the Tegra chip-select is reconfigured to
be active-low when the EC SPI driver calls spi_setup(). The problem is
that if the first SPI message to the EC is sent too soon after
reconfiguring the SPI chip-select, it fails.
The EC SPI driver prevents back-to-back SPI messages being sent too
soon by keeping track of the time the last transfer was sent via the
variable 'last_transfer_ns'. To prevent the very first transfer being
sent too soon, initialise the 'last_transfer_ns' variable after calling
spi_setup() and before sending the first SPI message.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Jon Hunter <jonathanh(a)nvidia.com>
Reviewed-by: Brian Norris <briannorris(a)chromium.org>
---
Changes since V1:
- Added stable-tag and Brian's reviewed-by.
Looks like this issue has been around for several Linux releases now
and it just depends on timing if this issue is seen or not and so there
is no specific commit this fixes. However, would be good to include for
v4.15.
drivers/mfd/cros_ec_spi.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/mfd/cros_ec_spi.c b/drivers/mfd/cros_ec_spi.c
index c9714072e224..a14196e95e9b 100644
--- a/drivers/mfd/cros_ec_spi.c
+++ b/drivers/mfd/cros_ec_spi.c
@@ -667,6 +667,7 @@ static int cros_ec_spi_probe(struct spi_device *spi)
sizeof(struct ec_response_get_protocol_info);
ec_dev->dout_size = sizeof(struct ec_host_request);
+ ec_spi->last_transfer_ns = ktime_get_ns();
err = cros_ec_register(ec_dev);
if (err) {
--
2.7.4
The commit d6810d730022 ("memcg, THP, swap: make mem_cgroup_swapout()
support THP") changed mem_cgroup_swapout() to support transparent huge
page (THP). However the patch missed one location which should be
changed for correctly handling THPs. The resulting bug will cause the
memory cgroups whose THPs were swapped out to become zombies on
deletion.
Fixes: d6810d730022 ("memcg, THP, swap: make mem_cgroup_swapout() support THP")
Signed-off-by: Shakeel Butt <shakeelb(a)google.com>
Cc: stable(a)vger.kernel.org
---
mm/memcontrol.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 50e6906314f8..ac2ffd5e02b9 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6044,7 +6044,7 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
memcg_check_events(memcg, page);
if (!mem_cgroup_is_root(memcg))
- css_put(&memcg->css);
+ css_put_many(&memcg->css, nr_entries);
}
/**
--
2.15.0.417.g466bffb3ac-goog
This is the start of the stable review cycle for the 3.18.85 release.
There are 67 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu Nov 30 10:03:41 UTC 2017.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v3.x/stable-review/patch-3.18.85-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-3.18.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 3.18.85-rc1
Juergen Gross <jgross(a)suse.com>
xen: xenbus driver must not accept invalid transaction ids
Heiko Carstens <heiko.carstens(a)de.ibm.com>
s390/kbuild: enable modversions for symbols exported from asm
Richard Fitzgerald <rf(a)opensource.wolfsonmicro.com>
ASoC: wm_adsp: Don't overrun firmware file buffer when reading region data
Pan Bian <bianpan2016(a)163.com>
btrfs: return the actual error value from from btrfs_uuid_tree_iterate
Florian Westphal <fw(a)strlen.de>
netfilter: nf_tables: fix oob access
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nft_queue: use raw_smp_processor_id()
Pan Bian <bianpan2016(a)163.com>
staging: iio: cdc: fix improper return value
Masashi Honma <masashi.honma(a)gmail.com>
mac80211: Suppress NEW_PEER_CANDIDATE event if no room
Masashi Honma <masashi.honma(a)gmail.com>
mac80211: Remove invalid flag operations in mesh TSF synchronization
Gabriele Mazzotta <gabriele.mzt(a)gmail.com>
ALSA: hda - Apply ALC269_FIXUP_NO_SHUTUP on HDA_FIXUP_ACT_PROBE
Daniel Vetter <daniel.vetter(a)ffwll.ch>
drm/armada: Fix compile fail
Thomas Preisner <thomas.preisner+linux(a)fau.de>
net: 3com: typhoon: typhoon_init_one: fix incorrect return values
Thomas Preisner <thomas.preisner+linux(a)fau.de>
net: 3com: typhoon: typhoon_init_one: make return values more specific
Bjorn Helgaas <bhelgaas(a)google.com>
PCI: Apply _HPX settings only to relevant devices
Santosh Shilimkar <santosh.shilimkar(a)oracle.com>
RDS: RDMA: return appropriate error on rdma map failures
Benjamin Poirier <bpoirier(a)suse.com>
e1000e: Separate signaling for link check/link up
Benjamin Poirier <bpoirier(a)suse.com>
e1000e: Fix return value test
Benjamin Poirier <bpoirier(a)suse.com>
e1000e: Fix error path in link detection
Ben Hutchings <ben.hutchings(a)codethink.co.uk>
iio: iio-trig-periodic-rtc: Free trigger resource correctly
Oliver Neukum <oneukum(a)suse.com>
USB: fix buffer overflows with parsing CDC headers
Brent Taylor <motobud(a)gmail.com>
mtd: nand: Fix writing mtdoops to nand flash.
Tuomas Tynkkynen <tuomas(a)tuxera.com>
net/9p: Switch to wait_event_killable()
Ricardo Ribalda Delgado <ricardo.ribalda(a)gmail.com>
media: v4l2-ctrl: Fix flags field on Control events
Sean Young <sean(a)mess.org>
media: rc: check for integer overflow
Michele Baldessari <michele(a)acksyn.org>
media: Don't do DMA on stack for firmware upload in the AS102 driver
Naveen N. Rao <naveen.n.rao(a)linux.vnet.ibm.com>
powerpc/signal: Properly handle return value from uprobe_deny_signal()
John David Anglin <dave.anglin(a)bell.net>
parisc: Fix validity check of pointer size argument in new CAS implementation
Brian King <brking(a)linux.vnet.ibm.com>
ixgbe: Fix skb list corruption on Power systems
Brian King <brking(a)linux.vnet.ibm.com>
fm10k: Use smp_rmb rather than read_barrier_depends
Brian King <brking(a)linux.vnet.ibm.com>
i40evf: Use smp_rmb rather than read_barrier_depends
Brian King <brking(a)linux.vnet.ibm.com>
ixgbevf: Use smp_rmb rather than read_barrier_depends
Brian King <brking(a)linux.vnet.ibm.com>
igbvf: Use smp_rmb rather than read_barrier_depends
Brian King <brking(a)linux.vnet.ibm.com>
igb: Use smp_rmb rather than read_barrier_depends
Brian King <brking(a)linux.vnet.ibm.com>
i40e: Use smp_rmb rather than read_barrier_depends
Wang YanQing <udknight(a)gmail.com>
time: Always make sure wall_to_monotonic isn't positive
Johan Hovold <johan(a)kernel.org>
NFC: fix device-allocation error return
Bart Van Assche <bart.vanassche(a)wdc.com>
IB/srpt: Do not accept invalid initiator port names
Johan Hovold <johan(a)kernel.org>
clk: ti: dra7-atl-clock: fix child-node lookups
Peter Ujfalusi <peter.ujfalusi(a)ti.com>
clk: ti: dra7-atl-clock: Fix of_node reference counting
Paolo Bonzini <pbonzini(a)redhat.com>
KVM: SVM: obey guest PAT
Ladi Prosek <lprosek(a)redhat.com>
KVM: nVMX: set IDTR and GDTR limits when loading L1 host state
Nicholas Bellinger <nab(a)linux-iscsi.org>
iscsi-target: Fix non-immediate TMR reference leak
Tuomas Tynkkynen <tuomas(a)tuxera.com>
fs/9p: Compare qid.path in v9fs_test_inode
Takashi Iwai <tiwai(a)suse.de>
ALSA: timer: Remove kernel warning at compat ioctl error paths
Takashi Iwai <tiwai(a)suse.de>
ALSA: usb-audio: Add sanity checks in v2 clock parsers
Takashi Iwai <tiwai(a)suse.de>
ALSA: usb-audio: Fix potential out-of-bound access at parsing SU
Takashi Iwai <tiwai(a)suse.de>
ALSA: usb-audio: Add sanity checks to FE parser
Theodore Ts'o <tytso(a)mit.edu>
ext4: fix interaction between i_size, fallocate, and delalloc after a crash
Andrew Elble <aweits(a)rit.edu>
nfsd: deal with revoked delegations appropriately
Chuck Lever <chuck.lever(a)oracle.com>
nfs: Fix ugly referral attributes
Joshua Watt <jpewhacker(a)gmail.com>
NFS: Fix typo in nomigration mount option
Arnd Bergmann <arnd(a)arndb.de>
isofs: fix timestamps beyond 2027
Coly Li <colyli(a)suse.de>
bcache: check ca->alloc_thread initialized before wake up it
Dan Carpenter <dan.carpenter(a)oracle.com>
eCryptfs: use after free in ecryptfs_release_messaging()
Andreas Rohner <andreas.rohner(a)gmx.net>
nilfs2: fix race condition that causes file system corruption
NeilBrown <neilb(a)suse.com>
autofs: don't fail mount for transient error
Mirko Parthey <mirko.parthey(a)web.de>
MIPS: BCM47XX: Fix LED inversion for WRT54GSv1
Maciej W. Rozycki <macro(a)mips.com>
MIPS: Fix an n32 core file generation regset support regression
Hou Tao <houtao1(a)huawei.com>
dm: fix race between dm_get_from_kobject() and __dm_destroy()
Eric Biggers <ebiggers(a)google.com>
dm bufio: fix integer overflow when limiting maximum cache size
Vijendar Mukunda <Vijendar.Mukunda(a)amd.com>
ALSA: hda: Add Raven PCI ID
Philip Derrin <philip(a)cog.systems>
ARM: 8721/1: mm: dump: check hardware RO bit for LPAE
Masami Hiramatsu <mhiramat(a)kernel.org>
x86/decoder: Add new TEST instruction pattern
Eric Biggers <ebiggers(a)google.com>
lib/mpi: call cond_resched() from mpi_powm() loop
Paul E. McKenney <paulmck(a)linux.vnet.ibm.com>
sched: Make resched_cpu() unconditional
WANG Cong <xiyou.wangcong(a)gmail.com>
ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER
Vasily Gorbik <gor(a)linux.vnet.ibm.com>
s390/disassembler: increase show_code buffer size
-------------
Diffstat:
Makefile | 4 ++--
arch/arm/mm/dump.c | 4 ++--
arch/mips/bcm47xx/leds.c | 2 +-
arch/mips/kernel/ptrace.c | 17 +++++++++++++
arch/parisc/kernel/syscall.S | 6 ++---
arch/powerpc/kernel/signal.c | 2 +-
arch/s390/include/asm/asm-prototypes.h | 8 +++++++
arch/s390/kernel/dis.c | 4 ++--
arch/x86/kvm/svm.c | 7 ++++++
arch/x86/kvm/vmx.c | 2 ++
arch/x86/lib/x86-opcode-map.txt | 2 +-
drivers/clk/ti/clk-dra7-atl.c | 3 ++-
drivers/gpu/drm/armada/Makefile | 2 ++
drivers/infiniband/ulp/srpt/ib_srpt.c | 9 ++++---
drivers/md/bcache/alloc.c | 3 ++-
drivers/md/dm-bufio.c | 15 +++++-------
drivers/md/dm.c | 12 ++++++----
drivers/media/rc/ir-lirc-codec.c | 9 ++++---
drivers/media/usb/as102/as102_fw.c | 28 +++++++++++++---------
drivers/media/v4l2-core/v4l2-ctrls.c | 16 +++++++++----
drivers/mtd/nand/nand_base.c | 9 ++++---
drivers/net/ethernet/3com/typhoon.c | 25 ++++++++++---------
drivers/net/ethernet/intel/e1000e/mac.c | 11 ++++++---
drivers/net/ethernet/intel/e1000e/netdev.c | 4 ++--
drivers/net/ethernet/intel/e1000e/phy.c | 7 +++---
drivers/net/ethernet/intel/fm10k/fm10k_main.c | 2 +-
drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +-
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 2 +-
drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 2 +-
drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
drivers/net/ethernet/intel/igbvf/netdev.c | 2 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 +-
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 2 +-
drivers/net/usb/cdc_ether.c | 9 ++++++-
drivers/pci/probe.c | 15 ++++++++++--
drivers/staging/iio/cdc/ad7150.c | 2 +-
.../staging/iio/trigger/iio-trig-periodic-rtc.c | 6 ++---
drivers/target/iscsi/iscsi_target.c | 8 ++++---
drivers/usb/class/cdc-acm.c | 2 +-
drivers/usb/class/cdc-wdm.c | 2 ++
drivers/xen/xenbus/xenbus_dev_frontend.c | 2 +-
fs/9p/vfs_inode.c | 3 +++
fs/9p/vfs_inode_dotl.c | 3 +++
fs/autofs4/waitq.c | 15 +++++++++++-
fs/btrfs/uuid-tree.c | 4 +---
fs/ecryptfs/messaging.c | 7 +++---
fs/ext4/extents.c | 6 +++--
fs/isofs/isofs.h | 2 +-
fs/isofs/rock.h | 2 +-
fs/isofs/util.c | 2 +-
fs/nfs/nfs4proc.c | 18 +++++++-------
fs/nfs/super.c | 2 +-
fs/nfsd/nfs4state.c | 25 ++++++++++++++++++-
fs/nilfs2/segment.c | 6 +++--
kernel/sched/core.c | 3 +--
kernel/time/timekeeping.c | 13 +++++++---
lib/mpi/mpi-pow.c | 2 ++
net/9p/client.c | 3 +--
net/9p/trans_virtio.c | 13 +++++-----
net/ipv6/route.c | 6 ++++-
net/mac80211/ieee80211_i.h | 1 -
net/mac80211/mesh.c | 3 ---
net/mac80211/mesh_plink.c | 14 ++++++-----
net/mac80211/mesh_sync.c | 11 ---------
net/netfilter/nf_tables_api.c | 2 +-
net/netfilter/nft_queue.c | 2 +-
net/nfc/core.c | 2 +-
net/rds/send.c | 11 ++++++++-
sound/core/timer_compat.c | 12 +++++-----
sound/pci/hda/hda_intel.c | 3 +++
sound/pci/hda/patch_realtek.c | 2 +-
sound/soc/codecs/wm_adsp.c | 25 ++++++++++++++++++-
sound/usb/clock.c | 9 ++++---
sound/usb/mixer.c | 15 +++++++++++-
74 files changed, 350 insertions(+), 170 deletions(-)
If the call __alloc_contig_migrate_range() in alloc_contig_range
returns -EBUSY, processing continues so that test_pages_isolated()
is called where there is a tracepoint to identify the busy pages.
However, it is possible for busy pages to become available between
the calls to these two routines. In this case, the range of pages
may be allocated. Unfortunately, the original return code (ret
== -EBUSY) is still set and returned to the caller. Therefore,
the caller believes the pages were not allocated and they are leaked.
Update the return code with the value from test_pages_isolated().
Fixes: 8ef5849fa8a2 ("mm/cma: always check which page caused allocation failure")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
---
mm/page_alloc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 77e4d3c5c57b..3605ca82fd29 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7632,10 +7632,10 @@ int alloc_contig_range(unsigned long start, unsigned long end,
}
/* Make sure the range is really isolated. */
- if (test_pages_isolated(outer_start, end, false)) {
+ ret = test_pages_isolated(outer_start, end, false);
+ if (ret) {
pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
__func__, outer_start, end);
- ret = -EBUSY;
goto done;
}
--
2.13.6
From: Eric Biggers <ebiggers(a)google.com>
When asked to encrypt or decrypt 0 bytes, both the generic and x86
implementations of Salsa20 crash in blkcipher_walk_done(), either when
doing 'kfree(walk->buffer)' or 'free_page((unsigned long)walk->page)',
because walk->buffer and walk->page have not been initialized.
The bug is that Salsa20 is calling blkcipher_walk_done() even when
nothing is in 'walk.nbytes'. But blkcipher_walk_done() is only meant to
be called when a nonzero number of bytes have been provided.
The broken code is part of an optimization that tries to make only one
call to salsa20_encrypt_bytes() to process inputs that are not evenly
divisible by 64 bytes. To fix the bug, just remove this "optimization"
and use the blkcipher_walk API the same way all the other users do.
Reproducer:
#include <linux/if_alg.h>
#include <sys/socket.h>
#include <unistd.h>
int main()
{
int algfd, reqfd;
struct sockaddr_alg addr = {
.salg_type = "skcipher",
.salg_name = "salsa20",
};
char key[16] = { 0 };
algfd = socket(AF_ALG, SOCK_SEQPACKET, 0);
bind(algfd, (void *)&addr, sizeof(addr));
reqfd = accept(algfd, 0, 0);
setsockopt(algfd, SOL_ALG, ALG_SET_KEY, key, sizeof(key));
read(reqfd, key, sizeof(key));
}
Reported-by: syzbot <syzkaller(a)googlegroups.com>
Fixes: eb6f13eb9f81 ("[CRYPTO] salsa20_generic: Fix multi-page processing")
Cc: <stable(a)vger.kernel.org> # v2.6.25+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
arch/x86/crypto/salsa20_glue.c | 7 -------
crypto/salsa20_generic.c | 7 -------
2 files changed, 14 deletions(-)
diff --git a/arch/x86/crypto/salsa20_glue.c b/arch/x86/crypto/salsa20_glue.c
index 399a29d067d6..cb91a64a99e7 100644
--- a/arch/x86/crypto/salsa20_glue.c
+++ b/arch/x86/crypto/salsa20_glue.c
@@ -59,13 +59,6 @@ static int encrypt(struct blkcipher_desc *desc,
salsa20_ivsetup(ctx, walk.iv);
- if (likely(walk.nbytes == nbytes))
- {
- salsa20_encrypt_bytes(ctx, walk.src.virt.addr,
- walk.dst.virt.addr, nbytes);
- return blkcipher_walk_done(desc, &walk, 0);
- }
-
while (walk.nbytes >= 64) {
salsa20_encrypt_bytes(ctx, walk.src.virt.addr,
walk.dst.virt.addr,
diff --git a/crypto/salsa20_generic.c b/crypto/salsa20_generic.c
index f550b5d94630..d7da0eea5622 100644
--- a/crypto/salsa20_generic.c
+++ b/crypto/salsa20_generic.c
@@ -188,13 +188,6 @@ static int encrypt(struct blkcipher_desc *desc,
salsa20_ivsetup(ctx, walk.iv);
- if (likely(walk.nbytes == nbytes))
- {
- salsa20_encrypt_bytes(ctx, walk.dst.virt.addr,
- walk.src.virt.addr, nbytes);
- return blkcipher_walk_done(desc, &walk, 0);
- }
-
while (walk.nbytes >= 64) {
salsa20_encrypt_bytes(ctx, walk.dst.virt.addr,
walk.src.virt.addr,
--
2.15.0
From: Eric Biggers <ebiggers(a)google.com>
Because the HMAC template didn't check that its underlying hash
algorithm is unkeyed, trying to use "hmac(hmac(sha3-512-generic))"
through AF_ALG or through KEYCTL_DH_COMPUTE resulted in the inner HMAC
being used without having been keyed, resulting in sha3_update() being
called without sha3_init(), causing a stack buffer overflow.
This is a very old bug, but it seems to have only started causing real
problems when SHA-3 support was added (requires CONFIG_CRYPTO_SHA3)
because the innermost hash's state is ->import()ed from a zeroed buffer,
and it just so happens that other hash algorithms are fine with that,
but SHA-3 is not. However, there could be arch or hardware-dependent
hash algorithms also affected; I couldn't test everything.
Fix the bug by introducing a function crypto_shash_alg_has_setkey()
which tests whether a shash algorithm is keyed. Then update the HMAC
template to require that its underlying hash algorithm is unkeyed.
Here is a reproducer:
#include <linux/if_alg.h>
#include <sys/socket.h>
int main()
{
int algfd;
struct sockaddr_alg addr = {
.salg_type = "hash",
.salg_name = "hmac(hmac(sha3-512-generic))",
};
char key[4096] = { 0 };
algfd = socket(AF_ALG, SOCK_SEQPACKET, 0);
bind(algfd, (const struct sockaddr *)&addr, sizeof(addr));
setsockopt(algfd, SOL_ALG, ALG_SET_KEY, key, sizeof(key));
}
Here was the KASAN report from syzbot:
BUG: KASAN: stack-out-of-bounds in memcpy include/linux/string.h:341 [inline]
BUG: KASAN: stack-out-of-bounds in sha3_update+0xdf/0x2e0 crypto/sha3_generic.c:161
Write of size 4096 at addr ffff8801cca07c40 by task syzkaller076574/3044
CPU: 1 PID: 3044 Comm: syzkaller076574 Not tainted 4.14.0-mm1+ #25
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:53
print_address_description+0x73/0x250 mm/kasan/report.c:252
kasan_report_error mm/kasan/report.c:351 [inline]
kasan_report+0x25b/0x340 mm/kasan/report.c:409
check_memory_region_inline mm/kasan/kasan.c:260 [inline]
check_memory_region+0x137/0x190 mm/kasan/kasan.c:267
memcpy+0x37/0x50 mm/kasan/kasan.c:303
memcpy include/linux/string.h:341 [inline]
sha3_update+0xdf/0x2e0 crypto/sha3_generic.c:161
crypto_shash_update+0xcb/0x220 crypto/shash.c:109
shash_finup_unaligned+0x2a/0x60 crypto/shash.c:151
crypto_shash_finup+0xc4/0x120 crypto/shash.c:165
hmac_finup+0x182/0x330 crypto/hmac.c:152
crypto_shash_finup+0xc4/0x120 crypto/shash.c:165
shash_digest_unaligned+0x9e/0xd0 crypto/shash.c:172
crypto_shash_digest+0xc4/0x120 crypto/shash.c:186
hmac_setkey+0x36a/0x690 crypto/hmac.c:66
crypto_shash_setkey+0xad/0x190 crypto/shash.c:64
shash_async_setkey+0x47/0x60 crypto/shash.c:207
crypto_ahash_setkey+0xaf/0x180 crypto/ahash.c:200
hash_setkey+0x40/0x90 crypto/algif_hash.c:446
alg_setkey crypto/af_alg.c:221 [inline]
alg_setsockopt+0x2a1/0x350 crypto/af_alg.c:254
SYSC_setsockopt net/socket.c:1851 [inline]
SyS_setsockopt+0x189/0x360 net/socket.c:1830
entry_SYSCALL_64_fastpath+0x1f/0x96
Reported-by: syzbot <syzkaller(a)googlegroups.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
crypto/hmac.c | 6 +++++-
crypto/shash.c | 5 +++--
include/crypto/internal/hash.h | 8 ++++++++
3 files changed, 16 insertions(+), 3 deletions(-)
diff --git a/crypto/hmac.c b/crypto/hmac.c
index 92871dc2a63e..e74730224f0a 100644
--- a/crypto/hmac.c
+++ b/crypto/hmac.c
@@ -195,11 +195,15 @@ static int hmac_create(struct crypto_template *tmpl, struct rtattr **tb)
salg = shash_attr_alg(tb[1], 0, 0);
if (IS_ERR(salg))
return PTR_ERR(salg);
+ alg = &salg->base;
+ /* The underlying hash algorithm must be unkeyed */
err = -EINVAL;
+ if (crypto_shash_alg_has_setkey(salg))
+ goto out_put_alg;
+
ds = salg->digestsize;
ss = salg->statesize;
- alg = &salg->base;
if (ds > alg->cra_blocksize ||
ss < alg->cra_blocksize)
goto out_put_alg;
diff --git a/crypto/shash.c b/crypto/shash.c
index 325a14da5827..e849d3ee2e27 100644
--- a/crypto/shash.c
+++ b/crypto/shash.c
@@ -25,11 +25,12 @@
static const struct crypto_type crypto_shash_type;
-static int shash_no_setkey(struct crypto_shash *tfm, const u8 *key,
- unsigned int keylen)
+int shash_no_setkey(struct crypto_shash *tfm, const u8 *key,
+ unsigned int keylen)
{
return -ENOSYS;
}
+EXPORT_SYMBOL_GPL(shash_no_setkey);
static int shash_setkey_unaligned(struct crypto_shash *tfm, const u8 *key,
unsigned int keylen)
diff --git a/include/crypto/internal/hash.h b/include/crypto/internal/hash.h
index f0b44c16e88f..c2bae8da642c 100644
--- a/include/crypto/internal/hash.h
+++ b/include/crypto/internal/hash.h
@@ -82,6 +82,14 @@ int ahash_register_instance(struct crypto_template *tmpl,
struct ahash_instance *inst);
void ahash_free_instance(struct crypto_instance *inst);
+int shash_no_setkey(struct crypto_shash *tfm, const u8 *key,
+ unsigned int keylen);
+
+static inline bool crypto_shash_alg_has_setkey(struct shash_alg *alg)
+{
+ return alg->setkey != shash_no_setkey;
+}
+
int crypto_init_ahash_spawn(struct crypto_ahash_spawn *spawn,
struct hash_alg_common *alg,
struct crypto_instance *inst);
--
2.15.0
The patch titled
Subject: mm/cma: fix alloc_contig_range ret code/potential leak
has been added to the -mm tree. Its filename is
mm-cma-fix-alloc_contig_range-ret-code-potential-leak-v2.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-cma-fix-alloc_contig_range-ret-…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-cma-fix-alloc_contig_range-ret-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: mm/cma: fix alloc_contig_range ret code/potential leak
If the call __alloc_contig_migrate_range() in alloc_contig_range returns
-EBUSY, processing continues so that test_pages_isolated() is called where
there is a tracepoint to identify the busy pages. However, it is possible
for busy pages to become available between the calls to these two
routines. In this case, the range of pages may be allocated.
Unfortunately, the original return code (ret == -EBUSY) is still set and
returned to the caller. Therefore, the caller believes the pages were not
allocated and they are leaked.
Update comment to indicate that allocation is still possible even if
__alloc_contig_migrate_range returns -EBUSY. Also, clear return code in
this case so that it is not accidentally used or returned to caller.
Link: http://lkml.kernel.org/r/20171122185214.25285-1-mike.kravetz@oracle.com
Fixes: 8ef5849fa8a2 ("mm/cma: always check which page caused allocation failure")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Nazarewicz <mina86(a)mina86.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Laura Abbott <labbott(a)redhat.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff -puN mm/page_alloc.c~mm-cma-fix-alloc_contig_range-ret-code-potential-leak-v2 mm/page_alloc.c
--- a/mm/page_alloc.c~mm-cma-fix-alloc_contig_range-ret-code-potential-leak-v2
+++ a/mm/page_alloc.c
@@ -7652,11 +7652,18 @@ int alloc_contig_range(unsigned long sta
/*
* In case of -EBUSY, we'd like to know which page causes problem.
- * So, just fall through. We will check it in test_pages_isolated().
+ * So, just fall through. test_pages_isolated() has a tracepoint
+ * which will report the busy page.
+ *
+ * It is possible that busy pages could become available before
+ * the call to test_pages_isolated, and the range will actually be
+ * allocated. So, if we fall through be sure to clear ret so that
+ * -EBUSY is not accidentally used or returned to caller.
*/
ret = __alloc_contig_migrate_range(&cc, start, end);
if (ret && ret != -EBUSY)
goto done;
+ ret =0;
/*
* Pages from [start, end) are within a MAX_ORDER_NR_PAGES
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
mm-cma-fix-alloc_contig_range-ret-code-potential-leak-v2.patch
The patch titled
Subject: mm/cma: fix alloc_contig_range ret code/potential leak
has been removed from the -mm tree. Its filename was
mm-cma-fix-alloc_contig_range-ret-code-potential-leak.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: mm/cma: fix alloc_contig_range ret code/potential leak
In an attempt to make contiguous allocation routines more available to
drivers, I have been experimenting with code similar to that used by
alloc_gigantic_page(). While stressing this code with many other
allocations and frees in progress, I would sometimes notice large
'leaks' of page ranges.
I traced this down to the routine alloc_contig_range() itself. In
8ef5849fa8a2 ("mm/cma: always check which page caused allocation
failure") the code was changed so that an -EBUSY returned by
__alloc_contig_migrate_range() would not immediately return to the
caller. Rather, processing continues so that test_pages_isolated() is
eventually called. This is done because test_pages_isolated() has a
tracepoint to identify the busy pages.
However, it is possible (observed in my testing) that pages which were
busy when __alloc_contig_migrate_range was called may become available
by the time test_pages_isolated is called. Further, it is possible
that the entire range can actually be allocated. Unfortunately, in
this case the return code originally set by
__alloc_contig_migrate_range (-EBUSY) is returned to the calller.
Therefore, the caller assumes the range was not allocated and the pages
are essentially leaked.
The following patch simply updates the return code based on the value
returned from test_pages_isolated.
It is unlikely that we will hit this issue today based on the limited
number of callers to alloc_contig_range. However, I have Cc'ed stable
because if we do hit this issue it has the potential to leak a large
number of pages.
If the call __alloc_contig_migrate_range() in alloc_contig_range returns
-EBUSY, processing continues so that test_pages_isolated() is called where
there is a tracepoint to identify the busy pages. However, it is possible
for busy pages to become available between the calls to these two
routines. In this case, the range of pages may be allocated.
Unfortunately, the original return code (ret == -EBUSY) is still set and
returned to the caller. Therefore, the caller believes the pages were not
allocated and they are leaked.
Update the return code with the value from test_pages_isolated().
Link: http://lkml.kernel.org/r/20171120193930.23428-2-mike.kravetz@oracle.com
Fixes: 8ef5849fa8a2 ("mm/cma: always check which page caused allocation failure")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Michal Nazarewicz <mina86(a)mina86.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Laura Abbott <labbott(a)redhat.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff -puN mm/page_alloc.c~mm-cma-fix-alloc_contig_range-ret-code-potential-leak mm/page_alloc.c
--- a/mm/page_alloc.c~mm-cma-fix-alloc_contig_range-ret-code-potential-leak
+++ a/mm/page_alloc.c
@@ -7702,10 +7702,10 @@ int alloc_contig_range(unsigned long sta
}
/* Make sure the range is really isolated. */
- if (test_pages_isolated(outer_start, end, false)) {
+ ret = test_pages_isolated(outer_start, end, false);
+ if (ret) {
pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
__func__, outer_start, end);
- ret = -EBUSY;
goto done;
}
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
mm-cma-fix-alloc_contig_range-ret-code-potential-leak-v2.patch
The patch titled
Subject: mm/hugetlb: fix NULL-pointer dereference on 5-level paging machine
has been added to the -mm tree. Its filename is
mm-hugetlb-fix-null-pointer-dereference-on-5-level-paging-machine.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-hugetlb-fix-null-pointer-derefe…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-hugetlb-fix-null-pointer-derefe…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Subject: mm/hugetlb: fix NULL-pointer dereference on 5-level paging machine
I made a mistake during converting hugetlb code to 5-level paging: in
huge_pte_alloc() we have to use p4d_alloc(), not p4d_offset(). Otherwise
it leads to crash -- NULL-pointer dereference in pud_alloc() if p4d table
is not yet allocated.
It only can happen in 5-level paging mode. In 4-level paging mode
p4d_offset() always returns pgd, so we are fine.
Link: http://lkml.kernel.org/r/20171122121921.64822-1-kirill.shutemov@linux.intel…
Fixes: c2febafc6773 ("mm: convert generic code to 5-level paging")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: <stable(a)vger.kernel.org> [4.11+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff -puN mm/hugetlb.c~mm-hugetlb-fix-null-pointer-dereference-on-5-level-paging-machine mm/hugetlb.c
--- a/mm/hugetlb.c~mm-hugetlb-fix-null-pointer-dereference-on-5-level-paging-machine
+++ a/mm/hugetlb.c
@@ -4635,7 +4635,9 @@ pte_t *huge_pte_alloc(struct mm_struct *
pte_t *pte = NULL;
pgd = pgd_offset(mm, addr);
- p4d = p4d_offset(pgd, addr);
+ p4d = p4d_alloc(mm, pgd, addr);
+ if (!p4d)
+ return NULL;
pud = pud_alloc(mm, p4d, addr);
if (pud) {
if (sz == PUD_SIZE) {
_
Patches currently in -mm which might be from kirill.shutemov(a)linux.intel.com are
mm-hugetlb-fix-null-pointer-dereference-on-5-level-paging-machine.patch
The patch titled
Subject: autofs: revert "autofs: fix AT_NO_AUTOMOUNT not being honored"
has been added to the -mm tree. Its filename is
autofs-revert-fix-at_no_automount-not-being-honored.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/autofs-revert-fix-at_no_automount-…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/autofs-revert-fix-at_no_automount-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Ian Kent <raven(a)themaw.net>
Subject: autofs: revert "autofs: fix AT_NO_AUTOMOUNT not being honored"
42f4614821 ("autofs: fix AT_NO_AUTOMOUNT not being honored") allowed the
fstatat(2) system call to properly honor the AT_NO_AUTOMOUNT flag but
introduced a semantic change.
In order to honor AT_NO_AUTOMOUNT a semantic change was made to the
negative dentry case for stat family system calls in follow_automount().
This changed the unconditional triggering of an automount in this case to
no longer be done and an error returned instead.
This has caused more problems than I expected so reverting the change is
needed.
In a discussion with Neil Brown it was concluded that the automount(8)
daemon can implement this change without kernel modifications. So that
will be done instead and the autofs module documentation updated with a
description of the problem and what needs to be done by module users for
this specific case.
Link: http://lkml.kernel.org/r/151174730120.6162.3848002191530283984.stgit@pluto.…
Fixes: 42f4614821 ("autofs: fix AT_NO_AUTOMOUNT not being honored")
Signed-off-by: Ian Kent <raven(a)themaw.net>
Cc: Neil Brown <neilb(a)suse.com>
Cc: Al Viro <viro(a)ZenIV.linux.org.uk>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Colin Walters <walters(a)redhat.com>
Cc: Ondrej Holy <oholy(a)redhat.com>
Cc: <stable(a)vger.kernel.org> [4.11+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/namei.c | 15 +++------------
include/linux/fs.h | 3 ++-
2 files changed, 5 insertions(+), 13 deletions(-)
diff -puN fs/namei.c~autofs-revert-fix-at_no_automount-not-being-honored fs/namei.c
--- a/fs/namei.c~autofs-revert-fix-at_no_automount-not-being-honored
+++ a/fs/namei.c
@@ -1129,18 +1129,9 @@ static int follow_automount(struct path
* of the daemon to instantiate them before they can be used.
*/
if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
- LOOKUP_OPEN | LOOKUP_CREATE |
- LOOKUP_AUTOMOUNT))) {
- /* Positive dentry that isn't meant to trigger an
- * automount, EISDIR will allow it to be used,
- * otherwise there's no mount here "now" so return
- * ENOENT.
- */
- if (path->dentry->d_inode)
- return -EISDIR;
- else
- return -ENOENT;
- }
+ LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
+ path->dentry->d_inode)
+ return -EISDIR;
if (path->dentry->d_sb->s_user_ns != &init_user_ns)
return -EACCES;
diff -puN include/linux/fs.h~autofs-revert-fix-at_no_automount-not-being-honored include/linux/fs.h
--- a/include/linux/fs.h~autofs-revert-fix-at_no_automount-not-being-honored
+++ a/include/linux/fs.h
@@ -3088,7 +3088,8 @@ static inline int vfs_lstat(const char _
static inline int vfs_fstatat(int dfd, const char __user *filename,
struct kstat *stat, int flags)
{
- return vfs_statx(dfd, filename, flags, stat, STATX_BASIC_STATS);
+ return vfs_statx(dfd, filename, flags | AT_NO_AUTOMOUNT,
+ stat, STATX_BASIC_STATS);
}
static inline int vfs_fstat(int fd, struct kstat *stat)
{
_
Patches currently in -mm which might be from raven(a)themaw.net are
autofs-revert-take-more-care-to-not-update-last_used-on-path-walk.patch
autofs-revert-fix-at_no_automount-not-being-honored.patch
The patch titled
Subject: autofs: revert "autofs: take more care to not update last_used on path walk"
has been added to the -mm tree. Its filename is
autofs-revert-take-more-care-to-not-update-last_used-on-path-walk.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/autofs-revert-take-more-care-to-no…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/autofs-revert-take-more-care-to-no…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Ian Kent <raven(a)themaw.net>
Subject: autofs: revert "autofs: take more care to not update last_used on path walk"
While 092a53452b ("autofs: take more care to not update last_used on path
walk") helped (partially) resolve a problem where automounts were not
expiring due to aggressive accesses from user space it has a side effect
for very large environments.
This change helps with the expire problem by making the expire more
aggressive but, for very large environments, that means more mount
requests from clients. When there are a lot of clients that can mean
fairly significant server load increases.
It turns out I put the last_used in this position to solve this very
problem and failed to update my own thinking of the autofs expire policy.
So the patch being reverted introduces a regression which should be fixed.
Link: http://lkml.kernel.org/r/151174729420.6162.1832622523537052460.stgit@pluto.…
Fixes: 092a53452b ("autofs: take more care to not update last_used on path walk")
Signed-off-by: Ian Kent <raven(a)themaw.net>
Reviewed-by: NeilBrown <neilb(a)suse.com>
Cc: Al Viro <viro(a)ZenIV.linux.org.uk>
Cc: <stable(a)vger.kernel.org> [4.11+]
Cc: Colin Walters <walters(a)redhat.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Ondrej Holy <oholy(a)redhat.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/autofs4/root.c | 17 ++++++-----------
1 file changed, 6 insertions(+), 11 deletions(-)
diff -puN fs/autofs4/root.c~autofs-revert-take-more-care-to-not-update-last_used-on-path-walk fs/autofs4/root.c
--- a/fs/autofs4/root.c~autofs-revert-take-more-care-to-not-update-last_used-on-path-walk
+++ a/fs/autofs4/root.c
@@ -281,8 +281,8 @@ static int autofs4_mount_wait(const stru
pr_debug("waiting for mount name=%pd\n", path->dentry);
status = autofs4_wait(sbi, path, NFY_MOUNT);
pr_debug("mount wait done status=%d\n", status);
- ino->last_used = jiffies;
}
+ ino->last_used = jiffies;
return status;
}
@@ -321,21 +321,16 @@ static struct dentry *autofs4_mountpoint
*/
if (autofs_type_indirect(sbi->type) && d_unhashed(dentry)) {
struct dentry *parent = dentry->d_parent;
+ struct autofs_info *ino;
struct dentry *new;
new = d_lookup(parent, &dentry->d_name);
if (!new)
return NULL;
- if (new == dentry)
- dput(new);
- else {
- struct autofs_info *ino;
-
- ino = autofs4_dentry_ino(new);
- ino->last_used = jiffies;
- dput(path->dentry);
- path->dentry = new;
- }
+ ino = autofs4_dentry_ino(new);
+ ino->last_used = jiffies;
+ dput(path->dentry);
+ path->dentry = new;
}
return path->dentry;
}
_
Patches currently in -mm which might be from raven(a)themaw.net are
autofs-revert-take-more-care-to-not-update-last_used-on-path-walk.patch
autofs-revert-fix-at_no_automount-not-being-honored.patch
The patch titled
Subject: fat: Fix sb_rdonly() change
has been added to the -mm tree. Its filename is
fat-fix-sb_rdonly-change.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/fat-fix-sb_rdonly-change.patch
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/fat-fix-sb_rdonly-change.patch
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp>
Subject: fat: Fix sb_rdonly() change
bc98a42c1f7d0f ("VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)")
converted fat_remount():new_rdonly from a bool to an int. However
fat_remount() depends upon the compiler's conversion of a non-zero integer
into boolean `true'.
Fix it by switching `new_rdonly' back into a bool.
Link: http://lkml.kernel.org/r/87mv3d5x51.fsf@mail.parknet.co.jp
Fixes: bc98a42c1f7d0f8 ("VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)")
Signed-off-by: OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp>
Cc: Joe Perches <joe(a)perches.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/fat/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN fs/fat/inode.c~fat-fix-sb_rdonly-change fs/fat/inode.c
--- a/fs/fat/inode.c~fat-fix-sb_rdonly-change
+++ a/fs/fat/inode.c
@@ -779,7 +779,7 @@ static void __exit fat_destroy_inodecach
static int fat_remount(struct super_block *sb, int *flags, char *data)
{
- int new_rdonly;
+ bool new_rdonly;
struct msdos_sb_info *sbi = MSDOS_SB(sb);
*flags |= SB_NODIRATIME | (sbi->options.isvfat ? 0 : SB_NOATIME);
_
Patches currently in -mm which might be from hirofumi(a)mail.parknet.co.jp are
fat-fix-sb_rdonly-change.patch
The patch titled
Subject: kernel/async.c: revert "async: simplify lowest_in_progress()"
has been added to the -mm tree. Its filename is
revert-async-simplify-lowest_in_progress.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/revert-async-simplify-lowest_in_pr…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/revert-async-simplify-lowest_in_pr…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Rasmus Villemoes <linux(a)rasmusvillemoes.dk>
Subject: kernel/async.c: revert "async: simplify lowest_in_progress()"
This reverts 92266d6ef60c2381 ("async: simplify lowest_in_progress()"),
which was simply wrong: In the case where domain is NULL, we now use the
wrong offsetof() in the list_first_entry macro, so we don't actually fetch
the ->cookie value, but rather the eight bytes located sizeof(struct
list_head) further into the struct async_entry.
On 64 bit, that's the data member, while on 32 bit, that's a u64 built
from func and data in some order.
I think the bug happens to be harmless in practice: It obviously only
affects callers which pass a NULL domain, and AFAICT the only such caller
is
async_synchronize_full() ->
async_synchronize_full_domain(NULL) ->
async_synchronize_cookie_domain(ASYNC_COOKIE_MAX, NULL)
and the ASYNC_COOKIE_MAX means that in practice we end up waiting for the
async_global_pending list to be empty - but it would break if somebody
happened to pass (void*)-1 as the data element to async_schedule, and of
course also if somebody ever does a async_synchronize_cookie_domain(,
NULL) with a "finite" cookie value.
Maybe the "harmless in practice" means this isn't -stable material. But
I'm not completely confident my quick git grep'ing is enough, and there
might be affected code in one of the earlier kernels that has since been
removed, so I'll leave the decision to the stable guys.
Link: http://lkml.kernel.org/r/20171128104938.3921-1-linux@rasmusvillemoes.dk
Fixes: 92266d6ef60c "async: simplify lowest_in_progress()"
Signed-off-by: Rasmus Villemoes <linux(a)rasmusvillemoes.dk>
Acked-by: Tejun Heo <tj(a)kernel.org>
Cc: Arjan van de Ven <arjan(a)linux.intel.com>
Cc: Adam Wallis <awallis(a)codeaurora.org>
Cc: Lai Jiangshan <laijs(a)cn.fujitsu.com>
Cc: <stable(a)vger.kernel.org> [3.10+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/async.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff -puN kernel/async.c~revert-async-simplify-lowest_in_progress kernel/async.c
--- a/kernel/async.c~revert-async-simplify-lowest_in_progress
+++ a/kernel/async.c
@@ -84,20 +84,24 @@ static atomic_t entry_count;
static async_cookie_t lowest_in_progress(struct async_domain *domain)
{
- struct list_head *pending;
+ struct async_entry *first = NULL;
async_cookie_t ret = ASYNC_COOKIE_MAX;
unsigned long flags;
spin_lock_irqsave(&async_lock, flags);
- if (domain)
- pending = &domain->pending;
- else
- pending = &async_global_pending;
+ if (domain) {
+ if (!list_empty(&domain->pending))
+ first = list_first_entry(&domain->pending,
+ struct async_entry, domain_list);
+ } else {
+ if (!list_empty(&async_global_pending))
+ first = list_first_entry(&async_global_pending,
+ struct async_entry, global_list);
+ }
- if (!list_empty(pending))
- ret = list_first_entry(pending, struct async_entry,
- domain_list)->cookie;
+ if (first)
+ ret = first->cookie;
spin_unlock_irqrestore(&async_lock, flags);
return ret;
_
Patches currently in -mm which might be from linux(a)rasmusvillemoes.dk are
revert-async-simplify-lowest_in_progress.patch
The patch titled
Subject: mm, memcg: fix mem_cgroup_swapout() for THPs
has been added to the -mm tree. Its filename is
mm-memcg-fix-mem_cgroup_swapout-for-thps.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-memcg-fix-mem_cgroup_swapout-fo…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcg-fix-mem_cgroup_swapout-fo…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Shakeel Butt <shakeelb(a)google.com>
Subject: mm, memcg: fix mem_cgroup_swapout() for THPs
d6810d730022 ("memcg, THP, swap: make mem_cgroup_swapout() support THP")
changed mem_cgroup_swapout() to support transparent huge page (THP).
However the patch missed one location which should be changed for
correctly handling THPs. The resulting bug will cause the memory cgroups
whose THPs were swapped out to become zombies on deletion.
Link: http://lkml.kernel.org/r/20171128161941.20931-1-shakeelb@google.com
Fixes: d6810d730022 ("memcg, THP, swap: make mem_cgroup_swapout() support THP")
Signed-off-by: Shakeel Butt <shakeelb(a)google.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Huang Ying <ying.huang(a)intel.com>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: Greg Thelen <gthelen(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN mm/memcontrol.c~mm-memcg-fix-mem_cgroup_swapout-for-thps mm/memcontrol.c
--- a/mm/memcontrol.c~mm-memcg-fix-mem_cgroup_swapout-for-thps
+++ a/mm/memcontrol.c
@@ -6044,7 +6044,7 @@ void mem_cgroup_swapout(struct page *pag
memcg_check_events(memcg, page);
if (!mem_cgroup_is_root(memcg))
- css_put(&memcg->css);
+ css_put_many(&memcg->css, nr_entries);
}
/**
_
Patches currently in -mm which might be from shakeelb(a)google.com are
mm-memcg-fix-mem_cgroup_swapout-for-thps.patch
mm-mlock-vmscan-no-more-skipping-pagevecs.patch
vfs-remove-might_sleep-from-clear_inode.patch
From: Daniel Jurgens <danielj(a)mellanox.com>
For now the only LSM security enforcement mechanism available is
specific to InfiniBand. Bypass enforcement for non-IB link types.
This fixes a regression where modify_qp fails for iWARP because
querying the PKEY returns -EINVAL.
Cc: Paul Moore <paul(a)paul-moore.com>
Cc: Don Dutile <ddutile(a)redhat.com>
Cc: stable(a)vger.kernel.org
Reported-by: Potnuri Bharat Teja <bharat(a)chelsio.com>
Fixes: d291f1a65232("IB/core: Enforce PKey security on QPs")
Fixes: 47a2b338fe63("IB/core: Enforce security on management datagrams")
Signed-off-by: Daniel Jurgens <danielj(a)mellanox.com>
Reviewed-by: Parav Pandit <parav(a)mellanox.com>
Tested-by: Potnuri Bharat Teja <bharat(a)chelsio.com>
Signed-off-by: Leon Romanovsky <leon(a)kernel.org>
---
Changelog:
v1->v2: Fixed build errors
v0->v1: Added proper SElinux patch
---
drivers/infiniband/core/security.c | 43 ++++++++++++++++++++++++++++++++++++--
1 file changed, 41 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/core/security.c b/drivers/infiniband/core/security.c
index 23278ed5be45..06c608c07b65 100644
--- a/drivers/infiniband/core/security.c
+++ b/drivers/infiniband/core/security.c
@@ -417,8 +417,17 @@ void ib_close_shared_qp_security(struct ib_qp_security *sec)
int ib_create_qp_security(struct ib_qp *qp, struct ib_device *dev)
{
+ u8 i = rdma_start_port(dev);
+ bool is_ib = false;
int ret;
+ while (i <= rdma_end_port(dev) && !is_ib)
+ is_ib = rdma_protocol_ib(dev, i++);
+
+ /* If this isn't an IB device don't create the security context */
+ if (!is_ib)
+ return 0;
+
qp->qp_sec = kzalloc(sizeof(*qp->qp_sec), GFP_KERNEL);
if (!qp->qp_sec)
return -ENOMEM;
@@ -441,6 +450,10 @@ EXPORT_SYMBOL(ib_create_qp_security);
void ib_destroy_qp_security_begin(struct ib_qp_security *sec)
{
+ /* Return if not IB */
+ if (!sec)
+ return;
+
mutex_lock(&sec->mutex);
/* Remove the QP from the lists so it won't get added to
@@ -470,6 +483,10 @@ void ib_destroy_qp_security_abort(struct ib_qp_security *sec)
int ret;
int i;
+ /* Return if not IB */
+ if (!sec)
+ return;
+
/* If a concurrent cache update is in progress this
* QP security could be marked for an error state
* transition. Wait for this to complete.
@@ -505,6 +522,10 @@ void ib_destroy_qp_security_end(struct ib_qp_security *sec)
{
int i;
+ /* Return if not IB */
+ if (!sec)
+ return;
+
/* If a concurrent cache update is occurring we must
* wait until this QP security structure is processed
* in the QP to error flow before destroying it because
@@ -565,13 +586,19 @@ int ib_security_modify_qp(struct ib_qp *qp,
bool pps_change = ((qp_attr_mask & (IB_QP_PKEY_INDEX | IB_QP_PORT)) ||
(qp_attr_mask & IB_QP_ALT_PATH));
+ WARN_ONCE((qp_attr_mask & IB_QP_PORT &&
+ rdma_protocol_ib(real_qp->device, qp_attr->port_num) &&
+ !real_qp->qp_sec),
+ "%s: QP security is not initialized for IB QP: %d\n",
+ __func__, real_qp->qp_num);
+
/* The port/pkey settings are maintained only for the real QP. Open
* handles on the real QP will be in the shared_qp_list. When
* enforcing security on the real QP all the shared QPs will be
* checked as well.
*/
- if (pps_change && !special_qp) {
+ if (pps_change && !special_qp && real_qp->qp_sec) {
mutex_lock(&real_qp->qp_sec->mutex);
new_pps = get_new_pps(real_qp,
qp_attr,
@@ -600,7 +627,7 @@ int ib_security_modify_qp(struct ib_qp *qp,
qp_attr_mask,
udata);
- if (pps_change && !special_qp) {
+ if (pps_change && !special_qp && real_qp->qp_sec) {
/* Clean up the lists and free the appropriate
* ports_pkeys structure.
*/
@@ -631,6 +658,9 @@ int ib_security_pkey_access(struct ib_device *dev,
u16 pkey;
int ret;
+ if (!rdma_protocol_ib(dev, port_num))
+ return 0;
+
ret = ib_get_cached_pkey(dev, port_num, pkey_index, &pkey);
if (ret)
return ret;
@@ -665,6 +695,9 @@ int ib_mad_agent_security_setup(struct ib_mad_agent *agent,
{
int ret;
+ if (!rdma_protocol_ib(agent->device, agent->port_num))
+ return 0;
+
ret = security_ib_alloc_security(&agent->security);
if (ret)
return ret;
@@ -690,6 +723,9 @@ int ib_mad_agent_security_setup(struct ib_mad_agent *agent,
void ib_mad_agent_security_cleanup(struct ib_mad_agent *agent)
{
+ if (!rdma_protocol_ib(agent->device, agent->port_num))
+ return;
+
security_ib_free_security(agent->security);
if (agent->lsm_nb_reg)
unregister_lsm_notifier(&agent->lsm_nb);
@@ -697,6 +733,9 @@ void ib_mad_agent_security_cleanup(struct ib_mad_agent *agent)
int ib_mad_enforce_security(struct ib_mad_agent_private *map, u16 pkey_index)
{
+ if (!rdma_protocol_ib(map->agent.device, map->agent.port_num))
+ return 0;
+
if (map->agent.qp->qp_type == IB_QPT_SMI && !map->agent.smp_allowed)
return -EACCES;
--
2.15.0
From: Eric Biggers <ebiggers(a)google.com>
When the request_key() syscall is not passed a destination keyring, it
links the requested key (if constructed) into the "default" request-key
keyring. This should require Write permission to the keyring. However,
there is actually no permission check.
This can be abused to add keys to any keyring to which only Search
permission is granted. This is because Search permission allows joining
the keyring. keyctl_set_reqkey_keyring(KEY_REQKEY_DEFL_SESSION_KEYRING)
then will set the default request-key keyring to the session keyring.
Then, request_key() can be used to add keys to the keyring.
Both negatively and positively instantiated keys can be added using this
method. Adding negative keys is trivial. Adding a positive key is a
bit trickier. It requires that either /sbin/request-key positively
instantiates the key, or that another thread adds the key to the process
keyring at just the right time, such that request_key() misses it
initially but then finds it in construct_alloc_key().
Fix this bug by checking for Write permission to the keyring in
construct_get_dest_keyring() when the default keyring is being used.
We don't do the permission check for non-default keyrings because that
was already done by the earlier call to lookup_user_key(). Also,
request_key_and_link() is currently passed a 'struct key *' rather than
a key_ref_t, so the "possessed" bit is unavailable.
We also don't do the permission check for the "requestor keyring", to
continue to support the use case described by commit 8bbf4976b59f
("KEYS: Alter use of key instantiation link-to-keyring argument") where
/sbin/request-key recursively calls request_key() to add keys to the
original requestor's destination keyring. (I don't know of any users
who actually do that, though...)
Fixes: 3e30148c3d52 ("[PATCH] Keys: Make request-key create an authorisation key")
Cc: <stable(a)vger.kernel.org> # v2.6.13+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
v2: also skip permission check if default dest_keyring is NULL
security/keys/request_key.c | 46 ++++++++++++++++++++++++++++++++++++---------
1 file changed, 37 insertions(+), 9 deletions(-)
diff --git a/security/keys/request_key.c b/security/keys/request_key.c
index e8036cd0ad54..7dc741382154 100644
--- a/security/keys/request_key.c
+++ b/security/keys/request_key.c
@@ -251,11 +251,12 @@ static int construct_key(struct key *key, const void *callout_info,
* The keyring selected is returned with an extra reference upon it which the
* caller must release.
*/
-static void construct_get_dest_keyring(struct key **_dest_keyring)
+static int construct_get_dest_keyring(struct key **_dest_keyring)
{
struct request_key_auth *rka;
const struct cred *cred = current_cred();
struct key *dest_keyring = *_dest_keyring, *authkey;
+ int ret;
kenter("%p", dest_keyring);
@@ -264,6 +265,8 @@ static void construct_get_dest_keyring(struct key **_dest_keyring)
/* the caller supplied one */
key_get(dest_keyring);
} else {
+ bool do_perm_check = true;
+
/* use a default keyring; falling through the cases until we
* find one that we actually have */
switch (cred->jit_keyring) {
@@ -278,8 +281,10 @@ static void construct_get_dest_keyring(struct key **_dest_keyring)
dest_keyring =
key_get(rka->dest_keyring);
up_read(&authkey->sem);
- if (dest_keyring)
+ if (dest_keyring) {
+ do_perm_check = false;
break;
+ }
}
case KEY_REQKEY_DEFL_THREAD_KEYRING:
@@ -314,11 +319,29 @@ static void construct_get_dest_keyring(struct key **_dest_keyring)
default:
BUG();
}
+
+ /*
+ * Require Write permission on the keyring. This is essential
+ * because the default keyring may be the session keyring, and
+ * joining a keyring only requires Search permission.
+ *
+ * However, this check is skipped for the "requestor keyring" so
+ * that /sbin/request-key can itself use request_key() to add
+ * keys to the original requestor's destination keyring.
+ */
+ if (dest_keyring && do_perm_check) {
+ ret = key_permission(make_key_ref(dest_keyring, 1),
+ KEY_NEED_WRITE);
+ if (ret) {
+ key_put(dest_keyring);
+ return ret;
+ }
+ }
}
*_dest_keyring = dest_keyring;
kleave(" [dk %d]", key_serial(dest_keyring));
- return;
+ return 0;
}
/*
@@ -444,11 +467,15 @@ static struct key *construct_key_and_link(struct keyring_search_context *ctx,
if (ctx->index_key.type == &key_type_keyring)
return ERR_PTR(-EPERM);
- user = key_user_lookup(current_fsuid());
- if (!user)
- return ERR_PTR(-ENOMEM);
+ ret = construct_get_dest_keyring(&dest_keyring);
+ if (ret)
+ goto error;
- construct_get_dest_keyring(&dest_keyring);
+ user = key_user_lookup(current_fsuid());
+ if (!user) {
+ ret = -ENOMEM;
+ goto error_put_dest_keyring;
+ }
ret = construct_alloc_key(ctx, dest_keyring, flags, user, &key);
key_user_put(user);
@@ -463,7 +490,7 @@ static struct key *construct_key_and_link(struct keyring_search_context *ctx,
} else if (ret == -EINPROGRESS) {
ret = 0;
} else {
- goto couldnt_alloc_key;
+ goto error_put_dest_keyring;
}
key_put(dest_keyring);
@@ -473,8 +500,9 @@ static struct key *construct_key_and_link(struct keyring_search_context *ctx,
construction_failed:
key_negate_and_link(key, key_negative_timeout, NULL, NULL);
key_put(key);
-couldnt_alloc_key:
+error_put_dest_keyring:
key_put(dest_keyring);
+error:
kleave(" = %d", ret);
return ERR_PTR(ret);
}
--
2.15.0.417.g466bffb3ac-goog
From: Eric Biggers <ebiggers(a)google.com>
When the request_key() syscall is not passed a destination keyring, it
links the requested key (if constructed) into the "default" request-key
keyring. This should require Write permission to the keyring. However,
there is actually no permission check.
This can be abused to add keys to any keyring to which only Search
permission is granted. This is because Search permission allows joining
the keyring. keyctl_set_reqkey_keyring(KEY_REQKEY_DEFL_SESSION_KEYRING)
then will set the default request-key keyring to the session keyring.
Then, request_key() can be used to add keys to the keyring.
Both negatively and positively instantiated keys can be added using this
method. Adding negative keys is trivial. Adding a positive key is a
bit trickier. It requires that either /sbin/request-key positively
instantiates the key, or that another thread adds the key to the process
keyring at just the right time, such that request_key() misses it
initially but then finds it in construct_alloc_key().
Fix this bug by checking for Write permission to the keyring in
construct_get_dest_keyring() when the default keyring is being used.
We don't do the permission check for non-default keyrings because that
was already done by the earlier call to lookup_user_key(). Also,
request_key_and_link() is currently passed a 'struct key *' rather than
a key_ref_t, so the "possessed" bit is unavailable.
We also don't do the permission check for the "requestor keyring", to
continue to support the use case described by commit 8bbf4976b59f
("KEYS: Alter use of key instantiation link-to-keyring argument") where
/sbin/request-key recursively calls request_key() to add keys to the
original requestor's destination keyring. (I don't know of any users
who actually do that, though...)
Fixes: 3e30148c3d52 ("[PATCH] Keys: Make request-key create an authorisation key")
Cc: <stable(a)vger.kernel.org> # v2.6.13+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
security/keys/request_key.c | 46 ++++++++++++++++++++++++++++++++++++---------
1 file changed, 37 insertions(+), 9 deletions(-)
diff --git a/security/keys/request_key.c b/security/keys/request_key.c
index c6880af8b411..4557c1c368aa 100644
--- a/security/keys/request_key.c
+++ b/security/keys/request_key.c
@@ -251,11 +251,12 @@ static int construct_key(struct key *key, const void *callout_info,
* The keyring selected is returned with an extra reference upon it which the
* caller must release.
*/
-static void construct_get_dest_keyring(struct key **_dest_keyring)
+static int construct_get_dest_keyring(struct key **_dest_keyring)
{
struct request_key_auth *rka;
const struct cred *cred = current_cred();
struct key *dest_keyring = *_dest_keyring, *authkey;
+ int ret;
kenter("%p", dest_keyring);
@@ -264,6 +265,8 @@ static void construct_get_dest_keyring(struct key **_dest_keyring)
/* the caller supplied one */
key_get(dest_keyring);
} else {
+ bool do_perm_check = true;
+
/* use a default keyring; falling through the cases until we
* find one that we actually have */
switch (cred->jit_keyring) {
@@ -278,8 +281,10 @@ static void construct_get_dest_keyring(struct key **_dest_keyring)
dest_keyring =
key_get(rka->dest_keyring);
up_read(&authkey->sem);
- if (dest_keyring)
+ if (dest_keyring) {
+ do_perm_check = false;
break;
+ }
}
case KEY_REQKEY_DEFL_THREAD_KEYRING:
@@ -314,11 +319,29 @@ static void construct_get_dest_keyring(struct key **_dest_keyring)
default:
BUG();
}
+
+ /*
+ * Require Write permission on the keyring. This is essential
+ * because the default keyring may be the session keyring, and
+ * joining a keyring only requires Search permission.
+ *
+ * However, this check is skipped for the "requestor keyring" so
+ * that /sbin/request-key can itself use request_key() to add
+ * keys to the original requestor's destination keyring.
+ */
+ if (do_perm_check) {
+ ret = key_permission(make_key_ref(dest_keyring, 1),
+ KEY_NEED_WRITE);
+ if (ret) {
+ key_put(dest_keyring);
+ return ret;
+ }
+ }
}
*_dest_keyring = dest_keyring;
kleave(" [dk %d]", key_serial(dest_keyring));
- return;
+ return 0;
}
/*
@@ -444,11 +467,15 @@ static struct key *construct_key_and_link(struct keyring_search_context *ctx,
if (ctx->index_key.type == &key_type_keyring)
return ERR_PTR(-EPERM);
- user = key_user_lookup(current_fsuid());
- if (!user)
- return ERR_PTR(-ENOMEM);
+ ret = construct_get_dest_keyring(&dest_keyring);
+ if (ret)
+ goto error;
- construct_get_dest_keyring(&dest_keyring);
+ user = key_user_lookup(current_fsuid());
+ if (!user) {
+ ret = -ENOMEM;
+ goto error_put_dest_keyring;
+ }
ret = construct_alloc_key(ctx, dest_keyring, flags, user, &key);
key_user_put(user);
@@ -463,7 +490,7 @@ static struct key *construct_key_and_link(struct keyring_search_context *ctx,
} else if (ret == -EINPROGRESS) {
ret = 0;
} else {
- goto couldnt_alloc_key;
+ goto error_put_dest_keyring;
}
key_put(dest_keyring);
@@ -473,8 +500,9 @@ static struct key *construct_key_and_link(struct keyring_search_context *ctx,
construction_failed:
key_negate_and_link(key, key_negative_timeout, NULL, NULL);
key_put(key);
-couldnt_alloc_key:
+error_put_dest_keyring:
key_put(dest_keyring);
+error:
kleave(" = %d", ret);
return ERR_PTR(ret);
}
--
2.15.0.448.gf294e3d99a-goog
From: Eric Biggers <ebiggers(a)google.com>
->pkey_algo used to be an enum, but was changed to a string by commit
4e8ae72a75aa ("X.509: Make algo identifiers text instead of enum"). But
two comparisons were not updated. Fix them to use strcmp().
This bug broke signature verification in certain configurations,
depending on whether the string constants were deduplicated or not.
Fixes: 4e8ae72a75aa ("X.509: Make algo identifiers text instead of enum")
Cc: <stable(a)vger.kernel.org> # v4.6+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
v2: use != 0 in comparisons
crypto/asymmetric_keys/pkcs7_verify.c | 2 +-
crypto/asymmetric_keys/x509_public_key.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/crypto/asymmetric_keys/pkcs7_verify.c b/crypto/asymmetric_keys/pkcs7_verify.c
index 2d93d9eccb4d..986033e64a83 100644
--- a/crypto/asymmetric_keys/pkcs7_verify.c
+++ b/crypto/asymmetric_keys/pkcs7_verify.c
@@ -150,7 +150,7 @@ static int pkcs7_find_key(struct pkcs7_message *pkcs7,
pr_devel("Sig %u: Found cert serial match X.509[%u]\n",
sinfo->index, certix);
- if (x509->pub->pkey_algo != sinfo->sig->pkey_algo) {
+ if (strcmp(x509->pub->pkey_algo, sinfo->sig->pkey_algo) != 0) {
pr_warn("Sig %u: X.509 algo and PKCS#7 sig algo don't match\n",
sinfo->index);
continue;
diff --git a/crypto/asymmetric_keys/x509_public_key.c b/crypto/asymmetric_keys/x509_public_key.c
index c9013582c026..3d6f124a8b34 100644
--- a/crypto/asymmetric_keys/x509_public_key.c
+++ b/crypto/asymmetric_keys/x509_public_key.c
@@ -135,7 +135,7 @@ int x509_check_for_self_signed(struct x509_certificate *cert)
}
ret = -EKEYREJECTED;
- if (cert->pub->pkey_algo != cert->sig->pkey_algo)
+ if (strcmp(cert->pub->pkey_algo, cert->sig->pkey_algo) != 0)
goto out;
ret = public_key_verify_signature(cert->pub, cert->sig);
--
2.15.0.417.g466bffb3ac-goog
This is a note to let you know that I've just added the patch titled
fpga: region: release of_parse_phandle nodes after use
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 0f5eb1545907edeea7672a9c1652c4231150ff22 Mon Sep 17 00:00:00 2001
From: Ian Abbott <abbotti(a)mev.co.uk>
Date: Wed, 15 Nov 2017 16:33:12 -0600
Subject: fpga: region: release of_parse_phandle nodes after use
Both fpga_region_get_manager() and fpga_region_get_bridges() call
of_parse_phandle(), but nothing calls of_node_put() on the returned
struct device_node pointers. Make sure to do that to stop their
reference counters getting out of whack.
Fixes: 0fa20cdfcc1f ("fpga: fpga-region: device tree control for FPGA")
Cc: <stable(a)vger.kernel.org> # 4.10+
Signed-off-by: Ian Abbott <abbotti(a)mev.co.uk>
Signed-off-by: Alan Tull <atull(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/fpga/of-fpga-region.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/fpga/of-fpga-region.c b/drivers/fpga/of-fpga-region.c
index c6b21194dcbc..119ff75522f1 100644
--- a/drivers/fpga/of-fpga-region.c
+++ b/drivers/fpga/of-fpga-region.c
@@ -73,6 +73,7 @@ static struct fpga_manager *of_fpga_region_get_mgr(struct device_node *np)
mgr_node = of_parse_phandle(np, "fpga-mgr", 0);
if (mgr_node) {
mgr = of_fpga_mgr_get(mgr_node);
+ of_node_put(mgr_node);
of_node_put(np);
return mgr;
}
@@ -120,10 +121,13 @@ static int of_fpga_region_get_bridges(struct fpga_region *region)
parent_br = region_np->parent;
/* If overlay has a list of bridges, use it. */
- if (of_parse_phandle(info->overlay, "fpga-bridges", 0))
+ br = of_parse_phandle(info->overlay, "fpga-bridges", 0);
+ if (br) {
+ of_node_put(br);
np = info->overlay;
- else
+ } else {
np = region_np;
+ }
for (i = 0; ; i++) {
br = of_parse_phandle(np, "fpga-bridges", i);
@@ -131,12 +135,15 @@ static int of_fpga_region_get_bridges(struct fpga_region *region)
break;
/* If parent bridge is in list, skip it. */
- if (br == parent_br)
+ if (br == parent_br) {
+ of_node_put(br);
continue;
+ }
/* If node is a bridge, get it and add to list */
ret = of_fpga_bridge_get_to_list(br, info,
®ion->bridge_list);
+ of_node_put(br);
/* If any of the bridges are in use, give up */
if (ret == -EBUSY) {
--
2.15.0
This is a note to let you know that I've just added the patch titled
firmware: vpd: Tie firmware kobject to device lifetime
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From e4b28b3c3a405b251fa25db58abe1512814a680a Mon Sep 17 00:00:00 2001
From: Guenter Roeck <linux(a)roeck-us.net>
Date: Wed, 15 Nov 2017 13:00:44 -0800
Subject: firmware: vpd: Tie firmware kobject to device lifetime
It doesn't make sense to have /sys/firmware/vpd if the device is not
instantiated, so tie its lifetime to the device.
Fixes: 049a59db34eb ("firmware: Google VPD sysfs driver")
Signed-off-by: Guenter Roeck <linux(a)roeck-us.net>
Cc: stable <stable(a)vger.kernel.org>
Reviewed-by: Dmitry Torokhov <dmitry.torokhov(a)gmail.com>
Tested-by: Randy Dunlap <rdunlap(a)infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/firmware/google/vpd.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/drivers/firmware/google/vpd.c b/drivers/firmware/google/vpd.c
index 84217172297b..942e358efa60 100644
--- a/drivers/firmware/google/vpd.c
+++ b/drivers/firmware/google/vpd.c
@@ -295,7 +295,17 @@ static int vpd_probe(struct platform_device *pdev)
if (ret)
return ret;
- return vpd_sections_init(entry.cbmem_addr);
+ vpd_kobj = kobject_create_and_add("vpd", firmware_kobj);
+ if (!vpd_kobj)
+ return -ENOMEM;
+
+ ret = vpd_sections_init(entry.cbmem_addr);
+ if (ret) {
+ kobject_put(vpd_kobj);
+ return ret;
+ }
+
+ return 0;
}
static int vpd_remove(struct platform_device *pdev)
@@ -303,6 +313,8 @@ static int vpd_remove(struct platform_device *pdev)
vpd_section_destroy(&ro_vpd);
vpd_section_destroy(&rw_vpd);
+ kobject_put(vpd_kobj);
+
return 0;
}
@@ -322,10 +334,6 @@ static int __init vpd_platform_init(void)
if (IS_ERR(pdev))
return PTR_ERR(pdev);
- vpd_kobj = kobject_create_and_add("vpd", firmware_kobj);
- if (!vpd_kobj)
- return -ENOMEM;
-
platform_driver_register(&vpd_driver);
return 0;
@@ -333,7 +341,6 @@ static int __init vpd_platform_init(void)
static void __exit vpd_platform_exit(void)
{
- kobject_put(vpd_kobj);
}
module_init(vpd_platform_init);
--
2.15.0
This is a note to let you know that I've just added the patch titled
firmware: vpd: Fix platform driver and device
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 0631fb8b027f5968c2f5031f0b3ff7be3e4bebcc Mon Sep 17 00:00:00 2001
From: Guenter Roeck <linux(a)roeck-us.net>
Date: Wed, 15 Nov 2017 13:00:45 -0800
Subject: firmware: vpd: Fix platform driver and device
registration/unregistration
The driver exit function needs to unregister both platform device and
driver. Also, during registration, register driver first and perform
error checks.
Fixes: 049a59db34eb ("firmware: Google VPD sysfs driver")
Signed-off-by: Guenter Roeck <linux(a)roeck-us.net>
Cc: stable <stable(a)vger.kernel.org>
Tested-by: Randy Dunlap <rdunlap(a)infradead.org>
Reviewed-by: Dmitry Torokhov <dmitry.torokhov(a)gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/firmware/google/vpd.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/drivers/firmware/google/vpd.c b/drivers/firmware/google/vpd.c
index 942e358efa60..e4b40f2b4627 100644
--- a/drivers/firmware/google/vpd.c
+++ b/drivers/firmware/google/vpd.c
@@ -326,21 +326,29 @@ static struct platform_driver vpd_driver = {
},
};
+static struct platform_device *vpd_pdev;
+
static int __init vpd_platform_init(void)
{
- struct platform_device *pdev;
+ int ret;
- pdev = platform_device_register_simple("vpd", -1, NULL, 0);
- if (IS_ERR(pdev))
- return PTR_ERR(pdev);
+ ret = platform_driver_register(&vpd_driver);
+ if (ret)
+ return ret;
- platform_driver_register(&vpd_driver);
+ vpd_pdev = platform_device_register_simple("vpd", -1, NULL, 0);
+ if (IS_ERR(vpd_pdev)) {
+ platform_driver_unregister(&vpd_driver);
+ return PTR_ERR(vpd_pdev);
+ }
return 0;
}
static void __exit vpd_platform_exit(void)
{
+ platform_device_unregister(vpd_pdev);
+ platform_driver_unregister(&vpd_driver);
}
module_init(vpd_platform_init);
--
2.15.0