From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Subject: mm/khugepaged.c: convert VM_BUG_ON() to collapse fail
khugepaged is not yet able to convert PTE-mapped huge pages back to PMD
mapped. We do not collapse such pages. See check khugepaged_scan_pmd().
But if between khugepaged_scan_pmd() and __collapse_huge_page_isolate()
somebody managed to instantiate THP in the range and then split the PMD
back to PTEs we would have a problem -- VM_BUG_ON_PAGE(PageCompound(page))
will get triggered.
It's possible since we drop mmap_sem during collapse to re-take for
write.
Replace the VM_BUG_ON() with graceful collapse fail.
Link: http://lkml.kernel.org/r/20180315152353.27989-1-kirill.shutemov@linux.intel…
Fixes: b1caa957ae6d ("khugepaged: ignore pmd tables with THP mapped with ptes")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Laura Abbott <labbott(a)redhat.com>
Cc: Jerome Marchand <jmarchan(a)redhat.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <>
---
mm/khugepaged.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff -puN mm/khugepaged.c~mm-khugepaged-convert-vm_bug_on-to-collapse-fail mm/khugepaged.c
--- a/mm/khugepaged.c~mm-khugepaged-convert-vm_bug_on-to-collapse-fail
+++ a/mm/khugepaged.c
@@ -530,7 +530,12 @@ static int __collapse_huge_page_isolate(
goto out;
}
- VM_BUG_ON_PAGE(PageCompound(page), page);
+ /* TODO: teach khugepaged to collapse THP mapped with pte */
+ if (PageCompound(page)) {
+ result = SCAN_PAGE_COMPOUND;
+ goto out;
+ }
+
VM_BUG_ON_PAGE(!PageAnon(page), page);
/*
_
From: Arnd Bergmann <arnd(a)arndb.de>
Subject: h8300: remove extraneous __BIG_ENDIAN definition
A bugfix I did earlier caused a build regression on h8300, which defines
the __BIG_ENDIAN macro in a slightly different way than the generic code:
arch/h8300/include/asm/byteorder.h:5:0: warning: "__BIG_ENDIAN" redefined
We don't need to define it here, as the same macro is already provided by
the linux/byteorder/big_endian.h, and that version does not conflict.
While this is a v4.16 regression, my earlier patch also got backported to
the 4.14 and 4.15 stable kernels, so we need the fixup there as well.
Link: http://lkml.kernel.org/r/20180313120752.2645129-1-arnd@arndb.de
Fixes: 101110f6271c ("Kbuild: always define endianess in kconfig.h")
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Cc: Yoshinori Sato <ysato(a)users.sourceforge.jp>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <>
---
arch/h8300/include/asm/byteorder.h | 1 -
1 file changed, 1 deletion(-)
diff -puN arch/h8300/include/asm/byteorder.h~h8300-remove-extranous-__big_endian-definition arch/h8300/include/asm/byteorder.h
--- a/arch/h8300/include/asm/byteorder.h~h8300-remove-extranous-__big_endian-definition
+++ a/arch/h8300/include/asm/byteorder.h
@@ -2,7 +2,6 @@
#ifndef __H8300_BYTEORDER_H__
#define __H8300_BYTEORDER_H__
-#define __BIG_ENDIAN __ORDER_BIG_ENDIAN__
#include <linux/byteorder/big_endian.h>
#endif
_
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlbfs: check for pgoff value overflow
A vma with vm_pgoff large enough to overflow a loff_t type when converted
to a byte offset can be passed via the remap_file_pages system call. The
hugetlbfs mmap routine uses the byte offset to calculate reservations and
file size.
A sequence such as:
mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);
will result in the following when task exits/file closed,
kernel BUG at mm/hugetlb.c:749!
Call Trace:
hugetlbfs_evict_inode+0x2f/0x40
evict+0xcb/0x190
__dentry_kill+0xcb/0x150
__fput+0x164/0x1e0
task_work_run+0x84/0xa0
exit_to_usermode_loop+0x7d/0x80
do_syscall_64+0x18b/0x190
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
The overflowed pgoff value causes hugetlbfs to try to set up a mapping
with a negative range (end < start) that leaves invalid state which causes
the BUG.
The previous overflow fix to this code was incomplete and did not take the
remap_file_pages system call into account.
[mike.kravetz(a)oracle.com: v3]
Link: http://lkml.kernel.org/r/20180309002726.7248-1-mike.kravetz@oracle.com
[akpm(a)linux-foundation.org: include mmdebug.h]
[akpm(a)linux-foundation.org: fix -ve left shift count on sh]
Link: http://lkml.kernel.org/r/20180308210502.15952-1-mike.kravetz@oracle.com
Fixes: 045c7a3f53d9 ("hugetlbfs: fix offset overflow in hugetlbfs mmap")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reported-by: Nic Losby <blurbdust(a)gmail.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Yisheng Xie <xieyisheng1(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <>
---
fs/hugetlbfs/inode.c | 17 ++++++++++++++---
mm/hugetlb.c | 7 +++++++
2 files changed, 21 insertions(+), 3 deletions(-)
diff -puN fs/hugetlbfs/inode.c~hugetlbfs-check-for-pgoff-value-overflow fs/hugetlbfs/inode.c
--- a/fs/hugetlbfs/inode.c~hugetlbfs-check-for-pgoff-value-overflow
+++ a/fs/hugetlbfs/inode.c
@@ -108,6 +108,16 @@ static void huge_pagevec_release(struct
pagevec_reinit(pvec);
}
+/*
+ * Mask used when checking the page offset value passed in via system
+ * calls. This value will be converted to a loff_t which is signed.
+ * Therefore, we want to check the upper PAGE_SHIFT + 1 bits of the
+ * value. The extra bit (- 1 in the shift value) is to take the sign
+ * bit into account.
+ */
+#define PGOFF_LOFFT_MAX \
+ (((1UL << (PAGE_SHIFT + 1)) - 1) << (BITS_PER_LONG - (PAGE_SHIFT + 1)))
+
static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
{
struct inode *inode = file_inode(file);
@@ -127,12 +137,13 @@ static int hugetlbfs_file_mmap(struct fi
vma->vm_ops = &hugetlb_vm_ops;
/*
- * Offset passed to mmap (before page shift) could have been
- * negative when represented as a (l)off_t.
+ * page based offset in vm_pgoff could be sufficiently large to
+ * overflow a (l)off_t when converted to byte offset.
*/
- if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0)
+ if (vma->vm_pgoff & PGOFF_LOFFT_MAX)
return -EINVAL;
+ /* must be huge page aligned */
if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT))
return -EINVAL;
diff -puN mm/hugetlb.c~hugetlbfs-check-for-pgoff-value-overflow mm/hugetlb.c
--- a/mm/hugetlb.c~hugetlbfs-check-for-pgoff-value-overflow
+++ a/mm/hugetlb.c
@@ -18,6 +18,7 @@
#include <linux/bootmem.h>
#include <linux/sysfs.h>
#include <linux/slab.h>
+#include <linux/mmdebug.h>
#include <linux/sched/signal.h>
#include <linux/rmap.h>
#include <linux/string_helpers.h>
@@ -4374,6 +4375,12 @@ int hugetlb_reserve_pages(struct inode *
struct resv_map *resv_map;
long gbl_reserve;
+ /* This should never happen */
+ if (from > to) {
+ VM_WARN(1, "%s called with a negative range\n", __func__);
+ return -EINVAL;
+ }
+
/*
* Only apply hugepage reservation if asked. At fault time, an
* attempt will be made for VM_NORESERVE to allocate a page
_
From: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Subject: lockdep: fix fs_reclaim warning
Dave Jones reported fs_reclaim lockdep warnings.
============================================
WARNING: possible recursive locking detected
4.15.0-rc9-backup-debug+ #1 Not tainted
--------------------------------------------
sshd/24800 is trying to acquire lock:
(fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30
but task is already holding lock:
(fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(fs_reclaim);
lock(fs_reclaim);
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by sshd/24800:
#0: (sk_lock-AF_INET6){+.+.}, at: [<000000001a069652>] tcp_sendmsg+0x19/0x40
#1: (fs_reclaim){+.+.}, at: [<0000000084f438c2>] fs_reclaim_acquire.part.102+0x5/0x30
stack backtrace:
CPU: 3 PID: 24800 Comm: sshd Not tainted 4.15.0-rc9-backup-debug+ #1
Call Trace:
dump_stack+0xbc/0x13f
__lock_acquire+0xa09/0x2040
lock_acquire+0x12e/0x350
fs_reclaim_acquire.part.102+0x29/0x30
kmem_cache_alloc+0x3d/0x2c0
alloc_extent_state+0xa7/0x410
__clear_extent_bit+0x3ea/0x570
try_release_extent_mapping+0x21a/0x260
__btrfs_releasepage+0xb0/0x1c0
btrfs_releasepage+0x161/0x170
try_to_release_page+0x162/0x1c0
shrink_page_list+0x1d5a/0x2fb0
shrink_inactive_list+0x451/0x940
shrink_node_memcg.constprop.88+0x4c9/0x5e0
shrink_node+0x12d/0x260
try_to_free_pages+0x418/0xaf0
__alloc_pages_slowpath+0x976/0x1790
__alloc_pages_nodemask+0x52c/0x5c0
new_slab+0x374/0x3f0
___slab_alloc.constprop.81+0x47e/0x5a0
__slab_alloc.constprop.80+0x32/0x60
__kmalloc_track_caller+0x267/0x310
__kmalloc_reserve.isra.40+0x29/0x80
__alloc_skb+0xee/0x390
sk_stream_alloc_skb+0xb8/0x340
tcp_sendmsg_locked+0x8e6/0x1d30
tcp_sendmsg+0x27/0x40
inet_sendmsg+0xd0/0x310
sock_write_iter+0x17a/0x240
__vfs_write+0x2ab/0x380
vfs_write+0xfb/0x260
SyS_write+0xb6/0x140
do_syscall_64+0x1e5/0xc05
entry_SYSCALL64_slow_path+0x25/0x25
This warning is caused by commit d92a8cfcb37ecd13 ("locking/lockdep:
Rework FS_RECLAIM annotation") which replaced
lockdep_set_current_reclaim_state()/ lockdep_clear_current_reclaim_state()
in __perform_reclaim() and lockdep_trace_alloc() in slab_pre_alloc_hook()
with fs_reclaim_acquire()/ fs_reclaim_release(). Since
__kmalloc_reserve() from __alloc_skb() adds __GFP_NOMEMALLOC |
__GFP_NOWARN to gfp_mask, and all reclaim path simply propagates
__GFP_NOMEMALLOC, fs_reclaim_acquire() in slab_pre_alloc_hook() is trying
to grab the 'fake' lock again when __perform_reclaim() already grabbed the
'fake' lock.
The
/* this guy won't enter reclaim */
if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC))
return false;
test which causes slab_pre_alloc_hook() to try to grab the 'fake' lock was
added by commit cf40bd16fdad42c0 ("lockdep: annotate reclaim context
(__GFP_NOFS)"). But that test is outdated because PF_MEMALLOC thread
won't enter reclaim regardless of __GFP_NOMEMALLOC after commit
341ce06f69abfafa ("page allocator: calculate the alloc_flags for
allocation only once") added the PF_MEMALLOC safeguard (
/* Avoid recursion of direct reclaim */
if (p->flags & PF_MEMALLOC)
goto nopage;
in __alloc_pages_slowpath()).
Thus, let's fix outdated test by removing __GFP_NOMEMALLOC test and allow
__need_fs_reclaim() to return false.
Link: http://lkml.kernel.org/r/201802280650.FJC73911.FOSOMLJVFFQtHO@I-love.SAKURA…
Fixes: d92a8cfcb37ecd13 ("locking/lockdep: Rework FS_RECLAIM annotation")
Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Reported-by: Dave Jones <davej(a)codemonkey.org.uk>
Tested-by: Dave Jones <davej(a)codemonkey.org.uk>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Nick Piggin <npiggin(a)gmail.com>
Cc: Ingo Molnar <mingo(a)elte.hu>
Cc: Nikolay Borisov <nborisov(a)suse.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <>
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN mm/page_alloc.c~lockdep-fix-fs_reclaim-warning mm/page_alloc.c
--- a/mm/page_alloc.c~lockdep-fix-fs_reclaim-warning
+++ a/mm/page_alloc.c
@@ -3596,7 +3596,7 @@ static bool __need_fs_reclaim(gfp_t gfp_
return false;
/* this guy won't enter reclaim */
- if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC))
+ if (current->flags & PF_MEMALLOC)
return false;
/* We're only interested __GFP_FS allocations for now */
_
This is a note to let you know that I've just added the patch titled
RDMA/vmw_pvrdma: Fix usage of user response structures in ABI file
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
rdma-vmw_pvrdma-fix-usage-of-user-response-structures-in-abi-file.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 1f5a6c47aabc4606f91ad2e6ef71a1ff1924101c Mon Sep 17 00:00:00 2001
From: Adit Ranadive <aditr(a)vmware.com>
Date: Thu, 15 Feb 2018 12:36:46 -0800
Subject: RDMA/vmw_pvrdma: Fix usage of user response structures in ABI file
From: Adit Ranadive <aditr(a)vmware.com>
commit 1f5a6c47aabc4606f91ad2e6ef71a1ff1924101c upstream.
This ensures that we return the right structures back to userspace.
Otherwise, it looks like the reserved fields in the response structures
in userspace might have uninitialized data in them.
Fixes: 8b10ba783c9d ("RDMA/vmw_pvrdma: Add shared receive queue support")
Fixes: 29c8d9eba550 ("IB: Add vmw_pvrdma driver")
Suggested-by: Jason Gunthorpe <jgg(a)mellanox.com>
Reviewed-by: Bryan Tan <bryantan(a)vmware.com>
Reviewed-by: Aditya Sarwade <asarwade(a)vmware.com>
Reviewed-by: Jorgen Hansen <jhansen(a)vmware.com>
Signed-off-by: Adit Ranadive <aditr(a)vmware.com>
Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c | 4 +++-
drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.c | 4 +++-
2 files changed, 6 insertions(+), 2 deletions(-)
--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c
+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c
@@ -114,6 +114,7 @@ struct ib_cq *pvrdma_create_cq(struct ib
union pvrdma_cmd_resp rsp;
struct pvrdma_cmd_create_cq *cmd = &req.create_cq;
struct pvrdma_cmd_create_cq_resp *resp = &rsp.create_cq_resp;
+ struct pvrdma_create_cq_resp cq_resp = {0};
struct pvrdma_create_cq ucmd;
BUILD_BUG_ON(sizeof(struct pvrdma_cqe) != 64);
@@ -198,6 +199,7 @@ struct ib_cq *pvrdma_create_cq(struct ib
cq->ibcq.cqe = resp->cqe;
cq->cq_handle = resp->cq_handle;
+ cq_resp.cqn = resp->cq_handle;
spin_lock_irqsave(&dev->cq_tbl_lock, flags);
dev->cq_tbl[cq->cq_handle % dev->dsr->caps.max_cq] = cq;
spin_unlock_irqrestore(&dev->cq_tbl_lock, flags);
@@ -206,7 +208,7 @@ struct ib_cq *pvrdma_create_cq(struct ib
cq->uar = &(to_vucontext(context)->uar);
/* Copy udata back. */
- if (ib_copy_to_udata(udata, &cq->cq_handle, sizeof(__u32))) {
+ if (ib_copy_to_udata(udata, &cq_resp, sizeof(cq_resp))) {
dev_warn(&dev->pdev->dev,
"failed to copy back udata\n");
pvrdma_destroy_cq(&cq->ibcq);
--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.c
+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.c
@@ -444,6 +444,7 @@ struct ib_pd *pvrdma_alloc_pd(struct ib_
union pvrdma_cmd_resp rsp;
struct pvrdma_cmd_create_pd *cmd = &req.create_pd;
struct pvrdma_cmd_create_pd_resp *resp = &rsp.create_pd_resp;
+ struct pvrdma_alloc_pd_resp pd_resp = {0};
int ret;
void *ptr;
@@ -472,9 +473,10 @@ struct ib_pd *pvrdma_alloc_pd(struct ib_
pd->privileged = !context;
pd->pd_handle = resp->pd_handle;
pd->pdn = resp->pd_handle;
+ pd_resp.pdn = resp->pd_handle;
if (context) {
- if (ib_copy_to_udata(udata, &pd->pdn, sizeof(__u32))) {
+ if (ib_copy_to_udata(udata, &pd_resp, sizeof(pd_resp))) {
dev_warn(&dev->pdev->dev,
"failed to copy back protection domain\n");
pvrdma_dealloc_pd(&pd->ibpd);
Patches currently in stable-queue which might be from aditr(a)vmware.com are
queue-4.14/rdma-vmw_pvrdma-fix-usage-of-user-response-structures-in-abi-file.patch
This is a note to let you know that I've just added the patch titled
kbuild: fix linker feature test macros when cross compiling with Clang
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
kbuild-fix-linker-feature-test-macros-when-cross-compiling-with-clang.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 86a9df597cdd564d2d29c65897bcad42519e3678 Mon Sep 17 00:00:00 2001
From: Nick Desaulniers <ndesaulniers(a)google.com>
Date: Mon, 6 Nov 2017 10:47:54 -0800
Subject: kbuild: fix linker feature test macros when cross compiling with Clang
From: Nick Desaulniers <ndesaulniers(a)google.com>
commit 86a9df597cdd564d2d29c65897bcad42519e3678 upstream.
I was not seeing my linker flags getting added when using ld-option when
cross compiling with Clang. Upon investigation, this seems to be due to
a difference in how GCC vs Clang handle cross compilation.
GCC is configured at build time to support one backend, that is implicit
when compiling. Clang is explicit via the use of `-target <triple>` and
ships with all supported backends by default.
GNU Make feature test macros that compile then link will always fail
when cross compiling with Clang unless Clang's triple is passed along to
the compiler. For example:
$ clang -x c /dev/null -c -o temp.o
$ aarch64-linux-android/bin/ld -E temp.o
aarch64-linux-android/bin/ld:
unknown architecture of input file `temp.o' is incompatible with
aarch64 output
aarch64-linux-android/bin/ld:
warning: cannot find entry symbol _start; defaulting to
0000000000400078
$ echo $?
1
$ clang -target aarch64-linux-android- -x c /dev/null -c -o temp.o
$ aarch64-linux-android/bin/ld -E temp.o
aarch64-linux-android/bin/ld:
warning: cannot find entry symbol _start; defaulting to 00000000004002e4
$ echo $?
0
This causes conditional checks that invoke $(CC) without the target
triple, then $(LD) on the result, to always fail.
Suggested-by: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Signed-off-by: Nick Desaulniers <ndesaulniers(a)google.com>
Reviewed-by: Matthias Kaehlcke <mka(a)chromium.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Signed-off-by: Greg Hackmann <ghackmann(a)google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
scripts/Kbuild.include | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--- a/scripts/Kbuild.include
+++ b/scripts/Kbuild.include
@@ -160,12 +160,13 @@ cc-if-fullversion = $(shell [ $(cc-fullv
# cc-ldoption
# Usage: ldflags += $(call cc-ldoption, -Wl$(comma)--hash-style=both)
cc-ldoption = $(call try-run,\
- $(CC) $(1) -nostdlib -x c /dev/null -o "$$TMP",$(1),$(2))
+ $(CC) $(1) $(KBUILD_CPPFLAGS) $(CC_OPTION_CFLAGS) -nostdlib -x c /dev/null -o "$$TMP",$(1),$(2))
# ld-option
# Usage: LDFLAGS += $(call ld-option, -X)
ld-option = $(call try-run,\
- $(CC) -x c /dev/null -c -o "$$TMPO" ; $(LD) $(1) "$$TMPO" -o "$$TMP",$(1),$(2))
+ $(CC) $(KBUILD_CPPFLAGS) $(CC_OPTION_CFLAGS) -x c /dev/null -c -o "$$TMPO"; \
+ $(LD) $(LDFLAGS) $(1) "$$TMPO" -o "$$TMP",$(1),$(2))
# ar-option
# Usage: KBUILD_ARFLAGS := $(call ar-option,D)
Patches currently in stable-queue which might be from ndesaulniers(a)google.com are
queue-4.14/kbuild-fix-linker-feature-test-macros-when-cross-compiling-with-clang.patch