The patch titled
Subject: ipc/shm.c add ->pagesize function to shm_vm_ops
has been added to the -mm tree. Its filename is
ipc-shmc-add-pagesize-function-to-shm_vm_ops.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/ipc-shmc-add-pagesize-function-to-…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/ipc-shmc-add-pagesize-function-to-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Jane Chu <jane.chu(a)oracle.com>
Subject: ipc/shm.c add ->pagesize function to shm_vm_ops
05ea88608d4e13 (mm, hugetlbfs: introduce ->pagesize() to
vm_operations_struct) adds a new ->pagesize() function to hugetlb_vm_ops,
intended to cover all hugetlbfs backed files.
With System V shared memory model, if "huge page" is specified, the
"shared memory" is backed by hugetlbfs files, but the mappings initiated
via shmget/shmat have their original vm_ops overwritten with shm_vm_ops,
so we need to add a ->pagesize function to shm_vm_ops. Otherwise,
vma_kernel_pagesize() returns PAGE_SIZE given a hugetlbfs backed vma,
result in below BUG:
fs/hugetlbfs/inode.c
443 if (unlikely(page_mapped(page))) {
444 BUG_ON(truncate_op);
[ 242.268342] hugetlbfs: oracle (4592): Using mlock ulimits for SHM_HUGETLB is deprecated
[ 282.653208] ------------[ cut here ]------------
[ 282.708447] kernel BUG at fs/hugetlbfs/inode.c:444!
[ 282.818957] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 ...
[ 284.025873] CPU: 35 PID: 5583 Comm: oracle_5583_sbt Not tainted 4.14.35-1829.el7uek.x86_64 #2
[ 284.246609] task: ffff9bf0507aaf80 task.stack: ffffa9e625628000
[ 284.317455] RIP: 0010:remove_inode_hugepages+0x3db/0x3e2
....
[ 285.292389] Call Trace:
[ 285.321630] hugetlbfs_evict_inode+0x1e/0x3e
[ 285.372707] evict+0xdb/0x1af
[ 285.408185] iput+0x1a2/0x1f7
[ 285.443661] dentry_unlink_inode+0xc6/0xf0
[ 285.492661] __dentry_kill+0xd8/0x18d
[ 285.536459] dput+0x1b5/0x1ed
[ 285.571939] __fput+0x18b/0x216
[ 285.609495] ____fput+0xe/0x10
[ 285.646030] task_work_run+0x90/0xa7
[ 285.688788] exit_to_usermode_loop+0xdd/0x116
[ 285.740905] do_syscall_64+0x187/0x1ae
[ 285.785740] entry_SYSCALL_64_after_hwframe+0x150/0x0
Link: http://lkml.kernel.org/r/20180727211727.5020-1-jane.chu@oracle.com
Fixes: 05ea88608d4e13 ("mm, hugetlbfs: introduce ->pagesize() to vm_operations_struct")
Signed-off-by: Jane Chu <jane.chu(a)oracle.com>
Suggested-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Jérôme Glisse <jglisse(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Manfred Spraul <manfred(a)colorfullife.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
diff -puN include/linux/mm.h~ipc-shmc-add-pagesize-function-to-shm_vm_ops include/linux/mm.h
--- a/include/linux/mm.h~ipc-shmc-add-pagesize-function-to-shm_vm_ops
+++ a/include/linux/mm.h
@@ -389,6 +389,13 @@ enum page_entry_size {
* These are the virtual MM functions - opening of an area, closing and
* unmapping it (needed to keep files on disk up-to-date etc), pointer
* to the functions called when a no-page or a wp-page exception occurs.
+ *
+ * Note, when a new function is introduced to vm_operations_struct and
+ * added to hugetlb_vm_ops, please consider adding the function to
+ * shm_vm_ops. This is because under System V memory model, though
+ * mappings created via shmget/shmat with "huge page" specified are
+ * backed by hugetlbfs files, their original vm_ops are overwritten with
+ * shm_vm_ops.
*/
struct vm_operations_struct {
void (*open)(struct vm_area_struct * area);
diff -puN ipc/shm.c~ipc-shmc-add-pagesize-function-to-shm_vm_ops ipc/shm.c
--- a/ipc/shm.c~ipc-shmc-add-pagesize-function-to-shm_vm_ops
+++ a/ipc/shm.c
@@ -427,6 +427,17 @@ static int shm_split(struct vm_area_stru
return 0;
}
+static unsigned long shm_pagesize(struct vm_area_struct *vma)
+{
+ struct file *file = vma->vm_file;
+ struct shm_file_data *sfd = shm_file_data(file);
+
+ if (sfd->vm_ops->pagesize)
+ return sfd->vm_ops->pagesize(vma);
+
+ return PAGE_SIZE;
+}
+
#ifdef CONFIG_NUMA
static int shm_set_policy(struct vm_area_struct *vma, struct mempolicy *new)
{
@@ -554,6 +565,7 @@ static const struct vm_operations_struct
.close = shm_close, /* callback for when the vm-area is released */
.fault = shm_fault,
.split = shm_split,
+ .pagesize = shm_pagesize,
#if defined(CONFIG_NUMA)
.set_policy = shm_set_policy,
.get_policy = shm_get_policy,
_
Patches currently in -mm which might be from jane.chu(a)oracle.com are
ipc-shmc-add-pagesize-function-to-shm_vm_ops.patch
The patch titled
Subject: kvm, mm: account shadow page tables to kmemcg
has been removed from the -mm tree. Its filename was
kvm-mm-account-shadow-page-tables-to-kmemcg.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Shakeel Butt <shakeelb(a)google.com>
Subject: kvm, mm: account shadow page tables to kmemcg
The size of kvm's shadow page tables corresponds to the size of the guest
virtual machines on the system. Large VMs can spend a significant amount
of memory as shadow page tables which can not be left as system memory
overhead. So, account shadow page tables to the kmemcg.
[shakeelb(a)google.com: replace (GFP_KERNEL|__GFP_ACCOUNT) with GFP_KERNEL_ACCOUNT]
Link: http://lkml.kernel.org/r/20180629140224.205849-1-shakeelb@google.com
Link: http://lkml.kernel.org/r/20180627181349.149778-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb(a)google.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Greg Thelen <gthelen(a)google.com>
Cc: Radim Krčmář <rkrcmar(a)redhat.com>
Cc: Peter Feiner <pfeiner(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/kvm/mmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kvm/mmu.c~kvm-mm-account-shadow-page-tables-to-kmemcg
+++ a/arch/x86/kvm/mmu.c
@@ -890,7 +890,7 @@ static int mmu_topup_memory_cache_page(s
if (cache->nobjs >= min)
return 0;
while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
- page = (void *)__get_free_page(GFP_KERNEL);
+ page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
if (!page)
return -ENOMEM;
cache->objects[cache->nobjs++] = page;
_
Patches currently in -mm which might be from shakeelb(a)google.com are
fs-fsnotify-account-fsnotify-metadata-to-kmemcg.patch
fs-fsnotify-account-fsnotify-metadata-to-kmemcg-fix.patch
fs-mm-account-buffer_head-to-kmemcg.patch
fs-mm-account-buffer_head-to-kmemcgpatchfix.patch
memcg-reduce-memcg-tree-traversals-for-stats-collection.patch
The patch titled
Subject: mm: introduce vma_init()
has been removed from the -mm tree. Its filename was
mm-introduce-vma_init.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Subject: mm: introduce vma_init()
Not all VMAs allocated with vm_area_alloc(). Some of them allocated on
stack or in data segment.
The new helper can be use to initialize VMA properly regardless where it
was allocated.
Link: http://lkml.kernel.org/r/20180724121139.62570-2-kirill.shutemov@linux.intel…
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Acked-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mm.h | 6 ++++++
kernel/fork.c | 6 ++----
2 files changed, 8 insertions(+), 4 deletions(-)
--- a/include/linux/mm.h~mm-introduce-vma_init
+++ a/include/linux/mm.h
@@ -452,6 +452,12 @@ struct vm_operations_struct {
unsigned long addr);
};
+static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm)
+{
+ vma->vm_mm = mm;
+ INIT_LIST_HEAD(&vma->anon_vma_chain);
+}
+
struct mmu_gather;
struct inode;
--- a/kernel/fork.c~mm-introduce-vma_init
+++ a/kernel/fork.c
@@ -312,10 +312,8 @@ struct vm_area_struct *vm_area_alloc(str
{
struct vm_area_struct *vma = kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL);
- if (vma) {
- vma->vm_mm = mm;
- INIT_LIST_HEAD(&vma->anon_vma_chain);
- }
+ if (vma)
+ vma_init(vma, mm);
return vma;
}
_
Patches currently in -mm which might be from kirill.shutemov(a)linux.intel.com are
mm-page_ext-drop-definition-of-unused-page_ext_debug_poison.patch
mm-page_ext-constify-lookup_page_ext-argument.patch
The patch titled
Subject: delayacct: fix crash in delayacct_blkio_end() after delayacct init failure
has been removed from the -mm tree. Its filename was
delayacct-fix-crash-in-delayacct_blkio_end-after-delayacct-init-failure.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Tejun Heo <tj(a)kernel.org>
Subject: delayacct: fix crash in delayacct_blkio_end() after delayacct init failure
While forking, if delayacct init fails due to memory shortage, it
continues expecting all delayacct users to check task->delays pointer
against NULL before dereferencing it, which all of them used to do.
c96f5471ce7d ("delayacct: Account blkio completion on the correct task"),
while updating delayacct_blkio_end() to take the target task instead of
always using %current, made the function test NULL on %current->delays and
then continue to operated on @p->delays. If %current succeeded init while
@p didn't, it leads to the following crash.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
IP: __delayacct_blkio_end+0xc/0x40
PGD 8000001fd07e1067 P4D 8000001fd07e1067 PUD 1fcffbb067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 4 PID: 25774 Comm: QIOThread0 Not tainted 4.16.0-9_fbk1_rc2_1180_g6b593215b4d7 #9
Hardware name: Quanta Leopard ORv2-DDR4/Leopard ORv2-DDR4, BIOS F06_3B12 08/17/2017
RIP: 0010:__delayacct_blkio_end+0xc/0x40
RSP: 0000:ffff881fff703bf8 EFLAGS: 00010086
RAX: ffff881f1ec8b800 RBX: ffff8804f735cd54 RCX: ffff881fff703cb0
RDX: 0000000000000002 RSI: 0000000000000003 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: ffff881fff703cc0
R10: 0000000000001000 R11: ffff881fd3f73d00 R12: ffff8804f735c600
R13: 0000000000000000 R14: 000000000000001d R15: ffff881fff703cb0
FS: 00007f5003f7d700(0000) GS:ffff881fff700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000004 CR3: 0000001f401a6006 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
try_to_wake_up+0x2c0/0x600
autoremove_wake_function+0xe/0x30
__wake_up_common+0x74/0x120
wake_up_page_bit+0x9c/0xe0
mpage_end_io+0x27/0x70
blk_update_request+0x78/0x2c0
scsi_end_request+0x2c/0x1e0
scsi_io_completion+0x20b/0x5f0
blk_mq_complete_request+0xa2/0x100
ata_scsi_qc_complete+0x79/0x400
ata_qc_complete_multiple+0x86/0xd0
ahci_handle_port_interrupt+0xc9/0x5c0
ahci_handle_port_intr+0x54/0xb0
ahci_single_level_irq_intr+0x3b/0x60
__handle_irq_event_percpu+0x43/0x190
handle_irq_event_percpu+0x20/0x50
handle_irq_event+0x2a/0x50
handle_edge_irq+0x80/0x1c0
handle_irq+0xaf/0x120
do_IRQ+0x41/0xc0
common_interrupt+0xf/0xf
</IRQ>
Fix it by updating delayacct_blkio_end() check @p->delays instead.
Link: http://lkml.kernel.org/r/20180724175542.GP1934745@devbig577.frc2.facebook.c…
Fixes: c96f5471ce7d ("delayacct: Account blkio completion on the correct task")
Signed-off-by: Tejun Heo <tj(a)kernel.org>
Reported-by: Dave Jones <dsj(a)fb.com>
Debugged-by: Dave Jones <dsj(a)fb.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Josh Snyder <joshs(a)netflix.com>
Cc: <stable(a)vger.kernel.org> [4.15+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/delayacct.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/linux/delayacct.h~delayacct-fix-crash-in-delayacct_blkio_end-after-delayacct-init-failure
+++ a/include/linux/delayacct.h
@@ -124,7 +124,7 @@ static inline void delayacct_blkio_start
static inline void delayacct_blkio_end(struct task_struct *p)
{
- if (current->delays)
+ if (p->delays)
__delayacct_blkio_end(p);
delayacct_clear_flag(DELAYACCT_PF_BLKIO);
}
_
Patches currently in -mm which might be from tj(a)kernel.org are
This is the start of the stable review cycle for the 3.18.117 release.
There are 27 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun Jul 29 10:26:38 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v3.x/stable-review/patch-3.18.117-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-3.18.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 3.18.117-rc1
Arnd Bergmann <arnd(a)arndb.de>
turn off -Wattribute-alias
Arnd Bergmann <arnd(a)arndb.de>
ARM: fix put_user() for gcc-8
Anssi Hannula <anssi.hannula(a)bitwise.fi>
can: xilinx_can: fix RX overflow interrupt not being enabled
Anssi Hannula <anssi.hannula(a)bitwise.fi>
can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting
Anssi Hannula <anssi.hannula(a)bitwise.fi>
can: xilinx_can: fix device dropping off bus on RX overrun
Anssi Hannula <anssi.hannula(a)bitwise.fi>
can: xilinx_can: fix RX loop if RXNEMP is asserted without RXOK
Jerry Zhang <zhangjerry(a)google.com>
usb: gadget: f_fs: Only return delayed status when len is 0
Bin Liu <b-liu(a)ti.com>
usb: core: handle hub C_PORT_OVER_CURRENT condition
Lubomir Rintel <lkundrak(a)v3.sk>
usb: cdc_acm: Add quirk for Castles VEGA3000
Eric Dumazet <edumazet(a)google.com>
tcp: detect malicious patterns in tcp_collapse_ofo_queue()
Eric Dumazet <edumazet(a)google.com>
tcp: avoid collapses in tcp_prune_queue() if possible
Yuchung Cheng <ycheng(a)google.com>
tcp: do not delay ACK in DCTCP upon CE status change
Yuchung Cheng <ycheng(a)google.com>
tcp: do not cancel delay-AcK on DCTCP special ACK
Yuchung Cheng <ycheng(a)google.com>
tcp: helpers to send special DCTCP ack
Yuchung Cheng <ycheng(a)google.com>
tcp: fix dctcp delayed ACK schedule
Roopa Prabhu <roopa(a)cumulusnetworks.com>
rtnetlink: add rtnl_link_state check in rtnl_configure_link
Jack Morgenstein <jackm(a)dev.mellanox.co.il>
net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper
Paolo Abeni <pabeni(a)redhat.com>
ip: hash fragments consistently
Stefano Brivio <sbrivio(a)redhat.com>
skbuff: Unconditionally copy pfmemalloc in __skb_clone()
Stefano Brivio <sbrivio(a)redhat.com>
net: Don't copy pfmemalloc flag in __copy_skb_header()
Gustavo A. R. Silva <gustavo(a)embeddedor.com>
ptp: fix missing break in switch
Tyler Hicks <tyhicks(a)canonical.com>
ipv4: Return EINVAL when ping_group_range sysctl doesn't map to user ns
Vineet Gupta <vgupta(a)synopsys.com>
ARC: mm: allow mprotect to make stack mappings executable
Alexey Brodkin <abrodkin(a)synopsys.com>
ARC: Fix CONFIG_SWAP
Takashi Iwai <tiwai(a)suse.de>
ALSA: rawmidi: Change resized buffers atomically
OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp>
fat: fix memory allocation failure handling of match_strdup()
Dewet Thibaut <thibaut.dewet(a)nokia.com>
x86/MCE: Remove min interval polling limitation
-------------
Diffstat:
Makefile | 5 +-
arch/arc/include/asm/page.h | 2 +-
arch/arc/include/asm/pgtable.h | 2 +-
arch/arm/include/asm/uaccess.h | 2 +-
arch/x86/kernel/cpu/mcheck/mce.c | 3 -
drivers/net/can/xilinx_can.c | 98 ++++++++++++++++------
.../net/ethernet/mellanox/mlx4/resource_tracker.c | 2 +-
drivers/ptp/ptp_chardev.c | 1 +
drivers/usb/class/cdc-acm.c | 3 +
drivers/usb/core/hub.c | 8 +-
drivers/usb/gadget/function/f_fs.c | 2 +-
fs/fat/inode.c | 20 +++--
include/linux/skbuff.h | 12 +--
include/net/tcp.h | 2 +
net/core/rtnetlink.c | 9 +-
net/core/skbuff.c | 1 +
net/ipv4/ip_output.c | 2 +
net/ipv4/sysctl_net_ipv4.c | 5 +-
net/ipv4/tcp_dctcp.c | 50 ++++-------
net/ipv4/tcp_input.c | 21 ++++-
net/ipv4/tcp_output.c | 33 ++++++--
net/ipv6/ip6_output.c | 2 +
sound/core/rawmidi.c | 20 +++--
23 files changed, 198 insertions(+), 107 deletions(-)
'ac->ac_2order' is a user-controlled value used to index into
'grp->bb_counters' and based on the value at that index, 'ac->ac_found'
is written to. Clamp the value right after the bounds check to avoid a
speculative out-of-bounds read of 'grp->bb_counters'.
This also protects the access of the s_mb_offsets and s_mb_maxs arrays
inside mb_find_buddy().
These gadgets were discovered with the help of smatch:
* fs/ext4/mballoc.c:1896 ext4_mb_simple_scan_group() warn: potential
spectre issue 'grp->bb_counters' [w] (local cap)
* fs/ext4/mballoc.c:445 mb_find_buddy() warn: potential spectre issue
'EXT4_SB(e4b->bd_sb)->s_mb_offsets' [r] (local cap)
* fs/ext4/mballoc.c:446 mb_find_buddy() warn: potential spectre issue
'EXT4_SB(e4b->bd_sb)->s_mb_maxs' [r] (local cap)
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jeremy Cline <jcline(a)redhat.com>
---
fs/ext4/mballoc.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index f7ab34088162..c0866007a949 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -14,6 +14,7 @@
#include <linux/log2.h>
#include <linux/module.h>
#include <linux/slab.h>
+#include <linux/nospec.h>
#include <linux/backing-dev.h>
#include <trace/events/ext4.h>
@@ -1893,6 +1894,7 @@ void ext4_mb_simple_scan_group(struct ext4_allocation_context *ac,
BUG_ON(ac->ac_2order <= 0);
for (i = ac->ac_2order; i <= sb->s_blocksize_bits + 1; i++) {
+ i = array_index_nospec(i, sb->s_blocksize_bits + 2);
if (grp->bb_counters[i] == 0)
continue;
--
2.17.1