From: Paul Burton <paul.burton(a)mips.com>
The current MIPS implementation of arch_trigger_cpumask_backtrace() is
broken because it attempts to use synchronous IPIs despite the fact that
it may be run with interrupts disabled.
This means that when arch_trigger_cpumask_backtrace() is invoked, for
example by the RCU CPU stall watchdog, we may:
- Deadlock due to use of synchronous IPIs with interrupts disabled,
causing the CPU that's attempting to generate the backtrace output
to hang itself.
- Not succeed in generating the desired output from remote CPUs.
- Produce warnings about this from smp_call_function_many(), for
example:
[42760.526910] INFO: rcu_sched detected stalls on CPUs/tasks:
[42760.535755] 0-...!: (1 GPs behind) idle=ade/140000000000000/0 softirq=526944/526945 fqs=0
[42760.547874] 1-...!: (0 ticks this GP) idle=e4a/140000000000000/0 softirq=547885/547885 fqs=0
[42760.559869] (detected by 2, t=2162 jiffies, g=266689, c=266688, q=33)
[42760.568927] ------------[ cut here ]------------
[42760.576146] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:416 smp_call_function_many+0x88/0x20c
[42760.587839] Modules linked in:
[42760.593152] CPU: 2 PID: 1216 Comm: sh Not tainted 4.15.4-00373-gee058bb4d0c2 #2
[42760.603767] Stack : 8e09bd20 8e09bd20 8e09bd20 fffffff0 00000007 00000006 00000000 8e09bca8
[42760.616937] 95b2b379 95b2b379 807a0080 00000007 81944518 0000018a 00000032 00000000
[42760.630095] 00000000 00000030 80000000 00000000 806eca74 00000009 8017e2b8 000001a0
[42760.643169] 00000000 00000002 00000000 8e09baa4 00000008 808b8008 86d69080 8e09bca0
[42760.656282] 8e09ad50 805e20aa 00000000 00000000 00000000 8017e2b8 00000009 801070ca
[42760.669424] ...
[42760.673919] Call Trace:
[42760.678672] [<27fde568>] show_stack+0x70/0xf0
[42760.685417] [<84751641>] dump_stack+0xaa/0xd0
[42760.692188] [<699d671c>] __warn+0x80/0x92
[42760.698549] [<68915d41>] warn_slowpath_null+0x28/0x36
[42760.705912] [<f7c76c1c>] smp_call_function_many+0x88/0x20c
[42760.713696] [<6bbdfc2a>] arch_trigger_cpumask_backtrace+0x30/0x4a
[42760.722216] [<f845bd33>] rcu_dump_cpu_stacks+0x6a/0x98
[42760.729580] [<796e7629>] rcu_check_callbacks+0x672/0x6ac
[42760.737476] [<059b3b43>] update_process_times+0x18/0x34
[42760.744981] [<6eb94941>] tick_sched_handle.isra.5+0x26/0x38
[42760.752793] [<478d3d70>] tick_sched_timer+0x1c/0x50
[42760.759882] [<e56ea39f>] __hrtimer_run_queues+0xc6/0x226
[42760.767418] [<e88bbcae>] hrtimer_interrupt+0x88/0x19a
[42760.775031] [<6765a19e>] gic_compare_interrupt+0x2e/0x3a
[42760.782761] [<0558bf5f>] handle_percpu_devid_irq+0x78/0x168
[42760.790795] [<90c11ba2>] generic_handle_irq+0x1e/0x2c
[42760.798117] [<1b6d462c>] gic_handle_local_int+0x38/0x86
[42760.805545] [<b2ada1c7>] gic_irq_dispatch+0xa/0x14
[42760.812534] [<90c11ba2>] generic_handle_irq+0x1e/0x2c
[42760.820086] [<c7521934>] do_IRQ+0x16/0x20
[42760.826274] [<9aef3ce6>] plat_irq_dispatch+0x62/0x94
[42760.833458] [<6a94b53c>] except_vec_vi_end+0x70/0x78
[42760.840655] [<22284043>] smp_call_function_many+0x1ba/0x20c
[42760.848501] [<54022b58>] smp_call_function+0x1e/0x2c
[42760.855693] [<ab9fc705>] flush_tlb_mm+0x2a/0x98
[42760.862730] [<0844cdd0>] tlb_flush_mmu+0x1c/0x44
[42760.869628] [<cb259b74>] arch_tlb_finish_mmu+0x26/0x3e
[42760.877021] [<1aeaaf74>] tlb_finish_mmu+0x18/0x66
[42760.883907] [<b3fce717>] exit_mmap+0x76/0xea
[42760.890428] [<c4c8a2f6>] mmput+0x80/0x11a
[42760.896632] [<a41a08f4>] do_exit+0x1f4/0x80c
[42760.903158] [<ee01cef6>] do_group_exit+0x20/0x7e
[42760.909990] [<13fa8d54>] __wake_up_parent+0x0/0x1e
[42760.917045] [<46cf89d0>] smp_call_function_many+0x1a2/0x20c
[42760.924893] [<8c21a93b>] syscall_common+0x14/0x1c
[42760.931765] ---[ end trace 02aa09da9dc52a60 ]---
[42760.938342] ------------[ cut here ]------------
[42760.945311] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:291 smp_call_function_single+0xee/0xf8
...
This patch switches MIPS' arch_trigger_cpumask_backtrace() to use async
IPIs & smp_call_function_single_async() in order to resolve this
problem. We ensure use of the pre-allocated call_single_data_t
structures is serialized by maintaining a cpumask indicating that
they're busy, and refusing to attempt to send an IPI when a CPU's bit is
set in this mask. This should only happen if a CPU hasn't responded to a
previous backtrace IPI - ie. if it's hung - and we print a warning to
the console in this case.
I've marked this for stable branches as far back as v4.9, to which it
applies cleanly. Strictly speaking the faulty MIPS implementation can be
traced further back to commit 856839b76836 ("MIPS: Add
arch_trigger_all_cpu_backtrace() function") in v3.19, but kernel
versions v3.19 through v4.8 will require further work to backport due to
the rework performed in commit 9a01c3ed5cdb ("nmi_backtrace: add more
trigger_*_cpu_backtrace() methods").
Signed-off-by: Paul Burton <paul.burton(a)mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/19597/
Cc: James Hogan <jhogan(a)kernel.org>
Cc: Ralf Baechle <ralf(a)linux-mips.org>
Cc: linux-mips(a)linux-mips.org
Cc: stable(a)vger.kernel.org
Fixes: 856839b76836 ("MIPS: Add arch_trigger_all_cpu_backtrace() function")
Fixes: 9a01c3ed5cdb ("nmi_backtrace: add more trigger_*_cpu_backtrace() methods")
[ Huacai: backported to 4.9: Replace "call_single_data_t" with "struct call_single_data" ]
Signed-off-by: Huacai Chen <chenhc(a)lemote.com>
---
arch/mips/kernel/process.c | 45 ++++++++++++++++++++++++++++++---------------
1 file changed, 30 insertions(+), 15 deletions(-)
diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index cb1e9c1..513a63b 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -26,6 +26,7 @@
#include <linux/kallsyms.h>
#include <linux/random.h>
#include <linux/prctl.h>
+#include <linux/nmi.h>
#include <asm/asm.h>
#include <asm/bootinfo.h>
@@ -633,28 +634,42 @@ unsigned long arch_align_stack(unsigned long sp)
return sp & ALMASK;
}
-static void arch_dump_stack(void *info)
-{
- struct pt_regs *regs;
+static DEFINE_PER_CPU(struct call_single_data, backtrace_csd);
+static struct cpumask backtrace_csd_busy;
- regs = get_irq_regs();
-
- if (regs)
- show_regs(regs);
- else
- dump_stack();
+static void handle_backtrace(void *info)
+{
+ nmi_cpu_backtrace(get_irq_regs());
+ cpumask_clear_cpu(smp_processor_id(), &backtrace_csd_busy);
}
-void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
+static void raise_backtrace(cpumask_t *mask)
{
- long this_cpu = get_cpu();
+ struct call_single_data *csd;
+ int cpu;
- if (cpumask_test_cpu(this_cpu, mask) && !exclude_self)
- dump_stack();
+ for_each_cpu(cpu, mask) {
+ /*
+ * If we previously sent an IPI to the target CPU & it hasn't
+ * cleared its bit in the busy cpumask then it didn't handle
+ * our previous IPI & it's not safe for us to reuse the
+ * call_single_data_t.
+ */
+ if (cpumask_test_and_set_cpu(cpu, &backtrace_csd_busy)) {
+ pr_warn("Unable to send backtrace IPI to CPU%u - perhaps it hung?\n",
+ cpu);
+ continue;
+ }
- smp_call_function_many(mask, arch_dump_stack, NULL, 1);
+ csd = &per_cpu(backtrace_csd, cpu);
+ csd->func = handle_backtrace;
+ smp_call_function_single_async(cpu, csd);
+ }
+}
- put_cpu();
+void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
+{
+ nmi_trigger_cpumask_backtrace(mask, exclude_self, raise_backtrace);
}
int mips_get_process_fp_mode(struct task_struct *task)
--
2.7.0
This is the start of the stable review cycle for the 4.14.56 release.
There are 54 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed Jul 18 07:34:24 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.56-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.56-rc1
Jaegeuk Kim <jaegeuk(a)kernel.org>
f2fs: give message and set need_fsck given broken node id
Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
loop: remember whether sysfs_create_group() was done
Leon Romanovsky <leonro(a)mellanox.com>
RDMA/ucm: Mark UCM interface as BROKEN
Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
PM / hibernate: Fix oops at snapshot_write()
Theodore Ts'o <tytso(a)mit.edu>
loop: add recursion validation to LOOP_CHANGE_FD
Florian Westphal <fw(a)strlen.de>
netfilter: x_tables: initialise match/target check parameter struct
Eric Dumazet <edumazet(a)google.com>
netfilter: nf_queue: augment nfqa_cfg_policy
Oleg Nesterov <oleg(a)redhat.com>
uprobes/x86: Remove incorrect WARN_ON() in uprobe_init_insn()
Eric Biggers <ebiggers(a)google.com>
crypto: x86/salsa20 - remove x86 salsa20 implementations
Keith Busch <keith.busch(a)intel.com>
nvme-pci: Remap CMB SQ entries on every controller reset
Juergen Gross <jgross(a)suse.com>
xen: setup pv irq ops vector earlier
Steve Wise <swise(a)opengridcomputing.com>
iw_cxgb4: correctly enforce the max reg_mr depth
Jon Hunter <jonathanh(a)nvidia.com>
i2c: tegra: Fix NACK error handling
Michael J. Ruhl <michael.j.ruhl(a)intel.com>
IB/hfi1: Fix incorrect mixing of ERR_PTR and NULL return values
Paul Menzel <pmenzel(a)molgen.mpg.de>
tools build: fix # escaping in .cmd files for future Make
Yandong Zhao <yandong77520(a)gmail.com>
arm64: neon: Fix function may_use_simd() return error status
Randy Dunlap <rdunlap(a)infradead.org>
kbuild: delete INSTALL_FW_PATH from kbuild documentation
Joel Fernandes (Google) <joel(a)joelfernandes.org>
tracing: Reorder display of TGID to be after PID
Michal Hocko <mhocko(a)suse.com>
mm: do not bug_on on incorrect length in __mm_populate()
Oscar Salvador <osalvador(a)suse.de>
fs, elf: make sure to page align bss in load_elf_library
Vlastimil Babka <vbabka(a)suse.cz>
fs/proc/task_mmu.c: fix Locked field in /proc/pid/smaps*
Christian Borntraeger <borntraeger(a)de.ibm.com>
mm: do not drop unused pages when userfaultd is running
Chris Wilson <chris(a)chris-wilson.co.uk>
ALSA: hda - Handle pm failure during hotplug
Hui Wang <hui.wang(a)canonical.com>
ALSA: hda/realtek - two more lenovo models need fixup of MIC_LOCATION
Ming Lei <ming.lei(a)redhat.com>
scsi: megaraid_sas: fix selection of reply queue
Shivasharan S <shivasharan.srikanteshwara(a)broadcom.com>
scsi: megaraid_sas: Create separate functions to allocate ctrl memory
Shivasharan S <shivasharan.srikanteshwara(a)broadcom.com>
scsi: megaraid_sas: replace is_ventura with adapter_type checks
Shivasharan S <shivasharan.srikanteshwara(a)broadcom.com>
scsi: megaraid_sas: replace instance->ctrl_context checks with instance->adapter_type
Shivasharan S <shivasharan.srikanteshwara(a)broadcom.com>
scsi: megaraid_sas: use adapter_type for all gen controllers
Christoph Hellwig <hch(a)lst.de>
genirq/affinity: assign vectors to all possible CPUs
Linus Torvalds <torvalds(a)linux-foundation.org>
Fix up non-directory creation in SGID directories
Christian Brauner <christian.brauner(a)ubuntu.com>
devpts: resolve devpts bind-mounts
Christian Brauner <christian.brauner(a)ubuntu.com>
devpts: hoist out check for DEVPTS_SUPER_MAGIC
Dan Carpenter <dan.carpenter(a)oracle.com>
xhci: xhci-mem: off by one in xhci_stream_id_to_ring()
Nico Sneck <snecknico(a)gmail.com>
usb: quirks: add delay quirks for Corsair Strafe
Johan Hovold <johan(a)kernel.org>
USB: serial: mos7840: fix status-register error handling
Jann Horn <jannh(a)google.com>
USB: yurex: fix out-of-bounds uaccess in read handler
Johan Hovold <johan(a)kernel.org>
USB: serial: keyspan_pda: fix modem-status error handling
Olli Salonen <olli.salonen(a)iki.fi>
USB: serial: cp210x: add another USB ID for Qivicon ZigBee stick
Dan Carpenter <dan.carpenter(a)oracle.com>
USB: serial: ch341: fix type promotion bug in ch341_control_in()
Hans de Goede <hdegoede(a)redhat.com>
ahci: Disable LPM on Lenovo 50 series laptops with a too old BIOS
Nadav Amit <namit(a)vmware.com>
vmw_balloon: fix inflation with batching
Damien Le Moal <damien.lemoal(a)wdc.com>
ata: Fix ZBC_OUT all bit handling
Damien Le Moal <damien.lemoal(a)wdc.com>
ata: Fix ZBC_OUT command block check
Ping-Ke Shih <pkshih(a)realtek.com>
staging: r8822be: Fix RTL8822be can't find any wireless AP
Murray McAllister <murray.mcallister(a)insomniasec.com>
staging: rtl8723bs: Prevent an underflow in rtw_check_beacon_data().
Jann Horn <jannh(a)google.com>
ibmasm: don't write out of bounds in read handler
x00270170 <xiaqing17(a)hisilicon.com>
mmc: dw_mmc: fix card threshold control configuration
Stefan Agner <stefan(a)agner.ch>
mmc: sdhci-esdhc-imx: allow 1.8V modes without 100/200MHz pinctrl states
Paul Burton <paul.burton(a)mips.com>
MIPS: Fix ioremap() RAM check
Paul Burton <paul.burton(a)mips.com>
MIPS: Use async IPIs for arch_trigger_cpumask_backtrace()
Paul Burton <paul.burton(a)mips.com>
MIPS: Call dump_stack() from show_regs()
Kai Chieh Chuang <kaichieh.chuang(a)mediatek.com>
ASoC: mediatek: preallocate pages use platform device
Sean Young <sean(a)mess.org>
media: rc: mce_kbd decoder: fix stuck keys
-------------
Diffstat:
Documentation/kbuild/kbuild.txt | 9 -
Makefile | 4 +-
arch/arm64/include/asm/simd.h | 19 +-
arch/mips/kernel/process.c | 43 +-
arch/mips/kernel/traps.c | 1 +
arch/mips/mm/ioremap.c | 37 +-
arch/x86/crypto/Makefile | 4 -
arch/x86/crypto/salsa20-i586-asm_32.S | 1114 --------------------
arch/x86/crypto/salsa20-x86_64-asm_64.S | 919 ----------------
arch/x86/crypto/salsa20_glue.c | 116 --
arch/x86/kernel/uprobes.c | 2 +-
arch/x86/xen/enlighten_pv.c | 24 +-
arch/x86/xen/irq.c | 4 +-
crypto/Kconfig | 26 -
drivers/ata/ahci.c | 59 ++
drivers/ata/libata-core.c | 3 +
drivers/ata/libata-scsi.c | 18 +-
drivers/block/loop.c | 79 +-
drivers/block/loop.h | 1 +
drivers/i2c/busses/i2c-tegra.c | 17 +-
drivers/infiniband/Kconfig | 12 +
drivers/infiniband/core/Makefile | 4 +-
drivers/infiniband/hw/cxgb4/mem.c | 2 +-
drivers/infiniband/hw/hfi1/rc.c | 2 +-
drivers/infiniband/hw/hfi1/uc.c | 4 +-
drivers/infiniband/hw/hfi1/ud.c | 4 +-
drivers/infiniband/hw/hfi1/verbs_txreq.c | 4 +-
drivers/infiniband/hw/hfi1/verbs_txreq.h | 4 +-
drivers/media/rc/ir-mce_kbd-decoder.c | 2 +
drivers/misc/ibmasm/ibmasmfs.c | 27 +-
drivers/misc/vmw_balloon.c | 4 +-
drivers/mmc/host/dw_mmc.c | 7 +-
drivers/mmc/host/sdhci-esdhc-imx.c | 21 +-
drivers/nvme/host/pci.c | 27 +-
drivers/scsi/megaraid/megaraid_sas.h | 10 +-
drivers/scsi/megaraid/megaraid_sas_base.c | 296 ++++--
drivers/scsi/megaraid/megaraid_sas_fp.c | 20 +-
drivers/scsi/megaraid/megaraid_sas_fusion.c | 54 +-
drivers/scsi/megaraid/megaraid_sas_fusion.h | 7 -
drivers/staging/rtl8723bs/core/rtw_ap.c | 2 +-
drivers/staging/rtlwifi/rtl8822be/hw.c | 2 +-
drivers/staging/rtlwifi/wifi.h | 1 +
drivers/usb/core/quirks.c | 4 +
drivers/usb/host/xhci-mem.c | 2 +-
drivers/usb/misc/yurex.c | 23 +-
drivers/usb/serial/ch341.c | 2 +-
drivers/usb/serial/cp210x.c | 1 +
drivers/usb/serial/keyspan_pda.c | 4 +-
drivers/usb/serial/mos7840.c | 3 +
fs/binfmt_elf.c | 5 +-
fs/devpts/inode.c | 48 +-
fs/f2fs/f2fs.h | 13 +-
fs/f2fs/inode.c | 13 +-
fs/f2fs/node.c | 21 +-
fs/inode.c | 6 +
fs/proc/task_mmu.c | 2 +-
include/linux/libata.h | 1 +
kernel/irq/affinity.c | 30 +-
kernel/power/user.c | 5 +
kernel/trace/trace.c | 8 +-
kernel/trace/trace_output.c | 5 +-
mm/gup.c | 2 -
mm/mmap.c | 29 +-
mm/rmap.c | 8 +-
net/bridge/netfilter/ebtables.c | 2 +
net/ipv4/netfilter/ip_tables.c | 1 +
net/ipv6/netfilter/ip6_tables.c | 1 +
net/netfilter/nfnetlink_queue.c | 3 +
sound/pci/hda/patch_hdmi.c | 19 +-
sound/pci/hda/patch_realtek.c | 6 +-
.../soc/mediatek/common/mtk-afe-platform-driver.c | 4 +-
tools/build/Build.include | 4 +-
72 files changed, 665 insertions(+), 2625 deletions(-)
SEV guest fails to update the UEFI runtime variables stored in the
flash. commit 1379edd59673 ("x86/efi: Access EFI data as encrypted
when SEV is active") unconditionally maps all the UEFI runtime data
as 'encrypted' (C=1). When SEV is active the UEFI runtime data marked
as EFI_MEMORY_MAPPED_IO should be mapped as 'unencrypted' so that both
guest and hypervisor can access the data.
Fixes: 1379edd59673 (x86/efi: Access EFI data as encrypted ...)
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)suse.de>
Cc: linux-efi(a)vger.kernel.org
Cc: kvm(a)vger.kernel.org
Cc: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Cc: Matt Fleming <matt(a)codeblueprint.co.uk>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: <stable(a)vger.kernel.org> # 4.15.x
Signed-off-by: Brijesh Singh <brijesh.singh(a)amd.com>
---
arch/x86/platform/efi/efi_64.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 77873ce..5f2eb32 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -417,7 +417,7 @@ static void __init __map_region(efi_memory_desc_t *md, u64 va)
if (!(md->attribute & EFI_MEMORY_WB))
flags |= _PAGE_PCD;
- if (sev_active())
+ if (sev_active() && md->type != EFI_MEMORY_MAPPED_IO)
flags |= _PAGE_ENC;
pfn = md->phys_addr >> PAGE_SHIFT;
--
2.7.4
The patch titled
Subject: mm: do not bug_on on incorrect length in __mm_populate()
has been removed from the -mm tree. Its filename was
mm-do-not-bug_on-on-incorrect-lenght-in-__mm_populate.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Michal Hocko <mhocko(a)suse.com>
Subject: mm: do not bug_on on incorrect length in __mm_populate()
syzbot has noticed that a specially crafted library can easily hit
VM_BUG_ON in __mm_populate
localhost login: [ 81.210241] emacs (9634) used greatest stack depth: 10416 bytes left
[ 140.099935] ------------[ cut here ]------------
[ 140.101904] kernel BUG at mm/gup.c:1242!
[ 140.103572] invalid opcode: 0000 [#1] SMP
[ 140.105220] CPU: 2 PID: 9667 Comm: a.out Not tainted 4.18.0-rc3 #644
[ 140.107762] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
[ 140.112000] RIP: 0010:__mm_populate+0x1e2/0x1f0
[ 140.113875] Code: 55 d0 65 48 33 14 25 28 00 00 00 89 d8 75 21 48 83 c4 20 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 75 18 f1 ff 0f 0b e8 6e 18 f1 ff <0f> 0b 31 db eb c9 e8 93 06 e0 ff 0f 1f 00 55 48 89 e5 53 48 89 fb
[ 140.121403] RSP: 0018:ffffc90000dffd78 EFLAGS: 00010293
[ 140.123516] RAX: ffff8801366c63c0 RBX: 000000007bf81000 RCX: ffffffff813e4ee2
[ 140.126352] RDX: 0000000000000000 RSI: 0000000000007676 RDI: 000000007bf81000
[ 140.129236] RBP: ffffc90000dffdc0 R08: 0000000000000000 R09: 0000000000000000
[ 140.132110] R10: ffff880135895c80 R11: 0000000000000000 R12: 0000000000007676
[ 140.134955] R13: 0000000000008000 R14: 0000000000000000 R15: 0000000000007676
[ 140.137785] FS: 0000000000000000(0000) GS:ffff88013a680000(0063) knlGS:00000000f7db9700
[ 140.140998] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 140.143303] CR2: 00000000f7ea56e0 CR3: 0000000134674004 CR4: 00000000000606e0
[ 140.145906] Call Trace:
[ 140.146728] vm_brk_flags+0xc3/0x100
[ 140.147830] vm_brk+0x1f/0x30
[ 140.148714] load_elf_library+0x281/0x2e0
[ 140.149875] __ia32_sys_uselib+0x170/0x1e0
[ 140.151028] ? copy_overflow+0x30/0x30
[ 140.152105] ? __ia32_sys_uselib+0x170/0x1e0
[ 140.153301] do_fast_syscall_32+0xca/0x420
[ 140.154455] entry_SYSENTER_compat+0x70/0x7f
The reason is that the length of the new brk is not page aligned when we
try to populate the it. There is no reason to bug on that though.
do_brk_flags already aligns the length properly so the mapping is expanded
as it should. All we need is to tell mm_populate about it. Besides that
there is absolutely no reason to to bug_on in the first place. The worst
thing that could happen is that the last page wouldn't get populated and
that is far from putting system into an inconsistent state.
Fix the issue by moving the length sanitization code from do_brk_flags up
to vm_brk_flags. The only other caller of do_brk_flags is brk syscall
entry and it makes sure to provide the proper length so t here is no need
for sanitation and so we can use do_brk_flags without it.
Also remove the bogus BUG_ONs.
[osalvador(a)techadventures.net: fix up vm_brk_flags s@request@len@]
Link: http://lkml.kernel.org/r/20180706090217.GI32658@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko(a)suse.com>
Reported-by: syzbot <syzbot+5dcb560fe12aa5091c06(a)syzkaller.appspotmail.com>
Tested-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Cc: Zi Yan <zi.yan(a)cs.rutgers.edu>
Cc: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.vnet.ibm.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Michael S. Tsirkin <mst(a)redhat.com>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 2 --
mm/mmap.c | 29 ++++++++++++-----------------
2 files changed, 12 insertions(+), 19 deletions(-)
diff -puN mm/gup.c~mm-do-not-bug_on-on-incorrect-lenght-in-__mm_populate mm/gup.c
--- a/mm/gup.c~mm-do-not-bug_on-on-incorrect-lenght-in-__mm_populate
+++ a/mm/gup.c
@@ -1238,8 +1238,6 @@ int __mm_populate(unsigned long start, u
int locked = 0;
long ret = 0;
- VM_BUG_ON(start & ~PAGE_MASK);
- VM_BUG_ON(len != PAGE_ALIGN(len));
end = start + len;
for (nstart = start; nstart < end; nstart = nend) {
diff -puN mm/mmap.c~mm-do-not-bug_on-on-incorrect-lenght-in-__mm_populate mm/mmap.c
--- a/mm/mmap.c~mm-do-not-bug_on-on-incorrect-lenght-in-__mm_populate
+++ a/mm/mmap.c
@@ -186,8 +186,8 @@ static struct vm_area_struct *remove_vma
return next;
}
-static int do_brk(unsigned long addr, unsigned long len, struct list_head *uf);
-
+static int do_brk_flags(unsigned long addr, unsigned long request, unsigned long flags,
+ struct list_head *uf);
SYSCALL_DEFINE1(brk, unsigned long, brk)
{
unsigned long retval;
@@ -245,7 +245,7 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
goto out;
/* Ok, looks good - let it rip. */
- if (do_brk(oldbrk, newbrk-oldbrk, &uf) < 0)
+ if (do_brk_flags(oldbrk, newbrk-oldbrk, 0, &uf) < 0)
goto out;
set_brk:
@@ -2929,21 +2929,14 @@ static inline void verify_mm_writelocked
* anonymous maps. eventually we may be able to do some
* brk-specific accounting here.
*/
-static int do_brk_flags(unsigned long addr, unsigned long request, unsigned long flags, struct list_head *uf)
+static int do_brk_flags(unsigned long addr, unsigned long len, unsigned long flags, struct list_head *uf)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma, *prev;
- unsigned long len;
struct rb_node **rb_link, *rb_parent;
pgoff_t pgoff = addr >> PAGE_SHIFT;
int error;
- len = PAGE_ALIGN(request);
- if (len < request)
- return -ENOMEM;
- if (!len)
- return 0;
-
/* Until we need other flags, refuse anything except VM_EXEC. */
if ((flags & (~VM_EXEC)) != 0)
return -EINVAL;
@@ -3015,18 +3008,20 @@ out:
return 0;
}
-static int do_brk(unsigned long addr, unsigned long len, struct list_head *uf)
-{
- return do_brk_flags(addr, len, 0, uf);
-}
-
-int vm_brk_flags(unsigned long addr, unsigned long len, unsigned long flags)
+int vm_brk_flags(unsigned long addr, unsigned long request, unsigned long flags)
{
struct mm_struct *mm = current->mm;
+ unsigned long len;
int ret;
bool populate;
LIST_HEAD(uf);
+ len = PAGE_ALIGN(request);
+ if (len < request)
+ return -ENOMEM;
+ if (!len)
+ return 0;
+
if (down_write_killable(&mm->mmap_sem))
return -EINTR;
_
Patches currently in -mm which might be from mhocko(a)suse.com are
mm-drop-vm_bug_on-from-__get_free_pages.patch
memcg-oom-move-out_of_memory-back-to-the-charge-path.patch
mm-oom-remove-sleep-from-under-oom_lock.patch
mm-oom-document-oom_lock.patch
mm-oom-docs-describe-the-cgroup-aware-oom-killer-fix-2.patch
The patch titled
Subject: fs, elf: make sure to page align bss in load_elf_library
has been removed from the -mm tree. Its filename was
fs-elf-make-sure-to-page-align-bss-in-load_elf_library.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Oscar Salvador <osalvador(a)suse.de>
Subject: fs, elf: make sure to page align bss in load_elf_library
The current code does not make sure to page align bss before calling
vm_brk(), and this can lead to a VM_BUG_ON() in __mm_populate()
due to the requested lenght not being correctly aligned.
Let us make sure to align it properly.
Kees: only applicable to CONFIG_USELIB kernels: 32-bit and configured for
libc5.
Link: http://lkml.kernel.org/r/20180705145539.9627-1-osalvador@techadventures.net
Signed-off-by: Oscar Salvador <osalvador(a)suse.de>
Reported-by: syzbot+5dcb560fe12aa5091c06(a)syzkaller.appspotmail.com
Tested-by: Tetsuo Handa <penguin-kernel(a)i-love.sakura.ne.jp>
Acked-by: Kees Cook <keescook(a)chromium.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Nicolas Pitre <nicolas.pitre(a)linaro.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/binfmt_elf.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff -puN fs/binfmt_elf.c~fs-elf-make-sure-to-page-align-bss-in-load_elf_library fs/binfmt_elf.c
--- a/fs/binfmt_elf.c~fs-elf-make-sure-to-page-align-bss-in-load_elf_library
+++ a/fs/binfmt_elf.c
@@ -1259,9 +1259,8 @@ static int load_elf_library(struct file
goto out_free_ph;
}
- len = ELF_PAGESTART(eppnt->p_filesz + eppnt->p_vaddr +
- ELF_MIN_ALIGN - 1);
- bss = eppnt->p_memsz + eppnt->p_vaddr;
+ len = ELF_PAGEALIGN(eppnt->p_filesz + eppnt->p_vaddr);
+ bss = ELF_PAGEALIGN(eppnt->p_memsz + eppnt->p_vaddr);
if (bss > len) {
error = vm_brk(len, bss - len);
if (error)
_
Patches currently in -mm which might be from osalvador(a)suse.de are
mm-memory_hotplug-make-add_memory_resource-use-__try_online_node.patch
mm-memory_hotplug-call-register_mem_sect_under_node.patch
mm-memory_hotplug-make-register_mem_sect_under_node-a-cb-of-walk_memory_range.patch
mm-memory_hotplug-drop-unnecessary-checks-from-register_mem_sect_under_node.patch
mm-sparse-make-sparse_init_one_section-void-and-remove-check.patch
The patch titled
Subject: x86/purgatory: add missing FORCE to Makefile target
has been removed from the -mm tree. Its filename was
x86-purgatory-add-missing-force-to-makefile-target.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Philipp Rudo <prudo(a)linux.ibm.com>
Subject: x86/purgatory: add missing FORCE to Makefile target
- Build the kernel without the fix
- Add some flag to the purgatories KBUILD_CFLAGS,I used
-fno-asynchronous-unwind-tables
- Re-build the kernel
When you look at makes output you see that sha256.o is not re-build in the
last step. Also readelf -S still shows the .eh_frame section for
sha256.o.
With the fix sha256.o is rebuilt in the last step.
Without FORCE make does not detect changes only made to the command line
options. So object files might not be re-built even when they should be.
Fix this by adding FORCE where it is missing.
Link: http://lkml.kernel.org/r/20180704110044.29279-2-prudo@linux.ibm.com
Fixes: df6f2801f511 ("kernel/kexec_file.c: move purgatories sha256 to common code")
Signed-off-by: Philipp Rudo <prudo(a)linux.ibm.com>
Acked-by: Dave Young <dyoung(a)redhat.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: <stable(a)vger.kernel.org> [4.17+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/purgatory/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN arch/x86/purgatory/Makefile~x86-purgatory-add-missing-force-to-makefile-target arch/x86/purgatory/Makefile
--- a/arch/x86/purgatory/Makefile~x86-purgatory-add-missing-force-to-makefile-target
+++ a/arch/x86/purgatory/Makefile
@@ -6,7 +6,7 @@ purgatory-y := purgatory.o stack.o setup
targets += $(purgatory-y)
PURGATORY_OBJS = $(addprefix $(obj)/,$(purgatory-y))
-$(obj)/sha256.o: $(srctree)/lib/sha256.c
+$(obj)/sha256.o: $(srctree)/lib/sha256.c FORCE
$(call if_changed_rule,cc_o_c)
LDFLAGS_purgatory.ro := -e purgatory_start -r --no-undefined -nostdlib -z nodefaultlib
_
Patches currently in -mm which might be from prudo(a)linux.ibm.com are
The patch titled
Subject: fs/proc/task_mmu.c: fix Locked field in /proc/pid/smaps*
has been removed from the -mm tree. Its filename was
mm-fix-locked-field-in-proc-pid-smaps.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Vlastimil Babka <vbabka(a)suse.cz>
Subject: fs/proc/task_mmu.c: fix Locked field in /proc/pid/smaps*
Thomas reports:
: While looking around in /proc on my v4.14.52 system I noticed that
: all processes got a lot of "Locked" memory in /proc/*/smaps. A lot
: more memory than a regular user can usually lock with mlock().
:
: commit 493b0e9d945fa9dfe96be93ae41b4ca4b6fdb317 (v4.14-rc1) seems
: to have changed the behavior of "Locked".
:
: Before that commit the code was like this. Notice the VM_LOCKED
: check.
:
: (vma->vm_flags & VM_LOCKED) ?
: (unsigned long)(mss.pss >> (10 + PSS_SHIFT)) : 0);
:
: After that commit Locked is now the same as Pss. This looks like a
: mistake.
:
: (unsigned long)(mss->pss >> (10 + PSS_SHIFT)));
Indeed, the commit has added mss->pss_locked with the correct value that
depends on VM_LOCKED, but forgot to actually use it. Fix it.
Link: http://lkml.kernel.org/r/ebf6c7fb-fec3-6a26-544f-710ed193c154@suse.cz
Fixes: 493b0e9d945f ("mm: add /proc/pid/smaps_rollup")
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Reported-by: Thomas Lindroth <thomas.lindroth(a)gmail.com>
Cc: Alexey Dobriyan <adobriyan(a)gmail.com>
Cc: Daniel Colascione <dancol(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/task_mmu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff -puN fs/proc/task_mmu.c~mm-fix-locked-field-in-proc-pid-smaps fs/proc/task_mmu.c
--- a/fs/proc/task_mmu.c~mm-fix-locked-field-in-proc-pid-smaps
+++ a/fs/proc/task_mmu.c
@@ -831,7 +831,8 @@ static int show_smap(struct seq_file *m,
SEQ_PUT_DEC(" kB\nSwap: ", mss->swap);
SEQ_PUT_DEC(" kB\nSwapPss: ",
mss->swap_pss >> PSS_SHIFT);
- SEQ_PUT_DEC(" kB\nLocked: ", mss->pss >> PSS_SHIFT);
+ SEQ_PUT_DEC(" kB\nLocked: ",
+ mss->pss_locked >> PSS_SHIFT);
seq_puts(m, " kB\n");
}
if (!rollup_mode) {
_
Patches currently in -mm which might be from vbabka(a)suse.cz are
mm-page_alloc-actually-ignore-mempolicies-for-high-priority-allocations.patch
The patch titled
Subject: mm: do not drop unused pages when userfaultd is running
has been removed from the -mm tree. Its filename was
mm-do-not-drop-unused-pages-when-userfaultd-is-running.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Christian Borntraeger <borntraeger(a)de.ibm.com>
Subject: mm: do not drop unused pages when userfaultd is running
KVM guests on s390 can notify the host of unused pages. This can result
in pte_unused callbacks to be true for KVM guest memory.
If a page is unused (checked with pte_unused) we might drop this page
instead of paging it. This can have side-effects on userfaultd, when the
page in question was already migrated:
The next access of that page will trigger a fault and a user fault instead
of faulting in a new and empty zero page. As QEMU does not expect a
userfault on an already migrated page this migration will fail.
The most straightforward solution is to ignore the pte_unused hint if a
userfault context is active for this VMA.
Link: http://lkml.kernel.org/r/20180703171854.63981-1-borntraeger@de.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger(a)de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com>
Cc: Janosch Frank <frankja(a)linux.ibm.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Cornelia Huck <cohuck(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/rmap.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff -puN mm/rmap.c~mm-do-not-drop-unused-pages-when-userfaultd-is-running mm/rmap.c
--- a/mm/rmap.c~mm-do-not-drop-unused-pages-when-userfaultd-is-running
+++ a/mm/rmap.c
@@ -64,6 +64,7 @@
#include <linux/backing-dev.h>
#include <linux/page_idle.h>
#include <linux/memremap.h>
+#include <linux/userfaultfd_k.h>
#include <asm/tlbflush.h>
@@ -1481,11 +1482,16 @@ static bool try_to_unmap_one(struct page
set_pte_at(mm, address, pvmw.pte, pteval);
}
- } else if (pte_unused(pteval)) {
+ } else if (pte_unused(pteval) && !userfaultfd_armed(vma)) {
/*
* The guest indicated that the page content is of no
* interest anymore. Simply discard the pte, vmscan
* will take care of the rest.
+ * A future reference will then fault in a new zero
+ * page. When userfaultfd is active, we must not drop
+ * this page though, as its main user (postcopy
+ * migration) will not expect userfaults on already
+ * copied pages.
*/
dec_mm_counter(mm, mm_counter(page));
/* We have to invalidate as we cleared the pte */
_
Patches currently in -mm which might be from borntraeger(a)de.ibm.com are