Any request queue data structure may change while a queue is frozen.
Hence make sure that blk_mq_run_hw_queues() does not access any hw
queue while a request queue is frozen.
After blk_cleanup_queue() has marked a queue as dead it is no longer
safe to access the hardware queue data structures. This patch avoids
that blk_mq_run_hw_queues() crashes when called during or after
blk_cleanup_queue() has freed the hardware queues. This patch is a
variant of a patch posted by Hannes Reinecke ("[PATCH] block: don't
call blk_mq_run_hw_queues() for dead or dying queues "). This patch
is similar in nature to commit c246e80d8673 ("block: Avoid that
request_fn is invoked on a dead queue"; v3.8). An example of a crash
that is fixed by this patch:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8135a10b>] sbitmap_any_bit_set+0xb/0x30
Call Trace:
[<ffffffff81303a88>] blk_mq_run_hw_queues+0x48/0x90
[<ffffffff813053cc>] blk_mq_requeue_work+0x10c/0x120
[<ffffffff81098cb4>] process_one_work+0x154/0x410
[<ffffffff81099896>] worker_thread+0x116/0x4a0
[<ffffffff8109edb9>] kthread+0xc9/0xe0
[<ffffffff81619b05>] ret_from_fork+0x55/0x80
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Hannes Reinecke <hare(a)suse.com>
Cc: James Smart <james.smart(a)broadcom.com>
Cc: Ming Lei <ming.lei(a)redhat.com>
Cc: Jianchao Wang <jianchao.w.wang(a)oracle.com>
Cc: Dongli Zhang <dongli.zhang(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Fixes: a063057d7c73 ("block: Fix a race between request queue removal and the block cgroup controller") # v4.17.
Reported-by: James Smart <james.smart(a)broadcom.com>
Signed-off-by: Bart Van Assche <bvanassche(a)acm.org>
---
block/blk-mq.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3ff3d7b49969..652d0c6d5945 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1499,12 +1499,20 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async)
struct blk_mq_hw_ctx *hctx;
int i;
+ /*
+ * Do not run any hardware queues if the queue is frozen or if a
+ * concurrent blk_cleanup_queue() call is removing any data
+ * structures used by this function.
+ */
+ if (!percpu_ref_tryget(&q->q_usage_counter))
+ return;
queue_for_each_hw_ctx(q, hctx, i) {
if (blk_mq_hctx_stopped(hctx))
continue;
blk_mq_run_hw_queue(hctx, async);
}
+ percpu_ref_put(&q->q_usage_counter);
}
EXPORT_SYMBOL(blk_mq_run_hw_queues);
--
2.21.0.196.g041f5ea1cf98
When the physmap_of_core.c code was merged into physmap-core.c the
ability to use MTD_PHYSMAP_OF with only MTD_RAM selected was lost.
Restore this by adding MTD_RAM to the dependencies of MTD_PHYSMAP.
Fixes: commit 642b1e8dbed7 ("mtd: maps: Merge physmap_of.c into physmap-core.c")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Chris Packham <chris.packham(a)alliedtelesis.co.nz>
Reviewed-by: Hamish Martin <hamish.martin(a)alliedtelesis.co.nz>
---
drivers/mtd/maps/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mtd/maps/Kconfig b/drivers/mtd/maps/Kconfig
index e0cf869c8544..544ed1931843 100644
--- a/drivers/mtd/maps/Kconfig
+++ b/drivers/mtd/maps/Kconfig
@@ -10,7 +10,7 @@ config MTD_COMPLEX_MAPPINGS
config MTD_PHYSMAP
tristate "Flash device in physical memory map"
- depends on MTD_CFI || MTD_JEDECPROBE || MTD_ROM || MTD_LPDDR
+ depends on MTD_CFI || MTD_JEDECPROBE || MTD_ROM || MTD_RAM || MTD_LPDDR
help
This provides a 'mapping' driver which allows the NOR Flash and
ROM driver code to communicate with chips which are mapped
--
2.21.0
This is the start of the stable review cycle for the 3.18.138 release.
There are 50 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed Apr 3 16:59:36 UTC 2019.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v3.x/stable-review/patch-3.18.138-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-3.18.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 3.18.138-rc1
Eric Biggers <ebiggers(a)google.com>
arm64: support keyctl() system call in 32-bit mode
Kohji Okuno <okuno.kohji(a)jp.panasonic.com>
ARM: imx6q: cpuidle: fix bug that CPU might not wake up at expected time
Mathias Nyman <mathias.nyman(a)linux.intel.com>
xhci: Fix port resume done detection for SS ports with LPM enabled
Sean Christopherson <sean.j.christopherson(a)intel.com>
KVM: Reject device ioctls from processes other than the VM's creator
Axel Lin <axel.lin(a)ingics.com>
gpio: adnp: Fix testing wrong value in adnp_gpio_direction_input
YueHaibing <yuehaibing(a)huawei.com>
fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links
Wentao Wang <witallwang(a)gmail.com>
Disable kgdboc failed by echo space to /sys/module/kgdboc/parameters/kgdboc
Lin Yi <teroincn(a)163.com>
USB: serial: mos7720: fix mos_parport refcount imbalance on error path
George McCollister <george.mccollister(a)gmail.com>
USB: serial: ftdi_sio: add additional NovaTech products
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
USB: serial: cp210x: add new device id
Aditya Pakki <pakki001(a)umn.edu>
serial: max310x: Fix to avoid potential NULL pointer dereference
Steffen Maier <maier(a)linux.ibm.com>
scsi: zfcp: fix scsi_eh host reset with port_forced ERP for non-NPIV FCP devices
Takashi Iwai <tiwai(a)suse.de>
ALSA: pcm: Don't suspend stream in unrecoverable PCM state
Takashi Iwai <tiwai(a)suse.de>
ALSA: pcm: Fix possible OOB access in PCM oss plugins
Finn Thain <fthain(a)telegraphics.com.au>
mac8390: Fix mmio access size probe
Xin Long <lucien.xin(a)gmail.com>
sctp: get sctphdr by offset in sctp_compute_cksum
Eric Dumazet <edumazet(a)google.com>
tcp: do not use ipv6 header for ipv4 flow
Maxime Chevallier <maxime.chevallier(a)bootlin.com>
packets: Always register packet sk in the same order
David S. Miller <davem(a)davemloft.net>
Add hlist_add_tail_rcu() (Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Eric Dumazet <edumazet(a)google.com>
net: rose: fix a possible stack overflow
Christoph Paasch <cpaasch(a)apple.com>
net/packet: Set __GFP_NOWARN upon allocation in alloc_pg_vec
Bjorn Helgaas <bhelgaas(a)google.com>
mISDN: hfcpci: Test both vendor & device ID for Digium HFC4S
Eric Dumazet <edumazet(a)google.com>
dccp: do not use ipv6 header for ipv4 flow
Johannes Berg <johannes.berg(a)intel.com>
cfg80211: size various nl80211 messages correctly
Chaotian Jing <chaotian.jing(a)mediatek.com>
mmc: mmc: fix switch timeout issue caused by jiffies precision
Ezequiel Garcia <ezequiel(a)vanguardiasur.com.ar>
arm64: kconfig: drop CONFIG_RTC_LIB dependency
Christoffer Dall <christoffer.dall(a)linaro.org>
video: fbdev: Set pixclock = 0 in goldfishfb
Winter Wang <wente.wang(a)nxp.com>
usb: gadget: configfs: add mutex lock before unregister gadget
Hannes Frederic Sowa <hannes(a)stressinduktion.org>
ipv6: fix endianness error in icmpv6_err
James Morse <james.morse(a)arm.com>
arm64: kernel: Include _AC definition in page.h
Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
arm64/kernel: fix incorrect EL0 check in inv_entry macro
Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
ARM: 8510/1: rework ARM_CPU_SUSPEND dependencies
Greg Hackmann <ghackmann(a)google.com>
staging: goldfish: audio: fix compiliation on arm
Rajmal Menariya <rajmal.menariya(a)spreadtrum.com>
staging: ion: Set minimum carveout heap allocation order to PAGE_SHIFT
Rom Lemarchand <romlem(a)android.com>
staging: ashmem: Add missing include
Laura Abbott <lauraa(a)codeaurora.org>
staging: ashmem: Avoid deadlock with mmap/shrink
Mark Rutland <mark.rutland(a)arm.com>
asm-generic: Fix local variable shadow in __set_fixmap_offset
Dmitry Torokhov <dtor(a)chromium.org>
android: unconditionally remove callbacks in sync_fence_free()
Arnd Bergmann <arnd(a)arndb.de>
ARM: 8458/1: bL_switcher: add GIC dependency
Yury Norov <ynorov(a)caviumnetworks.com>
arm64: fix COMPAT_SHMLBA definition for large pages
Colin Cross <ccross(a)android.com>
mmc: block: Allow more than 8 partitions per card
Marcel Holtmann <marcel(a)holtmann.org>
Bluetooth: Verify that l2cap_get_conf_opt provides large enough buffer
Marcel Holtmann <marcel(a)holtmann.org>
Bluetooth: Check L2CAP option sizes returned from l2cap_get_conf_opt
Hans Verkuil <hverkuil(a)xs4all.nl>
media: v4l2-ctrls.c/uvc: zero v4l2_event
Sergei Shtylyov <sergei.shtylyov(a)cogentembedded.com>
mmc: tmio_mmc_core: don't claim spurious interrupts
zhangyi (F) <yi.zhang(a)huawei.com>
ext4: brelse all indirect buffer in ext4_ind_remove_space()
Lukas Czerner <lczerner(a)redhat.com>
ext4: fix data corruption caused by unaligned direct AIO
Jiufei Xue <jiufei.xue(a)linux.alibaba.com>
ext4: fix NULL pointer dereference while journal is aborted
Chen Jie <chenjie6(a)huawei.com>
futex: Ensure that futex address is aligned in handle_futex_death()
Jan Kara <jack(a)suse.cz>
udf: Fix crash on IO error during truncate
-------------
Diffstat:
Documentation/virtual/kvm/api.txt | 16 +++--
Makefile | 4 +-
arch/arm/Kconfig | 6 +-
arch/arm/mach-imx/cpuidle-imx6q.c | 27 +++-----
arch/arm64/Kconfig | 5 +-
arch/arm64/include/asm/page.h | 2 +
arch/arm64/include/asm/shmparam.h | 2 +-
arch/arm64/kernel/entry.S | 2 +-
drivers/gpio/gpio-adnp.c | 6 +-
drivers/isdn/hardware/mISDN/hfcmulti.c | 3 +-
drivers/media/usb/uvc/uvc_ctrl.c | 2 +-
drivers/media/v4l2-core/v4l2-ctrls.c | 2 +-
drivers/mmc/card/block.c | 7 +--
drivers/mmc/core/mmc_ops.c | 2 +-
drivers/mmc/host/tmio_mmc_pio.c | 8 +--
drivers/net/ethernet/8390/mac8390.c | 19 +++---
drivers/s390/scsi/zfcp_erp.c | 14 +++++
drivers/s390/scsi/zfcp_ext.h | 2 +
drivers/s390/scsi/zfcp_scsi.c | 4 ++
drivers/staging/android/ashmem.c | 4 +-
drivers/staging/android/ion/ion_carveout_heap.c | 2 +-
drivers/staging/android/sync.c | 6 +-
drivers/staging/android/uapi/ashmem.h | 1 +
drivers/staging/goldfish/goldfish_audio.c | 1 +
drivers/tty/serial/kgdboc.c | 4 +-
drivers/tty/serial/max310x.c | 2 +
drivers/usb/gadget/configfs.c | 2 +
drivers/usb/host/xhci-ring.c | 9 ++-
drivers/usb/host/xhci.h | 1 +
drivers/usb/serial/cp210x.c | 1 +
drivers/usb/serial/ftdi_sio.c | 2 +
drivers/usb/serial/ftdi_sio_ids.h | 4 +-
drivers/usb/serial/mos7720.c | 4 +-
drivers/video/fbdev/goldfishfb.c | 2 +-
fs/ext4/ext4_jbd2.h | 2 +-
fs/ext4/file.c | 2 +-
fs/ext4/indirect.c | 12 ++--
fs/proc/proc_sysctl.c | 3 +-
fs/udf/truncate.c | 3 +
include/asm-generic/fixmap.h | 12 ++--
include/linux/rculist.h | 36 +++++++++++
include/net/sctp/checksum.h | 2 +-
include/net/sock.h | 6 ++
kernel/futex.c | 4 ++
net/bluetooth/l2cap_core.c | 83 ++++++++++++++++---------
net/dccp/ipv6.c | 4 +-
net/ipv6/icmp.c | 2 +-
net/ipv6/tcp_ipv6.c | 8 +--
net/packet/af_packet.c | 4 +-
net/rose/rose_subr.c | 21 ++++---
net/wireless/nl80211.c | 16 ++---
sound/core/oss/pcm_oss.c | 43 ++++++-------
sound/core/pcm_native.c | 9 ++-
virt/kvm/kvm_main.c | 3 +
54 files changed, 293 insertions(+), 160 deletions(-)
This is the start of the stable review cycle for the 4.9.167 release.
There are 56 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed Apr 3 17:00:20 UTC 2019.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.167-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.167-rc1
Eric Biggers <ebiggers(a)google.com>
arm64: support keyctl() system call in 32-bit mode
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "USB: core: only clean up what we allocated"
Mathias Nyman <mathias.nyman(a)linux.intel.com>
xhci: Fix port resume done detection for SS ports with LPM enabled
Radoslav Gerganov <rgerganov(a)vmware.com>
USB: gadget: f_hid: fix deadlock in f_hidg_write()
Sean Christopherson <sean.j.christopherson(a)intel.com>
KVM: x86: Emulate MSR_IA32_ARCH_CAPABILITIES on AMD hosts
Sean Christopherson <sean.j.christopherson(a)intel.com>
KVM: Reject device ioctls from processes other than the VM's creator
Thomas Gleixner <tglx(a)linutronix.de>
x86/smp: Enforce CONFIG_HOTPLUG_CPU when SMP=y
Thomas Gleixner <tglx(a)linutronix.de>
cpu/hotplug: Prevent crash when CPU bringup fails on CONFIG_HOTPLUG_CPU=n
Adrian Hunter <adrian.hunter(a)intel.com>
perf intel-pt: Fix TSC slip
Yasushi Asano <yasano(a)jp.adit-jv.com>
usb: host: xhci-rcar: Add XHCI_TRUST_TX_LENGTH quirk
Fabrizio Castro <fabrizio.castro(a)bp.renesas.com>
usb: common: Consider only available nodes for dr_mode
Axel Lin <axel.lin(a)ingics.com>
gpio: adnp: Fix testing wrong value in adnp_gpio_direction_input
YueHaibing <yuehaibing(a)huawei.com>
fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links
Wentao Wang <witallwang(a)gmail.com>
Disable kgdboc failed by echo space to /sys/module/kgdboc/parameters/kgdboc
Bjørn Mork <bjorn(a)mork.no>
USB: serial: option: add Olicard 600
Mans Rullgard <mans(a)mansr.com>
USB: serial: option: set driver_info for SIM5218 and compatibles
Lin Yi <teroincn(a)163.com>
USB: serial: mos7720: fix mos_parport refcount imbalance on error path
George McCollister <george.mccollister(a)gmail.com>
USB: serial: ftdi_sio: add additional NovaTech products
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
USB: serial: cp210x: add new device id
Hoan Nguyen An <na-hoan(a)jinso.co.jp>
serial: sh-sci: Fix setting SCSCR_TIE while transferring data
Aditya Pakki <pakki001(a)umn.edu>
serial: max310x: Fix to avoid potential NULL pointer dereference
Malcolm Priestley <tvboxspy(a)gmail.com>
staging: vt6655: Fix interrupt race condition on device start up.
Malcolm Priestley <tvboxspy(a)gmail.com>
staging: vt6655: Remove vif check from vnt_interrupt
Ian Abbott <abbotti(a)mev.co.uk>
staging: comedi: ni_mio_common: Fix divide-by-zero for DIO cmdtest
Kangjie Lu <kjlu(a)umn.edu>
tty: atmel_serial: fix a potential NULL pointer dereference
Steffen Maier <maier(a)linux.ibm.com>
scsi: zfcp: fix scsi_eh host reset with port_forced ERP for non-NPIV FCP devices
Steffen Maier <maier(a)linux.ibm.com>
scsi: zfcp: fix rport unblock if deleted SCSI devices on Scsi_Host
Martin K. Petersen <martin.petersen(a)oracle.com>
scsi: sd: Quiesce warning if device does not report optimal I/O size
Bart Van Assche <bvanassche(a)acm.org>
scsi: sd: Fix a race between closing an sd device and sd I/O
Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
fs/open.c: allow opening only regular files during execve()
Takashi Iwai <tiwai(a)suse.de>
ALSA: pcm: Don't suspend stream in unrecoverable PCM state
Takashi Iwai <tiwai(a)suse.de>
ALSA: pcm: Fix possible OOB access in PCM oss plugins
Gustavo A. R. Silva <gustavo(a)embeddedor.com>
ALSA: seq: oss: Fix Spectre v1 vulnerability
Gustavo A. R. Silva <gustavo(a)embeddedor.com>
ALSA: rawmidi: Fix potential Spectre v1 vulnerability
Christian Lamparter <chunkeey(a)gmail.com>
net: dsa: qca8k: remove leftover phy accessors
Olga Kornievskaia <kolga(a)netapp.com>
NFSv4.1 don't free interrupted slot on open
Naveen N. Rao <naveen.n.rao(a)linux.vnet.ibm.com>
powerpc: bpf: Fix generation of load/store DW instructions
Kohji Okuno <okuno.kohji(a)jp.panasonic.com>
ARM: imx6q: cpuidle: fix bug that CPU might not wake up at expected time
Andrea Righi <andrea.righi(a)canonical.com>
btrfs: raid56: properly unmap parity page in finish_parity_scrub()
Josef Bacik <josef(a)toxicpanda.com>
btrfs: remove WARN_ON in log_dir_items
Eric Dumazet <edumazet(a)google.com>
tun: add a missing rcu_read_unlock() in error path
Eric Dumazet <edumazet(a)google.com>
tun: properly test for IFF_UP
Finn Thain <fthain(a)telegraphics.com.au>
mac8390: Fix mmio access size probe
Xin Long <lucien.xin(a)gmail.com>
sctp: get sctphdr by offset in sctp_compute_cksum
Zhiqiang Liu <liuzhiqiang26(a)huawei.com>
vxlan: Don't call gro_cells_destroy() before device is unregistered
Eric Dumazet <edumazet(a)google.com>
tcp: do not use ipv6 header for ipv4 flow
Maxime Chevallier <maxime.chevallier(a)bootlin.com>
packets: Always register packet sk in the same order
Eric Dumazet <edumazet(a)google.com>
net: rose: fix a possible stack overflow
Christoph Paasch <cpaasch(a)apple.com>
net/packet: Set __GFP_NOWARN upon allocation in alloc_pg_vec
Bjorn Helgaas <bhelgaas(a)google.com>
mISDN: hfcpci: Test both vendor & device ID for Digium HFC4S
Eric Dumazet <edumazet(a)google.com>
dccp: do not use ipv6 header for ipv4 flow
Bhadram Varka <vbhadram(a)nvidia.com>
stmmac: copy unicast mac address to MAC registers
Johannes Berg <johannes.berg(a)intel.com>
cfg80211: size various nl80211 messages correctly
Christoffer Dall <christoffer.dall(a)linaro.org>
video: fbdev: Set pixclock = 0 in goldfishfb
Marcel Holtmann <marcel(a)holtmann.org>
Bluetooth: Verify that l2cap_get_conf_opt provides large enough buffer
Marcel Holtmann <marcel(a)holtmann.org>
Bluetooth: Check L2CAP option sizes returned from l2cap_get_conf_opt
-------------
Diffstat:
Documentation/virtual/kvm/api.txt | 16 +++--
Makefile | 4 +-
arch/arm/mach-imx/cpuidle-imx6q.c | 27 +++----
arch/arm64/Kconfig | 4 ++
arch/powerpc/include/asm/ppc-opcode.h | 2 +
arch/powerpc/net/bpf_jit.h | 17 ++---
arch/powerpc/net/bpf_jit32.h | 4 ++
arch/powerpc/net/bpf_jit64.h | 20 ++++++
arch/powerpc/net/bpf_jit_comp64.c | 12 ++--
arch/x86/Kconfig | 8 +--
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/vmx.c | 14 ----
arch/x86/kvm/x86.c | 12 ++++
drivers/gpio/gpio-adnp.c | 6 +-
drivers/isdn/hardware/mISDN/hfcmulti.c | 3 +-
drivers/net/dsa/qca8k.c | 18 -----
drivers/net/ethernet/8390/mac8390.c | 19 +++--
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 16 ++++-
drivers/net/tun.c | 16 +++--
drivers/net/vxlan.c | 4 +-
drivers/s390/scsi/zfcp_erp.c | 17 +++++
drivers/s390/scsi/zfcp_ext.h | 2 +
drivers/s390/scsi/zfcp_scsi.c | 4 ++
drivers/scsi/sd.c | 22 ++++--
drivers/staging/comedi/comedidev.h | 2 +
drivers/staging/comedi/drivers.c | 33 +++++++--
drivers/staging/comedi/drivers/ni_mio_common.c | 10 ++-
drivers/staging/vt6655/device_main.c | 11 ++-
drivers/tty/serial/atmel_serial.c | 4 ++
drivers/tty/serial/kgdboc.c | 4 +-
drivers/tty/serial/max310x.c | 2 +
drivers/tty/serial/sh-sci.c | 12 +---
drivers/usb/common/common.c | 2 +
drivers/usb/core/config.c | 9 +--
drivers/usb/gadget/function/f_hid.c | 6 +-
drivers/usb/host/xhci-rcar.c | 1 +
drivers/usb/host/xhci-ring.c | 9 ++-
drivers/usb/host/xhci.h | 1 +
drivers/usb/serial/cp210x.c | 1 +
drivers/usb/serial/ftdi_sio.c | 2 +
drivers/usb/serial/ftdi_sio_ids.h | 4 +-
drivers/usb/serial/mos7720.c | 4 +-
drivers/usb/serial/option.c | 13 ++--
drivers/video/fbdev/goldfishfb.c | 2 +-
fs/btrfs/raid56.c | 3 +-
fs/btrfs/tree-log.c | 11 ++-
fs/nfs/nfs4proc.c | 3 +-
fs/open.c | 6 ++
fs/proc/proc_sysctl.c | 3 +-
include/net/sctp/checksum.h | 2 +-
include/net/sock.h | 6 ++
kernel/cpu.c | 20 +++++-
net/bluetooth/l2cap_core.c | 83 ++++++++++++++--------
net/dccp/ipv6.c | 4 +-
net/ipv6/tcp_ipv6.c | 8 +--
net/packet/af_packet.c | 4 +-
net/rose/rose_subr.c | 21 +++---
net/wireless/nl80211.c | 16 ++---
sound/core/oss/pcm_oss.c | 43 +++++------
sound/core/pcm_native.c | 9 ++-
sound/core/rawmidi.c | 2 +
sound/core/seq/oss/seq_oss_synth.c | 7 +-
.../perf/util/intel-pt-decoder/intel-pt-decoder.c | 20 +++---
virt/kvm/kvm_main.c | 3 +
64 files changed, 422 insertions(+), 252 deletions(-)
When doing re-add, we need to ensure rdev->mddev->pers is not NULL,
which can avoid potential NULL pointer derefence in fallowing
add_bound_rdev().
Fixes: a6da4ef85cef ("md: re-add a failed disk")
Cc: Xiao Ni <xni(a)redhat.com>
Cc: NeilBrown <neilb(a)suse.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Yufen Yu <yuyufen(a)huawei.com>
---
drivers/md/md.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 875b29ba5926..66b6bdf9f364 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2859,8 +2859,10 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
err = 0;
}
} else if (cmd_match(buf, "re-add")) {
- if (test_bit(Faulty, &rdev->flags) && (rdev->raid_disk == -1) &&
- rdev->saved_raid_disk >= 0) {
+ if (!rdev->mddev->pers)
+ err = -EINVAL;
+ else if (test_bit(Faulty, &rdev->flags) && (rdev->raid_disk == -1) &&
+ rdev->saved_raid_disk >= 0) {
/* clear_bit is performed _after_ all the devices
* have their local Faulty bit cleared. If any writes
* happen in the meantime in the local node, they
--
2.16.2.dirty
Since commit a983b5ebee57 ("mm: memcontrol: fix excessive complexity in
memory.stat reporting") memcg dirty and writeback counters are managed
as:
1) per-memcg per-cpu values in range of [-32..32]
2) per-memcg atomic counter
When a per-cpu counter cannot fit in [-32..32] it's flushed to the
atomic. Stat readers only check the atomic.
Thus readers such as balance_dirty_pages() may see a nontrivial error
margin: 32 pages per cpu.
Assuming 100 cpus:
4k x86 page_size: 13 MiB error per memcg
64k ppc page_size: 200 MiB error per memcg
Considering that dirty+writeback are used together for some decisions
the errors double.
This inaccuracy can lead to undeserved oom kills. One nasty case is
when all per-cpu counters hold positive values offsetting an atomic
negative value (i.e. per_cpu[*]=32, atomic=n_cpu*-32).
balance_dirty_pages() only consults the atomic and does not consider
throttling the next n_cpu*32 dirty pages. If the file_lru is in the
13..200 MiB range then there's absolutely no dirty throttling, which
burdens vmscan with only dirty+writeback pages thus resorting to oom
kill.
It could be argued that tiny containers are not supported, but it's more
subtle. It's the amount the space available for file lru that matters.
If a container has memory.max-200MiB of non reclaimable memory, then it
will also suffer such oom kills on a 100 cpu machine.
The following test reliably ooms without this patch. This patch avoids
oom kills.
$ cat test
mount -t cgroup2 none /dev/cgroup
cd /dev/cgroup
echo +io +memory > cgroup.subtree_control
mkdir test
cd test
echo 10M > memory.max
(echo $BASHPID > cgroup.procs && exec /memcg-writeback-stress /foo)
(echo $BASHPID > cgroup.procs && exec dd if=/dev/zero of=/foo bs=2M count=100)
$ cat memcg-writeback-stress.c
/*
* Dirty pages from all but one cpu.
* Clean pages from the non dirtying cpu.
* This is to stress per cpu counter imbalance.
* On a 100 cpu machine:
* - per memcg per cpu dirty count is 32 pages for each of 99 cpus
* - per memcg atomic is -99*32 pages
* - thus the complete dirty limit: sum of all counters 0
* - balance_dirty_pages() only sees atomic count -99*32 pages, which
* it max()s to 0.
* - So a workload can dirty -99*32 pages before balance_dirty_pages()
* cares.
*/
#define _GNU_SOURCE
#include <err.h>
#include <fcntl.h>
#include <sched.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/stat.h>
#include <sys/sysinfo.h>
#include <sys/types.h>
#include <unistd.h>
static char *buf;
static int bufSize;
static void set_affinity(int cpu)
{
cpu_set_t affinity;
CPU_ZERO(&affinity);
CPU_SET(cpu, &affinity);
if (sched_setaffinity(0, sizeof(affinity), &affinity))
err(1, "sched_setaffinity");
}
static void dirty_on(int output_fd, int cpu)
{
int i, wrote;
set_affinity(cpu);
for (i = 0; i < 32; i++) {
for (wrote = 0; wrote < bufSize; ) {
int ret = write(output_fd, buf+wrote, bufSize-wrote);
if (ret == -1)
err(1, "write");
wrote += ret;
}
}
}
int main(int argc, char **argv)
{
int cpu, flush_cpu = 1, output_fd;
const char *output;
if (argc != 2)
errx(1, "usage: output_file");
output = argv[1];
bufSize = getpagesize();
buf = malloc(getpagesize());
if (buf == NULL)
errx(1, "malloc failed");
output_fd = open(output, O_CREAT|O_RDWR);
if (output_fd == -1)
err(1, "open(%s)", output);
for (cpu = 0; cpu < get_nprocs(); cpu++) {
if (cpu != flush_cpu)
dirty_on(output_fd, cpu);
}
set_affinity(flush_cpu);
if (fsync(output_fd))
err(1, "fsync(%s)", output);
if (close(output_fd))
err(1, "close(%s)", output);
free(buf);
}
Make balance_dirty_pages() and wb_over_bg_thresh() work harder to
collect exact per memcg counters. This avoids the aforementioned oom
kills.
This does not affect the overhead of memory.stat, which still reads the
single atomic counter.
Why not use percpu_counter? memcg already handles cpus going offline,
so no need for that overhead from percpu_counter. And the
percpu_counter spinlocks are more heavyweight than is required.
It probably also makes sense to use exact dirty and writeback counters
in memcg oom reports. But that is saved for later.
Cc: stable(a)vger.kernel.org # v4.16+
Signed-off-by: Greg Thelen <gthelen(a)google.com>
---
Changelog since v1:
- Move memcg_exact_page_state() into memcontrol.c.
- Unconditionally gather exact (per cpu) counters in mem_cgroup_wb_stats(), it's
not called in performance sensitive paths.
- Unconditionally check for underflow regardless of CONFIG_SMP. It's just
easier this way. This isn't performance sensitive.
- Add stable tag.
include/linux/memcontrol.h | 5 ++++-
mm/memcontrol.c | 20 ++++++++++++++++++--
2 files changed, 22 insertions(+), 3 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 1f3d880b7ca1..dbb6118370c1 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -566,7 +566,10 @@ struct mem_cgroup *lock_page_memcg(struct page *page);
void __unlock_page_memcg(struct mem_cgroup *memcg);
void unlock_page_memcg(struct page *page);
-/* idx can be of type enum memcg_stat_item or node_stat_item */
+/*
+ * idx can be of type enum memcg_stat_item or node_stat_item.
+ * Keep in sync with memcg_exact_page_state().
+ */
static inline unsigned long memcg_page_state(struct mem_cgroup *memcg,
int idx)
{
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 532e0e2a4817..81a0d3914ec9 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3882,6 +3882,22 @@ struct wb_domain *mem_cgroup_wb_domain(struct bdi_writeback *wb)
return &memcg->cgwb_domain;
}
+/*
+ * idx can be of type enum memcg_stat_item or node_stat_item.
+ * Keep in sync with memcg_exact_page().
+ */
+static unsigned long memcg_exact_page_state(struct mem_cgroup *memcg, int idx)
+{
+ long x = atomic_long_read(&memcg->stat[idx]);
+ int cpu;
+
+ for_each_online_cpu(cpu)
+ x += per_cpu_ptr(memcg->stat_cpu, cpu)->count[idx];
+ if (x < 0)
+ x = 0;
+ return x;
+}
+
/**
* mem_cgroup_wb_stats - retrieve writeback related stats from its memcg
* @wb: bdi_writeback in question
@@ -3907,10 +3923,10 @@ void mem_cgroup_wb_stats(struct bdi_writeback *wb, unsigned long *pfilepages,
struct mem_cgroup *memcg = mem_cgroup_from_css(wb->memcg_css);
struct mem_cgroup *parent;
- *pdirty = memcg_page_state(memcg, NR_FILE_DIRTY);
+ *pdirty = memcg_exact_page_state(memcg, NR_FILE_DIRTY);
/* this should eventually include NR_UNSTABLE_NFS */
- *pwriteback = memcg_page_state(memcg, NR_WRITEBACK);
+ *pwriteback = memcg_exact_page_state(memcg, NR_WRITEBACK);
*pfilepages = mem_cgroup_nr_lru_pages(memcg, (1 << LRU_INACTIVE_FILE) |
(1 << LRU_ACTIVE_FILE));
*pheadroom = PAGE_COUNTER_MAX;
--
2.21.0.392.gf8f6787159e-goog
The patch titled
Subject: mm: writeback: use exact memcg dirty counts
has been added to the -mm tree. Its filename is
writeback-use-exact-memcg-dirty-counts.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/writeback-use-exact-memcg-dirty-co…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/writeback-use-exact-memcg-dirty-co…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Greg Thelen <gthelen(a)google.com>
Subject: mm: writeback: use exact memcg dirty counts
Since a983b5ebee57 ("mm: memcontrol: fix excessive complexity in
memory.stat reporting") memcg dirty and writeback counters are managed as:
1) per-memcg per-cpu values in range of [-32..32]
2) per-memcg atomic counter
When a per-cpu counter cannot fit in [-32..32] it's flushed to the atomic.
Stat readers only check the atomic. Thus readers such as
balance_dirty_pages() may see a nontrivial error margin: 32 pages per cpu.
Assuming 100 cpus:
4k x86 page_size: 13 MiB error per memcg
64k ppc page_size: 200 MiB error per memcg
Considering that dirty+writeback are used together for some decisions the
errors double.
This inaccuracy can lead to undeserved oom kills. One nasty case is when
all per-cpu counters hold positive values offsetting an atomic negative
value (i.e. per_cpu[*]=32, atomic=n_cpu*-32). balance_dirty_pages() only
consults the atomic and does not consider throttling the next n_cpu*32
dirty pages. If the file_lru is in the 13..200 MiB range then there's
absolutely no dirty throttling, which burdens vmscan with only
dirty+writeback pages thus resorting to oom kill.
It could be argued that tiny containers are not supported, but it's more
subtle. It's the amount the space available for file lru that matters.
If a container has memory.max-200MiB of non reclaimable memory, then it
will also suffer such oom kills on a 100 cpu machine.
The following test reliably ooms without this patch. This patch avoids
oom kills.
$ cat test
mount -t cgroup2 none /dev/cgroup
cd /dev/cgroup
echo +io +memory > cgroup.subtree_control
mkdir test
cd test
echo 10M > memory.max
(echo $BASHPID > cgroup.procs && exec /memcg-writeback-stress /foo)
(echo $BASHPID > cgroup.procs && exec dd if=/dev/zero of=/foo bs=2M count=100)
$ cat memcg-writeback-stress.c
/*
* Dirty pages from all but one cpu.
* Clean pages from the non dirtying cpu.
* This is to stress per cpu counter imbalance.
* On a 100 cpu machine:
* - per memcg per cpu dirty count is 32 pages for each of 99 cpus
* - per memcg atomic is -99*32 pages
* - thus the complete dirty limit: sum of all counters 0
* - balance_dirty_pages() only sees atomic count -99*32 pages, which
* it max()s to 0.
* - So a workload can dirty -99*32 pages before balance_dirty_pages()
* cares.
*/
#define _GNU_SOURCE
#include <err.h>
#include <fcntl.h>
#include <sched.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/stat.h>
#include <sys/sysinfo.h>
#include <sys/types.h>
#include <unistd.h>
static char *buf;
static int bufSize;
static void set_affinity(int cpu)
{
cpu_set_t affinity;
CPU_ZERO(&affinity);
CPU_SET(cpu, &affinity);
if (sched_setaffinity(0, sizeof(affinity), &affinity))
err(1, "sched_setaffinity");
}
static void dirty_on(int output_fd, int cpu)
{
int i, wrote;
set_affinity(cpu);
for (i = 0; i < 32; i++) {
for (wrote = 0; wrote < bufSize; ) {
int ret = write(output_fd, buf+wrote, bufSize-wrote);
if (ret == -1)
err(1, "write");
wrote += ret;
}
}
}
int main(int argc, char **argv)
{
int cpu, flush_cpu = 1, output_fd;
const char *output;
if (argc != 2)
errx(1, "usage: output_file");
output = argv[1];
bufSize = getpagesize();
buf = malloc(getpagesize());
if (buf == NULL)
errx(1, "malloc failed");
output_fd = open(output, O_CREAT|O_RDWR);
if (output_fd == -1)
err(1, "open(%s)", output);
for (cpu = 0; cpu < get_nprocs(); cpu++) {
if (cpu != flush_cpu)
dirty_on(output_fd, cpu);
}
set_affinity(flush_cpu);
if (fsync(output_fd))
err(1, "fsync(%s)", output);
if (close(output_fd))
err(1, "close(%s)", output);
free(buf);
}
Make balance_dirty_pages() and wb_over_bg_thresh() work harder to collect
exact per memcg counters. This avoids the aforementioned oom kills.
This does not affect the overhead of memory.stat, which still reads the
single atomic counter.
Why not use percpu_counter? memcg already handles cpus going offline, so
no need for that overhead from percpu_counter. And the percpu_counter
spinlocks are more heavyweight than is required.
It probably also makes sense to use exact dirty and writeback counters in
memcg oom reports. But that is saved for later.
Link: http://lkml.kernel.org/r/20190329174609.164344-1-gthelen@google.com
Signed-off-by: Greg Thelen <gthelen(a)google.com>
Reviewed-by: Roman Gushchin <guro(a)fb.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [4.16+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/memcontrol.h | 5 ++++-
mm/memcontrol.c | 20 ++++++++++++++++++--
2 files changed, 22 insertions(+), 3 deletions(-)
--- a/include/linux/memcontrol.h~writeback-use-exact-memcg-dirty-counts
+++ a/include/linux/memcontrol.h
@@ -566,7 +566,10 @@ struct mem_cgroup *lock_page_memcg(struc
void __unlock_page_memcg(struct mem_cgroup *memcg);
void unlock_page_memcg(struct page *page);
-/* idx can be of type enum memcg_stat_item or node_stat_item */
+/*
+ * idx can be of type enum memcg_stat_item or node_stat_item.
+ * Keep in sync with memcg_exact_page_state().
+ */
static inline unsigned long memcg_page_state(struct mem_cgroup *memcg,
int idx)
{
--- a/mm/memcontrol.c~writeback-use-exact-memcg-dirty-counts
+++ a/mm/memcontrol.c
@@ -3882,6 +3882,22 @@ struct wb_domain *mem_cgroup_wb_domain(s
return &memcg->cgwb_domain;
}
+/*
+ * idx can be of type enum memcg_stat_item or node_stat_item.
+ * Keep in sync with memcg_exact_page().
+ */
+static unsigned long memcg_exact_page_state(struct mem_cgroup *memcg, int idx)
+{
+ long x = atomic_long_read(&memcg->stat[idx]);
+ int cpu;
+
+ for_each_online_cpu(cpu)
+ x += per_cpu_ptr(memcg->stat_cpu, cpu)->count[idx];
+ if (x < 0)
+ x = 0;
+ return x;
+}
+
/**
* mem_cgroup_wb_stats - retrieve writeback related stats from its memcg
* @wb: bdi_writeback in question
@@ -3907,10 +3923,10 @@ void mem_cgroup_wb_stats(struct bdi_writ
struct mem_cgroup *memcg = mem_cgroup_from_css(wb->memcg_css);
struct mem_cgroup *parent;
- *pdirty = memcg_page_state(memcg, NR_FILE_DIRTY);
+ *pdirty = memcg_exact_page_state(memcg, NR_FILE_DIRTY);
/* this should eventually include NR_UNSTABLE_NFS */
- *pwriteback = memcg_page_state(memcg, NR_WRITEBACK);
+ *pwriteback = memcg_exact_page_state(memcg, NR_WRITEBACK);
*pfilepages = mem_cgroup_nr_lru_pages(memcg, (1 << LRU_INACTIVE_FILE) |
(1 << LRU_ACTIVE_FILE));
*pheadroom = PAGE_COUNTER_MAX;
_
Patches currently in -mm which might be from gthelen(a)google.com are
writeback-use-exact-memcg-dirty-counts.patch
The patch titled
Subject: mm/huge_memory.c: fix modifying of page protection by insert_pfn_pmd()
has been added to the -mm tree. Its filename is
mm-fix-modifying-of-page-protection-by-insert_pfn_pmd.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-fix-modifying-of-page-protectio…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-fix-modifying-of-page-protectio…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com>
Subject: mm/huge_memory.c: fix modifying of page protection by insert_pfn_pmd()
With some architectures like ppc64, set_pmd_at() cannot cope with a
situation where there is already some (different) valid entry present.
Use pmdp_set_access_flags() instead to modify the pfn which is built to
deal with modifying existing PMD entries.
This is similar to cae85cb8add3 ("mm/memory.c: fix modifying of page
protection by insert_pfn()")
We also do similar update w.r.t insert_pfn_pud eventhough ppc64 don't
support pud pfn entries now.
Without this patch we also see the below message in kernel log "BUG:
non-zero pgtables_bytes on freeing mm:"
Link: http://lkml.kernel.org/r/20190402115125.18803-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Reported-by: Chandan Rajendra <chandan(a)linux.ibm.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 36 ++++++++++++++++++++++++++++++++++++
1 file changed, 36 insertions(+)
--- a/mm/huge_memory.c~mm-fix-modifying-of-page-protection-by-insert_pfn_pmd
+++ a/mm/huge_memory.c
@@ -755,6 +755,21 @@ static void insert_pfn_pmd(struct vm_are
spinlock_t *ptl;
ptl = pmd_lock(mm, pmd);
+ if (!pmd_none(*pmd)) {
+ if (write) {
+ if (pmd_pfn(*pmd) != pfn_t_to_pfn(pfn)) {
+ WARN_ON_ONCE(!is_huge_zero_pmd(*pmd));
+ goto out_unlock;
+ }
+ entry = pmd_mkyoung(*pmd);
+ entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
+ if (pmdp_set_access_flags(vma, addr, pmd, entry, 1))
+ update_mmu_cache_pmd(vma, addr, pmd);
+ }
+
+ goto out_unlock;
+ }
+
entry = pmd_mkhuge(pfn_t_pmd(pfn, prot));
if (pfn_t_devmap(pfn))
entry = pmd_mkdevmap(entry);
@@ -766,11 +781,16 @@ static void insert_pfn_pmd(struct vm_are
if (pgtable) {
pgtable_trans_huge_deposit(mm, pmd, pgtable);
mm_inc_nr_ptes(mm);
+ pgtable = NULL;
}
set_pmd_at(mm, addr, pmd, entry);
update_mmu_cache_pmd(vma, addr, pmd);
+
+out_unlock:
spin_unlock(ptl);
+ if (pgtable)
+ pte_free(mm, pgtable);
}
vm_fault_t vmf_insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
@@ -821,6 +841,20 @@ static void insert_pfn_pud(struct vm_are
spinlock_t *ptl;
ptl = pud_lock(mm, pud);
+ if (!pud_none(*pud)) {
+ if (write) {
+ if (pud_pfn(*pud) != pfn_t_to_pfn(pfn)) {
+ WARN_ON_ONCE(!is_huge_zero_pud(*pud));
+ goto out_unlock;
+ }
+ entry = pud_mkyoung(*pud);
+ entry = maybe_pud_mkwrite(pud_mkdirty(entry), vma);
+ if (pudp_set_access_flags(vma, addr, pud, entry, 1))
+ update_mmu_cache_pud(vma, addr, pud);
+ }
+ goto out_unlock;
+ }
+
entry = pud_mkhuge(pfn_t_pud(pfn, prot));
if (pfn_t_devmap(pfn))
entry = pud_mkdevmap(entry);
@@ -830,6 +864,8 @@ static void insert_pfn_pud(struct vm_are
}
set_pud_at(mm, addr, pud, entry);
update_mmu_cache_pud(vma, addr, pud);
+
+out_unlock:
spin_unlock(ptl);
}
_
Patches currently in -mm which might be from aneesh.kumar(a)linux.ibm.com are
mm-fix-modifying-of-page-protection-by-insert_pfn_pmd.patch
mm-page_mkclean-vs-madv_dontneed-race.patch