I'm announcing the release of the 4.19.64 kernel.
All users of the 4.19 kernel series must upgrade.
The updated 4.19.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.19.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/arm64/include/asm/compat.h | 1
arch/sh/boards/Kconfig | 14 -
block/blk-core.c | 35 +--
block/blk-mq-debugfs.c | 10
drivers/android/binder.c | 16 +
drivers/bluetooth/hci_ath.c | 3
drivers/bluetooth/hci_bcm.c | 3
drivers/bluetooth/hci_intel.c | 3
drivers/bluetooth/hci_ldisc.c | 13 +
drivers/bluetooth/hci_mrvl.c | 3
drivers/bluetooth/hci_qca.c | 3
drivers/bluetooth/hci_uart.h | 1
drivers/iommu/intel-iommu.c | 2
drivers/iommu/iova.c | 18 +
drivers/isdn/hardware/mISDN/hfcsusb.c | 3
drivers/media/radio/radio-raremono.c | 30 ++
drivers/media/usb/au0828/au0828-core.c | 12 -
drivers/media/usb/cpia2/cpia2_usb.c | 3
drivers/media/usb/pvrusb2/pvrusb2-hdw.c | 4
drivers/media/usb/pvrusb2/pvrusb2-i2c-core.c | 6
drivers/media/usb/pvrusb2/pvrusb2-std.c | 2
drivers/net/wireless/ath/ath10k/usb.c | 2
drivers/pps/pps.c | 8
drivers/scsi/scsi_lib.c | 15 -
drivers/usb/dwc2/gadget.c | 41 ++-
drivers/vhost/net.c | 41 +--
drivers/vhost/scsi.c | 15 +
drivers/vhost/vhost.c | 20 +
drivers/vhost/vhost.h | 5
drivers/vhost/vsock.c | 28 +-
fs/ceph/caps.c | 7
fs/exec.c | 2
fs/nfs/client.c | 4
fs/nfs/dir.c | 295 ++++++++++++++-------------
fs/nfs/nfs4proc.c | 15 +
fs/proc/base.c | 132 ++++++------
include/linux/blkdev.h | 14 -
include/linux/iova.h | 6
include/linux/sched.h | 10
include/linux/sched/numa_balancing.h | 4
kernel/fork.c | 2
kernel/sched/fair.c | 144 +++++++++----
net/ipv4/ip_tunnel_core.c | 9
net/vmw_vsock/af_vsock.c | 38 ---
net/vmw_vsock/hyperv_transport.c | 108 +++++++--
46 files changed, 724 insertions(+), 428 deletions(-)
Andrey Konovalov (1):
media: pvrusb2: use a different format for warnings
Bart Van Assche (2):
block, scsi: Change the preempt-only flag into a counter
scsi: core: Avoid that a kernel warning appears during system resume
Benjamin Coddington (1):
NFS: Cleanup if nfs_match_client is interrupted
Dmitry Safonov (1):
iommu/vt-d: Don't queue_iova() if there is no flush queue
Fabio Estevam (1):
ath10k: Change the warning message string
Greg Kroah-Hartman (1):
Linux 4.19.64
Jann Horn (2):
sched/fair: Don't free p->numa_faults with concurrent readers
sched/fair: Use RCU accessors consistently for ->numa_group
Jason Wang (4):
vhost: introduce vhost_exceeds_weight()
vhost_net: fix possible infinite loop
vhost: vsock: add weight support
vhost: scsi: add weight support
Joerg Roedel (1):
iommu/iova: Fix compilation error with !CONFIG_IOMMU_IOVA
Linus Torvalds (2):
/proc/<pid>/cmdline: remove all the special cases
/proc/<pid>/cmdline: add back the setproctitle() special case
Luke Nowakowski-Krijger (1):
media: radio-raremono: change devm_k*alloc to k*alloc
Minas Harutyunyan (2):
usb: dwc2: Disable all EP's on disconnect
usb: dwc2: Fix disable all EP's on disconnect
Miroslav Lichvar (1):
drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl
Oliver Neukum (1):
media: cpia2_usb: first wake up, then free in disconnect
Phong Tran (1):
ISDN: hfcsusb: checking idx of ep configuration
Sean Young (1):
media: au0828: fix null dereference in error path
Sunil Muthuswamy (2):
hv_sock: Add support for delayed close
vsock: correct removal of socket from the list
Todd Kjos (1):
binder: fix possible UAF when freeing buffer
Trond Myklebust (3):
NFS: Fix dentry revalidation on NFSv4 lookup
NFS: Refactor nfs_lookup_revalidate()
NFSv4: Fix lookup revalidate of regular files
Vladis Dronov (1):
Bluetooth: hci_uart: check for missing tty operations
Will Deacon (1):
arm64: compat: Provide definition for COMPAT_SIGMINSTKSZ
Xin Long (1):
ip_tunnel: allow not to count pkts on tstats by setting skb's dev to NULL
Yan, Zheng (1):
ceph: hold i_ceph_lock when removing caps for freeing inode
Yoshinori Sato (1):
Fix allyesconfig output.
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: don't drop front-end reference count for ->detach
Author: Arnd Bergmann <arnd(a)arndb.de>
Date: Wed Jun 19 10:24:17 2019 -0300
A bugfix introduce a link failure in configurations without CONFIG_MODULES:
In file included from drivers/media/usb/dvb-usb/pctv452e.c:20:0:
drivers/media/usb/dvb-usb/pctv452e.c: In function 'pctv452e_frontend_attach':
drivers/media/dvb-frontends/stb0899_drv.h:151:36: error: weak declaration of 'stb0899_attach' being applied to a already existing, static definition
The problem is that the !IS_REACHABLE() declaration of stb0899_attach()
is a 'static inline' definition that clashes with the weak definition.
I further observed that the bugfix was only done for one of the five users
of stb0899_attach(), the other four still have the problem. This reverts
the bugfix and instead addresses the problem by not dropping the reference
count when calling '->detach()', instead we call this function directly
in dvb_frontend_put() before dropping the kref on the front-end.
I first submitted this in early 2018, and after some discussion it
was apparently discarded. While there is a long-term plan in place,
that plan is obviously not nearing completion yet, and the current
kernel is still broken unless this patch is applied.
Link: https://patchwork.kernel.org/patch/10140175/
Link: https://patchwork.linuxtv.org/patch/54831/
Cc: Max Kellermann <max.kellermann(a)gmail.com>
Cc: Wolfgang Rohdewald <wolfgang(a)rohdewald.de>
Cc: stable(a)vger.kernel.org
Fixes: f686c14364ad ("[media] stb0899: move code to "detach" callback")
Fixes: 6cdeaed3b142 ("media: dvb_usb_pctv452e: module refcount changes were unbalanced")
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Signed-off-by: Sean Young <sean(a)mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung(a)kernel.org>
drivers/media/dvb-core/dvb_frontend.c | 4 +++-
drivers/media/usb/dvb-usb/pctv452e.c | 8 --------
2 files changed, 3 insertions(+), 9 deletions(-)
---
diff --git a/drivers/media/dvb-core/dvb_frontend.c b/drivers/media/dvb-core/dvb_frontend.c
index 209186c5cd9b..06ea30a689d7 100644
--- a/drivers/media/dvb-core/dvb_frontend.c
+++ b/drivers/media/dvb-core/dvb_frontend.c
@@ -152,6 +152,9 @@ static void dvb_frontend_free(struct kref *ref)
static void dvb_frontend_put(struct dvb_frontend *fe)
{
+ /* call detach before dropping the reference count */
+ if (fe->ops.detach)
+ fe->ops.detach(fe);
/*
* Check if the frontend was registered, as otherwise
* kref was not initialized yet.
@@ -3040,7 +3043,6 @@ void dvb_frontend_detach(struct dvb_frontend *fe)
dvb_frontend_invoke_release(fe, fe->ops.release_sec);
dvb_frontend_invoke_release(fe, fe->ops.tuner_ops.release);
dvb_frontend_invoke_release(fe, fe->ops.analog_ops.release);
- dvb_frontend_invoke_release(fe, fe->ops.detach);
dvb_frontend_put(fe);
}
EXPORT_SYMBOL(dvb_frontend_detach);
diff --git a/drivers/media/usb/dvb-usb/pctv452e.c b/drivers/media/usb/dvb-usb/pctv452e.c
index d6b36e4f33d2..441d878fc22c 100644
--- a/drivers/media/usb/dvb-usb/pctv452e.c
+++ b/drivers/media/usb/dvb-usb/pctv452e.c
@@ -909,14 +909,6 @@ static int pctv452e_frontend_attach(struct dvb_usb_adapter *a)
&a->dev->i2c_adap);
if (!a->fe_adap[0].fe)
return -ENODEV;
-
- /*
- * dvb_frontend will call dvb_detach for both stb0899_detach
- * and stb0899_release but we only do dvb_attach(stb0899_attach).
- * Increment the module refcount instead.
- */
- symbol_get(stb0899_attach);
-
if ((dvb_attach(lnbp22_attach, a->fe_adap[0].fe,
&a->dev->i2c_adap)) == NULL)
err("Cannot attach lnbp22\n");
Hi,
[This is an automated email]
This commit has been processed because it contains a "Fixes:" tag,
fixing commit: e23008ec81ef NFSv4 reduce attribute requests for open reclaim.
The bot has tested the following trees: v5.2.5, v4.19.63, v4.14.135, v4.9.186, v4.4.186.
v5.2.5: Build OK!
v4.19.63: Failed to apply! Possible dependencies:
ace9fad43aa6 ("NFSv4: Convert struct nfs4_state to use refcount_t")
v4.14.135: Failed to apply! Possible dependencies:
ace9fad43aa6 ("NFSv4: Convert struct nfs4_state to use refcount_t")
c9399f21c215 ("NFSv4: Fix OPEN / CLOSE race")
v4.9.186: Failed to apply! Possible dependencies:
4e2fcac77390 ("NFSv4: Use correct inode in _nfs4_opendata_to_nfs4_state()")
75e8c48b9ef3 ("NFSv4: Use the nfs4_state being recovered in _nfs4_opendata_to_nfs4_state()")
ace9fad43aa6 ("NFSv4: Convert struct nfs4_state to use refcount_t")
c9399f21c215 ("NFSv4: Fix OPEN / CLOSE race")
v4.4.186: Failed to apply! Possible dependencies:
1393d9612ba0 ("NFSv4: Fix a race when updating an open_stateid")
4586f6e28327 ("NFSv4.1: Add a helper function to deal with expired stateids")
4e2fcac77390 ("NFSv4: Use correct inode in _nfs4_opendata_to_nfs4_state()")
75e8c48b9ef3 ("NFSv4: Use the nfs4_state being recovered in _nfs4_opendata_to_nfs4_state()")
8a64c4ef106d ("NFSv4.1: Even if the stateid is OK, we may need to recover the open modes")
a8ce377a5db8 ("nfs: track whether server sets MAY_NOTIFY_LOCK flag")
ace9fad43aa6 ("NFSv4: Convert struct nfs4_state to use refcount_t")
b134fc4a5333 ("NFSv4: Don't test open_stateid unless it is set")
c9399f21c215 ("NFSv4: Fix OPEN / CLOSE race")
NOTE: The patch will not be queued to stable trees until it is upstream.
How should we proceed with this patch?
--
Thanks,
Sasha
Hi,
[This is an automated email]
This commit has been processed because it contains a "Fixes:" tag,
fixing commit: 24311f884189 NFSv4: Recovery of recalled read delegations is broken.
The bot has tested the following trees: v5.2.5, v4.19.63, v4.14.135, v4.9.186, v4.4.186.
v5.2.5: Build OK!
v4.19.63: Failed to apply! Possible dependencies:
07d02a67b7fa ("SUNRPC: Simplify lookup code")
79b181810285 ("SUNRPC: Convert auth creds to use refcount_t")
8276c902bbe9 ("SUNRPC: remove uid and gid from struct auth_cred")
95cd623250ad ("SUNRPC: Clean up the AUTH cache code")
97f68c6b02e0 ("SUNRPC: add 'struct cred *' to auth_cred and rpc_cred")
a52458b48af1 ("NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.")
fc0664fd9bcc ("SUNRPC: remove groupinfo from struct auth_cred.")
v4.14.135: Failed to apply! Possible dependencies:
07d02a67b7fa ("SUNRPC: Simplify lookup code")
12f275cdd163 ("NFSv4: Retry CLOSE and DELEGRETURN on NFS4ERR_OLD_STATEID.")
1eb5d98f16f6 ("nfs: convert to new i_version API")
35156bfff3c0 ("NFSv4: Fix the nfs_inode_set_delegation() arguments")
79b181810285 ("SUNRPC: Convert auth creds to use refcount_t")
95cd623250ad ("SUNRPC: Clean up the AUTH cache code")
97f68c6b02e0 ("SUNRPC: add 'struct cred *' to auth_cred and rpc_cred")
a52458b48af1 ("NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.")
b3dce6a2f060 ("pnfs/blocklayout: handle transient devices")
fc0664fd9bcc ("SUNRPC: remove groupinfo from struct auth_cred.")
v4.9.186: Failed to apply! Possible dependencies:
1eb5d98f16f6 ("nfs: convert to new i_version API")
35156bfff3c0 ("NFSv4: Fix the nfs_inode_set_delegation() arguments")
39bc88e5e38e ("arm64: Disable TTBR0_EL1 during normal kernel execution")
7c0f6ba682b9 ("Replace <asm/uaccess.h> with <linux/uaccess.h> globally")
9cf09d68b89a ("arm64: xen: Enable user access before a privcmd hvc call")
a52458b48af1 ("NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.")
b3dce6a2f060 ("pnfs/blocklayout: handle transient devices")
bd38967d406f ("arm64: Factor out PAN enabling/disabling into separate uaccess_* macros")
v4.4.186: Failed to apply! Possible dependencies:
0654cc726fc6 ("NFSv4.1/pNFS: Add a helper to mark the layout as returned")
10335556c9e6 ("NFSv4.1/pNFS: pnfs_error_mark_layout_for_return() must always return layout")
13c13a6ad71f ("pNFS: Fix missing layoutreturn calls")
2454dfea0aef ("NFSv4.x/pnfs: Fix a race between layoutget and pnfs_destroy_layout")
3982a6a2d0e6 ("pnfs: keep track of the return sequence number in pnfs_layout_hdr")
4b0934baf931 ("NFSv4.1/pNFS: Fix a race in initiate_file_draining()")
506c0d68269e ("NFSv4.1/pNFS: Cleanup constify struct pnfs_layout_range arguments")
50f563ef5d41 ("NFSv4.1/pNFS: Use nfs4_stateid_copy for copying stateids")
5c97f5de2c7c ("NFSv4.1/pNFS: pnfs_mark_matching_lsegs_return() should set the iomode")
68d264cf02b0 ("NFS42: handle layoutstats stateid error")
6d597e175012 ("pnfs: only tear down lsegs that precede seqid in LAYOUTRETURN args")
71b39854a500 ("NFSv4.1/pNFS: Cleanup pnfs_mark_matching_lsegs_invalid()")
9fd4b9fc7695 ("NFSv4.x/pnfs: Fix a race between layoutget and bulk recalls")
a52458b48af1 ("NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.")
b20135d0b243 ("NFSv4.1/pNFS: Don't queue up a new commit if the layout segment is invalid")
b3dce6a2f060 ("pnfs/blocklayout: handle transient devices")
e036f46453f2 ("NFS: pnfs_mark_matching_lsegs_return() should match the layout sequence id")
e0d9243048fd ("NFSv4.1/pNFS: Don't return NFS4ERR_DELAY unnecessarily in CB_LAYOUTRECALL")
ed429d6b934d ("NFSv4.1/pNFS: Don't pass stateids by value to pnfs_send_layoutreturn()")
fc7ff36747b9 ("pNFS: If we have to delay the layout callback, mark the layout for return")
NOTE: The patch will not be queued to stable trees until it is upstream.
How should we proceed with this patch?
--
Thanks,
Sasha
This is the start of the stable review cycle for the 4.14.136 release.
There are 25 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun 04 Aug 2019 09:19:34 AM UTC.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.136-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.136-rc1
Yan, Zheng <zyan(a)redhat.com>
ceph: hold i_ceph_lock when removing caps for freeing inode
Yoshinori Sato <ysato(a)users.sourceforge.jp>
Fix allyesconfig output.
Miroslav Lichvar <mlichvar(a)redhat.com>
drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl
Jann Horn <jannh(a)google.com>
sched/fair: Don't free p->numa_faults with concurrent readers
Vladis Dronov <vdronov(a)redhat.com>
Bluetooth: hci_uart: check for missing tty operations
Sunil Muthuswamy <sunilmut(a)microsoft.com>
hv_sock: Add support for delayed close
Joerg Roedel <jroedel(a)suse.de>
iommu/iova: Fix compilation error with !CONFIG_IOMMU_IOVA
Dmitry Safonov <dima(a)arista.com>
iommu/vt-d: Don't queue_iova() if there is no flush queue
Luke Nowakowski-Krijger <lnowakow(a)eng.ucsd.edu>
media: radio-raremono: change devm_k*alloc to k*alloc
Benjamin Coddington <bcodding(a)redhat.com>
NFS: Cleanup if nfs_match_client is interrupted
Andrey Konovalov <andreyknvl(a)google.com>
media: pvrusb2: use a different format for warnings
Oliver Neukum <oneukum(a)suse.com>
media: cpia2_usb: first wake up, then free in disconnect
Fabio Estevam <festevam(a)gmail.com>
ath10k: Change the warning message string
Sean Young <sean(a)mess.org>
media: au0828: fix null dereference in error path
Phong Tran <tranmanphong(a)gmail.com>
ISDN: hfcsusb: checking idx of ep configuration
Todd Kjos <tkjos(a)android.com>
binder: fix possible UAF when freeing buffer
Will Deacon <will.deacon(a)arm.com>
arm64: compat: Provide definition for COMPAT_SIGMINSTKSZ
Abhishek Sahu <absahu(a)codeaurora.org>
i2c: qup: fixed releasing dma without flush operation completion
allen yan <yanwei(a)marvell.com>
arm64: dts: marvell: Fix A37xx UART0 register size
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFSv4: Fix lookup revalidate of regular files
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFS: Refactor nfs_lookup_revalidate()
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFS: Fix dentry revalidation on NFSv4 lookup
Sunil Muthuswamy <sunilmut(a)microsoft.com>
vsock: correct removal of socket from the list
Stefan Hajnoczi <stefanha(a)redhat.com>
VSOCK: use TCP state constants for sk_state
-------------
Diffstat:
.../devicetree/bindings/serial/mvebu-uart.txt | 2 +-
Makefile | 4 +-
arch/arm64/boot/dts/marvell/armada-37xx.dtsi | 2 +-
arch/arm64/include/asm/compat.h | 1 +
arch/sh/boards/Kconfig | 14 +-
drivers/android/binder.c | 16 +-
drivers/bluetooth/hci_ath.c | 3 +
drivers/bluetooth/hci_bcm.c | 3 +
drivers/bluetooth/hci_intel.c | 3 +
drivers/bluetooth/hci_ldisc.c | 13 +
drivers/bluetooth/hci_mrvl.c | 3 +
drivers/bluetooth/hci_uart.h | 1 +
drivers/i2c/busses/i2c-qup.c | 2 +
drivers/iommu/intel-iommu.c | 2 +-
drivers/iommu/iova.c | 18 +-
drivers/isdn/hardware/mISDN/hfcsusb.c | 3 +
drivers/media/radio/radio-raremono.c | 30 ++-
drivers/media/usb/au0828/au0828-core.c | 12 +-
drivers/media/usb/cpia2/cpia2_usb.c | 3 +-
drivers/media/usb/pvrusb2/pvrusb2-hdw.c | 4 +-
drivers/media/usb/pvrusb2/pvrusb2-i2c-core.c | 6 +-
drivers/media/usb/pvrusb2/pvrusb2-std.c | 2 +-
drivers/net/wireless/ath/ath10k/usb.c | 2 +-
drivers/pps/pps.c | 8 +
fs/ceph/caps.c | 7 +-
fs/exec.c | 2 +-
fs/nfs/client.c | 4 +-
fs/nfs/dir.c | 295 +++++++++++----------
fs/nfs/nfs4proc.c | 15 +-
include/linux/iova.h | 6 +
include/linux/sched/numa_balancing.h | 4 +-
include/net/af_vsock.h | 3 -
kernel/fork.c | 2 +-
kernel/sched/fair.c | 24 +-
net/vmw_vsock/af_vsock.c | 84 +++---
net/vmw_vsock/hyperv_transport.c | 118 ++++++---
net/vmw_vsock/virtio_transport.c | 2 +-
net/vmw_vsock/virtio_transport_common.c | 22 +-
net/vmw_vsock/vmci_transport.c | 34 +--
net/vmw_vsock/vmci_transport_notify.c | 2 +-
net/vmw_vsock/vmci_transport_notify_qstate.c | 2 +-
41 files changed, 472 insertions(+), 311 deletions(-)
This is the start of the stable review cycle for the 4.14.132 release.
There are 43 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu 04 Jul 2019 07:59:45 AM UTC.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.132-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.132-rc1
Xin Long <lucien.xin(a)gmail.com>
tipc: pass tunnel dev as NULL to udp_tunnel(6)_xmit_skb
Will Deacon <will.deacon(a)arm.com>
futex: Update comments and docs about return values of arch futex code
Daniel Borkmann <daniel(a)iogearbox.net>
bpf, arm64: use more scalable stadd over ldxr / stxr loop in xadd
Will Deacon <will.deacon(a)arm.com>
arm64: futex: Avoid copying out uninitialised stack in failed cmpxchg()
Martin KaFai Lau <kafai(a)fb.com>
bpf: udp: ipv6: Avoid running reuseport's bpf_prog from __udp6_lib_err
Martin KaFai Lau <kafai(a)fb.com>
bpf: udp: Avoid calling reuseport's bpf_prog from udp_gro
YueHaibing <yuehaibing(a)huawei.com>
bonding: Always enable vlan tx offload
YueHaibing <yuehaibing(a)huawei.com>
team: Always enable vlan tx offload
Fei Li <lifei.shirley(a)bytedance.com>
tun: wake up waitqueues after IFF_UP is set
Xin Long <lucien.xin(a)gmail.com>
tipc: check msg->req data len in tipc_nl_compat_bearer_disable
Xin Long <lucien.xin(a)gmail.com>
tipc: change to use register_pernet_device
Xin Long <lucien.xin(a)gmail.com>
sctp: change to hold sk after auth shkey is created successfully
Roland Hii <roland.king.guan.hii(a)intel.com>
net: stmmac: fixed new system time seconds value calculation
JingYi Hou <houjingyi647(a)gmail.com>
net: remove duplicate fetch in sock_getsockopt
Eric Dumazet <edumazet(a)google.com>
net/packet: fix memory leak in packet_set_ring()
Stephen Suryaputra <ssuryaextr(a)gmail.com>
ipv4: Use return value of inet_iif() for __raw_v4_lookup in the while loop
Neil Horman <nhorman(a)tuxdriver.com>
af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET
Wang Xin <xin.wang7(a)cn.bosch.com>
eeprom: at24: fix unexpected timeout under high load
Geert Uytterhoeven <geert(a)linux-m68k.org>
cpu/speculation: Warn on unsupported mitigations= parameter
Trond Myklebust <trondmy(a)gmail.com>
NFS/flexfiles: Use the correct TCP timeout for flexfiles I/O
Thomas Gleixner <tglx(a)linutronix.de>
x86/microcode: Fix the microcode load on CPU hotplug for real
Alejandro Jimenez <alejandro.j.jimenez(a)oracle.com>
x86/speculation: Allow guests to use SSBD even if host does not
Jan Kara <jack(a)suse.cz>
scsi: vmw_pscsi: Fix use-after-free in pvscsi_queue_lck()
zhangyi (F) <yi.zhang(a)huawei.com>
dm log writes: make sure super sector log updates are written in order
Colin Ian King <colin.king(a)canonical.com>
mm/page_idle.c: fix oops because end_pfn is larger than max_pfn
Jann Horn <jannh(a)google.com>
fs/binfmt_flat.c: make load_flat_shared_library() work
zhong jiang <zhongjiang(a)huawei.com>
mm/mempolicy.c: fix an incorrect rebind node in mpol_rebind_nodemask
John Ogness <john.ogness(a)linutronix.de>
fs/proc/array.c: allow reporting eip/esp for all coredumping threads
Sasha Levin <sashal(a)kernel.org>
Revert "compiler.h: update definition of unreachable()"
Kristian Evensen <kristian.evensen(a)gmail.com>
qmi_wwan: Fix out-of-bounds read
Adeodato Simó <dato(a)net.com.org.es>
net/9p: include trans_common.h to fix missing prototype warning.
Dominique Martinet <dominique.martinet(a)cea.fr>
9p: p9dirent_read: check network-provided name length
Dominique Martinet <dominique.martinet(a)cea.fr>
9p/rdma: remove useless check in cm_event_handler
Dominique Martinet <dominique.martinet(a)cea.fr>
9p: acl: fix uninitialized iattr access
Dominique Martinet <dominique.martinet(a)cea.fr>
9p/rdma: do not disconnect on down_interruptible EAGAIN
Dominique Martinet <dominique.martinet(a)cea.fr>
9p/xen: fix check for xenbus_read error in front_probe
Martin Wilck <mwilck(a)suse.com>
block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs
Christoph Hellwig <hch(a)lst.de>
block: add a lower-level bio_add_page interface
Mike Marciniszyn <mike.marciniszyn(a)intel.com>
IB/hfi1: Close PSM sdma_progress sleep window
Sasha Levin <sashal(a)kernel.org>
Revert "x86/uaccess, ftrace: Fix ftrace_likely_update() vs. SMAP"
Arnaldo Carvalho de Melo <acme(a)redhat.com>
perf header: Fix unchecked usage of strncpy()
Arnaldo Carvalho de Melo <acme(a)redhat.com>
perf help: Remove needless use of strncpy()
Arnaldo Carvalho de Melo <acme(a)redhat.com>
perf ui helpline: Use strlcpy() as a shorter form of strncpy() + explicit set nul
-------------
Diffstat:
Documentation/robust-futexes.txt | 3 +-
Makefile | 4 +-
arch/arm64/include/asm/futex.h | 4 +-
arch/arm64/include/asm/insn.h | 8 ++
arch/arm64/kernel/insn.c | 40 +++++++
arch/arm64/net/bpf_jit.h | 4 +
arch/arm64/net/bpf_jit_comp.c | 28 +++--
arch/x86/kernel/cpu/bugs.c | 11 +-
arch/x86/kernel/cpu/microcode/core.c | 15 ++-
block/bio.c | 131 +++++++++++++++------
drivers/infiniband/hw/hfi1/user_sdma.c | 12 +-
drivers/infiniband/hw/hfi1/user_sdma.h | 1 -
drivers/md/dm-log-writes.c | 23 +++-
drivers/misc/eeprom/at24.c | 107 ++++++++++++-----
drivers/net/bonding/bond_main.c | 2 +-
.../net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c | 2 +-
drivers/net/team/team.c | 2 +-
drivers/net/tun.c | 19 ++-
drivers/net/usb/qmi_wwan.c | 4 +-
drivers/scsi/vmw_pvscsi.c | 6 +-
fs/9p/acl.c | 2 +-
fs/binfmt_flat.c | 23 ++--
fs/nfs/flexfilelayout/flexfilelayoutdev.c | 2 +-
fs/proc/array.c | 2 +-
include/asm-generic/futex.h | 8 +-
include/linux/bio.h | 9 ++
include/linux/compiler.h | 5 +-
kernel/cpu.c | 3 +
kernel/trace/trace_branch.c | 4 -
mm/mempolicy.c | 2 +-
mm/page_idle.c | 4 +-
net/9p/protocol.c | 12 +-
net/9p/trans_common.c | 1 +
net/9p/trans_rdma.c | 7 +-
net/9p/trans_xen.c | 4 +-
net/core/sock.c | 3 -
net/ipv4/raw.c | 2 +-
net/ipv4/udp.c | 6 +-
net/ipv6/udp.c | 4 +-
net/packet/af_packet.c | 23 +++-
net/packet/internal.h | 1 +
net/sctp/endpointola.c | 8 +-
net/tipc/core.c | 12 +-
net/tipc/netlink_compat.c | 18 ++-
net/tipc/udp_media.c | 8 +-
tools/perf/builtin-help.c | 2 +-
tools/perf/ui/tui/helpline.c | 2 +-
tools/perf/util/header.c | 2 +-
48 files changed, 418 insertions(+), 187 deletions(-)
From: Chris Down <chris(a)chrisdown.name>
Subject: cgroup: kselftest: relax fs_spec checks
On my laptop most memcg kselftests were being skipped because it claimed
cgroup v2 hierarchy wasn't mounted, but this isn't correct. Instead, it
seems current systemd HEAD mounts it with the name "cgroup2" instead of
"cgroup":
% grep cgroup /proc/mounts
cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate 0 0
I can't think of a reason to need to check fs_spec explicitly
since it's arbitrary, so we can just rely on fs_vfstype.
After these changes, `make TARGETS=cgroup kselftest` actually runs the
cgroup v2 tests in more cases.
Link: http://lkml.kernel.org/r/20190723210737.GA487@chrisdown.name
Signed-off-by: Chris Down <chris(a)chrisdown.name>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/cgroup/cgroup_util.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/tools/testing/selftests/cgroup/cgroup_util.c~cgroup-kselftest-relax-fs_spec-checks
+++ a/tools/testing/selftests/cgroup/cgroup_util.c
@@ -191,8 +191,7 @@ int cg_find_unified_root(char *root, siz
strtok(NULL, delim);
strtok(NULL, delim);
- if (strcmp(fs, "cgroup") == 0 &&
- strcmp(type, "cgroup2") == 0) {
+ if (strcmp(type, "cgroup2") == 0) {
strncpy(root, mount, len);
return 0;
}
_
From: Arnd Bergmann <arnd(a)arndb.de>
Subject: ubsan: build ubsan.c more conservatively
objtool points out several conditions that it does not like, depending on
the combination with other configuration options and compiler variants:
stack protector:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0xbf: call to __stack_chk_fail() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0xbe: call to __stack_chk_fail() with UACCESS enabled
stackleak plugin:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0x4a: call to stackleak_track_stack() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0x4a: call to stackleak_track_stack() with UACCESS enabled
kasan:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0x25: call to memcpy() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0x25: call to memcpy() with UACCESS enabled
The stackleak and kasan options just need to be disabled for this file as
we do for other files already. For the stack protector, we already
attempt to disable it, but this fails on clang because the check is mixed
with the gcc specific -fno-conserve-stack option. According to Andrey
Ryabinin, that option is not even needed, dropping it here fixes the
stackprotector issue.
Link: http://lkml.kernel.org/r/20190722125139.1335385-1-arnd@arndb.de
Link: https://lore.kernel.org/lkml/20190617123109.667090-1-arnd@arndb.de/t/
Link: https://lore.kernel.org/lkml/20190722091050.2188664-1-arnd@arndb.de/t/
Fixes: d08965a27e84 ("x86/uaccess, ubsan: Fix UBSAN vs. SMAP")
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Reviewed-by: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Cc: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/Makefile | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/lib/Makefile~ubsan-build-ubsanc-more-conservatively
+++ a/lib/Makefile
@@ -279,7 +279,8 @@ obj-$(CONFIG_UCS2_STRING) += ucs2_string
obj-$(CONFIG_UBSAN) += ubsan.o
UBSAN_SANITIZE_ubsan.o := n
-CFLAGS_ubsan.o := $(call cc-option, -fno-conserve-stack -fno-stack-protector)
+KASAN_SANITIZE_ubsan.o := n
+CFLAGS_ubsan.o := $(call cc-option, -fno-stack-protector) $(DISABLE_STACKLEAK_PLUGIN)
obj-$(CONFIG_SBITMAP) += sbitmap.o
_
From: Mel Gorman <mgorman(a)techsingularity.net>
Subject: mm: compaction: avoid 100% CPU usage during compaction when a task is killed
"howaboutsynergy" reported via kernel buzilla number 204165 that
compact_zone_order was consuming 100% CPU during a stress test for
prolonged periods of time. Specifically the following command, which
should exit in 10 seconds, was taking an excessive time to finish while
the CPU was pegged at 100%.
stress -m 220 --vm-bytes 1000000000 --timeout 10
Tracing indicated a pattern as follows
stress-3923 [007] 519.106208: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106212: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106216: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106219: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106223: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106227: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106231: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106235: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106238: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106242: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
Note that compaction is entered in rapid succession while scanning and
isolating nothing. The problem is that when a task that is compacting
receives a fatal signal, it retries indefinitely instead of exiting while
making no progress as a fatal signal is pending.
It's not easy to trigger this condition although enabling zswap helps on
the basis that the timing is altered. A very small window has to be hit
for the problem to occur (signal delivered while compacting and isolating
a PFN for migration that is not aligned to SWAP_CLUSTER_MAX).
This was reproduced locally -- 16G single socket system, 8G swap, 30%
zswap configured, vm-bytes 22000000000 using Colin Kings stress-ng
implementation from github running in a loop until the problem hits).
Tracing recorded the problem occurring almost 200K times in a short
window. With this patch, the problem hit 4 times but the task existed
normally instead of consuming CPU.
This problem has existed for some time but it was made worse by
cf66f0700c8f ("mm, compaction: do not consider a need to reschedule as
contention"). Before that commit, if the same condition was hit then
locks would be quickly contended and compaction would exit that way.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204165
Link: http://lkml.kernel.org/r/20190718085708.GE24383@techsingularity.net
Fixes: cf66f0700c8f ("mm, compaction: do not consider a need to reschedule as contention")
Signed-off-by: Mel Gorman <mgorman(a)techsingularity.net>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org> [5.1+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/compaction.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
--- a/mm/compaction.c~mm-compaction-avoid-100%-cpu-usage-during-compaction-when-a-task-is-killed
+++ a/mm/compaction.c
@@ -842,13 +842,15 @@ isolate_migratepages_block(struct compac
/*
* Periodically drop the lock (if held) regardless of its
- * contention, to give chance to IRQs. Abort async compaction
- * if contended.
+ * contention, to give chance to IRQs. Abort completely if
+ * a fatal signal is pending.
*/
if (!(low_pfn % SWAP_CLUSTER_MAX)
&& compact_unlock_should_abort(&pgdat->lru_lock,
- flags, &locked, cc))
- break;
+ flags, &locked, cc)) {
+ low_pfn = 0;
+ goto fatal_pending;
+ }
if (!pfn_valid_within(low_pfn))
goto isolate_fail;
@@ -1060,6 +1062,7 @@ isolate_abort:
trace_mm_compaction_isolate_migratepages(start_pfn, low_pfn,
nr_scanned, nr_isolated);
+fatal_pending:
cc->total_migrate_scanned += nr_scanned;
if (nr_isolated)
count_compact_events(COMPACTISOLATED, nr_isolated);
_
From: Jan Kara <jack(a)suse.cz>
Subject: mm: migrate: fix reference check race between __find_get_block() and migration
buffer_migrate_page_norefs() can race with bh users in the following way:
CPU1 CPU2
buffer_migrate_page_norefs()
buffer_migrate_lock_buffers()
checks bh refs
spin_unlock(&mapping->private_lock)
__find_get_block()
spin_lock(&mapping->private_lock)
grab bh ref
spin_unlock(&mapping->private_lock)
move page do bh work
This can result in various issues like lost updates to buffers (i.e.
metadata corruption) or use after free issues for the old page.
This patch closes the race by holding mapping->private_lock while the
mapping is being moved to a new page. Ordinarily, a reference can be
taken outside of the private_lock using the per-cpu BH LRU but the
references are checked and the LRU invalidated if necessary. The
private_lock is held once the references are known so the buffer lookup
slow path will spin on the private_lock. Between the page lock and
private_lock, it should be impossible for other references to be acquired
and updates to happen during the migration.
A user had reported data corruption issues on a distribution kernel with a
similar page migration implementation as mainline. The data corruption
could not be reproduced with this patch applied. A small number of
migration-intensive tests were run and no performance problems were noted.
[mgorman(a)techsingularity.net: Changelog, removed tracing]
Link: http://lkml.kernel.org/r/20190718090238.GF24383@techsingularity.net
Fixes: 89cb0888ca14 "mm: migrate: provide buffer_migrate_page_norefs()"
Signed-off-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Mel Gorman <mgorman(a)techsingularity.net>
Cc: <stable(a)vger.kernel.org> [5.0+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/mm/migrate.c~mm-migrate-fix-reference-check-race-between-__find_get_block-and-migration
+++ a/mm/migrate.c
@@ -767,12 +767,12 @@ recheck_buffers:
}
bh = bh->b_this_page;
} while (bh != head);
- spin_unlock(&mapping->private_lock);
if (busy) {
if (invalidated) {
rc = -EAGAIN;
goto unlock_buffers;
}
+ spin_unlock(&mapping->private_lock);
invalidate_bh_lrus();
invalidated = true;
goto recheck_buffers;
@@ -805,6 +805,8 @@ recheck_buffers:
rc = MIGRATEPAGE_SUCCESS;
unlock_buffers:
+ if (check_refs)
+ spin_unlock(&mapping->private_lock);
bh = head;
do {
unlock_buffer(bh);
_
From: Yang Shi <yang.shi(a)linux.alibaba.com>
Subject: mm: vmscan: check if mem cgroup is disabled or not before calling memcg slab shrinker
Shakeel Butt reported premature oom on kernel with "cgroup_disable=memory"
since mem_cgroup_is_root() returns false even though memcg is actually
NULL. The drop_caches is also broken.
It is because aeed1d325d42 ("mm/vmscan.c: generalize shrink_slab() calls
in shrink_node()") removed the !memcg check before !mem_cgroup_is_root().
And, surprisingly root memcg is allocated even though memory cgroup is
disabled by kernel boot parameter.
Add mem_cgroup_disabled() check to make reclaimer work as expected.
Link: http://lkml.kernel.org/r/1563385526-20805-1-git-send-email-yang.shi@linux.a…
Fixes: aeed1d325d42 ("mm/vmscan.c: generalize shrink_slab() calls in shrink_node()")
Signed-off-by: Yang Shi <yang.shi(a)linux.alibaba.com>
Reported-by: Shakeel Butt <shakeelb(a)google.com>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Reviewed-by: Kirill Tkhai <ktkhai(a)virtuozzo.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Jan Hadrava <had(a)kam.mff.cuni.cz>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Qian Cai <cai(a)lca.pw>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> [4.19+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
--- a/mm/vmscan.c~mm-vmscan-check-if-mem-cgroup-is-disabled-or-not-before-calling-memcg-slab-shrinker
+++ a/mm/vmscan.c
@@ -699,7 +699,14 @@ static unsigned long shrink_slab(gfp_t g
unsigned long ret, freed = 0;
struct shrinker *shrinker;
- if (!mem_cgroup_is_root(memcg))
+ /*
+ * The root memcg might be allocated even though memcg is disabled
+ * via "cgroup_disable=memory" boot parameter. This could make
+ * mem_cgroup_is_root() return false, then just run memcg slab
+ * shrink, but skip global shrink. This may result in premature
+ * oom.
+ */
+ if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg))
return shrink_slab_memcg(gfp_mask, nid, memcg, priority);
if (!down_read_trylock(&shrinker_rwsem))
_
Some devices are not supposed to support on-die ECC but experience
shows that internal ECC machinery can actually be enabled through the
"SET FEATURE (EFh)" command, even if a read of the "READ ID Parameter
Tables" returns that it is not.
Currently, the driver checks the "READ ID Parameter" field directly
after having enabled the feature. If the check fails it returns
immediately but leaves the ECC on. When using buggy chips like
MT29F2G08ABAGA and MT29F2G08ABBGA, all future read/program cycles will
go through the on-die ECC, confusing the host controller which is
supposed to be the one handling correction.
To address this in a common way we need to turn off the on-die ECC
directly after reading the "READ ID Parameter" and before checking the
"ECC status".
Cc: stable(a)vger.kernel.org
Fixes: dbc44edbf833 ("mtd: rawnand: micron: Fix on-die ECC detection logic")
Signed-off-by: Marco Felsch <m.felsch(a)pengutronix.de>
Reviewed-by: Boris Brezillon <boris.brezillon(a)collabora.com>
---
drivers/mtd/nand/raw/nand_micron.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/mtd/nand/raw/nand_micron.c b/drivers/mtd/nand/raw/nand_micron.c
index 1622d3145587..8ca9fad6e6ad 100644
--- a/drivers/mtd/nand/raw/nand_micron.c
+++ b/drivers/mtd/nand/raw/nand_micron.c
@@ -390,6 +390,14 @@ static int micron_supports_on_die_ecc(struct nand_chip *chip)
(chip->id.data[4] & MICRON_ID_INTERNAL_ECC_MASK) != 0x2)
return MICRON_ON_DIE_UNSUPPORTED;
+ /*
+ * It seems that there are devices which do not support ECC officially.
+ * At least the MT29F2G08ABAGA / MT29F2G08ABBGA devices supports
+ * enabling the ECC feature but don't reflect that to the READ_ID table.
+ * So we have to guarantee that we disable the ECC feature directly
+ * after we did the READ_ID table command. Later we can evaluate the
+ * ECC_ENABLE support.
+ */
ret = micron_nand_on_die_ecc_setup(chip, true);
if (ret)
return MICRON_ON_DIE_UNSUPPORTED;
@@ -398,13 +406,13 @@ static int micron_supports_on_die_ecc(struct nand_chip *chip)
if (ret)
return MICRON_ON_DIE_UNSUPPORTED;
- if (!(id[4] & MICRON_ID_ECC_ENABLED))
- return MICRON_ON_DIE_UNSUPPORTED;
-
ret = micron_nand_on_die_ecc_setup(chip, false);
if (ret)
return MICRON_ON_DIE_UNSUPPORTED;
+ if (!(id[4] & MICRON_ID_ECC_ENABLED))
+ return MICRON_ON_DIE_UNSUPPORTED;
+
ret = nand_readid_op(chip, 0, id, sizeof(id));
if (ret)
return MICRON_ON_DIE_UNSUPPORTED;
--
2.20.1
When a ZONE_DEVICE private page is freed, the page->mapping field can be
set. If this page is reused as an anonymous page, the previous value can
prevent the page from being inserted into the CPU's anon rmap table.
For example, when migrating a pte_none() page to device memory:
migrate_vma(ops, vma, start, end, src, dst, private)
migrate_vma_collect()
src[] = MIGRATE_PFN_MIGRATE
migrate_vma_prepare()
/* no page to lock or isolate so OK */
migrate_vma_unmap()
/* no page to unmap so OK */
ops->alloc_and_copy()
/* driver allocates ZONE_DEVICE page for dst[] */
migrate_vma_pages()
migrate_vma_insert_page()
page_add_new_anon_rmap()
__page_set_anon_rmap()
/* This check sees the page's stale mapping field */
if (PageAnon(page))
return
/* page->mapping is not updated */
The result is that the migration appears to succeed but a subsequent CPU
fault will be unable to migrate the page back to system memory or worse.
Clear the page->mapping field when freeing the ZONE_DEVICE page so stale
pointer data doesn't affect future page use.
Fixes: b7a523109fb5c9d2d6dd ("mm: don't clear ->mapping in hmm_devmem_free")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ralph Campbell <rcampbell(a)nvidia.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Logan Gunthorpe <logang(a)deltatee.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Jan Kara <jack(a)suse.cz>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
---
kernel/memremap.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 6ee03a816d67..289a086e1467 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -397,6 +397,30 @@ void __put_devmap_managed_page(struct page *page)
mem_cgroup_uncharge(page);
+ /*
+ * When a device_private page is freed, the page->mapping field
+ * may still contain a (stale) mapping value. For example, the
+ * lower bits of page->mapping may still identify the page as
+ * an anonymous page. Ultimately, this entire field is just
+ * stale and wrong, and it will cause errors if not cleared.
+ * One example is:
+ *
+ * migrate_vma_pages()
+ * migrate_vma_insert_page()
+ * page_add_new_anon_rmap()
+ * __page_set_anon_rmap()
+ * ...checks page->mapping, via PageAnon(page) call,
+ * and incorrectly concludes that the page is an
+ * anonymous page. Therefore, it incorrectly,
+ * silently fails to set up the new anon rmap.
+ *
+ * For other types of ZONE_DEVICE pages, migration is either
+ * handled differently or not done at all, so there is no need
+ * to clear page->mapping.
+ */
+ if (is_device_private_page(page))
+ page->mapping = NULL;
+
page->pgmap->ops->page_free(page);
} else if (!count)
__put_page(page);
--
2.20.1
From: Florian Westphal <fw(a)strlen.de>
[ Upstream commit 1b0890cd60829bd51455dc5ad689ed58c4408227 ]
Thomas and Juliana report a deadlock when running:
(rmmod nf_conntrack_netlink/xfrm_user)
conntrack -e NEW -E &
modprobe -v xfrm_user
They provided following analysis:
conntrack -e NEW -E
netlink_bind()
netlink_lock_table() -> increases "nl_table_users"
nfnetlink_bind()
# does not unlock the table as it's locked by netlink_bind()
__request_module()
call_usermodehelper_exec()
This triggers "modprobe nf_conntrack_netlink" from kernel, netlink_bind()
won't return until modprobe process is done.
"modprobe xfrm_user":
xfrm_user_init()
register_pernet_subsys()
-> grab pernet_ops_rwsem
..
netlink_table_grab()
calls schedule() as "nl_table_users" is non-zero
so modprobe is blocked because netlink_bind() increased
nl_table_users while also holding pernet_ops_rwsem.
"modprobe nf_conntrack_netlink" runs and inits nf_conntrack_netlink:
ctnetlink_init()
register_pernet_subsys()
-> blocks on "pernet_ops_rwsem" thanks to xfrm_user module
both modprobe processes wait on one another -- neither can make
progress.
Switch netlink_bind() to "nowait" modprobe -- this releases the netlink
table lock, which then allows both modprobe instances to complete.
Reported-by: Thomas Jarosch <thomas.jarosch(a)intra2net.com>
Reported-by: Juliana Rodrigueiro <juliana.rodrigueiro(a)intra2net.com>
Signed-off-by: Florian Westphal <fw(a)strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
net/netfilter/nfnetlink.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index 9adedba78eeac..044559c10e98e 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -495,7 +495,7 @@ static int nfnetlink_bind(struct net *net, int group)
ss = nfnetlink_get_subsys(type << 8);
rcu_read_unlock();
if (!ss)
- request_module("nfnetlink-subsys-%d", type);
+ request_module_nowait("nfnetlink-subsys-%d", type);
return 0;
}
#endif
--
2.20.1
> From: Sasha Levin <sashal(a)kernel.org>
> Sent: Friday, August 2, 2019 11:08 AM
>
> Hi,
>
> [This is an automated email]
>
> This commit has been processed because it contains a "Fixes:" tag,
> fixing commit: 15becc2b56c6 PCI: hv: Add hv_pci_remove_slots() when we
> unload the driver.
>
> The bot has tested the following trees: v5.2.5, v4.19.63, v4.14.135.
>
> v5.2.5: Build OK!
> v4.19.63: Build OK!
> v4.14.135: Failed to apply! Possible dependencies:
> Unable to calculate
>
> NOTE: The patch will not be queued to stable trees until it is upstream.
>
> How should we proceed with this patch?
> Sasha
I'm waiting the patch to be accepted into the pci tree first.
We do not need to do anything at this moment.
Thanks,
-- Dexuan
From: Florian Westphal <fw(a)strlen.de>
[ Upstream commit 1b0890cd60829bd51455dc5ad689ed58c4408227 ]
Thomas and Juliana report a deadlock when running:
(rmmod nf_conntrack_netlink/xfrm_user)
conntrack -e NEW -E &
modprobe -v xfrm_user
They provided following analysis:
conntrack -e NEW -E
netlink_bind()
netlink_lock_table() -> increases "nl_table_users"
nfnetlink_bind()
# does not unlock the table as it's locked by netlink_bind()
__request_module()
call_usermodehelper_exec()
This triggers "modprobe nf_conntrack_netlink" from kernel, netlink_bind()
won't return until modprobe process is done.
"modprobe xfrm_user":
xfrm_user_init()
register_pernet_subsys()
-> grab pernet_ops_rwsem
..
netlink_table_grab()
calls schedule() as "nl_table_users" is non-zero
so modprobe is blocked because netlink_bind() increased
nl_table_users while also holding pernet_ops_rwsem.
"modprobe nf_conntrack_netlink" runs and inits nf_conntrack_netlink:
ctnetlink_init()
register_pernet_subsys()
-> blocks on "pernet_ops_rwsem" thanks to xfrm_user module
both modprobe processes wait on one another -- neither can make
progress.
Switch netlink_bind() to "nowait" modprobe -- this releases the netlink
table lock, which then allows both modprobe instances to complete.
Reported-by: Thomas Jarosch <thomas.jarosch(a)intra2net.com>
Reported-by: Juliana Rodrigueiro <juliana.rodrigueiro(a)intra2net.com>
Signed-off-by: Florian Westphal <fw(a)strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
net/netfilter/nfnetlink.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index 733d3e4a30d85..2cee032af46d2 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -530,7 +530,7 @@ static int nfnetlink_bind(struct net *net, int group)
ss = nfnetlink_get_subsys(type << 8);
rcu_read_unlock();
if (!ss)
- request_module("nfnetlink-subsys-%d", type);
+ request_module_nowait("nfnetlink-subsys-%d", type);
return 0;
}
#endif
--
2.20.1
This is a note to let you know that I've just added the patch titled
usb: typec: tcpm: Ignore unsupported/unknown alternate mode requests
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 88d02c9ba2e83fc22d37ccb1f11c62ea6fc9ae50 Mon Sep 17 00:00:00 2001
From: Guenter Roeck <linux(a)roeck-us.net>
Date: Fri, 2 Aug 2019 09:03:42 -0700
Subject: usb: typec: tcpm: Ignore unsupported/unknown alternate mode requests
TCPM may receive PD messages associated with unknown or unsupported
alternate modes. If that happens, calls to typec_match_altmode()
will return NULL. The tcpm code does not currently take this into
account. This results in crashes.
Unable to handle kernel NULL pointer dereference at virtual address 000001f0
pgd = 41dad9a1
[000001f0] *pgd=00000000
Internal error: Oops: 5 [#1] THUMB2
Modules linked in: tcpci tcpm
CPU: 0 PID: 2338 Comm: kworker/u2:0 Not tainted 5.1.18-sama5-armv7-r2 #6
Hardware name: Atmel SAMA5
Workqueue: 2-0050 tcpm_pd_rx_handler [tcpm]
PC is at typec_altmode_attention+0x0/0x14
LR is at tcpm_pd_rx_handler+0xa3b/0xda0 [tcpm]
...
[<c03fbee8>] (typec_altmode_attention) from [<bf8030fb>]
(tcpm_pd_rx_handler+0xa3b/0xda0 [tcpm])
[<bf8030fb>] (tcpm_pd_rx_handler [tcpm]) from [<c012082b>]
(process_one_work+0x123/0x2a8)
[<c012082b>] (process_one_work) from [<c0120a6d>]
(worker_thread+0xbd/0x3b0)
[<c0120a6d>] (worker_thread) from [<c012431f>] (kthread+0xcf/0xf4)
[<c012431f>] (kthread) from [<c01010f9>] (ret_from_fork+0x11/0x38)
Ignore PD messages if the associated alternate mode is not supported.
Fixes: e9576fe8e605c ("usb: typec: tcpm: Support for Alternate Modes")
Cc: stable <stable(a)vger.kernel.org>
Reported-by: Douglas Gilbert <dgilbert(a)interlog.com>
Cc: Douglas Gilbert <dgilbert(a)interlog.com>
Acked-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Tested-by: Douglas Gilbert <dgilbert(a)interlog.com>
Signed-off-by: Guenter Roeck <linux(a)roeck-us.net>
Link: https://lore.kernel.org/r/1564761822-13984-1-git-send-email-linux@roeck-us.…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/typec/tcpm/tcpm.c | 38 ++++++++++++++++++++++-------------
1 file changed, 24 insertions(+), 14 deletions(-)
diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c
index ab6456622120..15abe1d9958f 100644
--- a/drivers/usb/typec/tcpm/tcpm.c
+++ b/drivers/usb/typec/tcpm/tcpm.c
@@ -1109,7 +1109,8 @@ static int tcpm_pd_svdm(struct tcpm_port *port, const __le32 *payload, int cnt,
break;
case CMD_ATTENTION:
/* Attention command does not have response */
- typec_altmode_attention(adev, p[1]);
+ if (adev)
+ typec_altmode_attention(adev, p[1]);
return 0;
default:
break;
@@ -1161,20 +1162,26 @@ static int tcpm_pd_svdm(struct tcpm_port *port, const __le32 *payload, int cnt,
}
break;
case CMD_ENTER_MODE:
- typec_altmode_update_active(pdev, true);
-
- if (typec_altmode_vdm(adev, p[0], &p[1], cnt)) {
- response[0] = VDO(adev->svid, 1, CMD_EXIT_MODE);
- response[0] |= VDO_OPOS(adev->mode);
- return 1;
+ if (adev && pdev) {
+ typec_altmode_update_active(pdev, true);
+
+ if (typec_altmode_vdm(adev, p[0], &p[1], cnt)) {
+ response[0] = VDO(adev->svid, 1,
+ CMD_EXIT_MODE);
+ response[0] |= VDO_OPOS(adev->mode);
+ return 1;
+ }
}
return 0;
case CMD_EXIT_MODE:
- typec_altmode_update_active(pdev, false);
+ if (adev && pdev) {
+ typec_altmode_update_active(pdev, false);
- /* Back to USB Operation */
- WARN_ON(typec_altmode_notify(adev, TYPEC_STATE_USB,
- NULL));
+ /* Back to USB Operation */
+ WARN_ON(typec_altmode_notify(adev,
+ TYPEC_STATE_USB,
+ NULL));
+ }
break;
default:
break;
@@ -1184,8 +1191,10 @@ static int tcpm_pd_svdm(struct tcpm_port *port, const __le32 *payload, int cnt,
switch (cmd) {
case CMD_ENTER_MODE:
/* Back to USB Operation */
- WARN_ON(typec_altmode_notify(adev, TYPEC_STATE_USB,
- NULL));
+ if (adev)
+ WARN_ON(typec_altmode_notify(adev,
+ TYPEC_STATE_USB,
+ NULL));
break;
default:
break;
@@ -1196,7 +1205,8 @@ static int tcpm_pd_svdm(struct tcpm_port *port, const __le32 *payload, int cnt,
}
/* Informing the alternate mode drivers about everything */
- typec_altmode_vdm(adev, p[0], &p[1], cnt);
+ if (adev)
+ typec_altmode_vdm(adev, p[0], &p[1], cnt);
return rlen;
}
--
2.22.0
This is a note to let you know that I've just added the patch titled
usb: host: xhci-rcar: Fix timeout in xhci_suspend()
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 783bda5e41acc71f98336e1a402c180f9748e5dc Mon Sep 17 00:00:00 2001
From: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Date: Fri, 2 Aug 2019 17:33:35 +0900
Subject: usb: host: xhci-rcar: Fix timeout in xhci_suspend()
When a USB device is connected to the host controller and
the system enters suspend, the following error happens
in xhci_suspend():
xhci-hcd ee000000.usb: WARN: xHC CMD_RUN timeout
Since the firmware/internal CPU control the USBSTS.STS_HALT
and the process speed is down when the roothub port enters U3,
long delay for the handshake of STS_HALT is neeed in xhci_suspend().
So, this patch adds to set the XHCI_SLOW_SUSPEND.
Fixes: 435cc1138ec9 ("usb: host: xhci-plat: set resume_quirk() for R-Car controllers")
Cc: <stable(a)vger.kernel.org> # v4.12+
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Link: https://lore.kernel.org/r/1564734815-17964-1-git-send-email-yoshihiro.shimo…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/host/xhci-rcar.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/host/xhci-rcar.c b/drivers/usb/host/xhci-rcar.c
index 671bce18782c..8616c52849c6 100644
--- a/drivers/usb/host/xhci-rcar.c
+++ b/drivers/usb/host/xhci-rcar.c
@@ -238,10 +238,15 @@ int xhci_rcar_init_quirk(struct usb_hcd *hcd)
* pointers. So, this driver clears the AC64 bit of xhci->hcc_params
* to call dma_set_coherent_mask(dev, DMA_BIT_MASK(32)) in
* xhci_gen_setup().
+ *
+ * And, since the firmware/internal CPU control the USBSTS.STS_HALT
+ * and the process speed is down when the roothub port enters U3,
+ * long delay for the handshake of STS_HALT is neeed in xhci_suspend().
*/
if (xhci_rcar_is_gen2(hcd->self.controller) ||
- xhci_rcar_is_gen3(hcd->self.controller))
- xhci->quirks |= XHCI_NO_64BIT_SUPPORT;
+ xhci_rcar_is_gen3(hcd->self.controller)) {
+ xhci->quirks |= XHCI_NO_64BIT_SUPPORT | XHCI_SLOW_SUSPEND;
+ }
if (!xhci_rcar_wait_for_pll_active(hcd))
return -ETIMEDOUT;
--
2.22.0
The callback pmu->read() can be called with interrupts enabled.
Currently, on ARM, this can cause the following callchain:
armpmu_read() -> armpmu_event_update() -> armv7pmu_read_counter()
The last function might modify the counter selector register and then
read the target counter, without taking any lock. With interrupts
enabled, a PMU interrupt could occur and modify the selector register
as well, between the selection and read of the interrupted context.
Save and restore the value of the selector register in the PMU interrupt
handler, ensuring the interrupted context is left with the correct PMU
registers selected.
Signed-off-by: Julien Thierry <julien.thierry(a)arm.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Arnaldo Carvalho de Melo <acme(a)kernel.org>
Cc: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
Cc: Jiri Olsa <jolsa(a)redhat.com>
Cc: Namhyung Kim <namhyung(a)kernel.org>
Cc: Russell King <linux(a)armlinux.org.uk>
Cc: stable(a)vger.kernel.org
---
arch/arm/kernel/perf_event_v7.c | 25 +++++++++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index a4fb0f8..b7be2a3 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -736,10 +736,22 @@ static inline int armv7_pmnc_counter_has_overflowed(u32 pmnc, int idx)
return pmnc & BIT(ARMV7_IDX_TO_COUNTER(idx));
}
-static inline void armv7_pmnc_select_counter(int idx)
+static inline u32 armv7_pmsel_read(void)
+{
+ u32 pmsel;
+
+ asm volatile("mrc p15, 0, %0, c9, c12, 5" : "=&r" (pmsel));
+ return pmsel;
+}
+
+static inline void armv7_pmsel_write(u32 counter)
{
- u32 counter = ARMV7_IDX_TO_COUNTER(idx);
asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (counter));
+}
+
+static inline void armv7_pmnc_select_counter(int idx)
+{
+ armv7_pmsel_write(ARMV7_IDX_TO_COUNTER(idx));
isb();
}
@@ -952,8 +964,15 @@ static irqreturn_t armv7pmu_handle_irq(struct arm_pmu *cpu_pmu)
struct perf_sample_data data;
struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events);
struct pt_regs *regs;
+ u32 pmsel;
int idx;
+
+ /*
+ * Save pmsel in case the interrupted context was using it.
+ */
+ pmsel = armv7_pmsel_read();
+
/*
* Get and reset the IRQ flags
*/
@@ -1004,6 +1023,8 @@ static irqreturn_t armv7pmu_handle_irq(struct arm_pmu *cpu_pmu)
*/
irq_work_run();
+ armv7_pmsel_write(pmsel);
+
return IRQ_HANDLED;
}
--
1.9.1
From: Florian Westphal <fw(a)strlen.de>
[ Upstream commit 1b0890cd60829bd51455dc5ad689ed58c4408227 ]
Thomas and Juliana report a deadlock when running:
(rmmod nf_conntrack_netlink/xfrm_user)
conntrack -e NEW -E &
modprobe -v xfrm_user
They provided following analysis:
conntrack -e NEW -E
netlink_bind()
netlink_lock_table() -> increases "nl_table_users"
nfnetlink_bind()
# does not unlock the table as it's locked by netlink_bind()
__request_module()
call_usermodehelper_exec()
This triggers "modprobe nf_conntrack_netlink" from kernel, netlink_bind()
won't return until modprobe process is done.
"modprobe xfrm_user":
xfrm_user_init()
register_pernet_subsys()
-> grab pernet_ops_rwsem
..
netlink_table_grab()
calls schedule() as "nl_table_users" is non-zero
so modprobe is blocked because netlink_bind() increased
nl_table_users while also holding pernet_ops_rwsem.
"modprobe nf_conntrack_netlink" runs and inits nf_conntrack_netlink:
ctnetlink_init()
register_pernet_subsys()
-> blocks on "pernet_ops_rwsem" thanks to xfrm_user module
both modprobe processes wait on one another -- neither can make
progress.
Switch netlink_bind() to "nowait" modprobe -- this releases the netlink
table lock, which then allows both modprobe instances to complete.
Reported-by: Thomas Jarosch <thomas.jarosch(a)intra2net.com>
Reported-by: Juliana Rodrigueiro <juliana.rodrigueiro(a)intra2net.com>
Signed-off-by: Florian Westphal <fw(a)strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
net/netfilter/nfnetlink.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index 2278d9ab723bf..9837a61cb3e3b 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -490,7 +490,7 @@ static int nfnetlink_bind(struct net *net, int group)
ss = nfnetlink_get_subsys(type << 8);
rcu_read_unlock();
if (!ss)
- request_module("nfnetlink-subsys-%d", type);
+ request_module_nowait("nfnetlink-subsys-%d", type);
return 0;
}
#endif
--
2.20.1
From: Tomas Bortoli <tomasbortoli(a)gmail.com>
Uninitialized Kernel memory can leak to USB devices.
Fix by using kzalloc() instead of kmalloc() on the affected buffers.
Signed-off-by: Tomas Bortoli <tomasbortoli(a)gmail.com>
Reported-by: syzbot+d6a5a1a3657b596ef132(a)syzkaller.appspotmail.com
Fixes: f14e22435a27 ("net: can: peak_usb: Do not do dma on the stack")
Cc: linux-stable <stable(a)vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/usb/peak_usb/pcan_usb_pro.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_pro.c b/drivers/net/can/usb/peak_usb/pcan_usb_pro.c
index 178bb7cff0c1..53cb2f72bdd0 100644
--- a/drivers/net/can/usb/peak_usb/pcan_usb_pro.c
+++ b/drivers/net/can/usb/peak_usb/pcan_usb_pro.c
@@ -494,7 +494,7 @@ static int pcan_usb_pro_drv_loaded(struct peak_usb_device *dev, int loaded)
u8 *buffer;
int err;
- buffer = kmalloc(PCAN_USBPRO_FCT_DRVLD_REQ_LEN, GFP_KERNEL);
+ buffer = kzalloc(PCAN_USBPRO_FCT_DRVLD_REQ_LEN, GFP_KERNEL);
if (!buffer)
return -ENOMEM;
--
2.20.1
From: Tomas Bortoli <tomasbortoli(a)gmail.com>
Uninitialized Kernel memory can leak to USB devices.
Fix by using kzalloc() instead of kmalloc() on the affected buffers.
Signed-off-by: Tomas Bortoli <tomasbortoli(a)gmail.com>
Reported-by: syzbot+513e4d0985298538bf9b(a)syzkaller.appspotmail.com
Fixes: 0a25e1f4f185 ("can: peak_usb: add support for PEAK new CANFD USB adapters")
Cc: linux-stable <stable(a)vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/usb/peak_usb/pcan_usb_fd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_fd.c b/drivers/net/can/usb/peak_usb/pcan_usb_fd.c
index 34761c3a6286..47cc1ff5b88e 100644
--- a/drivers/net/can/usb/peak_usb/pcan_usb_fd.c
+++ b/drivers/net/can/usb/peak_usb/pcan_usb_fd.c
@@ -841,7 +841,7 @@ static int pcan_usb_fd_init(struct peak_usb_device *dev)
goto err_out;
/* allocate command buffer once for all for the interface */
- pdev->cmd_buffer_addr = kmalloc(PCAN_UFD_CMD_BUFFER_SIZE,
+ pdev->cmd_buffer_addr = kzalloc(PCAN_UFD_CMD_BUFFER_SIZE,
GFP_KERNEL);
if (!pdev->cmd_buffer_addr)
goto err_out_1;
--
2.20.1
This adds support for __copy to v4.9.y so that we can use it in
init/exit_module to avoid -Werror=missing-attributes errors on GCC 9.
Link: https://lore.kernel.org/lkml/259986242.BvXPX32bHu@devpool35/
Cc: <stable(a)vger.kernel.org>
Suggested-by: Rolf Eike Beer <eb(a)emlix.com>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis(a)gmail.com>
---
include/linux/compiler.h | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 80a5bc623c47..e905353f217a 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -54,6 +54,22 @@ extern void __chk_io_ptr(const volatile void __iomem *);
#ifdef __KERNEL__
+/*
+ * Minimal backport of compiler_attributes.h to add support for __copy
+ * to v4.9.y so that we can use it in init/exit_module to avoid
+ * -Werror=missing-attributes errors on GCC 9.
+ */
+#ifndef __has_attribute
+# define __has_attribute(x) __GCC4_has_attribute_##x
+# define __GCC4_has_attribute___copy__ 0
+#endif
+
+#if __has_attribute(__copy__)
+# define __copy(symbol) __attribute__((__copy__(symbol)))
+#else
+# define __copy(symbol)
+#endif
+
#ifdef __GNUC__
#include <linux/compiler-gcc.h>
#endif
--
2.17.1
I decided to dig out a toy project which uses a DragonBoard 410c. This has
been "running" with kernel 4.9, which I would keep this way for unrelated
reasons. The vanilla 4.9 kernel wasn't bootable back then, but it was
buildable, which was good enough.
Upgrading the kernel to 4.9.180 caused the boot to suddenly fail:
aarch64-unknown-linux-gnueabi-ld: ./drivers/firmware/efi/libstub/lib.a(arm64-
stub.stub.o): in function `handle_kernel_image':
/tmp/e2/build/linux-4.9.139/drivers/firmware/efi/libstub/arm64-stub.c:63:
undefined reference to `__efistub__GLOBAL_OFFSET_TABLE_'
aarch64-unknown-linux-gnueabi-ld: ./drivers/firmware/efi/libstub/lib.a(arm64-
stub.stub.o): relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol
`__efistub__GLOBAL_OFFSET_TABLE_' which may bind externally can not be used
when making a shared object; recompile with -fPIC
/tmp/e2/build/linux-4.9.139/drivers/firmware/efi/libstub/arm64-stub.c:63:
(.init.text+0xc): dangerous relocation: unsupported relocation
/tmp/e2/build/linux-4.9.139/Makefile:1001: recipe for target 'vmlinux' failed
-make[1]: *** [vmlinux] Error 1
This is caused by commit 27b5ebf61818749b3568354c64a8ec2d9cd5ecca from
linux-4.9.y (which is 91ee5b21ee026c49e4e7483de69b55b8b47042be), reverting
this commit fixes the build.
This happens with vanilla binutils 2.32 and gcc 8.3.0 as well as 9.1.0. See
the attached .config for reference.
If you have questions or patches just ping me.
Greetings,
Eike
--
Rolf Eike Beer, emlix GmbH, http://www.emlix.com
Fon +49 551 30664-0, Fax +49 551 30664-11
Gothaer Platz 3, 37083 Göttingen, Germany
Sitz der Gesellschaft: Göttingen, Amtsgericht Göttingen HR B 3160
Geschäftsführung: Heike Jordan, Dr. Uwe Kracke – Ust-IdNr.: DE 205 198 055
emlix - smart embedded open source
Hi back in the December of 2016 there was commit
"1c52d859cb2d417e7216d3e56bb7fea88444cec9"
witch was followed shortly by "c198b121b1a1d7a7171770c634cd49191bac4477"
Unfortunately only the first commit was later included in long-term kernel
branches such as 4.4 and 4.9 witch left some of microcode functionality
broken on 32 bit systems
I guess it should be easily fixed by including
"c198b121b1a1d7a7171770c634cd49191bac4477" in those branches
Hi,
[This is an automated email]
This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: all
The bot has tested the following trees: v5.1.4, v5.0.18, v4.19.45, v4.14.121, v4.9.178, v4.4.180, v3.18.140.
v5.1.4: Build OK!
v5.0.18: Failed to apply! Possible dependencies:
e3ec8d6898f71 ("ceph: send cap releases more aggressively")
v4.19.45: Failed to apply! Possible dependencies:
e3ec8d6898f71 ("ceph: send cap releases more aggressively")
v4.14.121: Failed to apply! Possible dependencies:
a1c6b8358171c ("ceph: define argument structure for handle_cap_grant")
a57d9064e4ee4 ("ceph: flush pending works before shutdown super")
e3ec8d6898f71 ("ceph: send cap releases more aggressively")
v4.9.178: Failed to apply! Possible dependencies:
a1c6b8358171c ("ceph: define argument structure for handle_cap_grant")
a57d9064e4ee4 ("ceph: flush pending works before shutdown super")
e3ec8d6898f71 ("ceph: send cap releases more aggressively")
v4.4.180: Failed to apply! Possible dependencies:
13d1ad16d05ee ("libceph: move message allocation out of ceph_osdc_alloc_request()")
34b759b4a22b0 ("ceph: kill ceph_empty_snapc")
3f1af42ad0fad ("libceph: enable large, variable-sized OSD requests")
5be0389dac662 ("ceph: re-send AIO write request when getting -EOLDSNAP error")
7627151ea30bc ("libceph: define new ceph_file_layout structure")
779fe0fb8e188 ("ceph: rados pool namespace support")
922dab6134178 ("libceph, rbd: ceph_osd_linger_request, watch/notify v2")
a1c6b8358171c ("ceph: define argument structure for handle_cap_grant")
ae458f5a171ba ("libceph: make r_request msg_size calculation clearer")
c41d13a31fefe ("rbd: use header_oid instead of header_name")
c8fe9b17d055f ("ceph: Asynchronous IO support")
d30291b985d18 ("libceph: variable-sized ceph_object_id")
e3ec8d6898f71 ("ceph: send cap releases more aggressively")
v3.18.140: Failed to apply! Possible dependencies:
10183a69551f7 ("ceph: check OSD caps before read/write")
28127bdd2f843 ("ceph: convert inline data to normal data before data write")
31c542a199d79 ("ceph: add inline data to pagecache")
5be0389dac662 ("ceph: re-send AIO write request when getting -EOLDSNAP error")
70db4f3629b34 ("ceph: introduce a new inode flag indicating if cached dentries are ordered")
745a8e3bccbc6 ("ceph: don't pre-allocate space for cap release messages")
7627151ea30bc ("libceph: define new ceph_file_layout structure")
779fe0fb8e188 ("ceph: rados pool namespace support")
83701246aee8f ("ceph: sync read inline data")
a1c6b8358171c ("ceph: define argument structure for handle_cap_grant")
affbc19a68f99 ("ceph: make sure syncfs flushes all cap snaps")
c8fe9b17d055f ("ceph: Asynchronous IO support")
d30291b985d18 ("libceph: variable-sized ceph_object_id")
d3383a8e37f80 ("ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_sync)")
e3ec8d6898f71 ("ceph: send cap releases more aggressively")
e96a650a8174e ("ceph, rbd: delete unnecessary checks before two function calls")
How should we proceed with this patch?
--
Thanks,
Sasha
From: Andrea Arcangeli <aarcange(a)redhat.com>
commit 04f5866e41fb70690e28397487d8bd8eea7d712a upstream.
The core dumping code has always run without holding the mmap_sem for
writing, despite that is the only way to ensure that the entire vma
layout will not change from under it. Only using some signal
serialization on the processes belonging to the mm is not nearly enough.
This was pointed out earlier. For example in Hugh's post from Jul 2017:
https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils
"Not strictly relevant here, but a related note: I was very surprised
to discover, only quite recently, how handle_mm_fault() may be called
without down_read(mmap_sem) - when core dumping. That seems a
misguided optimization to me, which would also be nice to correct"
In particular because the growsdown and growsup can move the
vm_start/vm_end the various loops the core dump does around the vma will
not be consistent if page faults can happen concurrently.
Pretty much all users calling mmget_not_zero()/get_task_mm() and then
taking the mmap_sem had the potential to introduce unexpected side
effects in the core dumping code.
Adding mmap_sem for writing around the ->core_dump invocation is a
viable long term fix, but it requires removing all copy user and page
faults and to replace them with get_dump_page() for all binary formats
which is not suitable as a short term fix.
For the time being this solution manually covers the places that can
confuse the core dump either by altering the vma layout or the vma flags
while it runs. Once ->core_dump runs under mmap_sem for writing the
function mmget_still_valid() can be dropped.
Allowing mmap_sem protected sections to run in parallel with the
coredump provides some minor parallelism advantage to the swapoff code
(which seems to be safe enough by never mangling any vma field and can
keep doing swapins in parallel to the core dumping) and to some other
corner case.
In order to facilitate the backporting I added "Fixes: 86039bd3b4e6"
however the side effect of this same race condition in /proc/pid/mem
should be reproducible since before 2.6.12-rc2 so I couldn't add any
other "Fixes:" because there's no hash beyond the git genesis commit.
Because find_extend_vma() is the only location outside of the process
context that could modify the "mm" structures under mmap_sem for
reading, by adding the mmget_still_valid() check to it, all other cases
that take the mmap_sem for reading don't need the new check after
mmget_not_zero()/get_task_mm(). The expand_stack() in page fault
context also doesn't need the new check, because all tasks under core
dumping are frozen.
Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com
Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
Signed-off-by: Andrea Arcangeli <aarcange(a)redhat.com>
Reported-by: Jann Horn <jannh(a)google.com>
Suggested-by: Oleg Nesterov <oleg(a)redhat.com>
Acked-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Mike Rapoport <rppt(a)linux.ibm.com>
Reviewed-by: Oleg Nesterov <oleg(a)redhat.com>
Reviewed-by: Jann Horn <jannh(a)google.com>
Acked-by: Jason Gunthorpe <jgg(a)mellanox.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[akaher(a)vmware.com: stable 4.9 backport
- handle binder_update_page_range - mhocko(a)suse.com]
Signed-off-by: Ajay Kaher <akaher(a)vmware.com>
---
drivers/android/binder.c | 6 ++++++
fs/proc/task_mmu.c | 18 ++++++++++++++++++
fs/userfaultfd.c | 9 +++++++++
include/linux/mm.h | 21 +++++++++++++++++++++
mm/mmap.c | 6 +++++-
5 files changed, 59 insertions(+), 1 deletion(-)
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 80499f4..f05ab8f 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -581,6 +581,12 @@ static int binder_update_page_range(struct binder_proc *proc, int allocate,
if (mm) {
down_write(&mm->mmap_sem);
+ if (!mmget_still_valid(mm)) {
+ if (allocate == 0)
+ goto free_range;
+ goto err_no_vma;
+ }
+
vma = proc->vma;
if (vma && mm != proc->vma_vm_mm) {
pr_err("%d: vma mm and task mm mismatch\n",
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 5138e78..4b207b1 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1057,6 +1057,24 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
count = -EINTR;
goto out_mm;
}
+ /*
+ * Avoid to modify vma->vm_flags
+ * without locked ops while the
+ * coredump reads the vm_flags.
+ */
+ if (!mmget_still_valid(mm)) {
+ /*
+ * Silently return "count"
+ * like if get_task_mm()
+ * failed. FIXME: should this
+ * function have returned
+ * -ESRCH if get_task_mm()
+ * failed like if
+ * get_proc_task() fails?
+ */
+ up_write(&mm->mmap_sem);
+ goto out_mm;
+ }
for (vma = mm->mmap; vma; vma = vma->vm_next) {
vma->vm_flags &= ~VM_SOFTDIRTY;
vma_set_page_prot(vma);
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 784d667..8bf425a 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -479,6 +479,8 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
* taking the mmap_sem for writing.
*/
down_write(&mm->mmap_sem);
+ if (!mmget_still_valid(mm))
+ goto skip_mm;
prev = NULL;
for (vma = mm->mmap; vma; vma = vma->vm_next) {
cond_resched();
@@ -501,6 +503,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
vma->vm_flags = new_flags;
vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
}
+skip_mm:
up_write(&mm->mmap_sem);
mmput(mm);
wakeup:
@@ -802,6 +805,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
goto out;
down_write(&mm->mmap_sem);
+ if (!mmget_still_valid(mm))
+ goto out_unlock;
+
vma = find_vma_prev(mm, start, &prev);
if (!vma)
goto out_unlock;
@@ -947,6 +953,9 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
goto out;
down_write(&mm->mmap_sem);
+ if (!mmget_still_valid(mm))
+ goto out_unlock;
+
vma = find_vma_prev(mm, start, &prev);
if (!vma)
goto out_unlock;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index e3c8d40..c239984 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1189,6 +1189,27 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long address,
void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
unsigned long start, unsigned long end);
+/*
+ * This has to be called after a get_task_mm()/mmget_not_zero()
+ * followed by taking the mmap_sem for writing before modifying the
+ * vmas or anything the coredump pretends not to change from under it.
+ *
+ * NOTE: find_extend_vma() called from GUP context is the only place
+ * that can modify the "mm" (notably the vm_start/end) under mmap_sem
+ * for reading and outside the context of the process, so it is also
+ * the only case that holds the mmap_sem for reading that must call
+ * this function. Generally if the mmap_sem is hold for reading
+ * there's no need of this check after get_task_mm()/mmget_not_zero().
+ *
+ * This function can be obsoleted and the check can be removed, after
+ * the coredump code will hold the mmap_sem for writing before
+ * invoking the ->core_dump methods.
+ */
+static inline bool mmget_still_valid(struct mm_struct *mm)
+{
+ return likely(!mm->core_state);
+}
+
/**
* mm_walk - callbacks for walk_page_range
* @pmd_entry: if set, called for each non-empty PMD (3rd-level) entry
diff --git a/mm/mmap.c b/mm/mmap.c
index 3f2314a..19368fb 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2448,7 +2448,8 @@ find_extend_vma(struct mm_struct *mm, unsigned long addr)
vma = find_vma_prev(mm, addr, &prev);
if (vma && (vma->vm_start <= addr))
return vma;
- if (!prev || expand_stack(prev, addr))
+ /* don't alter vm_end if the coredump is running */
+ if (!prev || !mmget_still_valid(mm) || expand_stack(prev, addr))
return NULL;
if (prev->vm_flags & VM_LOCKED)
populate_vma_page_range(prev, addr, prev->vm_end, NULL);
@@ -2474,6 +2475,9 @@ find_extend_vma(struct mm_struct *mm, unsigned long addr)
return vma;
if (!(vma->vm_flags & VM_GROWSDOWN))
return NULL;
+ /* don't alter vm_start if the coredump is running */
+ if (!mmget_still_valid(mm))
+ return NULL;
start = vma->vm_start;
if (expand_stack(vma, addr))
return NULL;
--
2.7.4
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From cb361d8cdef69990f6b4504dc1fd9a594d983c97 Mon Sep 17 00:00:00 2001
From: Jann Horn <jannh(a)google.com>
Date: Tue, 16 Jul 2019 17:20:47 +0200
Subject: [PATCH] sched/fair: Use RCU accessors consistently for ->numa_group
The old code used RCU annotations and accessors inconsistently for
->numa_group, which can lead to use-after-frees and NULL dereferences.
Let all accesses to ->numa_group use proper RCU helpers to prevent such
issues.
Signed-off-by: Jann Horn <jannh(a)google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Petr Mladek <pmladek(a)suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky(a)gmail.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Will Deacon <will(a)kernel.org>
Fixes: 8c8a743c5087 ("sched/numa: Use {cpu, pid} to create task groups for shared faults")
Link: https://lkml.kernel.org/r/20190716152047.14424-3-jannh@google.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8dc1811487f5..9f51932bd543 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1092,7 +1092,15 @@ struct task_struct {
u64 last_sum_exec_runtime;
struct callback_head numa_work;
- struct numa_group *numa_group;
+ /*
+ * This pointer is only modified for current in syscall and
+ * pagefault context (and for tasks being destroyed), so it can be read
+ * from any of the following contexts:
+ * - RCU read-side critical section
+ * - current->numa_group from everywhere
+ * - task's runqueue locked, task not running
+ */
+ struct numa_group __rcu *numa_group;
/*
* numa_faults is an array split into four regions:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6adb0e0f5feb..bc9cfeaac8bd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1086,6 +1086,21 @@ struct numa_group {
unsigned long faults[0];
};
+/*
+ * For functions that can be called in multiple contexts that permit reading
+ * ->numa_group (see struct task_struct for locking rules).
+ */
+static struct numa_group *deref_task_numa_group(struct task_struct *p)
+{
+ return rcu_dereference_check(p->numa_group, p == current ||
+ (lockdep_is_held(&task_rq(p)->lock) && !READ_ONCE(p->on_cpu)));
+}
+
+static struct numa_group *deref_curr_numa_group(struct task_struct *p)
+{
+ return rcu_dereference_protected(p->numa_group, p == current);
+}
+
static inline unsigned long group_faults_priv(struct numa_group *ng);
static inline unsigned long group_faults_shared(struct numa_group *ng);
@@ -1129,10 +1144,12 @@ static unsigned int task_scan_start(struct task_struct *p)
{
unsigned long smin = task_scan_min(p);
unsigned long period = smin;
+ struct numa_group *ng;
/* Scale the maximum scan period with the amount of shared memory. */
- if (p->numa_group) {
- struct numa_group *ng = p->numa_group;
+ rcu_read_lock();
+ ng = rcu_dereference(p->numa_group);
+ if (ng) {
unsigned long shared = group_faults_shared(ng);
unsigned long private = group_faults_priv(ng);
@@ -1140,6 +1157,7 @@ static unsigned int task_scan_start(struct task_struct *p)
period *= shared + 1;
period /= private + shared + 1;
}
+ rcu_read_unlock();
return max(smin, period);
}
@@ -1148,13 +1166,14 @@ static unsigned int task_scan_max(struct task_struct *p)
{
unsigned long smin = task_scan_min(p);
unsigned long smax;
+ struct numa_group *ng;
/* Watch for min being lower than max due to floor calculations */
smax = sysctl_numa_balancing_scan_period_max / task_nr_scan_windows(p);
/* Scale the maximum scan period with the amount of shared memory. */
- if (p->numa_group) {
- struct numa_group *ng = p->numa_group;
+ ng = deref_curr_numa_group(p);
+ if (ng) {
unsigned long shared = group_faults_shared(ng);
unsigned long private = group_faults_priv(ng);
unsigned long period = smax;
@@ -1186,7 +1205,7 @@ void init_numa_balancing(unsigned long clone_flags, struct task_struct *p)
p->numa_scan_period = sysctl_numa_balancing_scan_delay;
p->numa_work.next = &p->numa_work;
p->numa_faults = NULL;
- p->numa_group = NULL;
+ RCU_INIT_POINTER(p->numa_group, NULL);
p->last_task_numa_placement = 0;
p->last_sum_exec_runtime = 0;
@@ -1233,7 +1252,16 @@ static void account_numa_dequeue(struct rq *rq, struct task_struct *p)
pid_t task_numa_group_id(struct task_struct *p)
{
- return p->numa_group ? p->numa_group->gid : 0;
+ struct numa_group *ng;
+ pid_t gid = 0;
+
+ rcu_read_lock();
+ ng = rcu_dereference(p->numa_group);
+ if (ng)
+ gid = ng->gid;
+ rcu_read_unlock();
+
+ return gid;
}
/*
@@ -1258,11 +1286,13 @@ static inline unsigned long task_faults(struct task_struct *p, int nid)
static inline unsigned long group_faults(struct task_struct *p, int nid)
{
- if (!p->numa_group)
+ struct numa_group *ng = deref_task_numa_group(p);
+
+ if (!ng)
return 0;
- return p->numa_group->faults[task_faults_idx(NUMA_MEM, nid, 0)] +
- p->numa_group->faults[task_faults_idx(NUMA_MEM, nid, 1)];
+ return ng->faults[task_faults_idx(NUMA_MEM, nid, 0)] +
+ ng->faults[task_faults_idx(NUMA_MEM, nid, 1)];
}
static inline unsigned long group_faults_cpu(struct numa_group *group, int nid)
@@ -1400,12 +1430,13 @@ static inline unsigned long task_weight(struct task_struct *p, int nid,
static inline unsigned long group_weight(struct task_struct *p, int nid,
int dist)
{
+ struct numa_group *ng = deref_task_numa_group(p);
unsigned long faults, total_faults;
- if (!p->numa_group)
+ if (!ng)
return 0;
- total_faults = p->numa_group->total_faults;
+ total_faults = ng->total_faults;
if (!total_faults)
return 0;
@@ -1419,7 +1450,7 @@ static inline unsigned long group_weight(struct task_struct *p, int nid,
bool should_numa_migrate_memory(struct task_struct *p, struct page * page,
int src_nid, int dst_cpu)
{
- struct numa_group *ng = p->numa_group;
+ struct numa_group *ng = deref_curr_numa_group(p);
int dst_nid = cpu_to_node(dst_cpu);
int last_cpupid, this_cpupid;
@@ -1600,13 +1631,14 @@ static bool load_too_imbalanced(long src_load, long dst_load,
static void task_numa_compare(struct task_numa_env *env,
long taskimp, long groupimp, bool maymove)
{
+ struct numa_group *cur_ng, *p_ng = deref_curr_numa_group(env->p);
struct rq *dst_rq = cpu_rq(env->dst_cpu);
+ long imp = p_ng ? groupimp : taskimp;
struct task_struct *cur;
long src_load, dst_load;
- long load;
- long imp = env->p->numa_group ? groupimp : taskimp;
- long moveimp = imp;
int dist = env->dist;
+ long moveimp = imp;
+ long load;
if (READ_ONCE(dst_rq->numa_migrate_on))
return;
@@ -1645,21 +1677,22 @@ static void task_numa_compare(struct task_numa_env *env,
* If dst and source tasks are in the same NUMA group, or not
* in any group then look only at task weights.
*/
- if (cur->numa_group == env->p->numa_group) {
+ cur_ng = rcu_dereference(cur->numa_group);
+ if (cur_ng == p_ng) {
imp = taskimp + task_weight(cur, env->src_nid, dist) -
task_weight(cur, env->dst_nid, dist);
/*
* Add some hysteresis to prevent swapping the
* tasks within a group over tiny differences.
*/
- if (cur->numa_group)
+ if (cur_ng)
imp -= imp / 16;
} else {
/*
* Compare the group weights. If a task is all by itself
* (not part of a group), use the task weight instead.
*/
- if (cur->numa_group && env->p->numa_group)
+ if (cur_ng && p_ng)
imp += group_weight(cur, env->src_nid, dist) -
group_weight(cur, env->dst_nid, dist);
else
@@ -1757,11 +1790,12 @@ static int task_numa_migrate(struct task_struct *p)
.best_imp = 0,
.best_cpu = -1,
};
+ unsigned long taskweight, groupweight;
struct sched_domain *sd;
+ long taskimp, groupimp;
+ struct numa_group *ng;
struct rq *best_rq;
- unsigned long taskweight, groupweight;
int nid, ret, dist;
- long taskimp, groupimp;
/*
* Pick the lowest SD_NUMA domain, as that would have the smallest
@@ -1807,7 +1841,8 @@ static int task_numa_migrate(struct task_struct *p)
* multiple NUMA nodes; in order to better consolidate the group,
* we need to check other locations.
*/
- if (env.best_cpu == -1 || (p->numa_group && p->numa_group->active_nodes > 1)) {
+ ng = deref_curr_numa_group(p);
+ if (env.best_cpu == -1 || (ng && ng->active_nodes > 1)) {
for_each_online_node(nid) {
if (nid == env.src_nid || nid == p->numa_preferred_nid)
continue;
@@ -1840,7 +1875,7 @@ static int task_numa_migrate(struct task_struct *p)
* A task that migrated to a second choice node will be better off
* trying for a better one later. Do not set the preferred node here.
*/
- if (p->numa_group) {
+ if (ng) {
if (env.best_cpu == -1)
nid = env.src_nid;
else
@@ -2135,6 +2170,7 @@ static void task_numa_placement(struct task_struct *p)
unsigned long total_faults;
u64 runtime, period;
spinlock_t *group_lock = NULL;
+ struct numa_group *ng;
/*
* The p->mm->numa_scan_seq field gets updated without
@@ -2152,8 +2188,9 @@ static void task_numa_placement(struct task_struct *p)
runtime = numa_get_avg_runtime(p, &period);
/* If the task is part of a group prevent parallel updates to group stats */
- if (p->numa_group) {
- group_lock = &p->numa_group->lock;
+ ng = deref_curr_numa_group(p);
+ if (ng) {
+ group_lock = &ng->lock;
spin_lock_irq(group_lock);
}
@@ -2194,7 +2231,7 @@ static void task_numa_placement(struct task_struct *p)
p->numa_faults[cpu_idx] += f_diff;
faults += p->numa_faults[mem_idx];
p->total_numa_faults += diff;
- if (p->numa_group) {
+ if (ng) {
/*
* safe because we can only change our own group
*
@@ -2202,14 +2239,14 @@ static void task_numa_placement(struct task_struct *p)
* nid and priv in a specific region because it
* is at the beginning of the numa_faults array.
*/
- p->numa_group->faults[mem_idx] += diff;
- p->numa_group->faults_cpu[mem_idx] += f_diff;
- p->numa_group->total_faults += diff;
- group_faults += p->numa_group->faults[mem_idx];
+ ng->faults[mem_idx] += diff;
+ ng->faults_cpu[mem_idx] += f_diff;
+ ng->total_faults += diff;
+ group_faults += ng->faults[mem_idx];
}
}
- if (!p->numa_group) {
+ if (!ng) {
if (faults > max_faults) {
max_faults = faults;
max_nid = nid;
@@ -2220,8 +2257,8 @@ static void task_numa_placement(struct task_struct *p)
}
}
- if (p->numa_group) {
- numa_group_count_active_nodes(p->numa_group);
+ if (ng) {
+ numa_group_count_active_nodes(ng);
spin_unlock_irq(group_lock);
max_nid = preferred_group_nid(p, max_nid);
}
@@ -2255,7 +2292,7 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
int cpu = cpupid_to_cpu(cpupid);
int i;
- if (unlikely(!p->numa_group)) {
+ if (unlikely(!deref_curr_numa_group(p))) {
unsigned int size = sizeof(struct numa_group) +
4*nr_node_ids*sizeof(unsigned long);
@@ -2291,7 +2328,7 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
if (!grp)
goto no_join;
- my_grp = p->numa_group;
+ my_grp = deref_curr_numa_group(p);
if (grp == my_grp)
goto no_join;
@@ -2362,7 +2399,8 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
*/
void task_numa_free(struct task_struct *p, bool final)
{
- struct numa_group *grp = p->numa_group;
+ /* safe: p either is current or is being freed by current */
+ struct numa_group *grp = rcu_dereference_raw(p->numa_group);
unsigned long *numa_faults = p->numa_faults;
unsigned long flags;
int i;
@@ -2442,7 +2480,7 @@ void task_numa_fault(int last_cpupid, int mem_node, int pages, int flags)
* actively using should be counted as local. This allows the
* scan rate to slow down when a workload has settled down.
*/
- ng = p->numa_group;
+ ng = deref_curr_numa_group(p);
if (!priv && !local && ng && ng->active_nodes > 1 &&
numa_is_active_node(cpu_node, ng) &&
numa_is_active_node(mem_node, ng))
@@ -10460,18 +10498,22 @@ void show_numa_stats(struct task_struct *p, struct seq_file *m)
{
int node;
unsigned long tsf = 0, tpf = 0, gsf = 0, gpf = 0;
+ struct numa_group *ng;
+ rcu_read_lock();
+ ng = rcu_dereference(p->numa_group);
for_each_online_node(node) {
if (p->numa_faults) {
tsf = p->numa_faults[task_faults_idx(NUMA_MEM, node, 0)];
tpf = p->numa_faults[task_faults_idx(NUMA_MEM, node, 1)];
}
- if (p->numa_group) {
- gsf = p->numa_group->faults[task_faults_idx(NUMA_MEM, node, 0)],
- gpf = p->numa_group->faults[task_faults_idx(NUMA_MEM, node, 1)];
+ if (ng) {
+ gsf = ng->faults[task_faults_idx(NUMA_MEM, node, 0)],
+ gpf = ng->faults[task_faults_idx(NUMA_MEM, node, 1)];
}
print_numa_stats(m, node, tsf, tpf, gsf, gpf);
}
+ rcu_read_unlock();
}
#endif /* CONFIG_NUMA_BALANCING */
#endif /* CONFIG_SCHED_DEBUG */
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From cb361d8cdef69990f6b4504dc1fd9a594d983c97 Mon Sep 17 00:00:00 2001
From: Jann Horn <jannh(a)google.com>
Date: Tue, 16 Jul 2019 17:20:47 +0200
Subject: [PATCH] sched/fair: Use RCU accessors consistently for ->numa_group
The old code used RCU annotations and accessors inconsistently for
->numa_group, which can lead to use-after-frees and NULL dereferences.
Let all accesses to ->numa_group use proper RCU helpers to prevent such
issues.
Signed-off-by: Jann Horn <jannh(a)google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Petr Mladek <pmladek(a)suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky(a)gmail.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Will Deacon <will(a)kernel.org>
Fixes: 8c8a743c5087 ("sched/numa: Use {cpu, pid} to create task groups for shared faults")
Link: https://lkml.kernel.org/r/20190716152047.14424-3-jannh@google.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8dc1811487f5..9f51932bd543 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1092,7 +1092,15 @@ struct task_struct {
u64 last_sum_exec_runtime;
struct callback_head numa_work;
- struct numa_group *numa_group;
+ /*
+ * This pointer is only modified for current in syscall and
+ * pagefault context (and for tasks being destroyed), so it can be read
+ * from any of the following contexts:
+ * - RCU read-side critical section
+ * - current->numa_group from everywhere
+ * - task's runqueue locked, task not running
+ */
+ struct numa_group __rcu *numa_group;
/*
* numa_faults is an array split into four regions:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6adb0e0f5feb..bc9cfeaac8bd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1086,6 +1086,21 @@ struct numa_group {
unsigned long faults[0];
};
+/*
+ * For functions that can be called in multiple contexts that permit reading
+ * ->numa_group (see struct task_struct for locking rules).
+ */
+static struct numa_group *deref_task_numa_group(struct task_struct *p)
+{
+ return rcu_dereference_check(p->numa_group, p == current ||
+ (lockdep_is_held(&task_rq(p)->lock) && !READ_ONCE(p->on_cpu)));
+}
+
+static struct numa_group *deref_curr_numa_group(struct task_struct *p)
+{
+ return rcu_dereference_protected(p->numa_group, p == current);
+}
+
static inline unsigned long group_faults_priv(struct numa_group *ng);
static inline unsigned long group_faults_shared(struct numa_group *ng);
@@ -1129,10 +1144,12 @@ static unsigned int task_scan_start(struct task_struct *p)
{
unsigned long smin = task_scan_min(p);
unsigned long period = smin;
+ struct numa_group *ng;
/* Scale the maximum scan period with the amount of shared memory. */
- if (p->numa_group) {
- struct numa_group *ng = p->numa_group;
+ rcu_read_lock();
+ ng = rcu_dereference(p->numa_group);
+ if (ng) {
unsigned long shared = group_faults_shared(ng);
unsigned long private = group_faults_priv(ng);
@@ -1140,6 +1157,7 @@ static unsigned int task_scan_start(struct task_struct *p)
period *= shared + 1;
period /= private + shared + 1;
}
+ rcu_read_unlock();
return max(smin, period);
}
@@ -1148,13 +1166,14 @@ static unsigned int task_scan_max(struct task_struct *p)
{
unsigned long smin = task_scan_min(p);
unsigned long smax;
+ struct numa_group *ng;
/* Watch for min being lower than max due to floor calculations */
smax = sysctl_numa_balancing_scan_period_max / task_nr_scan_windows(p);
/* Scale the maximum scan period with the amount of shared memory. */
- if (p->numa_group) {
- struct numa_group *ng = p->numa_group;
+ ng = deref_curr_numa_group(p);
+ if (ng) {
unsigned long shared = group_faults_shared(ng);
unsigned long private = group_faults_priv(ng);
unsigned long period = smax;
@@ -1186,7 +1205,7 @@ void init_numa_balancing(unsigned long clone_flags, struct task_struct *p)
p->numa_scan_period = sysctl_numa_balancing_scan_delay;
p->numa_work.next = &p->numa_work;
p->numa_faults = NULL;
- p->numa_group = NULL;
+ RCU_INIT_POINTER(p->numa_group, NULL);
p->last_task_numa_placement = 0;
p->last_sum_exec_runtime = 0;
@@ -1233,7 +1252,16 @@ static void account_numa_dequeue(struct rq *rq, struct task_struct *p)
pid_t task_numa_group_id(struct task_struct *p)
{
- return p->numa_group ? p->numa_group->gid : 0;
+ struct numa_group *ng;
+ pid_t gid = 0;
+
+ rcu_read_lock();
+ ng = rcu_dereference(p->numa_group);
+ if (ng)
+ gid = ng->gid;
+ rcu_read_unlock();
+
+ return gid;
}
/*
@@ -1258,11 +1286,13 @@ static inline unsigned long task_faults(struct task_struct *p, int nid)
static inline unsigned long group_faults(struct task_struct *p, int nid)
{
- if (!p->numa_group)
+ struct numa_group *ng = deref_task_numa_group(p);
+
+ if (!ng)
return 0;
- return p->numa_group->faults[task_faults_idx(NUMA_MEM, nid, 0)] +
- p->numa_group->faults[task_faults_idx(NUMA_MEM, nid, 1)];
+ return ng->faults[task_faults_idx(NUMA_MEM, nid, 0)] +
+ ng->faults[task_faults_idx(NUMA_MEM, nid, 1)];
}
static inline unsigned long group_faults_cpu(struct numa_group *group, int nid)
@@ -1400,12 +1430,13 @@ static inline unsigned long task_weight(struct task_struct *p, int nid,
static inline unsigned long group_weight(struct task_struct *p, int nid,
int dist)
{
+ struct numa_group *ng = deref_task_numa_group(p);
unsigned long faults, total_faults;
- if (!p->numa_group)
+ if (!ng)
return 0;
- total_faults = p->numa_group->total_faults;
+ total_faults = ng->total_faults;
if (!total_faults)
return 0;
@@ -1419,7 +1450,7 @@ static inline unsigned long group_weight(struct task_struct *p, int nid,
bool should_numa_migrate_memory(struct task_struct *p, struct page * page,
int src_nid, int dst_cpu)
{
- struct numa_group *ng = p->numa_group;
+ struct numa_group *ng = deref_curr_numa_group(p);
int dst_nid = cpu_to_node(dst_cpu);
int last_cpupid, this_cpupid;
@@ -1600,13 +1631,14 @@ static bool load_too_imbalanced(long src_load, long dst_load,
static void task_numa_compare(struct task_numa_env *env,
long taskimp, long groupimp, bool maymove)
{
+ struct numa_group *cur_ng, *p_ng = deref_curr_numa_group(env->p);
struct rq *dst_rq = cpu_rq(env->dst_cpu);
+ long imp = p_ng ? groupimp : taskimp;
struct task_struct *cur;
long src_load, dst_load;
- long load;
- long imp = env->p->numa_group ? groupimp : taskimp;
- long moveimp = imp;
int dist = env->dist;
+ long moveimp = imp;
+ long load;
if (READ_ONCE(dst_rq->numa_migrate_on))
return;
@@ -1645,21 +1677,22 @@ static void task_numa_compare(struct task_numa_env *env,
* If dst and source tasks are in the same NUMA group, or not
* in any group then look only at task weights.
*/
- if (cur->numa_group == env->p->numa_group) {
+ cur_ng = rcu_dereference(cur->numa_group);
+ if (cur_ng == p_ng) {
imp = taskimp + task_weight(cur, env->src_nid, dist) -
task_weight(cur, env->dst_nid, dist);
/*
* Add some hysteresis to prevent swapping the
* tasks within a group over tiny differences.
*/
- if (cur->numa_group)
+ if (cur_ng)
imp -= imp / 16;
} else {
/*
* Compare the group weights. If a task is all by itself
* (not part of a group), use the task weight instead.
*/
- if (cur->numa_group && env->p->numa_group)
+ if (cur_ng && p_ng)
imp += group_weight(cur, env->src_nid, dist) -
group_weight(cur, env->dst_nid, dist);
else
@@ -1757,11 +1790,12 @@ static int task_numa_migrate(struct task_struct *p)
.best_imp = 0,
.best_cpu = -1,
};
+ unsigned long taskweight, groupweight;
struct sched_domain *sd;
+ long taskimp, groupimp;
+ struct numa_group *ng;
struct rq *best_rq;
- unsigned long taskweight, groupweight;
int nid, ret, dist;
- long taskimp, groupimp;
/*
* Pick the lowest SD_NUMA domain, as that would have the smallest
@@ -1807,7 +1841,8 @@ static int task_numa_migrate(struct task_struct *p)
* multiple NUMA nodes; in order to better consolidate the group,
* we need to check other locations.
*/
- if (env.best_cpu == -1 || (p->numa_group && p->numa_group->active_nodes > 1)) {
+ ng = deref_curr_numa_group(p);
+ if (env.best_cpu == -1 || (ng && ng->active_nodes > 1)) {
for_each_online_node(nid) {
if (nid == env.src_nid || nid == p->numa_preferred_nid)
continue;
@@ -1840,7 +1875,7 @@ static int task_numa_migrate(struct task_struct *p)
* A task that migrated to a second choice node will be better off
* trying for a better one later. Do not set the preferred node here.
*/
- if (p->numa_group) {
+ if (ng) {
if (env.best_cpu == -1)
nid = env.src_nid;
else
@@ -2135,6 +2170,7 @@ static void task_numa_placement(struct task_struct *p)
unsigned long total_faults;
u64 runtime, period;
spinlock_t *group_lock = NULL;
+ struct numa_group *ng;
/*
* The p->mm->numa_scan_seq field gets updated without
@@ -2152,8 +2188,9 @@ static void task_numa_placement(struct task_struct *p)
runtime = numa_get_avg_runtime(p, &period);
/* If the task is part of a group prevent parallel updates to group stats */
- if (p->numa_group) {
- group_lock = &p->numa_group->lock;
+ ng = deref_curr_numa_group(p);
+ if (ng) {
+ group_lock = &ng->lock;
spin_lock_irq(group_lock);
}
@@ -2194,7 +2231,7 @@ static void task_numa_placement(struct task_struct *p)
p->numa_faults[cpu_idx] += f_diff;
faults += p->numa_faults[mem_idx];
p->total_numa_faults += diff;
- if (p->numa_group) {
+ if (ng) {
/*
* safe because we can only change our own group
*
@@ -2202,14 +2239,14 @@ static void task_numa_placement(struct task_struct *p)
* nid and priv in a specific region because it
* is at the beginning of the numa_faults array.
*/
- p->numa_group->faults[mem_idx] += diff;
- p->numa_group->faults_cpu[mem_idx] += f_diff;
- p->numa_group->total_faults += diff;
- group_faults += p->numa_group->faults[mem_idx];
+ ng->faults[mem_idx] += diff;
+ ng->faults_cpu[mem_idx] += f_diff;
+ ng->total_faults += diff;
+ group_faults += ng->faults[mem_idx];
}
}
- if (!p->numa_group) {
+ if (!ng) {
if (faults > max_faults) {
max_faults = faults;
max_nid = nid;
@@ -2220,8 +2257,8 @@ static void task_numa_placement(struct task_struct *p)
}
}
- if (p->numa_group) {
- numa_group_count_active_nodes(p->numa_group);
+ if (ng) {
+ numa_group_count_active_nodes(ng);
spin_unlock_irq(group_lock);
max_nid = preferred_group_nid(p, max_nid);
}
@@ -2255,7 +2292,7 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
int cpu = cpupid_to_cpu(cpupid);
int i;
- if (unlikely(!p->numa_group)) {
+ if (unlikely(!deref_curr_numa_group(p))) {
unsigned int size = sizeof(struct numa_group) +
4*nr_node_ids*sizeof(unsigned long);
@@ -2291,7 +2328,7 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
if (!grp)
goto no_join;
- my_grp = p->numa_group;
+ my_grp = deref_curr_numa_group(p);
if (grp == my_grp)
goto no_join;
@@ -2362,7 +2399,8 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
*/
void task_numa_free(struct task_struct *p, bool final)
{
- struct numa_group *grp = p->numa_group;
+ /* safe: p either is current or is being freed by current */
+ struct numa_group *grp = rcu_dereference_raw(p->numa_group);
unsigned long *numa_faults = p->numa_faults;
unsigned long flags;
int i;
@@ -2442,7 +2480,7 @@ void task_numa_fault(int last_cpupid, int mem_node, int pages, int flags)
* actively using should be counted as local. This allows the
* scan rate to slow down when a workload has settled down.
*/
- ng = p->numa_group;
+ ng = deref_curr_numa_group(p);
if (!priv && !local && ng && ng->active_nodes > 1 &&
numa_is_active_node(cpu_node, ng) &&
numa_is_active_node(mem_node, ng))
@@ -10460,18 +10498,22 @@ void show_numa_stats(struct task_struct *p, struct seq_file *m)
{
int node;
unsigned long tsf = 0, tpf = 0, gsf = 0, gpf = 0;
+ struct numa_group *ng;
+ rcu_read_lock();
+ ng = rcu_dereference(p->numa_group);
for_each_online_node(node) {
if (p->numa_faults) {
tsf = p->numa_faults[task_faults_idx(NUMA_MEM, node, 0)];
tpf = p->numa_faults[task_faults_idx(NUMA_MEM, node, 1)];
}
- if (p->numa_group) {
- gsf = p->numa_group->faults[task_faults_idx(NUMA_MEM, node, 0)],
- gpf = p->numa_group->faults[task_faults_idx(NUMA_MEM, node, 1)];
+ if (ng) {
+ gsf = ng->faults[task_faults_idx(NUMA_MEM, node, 0)],
+ gpf = ng->faults[task_faults_idx(NUMA_MEM, node, 1)];
}
print_numa_stats(m, node, tsf, tpf, gsf, gpf);
}
+ rcu_read_unlock();
}
#endif /* CONFIG_NUMA_BALANCING */
#endif /* CONFIG_SCHED_DEBUG */
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From cb361d8cdef69990f6b4504dc1fd9a594d983c97 Mon Sep 17 00:00:00 2001
From: Jann Horn <jannh(a)google.com>
Date: Tue, 16 Jul 2019 17:20:47 +0200
Subject: [PATCH] sched/fair: Use RCU accessors consistently for ->numa_group
The old code used RCU annotations and accessors inconsistently for
->numa_group, which can lead to use-after-frees and NULL dereferences.
Let all accesses to ->numa_group use proper RCU helpers to prevent such
issues.
Signed-off-by: Jann Horn <jannh(a)google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Petr Mladek <pmladek(a)suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky(a)gmail.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Will Deacon <will(a)kernel.org>
Fixes: 8c8a743c5087 ("sched/numa: Use {cpu, pid} to create task groups for shared faults")
Link: https://lkml.kernel.org/r/20190716152047.14424-3-jannh@google.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8dc1811487f5..9f51932bd543 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1092,7 +1092,15 @@ struct task_struct {
u64 last_sum_exec_runtime;
struct callback_head numa_work;
- struct numa_group *numa_group;
+ /*
+ * This pointer is only modified for current in syscall and
+ * pagefault context (and for tasks being destroyed), so it can be read
+ * from any of the following contexts:
+ * - RCU read-side critical section
+ * - current->numa_group from everywhere
+ * - task's runqueue locked, task not running
+ */
+ struct numa_group __rcu *numa_group;
/*
* numa_faults is an array split into four regions:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6adb0e0f5feb..bc9cfeaac8bd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1086,6 +1086,21 @@ struct numa_group {
unsigned long faults[0];
};
+/*
+ * For functions that can be called in multiple contexts that permit reading
+ * ->numa_group (see struct task_struct for locking rules).
+ */
+static struct numa_group *deref_task_numa_group(struct task_struct *p)
+{
+ return rcu_dereference_check(p->numa_group, p == current ||
+ (lockdep_is_held(&task_rq(p)->lock) && !READ_ONCE(p->on_cpu)));
+}
+
+static struct numa_group *deref_curr_numa_group(struct task_struct *p)
+{
+ return rcu_dereference_protected(p->numa_group, p == current);
+}
+
static inline unsigned long group_faults_priv(struct numa_group *ng);
static inline unsigned long group_faults_shared(struct numa_group *ng);
@@ -1129,10 +1144,12 @@ static unsigned int task_scan_start(struct task_struct *p)
{
unsigned long smin = task_scan_min(p);
unsigned long period = smin;
+ struct numa_group *ng;
/* Scale the maximum scan period with the amount of shared memory. */
- if (p->numa_group) {
- struct numa_group *ng = p->numa_group;
+ rcu_read_lock();
+ ng = rcu_dereference(p->numa_group);
+ if (ng) {
unsigned long shared = group_faults_shared(ng);
unsigned long private = group_faults_priv(ng);
@@ -1140,6 +1157,7 @@ static unsigned int task_scan_start(struct task_struct *p)
period *= shared + 1;
period /= private + shared + 1;
}
+ rcu_read_unlock();
return max(smin, period);
}
@@ -1148,13 +1166,14 @@ static unsigned int task_scan_max(struct task_struct *p)
{
unsigned long smin = task_scan_min(p);
unsigned long smax;
+ struct numa_group *ng;
/* Watch for min being lower than max due to floor calculations */
smax = sysctl_numa_balancing_scan_period_max / task_nr_scan_windows(p);
/* Scale the maximum scan period with the amount of shared memory. */
- if (p->numa_group) {
- struct numa_group *ng = p->numa_group;
+ ng = deref_curr_numa_group(p);
+ if (ng) {
unsigned long shared = group_faults_shared(ng);
unsigned long private = group_faults_priv(ng);
unsigned long period = smax;
@@ -1186,7 +1205,7 @@ void init_numa_balancing(unsigned long clone_flags, struct task_struct *p)
p->numa_scan_period = sysctl_numa_balancing_scan_delay;
p->numa_work.next = &p->numa_work;
p->numa_faults = NULL;
- p->numa_group = NULL;
+ RCU_INIT_POINTER(p->numa_group, NULL);
p->last_task_numa_placement = 0;
p->last_sum_exec_runtime = 0;
@@ -1233,7 +1252,16 @@ static void account_numa_dequeue(struct rq *rq, struct task_struct *p)
pid_t task_numa_group_id(struct task_struct *p)
{
- return p->numa_group ? p->numa_group->gid : 0;
+ struct numa_group *ng;
+ pid_t gid = 0;
+
+ rcu_read_lock();
+ ng = rcu_dereference(p->numa_group);
+ if (ng)
+ gid = ng->gid;
+ rcu_read_unlock();
+
+ return gid;
}
/*
@@ -1258,11 +1286,13 @@ static inline unsigned long task_faults(struct task_struct *p, int nid)
static inline unsigned long group_faults(struct task_struct *p, int nid)
{
- if (!p->numa_group)
+ struct numa_group *ng = deref_task_numa_group(p);
+
+ if (!ng)
return 0;
- return p->numa_group->faults[task_faults_idx(NUMA_MEM, nid, 0)] +
- p->numa_group->faults[task_faults_idx(NUMA_MEM, nid, 1)];
+ return ng->faults[task_faults_idx(NUMA_MEM, nid, 0)] +
+ ng->faults[task_faults_idx(NUMA_MEM, nid, 1)];
}
static inline unsigned long group_faults_cpu(struct numa_group *group, int nid)
@@ -1400,12 +1430,13 @@ static inline unsigned long task_weight(struct task_struct *p, int nid,
static inline unsigned long group_weight(struct task_struct *p, int nid,
int dist)
{
+ struct numa_group *ng = deref_task_numa_group(p);
unsigned long faults, total_faults;
- if (!p->numa_group)
+ if (!ng)
return 0;
- total_faults = p->numa_group->total_faults;
+ total_faults = ng->total_faults;
if (!total_faults)
return 0;
@@ -1419,7 +1450,7 @@ static inline unsigned long group_weight(struct task_struct *p, int nid,
bool should_numa_migrate_memory(struct task_struct *p, struct page * page,
int src_nid, int dst_cpu)
{
- struct numa_group *ng = p->numa_group;
+ struct numa_group *ng = deref_curr_numa_group(p);
int dst_nid = cpu_to_node(dst_cpu);
int last_cpupid, this_cpupid;
@@ -1600,13 +1631,14 @@ static bool load_too_imbalanced(long src_load, long dst_load,
static void task_numa_compare(struct task_numa_env *env,
long taskimp, long groupimp, bool maymove)
{
+ struct numa_group *cur_ng, *p_ng = deref_curr_numa_group(env->p);
struct rq *dst_rq = cpu_rq(env->dst_cpu);
+ long imp = p_ng ? groupimp : taskimp;
struct task_struct *cur;
long src_load, dst_load;
- long load;
- long imp = env->p->numa_group ? groupimp : taskimp;
- long moveimp = imp;
int dist = env->dist;
+ long moveimp = imp;
+ long load;
if (READ_ONCE(dst_rq->numa_migrate_on))
return;
@@ -1645,21 +1677,22 @@ static void task_numa_compare(struct task_numa_env *env,
* If dst and source tasks are in the same NUMA group, or not
* in any group then look only at task weights.
*/
- if (cur->numa_group == env->p->numa_group) {
+ cur_ng = rcu_dereference(cur->numa_group);
+ if (cur_ng == p_ng) {
imp = taskimp + task_weight(cur, env->src_nid, dist) -
task_weight(cur, env->dst_nid, dist);
/*
* Add some hysteresis to prevent swapping the
* tasks within a group over tiny differences.
*/
- if (cur->numa_group)
+ if (cur_ng)
imp -= imp / 16;
} else {
/*
* Compare the group weights. If a task is all by itself
* (not part of a group), use the task weight instead.
*/
- if (cur->numa_group && env->p->numa_group)
+ if (cur_ng && p_ng)
imp += group_weight(cur, env->src_nid, dist) -
group_weight(cur, env->dst_nid, dist);
else
@@ -1757,11 +1790,12 @@ static int task_numa_migrate(struct task_struct *p)
.best_imp = 0,
.best_cpu = -1,
};
+ unsigned long taskweight, groupweight;
struct sched_domain *sd;
+ long taskimp, groupimp;
+ struct numa_group *ng;
struct rq *best_rq;
- unsigned long taskweight, groupweight;
int nid, ret, dist;
- long taskimp, groupimp;
/*
* Pick the lowest SD_NUMA domain, as that would have the smallest
@@ -1807,7 +1841,8 @@ static int task_numa_migrate(struct task_struct *p)
* multiple NUMA nodes; in order to better consolidate the group,
* we need to check other locations.
*/
- if (env.best_cpu == -1 || (p->numa_group && p->numa_group->active_nodes > 1)) {
+ ng = deref_curr_numa_group(p);
+ if (env.best_cpu == -1 || (ng && ng->active_nodes > 1)) {
for_each_online_node(nid) {
if (nid == env.src_nid || nid == p->numa_preferred_nid)
continue;
@@ -1840,7 +1875,7 @@ static int task_numa_migrate(struct task_struct *p)
* A task that migrated to a second choice node will be better off
* trying for a better one later. Do not set the preferred node here.
*/
- if (p->numa_group) {
+ if (ng) {
if (env.best_cpu == -1)
nid = env.src_nid;
else
@@ -2135,6 +2170,7 @@ static void task_numa_placement(struct task_struct *p)
unsigned long total_faults;
u64 runtime, period;
spinlock_t *group_lock = NULL;
+ struct numa_group *ng;
/*
* The p->mm->numa_scan_seq field gets updated without
@@ -2152,8 +2188,9 @@ static void task_numa_placement(struct task_struct *p)
runtime = numa_get_avg_runtime(p, &period);
/* If the task is part of a group prevent parallel updates to group stats */
- if (p->numa_group) {
- group_lock = &p->numa_group->lock;
+ ng = deref_curr_numa_group(p);
+ if (ng) {
+ group_lock = &ng->lock;
spin_lock_irq(group_lock);
}
@@ -2194,7 +2231,7 @@ static void task_numa_placement(struct task_struct *p)
p->numa_faults[cpu_idx] += f_diff;
faults += p->numa_faults[mem_idx];
p->total_numa_faults += diff;
- if (p->numa_group) {
+ if (ng) {
/*
* safe because we can only change our own group
*
@@ -2202,14 +2239,14 @@ static void task_numa_placement(struct task_struct *p)
* nid and priv in a specific region because it
* is at the beginning of the numa_faults array.
*/
- p->numa_group->faults[mem_idx] += diff;
- p->numa_group->faults_cpu[mem_idx] += f_diff;
- p->numa_group->total_faults += diff;
- group_faults += p->numa_group->faults[mem_idx];
+ ng->faults[mem_idx] += diff;
+ ng->faults_cpu[mem_idx] += f_diff;
+ ng->total_faults += diff;
+ group_faults += ng->faults[mem_idx];
}
}
- if (!p->numa_group) {
+ if (!ng) {
if (faults > max_faults) {
max_faults = faults;
max_nid = nid;
@@ -2220,8 +2257,8 @@ static void task_numa_placement(struct task_struct *p)
}
}
- if (p->numa_group) {
- numa_group_count_active_nodes(p->numa_group);
+ if (ng) {
+ numa_group_count_active_nodes(ng);
spin_unlock_irq(group_lock);
max_nid = preferred_group_nid(p, max_nid);
}
@@ -2255,7 +2292,7 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
int cpu = cpupid_to_cpu(cpupid);
int i;
- if (unlikely(!p->numa_group)) {
+ if (unlikely(!deref_curr_numa_group(p))) {
unsigned int size = sizeof(struct numa_group) +
4*nr_node_ids*sizeof(unsigned long);
@@ -2291,7 +2328,7 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
if (!grp)
goto no_join;
- my_grp = p->numa_group;
+ my_grp = deref_curr_numa_group(p);
if (grp == my_grp)
goto no_join;
@@ -2362,7 +2399,8 @@ static void task_numa_group(struct task_struct *p, int cpupid, int flags,
*/
void task_numa_free(struct task_struct *p, bool final)
{
- struct numa_group *grp = p->numa_group;
+ /* safe: p either is current or is being freed by current */
+ struct numa_group *grp = rcu_dereference_raw(p->numa_group);
unsigned long *numa_faults = p->numa_faults;
unsigned long flags;
int i;
@@ -2442,7 +2480,7 @@ void task_numa_fault(int last_cpupid, int mem_node, int pages, int flags)
* actively using should be counted as local. This allows the
* scan rate to slow down when a workload has settled down.
*/
- ng = p->numa_group;
+ ng = deref_curr_numa_group(p);
if (!priv && !local && ng && ng->active_nodes > 1 &&
numa_is_active_node(cpu_node, ng) &&
numa_is_active_node(mem_node, ng))
@@ -10460,18 +10498,22 @@ void show_numa_stats(struct task_struct *p, struct seq_file *m)
{
int node;
unsigned long tsf = 0, tpf = 0, gsf = 0, gpf = 0;
+ struct numa_group *ng;
+ rcu_read_lock();
+ ng = rcu_dereference(p->numa_group);
for_each_online_node(node) {
if (p->numa_faults) {
tsf = p->numa_faults[task_faults_idx(NUMA_MEM, node, 0)];
tpf = p->numa_faults[task_faults_idx(NUMA_MEM, node, 1)];
}
- if (p->numa_group) {
- gsf = p->numa_group->faults[task_faults_idx(NUMA_MEM, node, 0)],
- gpf = p->numa_group->faults[task_faults_idx(NUMA_MEM, node, 1)];
+ if (ng) {
+ gsf = ng->faults[task_faults_idx(NUMA_MEM, node, 0)],
+ gpf = ng->faults[task_faults_idx(NUMA_MEM, node, 1)];
}
print_numa_stats(m, node, tsf, tpf, gsf, gpf);
}
+ rcu_read_unlock();
}
#endif /* CONFIG_NUMA_BALANCING */
#endif /* CONFIG_SCHED_DEBUG */
Hi, Greg, hi Sasha,
I noticed the fixes for CVE-2019-3900 are only backported to 4.14.133+,
but not to 4.19, also 5.1, fixes have been included in 5.2.
So I backported to 4.19, only compiles fine, no functional tests.
Please review, and consider to include in next release.
Regards,
Jack Wang 1 & 1 IONOS Cloud GmbH
Jason Wang (4):
vhost: introduce vhost_exceeds_weight()
vhost_net: fix possible infinite loop
vhost: vsock: add weight support
vhost: scsi: add weight support
drivers/vhost/net.c | 41 ++++++++++++++---------------------------
drivers/vhost/scsi.c | 15 +++++++++++----
drivers/vhost/vhost.c | 20 +++++++++++++++++++-
drivers/vhost/vhost.h | 5 ++++-
drivers/vhost/vsock.c | 28 +++++++++++++++++++++-------
5 files changed, 69 insertions(+), 40 deletions(-)
--
2.17.1
Looks like a regression got introduced into nv50_mstc_atomic_check()
that somehow didn't get found until now. If userspace changes
crtc_state->active to false but leaves the CRTC enabled, we end up
calling drm_dp_atomic_find_vcpi_slots() using the PBN calculated in
asyh->dp.pbn. However, if the display is inactive we end up calculating
a PBN of 0, which inadvertently causes us to have an allocation of 0.
>From there, if userspace then disables the CRTC afterwards we end up
accidentally attempting to free the VCPI twice:
WARNING: CPU: 0 PID: 1484 at drivers/gpu/drm/drm_dp_mst_topology.c:3336
drm_dp_atomic_release_vcpi_slots+0x87/0xb0 [drm_kms_helper]
RIP: 0010:drm_dp_atomic_release_vcpi_slots+0x87/0xb0 [drm_kms_helper]
Call Trace:
drm_atomic_helper_check_modeset+0x3f3/0xa60 [drm_kms_helper]
? drm_atomic_check_only+0x43/0x780 [drm]
drm_atomic_helper_check+0x15/0x90 [drm_kms_helper]
nv50_disp_atomic_check+0x83/0x1d0 [nouveau]
drm_atomic_check_only+0x54d/0x780 [drm]
? drm_atomic_set_crtc_for_connector+0xec/0x100 [drm]
drm_atomic_commit+0x13/0x50 [drm]
drm_atomic_helper_set_config+0x81/0x90 [drm_kms_helper]
drm_mode_setcrtc+0x194/0x6a0 [drm]
? vprintk_emit+0x16a/0x230
? drm_ioctl+0x163/0x390 [drm]
? drm_mode_getcrtc+0x180/0x180 [drm]
drm_ioctl_kernel+0xaa/0xf0 [drm]
drm_ioctl+0x208/0x390 [drm]
? drm_mode_getcrtc+0x180/0x180 [drm]
nouveau_drm_ioctl+0x63/0xb0 [nouveau]
do_vfs_ioctl+0x405/0x660
? recalc_sigpending+0x17/0x50
? _copy_from_user+0x37/0x60
ksys_ioctl+0x5e/0x90
? exit_to_usermode_loop+0x92/0xe0
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x59/0x190
entry_SYSCALL_64_after_hwframe+0x44/0xa9
WARNING: CPU: 0 PID: 1484 at drivers/gpu/drm/drm_dp_mst_topology.c:3336
drm_dp_atomic_release_vcpi_slots+0x87/0xb0 [drm_kms_helper]
---[ end trace 4c395c0c51b1f88d ]---
[drm:drm_dp_atomic_release_vcpi_slots [drm_kms_helper]] *ERROR* no VCPI for
[MST PORT:00000000e288eb7d] found in mst state 000000008e642070
So, fix this by doing what we probably should have done from the start: only
call drm_dp_atomic_find_vcpi_slots() when crtc_state->mode_changed is set, so
that VCPI allocations remain for as long as the CRTC is enabled.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Fixes: 232c9eec417a ("drm/nouveau: Use atomic VCPI helpers for MST")
Cc: Lyude Paul <lyude(a)redhat.com>
Cc: Ben Skeggs <bskeggs(a)redhat.com>
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Cc: David Airlie <airlied(a)redhat.com>
Cc: Jerry Zuo <Jerry.Zuo(a)amd.com>
Cc: Harry Wentland <harry.wentland(a)amd.com>
Cc: Juston Li <juston.li(a)intel.com>
Cc: Karol Herbst <karolherbst(a)gmail.com>
Cc: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Cc: Ilia Mirkin <imirkin(a)alum.mit.edu>
Cc: <stable(a)vger.kernel.org> # v5.1+
---
drivers/gpu/drm/nouveau/dispnv50/disp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c b/drivers/gpu/drm/nouveau/dispnv50/disp.c
index 8497768f1b41..126703816794 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
@@ -780,7 +780,7 @@ nv50_msto_atomic_check(struct drm_encoder *encoder,
drm_dp_calc_pbn_mode(crtc_state->adjusted_mode.clock,
connector->display_info.bpc * 3);
- if (drm_atomic_crtc_needs_modeset(crtc_state)) {
+ if (crtc_state->mode_changed) {
slots = drm_dp_atomic_find_vcpi_slots(state, &mstm->mgr,
mstc->port,
asyh->dp.pbn);
--
2.21.0
Hi,
[This is an automated email]
This commit has been processed because it contains a "Fixes:" tag,
fixing commit: .
The bot has tested the following trees: v5.2.4, v5.1.21, v4.19.62, v4.14.134, v4.9.186, v4.4.186.
v5.2.4: Build OK!
v5.1.21: Build OK!
v4.19.62: Build OK!
v4.14.134: Failed to apply! Possible dependencies:
25a13e382de2 ("bluetooth: hci_qca: Replace GFP_ATOMIC with GFP_KERNEL")
v4.9.186: Failed to apply! Possible dependencies:
25a13e382de2 ("bluetooth: hci_qca: Replace GFP_ATOMIC with GFP_KERNEL")
v4.4.186: Failed to apply! Possible dependencies:
162f812f23ba ("Bluetooth: hci_uart: Add Marvell support")
25a13e382de2 ("bluetooth: hci_qca: Replace GFP_ATOMIC with GFP_KERNEL")
395174bb07c1 ("Bluetooth: hci_uart: Add Intel/AG6xx support")
9e69130c4efc ("Bluetooth: hci_uart: Add Nokia Protocol identifier")
NOTE: The patch will not be queued to stable trees until it is upstream.
How should we proceed with this patch?
--
Thanks,
Sasha
These patches include few backported fixes for the 4.4 stable
tree.
I would appreciate if you could kindly consider including them in the
next release.
Ajay
---
[PATCH 1/8]:
Backporting of upstream commit f958d7b528b1:
mm: make page ref count overflow check tighter and more explicit
[PATCH 2/8]:
Backporting of upstream commit 88b1a17dfc3e:
mm: add 'try_get_page()' helper function
[PATCH 3/8]:
Backporting of upstream commit 7aef4172c795:
mm: handle PTE-mapped tail pages in gerneric fast gup implementaiton
[PATCH 4/8]:
Backporting of upstream commit a3e328556d41:
mm, gup: remove broken VM_BUG_ON_PAGE compound check for hugepages
[PATCH 5/8]:
Backporting of upstream commit d63206ee32b6:
mm, gup: ensure real head page is ref-counted when using hugepages
[PATCH 6/8]:
Backporting of upstream commit 8fde12ca79af:
mm: prevent get_user_pages() from overflowing page refcount
[PATCH 7/8]:
Backporting of upstream commit 7bf2d1df8082:
pipe: add pipe_buf_get() helper
[PATCH 8/8]:
Backporting of upstream commit 15fab63e1e57:
fs: prevent page refcount overflow in pipe_buf_get
Hi,
Sunil Muthuswamy <sunilmut(a)microsoft.com> made some important fixes for
hv_sock recently. I and Sunil think it would be great to backport the
fixes to the longterm stable kernels.
Since hv_sock was firstly introduced in v4.14, we only care about
v4.14, v4.19 and v5.2.
For linux-5.2.y (currently it's v5.2.4), only one patch is missing.
The mainline commit ID is:
d5afa82c977e ("vsock: correct removal of socket from the list")
It can be cleanly cherry-picked from the mainline.
For linux-4.19.y (currently it's v4.19.62), 3 patches are missing.
The mainline commit IDs are:
cb359b604167 ("hvsock: fix epollout hang from race condition")
a9eeb998c28d ("hv_sock: Add support for delayed close")
d5afa82c977e ("vsock: correct removal of socket from the list")
They can be cleanly cherry-picked from the mainline, in the listed order here.
Note: it looks the first commit (cb359b604167) has been queued.
For linux-4.14.y (currently it's v4.14.134), 4 patches are missing.
The mainline commit IDs are:
cb359b604167 ("hvsock: fix epollout hang from race condition")
3b4477d2dcf2 ("VSOCK: use TCP state constants for sk_state")
a9eeb998c28d ("hv_sock: Add support for delayed close")
d5afa82c977e ("vsock: correct removal of socket from the list")
The third patch (a9eeb998c28d) needs small manual adjustments, and please
use the attached backported patch for it; the other 3 patches can be cleanly
cherry-picked from the mainline, in the listed order here.
Note: it looks the first commit (cb359b604167) has been queued.
Please let me know, if any clarification is needed.
Thanks!
-- Dexuan
On Thu, 1 Aug 2019 at 07:31, Sasha Levin <sashal(a)kernel.org> wrote:
>
> Hi,
>
> [This is an automated email]
>
> This commit has been processed because it contains a -stable tag.
> The stable tag indicates that it's relevant for the following trees: all
>
> The bot has tested the following trees: v5.2.4, v5.1.21, v4.19.62, v4.14.134, v4.9.186, v4.4.186.
>
> v5.2.4: Build OK!
> v5.1.21: Build OK!
> v4.19.62: Failed to apply! Possible dependencies:
> 41a75cdde735 ("coresight: Convert driver messages to dev_dbg")
> 68a147752d04 ("coresight: etmx: Claim devices before use")
> e006d89abedd ("coresight: etm4x: Add support for handling errors")
> e2a1551a881f ("coresight: etm3: Add support for handling errors")
>
> v4.14.134: Failed to apply! Possible dependencies:
> 41a75cdde735 ("coresight: Convert driver messages to dev_dbg")
> 68a147752d04 ("coresight: etmx: Claim devices before use")
> e006d89abedd ("coresight: etm4x: Add support for handling errors")
> e2a1551a881f ("coresight: etm3: Add support for handling errors")
>
> v4.9.186: Failed to apply! Possible dependencies:
> 297ab90f15f6 ("coresight: tmc: Cleanup operation mode handling")
> 2cd541402829 ("coresight: tmc: minor fix for output log")
> 41a75cdde735 ("coresight: Convert driver messages to dev_dbg")
> 68a147752d04 ("coresight: etmx: Claim devices before use")
> c38e505e2701 ("coresight: tmc: Get rid of mode parameter for helper routines")
> e006d89abedd ("coresight: etm4x: Add support for handling errors")
> e2a1551a881f ("coresight: etm3: Add support for handling errors")
>
> v4.4.186: Failed to apply! Possible dependencies:
> 1925a470ce69 ("coresight: etm3x: splitting struct etm_drvdata")
> 2127154d115d ("coresight: etm3x: implementing user/kernel mode tracing")
> 22fd532eaa0c ("coresight: etm3x: adding operation mode for etm_enable()")
> 27b10da8fff2 ("coresight: etb10: moving to local atomic operations")
> 41a75cdde735 ("coresight: Convert driver messages to dev_dbg")
> 52210c8745e4 ("coresight: implementing 'cpu_id()' API")
> 68a147752d04 ("coresight: etmx: Claim devices before use")
> 882d5e112491 ("coresight: etm3x: implementing perf_enable/disable() API")
> b3e94405941e ("coresight: associating path with session rather than tracer")
> c04148e708c0 ("coresight: etm3x: moving sysFS entries to dedicated file")
> c1f8e57c9e66 ("coresight: etm3x: moving etm_readl/writel to header file")
> e2a1551a881f ("coresight: etm3: Add support for handling errors")
> e827d4550aa3 ("coresight: etb10: adding operation mode for sink->enable()")
>
>
> NOTE: The patch will not be queued to stable trees until it is upstream.
>
> How should we proceed with this patch?
I have another one like that - will send a separate set that applies correctly.
Thanks for the consideration,
Mathieu
>
> --
> Thanks,
> Sasha
On 01/08/2019 14:31, Sasha Levin wrote:
> Hi,
>
> [This is an automated email]
>
> This commit has been processed because it contains a -stable tag.
> The stable tag indicates that it's relevant for the following trees: all
>
> The bot has tested the following trees: v5.2.4, v5.1.21, v4.19.62, v4.14.134, v4.9.186, v4.4.186.
>
> v5.2.4: Build OK!
> v5.1.21: Build OK!
> v4.19.62: Build OK!
> v4.14.134: Failed to apply! Possible dependencies:
> 031e3601869c ("lib: Add generic PIO mapping method")
>
> v4.9.186: Failed to apply! Possible dependencies:
> 031e3601869c ("lib: Add generic PIO mapping method")
>
> v4.4.186: Failed to apply! Possible dependencies:
> 031e3601869c ("lib: Add generic PIO mapping method")
>
>
> NOTE: The patch will not be queued to stable trees until it is upstream.
>
> How should we proceed with this patch?
>
Please only consider for v4.19.x, v5.1.x, and v5.2.x stable trees, same
for all patches in the patchset.
Thanks,
John
> --
> Thanks,
> Sasha
>
> .
>
Dear stable maintainers,
Please include the commit 66b20ac0a1a10769d059d6903202f53494e3d902 into the stable 5.1 and 5.2.
Commit title: nvme: fix multipath crash when ANA is deactivated
The bug that this commit fixes was seen crashing kernels or provoking an oops from 5.0 onwards.
Regards,
Marta
Sehr geehrte Damen und Herren,
nach unserem Besuch Ihrer Homepage möchten wir Ihnen ein Angebot von Produkten vorstellen, das Ihnen ermöglichen wird, den Verkauf Ihrer Produkte sowie Dienstleistungen deutlich zu erhöhen.
Die Datenbanken der Firmen sind in für Sie interessante und relevante Zielgruppen untergliedert.
Die Firmenangaben beinhalten: Name der Firma, Ansprechpartner, E-mail Adresse, Tel. + Fax-Nr., PLZ, Ort, Straße etc.
1. Gesamtpaket 2019 DE - 1,4 Mio. Firmenadressen ( 1 457 620 ) - 190 € ( bis zum 01.08.2019 )
2. Gesamtpaket 2019 DE,AT,CH - 1,7 Mio. Firmenadressen ( 1 747 921 ) - 240 € ( bis zum 01.08.2019 )
3. Schweiz 2019 ( 187 911 ) - 149 € ( bis zum 01.08.2019 )
4. Österreich 2019 ( 104 000 ) - 149 € ( bis zum 01.08.2019 )
Die Verwendungsmöglichkeiten der Datenbanken sind praktisch unbegrenzt und Sie können durch Verwendung der von uns entwickelten
Programme des personalisierten Versendens von Angeboten u.ä. mittels E-mailing bzw. Fax effektive und sichere Werbekampagnen damit durchführen.
Bitte informieren Sie sich über die weiteren Details einmal unverbindlich auf unseren Webseiten:
http://www.acdbmarketing.net/?page=catalog
Mit freundlichen Grüßen
Oliver Klemens
From: Wanpeng Li <wanpengli(a)tencent.com>
After commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts), a
five years ago bug is exposed. Running ebizzy benchmark in three 80 vCPUs VMs
on one 80 pCPUs Skylake server, a lot of rcu_sched stall warning splatting
in the VMs after stress testing:
INFO: rcu_sched detected stalls on CPUs/tasks: { 4 41 57 62 77} (detected by 15, t=60004 jiffies, g=899, c=898, q=15073)
Call Trace:
flush_tlb_mm_range+0x68/0x140
tlb_flush_mmu.part.75+0x37/0xe0
tlb_finish_mmu+0x55/0x60
zap_page_range+0x142/0x190
SyS_madvise+0x3cd/0x9c0
system_call_fastpath+0x1c/0x21
swait_active() sustains to be true before finish_swait() is called in
kvm_vcpu_block(), voluntarily preempted vCPUs are taken into account
by kvm_vcpu_on_spin() loop greatly increases the probability condition
kvm_arch_vcpu_runnable(vcpu) is checked and can be true, when APICv
is enabled the yield-candidate vCPU's VMCS RVI field leaks(by
vmx_sync_pir_to_irr()) into spinning-on-a-taken-lock vCPU's current
VMCS.
This patch fixes it by reverting the kvm_arch_vcpu_runnable() condition
in kvm_vcpu_on_spin() loop.
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Radim Krčmář <rkrcmar(a)redhat.com>
Fixes: 98f4a1467 (KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop)
Cc: stable(a)vger.kernel.org
Signed-off-by: Wanpeng Li <wanpengli(a)tencent.com>
---
virt/kvm/kvm_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ed061d8..12f2c91 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2506,7 +2506,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
continue;
if (vcpu == me)
continue;
- if (swait_active(&vcpu->wq) && !kvm_arch_vcpu_runnable(vcpu))
+ if (swait_active(&vcpu->wq))
continue;
if (READ_ONCE(vcpu->preempted) && yield_to_kernel_mode &&
!kvm_arch_vcpu_in_kernel(vcpu))
--
2.7.4
From: Munehisa Kamata <kamatam(a)amazon.com>
Commit abbbdf12497d ("replace kill_bdev() with __invalidate_device()")
once did this, but 29eaadc03649 ("nbd: stop using the bdev everywhere")
resurrected kill_bdev() and it has been there since then. So buffer_head
mappings still get killed on a server disconnection, and we can still
hit the BUG_ON on a filesystem on the top of the nbd device.
EXT4-fs (nbd0): mounted filesystem with ordered data mode. Opts: (null)
block nbd0: Receive control failed (result -32)
block nbd0: shutting down sockets
print_req_error: I/O error, dev nbd0, sector 66264 flags 3000
EXT4-fs warning (device nbd0): htree_dirblock_to_tree:979: inode #2: lblock 0: comm ls: error -5 reading directory block
print_req_error: I/O error, dev nbd0, sector 2264 flags 3000
EXT4-fs error (device nbd0): __ext4_get_inode_loc:4690: inode #2: block 283: comm ls: unable to read itable block
EXT4-fs error (device nbd0) in ext4_reserve_inode_write:5894: IO failure
------------[ cut here ]------------
kernel BUG at fs/buffer.c:3057!
invalid opcode: 0000 [#1] SMP PTI
CPU: 7 PID: 40045 Comm: jbd2/nbd0-8 Not tainted 5.1.0-rc3+ #4
Hardware name: Amazon EC2 m5.12xlarge/, BIOS 1.0 10/16/2017
RIP: 0010:submit_bh_wbc+0x18b/0x190
...
Call Trace:
jbd2_write_superblock+0xf1/0x230 [jbd2]
? account_entity_enqueue+0xc5/0xf0
jbd2_journal_update_sb_log_tail+0x94/0xe0 [jbd2]
jbd2_journal_commit_transaction+0x12f/0x1d20 [jbd2]
? __switch_to_asm+0x40/0x70
...
? lock_timer_base+0x67/0x80
kjournald2+0x121/0x360 [jbd2]
? remove_wait_queue+0x60/0x60
kthread+0xf8/0x130
? commit_timeout+0x10/0x10 [jbd2]
? kthread_bind+0x10/0x10
ret_from_fork+0x35/0x40
With __invalidate_device(), I no longer hit the BUG_ON with sync or
unmount on the disconnected device.
Fixes: 29eaadc03649 ("nbd: stop using the bdev everywhere")
Cc: linux-block(a)vger.kernel.org
Cc: Ratna Manoj Bolla <manoj.br(a)gmail.com>
Cc: nbd(a)other.debian.org
Cc: stable(a)vger.kernel.org
Cc: David Woodhouse <dwmw(a)amazon.com>
Signed-off-by: Munehisa Kamata <kamatam(a)amazon.com>
---
I reproduced this phenomenon on the fat file system.
reproduce steps :
1.Establish a nbd connection.
2.Run two threads:one do mount and umount,anther one do clear_sock ioctl
3.Then hit the BUG_ON.
v2: Delete a link.
Signed-off-by: SunKe <sunke32(a)huawei.com>
drivers/block/nbd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 9bcde23..e21d2de 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1231,7 +1231,7 @@ static void nbd_clear_sock_ioctl(struct nbd_device *nbd,
struct block_device *bdev)
{
sock_shutdown(nbd);
- kill_bdev(bdev);
+ __invalidate_device(bdev, true);
nbd_bdev_reset(bdev);
if (test_and_clear_bit(NBD_HAS_CONFIG_REF,
&nbd->config->runtime_flags))
--
2.7.4
The patch titled
Subject: mm/usercopy: use memory range to be accessed for wraparound check
has been added to the -mm tree. Its filename is
mm-usercopy-use-memory-range-to-be-accessed-for-wraparound-check.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-usercopy-use-memory-range-to-be…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-usercopy-use-memory-range-to-be…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: "Isaac J. Manjarres" <isaacm(a)codeaurora.org>
Subject: mm/usercopy: use memory range to be accessed for wraparound check
Currently, when checking to see if accessing n bytes starting at address
"ptr" will cause a wraparound in the memory addresses, the check in
check_bogus_address() adds an extra byte, which is incorrect, as the range
of addresses that will be accessed is [ptr, ptr + (n - 1)].
This can lead to incorrectly detecting a wraparound in the memory address,
when trying to read 4 KB from memory that is mapped to the the last
possible page in the virtual address space, when in fact, accessing that
range of memory would not cause a wraparound to occur.
Use the memory range that will actually be accessed when considering if
accessing a certain amount of bytes will cause the memory address to wrap
around.
Link: http://lkml.kernel.org/r/1564509253-23287-1-git-send-email-isaacm@codeauror…
Fixes: f5509cc18daa ("mm: Hardened usercopy")
Signed-off-by: Prasad Sodagudi <psodagud(a)codeaurora.org>
Signed-off-by: Isaac J. Manjarres <isaacm(a)codeaurora.org>
Co-developed-by: Prasad Sodagudi <psodagud(a)codeaurora.org>
Reviewed-by: William Kucharski <william.kucharski(a)oracle.com>
Acked-by: Kees Cook <keescook(a)chromium.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Trilok Soni <tsoni(a)codeaurora.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/usercopy.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/usercopy.c~mm-usercopy-use-memory-range-to-be-accessed-for-wraparound-check
+++ a/mm/usercopy.c
@@ -147,7 +147,7 @@ static inline void check_bogus_address(c
bool to_user)
{
/* Reject if object wraps past end of memory. */
- if (ptr + n < ptr)
+ if (ptr + (n - 1) < ptr)
usercopy_abort("wrapped address", NULL, to_user, 0, ptr + n);
/* Reject if NULL or ZERO-allocation. */
_
Patches currently in -mm which might be from isaacm(a)codeaurora.org are
mm-usercopy-use-memory-range-to-be-accessed-for-wraparound-check.patch
Currently, when checking to see if accessing n bytes starting at
address "ptr" will cause a wraparound in the memory addresses,
the check in check_bogus_address() adds an extra byte, which is
incorrect, as the range of addresses that will be accessed is
[ptr, ptr + (n - 1)].
This can lead to incorrectly detecting a wraparound in the
memory address, when trying to read 4 KB from memory that is
mapped to the the last possible page in the virtual address
space, when in fact, accessing that range of memory would not
cause a wraparound to occur.
Use the memory range that will actually be accessed when
considering if accessing a certain amount of bytes will cause
the memory address to wrap around.
Fixes: f5509cc18daa ("mm: Hardened usercopy")
Co-developed-by: Prasad Sodagudi <psodagud(a)codeaurora.org>
Signed-off-by: Prasad Sodagudi <psodagud(a)codeaurora.org>
Signed-off-by: Isaac J. Manjarres <isaacm(a)codeaurora.org>
Cc: stable(a)vger.kernel.org
Reviewed-by: William Kucharski <william.kucharski(a)oracle.com>
Acked-by: Kees Cook <keescook(a)chromium.org>
---
mm/usercopy.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/usercopy.c b/mm/usercopy.c
index 2a09796..98e92486 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -147,7 +147,7 @@ static inline void check_bogus_address(const unsigned long ptr, unsigned long n,
bool to_user)
{
/* Reject if object wraps past end of memory. */
- if (ptr + n < ptr)
+ if (ptr + (n - 1) < ptr)
usercopy_abort("wrapped address", NULL, to_user, 0, ptr + n);
/* Reject if NULL or ZERO-allocation. */
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
Hi Sasha,
On Thu, Jul 18, 2019 at 5:45 PM Sasha Levin <sashal(a)kernel.org> wrote:
>
> Hi,
>
> [This is an automated email]
Where did you get this patch from? Since stable needs patches merged
on Linus tree,
shouldn't your scripts run to try backporting only patches from there?
Thanks,
Rodrigo.
>
> This commit has been processed because it contains a "Fixes:" tag,
> fixing commit: 88a0d9606aff drm/i915/vbt: Parse and use the new field with PSR2 TP2/3 wakeup time.
>
> The bot has tested the following trees: v5.2.1.
> v5.2.1: Failed to apply! Possible dependencies:
> 231dcffc234f ("drm/i915/bios: add BDB block comments before definitions")
> 843444ed1301 ("drm/i915/bios: sort BDB block definitions using block ID")
> f87f6599c843 ("drm/i915/bios: reserve struct bdb_ prefix for BDB blocks")
>
>
> NOTE: The patch will not be queued to stable trees until it is upstream.
>
> How should we proceed with this patch?
>
> --
> Thanks,
> Sasha
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx(a)lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Rodrigo Vivi
Blog: http://blog.vivi.eng.br
Backport commits from master that fix boot failure on some intel
machines.
I have only boot tested this in a VM. Functional testing for v4.14 is
out of my scope as patches differ only on a trivial conflict from v4.19,
where I discovered/debugged the issue. While testing v4.14 stable on
affected nodes would require porting some other [local] patches,
which is out of my timelimit for the backport.
Hopefully, that's fine.
Cc: David Woodhouse <dwmw2(a)infradead.org>
Cc: Joerg Roedel <joro(a)8bytes.org>
Cc: Joerg Roedel <jroedel(a)suse.de>
Cc: Lu Baolu <baolu.lu(a)linux.intel.com>
Dmitry Safonov (1):
iommu/vt-d: Don't queue_iova() if there is no flush queue
Joerg Roedel (1):
iommu/iova: Fix compilation error with !CONFIG_IOMMU_IOVA
drivers/iommu/intel-iommu.c | 2 +-
drivers/iommu/iova.c | 18 ++++++++++++++----
include/linux/iova.h | 6 ++++++
3 files changed, 21 insertions(+), 5 deletions(-)
--
2.22.0
Hello Greg,
Can you please consider including the following patches in the stable
linux-4.14.y branch?
The following patch series attempts to address the issues raised by
Stan Hu around NFSv4 lookup revalidation.
The first patch in the series should, in principle suffice to address
the exact issue raised by Stan, however when looking at the
implementation of nfs4_lookup_revalidate(), it becomes clear that we're
not doing enough to revalidate the dentry itself when performing NFSv4.1
opens either.
Trond Myklebust (3):
NFS: Fix dentry revalidation on NFSv4 lookup
NFS: Refactor nfs_lookup_revalidate()
NFSv4: Fix lookup revalidate of regular files
zhangliguang (1):
NFS: Remove redundant semicolon
fs/nfs/dir.c | 295 ++++++++++++++++++++++++++++++------------------------
fs/nfs/nfs4proc.c | 15 ++-
2 files changed, 174 insertions(+), 136 deletions(-)
--
2.15.3.AMZN
From: Munehisa Kamata <kamatam(a)amazon.com>
Commit abbbdf12497d ("replace kill_bdev() with __invalidate_device()")
once did this, but 29eaadc03649 ("nbd: stop using the bdev everywhere")
resurrected kill_bdev() and it has been there since then. So buffer_head
mappings still get killed on a server disconnection, and we can still
hit the BUG_ON on a filesystem on the top of the nbd device.
EXT4-fs (nbd0): mounted filesystem with ordered data mode. Opts: (null)
block nbd0: Receive control failed (result -32)
block nbd0: shutting down sockets
print_req_error: I/O error, dev nbd0, sector 66264 flags 3000
EXT4-fs warning (device nbd0): htree_dirblock_to_tree:979: inode #2: lblock 0: comm ls: error -5 reading directory block
print_req_error: I/O error, dev nbd0, sector 2264 flags 3000
EXT4-fs error (device nbd0): __ext4_get_inode_loc:4690: inode #2: block 283: comm ls: unable to read itable block
EXT4-fs error (device nbd0) in ext4_reserve_inode_write:5894: IO failure
------------[ cut here ]------------
kernel BUG at fs/buffer.c:3057!
invalid opcode: 0000 [#1] SMP PTI
CPU: 7 PID: 40045 Comm: jbd2/nbd0-8 Not tainted 5.1.0-rc3+ #4
Hardware name: Amazon EC2 m5.12xlarge/, BIOS 1.0 10/16/2017
RIP: 0010:submit_bh_wbc+0x18b/0x190
...
Call Trace:
jbd2_write_superblock+0xf1/0x230 [jbd2]
? account_entity_enqueue+0xc5/0xf0
jbd2_journal_update_sb_log_tail+0x94/0xe0 [jbd2]
jbd2_journal_commit_transaction+0x12f/0x1d20 [jbd2]
? __switch_to_asm+0x40/0x70
...
? lock_timer_base+0x67/0x80
kjournald2+0x121/0x360 [jbd2]
? remove_wait_queue+0x60/0x60
kthread+0xf8/0x130
? commit_timeout+0x10/0x10 [jbd2]
? kthread_bind+0x10/0x10
ret_from_fork+0x35/0x40
With __invalidate_device(), I no longer hit the BUG_ON with sync or
unmount on the disconnected device.
Fixes: 29eaadc03649 ("nbd: stop using the bdev everywhere")
Cc: linux-block(a)vger.kernel.org
Cc: Ratna Manoj Bolla <manoj.br(a)gmail.com>
Cc: nbd(a)other.debian.org
Cc: stable(a)vger.kernel.org
Cc: David Woodhouse <dwmw(a)amazon.com>
Signed-off-by: Munehisa Kamata <kamatam(a)amazon.com>
CR: https://code.amazon.com/reviews/CR-7629288
---
I reproduced this phenomenon on the fat file system.
reproduce steps :
1.Establish a nbd connection.
2.Run two threads:one do mount and umount,anther one do clear_sock ioctl
3.Then hit the BUG_ON.
drivers/block/nbd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 9bcde23..e21d2de 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1231,7 +1231,7 @@ static void nbd_clear_sock_ioctl(struct nbd_device *nbd,
struct block_device *bdev)
{
sock_shutdown(nbd);
- kill_bdev(bdev);
+ __invalidate_device(bdev, true);
nbd_bdev_reset(bdev);
if (test_and_clear_bit(NBD_HAS_CONFIG_REF,
&nbd->config->runtime_flags))
--
2.7.4
From: Will Deacon <will.deacon(a)arm.com>
[ Upstream commit 24951465cbd279f60b1fdc2421b3694405bcff42 ]
arch/arm/ defines a SIGMINSTKSZ of 2k, so we should use the same value
for compat tasks.
Cc: <stable(a)vger.kernel.org> # 4.9+
Cc: Aurelien Jarno <aurelien(a)aurel32.net>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Dominik Brodowski <linux(a)dominikbrodowski.net>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Reviewed-by: Dave Martin <Dave.Martin(a)arm.com>
Reported-by: Steve McIntyre <steve.mcintyre(a)arm.com>
Tested-by: Steve McIntyre <93sam(a)debian.org>
Signed-off-by: Will Deacon <will.deacon(a)arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com>
---
Aurelien points out that this didn't get selected for -stable despite its
counterpart (22839869f21a ("signal: Introduce COMPAT_SIGMINSTKSZ for use
in compat_sys_sigaltstack")) being backported to 4.9. Oops.
arch/arm64/include/asm/compat.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/include/asm/compat.h b/arch/arm64/include/asm/compat.h
index 1a037b94eba1..cee28a05ee98 100644
--- a/arch/arm64/include/asm/compat.h
+++ b/arch/arm64/include/asm/compat.h
@@ -159,6 +159,7 @@ static inline compat_uptr_t ptr_to_compat(void __user *uptr)
}
#define compat_user_stack_pointer() (user_stack_pointer(task_pt_regs(current)))
+#define COMPAT_MINSIGSTKSZ 2048
static inline void __user *arch_compat_alloc_user_space(long len)
{
--
2.11.0
From: Todd Kjos <tkjos(a)android.com>
commit a370003cc301d4361bae20c9ef615f89bf8d1e8a upstream
There is a race between the binder driver cleaning
up a completed transaction via binder_free_transaction()
and a user calling binder_ioctl(BC_FREE_BUFFER) to
release a buffer. It doesn't matter which is first but
they need to be protected against running concurrently
which can result in a UAF.
Signed-off-by: Todd Kjos <tkjos(a)google.com>
Cc: stable <stable(a)vger.kernel.org> # 4.14 4.19
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/android/binder.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 5d67f5fec6c1b..2decb1a5a8e2f 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -1960,8 +1960,18 @@ static struct binder_thread *binder_get_txn_from_and_acq_inner(
static void binder_free_transaction(struct binder_transaction *t)
{
- if (t->buffer)
- t->buffer->transaction = NULL;
+ struct binder_proc *target_proc = t->to_proc;
+
+ if (target_proc) {
+ binder_inner_proc_lock(target_proc);
+ if (t->buffer)
+ t->buffer->transaction = NULL;
+ binder_inner_proc_unlock(target_proc);
+ }
+ /*
+ * If the transaction has no target_proc, then
+ * t->buffer->transaction has already been cleared.
+ */
kfree(t);
binder_stats_deleted(BINDER_STAT_TRANSACTION);
}
@@ -3484,10 +3494,12 @@ static int binder_thread_write(struct binder_proc *proc,
buffer->debug_id,
buffer->transaction ? "active" : "finished");
+ binder_inner_proc_lock(proc);
if (buffer->transaction) {
buffer->transaction->buffer = NULL;
buffer->transaction = NULL;
}
+ binder_inner_proc_unlock(proc);
if (buffer->async_transaction && buffer->target_node) {
struct binder_node *buf_node;
struct binder_work *w;
--
2.22.0.709.g102302147b-goog
From: allen yan <yanwei(a)marvell.com>
commit c737abc193d16e62e23e2fb585b8b7398ab380d8 upstream.
Armada-37xx UART0 registers are 0x200 bytes wide. Right next to them are
the UART1 registers that should not be declared in this node.
Update the example in DT bindings document accordingly.
Signed-off-by: allen yan <yanwei(a)marvell.com>
Signed-off-by: Miquel Raynal <miquel.raynal(a)free-electrons.com>
Signed-off-by: Gregory CLEMENT <gregory.clement(a)free-electrons.com>
Signed-off-by: Amit Pundir <amit.pundir(a)linaro.org>
---
Cherry-picked from lede/openwrt tree
https://git.lede-project.org/?p=source.git.
Build tested for ARCH=arm64 + defconfig
Cleanly apply on 4.9.y as well but since
lede stopped supporting v4.9.y, I'm not
sure if this patch is tested on v4.9.y at all.
Documentation/devicetree/bindings/serial/mvebu-uart.txt | 2 +-
arch/arm64/boot/dts/marvell/armada-37xx.dtsi | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/Documentation/devicetree/bindings/serial/mvebu-uart.txt b/Documentation/devicetree/bindings/serial/mvebu-uart.txt
index 6087defd9f93..d37fabe17bd1 100644
--- a/Documentation/devicetree/bindings/serial/mvebu-uart.txt
+++ b/Documentation/devicetree/bindings/serial/mvebu-uart.txt
@@ -8,6 +8,6 @@ Required properties:
Example:
serial@12000 {
compatible = "marvell,armada-3700-uart";
- reg = <0x12000 0x400>;
+ reg = <0x12000 0x200>;
interrupts = <43>;
};
diff --git a/arch/arm64/boot/dts/marvell/armada-37xx.dtsi b/arch/arm64/boot/dts/marvell/armada-37xx.dtsi
index 8c0cf7efac65..b554cdaf5e53 100644
--- a/arch/arm64/boot/dts/marvell/armada-37xx.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-37xx.dtsi
@@ -134,7 +134,7 @@
uart0: serial@12000 {
compatible = "marvell,armada-3700-uart";
- reg = <0x12000 0x400>;
+ reg = <0x12000 0x200>;
interrupts = <GIC_SPI 11 IRQ_TYPE_LEVEL_HIGH>;
status = "disabled";
};
--
2.7.4
'commit df9bde015a72 ("xen/gntdev.c: convert to use vm_map_pages()")'
breaks gntdev driver. If vma->vm_pgoff > 0, vm_map_pages()
will:
- use map->pages starting at vma->vm_pgoff instead of 0
- verify map->count against vma_pages()+vma->vm_pgoff instead of just
vma_pages().
In practice, this breaks using a single gntdev FD for mapping multiple
grants.
relevant strace output:
[pid 857] ioctl(7, IOCTL_GNTDEV_MAP_GRANT_REF, 0x7ffd3407b6d0) = 0
[pid 857] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 7, 0) =
0x777f1211b000
[pid 857] ioctl(7, IOCTL_GNTDEV_SET_UNMAP_NOTIFY, 0x7ffd3407b710) = 0
[pid 857] ioctl(7, IOCTL_GNTDEV_MAP_GRANT_REF, 0x7ffd3407b6d0) = 0
[pid 857] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 7,
0x1000) = -1 ENXIO (No such device or address)
details here:
https://github.com/QubesOS/qubes-issues/issues/5199
The reason is -> ( copying Marek's word from discussion)
vma->vm_pgoff is used as index passed to gntdev_find_map_index. It's
basically using this parameter for "which grant reference to map".
map struct returned by gntdev_find_map_index() describes just the pages
to be mapped. Specifically map->pages[0] should be mapped at
vma->vm_start, not vma->vm_start+vma->vm_pgoff*PAGE_SIZE.
When trying to map grant with index (aka vma->vm_pgoff) > 1,
__vm_map_pages() will refuse to map it because it will expect map->count
to be at least vma_pages(vma)+vma->vm_pgoff, while it is exactly
vma_pages(vma).
Converting vm_map_pages() to use vm_map_pages_zero() will fix the
problem.
Marek has tested and confirmed the same.
Reported-by: Marek Marczykowski-Górecki <marmarek(a)invisiblethingslab.com>
Signed-off-by: Souptick Joarder <jrdr.linux(a)gmail.com>
Tested-by: Marek Marczykowski-Górecki <marmarek(a)invisiblethingslab.com>
---
drivers/xen/gntdev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 4c339c7..a446a72 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -1143,7 +1143,7 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
goto out_put_map;
if (!use_ptemod) {
- err = vm_map_pages(vma, map->pages, map->count);
+ err = vm_map_pages_zero(vma, map->pages, map->count);
if (err)
goto out_put_map;
} else {
--
1.9.1
お問い合わせありがとうございます。
下記の内容にて送信いたしました。
■お名前
Williamecome様
■メールアドレス
stable(a)vger.kernel.org
■ご住所
■電話番号
457264
■お問い合わせ項目
■詳細内容
Find Girls in Near Me Area for Sex: http://consparceces.tk/fl38z?Z5RA5HC
このメールは カビ取り名人・rain-or-shineのHP (http://fukuoka-kawa-kenkyujyo.com) のお問い合わせフォームから送信されました
Hi
As this goes to the 4.9 series according to
https://www.kernel.org/doc/html/latest/networking/netdev-FAQ.html#q-are-all…
I'm sending it primarily to stable(a)v.k.o but Cc'ing Dave and Xin Long.
Could you please apply 99253eb750fd ("ipv6: check sk sk_type and
protocol early in ip_mroute_set/getsockopt") to the 4.9 stable series?
While 5e1859fbcc3c was done back in 3.8-rc1, 99253eb750fd from
4.11-rc1 was not backported to older stable series itself, where it is
needed as well.
Only checked if applicable without change in 4.9, but the fix should
probably go as well to the 4.4 and 3.16.
Regards,
Salvatore
A single 32-bit PSR2 training pattern field follows the sixteen element
array of PSR table entries in the VBT spec. But, we incorrectly define
this PSR2 field for each of the PSR table entries. As a result, the PSR1
training pattern duration for any panel_type != 0 will be parsed
incorrectly. Secondly, PSR2 training pattern durations for VBTs with bdb
version >= 226 will also be wrong.
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: José Roberto de Souza <jose.souza(a)intel.com>
Cc: stable(a)vger.kernel.org
Cc: stable(a)vger.kernel.org #v5.2
Fixes: 88a0d9606aff ("drm/i915/vbt: Parse and use the new field with PSR2 TP2/3 wakeup time")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111088
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204183
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan(a)intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Reviewed-by: José Roberto de Souza <jose.souza(a)intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Tested-by: François Guerraz <kubrick(a)fgv6.net>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190717223451.2595-1-dhinaka…
(cherry picked from commit b5ea9c9337007d6e700280c8a60b4e10d070fb53)
---
drivers/gpu/drm/i915/intel_bios.c | 2 +-
drivers/gpu/drm/i915/intel_vbt_defs.h | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_bios.c b/drivers/gpu/drm/i915/intel_bios.c
index 1dc8d03ff127..ee6fa75d65a2 100644
--- a/drivers/gpu/drm/i915/intel_bios.c
+++ b/drivers/gpu/drm/i915/intel_bios.c
@@ -762,7 +762,7 @@ parse_psr(struct drm_i915_private *dev_priv, const struct bdb_header *bdb)
}
if (bdb->version >= 226) {
- u32 wakeup_time = psr_table->psr2_tp2_tp3_wakeup_time;
+ u32 wakeup_time = psr->psr2_tp2_tp3_wakeup_time;
wakeup_time = (wakeup_time >> (2 * panel_type)) & 0x3;
switch (wakeup_time) {
diff --git a/drivers/gpu/drm/i915/intel_vbt_defs.h b/drivers/gpu/drm/i915/intel_vbt_defs.h
index fdbbb9a53804..796c070bbe6f 100644
--- a/drivers/gpu/drm/i915/intel_vbt_defs.h
+++ b/drivers/gpu/drm/i915/intel_vbt_defs.h
@@ -772,13 +772,13 @@ struct psr_table {
/* TP wake up time in multiple of 100 */
u16 tp1_wakeup_time;
u16 tp2_tp3_wakeup_time;
-
- /* PSR2 TP2/TP3 wakeup time for 16 panels */
- u32 psr2_tp2_tp3_wakeup_time;
} __packed;
struct bdb_psr {
struct psr_table psr_table[16];
+
+ /* PSR2 TP2/TP3 wakeup time for 16 panels */
+ u32 psr2_tp2_tp3_wakeup_time;
} __packed;
/*
--
2.17.1
The patch titled
Subject: mm/memcontrol.c: fix use after free in mem_cgroup_iter()
has been added to the -mm tree. Its filename is
mm-memcontrol-fix-use-after-free-in-mem_cgroup_iter.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-memcontrol-fix-use-after-free-i…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcontrol-fix-use-after-free-i…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Miles Chen <miles.chen(a)mediatek.com>
Subject: mm/memcontrol.c: fix use after free in mem_cgroup_iter()
This patch is sent to report an use after free in mem_cgroup_iter() after
merging commit be2657752e9e ("mm: memcg: fix use after free in
mem_cgroup_iter()").
I work with android kernel tree (4.9 & 4.14), and commit be2657752e9e
("mm: memcg: fix use after free in mem_cgroup_iter()") has been merged to
the trees. However, I can still observe use after free issues addressed
in the commit be2657752e9e. (on low-end devices, a few times this month)
backtrace:
css_tryget <- crash here
mem_cgroup_iter
shrink_node
shrink_zones
do_try_to_free_pages
try_to_free_pages
__perform_reclaim
__alloc_pages_direct_reclaim
__alloc_pages_slowpath
__alloc_pages_nodemask
To debug, I poisoned mem_cgroup before freeing it:
static void __mem_cgroup_free(struct mem_cgroup *memcg)
for_each_node(node)
free_mem_cgroup_per_node_info(memcg, node);
free_percpu(memcg->stat);
+ /* poison memcg before freeing it */
+ memset(memcg, 0x78, sizeof(struct mem_cgroup));
kfree(memcg);
}
The coredump shows the position=0xdbbc2a00 is freed.
(gdb) p/x ((struct mem_cgroup_per_node *)0xe5009e00)->iter[8]
$13 = {position = 0xdbbc2a00, generation = 0x2efd}
0xdbbc2a00: 0xdbbc2e00 0x00000000 0xdbbc2800 0x00000100
0xdbbc2a10: 0x00000200 0x78787878 0x00026218 0x00000000
0xdbbc2a20: 0xdcad6000 0x00000001 0x78787800 0x00000000
0xdbbc2a30: 0x78780000 0x00000000 0x0068fb84 0x78787878
0xdbbc2a40: 0x78787878 0x78787878 0x78787878 0xe3fa5cc0
0xdbbc2a50: 0x78787878 0x78787878 0x00000000 0x00000000
0xdbbc2a60: 0x00000000 0x00000000 0x00000000 0x00000000
0xdbbc2a70: 0x00000000 0x00000000 0x00000000 0x00000000
0xdbbc2a80: 0x00000000 0x00000000 0x00000000 0x00000000
0xdbbc2a90: 0x00000001 0x00000000 0x00000000 0x00100000
0xdbbc2aa0: 0x00000001 0xdbbc2ac8 0x00000000 0x00000000
0xdbbc2ab0: 0x00000000 0x00000000 0x00000000 0x00000000
0xdbbc2ac0: 0x00000000 0x00000000 0xe5b02618 0x00001000
0xdbbc2ad0: 0x00000000 0x78787878 0x78787878 0x78787878
0xdbbc2ae0: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2af0: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b00: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b10: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b20: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b30: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b40: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b50: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b60: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b70: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b80: 0x78787878 0x78787878 0x00000000 0x78787878
0xdbbc2b90: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2ba0: 0x78787878 0x78787878 0x78787878 0x78787878
In the reclaim path, try_to_free_pages() does not setup
sc.target_mem_cgroup and sc is passed to do_try_to_free_pages(), ...,
shrink_node().
In mem_cgroup_iter(), root is set to root_mem_cgroup because
sc->target_mem_cgroup is NULL. It is possible to assign a memcg to
root_mem_cgroup.nodeinfo.iter in mem_cgroup_iter().
try_to_free_pages
struct scan_control sc = {...}, target_mem_cgroup is 0x0;
do_try_to_free_pages
shrink_zones
shrink_node
mem_cgroup *root = sc->target_mem_cgroup;
memcg = mem_cgroup_iter(root, NULL, &reclaim);
mem_cgroup_iter()
if (!root)
root = root_mem_cgroup;
...
css = css_next_descendant_pre(css, &root->css);
memcg = mem_cgroup_from_css(css);
cmpxchg(&iter->position, pos, memcg);
My device uses memcg non-hierarchical mode. When we release a memcg:
invalidate_reclaim_iterators() reaches only dead_memcg and its parents.
If non-hierarchical mode is used, invalidate_reclaim_iterators() never
reaches root_mem_cgroup.
static void invalidate_reclaim_iterators(struct mem_cgroup *dead_memcg)
{
struct mem_cgroup *memcg = dead_memcg;
for (; memcg; memcg = parent_mem_cgroup(memcg)
...
}
So the use after free scenario looks like:
CPU1 CPU2
try_to_free_pages
do_try_to_free_pages
shrink_zones
shrink_node
mem_cgroup_iter()
if (!root)
root = root_mem_cgroup;
...
css = css_next_descendant_pre(css, &root->css);
memcg = mem_cgroup_from_css(css);
cmpxchg(&iter->position, pos, memcg);
invalidate_reclaim_iterators(memcg);
...
__mem_cgroup_free()
kfree(memcg);
try_to_free_pages
do_try_to_free_pages
shrink_zones
shrink_node
mem_cgroup_iter()
if (!root)
root = root_mem_cgroup;
...
mz = mem_cgroup_nodeinfo(root, reclaim->pgdat->node_id);
iter = &mz->iter[reclaim->priority];
pos = READ_ONCE(iter->position);
css_tryget(&pos->css) <- use after free
To avoid this, we should also invalidate root_mem_cgroup.nodeinfo.iter in
invalidate_reclaim_iterators().
Link: http://lkml.kernel.org/r/20190730015729.4406-1-miles.chen@mediatek.com
Fixes: 5ac8fb31ad2e ("mm: memcontrol: convert reclaim iterator to simple css refcounting")
Signed-off-by: Miles Chen <miles.chen(a)mediatek.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 39 +++++++++++++++++++++++++++++----------
1 file changed, 29 insertions(+), 10 deletions(-)
--- a/mm/memcontrol.c~mm-memcontrol-fix-use-after-free-in-mem_cgroup_iter
+++ a/mm/memcontrol.c
@@ -1130,26 +1130,45 @@ void mem_cgroup_iter_break(struct mem_cg
css_put(&prev->css);
}
-static void invalidate_reclaim_iterators(struct mem_cgroup *dead_memcg)
+static void __invalidate_reclaim_iterators(struct mem_cgroup *from,
+ struct mem_cgroup *dead_memcg)
{
- struct mem_cgroup *memcg = dead_memcg;
struct mem_cgroup_reclaim_iter *iter;
struct mem_cgroup_per_node *mz;
int nid;
int i;
- for (; memcg; memcg = parent_mem_cgroup(memcg)) {
- for_each_node(nid) {
- mz = mem_cgroup_nodeinfo(memcg, nid);
- for (i = 0; i <= DEF_PRIORITY; i++) {
- iter = &mz->iter[i];
- cmpxchg(&iter->position,
- dead_memcg, NULL);
- }
+ for_each_node(nid) {
+ mz = mem_cgroup_nodeinfo(from, nid);
+ for (i = 0; i <= DEF_PRIORITY; i++) {
+ iter = &mz->iter[i];
+ cmpxchg(&iter->position,
+ dead_memcg, NULL);
}
}
}
+static void invalidate_reclaim_iterators(struct mem_cgroup *dead_memcg)
+{
+ struct mem_cgroup *memcg = dead_memcg;
+ struct mem_cgroup *last;
+
+ do {
+ __invalidate_reclaim_iterators(memcg, dead_memcg);
+ last = memcg;
+ } while (memcg = parent_mem_cgroup(memcg));
+
+ /*
+ * When cgruop1 non-hierarchy mode is used,
+ * parent_mem_cgroup() does not walk all the way up to the
+ * cgroup root (root_mem_cgroup). So we have to handle
+ * dead_memcg from cgroup root separately.
+ */
+ if (last != root_mem_cgroup)
+ __invalidate_reclaim_iterators(root_mem_cgroup,
+ dead_memcg);
+}
+
/**
* mem_cgroup_scan_tasks - iterate over tasks of a memory cgroup hierarchy
* @memcg: hierarchy root
_
Patches currently in -mm which might be from miles.chen(a)mediatek.com are
mm-memcontrol-fix-use-after-free-in-mem_cgroup_iter.patch
Although SAS3 & SAS3.5 IT HBA controllers support
64-bit DMA addressing, as per hardware design,
if DMA able range contains all 64-bits set (0xFFFFFFFF-FFFFFFFF) then
it results in a firmware fault.
e.g. SGE's start address is 0xFFFFFFFF-FFFF000 and
data length is 0x1000 bytes. when HBA tries to DMA the data
at 0xFFFFFFFF-FFFFFFFF location then HBA will
fault the firmware.
Fix:
Driver will set 63-bit DMA mask to ensure the above address
will not be used.
Cc: <stable(a)vger.kernel.org> # 5.1.20+
Signed-off-by: Suganath Prabu <suganath-prabu.subramani(a)broadcom.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
---
V1 Change: Added tag for stable tree
V2 Change: Updated patch description.
drivers/scsi/mpt3sas/mpt3sas_base.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 6846628..050c0f0 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -2703,6 +2703,8 @@ _base_config_dma_addressing(struct MPT3SAS_ADAPTER *ioc, struct pci_dev *pdev)
{
u64 required_mask, coherent_mask;
struct sysinfo s;
+ /* Set 63 bit DMA mask for all SAS3 and SAS35 controllers */
+ int dma_mask = (ioc->hba_mpi_version_belonged > MPI2_VERSION) ? 63 : 64;
if (ioc->is_mcpu_endpoint)
goto try_32bit;
@@ -2712,17 +2714,17 @@ _base_config_dma_addressing(struct MPT3SAS_ADAPTER *ioc, struct pci_dev *pdev)
goto try_32bit;
if (ioc->dma_mask)
- coherent_mask = DMA_BIT_MASK(64);
+ coherent_mask = DMA_BIT_MASK(dma_mask);
else
coherent_mask = DMA_BIT_MASK(32);
- if (dma_set_mask(&pdev->dev, DMA_BIT_MASK(64)) ||
+ if (dma_set_mask(&pdev->dev, DMA_BIT_MASK(dma_mask)) ||
dma_set_coherent_mask(&pdev->dev, coherent_mask))
goto try_32bit;
ioc->base_add_sg_single = &_base_add_sg_single_64;
ioc->sge_size = sizeof(Mpi2SGESimple64_t);
- ioc->dma_mask = 64;
+ ioc->dma_mask = dma_mask;
goto out;
try_32bit:
@@ -2744,7 +2746,7 @@ static int
_base_change_consistent_dma_mask(struct MPT3SAS_ADAPTER *ioc,
struct pci_dev *pdev)
{
- if (pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64))) {
+ if (pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(ioc->dma_mask))) {
if (pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32)))
return -ENODEV;
}
@@ -4989,7 +4991,7 @@ _base_allocate_memory_pools(struct MPT3SAS_ADAPTER *ioc)
total_sz += sz;
} while (ioc->rdpq_array_enable && (++i < ioc->reply_queue_count));
- if (ioc->dma_mask == 64) {
+ if (ioc->dma_mask > 32) {
if (_base_change_consistent_dma_mask(ioc, ioc->pdev) != 0) {
ioc_warn(ioc, "no suitable consistent DMA mask for %s\n",
pci_name(ioc->pdev));
--
1.8.3.1
CLANG_FLAGS is initialized by the following line:
CLANG_FLAGS := --target=$(notdir $(CROSS_COMPILE:%-=%))
..., which is run only when CROSS_COMPILE is set.
Some build targets (bindeb-pkg etc.) recurse to the top Makefile.
When you build the kernel with Clang but without CROSS_COMPILE,
the same compiler flags such as -no-integrated-as are accumulated
into CLANG_FLAGS.
If you run 'make CC=clang' and then 'make CC=clang bindeb-pkg',
Kbuild will recompile everything needlessly due to the build command
change.
Fix this by correctly initializing CLANG_FLAGS.
Fixes: 238bcbc4e07f ("kbuild: consolidate Clang compiler flags")
Cc: <stable(a)vger.kernel.org> # v4.20+
Signed-off-by: Masahiro Yamada <yamada.masahiro(a)socionext.com>
---
Makefile | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index fa0fbe7851ea..5ee6f6889869 100644
--- a/Makefile
+++ b/Makefile
@@ -472,6 +472,7 @@ KBUILD_CFLAGS_MODULE := -DMODULE
KBUILD_LDFLAGS_MODULE := -T $(srctree)/scripts/module-common.lds
KBUILD_LDFLAGS :=
GCC_PLUGINS_CFLAGS :=
+CLANG_FLAGS :=
export ARCH SRCARCH CONFIG_SHELL HOSTCC KBUILD_HOSTCFLAGS CROSS_COMPILE AS LD CC
export CPP AR NM STRIP OBJCOPY OBJDUMP PAHOLE KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS
@@ -519,7 +520,7 @@ endif
ifneq ($(shell $(CC) --version 2>&1 | head -n 1 | grep clang),)
ifneq ($(CROSS_COMPILE),)
-CLANG_FLAGS := --target=$(notdir $(CROSS_COMPILE:%-=%))
+CLANG_FLAGS += --target=$(notdir $(CROSS_COMPILE:%-=%))
GCC_TOOLCHAIN_DIR := $(dir $(shell which $(CROSS_COMPILE)elfedit))
CLANG_FLAGS += --prefix=$(GCC_TOOLCHAIN_DIR)
GCC_TOOLCHAIN := $(realpath $(GCC_TOOLCHAIN_DIR)/..)
--
2.17.1
A build rule fails, the .DELETE_ON_ERROR special target removes the
target, but does nothing for the .*.cmd file, which might be corrupted.
So, .*.cmd files should be included only when the corresponding targets
exist.
Commit 392885ee82d3 ("kbuild: let fixdep directly write to .*.cmd
files") missed to fix up this file.
Fixes: 392885ee82d3 ("kbuild: let fixdep directly write to .*.cmd")
Cc: <stable(a)vger.kernel.org> # v5.0+
Signed-off-by: Masahiro Yamada <yamada.masahiro(a)socionext.com>
---
scripts/Makefile.modpost | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost
index 6b19c1a4eae5..ad4b9829a456 100644
--- a/scripts/Makefile.modpost
+++ b/scripts/Makefile.modpost
@@ -145,10 +145,8 @@ FORCE:
# optimization, we don't need to read them if the target does not
# exist, we will rebuild anyway in that case.
-cmd_files := $(wildcard $(foreach f,$(sort $(targets)),$(dir $(f)).$(notdir $(f)).cmd))
+existing-targets := $(wildcard $(sort $(targets)))
-ifneq ($(cmd_files),)
- include $(cmd_files)
-endif
+-include $(foreach f,$(existing-targets),$(dir $(f)).$(notdir $(f)).cmd)
.PHONY: $(PHONY)
--
2.17.1
Now that -Wimplicit-fallthrough is passed to GCC by default, the
following warnings shows up:
../drivers/gpu/drm/arm/malidp_hw.c: In function ‘malidp_format_get_bpp’:
../drivers/gpu/drm/arm/malidp_hw.c:387:8: warning: this statement may fall
through [-Wimplicit-fallthrough=]
bpp = 30;
~~~~^~~~
../drivers/gpu/drm/arm/malidp_hw.c:388:3: note: here
case DRM_FORMAT_YUV420_10BIT:
^~~~
../drivers/gpu/drm/arm/malidp_hw.c: In function ‘malidp_se_irq’:
../drivers/gpu/drm/arm/malidp_hw.c:1311:4: warning: this statement may fall
through [-Wimplicit-fallthrough=]
drm_writeback_signal_completion(&malidp->mw_connector, 0);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/arm/malidp_hw.c:1313:3: note: here
case MW_START:
^~~~
Rework to add a 'break;' in a case that didn't have it so that
the compiler doesn't warn about fall-through.
Cc: stable(a)vger.kernel.org # v5.2+
Fixes: b8207562abdd ("drm/arm/malidp: Specified the rotation memory requirements for AFBC YUV formats")
Acked-by: Liviu Dudau <liviu.dudau(a)arm.com>
Signed-off-by: Anders Roxell <anders.roxell(a)linaro.org>
---
drivers/gpu/drm/arm/malidp_hw.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/arm/malidp_hw.c b/drivers/gpu/drm/arm/malidp_hw.c
index 50af399d7f6f..380be66d4c6e 100644
--- a/drivers/gpu/drm/arm/malidp_hw.c
+++ b/drivers/gpu/drm/arm/malidp_hw.c
@@ -385,6 +385,7 @@ int malidp_format_get_bpp(u32 fmt)
switch (fmt) {
case DRM_FORMAT_VUY101010:
bpp = 30;
+ break;
case DRM_FORMAT_YUV420_10BIT:
bpp = 15;
break;
@@ -1309,7 +1310,7 @@ static irqreturn_t malidp_se_irq(int irq, void *arg)
break;
case MW_RESTART:
drm_writeback_signal_completion(&malidp->mw_connector, 0);
- /* fall through to a new start */
+ /* fall through - to a new start */
case MW_START:
/* writeback started, need to emulate one-shot mode */
hw->disable_memwrite(hwdev);
--
2.20.1
On Tue, Jul 30, 2019 at 7:52 PM Marek Marczykowski-Górecki
<marmarek(a)invisiblethingslab.com> wrote:
>
> On Tue, Jul 30, 2019 at 10:05:42AM -0400, Boris Ostrovsky wrote:
> > On 7/30/19 2:03 AM, Souptick Joarder wrote:
> > > On Mon, Jul 29, 2019 at 7:06 PM Marek Marczykowski-Górecki
> > > <marmarek(a)invisiblethingslab.com> wrote:
> > >> On Mon, Jul 29, 2019 at 02:02:54PM +0530, Souptick Joarder wrote:
> > >>> On Mon, Jul 29, 2019 at 1:35 PM Souptick Joarder <jrdr.linux(a)gmail.com> wrote:
> > >>>> On Sun, Jul 28, 2019 at 11:36 PM Marek Marczykowski-Górecki
> > >>>> <marmarek(a)invisiblethingslab.com> wrote:
> > >>>>> On Fri, Feb 15, 2019 at 08:18:31AM +0530, Souptick Joarder wrote:
> > >>>>>> Convert to use vm_map_pages() to map range of kernel
> > >>>>>> memory to user vma.
> > >>>>>>
> > >>>>>> map->count is passed to vm_map_pages() and internal API
> > >>>>>> verify map->count against count ( count = vma_pages(vma))
> > >>>>>> for page array boundary overrun condition.
> > >>>>> This commit breaks gntdev driver. If vma->vm_pgoff > 0, vm_map_pages
> > >>>>> will:
> > >>>>> - use map->pages starting at vma->vm_pgoff instead of 0
> > >>>> The actual code ignores vma->vm_pgoff > 0 scenario and mapped
> > >>>> the entire map->pages[i]. Why the entire map->pages[i] needs to be mapped
> > >>>> if vma->vm_pgoff > 0 (in original code) ?
> > >> vma->vm_pgoff is used as index passed to gntdev_find_map_index. It's
> > >> basically (ab)using this parameter for "which grant reference to map".
> > >>
> > >>>> are you referring to set vma->vm_pgoff = 0 irrespective of value passed
> > >>>> from user space ? If yes, using vm_map_pages_zero() is an alternate
> > >>>> option.
> > >> Yes, that should work.
> > > I prefer to use vm_map_pages_zero() to resolve both the issues. Alternatively
> > > the patch can be reverted as you suggested. Let me know you opinion and wait
> > > for feedback from others.
> > >
> > > Boris, would you like to give any feedback ?
> >
> > vm_map_pages_zero() looks good to me. Marek, does it work for you?
>
> Yes, replacing vm_map_pages() with vm_map_pages_zero() fixes the
> problem for me.
Marek, I can send a patch for the same if you are ok.
We need to cc stable as this changes are available in 5.2.4.
>
> --
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
Some devices are supposed to do not support on-die ECC but experience
shows that internal ECC machinery can actually be enabled through the
"SET FEATURE (EFh)" command, even if a read of the "READ ID Parameter
Tables" returns that it is not.
Currently, the driver checks the "READ ID Parameter" field directly
after having enabled the feature. If the check fails it returns
immediately but leaves the ECC on. When using buggy chips like
MT29F2G08ABAGA and MT29F2G08ABBGA, all future read/program cycles will
go through the on-die ECC, confusing the host controller which is
supposed to be the one handling correction.
To address this in a common way we need to turn off the on-die ECC
directly after reading the "READ ID Parameter" and before checking the
"ECC status".
Cc: stable(a)vger.kernel.org
Fixes: dbc44edbf833 ("mtd: rawnand: micron: Fix on-die ECC detection logic")
Signed-off-by: Marco Felsch <m.felsch(a)pengutronix.de>
Reviewed-by: Boris Brezillon <boris.brezillon(a)collabora.com>
---
v2:
- adapt commit message according Miquel comments
- add fixes, stable tags
- add Boris rb-tag
drivers/mtd/nand/raw/nand_micron.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/mtd/nand/raw/nand_micron.c b/drivers/mtd/nand/raw/nand_micron.c
index 1622d3145587..fb199ad2f1a6 100644
--- a/drivers/mtd/nand/raw/nand_micron.c
+++ b/drivers/mtd/nand/raw/nand_micron.c
@@ -390,6 +390,14 @@ static int micron_supports_on_die_ecc(struct nand_chip *chip)
(chip->id.data[4] & MICRON_ID_INTERNAL_ECC_MASK) != 0x2)
return MICRON_ON_DIE_UNSUPPORTED;
+ /*
+ * It seems that there are devices which do not support ECC official.
+ * At least the MT29F2G08ABAGA / MT29F2G08ABBGA devices supports
+ * enabling the ECC feature but don't reflect that to the READ_ID table.
+ * So we have to guarantee that we disable the ECC feature directly
+ * after we did the READ_ID table command. Later we can evaluate the
+ * ECC_ENABLE support.
+ */
ret = micron_nand_on_die_ecc_setup(chip, true);
if (ret)
return MICRON_ON_DIE_UNSUPPORTED;
@@ -398,13 +406,13 @@ static int micron_supports_on_die_ecc(struct nand_chip *chip)
if (ret)
return MICRON_ON_DIE_UNSUPPORTED;
- if (!(id[4] & MICRON_ID_ECC_ENABLED))
- return MICRON_ON_DIE_UNSUPPORTED;
-
ret = micron_nand_on_die_ecc_setup(chip, false);
if (ret)
return MICRON_ON_DIE_UNSUPPORTED;
+ if (!(id[4] & MICRON_ID_ECC_ENABLED))
+ return MICRON_ON_DIE_UNSUPPORTED;
+
ret = nand_readid_op(chip, 0, id, sizeof(id));
if (ret)
return MICRON_ON_DIE_UNSUPPORTED;
--
2.20.1
Synchronization is recommended before disabling the trace registers
to prevent any start or stop points being speculative at the point
of disabling the unit (section 7.3.77 of ARM IHI 0064D).
Synchronization is also recommended after programming the trace
registers to ensure all updates are committed prior to normal code
resuming (section 4.3.7 of ARM IHI 0064D).
Let's ensure these syncronization points are present in the code
and clearly commented.
Note that we could rely on the barriers in CS_LOCK and
coresight_disclaim_device_unlocked or the context switch to user
space - however coresight may be of use in the kernel.
On armv8 the mb macro is defined as dsb(sy) - Given that the etm4x is
only used on armv8 let's directly use dsb(sy) instead of mb(). This
removes some ambiguity and makes it easier to correlate the code with
the TRM.
Signed-off-by: Andrew Murray <andrew.murray(a)arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
CC: stable(a)vger.kernel.org
---
drivers/hwtracing/coresight/coresight-etm4x.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c
index 7ad15651e069..ec9468880c71 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x.c
@@ -188,6 +188,13 @@ static int etm4_enable_hw(struct etmv4_drvdata *drvdata)
dev_err(etm_dev,
"timeout while waiting for Idle Trace Status\n");
+ /*
+ * As recommended by section 4.3.7 ("Synchronization when using the
+ * memory-mapped interface") of ARM IHI 0064D
+ */
+ dsb(sy);
+ isb();
+
done:
CS_LOCK(drvdata->base);
@@ -453,8 +460,12 @@ static void etm4_disable_hw(void *info)
/* EN, bit[0] Trace unit enable bit */
control &= ~0x1;
- /* make sure everything completes before disabling */
- mb();
+ /*
+ * Make sure everything completes before disabling, as recommended
+ * by section 7.3.77 ("TRCVICTLR, ViewInst Main Control Register,
+ * SSTATUS") of ARM IHI 0064D
+ */
+ dsb(sy);
isb();
writel_relaxed(control, drvdata->base + TRCPRGCTLR);
--
2.21.0
This is a note to let you know that I've just added the patch titled
driver core: platform: return -ENXIO for missing GpioInt
to my driver-core git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git
in the driver-core-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 46c42d844211ef5902e32aa507beac0817c585e9 Mon Sep 17 00:00:00 2001
From: Brian Norris <briannorris(a)chromium.org>
Date: Mon, 29 Jul 2019 13:49:54 -0700
Subject: driver core: platform: return -ENXIO for missing GpioInt
Commit daaef255dc96 ("driver: platform: Support parsing GpioInt 0 in
platform_get_irq()") broke the Embedded Controller driver on most LPC
Chromebooks (i.e., most x86 Chromebooks), because cros_ec_lpc expects
platform_get_irq() to return -ENXIO for non-existent IRQs.
Unfortunately, acpi_dev_gpio_irq_get() doesn't follow this convention
and returns -ENOENT instead. So we get this error from cros_ec_lpc:
couldn't retrieve IRQ number (-2)
I see a variety of drivers that treat -ENXIO specially, so rather than
fix all of them, let's fix up the API to restore its previous behavior.
I reported this on v2 of this patch:
https://lore.kernel.org/lkml/20190220180538.GA42642@google.com/
but apparently the patch had already been merged before v3 got sent out:
https://lore.kernel.org/lkml/20190221193429.161300-1-egranata@chromium.org/
and the result is that the bug landed and remains unfixed.
I differ from the v3 patch by:
* allowing for ret==0, even though acpi_dev_gpio_irq_get() specifically
documents (and enforces) that 0 is not a valid return value (noted on
the v3 review)
* adding a small comment
Reported-by: Brian Norris <briannorris(a)chromium.org>
Reported-by: Salvatore Bellizzi <salvatore.bellizzi(a)linux.seppia.net>
Cc: Enrico Granata <egranata(a)chromium.org>
Cc: <stable(a)vger.kernel.org>
Fixes: daaef255dc96 ("driver: platform: Support parsing GpioInt 0 in platform_get_irq()")
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Acked-by: Enrico Granata <egranata(a)google.com>
Link: https://lore.kernel.org/r/20190729204954.25510-1-briannorris@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/base/platform.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 506a0175a5a7..ec974ba9c0c4 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -157,8 +157,13 @@ int platform_get_irq(struct platform_device *dev, unsigned int num)
* the device will only expose one IRQ, and this fallback
* allows a common code path across either kind of resource.
*/
- if (num == 0 && has_acpi_companion(&dev->dev))
- return acpi_dev_gpio_irq_get(ACPI_COMPANION(&dev->dev), num);
+ if (num == 0 && has_acpi_companion(&dev->dev)) {
+ int ret = acpi_dev_gpio_irq_get(ACPI_COMPANION(&dev->dev), num);
+
+ /* Our callers expect -ENXIO for missing IRQs. */
+ if (ret >= 0 || ret == -EPROBE_DEFER)
+ return ret;
+ }
return -ENXIO;
#endif
--
2.22.0
With recent changes in AOSP, adb is now using asynchronous I/O.
While adb works good for the most part, there have been issues with
adb root/unroot commands which cause adb hang. The issue is caused
by a request being queued twice. A series of 3 patches from
Felipe Balbi in upstream tree fixes this issue.
Felipe Balbi (3):
usb: dwc3: gadget: add dwc3_request status tracking
usb: dwc3: gadget: prevent dwc3_request from being queued twice
usb: dwc3: gadget: remove req->started flag
drivers/usb/dwc3/core.h | 11 +++++++++--
drivers/usb/dwc3/gadget.c | 9 ++++++++-
drivers/usb/dwc3/gadget.h | 4 ++--
3 files changed, 19 insertions(+), 5 deletions(-)
--
1.9.1
This is a note to let you know that I've just added the patch titled
usb: typec: tcpm: Add NULL check before dereferencing config
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 1957de95d425d1c06560069dc7277a73a8b28683 Mon Sep 17 00:00:00 2001
From: Guenter Roeck <linux(a)roeck-us.net>
Date: Wed, 24 Jul 2019 07:38:32 -0700
Subject: usb: typec: tcpm: Add NULL check before dereferencing config
When instantiating tcpm on an NXP OM 13588 board with NXP PTN5110,
the following crash is seen when writing into the 'preferred_role'
sysfs attribute.
Unable to handle kernel NULL pointer dereference at virtual address 00000028
pgd = f69149ad
[00000028] *pgd=00000000
Internal error: Oops: 5 [#1] THUMB2
Modules linked in: tcpci tcpm
CPU: 0 PID: 1882 Comm: bash Not tainted 5.1.18-sama5-armv7-r2 #4
Hardware name: Atmel SAMA5
PC is at tcpm_try_role+0x3a/0x4c [tcpm]
LR is at tcpm_try_role+0x15/0x4c [tcpm]
pc : [<bf8000e2>] lr : [<bf8000bd>] psr: 60030033
sp : dc1a1e88 ip : c03fb47d fp : 00000000
r10: dc216190 r9 : dc1a1f78 r8 : 00000001
r7 : df4ae044 r6 : dd032e90 r5 : dd1ce340 r4 : df4ae054
r3 : 00000000 r2 : 00000000 r1 : 00000000 r0 : df4ae044
Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment none
Control: 50c53c7d Table: 3efec059 DAC: 00000051
Process bash (pid: 1882, stack limit = 0x6a6d4aa5)
Stack: (0xdc1a1e88 to 0xdc1a2000)
1e80: dd05d808 dd1ce340 00000001 00000007 dd1ce340 c03fb4a7
1ea0: 00000007 00000007 dc216180 00000000 00000000 c01e1e03 00000000 00000000
1ec0: c0907008 dee98b40 c01e1d5d c06106c4 00000000 00000000 00000007 c0194e8b
1ee0: 0000000a 00000400 00000000 c01a97db dc22bf00 ffffe000 df4b6a00 df745900
1f00: 00000001 00000001 000000dd c01a9c2f 7aeab3be c0907008 00000000 dc22bf00
1f20: c0907008 00000000 00000000 00000000 00000000 7aeab3be 00000007 dee98b40
1f40: 005dc318 dc1a1f78 00000000 00000000 00000007 c01969f7 0000000a c01a20cb
1f60: dee98b40 c0907008 dee98b40 005dc318 00000000 c0196b9b 00000000 00000000
1f80: dee98b40 7aeab3be 00000074 005dc318 b6f3bdb0 00000004 c0101224 dc1a0000
1fa0: 00000004 c0101001 00000074 005dc318 00000001 005dc318 00000007 00000000
1fc0: 00000074 005dc318 b6f3bdb0 00000004 00000007 00000007 00000000 00000000
1fe0: 00000004 be800880 b6ed35b3 b6e5c746 60030030 00000001 00000000 00000000
[<bf8000e2>] (tcpm_try_role [tcpm]) from [<c03fb4a7>] (preferred_role_store+0x2b/0x5c)
[<c03fb4a7>] (preferred_role_store) from [<c01e1e03>] (kernfs_fop_write+0xa7/0x150)
[<c01e1e03>] (kernfs_fop_write) from [<c0194e8b>] (__vfs_write+0x1f/0x104)
[<c0194e8b>] (__vfs_write) from [<c01969f7>] (vfs_write+0x6b/0x104)
[<c01969f7>] (vfs_write) from [<c0196b9b>] (ksys_write+0x43/0x94)
[<c0196b9b>] (ksys_write) from [<c0101001>] (ret_fast_syscall+0x1/0x62)
Since commit 96232cbc6c994 ("usb: typec: tcpm: support get typec and pd
config from device properties"), the 'config' pointer in struct tcpc_dev
is optional when registering a Type-C port. Since it is optional, we have
to check if it is NULL before dereferencing it.
Reported-by: Douglas Gilbert <dgilbert(a)interlog.com>
Cc: Douglas Gilbert <dgilbert(a)interlog.com>
Fixes: 96232cbc6c994 ("usb: typec: tcpm: support get typec and pd config from device properties")
Signed-off-by: Guenter Roeck <linux(a)roeck-us.net>
Cc: stable <stable(a)vger.kernel.org>
Reviewed-by: Jun Li <jun.li(a)nxp.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Link: https://lore.kernel.org/r/1563979112-22483-1-git-send-email-linux@roeck-us.…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/typec/tcpm/tcpm.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c
index fba32d84e578..77f71f602f73 100644
--- a/drivers/usb/typec/tcpm/tcpm.c
+++ b/drivers/usb/typec/tcpm/tcpm.c
@@ -379,7 +379,8 @@ static enum tcpm_state tcpm_default_state(struct tcpm_port *port)
return SNK_UNATTACHED;
else if (port->try_role == TYPEC_SOURCE)
return SRC_UNATTACHED;
- else if (port->tcpc->config->default_role == TYPEC_SINK)
+ else if (port->tcpc->config &&
+ port->tcpc->config->default_role == TYPEC_SINK)
return SNK_UNATTACHED;
/* Fall through to return SRC_UNATTACHED */
} else if (port->port_type == TYPEC_PORT_SNK) {
@@ -4114,7 +4115,7 @@ static int tcpm_try_role(const struct typec_capability *cap, int role)
mutex_lock(&port->lock);
if (tcpc->try_role)
ret = tcpc->try_role(tcpc, role);
- if (!ret && !tcpc->config->try_role_hw)
+ if (!ret && (!tcpc->config || !tcpc->config->try_role_hw))
port->try_role = role;
port->try_src_count = 0;
port->try_snk_count = 0;
@@ -4701,7 +4702,7 @@ static int tcpm_copy_caps(struct tcpm_port *port,
port->typec_caps.prefer_role = tcfg->default_role;
port->typec_caps.type = tcfg->type;
port->typec_caps.data = tcfg->data;
- port->self_powered = port->tcpc->config->self_powered;
+ port->self_powered = tcfg->self_powered;
return 0;
}
--
2.22.0
Even in source code of this driver there is an author's description:
/*
* Even if we have an I2C bus, we can't assume that the cable
* is disconnected if drm_probe_ddc fails. Some cables don't
* wire the DDC pins, or the I2C bus might not be working at
* all.
*/
That's true. DDC and VGA channels are independent, and therefore
we cannot decide whether the monitor is connected or not,
depending on the information from the DDC.
So the monitor should always be considered connected.
Thus there is no reason to use connector detect callback for this
driver: DRM sub-system considers monitor always connected if there
is no detect() callback registered with drm_connector_init().
How to reproduce the bug:
* setup: i.MX8QXP, LCDIF video module + gpu/drm/mxsfb driver,
adv712x VGA DAC + dumb-vga-dac driver, VGA-connector w/o DDC;
* try to use drivers chain mxsfb-drm + dumb-vga-dac;
* any DRM applications consider the monitor is not connected:
===========
$ weston-start
$ cat /var/log/weston.log
...
DRM: head 'VGA-1' found, connector 32 is disconnected.
...
$ cat /sys/devices/platform/5a180000.lcdif/drm/card0/card0-VGA-1/status
unknown
===========
Oleksandr Suvorov (1):
drm/bridge: vga-dac: Fix detect of monitor connection
drivers/gpu/drm/bridge/dumb-vga-dac.c | 18 ------------------
1 file changed, 18 deletions(-)
--
2.20.1
Hello Greg,
Can you please consider including the following patches in the stable
linux-4.14.y branch?
An NFS server accepts only a limited number of concurrent v4.1+ mounts. Once
that limit is reached, on the affected client side, mount.nfs appears to hang to
keep reissuing CREATE_SESSION calls until one of them succeeds. This is to bump
the limit, also return smaller ca_maxrequests as the limit approaches instead of
waiting till we have to fail CREATE_SESSION completely.
44d8660d3bb0("nfsd: increase DRC cache limit")
de766e570413("nfsd: give out fewer session slots as limit approaches")
c54f24e338ed("nfsd: fix performance-limiting session calculation")
Thanks,
Qian Lu
Hi,
I forgot to mark a few patches for io_uring as stable. In order
of how to apply, can you add the following commits for 5.2?
f7b76ac9d17e16e44feebb6d2749fec92bfd6dd4
c0e48f9dea9129aa11bec3ed13803bcc26e96e49
bd11b3a391e3df6fa958facbe4b3f9f4cca9bd49
36703247d5f52a679df9da51192b6950fe81689f
Thanks!
--
Jens Axboe
The constraint from the zpool use of z3fold_destroy_pool() is there are no
outstanding handles to memory (so no active allocations), but it is possible
for there to be outstanding work on either of the two wqs in the pool.
If there is work queued on pool->compact_workqueue when it is called,
z3fold_destroy_pool() will do:
z3fold_destroy_pool()
destroy_workqueue(pool->release_wq)
destroy_workqueue(pool->compact_wq)
drain_workqueue(pool->compact_wq)
do_compact_page(zhdr)
kref_put(&zhdr->refcount)
__release_z3fold_page(zhdr, ...)
queue_work_on(pool->release_wq, &pool->work) *BOOM*
So compact_wq needs to be destroyed before release_wq.
Fixes: 5d03a6613957 ("mm/z3fold.c: use kref to prevent page free/compact race")
Signed-off-by: Henry Burns <henryburns(a)google.com>
Cc: <stable(a)vger.kernel.org>
---
mm/z3fold.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/mm/z3fold.c b/mm/z3fold.c
index 1a029a7432ee..43de92f52961 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -818,8 +818,15 @@ static void z3fold_destroy_pool(struct z3fold_pool *pool)
{
kmem_cache_destroy(pool->c_handle);
z3fold_unregister_migration(pool);
- destroy_workqueue(pool->release_wq);
+
+ /*
+ * We need to destroy pool->compact_wq before pool->release_wq,
+ * as any pending work on pool->compact_wq will call
+ * queue_work(pool->release_wq, &pool->work).
+ */
+
destroy_workqueue(pool->compact_wq);
+ destroy_workqueue(pool->release_wq);
kfree(pool);
}
--
2.22.0.709.g102302147b-goog
The patch below does not apply to the 5.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ee006eb00a00198d21dad60696318fd443a59f88 Mon Sep 17 00:00:00 2001
From: Lyude Paul <lyude(a)redhat.com>
Date: Thu, 20 Jun 2019 19:21:26 -0400
Subject: [PATCH] drm/amdgpu: Don't skip display settings in hwmgr_resume()
I'm not entirely sure why this is, but for some reason:
921935dc6404 ("drm/amd/powerplay: enforce display related settings only on needed")
Breaks runtime PM resume on the Radeon PRO WX 3100 (Lexa) in one the
pre-production laptops I have. The issue manifests as the following
messages in dmesg:
[drm] UVD and UVD ENC initialized successfully.
amdgpu 0000:3b:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vce1 test failed (-110)
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <vce_v3_0> failed -110
[drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-110).
And happens after about 6-10 runtime PM suspend/resume cycles (sometimes
sooner, if you're lucky!). Unfortunately I can't seem to pin down
precisely which part in psm_adjust_power_state_dynamic that is causing
the issue, but not skipping the display setting setup seems to fix it.
Hopefully if there is a better fix for this, this patch will spark
discussion around it.
Fixes: 921935dc6404 ("drm/amd/powerplay: enforce display related settings only on needed")
Cc: Evan Quan <evan.quan(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: Huang Rui <ray.huang(a)amd.com>
Cc: Rex Zhu <Rex.Zhu(a)amd.com>
Cc: Likun Gao <Likun.Gao(a)amd.com>
Cc: <stable(a)vger.kernel.org> # v5.1+
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c b/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
index 0b0d83c2a678..a24beaa4fb01 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
@@ -327,7 +327,7 @@ int hwmgr_resume(struct pp_hwmgr *hwmgr)
if (ret)
return ret;
- ret = psm_adjust_power_state_dynamic(hwmgr, true, NULL);
+ ret = psm_adjust_power_state_dynamic(hwmgr, false, NULL);
return ret;
}
Hi Greg,
Second attempt on this topic [1]:
"""
Upstream commmit 0eb77e988032 ("vmstat: make vmstat_updater deferrable
again and shut down on idle") was back ported in v4.4.178
(bdf3c006b9a2). For -rt we definitely need the bugfix f01f17d3705b
("mm, vmstat: make quiet_vmstat lighter") as well.
Since the offending patch was back ported to v4.4 stable only, the
other stable branches don't need an update (offending patch and bug
fix are already in).
"""
Though I missed a dependency as Jon noted[2]. The missing patch is
587198ba5206 ("vmstat: Remove BUG_ON from vmstat_update"). I've tested
this on a Tegra K1 one board which exposed the bug. With this should
be fine.
While at it, I looked on all relevant changes for
vmstat_updated(). These two patches are the only relevant changes
which are missing. It seems almost all changes from mainline have made
it back to v4.
Could you please queue the above patches for v4.4.y?
Thanks,
Daniel
[1] https://lore.kernel.org/stable/20190513061237.4915-1-wagi@monom.org
[2] https://lore.kernel.org/stable/f32de22f-c928-2eaa-ee3f-d2b26c184dd4@nvidia.…
Christoph Lameter (1):
vmstat: Remove BUG_ON from vmstat_update
Michal Hocko (1):
mm, vmstat: make quiet_vmstat lighter
mm/vmstat.c | 80 +++++++++++++++++++++++++++++++----------------------
1 file changed, 47 insertions(+), 33 deletions(-)
--
2.20.1
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ca6bf264f6d856f959c4239cda1047b587745c67 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 17 Jul 2019 18:08:21 -0700
Subject: [PATCH] libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock
A multithreaded namespace creation/destruction stress test currently
deadlocks with the following lockup signature:
INFO: task ndctl:2924 blocked for more than 122 seconds.
Tainted: G OE 5.2.0-rc4+ #3382
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ndctl D 0 2924 1176 0x00000000
Call Trace:
? __schedule+0x27e/0x780
schedule+0x30/0xb0
wait_nvdimm_bus_probe_idle+0x8a/0xd0 [libnvdimm]
? finish_wait+0x80/0x80
uuid_store+0xe6/0x2e0 [libnvdimm]
kernfs_fop_write+0xf0/0x1a0
vfs_write+0xb7/0x1b0
ksys_write+0x5c/0xd0
do_syscall_64+0x60/0x240
INFO: task ndctl:2923 blocked for more than 122 seconds.
Tainted: G OE 5.2.0-rc4+ #3382
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ndctl D 0 2923 1175 0x00000000
Call Trace:
? __schedule+0x27e/0x780
? __mutex_lock+0x489/0x910
schedule+0x30/0xb0
schedule_preempt_disabled+0x11/0x20
__mutex_lock+0x48e/0x910
? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
? __lock_acquire+0x23f/0x1710
? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
__dax_pmem_probe+0x5e/0x210 [dax_pmem_core]
? nvdimm_bus_probe+0x1d0/0x2c0 [libnvdimm]
dax_pmem_probe+0xc/0x20 [dax_pmem]
nvdimm_bus_probe+0x90/0x2c0 [libnvdimm]
really_probe+0xef/0x390
driver_probe_device+0xb4/0x100
In this sequence an 'nd_dax' device is being probed and trying to take
the lock on its backing namespace to validate that the 'nd_dax' device
indeed has exclusive access to the backing namespace. Meanwhile, another
thread is trying to update the uuid property of that same backing
namespace. So one thread is in the probe path trying to acquire the
lock, and the other thread has acquired the lock and tries to flush the
probe path.
Fix this deadlock by not holding the namespace device_lock over the
wait_nvdimm_bus_probe_idle() synchronization step. In turn this requires
the device_lock to be held on entry to wait_nvdimm_bus_probe_idle() and
subsequently dropped internally to wait_nvdimm_bus_probe_idle().
Cc: <stable(a)vger.kernel.org>
Fixes: bf9bccc14c05 ("libnvdimm: pmem label sets and namespace instantiation")
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Tested-by: Jane Chu <jane.chu(a)oracle.com>
Link: https://lore.kernel.org/r/156341210094.292348.2384694131126767789.stgit@dwi…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index a38572bf486b..df41f3571dc9 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -887,10 +887,12 @@ void wait_nvdimm_bus_probe_idle(struct device *dev)
do {
if (nvdimm_bus->probe_active == 0)
break;
- nvdimm_bus_unlock(&nvdimm_bus->dev);
+ nvdimm_bus_unlock(dev);
+ device_unlock(dev);
wait_event(nvdimm_bus->wait,
nvdimm_bus->probe_active == 0);
- nvdimm_bus_lock(&nvdimm_bus->dev);
+ device_lock(dev);
+ nvdimm_bus_lock(dev);
} while (true);
}
@@ -1016,7 +1018,7 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
case ND_CMD_ARS_START:
case ND_CMD_CLEAR_ERROR:
case ND_CMD_CALL:
- dev_dbg(&nvdimm_bus->dev, "'%s' command while read-only.\n",
+ dev_dbg(dev, "'%s' command while read-only.\n",
nvdimm ? nvdimm_cmd_name(cmd)
: nvdimm_bus_cmd_name(cmd));
return -EPERM;
@@ -1105,7 +1107,8 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
goto out;
}
- nvdimm_bus_lock(&nvdimm_bus->dev);
+ device_lock(dev);
+ nvdimm_bus_lock(dev);
rc = nd_cmd_clear_to_send(nvdimm_bus, nvdimm, func, buf);
if (rc)
goto out_unlock;
@@ -1125,7 +1128,8 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
rc = -EFAULT;
out_unlock:
- nvdimm_bus_unlock(&nvdimm_bus->dev);
+ nvdimm_bus_unlock(dev);
+ device_unlock(dev);
out:
kfree(in_env);
kfree(out_env);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 4fed9ce9c2fe..a15276cdec7d 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -422,10 +422,12 @@ static ssize_t available_size_show(struct device *dev,
* memory nvdimm_bus_lock() is dropped, but that's userspace's
* problem to not race itself.
*/
+ device_lock(dev);
nvdimm_bus_lock(dev);
wait_nvdimm_bus_probe_idle(dev);
available = nd_region_available_dpa(nd_region);
nvdimm_bus_unlock(dev);
+ device_unlock(dev);
return sprintf(buf, "%llu\n", available);
}
@@ -437,10 +439,12 @@ static ssize_t max_available_extent_show(struct device *dev,
struct nd_region *nd_region = to_nd_region(dev);
unsigned long long available = 0;
+ device_lock(dev);
nvdimm_bus_lock(dev);
wait_nvdimm_bus_probe_idle(dev);
available = nd_region_allocatable_dpa(nd_region);
nvdimm_bus_unlock(dev);
+ device_unlock(dev);
return sprintf(buf, "%llu\n", available);
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ca6bf264f6d856f959c4239cda1047b587745c67 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 17 Jul 2019 18:08:21 -0700
Subject: [PATCH] libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock
A multithreaded namespace creation/destruction stress test currently
deadlocks with the following lockup signature:
INFO: task ndctl:2924 blocked for more than 122 seconds.
Tainted: G OE 5.2.0-rc4+ #3382
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ndctl D 0 2924 1176 0x00000000
Call Trace:
? __schedule+0x27e/0x780
schedule+0x30/0xb0
wait_nvdimm_bus_probe_idle+0x8a/0xd0 [libnvdimm]
? finish_wait+0x80/0x80
uuid_store+0xe6/0x2e0 [libnvdimm]
kernfs_fop_write+0xf0/0x1a0
vfs_write+0xb7/0x1b0
ksys_write+0x5c/0xd0
do_syscall_64+0x60/0x240
INFO: task ndctl:2923 blocked for more than 122 seconds.
Tainted: G OE 5.2.0-rc4+ #3382
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ndctl D 0 2923 1175 0x00000000
Call Trace:
? __schedule+0x27e/0x780
? __mutex_lock+0x489/0x910
schedule+0x30/0xb0
schedule_preempt_disabled+0x11/0x20
__mutex_lock+0x48e/0x910
? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
? __lock_acquire+0x23f/0x1710
? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
__dax_pmem_probe+0x5e/0x210 [dax_pmem_core]
? nvdimm_bus_probe+0x1d0/0x2c0 [libnvdimm]
dax_pmem_probe+0xc/0x20 [dax_pmem]
nvdimm_bus_probe+0x90/0x2c0 [libnvdimm]
really_probe+0xef/0x390
driver_probe_device+0xb4/0x100
In this sequence an 'nd_dax' device is being probed and trying to take
the lock on its backing namespace to validate that the 'nd_dax' device
indeed has exclusive access to the backing namespace. Meanwhile, another
thread is trying to update the uuid property of that same backing
namespace. So one thread is in the probe path trying to acquire the
lock, and the other thread has acquired the lock and tries to flush the
probe path.
Fix this deadlock by not holding the namespace device_lock over the
wait_nvdimm_bus_probe_idle() synchronization step. In turn this requires
the device_lock to be held on entry to wait_nvdimm_bus_probe_idle() and
subsequently dropped internally to wait_nvdimm_bus_probe_idle().
Cc: <stable(a)vger.kernel.org>
Fixes: bf9bccc14c05 ("libnvdimm: pmem label sets and namespace instantiation")
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Tested-by: Jane Chu <jane.chu(a)oracle.com>
Link: https://lore.kernel.org/r/156341210094.292348.2384694131126767789.stgit@dwi…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index a38572bf486b..df41f3571dc9 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -887,10 +887,12 @@ void wait_nvdimm_bus_probe_idle(struct device *dev)
do {
if (nvdimm_bus->probe_active == 0)
break;
- nvdimm_bus_unlock(&nvdimm_bus->dev);
+ nvdimm_bus_unlock(dev);
+ device_unlock(dev);
wait_event(nvdimm_bus->wait,
nvdimm_bus->probe_active == 0);
- nvdimm_bus_lock(&nvdimm_bus->dev);
+ device_lock(dev);
+ nvdimm_bus_lock(dev);
} while (true);
}
@@ -1016,7 +1018,7 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
case ND_CMD_ARS_START:
case ND_CMD_CLEAR_ERROR:
case ND_CMD_CALL:
- dev_dbg(&nvdimm_bus->dev, "'%s' command while read-only.\n",
+ dev_dbg(dev, "'%s' command while read-only.\n",
nvdimm ? nvdimm_cmd_name(cmd)
: nvdimm_bus_cmd_name(cmd));
return -EPERM;
@@ -1105,7 +1107,8 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
goto out;
}
- nvdimm_bus_lock(&nvdimm_bus->dev);
+ device_lock(dev);
+ nvdimm_bus_lock(dev);
rc = nd_cmd_clear_to_send(nvdimm_bus, nvdimm, func, buf);
if (rc)
goto out_unlock;
@@ -1125,7 +1128,8 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
rc = -EFAULT;
out_unlock:
- nvdimm_bus_unlock(&nvdimm_bus->dev);
+ nvdimm_bus_unlock(dev);
+ device_unlock(dev);
out:
kfree(in_env);
kfree(out_env);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 4fed9ce9c2fe..a15276cdec7d 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -422,10 +422,12 @@ static ssize_t available_size_show(struct device *dev,
* memory nvdimm_bus_lock() is dropped, but that's userspace's
* problem to not race itself.
*/
+ device_lock(dev);
nvdimm_bus_lock(dev);
wait_nvdimm_bus_probe_idle(dev);
available = nd_region_available_dpa(nd_region);
nvdimm_bus_unlock(dev);
+ device_unlock(dev);
return sprintf(buf, "%llu\n", available);
}
@@ -437,10 +439,12 @@ static ssize_t max_available_extent_show(struct device *dev,
struct nd_region *nd_region = to_nd_region(dev);
unsigned long long available = 0;
+ device_lock(dev);
nvdimm_bus_lock(dev);
wait_nvdimm_bus_probe_idle(dev);
available = nd_region_allocatable_dpa(nd_region);
nvdimm_bus_unlock(dev);
+ device_unlock(dev);
return sprintf(buf, "%llu\n", available);
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ca6bf264f6d856f959c4239cda1047b587745c67 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 17 Jul 2019 18:08:21 -0700
Subject: [PATCH] libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock
A multithreaded namespace creation/destruction stress test currently
deadlocks with the following lockup signature:
INFO: task ndctl:2924 blocked for more than 122 seconds.
Tainted: G OE 5.2.0-rc4+ #3382
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ndctl D 0 2924 1176 0x00000000
Call Trace:
? __schedule+0x27e/0x780
schedule+0x30/0xb0
wait_nvdimm_bus_probe_idle+0x8a/0xd0 [libnvdimm]
? finish_wait+0x80/0x80
uuid_store+0xe6/0x2e0 [libnvdimm]
kernfs_fop_write+0xf0/0x1a0
vfs_write+0xb7/0x1b0
ksys_write+0x5c/0xd0
do_syscall_64+0x60/0x240
INFO: task ndctl:2923 blocked for more than 122 seconds.
Tainted: G OE 5.2.0-rc4+ #3382
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ndctl D 0 2923 1175 0x00000000
Call Trace:
? __schedule+0x27e/0x780
? __mutex_lock+0x489/0x910
schedule+0x30/0xb0
schedule_preempt_disabled+0x11/0x20
__mutex_lock+0x48e/0x910
? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
? __lock_acquire+0x23f/0x1710
? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
__dax_pmem_probe+0x5e/0x210 [dax_pmem_core]
? nvdimm_bus_probe+0x1d0/0x2c0 [libnvdimm]
dax_pmem_probe+0xc/0x20 [dax_pmem]
nvdimm_bus_probe+0x90/0x2c0 [libnvdimm]
really_probe+0xef/0x390
driver_probe_device+0xb4/0x100
In this sequence an 'nd_dax' device is being probed and trying to take
the lock on its backing namespace to validate that the 'nd_dax' device
indeed has exclusive access to the backing namespace. Meanwhile, another
thread is trying to update the uuid property of that same backing
namespace. So one thread is in the probe path trying to acquire the
lock, and the other thread has acquired the lock and tries to flush the
probe path.
Fix this deadlock by not holding the namespace device_lock over the
wait_nvdimm_bus_probe_idle() synchronization step. In turn this requires
the device_lock to be held on entry to wait_nvdimm_bus_probe_idle() and
subsequently dropped internally to wait_nvdimm_bus_probe_idle().
Cc: <stable(a)vger.kernel.org>
Fixes: bf9bccc14c05 ("libnvdimm: pmem label sets and namespace instantiation")
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Tested-by: Jane Chu <jane.chu(a)oracle.com>
Link: https://lore.kernel.org/r/156341210094.292348.2384694131126767789.stgit@dwi…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index a38572bf486b..df41f3571dc9 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -887,10 +887,12 @@ void wait_nvdimm_bus_probe_idle(struct device *dev)
do {
if (nvdimm_bus->probe_active == 0)
break;
- nvdimm_bus_unlock(&nvdimm_bus->dev);
+ nvdimm_bus_unlock(dev);
+ device_unlock(dev);
wait_event(nvdimm_bus->wait,
nvdimm_bus->probe_active == 0);
- nvdimm_bus_lock(&nvdimm_bus->dev);
+ device_lock(dev);
+ nvdimm_bus_lock(dev);
} while (true);
}
@@ -1016,7 +1018,7 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
case ND_CMD_ARS_START:
case ND_CMD_CLEAR_ERROR:
case ND_CMD_CALL:
- dev_dbg(&nvdimm_bus->dev, "'%s' command while read-only.\n",
+ dev_dbg(dev, "'%s' command while read-only.\n",
nvdimm ? nvdimm_cmd_name(cmd)
: nvdimm_bus_cmd_name(cmd));
return -EPERM;
@@ -1105,7 +1107,8 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
goto out;
}
- nvdimm_bus_lock(&nvdimm_bus->dev);
+ device_lock(dev);
+ nvdimm_bus_lock(dev);
rc = nd_cmd_clear_to_send(nvdimm_bus, nvdimm, func, buf);
if (rc)
goto out_unlock;
@@ -1125,7 +1128,8 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
rc = -EFAULT;
out_unlock:
- nvdimm_bus_unlock(&nvdimm_bus->dev);
+ nvdimm_bus_unlock(dev);
+ device_unlock(dev);
out:
kfree(in_env);
kfree(out_env);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 4fed9ce9c2fe..a15276cdec7d 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -422,10 +422,12 @@ static ssize_t available_size_show(struct device *dev,
* memory nvdimm_bus_lock() is dropped, but that's userspace's
* problem to not race itself.
*/
+ device_lock(dev);
nvdimm_bus_lock(dev);
wait_nvdimm_bus_probe_idle(dev);
available = nd_region_available_dpa(nd_region);
nvdimm_bus_unlock(dev);
+ device_unlock(dev);
return sprintf(buf, "%llu\n", available);
}
@@ -437,10 +439,12 @@ static ssize_t max_available_extent_show(struct device *dev,
struct nd_region *nd_region = to_nd_region(dev);
unsigned long long available = 0;
+ device_lock(dev);
nvdimm_bus_lock(dev);
wait_nvdimm_bus_probe_idle(dev);
available = nd_region_allocatable_dpa(nd_region);
nvdimm_bus_unlock(dev);
+ device_unlock(dev);
return sprintf(buf, "%llu\n", available);
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ca6bf264f6d856f959c4239cda1047b587745c67 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 17 Jul 2019 18:08:21 -0700
Subject: [PATCH] libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock
A multithreaded namespace creation/destruction stress test currently
deadlocks with the following lockup signature:
INFO: task ndctl:2924 blocked for more than 122 seconds.
Tainted: G OE 5.2.0-rc4+ #3382
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ndctl D 0 2924 1176 0x00000000
Call Trace:
? __schedule+0x27e/0x780
schedule+0x30/0xb0
wait_nvdimm_bus_probe_idle+0x8a/0xd0 [libnvdimm]
? finish_wait+0x80/0x80
uuid_store+0xe6/0x2e0 [libnvdimm]
kernfs_fop_write+0xf0/0x1a0
vfs_write+0xb7/0x1b0
ksys_write+0x5c/0xd0
do_syscall_64+0x60/0x240
INFO: task ndctl:2923 blocked for more than 122 seconds.
Tainted: G OE 5.2.0-rc4+ #3382
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ndctl D 0 2923 1175 0x00000000
Call Trace:
? __schedule+0x27e/0x780
? __mutex_lock+0x489/0x910
schedule+0x30/0xb0
schedule_preempt_disabled+0x11/0x20
__mutex_lock+0x48e/0x910
? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
? __lock_acquire+0x23f/0x1710
? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
__dax_pmem_probe+0x5e/0x210 [dax_pmem_core]
? nvdimm_bus_probe+0x1d0/0x2c0 [libnvdimm]
dax_pmem_probe+0xc/0x20 [dax_pmem]
nvdimm_bus_probe+0x90/0x2c0 [libnvdimm]
really_probe+0xef/0x390
driver_probe_device+0xb4/0x100
In this sequence an 'nd_dax' device is being probed and trying to take
the lock on its backing namespace to validate that the 'nd_dax' device
indeed has exclusive access to the backing namespace. Meanwhile, another
thread is trying to update the uuid property of that same backing
namespace. So one thread is in the probe path trying to acquire the
lock, and the other thread has acquired the lock and tries to flush the
probe path.
Fix this deadlock by not holding the namespace device_lock over the
wait_nvdimm_bus_probe_idle() synchronization step. In turn this requires
the device_lock to be held on entry to wait_nvdimm_bus_probe_idle() and
subsequently dropped internally to wait_nvdimm_bus_probe_idle().
Cc: <stable(a)vger.kernel.org>
Fixes: bf9bccc14c05 ("libnvdimm: pmem label sets and namespace instantiation")
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Tested-by: Jane Chu <jane.chu(a)oracle.com>
Link: https://lore.kernel.org/r/156341210094.292348.2384694131126767789.stgit@dwi…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index a38572bf486b..df41f3571dc9 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -887,10 +887,12 @@ void wait_nvdimm_bus_probe_idle(struct device *dev)
do {
if (nvdimm_bus->probe_active == 0)
break;
- nvdimm_bus_unlock(&nvdimm_bus->dev);
+ nvdimm_bus_unlock(dev);
+ device_unlock(dev);
wait_event(nvdimm_bus->wait,
nvdimm_bus->probe_active == 0);
- nvdimm_bus_lock(&nvdimm_bus->dev);
+ device_lock(dev);
+ nvdimm_bus_lock(dev);
} while (true);
}
@@ -1016,7 +1018,7 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
case ND_CMD_ARS_START:
case ND_CMD_CLEAR_ERROR:
case ND_CMD_CALL:
- dev_dbg(&nvdimm_bus->dev, "'%s' command while read-only.\n",
+ dev_dbg(dev, "'%s' command while read-only.\n",
nvdimm ? nvdimm_cmd_name(cmd)
: nvdimm_bus_cmd_name(cmd));
return -EPERM;
@@ -1105,7 +1107,8 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
goto out;
}
- nvdimm_bus_lock(&nvdimm_bus->dev);
+ device_lock(dev);
+ nvdimm_bus_lock(dev);
rc = nd_cmd_clear_to_send(nvdimm_bus, nvdimm, func, buf);
if (rc)
goto out_unlock;
@@ -1125,7 +1128,8 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
rc = -EFAULT;
out_unlock:
- nvdimm_bus_unlock(&nvdimm_bus->dev);
+ nvdimm_bus_unlock(dev);
+ device_unlock(dev);
out:
kfree(in_env);
kfree(out_env);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 4fed9ce9c2fe..a15276cdec7d 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -422,10 +422,12 @@ static ssize_t available_size_show(struct device *dev,
* memory nvdimm_bus_lock() is dropped, but that's userspace's
* problem to not race itself.
*/
+ device_lock(dev);
nvdimm_bus_lock(dev);
wait_nvdimm_bus_probe_idle(dev);
available = nd_region_available_dpa(nd_region);
nvdimm_bus_unlock(dev);
+ device_unlock(dev);
return sprintf(buf, "%llu\n", available);
}
@@ -437,10 +439,12 @@ static ssize_t max_available_extent_show(struct device *dev,
struct nd_region *nd_region = to_nd_region(dev);
unsigned long long available = 0;
+ device_lock(dev);
nvdimm_bus_lock(dev);
wait_nvdimm_bus_probe_idle(dev);
available = nd_region_allocatable_dpa(nd_region);
nvdimm_bus_unlock(dev);
+ device_unlock(dev);
return sprintf(buf, "%llu\n", available);
}
The patch below does not apply to the 5.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ca6bf264f6d856f959c4239cda1047b587745c67 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 17 Jul 2019 18:08:21 -0700
Subject: [PATCH] libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock
A multithreaded namespace creation/destruction stress test currently
deadlocks with the following lockup signature:
INFO: task ndctl:2924 blocked for more than 122 seconds.
Tainted: G OE 5.2.0-rc4+ #3382
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ndctl D 0 2924 1176 0x00000000
Call Trace:
? __schedule+0x27e/0x780
schedule+0x30/0xb0
wait_nvdimm_bus_probe_idle+0x8a/0xd0 [libnvdimm]
? finish_wait+0x80/0x80
uuid_store+0xe6/0x2e0 [libnvdimm]
kernfs_fop_write+0xf0/0x1a0
vfs_write+0xb7/0x1b0
ksys_write+0x5c/0xd0
do_syscall_64+0x60/0x240
INFO: task ndctl:2923 blocked for more than 122 seconds.
Tainted: G OE 5.2.0-rc4+ #3382
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ndctl D 0 2923 1175 0x00000000
Call Trace:
? __schedule+0x27e/0x780
? __mutex_lock+0x489/0x910
schedule+0x30/0xb0
schedule_preempt_disabled+0x11/0x20
__mutex_lock+0x48e/0x910
? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
? __lock_acquire+0x23f/0x1710
? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
__dax_pmem_probe+0x5e/0x210 [dax_pmem_core]
? nvdimm_bus_probe+0x1d0/0x2c0 [libnvdimm]
dax_pmem_probe+0xc/0x20 [dax_pmem]
nvdimm_bus_probe+0x90/0x2c0 [libnvdimm]
really_probe+0xef/0x390
driver_probe_device+0xb4/0x100
In this sequence an 'nd_dax' device is being probed and trying to take
the lock on its backing namespace to validate that the 'nd_dax' device
indeed has exclusive access to the backing namespace. Meanwhile, another
thread is trying to update the uuid property of that same backing
namespace. So one thread is in the probe path trying to acquire the
lock, and the other thread has acquired the lock and tries to flush the
probe path.
Fix this deadlock by not holding the namespace device_lock over the
wait_nvdimm_bus_probe_idle() synchronization step. In turn this requires
the device_lock to be held on entry to wait_nvdimm_bus_probe_idle() and
subsequently dropped internally to wait_nvdimm_bus_probe_idle().
Cc: <stable(a)vger.kernel.org>
Fixes: bf9bccc14c05 ("libnvdimm: pmem label sets and namespace instantiation")
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Tested-by: Jane Chu <jane.chu(a)oracle.com>
Link: https://lore.kernel.org/r/156341210094.292348.2384694131126767789.stgit@dwi…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index a38572bf486b..df41f3571dc9 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -887,10 +887,12 @@ void wait_nvdimm_bus_probe_idle(struct device *dev)
do {
if (nvdimm_bus->probe_active == 0)
break;
- nvdimm_bus_unlock(&nvdimm_bus->dev);
+ nvdimm_bus_unlock(dev);
+ device_unlock(dev);
wait_event(nvdimm_bus->wait,
nvdimm_bus->probe_active == 0);
- nvdimm_bus_lock(&nvdimm_bus->dev);
+ device_lock(dev);
+ nvdimm_bus_lock(dev);
} while (true);
}
@@ -1016,7 +1018,7 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
case ND_CMD_ARS_START:
case ND_CMD_CLEAR_ERROR:
case ND_CMD_CALL:
- dev_dbg(&nvdimm_bus->dev, "'%s' command while read-only.\n",
+ dev_dbg(dev, "'%s' command while read-only.\n",
nvdimm ? nvdimm_cmd_name(cmd)
: nvdimm_bus_cmd_name(cmd));
return -EPERM;
@@ -1105,7 +1107,8 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
goto out;
}
- nvdimm_bus_lock(&nvdimm_bus->dev);
+ device_lock(dev);
+ nvdimm_bus_lock(dev);
rc = nd_cmd_clear_to_send(nvdimm_bus, nvdimm, func, buf);
if (rc)
goto out_unlock;
@@ -1125,7 +1128,8 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
rc = -EFAULT;
out_unlock:
- nvdimm_bus_unlock(&nvdimm_bus->dev);
+ nvdimm_bus_unlock(dev);
+ device_unlock(dev);
out:
kfree(in_env);
kfree(out_env);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 4fed9ce9c2fe..a15276cdec7d 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -422,10 +422,12 @@ static ssize_t available_size_show(struct device *dev,
* memory nvdimm_bus_lock() is dropped, but that's userspace's
* problem to not race itself.
*/
+ device_lock(dev);
nvdimm_bus_lock(dev);
wait_nvdimm_bus_probe_idle(dev);
available = nd_region_available_dpa(nd_region);
nvdimm_bus_unlock(dev);
+ device_unlock(dev);
return sprintf(buf, "%llu\n", available);
}
@@ -437,10 +439,12 @@ static ssize_t max_available_extent_show(struct device *dev,
struct nd_region *nd_region = to_nd_region(dev);
unsigned long long available = 0;
+ device_lock(dev);
nvdimm_bus_lock(dev);
wait_nvdimm_bus_probe_idle(dev);
available = nd_region_allocatable_dpa(nd_region);
nvdimm_bus_unlock(dev);
+ device_unlock(dev);
return sprintf(buf, "%llu\n", available);
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8aac0e2338916e273ccbd438a2b7a1e8c61749f5 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 17 Jul 2019 18:07:58 -0700
Subject: [PATCH] libnvdimm/bus: Prevent duplicate device_unregister() calls
A multithreaded namespace creation/destruction stress test currently
fails with signatures like the following:
sysfs group 'power' not found for kobject 'dax1.1'
RIP: 0010:sysfs_remove_group+0x76/0x80
Call Trace:
device_del+0x73/0x370
device_unregister+0x16/0x50
nd_async_device_unregister+0x1e/0x30 [libnvdimm]
async_run_entry_fn+0x39/0x160
process_one_work+0x23c/0x5e0
worker_thread+0x3c/0x390
BUG: kernel NULL pointer dereference, address: 0000000000000020
RIP: 0010:klist_put+0x1b/0x6c
Call Trace:
klist_del+0xe/0x10
device_del+0x8a/0x2c9
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70
device_unregister+0x44/0x4f
nd_async_device_unregister+0x22/0x2d [libnvdimm]
async_run_entry_fn+0x47/0x15a
process_one_work+0x1a2/0x2eb
worker_thread+0x1b8/0x26e
Use the kill_device() helper to atomically resolve the race of multiple
threads issuing kill, device_unregister(), requests.
Reported-by: Jane Chu <jane.chu(a)oracle.com>
Reported-by: Erwin Tsaur <erwin.tsaur(a)oracle.com>
Fixes: 4d88a97aa9e8 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver...")
Cc: <stable(a)vger.kernel.org>
Link: https://github.com/pmem/ndctl/issues/96
Tested-by: Tested-by: Jane Chu <jane.chu(a)oracle.com>
Link: https://lore.kernel.org/r/156341207846.292348.10435719262819764054.stgit@dw…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index 2dca3034fee0..42713b210f51 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -547,13 +547,38 @@ EXPORT_SYMBOL(nd_device_register);
void nd_device_unregister(struct device *dev, enum nd_async_mode mode)
{
+ bool killed;
+
switch (mode) {
case ND_ASYNC:
+ /*
+ * In the async case this is being triggered with the
+ * device lock held and the unregistration work needs to
+ * be moved out of line iff this is thread has won the
+ * race to schedule the deletion.
+ */
+ if (!kill_device(dev))
+ return;
+
get_device(dev);
async_schedule_domain(nd_async_device_unregister, dev,
&nd_async_domain);
break;
case ND_SYNC:
+ /*
+ * In the sync case the device is being unregistered due
+ * to a state change of the parent. Claim the kill state
+ * to synchronize against other unregistration requests,
+ * or otherwise let the async path handle it if the
+ * unregistration was already queued.
+ */
+ device_lock(dev);
+ killed = kill_device(dev);
+ device_unlock(dev);
+
+ if (!killed)
+ return;
+
nd_synchronize();
device_unregister(dev);
break;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8aac0e2338916e273ccbd438a2b7a1e8c61749f5 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 17 Jul 2019 18:07:58 -0700
Subject: [PATCH] libnvdimm/bus: Prevent duplicate device_unregister() calls
A multithreaded namespace creation/destruction stress test currently
fails with signatures like the following:
sysfs group 'power' not found for kobject 'dax1.1'
RIP: 0010:sysfs_remove_group+0x76/0x80
Call Trace:
device_del+0x73/0x370
device_unregister+0x16/0x50
nd_async_device_unregister+0x1e/0x30 [libnvdimm]
async_run_entry_fn+0x39/0x160
process_one_work+0x23c/0x5e0
worker_thread+0x3c/0x390
BUG: kernel NULL pointer dereference, address: 0000000000000020
RIP: 0010:klist_put+0x1b/0x6c
Call Trace:
klist_del+0xe/0x10
device_del+0x8a/0x2c9
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70
device_unregister+0x44/0x4f
nd_async_device_unregister+0x22/0x2d [libnvdimm]
async_run_entry_fn+0x47/0x15a
process_one_work+0x1a2/0x2eb
worker_thread+0x1b8/0x26e
Use the kill_device() helper to atomically resolve the race of multiple
threads issuing kill, device_unregister(), requests.
Reported-by: Jane Chu <jane.chu(a)oracle.com>
Reported-by: Erwin Tsaur <erwin.tsaur(a)oracle.com>
Fixes: 4d88a97aa9e8 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver...")
Cc: <stable(a)vger.kernel.org>
Link: https://github.com/pmem/ndctl/issues/96
Tested-by: Tested-by: Jane Chu <jane.chu(a)oracle.com>
Link: https://lore.kernel.org/r/156341207846.292348.10435719262819764054.stgit@dw…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index 2dca3034fee0..42713b210f51 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -547,13 +547,38 @@ EXPORT_SYMBOL(nd_device_register);
void nd_device_unregister(struct device *dev, enum nd_async_mode mode)
{
+ bool killed;
+
switch (mode) {
case ND_ASYNC:
+ /*
+ * In the async case this is being triggered with the
+ * device lock held and the unregistration work needs to
+ * be moved out of line iff this is thread has won the
+ * race to schedule the deletion.
+ */
+ if (!kill_device(dev))
+ return;
+
get_device(dev);
async_schedule_domain(nd_async_device_unregister, dev,
&nd_async_domain);
break;
case ND_SYNC:
+ /*
+ * In the sync case the device is being unregistered due
+ * to a state change of the parent. Claim the kill state
+ * to synchronize against other unregistration requests,
+ * or otherwise let the async path handle it if the
+ * unregistration was already queued.
+ */
+ device_lock(dev);
+ killed = kill_device(dev);
+ device_unlock(dev);
+
+ if (!killed)
+ return;
+
nd_synchronize();
device_unregister(dev);
break;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8aac0e2338916e273ccbd438a2b7a1e8c61749f5 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 17 Jul 2019 18:07:58 -0700
Subject: [PATCH] libnvdimm/bus: Prevent duplicate device_unregister() calls
A multithreaded namespace creation/destruction stress test currently
fails with signatures like the following:
sysfs group 'power' not found for kobject 'dax1.1'
RIP: 0010:sysfs_remove_group+0x76/0x80
Call Trace:
device_del+0x73/0x370
device_unregister+0x16/0x50
nd_async_device_unregister+0x1e/0x30 [libnvdimm]
async_run_entry_fn+0x39/0x160
process_one_work+0x23c/0x5e0
worker_thread+0x3c/0x390
BUG: kernel NULL pointer dereference, address: 0000000000000020
RIP: 0010:klist_put+0x1b/0x6c
Call Trace:
klist_del+0xe/0x10
device_del+0x8a/0x2c9
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70
device_unregister+0x44/0x4f
nd_async_device_unregister+0x22/0x2d [libnvdimm]
async_run_entry_fn+0x47/0x15a
process_one_work+0x1a2/0x2eb
worker_thread+0x1b8/0x26e
Use the kill_device() helper to atomically resolve the race of multiple
threads issuing kill, device_unregister(), requests.
Reported-by: Jane Chu <jane.chu(a)oracle.com>
Reported-by: Erwin Tsaur <erwin.tsaur(a)oracle.com>
Fixes: 4d88a97aa9e8 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver...")
Cc: <stable(a)vger.kernel.org>
Link: https://github.com/pmem/ndctl/issues/96
Tested-by: Tested-by: Jane Chu <jane.chu(a)oracle.com>
Link: https://lore.kernel.org/r/156341207846.292348.10435719262819764054.stgit@dw…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index 2dca3034fee0..42713b210f51 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -547,13 +547,38 @@ EXPORT_SYMBOL(nd_device_register);
void nd_device_unregister(struct device *dev, enum nd_async_mode mode)
{
+ bool killed;
+
switch (mode) {
case ND_ASYNC:
+ /*
+ * In the async case this is being triggered with the
+ * device lock held and the unregistration work needs to
+ * be moved out of line iff this is thread has won the
+ * race to schedule the deletion.
+ */
+ if (!kill_device(dev))
+ return;
+
get_device(dev);
async_schedule_domain(nd_async_device_unregister, dev,
&nd_async_domain);
break;
case ND_SYNC:
+ /*
+ * In the sync case the device is being unregistered due
+ * to a state change of the parent. Claim the kill state
+ * to synchronize against other unregistration requests,
+ * or otherwise let the async path handle it if the
+ * unregistration was already queued.
+ */
+ device_lock(dev);
+ killed = kill_device(dev);
+ device_unlock(dev);
+
+ if (!killed)
+ return;
+
nd_synchronize();
device_unregister(dev);
break;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8aac0e2338916e273ccbd438a2b7a1e8c61749f5 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 17 Jul 2019 18:07:58 -0700
Subject: [PATCH] libnvdimm/bus: Prevent duplicate device_unregister() calls
A multithreaded namespace creation/destruction stress test currently
fails with signatures like the following:
sysfs group 'power' not found for kobject 'dax1.1'
RIP: 0010:sysfs_remove_group+0x76/0x80
Call Trace:
device_del+0x73/0x370
device_unregister+0x16/0x50
nd_async_device_unregister+0x1e/0x30 [libnvdimm]
async_run_entry_fn+0x39/0x160
process_one_work+0x23c/0x5e0
worker_thread+0x3c/0x390
BUG: kernel NULL pointer dereference, address: 0000000000000020
RIP: 0010:klist_put+0x1b/0x6c
Call Trace:
klist_del+0xe/0x10
device_del+0x8a/0x2c9
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70
device_unregister+0x44/0x4f
nd_async_device_unregister+0x22/0x2d [libnvdimm]
async_run_entry_fn+0x47/0x15a
process_one_work+0x1a2/0x2eb
worker_thread+0x1b8/0x26e
Use the kill_device() helper to atomically resolve the race of multiple
threads issuing kill, device_unregister(), requests.
Reported-by: Jane Chu <jane.chu(a)oracle.com>
Reported-by: Erwin Tsaur <erwin.tsaur(a)oracle.com>
Fixes: 4d88a97aa9e8 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver...")
Cc: <stable(a)vger.kernel.org>
Link: https://github.com/pmem/ndctl/issues/96
Tested-by: Tested-by: Jane Chu <jane.chu(a)oracle.com>
Link: https://lore.kernel.org/r/156341207846.292348.10435719262819764054.stgit@dw…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index 2dca3034fee0..42713b210f51 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -547,13 +547,38 @@ EXPORT_SYMBOL(nd_device_register);
void nd_device_unregister(struct device *dev, enum nd_async_mode mode)
{
+ bool killed;
+
switch (mode) {
case ND_ASYNC:
+ /*
+ * In the async case this is being triggered with the
+ * device lock held and the unregistration work needs to
+ * be moved out of line iff this is thread has won the
+ * race to schedule the deletion.
+ */
+ if (!kill_device(dev))
+ return;
+
get_device(dev);
async_schedule_domain(nd_async_device_unregister, dev,
&nd_async_domain);
break;
case ND_SYNC:
+ /*
+ * In the sync case the device is being unregistered due
+ * to a state change of the parent. Claim the kill state
+ * to synchronize against other unregistration requests,
+ * or otherwise let the async path handle it if the
+ * unregistration was already queued.
+ */
+ device_lock(dev);
+ killed = kill_device(dev);
+ device_unlock(dev);
+
+ if (!killed)
+ return;
+
nd_synchronize();
device_unregister(dev);
break;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 00289cd87676e14913d2d8492d1ce05c4baafdae Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 17 Jul 2019 18:07:53 -0700
Subject: [PATCH] drivers/base: Introduce kill_device()
The libnvdimm subsystem arranges for devices to be destroyed as a result
of a sysfs operation. Since device_unregister() cannot be called from
an actively running sysfs attribute of the same device libnvdimm
arranges for device_unregister() to be performed in an out-of-line async
context.
The driver core maintains a 'dead' state for coordinating its own racing
async registration / de-registration requests. Rather than add local
'dead' state tracking infrastructure to libnvdimm device objects, export
the existing state tracking via a new kill_device() helper.
The kill_device() helper simply marks the device as dead, i.e. that it
is on its way to device_del(), or returns that the device was already
dead. This can be used in advance of calling device_unregister() for
subsystems like libnvdimm that might need to handle multiple user
threads racing to delete a device.
This refactoring does not change any behavior, but it is a pre-requisite
for follow-on fixes and therefore marked for -stable.
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael(a)kernel.org>
Fixes: 4d88a97aa9e8 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver...")
Cc: <stable(a)vger.kernel.org>
Tested-by: Jane Chu <jane.chu(a)oracle.com>
Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Link: https://lore.kernel.org/r/156341207332.292348.14959761496009347574.stgit@dw…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/base/core.c b/drivers/base/core.c
index fd7511e04e62..eaf3aa0cb803 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2211,6 +2211,24 @@ void put_device(struct device *dev)
}
EXPORT_SYMBOL_GPL(put_device);
+bool kill_device(struct device *dev)
+{
+ /*
+ * Require the device lock and set the "dead" flag to guarantee that
+ * the update behavior is consistent with the other bitfields near
+ * it and that we cannot have an asynchronous probe routine trying
+ * to run while we are tearing out the bus/class/sysfs from
+ * underneath the device.
+ */
+ lockdep_assert_held(&dev->mutex);
+
+ if (dev->p->dead)
+ return false;
+ dev->p->dead = true;
+ return true;
+}
+EXPORT_SYMBOL_GPL(kill_device);
+
/**
* device_del - delete device from system.
* @dev: device.
@@ -2230,15 +2248,8 @@ void device_del(struct device *dev)
struct kobject *glue_dir = NULL;
struct class_interface *class_intf;
- /*
- * Hold the device lock and set the "dead" flag to guarantee that
- * the update behavior is consistent with the other bitfields near
- * it and that we cannot have an asynchronous probe routine trying
- * to run while we are tearing out the bus/class/sysfs from
- * underneath the device.
- */
device_lock(dev);
- dev->p->dead = true;
+ kill_device(dev);
device_unlock(dev);
/* Notify clients of device removal. This call must come
diff --git a/include/linux/device.h b/include/linux/device.h
index e85264fb6616..0da5c67f6be1 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -1373,6 +1373,7 @@ extern int (*platform_notify_remove)(struct device *dev);
*/
extern struct device *get_device(struct device *dev);
extern void put_device(struct device *dev);
+extern bool kill_device(struct device *dev);
#ifdef CONFIG_DEVTMPFS
extern int devtmpfs_create_node(struct device *dev);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 00289cd87676e14913d2d8492d1ce05c4baafdae Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 17 Jul 2019 18:07:53 -0700
Subject: [PATCH] drivers/base: Introduce kill_device()
The libnvdimm subsystem arranges for devices to be destroyed as a result
of a sysfs operation. Since device_unregister() cannot be called from
an actively running sysfs attribute of the same device libnvdimm
arranges for device_unregister() to be performed in an out-of-line async
context.
The driver core maintains a 'dead' state for coordinating its own racing
async registration / de-registration requests. Rather than add local
'dead' state tracking infrastructure to libnvdimm device objects, export
the existing state tracking via a new kill_device() helper.
The kill_device() helper simply marks the device as dead, i.e. that it
is on its way to device_del(), or returns that the device was already
dead. This can be used in advance of calling device_unregister() for
subsystems like libnvdimm that might need to handle multiple user
threads racing to delete a device.
This refactoring does not change any behavior, but it is a pre-requisite
for follow-on fixes and therefore marked for -stable.
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael(a)kernel.org>
Fixes: 4d88a97aa9e8 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver...")
Cc: <stable(a)vger.kernel.org>
Tested-by: Jane Chu <jane.chu(a)oracle.com>
Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Link: https://lore.kernel.org/r/156341207332.292348.14959761496009347574.stgit@dw…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/base/core.c b/drivers/base/core.c
index fd7511e04e62..eaf3aa0cb803 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2211,6 +2211,24 @@ void put_device(struct device *dev)
}
EXPORT_SYMBOL_GPL(put_device);
+bool kill_device(struct device *dev)
+{
+ /*
+ * Require the device lock and set the "dead" flag to guarantee that
+ * the update behavior is consistent with the other bitfields near
+ * it and that we cannot have an asynchronous probe routine trying
+ * to run while we are tearing out the bus/class/sysfs from
+ * underneath the device.
+ */
+ lockdep_assert_held(&dev->mutex);
+
+ if (dev->p->dead)
+ return false;
+ dev->p->dead = true;
+ return true;
+}
+EXPORT_SYMBOL_GPL(kill_device);
+
/**
* device_del - delete device from system.
* @dev: device.
@@ -2230,15 +2248,8 @@ void device_del(struct device *dev)
struct kobject *glue_dir = NULL;
struct class_interface *class_intf;
- /*
- * Hold the device lock and set the "dead" flag to guarantee that
- * the update behavior is consistent with the other bitfields near
- * it and that we cannot have an asynchronous probe routine trying
- * to run while we are tearing out the bus/class/sysfs from
- * underneath the device.
- */
device_lock(dev);
- dev->p->dead = true;
+ kill_device(dev);
device_unlock(dev);
/* Notify clients of device removal. This call must come
diff --git a/include/linux/device.h b/include/linux/device.h
index e85264fb6616..0da5c67f6be1 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -1373,6 +1373,7 @@ extern int (*platform_notify_remove)(struct device *dev);
*/
extern struct device *get_device(struct device *dev);
extern void put_device(struct device *dev);
+extern bool kill_device(struct device *dev);
#ifdef CONFIG_DEVTMPFS
extern int devtmpfs_create_node(struct device *dev);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 00289cd87676e14913d2d8492d1ce05c4baafdae Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 17 Jul 2019 18:07:53 -0700
Subject: [PATCH] drivers/base: Introduce kill_device()
The libnvdimm subsystem arranges for devices to be destroyed as a result
of a sysfs operation. Since device_unregister() cannot be called from
an actively running sysfs attribute of the same device libnvdimm
arranges for device_unregister() to be performed in an out-of-line async
context.
The driver core maintains a 'dead' state for coordinating its own racing
async registration / de-registration requests. Rather than add local
'dead' state tracking infrastructure to libnvdimm device objects, export
the existing state tracking via a new kill_device() helper.
The kill_device() helper simply marks the device as dead, i.e. that it
is on its way to device_del(), or returns that the device was already
dead. This can be used in advance of calling device_unregister() for
subsystems like libnvdimm that might need to handle multiple user
threads racing to delete a device.
This refactoring does not change any behavior, but it is a pre-requisite
for follow-on fixes and therefore marked for -stable.
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael(a)kernel.org>
Fixes: 4d88a97aa9e8 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver...")
Cc: <stable(a)vger.kernel.org>
Tested-by: Jane Chu <jane.chu(a)oracle.com>
Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Link: https://lore.kernel.org/r/156341207332.292348.14959761496009347574.stgit@dw…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/base/core.c b/drivers/base/core.c
index fd7511e04e62..eaf3aa0cb803 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2211,6 +2211,24 @@ void put_device(struct device *dev)
}
EXPORT_SYMBOL_GPL(put_device);
+bool kill_device(struct device *dev)
+{
+ /*
+ * Require the device lock and set the "dead" flag to guarantee that
+ * the update behavior is consistent with the other bitfields near
+ * it and that we cannot have an asynchronous probe routine trying
+ * to run while we are tearing out the bus/class/sysfs from
+ * underneath the device.
+ */
+ lockdep_assert_held(&dev->mutex);
+
+ if (dev->p->dead)
+ return false;
+ dev->p->dead = true;
+ return true;
+}
+EXPORT_SYMBOL_GPL(kill_device);
+
/**
* device_del - delete device from system.
* @dev: device.
@@ -2230,15 +2248,8 @@ void device_del(struct device *dev)
struct kobject *glue_dir = NULL;
struct class_interface *class_intf;
- /*
- * Hold the device lock and set the "dead" flag to guarantee that
- * the update behavior is consistent with the other bitfields near
- * it and that we cannot have an asynchronous probe routine trying
- * to run while we are tearing out the bus/class/sysfs from
- * underneath the device.
- */
device_lock(dev);
- dev->p->dead = true;
+ kill_device(dev);
device_unlock(dev);
/* Notify clients of device removal. This call must come
diff --git a/include/linux/device.h b/include/linux/device.h
index e85264fb6616..0da5c67f6be1 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -1373,6 +1373,7 @@ extern int (*platform_notify_remove)(struct device *dev);
*/
extern struct device *get_device(struct device *dev);
extern void put_device(struct device *dev);
+extern bool kill_device(struct device *dev);
#ifdef CONFIG_DEVTMPFS
extern int devtmpfs_create_node(struct device *dev);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 00289cd87676e14913d2d8492d1ce05c4baafdae Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 17 Jul 2019 18:07:53 -0700
Subject: [PATCH] drivers/base: Introduce kill_device()
The libnvdimm subsystem arranges for devices to be destroyed as a result
of a sysfs operation. Since device_unregister() cannot be called from
an actively running sysfs attribute of the same device libnvdimm
arranges for device_unregister() to be performed in an out-of-line async
context.
The driver core maintains a 'dead' state for coordinating its own racing
async registration / de-registration requests. Rather than add local
'dead' state tracking infrastructure to libnvdimm device objects, export
the existing state tracking via a new kill_device() helper.
The kill_device() helper simply marks the device as dead, i.e. that it
is on its way to device_del(), or returns that the device was already
dead. This can be used in advance of calling device_unregister() for
subsystems like libnvdimm that might need to handle multiple user
threads racing to delete a device.
This refactoring does not change any behavior, but it is a pre-requisite
for follow-on fixes and therefore marked for -stable.
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael(a)kernel.org>
Fixes: 4d88a97aa9e8 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver...")
Cc: <stable(a)vger.kernel.org>
Tested-by: Jane Chu <jane.chu(a)oracle.com>
Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Link: https://lore.kernel.org/r/156341207332.292348.14959761496009347574.stgit@dw…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/base/core.c b/drivers/base/core.c
index fd7511e04e62..eaf3aa0cb803 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2211,6 +2211,24 @@ void put_device(struct device *dev)
}
EXPORT_SYMBOL_GPL(put_device);
+bool kill_device(struct device *dev)
+{
+ /*
+ * Require the device lock and set the "dead" flag to guarantee that
+ * the update behavior is consistent with the other bitfields near
+ * it and that we cannot have an asynchronous probe routine trying
+ * to run while we are tearing out the bus/class/sysfs from
+ * underneath the device.
+ */
+ lockdep_assert_held(&dev->mutex);
+
+ if (dev->p->dead)
+ return false;
+ dev->p->dead = true;
+ return true;
+}
+EXPORT_SYMBOL_GPL(kill_device);
+
/**
* device_del - delete device from system.
* @dev: device.
@@ -2230,15 +2248,8 @@ void device_del(struct device *dev)
struct kobject *glue_dir = NULL;
struct class_interface *class_intf;
- /*
- * Hold the device lock and set the "dead" flag to guarantee that
- * the update behavior is consistent with the other bitfields near
- * it and that we cannot have an asynchronous probe routine trying
- * to run while we are tearing out the bus/class/sysfs from
- * underneath the device.
- */
device_lock(dev);
- dev->p->dead = true;
+ kill_device(dev);
device_unlock(dev);
/* Notify clients of device removal. This call must come
diff --git a/include/linux/device.h b/include/linux/device.h
index e85264fb6616..0da5c67f6be1 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -1373,6 +1373,7 @@ extern int (*platform_notify_remove)(struct device *dev);
*/
extern struct device *get_device(struct device *dev);
extern void put_device(struct device *dev);
+extern bool kill_device(struct device *dev);
#ifdef CONFIG_DEVTMPFS
extern int devtmpfs_create_node(struct device *dev);