From: Oleksij Rempel <o.rempel(a)pengutronix.de>
Set SOCK_RCU_FREE to let RCU to call sk_destruct() on completion.
Without this patch, we will run in to j1939_can_recv() after priv was
freed by j1939_sk_release()->j1939_sk_sock_destruct()
Fixes: 25fe97cb7620 ("can: j1939: move j1939_priv_put() into sk_destruct callback")
Link: https://lore.kernel.org/r/20210617130623.12705-1-o.rempel@pengutronix.de
Cc: linux-stable <stable(a)vger.kernel.org>
Reported-by: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
Reported-by: syzbot+bdf710cfc41c186fdff3(a)syzkaller.appspotmail.com
Signed-off-by: Oleksij Rempel <o.rempel(a)pengutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
net/can/j1939/main.c | 4 ++++
net/can/j1939/socket.c | 3 +++
2 files changed, 7 insertions(+)
diff --git a/net/can/j1939/main.c b/net/can/j1939/main.c
index da3a7a7bcff2..08c8606cfd9c 100644
--- a/net/can/j1939/main.c
+++ b/net/can/j1939/main.c
@@ -193,6 +193,10 @@ static void j1939_can_rx_unregister(struct j1939_priv *priv)
can_rx_unregister(dev_net(ndev), ndev, J1939_CAN_ID, J1939_CAN_MASK,
j1939_can_recv, priv);
+ /* The last reference of priv is dropped by the RCU deferred
+ * j1939_sk_sock_destruct() of the last socket, so we can
+ * safely drop this reference here.
+ */
j1939_priv_put(priv);
}
diff --git a/net/can/j1939/socket.c b/net/can/j1939/socket.c
index 56aa66147d5a..fce8bc8afeb7 100644
--- a/net/can/j1939/socket.c
+++ b/net/can/j1939/socket.c
@@ -398,6 +398,9 @@ static int j1939_sk_init(struct sock *sk)
atomic_set(&jsk->skb_pending, 0);
spin_lock_init(&jsk->sk_session_queue_lock);
INIT_LIST_HEAD(&jsk->sk_session_queue);
+
+ /* j1939_sk_sock_destruct() depends on SOCK_RCU_FREE flag */
+ sock_set_flag(sk, SOCK_RCU_FREE);
sk->sk_destruct = j1939_sk_sock_destruct;
sk->sk_protocol = CAN_J1939;
--
2.30.2
From: Oliver Hartkopp <socketcan(a)hartkopp.net>
When closing the isotp socket, the potentially running hrtimers are
canceled before removing the subscription for CAN identifiers via
can_rx_unregister().
This may lead to an unintended (re)start of a hrtimer in
isotp_rcv_cf() and isotp_rcv_fc() in the case that a CAN frame is
received by isotp_rcv() while the subscription removal is processed.
However, isotp_rcv() is called under RCU protection, so after calling
can_rx_unregister, we may call synchronize_rcu in order to wait for
any RCU read-side critical sections to finish. This prevents the
reception of CAN frames after hrtimer_cancel() and therefore the
unintended (re)start of the hrtimers.
Link: https://lore.kernel.org/r/20210618173713.2296-1-socketcan@hartkopp.net
Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Cc: linux-stable <stable(a)vger.kernel.org>
Signed-off-by: Oliver Hartkopp <socketcan(a)hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
net/can/isotp.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/can/isotp.c b/net/can/isotp.c
index be6183f8ca11..234cc4ad179a 100644
--- a/net/can/isotp.c
+++ b/net/can/isotp.c
@@ -1028,9 +1028,6 @@ static int isotp_release(struct socket *sock)
lock_sock(sk);
- hrtimer_cancel(&so->txtimer);
- hrtimer_cancel(&so->rxtimer);
-
/* remove current filters & unregister */
if (so->bound && (!(so->opt.flags & CAN_ISOTP_SF_BROADCAST))) {
if (so->ifindex) {
@@ -1042,10 +1039,14 @@ static int isotp_release(struct socket *sock)
SINGLE_MASK(so->rxid),
isotp_rcv, sk);
dev_put(dev);
+ synchronize_rcu();
}
}
}
+ hrtimer_cancel(&so->txtimer);
+ hrtimer_cancel(&so->rxtimer);
+
so->ifindex = 0;
so->bound = 0;
--
2.30.2
From: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
can_rx_register() callbacks may be called concurrently to the call to
can_rx_unregister(). The callbacks and callback data, though, are
protected by RCU and the struct sock reference count.
So the callback data is really attached to the life of sk, meaning
that it should be released on sk_destruct. However, bcm_remove_op()
calls tasklet_kill(), and RCU callbacks may be called under RCU
softirq, so that cannot be used on kernels before the introduction of
HRTIMER_MODE_SOFT.
However, bcm_rx_handler() is called under RCU protection, so after
calling can_rx_unregister(), we may call synchronize_rcu() in order to
wait for any RCU read-side critical sections to finish. That is,
bcm_rx_handler() won't be called anymore for those ops. So, we only
free them, after we do that synchronize_rcu().
Fixes: ffd980f976e7 ("[CAN]: Add broadcast manager (bcm) protocol")
Link: https://lore.kernel.org/r/20210619161813.2098382-1-cascardo@canonical.com
Cc: linux-stable <stable(a)vger.kernel.org>
Reported-by: syzbot+0f7e7e5e2f4f40fa89c0(a)syzkaller.appspotmail.com
Reported-by: Norbert Slusarek <nslusarek(a)gmx.net>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
Acked-by: Oliver Hartkopp <socketcan(a)hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
net/can/bcm.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/net/can/bcm.c b/net/can/bcm.c
index f3e4d9528fa3..0928a39c4423 100644
--- a/net/can/bcm.c
+++ b/net/can/bcm.c
@@ -785,6 +785,7 @@ static int bcm_delete_rx_op(struct list_head *ops, struct bcm_msg_head *mh,
bcm_rx_handler, op);
list_del(&op->list);
+ synchronize_rcu();
bcm_remove_op(op);
return 1; /* done */
}
@@ -1533,9 +1534,13 @@ static int bcm_release(struct socket *sock)
REGMASK(op->can_id),
bcm_rx_handler, op);
- bcm_remove_op(op);
}
+ synchronize_rcu();
+
+ list_for_each_entry_safe(op, next, &bo->rx_ops, list)
+ bcm_remove_op(op);
+
#if IS_ENABLED(CONFIG_PROC_FS)
/* remove procfs entry */
if (net->can.bcmproc_dir && bo->bcm_proc_read)
base-commit: dda2626b86c2c1813b7bfdd10d2fdd849611fc97
--
2.30.2
The following commit has been merged into the smp/urgent branch of tip:
Commit-ID: 64c71be97c02c3d3f24dea7c290912ad300538b9
Gitweb: https://git.kernel.org/tip/64c71be97c02c3d3f24dea7c290912ad300538b9
Author: Thomas Gleixner <tglx(a)linutronix.de>
AuthorDate: Sat, 27 Mar 2021 22:01:36 +01:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Sat, 19 Jun 2021 22:26:07 +02:00
cpu/hotplug: Cure the cpusets trainwreck
Alexey and Joshua tried to solve a cpusets related hotplug problem which is
user space visible and results in unexpected behaviour for some time after
a CPU has been plugged in and the corresponding uevent was delivered.
cpusets delegate the hotplug work (rebuilding cpumasks etc.) to a
workqueue. This is done because the cpusets code has already a lock
nesting of cgroups_mutex -> cpu_hotplug_lock. A synchronous callback or
waiting for the work to finish with cpu_hotplug_lock held can and will
deadlock because that results in the reverse lock order.
As a consequence the uevent can be delivered before cpusets have consistent
state which means that a user space invocation of sched_setaffinity() to
move a task to the plugged CPU fails up to the point where the scheduled
work has been processed.
The same is true for CPU unplug, but that does not create user observable
failure (yet).
It's still inconsistent to claim that an operation is finished before it
actually is and that's the real issue at hand. uevents just make it
reliably observable.
Obviously the problem should be fixed in cpusets/cgroups, but untangling
that is pretty much impossible because according to the changelog of the
commit which introduced this 8 years ago:
3a5a6d0c2b03("cpuset: don't nest cgroup_mutex inside get_online_cpus()")
the lock order cgroups_mutex -> cpu_hotplug_lock is a design decision and
the whole code is built around that.
So bite the bullet and invoke the relevant cpuset function, which waits for
the work to finish, in _cpu_up/down() after dropping cpu_hotplug_lock and
only when tasks are not frozen by suspend/hibernate because that would
obviously wait forever.
Waiting there with cpu_add_remove_lock, which is protecting the present
and possible CPU maps, held is not a problem at all because neither work
queues nor cpusets/cgroups have any lockchains related to that lock.
Waiting in the hotplug machinery is not problematic either because there
are already state callbacks which wait for hardware queues to drain. It
makes the operations slightly slower, but hotplug is slow anyway.
This ensures that state is consistent before returning from a hotplug
up/down operation. It's still inconsistent during the operation, but that's
a different story.
Add a large comment which explains why this is done and why this is not a
dump ground for the hack of the day to work around half thought out locking
schemes. Document also the implications vs. hotplug operations and
serialization or the lack of it.
Thanks to Alexy and Joshua for analyzing why this temporary
sched_setaffinity() failure happened.
Fixes: 3a5a6d0c2b03("cpuset: don't nest cgroup_mutex inside get_online_cpus()")
Reported-by: Alexey Klimov <aklimov(a)redhat.com>
Reported-by: Joshua Baker <jobaker(a)redhat.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Alexey Klimov <aklimov(a)redhat.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87tuowcnv3.ffs@nanos.tec.linutronix.de
---
kernel/cpu.c | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 49 insertions(+)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index e538518..eccc8cf 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -32,6 +32,7 @@
#include <linux/relay.h>
#include <linux/slab.h>
#include <linux/percpu-rwsem.h>
+#include <linux/cpuset.h>
#include <trace/events/power.h>
#define CREATE_TRACE_POINTS
@@ -919,6 +920,52 @@ void clear_tasks_mm_cpumask(int cpu)
rcu_read_unlock();
}
+/*
+ *
+ * Serialize hotplug trainwrecks outside of the cpu_hotplug_lock
+ * protected region.
+ *
+ * The operation is still serialized against concurrent CPU hotplug via
+ * cpu_add_remove_lock, i.e. CPU map protection. But it is _not_
+ * serialized against other hotplug related activity like adding or
+ * removing of state callbacks and state instances, which invoke either the
+ * startup or the teardown callback of the affected state.
+ *
+ * This is required for subsystems which are unfixable vs. CPU hotplug and
+ * evade lock inversion problems by scheduling work which has to be
+ * completed _before_ cpu_up()/_cpu_down() returns.
+ *
+ * Don't even think about adding anything to this for any new code or even
+ * drivers. It's only purpose is to keep existing lock order trainwrecks
+ * working.
+ *
+ * For cpu_down() there might be valid reasons to finish cleanups which are
+ * not required to be done under cpu_hotplug_lock, but that's a different
+ * story and would be not invoked via this.
+ */
+static void cpu_up_down_serialize_trainwrecks(bool tasks_frozen)
+{
+ /*
+ * cpusets delegate hotplug operations to a worker to "solve" the
+ * lock order problems. Wait for the worker, but only if tasks are
+ * _not_ frozen (suspend, hibernate) as that would wait forever.
+ *
+ * The wait is required because otherwise the hotplug operation
+ * returns with inconsistent state, which could even be observed in
+ * user space when a new CPU is brought up. The CPU plug uevent
+ * would be delivered and user space reacting on it would fail to
+ * move tasks to the newly plugged CPU up to the point where the
+ * work has finished because up to that point the newly plugged CPU
+ * is not assignable in cpusets/cgroups. On unplug that's not
+ * necessarily a visible issue, but it is still inconsistent state,
+ * which is the real problem which needs to be "fixed". This can't
+ * prevent the transient state between scheduling the work and
+ * returning from waiting for it.
+ */
+ if (!tasks_frozen)
+ cpuset_wait_for_hotplug();
+}
+
/* Take this CPU down. */
static int take_cpu_down(void *_param)
{
@@ -1108,6 +1155,7 @@ out:
*/
lockup_detector_cleanup();
arch_smt_update();
+ cpu_up_down_serialize_trainwrecks(tasks_frozen);
return ret;
}
@@ -1302,6 +1350,7 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
out:
cpus_write_unlock();
arch_smt_update();
+ cpu_up_down_serialize_trainwrecks(tasks_frozen);
return ret;
}
This is the start of the stable review cycle for the 5.10.45 release.
There are 38 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 18 Jun 2021 15:28:19 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.45-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.10.45-rc1
Zheng Yongjun <zhengyongjun3(a)huawei.com>
fib: Return the correct errno code
Zheng Yongjun <zhengyongjun3(a)huawei.com>
net: Return the correct errno code
Zheng Yongjun <zhengyongjun3(a)huawei.com>
net/x25: Return the correct errno code
Jiapeng Chong <jiapeng.chong(a)linux.alibaba.com>
rtnetlink: Fix missing error code in rtnl_bridge_notify()
Victor Zhao <Victor.Zhao(a)amd.com>
drm/amd/amdgpu:save psp ring wptr to avoid attack
Roman Li <roman.li(a)amd.com>
drm/amd/display: Fix potential memory leak in DMUB hw_init
Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
drm/amd/display: Fix overlay validation by considering cursors
Jiansong Chen <Jiansong.Chen(a)amd.com>
drm/amdgpu: refine amdgpu_fru_get_product_info
Bindu Ramamurthy <bindu.r(a)amd.com>
drm/amd/display: Allow bandwidth validation for 0 streams.
Josh Triplett <josh(a)joshtriplett.org>
net: ipconfig: Don't override command-line hostnames or domains
Hannes Reinecke <hare(a)suse.de>
nvme-loop: do not warn for deleted controllers during reset
Hannes Reinecke <hare(a)suse.de>
nvme-loop: check for NVME_LOOP_Q_LIVE in nvme_loop_destroy_admin_queue()
Hannes Reinecke <hare(a)suse.de>
nvme-loop: clear NVME_LOOP_Q_LIVE when nvme_loop_configure_admin_queue() fails
Hannes Reinecke <hare(a)suse.de>
nvme-loop: reset queue count to 1 in nvme_loop_destroy_io_queues()
Ewan D. Milne <emilne(a)redhat.com>
scsi: scsi_devinfo: Add blacklist entry for HPE OPEN-V
Larry Finger <Larry.Finger(a)lwfinger.net>
Bluetooth: Add a new USB ID for RTL8822CE
Daniel Wagner <dwagner(a)suse.de>
scsi: qedf: Do not put host in qedf_vport_create() unconditionally
Jiapeng Chong <jiapeng.chong(a)linux.alibaba.com>
ethernet: myri10ge: Fix missing error code in myri10ge_probe()
Maurizio Lombardi <mlombard(a)redhat.com>
scsi: target: core: Fix warning on realtime kernels
Hillf Danton <hdanton(a)sina.com>
gfs2: Fix use-after-free in gfs2_glock_shrink_scan
Khem Raj <raj.khem(a)gmail.com>
riscv: Use -mno-relax when using lld linker
Bixuan Cui <cuibixuan(a)huawei.com>
HID: gt683r: add missing MODULE_DEVICE_TABLE
Bob Peterson <rpeterso(a)redhat.com>
gfs2: fix a deadlock on withdraw-during-mount
Andreas Gruenbacher <agruenba(a)redhat.com>
gfs2: Prevent direct-I/O write fallback errors from getting lost
Yongqiang Liu <liuyongqiang13(a)huawei.com>
ARM: OMAP2+: Fix build warning when mmc_omap is not built
Maciej Falkowski <maciej.falkowski9(a)gmail.com>
ARM: OMAP1: Fix use of possibly uninitialized irq variable
Thierry Reding <treding(a)nvidia.com>
drm/tegra: sor: Fully initialize SOR before registration
Thierry Reding <treding(a)nvidia.com>
gpu: host1x: Split up client initalization and registration
Pavel Machek (CIP) <pavel(a)denx.de>
drm/tegra: sor: Do not leak runtime PM reference
Anirudh Rayabharam <mail(a)anirudhrb.com>
HID: usbhid: fix info leak in hid_submit_ctrl
Mark Bolhuis <mark(a)bolhuis.dev>
HID: Add BUS_VIRTUAL to hid_connect logging
Ahelenia Ziemiańska <nabijaczleweli(a)nabijaczleweli.xyz>
HID: multitouch: set Stylus suffix for Stylus-application devices, too
Saeed Mirzamohammadi <saeed.mirzamohammadi(a)oracle.com>
HID: quirks: Add quirk for Lenovo optical mouse
Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com>
HID: hid-sensor-hub: Return error for hid_set_field() failure
Dmitry Torokhov <dmitry.torokhov(a)gmail.com>
HID: hid-input: add mapping for emoji picker key
Mateusz Jończyk <mat.jonczyk(a)o2.pl>
HID: a4tech: use A4_2WHEEL_MOUSE_HACK_B8 for A4TECH NB-95
Nirenjan Krishnan <nirenjan(a)gmail.com>
HID: quirks: Set INCREMENT_USAGE_ON_DUPLICATE for Saitek X65
Dan Robertson <dan(a)dlrobertson.com>
net: ieee802154: fix null deref in parse dev addr
-------------
Diffstat:
Makefile | 4 +--
arch/arm/mach-omap1/pm.c | 10 ++++--
arch/arm/mach-omap2/board-n8x0.c | 2 +-
arch/riscv/Makefile | 9 +++++
drivers/bluetooth/btusb.c | 2 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c | 42 ++++++++++++----------
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 1 +
drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 3 +-
drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 3 +-
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 14 ++++++--
.../gpu/drm/amd/display/dc/dcn20/dcn20_resource.c | 2 +-
drivers/gpu/drm/tegra/sor.c | 41 +++++++++++----------
drivers/gpu/host1x/bus.c | 30 ++++++++++++----
drivers/hid/Kconfig | 4 +--
drivers/hid/hid-a4tech.c | 2 ++
drivers/hid/hid-core.c | 3 ++
drivers/hid/hid-debug.c | 1 +
drivers/hid/hid-gt683r.c | 1 +
drivers/hid/hid-ids.h | 3 ++
drivers/hid/hid-input.c | 3 ++
drivers/hid/hid-multitouch.c | 8 ++---
drivers/hid/hid-quirks.c | 3 ++
drivers/hid/hid-sensor-hub.c | 13 ++++---
drivers/hid/usbhid/hid-core.c | 2 +-
drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 1 +
drivers/nvme/target/loop.c | 11 ++++--
drivers/scsi/qedf/qedf_main.c | 20 +++++------
drivers/scsi/scsi_devinfo.c | 1 +
drivers/target/target_core_transport.c | 4 +--
fs/gfs2/file.c | 5 ++-
fs/gfs2/glock.c | 26 +++++++++++---
include/linux/hid.h | 3 +-
include/linux/host1x.h | 30 ++++++++++++----
include/uapi/linux/input-event-codes.h | 1 +
net/compat.c | 2 +-
net/core/fib_rules.c | 2 +-
net/core/rtnetlink.c | 4 ++-
net/ieee802154/nl802154.c | 9 ++---
net/ipv4/ipconfig.c | 13 ++++---
net/x25/af_x25.c | 2 +-
40 files changed, 231 insertions(+), 109 deletions(-)