From: Michal Hocko <mhocko(a)suse.com>
[ Upstream commit 093590c16b447f53e66771c8579ae66c96f6ef61 ]
The fill_page_cache_func() function allocates couple of pages to store
kvfree_rcu_bulk_data structures. This is a lightweight (GFP_NORETRY)
allocation which can fail under memory pressure. The function will,
however keep retrying even when the previous attempt has failed.
This retrying is in theory correct, but in practice the allocation is
invoked from workqueue context, which means that if the memory reclaim
gets stuck, these retries can hog the worker for quite some time.
Although the workqueues subsystem automatically adjusts concurrency, such
adjustment is not guaranteed to happen until the worker context sleeps.
And the fill_page_cache_func() function's retry loop is not guaranteed
to sleep (see the should_reclaim_retry() function).
And we have seen this function cause workqueue lockups:
kernel: BUG: workqueue lockup - pool cpus=93 node=1 flags=0x1 nice=0 stuck for 32s!
[...]
kernel: pool 74: cpus=37 node=0 flags=0x1 nice=0 hung=32s workers=2 manager: 2146
kernel: pwq 498: cpus=249 node=1 flags=0x1 nice=0 active=4/256 refcnt=5
kernel: in-flight: 1917:fill_page_cache_func
kernel: pending: dbs_work_handler, free_work, kfree_rcu_monitor
Originally, we thought that the root cause of this lockup was several
retries with direct reclaim, but this is not yet confirmed. Furthermore,
we have seen similar lockups without any heavy memory pressure. This
suggests that there are other factors contributing to these lockups.
However, it is not really clear that endless retries are desireable.
So let's make the fill_page_cache_func() function back off after
allocation failure.
Cc: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
Cc: "Paul E. McKenney" <paulmck(a)kernel.org>
Cc: Frederic Weisbecker <frederic(a)kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju(a)quicinc.com>
Cc: Josh Triplett <josh(a)joshtriplett.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Lai Jiangshan <jiangshanlai(a)gmail.com>
Cc: Joel Fernandes <joel(a)joelfernandes.org>
Signed-off-by: Michal Hocko <mhocko(a)suse.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/rcu/tree.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index b41009a283ca..b10d6bcea77d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3393,15 +3393,16 @@ static void fill_page_cache_func(struct work_struct *work)
bnode = (struct kvfree_rcu_bulk_data *)
__get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
- if (bnode) {
- raw_spin_lock_irqsave(&krcp->lock, flags);
- pushed = put_cached_bnode(krcp, bnode);
- raw_spin_unlock_irqrestore(&krcp->lock, flags);
+ if (!bnode)
+ break;
- if (!pushed) {
- free_page((unsigned long) bnode);
- break;
- }
+ raw_spin_lock_irqsave(&krcp->lock, flags);
+ pushed = put_cached_bnode(krcp, bnode);
+ raw_spin_unlock_irqrestore(&krcp->lock, flags);
+
+ if (!pushed) {
+ free_page((unsigned long) bnode);
+ break;
}
}
--
2.35.1
Dear Sir/Ma,
A reputable pharmaceutical company from Vietnam is in need of a reliable individual or corporate entity in your state to act as their Liaison; this will not affect your current job or business operations in anyway. If interested, reply for more information.
Sincerely,
Ms. Lan Nguyen
From: Anssi Hannula <anssi.hannula(a)bitwise.fi>
can_restart() expects CMD_START_CHIP to set the error state to
ERROR_ACTIVE as it calls netif_carrier_on() immediately afterwards.
Otherwise the user may immediately trigger restart again and hit a
BUG_ON() in can_restart().
Fix kvaser_usb_leaf set_mode(CMD_START_CHIP) to set the expected state.
Cc: stable(a)vger.kernel.org
Fixes: 080f40a6fa28 ("can: kvaser_usb: Add support for Kvaser CAN/USB devices")
Tested-by: Jimmy Assarsson <extja(a)kvaser.com>
Signed-off-by: Anssi Hannula <anssi.hannula(a)bitwise.fi>
Signed-off-by: Jimmy Assarsson <extja(a)kvaser.com>
Link: https://lore.kernel.org/all/20221010150829.199676-5-extja@kvaser.com
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/usb/kvaser_usb/kvaser_usb_leaf.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/can/usb/kvaser_usb/kvaser_usb_leaf.c b/drivers/net/can/usb/kvaser_usb/kvaser_usb_leaf.c
index 59c220ef3049..50f2ac8319ff 100644
--- a/drivers/net/can/usb/kvaser_usb/kvaser_usb_leaf.c
+++ b/drivers/net/can/usb/kvaser_usb/kvaser_usb_leaf.c
@@ -1431,6 +1431,8 @@ static int kvaser_usb_leaf_set_mode(struct net_device *netdev,
err = kvaser_usb_leaf_simple_cmd_async(priv, CMD_START_CHIP);
if (err)
return err;
+
+ priv->can.state = CAN_STATE_ERROR_ACTIVE;
break;
default:
return -EOPNOTSUPP;
--
2.35.1
From: Anssi Hannula <anssi.hannula(a)bitwise.fi>
flush_comp is initialized when CMD_FLUSH_QUEUE is sent to the device and
completed when the device sends CMD_FLUSH_QUEUE_RESP.
This causes completion of uninitialized completion if the device sends
CMD_FLUSH_QUEUE_RESP before CMD_FLUSH_QUEUE is ever sent (e.g. as a
response to a flush by a previously bound driver, or a misbehaving
device).
Fix that by initializing flush_comp in kvaser_usb_init_one() like the
other completions.
This issue is only triggerable after RX URBs have been set up, i.e. the
interface has been opened at least once.
Cc: stable(a)vger.kernel.org
Fixes: aec5fb2268b7 ("can: kvaser_usb: Add support for Kvaser USB hydra family")
Tested-by: Jimmy Assarsson <extja(a)kvaser.com>
Signed-off-by: Anssi Hannula <anssi.hannula(a)bitwise.fi>
Signed-off-by: Jimmy Assarsson <extja(a)kvaser.com>
Link: https://lore.kernel.org/all/20221010150829.199676-3-extja@kvaser.com
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c | 1 +
drivers/net/can/usb/kvaser_usb/kvaser_usb_hydra.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c b/drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c
index 824cab80aa02..c2bce6773adc 100644
--- a/drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c
+++ b/drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c
@@ -729,6 +729,7 @@ static int kvaser_usb_init_one(struct kvaser_usb *dev, int channel)
init_usb_anchor(&priv->tx_submitted);
init_completion(&priv->start_comp);
init_completion(&priv->stop_comp);
+ init_completion(&priv->flush_comp);
priv->can.ctrlmode_supported = 0;
priv->dev = dev;
diff --git a/drivers/net/can/usb/kvaser_usb/kvaser_usb_hydra.c b/drivers/net/can/usb/kvaser_usb/kvaser_usb_hydra.c
index 6871d474dabf..7b52fda73d82 100644
--- a/drivers/net/can/usb/kvaser_usb/kvaser_usb_hydra.c
+++ b/drivers/net/can/usb/kvaser_usb/kvaser_usb_hydra.c
@@ -1916,7 +1916,7 @@ static int kvaser_usb_hydra_flush_queue(struct kvaser_usb_net_priv *priv)
{
int err;
- init_completion(&priv->flush_comp);
+ reinit_completion(&priv->flush_comp);
err = kvaser_usb_hydra_send_simple_cmd(priv->dev, CMD_FLUSH_QUEUE,
priv->channel);
--
2.35.1
Changes in v5:
- Split series [1], keept only critical bug fixes that should go into
stable, since v4 got rejected [2].
Non-critical fixes are posted in a separate series.
[1]
https://lore.kernel.org/linux-can/20220903182344.139-1-extja@kvaser.com
[2]
https://lore.kernel.org/linux-can/20220920192708.jcvyph3ec7lscuqj@pengutron…
Anssi Hannula (4):
can: kvaser_usb_leaf: Fix overread with an invalid command
can: kvaser_usb: Fix use of uninitialized completion
can: kvaser_usb_leaf: Fix TX queue out of sync after restart
can: kvaser_usb_leaf: Fix CAN state after restart
drivers/net/can/usb/kvaser_usb/kvaser_usb.h | 2 +
.../net/can/usb/kvaser_usb/kvaser_usb_core.c | 3 +-
.../net/can/usb/kvaser_usb/kvaser_usb_hydra.c | 2 +-
.../net/can/usb/kvaser_usb/kvaser_usb_leaf.c | 79 +++++++++++++++++++
4 files changed, 84 insertions(+), 2 deletions(-)
--
2.38.0