From: Michal Hocko mhocko@suse.com
[ Upstream commit 093590c16b447f53e66771c8579ae66c96f6ef61 ]
The fill_page_cache_func() function allocates couple of pages to store kvfree_rcu_bulk_data structures. This is a lightweight (GFP_NORETRY) allocation which can fail under memory pressure. The function will, however keep retrying even when the previous attempt has failed.
This retrying is in theory correct, but in practice the allocation is invoked from workqueue context, which means that if the memory reclaim gets stuck, these retries can hog the worker for quite some time. Although the workqueues subsystem automatically adjusts concurrency, such adjustment is not guaranteed to happen until the worker context sleeps. And the fill_page_cache_func() function's retry loop is not guaranteed to sleep (see the should_reclaim_retry() function).
And we have seen this function cause workqueue lockups:
kernel: BUG: workqueue lockup - pool cpus=93 node=1 flags=0x1 nice=0 stuck for 32s! [...] kernel: pool 74: cpus=37 node=0 flags=0x1 nice=0 hung=32s workers=2 manager: 2146 kernel: pwq 498: cpus=249 node=1 flags=0x1 nice=0 active=4/256 refcnt=5 kernel: in-flight: 1917:fill_page_cache_func kernel: pending: dbs_work_handler, free_work, kfree_rcu_monitor
Originally, we thought that the root cause of this lockup was several retries with direct reclaim, but this is not yet confirmed. Furthermore, we have seen similar lockups without any heavy memory pressure. This suggests that there are other factors contributing to these lockups. However, it is not really clear that endless retries are desireable.
So let's make the fill_page_cache_func() function back off after allocation failure.
Cc: Uladzislau Rezki (Sony) urezki@gmail.com Cc: "Paul E. McKenney" paulmck@kernel.org Cc: Frederic Weisbecker frederic@kernel.org Cc: Neeraj Upadhyay quic_neeraju@quicinc.com Cc: Josh Triplett josh@joshtriplett.org Cc: Steven Rostedt rostedt@goodmis.org Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Lai Jiangshan jiangshanlai@gmail.com Cc: Joel Fernandes joel@joelfernandes.org Signed-off-by: Michal Hocko mhocko@suse.com Reviewed-by: Uladzislau Rezki (Sony) urezki@gmail.com Signed-off-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/rcu/tree.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index c25ba442044a..54a3a19c4c0b 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3508,15 +3508,16 @@ static void fill_page_cache_func(struct work_struct *work) bnode = (struct kvfree_rcu_bulk_data *) __get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN);
- if (bnode) { - raw_spin_lock_irqsave(&krcp->lock, flags); - pushed = put_cached_bnode(krcp, bnode); - raw_spin_unlock_irqrestore(&krcp->lock, flags); + if (!bnode) + break;
- if (!pushed) { - free_page((unsigned long) bnode); - break; - } + raw_spin_lock_irqsave(&krcp->lock, flags); + pushed = put_cached_bnode(krcp, bnode); + raw_spin_unlock_irqrestore(&krcp->lock, flags); + + if (!pushed) { + free_page((unsigned long) bnode); + break; } }