The patch titled
Subject: mm: swap: get rid of deadloop in swapin readahead
has been added to the -mm tree. Its filename is
mm-swap-get-rid-of-deadloop-in-swapin-readahead.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-swap-get-rid-of-deadloop-in-sw…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-swap-get-rid-of-deadloop-in-sw…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Guo Ziliang <guo.ziliang(a)zte.com.cn>
Subject: mm: swap: get rid of deadloop in swapin readahead
In our testing, a deadloop task was found. Through sysrq printing, same
stack was found every time, as follows:
__swap_duplicate+0x58/0x1a0
swapcache_prepare+0x24/0x30
__read_swap_cache_async+0xac/0x220
read_swap_cache_async+0x58/0xa0
swapin_readahead+0x24c/0x628
do_swap_page+0x374/0x8a0
__handle_mm_fault+0x598/0xd60
handle_mm_fault+0x114/0x200
do_page_fault+0x148/0x4d0
do_translation_fault+0xb0/0xd4
do_mem_abort+0x50/0xb0
The reason for the deadloop is that swapcache_prepare() always returns
EEXIST, indicating that SWAP_HAS_CACHE has not been cleared, so that it
cannot jump out of the loop. We suspect that the task that clears the
SWAP_HAS_CACHE flag never gets a chance to run. We try to lower the
priority of the task stuck in a deadloop so that the task that clears the
SWAP_HAS_CACHE flag will run. The results show that the system returns to
normal after the priority is lowered.
In our testing, multiple real-time tasks are bound to the same core, and
the task in the deadloop is the highest priority task of the core, so the
deadloop task cannot be preempted.
Although cond_resched() is used by __read_swap_cache_async, it is an empty
function in the preemptive system and cannot achieve the purpose of
releasing the CPU. A high-priority task cannot release the CPU unless
preempted by a higher-priority task. But when this task is already the
highest priority task on this core, other tasks will not be able to be
scheduled. So we think we should replace cond_resched() with
schedule_timeout_uninterruptible(1), schedule_timeout_interruptible will
call set_current_state first to set the task state, so the task will be
removed from the running queue, so as to achieve the purpose of giving up
the CPU and prevent it from running in kernel mode for too long.
(akpm: ugly hack becomes uglier. But it fixes the issue in a
backportable-to-stable fashion while we hopefully work on something
better)
Link: https://lkml.kernel.org/r/20220221111749.1928222-1-cgel.zte@gmail.com
Signed-off-by: Guo Ziliang <guo.ziliang(a)zte.com.cn>
Reported-by: Zeal Robot <zealci(a)zte.com.cn>
Reviewed-by: Ran Xiaokai <ran.xiaokai(a)zte.com.cn>
Reviewed-by: Jiang Xuexin <jiang.xuexin(a)zte.com.cn>
Reviewed-by: Yang Yang <yang.yang29(a)zte.com.cn>
Cc: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Roger Quadros <rogerq(a)kernel.org>
Cc: Ziliang Guo <guo.ziliang(a)zte.com.cn>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/swap_state.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/swap_state.c~mm-swap-get-rid-of-deadloop-in-swapin-readahead
+++ a/mm/swap_state.c
@@ -478,7 +478,7 @@ struct page *__read_swap_cache_async(swp
* __read_swap_cache_async(), which has set SWAP_HAS_CACHE
* in swap_map, but not yet added its page to swap cache.
*/
- cond_resched();
+ schedule_timeout_uninterruptible(1);
}
/*
_
Patches currently in -mm which might be from guo.ziliang(a)zte.com.cn are
mm-swap-get-rid-of-deadloop-in-swapin-readahead.patch
Hi Jens,
Sysbot found an UAF bug in __fdget_raw [1].
The issue triggers on 5.10 stable, and doesn't trigger on mainline.
I was able to bisect it to the fixing commit:
commit fb3a1f6c745c "io-wq: have manager wait for all workers to exit"
The fix went in around 5.12 kernel and was part of a bigger series of uio fixes:
https://lore.kernel.org/all/20210304002700.374417-3-axboe@kernel.dk/
Then I found out that there is one more fix needed on top:
"io-wq: fix race in freeing 'wq' and worker access"
https://lore.kernel.org/all/20210310224358.1494503-2-axboe@kernel.dk/
I have back ported the two patches to 5.10, see patch below, but the issue still
triggers. See trace [2]
Could you have a look and see what else could be missing. Any suggestion would
be appreciated.
--
Thanks,
Tadeusz
[1] https://syzkaller.appspot.com/bug?id=54c4ddb7a0d44bd9fbdc22d19caff5f2098081…
[2] https://pastebin.linaro.org/view/raw/263a8d9f
diff --git a/fs/io-wq.c b/fs/io-wq.c
index 3d5fc76b92d0..c39568971288 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -125,6 +125,9 @@ struct io_wq {
refcount_t refs;
struct completion done;
+ atomic_t worker_refs;
+ struct completion worker_done;
+
struct hlist_node cpuhp_node;
refcount_t use_refs;
@@ -250,6 +253,10 @@ static void io_worker_exit(struct io_worker *worker)
raw_spin_unlock_irq(&wqe->lock);
kfree_rcu(worker, rcu);
+
+ if (atomic_dec_and_test(&wqe->wq->worker_refs))
+ complete(&wqe->wq->worker_done);
+
if (refcount_dec_and_test(&wqe->wq->refs))
complete(&wqe->wq->done);
}
@@ -691,6 +698,7 @@ static bool create_io_worker(struct io_wq *wq, struct io_wqe
*wqe, int index)
return false;
refcount_set(&worker->ref, 1);
+ atomic_inc(&wq->worker_refs);
worker->nulls_node.pprev = NULL;
worker->wqe = wqe;
spin_lock_init(&worker->lock);
@@ -821,6 +829,14 @@ static int io_wq_manager(void *data)
if (current->task_works)
task_work_run();
+ rcu_read_lock();
+ for_each_node(node)
+ io_wq_for_each_worker(wq->wqes[node], io_wq_worker_wake, NULL);
+ rcu_read_unlock();
+
+ if (atomic_dec_and_test(&wq->worker_refs))
+ complete(&wq->worker_done);
+ wait_for_completion(&wq->worker_done);
out:
if (refcount_dec_and_test(&wq->refs)) {
complete(&wq->done);
@@ -1134,6 +1150,8 @@ struct io_wq *io_wq_create(unsigned bounded, struct
io_wq_data *data)
}
init_completion(&wq->done);
+ init_completion(&wq->worker_done);
+ atomic_set(&wq->worker_refs, 0);
wq->manager = kthread_create(io_wq_manager, wq, "io_wq_manager");
if (!IS_ERR(wq->manager)) {
@@ -1179,11 +1197,6 @@ static void __io_wq_destroy(struct io_wq *wq)
if (wq->manager)
kthread_stop(wq->manager);
- rcu_read_lock();
- for_each_node(node)
- io_wq_for_each_worker(wq->wqes[node], io_wq_worker_wake, NULL);
- rcu_read_unlock();
-
wait_for_completion(&wq->done);
for_each_node(node)
__acpi_node_get_property_reference() is documented to return -ENOENT if
the caller requests a property reference at an index that does not exist,
not -EINVAL which it actually does.
Fix this by returning -ENOENT consistenly, independently of whether the
property value is a plain reference or a package.
Fixes: c343bc2ce2c6 ("ACPI: properties: Align return codes of __acpi_node_get_property_reference()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sakari Ailus <sakari.ailus(a)linux.intel.com>
---
drivers/acpi/property.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/acpi/property.c b/drivers/acpi/property.c
index d0986bda29640..3fceb4681ec9f 100644
--- a/drivers/acpi/property.c
+++ b/drivers/acpi/property.c
@@ -685,7 +685,7 @@ int __acpi_node_get_property_reference(const struct fwnode_handle *fwnode,
*/
if (obj->type == ACPI_TYPE_LOCAL_REFERENCE) {
if (index)
- return -EINVAL;
+ return -ENOENT;
device = acpi_fetch_acpi_dev(obj->reference.handle);
if (!device)
--
2.30.2