Syzkaller reports a KASAN issue as below:
general protection fault, probably for non-canonical address 0xfbd59c0000000021: 0000 [#1] PREEMPT SMP KASAN NOPTI KASAN: maybe wild-memory-access in range [0xdead000000000108-0xdead00000000010f] CPU: 0 PID: 5083 Comm: syz-executor.2 Not tainted 6.1.134-syzkaller-00037-g855bd1d7d838 #0 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 RIP: 0010:__list_del include/linux/list.h:114 [inline] RIP: 0010:__list_del_entry include/linux/list.h:137 [inline] RIP: 0010:list_del include/linux/list.h:148 [inline] RIP: 0010:p9_fd_cancelled+0xe9/0x200 net/9p/trans_fd.c:734
Call Trace: <TASK> p9_client_flush+0x351/0x440 net/9p/client.c:614 p9_client_rpc+0xb6b/0xc70 net/9p/client.c:734 p9_client_version net/9p/client.c:920 [inline] p9_client_create+0xb51/0x1240 net/9p/client.c:1027 v9fs_session_init+0x1f0/0x18f0 fs/9p/v9fs.c:408 v9fs_mount+0xba/0xcb0 fs/9p/vfs_super.c:126 legacy_get_tree+0x108/0x220 fs/fs_context.c:632 vfs_get_tree+0x8e/0x300 fs/super.c:1573 do_new_mount fs/namespace.c:3056 [inline] path_mount+0x6a6/0x1e90 fs/namespace.c:3386 do_mount fs/namespace.c:3399 [inline] __do_sys_mount fs/namespace.c:3607 [inline] __se_sys_mount fs/namespace.c:3584 [inline] __x64_sys_mount+0x283/0x300 fs/namespace.c:3584 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x35/0x80 arch/x86/entry/common.c:81 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
This happens because of a race condition between:
- The 9p client sending an invalid flush request and later cleaning it up; - The 9p client in p9_read_work() canceled all pending requests.
Thread 1 Thread 2 ... p9_client_create() ... p9_fd_create() ... p9_conn_create() ... // start Thread 2 INIT_WORK(&m->rq, p9_read_work); p9_read_work() ... p9_client_rpc() ... ... p9_conn_cancel() ... spin_lock(&m->req_lock); ... p9_fd_cancelled() ... ... spin_unlock(&m->req_lock); // status rewrite p9_client_cb(m->client, req, REQ_STATUS_ERROR) // first remove list_del(&req->req_list); ...
spin_lock(&m->req_lock) ... // second remove list_del(&req->req_list); spin_unlock(&m->req_lock) ...
Commit 74d6a5d56629 ("9p/trans_fd: Fix concurrency del of req_list in p9_fd_cancelled/p9_read_work") fixes a concurrency issue in the 9p filesystem client where the req_list could be deleted simultaneously by both p9_read_work and p9_fd_cancelled functions, but for the case where req->status equals REQ_STATUS_RCVD.
Add an explicit check for REQ_STATUS_ERROR in p9_fd_cancelled before processing the request. Skip processing if the request is already in the error state, as it has been removed and its resources cleaned up.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: afd8d6541155 ("9P: Add cancelled() to the transport functions.") Cc: stable@vger.kernel.org Signed-off-by: Nalivayko Sergey Sergey.Nalivayko@kaspersky.com --- net/9p/trans_fd.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c index a69422366a23..a6054a392a90 100644 --- a/net/9p/trans_fd.c +++ b/net/9p/trans_fd.c @@ -721,9 +721,9 @@ static int p9_fd_cancelled(struct p9_client *client, struct p9_req_t *req)
spin_lock(&m->req_lock); /* Ignore cancelled request if message has been received - * before lock. - */ - if (req->status == REQ_STATUS_RCVD) { + * or cancelled with error before lock. + */ + if (req->status == REQ_STATUS_RCVD || req->status == REQ_STATUS_ERROR) { spin_unlock(&m->req_lock); return 0; }
Nalivayko Sergey wrote on Tue, Jul 15, 2025 at 06:48:15PM +0300:
This happens because of a race condition between:
The 9p client sending an invalid flush request and later cleaning it up;
The 9p client in p9_read_work() canceled all pending requests.
Thread 1 Thread 2
... p9_client_create() ... p9_fd_create() ... p9_conn_create() ... // start Thread 2 INIT_WORK(&m->rq, p9_read_work); p9_read_work() ... p9_client_rpc() ... ... p9_conn_cancel() ... spin_lock(&m->req_lock); ... p9_fd_cancelled() ... ... spin_unlock(&m->req_lock); // status rewrite p9_client_cb(m->client, req, REQ_STATUS_ERROR) // first remove list_del(&req->req_list); ...
spin_lock(&m->req_lock) ... // second remove list_del(&req->req_list); spin_unlock(&m->req_lock) ...
Commit 74d6a5d56629 ("9p/trans_fd: Fix concurrency del of req_list in p9_fd_cancelled/p9_read_work") fixes a concurrency issue in the 9p filesystem client where the req_list could be deleted simultaneously by both p9_read_work and p9_fd_cancelled functions, but for the case where req->status equals REQ_STATUS_RCVD.
Sorry for the delay, Thanks for the investigation, this makes sense and deserves fixing.
Add an explicit check for REQ_STATUS_ERROR in p9_fd_cancelled before processing the request. Skip processing if the request is already in the error state, as it has been removed and its resources cleaned up.
Looking at the other status, it's quite unlikely but if other thread would make it FLSHD we should also skip these -- and I don't think it's possible as far as the logic goes but if it's not sent yet we would have nothing to flush either, so it's probably better to invert the check, and make it `if (req != SENT) return` ?
client.c already checks `READ_ONCE(oldreq->status) == REQ_STATUS_SENT` before calling cancelled but that's without lock, so basically we're checking nothing raced since that check, and it's not limited to RCVD and ERROR.
If you can send a v2 with that I'll pick it up.
Thanks,
Dominique Martinet wrote on Fri, Aug 15, 2025 at 11:01:00AM +0900:
Add an explicit check for REQ_STATUS_ERROR in p9_fd_cancelled before processing the request. Skip processing if the request is already in the error state, as it has been removed and its resources cleaned up.
Looking at the other status, it's quite unlikely but if other thread would make it FLSHD we should also skip these -- and I don't think it's possible as far as the logic goes but if it's not sent yet we would have nothing to flush either, so it's probably better to invert the check, and make it `if (req != SENT) return` ?
client.c already checks `READ_ONCE(oldreq->status) == REQ_STATUS_SENT` before calling cancelled but that's without lock, so basically we're checking nothing raced since that check, and it's not limited to RCVD and ERROR.
If you can send a v2 with that I'll pick it up.
Actually it's just as fast if I do it myself, if you have time please check this makes sense: https://github.com/martinetd/linux/commit/afdaa9f9ea451a935e9b7645fc7ffd93d5...
This is a fix but I don't believe it's urgent (can only happen with a bogus server, and while in theory we should aim to be robust to an adversary server I don't believe 9p is anywhere near that point), so I'll push it along with other fixes next cycle as I missed the 5.17 train
Thanks,
linux-stable-mirror@lists.linaro.org