On May 15, 2023, at 10:55 AM, Ding Hui dinghui@sangfor.com.cn wrote:
On 2023/5/15 20:01, Chuck Lever III wrote:
On May 14, 2023, at 10:13 PM, Ding Hui dinghui@sangfor.com.cn wrote:
After the listener svc_sock be freed, and before invoking svc_tcp_accept() for the established child sock, there is a window that the newsock retaining a freed listener svc_sock in sk_user_data which cloning from parent.
Thank you, I will apply this (after testing it). The next step is to figure out why SUNRPC is trying to accept on a dead listener. Any thoughts about that?
A child sock is cloned from the listener sock, it inherits sk_data_ready and sk_user_data from its parent sock, which is svc_tcp_listen_data_ready() and listener svc_sock, the sk_state of the child becomes ESTABLISHED once after TCP handshake in protocol stack.
Case 1:
listener sock | child sock | nfsd thread =>sk_data_ready | =>sk_data_ready | -------------------+-----------------------+---------------------- svc_tcp_listen_data_ready svsk is listener svc_sock set_bit(XPT_CONN) svc_recv svc_tcp_accept(listener) kernel_accept get child sock as newsock svc_setup_socket(newsock) newsock->sk_data_ready=svc_data_ready newsock->sk_user_data=newsvsk svc_data_ready svsk is newsvsk
Case 2:
listener sock | child sock | nfsd thread =>sk_data_ready | =>sk_data_ready | -------------------+-----------------------+---------------------- svc_tcp_listen_data_ready svsk is listener svc_sock set_bit(XPT_CONN) svc_tcp_listen_data_ready svsk is listener svc_sock svc_recv svc_tcp_accept(listener) kernel_accept get the child sock as newsock svc_setup_socket(newsock) newsock->sk_data_ready=svc_data_ready newsock->sk_user_data=newsvsk svc_data_ready svsk is newsvsk
The UAF case:
listener sock | child sock | rpc.nfsd 0 =>sk_data_ready | =>sk_data_ready | -------------------+-----------------------+---------------------- svc_tcp_listen_data_ready svsk is listener svc_sock set_bit(XPT_CONN) svc_xprt_destroy_all svc_xprt_free kfree listener svc_sock // the child sock has not yet been accepted, // so it is not managed by SUNRPC for now. svc_tcp_listen_data_ready svsk is listener svc_sock svsk->sk_odata // UAF!
Thanks for the ladder diagrams, that helps.
In the race windows if data is received on the newsock, we will observe use-after-free report in svc_tcp_listen_data_ready().
Reproduce by two tasks:
- while :; do rpc.nfsd 0 ; rpc.nfsd; done
- while :; do echo "" | ncat -4 127.0.0.1 2049 ; done
I will continue attempting to reproduce, as I would like a root cause for this issue.
svc_xprt shutdown seems to be unordered. I would think that it should unregister with the portmapper, close the listener so no new connections can be established, then close the children last. That doesn't appear to be how it works, though.
But if the listener happens to be closed first, that frees the svc_sock that might be relied upon by children sockets. The shutdown ordering seems to be the crux of the problem, and pinning the svc_xprt won't help, as you pointed out.
Making the children not rely on the listener does address the crash, but doesn't fix the design. But the design problem doesn't seem to be urgent. So I'm going to file a low priority bug to document the design issue.
KASAN report:
================================================================== BUG: KASAN: slab-use-after-free in svc_tcp_listen_data_ready+0x1cf/0x1f0 [sunrpc] Read of size 8 at addr ffff888139d96228 by task nc/102553 CPU: 7 PID: 102553 Comm: nc Not tainted 6.3.0+ #18 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 Call Trace:
<IRQ> dump_stack_lvl+0x33/0x50 print_address_description.constprop.0+0x27/0x310 print_report+0x3e/0x70 kasan_report+0xae/0xe0 svc_tcp_listen_data_ready+0x1cf/0x1f0 [sunrpc] tcp_data_queue+0x9f4/0x20e0 tcp_rcv_established+0x666/0x1f60 tcp_v4_do_rcv+0x51c/0x850 tcp_v4_rcv+0x23fc/0x2e80 ip_protocol_deliver_rcu+0x62/0x300 ip_local_deliver_finish+0x267/0x350 ip_local_deliver+0x18b/0x2d0 ip_rcv+0x2fb/0x370 __netif_receive_skb_one_core+0x166/0x1b0 process_backlog+0x24c/0x5e0 __napi_poll+0xa2/0x500 net_rx_action+0x854/0xc90 __do_softirq+0x1bb/0x5de do_softirq+0xcb/0x100 </IRQ> <TASK> ... </TASK>
Allocated by task 102371: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 __kasan_kmalloc+0x7b/0x90 svc_setup_socket+0x52/0x4f0 [sunrpc] svc_addsock+0x20d/0x400 [sunrpc] __write_ports_addfd+0x209/0x390 [nfsd] write_ports+0x239/0x2c0 [nfsd] nfsctl_transaction_write+0xac/0x110 [nfsd] vfs_write+0x1c3/0xae0 ksys_write+0xed/0x1c0 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc
Freed by task 102551: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x50 __kasan_slab_free+0x106/0x190 __kmem_cache_free+0x133/0x270 svc_xprt_free+0x1e2/0x350 [sunrpc] svc_xprt_destroy_all+0x25a/0x440 [sunrpc] nfsd_put+0x125/0x240 [nfsd] nfsd_svc+0x2cb/0x3c0 [nfsd] write_threads+0x1ac/0x2a0 [nfsd] nfsctl_transaction_write+0xac/0x110 [nfsd] vfs_write+0x1c3/0xae0 ksys_write+0xed/0x1c0 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc
Fix the UAF by simply doing nothing in svc_tcp_listen_data_ready() if state != TCP_LISTEN, that will avoid dereferencing svsk for all child socket.
Link: https://lore.kernel.org/lkml/20230507091131.23540-1-dinghui@sangfor.com.cn/ Fixes: fa9251afc33c ("SUNRPC: Call the default socket callbacks instead of open coding") Signed-off-by: Ding Hui dinghui@sangfor.com.cn Cc: stable@vger.kernel.org
net/sunrpc/svcsock.c | 23 +++++++++++------------ 1 file changed, 11 insertions(+), 12 deletions(-)
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index a51c9b989d58..9aca6e1e78e4 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -825,12 +825,6 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
trace_sk_data_ready(sk);
- if (svsk) {
- /* Refer to svc_setup_socket() for details. */
- rmb();
- svsk->sk_odata(sk);
- }
/*
- This callback may called twice when a new connection
- is established as a child socket inherits everything
@@ -839,13 +833,18 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
- when one of child sockets become ESTABLISHED.
- data_ready method of the child socket may be called
- when it receives data before the socket is accepted.
- In case of 2, we should ignore it silently.
- In case of 2, we should ignore it silently and DO NOT
- dereference svsk.
*/
- if (sk->sk_state == TCP_LISTEN) {
- if (svsk) {
- set_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags);
- svc_xprt_enqueue(&svsk->sk_xprt);
- }
- if (sk->sk_state != TCP_LISTEN)
- return;
- if (svsk) {
- /* Refer to svc_setup_socket() for details. */
- rmb();
- svsk->sk_odata(sk);
- set_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags);
- svc_xprt_enqueue(&svsk->sk_xprt);
} }
-- 2.17.1
-- Chuck Lever
-- Thanks, -dinghui
-- Chuck Lever