[PATCH net v4 0/3] vsock/bpf: Handle races between sockmap update and connect() disconnecting

List overview All Threads
Download

newer

older

[PATCH v8 0/3] rust: kunit:...

[PATCH bpf-next] selftests/bpf:...

Michal Luczaj

17 Mar 2025 17 Mar '25

9:52 a.m.

Signal delivery during connect() may disconnect an already established socket. Problem is that such socket might have been placed in a sockmap before the connection was closed.

PATCH 1 ensures this race won't lead to an unconnected vsock staying in the sockmap. PATCH 2 selftests it.

PATCH 3 fixes a related race. Note that selftest in PATCH 2 does test this code as well, but winning this race variant may take more than 2 seconds, so I'm not advertising it.

Signed-off-by: Michal Luczaj mhal@rbox.co --- Changes in v4: - Selftest: send signal to only our own process - Link to v3: https://lore.kernel.org/r/20250316-vsock-trans-signal-race-v3-0-17a6862277c9...

Changes in v3: - Selftest: drop unnecessary variable initialization and reorder the calls - Link to v2: https://lore.kernel.org/r/20250314-vsock-trans-signal-race-v2-0-421a41f60f42...

Changes in v2: - Handle one more path of tripping the warning - Add a selftest - Collect R-b [Stefano] - Link to v1: https://lore.kernel.org/r/20250307-vsock-trans-signal-race-v1-1-3aca3f771fbd...

--- Michal Luczaj (3): vsock/bpf: Fix EINTR connect() racing sockmap update selftest/bpf: Add test for AF_VSOCK connect() racing sockmap update vsock/bpf: Fix bpf recvmsg() racing transport reassignment

net/vmw_vsock/af_vsock.c | 10 ++- net/vmw_vsock/vsock_bpf.c | 24 ++++-- .../selftests/bpf/prog_tests/sockmap_basic.c | 99 ++++++++++++++++++++++ 3 files changed, 124 insertions(+), 9 deletions(-) --- base-commit: da9e8efe7ee10e8425dc356a9fc593502c8e3933 change-id: 20250305-vsock-trans-signal-race-d62f7718d099

Best regards,

-- Michal Luczaj mhal@rbox.co

Show replies by date

Michal Luczaj

17 Mar 17 Mar

9:52 a.m.

New subject: [PATCH net v4 1/3] vsock/bpf: Fix EINTR connect() racing sockmap update

Signal delivery during connect() may result in a disconnect of an already TCP_ESTABLISHED socket. Problem is that such established socket might have been placed in a sockmap before the connection was closed. We end up with a SS_UNCONNECTED vsock in a sockmap. And this, combined with the ability to reassign (unconnected) vsock's transport to NULL, breaks the sockmap contract. As manifested by WARN_ON_ONCE.

connect / state = SS_CONNECTED / sock_map_update_elem if signal_pending state = SS_UNCONNECTED

connect transport = NULL vsock_bpf_recvmsg WARN_ON_ONCE(!vsk->transport)

Ensure the socket does not stay in sockmap.

WARNING: CPU: 8 PID: 1228 at net/vmw_vsock/vsock_bpf.c:97 vsock_bpf_recvmsg+0xb43/0xe00 CPU: 8 UID: 0 PID: 1228 Comm: a.out Not tainted 6.14.0-rc5+ RIP: 0010:vsock_bpf_recvmsg+0xb43/0xe00 sock_recvmsg+0x1b2/0x220 __sys_recvfrom+0x190/0x270 __x64_sys_recvfrom+0xdc/0x1b0 do_syscall_64+0x93/0x1b0 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Fixes: 634f1a7110b4 ("vsock: support sockmap") Reviewed-by: Stefano Garzarella sgarzare@redhat.com Signed-off-by: Michal Luczaj mhal@rbox.co --- net/vmw_vsock/af_vsock.c | 10 +++++++++- net/vmw_vsock/vsock_bpf.c | 1 + 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 7e3db87ae4333cf63327ec105ca99253569bb9fe..81b1b8e9c946a646778367ab78ca180cef75ef72 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -1579,7 +1579,15 @@ static int vsock_connect(struct socket *sock, struct sockaddr *addr,

if (signal_pending(current)) { err = sock_intr_errno(timeout); - sk->sk_state = sk->sk_state == TCP_ESTABLISHED ? TCP_CLOSING : TCP_CLOSE; + if (sk->sk_state == TCP_ESTABLISHED) { + /* Might have raced with a sockmap update. */ + if (sk->sk_prot->unhash) + sk->sk_prot->unhash(sk); + + sk->sk_state = TCP_CLOSING; + } else { + sk->sk_state = TCP_CLOSE; + } sock->state = SS_UNCONNECTED; vsock_transport_cancel_pkt(vsk); vsock_remove_connected(vsk); diff --git a/net/vmw_vsock/vsock_bpf.c b/net/vmw_vsock/vsock_bpf.c index 07b96d56f3a577af71021b1b8132743554996c4f..c68fdaf09046b68254dac3ea70ffbe73dfa45cef 100644 --- a/net/vmw_vsock/vsock_bpf.c +++ b/net/vmw_vsock/vsock_bpf.c @@ -127,6 +127,7 @@ static void vsock_bpf_rebuild_protos(struct proto *prot, const struct proto *bas { *prot = *base; prot->close = sock_map_close; + prot->unhash = sock_map_unhash; prot->recvmsg = vsock_bpf_recvmsg; prot->sock_is_readable = sk_msg_is_readable; }

-- 2.48.1

Michal Luczaj

9:52 a.m.

New subject: [PATCH net v4 2/3] selftest/bpf: Add test for AF_VSOCK connect() racing sockmap update

Racing signal-interrupted connect() and sockmap update may result in an unconnected (and missing vsock transport) socket in a sockmap.

Test spends 2 seconds attempting to reach WARN_ON_ONCE().

connect / state = SS_CONNECTED / sock_map_update_elem if signal_pending state = SS_UNCONNECTED

connect transport = NULL vsock_bpf_recvmsg WARN_ON_ONCE(!vsk->transport)

Signed-off-by: Michal Luczaj mhal@rbox.co --- .../selftests/bpf/prog_tests/sockmap_basic.c | 99 ++++++++++++++++++++++ 1 file changed, 99 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c index 1e3e4392dcca0e1722c1982ecc649a80c27443b2..2f8bba27866354848f1e30b5473cedb6a85244ff 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c @@ -3,6 +3,7 @@ #include <error.h> #include <netinet/tcp.h> #include <sys/epoll.h> +#include <linux/time64.h>

#include "test_progs.h" #include "test_skmsg_load_helpers.skel.h" @@ -1042,6 +1043,102 @@ static void test_sockmap_vsock_unconnected(void) xclose(map); }

+#define CONNECT_SIGNAL_RACE_TIMEOUT 2 /* seconds */ + +static void sig_handler(int signum) +{ + /* nop */ +} + +static void connect_signal_racer_cleanup(void *map) +{ + xclose(*(int *)map); +} + +static void *connect_signal_racer(void *arg) +{ + pid_t pid; + int map; + + map = bpf_map_create(BPF_MAP_TYPE_SOCKMAP, NULL, sizeof(int), + sizeof(int), 1, NULL); + if (!ASSERT_OK_FD(map, "bpf_map_create")) + return NULL; + + pthread_cleanup_push(connect_signal_racer_cleanup, &map); + pid = getpid(); + + for (;;) { + int c = *(int *)arg; + int zero = 0; + + (void)bpf_map_update_elem(map, &zero, &c, BPF_ANY); + + if (kill(pid, SIGUSR1)) { + FAIL_ERRNO("kill"); + break; + } + + if ((recv(c, NULL, 0, MSG_DONTWAIT) < 0) && errno == ENODEV) { + FAIL_ERRNO("recv"); + break; + } + } + + pthread_cleanup_pop(1); + + return NULL; +} + +static void test_sockmap_vsock_connect_signal_race(void) +{ + struct sockaddr_vm addr, bad_addr; + socklen_t alen = sizeof(addr); + sighandler_t orig_handler; + pthread_t thread; + int s, c, p; + __u64 tout; + + orig_handler = signal(SIGUSR1, sig_handler); + if (!ASSERT_NEQ(orig_handler, SIG_ERR, "signal handler setup")) + return; + + s = socket_loopback(AF_VSOCK, SOCK_SEQPACKET | SOCK_NONBLOCK); + if (s < 0) + goto restore; + + if (xgetsockname(s, (struct sockaddr *)&addr, &alen)) + goto close; + + bad_addr = addr; + bad_addr.svm_cid = 0x42424242; /* non-existing */ + + if (xpthread_create(&thread, 0, connect_signal_racer, &c)) + goto close; + + tout = get_time_ns() + CONNECT_SIGNAL_RACE_TIMEOUT * NSEC_PER_SEC; + do { + c = xsocket(AF_VSOCK, SOCK_SEQPACKET, 0); + if (c < 0) + break; + + if (connect(c, (struct sockaddr *)&addr, alen) && errno == EINTR) + (void)connect(c, (struct sockaddr *)&bad_addr, alen); + + xclose(c); + p = accept(s, NULL, NULL); + if (p >= 0) + xclose(p); + } while (get_time_ns() < tout); + + ASSERT_OK(pthread_cancel(thread), "pthread_cancel"); + xpthread_join(thread, NULL); +close: + xclose(s); +restore: + ASSERT_NEQ(signal(SIGUSR1, orig_handler), SIG_ERR, "handler restore"); +} + void test_sockmap_basic(void) { if (test__start_subtest("sockmap create_update_free")) @@ -1108,4 +1205,6 @@ void test_sockmap_basic(void) test_sockmap_skb_verdict_vsock_poll(); if (test__start_subtest("sockmap vsock unconnected")) test_sockmap_vsock_unconnected(); + if (test__start_subtest("sockmap vsock connect signal race")) + test_sockmap_vsock_connect_signal_race(); }

-- 2.48.1

Stefano Garzarella

19 Mar 19 Mar

9:17 a.m.

New subject: [PATCH net v4 2/3] selftest/bpf: Add test for AF_VSOCK connect() racing sockmap update

On Mon, Mar 17, 2025 at 10:52:24AM +0100, Michal Luczaj wrote:

...

Racing signal-interrupted connect() and sockmap update may result in an unconnected (and missing vsock transport) socket in a sockmap.

Test spends 2 seconds attempting to reach WARN_ON_ONCE().

connect / state = SS_CONNECTED / sock_map_update_elem if signal_pending state = SS_UNCONNECTED

connect transport = NULL vsock_bpf_recvmsg WARN_ON_ONCE(!vsk->transport)

Signed-off-by: Michal Luczaj mhal@rbox.co

.../selftests/bpf/prog_tests/sockmap_basic.c | 99 ++++++++++++++++++++++ 1 file changed, 99 insertions(+)

LGTM for the vsock part!

Acked-by: Stefano Garzarella sgarzare@redhat.com

...

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c index 1e3e4392dcca0e1722c1982ecc649a80c27443b2..2f8bba27866354848f1e30b5473cedb6a85244ff 100644 --- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c @@ -3,6 +3,7 @@ #include <error.h> #include <netinet/tcp.h> #include <sys/epoll.h> +#include <linux/time64.h>

#include "test_progs.h" #include "test_skmsg_load_helpers.skel.h" @@ -1042,6 +1043,102 @@ static void test_sockmap_vsock_unconnected(void) xclose(map); }

+#define CONNECT_SIGNAL_RACE_TIMEOUT 2 /* seconds */

+static void sig_handler(int signum) +{

/* nop */

+}

+static void connect_signal_racer_cleanup(void *map) +{

xclose(*(int *)map);

+}

+static void *connect_signal_racer(void *arg) +{
pid_t pid;

int map;

map = bpf_map_create(BPF_MAP_TYPE_SOCKMAP, NULL, sizeof(int),
	     sizeof(int), 1, NULL);
if (!ASSERT_OK_FD(map, "bpf_map_create"))
return NULL;
pthread_cleanup_push(connect_signal_racer_cleanup, &map);

pid = getpid();

for (;;) {
int c = *(int *)arg;
int zero = 0;
(void)bpf_map_update_elem(map, &zero, &c, BPF_ANY);
if (kill(pid, SIGUSR1)) {
	FAIL_ERRNO("kill");
	break;
}
if ((recv(c, NULL, 0, MSG_DONTWAIT) < 0) && errno == ENODEV) {
	FAIL_ERRNO("recv");
	break;
}
}

pthread_cleanup_pop(1);

return NULL;
+}

+static void test_sockmap_vsock_connect_signal_race(void) +{
struct sockaddr_vm addr, bad_addr;

socklen_t alen = sizeof(addr);

sighandler_t orig_handler;

pthread_t thread;

int s, c, p;

__u64 tout;

orig_handler = signal(SIGUSR1, sig_handler);

if (!ASSERT_NEQ(orig_handler, SIG_ERR, "signal handler setup"))
return;
s = socket_loopback(AF_VSOCK, SOCK_SEQPACKET | SOCK_NONBLOCK);

if (s < 0)
goto restore;
if (xgetsockname(s, (struct sockaddr *)&addr, &alen))
goto close;
bad_addr = addr;

bad_addr.svm_cid = 0x42424242; /* non-existing */

if (xpthread_create(&thread, 0, connect_signal_racer, &c))
goto close;
tout = get_time_ns() + CONNECT_SIGNAL_RACE_TIMEOUT * NSEC_PER_SEC;

do {
c = xsocket(AF_VSOCK, SOCK_SEQPACKET, 0);
if (c < 0)
	break;
if (connect(c, (struct sockaddr *)&addr, alen) && errno == EINTR)
	(void)connect(c, (struct sockaddr *)&bad_addr, alen);
xclose(c);
p = accept(s, NULL, NULL);
if (p >= 0)
	xclose(p);
} while (get_time_ns() < tout);

ASSERT_OK(pthread_cancel(thread), "pthread_cancel");

xpthread_join(thread, NULL);
+close:

xclose(s);

+restore:

ASSERT_NEQ(signal(SIGUSR1, orig_handler), SIG_ERR, "handler restore");

+}

void test_sockmap_basic(void) { if (test__start_subtest("sockmap create_update_free")) @@ -1108,4 +1205,6 @@ void test_sockmap_basic(void) test_sockmap_skb_verdict_vsock_poll(); if (test__start_subtest("sockmap vsock unconnected")) test_sockmap_vsock_unconnected();
if (test__start_subtest("sockmap vsock connect signal race"))
test_sockmap_vsock_connect_signal_race();
}

-- 2.48.1

Michal Luczaj

17 Mar 17 Mar

9:52 a.m.

New subject: [PATCH net v4 3/3] vsock/bpf: Fix bpf recvmsg() racing transport reassignment

Signal delivery during connect() may lead to a disconnect of an already established socket. That involves removing socket from any sockmap and resetting state to SS_UNCONNECTED. While it correctly restores socket's proto, a call to vsock_bpf_recvmsg() might have been already under way in another thread. If the connect()ing thread reassigns the vsock transport to NULL, the recvmsg()ing thread may trigger a WARN_ON_ONCE.

connect / state = SS_CONNECTED / sock_map_update_elem vsock_bpf_recvmsg psock = sk_psock_get() lock sk if signal_pending unhash sock_map_remove_links state = SS_UNCONNECTED release sk

connect transport = NULL lock sk WARN_ON_ONCE(!vsk->transport)

Protect recvmsg() from racing against transport reassignment. Enforce the sockmap invariant that psock implies transport: lock socket before getting psock.

WARNING: CPU: 9 PID: 1222 at net/vmw_vsock/vsock_bpf.c:92 vsock_bpf_recvmsg+0xb55/0xe00 CPU: 9 UID: 0 PID: 1222 Comm: a.out Not tainted 6.14.0-rc5+ RIP: 0010:vsock_bpf_recvmsg+0xb55/0xe00 sock_recvmsg+0x1b2/0x220 __sys_recvfrom+0x190/0x270 __x64_sys_recvfrom+0xdc/0x1b0 do_syscall_64+0x93/0x1b0 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Fixes: 634f1a7110b4 ("vsock: support sockmap") Signed-off-by: Michal Luczaj mhal@rbox.co --- net/vmw_vsock/vsock_bpf.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/net/vmw_vsock/vsock_bpf.c b/net/vmw_vsock/vsock_bpf.c index c68fdaf09046b68254dac3ea70ffbe73dfa45cef..5138195d91fb258d4bc09b48e80e13651d62863a 100644 --- a/net/vmw_vsock/vsock_bpf.c +++ b/net/vmw_vsock/vsock_bpf.c @@ -73,28 +73,35 @@ static int __vsock_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int return err; }

-static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg, - size_t len, int flags, int *addr_len) +static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, + int flags, int *addr_len) { struct sk_psock *psock; struct vsock_sock *vsk; int copied;

+ /* Since signal delivery during connect() may reset the state of socket + * that's already in a sockmap, take the lock before checking on psock. + * This serializes a possible transport reassignment, protecting this + * function from running with NULL transport. + */ + lock_sock(sk); + psock = sk_psock_get(sk); - if (unlikely(!psock)) + if (unlikely(!psock)) { + release_sock(sk); return __vsock_recvmsg(sk, msg, len, flags); + }

- lock_sock(sk); vsk = vsock_sk(sk); - if (WARN_ON_ONCE(!vsk->transport)) { copied = -ENODEV; goto out; }

if (vsock_has_data(sk, psock) && sk_psock_queue_empty(psock)) { - release_sock(sk); sk_psock_put(sk, psock); + release_sock(sk); return __vsock_recvmsg(sk, msg, len, flags); }

@@ -108,8 +115,8 @@ static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg, }

if (sk_psock_queue_empty(psock)) { - release_sock(sk); sk_psock_put(sk, psock); + release_sock(sk); return __vsock_recvmsg(sk, msg, len, flags); }

@@ -117,8 +124,8 @@ static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg, }

out: - release_sock(sk); sk_psock_put(sk, psock); + release_sock(sk);

return copied; }

-- 2.48.1

Stefano Garzarella

19 Mar 19 Mar

9:34 a.m.

New subject: [PATCH net v4 3/3] vsock/bpf: Fix bpf recvmsg() racing transport reassignment

On Mon, Mar 17, 2025 at 10:52:25AM +0100, Michal Luczaj wrote:

...

Signal delivery during connect() may lead to a disconnect of an already established socket. That involves removing socket from any sockmap and resetting state to SS_UNCONNECTED. While it correctly restores socket's proto, a call to vsock_bpf_recvmsg() might have been already under way in another thread. If the connect()ing thread reassigns the vsock transport to NULL, the recvmsg()ing thread may trigger a WARN_ON_ONCE.

connect / state = SS_CONNECTED / sock_map_update_elem vsock_bpf_recvmsg psock = sk_psock_get() lock sk if signal_pending unhash sock_map_remove_links state = SS_UNCONNECTED release sk

connect transport = NULL lock sk WARN_ON_ONCE(!vsk->transport)

Protect recvmsg() from racing against transport reassignment. Enforce the sockmap invariant that psock implies transport: lock socket before getting psock.

WARNING: CPU: 9 PID: 1222 at net/vmw_vsock/vsock_bpf.c:92 vsock_bpf_recvmsg+0xb55/0xe00 CPU: 9 UID: 0 PID: 1222 Comm: a.out Not tainted 6.14.0-rc5+ RIP: 0010:vsock_bpf_recvmsg+0xb55/0xe00 sock_recvmsg+0x1b2/0x220 __sys_recvfrom+0x190/0x270 __x64_sys_recvfrom+0xdc/0x1b0 do_syscall_64+0x93/0x1b0 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Fixes: 634f1a7110b4 ("vsock: support sockmap") Signed-off-by: Michal Luczaj mhal@rbox.co

net/vmw_vsock/vsock_bpf.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/net/vmw_vsock/vsock_bpf.c b/net/vmw_vsock/vsock_bpf.c index c68fdaf09046b68254dac3ea70ffbe73dfa45cef..5138195d91fb258d4bc09b48e80e13651d62863a 100644 --- a/net/vmw_vsock/vsock_bpf.c +++ b/net/vmw_vsock/vsock_bpf.c @@ -73,28 +73,35 @@ static int __vsock_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int return err; }

-static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg,
	     size_t len, int flags, int *addr_len)
+static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
	     int flags, int *addr_len)

I would avoid this change, especially in a patch with the Fixes tag then to be backported.

...

{ struct sk_psock *psock; struct vsock_sock *vsk; int copied;
/* Since signal delivery during connect() may reset the state of socket
* that's already in a sockmap, take the lock before checking on psock.
* This serializes a possible transport reassignment, protecting this
* function from running with NULL transport.
*/
lock_sock(sk);

psock = sk_psock_get(sk);
if (unlikely(!psock))
if (unlikely(!psock)) {
release_sock(sk);
return __vsock_recvmsg(sk, msg, len, flags);
}
lock_sock(sk); vsk = vsock_sk(sk);

if (WARN_ON_ONCE(!vsk->transport)) { copied = -ENODEV; goto out; }

if (vsock_has_data(sk, psock) && sk_psock_queue_empty(psock)) {
release_sock(sk);
sk_psock_put(sk, psock);
release_sock(sk);

But here we release it, so can still a reset happen at this point, before calling __vsock_connectible_recvmsg(). In there anyway we handle the case where transport is null, so there's no problem, right?

The rest LTGM.

Thanks, Stefano

...

return __vsock_recvmsg(sk, msg, len, flags);
}

@@ -108,8 +115,8 @@ static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg, }
if (sk_psock_queue_empty(psock)) {
	release_sock(sk);
sk_psock_put(sk, psock);
	release_sock(sk);
return __vsock_recvmsg(sk, msg, len, flags);
}
@@ -117,8 +124,8 @@ static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg, }

out:

release_sock(sk); sk_psock_put(sk, psock);

release_sock(sk);

return copied;

}

-- 2.48.1

Michal Luczaj

7:05 p.m.

New subject: [PATCH net v4 3/3] vsock/bpf: Fix bpf recvmsg() racing transport reassignment

On 3/19/25 10:34, Stefano Garzarella wrote:

...

On Mon, Mar 17, 2025 at 10:52:25AM +0100, Michal Luczaj wrote:

...
... -static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg,
	     size_t len, int flags, int *addr_len)
+static int vsock_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
	     int flags, int *addr_len)
I would avoid this change, especially in a patch with the Fixes tag then to be backported.

I thought that since I've modified this function in so many places, doing this wouldn't hurt. But ok, I'll drop this change.

...

...
{ struct sk_psock *psock; struct vsock_sock *vsk; int copied;
/* Since signal delivery during connect() may reset the state of socket
* that's already in a sockmap, take the lock before checking on psock.
* This serializes a possible transport reassignment, protecting this
* function from running with NULL transport.
*/
lock_sock(sk);

psock = sk_psock_get(sk);
if (unlikely(!psock))
if (unlikely(!psock)) {
release_sock(sk);
return __vsock_recvmsg(sk, msg, len, flags);
}
lock_sock(sk); vsk = vsock_sk(sk);

if (WARN_ON_ONCE(!vsk->transport)) { copied = -ENODEV; goto out; }

if (vsock_has_data(sk, psock) && sk_psock_queue_empty(psock)) {
release_sock(sk);
sk_psock_put(sk, psock);
release_sock(sk);
But here we release it, so can still a reset happen at this point, before calling __vsock_connectible_recvmsg(). In there anyway we handle the case where transport is null, so there's no problem, right?

Yes, I think we're good. That function needs to gracefully handle being called without a transport, and it does.

Thanks, Michal

Cong Wang

10:18 p.m.

New subject: [PATCH net v4 3/3] vsock/bpf: Fix bpf recvmsg() racing transport reassignment

On Mon, Mar 17, 2025 at 10:52:25AM +0100, Michal Luczaj wrote:

...

Signal delivery during connect() may lead to a disconnect of an already established socket. That involves removing socket from any sockmap and resetting state to SS_UNCONNECTED. While it correctly restores socket's proto, a call to vsock_bpf_recvmsg() might have been already under way in another thread. If the connect()ing thread reassigns the vsock transport to NULL, the recvmsg()ing thread may trigger a WARN_ON_ONCE.

connect / state = SS_CONNECTED / sock_map_update_elem vsock_bpf_recvmsg psock = sk_psock_get() lock sk if signal_pending unhash sock_map_remove_links

So vsock's ->recvmsg() should be restored after this, right? Then how is vsock_bpf_recvmsg() called afterward?

...

state = SS_UNCONNECTED
release sk

connect transport = NULL lock sk WARN_ON_ONCE(!vsk->transport)

And I am wondering why we need to WARN here since we can handle this error case correctly?

Thanks.

Michal Luczaj

20 Mar 20 Mar

12:05 p.m.

New subject: [PATCH net v4 3/3] vsock/bpf: Fix bpf recvmsg() racing transport reassignment

On 3/19/25 23:18, Cong Wang wrote:

...

On Mon, Mar 17, 2025 at 10:52:25AM +0100, Michal Luczaj wrote:

...
Signal delivery during connect() may lead to a disconnect of an already established socket. That involves removing socket from any sockmap and resetting state to SS_UNCONNECTED. While it correctly restores socket's proto, a call to vsock_bpf_recvmsg() might have been already under way in another thread. If the connect()ing thread reassigns the vsock transport to NULL, the recvmsg()ing thread may trigger a WARN_ON_ONCE.

*THREAD 1* *THREAD 2*

...

...
connect / state = SS_CONNECTED / sock_map_update_elem vsock_bpf_recvmsg psock = sk_psock_get() lock sk if signal_pending unhash sock_map_remove_links

So vsock's ->recvmsg() should be restored after this, right? Then how is vsock_bpf_recvmsg() called afterward?

I'm not sure I understand the question, so I've added a header above: those are 2 parallel flows of execution. vsock_bpf_recvmsg() wasn't called afterwards. It was called before sock_map_remove_links(). Note that at the time of sock_map_remove_links() (in T1), vsock_bpf_recvmsg() is still executing (in T2).

...

...
state = SS_UNCONNECTED
release sk

connect transport = NULL lock sk WARN_ON_ONCE(!vsk->transport)
And I am wondering why we need to WARN here since we can handle this error case correctly?

The WARN and transport check are here for defensive measures, and to state a contract.

But I think I get your point. If we accept for a fact of life that BPF code should be able to handle transport disappearing - then WARN can be removed (while keeping the check) and this patch can be dropped.

My aim, instead, was to keep things consistent. By which I mean sticking to the conditions expressed in vsock_bpf_update_proto() as invariants; so that vsock with a psock is guaranteed to have transport assigned.

Cong Wang

8:54 p.m.

New subject: [PATCH net v4 3/3] vsock/bpf: Fix bpf recvmsg() racing transport reassignment

On Thu, Mar 20, 2025 at 01:05:27PM +0100, Michal Luczaj wrote:

...

On 3/19/25 23:18, Cong Wang wrote:

...
On Mon, Mar 17, 2025 at 10:52:25AM +0100, Michal Luczaj wrote:

...
Signal delivery during connect() may lead to a disconnect of an already established socket. That involves removing socket from any sockmap and resetting state to SS_UNCONNECTED. While it correctly restores socket's proto, a call to vsock_bpf_recvmsg() might have been already under way in another thread. If the connect()ing thread reassigns the vsock transport to NULL, the recvmsg()ing thread may trigger a WARN_ON_ONCE.

*THREAD 1* *THREAD 2*

...
...
connect / state = SS_CONNECTED / sock_map_update_elem vsock_bpf_recvmsg psock = sk_psock_get() lock sk if signal_pending unhash sock_map_remove_links

So vsock's ->recvmsg() should be restored after this, right? Then how is vsock_bpf_recvmsg() called afterward?

I'm not sure I understand the question, so I've added a header above: those are 2 parallel flows of execution. vsock_bpf_recvmsg() wasn't called afterwards. It was called before sock_map_remove_links(). Note that at the time of sock_map_remove_links() (in T1), vsock_bpf_recvmsg() is still executing (in T2).

I thought the above vsock_bpf_recvmsg() on the right side completed before sock_map_remove_links(), sorry for the confusion.

...

...
...
state = SS_UNCONNECTED
release sk

connect transport = NULL lock sk WARN_ON_ONCE(!vsk->transport)
And I am wondering why we need to WARN here since we can handle this error case correctly?
The WARN and transport check are here for defensive measures, and to state a contract.

But I think I get your point. If we accept for a fact of life that BPF code should be able to handle transport disappearing - then WARN can be removed (while keeping the check) and this patch can be dropped.

I am thinking whether we have more elegant way to handle this case, WARN looks not pretty.

...

My aim, instead, was to keep things consistent. By which I mean sticking to the conditions expressed in vsock_bpf_update_proto() as invariants; so that vsock with a psock is guaranteed to have transport assigned.

Other than the WARN, I am also concerned about locking vsock_bpf_recvmsg() because for example UDP is (almost) lockless, so enforcing the sock lock for all vsock types looks not flexible and may hurt performance.

Maybe it is time to let vsock_bpf_rebuild_protos() build different hooks for different struct proto (as we did for TCP/UDP)?

Thanks.

Michal Luczaj

10:16 p.m.

New subject: [PATCH net v4 3/3] vsock/bpf: Fix bpf recvmsg() racing transport reassignment

On 3/20/25 21:54, Cong Wang wrote:

...

On Thu, Mar 20, 2025 at 01:05:27PM +0100, Michal Luczaj wrote:

...
On 3/19/25 23:18, Cong Wang wrote:

...
On Mon, Mar 17, 2025 at 10:52:25AM +0100, Michal Luczaj wrote:

...
Signal delivery during connect() may lead to a disconnect of an already established socket. That involves removing socket from any sockmap and resetting state to SS_UNCONNECTED. While it correctly restores socket's proto, a call to vsock_bpf_recvmsg() might have been already under way in another thread. If the connect()ing thread reassigns the vsock transport to NULL, the recvmsg()ing thread may trigger a WARN_ON_ONCE.

*THREAD 1* *THREAD 2*

...
...
connect / state = SS_CONNECTED / sock_map_update_elem vsock_bpf_recvmsg psock = sk_psock_get() lock sk if signal_pending unhash sock_map_remove_links

So vsock's ->recvmsg() should be restored after this, right? Then how is vsock_bpf_recvmsg() called afterward?

I'm not sure I understand the question, so I've added a header above: those are 2 parallel flows of execution. vsock_bpf_recvmsg() wasn't called afterwards. It was called before sock_map_remove_links(). Note that at the time of sock_map_remove_links() (in T1), vsock_bpf_recvmsg() is still executing (in T2).

I thought the above vsock_bpf_recvmsg() on the right side completed before sock_map_remove_links(), sorry for the confusion.

No problem, I see why you've might. Perhaps deeper indentation would make things clearer.

...

...
...
...
state = SS_UNCONNECTED
release sk

connect transport = NULL lock sk WARN_ON_ONCE(!vsk->transport)
And I am wondering why we need to WARN here since we can handle this error case correctly?
The WARN and transport check are here for defensive measures, and to state a contract.

But I think I get your point. If we accept for a fact of life that BPF code should be able to handle transport disappearing - then WARN can be removed (while keeping the check) and this patch can be dropped.
I am thinking whether we have more elegant way to handle this case, WARN looks not pretty.

Since the case should never happen, I like to think of WARN as a deliberate eyesore :)

...

...
My aim, instead, was to keep things consistent. By which I mean sticking to the conditions expressed in vsock_bpf_update_proto() as invariants; so that vsock with a psock is guaranteed to have transport assigned.

Other than the WARN, I am also concerned about locking vsock_bpf_recvmsg() because for example UDP is (almost) lockless, so enforcing the sock lock for all vsock types looks not flexible and may hurt performance.

Maybe it is time to let vsock_bpf_rebuild_protos() build different hooks for different struct proto (as we did for TCP/UDP)?

By UDP you mean vsock SOCK_DGRAM? No need to worry. VMCI is the only transport that features VSOCK_TRANSPORT_F_DGRAM, but it does not implemented read_skb() callback, making it unsupported by BPF/sockmap.

Michael S. Tsirkin

19 Mar 19 Mar

9:21 a.m.

On Mon, Mar 17, 2025 at 10:52:22AM +0100, Michal Luczaj wrote:

...

Signal delivery during connect() may disconnect an already established socket. Problem is that such socket might have been placed in a sockmap before the connection was closed.

PATCH 1 ensures this race won't lead to an unconnected vsock staying in the sockmap. PATCH 2 selftests it.

PATCH 3 fixes a related race. Note that selftest in PATCH 2 does test this code as well, but winning this race variant may take more than 2 seconds, so I'm not advertising it.

Signed-off-by: Michal Luczaj mhal@rbox.co

vsock things:

Acked-by: Michael S. Tsirkin mst@redhat.com

...

Changes in v4:

Selftest: send signal to only our own process

Link to v3: https://lore.kernel.org/r/20250316-vsock-trans-signal-race-v3-0-17a6862277c9...

Changes in v3:

Selftest: drop unnecessary variable initialization and reorder the calls

Link to v2: https://lore.kernel.org/r/20250314-vsock-trans-signal-race-v2-0-421a41f60f42...

Changes in v2:

Handle one more path of tripping the warning

Add a selftest

Collect R-b [Stefano]

Link to v1: https://lore.kernel.org/r/20250307-vsock-trans-signal-race-v1-1-3aca3f771fbd...

Michal Luczaj (3): vsock/bpf: Fix EINTR connect() racing sockmap update selftest/bpf: Add test for AF_VSOCK connect() racing sockmap update vsock/bpf: Fix bpf recvmsg() racing transport reassignment

net/vmw_vsock/af_vsock.c | 10 ++- net/vmw_vsock/vsock_bpf.c | 24 ++++-- .../selftests/bpf/prog_tests/sockmap_basic.c | 99 ++++++++++++++++++++++ 3 files changed, 124 insertions(+), 9 deletions(-)

base-commit: da9e8efe7ee10e8425dc356a9fc593502c8e3933 change-id: 20250305-vsock-trans-signal-race-d62f7718d099

Best regards,

Michal Luczaj mhal@rbox.co

114

days inactive

117

days old

linux-kselftest-mirror@lists.linaro.org

11 comments

participants

tags (0)

participants (4)

Cong Wang
Michael S. Tsirkin
Michal Luczaj
Stefano Garzarella