This series adds namespace support to vhost-vsock and loopback. It does not add namespaces to any of the other guest transports (virtio-vsock, hyperv, or vmci).
The current revision supports two modes: local and global. Local mode is complete isolation of namespaces, while global mode is complete sharing between namespaces of CIDs (the original behavior).
The mode is set using /proc/sys/net/vsock/ns_mode.
Modes are per-netns and write-once. This allows a system to configure namespaces independently (some may share CIDs, others are completely isolated). This also supports future possible mixed use cases, where there may be namespaces in global mode spinning up VMs while there are mixed mode namespaces that provide services to the VMs, but are not allowed to allocate from the global CID pool (this mode is not implemented in this series).
If a socket or VM is created when a namespace is global but the namespace changes to local, the socket or VM will continue working normally. That is, the socket or VM assumes the mode behavior of the namespace at the time the socket/VM was created. The original mode is captured in vsock_create() and so occurs at the time of socket(2) and accept(2) for sockets and open(2) on /dev/vhost-vsock for VMs. This prevents a socket/VM connection from suddenly breaking due to a namespace mode change. Any new sockets/VMs created after the mode change will adopt the new mode's behavior.
Additionally, added tests for the new namespace features:
tools/testing/selftests/vsock/vmtest.sh 1..29 ok 1 vm_server_host_client ok 2 vm_client_host_server ok 3 vm_loopback ok 4 ns_guest_local_mode_rejected ok 5 ns_host_vsock_ns_mode_ok ok 6 ns_host_vsock_ns_mode_write_once_ok ok 7 ns_global_same_cid_fails ok 8 ns_local_same_cid_ok ok 9 ns_global_local_same_cid_ok ok 10 ns_local_global_same_cid_ok ok 11 ns_diff_global_host_connect_to_global_vm_ok ok 12 ns_diff_global_host_connect_to_local_vm_fails ok 13 ns_diff_global_vm_connect_to_global_host_ok ok 14 ns_diff_global_vm_connect_to_local_host_fails ok 15 ns_diff_local_host_connect_to_local_vm_fails ok 16 ns_diff_local_vm_connect_to_local_host_fails ok 17 ns_diff_global_to_local_loopback_local_fails ok 18 ns_diff_local_to_global_loopback_fails ok 19 ns_diff_local_to_local_loopback_fails ok 20 ns_diff_global_to_global_loopback_ok ok 21 ns_same_local_loopback_ok ok 22 ns_same_local_host_connect_to_local_vm_ok ok 23 ns_same_local_vm_connect_to_local_host_ok ok 24 ns_mode_change_connection_continue_vm_ok ok 25 ns_mode_change_connection_continue_host_ok ok 26 ns_mode_change_connection_continue_both_ok ok 27 ns_delete_vm_ok ok 28 ns_delete_host_ok ok 29 ns_delete_both_ok SUMMARY: PASS=29 SKIP=0 FAIL=0
Dependent on series: https://lore.kernel.org/all/20251108-vsock-selftests-fixes-and-improvements-...
Thanks again for everyone's help and reviews!
Suggested-by: Sargun Dhillon sargun@sargun.me Signed-off-by: Bobby Eshleman bobbyeshleman@gmail.com
To: Stefano Garzarella sgarzare@redhat.com To: Shuah Khan shuah@kernel.org To: David S. Miller davem@davemloft.net To: Eric Dumazet edumazet@google.com To: Jakub Kicinski kuba@kernel.org To: Paolo Abeni pabeni@redhat.com To: Simon Horman horms@kernel.org To: Stefan Hajnoczi stefanha@redhat.com To: Michael S. Tsirkin mst@redhat.com To: Jason Wang jasowang@redhat.com To: Xuan Zhuo xuanzhuo@linux.alibaba.com To: Eugenio Pérez eperezma@redhat.com To: K. Y. Srinivasan kys@microsoft.com To: Haiyang Zhang haiyangz@microsoft.com To: Wei Liu wei.liu@kernel.org To: Dexuan Cui decui@microsoft.com To: Bryan Tan bryan-bt.tan@broadcom.com To: Vishnu Dasa vishnu.dasa@broadcom.com To: Broadcom internal kernel review list bcm-kernel-feedback-list@broadcom.com Cc: virtualization@lists.linux.dev Cc: netdev@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org Cc: linux-hyperv@vger.kernel.org Cc: berrange@redhat.com Cc: Sargun Dhillon sargun@sargun.me
Changes in v9: - reorder loopback patch after patch for virtio transport common code - remove module ordering tests patch because loopback no longer depends on pernet ops - major simplifications in vsock_loopback - added a new patch for blocking local mode for guests, added test case to check - add net ref tracking to vsock_loopback patch - Link to v8: https://lore.kernel.org/r/20251023-vsock-vmtest-v8-0-dea984d02bb0@meta.com
Changes in v8: - Break generic cleanup/refactoring patches into standalone series, remove those from this series - Link to dependency: https://lore.kernel.org/all/20251022-vsock-selftests-fixes-and-improvements-... - Link to v7: https://lore.kernel.org/r/20251021-vsock-vmtest-v7-0-0661b7b6f081@meta.com
Changes in v7: - fix hv_sock build - break out vmtest patches into distinct, more well-scoped patches - change `orig_net_mode` to `net_mode` - many fixes and style changes in per-patch change sets (see individual patches for specific changes) - optimize `virtio_vsock_skb_cb` layout - update commit messages with more useful descriptions - vsock_loopback: use orig_net_mode instead of current net mode - add tests for edge cases (ns deletion, mode changing, loopback module load ordering) - Link to v6: https://lore.kernel.org/r/20250916-vsock-vmtest-v6-0-064d2eb0c89d@meta.com
Changes in v6: - define behavior when mode changes to local while socket/VM is alive - af_vsock: clarify description of CID behavior - af_vsock: use stronger langauge around CID rules (dont use "may") - af_vsock: improve naming of buf/buffer - af_vsock: improve string length checking on proc writes - vsock_loopback: add space in struct to clarify lock protection - vsock_loopback: do proper cleanup/unregister on vsock_loopback_exit() - vsock_loopback: use virtio_vsock_skb_net() instead of sock_net() - vsock_loopback: set loopback to NULL after kfree() - vsock_loopback: use pernet_operations and remove callback mechanism - vsock_loopback: add macros for "global" and "local" - vsock_loopback: fix length checking - vmtest.sh: check for namespace support in vmtest.sh - Link to v5: https://lore.kernel.org/r/20250827-vsock-vmtest-v5-0-0ba580bede5b@meta.com
Changes in v5: - /proc/net/vsock_ns_mode -> /proc/sys/net/vsock/ns_mode - vsock_global_net -> vsock_global_dummy_net - fix netns lookup in vhost_vsock to respect pid namespaces - add callbacks for vsock_loopback to avoid circular dependency - vmtest.sh loads vsock_loopback module - remove vsock_net_mode_can_set() - change vsock_net_write_mode() to return true/false based on success - make vsock_net_mode enum instead of u8 - Link to v4: https://lore.kernel.org/r/20250805-vsock-vmtest-v4-0-059ec51ab111@meta.com
Changes in v4: - removed RFC tag - implemented loopback support - renamed new tests to better reflect behavior - completed suite of tests with permutations of ns modes and vsock_test as guest/host - simplified socat bridging with unix socket instead of tcp + veth - only use vsock_test for success case, socat for failure case (context in commit message) - lots of cleanup
Changes in v3: - add notion of "modes" - add procfs /proc/net/vsock_ns_mode - local and global modes only - no /dev/vhost-vsock-netns - vmtest.sh already merged, so new patch just adds new tests for NS - Link to v2: https://lore.kernel.org/kvm/20250312-vsock-netns-v2-0-84bffa1aa97a@gmail.com
Changes in v2: - only support vhost-vsock namespaces - all g2h namespaces retain old behavior, only common API changes impacted by vhost-vsock changes - add /dev/vhost-vsock-netns for "opt-in" - leave /dev/vhost-vsock to old behavior - removed netns module param - Link to v1: https://lore.kernel.org/r/20200116172428.311437-1-sgarzare@redhat.com
Changes in v1: - added 'netns' module param to vsock.ko to enable the network namespace support (disabled by default) - added 'vsock_net_eq()' to check the "net" assigned to a socket only when 'netns' support is enabled - Link to RFC: https://patchwork.ozlabs.org/cover/1202235/
--- Bobby Eshleman (14): vsock: a per-net vsock NS mode state vsock: add netns to vsock core vsock/virtio: add netns support to virtio transport and virtio common vsock/virtio: pack struct virtio_vsock_skb_cb vsock: add netns and netns_tracker to vsock skb cb vsock/loopback: add netns support vhost/vsock: add netns support vsock: reject bad VSOCK_NET_MODE_LOCAL configuration for G2H selftests/vsock: add namespace helpers to vmtest.sh selftests/vsock: prepare vm management helpers for namespaces selftests/vsock: add tests for proc sys vsock ns_mode selftests/vsock: add namespace tests for CID collisions selftests/vsock: add tests for host <-> vm connectivity with namespaces selftests/vsock: add tests for namespace deletion and mode changes
MAINTAINERS | 1 + drivers/vhost/vsock.c | 48 +- include/linux/virtio_vsock.h | 43 +- include/net/af_vsock.h | 57 +- include/net/net_namespace.h | 4 + include/net/netns/vsock.h | 17 + net/vmw_vsock/af_vsock.c | 290 +++++++++- net/vmw_vsock/hyperv_transport.c | 1 + net/vmw_vsock/virtio_transport.c | 14 +- net/vmw_vsock/virtio_transport_common.c | 57 +- net/vmw_vsock/vsock_loopback.c | 48 +- tools/testing/selftests/vsock/vmtest.sh | 931 ++++++++++++++++++++++++++++++-- 12 files changed, 1418 insertions(+), 93 deletions(-) --- base-commit: 962ac5ca99a5c3e7469215bf47572440402dfd59 change-id: 20250325-vsock-vmtest-b3a21d2102c2 prerequisite-message-id: 20251022-vsock-selftests-fixes-and-improvements-v1-0-edeb179d6463@meta.com prerequisite-patch-id: a2eecc3851f2509ed40009a7cab6990c6d7cfff5 prerequisite-patch-id: 501db2100636b9c8fcb3b64b8b1df797ccbede85 prerequisite-patch-id: ba1a2f07398a035bc48ef72edda41888614be449 prerequisite-patch-id: fd5cc5445aca9355ce678e6d2bfa89fab8a57e61 prerequisite-patch-id: 795ab4432ffb0843e22b580374782e7e0d99b909 prerequisite-patch-id: 1499d263dc933e75366c09e045d2125ca39f7ddd prerequisite-patch-id: f92d99bb1d35d99b063f818a19dcda999152d74c prerequisite-patch-id: e3296f38cdba6d903e061cff2bbb3e7615e8e671 prerequisite-patch-id: bc4662b4710d302d4893f58708820fc2a0624325 prerequisite-patch-id: f8991f2e98c2661a706183fde6b35e2b8d9aedcf prerequisite-patch-id: 44bf9ed69353586d284e5ee63d6fffa30439a698 prerequisite-patch-id: d50621bc630eeaf608bbaf260370c8dabf6326df
Best regards,
From: Bobby Eshleman bobbyeshleman@meta.com
Add the per-net vsock NS mode state. This only adds the structure for holding the mode and some of the functions for setting/getting and checking the mode, but does not integrate the functionality yet.
A "net_mode" field is added to vsock_sock to store the mode of the namespace when the vsock_sock was created. In order to evaluate namespace mode rules we need to know both a) which namespace the endpoints are in, and b) what mode that namespace had when the endpoints were created. This allows us to handle the changing of modes from global to local *after* a socket has been created by remembering that the mode was global when the socket was created. If we were to use the current net's mode instead, then the lookup would fail and the socket would break.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- Changes in v9: - use xchg(), WRITE_ONCE(), READ_ONCE() for mode and mode_locked (Stefano) - clarify mode0/mode1 meaning in vsock_net_check_mode() comment - remove spin lock in net->vsock (not used anymore) - change mode from u8 to enum vsock_net_mode in vsock_net_write_mode()
Changes in v7: - clarify vsock_net_check_mode() comments - change to `orig_net_mode == VSOCK_NET_MODE_GLOBAL && orig_net_mode == vsk->orig_net_mode` - remove extraneous explanation of `orig_net_mode` - rename `written` to `mode_locked` - rename `vsock_hdr` to `sysctl_hdr` - change `orig_net_mode` to `net_mode` - make vsock_net_check_mode() more generic by taking just net pointers and modes, instead of a vsock_sock ptr, for reuse by transports (e.g., vhost_vsock)
Changes in v6: - add orig_net_mode to store mode at creation time which will be used to avoid breakage when namespace changes mode during socket/VM lifespan
Changes in v5: - use /proc/sys/net/vsock/ns_mode instead of /proc/net/vsock_ns_mode - change from net->vsock.ns_mode to net->vsock.mode - change vsock_net_set_mode() to vsock_net_write_mode() - vsock_net_write_mode() returns bool for write success to avoid need to use vsock_net_mode_can_set() - remove vsock_net_mode_can_set() --- MAINTAINERS | 1 + include/net/af_vsock.h | 41 +++++++++++++++++++++++++++++++++++++++++ include/net/net_namespace.h | 4 ++++ include/net/netns/vsock.h | 17 +++++++++++++++++ 4 files changed, 63 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS index 0dc4aa37d903..15c590a571f2 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -27098,6 +27098,7 @@ L: netdev@vger.kernel.org S: Maintained F: drivers/vhost/vsock.c F: include/linux/virtio_vsock.h +F: include/net/netns/vsock.h F: include/uapi/linux/virtio_vsock.h F: net/vmw_vsock/virtio_transport.c F: net/vmw_vsock/virtio_transport_common.c diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index d40e978126e3..f3c3f74355e8 100644 --- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -10,6 +10,7 @@
#include <linux/kernel.h> #include <linux/workqueue.h> +#include <net/netns/vsock.h> #include <net/sock.h> #include <uapi/linux/vm_sockets.h>
@@ -65,6 +66,7 @@ struct vsock_sock { u32 peer_shutdown; bool sent_request; bool ignore_connecting_rst; + enum vsock_net_mode net_mode;
/* Protected by lock_sock(sk) */ u64 buffer_size; @@ -256,4 +258,43 @@ static inline bool vsock_msgzerocopy_allow(const struct vsock_transport *t) { return t->msgzerocopy_allow && t->msgzerocopy_allow(); } + +static inline enum vsock_net_mode vsock_net_mode(struct net *net) +{ + return READ_ONCE(net->vsock.mode); +} + +static inline bool vsock_net_write_mode(struct net *net, enum vsock_net_mode mode) +{ + if (xchg(&net->vsock.mode_locked, true)) + return false; + + WRITE_ONCE(net->vsock.mode, mode); + return true; +} + +/* Return true if two namespaces and modes pass the mode rules. Otherwise, + * return false. + * + * - ns0 and ns1 are the namespaces being checked. + * - mode0 and mode1 are the vsock namespace modes of ns0 and ns1 at the time + * the vsock objects were created. + * + * Read more about modes in the comment header of net/vmw_vsock/af_vsock.c. + */ +static inline bool vsock_net_check_mode(struct net *ns0, enum vsock_net_mode mode0, + struct net *ns1, enum vsock_net_mode mode1) +{ + /* Any vsocks within the same network namespace are always reachable, + * regardless of the mode. + */ + if (net_eq(ns0, ns1)) + return true; + + /* + * If the network namespaces differ, vsocks are only reachable if both + * were created in VSOCK_NET_MODE_GLOBAL mode. + */ + return mode0 == VSOCK_NET_MODE_GLOBAL && mode0 == mode1; +} #endif /* __AF_VSOCK_H__ */ diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index cb664f6e3558..66d3de1d935f 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -37,6 +37,7 @@ #include <net/netns/smc.h> #include <net/netns/bpf.h> #include <net/netns/mctp.h> +#include <net/netns/vsock.h> #include <net/net_trackers.h> #include <linux/ns_common.h> #include <linux/idr.h> @@ -196,6 +197,9 @@ struct net { /* Move to a better place when the config guard is removed. */ struct mutex rtnl_mutex; #endif +#if IS_ENABLED(CONFIG_VSOCKETS) + struct netns_vsock vsock; +#endif } __randomize_layout;
#include <linux/seq_file_net.h> diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h new file mode 100644 index 000000000000..21189d7bdd4e --- /dev/null +++ b/include/net/netns/vsock.h @@ -0,0 +1,17 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __NET_NET_NAMESPACE_VSOCK_H +#define __NET_NET_NAMESPACE_VSOCK_H + +#include <linux/types.h> + +enum vsock_net_mode { + VSOCK_NET_MODE_GLOBAL, + VSOCK_NET_MODE_LOCAL, +}; + +struct netns_vsock { + struct ctl_table_header *sysctl_hdr; + enum vsock_net_mode mode; + bool mode_locked; +}; +#endif /* __NET_NET_NAMESPACE_VSOCK_H */
On Tue, Nov 11, 2025 at 10:54:43PM -0800, Bobby Eshleman wrote:
From: Bobby Eshleman bobbyeshleman@meta.com
Add the per-net vsock NS mode state. This only adds the structure for holding the mode and some of the functions for setting/getting and checking the mode, but does not integrate the functionality yet.
A "net_mode" field is added to vsock_sock to store the mode of the namespace when the vsock_sock was created. In order to evaluate namespace mode rules we need to know both a) which namespace the endpoints are in, and b) what mode that namespace had when the endpoints were created. This allows us to handle the changing of modes from global to local *after* a socket has been created by remembering that the mode was global when the socket was created. If we were to use the current net's mode instead, then the lookup would fail and the socket would break.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com
Changes in v9:
- use xchg(), WRITE_ONCE(), READ_ONCE() for mode and mode_locked (Stefano)
- clarify mode0/mode1 meaning in vsock_net_check_mode() comment
- remove spin lock in net->vsock (not used anymore)
- change mode from u8 to enum vsock_net_mode in vsock_net_write_mode()
Changes in v7:
- clarify vsock_net_check_mode() comments
- change to `orig_net_mode == VSOCK_NET_MODE_GLOBAL && orig_net_mode == vsk->orig_net_mode`
- remove extraneous explanation of `orig_net_mode`
- rename `written` to `mode_locked`
- rename `vsock_hdr` to `sysctl_hdr`
- change `orig_net_mode` to `net_mode`
- make vsock_net_check_mode() more generic by taking just net pointers
and modes, instead of a vsock_sock ptr, for reuse by transports (e.g., vhost_vsock)
Changes in v6:
- add orig_net_mode to store mode at creation time which will be used to
avoid breakage when namespace changes mode during socket/VM lifespan
Changes in v5:
- use /proc/sys/net/vsock/ns_mode instead of /proc/net/vsock_ns_mode
- change from net->vsock.ns_mode to net->vsock.mode
- change vsock_net_set_mode() to vsock_net_write_mode()
- vsock_net_write_mode() returns bool for write success to avoid
need to use vsock_net_mode_can_set()
- remove vsock_net_mode_can_set()
MAINTAINERS | 1 + include/net/af_vsock.h | 41 +++++++++++++++++++++++++++++++++++++++++ include/net/net_namespace.h | 4 ++++ include/net/netns/vsock.h | 17 +++++++++++++++++ 4 files changed, 63 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS index 0dc4aa37d903..15c590a571f2 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -27098,6 +27098,7 @@ L: netdev@vger.kernel.org S: Maintained F: drivers/vhost/vsock.c F: include/linux/virtio_vsock.h +F: include/net/netns/vsock.h F: include/uapi/linux/virtio_vsock.h F: net/vmw_vsock/virtio_transport.c F: net/vmw_vsock/virtio_transport_common.c diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index d40e978126e3..f3c3f74355e8 100644 --- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -10,6 +10,7 @@
#include <linux/kernel.h> #include <linux/workqueue.h> +#include <net/netns/vsock.h> #include <net/sock.h> #include <uapi/linux/vm_sockets.h>
@@ -65,6 +66,7 @@ struct vsock_sock { u32 peer_shutdown; bool sent_request; bool ignore_connecting_rst;
enum vsock_net_mode net_mode;
/* Protected by lock_sock(sk) */ u64 buffer_size;
@@ -256,4 +258,43 @@ static inline bool vsock_msgzerocopy_allow(const struct vsock_transport *t) { return t->msgzerocopy_allow && t->msgzerocopy_allow(); }
+static inline enum vsock_net_mode vsock_net_mode(struct net *net) +{
- return READ_ONCE(net->vsock.mode);
+}
+static inline bool vsock_net_write_mode(struct net *net, enum vsock_net_mode mode) +{
- if (xchg(&net->vsock.mode_locked, true))
LGTM, but it seems that some architecture doesn't support xchg on 1 byte, e.g. see commit d66a65b7f5d2 ("scsi: elx: efct: Fix link error for _bad_cmpxchg")
So maybe we just need to change the type of mode_locked to int.
The rest LGTM.
Stefano
return false;- WRITE_ONCE(net->vsock.mode, mode);
- return true;
+}
+/* Return true if two namespaces and modes pass the mode rules. Otherwise,
- return false.
- ns0 and ns1 are the namespaces being checked.
- mode0 and mode1 are the vsock namespace modes of ns0 and ns1 at the time
- the vsock objects were created.
- Read more about modes in the comment header of net/vmw_vsock/af_vsock.c.
- */
+static inline bool vsock_net_check_mode(struct net *ns0, enum vsock_net_mode mode0,
struct net *ns1, enum vsock_net_mode mode1)+{
- /* Any vsocks within the same network namespace are always reachable,
* regardless of the mode.*/- if (net_eq(ns0, ns1))
return true;- /*
* If the network namespaces differ, vsocks are only reachable if both* were created in VSOCK_NET_MODE_GLOBAL mode.*/- return mode0 == VSOCK_NET_MODE_GLOBAL && mode0 == mode1;
+} #endif /* __AF_VSOCK_H__ */ diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index cb664f6e3558..66d3de1d935f 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -37,6 +37,7 @@ #include <net/netns/smc.h> #include <net/netns/bpf.h> #include <net/netns/mctp.h> +#include <net/netns/vsock.h> #include <net/net_trackers.h> #include <linux/ns_common.h> #include <linux/idr.h> @@ -196,6 +197,9 @@ struct net { /* Move to a better place when the config guard is removed. */ struct mutex rtnl_mutex; #endif +#if IS_ENABLED(CONFIG_VSOCKETS)
- struct netns_vsock vsock;
+#endif } __randomize_layout;
#include <linux/seq_file_net.h> diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h new file mode 100644 index 000000000000..21189d7bdd4e --- /dev/null +++ b/include/net/netns/vsock.h @@ -0,0 +1,17 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __NET_NET_NAMESPACE_VSOCK_H +#define __NET_NET_NAMESPACE_VSOCK_H
+#include <linux/types.h>
+enum vsock_net_mode {
- VSOCK_NET_MODE_GLOBAL,
- VSOCK_NET_MODE_LOCAL,
+};
+struct netns_vsock {
- struct ctl_table_header *sysctl_hdr;
- enum vsock_net_mode mode;
- bool mode_locked;
+}; +#endif /* __NET_NET_NAMESPACE_VSOCK_H */
-- 2.47.3
From: Bobby Eshleman bobbyeshleman@meta.com
Add netns logic to vsock core. Additionally, modify transport hook prototypes to be used by later transport-specific patches (e.g., *_seqpacket_allow()).
Namespaces are supported primarily by changing socket lookup functions (e.g., vsock_find_connected_socket()) to take into account the socket namespace and the namespace mode before considering a candidate socket a "match".
This patch also introduces the sysctl /proc/sys/net/vsock/ns_mode that accepts the "global" or "local" mode strings.
Add netns functionality (initialization, passing to transports, procfs, etc...) to the af_vsock socket layer. Later patches that add netns support to transports depend on this patch.
seqpacket_allow() callbacks are modified to take a vsk so that transport implementations can inspect sock_net(sk) and vsk->net_mode when performing lookups (e.g., vhost does this in its future netns patch). Because the API change affects all transports, it seemed more appropriate to make this internal API change in the "vsock core" patch then in the "vhost" patch.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- Changes in v9: - remove virtio_vsock_alloc_rx_skb() (Stefano) - remove vsock_global_dummy_net, not needed as net=NULL + net_mode=VSOCK_NET_MODE_GLOBAL achieves identical result
Changes in v7: - hv_sock: fix hyperv build error - explain why vhost does not use the dummy - explain usage of __vsock_global_dummy_net - explain why VSOCK_NET_MODE_STR_MAX is 8 characters - use switch-case in vsock_net_mode_string() - avoid changing transports as much as possible - add vsock_find_{bound,connected}_socket_net() - rename `vsock_hdr` to `sysctl_hdr` - add virtio_vsock_alloc_linear_skb() wrapper for setting dummy net and global mode for virtio-vsock, move skb->cb zero-ing into wrapper - explain seqpacket_allow() change - move net setting to __vsock_create() instead of vsock_create() so that child sockets also have their net assigned upon accept()
Changes in v6: - unregister sysctl ops in vsock_exit() - af_vsock: clarify description of CID behavior - af_vsock: fix buf vs buffer naming, and length checking - af_vsock: fix length checking w/ correct ctl_table->maxlen
Changes in v5: - vsock_global_net() -> vsock_global_dummy_net() - update comments for new uAPI - use /proc/sys/net/vsock/ns_mode instead of /proc/net/vsock_ns_mode - add prototype changes so patch remains compilable --- drivers/vhost/vsock.c | 4 +- include/net/af_vsock.h | 8 +- net/vmw_vsock/af_vsock.c | 251 ++++++++++++++++++++++++++++++++++++--- net/vmw_vsock/virtio_transport.c | 4 +- net/vmw_vsock/vsock_loopback.c | 4 +- 5 files changed, 247 insertions(+), 24 deletions(-)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index ae01457ea2cd..34adf0cf9124 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -404,7 +404,7 @@ static bool vhost_transport_msgzerocopy_allow(void) return true; }
-static bool vhost_transport_seqpacket_allow(u32 remote_cid); +static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid);
static struct virtio_transport vhost_transport = { .transport = { @@ -460,7 +460,7 @@ static struct virtio_transport vhost_transport = { .send_pkt = vhost_transport_send_pkt, };
-static bool vhost_transport_seqpacket_allow(u32 remote_cid) +static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid) { struct vhost_vsock *vsock; bool seqpacket_allow = false; diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index f3c3f74355e8..cfd121bb5ab7 100644 --- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -145,7 +145,7 @@ struct vsock_transport { int flags); int (*seqpacket_enqueue)(struct vsock_sock *vsk, struct msghdr *msg, size_t len); - bool (*seqpacket_allow)(u32 remote_cid); + bool (*seqpacket_allow)(struct vsock_sock *vsk, u32 remote_cid); u32 (*seqpacket_has_data)(struct vsock_sock *vsk);
/* Notification. */ @@ -218,6 +218,12 @@ void vsock_remove_connected(struct vsock_sock *vsk); struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr); struct sock *vsock_find_connected_socket(struct sockaddr_vm *src, struct sockaddr_vm *dst); +struct sock *vsock_find_bound_socket_net(struct sockaddr_vm *addr, struct net *net, + enum vsock_net_mode net_mode); +struct sock *vsock_find_connected_socket_net(struct sockaddr_vm *src, + struct sockaddr_vm *dst, + struct net *net, + enum vsock_net_mode net_mode); void vsock_remove_sock(struct vsock_sock *vsk); void vsock_for_each_connected_socket(struct vsock_transport *transport, void (*fn)(struct sock *sk)); diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 72bb6b7ed386..c0b5946bdc95 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -83,6 +83,35 @@ * TCP_ESTABLISHED - connected * TCP_CLOSING - disconnecting * TCP_LISTEN - listening + * + * - Namespaces in vsock support two different modes configured + * through /proc/sys/net/vsock/ns_mode. The modes are "local" and "global". + * Each mode defines how the namespace interacts with CIDs. + * /proc/sys/net/vsock/ns_mode is write-once, so that it may be configured + * and locked down by a namespace manager. The default is "global". The mode + * is set per-namespace. + * + * The modes affect the allocation and accessibility of CIDs as follows: + * + * - global - access and allocation are all system-wide + * - all CID allocation from global namespaces draw from the same + * system-wide pool + * - if one global namespace has already allocated some CID, another + * global namespace will not be able to allocate the same CID + * - global mode AF_VSOCK sockets can reach any VM or socket in any global + * namespace, they are not contained to only their own namespace + * - AF_VSOCK sockets in a global mode namespace cannot reach VMs or + * sockets in any local mode namespace + * - local - access and allocation are contained within the namespace + * - CID allocation draws only from a private pool local only to the + * namespace, and does not affect the CIDs available for allocation in any + * other namespace (global or local) + * - VMs in a local namespace do not collide with CIDs in any other local + * namespace or any global namespace. For example, if a VM in a local mode + * namespace is given CID 10, then CID 10 is still available for + * allocation in any other namespace, but not in the same namespace + * - AF_VSOCK sockets in a local mode namespace can connect only to VMs or + * other sockets within their own namespace. */
#include <linux/compat.h> @@ -100,6 +129,7 @@ #include <linux/module.h> #include <linux/mutex.h> #include <linux/net.h> +#include <linux/proc_fs.h> #include <linux/poll.h> #include <linux/random.h> #include <linux/skbuff.h> @@ -111,9 +141,18 @@ #include <linux/workqueue.h> #include <net/sock.h> #include <net/af_vsock.h> +#include <net/netns/vsock.h> #include <uapi/linux/vm_sockets.h> #include <uapi/asm-generic/ioctls.h>
+#define VSOCK_NET_MODE_STR_GLOBAL "global" +#define VSOCK_NET_MODE_STR_LOCAL "local" + +/* 6 chars for "global", 1 for null-terminator, and 1 more for '\n'. + * The newline is added by proc_dostring() for read operations. + */ +#define VSOCK_NET_MODE_STR_MAX 8 + static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr); static void vsock_sk_destruct(struct sock *sk); static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb); @@ -235,33 +274,44 @@ static void __vsock_remove_connected(struct vsock_sock *vsk) sock_put(&vsk->sk); }
-static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr) +static struct sock *__vsock_find_bound_socket_net(struct sockaddr_vm *addr, + struct net *net, + enum vsock_net_mode net_mode) { struct vsock_sock *vsk;
list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) { - if (vsock_addr_equals_addr(addr, &vsk->local_addr)) - return sk_vsock(vsk); + struct sock *sk = sk_vsock(vsk); + + if (vsock_addr_equals_addr(addr, &vsk->local_addr) && + vsock_net_check_mode(sock_net(sk), vsk->net_mode, net, net_mode)) + return sk;
if (addr->svm_port == vsk->local_addr.svm_port && (vsk->local_addr.svm_cid == VMADDR_CID_ANY || - addr->svm_cid == VMADDR_CID_ANY)) - return sk_vsock(vsk); + addr->svm_cid == VMADDR_CID_ANY) && + vsock_net_check_mode(sock_net(sk), vsk->net_mode, net, net_mode)) + return sk; }
return NULL; }
-static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src, - struct sockaddr_vm *dst) +static struct sock *__vsock_find_connected_socket_net(struct sockaddr_vm *src, + struct sockaddr_vm *dst, + struct net *net, + enum vsock_net_mode net_mode) { struct vsock_sock *vsk;
list_for_each_entry(vsk, vsock_connected_sockets(src, dst), connected_table) { + struct sock *sk = sk_vsock(vsk); + if (vsock_addr_equals_addr(src, &vsk->remote_addr) && - dst->svm_port == vsk->local_addr.svm_port) { - return sk_vsock(vsk); + dst->svm_port == vsk->local_addr.svm_port && + vsock_net_check_mode(sock_net(sk), vsk->net_mode, net, net_mode)) { + return sk; } }
@@ -304,12 +354,14 @@ void vsock_remove_connected(struct vsock_sock *vsk) } EXPORT_SYMBOL_GPL(vsock_remove_connected);
-struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr) +struct sock *vsock_find_bound_socket_net(struct sockaddr_vm *addr, + struct net *net, + enum vsock_net_mode net_mode) { struct sock *sk;
spin_lock_bh(&vsock_table_lock); - sk = __vsock_find_bound_socket(addr); + sk = __vsock_find_bound_socket_net(addr, net, net_mode); if (sk) sock_hold(sk);
@@ -317,15 +369,23 @@ struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
return sk; } +EXPORT_SYMBOL_GPL(vsock_find_bound_socket_net); + +struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr) +{ + return vsock_find_bound_socket_net(addr, NULL, VSOCK_NET_MODE_GLOBAL); +} EXPORT_SYMBOL_GPL(vsock_find_bound_socket);
-struct sock *vsock_find_connected_socket(struct sockaddr_vm *src, - struct sockaddr_vm *dst) +struct sock *vsock_find_connected_socket_net(struct sockaddr_vm *src, + struct sockaddr_vm *dst, + struct net *net, + enum vsock_net_mode net_mode) { struct sock *sk;
spin_lock_bh(&vsock_table_lock); - sk = __vsock_find_connected_socket(src, dst); + sk = __vsock_find_connected_socket_net(src, dst, net, net_mode); if (sk) sock_hold(sk);
@@ -333,6 +393,14 @@ struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
return sk; } +EXPORT_SYMBOL_GPL(vsock_find_connected_socket_net); + +struct sock *vsock_find_connected_socket(struct sockaddr_vm *src, + struct sockaddr_vm *dst) +{ + return vsock_find_connected_socket_net(src, dst, + NULL, VSOCK_NET_MODE_GLOBAL); +} EXPORT_SYMBOL_GPL(vsock_find_connected_socket);
void vsock_remove_sock(struct vsock_sock *vsk) @@ -528,7 +596,7 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
if (sk->sk_type == SOCK_SEQPACKET) { if (!new_transport->seqpacket_allow || - !new_transport->seqpacket_allow(remote_cid)) { + !new_transport->seqpacket_allow(vsk, remote_cid)) { module_put(new_transport->module); return -ESOCKTNOSUPPORT; } @@ -676,6 +744,7 @@ static void vsock_pending_work(struct work_struct *work) static int __vsock_bind_connectible(struct vsock_sock *vsk, struct sockaddr_vm *addr) { + struct net *net = sock_net(sk_vsock(vsk)); static u32 port; struct sockaddr_vm new_addr;
@@ -695,7 +764,8 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
new_addr.svm_port = port++;
- if (!__vsock_find_bound_socket(&new_addr)) { + if (!__vsock_find_bound_socket_net(&new_addr, net, + vsk->net_mode)) { found = true; break; } @@ -712,7 +782,8 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk, return -EACCES; }
- if (__vsock_find_bound_socket(&new_addr)) + if (__vsock_find_bound_socket_net(&new_addr, net, + vsk->net_mode)) return -EADDRINUSE; }
@@ -836,6 +907,8 @@ static struct sock *__vsock_create(struct net *net, vsk->buffer_max_size = VSOCK_DEFAULT_BUFFER_MAX_SIZE; }
+ vsk->net_mode = vsock_net_mode(net); + return sk; }
@@ -2636,6 +2709,141 @@ static struct miscdevice vsock_device = { .fops = &vsock_device_ops, };
+static int vsock_net_mode_string(const struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + char data[VSOCK_NET_MODE_STR_MAX] = {0}; + enum vsock_net_mode mode; + struct ctl_table tmp; + struct net *net; + int ret; + + if (!table->data || !table->maxlen || !*lenp) { + *lenp = 0; + return 0; + } + + net = current->nsproxy->net_ns; + tmp = *table; + tmp.data = data; + + if (!write) { + const char *p; + + mode = vsock_net_mode(net); + + switch (mode) { + case VSOCK_NET_MODE_GLOBAL: + p = VSOCK_NET_MODE_STR_GLOBAL; + break; + case VSOCK_NET_MODE_LOCAL: + p = VSOCK_NET_MODE_STR_LOCAL; + break; + default: + WARN_ONCE(true, "netns has invalid vsock mode"); + *lenp = 0; + return 0; + } + + strscpy(data, p, sizeof(data)); + tmp.maxlen = strlen(p); + } + + ret = proc_dostring(&tmp, write, buffer, lenp, ppos); + if (ret) + return ret; + + if (write) { + if (*lenp >= sizeof(data)) + return -EINVAL; + + if (!strncmp(data, VSOCK_NET_MODE_STR_GLOBAL, sizeof(data))) + mode = VSOCK_NET_MODE_GLOBAL; + else if (!strncmp(data, VSOCK_NET_MODE_STR_LOCAL, sizeof(data))) + mode = VSOCK_NET_MODE_LOCAL; + else + return -EINVAL; + + if (!vsock_net_write_mode(net, mode)) + return -EPERM; + } + + return 0; +} + +static struct ctl_table vsock_table[] = { + { + .procname = "ns_mode", + .data = &init_net.vsock.mode, + .maxlen = VSOCK_NET_MODE_STR_MAX, + .mode = 0644, + .proc_handler = vsock_net_mode_string + }, +}; + +static int __net_init vsock_sysctl_register(struct net *net) +{ + struct ctl_table *table; + + if (net_eq(net, &init_net)) { + table = vsock_table; + } else { + table = kmemdup(vsock_table, sizeof(vsock_table), GFP_KERNEL); + if (!table) + goto err_alloc; + + table[0].data = &net->vsock.mode; + } + + net->vsock.sysctl_hdr = register_net_sysctl_sz(net, "net/vsock", table, + ARRAY_SIZE(vsock_table)); + if (!net->vsock.sysctl_hdr) + goto err_reg; + + return 0; + +err_reg: + if (!net_eq(net, &init_net)) + kfree(table); +err_alloc: + return -ENOMEM; +} + +static void vsock_sysctl_unregister(struct net *net) +{ + const struct ctl_table *table; + + table = net->vsock.sysctl_hdr->ctl_table_arg; + unregister_net_sysctl_table(net->vsock.sysctl_hdr); + if (!net_eq(net, &init_net)) + kfree(table); +} + +static void vsock_net_init(struct net *net) +{ + net->vsock.mode = VSOCK_NET_MODE_GLOBAL; +} + +static __net_init int vsock_sysctl_init_net(struct net *net) +{ + vsock_net_init(net); + + if (vsock_sysctl_register(net)) + return -ENOMEM; + + return 0; +} + +static __net_exit void vsock_sysctl_exit_net(struct net *net) +{ + vsock_sysctl_unregister(net); +} + +static struct pernet_operations vsock_sysctl_ops __net_initdata = { + .init = vsock_sysctl_init_net, + .exit = vsock_sysctl_exit_net, +}; + static int __init vsock_init(void) { int err = 0; @@ -2663,10 +2871,18 @@ static int __init vsock_init(void) goto err_unregister_proto; }
+ if (register_pernet_subsys(&vsock_sysctl_ops)) { + err = -ENOMEM; + goto err_unregister_sock; + } + + vsock_net_init(&init_net); vsock_bpf_build_proto();
return 0;
+err_unregister_sock: + sock_unregister(AF_VSOCK); err_unregister_proto: proto_unregister(&vsock_proto); err_deregister_misc: @@ -2680,6 +2896,7 @@ static void __exit vsock_exit(void) misc_deregister(&vsock_device); sock_unregister(AF_VSOCK); proto_unregister(&vsock_proto); + unregister_pernet_subsys(&vsock_sysctl_ops); }
const struct vsock_transport *vsock_core_get_transport(struct vsock_sock *vsk) diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index 8c867023a2e5..f92f23be3f59 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -536,7 +536,7 @@ static bool virtio_transport_msgzerocopy_allow(void) return true; }
-static bool virtio_transport_seqpacket_allow(u32 remote_cid); +static bool virtio_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid);
static struct virtio_transport virtio_transport = { .transport = { @@ -593,7 +593,7 @@ static struct virtio_transport virtio_transport = { .can_msgzerocopy = virtio_transport_can_msgzerocopy, };
-static bool virtio_transport_seqpacket_allow(u32 remote_cid) +static bool virtio_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid) { struct virtio_vsock *vsock; bool seqpacket_allow; diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c index bc2ff918b315..a8f218f0c5a3 100644 --- a/net/vmw_vsock/vsock_loopback.c +++ b/net/vmw_vsock/vsock_loopback.c @@ -46,7 +46,7 @@ static int vsock_loopback_cancel_pkt(struct vsock_sock *vsk) return 0; }
-static bool vsock_loopback_seqpacket_allow(u32 remote_cid); +static bool vsock_loopback_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid); static bool vsock_loopback_msgzerocopy_allow(void) { return true; @@ -106,7 +106,7 @@ static struct virtio_transport loopback_transport = { .send_pkt = vsock_loopback_send_pkt, };
-static bool vsock_loopback_seqpacket_allow(u32 remote_cid) +static bool vsock_loopback_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid) { return true; }
On Tue, Nov 11, 2025 at 10:54:44PM -0800, Bobby Eshleman wrote:
From: Bobby Eshleman bobbyeshleman@meta.com
Add netns logic to vsock core. Additionally, modify transport hook prototypes to be used by later transport-specific patches (e.g., *_seqpacket_allow()).
Namespaces are supported primarily by changing socket lookup functions (e.g., vsock_find_connected_socket()) to take into account the socket namespace and the namespace mode before considering a candidate socket a "match".
This patch also introduces the sysctl /proc/sys/net/vsock/ns_mode that accepts the "global" or "local" mode strings.
Add netns functionality (initialization, passing to transports, procfs, etc...) to the af_vsock socket layer. Later patches that add netns support to transports depend on this patch.
seqpacket_allow() callbacks are modified to take a vsk so that transport implementations can inspect sock_net(sk) and vsk->net_mode when performing lookups (e.g., vhost does this in its future netns patch). Because the API change affects all transports, it seemed more appropriate to make this internal API change in the "vsock core" patch then in the "vhost" patch.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com
Changes in v9:
- remove virtio_vsock_alloc_rx_skb() (Stefano)
- remove vsock_global_dummy_net, not needed as net=NULL +
net_mode=VSOCK_NET_MODE_GLOBAL achieves identical result
Changes in v7:
- hv_sock: fix hyperv build error
- explain why vhost does not use the dummy
- explain usage of __vsock_global_dummy_net
- explain why VSOCK_NET_MODE_STR_MAX is 8 characters
- use switch-case in vsock_net_mode_string()
- avoid changing transports as much as possible
- add vsock_find_{bound,connected}_socket_net()
- rename `vsock_hdr` to `sysctl_hdr`
- add virtio_vsock_alloc_linear_skb() wrapper for setting dummy net and
global mode for virtio-vsock, move skb->cb zero-ing into wrapper
- explain seqpacket_allow() change
- move net setting to __vsock_create() instead of vsock_create() so
that child sockets also have their net assigned upon accept()
Changes in v6:
- unregister sysctl ops in vsock_exit()
- af_vsock: clarify description of CID behavior
- af_vsock: fix buf vs buffer naming, and length checking
- af_vsock: fix length checking w/ correct ctl_table->maxlen
Changes in v5:
- vsock_global_net() -> vsock_global_dummy_net()
- update comments for new uAPI
- use /proc/sys/net/vsock/ns_mode instead of /proc/net/vsock_ns_mode
- add prototype changes so patch remains compilable
drivers/vhost/vsock.c | 4 +- include/net/af_vsock.h | 8 +- net/vmw_vsock/af_vsock.c | 251 ++++++++++++++++++++++++++++++++++++--- net/vmw_vsock/virtio_transport.c | 4 +- net/vmw_vsock/vsock_loopback.c | 4 +- 5 files changed, 247 insertions(+), 24 deletions(-)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index ae01457ea2cd..34adf0cf9124 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -404,7 +404,7 @@ static bool vhost_transport_msgzerocopy_allow(void) return true; }
-static bool vhost_transport_seqpacket_allow(u32 remote_cid); +static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid);
static struct virtio_transport vhost_transport = { .transport = { @@ -460,7 +460,7 @@ static struct virtio_transport vhost_transport = { .send_pkt = vhost_transport_send_pkt, };
-static bool vhost_transport_seqpacket_allow(u32 remote_cid) +static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid) { struct vhost_vsock *vsock; bool seqpacket_allow = false; diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index f3c3f74355e8..cfd121bb5ab7 100644 --- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -145,7 +145,7 @@ struct vsock_transport { int flags); int (*seqpacket_enqueue)(struct vsock_sock *vsk, struct msghdr *msg, size_t len);
- bool (*seqpacket_allow)(u32 remote_cid);
bool (*seqpacket_allow)(struct vsock_sock *vsk, u32 remote_cid); u32 (*seqpacket_has_data)(struct vsock_sock *vsk);
/* Notification. */
@@ -218,6 +218,12 @@ void vsock_remove_connected(struct vsock_sock *vsk); struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr); struct sock *vsock_find_connected_socket(struct sockaddr_vm *src, struct sockaddr_vm *dst); +struct sock *vsock_find_bound_socket_net(struct sockaddr_vm *addr, struct net *net,
enum vsock_net_mode net_mode);+struct sock *vsock_find_connected_socket_net(struct sockaddr_vm *src,
struct sockaddr_vm *dst,struct net *net,enum vsock_net_mode net_mode);void vsock_remove_sock(struct vsock_sock *vsk); void vsock_for_each_connected_socket(struct vsock_transport *transport, void (*fn)(struct sock *sk)); diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 72bb6b7ed386..c0b5946bdc95 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -83,6 +83,35 @@
- TCP_ESTABLISHED - connected
- TCP_CLOSING - disconnecting
- TCP_LISTEN - listening
- Namespaces in vsock support two different modes configured
- through /proc/sys/net/vsock/ns_mode. The modes are "local" and "global".
- Each mode defines how the namespace interacts with CIDs.
- /proc/sys/net/vsock/ns_mode is write-once, so that it may be configured
- and locked down by a namespace manager. The default is "global". The mode
- is set per-namespace.
- The modes affect the allocation and accessibility of CIDs as follows:
- global - access and allocation are all system-wide
- all CID allocation from global namespaces draw from the same
system-wide pool
- if one global namespace has already allocated some CID, another
global namespace will not be able to allocate the same CID
- global mode AF_VSOCK sockets can reach any VM or socket in any global
namespace, they are not contained to only their own namespace
- AF_VSOCK sockets in a global mode namespace cannot reach VMs or
sockets in any local mode namespace
- local - access and allocation are contained within the namespace
- CID allocation draws only from a private pool local only to the
namespace, and does not affect the CIDs available for allocation in any
other namespace (global or local)
- VMs in a local namespace do not collide with CIDs in any other local
namespace or any global namespace. For example, if a VM in a local mode
namespace is given CID 10, then CID 10 is still available for
allocation in any other namespace, but not in the same namespace
- AF_VSOCK sockets in a local mode namespace can connect only to VMs or
other sockets within their own namespace.
Should we also document what happen to pre-existing sockets/devices if the user change the mode ?
The rest LGTM!
Thanks, Stefano
*/
#include <linux/compat.h> @@ -100,6 +129,7 @@ #include <linux/module.h> #include <linux/mutex.h> #include <linux/net.h> +#include <linux/proc_fs.h> #include <linux/poll.h> #include <linux/random.h> #include <linux/skbuff.h> @@ -111,9 +141,18 @@ #include <linux/workqueue.h> #include <net/sock.h> #include <net/af_vsock.h> +#include <net/netns/vsock.h> #include <uapi/linux/vm_sockets.h> #include <uapi/asm-generic/ioctls.h>
+#define VSOCK_NET_MODE_STR_GLOBAL "global" +#define VSOCK_NET_MODE_STR_LOCAL "local"
+/* 6 chars for "global", 1 for null-terminator, and 1 more for '\n'.
- The newline is added by proc_dostring() for read operations.
- */
+#define VSOCK_NET_MODE_STR_MAX 8
static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr); static void vsock_sk_destruct(struct sock *sk); static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb); @@ -235,33 +274,44 @@ static void __vsock_remove_connected(struct vsock_sock *vsk) sock_put(&vsk->sk); }
-static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr) +static struct sock *__vsock_find_bound_socket_net(struct sockaddr_vm *addr,
struct net *net,enum vsock_net_mode net_mode){ struct vsock_sock *vsk;
list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) {
if (vsock_addr_equals_addr(addr, &vsk->local_addr))return sk_vsock(vsk);
struct sock *sk = sk_vsock(vsk);if (vsock_addr_equals_addr(addr, &vsk->local_addr) &&vsock_net_check_mode(sock_net(sk), vsk->net_mode, net, net_mode))return sk;if (addr->svm_port == vsk->local_addr.svm_port && (vsk->local_addr.svm_cid == VMADDR_CID_ANY ||
addr->svm_cid == VMADDR_CID_ANY))return sk_vsock(vsk);
addr->svm_cid == VMADDR_CID_ANY) &&vsock_net_check_mode(sock_net(sk), vsk->net_mode, net, net_mode))return sk;}
return NULL;
}
-static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
struct sockaddr_vm *dst)+static struct sock *__vsock_find_connected_socket_net(struct sockaddr_vm *src,
struct sockaddr_vm *dst,struct net *net,enum vsock_net_mode net_mode){ struct vsock_sock *vsk;
list_for_each_entry(vsk, vsock_connected_sockets(src, dst), connected_table) {
struct sock *sk = sk_vsock(vsk);- if (vsock_addr_equals_addr(src, &vsk->remote_addr) &&
dst->svm_port == vsk->local_addr.svm_port) {return sk_vsock(vsk);
dst->svm_port == vsk->local_addr.svm_port &&vsock_net_check_mode(sock_net(sk), vsk->net_mode, net, net_mode)) { } }return sk;@@ -304,12 +354,14 @@ void vsock_remove_connected(struct vsock_sock *vsk) } EXPORT_SYMBOL_GPL(vsock_remove_connected);
-struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr) +struct sock *vsock_find_bound_socket_net(struct sockaddr_vm *addr,
struct net *net,enum vsock_net_mode net_mode){ struct sock *sk;
spin_lock_bh(&vsock_table_lock);
- sk = __vsock_find_bound_socket(addr);
- sk = __vsock_find_bound_socket_net(addr, net, net_mode); if (sk) sock_hold(sk);
@@ -317,15 +369,23 @@ struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr)
return sk; } +EXPORT_SYMBOL_GPL(vsock_find_bound_socket_net);
+struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr) +{
- return vsock_find_bound_socket_net(addr, NULL, VSOCK_NET_MODE_GLOBAL);
+} EXPORT_SYMBOL_GPL(vsock_find_bound_socket);
-struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
struct sockaddr_vm *dst)+struct sock *vsock_find_connected_socket_net(struct sockaddr_vm *src,
struct sockaddr_vm *dst,struct net *net,enum vsock_net_mode net_mode){ struct sock *sk;
spin_lock_bh(&vsock_table_lock);
- sk = __vsock_find_connected_socket(src, dst);
- sk = __vsock_find_connected_socket_net(src, dst, net, net_mode); if (sk) sock_hold(sk);
@@ -333,6 +393,14 @@ struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
return sk; } +EXPORT_SYMBOL_GPL(vsock_find_connected_socket_net);
+struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
struct sockaddr_vm *dst)+{
- return vsock_find_connected_socket_net(src, dst,
NULL, VSOCK_NET_MODE_GLOBAL);+} EXPORT_SYMBOL_GPL(vsock_find_connected_socket);
void vsock_remove_sock(struct vsock_sock *vsk) @@ -528,7 +596,7 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
if (sk->sk_type == SOCK_SEQPACKET) { if (!new_transport->seqpacket_allow ||
!new_transport->seqpacket_allow(remote_cid)) {
}!new_transport->seqpacket_allow(vsk, remote_cid)) { module_put(new_transport->module); return -ESOCKTNOSUPPORT;@@ -676,6 +744,7 @@ static void vsock_pending_work(struct work_struct *work) static int __vsock_bind_connectible(struct vsock_sock *vsk, struct sockaddr_vm *addr) {
- struct net *net = sock_net(sk_vsock(vsk)); static u32 port; struct sockaddr_vm new_addr;
@@ -695,7 +764,8 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
new_addr.svm_port = port++;
if (!__vsock_find_bound_socket(&new_addr)) {
if (!__vsock_find_bound_socket_net(&new_addr, net,vsk->net_mode)) { found = true; break; }@@ -712,7 +782,8 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk, return -EACCES; }
if (__vsock_find_bound_socket(&new_addr))
if (__vsock_find_bound_socket_net(&new_addr, net, }vsk->net_mode)) return -EADDRINUSE;@@ -836,6 +907,8 @@ static struct sock *__vsock_create(struct net *net, vsk->buffer_max_size = VSOCK_DEFAULT_BUFFER_MAX_SIZE; }
- vsk->net_mode = vsock_net_mode(net);
- return sk;
}
@@ -2636,6 +2709,141 @@ static struct miscdevice vsock_device = { .fops = &vsock_device_ops, };
+static int vsock_net_mode_string(const struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos)+{
- char data[VSOCK_NET_MODE_STR_MAX] = {0};
- enum vsock_net_mode mode;
- struct ctl_table tmp;
- struct net *net;
- int ret;
- if (!table->data || !table->maxlen || !*lenp) {
*lenp = 0;return 0;- }
- net = current->nsproxy->net_ns;
- tmp = *table;
- tmp.data = data;
- if (!write) {
const char *p;mode = vsock_net_mode(net);switch (mode) {case VSOCK_NET_MODE_GLOBAL:p = VSOCK_NET_MODE_STR_GLOBAL;break;case VSOCK_NET_MODE_LOCAL:p = VSOCK_NET_MODE_STR_LOCAL;break;default:WARN_ONCE(true, "netns has invalid vsock mode");*lenp = 0;return 0;}strscpy(data, p, sizeof(data));tmp.maxlen = strlen(p);- }
- ret = proc_dostring(&tmp, write, buffer, lenp, ppos);
- if (ret)
return ret;- if (write) {
if (*lenp >= sizeof(data))return -EINVAL;if (!strncmp(data, VSOCK_NET_MODE_STR_GLOBAL, sizeof(data)))mode = VSOCK_NET_MODE_GLOBAL;else if (!strncmp(data, VSOCK_NET_MODE_STR_LOCAL, sizeof(data)))mode = VSOCK_NET_MODE_LOCAL;elsereturn -EINVAL;if (!vsock_net_write_mode(net, mode))return -EPERM;- }
- return 0;
+}
+static struct ctl_table vsock_table[] = {
- {
.procname = "ns_mode",.data = &init_net.vsock.mode,.maxlen = VSOCK_NET_MODE_STR_MAX,.mode = 0644,.proc_handler = vsock_net_mode_string- },
+};
+static int __net_init vsock_sysctl_register(struct net *net) +{
- struct ctl_table *table;
- if (net_eq(net, &init_net)) {
table = vsock_table;- } else {
table = kmemdup(vsock_table, sizeof(vsock_table), GFP_KERNEL);if (!table)goto err_alloc;table[0].data = &net->vsock.mode;- }
- net->vsock.sysctl_hdr = register_net_sysctl_sz(net, "net/vsock", table,
ARRAY_SIZE(vsock_table));- if (!net->vsock.sysctl_hdr)
goto err_reg;- return 0;
+err_reg:
- if (!net_eq(net, &init_net))
kfree(table);+err_alloc:
- return -ENOMEM;
+}
+static void vsock_sysctl_unregister(struct net *net) +{
- const struct ctl_table *table;
- table = net->vsock.sysctl_hdr->ctl_table_arg;
- unregister_net_sysctl_table(net->vsock.sysctl_hdr);
- if (!net_eq(net, &init_net))
kfree(table);+}
+static void vsock_net_init(struct net *net) +{
- net->vsock.mode = VSOCK_NET_MODE_GLOBAL;
+}
+static __net_init int vsock_sysctl_init_net(struct net *net) +{
- vsock_net_init(net);
- if (vsock_sysctl_register(net))
return -ENOMEM;- return 0;
+}
+static __net_exit void vsock_sysctl_exit_net(struct net *net) +{
- vsock_sysctl_unregister(net);
+}
+static struct pernet_operations vsock_sysctl_ops __net_initdata = {
- .init = vsock_sysctl_init_net,
- .exit = vsock_sysctl_exit_net,
+};
static int __init vsock_init(void) { int err = 0; @@ -2663,10 +2871,18 @@ static int __init vsock_init(void) goto err_unregister_proto; }
if (register_pernet_subsys(&vsock_sysctl_ops)) {
err = -ENOMEM;goto err_unregister_sock;}
vsock_net_init(&init_net); vsock_bpf_build_proto();
return 0;
+err_unregister_sock:
- sock_unregister(AF_VSOCK);
err_unregister_proto: proto_unregister(&vsock_proto); err_deregister_misc: @@ -2680,6 +2896,7 @@ static void __exit vsock_exit(void) misc_deregister(&vsock_device); sock_unregister(AF_VSOCK); proto_unregister(&vsock_proto);
- unregister_pernet_subsys(&vsock_sysctl_ops);
}
const struct vsock_transport *vsock_core_get_transport(struct vsock_sock *vsk) diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index 8c867023a2e5..f92f23be3f59 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -536,7 +536,7 @@ static bool virtio_transport_msgzerocopy_allow(void) return true; }
-static bool virtio_transport_seqpacket_allow(u32 remote_cid); +static bool virtio_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid);
static struct virtio_transport virtio_transport = { .transport = { @@ -593,7 +593,7 @@ static struct virtio_transport virtio_transport = { .can_msgzerocopy = virtio_transport_can_msgzerocopy, };
-static bool virtio_transport_seqpacket_allow(u32 remote_cid) +static bool virtio_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid) { struct virtio_vsock *vsock; bool seqpacket_allow; diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c index bc2ff918b315..a8f218f0c5a3 100644 --- a/net/vmw_vsock/vsock_loopback.c +++ b/net/vmw_vsock/vsock_loopback.c @@ -46,7 +46,7 @@ static int vsock_loopback_cancel_pkt(struct vsock_sock *vsk) return 0; }
-static bool vsock_loopback_seqpacket_allow(u32 remote_cid); +static bool vsock_loopback_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid); static bool vsock_loopback_msgzerocopy_allow(void) { return true; @@ -106,7 +106,7 @@ static struct virtio_transport loopback_transport = { .send_pkt = vsock_loopback_send_pkt, };
-static bool vsock_loopback_seqpacket_allow(u32 remote_cid) +static bool vsock_loopback_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid) { return true; }
-- 2.47.3
From: Bobby Eshleman bobbyeshleman@meta.com
Enable network namespace support in the virtio-vsock and common transport layer.
The changes include: 1. Add a 'net' field to virtio_vsock_pkt_info to carry the namespace pointer for outgoing packets. 2. Add 'net' and 'net_mode' to t->send_pkt() and virtio_transport_recv_pkt() functions 3. Modify callback functions to accept placeholder values (NULL and 0) for net and net_mode. The placeholders will be replaced when later patches in this series add namespace support to transports. 4. Set virtio-vsock to global mode unconditionally, instead of using placeholders. This is done in this patch because virtio-vsock won't have any additional changes to choose the net/net_mode, unlike the other transports. Same complexity as placeholders. 5. Pass net and net_mode to virtio_transport_reset_no_sock() directly. This ensures that the outgoing RST packets are scoped based on the namespace of the receiver of the failed request. 6. Pass net and net_mode to socket lookup functions using vsock_find_{bound,connected}_socket_net().
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- Changes in v9: - include/virtio_vsock.h: send_pkt() cb takes net and net_mode - virtio_transport reset_no_sock() takes net and net_mode - vhost-vsock: add placeholders to recv_pkt() for compilation - loopback: add placeholders to recv_pkt() for compilation - remove skb->cb net/net_mode usage, pass as arguments to t->send_pkt() and virtio_transport_recv_pkt() functions instead. Note that skb->cb will still be used by loopback, but only internal to loopback and never passing it to virtio common. - remove virtio_vsock_alloc_rx_skb(), it is not needed after removing skb->cb usage. - pass net and net_mode to virtio_transport_reset_no_sock()
Changes in v8: - add the virtio_vsock_alloc_rx_skb(), to be in same patch that fields are read (Stefano)
Changes in v7: - add comment explaining the !vsk case in virtio_transport_alloc_skb() --- drivers/vhost/vsock.c | 6 ++-- include/linux/virtio_vsock.h | 8 +++-- net/vmw_vsock/virtio_transport.c | 10 ++++-- net/vmw_vsock/virtio_transport_common.c | 57 ++++++++++++++++++++++++--------- net/vmw_vsock/vsock_loopback.c | 5 +-- 5 files changed, 62 insertions(+), 24 deletions(-)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 34adf0cf9124..0a0e73405532 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -269,7 +269,8 @@ static void vhost_transport_send_pkt_work(struct vhost_work *work) }
static int -vhost_transport_send_pkt(struct sk_buff *skb) +vhost_transport_send_pkt(struct sk_buff *skb, struct net *net, + enum vsock_net_mode net_mode) { struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb); struct vhost_vsock *vsock; @@ -537,7 +538,8 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work) if (le64_to_cpu(hdr->src_cid) == vsock->guest_cid && le64_to_cpu(hdr->dst_cid) == vhost_transport_get_local_cid()) - virtio_transport_recv_pkt(&vhost_transport, skb); + virtio_transport_recv_pkt(&vhost_transport, skb, NULL, + 0); else kfree_skb(skb);
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index 0c67543a45c8..5ed6136a4ed4 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -173,6 +173,8 @@ struct virtio_vsock_pkt_info { u32 remote_cid, remote_port; struct vsock_sock *vsk; struct msghdr *msg; + struct net *net; + enum vsock_net_mode net_mode; u32 pkt_len; u16 type; u16 op; @@ -185,7 +187,8 @@ struct virtio_transport { struct vsock_transport transport;
/* Takes ownership of the packet */ - int (*send_pkt)(struct sk_buff *skb); + int (*send_pkt)(struct sk_buff *skb, struct net *net, + enum vsock_net_mode net_mode);
/* Used in MSG_ZEROCOPY mode. Checks, that provided data * (number of buffers) could be transmitted with zerocopy @@ -280,7 +283,8 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk, void virtio_transport_destruct(struct vsock_sock *vsk);
void virtio_transport_recv_pkt(struct virtio_transport *t, - struct sk_buff *skb); + struct sk_buff *skb, struct net *net, + enum vsock_net_mode net_mode); void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct sk_buff *skb); u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 wanted); void virtio_transport_put_credit(struct virtio_vsock_sock *vvs, u32 credit); diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index f92f23be3f59..9395fd875823 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -231,7 +231,8 @@ static int virtio_transport_send_skb_fast_path(struct virtio_vsock *vsock, struc }
static int -virtio_transport_send_pkt(struct sk_buff *skb) +virtio_transport_send_pkt(struct sk_buff *skb, struct net *net, + enum vsock_net_mode net_mode) { struct virtio_vsock_hdr *hdr; struct virtio_vsock *vsock; @@ -660,7 +661,12 @@ static void virtio_transport_rx_work(struct work_struct *work) virtio_vsock_skb_put(skb, payload_len);
virtio_transport_deliver_tap_pkt(skb); - virtio_transport_recv_pkt(&virtio_transport, skb); + + /* Force virtio-transport into global mode since it + * does not yet support local-mode namespacing. + */ + virtio_transport_recv_pkt(&virtio_transport, skb, + NULL, VSOCK_NET_MODE_GLOBAL); } } while (!virtqueue_enable_cb(vq));
diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index dcc8a1d5851e..f4e09cb1567c 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -413,7 +413,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
virtio_transport_inc_tx_pkt(vvs, skb);
- ret = t_ops->send_pkt(skb); + ret = t_ops->send_pkt(skb, info->net, info->net_mode); if (ret < 0) break;
@@ -527,6 +527,8 @@ static int virtio_transport_send_credit_update(struct vsock_sock *vsk) struct virtio_vsock_pkt_info info = { .op = VIRTIO_VSOCK_OP_CREDIT_UPDATE, .vsk = vsk, + .net = sock_net(sk_vsock(vsk)), + .net_mode = vsk->net_mode, };
return virtio_transport_send_pkt_info(vsk, &info); @@ -1067,6 +1069,8 @@ int virtio_transport_connect(struct vsock_sock *vsk) struct virtio_vsock_pkt_info info = { .op = VIRTIO_VSOCK_OP_REQUEST, .vsk = vsk, + .net = sock_net(sk_vsock(vsk)), + .net_mode = vsk->net_mode, };
return virtio_transport_send_pkt_info(vsk, &info); @@ -1082,6 +1086,8 @@ int virtio_transport_shutdown(struct vsock_sock *vsk, int mode) (mode & SEND_SHUTDOWN ? VIRTIO_VSOCK_SHUTDOWN_SEND : 0), .vsk = vsk, + .net = sock_net(sk_vsock(vsk)), + .net_mode = vsk->net_mode, };
return virtio_transport_send_pkt_info(vsk, &info); @@ -1108,6 +1114,8 @@ virtio_transport_stream_enqueue(struct vsock_sock *vsk, .msg = msg, .pkt_len = len, .vsk = vsk, + .net = sock_net(sk_vsock(vsk)), + .net_mode = vsk->net_mode, };
return virtio_transport_send_pkt_info(vsk, &info); @@ -1145,6 +1153,8 @@ static int virtio_transport_reset(struct vsock_sock *vsk, .op = VIRTIO_VSOCK_OP_RST, .reply = !!skb, .vsk = vsk, + .net = sock_net(sk_vsock(vsk)), + .net_mode = vsk->net_mode, };
/* Send RST only if the original pkt is not a RST pkt */ @@ -1156,15 +1166,27 @@ static int virtio_transport_reset(struct vsock_sock *vsk,
/* Normally packets are associated with a socket. There may be no socket if an * attempt was made to connect to a socket that does not exist. + * + * net and net_mode refer to the net and mode of the receiving device (e.g., + * vhost_vsock). For loopback, they refer to the sending socket net/mode. This + * way the RST packet is sent back to the same namespace as the bad request. */ static int virtio_transport_reset_no_sock(const struct virtio_transport *t, - struct sk_buff *skb) + struct sk_buff *skb, struct net *net, + enum vsock_net_mode net_mode) { struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb); struct virtio_vsock_pkt_info info = { .op = VIRTIO_VSOCK_OP_RST, .type = le16_to_cpu(hdr->type), .reply = true, + + /* net or net_mode are not defined here because we pass + * net and net_mode directly to t->send_pkt(), instead of + * relying on virtio_transport_send_pkt_info() to pass them to + * t->send_pkt(). They are not needed by + * virtio_transport_alloc_skb(). + */ }; struct sk_buff *reply;
@@ -1183,7 +1205,7 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t, if (!reply) return -ENOMEM;
- return t->send_pkt(reply); + return t->send_pkt(reply, net, net_mode); }
/* This function should be called with sk_lock held and SOCK_DONE set */ @@ -1465,6 +1487,8 @@ virtio_transport_send_response(struct vsock_sock *vsk, .remote_port = le32_to_cpu(hdr->src_port), .reply = true, .vsk = vsk, + .net = sock_net(sk_vsock(vsk)), + .net_mode = vsk->net_mode, };
return virtio_transport_send_pkt_info(vsk, &info); @@ -1507,12 +1531,12 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb, int ret;
if (le16_to_cpu(hdr->op) != VIRTIO_VSOCK_OP_REQUEST) { - virtio_transport_reset_no_sock(t, skb); + virtio_transport_reset_no_sock(t, skb, sock_net(sk), vsk->net_mode); return -EINVAL; }
if (sk_acceptq_is_full(sk)) { - virtio_transport_reset_no_sock(t, skb); + virtio_transport_reset_no_sock(t, skb, sock_net(sk), vsk->net_mode); return -ENOMEM; }
@@ -1520,13 +1544,13 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb, * Subsequent enqueues would lead to a memory leak. */ if (sk->sk_shutdown == SHUTDOWN_MASK) { - virtio_transport_reset_no_sock(t, skb); + virtio_transport_reset_no_sock(t, skb, sock_net(sk), vsk->net_mode); return -ESHUTDOWN; }
child = vsock_create_connected(sk); if (!child) { - virtio_transport_reset_no_sock(t, skb); + virtio_transport_reset_no_sock(t, skb, sock_net(sk), vsk->net_mode); return -ENOMEM; }
@@ -1548,7 +1572,7 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb, */ if (ret || vchild->transport != &t->transport) { release_sock(child); - virtio_transport_reset_no_sock(t, skb); + virtio_transport_reset_no_sock(t, skb, sock_net(sk), vsk->net_mode); sock_put(child); return ret; } @@ -1576,7 +1600,8 @@ static bool virtio_transport_valid_type(u16 type) * lock. */ void virtio_transport_recv_pkt(struct virtio_transport *t, - struct sk_buff *skb) + struct sk_buff *skb, struct net *net, + enum vsock_net_mode net_mode) { struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb); struct sockaddr_vm src, dst; @@ -1599,24 +1624,24 @@ void virtio_transport_recv_pkt(struct virtio_transport *t, le32_to_cpu(hdr->fwd_cnt));
if (!virtio_transport_valid_type(le16_to_cpu(hdr->type))) { - (void)virtio_transport_reset_no_sock(t, skb); + (void)virtio_transport_reset_no_sock(t, skb, net, net_mode); goto free_pkt; }
/* The socket must be in connected or bound table * otherwise send reset back */ - sk = vsock_find_connected_socket(&src, &dst); + sk = vsock_find_connected_socket_net(&src, &dst, net, net_mode); if (!sk) { - sk = vsock_find_bound_socket(&dst); + sk = vsock_find_bound_socket_net(&dst, net, net_mode); if (!sk) { - (void)virtio_transport_reset_no_sock(t, skb); + (void)virtio_transport_reset_no_sock(t, skb, net, net_mode); goto free_pkt; } }
if (virtio_transport_get_type(sk) != le16_to_cpu(hdr->type)) { - (void)virtio_transport_reset_no_sock(t, skb); + (void)virtio_transport_reset_no_sock(t, skb, net, net_mode); sock_put(sk); goto free_pkt; } @@ -1635,7 +1660,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t, */ if (sock_flag(sk, SOCK_DONE) || (sk->sk_state != TCP_LISTEN && vsk->transport != &t->transport)) { - (void)virtio_transport_reset_no_sock(t, skb); + (void)virtio_transport_reset_no_sock(t, skb, net, net_mode); release_sock(sk); sock_put(sk); goto free_pkt; @@ -1667,7 +1692,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t, kfree_skb(skb); break; default: - (void)virtio_transport_reset_no_sock(t, skb); + (void)virtio_transport_reset_no_sock(t, skb, net, net_mode); kfree_skb(skb); break; } diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c index a8f218f0c5a3..d3ac056663ea 100644 --- a/net/vmw_vsock/vsock_loopback.c +++ b/net/vmw_vsock/vsock_loopback.c @@ -26,7 +26,8 @@ static u32 vsock_loopback_get_local_cid(void) return VMADDR_CID_LOCAL; }
-static int vsock_loopback_send_pkt(struct sk_buff *skb) +static int vsock_loopback_send_pkt(struct sk_buff *skb, struct net *net, + enum vsock_net_mode net_mode) { struct vsock_loopback *vsock = &the_vsock_loopback; int len = skb->len; @@ -130,7 +131,7 @@ static void vsock_loopback_work(struct work_struct *work) */ virtio_transport_consume_skb_sent(skb, false); virtio_transport_deliver_tap_pkt(skb); - virtio_transport_recv_pkt(&loopback_transport, skb); + virtio_transport_recv_pkt(&loopback_transport, skb, NULL, 0); } }
On Tue, Nov 11, 2025 at 10:54:45PM -0800, Bobby Eshleman wrote:
From: Bobby Eshleman bobbyeshleman@meta.com
Enable network namespace support in the virtio-vsock and common transport layer.
The changes include:
This list seems to have been generated by AI. I have nothing against it, but I don't think it's important to list all the things that have changed, but rather to explain why.
- Add a 'net' field to virtio_vsock_pkt_info to carry the namespace
pointer for outgoing packets.
Why?
- Add 'net' and 'net_mode' to t->send_pkt() and
virtio_transport_recv_pkt() functions
Why?
- Modify callback functions to accept placeholder values
(NULL and 0) for net and net_mode. The placeholders will be
Why 0 ? I mean VSOCK_NET_MODE_GLOBAL is also 0, no? So I don't understand if you want to specify an invalid value (like NULL) or VSOCK_NET_MODE_GLOBAL.
replaced when later patches in this series add namespace support to transports. 4. Set virtio-vsock to global mode unconditionally, instead of using placeholders. This is done in this patch because virtio-vsock won't have any additional changes to choose the net/net_mode, unlike the other transports. Same complexity as placeholders. 5. Pass net and net_mode to virtio_transport_reset_no_sock() directly. This ensures that the outgoing RST packets are scoped based on the namespace of the receiver of the failed request.
"Receiver" is confusing IMO, see the comment on virtio_transport_reset_no_sock().
- Pass net and net_mode to socket lookup functions using
vsock_find_{bound,connected}_socket_net().
mmmm, are those functions working fine with the placeholders?
If it simplifies, I think we can eventually merge all changes to transports that depends on virtio_transport_common in a single commit. IMO is better to have working commits than better split.
I mean, is this commit working (at runtime) well?
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com
Changes in v9:
- include/virtio_vsock.h: send_pkt() cb takes net and net_mode
- virtio_transport reset_no_sock() takes net and net_mode
- vhost-vsock: add placeholders to recv_pkt() for compilation
- loopback: add placeholders to recv_pkt() for compilation
- remove skb->cb net/net_mode usage, pass as arguments to
t->send_pkt() and virtio_transport_recv_pkt() functions instead. Note that skb->cb will still be used by loopback, but only internal to loopback and never passing it to virtio common.
- remove virtio_vsock_alloc_rx_skb(), it is not needed after removing
skb->cb usage.
- pass net and net_mode to virtio_transport_reset_no_sock()
Changes in v8:
- add the virtio_vsock_alloc_rx_skb(), to be in same patch that fields
are read (Stefano)
Changes in v7:
- add comment explaining the !vsk case in virtio_transport_alloc_skb()
drivers/vhost/vsock.c | 6 ++-- include/linux/virtio_vsock.h | 8 +++-- net/vmw_vsock/virtio_transport.c | 10 ++++-- net/vmw_vsock/virtio_transport_common.c | 57 ++++++++++++++++++++++++--------- net/vmw_vsock/vsock_loopback.c | 5 +-- 5 files changed, 62 insertions(+), 24 deletions(-)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 34adf0cf9124..0a0e73405532 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -269,7 +269,8 @@ static void vhost_transport_send_pkt_work(struct vhost_work *work) }
static int -vhost_transport_send_pkt(struct sk_buff *skb) +vhost_transport_send_pkt(struct sk_buff *skb, struct net *net,
enum vsock_net_mode net_mode){ struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb); struct vhost_vsock *vsock; @@ -537,7 +538,8 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work) if (le64_to_cpu(hdr->src_cid) == vsock->guest_cid && le64_to_cpu(hdr->dst_cid) == vhost_transport_get_local_cid())
virtio_transport_recv_pkt(&vhost_transport, skb);
virtio_transport_recv_pkt(&vhost_transport, skb, NULL, else kfree_skb(skb);0);diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index 0c67543a45c8..5ed6136a4ed4 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -173,6 +173,8 @@ struct virtio_vsock_pkt_info { u32 remote_cid, remote_port; struct vsock_sock *vsk; struct msghdr *msg;
- struct net *net;
- enum vsock_net_mode net_mode; u32 pkt_len; u16 type; u16 op;
@@ -185,7 +187,8 @@ struct virtio_transport { struct vsock_transport transport;
/* Takes ownership of the packet */
- int (*send_pkt)(struct sk_buff *skb);
int (*send_pkt)(struct sk_buff *skb, struct net *net,
enum vsock_net_mode net_mode);/* Used in MSG_ZEROCOPY mode. Checks, that provided data
- (number of buffers) could be transmitted with zerocopy
@@ -280,7 +283,8 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk, void virtio_transport_destruct(struct vsock_sock *vsk);
void virtio_transport_recv_pkt(struct virtio_transport *t,
struct sk_buff *skb);
struct sk_buff *skb, struct net *net,enum vsock_net_mode net_mode);void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct sk_buff *skb); u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 wanted); void virtio_transport_put_credit(struct virtio_vsock_sock *vvs, u32 credit); diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index f92f23be3f59..9395fd875823 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -231,7 +231,8 @@ static int virtio_transport_send_skb_fast_path(struct virtio_vsock *vsock, struc }
static int -virtio_transport_send_pkt(struct sk_buff *skb) +virtio_transport_send_pkt(struct sk_buff *skb, struct net *net,
enum vsock_net_mode net_mode){ struct virtio_vsock_hdr *hdr; struct virtio_vsock *vsock; @@ -660,7 +661,12 @@ static void virtio_transport_rx_work(struct work_struct *work) virtio_vsock_skb_put(skb, payload_len);
virtio_transport_deliver_tap_pkt(skb);
virtio_transport_recv_pkt(&virtio_transport, skb);
/* Force virtio-transport into global mode since it* does not yet support local-mode namespacing.*/virtio_transport_recv_pkt(&virtio_transport, skb, } } while (!virtqueue_enable_cb(vq));NULL, VSOCK_NET_MODE_GLOBAL);diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index dcc8a1d5851e..f4e09cb1567c 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -413,7 +413,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
virtio_transport_inc_tx_pkt(vvs, skb);
ret = t_ops->send_pkt(skb);
if (ret < 0) break;ret = t_ops->send_pkt(skb, info->net, info->net_mode);@@ -527,6 +527,8 @@ static int virtio_transport_send_credit_update(struct vsock_sock *vsk) struct virtio_vsock_pkt_info info = { .op = VIRTIO_VSOCK_OP_CREDIT_UPDATE, .vsk = vsk,
.net = sock_net(sk_vsock(vsk)),.net_mode = vsk->net_mode,};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -1067,6 +1069,8 @@ int virtio_transport_connect(struct vsock_sock *vsk) struct virtio_vsock_pkt_info info = { .op = VIRTIO_VSOCK_OP_REQUEST, .vsk = vsk,
.net = sock_net(sk_vsock(vsk)),.net_mode = vsk->net_mode,};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -1082,6 +1086,8 @@ int virtio_transport_shutdown(struct vsock_sock *vsk, int mode) (mode & SEND_SHUTDOWN ? VIRTIO_VSOCK_SHUTDOWN_SEND : 0), .vsk = vsk,
.net = sock_net(sk_vsock(vsk)),.net_mode = vsk->net_mode,};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -1108,6 +1114,8 @@ virtio_transport_stream_enqueue(struct vsock_sock *vsk, .msg = msg, .pkt_len = len, .vsk = vsk,
.net = sock_net(sk_vsock(vsk)),.net_mode = vsk->net_mode,};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -1145,6 +1153,8 @@ static int virtio_transport_reset(struct vsock_sock *vsk, .op = VIRTIO_VSOCK_OP_RST, .reply = !!skb, .vsk = vsk,
.net = sock_net(sk_vsock(vsk)),.net_mode = vsk->net_mode,};
/* Send RST only if the original pkt is not a RST pkt */
@@ -1156,15 +1166,27 @@ static int virtio_transport_reset(struct vsock_sock *vsk,
/* Normally packets are associated with a socket. There may be no socket if an
- attempt was made to connect to a socket that does not exist.
- net and net_mode refer to the net and mode of the receiving device (e.g.,
- vhost_vsock). For loopback, they refer to the sending socket net/mode. This
- way the RST packet is sent back to the same namespace as the bad request.
Could this be a problem, should we split this function?
BTW, I'm a bit confused. For vhost-vsock, this is the namespace of the device, so the namespace of the guest, so also in that case the namespace of the sender, no?
Maybe sender/receiver are confusing. What you want to highlight with this comment?
*/ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
struct sk_buff *skb)
struct sk_buff *skb, struct net *net,enum vsock_net_mode net_mode){ struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb); struct virtio_vsock_pkt_info info = { .op = VIRTIO_VSOCK_OP_RST, .type = le16_to_cpu(hdr->type), .reply = true,
/* net or net_mode are not defined here because we pass* net and net_mode directly to t->send_pkt(), instead of* relying on virtio_transport_send_pkt_info() to pass them to* t->send_pkt(). They are not needed by* virtio_transport_alloc_skb(). }; struct sk_buff *reply;*/@@ -1183,7 +1205,7 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t, if (!reply) return -ENOMEM;
- return t->send_pkt(reply);
- return t->send_pkt(reply, net, net_mode);
}
/* This function should be called with sk_lock held and SOCK_DONE set */ @@ -1465,6 +1487,8 @@ virtio_transport_send_response(struct vsock_sock *vsk, .remote_port = le32_to_cpu(hdr->src_port), .reply = true, .vsk = vsk,
.net = sock_net(sk_vsock(vsk)),.net_mode = vsk->net_mode,};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -1507,12 +1531,12 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb, int ret;
if (le16_to_cpu(hdr->op) != VIRTIO_VSOCK_OP_REQUEST) {
virtio_transport_reset_no_sock(t, skb);
virtio_transport_reset_no_sock(t, skb, sock_net(sk), vsk->net_mode);return -EINVAL; }
if (sk_acceptq_is_full(sk)) {
virtio_transport_reset_no_sock(t, skb);
return -ENOMEM; }virtio_transport_reset_no_sock(t, skb, sock_net(sk), vsk->net_mode);@@ -1520,13 +1544,13 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb, * Subsequent enqueues would lead to a memory leak. */ if (sk->sk_shutdown == SHUTDOWN_MASK) {
virtio_transport_reset_no_sock(t, skb);
virtio_transport_reset_no_sock(t, skb, sock_net(sk), vsk->net_mode);return -ESHUTDOWN; }
child = vsock_create_connected(sk); if (!child) {
virtio_transport_reset_no_sock(t, skb);
return -ENOMEM; }virtio_transport_reset_no_sock(t, skb, sock_net(sk), vsk->net_mode);@@ -1548,7 +1572,7 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb, */ if (ret || vchild->transport != &t->transport) { release_sock(child);
virtio_transport_reset_no_sock(t, skb);
sock_put(child); return ret; }virtio_transport_reset_no_sock(t, skb, sock_net(sk), vsk->net_mode);@@ -1576,7 +1600,8 @@ static bool virtio_transport_valid_type(u16 type)
- lock.
*/ void virtio_transport_recv_pkt(struct virtio_transport *t,
struct sk_buff *skb)
struct sk_buff *skb, struct net *net,enum vsock_net_mode net_mode){ struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb); struct sockaddr_vm src, dst; @@ -1599,24 +1624,24 @@ void virtio_transport_recv_pkt(struct virtio_transport *t, le32_to_cpu(hdr->fwd_cnt));
if (!virtio_transport_valid_type(le16_to_cpu(hdr->type))) {
(void)virtio_transport_reset_no_sock(t, skb);
(void)virtio_transport_reset_no_sock(t, skb, net, net_mode);goto free_pkt; }
/* The socket must be in connected or bound table
- otherwise send reset back
*/
- sk = vsock_find_connected_socket(&src, &dst);
- sk = vsock_find_connected_socket_net(&src, &dst, net, net_mode);
Here `net` can be null, right? Is this okay?
if (!sk) {
sk = vsock_find_bound_socket(&dst);
if (!sk) {sk = vsock_find_bound_socket_net(&dst, net, net_mode);
(void)virtio_transport_reset_no_sock(t, skb);
(void)virtio_transport_reset_no_sock(t, skb, net, net_mode); goto free_pkt;} }
if (virtio_transport_get_type(sk) != le16_to_cpu(hdr->type)) {
(void)virtio_transport_reset_no_sock(t, skb);
sock_put(sk); goto free_pkt; }(void)virtio_transport_reset_no_sock(t, skb, net, net_mode);@@ -1635,7 +1660,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t, */ if (sock_flag(sk, SOCK_DONE) || (sk->sk_state != TCP_LISTEN && vsk->transport != &t->transport)) {
(void)virtio_transport_reset_no_sock(t, skb);
release_sock(sk); sock_put(sk); goto free_pkt;(void)virtio_transport_reset_no_sock(t, skb, net, net_mode);@@ -1667,7 +1692,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t, kfree_skb(skb); break; default:
(void)virtio_transport_reset_no_sock(t, skb);
kfree_skb(skb); break; }(void)virtio_transport_reset_no_sock(t, skb, net, net_mode);diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c index a8f218f0c5a3..d3ac056663ea 100644 --- a/net/vmw_vsock/vsock_loopback.c +++ b/net/vmw_vsock/vsock_loopback.c @@ -26,7 +26,8 @@ static u32 vsock_loopback_get_local_cid(void) return VMADDR_CID_LOCAL; }
-static int vsock_loopback_send_pkt(struct sk_buff *skb) +static int vsock_loopback_send_pkt(struct sk_buff *skb, struct net *net,
enum vsock_net_mode net_mode){ struct vsock_loopback *vsock = &the_vsock_loopback; int len = skb->len; @@ -130,7 +131,7 @@ static void vsock_loopback_work(struct work_struct *work) */ virtio_transport_consume_skb_sent(skb, false); virtio_transport_deliver_tap_pkt(skb);
virtio_transport_recv_pkt(&loopback_transport, skb);
}virtio_transport_recv_pkt(&loopback_transport, skb, NULL, 0);}
-- 2.47.3
On Wed, Nov 12, 2025 at 03:18:42PM +0100, Stefano Garzarella wrote:
On Tue, Nov 11, 2025 at 10:54:45PM -0800, Bobby Eshleman wrote:
From: Bobby Eshleman bobbyeshleman@meta.com
Enable network namespace support in the virtio-vsock and common transport layer.
The changes include:
This list seems to have been generated by AI. I have nothing against it, but I don't think it's important to list all the things that have changed, but rather to explain why.
Sounds good, I'll keep that in mind on why vs what. I have been experimenting with AI in my process, but sadly this list was mostly hand-rolled. I guess exhaustive listing is an over-correction for too sparse of commit messages on my part.
- Add a 'net' field to virtio_vsock_pkt_info to carry the namespace
pointer for outgoing packets.
Why?
- Add 'net' and 'net_mode' to t->send_pkt() and
virtio_transport_recv_pkt() functions
Why?
- Modify callback functions to accept placeholder values
(NULL and 0) for net and net_mode. The placeholders will be
Why 0 ? I mean VSOCK_NET_MODE_GLOBAL is also 0, no? So I don't understand if you want to specify an invalid value (like NULL) or VSOCK_NET_MODE_GLOBAL.
replaced when later patches in this series add namespace support to transports. 4. Set virtio-vsock to global mode unconditionally, instead of using placeholders. This is done in this patch because virtio-vsock won't have any additional changes to choose the net/net_mode, unlike the other transports. Same complexity as placeholders. 5. Pass net and net_mode to virtio_transport_reset_no_sock() directly. This ensures that the outgoing RST packets are scoped based on the namespace of the receiver of the failed request.
"Receiver" is confusing IMO, see the comment on virtio_transport_reset_no_sock().
- Pass net and net_mode to socket lookup functions using
vsock_find_{bound,connected}_socket_net().
mmmm, are those functions working fine with the placeholders?
They should resolve everything to global mode as this is why virtio-vsock does by the end of this series, but I didn't run the tests specifically on this patch.
If it simplifies, I think we can eventually merge all changes to transports that depends on virtio_transport_common in a single commit. IMO is better to have working commits than better split.
That would be so much easier. Much of this patch is just me trying to find a way to keep total patch size reasonably small for review... if having them all in one commit is preferred then that makes life easier.
The answer to all of the above is that I was just trying to make the virtio_common changes in one place, but not break bisect/build by failing to update the transport-level call sites. So the placeholder values are primarily there to compile.
I mean, is this commit working (at runtime) well?
In theory it should, but I only build checked it.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com
Changes in v9:
- include/virtio_vsock.h: send_pkt() cb takes net and net_mode
- virtio_transport reset_no_sock() takes net and net_mode
- vhost-vsock: add placeholders to recv_pkt() for compilation
- loopback: add placeholders to recv_pkt() for compilation
- remove skb->cb net/net_mode usage, pass as arguments to
t->send_pkt() and virtio_transport_recv_pkt() functions instead. Note that skb->cb will still be used by loopback, but only internal to loopback and never passing it to virtio common.
- remove virtio_vsock_alloc_rx_skb(), it is not needed after removing
skb->cb usage.
- pass net and net_mode to virtio_transport_reset_no_sock()
Changes in v8:
- add the virtio_vsock_alloc_rx_skb(), to be in same patch that fields
are read (Stefano)
Changes in v7:
- add comment explaining the !vsk case in virtio_transport_alloc_skb()
drivers/vhost/vsock.c | 6 ++-- include/linux/virtio_vsock.h | 8 +++-- net/vmw_vsock/virtio_transport.c | 10 ++++-- net/vmw_vsock/virtio_transport_common.c | 57 ++++++++++++++++++++++++--------- net/vmw_vsock/vsock_loopback.c | 5 +-- 5 files changed, 62 insertions(+), 24 deletions(-)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 34adf0cf9124..0a0e73405532 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -269,7 +269,8 @@ static void vhost_transport_send_pkt_work(struct vhost_work *work) }
static int -vhost_transport_send_pkt(struct sk_buff *skb) +vhost_transport_send_pkt(struct sk_buff *skb, struct net *net,
enum vsock_net_mode net_mode){ struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb); struct vhost_vsock *vsock; @@ -537,7 +538,8 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work) if (le64_to_cpu(hdr->src_cid) == vsock->guest_cid && le64_to_cpu(hdr->dst_cid) == vhost_transport_get_local_cid())
virtio_transport_recv_pkt(&vhost_transport, skb);
virtio_transport_recv_pkt(&vhost_transport, skb, NULL, else kfree_skb(skb);0);diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index 0c67543a45c8..5ed6136a4ed4 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -173,6 +173,8 @@ struct virtio_vsock_pkt_info { u32 remote_cid, remote_port; struct vsock_sock *vsk; struct msghdr *msg;
- struct net *net;
- enum vsock_net_mode net_mode; u32 pkt_len; u16 type; u16 op;
@@ -185,7 +187,8 @@ struct virtio_transport { struct vsock_transport transport;
/* Takes ownership of the packet */
- int (*send_pkt)(struct sk_buff *skb);
int (*send_pkt)(struct sk_buff *skb, struct net *net,
enum vsock_net_mode net_mode);/* Used in MSG_ZEROCOPY mode. Checks, that provided data
- (number of buffers) could be transmitted with zerocopy
@@ -280,7 +283,8 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk, void virtio_transport_destruct(struct vsock_sock *vsk);
void virtio_transport_recv_pkt(struct virtio_transport *t,
struct sk_buff *skb);
struct sk_buff *skb, struct net *net,enum vsock_net_mode net_mode);void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct sk_buff *skb); u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 wanted); void virtio_transport_put_credit(struct virtio_vsock_sock *vvs, u32 credit); diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index f92f23be3f59..9395fd875823 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -231,7 +231,8 @@ static int virtio_transport_send_skb_fast_path(struct virtio_vsock *vsock, struc }
static int -virtio_transport_send_pkt(struct sk_buff *skb) +virtio_transport_send_pkt(struct sk_buff *skb, struct net *net,
enum vsock_net_mode net_mode){ struct virtio_vsock_hdr *hdr; struct virtio_vsock *vsock; @@ -660,7 +661,12 @@ static void virtio_transport_rx_work(struct work_struct *work) virtio_vsock_skb_put(skb, payload_len);
virtio_transport_deliver_tap_pkt(skb);
virtio_transport_recv_pkt(&virtio_transport, skb);
/* Force virtio-transport into global mode since it* does not yet support local-mode namespacing.*/virtio_transport_recv_pkt(&virtio_transport, skb, } } while (!virtqueue_enable_cb(vq));NULL, VSOCK_NET_MODE_GLOBAL);diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index dcc8a1d5851e..f4e09cb1567c 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -413,7 +413,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
virtio_transport_inc_tx_pkt(vvs, skb);
ret = t_ops->send_pkt(skb);
if (ret < 0) break;ret = t_ops->send_pkt(skb, info->net, info->net_mode);@@ -527,6 +527,8 @@ static int virtio_transport_send_credit_update(struct vsock_sock *vsk) struct virtio_vsock_pkt_info info = { .op = VIRTIO_VSOCK_OP_CREDIT_UPDATE, .vsk = vsk,
.net = sock_net(sk_vsock(vsk)),.net_mode = vsk->net_mode,};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -1067,6 +1069,8 @@ int virtio_transport_connect(struct vsock_sock *vsk) struct virtio_vsock_pkt_info info = { .op = VIRTIO_VSOCK_OP_REQUEST, .vsk = vsk,
.net = sock_net(sk_vsock(vsk)),.net_mode = vsk->net_mode,};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -1082,6 +1086,8 @@ int virtio_transport_shutdown(struct vsock_sock *vsk, int mode) (mode & SEND_SHUTDOWN ? VIRTIO_VSOCK_SHUTDOWN_SEND : 0), .vsk = vsk,
.net = sock_net(sk_vsock(vsk)),.net_mode = vsk->net_mode,};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -1108,6 +1114,8 @@ virtio_transport_stream_enqueue(struct vsock_sock *vsk, .msg = msg, .pkt_len = len, .vsk = vsk,
.net = sock_net(sk_vsock(vsk)),.net_mode = vsk->net_mode,};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -1145,6 +1153,8 @@ static int virtio_transport_reset(struct vsock_sock *vsk, .op = VIRTIO_VSOCK_OP_RST, .reply = !!skb, .vsk = vsk,
.net = sock_net(sk_vsock(vsk)),.net_mode = vsk->net_mode,};
/* Send RST only if the original pkt is not a RST pkt */
@@ -1156,15 +1166,27 @@ static int virtio_transport_reset(struct vsock_sock *vsk,
/* Normally packets are associated with a socket. There may be no socket if an
- attempt was made to connect to a socket that does not exist.
- net and net_mode refer to the net and mode of the receiving device (e.g.,
- vhost_vsock). For loopback, they refer to the sending socket net/mode. This
- way the RST packet is sent back to the same namespace as the bad request.
Could this be a problem, should we split this function?
BTW, I'm a bit confused. For vhost-vsock, this is the namespace of the device, so the namespace of the guest, so also in that case the namespace of the sender, no?
Maybe sender/receiver are confusing. What you want to highlight with this comment?
Sounds good, I'll try to update it with clarification. The namespace passed in needs to be the namespace of whoever sent the bad message. For vhost-vsock (and probably virtio-vsock eventually) that will be the device/guest namespace. For loopback, it is just the namespace of the socket that sent the bad message.
*/ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
struct sk_buff *skb)
struct sk_buff *skb, struct net *net,enum vsock_net_mode net_mode){ struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb); struct virtio_vsock_pkt_info info = { .op = VIRTIO_VSOCK_OP_RST, .type = le16_to_cpu(hdr->type), .reply = true,
/* net or net_mode are not defined here because we pass* net and net_mode directly to t->send_pkt(), instead of* relying on virtio_transport_send_pkt_info() to pass them to* t->send_pkt(). They are not needed by* virtio_transport_alloc_skb(). }; struct sk_buff *reply;*/@@ -1183,7 +1205,7 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t, if (!reply) return -ENOMEM;
- return t->send_pkt(reply);
- return t->send_pkt(reply, net, net_mode);
}
/* This function should be called with sk_lock held and SOCK_DONE set */ @@ -1465,6 +1487,8 @@ virtio_transport_send_response(struct vsock_sock *vsk, .remote_port = le32_to_cpu(hdr->src_port), .reply = true, .vsk = vsk,
.net = sock_net(sk_vsock(vsk)),.net_mode = vsk->net_mode,};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -1507,12 +1531,12 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb, int ret;
if (le16_to_cpu(hdr->op) != VIRTIO_VSOCK_OP_REQUEST) {
virtio_transport_reset_no_sock(t, skb);
virtio_transport_reset_no_sock(t, skb, sock_net(sk), vsk->net_mode);return -EINVAL; }
if (sk_acceptq_is_full(sk)) {
virtio_transport_reset_no_sock(t, skb);
return -ENOMEM; }virtio_transport_reset_no_sock(t, skb, sock_net(sk), vsk->net_mode);@@ -1520,13 +1544,13 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb, * Subsequent enqueues would lead to a memory leak. */ if (sk->sk_shutdown == SHUTDOWN_MASK) {
virtio_transport_reset_no_sock(t, skb);
virtio_transport_reset_no_sock(t, skb, sock_net(sk), vsk->net_mode);return -ESHUTDOWN; }
child = vsock_create_connected(sk); if (!child) {
virtio_transport_reset_no_sock(t, skb);
return -ENOMEM; }virtio_transport_reset_no_sock(t, skb, sock_net(sk), vsk->net_mode);@@ -1548,7 +1572,7 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb, */ if (ret || vchild->transport != &t->transport) { release_sock(child);
virtio_transport_reset_no_sock(t, skb);
sock_put(child); return ret; }virtio_transport_reset_no_sock(t, skb, sock_net(sk), vsk->net_mode);@@ -1576,7 +1600,8 @@ static bool virtio_transport_valid_type(u16 type)
- lock.
*/ void virtio_transport_recv_pkt(struct virtio_transport *t,
struct sk_buff *skb)
struct sk_buff *skb, struct net *net,enum vsock_net_mode net_mode){ struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb); struct sockaddr_vm src, dst; @@ -1599,24 +1624,24 @@ void virtio_transport_recv_pkt(struct virtio_transport *t, le32_to_cpu(hdr->fwd_cnt));
if (!virtio_transport_valid_type(le16_to_cpu(hdr->type))) {
(void)virtio_transport_reset_no_sock(t, skb);
(void)virtio_transport_reset_no_sock(t, skb, net, net_mode);goto free_pkt; }
/* The socket must be in connected or bound table
- otherwise send reset back
*/
- sk = vsock_find_connected_socket(&src, &dst);
- sk = vsock_find_connected_socket_net(&src, &dst, net, net_mode);
Here `net` can be null, right? Is this okay?
Yes, it can be null. net_eq() comparisons pointers (returns false), and then the modes evaluate w/ GLOBAL == GLOBAL.
This goes away if we combine patches though.
Thanks again for the review!
Best, Bobby
On Wed, Nov 12, 2025 at 08:13:50AM -0800, Bobby Eshleman wrote:
On Wed, Nov 12, 2025 at 03:18:42PM +0100, Stefano Garzarella wrote:
On Tue, Nov 11, 2025 at 10:54:45PM -0800, Bobby Eshleman wrote:
From: Bobby Eshleman bobbyeshleman@meta.com
Enable network namespace support in the virtio-vsock and common transport layer.
The changes include:
This list seems to have been generated by AI. I have nothing against it, but I don't think it's important to list all the things that have changed, but rather to explain why.
Sounds good, I'll keep that in mind on why vs what. I have been experimenting with AI in my process, but sadly this list was mostly hand-rolled. I guess exhaustive listing is an over-correction for too sparse of commit messages on my part.
- Add a 'net' field to virtio_vsock_pkt_info to carry the namespace
pointer for outgoing packets.
Why?
- Add 'net' and 'net_mode' to t->send_pkt() and
virtio_transport_recv_pkt() functions
Why?
- Modify callback functions to accept placeholder values
(NULL and 0) for net and net_mode. The placeholders will be
Why 0 ? I mean VSOCK_NET_MODE_GLOBAL is also 0, no? So I don't understand if you want to specify an invalid value (like NULL) or VSOCK_NET_MODE_GLOBAL.
replaced when later patches in this series add namespace support to transports. 4. Set virtio-vsock to global mode unconditionally, instead of using placeholders. This is done in this patch because virtio-vsock won't have any additional changes to choose the net/net_mode, unlike the other transports. Same complexity as placeholders. 5. Pass net and net_mode to virtio_transport_reset_no_sock() directly. This ensures that the outgoing RST packets are scoped based on the namespace of the receiver of the failed request.
"Receiver" is confusing IMO, see the comment on virtio_transport_reset_no_sock().
- Pass net and net_mode to socket lookup functions using
vsock_find_{bound,connected}_socket_net().
mmmm, are those functions working fine with the placeholders?
They should resolve everything to global mode as this is why virtio-vsock does by the end of this series, but I didn't run the tests specifically on this patch.
If it simplifies, I think we can eventually merge all changes to transports that depends on virtio_transport_common in a single commit. IMO is better to have working commits than better split.
That would be so much easier. Much of this patch is just me trying to find a way to keep total patch size reasonably small for review... if having them all in one commit is preferred then that makes life easier.
The answer to all of the above is that I was just trying to make the virtio_common changes in one place, but not break bisect/build by failing to update the transport-level call sites. So the placeholder values are primarily there to compile.
In theory, they should compile, but they should also properly behave.
BTW I strongly believe that having separate commits is a great thing, but we shouldn't take things to extremes and complicate our lives when things are too closely related, as in this case.
There is a clear dependency between these patches, so IMO, if the patch doesn't become huge, it's better to have everything together. (I mean between dependencies with virtio_transport_common).
What we could perhaps do is have an initial commit where you make the changes, but the behavior remains unchanged (continue to use global everywhere, as for virtio_transport.c in this patch), and then specific commits to just enable support for local/global.
Not sure if it's doable, but I'd like to remove the placeholders if possibile. Let's discuss more about it if there are issues.
I mean, is this commit working (at runtime) well?
In theory it should, but I only build checked it.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com
Changes in v9:
- include/virtio_vsock.h: send_pkt() cb takes net and net_mode
- virtio_transport reset_no_sock() takes net and net_mode
- vhost-vsock: add placeholders to recv_pkt() for compilation
- loopback: add placeholders to recv_pkt() for compilation
- remove skb->cb net/net_mode usage, pass as arguments to
t->send_pkt() and virtio_transport_recv_pkt() functions instead. Note that skb->cb will still be used by loopback, but only internal to loopback and never passing it to virtio common.
- remove virtio_vsock_alloc_rx_skb(), it is not needed after removing
skb->cb usage.
- pass net and net_mode to virtio_transport_reset_no_sock()
Changes in v8:
- add the virtio_vsock_alloc_rx_skb(), to be in same patch that fields
are read (Stefano)
Changes in v7:
- add comment explaining the !vsk case in virtio_transport_alloc_skb()
drivers/vhost/vsock.c | 6 ++-- include/linux/virtio_vsock.h | 8 +++-- net/vmw_vsock/virtio_transport.c | 10 ++++-- net/vmw_vsock/virtio_transport_common.c | 57 ++++++++++++++++++++++++--------- net/vmw_vsock/vsock_loopback.c | 5 +-- 5 files changed, 62 insertions(+), 24 deletions(-)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 34adf0cf9124..0a0e73405532 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -269,7 +269,8 @@ static void vhost_transport_send_pkt_work(struct vhost_work *work) }
static int -vhost_transport_send_pkt(struct sk_buff *skb) +vhost_transport_send_pkt(struct sk_buff *skb, struct net *net,
enum vsock_net_mode net_mode){ struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb); struct vhost_vsock *vsock; @@ -537,7 +538,8 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work) if (le64_to_cpu(hdr->src_cid) == vsock->guest_cid && le64_to_cpu(hdr->dst_cid) == vhost_transport_get_local_cid())
virtio_transport_recv_pkt(&vhost_transport, skb);
virtio_transport_recv_pkt(&vhost_transport, skb, NULL, else kfree_skb(skb);0);diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index 0c67543a45c8..5ed6136a4ed4 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -173,6 +173,8 @@ struct virtio_vsock_pkt_info { u32 remote_cid, remote_port; struct vsock_sock *vsk; struct msghdr *msg;
- struct net *net;
- enum vsock_net_mode net_mode; u32 pkt_len; u16 type; u16 op;
@@ -185,7 +187,8 @@ struct virtio_transport { struct vsock_transport transport;
/* Takes ownership of the packet */
- int (*send_pkt)(struct sk_buff *skb);
int (*send_pkt)(struct sk_buff *skb, struct net *net,
enum vsock_net_mode net_mode);/* Used in MSG_ZEROCOPY mode. Checks, that provided data
- (number of buffers) could be transmitted with zerocopy
@@ -280,7 +283,8 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk, void virtio_transport_destruct(struct vsock_sock *vsk);
void virtio_transport_recv_pkt(struct virtio_transport *t,
struct sk_buff *skb);
struct sk_buff *skb, struct net *net,enum vsock_net_mode net_mode);void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct sk_buff *skb); u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 wanted); void virtio_transport_put_credit(struct virtio_vsock_sock *vvs, u32 credit); diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index f92f23be3f59..9395fd875823 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -231,7 +231,8 @@ static int virtio_transport_send_skb_fast_path(struct virtio_vsock *vsock, struc }
static int -virtio_transport_send_pkt(struct sk_buff *skb) +virtio_transport_send_pkt(struct sk_buff *skb, struct net *net,
enum vsock_net_mode net_mode){ struct virtio_vsock_hdr *hdr; struct virtio_vsock *vsock; @@ -660,7 +661,12 @@ static void virtio_transport_rx_work(struct work_struct *work) virtio_vsock_skb_put(skb, payload_len);
virtio_transport_deliver_tap_pkt(skb);
virtio_transport_recv_pkt(&virtio_transport, skb);
/* Force virtio-transport into global mode since it* does not yet support local-mode namespacing.*/virtio_transport_recv_pkt(&virtio_transport, skb, } } while (!virtqueue_enable_cb(vq));NULL, VSOCK_NET_MODE_GLOBAL);diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index dcc8a1d5851e..f4e09cb1567c 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -413,7 +413,7 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk,
virtio_transport_inc_tx_pkt(vvs, skb);
ret = t_ops->send_pkt(skb);
if (ret < 0) break;ret = t_ops->send_pkt(skb, info->net, info->net_mode);@@ -527,6 +527,8 @@ static int virtio_transport_send_credit_update(struct vsock_sock *vsk) struct virtio_vsock_pkt_info info = { .op = VIRTIO_VSOCK_OP_CREDIT_UPDATE, .vsk = vsk,
.net = sock_net(sk_vsock(vsk)),.net_mode = vsk->net_mode,};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -1067,6 +1069,8 @@ int virtio_transport_connect(struct vsock_sock *vsk) struct virtio_vsock_pkt_info info = { .op = VIRTIO_VSOCK_OP_REQUEST, .vsk = vsk,
.net = sock_net(sk_vsock(vsk)),.net_mode = vsk->net_mode,};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -1082,6 +1086,8 @@ int virtio_transport_shutdown(struct vsock_sock *vsk, int mode) (mode & SEND_SHUTDOWN ? VIRTIO_VSOCK_SHUTDOWN_SEND : 0), .vsk = vsk,
.net = sock_net(sk_vsock(vsk)),.net_mode = vsk->net_mode,};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -1108,6 +1114,8 @@ virtio_transport_stream_enqueue(struct vsock_sock *vsk, .msg = msg, .pkt_len = len, .vsk = vsk,
.net = sock_net(sk_vsock(vsk)),.net_mode = vsk->net_mode,};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -1145,6 +1153,8 @@ static int virtio_transport_reset(struct vsock_sock *vsk, .op = VIRTIO_VSOCK_OP_RST, .reply = !!skb, .vsk = vsk,
.net = sock_net(sk_vsock(vsk)),.net_mode = vsk->net_mode,};
/* Send RST only if the original pkt is not a RST pkt */
@@ -1156,15 +1166,27 @@ static int virtio_transport_reset(struct vsock_sock *vsk,
/* Normally packets are associated with a socket. There may be no socket if an
- attempt was made to connect to a socket that does not exist.
- net and net_mode refer to the net and mode of the receiving device (e.g.,
- vhost_vsock). For loopback, they refer to the sending socket net/mode. This
- way the RST packet is sent back to the same namespace as the bad request.
Could this be a problem, should we split this function?
BTW, I'm a bit confused. For vhost-vsock, this is the namespace of the device, so the namespace of the guest, so also in that case the namespace of the sender, no?
Maybe sender/receiver are confusing. What you want to highlight with this comment?
Sounds good, I'll try to update it with clarification. The namespace passed in needs to be the namespace of whoever sent the bad message. For vhost-vsock (and probably virtio-vsock eventually) that will be the device/guest namespace. For loopback, it is just the namespace of the socket that sent the bad message.
Yeah now is clear, thanks! So, IMO the `receiving device` was a bit confusing.
*/ static int virtio_transport_reset_no_sock(const struct virtio_transport *t,
struct sk_buff *skb)
struct sk_buff *skb, struct net *net,enum vsock_net_mode net_mode){ struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb); struct virtio_vsock_pkt_info info = { .op = VIRTIO_VSOCK_OP_RST, .type = le16_to_cpu(hdr->type), .reply = true,
/* net or net_mode are not defined here because we pass* net and net_mode directly to t->send_pkt(), instead of* relying on virtio_transport_send_pkt_info() to pass them to* t->send_pkt(). They are not needed by* virtio_transport_alloc_skb(). }; struct sk_buff *reply;*/@@ -1183,7 +1205,7 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t, if (!reply) return -ENOMEM;
- return t->send_pkt(reply);
- return t->send_pkt(reply, net, net_mode);
}
/* This function should be called with sk_lock held and SOCK_DONE set */ @@ -1465,6 +1487,8 @@ virtio_transport_send_response(struct vsock_sock *vsk, .remote_port = le32_to_cpu(hdr->src_port), .reply = true, .vsk = vsk,
.net = sock_net(sk_vsock(vsk)),.net_mode = vsk->net_mode,};
return virtio_transport_send_pkt_info(vsk, &info);
@@ -1507,12 +1531,12 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb, int ret;
if (le16_to_cpu(hdr->op) != VIRTIO_VSOCK_OP_REQUEST) {
virtio_transport_reset_no_sock(t, skb);
virtio_transport_reset_no_sock(t, skb, sock_net(sk), vsk->net_mode);return -EINVAL; }
if (sk_acceptq_is_full(sk)) {
virtio_transport_reset_no_sock(t, skb);
return -ENOMEM; }virtio_transport_reset_no_sock(t, skb, sock_net(sk), vsk->net_mode);@@ -1520,13 +1544,13 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb, * Subsequent enqueues would lead to a memory leak. */ if (sk->sk_shutdown == SHUTDOWN_MASK) {
virtio_transport_reset_no_sock(t, skb);
virtio_transport_reset_no_sock(t, skb, sock_net(sk), vsk->net_mode);return -ESHUTDOWN; }
child = vsock_create_connected(sk); if (!child) {
virtio_transport_reset_no_sock(t, skb);
return -ENOMEM; }virtio_transport_reset_no_sock(t, skb, sock_net(sk), vsk->net_mode);@@ -1548,7 +1572,7 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb, */ if (ret || vchild->transport != &t->transport) { release_sock(child);
virtio_transport_reset_no_sock(t, skb);
sock_put(child); return ret; }virtio_transport_reset_no_sock(t, skb, sock_net(sk), vsk->net_mode);@@ -1576,7 +1600,8 @@ static bool virtio_transport_valid_type(u16 type)
- lock.
*/ void virtio_transport_recv_pkt(struct virtio_transport *t,
struct sk_buff *skb)
struct sk_buff *skb, struct net *net,enum vsock_net_mode net_mode){ struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb); struct sockaddr_vm src, dst; @@ -1599,24 +1624,24 @@ void virtio_transport_recv_pkt(struct virtio_transport *t, le32_to_cpu(hdr->fwd_cnt));
if (!virtio_transport_valid_type(le16_to_cpu(hdr->type))) {
(void)virtio_transport_reset_no_sock(t, skb);
(void)virtio_transport_reset_no_sock(t, skb, net, net_mode);goto free_pkt; }
/* The socket must be in connected or bound table
- otherwise send reset back
*/
- sk = vsock_find_connected_socket(&src, &dst);
- sk = vsock_find_connected_socket_net(&src, &dst, net, net_mode);
Here `net` can be null, right? Is this okay?
Yes, it can be null. net_eq() comparisons pointers (returns false), and then the modes evaluate w/ GLOBAL == GLOBAL.
This goes away if we combine patches though.
I see, thanks! Stefano
On Wed, Nov 12, 2025 at 06:39:22PM +0100, Stefano Garzarella wrote:
On Wed, Nov 12, 2025 at 08:13:50AM -0800, Bobby Eshleman wrote:
On Wed, Nov 12, 2025 at 03:18:42PM +0100, Stefano Garzarella wrote:
On Tue, Nov 11, 2025 at 10:54:45PM -0800, Bobby Eshleman wrote:
From: Bobby Eshleman bobbyeshleman@meta.com
[...]
If it simplifies, I think we can eventually merge all changes to transports that depends on virtio_transport_common in a single commit. IMO is better to have working commits than better split.
That would be so much easier. Much of this patch is just me trying to find a way to keep total patch size reasonably small for review... if having them all in one commit is preferred then that makes life easier.
The answer to all of the above is that I was just trying to make the virtio_common changes in one place, but not break bisect/build by failing to update the transport-level call sites. So the placeholder values are primarily there to compile.
In theory, they should compile, but they should also properly behave.
BTW I strongly believe that having separate commits is a great thing, but we shouldn't take things to extremes and complicate our lives when things are too closely related, as in this case.
There is a clear dependency between these patches, so IMO, if the patch doesn't become huge, it's better to have everything together. (I mean between dependencies with virtio_transport_common).
Sounds good, let's give the combined commit a go, I think the transport-specific pieces are small enough for it to not balloon?
What we could perhaps do is have an initial commit where you make the changes, but the behavior remains unchanged (continue to use global everywhere, as for virtio_transport.c in this patch), and then specific commits to just enable support for local/global.
Not sure if it's doable, but I'd like to remove the placeholders if possibile. Let's discuss more about it if there are issues.
Sounds good, I'll come back to this thread if the combined commit approach above balloons. For the combined commit, should the change log start at "Changes in v10" with any new changes, mention combining + links to the v9 patches that were combined?
Best, Bobby
On Wed, Nov 12, 2025 at 11:32:51AM -0800, Bobby Eshleman wrote:
On Wed, Nov 12, 2025 at 06:39:22PM +0100, Stefano Garzarella wrote:
On Wed, Nov 12, 2025 at 08:13:50AM -0800, Bobby Eshleman wrote:
On Wed, Nov 12, 2025 at 03:18:42PM +0100, Stefano Garzarella wrote:
On Tue, Nov 11, 2025 at 10:54:45PM -0800, Bobby Eshleman wrote:
From: Bobby Eshleman bobbyeshleman@meta.com
[...]
If it simplifies, I think we can eventually merge all changes to transports that depends on virtio_transport_common in a single commit. IMO is better to have working commits than better split.
That would be so much easier. Much of this patch is just me trying to find a way to keep total patch size reasonably small for review... if having them all in one commit is preferred then that makes life easier.
The answer to all of the above is that I was just trying to make the virtio_common changes in one place, but not break bisect/build by failing to update the transport-level call sites. So the placeholder values are primarily there to compile.
In theory, they should compile, but they should also properly behave.
BTW I strongly believe that having separate commits is a great thing, but we shouldn't take things to extremes and complicate our lives when things are too closely related, as in this case.
There is a clear dependency between these patches, so IMO, if the patch doesn't become huge, it's better to have everything together. (I mean between dependencies with virtio_transport_common).
Sounds good, let's give the combined commit a go, I think the transport-specific pieces are small enough for it to not balloon?
Yeah, I think so.
What we could perhaps do is have an initial commit where you make the changes, but the behavior remains unchanged (continue to use global everywhere, as for virtio_transport.c in this patch), and then specific commits to just enable support for local/global.
Not sure if it's doable, but I'd like to remove the placeholders if possibile. Let's discuss more about it if there are issues.
Sounds good, I'll come back to this thread if the combined commit approach above balloons. For the combined commit, should the change log start at "Changes in v10" with any new changes, mention combining + links to the v9 patches that were combined?
Yep, that would be great. Plus exaplaining why we decided to do that (I mean just in the changelog).
Thanks, Stefano
From: Bobby Eshleman bobbyeshleman@meta.com
Reduce holes in struct virtio_vsock_skb_cb. As this struct continues to grow, we want to keep it trimmed down so it doesn't exceed the size of skb->cb (currently 48 bytes). Eliminating the 2 byte hole provides an additional two bytes for new fields at the end of the structure. It does not shrink the total size, however.
Future work could include combining fields like reply and tap_delivered into a single bitfield, but currently doing so will not make the total struct size smaller (although, would extend the tail-end padding area by one byte).
Before this patch:
struct virtio_vsock_skb_cb { bool reply; /* 0 1 */ bool tap_delivered; /* 1 1 */
/* XXX 2 bytes hole, try to pack */
u32 offset; /* 4 4 */
/* size: 8, cachelines: 1, members: 3 */ /* sum members: 6, holes: 1, sum holes: 2 */ /* last cacheline: 8 bytes */ }; ;
After this patch:
struct virtio_vsock_skb_cb { u32 offset; /* 0 4 */ bool reply; /* 4 1 */ bool tap_delivered; /* 5 1 */
/* size: 8, cachelines: 1, members: 3 */ /* padding: 2 */ /* last cacheline: 8 bytes */ };
Reviewed-by: Stefano Garzarella sgarzare@redhat.com Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- include/linux/virtio_vsock.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index 5ed6136a4ed4..18deb3c8dab3 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -10,9 +10,9 @@ #define VIRTIO_VSOCK_SKB_HEADROOM (sizeof(struct virtio_vsock_hdr))
struct virtio_vsock_skb_cb { + u32 offset; bool reply; bool tap_delivered; - u32 offset; };
#define VIRTIO_VSOCK_SKB_CB(skb) ((struct virtio_vsock_skb_cb *)((skb)->cb))
From: Bobby Eshleman bobbyeshleman@meta.com
Add a net pointer, netns_tracker, and net_mode to the vsock skb and helpers for getting/setting them. These fields are only used by vsock_loopback in order to avoid net-related race conditions (more info in the loopback patch).
This extends virtio_vsock_skb_cb to 32 bytes (with CONFIG_NET_DEV_REFCNT_TRACKER=y):
struct virtio_vsock_skb_cb { struct net * net; /* 0 8 */ netns_tracker ns_tracker; /* 8 8 */ enum vsock_net_mode net_mode; /* 16 4 */ u32 offset; /* 20 4 */ bool reply; /* 24 1 */ bool tap_delivered; /* 25 1 */
/* size: 32, cachelines: 1, members: 6 */ /* padding: 6 */ /* last cacheline: 32 bytes */ };
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- Changes in v9: - update commit message to specify usage by loopback only - add comment in virtio_vsock_skb_cb mentioning usage by vsock_loopback - add ns_tracker to skb->cb - removed Stefano's Reviewed-by trailer due to ns_tracker addition (not sure if this is the right process thing to do)
Changes in v7: - rename `orig_net_mode` to `net_mode` - update commit message with a more complete explanation of changes
Changes in v5: - some diff context change due to rebase to current net-next --- include/linux/virtio_vsock.h | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+)
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index 18deb3c8dab3..a3ef752cdb95 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -10,6 +10,10 @@ #define VIRTIO_VSOCK_SKB_HEADROOM (sizeof(struct virtio_vsock_hdr))
struct virtio_vsock_skb_cb { + /* net, net_mode, and ns_tracker are only used by vsock_loopback. */ + struct net *net; + netns_tracker ns_tracker; + enum vsock_net_mode net_mode; u32 offset; bool reply; bool tap_delivered; @@ -130,6 +134,35 @@ static inline size_t virtio_vsock_skb_len(struct sk_buff *skb) return (size_t)(skb_end_pointer(skb) - skb->head); }
+static inline struct net *virtio_vsock_skb_net(struct sk_buff *skb) +{ + return VIRTIO_VSOCK_SKB_CB(skb)->net; +} + +static inline void virtio_vsock_skb_set_net(struct sk_buff *skb, struct net *net) +{ + get_net_track(net, &VIRTIO_VSOCK_SKB_CB(skb)->ns_tracker, GFP_KERNEL); + VIRTIO_VSOCK_SKB_CB(skb)->net = net; +} + +static inline void virtio_vsock_skb_clear_net(struct sk_buff *skb) +{ + put_net_track(VIRTIO_VSOCK_SKB_CB(skb)->net, + &VIRTIO_VSOCK_SKB_CB(skb)->ns_tracker); + VIRTIO_VSOCK_SKB_CB(skb)->net = NULL; +} + +static inline enum vsock_net_mode virtio_vsock_skb_net_mode(struct sk_buff *skb) +{ + return VIRTIO_VSOCK_SKB_CB(skb)->net_mode; +} + +static inline void virtio_vsock_skb_set_net_mode(struct sk_buff *skb, + enum vsock_net_mode net_mode) +{ + VIRTIO_VSOCK_SKB_CB(skb)->net_mode = net_mode; +} + /* Dimension the RX SKB so that the entire thing fits exactly into * a single 4KiB page. This avoids wasting memory due to alloc_skb() * rounding up to the next page order and also means that we
From: Bobby Eshleman bobbyeshleman@meta.com
Add NS support to vsock loopback. Sockets in a global mode netns communicate with each other, regardless of namespace. Sockets in a local mode netns may only communicate with other sockets within the same namespace.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- Changes in v9: - remove per-netns vsock_loopback and workqueues, just re-using the net and net_mode in skb->cb achieved the same result in a simpler way. Also removed need for pernet_subsys. - properly track net references
Changes in v7: - drop for_each_net() init/exit, drop net_rwsem, the pernet registration handles this automatically and race-free - flush workqueue before destruction, purge pkt list - remember net_mode instead of current net mode - keep space after INIT_WORK() - change vsock_loopback in netns_vsock to ->priv void ptr - rename `orig_net_mode` to `net_mode` - remove useless comment - protect `register_pernet_subsys()` with `net_rwsem` - do cleanup before releasing `net_rwsem` when failure happens - call `unregister_pernet_subsys()` in `vsock_loopback_exit()` - call `vsock_loopback_deinit_vsock()` in `vsock_loopback_exit()`
Changes in v6: - init pernet ops for vsock_loopback module - vsock_loopback: add space in struct to clarify lock protection - do proper cleanup/unregister on vsock_loopback_exit() - vsock_loopback: use virtio_vsock_skb_net()
Changes in v5: - add callbacks code to avoid reverse dependency - add logic for handling vsock_loopback setup for already existing namespaces --- net/vmw_vsock/vsock_loopback.c | 41 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 40 insertions(+), 1 deletion(-)
diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c index d3ac056663ea..e62f6c516992 100644 --- a/net/vmw_vsock/vsock_loopback.c +++ b/net/vmw_vsock/vsock_loopback.c @@ -32,6 +32,9 @@ static int vsock_loopback_send_pkt(struct sk_buff *skb, struct net *net, struct vsock_loopback *vsock = &the_vsock_loopback; int len = skb->len;
+ virtio_vsock_skb_set_net(skb, net); + virtio_vsock_skb_set_net_mode(skb, net_mode); + virtio_vsock_skb_queue_tail(&vsock->pkt_queue, skb); queue_work(vsock->workqueue, &vsock->pkt_work);
@@ -116,8 +119,10 @@ static void vsock_loopback_work(struct work_struct *work) { struct vsock_loopback *vsock = container_of(work, struct vsock_loopback, pkt_work); + enum vsock_net_mode net_mode; struct sk_buff_head pkts; struct sk_buff *skb; + struct net *net;
skb_queue_head_init(&pkts);
@@ -131,7 +136,41 @@ static void vsock_loopback_work(struct work_struct *work) */ virtio_transport_consume_skb_sent(skb, false); virtio_transport_deliver_tap_pkt(skb); - virtio_transport_recv_pkt(&loopback_transport, skb, NULL, 0); + + /* In the case of virtio_transport_reset_no_sock(), the skb + * does not hold a reference on the socket, and so does not + * transitively hold a reference on the net. + * + * There is an ABA race condition in this sequence: + * 1. the sender sends a packet + * 2. worker calls virtio_transport_recv_pkt(), using the + * sender's net + * 3. virtio_transport_recv_pkt() uses t->send_pkt() passing the + * sender's net + * 4. virtio_transport_recv_pkt() free's the skb, dropping the + * reference to the socket + * 5. the socket closes, frees its reference to the net + * 6. Finally, the worker for the second t->send_pkt() call + * processes the skb, and uses the now stale net pointer for + * socket lookups. + * + * To prevent this, we acquire a net reference in vsock_loopback_send_pkt() + * and hold it until virtio_transport_recv_pkt() completes. + * + * Additionally, we must grab a reference on the skb before + * calling virtio_transport_recv_pkt() to prevent it from + * freeing the skb before we have a chance to release the net. + */ + net_mode = virtio_vsock_skb_net_mode(skb); + net = virtio_vsock_skb_net(skb); + + skb_get(skb); + + virtio_transport_recv_pkt(&loopback_transport, skb, net, + net_mode); + + virtio_vsock_skb_clear_net(skb); + kfree_skb(skb); } }
On Tue, Nov 11, 2025 at 10:54:48PM -0800, Bobby Eshleman wrote:
From: Bobby Eshleman bobbyeshleman@meta.com
Add NS support to vsock loopback. Sockets in a global mode netns communicate with each other, regardless of namespace. Sockets in a local mode netns may only communicate with other sockets within the same namespace.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com
Changes in v9:
- remove per-netns vsock_loopback and workqueues, just re-using
the net and net_mode in skb->cb achieved the same result in a simpler way. Also removed need for pernet_subsys.
- properly track net references
Changes in v7:
- drop for_each_net() init/exit, drop net_rwsem, the pernet registration
handles this automatically and race-free
- flush workqueue before destruction, purge pkt list
- remember net_mode instead of current net mode
- keep space after INIT_WORK()
- change vsock_loopback in netns_vsock to ->priv void ptr
- rename `orig_net_mode` to `net_mode`
- remove useless comment
- protect `register_pernet_subsys()` with `net_rwsem`
- do cleanup before releasing `net_rwsem` when failure happens
- call `unregister_pernet_subsys()` in `vsock_loopback_exit()`
- call `vsock_loopback_deinit_vsock()` in `vsock_loopback_exit()`
Changes in v6:
- init pernet ops for vsock_loopback module
- vsock_loopback: add space in struct to clarify lock protection
- do proper cleanup/unregister on vsock_loopback_exit()
- vsock_loopback: use virtio_vsock_skb_net()
Changes in v5:
- add callbacks code to avoid reverse dependency
- add logic for handling vsock_loopback setup for already existing
namespaces
net/vmw_vsock/vsock_loopback.c | 41 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 40 insertions(+), 1 deletion(-)
diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c index d3ac056663ea..e62f6c516992 100644 --- a/net/vmw_vsock/vsock_loopback.c +++ b/net/vmw_vsock/vsock_loopback.c @@ -32,6 +32,9 @@ static int vsock_loopback_send_pkt(struct sk_buff *skb, struct net *net, struct vsock_loopback *vsock = &the_vsock_loopback; int len = skb->len;
- virtio_vsock_skb_set_net(skb, net);
- virtio_vsock_skb_set_net_mode(skb, net_mode);
- virtio_vsock_skb_queue_tail(&vsock->pkt_queue, skb); queue_work(vsock->workqueue, &vsock->pkt_work);
@@ -116,8 +119,10 @@ static void vsock_loopback_work(struct work_struct *work) { struct vsock_loopback *vsock = container_of(work, struct vsock_loopback, pkt_work);
enum vsock_net_mode net_mode; struct sk_buff_head pkts; struct sk_buff *skb;
struct net *net;
skb_queue_head_init(&pkts);
@@ -131,7 +136,41 @@ static void vsock_loopback_work(struct work_struct *work) */ virtio_transport_consume_skb_sent(skb, false); virtio_transport_deliver_tap_pkt(skb);
virtio_transport_recv_pkt(&loopback_transport, skb, NULL, 0);
/* In the case of virtio_transport_reset_no_sock(), the skb* does not hold a reference on the socket, and so does not* transitively hold a reference on the net.** There is an ABA race condition in this sequence:* 1. the sender sends a packet* 2. worker calls virtio_transport_recv_pkt(), using the* sender's net* 3. virtio_transport_recv_pkt() uses t->send_pkt() passing the* sender's net* 4. virtio_transport_recv_pkt() free's the skb, dropping the* reference to the socket* 5. the socket closes, frees its reference to the net* 6. Finally, the worker for the second t->send_pkt() call* processes the skb, and uses the now stale net pointer for* socket lookups.** To prevent this, we acquire a net reference in vsock_loopback_send_pkt()* and hold it until virtio_transport_recv_pkt() completes.** Additionally, we must grab a reference on the skb before* calling virtio_transport_recv_pkt() to prevent it from* freeing the skb before we have a chance to release the net.*/net_mode = virtio_vsock_skb_net_mode(skb);net = virtio_vsock_skb_net(skb);
Wait, we are adding those just for loopback (in theory used only for testing/debugging)? And only to support virtio_transport_reset_no_sock() use case?
Honestly I don't like this, do we have any alternative?
I'll also try to think something else.
Stefano
skb_get(skb);virtio_transport_recv_pkt(&loopback_transport, skb, net,net_mode);virtio_vsock_skb_clear_net(skb); }kfree_skb(skb);}
-- 2.47.3
On Wed, Nov 12, 2025 at 03:19:47PM +0100, Stefano Garzarella wrote:
On Tue, Nov 11, 2025 at 10:54:48PM -0800, Bobby Eshleman wrote:
From: Bobby Eshleman bobbyeshleman@meta.com
Add NS support to vsock loopback. Sockets in a global mode netns communicate with each other, regardless of namespace. Sockets in a local mode netns may only communicate with other sockets within the same namespace.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com
Changes in v9:
- remove per-netns vsock_loopback and workqueues, just re-using
the net and net_mode in skb->cb achieved the same result in a simpler way. Also removed need for pernet_subsys.
- properly track net references
Changes in v7:
- drop for_each_net() init/exit, drop net_rwsem, the pernet registration
handles this automatically and race-free
- flush workqueue before destruction, purge pkt list
- remember net_mode instead of current net mode
- keep space after INIT_WORK()
- change vsock_loopback in netns_vsock to ->priv void ptr
- rename `orig_net_mode` to `net_mode`
- remove useless comment
- protect `register_pernet_subsys()` with `net_rwsem`
- do cleanup before releasing `net_rwsem` when failure happens
- call `unregister_pernet_subsys()` in `vsock_loopback_exit()`
- call `vsock_loopback_deinit_vsock()` in `vsock_loopback_exit()`
Changes in v6:
- init pernet ops for vsock_loopback module
- vsock_loopback: add space in struct to clarify lock protection
- do proper cleanup/unregister on vsock_loopback_exit()
- vsock_loopback: use virtio_vsock_skb_net()
Changes in v5:
- add callbacks code to avoid reverse dependency
- add logic for handling vsock_loopback setup for already existing
namespaces
net/vmw_vsock/vsock_loopback.c | 41 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 40 insertions(+), 1 deletion(-)
diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c index d3ac056663ea..e62f6c516992 100644 --- a/net/vmw_vsock/vsock_loopback.c +++ b/net/vmw_vsock/vsock_loopback.c @@ -32,6 +32,9 @@ static int vsock_loopback_send_pkt(struct sk_buff *skb, struct net *net, struct vsock_loopback *vsock = &the_vsock_loopback; int len = skb->len;
- virtio_vsock_skb_set_net(skb, net);
- virtio_vsock_skb_set_net_mode(skb, net_mode);
- virtio_vsock_skb_queue_tail(&vsock->pkt_queue, skb); queue_work(vsock->workqueue, &vsock->pkt_work);
@@ -116,8 +119,10 @@ static void vsock_loopback_work(struct work_struct *work) { struct vsock_loopback *vsock = container_of(work, struct vsock_loopback, pkt_work);
enum vsock_net_mode net_mode; struct sk_buff_head pkts; struct sk_buff *skb;
struct net *net;
skb_queue_head_init(&pkts);
@@ -131,7 +136,41 @@ static void vsock_loopback_work(struct work_struct *work) */ virtio_transport_consume_skb_sent(skb, false); virtio_transport_deliver_tap_pkt(skb);
virtio_transport_recv_pkt(&loopback_transport, skb, NULL, 0);
/* In the case of virtio_transport_reset_no_sock(), the skb* does not hold a reference on the socket, and so does not* transitively hold a reference on the net.** There is an ABA race condition in this sequence:* 1. the sender sends a packet* 2. worker calls virtio_transport_recv_pkt(), using the* sender's net* 3. virtio_transport_recv_pkt() uses t->send_pkt() passing the* sender's net* 4. virtio_transport_recv_pkt() free's the skb, dropping the* reference to the socket* 5. the socket closes, frees its reference to the net* 6. Finally, the worker for the second t->send_pkt() call* processes the skb, and uses the now stale net pointer for* socket lookups.** To prevent this, we acquire a net reference in vsock_loopback_send_pkt()* and hold it until virtio_transport_recv_pkt() completes.** Additionally, we must grab a reference on the skb before* calling virtio_transport_recv_pkt() to prevent it from* freeing the skb before we have a chance to release the net.*/net_mode = virtio_vsock_skb_net_mode(skb);net = virtio_vsock_skb_net(skb);Wait, we are adding those just for loopback (in theory used only for testing/debugging)? And only to support virtio_transport_reset_no_sock() use case?
Yes, exactly, only loopback + reset_no_sock(). The issue doesn't exist for vhost-vsock because vhost_vsock holds a net reference, and it doesn't exist for non-reset_no_sock calls because after looking up the socket we transfer skb ownership to it, which holds down the skb -> sk -> net reference chain.
Honestly I don't like this, do we have any alternative?
I'll also try to think something else.
Stefano
I've been thinking about this all morning... maybe we can do something like this:
```
virtio_transport_recv_pkt(..., struct sock *reply_sk) {... }
virtio_transport_reset_no_sock(..., reply_sk) { if (reply_sk) skb_set_owner_sk_safe(reply, reply_sk)
t->send_pkt(reply); }
vsock_loopback_work(...) { virtio_transport_recv_pkt(..., skb, skb->sk); }
for other transports:
virtio_transport_recv_pkt(..., skb, NULL);
```
This way 'reply' keeps the sk and sk->net alive even after virtio_transport_recv_pkt() frees 'skb'. The net won't be released until after 'reply' is freed back on the other side, removing the race.
It makes semantic sense too... for loopback, we already know which sk the reply is going back to. For other transports, we don't because they're across the virt boundary.
WDYT?
I hate to suggest this, but another option might be to just do nothing? In order for this race to have any real effect, a loopback socket must send a pkt to a non-existent socket, immediately close(), then the namespace deleted, a new namespace created with the same pointer address, and finally a new socket with the same port created in that namespace, all before the reply RST reaches recv_pkt()... at which point the newly created socket would wrongfully receive the RST.
Best, Bobby
On Wed, Nov 12, 2025 at 10:27:18AM -0800, Bobby Eshleman wrote:
On Wed, Nov 12, 2025 at 03:19:47PM +0100, Stefano Garzarella wrote:
On Tue, Nov 11, 2025 at 10:54:48PM -0800, Bobby Eshleman wrote:
From: Bobby Eshleman bobbyeshleman@meta.com
Add NS support to vsock loopback. Sockets in a global mode netns communicate with each other, regardless of namespace. Sockets in a local mode netns may only communicate with other sockets within the same namespace.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com
Changes in v9:
- remove per-netns vsock_loopback and workqueues, just re-using
the net and net_mode in skb->cb achieved the same result in a simpler way. Also removed need for pernet_subsys.
- properly track net references
Changes in v7:
- drop for_each_net() init/exit, drop net_rwsem, the pernet registration
handles this automatically and race-free
- flush workqueue before destruction, purge pkt list
- remember net_mode instead of current net mode
- keep space after INIT_WORK()
- change vsock_loopback in netns_vsock to ->priv void ptr
- rename `orig_net_mode` to `net_mode`
- remove useless comment
- protect `register_pernet_subsys()` with `net_rwsem`
- do cleanup before releasing `net_rwsem` when failure happens
- call `unregister_pernet_subsys()` in `vsock_loopback_exit()`
- call `vsock_loopback_deinit_vsock()` in `vsock_loopback_exit()`
Changes in v6:
- init pernet ops for vsock_loopback module
- vsock_loopback: add space in struct to clarify lock protection
- do proper cleanup/unregister on vsock_loopback_exit()
- vsock_loopback: use virtio_vsock_skb_net()
Changes in v5:
- add callbacks code to avoid reverse dependency
- add logic for handling vsock_loopback setup for already existing
namespaces
net/vmw_vsock/vsock_loopback.c | 41 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 40 insertions(+), 1 deletion(-)
diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c index d3ac056663ea..e62f6c516992 100644 --- a/net/vmw_vsock/vsock_loopback.c +++ b/net/vmw_vsock/vsock_loopback.c @@ -32,6 +32,9 @@ static int vsock_loopback_send_pkt(struct sk_buff *skb, struct net *net, struct vsock_loopback *vsock = &the_vsock_loopback; int len = skb->len;
- virtio_vsock_skb_set_net(skb, net);
- virtio_vsock_skb_set_net_mode(skb, net_mode);
- virtio_vsock_skb_queue_tail(&vsock->pkt_queue, skb); queue_work(vsock->workqueue, &vsock->pkt_work);
@@ -116,8 +119,10 @@ static void vsock_loopback_work(struct work_struct *work) { struct vsock_loopback *vsock = container_of(work, struct vsock_loopback, pkt_work);
enum vsock_net_mode net_mode; struct sk_buff_head pkts; struct sk_buff *skb;
struct net *net;
skb_queue_head_init(&pkts);
@@ -131,7 +136,41 @@ static void vsock_loopback_work(struct work_struct *work) */ virtio_transport_consume_skb_sent(skb, false); virtio_transport_deliver_tap_pkt(skb);
virtio_transport_recv_pkt(&loopback_transport, skb, NULL, 0);
/* In the case of virtio_transport_reset_no_sock(), the skb* does not hold a reference on the socket, and so does not* transitively hold a reference on the net.** There is an ABA race condition in this sequence:* 1. the sender sends a packet* 2. worker calls virtio_transport_recv_pkt(), using the* sender's net* 3. virtio_transport_recv_pkt() uses t->send_pkt() passing the* sender's net* 4. virtio_transport_recv_pkt() free's the skb, dropping the* reference to the socket* 5. the socket closes, frees its reference to the net* 6. Finally, the worker for the second t->send_pkt() call* processes the skb, and uses the now stale net pointer for* socket lookups.** To prevent this, we acquire a net reference in vsock_loopback_send_pkt()* and hold it until virtio_transport_recv_pkt() completes.** Additionally, we must grab a reference on the skb before* calling virtio_transport_recv_pkt() to prevent it from* freeing the skb before we have a chance to release the net.*/net_mode = virtio_vsock_skb_net_mode(skb);net = virtio_vsock_skb_net(skb);Wait, we are adding those just for loopback (in theory used only for testing/debugging)? And only to support virtio_transport_reset_no_sock() use case?
Yes, exactly, only loopback + reset_no_sock(). The issue doesn't exist for vhost-vsock because vhost_vsock holds a net reference, and it doesn't exist for non-reset_no_sock calls because after looking up the socket we transfer skb ownership to it, which holds down the skb -> sk -> net reference chain.
Honestly I don't like this, do we have any alternative?
I'll also try to think something else.
Stefano
I've been thinking about this all morning... maybe we can do something like this:
virtio_transport_recv_pkt(..., struct sock *reply_sk) {... } virtio_transport_reset_no_sock(..., reply_sk) { if (reply_sk) skb_set_owner_sk_safe(reply, reply_sk)
Interesting, but what about if we call skb_set_owner_sk_safe() in vsock_loopback.c just before calling virtio_transport_recv_pkt() for every skb?
Maybe we should refactor a bit virtio_transport_recv_pkt() e.g. moving `skb_set_owner_sk_safe()` to be sure it's called only when we are sure it's the right socket (e.g. after checking SOCK_DONE).
WDYT?
t->send_pkt(reply); }
vsock_loopback_work(...) { virtio_transport_recv_pkt(..., skb, skb->sk); }
for other transports:
virtio_transport_recv_pkt(..., skb, NULL);
This way 'reply' keeps the sk and sk->net alive even after virtio_transport_recv_pkt() frees 'skb'. The net won't be released until after 'reply' is freed back on the other side, removing the race. It makes semantic sense too... for loopback, we already know which sk the reply is going back to. For other transports, we don't because they're across the virt boundary. WDYT? I hate to suggest this, but another option might be to just do nothing? In order for this race to have any real effect, a loopback socket must send a pkt to a non-existent socket, immediately close(), then the namespace deleted, a new namespace created with the same pointer address, and finally a new socket with the same port created in that namespace, all before the reply RST reaches recv_pkt()... at which point the newly created socket would wrongfully receive the RST.
Yeah, let's keep this as plan B for now :-)
Thanks, Stefano
On Thu, Nov 13, 2025 at 04:24:44PM +0100, Stefano Garzarella wrote:
On Wed, Nov 12, 2025 at 10:27:18AM -0800, Bobby Eshleman wrote:
On Wed, Nov 12, 2025 at 03:19:47PM +0100, Stefano Garzarella wrote:
On Tue, Nov 11, 2025 at 10:54:48PM -0800, Bobby Eshleman wrote:
From: Bobby Eshleman bobbyeshleman@meta.com
Add NS support to vsock loopback. Sockets in a global mode netns communicate with each other, regardless of namespace. Sockets in a local mode netns may only communicate with other sockets within the same namespace.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com
[...]
@@ -131,7 +136,41 @@ static void vsock_loopback_work(struct work_struct *work) */ virtio_transport_consume_skb_sent(skb, false); virtio_transport_deliver_tap_pkt(skb);
virtio_transport_recv_pkt(&loopback_transport, skb, NULL, 0);
/* In the case of virtio_transport_reset_no_sock(), the skb* does not hold a reference on the socket, and so does not* transitively hold a reference on the net.** There is an ABA race condition in this sequence:* 1. the sender sends a packet* 2. worker calls virtio_transport_recv_pkt(), using the* sender's net* 3. virtio_transport_recv_pkt() uses t->send_pkt() passing the* sender's net* 4. virtio_transport_recv_pkt() free's the skb, dropping the* reference to the socket* 5. the socket closes, frees its reference to the net* 6. Finally, the worker for the second t->send_pkt() call* processes the skb, and uses the now stale net pointer for* socket lookups.** To prevent this, we acquire a net reference in vsock_loopback_send_pkt()* and hold it until virtio_transport_recv_pkt() completes.** Additionally, we must grab a reference on the skb before* calling virtio_transport_recv_pkt() to prevent it from* freeing the skb before we have a chance to release the net.*/net_mode = virtio_vsock_skb_net_mode(skb);net = virtio_vsock_skb_net(skb);Wait, we are adding those just for loopback (in theory used only for testing/debugging)? And only to support virtio_transport_reset_no_sock() use case?
Yes, exactly, only loopback + reset_no_sock(). The issue doesn't exist for vhost-vsock because vhost_vsock holds a net reference, and it doesn't exist for non-reset_no_sock calls because after looking up the socket we transfer skb ownership to it, which holds down the skb -> sk -> net reference chain.
Honestly I don't like this, do we have any alternative?
I'll also try to think something else.
Stefano
I've been thinking about this all morning... maybe we can do something like this:
virtio_transport_recv_pkt(..., struct sock *reply_sk) {... } virtio_transport_reset_no_sock(..., reply_sk) { if (reply_sk) skb_set_owner_sk_safe(reply, reply_sk)Interesting, but what about if we call skb_set_owner_sk_safe() in vsock_loopback.c just before calling virtio_transport_recv_pkt() for every skb?
I think the issue with this is that at the time vsock_loopback calls virtio_transport_recv_pkt() the reply skb hasn't yet been allocated by virtio_transport_reset_no_sock() and we can't wait for it to return because the original skb may be freed by then.
We might be able to keep it all in vsock_loopback if we removed the need to use the original skb or sk by just using the net. But to do that we would need to add a netns_tracker per net somewhere. I guess that would end up in a list or hashmap in struct vsock_loopback.
Another option that does simplify a little, but unfortunately still doesn't keep everything in loopback:
@@ -1205,7 +1205,7 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t, if (!reply) return -ENOMEM;
- return t->send_pkt(reply, net, net_mode); + return t->send_pkt(reply, net, net_mode, skb->sk); }
@@ -27,11 +27,16 @@ static u32 vsock_loopback_get_local_cid(void) }
static int vsock_loopback_send_pkt(struct sk_buff *skb, struct net *net, - enum vsock_net_mode net_mode) + enum vsock_net_mode net_mode, + struct sock *rst_owner) { struct vsock_loopback *vsock = &the_vsock_loopback; int len = skb->len;
+ if (!skb->sk && rst_owner) + WARN_ONCE(!skb_set_owner_sk_safe(skb, rst_owner), + "loopback socket has sk_refcnt == 0\n"); + virtio_vsock_skb_queue_tail(&vsock->pkt_queue, skb); queue_work(vsock->workqueue, &vsock->pkt_work);
Maybe we should refactor a bit virtio_transport_recv_pkt() e.g. moving `skb_set_owner_sk_safe()` to be sure it's called only when we are sure it's the right socket (e.g. after checking SOCK_DONE).
WDYT?
I agree, it is called a little prematurely.
Thanks, Bobby
On Thu, Nov 13, 2025 at 10:26:04AM -0800, Bobby Eshleman wrote:
On Thu, Nov 13, 2025 at 04:24:44PM +0100, Stefano Garzarella wrote:
On Wed, Nov 12, 2025 at 10:27:18AM -0800, Bobby Eshleman wrote:
On Wed, Nov 12, 2025 at 03:19:47PM +0100, Stefano Garzarella wrote:
On Tue, Nov 11, 2025 at 10:54:48PM -0800, Bobby Eshleman wrote:
From: Bobby Eshleman bobbyeshleman@meta.com
Add NS support to vsock loopback. Sockets in a global mode netns communicate with each other, regardless of namespace. Sockets in a local mode netns may only communicate with other sockets within the same namespace.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com
[...]
@@ -131,7 +136,41 @@ static void vsock_loopback_work(struct work_struct *work) */ virtio_transport_consume_skb_sent(skb, false); virtio_transport_deliver_tap_pkt(skb);
virtio_transport_recv_pkt(&loopback_transport, skb, NULL, 0);
/* In the case of virtio_transport_reset_no_sock(), the skb* does not hold a reference on the socket, and so does not* transitively hold a reference on the net.** There is an ABA race condition in this sequence:* 1. the sender sends a packet* 2. worker calls virtio_transport_recv_pkt(), using the* sender's net* 3. virtio_transport_recv_pkt() uses t->send_pkt() passing the* sender's net* 4. virtio_transport_recv_pkt() free's the skb, dropping the* reference to the socket* 5. the socket closes, frees its reference to the net* 6. Finally, the worker for the second t->send_pkt() call* processes the skb, and uses the now stale net pointer for* socket lookups.** To prevent this, we acquire a net reference in vsock_loopback_send_pkt()* and hold it until virtio_transport_recv_pkt() completes.** Additionally, we must grab a reference on the skb before* calling virtio_transport_recv_pkt() to prevent it from* freeing the skb before we have a chance to release the net.*/net_mode = virtio_vsock_skb_net_mode(skb);net = virtio_vsock_skb_net(skb);Wait, we are adding those just for loopback (in theory used only for testing/debugging)? And only to support virtio_transport_reset_no_sock() use case?
Yes, exactly, only loopback + reset_no_sock(). The issue doesn't exist for vhost-vsock because vhost_vsock holds a net reference, and it doesn't exist for non-reset_no_sock calls because after looking up the socket we transfer skb ownership to it, which holds down the skb -> sk -> net reference chain.
Honestly I don't like this, do we have any alternative?
I'll also try to think something else.
Stefano
I've been thinking about this all morning... maybe we can do something like this:
virtio_transport_recv_pkt(..., struct sock *reply_sk) {... } virtio_transport_reset_no_sock(..., reply_sk) { if (reply_sk) skb_set_owner_sk_safe(reply, reply_sk)Interesting, but what about if we call skb_set_owner_sk_safe() in vsock_loopback.c just before calling virtio_transport_recv_pkt() for every skb?
I think the issue with this is that at the time vsock_loopback calls virtio_transport_recv_pkt() the reply skb hasn't yet been allocated by virtio_transport_reset_no_sock() and we can't wait for it to return because the original skb may be freed by then.
Right!
We might be able to keep it all in vsock_loopback if we removed the need to use the original skb or sk by just using the net. But to do that we would need to add a netns_tracker per net somewhere. I guess that would end up in a list or hashmap in struct vsock_loopback.
Another option that does simplify a little, but unfortunately still doesn't keep everything in loopback:
@@ -1205,7 +1205,7 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t, if (!reply) return -ENOMEM;
- return t->send_pkt(reply, net, net_mode);
- return t->send_pkt(reply, net, net_mode, skb->sk);
}
@@ -27,11 +27,16 @@ static u32 vsock_loopback_get_local_cid(void) }
static int vsock_loopback_send_pkt(struct sk_buff *skb, struct net *net,
enum vsock_net_mode net_mode)
enum vsock_net_mode net_mode,struct sock *rst_owner){ struct vsock_loopback *vsock = &the_vsock_loopback; int len = skb->len;
- if (!skb->sk && rst_owner)
WARN_ONCE(!skb_set_owner_sk_safe(skb, rst_owner),"loopback socket has sk_refcnt == 0\n");
This doesn't seem too bad IMO, but at this point, why we can't do that in virtio_transport_reset_no_sock() for any kind of transport?
I mean, in any case the RST packet should be handled by the same net of the "sender", no?
At this point, can we just put the `vsk` of the sender in the `info` and virtio_transport_alloc_skb() will already do that.
WDYT? Am I missing something?
virtio_vsock_skb_queue_tail(&vsock->pkt_queue, skb); queue_work(vsock->workqueue, &vsock->pkt_work);
Maybe we should refactor a bit virtio_transport_recv_pkt() e.g. moving `skb_set_owner_sk_safe()` to be sure it's called only when we are sure it's the right socket (e.g. after checking SOCK_DONE).
WDYT?
I agree, it is called a little prematurely.
Yep, but I'll leave this out for now :-)
Thanks, Stefano
From: Bobby Eshleman bobbyeshleman@meta.com
Add the ability to isolate vhost-vsock flows using namespaces.
The VM, via the vhost_vsock struct, inherits its namespace from the process that opens the vhost-vsock device. vhost_vsock lookup functions are modified to take into account the mode (e.g., if CIDs are matching but modes don't align, then return NULL).
When namespace modes are evaluated during socket usage we always use the mode of the namespace at the time the vhost vsock device file was opened. If that namespace is later changed from "global" to "local" mode, the vsock will continue operating as if the change never happened (i.e., it is in "global" mode). This avoids breaking already established flows.
vhost_vsock now acquires a reference to the namespace.
Suggested-by: Sargun Dhillon sargun@sargun.me Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- Changes in v9: - add more information about net_mode and rationale (changing modes) to both code comment and commit message Changes in v7: - remove the check_global flag of vhost_vsock_get(), that logic was both wrong and not necessary, reuse vsock_net_check_mode() instead - remove 'delete me' comment Changes in v5: - respect pid namespaces when assigning namespace to vhost_vsock --- drivers/vhost/vsock.c | 42 ++++++++++++++++++++++++++++++++---------- 1 file changed, 32 insertions(+), 10 deletions(-)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 0a0e73405532..09f9321e4bc8 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -46,6 +46,11 @@ static DEFINE_READ_MOSTLY_HASHTABLE(vhost_vsock_hash, 8); struct vhost_vsock { struct vhost_dev dev; struct vhost_virtqueue vqs[2]; + struct net *net; + netns_tracker ns_tracker; + + /* The ns mode at the time vhost_vsock was created */ + enum vsock_net_mode net_mode;
/* Link to global vhost_vsock_hash, writes use vhost_vsock_mutex */ struct hlist_node hash; @@ -67,7 +72,8 @@ static u32 vhost_transport_get_local_cid(void) /* Callers that dereference the return value must hold vhost_vsock_mutex or the * RCU read lock. */ -static struct vhost_vsock *vhost_vsock_get(u32 guest_cid) +static struct vhost_vsock *vhost_vsock_get(u32 guest_cid, struct net *net, + enum vsock_net_mode mode) { struct vhost_vsock *vsock;
@@ -78,9 +84,9 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid) if (other_cid == 0) continue;
- if (other_cid == guest_cid) + if (other_cid == guest_cid && + vsock_net_check_mode(net, mode, vsock->net, vsock->net_mode)) return vsock; - }
return NULL; @@ -279,7 +285,7 @@ vhost_transport_send_pkt(struct sk_buff *skb, struct net *net, rcu_read_lock();
/* Find the vhost_vsock according to guest context id */ - vsock = vhost_vsock_get(le64_to_cpu(hdr->dst_cid)); + vsock = vhost_vsock_get(le64_to_cpu(hdr->dst_cid), net, net_mode); if (!vsock) { rcu_read_unlock(); kfree_skb(skb); @@ -306,7 +312,8 @@ vhost_transport_cancel_pkt(struct vsock_sock *vsk) rcu_read_lock();
/* Find the vhost_vsock according to guest context id */ - vsock = vhost_vsock_get(vsk->remote_addr.svm_cid); + vsock = vhost_vsock_get(vsk->remote_addr.svm_cid, + sock_net(sk_vsock(vsk)), vsk->net_mode); if (!vsock) goto out;
@@ -463,11 +470,12 @@ static struct virtio_transport vhost_transport = {
static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid) { + struct net *net = sock_net(sk_vsock(vsk)); struct vhost_vsock *vsock; bool seqpacket_allow = false;
rcu_read_lock(); - vsock = vhost_vsock_get(remote_cid); + vsock = vhost_vsock_get(remote_cid, net, vsk->net_mode);
if (vsock) seqpacket_allow = vsock->seqpacket_allow; @@ -538,8 +546,8 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work) if (le64_to_cpu(hdr->src_cid) == vsock->guest_cid && le64_to_cpu(hdr->dst_cid) == vhost_transport_get_local_cid()) - virtio_transport_recv_pkt(&vhost_transport, skb, NULL, - 0); + virtio_transport_recv_pkt(&vhost_transport, skb, + vsock->net, vsock->net_mode); else kfree_skb(skb);
@@ -654,8 +662,10 @@ static void vhost_vsock_free(struct vhost_vsock *vsock)
static int vhost_vsock_dev_open(struct inode *inode, struct file *file) { + struct vhost_virtqueue **vqs; struct vhost_vsock *vsock; + struct net *net; int ret;
/* This struct is large and allocation could fail, fall back to vmalloc @@ -671,6 +681,17 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file) goto out; }
+ net = current->nsproxy->net_ns; + vsock->net = get_net_track(net, &vsock->ns_tracker, GFP_KERNEL); + + /* Store the mode of the namespace at the time of creation. If this + * namespace later changes from "global" to "local", we want this vsock + * to continue operating normally and not suddenly break. For that + * reason, we save the mode here and later use it when performing + * socket lookups with vsock_net_check_mode() (see vhost_vsock_get()). + */ + vsock->net_mode = vsock_net_mode(net); + vsock->guest_cid = 0; /* no CID assigned yet */ vsock->seqpacket_allow = false;
@@ -710,7 +731,7 @@ static void vhost_vsock_reset_orphans(struct sock *sk) */
/* If the peer is still valid, no need to reset connection */ - if (vhost_vsock_get(vsk->remote_addr.svm_cid)) + if (vhost_vsock_get(vsk->remote_addr.svm_cid, sock_net(sk), vsk->net_mode)) return;
/* If the close timeout is pending, let it expire. This avoids races @@ -755,6 +776,7 @@ static int vhost_vsock_dev_release(struct inode *inode, struct file *file) virtio_vsock_skb_queue_purge(&vsock->send_pkt_queue);
vhost_dev_cleanup(&vsock->dev); + put_net_track(vsock->net, &vsock->ns_tracker); kfree(vsock->dev.vqs); vhost_vsock_free(vsock); return 0; @@ -781,7 +803,7 @@ static int vhost_vsock_set_cid(struct vhost_vsock *vsock, u64 guest_cid)
/* Refuse if CID is already in use */ mutex_lock(&vhost_vsock_mutex); - other = vhost_vsock_get(guest_cid); + other = vhost_vsock_get(guest_cid, vsock->net, vsock->net_mode); if (other && other != vsock) { mutex_unlock(&vhost_vsock_mutex); return -EADDRINUSE;
From: Bobby Eshleman bobbyeshleman@meta.com
Reject setting VSOCK_NET_MODE_LOCAL with -EOPNOTSUPP if a G2H transport is operational. Additionally, reject G2H transport registration if there already exists a namespace in local mode.
G2H sockets break in local mode because the G2H transports don't support namespacing yet. The current approach is to coerce packets coming out of G2H transports into VSOCK_NET_MODE_GLOBAL mode, but it is not possible to coerce sockets in the same way because it cannot be deduced which transport will be used by the socket. Specifically, when bound to VMADDR_CID_ANY in a nested VM (both G2H and H2G available), it is not until a packet is received and matched to the bound socket that we assign the transport. This presents a chicken-and-egg problem, because we need the namespace to lookup the socket and resolve the transport, but we need the transport to know how to use the namespace during lookup.
For that reason, this patch prevents VSOCK_NET_MODE_LOCAL from being used on systems that support G2H, even nested systems that also have H2G transports.
Local mode is blocked based on detecting the presence of G2H devices (when possible, as hyperv is special). This means that a host kernel with G2H support compiled in (or has the module loaded), will still support local mode because there is no G2H (e.g., virtio-vsock) device detected. This enables using the same kernel in the host and in the guest, as we do in kselftest.
Systems with only namespace-aware transports (vhost-vsock, loopback) can still use both VSOCK_NET_MODE_GLOBAL and VSOCK_NET_MODE_LOCAL modes as intended.
The hyperv transport must be treated specially. Other G2H transports can can report presence of a device using get_local_cid(). When a device is present it returns a valid CID; otherwise, it returns VMADDR_CID_ANY. THe hyperv transport's get_local_cid() always returns VMADDR_CID_ANY, however, even when a device is present.
For that reason, this patch adds an always_block_local_mode flag to struct vsock_transport. When set to true, VSOCK_NET_MODE_LOCAL is blocked unconditionally whenever the transport is registered, regardless of device presence. When false, LOCAL mode is only blocked when get_local_cid() indicates a device is present (!= VMADDR_CID_ANY).
The hyperv transport sets this flag to true to unconditionally block local mode. Other G2H transports (virtio-vsock, vmci-vsock) leave it false and continue using device detection via get_local_cid() to block local mode.
These restrictions can be lifted in a future patch series when G2H transports gain namespace support.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- include/net/af_vsock.h | 8 +++++++ net/vmw_vsock/af_vsock.c | 45 +++++++++++++++++++++++++++++++++++++--- net/vmw_vsock/hyperv_transport.c | 1 + 3 files changed, 51 insertions(+), 3 deletions(-)
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index cfd121bb5ab7..089c61105dda 100644 --- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -108,6 +108,14 @@ struct vsock_transport_send_notify_data {
struct vsock_transport { struct module *module; + /* If true, block VSOCK_NET_MODE_LOCAL unconditionally when this G2H + * transport is registered. If false, only block LOCAL mode when + * get_local_cid() indicates a device is present (!= VMADDR_CID_ANY). + * Hyperv sets this true because it doesn't offer a callback that + * detects device presence. This only applies to G2H transports; H2G + * transports are unaffected. + */ + bool always_block_local_mode;
/* Initialize/tear-down socket. */ int (*init)(struct vsock_sock *, struct vsock_sock *); diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index c0b5946bdc95..a2da1810b802 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -91,6 +91,11 @@ * and locked down by a namespace manager. The default is "global". The mode * is set per-namespace. * + * Note: LOCAL mode is only supported when using namespace-aware transports + * (vhost-vsock, loopback). If a guest-to-host transport (virtio-vsock, + * hyperv-vsock, vmci-vsock) is loaded, attempts to set LOCAL mode will fail + * with EOPNOTSUPP, as these transports do not support per-namespace isolation. + * * The modes affect the allocation and accessibility of CIDs as follows: * * - global - access and allocation are all system-wide @@ -2757,12 +2762,30 @@ static int vsock_net_mode_string(const struct ctl_table *table, int write, if (*lenp >= sizeof(data)) return -EINVAL;
- if (!strncmp(data, VSOCK_NET_MODE_STR_GLOBAL, sizeof(data))) + if (!strncmp(data, VSOCK_NET_MODE_STR_GLOBAL, sizeof(data))) { mode = VSOCK_NET_MODE_GLOBAL; - else if (!strncmp(data, VSOCK_NET_MODE_STR_LOCAL, sizeof(data))) + } else if (!strncmp(data, VSOCK_NET_MODE_STR_LOCAL, sizeof(data))) { + /* LOCAL mode is not supported when G2H transports + * (virtio-vsock, hyperv, vmci) are active, because + * these transports don't support namespaces. We must + * stay in GLOBAL mode to avoid bind/lookup mismatches. + * + * Check if G2H transport is present and either: + * 1. Has always_block_local_mode set (hyperv), OR + * 2. Has an actual device present (get_local_cid() != VMADDR_CID_ANY) + */ + mutex_lock(&vsock_register_mutex); + if (transport_g2h && + (transport_g2h->always_block_local_mode || + transport_g2h->get_local_cid() != VMADDR_CID_ANY)) { + mutex_unlock(&vsock_register_mutex); + return -EOPNOTSUPP; + } + mutex_unlock(&vsock_register_mutex); mode = VSOCK_NET_MODE_LOCAL; - else + } else { return -EINVAL; + }
if (!vsock_net_write_mode(net, mode)) return -EPERM; @@ -2909,6 +2932,7 @@ int vsock_core_register(const struct vsock_transport *t, int features) { const struct vsock_transport *t_h2g, *t_g2h, *t_dgram, *t_local; int err = mutex_lock_interruptible(&vsock_register_mutex); + struct net *net;
if (err) return err; @@ -2931,6 +2955,21 @@ int vsock_core_register(const struct vsock_transport *t, int features) err = -EBUSY; goto err_busy; } + + /* G2H sockets break in LOCAL mode namespaces because G2H transports + * don't support them yet. Block registering new G2H transports if we + * already have local mode namespaces on the system. + */ + rcu_read_lock(); + for_each_net_rcu(net) { + if (vsock_net_mode(net) == VSOCK_NET_MODE_LOCAL) { + rcu_read_unlock(); + err = -EOPNOTSUPP; + goto err_busy; + } + } + rcu_read_unlock(); + t_g2h = t; }
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c index 432fcbbd14d4..ed48dd1ff19b 100644 --- a/net/vmw_vsock/hyperv_transport.c +++ b/net/vmw_vsock/hyperv_transport.c @@ -835,6 +835,7 @@ int hvs_notify_set_rcvlowat(struct vsock_sock *vsk, int val)
static struct vsock_transport hvs_transport = { .module = THIS_MODULE, + .always_block_local_mode = true,
.get_local_cid = hvs_get_local_cid,
On Tue, Nov 11, 2025 at 10:54:50PM -0800, Bobby Eshleman wrote:
From: Bobby Eshleman bobbyeshleman@meta.com
Reject setting VSOCK_NET_MODE_LOCAL with -EOPNOTSUPP if a G2H transport is operational. Additionally, reject G2H transport registration if there already exists a namespace in local mode.
G2H sockets break in local mode because the G2H transports don't support namespacing yet. The current approach is to coerce packets coming out of G2H transports into VSOCK_NET_MODE_GLOBAL mode, but it is not possible to coerce sockets in the same way because it cannot be deduced which transport will be used by the socket. Specifically, when bound to VMADDR_CID_ANY in a nested VM (both G2H and H2G available), it is not until a packet is received and matched to the bound socket that we assign the transport. This presents a chicken-and-egg problem, because we need the namespace to lookup the socket and resolve the transport, but we need the transport to know how to use the namespace during lookup.
For that reason, this patch prevents VSOCK_NET_MODE_LOCAL from being used on systems that support G2H, even nested systems that also have H2G transports.
Local mode is blocked based on detecting the presence of G2H devices (when possible, as hyperv is special). This means that a host kernel with G2H support compiled in (or has the module loaded), will still support local mode because there is no G2H (e.g., virtio-vsock) device detected. This enables using the same kernel in the host and in the guest, as we do in kselftest.
Systems with only namespace-aware transports (vhost-vsock, loopback) can still use both VSOCK_NET_MODE_GLOBAL and VSOCK_NET_MODE_LOCAL modes as intended.
The hyperv transport must be treated specially. Other G2H transports can can report presence of a device using get_local_cid(). When a device is present it returns a valid CID; otherwise, it returns VMADDR_CID_ANY. THe hyperv transport's get_local_cid() always returns VMADDR_CID_ANY, however, even when a device is present.
For that reason, this patch adds an always_block_local_mode flag to struct vsock_transport. When set to true, VSOCK_NET_MODE_LOCAL is blocked unconditionally whenever the transport is registered, regardless of device presence. When false, LOCAL mode is only blocked when get_local_cid() indicates a device is present (!= VMADDR_CID_ANY).
The hyperv transport sets this flag to true to unconditionally block local mode. Other G2H transports (virtio-vsock, vmci-vsock) leave it false and continue using device detection via get_local_cid() to block local mode.
These restrictions can be lifted in a future patch series when G2H transports gain namespace support.
IMO this commit should be before supporting namespaces in any device, so we are sure we don't have commits where this can happen.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com
include/net/af_vsock.h | 8 +++++++ net/vmw_vsock/af_vsock.c | 45 +++++++++++++++++++++++++++++++++++++--- net/vmw_vsock/hyperv_transport.c | 1 + 3 files changed, 51 insertions(+), 3 deletions(-)
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index cfd121bb5ab7..089c61105dda 100644 --- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -108,6 +108,14 @@ struct vsock_transport_send_notify_data {
struct vsock_transport { struct module *module;
/* If true, block VSOCK_NET_MODE_LOCAL unconditionally when this G2H
* transport is registered. If false, only block LOCAL mode when* get_local_cid() indicates a device is present (!= VMADDR_CID_ANY).* Hyperv sets this true because it doesn't offer a callback that* detects device presence. This only applies to G2H transports; H2G* transports are unaffected.*/bool always_block_local_mode;
/* Initialize/tear-down socket. */ int (*init)(struct vsock_sock *, struct vsock_sock *);
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index c0b5946bdc95..a2da1810b802 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -91,6 +91,11 @@
- and locked down by a namespace manager. The default is "global". The mode
- is set per-namespace.
- Note: LOCAL mode is only supported when using namespace-aware transports
- (vhost-vsock, loopback). If a guest-to-host transport (virtio-vsock,
- hyperv-vsock, vmci-vsock) is loaded, attempts to set LOCAL mode will fail
- with EOPNOTSUPP, as these transports do not support per-namespace
isolation.
- The modes affect the allocation and accessibility of CIDs as follows:
- global - access and allocation are all system-wide
@@ -2757,12 +2762,30 @@ static int vsock_net_mode_string(const struct ctl_table *table, int write, if (*lenp >= sizeof(data)) return -EINVAL;
if (!strncmp(data, VSOCK_NET_MODE_STR_GLOBAL, sizeof(data)))
if (!strncmp(data, VSOCK_NET_MODE_STR_GLOBAL, sizeof(data))) { mode = VSOCK_NET_MODE_GLOBAL;
else if (!strncmp(data, VSOCK_NET_MODE_STR_LOCAL, sizeof(data)))
} else if (!strncmp(data, VSOCK_NET_MODE_STR_LOCAL, sizeof(data))) {/* LOCAL mode is not supported when G2H transports* (virtio-vsock, hyperv, vmci) are active, because* these transports don't support namespaces. We must* stay in GLOBAL mode to avoid bind/lookup mismatches.** Check if G2H transport is present and either:* 1. Has always_block_local_mode set (hyperv), OR* 2. Has an actual device present (get_local_cid() != VMADDR_CID_ANY)*/mutex_lock(&vsock_register_mutex);if (transport_g2h &&(transport_g2h->always_block_local_mode ||transport_g2h->get_local_cid() != VMADDR_CID_ANY)) {
This seems almost like a hack. What about adding a new function in the transports that tells us whether a device is present or not and implement it in all of them?
Or a more specific function to check if the transport can work with local mode (e.g. netns_local_aware() or something like that - I'm not great with nameming xD)
mutex_unlock(&vsock_register_mutex);return -EOPNOTSUPP;}mutex_unlock(&vsock_register_mutex);
What happen if the G2H is loaded here, just after we release the mutex?
I suspect there could be a race that we may fix postponing the unlock after the vsock_net_write_mode() call.
WDYT?
mode = VSOCK_NET_MODE_LOCAL;
else
} else { return -EINVAL;}if (!vsock_net_write_mode(net, mode)) return -EPERM;
@@ -2909,6 +2932,7 @@ int vsock_core_register(const struct vsock_transport *t, int features) { const struct vsock_transport *t_h2g, *t_g2h, *t_dgram, *t_local; int err = mutex_lock_interruptible(&vsock_register_mutex);
struct net *net;
if (err) return err;
@@ -2931,6 +2955,21 @@ int vsock_core_register(const struct vsock_transport *t, int features) err = -EBUSY; goto err_busy; }
/* G2H sockets break in LOCAL mode namespaces because G2H transports* don't support them yet. Block registering new G2H transports if we* already have local mode namespaces on the system.*/rcu_read_lock();for_each_net_rcu(net) {if (vsock_net_mode(net) == VSOCK_NET_MODE_LOCAL) {rcu_read_unlock();err = -EOPNOTSUPP;goto err_busy;}}rcu_read_unlock();- t_g2h = t; }
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c index 432fcbbd14d4..ed48dd1ff19b 100644 --- a/net/vmw_vsock/hyperv_transport.c +++ b/net/vmw_vsock/hyperv_transport.c @@ -835,6 +835,7 @@ int hvs_notify_set_rcvlowat(struct vsock_sock *vsk, int val)
static struct vsock_transport hvs_transport = { .module = THIS_MODULE,
.always_block_local_mode = true,
.get_local_cid = hvs_get_local_cid,
-- 2.47.3
On Wed, Nov 12, 2025 at 03:21:39PM +0100, Stefano Garzarella wrote:
On Tue, Nov 11, 2025 at 10:54:50PM -0800, Bobby Eshleman wrote:
From: Bobby Eshleman bobbyeshleman@meta.com
Reject setting VSOCK_NET_MODE_LOCAL with -EOPNOTSUPP if a G2H transport is operational. Additionally, reject G2H transport registration if there already exists a namespace in local mode.
G2H sockets break in local mode because the G2H transports don't support namespacing yet. The current approach is to coerce packets coming out of G2H transports into VSOCK_NET_MODE_GLOBAL mode, but it is not possible to coerce sockets in the same way because it cannot be deduced which transport will be used by the socket. Specifically, when bound to VMADDR_CID_ANY in a nested VM (both G2H and H2G available), it is not until a packet is received and matched to the bound socket that we assign the transport. This presents a chicken-and-egg problem, because we need the namespace to lookup the socket and resolve the transport, but we need the transport to know how to use the namespace during lookup.
For that reason, this patch prevents VSOCK_NET_MODE_LOCAL from being used on systems that support G2H, even nested systems that also have H2G transports.
Local mode is blocked based on detecting the presence of G2H devices (when possible, as hyperv is special). This means that a host kernel with G2H support compiled in (or has the module loaded), will still support local mode because there is no G2H (e.g., virtio-vsock) device detected. This enables using the same kernel in the host and in the guest, as we do in kselftest.
Systems with only namespace-aware transports (vhost-vsock, loopback) can still use both VSOCK_NET_MODE_GLOBAL and VSOCK_NET_MODE_LOCAL modes as intended.
The hyperv transport must be treated specially. Other G2H transports can can report presence of a device using get_local_cid(). When a device is present it returns a valid CID; otherwise, it returns VMADDR_CID_ANY. THe hyperv transport's get_local_cid() always returns VMADDR_CID_ANY, however, even when a device is present.
For that reason, this patch adds an always_block_local_mode flag to struct vsock_transport. When set to true, VSOCK_NET_MODE_LOCAL is blocked unconditionally whenever the transport is registered, regardless of device presence. When false, LOCAL mode is only blocked when get_local_cid() indicates a device is present (!= VMADDR_CID_ANY).
The hyperv transport sets this flag to true to unconditionally block local mode. Other G2H transports (virtio-vsock, vmci-vsock) leave it false and continue using device detection via get_local_cid() to block local mode.
These restrictions can be lifted in a future patch series when G2H transports gain namespace support.
IMO this commit should be before supporting namespaces in any device, so we are sure we don't have commits where this can happen.
sgtm!
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com
include/net/af_vsock.h | 8 +++++++ net/vmw_vsock/af_vsock.c | 45 +++++++++++++++++++++++++++++++++++++--- net/vmw_vsock/hyperv_transport.c | 1 + 3 files changed, 51 insertions(+), 3 deletions(-)
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index cfd121bb5ab7..089c61105dda 100644 --- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -108,6 +108,14 @@ struct vsock_transport_send_notify_data {
struct vsock_transport { struct module *module;
/* If true, block VSOCK_NET_MODE_LOCAL unconditionally when this G2H
* transport is registered. If false, only block LOCAL mode when* get_local_cid() indicates a device is present (!= VMADDR_CID_ANY).* Hyperv sets this true because it doesn't offer a callback that* detects device presence. This only applies to G2H transports; H2G* transports are unaffected.*/bool always_block_local_mode;
/* Initialize/tear-down socket. */ int (*init)(struct vsock_sock *, struct vsock_sock *);
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index c0b5946bdc95..a2da1810b802 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -91,6 +91,11 @@
- and locked down by a namespace manager. The default is "global". The mode
- is set per-namespace.
- Note: LOCAL mode is only supported when using namespace-aware transports
- (vhost-vsock, loopback). If a guest-to-host transport (virtio-vsock,
- hyperv-vsock, vmci-vsock) is loaded, attempts to set LOCAL mode will fail
- with EOPNOTSUPP, as these transports do not support per-namespace
isolation.
- The modes affect the allocation and accessibility of CIDs as follows:
- global - access and allocation are all system-wide
@@ -2757,12 +2762,30 @@ static int vsock_net_mode_string(const struct ctl_table *table, int write, if (*lenp >= sizeof(data)) return -EINVAL;
if (!strncmp(data, VSOCK_NET_MODE_STR_GLOBAL, sizeof(data)))
if (!strncmp(data, VSOCK_NET_MODE_STR_GLOBAL, sizeof(data))) { mode = VSOCK_NET_MODE_GLOBAL;
else if (!strncmp(data, VSOCK_NET_MODE_STR_LOCAL, sizeof(data)))
} else if (!strncmp(data, VSOCK_NET_MODE_STR_LOCAL, sizeof(data))) {/* LOCAL mode is not supported when G2H transports* (virtio-vsock, hyperv, vmci) are active, because* these transports don't support namespaces. We must* stay in GLOBAL mode to avoid bind/lookup mismatches.** Check if G2H transport is present and either:* 1. Has always_block_local_mode set (hyperv), OR* 2. Has an actual device present (get_local_cid() != VMADDR_CID_ANY)*/mutex_lock(&vsock_register_mutex);if (transport_g2h &&(transport_g2h->always_block_local_mode ||transport_g2h->get_local_cid() != VMADDR_CID_ANY)) {This seems almost like a hack. What about adding a new function in the transports that tells us whether a device is present or not and implement it in all of them?
Or a more specific function to check if the transport can work with local mode (e.g. netns_local_aware() or something like that - I'm not great with nameming xD)
That sounds good to me, I probably prefer option 2 because I think it'll be simpler for the hyperv case.
mutex_unlock(&vsock_register_mutex);return -EOPNOTSUPP;}mutex_unlock(&vsock_register_mutex);What happen if the G2H is loaded here, just after we release the mutex?
I suspect there could be a race that we may fix postponing the unlock after the vsock_net_write_mode() call.
WDYT?
Oh good eye, yeah I think you are right. Writing the net mode should definitely be in the critical section.
mode = VSOCK_NET_MODE_LOCAL;
else
} else { return -EINVAL;}if (!vsock_net_write_mode(net, mode)) return -EPERM;
@@ -2909,6 +2932,7 @@ int vsock_core_register(const struct vsock_transport *t, int features) { const struct vsock_transport *t_h2g, *t_g2h, *t_dgram, *t_local; int err = mutex_lock_interruptible(&vsock_register_mutex);
struct net *net;
if (err) return err;
@@ -2931,6 +2955,21 @@ int vsock_core_register(const struct vsock_transport *t, int features) err = -EBUSY; goto err_busy; }
/* G2H sockets break in LOCAL mode namespaces because G2H transports* don't support them yet. Block registering new G2H transports if we* already have local mode namespaces on the system.*/rcu_read_lock();for_each_net_rcu(net) {if (vsock_net_mode(net) == VSOCK_NET_MODE_LOCAL) {rcu_read_unlock();err = -EOPNOTSUPP;goto err_busy;}}rcu_read_unlock();- t_g2h = t; }
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c index 432fcbbd14d4..ed48dd1ff19b 100644 --- a/net/vmw_vsock/hyperv_transport.c +++ b/net/vmw_vsock/hyperv_transport.c @@ -835,6 +835,7 @@ int hvs_notify_set_rcvlowat(struct vsock_sock *vsk, int val)
static struct vsock_transport hvs_transport = { .module = THIS_MODULE,
.always_block_local_mode = true,
.get_local_cid = hvs_get_local_cid,
-- 2.47.3
From: Bobby Eshleman bobbyeshleman@meta.com
Add functions for initializing namespaces with the different vsock NS modes. Callers can use add_namespaces() and del_namespaces() to create namespaces global0, global1, local0, and local1.
The init_namespaces() function initializes global0, local0, etc... with their respective vsock NS mode. This function is separate so that tests that depend on this initialization can use it, while other tests that want to test the initialization interface itself can start with a clean slate by omitting this call.
Remove namespaces upon exiting the program in cleanup(). This is unlikely to be needed for a healthy run, but it is useful for tests that are manually killed mid-test. In that case, this patch prevents the subsequent test run from finding stale namespaces with already-write-once-locked vsock ns modes.
This patch is in preparation for later namespace tests.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- tools/testing/selftests/vsock/vmtest.sh | 41 +++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh index c7b270dd77a9..f78cc574c274 100755 --- a/tools/testing/selftests/vsock/vmtest.sh +++ b/tools/testing/selftests/vsock/vmtest.sh @@ -49,6 +49,7 @@ readonly TEST_DESCS=( )
readonly USE_SHARED_VM=(vm_server_host_client vm_client_host_server vm_loopback) +readonly NS_MODES=("local" "global")
VERBOSE=0
@@ -103,6 +104,45 @@ check_result() { fi }
+add_namespaces() { + # add namespaces local0, local1, global0, and global1 + for mode in "${NS_MODES[@]}"; do + ip netns add "${mode}0" 2>/dev/null + ip netns add "${mode}1" 2>/dev/null + done +} + +init_namespaces() { + for mode in "${NS_MODES[@]}"; do + ns_set_mode "${mode}0" "${mode}" + ns_set_mode "${mode}1" "${mode}" + + log_host "set ns ${mode}0 to mode ${mode}" + log_host "set ns ${mode}1 to mode ${mode}" + + # we need lo for qemu port forwarding + ip netns exec "${mode}0" ip link set dev lo up + ip netns exec "${mode}1" ip link set dev lo up + done +} + +del_namespaces() { + for mode in "${NS_MODES[@]}"; do + ip netns del "${mode}0" &>/dev/null + ip netns del "${mode}1" &>/dev/null + log_host "removed ns ${mode}0" + log_host "removed ns ${mode}1" + done +} + +ns_set_mode() { + local ns=$1 + local mode=$2 + + echo "${mode}" | ip netns exec "${ns}" \ + tee /proc/sys/net/vsock/ns_mode &>/dev/null +} + vm_ssh() { ssh -q -o UserKnownHostsFile=/dev/null -p ${SSH_HOST_PORT} localhost "$@" return $? @@ -110,6 +150,7 @@ vm_ssh() {
cleanup() { terminate_pidfiles "${!PIDFILES[@]}" + del_namespaces }
check_args() {
From: Bobby Eshleman bobbyeshleman@meta.com
Add namespace support to vm management, ssh helpers, and vsock_test wrapper functions. This enables running VMs and test helpers in specific namespaces, which is required for upcoming namespace isolation tests.
The functions still work correctly within the init ns, though the caller must now pass "init_ns" explicitly.
No functional changes for existing tests. All have been updated to pass "init_ns" explicitly.
Affected functions (such as vm_start() and vm_ssh()) now wrap their commands with 'ip netns exec' when executing commands in non-init namespaces.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- tools/testing/selftests/vsock/vmtest.sh | 100 ++++++++++++++++++++++---------- 1 file changed, 68 insertions(+), 32 deletions(-)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh index f78cc574c274..663be2da4e22 100755 --- a/tools/testing/selftests/vsock/vmtest.sh +++ b/tools/testing/selftests/vsock/vmtest.sh @@ -144,7 +144,18 @@ ns_set_mode() { }
vm_ssh() { - ssh -q -o UserKnownHostsFile=/dev/null -p ${SSH_HOST_PORT} localhost "$@" + local ns_exec + + if [[ "${1}" == init_ns ]]; then + ns_exec="" + else + ns_exec="ip netns exec ${1}" + fi + + shift + + ${ns_exec} ssh -q -o UserKnownHostsFile=/dev/null -p "${SSH_HOST_PORT}" localhost "$@" + return $? }
@@ -267,10 +278,12 @@ terminate_pidfiles() {
vm_start() { local pidfile=$1 + local ns=$2 local logfile=/dev/null local verbose_opt="" local kernel_opt="" local qemu_opts="" + local ns_exec="" local qemu
qemu=$(command -v "${QEMU}") @@ -291,7 +304,11 @@ vm_start() { kernel_opt="${KERNEL_CHECKOUT}" fi
- vng \ + if [[ "${ns}" != "init_ns" ]]; then + ns_exec="ip netns exec ${ns}" + fi + + ${ns_exec} vng \ --run \ ${kernel_opt} \ ${verbose_opt} \ @@ -306,6 +323,7 @@ vm_start() { }
vm_wait_for_ssh() { + local ns=$1 local i
i=0 @@ -313,7 +331,8 @@ vm_wait_for_ssh() { if [[ ${i} -gt ${WAIT_PERIOD_MAX} ]]; then die "Timed out waiting for guest ssh" fi - if vm_ssh -- true; then + + if vm_ssh "${ns}" -- true; then break fi i=$(( i + 1 )) @@ -347,30 +366,40 @@ wait_for_listener() }
vm_wait_for_listener() { - local port=$1 + local ns=$1 + local port=$2
- vm_ssh <<EOF + vm_ssh "${ns}" <<EOF $(declare -f wait_for_listener) wait_for_listener ${port} ${WAIT_PERIOD} ${WAIT_PERIOD_MAX} EOF }
host_wait_for_listener() { - local port=$1 + local ns=$1 + local port=$2
- wait_for_listener "${port}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}" + if [[ "${ns}" == "init_ns" ]]; then + wait_for_listener "${port}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}" + else + ip netns exec "${ns}" bash <<-EOF + $(declare -f wait_for_listener) + wait_for_listener ${port} ${WAIT_PERIOD} ${WAIT_PERIOD_MAX} + EOF + fi }
vm_vsock_test() { - local host=$1 - local cid=$2 - local port=$3 + local ns=$1 + local host=$2 + local cid=$3 + local port=$4 local rc
# log output and use pipefail to respect vsock_test errors set -o pipefail if [[ "${host}" != server ]]; then - vm_ssh -- "${VSOCK_TEST}" \ + vm_ssh "${ns}" -- "${VSOCK_TEST}" \ --mode=client \ --control-host="${host}" \ --peer-cid="${cid}" \ @@ -378,7 +407,7 @@ vm_vsock_test() { 2>&1 | log_guest rc=$? else - vm_ssh -- "${VSOCK_TEST}" \ + vm_ssh "${ns}" -- "${VSOCK_TEST}" \ --mode=server \ --peer-cid="${cid}" \ --control-port="${port}" \ @@ -390,7 +419,7 @@ vm_vsock_test() { return $rc fi
- vm_wait_for_listener "${port}" + vm_wait_for_listener "${ns}" "${port}" rc=$? fi set +o pipefail @@ -399,22 +428,28 @@ vm_vsock_test() { }
host_vsock_test() { - local host=$1 - local cid=$2 - local port=$3 + local ns=$1 + local host=$2 + local cid=$3 + local port=$4 local rc
+ local cmd="${VSOCK_TEST}" + if [[ "${ns}" != "init_ns" ]]; then + cmd="ip netns exec ${ns} ${cmd}" + fi + # log output and use pipefail to respect vsock_test errors set -o pipefail if [[ "${host}" != server ]]; then - ${VSOCK_TEST} \ + ${cmd} \ --mode=client \ --peer-cid="${cid}" \ --control-host="${host}" \ --control-port="${port}" 2>&1 | log_host rc=$? else - ${VSOCK_TEST} \ + ${cmd} \ --mode=server \ --peer-cid="${cid}" \ --control-port="${port}" 2>&1 | log_host & @@ -425,7 +460,7 @@ host_vsock_test() { return $rc fi
- host_wait_for_listener "${port}" + host_wait_for_listener "${ns}" "${port}" rc=$? fi set +o pipefail @@ -469,11 +504,11 @@ log_guest() { }
test_vm_server_host_client() { - if ! vm_vsock_test "server" 2 "${TEST_GUEST_PORT}"; then + if ! vm_vsock_test "init_ns" "server" 2 "${TEST_GUEST_PORT}"; then return "${KSFT_FAIL}" fi
- if ! host_vsock_test "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}"; then + if ! host_vsock_test "init_ns" "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}"; then return "${KSFT_FAIL}" fi
@@ -481,11 +516,11 @@ test_vm_server_host_client() { }
test_vm_client_host_server() { - if ! host_vsock_test "server" "${VSOCK_CID}" "${TEST_HOST_PORT_LISTENER}"; then + if ! host_vsock_test "init_ns" "server" "${VSOCK_CID}" "${TEST_HOST_PORT_LISTENER}"; then return "${KSFT_FAIL}" fi
- if ! vm_vsock_test "10.0.2.2" 2 "${TEST_HOST_PORT_LISTENER}"; then + if ! vm_vsock_test "init_ns" "10.0.2.2" 2 "${TEST_HOST_PORT_LISTENER}"; then return "${KSFT_FAIL}" fi
@@ -495,13 +530,14 @@ test_vm_client_host_server() { test_vm_loopback() { local port=60000 # non-forwarded local port
- vm_ssh -- modprobe vsock_loopback &> /dev/null || : + vm_ssh "init_ns" -- modprobe vsock_loopback &> /dev/null || :
- if ! vm_vsock_test "server" 1 "${port}"; then + if ! vm_vsock_test "init_ns" "server" 1 "${port}"; then return "${KSFT_FAIL}" fi
- if ! vm_vsock_test "127.0.0.1" 1 "${port}"; then + + if ! vm_vsock_test "init_ns" "127.0.0.1" 1 "${port}"; then return "${KSFT_FAIL}" fi
@@ -559,8 +595,8 @@ run_shared_vm_test() {
host_oops_cnt_before=$(dmesg | grep -c -i 'Oops') host_warn_cnt_before=$(dmesg --level=warn | grep -c -i 'vsock') - vm_oops_cnt_before=$(vm_ssh -- dmesg | grep -c -i 'Oops') - vm_warn_cnt_before=$(vm_ssh -- dmesg --level=warn | grep -c -i 'vsock') + vm_oops_cnt_before=$(vm_ssh "init_ns" -- dmesg | grep -c -i 'Oops') + vm_warn_cnt_before=$(vm_ssh "init_ns" -- dmesg --level=warn | grep -c -i 'vsock')
name=$(echo "${1}" | awk '{ print $1 }') eval test_"${name}" @@ -578,13 +614,13 @@ run_shared_vm_test() { rc=$KSFT_FAIL fi
- vm_oops_cnt_after=$(vm_ssh -- dmesg | grep -i 'Oops' | wc -l) + vm_oops_cnt_after=$(vm_ssh "init_ns" -- dmesg | grep -i 'Oops' | wc -l) if [[ ${vm_oops_cnt_after} -gt ${vm_oops_cnt_before} ]]; then echo "FAIL: kernel oops detected on vm" | log_host rc=$KSFT_FAIL fi
- vm_warn_cnt_after=$(vm_ssh -- dmesg --level=warn | grep -c -i 'vsock') + vm_warn_cnt_after=$(vm_ssh "init_ns" -- dmesg --level=warn | grep -c -i 'vsock') if [[ ${vm_warn_cnt_after} -gt ${vm_warn_cnt_before} ]]; then echo "FAIL: kernel warning detected on vm" | log_host rc=$KSFT_FAIL @@ -630,8 +666,8 @@ cnt_total=0 if shared_vm_tests_requested "${ARGS[@]}"; then log_host "Booting up VM" pidfile="$(create_pidfile)" - vm_start "${pidfile}" - vm_wait_for_ssh + vm_start "${pidfile}" "init_ns" + vm_wait_for_ssh "init_ns" log_host "VM booted up"
run_shared_vm_tests "${ARGS[@]}"
On Tue, Nov 11, 2025 at 10:54:52PM -0800, Bobby Eshleman wrote:
From: Bobby Eshleman bobbyeshleman@meta.com
Add namespace support to vm management, ssh helpers, and vsock_test wrapper functions. This enables running VMs and test helpers in specific namespaces, which is required for upcoming namespace isolation tests.
The functions still work correctly within the init ns, though the caller must now pass "init_ns" explicitly.
No functional changes for existing tests. All have been updated to pass "init_ns" explicitly.
Affected functions (such as vm_start() and vm_ssh()) now wrap their commands with 'ip netns exec' when executing commands in non-init namespaces.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com
tools/testing/selftests/vsock/vmtest.sh | 100 ++++++++++++++++++++++---------- 1 file changed, 68 insertions(+), 32 deletions(-)
Reviewed-by: Stefano Garzarella sgarzare@redhat.com
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh index f78cc574c274..663be2da4e22 100755 --- a/tools/testing/selftests/vsock/vmtest.sh +++ b/tools/testing/selftests/vsock/vmtest.sh @@ -144,7 +144,18 @@ ns_set_mode() { }
vm_ssh() {
- ssh -q -o UserKnownHostsFile=/dev/null -p ${SSH_HOST_PORT} localhost "$@"
- local ns_exec
- if [[ "${1}" == init_ns ]]; then
ns_exec=""- else
ns_exec="ip netns exec ${1}"- fi
- shift
- ${ns_exec} ssh -q -o UserKnownHostsFile=/dev/null -p "${SSH_HOST_PORT}" localhost "$@"
- return $?
}
@@ -267,10 +278,12 @@ terminate_pidfiles() {
vm_start() { local pidfile=$1
local ns=$2 local logfile=/dev/null local verbose_opt="" local kernel_opt="" local qemu_opts=""
local ns_exec="" local qemu
qemu=$(command -v "${QEMU}")
@@ -291,7 +304,11 @@ vm_start() { kernel_opt="${KERNEL_CHECKOUT}" fi
- vng \
- if [[ "${ns}" != "init_ns" ]]; then
ns_exec="ip netns exec ${ns}"- fi
- ${ns_exec} vng \ --run \ ${kernel_opt} \ ${verbose_opt} \
@@ -306,6 +323,7 @@ vm_start() { }
vm_wait_for_ssh() {
local ns=$1 local i
i=0
@@ -313,7 +331,8 @@ vm_wait_for_ssh() { if [[ ${i} -gt ${WAIT_PERIOD_MAX} ]]; then die "Timed out waiting for guest ssh" fi
if vm_ssh -- true; then
fi i=$(( i + 1 ))if vm_ssh "${ns}" -- true; then break@@ -347,30 +366,40 @@ wait_for_listener() }
vm_wait_for_listener() {
- local port=$1
- local ns=$1
- local port=$2
- vm_ssh <<EOF
- vm_ssh "${ns}" <<EOF
$(declare -f wait_for_listener) wait_for_listener ${port} ${WAIT_PERIOD} ${WAIT_PERIOD_MAX} EOF }
host_wait_for_listener() {
- local port=$1
- local ns=$1
- local port=$2
- wait_for_listener "${port}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}"
- if [[ "${ns}" == "init_ns" ]]; then
wait_for_listener "${port}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}"- else
ip netns exec "${ns}" bash <<-EOF$(declare -f wait_for_listener)wait_for_listener ${port} ${WAIT_PERIOD} ${WAIT_PERIOD_MAX}EOF- fi
}
vm_vsock_test() {
- local host=$1
- local cid=$2
- local port=$3
local ns=$1
local host=$2
local cid=$3
local port=$4 local rc
# log output and use pipefail to respect vsock_test errors set -o pipefail if [[ "${host}" != server ]]; then
vm_ssh -- "${VSOCK_TEST}" \
vm_ssh "${ns}" -- "${VSOCK_TEST}" \ --mode=client \ --control-host="${host}" \ --peer-cid="${cid}" \@@ -378,7 +407,7 @@ vm_vsock_test() { 2>&1 | log_guest rc=$? else
vm_ssh -- "${VSOCK_TEST}" \
vm_ssh "${ns}" -- "${VSOCK_TEST}" \ --mode=server \ --peer-cid="${cid}" \ --control-port="${port}" \@@ -390,7 +419,7 @@ vm_vsock_test() { return $rc fi
vm_wait_for_listener "${port}"
rc=$? fi set +o pipefailvm_wait_for_listener "${ns}" "${port}"@@ -399,22 +428,28 @@ vm_vsock_test() { }
host_vsock_test() {
- local host=$1
- local cid=$2
- local port=$3
local ns=$1
local host=$2
local cid=$3
local port=$4 local rc
local cmd="${VSOCK_TEST}"
if [[ "${ns}" != "init_ns" ]]; then
cmd="ip netns exec ${ns} ${cmd}"fi
# log output and use pipefail to respect vsock_test errors set -o pipefail if [[ "${host}" != server ]]; then
${VSOCK_TEST} \
rc=$? else${cmd} \ --mode=client \ --peer-cid="${cid}" \ --control-host="${host}" \ --control-port="${port}" 2>&1 | log_host
${VSOCK_TEST} \
${cmd} \ --mode=server \ --peer-cid="${cid}" \ --control-port="${port}" 2>&1 | log_host &@@ -425,7 +460,7 @@ host_vsock_test() { return $rc fi
host_wait_for_listener "${port}"
rc=$? fi set +o pipefailhost_wait_for_listener "${ns}" "${port}"@@ -469,11 +504,11 @@ log_guest() { }
test_vm_server_host_client() {
- if ! vm_vsock_test "server" 2 "${TEST_GUEST_PORT}"; then
- if ! vm_vsock_test "init_ns" "server" 2 "${TEST_GUEST_PORT}"; then return "${KSFT_FAIL}" fi
- if ! host_vsock_test "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}"; then
- if ! host_vsock_test "init_ns" "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}"; then return "${KSFT_FAIL}" fi
@@ -481,11 +516,11 @@ test_vm_server_host_client() { }
test_vm_client_host_server() {
- if ! host_vsock_test "server" "${VSOCK_CID}" "${TEST_HOST_PORT_LISTENER}"; then
- if ! host_vsock_test "init_ns" "server" "${VSOCK_CID}" "${TEST_HOST_PORT_LISTENER}"; then return "${KSFT_FAIL}" fi
- if ! vm_vsock_test "10.0.2.2" 2 "${TEST_HOST_PORT_LISTENER}"; then
- if ! vm_vsock_test "init_ns" "10.0.2.2" 2 "${TEST_HOST_PORT_LISTENER}"; then return "${KSFT_FAIL}" fi
@@ -495,13 +530,14 @@ test_vm_client_host_server() { test_vm_loopback() { local port=60000 # non-forwarded local port
- vm_ssh -- modprobe vsock_loopback &> /dev/null || :
- vm_ssh "init_ns" -- modprobe vsock_loopback &> /dev/null || :
- if ! vm_vsock_test "server" 1 "${port}"; then
- if ! vm_vsock_test "init_ns" "server" 1 "${port}"; then return "${KSFT_FAIL}" fi
- if ! vm_vsock_test "127.0.0.1" 1 "${port}"; then
- if ! vm_vsock_test "init_ns" "127.0.0.1" 1 "${port}"; then return "${KSFT_FAIL}" fi
@@ -559,8 +595,8 @@ run_shared_vm_test() {
host_oops_cnt_before=$(dmesg | grep -c -i 'Oops') host_warn_cnt_before=$(dmesg --level=warn | grep -c -i 'vsock')
- vm_oops_cnt_before=$(vm_ssh -- dmesg | grep -c -i 'Oops')
- vm_warn_cnt_before=$(vm_ssh -- dmesg --level=warn | grep -c -i 'vsock')
vm_oops_cnt_before=$(vm_ssh "init_ns" -- dmesg | grep -c -i 'Oops')
vm_warn_cnt_before=$(vm_ssh "init_ns" -- dmesg --level=warn | grep -c -i 'vsock')
name=$(echo "${1}" | awk '{ print $1 }') eval test_"${name}"
@@ -578,13 +614,13 @@ run_shared_vm_test() { rc=$KSFT_FAIL fi
- vm_oops_cnt_after=$(vm_ssh -- dmesg | grep -i 'Oops' | wc -l)
- vm_oops_cnt_after=$(vm_ssh "init_ns" -- dmesg | grep -i 'Oops' | wc -l) if [[ ${vm_oops_cnt_after} -gt ${vm_oops_cnt_before} ]]; then echo "FAIL: kernel oops detected on vm" | log_host rc=$KSFT_FAIL fi
- vm_warn_cnt_after=$(vm_ssh -- dmesg --level=warn | grep -c -i 'vsock')
- vm_warn_cnt_after=$(vm_ssh "init_ns" -- dmesg --level=warn | grep -c -i 'vsock') if [[ ${vm_warn_cnt_after} -gt ${vm_warn_cnt_before} ]]; then echo "FAIL: kernel warning detected on vm" | log_host rc=$KSFT_FAIL
@@ -630,8 +666,8 @@ cnt_total=0 if shared_vm_tests_requested "${ARGS[@]}"; then log_host "Booting up VM" pidfile="$(create_pidfile)"
- vm_start "${pidfile}"
- vm_wait_for_ssh
vm_start "${pidfile}" "init_ns"
vm_wait_for_ssh "init_ns" log_host "VM booted up"
run_shared_vm_tests "${ARGS[@]}"
-- 2.47.3
From: Bobby Eshleman bobbyeshleman@meta.com
Add tests for the /proc/sys/net/vsock/ns_mode interface. Namely, that it accepts "global" and "local" strings and enforces a write-once policy.
Start a convention of commenting the test name over the test description. Add test name comments over test descriptions that existed before this convention.
Add a check_netns() function that checks if the test requires namespaces and if the current kernel supports namespaces. Skip tests that require namespaces if the system does not have namespace support.
Add a test to verify that guest VMs with an active G2H transport (virtio-vsock) cannot set namespace mode to 'local'. This validates the mutual exclusion between G2H transports and LOCAL mode.
This patch is the first to add tests that do *not* re-use the same shared VM. For that reason, it adds a run_tests() function to run these tests and filter out the shared VM tests.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- Changes in v9: - add test ns_vm_local_mode_rejected to check that guests cannot use local mode --- tools/testing/selftests/vsock/vmtest.sh | 130 +++++++++++++++++++++++++++++++- 1 file changed, 128 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh index 663be2da4e22..ef5f1d954f8b 100755 --- a/tools/testing/selftests/vsock/vmtest.sh +++ b/tools/testing/selftests/vsock/vmtest.sh @@ -41,14 +41,40 @@ readonly KERNEL_CMDLINE="\ virtme.ssh virtme_ssh_channel=tcp virtme_ssh_user=$USER \ " readonly LOG=$(mktemp /tmp/vsock_vmtest_XXXX.log) -readonly TEST_NAMES=(vm_server_host_client vm_client_host_server vm_loopback) +readonly TEST_NAMES=( + vm_server_host_client + vm_client_host_server + vm_loopback + ns_host_vsock_ns_mode_ok + ns_host_vsock_ns_mode_write_once_ok + ns_vm_local_mode_rejected +) readonly TEST_DESCS=( + # vm_server_host_client "Run vsock_test in server mode on the VM and in client mode on the host." + + # vm_client_host_server "Run vsock_test in client mode on the VM and in server mode on the host." + + # vm_loopback "Run vsock_test using the loopback transport in the VM." + + # ns_host_vsock_ns_mode_ok + "Check /proc/sys/net/vsock/ns_mode strings on the host." + + # ns_host_vsock_ns_mode_write_once_ok + "Check /proc/sys/net/vsock/ns_mode is write-once on the host." + + # ns_vm_local_mode_rejected + "Test that guest VM with G2H transport cannot set namespace mode to 'local'" )
-readonly USE_SHARED_VM=(vm_server_host_client vm_client_host_server vm_loopback) +readonly USE_SHARED_VM=( + vm_server_host_client + vm_client_host_server + vm_loopback + ns_vm_local_mode_rejected +) readonly NS_MODES=("local" "global")
VERBOSE=0 @@ -205,6 +231,20 @@ check_deps() { fi }
+check_netns() { + local tname=$1 + + # If the test requires NS support, check if NS support exists + # using /proc/self/ns + if [[ "${tname}" =~ ^ns_ ]] && + [[ ! -e /proc/self/ns ]]; then + log_host "No NS support detected for test ${tname}" + return 1 + fi + + return 0 +} + check_vng() { local tested_versions local version @@ -503,6 +543,43 @@ log_guest() { LOG_PREFIX=guest log "$@" }
+test_ns_host_vsock_ns_mode_ok() { + add_namespaces + + for mode in "${NS_MODES[@]}"; do + if ! ns_set_mode "${mode}0" "${mode}"; then + del_namespaces + return "${KSFT_FAIL}" + fi + done + + del_namespaces + + return "${KSFT_PASS}" +} + +test_ns_host_vsock_ns_mode_write_once_ok() { + add_namespaces + + for mode in "${NS_MODES[@]}"; do + local ns="${mode}0" + if ! ns_set_mode "${ns}" "${mode}"; then + del_namespaces + return "${KSFT_FAIL}" + fi + + # try writing again and expect failure + if ns_set_mode "${ns}" "${mode}"; then + del_namespaces + return "${KSFT_FAIL}" + fi + done + + del_namespaces + + return "${KSFT_PASS}" +} + test_vm_server_host_client() { if ! vm_vsock_test "init_ns" "server" 2 "${TEST_GUEST_PORT}"; then return "${KSFT_FAIL}" @@ -544,6 +621,26 @@ test_vm_loopback() { return "${KSFT_PASS}" }
+test_ns_vm_local_mode_rejected() { + # Guest VMs have a G2H transport (virtio-vsock) active, so they + # should not be able to set namespace mode to 'local'. + # This test verifies that the sysctl write fails as expected. + + # Try to set local mode in the guest's init_ns + if vm_ssh init_ns "echo local | tee /proc/sys/net/vsock/ns_mode &>/dev/null"; then + return "${KSFT_FAIL}" + fi + + # Verify mode is still 'global' + local mode + mode=$(vm_ssh init_ns "cat /proc/sys/net/vsock/ns_mode") + if [[ "${mode}" != "global" ]]; then + return "${KSFT_FAIL}" + fi + + return "${KSFT_PASS}" +} + shared_vm_test() { local tname
@@ -576,6 +673,11 @@ run_shared_vm_tests() { continue fi
+ if ! check_netns "${arg}"; then + check_result "${KSFT_SKIP}" "${arg}" + continue + fi + run_shared_vm_test "${arg}" check_result "$?" "${arg}" done @@ -629,6 +731,28 @@ run_shared_vm_test() { return "${rc}" }
+run_tests() { + for arg in "${ARGS[@]}"; do + if shared_vm_test "${arg}"; then + continue + fi + + if ! check_netns "${arg}"; then + check_result "${KSFT_SKIP}" "${arg}" + continue + fi + + add_namespaces + + name=$(echo "${arg}" | awk '{ print $1 }') + log_host "Executing test_${name}" + eval test_"${name}" + check_result $? "${name}" + + del_namespaces + done +} + BUILD=0 QEMU="qemu-system-$(uname -m)"
@@ -674,6 +798,8 @@ if shared_vm_tests_requested "${ARGS[@]}"; then terminate_pidfiles "${pidfile}" fi
+run_tests "${ARGS[@]}" + echo "SUMMARY: PASS=${cnt_pass} SKIP=${cnt_skip} FAIL=${cnt_fail}" echo "Log: ${LOG}"
On Tue, Nov 11, 2025 at 10:54:53PM -0800, Bobby Eshleman wrote:
From: Bobby Eshleman bobbyeshleman@meta.com
Add tests for the /proc/sys/net/vsock/ns_mode interface. Namely, that it accepts "global" and "local" strings and enforces a write-once policy.
Start a convention of commenting the test name over the test description. Add test name comments over test descriptions that existed before this convention.
Add a check_netns() function that checks if the test requires namespaces and if the current kernel supports namespaces. Skip tests that require namespaces if the system does not have namespace support.
Add a test to verify that guest VMs with an active G2H transport (virtio-vsock) cannot set namespace mode to 'local'. This validates the mutual exclusion between G2H transports and LOCAL mode.
This patch is the first to add tests that do *not* re-use the same shared VM. For that reason, it adds a run_tests() function to run these tests and filter out the shared VM tests.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com
Changes in v9:
- add test ns_vm_local_mode_rejected to check that guests cannot use
local mode
tools/testing/selftests/vsock/vmtest.sh | 130 +++++++++++++++++++++++++++++++- 1 file changed, 128 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh index 663be2da4e22..ef5f1d954f8b 100755 --- a/tools/testing/selftests/vsock/vmtest.sh +++ b/tools/testing/selftests/vsock/vmtest.sh @@ -41,14 +41,40 @@ readonly KERNEL_CMDLINE="\ virtme.ssh virtme_ssh_channel=tcp virtme_ssh_user=$USER \ " readonly LOG=$(mktemp /tmp/vsock_vmtest_XXXX.log) -readonly TEST_NAMES=(vm_server_host_client vm_client_host_server vm_loopback) +readonly TEST_NAMES=(
- vm_server_host_client
- vm_client_host_server
- vm_loopback
- ns_host_vsock_ns_mode_ok
- ns_host_vsock_ns_mode_write_once_ok
- ns_vm_local_mode_rejected
+) readonly TEST_DESCS=(
- # vm_server_host_client "Run vsock_test in server mode on the VM and in client mode on the host."
- # vm_client_host_server "Run vsock_test in client mode on the VM and in server mode on the host."
- # vm_loopback "Run vsock_test using the loopback transport in the VM."
- # ns_host_vsock_ns_mode_ok
- "Check /proc/sys/net/vsock/ns_mode strings on the host."
- # ns_host_vsock_ns_mode_write_once_ok
- "Check /proc/sys/net/vsock/ns_mode is write-once on the host."
- # ns_vm_local_mode_rejected
- "Test that guest VM with G2H transport cannot set namespace mode to 'local'"
)
-readonly USE_SHARED_VM=(vm_server_host_client vm_client_host_server vm_loopback) +readonly USE_SHARED_VM=(
- vm_server_host_client
- vm_client_host_server
- vm_loopback
- ns_vm_local_mode_rejected
+) readonly NS_MODES=("local" "global")
VERBOSE=0 @@ -205,6 +231,20 @@ check_deps() { fi }
+check_netns() {
- local tname=$1
- # If the test requires NS support, check if NS support exists
- # using /proc/self/ns
- if [[ "${tname}" =~ ^ns_ ]] &&
[[ ! -e /proc/self/ns ]]; thenlog_host "No NS support detected for test ${tname}"return 1- fi
- return 0
+}
check_vng() { local tested_versions local version @@ -503,6 +543,43 @@ log_guest() { LOG_PREFIX=guest log "$@" }
+test_ns_host_vsock_ns_mode_ok() {
- add_namespaces
- for mode in "${NS_MODES[@]}"; do
if ! ns_set_mode "${mode}0" "${mode}"; thendel_namespacesreturn "${KSFT_FAIL}"fi- done
- del_namespaces
- return "${KSFT_PASS}"
+}
+test_ns_host_vsock_ns_mode_write_once_ok() {
- add_namespaces
- for mode in "${NS_MODES[@]}"; do
local ns="${mode}0"if ! ns_set_mode "${ns}" "${mode}"; thendel_namespacesreturn "${KSFT_FAIL}"fi# try writing again and expect failureif ns_set_mode "${ns}" "${mode}"; thendel_namespacesreturn "${KSFT_FAIL}"fi- done
- del_namespaces
- return "${KSFT_PASS}"
+}
test_vm_server_host_client() { if ! vm_vsock_test "init_ns" "server" 2 "${TEST_GUEST_PORT}"; then return "${KSFT_FAIL}" @@ -544,6 +621,26 @@ test_vm_loopback() { return "${KSFT_PASS}" }
+test_ns_vm_local_mode_rejected() {
- # Guest VMs have a G2H transport (virtio-vsock) active, so they
- # should not be able to set namespace mode to 'local'.
- # This test verifies that the sysctl write fails as expected.
- # Try to set local mode in the guest's init_ns
- if vm_ssh init_ns "echo local | tee /proc/sys/net/vsock/ns_mode &>/dev/null"; then
return "${KSFT_FAIL}"- fi
- # Verify mode is still 'global'
- local mode
- mode=$(vm_ssh init_ns "cat /proc/sys/net/vsock/ns_mode")
- if [[ "${mode}" != "global" ]]; then
return "${KSFT_FAIL}"- fi
- return "${KSFT_PASS}"
+}
shared_vm_test() { local tname
@@ -576,6 +673,11 @@ run_shared_vm_tests() { continue fi
if ! check_netns "${arg}"; thencheck_result "${KSFT_SKIP}" "${arg}"continuefi- run_shared_vm_test "${arg}" check_result "$?" "${arg}" done
@@ -629,6 +731,28 @@ run_shared_vm_test() { return "${rc}" }
+run_tests() {
- for arg in "${ARGS[@]}"; do
if shared_vm_test "${arg}"; thencontinuefiif ! check_netns "${arg}"; thencheck_result "${KSFT_SKIP}" "${arg}"continuefiadd_namespaces
Some tests call this in the test function, some not, but we call here for all test. I'm a bit confused.
Also, are we supposed to use this run_tests() only for namespace tests?
Thanks, Stefano
name=$(echo "${arg}" | awk '{ print $1 }')log_host "Executing test_${name}"eval test_"${name}"check_result $? "${name}"del_namespaces- done
+}
BUILD=0 QEMU="qemu-system-$(uname -m)"
@@ -674,6 +798,8 @@ if shared_vm_tests_requested "${ARGS[@]}"; then terminate_pidfiles "${pidfile}" fi
+run_tests "${ARGS[@]}"
echo "SUMMARY: PASS=${cnt_pass} SKIP=${cnt_skip} FAIL=${cnt_fail}" echo "Log: ${LOG}"
-- 2.47.3
From: Bobby Eshleman bobbyeshleman@meta.com
Add tests to verify CID collision rules across different vsock namespace modes.
1. Two VMs with the same CID cannot start in different global namespaces (ns_global_same_cid_fails) 2. Two VMs with the same CID can start in different local namespaces (ns_local_same_cid_ok) 3. VMs with the same CID can coexist when one is in a global namespace and another is in a local namespace (ns_global_local_same_cid_ok and ns_local_global_same_cid_ok)
The tests ns_global_local_same_cid_ok and ns_local_global_same_cid_ok make sure that ordering does not matter.
The tests use a shared helper function namespaces_can_boot_same_cid() that attempts to start two VMs with identical CIDs in the specified namespaces and verifies whether VM initialization failed or succeeded.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- tools/testing/selftests/vsock/vmtest.sh | 73 +++++++++++++++++++++++++++++++++ 1 file changed, 73 insertions(+)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh index ef5f1d954f8b..cc8dc280afdf 100755 --- a/tools/testing/selftests/vsock/vmtest.sh +++ b/tools/testing/selftests/vsock/vmtest.sh @@ -48,6 +48,10 @@ readonly TEST_NAMES=( ns_host_vsock_ns_mode_ok ns_host_vsock_ns_mode_write_once_ok ns_vm_local_mode_rejected + ns_global_same_cid_fails + ns_local_same_cid_ok + ns_global_local_same_cid_ok + ns_local_global_same_cid_ok ) readonly TEST_DESCS=( # vm_server_host_client @@ -67,6 +71,17 @@ readonly TEST_DESCS=(
# ns_vm_local_mode_rejected "Test that guest VM with G2H transport cannot set namespace mode to 'local'" + # ns_global_same_cid_fails + "Check QEMU fails to start two VMs with same CID in two different global namespaces." + + # ns_local_same_cid_ok + "Check QEMU successfully starts two VMs with same CID in two different local namespaces." + + # ns_global_local_same_cid_ok + "Check QEMU successfully starts one VM in a global ns and then another VM in a local ns with the same CID." + + # ns_local_global_same_cid_ok + "Check QEMU successfully starts one VM in a local ns and then another VM in a global ns with the same CID." )
readonly USE_SHARED_VM=( @@ -558,6 +573,64 @@ test_ns_host_vsock_ns_mode_ok() { return "${KSFT_PASS}" }
+namespaces_can_boot_same_cid() { + local ns0=$1 + local ns1=$2 + local pidfile1 pidfile2 + local rc + + pidfile1="$(create_pidfile)" + vm_start "${pidfile1}" "${ns0}" + + pidfile2="$(create_pidfile)" + vm_start "${pidfile2}" "${ns1}" + + rc=$? + terminate_pidfiles "${pidfile1}" "${pidfile2}" + + return "${rc}" +} + +test_ns_global_same_cid_fails() { + init_namespaces + + if namespaces_can_boot_same_cid "global0" "global1"; then + return "${KSFT_FAIL}" + fi + + return "${KSFT_PASS}" +} + +test_ns_local_global_same_cid_ok() { + init_namespaces + + if namespaces_can_boot_same_cid "local0" "global0"; then + return "${KSFT_PASS}" + fi + + return "${KSFT_FAIL}" +} + +test_ns_global_local_same_cid_ok() { + init_namespaces + + if namespaces_can_boot_same_cid "global0" "local0"; then + return "${KSFT_PASS}" + fi + + return "${KSFT_FAIL}" +} + +test_ns_local_same_cid_ok() { + init_namespaces + + if namespaces_can_boot_same_cid "local0" "local0"; then + return "${KSFT_FAIL}" + fi + + return "${KSFT_PASS}" +} + test_ns_host_vsock_ns_mode_write_once_ok() { add_namespaces
From: Bobby Eshleman bobbyeshleman@meta.com
Add tests to validate namespace correctness using vsock_test and socat. The vsock_test tool is used to validate expected success tests, but socat is used for expected failure tests. socat is used to ensure that connections are rejected outright instead of failing due to some other socket behavior (as tested in vsock_test). Additionally, socat is already required for tunneling TCP traffic from vsock_test. Using only one of the vsock_test tests like 'test_stream_client_close_client' would have yielded a similar result, but doing so wouldn't remove the socat dependency.
Additionally, check for the dependency socat. socat needs special handling beyond just checking if it is on the path because it must be compiled with support for both vsock and unix. The function check_socat() checks that this support exists.
Add more padding to test name printf strings because the tests added in this patch would otherwise overflow.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- Changes in v9: - consistent variable quoting --- tools/testing/selftests/vsock/vmtest.sh | 463 +++++++++++++++++++++++++++++++- 1 file changed, 461 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh index cc8dc280afdf..111059924287 100755 --- a/tools/testing/selftests/vsock/vmtest.sh +++ b/tools/testing/selftests/vsock/vmtest.sh @@ -7,6 +7,7 @@ # * virtme-ng # * busybox-static (used by virtme-ng) # * qemu (used by virtme-ng) +# * socat # # shellcheck disable=SC2317,SC2119
@@ -52,6 +53,19 @@ readonly TEST_NAMES=( ns_local_same_cid_ok ns_global_local_same_cid_ok ns_local_global_same_cid_ok + ns_diff_global_host_connect_to_global_vm_ok + ns_diff_global_host_connect_to_local_vm_fails + ns_diff_global_vm_connect_to_global_host_ok + ns_diff_global_vm_connect_to_local_host_fails + ns_diff_local_host_connect_to_local_vm_fails + ns_diff_local_vm_connect_to_local_host_fails + ns_diff_global_to_local_loopback_local_fails + ns_diff_local_to_global_loopback_fails + ns_diff_local_to_local_loopback_fails + ns_diff_global_to_global_loopback_ok + ns_same_local_loopback_ok + ns_same_local_host_connect_to_local_vm_ok + ns_same_local_vm_connect_to_local_host_ok ) readonly TEST_DESCS=( # vm_server_host_client @@ -82,6 +96,45 @@ readonly TEST_DESCS=(
# ns_local_global_same_cid_ok "Check QEMU successfully starts one VM in a local ns and then another VM in a global ns with the same CID." + + # ns_diff_global_host_connect_to_global_vm_ok + "Run vsock_test client in global ns with server in VM in another global ns." + + # ns_diff_global_host_connect_to_local_vm_fails + "Run socat to test a process in a global ns fails to connect to a VM in a local ns." + + # ns_diff_global_vm_connect_to_global_host_ok + "Run vsock_test client in VM in a global ns with server in another global ns." + + # ns_diff_global_vm_connect_to_local_host_fails + "Run socat to test a VM in a global ns fails to connect to a host process in a local ns." + + # ns_diff_local_host_connect_to_local_vm_fails + "Run socat to test a host process in a local ns fails to connect to a VM in another local ns." + + # ns_diff_local_vm_connect_to_local_host_fails + "Run socat to test a VM in a local ns fails to connect to a host process in another local ns." + + # ns_diff_global_to_local_loopback_local_fails + "Run socat to test a loopback vsock in a global ns fails to connect to a vsock in a local ns." + + # ns_diff_local_to_global_loopback_fails + "Run socat to test a loopback vsock in a local ns fails to connect to a vsock in a global ns." + + # ns_diff_local_to_local_loopback_fails + "Run socat to test a loopback vsock in a local ns fails to connect to a vsock in another local ns." + + # ns_diff_global_to_global_loopback_ok + "Run socat to test a loopback vsock in a global ns successfully connects to a vsock in another global ns." + + # ns_same_local_loopback_ok + "Run socat to test a loopback vsock in a local ns successfully connects to a vsock in the same ns." + + # ns_same_local_host_connect_to_local_vm_ok + "Run vsock_test client in a local ns with server in VM in same ns." + + # ns_same_local_vm_connect_to_local_host_ok + "Run vsock_test client in VM in a local ns with server in same ns." )
readonly USE_SHARED_VM=( @@ -113,7 +166,7 @@ usage() { for ((i = 0; i < ${#TEST_NAMES[@]}; i++)); do name=${TEST_NAMES[${i}]} desc=${TEST_DESCS[${i}]} - printf "\t%-35s%-35s\n" "${name}" "${desc}" + printf "\t%-55s%-35s\n" "${name}" "${desc}" done echo
@@ -232,7 +285,7 @@ check_args() { }
check_deps() { - for dep in vng ${QEMU} busybox pkill ssh; do + for dep in vng ${QEMU} busybox pkill ssh socat; do if [[ ! -x $(command -v "${dep}") ]]; then echo -e "skip: dependency ${dep} not found!\n" exit "${KSFT_SKIP}" @@ -283,6 +336,20 @@ check_vng() { fi }
+check_socat() { + local support_string + + support_string="$(socat -V)" + + if [[ "${support_string}" != *"WITH_VSOCK 1"* ]]; then + die "err: socat is missing vsock support" + fi + + if [[ "${support_string}" != *"WITH_UNIX 1"* ]]; then + die "err: socat is missing unix support" + fi +} + handle_build() { if [[ ! "${BUILD}" -eq 1 ]]; then return @@ -331,6 +398,14 @@ terminate_pidfiles() { done }
+terminate_pids() { + local pid + + for pid in "$@"; do + kill -SIGTERM "${pid}" &>/dev/null || : + done +} + vm_start() { local pidfile=$1 local ns=$2 @@ -573,6 +648,389 @@ test_ns_host_vsock_ns_mode_ok() { return "${KSFT_PASS}" }
+test_ns_diff_global_host_connect_to_global_vm_ok() { + local pids pid pidfile + local ns0 ns1 port + declare -a pids + local unixfile + ns0="global0" + ns1="global1" + port=1234 + local rc + + init_namespaces + + pidfile="$(create_pidfile)" + + if ! vm_start "${pidfile}" "${ns0}"; then + return "${KSFT_FAIL}" + fi + + unixfile=$(mktemp -u /tmp/XXXX.sock) + ip netns exec "${ns1}" \ + socat TCP-LISTEN:"${TEST_HOST_PORT}",fork \ + UNIX-CONNECT:"${unixfile}" & + pids+=($!) + host_wait_for_listener "${ns1}" "${TEST_HOST_PORT}" + + ip netns exec "${ns0}" socat UNIX-LISTEN:"${unixfile}",fork \ + TCP-CONNECT:localhost:"${TEST_HOST_PORT}" & + pids+=($!) + + vm_vsock_test "${ns0}" "server" 2 "${TEST_GUEST_PORT}" + vm_wait_for_listener "${ns0}" "${TEST_GUEST_PORT}" + host_vsock_test "${ns1}" "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}" + rc=$? + + for pid in "${pids[@]}"; do + if [[ "$(jobs -p)" = *"${pid}"* ]]; then + kill -SIGTERM "${pid}" &>/dev/null + fi + done + + terminate_pidfiles "${pidfile}" + + if [[ "${rc}" -ne 0 ]]; then + return "${KSFT_FAIL}" + fi + + return "${KSFT_PASS}" +} + +test_ns_diff_global_host_connect_to_local_vm_fails() { + local ns0="global0" + local ns1="local0" + local port=12345 + local pidfile + local result + local pid + + init_namespaces + + outfile=$(mktemp) + + pidfile="$(create_pidfile)" + if ! vm_start "${pidfile}" "${ns1}"; then + log_host "failed to start vm (cid=${VSOCK_CID}, ns=${ns0})" + return "${KSFT_FAIL}" + fi + + vm_wait_for_ssh "${ns1}" + vm_ssh "${ns1}" -- socat VSOCK-LISTEN:"${port}" STDOUT > "${outfile}" & + echo TEST | ip netns exec "${ns0}" \ + socat STDIN VSOCK-CONNECT:"${VSOCK_CID}":"${port}" 2>/dev/null + + terminate_pidfiles "${pidfile}" + + result=$(cat "${outfile}") + rm -f "${outfile}" + + if [[ "${result}" != TEST ]]; then + return "${KSFT_PASS}" + fi + + return "${KSFT_FAIL}" +} + +test_ns_diff_global_vm_connect_to_global_host_ok() { + local ns0="global0" + local ns1="global1" + local port=12345 + local unixfile + local pidfile + local pids + + init_namespaces + + declare -a pids + + log_host "Setup socat bridge from ns ${ns0} to ns ${ns1} over port ${port}" + + unixfile=$(mktemp -u /tmp/XXXX.sock) + + ip netns exec "${ns0}" \ + socat TCP-LISTEN:"${port}" UNIX-CONNECT:"${unixfile}" & + pids+=($!) + + ip netns exec "${ns1}" \ + socat UNIX-LISTEN:"${unixfile}" TCP-CONNECT:127.0.0.1:"${port}" & + pids+=($!) + + log_host "Launching ${VSOCK_TEST} in ns ${ns1}" + host_vsock_test "${ns1}" "server" "${VSOCK_CID}" "${port}" + + pidfile="$(create_pidfile)" + if ! vm_start "${pidfile}" "${ns0}"; then + log_host "failed to start vm (cid=${cid}, ns=${ns0})" + terminate_pids "${pids[@]}" + rm -f "${unixfile}" + return "${KSFT_FAIL}" + fi + + vm_wait_for_ssh "${ns0}" + vm_vsock_test "${ns0}" "10.0.2.2" 2 "${port}" + rc=$? + + terminate_pidfiles "${pidfile}" + terminate_pids "${pids[@]}" + rm -f "${unixfile}" + + if [[ ! $rc -eq 0 ]]; then + return "${KSFT_FAIL}" + fi + + return "${KSFT_PASS}" + +} + +test_ns_diff_global_vm_connect_to_local_host_fails() { + local ns0="global0" + local ns1="local0" + local port=12345 + local pidfile + local result + local pid + + init_namespaces + + log_host "Launching socat in ns ${ns1}" + outfile=$(mktemp) + ip netns exec "${ns1}" socat VSOCK-LISTEN:"${port}" STDOUT &> "${outfile}" & + pid=$! + + pidfile="$(create_pidfile)" + if ! vm_start "${pidfile}" "${ns0}"; then + log_host "failed to start vm (cid=${cid}, ns=${ns0})" + terminate_pids "${pid}" + rm -f "${outfile}" + return "${KSFT_FAIL}" + fi + + vm_wait_for_ssh "${ns0}" + + vm_ssh "${ns0}" -- \ + bash -c "echo TEST | socat STDIN VSOCK-CONNECT:2:${port}" 2>&1 | log_guest + + terminate_pidfiles "${pidfile}" + terminate_pids "${pid}" + + result=$(cat "${outfile}") + rm -f "${outfile}" + + if [[ "${result}" != TEST ]]; then + return "${KSFT_PASS}" + fi + + return "${KSFT_FAIL}" +} + +test_ns_diff_local_host_connect_to_local_vm_fails() { + local ns0="local0" + local ns1="local1" + local port=12345 + local pidfile + local result + local pid + + init_namespaces + + outfile=$(mktemp) + + pidfile="$(create_pidfile)" + if ! vm_start "${pidfile}" "${ns1}"; then + log_host "failed to start vm (cid=${cid}, ns=${ns0})" + return "${KSFT_FAIL}" + fi + + vm_wait_for_ssh "${ns1}" + vm_ssh "${ns1}" -- socat VSOCK-LISTEN:"${port}" STDOUT > "${outfile}" & + echo TEST | ip netns exec "${ns0}" \ + socat STDIN VSOCK-CONNECT:"${VSOCK_CID}":"${port}" 2>/dev/null + + terminate_pidfiles "${pidfile}" + + result=$(cat "${outfile}") + rm -f "${outfile}" + + if [[ "${result}" != TEST ]]; then + return "${KSFT_PASS}" + fi + + return "${KSFT_FAIL}" +} + +test_ns_diff_local_vm_connect_to_local_host_fails() { + local ns0="local0" + local ns1="local1" + local port=12345 + local pidfile + local result + local pid + + init_namespaces + + log_host "Launching socat in ns ${ns1}" + outfile=$(mktemp) + ip netns exec "${ns1}" socat VSOCK-LISTEN:"${port}" STDOUT &> "${outfile}" & + pid=$! + + pidfile="$(create_pidfile)" + if ! vm_start "${pidfile}" "${ns0}"; then + log_host "failed to start vm (cid=${cid}, ns=${ns0})" + rm -f "${outfile}" + return "${KSFT_FAIL}" + fi + + vm_wait_for_ssh "${ns0}" + + vm_ssh "${ns0}" -- \ + bash -c "echo TEST | socat STDIN VSOCK-CONNECT:2:${port}" 2>&1 | log_guest + + terminate_pidfiles "${pidfile}" + terminate_pids "${pid}" + + result=$(cat "${outfile}") + rm -f "${outfile}" + + if [[ "${result}" != TEST ]]; then + return "${KSFT_PASS}" + fi + + return "${KSFT_FAIL}" +} + +__test_loopback_two_netns() { + local ns0=$1 + local ns1=$2 + local port=12345 + local result + local pid + + modprobe vsock_loopback &> /dev/null || : + + log_host "Launching socat in ns ${ns1}" + outfile=$(mktemp) + ip netns exec "${ns1}" socat VSOCK-LISTEN:"${port}" STDOUT > "${outfile}" 2>/dev/null & + pid=$! + + log_host "Launching socat in ns ${ns0}" + echo TEST | ip netns exec "${ns0}" socat STDIN VSOCK-CONNECT:1:"${port}" 2>/dev/null + terminate_pids "${pid}" + + result=$(cat "${outfile}") + rm -f "${outfile}" + + if [[ "${result}" == TEST ]]; then + return 0 + fi + + return 1 +} + +test_ns_diff_global_to_local_loopback_local_fails() { + init_namespaces + + if ! __test_loopback_two_netns "global0" "local0"; then + return "${KSFT_PASS}" + fi + + return "${KSFT_FAIL}" +} + +test_ns_diff_local_to_global_loopback_fails() { + init_namespaces + + if ! __test_loopback_two_netns "local0" "global0"; then + return "${KSFT_PASS}" + fi + + return "${KSFT_FAIL}" +} + +test_ns_diff_local_to_local_loopback_fails() { + init_namespaces + + if ! __test_loopback_two_netns "local0" "local1"; then + return "${KSFT_PASS}" + fi + + return "${KSFT_FAIL}" +} + +test_ns_diff_global_to_global_loopback_ok() { + init_namespaces + + if __test_loopback_two_netns "global0" "global1"; then + return "${KSFT_PASS}" + fi + + return "${KSFT_FAIL}" +} + +test_ns_same_local_loopback_ok() { + init_namespaces + + if __test_loopback_two_netns "local0" "local0"; then + return "${KSFT_PASS}" + fi + + return "${KSFT_FAIL}" +} + +test_ns_same_local_host_connect_to_local_vm_ok() { + local ns="local0" + local port=1234 + local pidfile + local rc + + init_namespaces + + pidfile="$(create_pidfile)" + + if ! vm_start "${pidfile}" "${ns}"; then + return "${KSFT_FAIL}" + fi + + vm_vsock_test "${ns}" "server" 2 "${TEST_GUEST_PORT}" + host_vsock_test "${ns}" "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}" + rc=$? + + terminate_pidfiles "${pidfile}" + + if [[ $rc -ne 0 ]]; then + return "${KSFT_FAIL}" + fi + + return "${KSFT_PASS}" +} + +test_ns_same_local_vm_connect_to_local_host_ok() { + local ns="local0" + local port=1234 + local pidfile + local rc + + init_namespaces + + pidfile="$(create_pidfile)" + + if ! vm_start "${pidfile}" "${ns}"; then + return "${KSFT_FAIL}" + fi + + vm_vsock_test "${ns}" "server" 2 "${TEST_GUEST_PORT}" + host_vsock_test "${ns}" "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}" + rc=$? + + terminate_pidfiles "${pidfile}" + + if [[ $rc -ne 0 ]]; then + return "${KSFT_FAIL}" + fi + + return "${KSFT_PASS}" +} + namespaces_can_boot_same_cid() { local ns0=$1 local ns1=$2 @@ -851,6 +1309,7 @@ fi check_args "${ARGS[@]}" check_deps check_vng +check_socat handle_build
echo "1..${#ARGS[@]}"
On Tue, Nov 11, 2025 at 10:54:55PM -0800, Bobby Eshleman wrote:
From: Bobby Eshleman bobbyeshleman@meta.com
Add tests to validate namespace correctness using vsock_test and socat. The vsock_test tool is used to validate expected success tests, but socat is used for expected failure tests. socat is used to ensure that connections are rejected outright instead of failing due to some other socket behavior (as tested in vsock_test). Additionally, socat is already required for tunneling TCP traffic from vsock_test. Using only one of the vsock_test tests like 'test_stream_client_close_client' would have yielded a similar result, but doing so wouldn't remove the socat dependency.
Additionally, check for the dependency socat. socat needs special handling beyond just checking if it is on the path because it must be compiled with support for both vsock and unix. The function check_socat() checks that this support exists.
Add more padding to test name printf strings because the tests added in this patch would otherwise overflow.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com
Changes in v9:
- consistent variable quoting
tools/testing/selftests/vsock/vmtest.sh | 463 +++++++++++++++++++++++++++++++- 1 file changed, 461 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh index cc8dc280afdf..111059924287 100755 --- a/tools/testing/selftests/vsock/vmtest.sh +++ b/tools/testing/selftests/vsock/vmtest.sh @@ -7,6 +7,7 @@ # * virtme-ng # * busybox-static (used by virtme-ng) # * qemu (used by virtme-ng) +# * socat # # shellcheck disable=SC2317,SC2119
@@ -52,6 +53,19 @@ readonly TEST_NAMES=( ns_local_same_cid_ok ns_global_local_same_cid_ok ns_local_global_same_cid_ok
- ns_diff_global_host_connect_to_global_vm_ok
- ns_diff_global_host_connect_to_local_vm_fails
- ns_diff_global_vm_connect_to_global_host_ok
- ns_diff_global_vm_connect_to_local_host_fails
- ns_diff_local_host_connect_to_local_vm_fails
- ns_diff_local_vm_connect_to_local_host_fails
- ns_diff_global_to_local_loopback_local_fails
- ns_diff_local_to_global_loopback_fails
- ns_diff_local_to_local_loopback_fails
- ns_diff_global_to_global_loopback_ok
- ns_same_local_loopback_ok
- ns_same_local_host_connect_to_local_vm_ok
- ns_same_local_vm_connect_to_local_host_ok
) readonly TEST_DESCS=( # vm_server_host_client @@ -82,6 +96,45 @@ readonly TEST_DESCS=(
# ns_local_global_same_cid_ok "Check QEMU successfully starts one VM in a local ns and then another VM in a global ns with the same CID."
- # ns_diff_global_host_connect_to_global_vm_ok
- "Run vsock_test client in global ns with server in VM in another global ns."
- # ns_diff_global_host_connect_to_local_vm_fails
- "Run socat to test a process in a global ns fails to connect to a VM in a local ns."
- # ns_diff_global_vm_connect_to_global_host_ok
- "Run vsock_test client in VM in a global ns with server in another global ns."
- # ns_diff_global_vm_connect_to_local_host_fails
- "Run socat to test a VM in a global ns fails to connect to a host process in a local ns."
- # ns_diff_local_host_connect_to_local_vm_fails
- "Run socat to test a host process in a local ns fails to connect to a VM in another local ns."
- # ns_diff_local_vm_connect_to_local_host_fails
- "Run socat to test a VM in a local ns fails to connect to a host process in another local ns."
- # ns_diff_global_to_local_loopback_local_fails
- "Run socat to test a loopback vsock in a global ns fails to connect to a vsock in a local ns."
- # ns_diff_local_to_global_loopback_fails
- "Run socat to test a loopback vsock in a local ns fails to connect to a vsock in a global ns."
- # ns_diff_local_to_local_loopback_fails
- "Run socat to test a loopback vsock in a local ns fails to connect to a vsock in another local ns."
- # ns_diff_global_to_global_loopback_ok
- "Run socat to test a loopback vsock in a global ns successfully connects to a vsock in another global ns."
- # ns_same_local_loopback_ok
- "Run socat to test a loopback vsock in a local ns successfully connects to a vsock in the same ns."
- # ns_same_local_host_connect_to_local_vm_ok
- "Run vsock_test client in a local ns with server in VM in same ns."
- # ns_same_local_vm_connect_to_local_host_ok
- "Run vsock_test client in VM in a local ns with server in same ns."
)
readonly USE_SHARED_VM=( @@ -113,7 +166,7 @@ usage() { for ((i = 0; i < ${#TEST_NAMES[@]}; i++)); do name=${TEST_NAMES[${i}]} desc=${TEST_DESCS[${i}]}
printf "\t%-35s%-35s\n" "${name}" "${desc}"
done echoprintf "\t%-55s%-35s\n" "${name}" "${desc}"@@ -232,7 +285,7 @@ check_args() { }
check_deps() {
- for dep in vng ${QEMU} busybox pkill ssh; do
- for dep in vng ${QEMU} busybox pkill ssh socat; do if [[ ! -x $(command -v "${dep}") ]]; then echo -e "skip: dependency ${dep} not found!\n" exit "${KSFT_SKIP}"
@@ -283,6 +336,20 @@ check_vng() { fi }
+check_socat() {
- local support_string
- support_string="$(socat -V)"
- if [[ "${support_string}" != *"WITH_VSOCK 1"* ]]; then
die "err: socat is missing vsock support"- fi
- if [[ "${support_string}" != *"WITH_UNIX 1"* ]]; then
die "err: socat is missing unix support"- fi
+}
handle_build() { if [[ ! "${BUILD}" -eq 1 ]]; then return @@ -331,6 +398,14 @@ terminate_pidfiles() { done }
+terminate_pids() {
- local pid
- for pid in "$@"; do
kill -SIGTERM "${pid}" &>/dev/null || :- done
+}
vm_start() { local pidfile=$1 local ns=$2 @@ -573,6 +648,389 @@ test_ns_host_vsock_ns_mode_ok() { return "${KSFT_PASS}" }
+test_ns_diff_global_host_connect_to_global_vm_ok() {
- local pids pid pidfile
- local ns0 ns1 port
- declare -a pids
- local unixfile
- ns0="global0"
- ns1="global1"
- port=1234
- local rc
- init_namespaces
- pidfile="$(create_pidfile)"
- if ! vm_start "${pidfile}" "${ns0}"; then
return "${KSFT_FAIL}"- fi
- unixfile=$(mktemp -u /tmp/XXXX.sock)
- ip netns exec "${ns1}" \
socat TCP-LISTEN:"${TEST_HOST_PORT}",fork \UNIX-CONNECT:"${unixfile}" &- pids+=($!)
- host_wait_for_listener "${ns1}" "${TEST_HOST_PORT}"
- ip netns exec "${ns0}" socat UNIX-LISTEN:"${unixfile}",fork \
TCP-CONNECT:localhost:"${TEST_HOST_PORT}" &- pids+=($!)
- vm_vsock_test "${ns0}" "server" 2 "${TEST_GUEST_PORT}"
- vm_wait_for_listener "${ns0}" "${TEST_GUEST_PORT}"
- host_vsock_test "${ns1}" "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}"
- rc=$?
- for pid in "${pids[@]}"; do
if [[ "$(jobs -p)" = *"${pid}"* ]]; thenkill -SIGTERM "${pid}" &>/dev/nullfi- done
In run_shared_vm_test() we are also checking oops, warn in both host and VM, should we do the same here in each no-shared test that boot a VM?
I mean, should we generalize run_shared_vm_test() and use it for both kind of tests?
Stefano
- terminate_pidfiles "${pidfile}"
- if [[ "${rc}" -ne 0 ]]; then
return "${KSFT_FAIL}"- fi
- return "${KSFT_PASS}"
+}
+test_ns_diff_global_host_connect_to_local_vm_fails() {
- local ns0="global0"
- local ns1="local0"
- local port=12345
- local pidfile
- local result
- local pid
- init_namespaces
- outfile=$(mktemp)
- pidfile="$(create_pidfile)"
- if ! vm_start "${pidfile}" "${ns1}"; then
log_host "failed to start vm (cid=${VSOCK_CID}, ns=${ns0})"return "${KSFT_FAIL}"- fi
- vm_wait_for_ssh "${ns1}"
- vm_ssh "${ns1}" -- socat VSOCK-LISTEN:"${port}" STDOUT > "${outfile}" &
- echo TEST | ip netns exec "${ns0}" \
socat STDIN VSOCK-CONNECT:"${VSOCK_CID}":"${port}" 2>/dev/null- terminate_pidfiles "${pidfile}"
- result=$(cat "${outfile}")
- rm -f "${outfile}"
- if [[ "${result}" != TEST ]]; then
return "${KSFT_PASS}"- fi
- return "${KSFT_FAIL}"
+}
+test_ns_diff_global_vm_connect_to_global_host_ok() {
- local ns0="global0"
- local ns1="global1"
- local port=12345
- local unixfile
- local pidfile
- local pids
- init_namespaces
- declare -a pids
- log_host "Setup socat bridge from ns ${ns0} to ns ${ns1} over port ${port}"
- unixfile=$(mktemp -u /tmp/XXXX.sock)
- ip netns exec "${ns0}" \
socat TCP-LISTEN:"${port}" UNIX-CONNECT:"${unixfile}" &- pids+=($!)
- ip netns exec "${ns1}" \
socat UNIX-LISTEN:"${unixfile}" TCP-CONNECT:127.0.0.1:"${port}" &- pids+=($!)
- log_host "Launching ${VSOCK_TEST} in ns ${ns1}"
- host_vsock_test "${ns1}" "server" "${VSOCK_CID}" "${port}"
- pidfile="$(create_pidfile)"
- if ! vm_start "${pidfile}" "${ns0}"; then
log_host "failed to start vm (cid=${cid}, ns=${ns0})"terminate_pids "${pids[@]}"rm -f "${unixfile}"return "${KSFT_FAIL}"- fi
- vm_wait_for_ssh "${ns0}"
- vm_vsock_test "${ns0}" "10.0.2.2" 2 "${port}"
- rc=$?
- terminate_pidfiles "${pidfile}"
- terminate_pids "${pids[@]}"
- rm -f "${unixfile}"
- if [[ ! $rc -eq 0 ]]; then
return "${KSFT_FAIL}"- fi
- return "${KSFT_PASS}"
+}
+test_ns_diff_global_vm_connect_to_local_host_fails() {
- local ns0="global0"
- local ns1="local0"
- local port=12345
- local pidfile
- local result
- local pid
- init_namespaces
- log_host "Launching socat in ns ${ns1}"
- outfile=$(mktemp)
- ip netns exec "${ns1}" socat VSOCK-LISTEN:"${port}" STDOUT &> "${outfile}" &
- pid=$!
- pidfile="$(create_pidfile)"
- if ! vm_start "${pidfile}" "${ns0}"; then
log_host "failed to start vm (cid=${cid}, ns=${ns0})"terminate_pids "${pid}"rm -f "${outfile}"return "${KSFT_FAIL}"- fi
- vm_wait_for_ssh "${ns0}"
- vm_ssh "${ns0}" -- \
bash -c "echo TEST | socat STDIN VSOCK-CONNECT:2:${port}" 2>&1 | log_guest- terminate_pidfiles "${pidfile}"
- terminate_pids "${pid}"
- result=$(cat "${outfile}")
- rm -f "${outfile}"
- if [[ "${result}" != TEST ]]; then
return "${KSFT_PASS}"- fi
- return "${KSFT_FAIL}"
+}
+test_ns_diff_local_host_connect_to_local_vm_fails() {
- local ns0="local0"
- local ns1="local1"
- local port=12345
- local pidfile
- local result
- local pid
- init_namespaces
- outfile=$(mktemp)
- pidfile="$(create_pidfile)"
- if ! vm_start "${pidfile}" "${ns1}"; then
log_host "failed to start vm (cid=${cid}, ns=${ns0})"return "${KSFT_FAIL}"- fi
- vm_wait_for_ssh "${ns1}"
- vm_ssh "${ns1}" -- socat VSOCK-LISTEN:"${port}" STDOUT > "${outfile}" &
- echo TEST | ip netns exec "${ns0}" \
socat STDIN VSOCK-CONNECT:"${VSOCK_CID}":"${port}" 2>/dev/null- terminate_pidfiles "${pidfile}"
- result=$(cat "${outfile}")
- rm -f "${outfile}"
- if [[ "${result}" != TEST ]]; then
return "${KSFT_PASS}"- fi
- return "${KSFT_FAIL}"
+}
+test_ns_diff_local_vm_connect_to_local_host_fails() {
- local ns0="local0"
- local ns1="local1"
- local port=12345
- local pidfile
- local result
- local pid
- init_namespaces
- log_host "Launching socat in ns ${ns1}"
- outfile=$(mktemp)
- ip netns exec "${ns1}" socat VSOCK-LISTEN:"${port}" STDOUT &> "${outfile}" &
- pid=$!
- pidfile="$(create_pidfile)"
- if ! vm_start "${pidfile}" "${ns0}"; then
log_host "failed to start vm (cid=${cid}, ns=${ns0})"rm -f "${outfile}"return "${KSFT_FAIL}"- fi
- vm_wait_for_ssh "${ns0}"
- vm_ssh "${ns0}" -- \
bash -c "echo TEST | socat STDIN VSOCK-CONNECT:2:${port}" 2>&1 | log_guest- terminate_pidfiles "${pidfile}"
- terminate_pids "${pid}"
- result=$(cat "${outfile}")
- rm -f "${outfile}"
- if [[ "${result}" != TEST ]]; then
return "${KSFT_PASS}"- fi
- return "${KSFT_FAIL}"
+}
+__test_loopback_two_netns() {
- local ns0=$1
- local ns1=$2
- local port=12345
- local result
- local pid
- modprobe vsock_loopback &> /dev/null || :
- log_host "Launching socat in ns ${ns1}"
- outfile=$(mktemp)
- ip netns exec "${ns1}" socat VSOCK-LISTEN:"${port}" STDOUT > "${outfile}" 2>/dev/null &
- pid=$!
- log_host "Launching socat in ns ${ns0}"
- echo TEST | ip netns exec "${ns0}" socat STDIN VSOCK-CONNECT:1:"${port}" 2>/dev/null
- terminate_pids "${pid}"
- result=$(cat "${outfile}")
- rm -f "${outfile}"
- if [[ "${result}" == TEST ]]; then
return 0- fi
- return 1
+}
+test_ns_diff_global_to_local_loopback_local_fails() {
- init_namespaces
- if ! __test_loopback_two_netns "global0" "local0"; then
return "${KSFT_PASS}"- fi
- return "${KSFT_FAIL}"
+}
+test_ns_diff_local_to_global_loopback_fails() {
- init_namespaces
- if ! __test_loopback_two_netns "local0" "global0"; then
return "${KSFT_PASS}"- fi
- return "${KSFT_FAIL}"
+}
+test_ns_diff_local_to_local_loopback_fails() {
- init_namespaces
- if ! __test_loopback_two_netns "local0" "local1"; then
return "${KSFT_PASS}"- fi
- return "${KSFT_FAIL}"
+}
+test_ns_diff_global_to_global_loopback_ok() {
- init_namespaces
- if __test_loopback_two_netns "global0" "global1"; then
return "${KSFT_PASS}"- fi
- return "${KSFT_FAIL}"
+}
+test_ns_same_local_loopback_ok() {
- init_namespaces
- if __test_loopback_two_netns "local0" "local0"; then
return "${KSFT_PASS}"- fi
- return "${KSFT_FAIL}"
+}
+test_ns_same_local_host_connect_to_local_vm_ok() {
- local ns="local0"
- local port=1234
- local pidfile
- local rc
- init_namespaces
- pidfile="$(create_pidfile)"
- if ! vm_start "${pidfile}" "${ns}"; then
return "${KSFT_FAIL}"- fi
- vm_vsock_test "${ns}" "server" 2 "${TEST_GUEST_PORT}"
- host_vsock_test "${ns}" "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}"
- rc=$?
- terminate_pidfiles "${pidfile}"
- if [[ $rc -ne 0 ]]; then
return "${KSFT_FAIL}"- fi
- return "${KSFT_PASS}"
+}
+test_ns_same_local_vm_connect_to_local_host_ok() {
- local ns="local0"
- local port=1234
- local pidfile
- local rc
- init_namespaces
- pidfile="$(create_pidfile)"
- if ! vm_start "${pidfile}" "${ns}"; then
return "${KSFT_FAIL}"- fi
- vm_vsock_test "${ns}" "server" 2 "${TEST_GUEST_PORT}"
- host_vsock_test "${ns}" "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}"
- rc=$?
- terminate_pidfiles "${pidfile}"
- if [[ $rc -ne 0 ]]; then
return "${KSFT_FAIL}"- fi
- return "${KSFT_PASS}"
+}
namespaces_can_boot_same_cid() { local ns0=$1 local ns1=$2 @@ -851,6 +1309,7 @@ fi check_args "${ARGS[@]}" check_deps check_vng +check_socat handle_build
echo "1..${#ARGS[@]}"
-- 2.47.3
From: Bobby Eshleman bobbyeshleman@meta.com
Add tests that validate vsock sockets are resilient to deleting namespaces or changing namespace modes from global to local. The vsock sockets should still function normally.
The function check_ns_changes_dont_break_connection() is added to re-use the step-by-step logic of 1) setup connections, 2) do something that would maybe break the connections, 3) check that the connections are still ok.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- Changes in v9: - more consistent shell style - clarify -u usage comment for pipefile --- tools/testing/selftests/vsock/vmtest.sh | 124 ++++++++++++++++++++++++++++++++ 1 file changed, 124 insertions(+)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh index 111059924287..4caa7d47f407 100755 --- a/tools/testing/selftests/vsock/vmtest.sh +++ b/tools/testing/selftests/vsock/vmtest.sh @@ -66,6 +66,12 @@ readonly TEST_NAMES=( ns_same_local_loopback_ok ns_same_local_host_connect_to_local_vm_ok ns_same_local_vm_connect_to_local_host_ok + ns_mode_change_connection_continue_vm_ok + ns_mode_change_connection_continue_host_ok + ns_mode_change_connection_continue_both_ok + ns_delete_vm_ok + ns_delete_host_ok + ns_delete_both_ok ) readonly TEST_DESCS=( # vm_server_host_client @@ -135,6 +141,24 @@ readonly TEST_DESCS=(
# ns_same_local_vm_connect_to_local_host_ok "Run vsock_test client in VM in a local ns with server in same ns." + + # ns_mode_change_connection_continue_vm_ok + "Check that changing NS mode of VM namespace from global to local after a connection is established doesn't break the connection" + + # ns_mode_change_connection_continue_host_ok + "Check that changing NS mode of host namespace from global to local after a connection is established doesn't break the connection" + + # ns_mode_change_connection_continue_both_ok + "Check that changing NS mode of host and VM namespaces from global to local after a connection is established doesn't break the connection" + + # ns_delete_vm_ok + "Check that deleting the VM's namespace does not break the socket connection" + + # ns_delete_host_ok + "Check that deleting the host's namespace does not break the socket connection" + + # ns_delete_both_ok + "Check that deleting the VM and host's namespaces does not break the socket connection" )
readonly USE_SHARED_VM=( @@ -1172,6 +1196,106 @@ test_ns_vm_local_mode_rejected() { return "${KSFT_PASS}" }
+check_ns_changes_dont_break_connection() { + local ns0="global0" + local ns1="global1" + local port=12345 + local pidfile + local outfile + local pids=() + local rc=0 + + init_namespaces + + pidfile="$(create_pidfile)" + if ! vm_start "${pidfile}" "${ns0}"; then + return "${KSFT_FAIL}" + fi + vm_wait_for_ssh "${ns0}" + + outfile=$(mktemp) + vm_ssh "${ns0}" -- \ + socat VSOCK-LISTEN:"${port}",fork STDOUT > "${outfile}" 2>/dev/null & + pids+=($!) + + # wait_for_listener() does not work for vsock because vsock does not + # export socket state to /proc/net/. Instead, we have no choice but to + # sleep for some hardcoded time. + sleep "${WAIT_PERIOD}" + + # We use a pipe here so that we can echo into the pipe instead of using + # socat and a unix socket file. We just need a name for the pipe (not a + # regular file) so use -u. + local pipefile=$(mktemp -u /tmp/vmtest_pipe_XXXX) + ip netns exec "${ns1}" \ + socat PIPE:"${pipefile}" VSOCK-CONNECT:"${VSOCK_CID}":"${port}" & + pids+=($!) + + timeout "${WAIT_PERIOD}" \ + bash -c 'while [[ ! -e '"${pipefile}"' ]]; do sleep 1; done; exit 0' + + if [[ $2 == "delete" ]]; then + if [[ "$1" == "vm" ]]; then + ip netns del "${ns0}" + elif [[ "$1" == "host" ]]; then + ip netns del "${ns1}" + elif [[ "$1" == "both" ]]; then + ip netns del "${ns0}" + ip netns del "${ns1}" + fi + elif [[ $2 == "change_mode" ]]; then + if [[ "$1" == "vm" ]]; then + ns_set_mode "${ns0}" "local" + elif [[ "$1" == "host" ]]; then + ns_set_mode "${ns1}" "local" + elif [[ "$1" == "both" ]]; then + ns_set_mode "${ns0}" "local" + ns_set_mode "${ns1}" "local" + fi + fi + + echo "TEST" > "${pipefile}" + + timeout "${WAIT_PERIOD}" \ + bash -c 'while [[ ! -s '"${outfile}"' ]]; do sleep 1; done; exit 0' + + if grep -q "TEST" "${outfile}"; then + rc="${KSFT_PASS}" + else + rc="${KSFT_FAIL}" + fi + + terminate_pidfiles "${pidfile}" + terminate_pids "${pids[@]}" + rm -f "${outfile}" + + return "${rc}" +} + +test_ns_mode_change_connection_continue_vm_ok() { + check_ns_changes_dont_break_connection "vm" "change_mode" +} + +test_ns_mode_change_connection_continue_host_ok() { + check_ns_changes_dont_break_connection "host" "change_mode" +} + +test_ns_mode_change_connection_continue_both_ok() { + check_ns_changes_dont_break_connection "both" "change_mode" +} + +test_ns_delete_vm_ok() { + check_ns_changes_dont_break_connection "vm" "delete" +} + +test_ns_delete_host_ok() { + check_ns_changes_dont_break_connection "host" "delete" +} + +test_ns_delete_both_ok() { + check_ns_changes_dont_break_connection "both" "delete" +} + shared_vm_test() { local tname
linux-kselftest-mirror@lists.linaro.org