This series adds namespace support to vhost-vsock. It does not add namespaces to any of the guest transports (virtio-vsock, hyperv, or vmci).
The current revision only supports two modes: local or global. Local mode is complete isolation of namespaces, while global mode is complete sharing between namespaces of CIDs (the original behavior).
Future may include supporting a mixed mode, which I expect to be more complicated because socket lookups will have to include new logic and API changes to behave differently based on if the lookup is part of a mixed mode CID allocation, a global CID allocation, a mixed-to-global connection (allowed), or a global-to-mixed connection (not allowed).
Modes are per-netns and write-once. This allows a system to configure namespaces independently (some may share CIDs, others are completely isolated). This also supports future mixed use cases, where there may be namespaces in global mode spinning up VMs while there are mixed mode namespaces that provide services to the VMs, but are not allowed to allocate from the global CID pool.
Thanks again for everyone's help and reviews!
Signed-off-by: Bobby Eshleman bobbyeshleman@gmail.com To: Stefano Garzarella sgarzare@redhat.com To: Shuah Khan shuah@kernel.org To: David S. Miller davem@davemloft.net To: Eric Dumazet edumazet@google.com To: Jakub Kicinski kuba@kernel.org To: Paolo Abeni pabeni@redhat.com To: Simon Horman horms@kernel.org To: Stefan Hajnoczi stefanha@redhat.com To: Michael S. Tsirkin mst@redhat.com To: Jason Wang jasowang@redhat.com To: Xuan Zhuo xuanzhuo@linux.alibaba.com To: Eugenio Pérez eperezma@redhat.com To: K. Y. Srinivasan kys@microsoft.com To: Haiyang Zhang haiyangz@microsoft.com To: Wei Liu wei.liu@kernel.org To: Dexuan Cui decui@microsoft.com To: Bryan Tan bryan-bt.tan@broadcom.com To: Vishnu Dasa vishnu.dasa@broadcom.com To: Broadcom internal kernel review list bcm-kernel-feedback-list@broadcom.com Cc: virtualization@lists.linux.dev Cc: netdev@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org Cc: linux-hyperv@vger.kernel.org Cc: berrange@redhat.com
Changes in v4: - removed RFC tag - implemented loopback support - renamed new tests to better reflect behavior - completed suite of tests with permutations of ns modes and vsock_test as guest/host - simplified socat bridging with unix socket instead of tcp + veth - only use vsock_test for success case, socat for failure case (context in commit message) - lots of cleanup
Changes in v3: - add notion of "modes" - add procfs /proc/net/vsock_ns_mode - local and global modes only - no /dev/vhost-vsock-netns - vmtest.sh already merged, so new patch just adds new tests for NS - Link to v2: https://lore.kernel.org/kvm/20250312-vsock-netns-v2-0-84bffa1aa97a@gmail.com
Changes in v2: - only support vhost-vsock namespaces - all g2h namespaces retain old behavior, only common API changes impacted by vhost-vsock changes - add /dev/vhost-vsock-netns for "opt-in" - leave /dev/vhost-vsock to old behavior - removed netns module param - Link to v1: https://lore.kernel.org/r/20200116172428.311437-1-sgarzare@redhat.com
Changes in v1: - added 'netns' module param to vsock.ko to enable the network namespace support (disabled by default) - added 'vsock_net_eq()' to check the "net" assigned to a socket only when 'netns' support is enabled - Link to RFC: https://patchwork.ozlabs.org/cover/1202235/
--- Bobby Eshleman (12): vsock: a per-net vsock NS mode state vsock: add net to vsock skb cb vsock: add netns to af_vsock core vsock/virtio: add netns to virtio transport common vhost/vsock: add netns support vsock/virtio: use the global netns hv_sock: add netns hooks vsock/vmci: add netns hooks vsock/loopback: add netns support selftests/vsock: improve logging in vmtest.sh selftests/vsock: invoke vsock_test through helpers selftests/vsock: add namespace tests
MAINTAINERS | 1 + drivers/vhost/vsock.c | 48 +- include/linux/virtio_vsock.h | 12 + include/net/af_vsock.h | 59 +- include/net/net_namespace.h | 4 + include/net/netns/vsock.h | 21 + net/vmw_vsock/af_vsock.c | 204 +++++- net/vmw_vsock/hyperv_transport.c | 2 +- net/vmw_vsock/virtio_transport.c | 5 +- net/vmw_vsock/virtio_transport_common.c | 14 +- net/vmw_vsock/vmci_transport.c | 4 +- net/vmw_vsock/vsock_loopback.c | 59 +- tools/testing/selftests/vsock/vmtest.sh | 1088 ++++++++++++++++++++++++++----- 13 files changed, 1330 insertions(+), 191 deletions(-) --- base-commit: dd500e4aecf25e48e874ca7628697969df679493 change-id: 20250325-vsock-vmtest-b3a21d2102c2
Best regards,
From: Bobby Eshleman bobbyeshleman@meta.com
Add the per-net vsock NS mode state. This only adds the structure for holding the mode necessary and some of the definitions, but does not integrate the functionality yet.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- MAINTAINERS | 1 + include/net/af_vsock.h | 42 ++++++++++++++++++++++++++++++++++++++++++ include/net/net_namespace.h | 4 ++++ include/net/netns/vsock.h | 18 ++++++++++++++++++ 4 files changed, 65 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS index 1bc1698bc5ae..76905fc1c1d3 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -26208,6 +26208,7 @@ L: netdev@vger.kernel.org S: Maintained F: drivers/vhost/vsock.c F: include/linux/virtio_vsock.h +F: include/net/netns/vsock.h F: include/uapi/linux/virtio_vsock.h F: net/vmw_vsock/virtio_transport.c F: net/vmw_vsock/virtio_transport_common.c diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index d40e978126e3..d34bf7dbc69a 100644 --- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -10,6 +10,7 @@
#include <linux/kernel.h> #include <linux/workqueue.h> +#include <net/netns/vsock.h> #include <net/sock.h> #include <uapi/linux/vm_sockets.h>
@@ -256,4 +257,45 @@ static inline bool vsock_msgzerocopy_allow(const struct vsock_transport *t) { return t->msgzerocopy_allow && t->msgzerocopy_allow(); } + +static inline u8 vsock_net_mode(struct net *net) +{ + u8 ret; + + spin_lock_bh(&net->vsock.lock); + ret = net->vsock.ns_mode; + spin_unlock_bh(&net->vsock.lock); + return ret; +} + +static inline void vsock_net_set_mode(struct net *net, u8 mode) +{ + spin_lock_bh(&net->vsock.lock); + net->vsock.ns_mode = mode; + net->vsock.written = true; + spin_unlock_bh(&net->vsock.lock); +} + +/* Return true if mode has already been written once. Otherwise, return false. */ +static inline bool vsock_net_mode_can_set(struct net *net) +{ + bool ret; + + spin_lock_bh(&net->vsock.lock); + ret = !net->vsock.written; + spin_unlock_bh(&net->vsock.lock); + + return ret; +} + +/* Return true if vsock net mode check passes. Otherwise, return false. + * + * Read more about modes in comment header of net/vmw_vsock/af_vsock.c. + */ +static inline bool vsock_net_check_mode(struct net *n1, struct net *n2) +{ + return net_eq(n1, n2) || + (vsock_net_mode(n1) == VSOCK_NET_MODE_GLOBAL && + vsock_net_mode(n2) == VSOCK_NET_MODE_GLOBAL); +} #endif /* __AF_VSOCK_H__ */ diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index 025a7574b275..005c0da4fb62 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -37,6 +37,7 @@ #include <net/netns/smc.h> #include <net/netns/bpf.h> #include <net/netns/mctp.h> +#include <net/netns/vsock.h> #include <net/net_trackers.h> #include <linux/ns_common.h> #include <linux/idr.h> @@ -196,6 +197,9 @@ struct net { /* Move to a better place when the config guard is removed. */ struct mutex rtnl_mutex; #endif +#if IS_ENABLED(CONFIG_VSOCKETS) + struct netns_vsock vsock; +#endif } __randomize_layout;
#include <linux/seq_file_net.h> diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h new file mode 100644 index 000000000000..0bad4652815c --- /dev/null +++ b/include/net/netns/vsock.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __NET_NET_NAMESPACE_VSOCK_H +#define __NET_NET_NAMESPACE_VSOCK_H + +#include <linux/types.h> + +#define VSOCK_NET_MODE_GLOBAL 1 +#define VSOCK_NET_MODE_LOCAL (1 << 1) + +struct netns_vsock { + struct ctl_table_header *vsock_hdr; + spinlock_t lock; + + /* protected by lock */ + u8 ns_mode; + bool written; +}; +#endif /* __NET_NET_NAMESPACE_VSOCK_H */
From: Bobby Eshleman bobbyeshleman@meta.com
Add a net pointer to the vsock skb and helpers for getting/setting it. This is in preparation for adding vsock NS support.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- include/linux/virtio_vsock.h | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index 36fb3edfa403..93edc1e798a5 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -13,6 +13,7 @@ struct virtio_vsock_skb_cb { bool reply; bool tap_delivered; u32 offset; + struct net *net; };
#define VIRTIO_VSOCK_SKB_CB(skb) ((struct virtio_vsock_skb_cb *)((skb)->cb)) @@ -111,6 +112,16 @@ static inline size_t virtio_vsock_skb_len(struct sk_buff *skb) return (size_t)(skb_end_pointer(skb) - skb->head); }
+static inline struct net *virtio_vsock_skb_net(struct sk_buff *skb) +{ + return VIRTIO_VSOCK_SKB_CB(skb)->net; +} + +static inline void virtio_vsock_skb_set_net(struct sk_buff *skb, struct net *net) +{ + VIRTIO_VSOCK_SKB_CB(skb)->net = net; +} + #define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE (1024 * 4) #define VIRTIO_VSOCK_MAX_BUF_SIZE 0xFFFFFFFFUL #define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE (1024 * 64)
From: Bobby Eshleman bobbyeshleman@meta.com
Add netns functionality (initialization, passing to transports, procfs, etc...) to the af_vsock socket layer. Later patches that add netns support to transports depend on this patch.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- include/net/af_vsock.h | 13 +++- net/vmw_vsock/af_vsock.c | 198 +++++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 194 insertions(+), 17 deletions(-)
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index d34bf7dbc69a..0c0c351394de 100644 --- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -144,7 +144,7 @@ struct vsock_transport { int flags); int (*seqpacket_enqueue)(struct vsock_sock *vsk, struct msghdr *msg, size_t len); - bool (*seqpacket_allow)(u32 remote_cid); + bool (*seqpacket_allow)(struct vsock_sock *vsk, u32 remote_cid); u32 (*seqpacket_has_data)(struct vsock_sock *vsk);
/* Notification. */ @@ -214,9 +214,10 @@ void vsock_enqueue_accept(struct sock *listener, struct sock *connected); void vsock_insert_connected(struct vsock_sock *vsk); void vsock_remove_bound(struct vsock_sock *vsk); void vsock_remove_connected(struct vsock_sock *vsk); -struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr); +struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr, struct net *net); struct sock *vsock_find_connected_socket(struct sockaddr_vm *src, - struct sockaddr_vm *dst); + struct sockaddr_vm *dst, + struct net *net); void vsock_remove_sock(struct vsock_sock *vsk); void vsock_for_each_connected_socket(struct vsock_transport *transport, void (*fn)(struct sock *sk)); @@ -258,6 +259,12 @@ static inline bool vsock_msgzerocopy_allow(const struct vsock_transport *t) return t->msgzerocopy_allow && t->msgzerocopy_allow(); }
+extern struct net __vsock_global_net; +static inline struct net *vsock_global_net(void) +{ + return &__vsock_global_net; +} + static inline u8 vsock_net_mode(struct net *net) { u8 ret; diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 218d91e6b32b..c69c2db03162 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -83,6 +83,24 @@ * TCP_ESTABLISHED - connected * TCP_CLOSING - disconnecting * TCP_LISTEN - listening + * + * - Namespaces in vsock support two different modes configured + * through /proc/net/vsock_ns_mode. The modes are "local" and "global". + * Each mode defines how the namespace interacts with CIDs. + * /proc/net/vsock_ns_mode is write-once, so that it may be configured + * by a namespace manager. The default is "global". The mode is set + * per-namespace. + * + * The modes affect the allocation and accessibility of CIDs as follows: + * - global - aka fully public + * - CID allocation draws from the public pool + * - AF_VSOCK sockets may reach any CID allocated from the public pool + * - AF_VSOCK sockets may not reach CIDs allocated from private pools + * + * - local - aka fully private + * - CID allocation draws only from the private pool, does not affect public pool + * - AF_VSOCK sockets may only reach CIDs from the private pool + * - AF_VSOCK sockets may not reach CIDs allocated from outside the pool */
#include <linux/compat.h> @@ -100,6 +118,7 @@ #include <linux/module.h> #include <linux/mutex.h> #include <linux/net.h> +#include <linux/proc_fs.h> #include <linux/poll.h> #include <linux/random.h> #include <linux/skbuff.h> @@ -111,6 +130,7 @@ #include <linux/workqueue.h> #include <net/sock.h> #include <net/af_vsock.h> +#include <net/netns/vsock.h> #include <uapi/linux/vm_sockets.h> #include <uapi/asm-generic/ioctls.h>
@@ -149,6 +169,9 @@ static const struct vsock_transport *transport_dgram; static const struct vsock_transport *transport_local; static DEFINE_MUTEX(vsock_register_mutex);
+struct net __vsock_global_net; +EXPORT_SYMBOL_GPL(__vsock_global_net); + /**** UTILS ****/
/* Each bound VSocket is stored in the bind hash table and each connected @@ -235,33 +258,42 @@ static void __vsock_remove_connected(struct vsock_sock *vsk) sock_put(&vsk->sk); }
-static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr) +static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr, + struct net *net) { struct vsock_sock *vsk;
list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) { + struct sock *sk = sk_vsock(vsk); + if (vsock_addr_equals_addr(addr, &vsk->local_addr)) - return sk_vsock(vsk); + if (vsock_net_check_mode(net, sock_net(sk))) + return sk;
if (addr->svm_port == vsk->local_addr.svm_port && (vsk->local_addr.svm_cid == VMADDR_CID_ANY || - addr->svm_cid == VMADDR_CID_ANY)) - return sk_vsock(vsk); + addr->svm_cid == VMADDR_CID_ANY) && + vsock_net_check_mode(net, sock_net(sk))) + return sk; }
return NULL; }
static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src, - struct sockaddr_vm *dst) + struct sockaddr_vm *dst, + struct net *net) { struct vsock_sock *vsk;
list_for_each_entry(vsk, vsock_connected_sockets(src, dst), connected_table) { + struct sock *sk = sk_vsock(vsk); + if (vsock_addr_equals_addr(src, &vsk->remote_addr) && - dst->svm_port == vsk->local_addr.svm_port) { - return sk_vsock(vsk); + dst->svm_port == vsk->local_addr.svm_port && + vsock_net_check_mode(net, sock_net(sk))) { + return sk; } }
@@ -304,12 +336,12 @@ void vsock_remove_connected(struct vsock_sock *vsk) } EXPORT_SYMBOL_GPL(vsock_remove_connected);
-struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr) +struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr, struct net *net) { struct sock *sk;
spin_lock_bh(&vsock_table_lock); - sk = __vsock_find_bound_socket(addr); + sk = __vsock_find_bound_socket(addr, net); if (sk) sock_hold(sk);
@@ -320,12 +352,13 @@ struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr) EXPORT_SYMBOL_GPL(vsock_find_bound_socket);
struct sock *vsock_find_connected_socket(struct sockaddr_vm *src, - struct sockaddr_vm *dst) + struct sockaddr_vm *dst, + struct net *net) { struct sock *sk;
spin_lock_bh(&vsock_table_lock); - sk = __vsock_find_connected_socket(src, dst); + sk = __vsock_find_connected_socket(src, dst, net); if (sk) sock_hold(sk);
@@ -528,7 +561,7 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk)
if (sk->sk_type == SOCK_SEQPACKET) { if (!new_transport->seqpacket_allow || - !new_transport->seqpacket_allow(remote_cid)) { + !new_transport->seqpacket_allow(vsk, remote_cid)) { module_put(new_transport->module); return -ESOCKTNOSUPPORT; } @@ -678,6 +711,7 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk, { static u32 port; struct sockaddr_vm new_addr; + struct net *net = sock_net(sk_vsock(vsk));
if (!port) port = get_random_u32_above(LAST_RESERVED_PORT); @@ -694,7 +728,7 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk,
new_addr.svm_port = port++;
- if (!__vsock_find_bound_socket(&new_addr)) { + if (!__vsock_find_bound_socket(&new_addr, net)) { found = true; break; } @@ -711,7 +745,7 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk, return -EACCES; }
- if (__vsock_find_bound_socket(&new_addr)) + if (__vsock_find_bound_socket(&new_addr, net)) return -EADDRINUSE; }
@@ -2645,6 +2679,133 @@ static struct miscdevice vsock_device = { .fops = &vsock_device_ops, };
+#define VSOCK_NS_MODE_NAME_MAX 8 + +static struct ctl_table vsock_table[] = { + { + .procname = "vsock_ns_mode", + .data = &init_net.vsock.ns_mode, + .maxlen = sizeof(u8), + .mode = 0644, + .proc_handler = proc_dostring + }, +}; + +static int __net_init vsock_sysctl_register(struct net *net) +{ + struct ctl_table *table; + + if (net_eq(net, &init_net)) { + table = vsock_table; + } else { + table = kmemdup(vsock_table, sizeof(vsock_table), GFP_KERNEL); + if (!table) + goto err_alloc; + + table[0].data = &net->vsock.ns_mode; + } + + net->vsock.vsock_hdr = register_net_sysctl_sz(net, "net/vsock", table, + ARRAY_SIZE(vsock_table)); + if (!net->vsock.vsock_hdr) + goto err_reg; + + return 0; + +err_reg: + if (!net_eq(net, &init_net)) + kfree(table); +err_alloc: + return -ENOMEM; +} + +static void vsock_sysctl_unregister(struct net *net) +{ + const struct ctl_table *table; + + table = net->vsock.vsock_hdr->ctl_table_arg; + unregister_net_sysctl_table(net->vsock.vsock_hdr); + if (!net_eq(net, &init_net)) + kfree(table); +} + +#ifdef CONFIG_PROC_FS +static int vsock_proc_ns_mode_show(struct seq_file *seq, void *v) +{ + struct net *net = seq_file_single_net(seq); + const char *p = "invalid"; + + spin_lock_bh(&net->vsock.lock); + if (net->vsock.ns_mode == VSOCK_NET_MODE_GLOBAL) + p = "global"; + else if (net->vsock.ns_mode == VSOCK_NET_MODE_LOCAL) + p = "local"; + else + WARN_ONCE(1, "invalid vsock_ns_mode"); + spin_unlock_bh(&net->vsock.lock); + seq_printf(seq, "%s", p); + return 0; +} + +static int vsock_proc_ns_mode_write(struct file *file, char *buf, size_t size) +{ + struct seq_file *m = file->private_data; + struct net *net = seq_file_single_net(m); + size_t len = size - 1; + int ret = 0; + + if (!vsock_net_mode_can_set(net)) + return -EPERM; + + if (!strncmp(buf, "global", len)) + vsock_net_set_mode(net, VSOCK_NET_MODE_GLOBAL); + else if (!strncmp(buf, "local", len)) + vsock_net_set_mode(net, VSOCK_NET_MODE_LOCAL); + else + return -EINVAL; + + return ret; +} +#endif /* CONFIG_PROC_FS */ + +static void vsock_net_init(struct net *net) +{ + spin_lock_init(&net->vsock.lock); + net->vsock.ns_mode = VSOCK_NET_MODE_GLOBAL; +} + +static __net_init int vsock_sysctl_init_net(struct net *net) +{ + vsock_net_init(net); + + if (vsock_sysctl_register(net)) + return -ENOMEM; + +#ifdef CONFIG_PROC_FS + if (!proc_create_net_single_write("vsock_ns_mode", 0644, net->proc_net, + vsock_proc_ns_mode_show, + vsock_proc_ns_mode_write, + NULL)) + goto err_sysctl; +#endif + + return 0; + +err_sysctl: + vsock_sysctl_unregister(net); + return -ENOMEM; +} + +static __net_exit void vsock_sysctl_exit_net(struct net *net) +{ + vsock_sysctl_unregister(net); +} + +static struct pernet_operations vsock_sysctl_ops __net_initdata = { + .init = vsock_sysctl_init_net, + .exit = vsock_sysctl_exit_net, +}; + static int __init vsock_init(void) { int err = 0; @@ -2672,10 +2833,19 @@ static int __init vsock_init(void) goto err_unregister_proto; }
+ if (register_pernet_subsys(&vsock_sysctl_ops)) { + err = -ENOMEM; + goto err_unregister_sock; + } + + vsock_net_init(&init_net); + vsock_net_init(vsock_global_net()); vsock_bpf_build_proto();
return 0;
+err_unregister_sock: + sock_unregister(AF_VSOCK); err_unregister_proto: proto_unregister(&vsock_proto); err_deregister_misc:
From: Bobby Eshleman bobbyeshleman@meta.com
Add support to the virtio-vsock common code for passing around net namespace pointers (tx and rx). The series still requires vhost/virtio transport support to be added by future patches.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- include/linux/virtio_vsock.h | 1 + net/vmw_vsock/virtio_transport_common.c | 14 ++++++++++++-- 2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index 93edc1e798a5..81355f84b76c 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -160,6 +160,7 @@ struct virtio_vsock_pkt_info { u32 remote_cid, remote_port; struct vsock_sock *vsk; struct msghdr *msg; + struct net *net; u32 pkt_len; u16 type; u16 op; diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index 1b5d9896edae..310f2e92c527 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -313,6 +313,8 @@ static struct sk_buff *virtio_transport_alloc_skb(struct virtio_vsock_pkt_info * info->flags, zcopy);
+ virtio_vsock_skb_set_net(skb, info->net); + return skb; out: kfree_skb(skb); @@ -524,6 +526,7 @@ static int virtio_transport_send_credit_update(struct vsock_sock *vsk) struct virtio_vsock_pkt_info info = { .op = VIRTIO_VSOCK_OP_CREDIT_UPDATE, .vsk = vsk, + .net = sock_net(sk_vsock(vsk)), };
return virtio_transport_send_pkt_info(vsk, &info); @@ -1064,6 +1067,7 @@ int virtio_transport_connect(struct vsock_sock *vsk) struct virtio_vsock_pkt_info info = { .op = VIRTIO_VSOCK_OP_REQUEST, .vsk = vsk, + .net = sock_net(sk_vsock(vsk)), };
return virtio_transport_send_pkt_info(vsk, &info); @@ -1079,6 +1083,7 @@ int virtio_transport_shutdown(struct vsock_sock *vsk, int mode) (mode & SEND_SHUTDOWN ? VIRTIO_VSOCK_SHUTDOWN_SEND : 0), .vsk = vsk, + .net = sock_net(sk_vsock(vsk)), };
return virtio_transport_send_pkt_info(vsk, &info); @@ -1105,6 +1110,7 @@ virtio_transport_stream_enqueue(struct vsock_sock *vsk, .msg = msg, .pkt_len = len, .vsk = vsk, + .net = sock_net(sk_vsock(vsk)), };
return virtio_transport_send_pkt_info(vsk, &info); @@ -1142,6 +1148,7 @@ static int virtio_transport_reset(struct vsock_sock *vsk, .op = VIRTIO_VSOCK_OP_RST, .reply = !!skb, .vsk = vsk, + .net = sock_net(sk_vsock(vsk)), };
/* Send RST only if the original pkt is not a RST pkt */ @@ -1162,6 +1169,7 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t, .op = VIRTIO_VSOCK_OP_RST, .type = le16_to_cpu(hdr->type), .reply = true, + .net = virtio_vsock_skb_net(skb), }; struct sk_buff *reply;
@@ -1462,6 +1470,7 @@ virtio_transport_send_response(struct vsock_sock *vsk, .remote_port = le32_to_cpu(hdr->src_port), .reply = true, .vsk = vsk, + .net = sock_net(sk_vsock(vsk)), };
return virtio_transport_send_pkt_info(vsk, &info); @@ -1576,6 +1585,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t, struct sk_buff *skb) { struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb); + struct net *net = virtio_vsock_skb_net(skb); struct sockaddr_vm src, dst; struct vsock_sock *vsk; struct sock *sk; @@ -1603,9 +1613,9 @@ void virtio_transport_recv_pkt(struct virtio_transport *t, /* The socket must be in connected or bound table * otherwise send reset back */ - sk = vsock_find_connected_socket(&src, &dst); + sk = vsock_find_connected_socket(&src, &dst, net); if (!sk) { - sk = vsock_find_bound_socket(&dst); + sk = vsock_find_bound_socket(&dst, net); if (!sk) { (void)virtio_transport_reset_no_sock(t, skb); goto free_pkt;
From: Bobby Eshleman bobbyeshleman@meta.com
Add the ability to isolate vsock flows using namespaces.
The namespace for a VM is inherited from the PID that opened the vhost-vsock device.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- drivers/vhost/vsock.c | 48 ++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 38 insertions(+), 10 deletions(-)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 802153e23073..863419533a3f 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -46,6 +46,8 @@ static DEFINE_READ_MOSTLY_HASHTABLE(vhost_vsock_hash, 8); struct vhost_vsock { struct vhost_dev dev; struct vhost_virtqueue vqs[2]; + struct net *net; + netns_tracker ns_tracker;
/* Link to global vhost_vsock_hash, writes use vhost_vsock_mutex */ struct hlist_node hash; @@ -59,6 +61,22 @@ struct vhost_vsock { bool seqpacket_allow; };
+static void vhost_vsock_net_set(struct vhost_vsock *vsock, struct net *net) +{ + if (net_eq(net, vsock_global_net())) + vsock->net = vsock_global_net(); + else + vsock->net = get_net_track(net, &vsock->ns_tracker, GFP_KERNEL); +} + +static void vhost_vsock_net_put(struct vhost_vsock *vsock) +{ + if (net_eq(vsock->net, vsock_global_net())) + return; + + put_net_track(vsock->net, &vsock->ns_tracker); +} + static u32 vhost_transport_get_local_cid(void) { return VHOST_VSOCK_DEFAULT_HOST_CID; @@ -67,7 +85,7 @@ static u32 vhost_transport_get_local_cid(void) /* Callers that dereference the return value must hold vhost_vsock_mutex or the * RCU read lock. */ -static struct vhost_vsock *vhost_vsock_get(u32 guest_cid) +static struct vhost_vsock *vhost_vsock_get(u32 guest_cid, struct net *net) { struct vhost_vsock *vsock;
@@ -78,9 +96,8 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid) if (other_cid == 0) continue;
- if (other_cid == guest_cid) + if (other_cid == guest_cid && vsock_net_check_mode(net, vsock->net)) return vsock; - }
return NULL; @@ -272,13 +289,14 @@ static int vhost_transport_send_pkt(struct sk_buff *skb) { struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb); + struct net *net = virtio_vsock_skb_net(skb); struct vhost_vsock *vsock; int len = skb->len;
rcu_read_lock();
/* Find the vhost_vsock according to guest context id */ - vsock = vhost_vsock_get(le64_to_cpu(hdr->dst_cid)); + vsock = vhost_vsock_get(le64_to_cpu(hdr->dst_cid), net); if (!vsock) { rcu_read_unlock(); kfree_skb(skb); @@ -305,7 +323,7 @@ vhost_transport_cancel_pkt(struct vsock_sock *vsk) rcu_read_lock();
/* Find the vhost_vsock according to guest context id */ - vsock = vhost_vsock_get(vsk->remote_addr.svm_cid); + vsock = vhost_vsock_get(vsk->remote_addr.svm_cid, sock_net(sk_vsock(vsk))); if (!vsock) goto out;
@@ -403,7 +421,7 @@ static bool vhost_transport_msgzerocopy_allow(void) return true; }
-static bool vhost_transport_seqpacket_allow(u32 remote_cid); +static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid);
static struct virtio_transport vhost_transport = { .transport = { @@ -459,13 +477,14 @@ static struct virtio_transport vhost_transport = { .send_pkt = vhost_transport_send_pkt, };
-static bool vhost_transport_seqpacket_allow(u32 remote_cid) +static bool vhost_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid) { + struct net *net = sock_net(sk_vsock(vsk)); struct vhost_vsock *vsock; bool seqpacket_allow = false;
rcu_read_lock(); - vsock = vhost_vsock_get(remote_cid); + vsock = vhost_vsock_get(remote_cid, net);
if (vsock) seqpacket_allow = vsock->seqpacket_allow; @@ -525,6 +544,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work) continue; }
+ virtio_vsock_skb_set_net(skb, vsock->net); total_len += sizeof(*hdr) + skb->len;
/* Deliver to monitoring devices all received packets */ @@ -651,10 +671,16 @@ static void vhost_vsock_free(struct vhost_vsock *vsock)
static int vhost_vsock_dev_open(struct inode *inode, struct file *file) { + struct vhost_virtqueue **vqs; struct vhost_vsock *vsock; + struct net *net; int ret;
+ net = get_net_ns_by_pid(current->pid); + if (IS_ERR(net)) + return PTR_ERR(net); + /* This struct is large and allocation could fail, fall back to vmalloc * if there is no other way. */ @@ -668,6 +694,7 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file) goto out; }
+ vhost_vsock_net_set(vsock, net); vsock->guest_cid = 0; /* no CID assigned yet */ vsock->seqpacket_allow = false;
@@ -707,7 +734,7 @@ static void vhost_vsock_reset_orphans(struct sock *sk) */
/* If the peer is still valid, no need to reset connection */ - if (vhost_vsock_get(vsk->remote_addr.svm_cid)) + if (vhost_vsock_get(vsk->remote_addr.svm_cid, sock_net(sk))) return;
/* If the close timeout is pending, let it expire. This avoids races @@ -752,6 +779,7 @@ static int vhost_vsock_dev_release(struct inode *inode, struct file *file) virtio_vsock_skb_queue_purge(&vsock->send_pkt_queue);
vhost_dev_cleanup(&vsock->dev); + vhost_vsock_net_put(vsock); kfree(vsock->dev.vqs); vhost_vsock_free(vsock); return 0; @@ -778,7 +806,7 @@ static int vhost_vsock_set_cid(struct vhost_vsock *vsock, u64 guest_cid)
/* Refuse if CID is already in use */ mutex_lock(&vhost_vsock_mutex); - other = vhost_vsock_get(guest_cid); + other = vhost_vsock_get(guest_cid, vsock->net); if (other && other != vsock) { mutex_unlock(&vhost_vsock_mutex); return -EADDRINUSE;
From: Bobby Eshleman bobbyeshleman@meta.com
This changes virtio-vsock to always use the global netns dummy so that all guest vsock continues to operate in global mode. The guest vsock behavior is unchanged.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- net/vmw_vsock/virtio_transport.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index f0e48e6911fc..25c1bca7b136 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -536,7 +536,7 @@ static bool virtio_transport_msgzerocopy_allow(void) return true; }
-static bool virtio_transport_seqpacket_allow(u32 remote_cid); +static bool virtio_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid);
static struct virtio_transport virtio_transport = { .transport = { @@ -593,7 +593,7 @@ static struct virtio_transport virtio_transport = { .can_msgzerocopy = virtio_transport_can_msgzerocopy, };
-static bool virtio_transport_seqpacket_allow(u32 remote_cid) +static bool virtio_transport_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid) { struct virtio_vsock *vsock; bool seqpacket_allow; @@ -649,6 +649,7 @@ static void virtio_transport_rx_work(struct work_struct *work) }
virtio_vsock_skb_rx_put(skb); + virtio_vsock_skb_set_net(skb, vsock_global_net()); virtio_transport_deliver_tap_pkt(skb); virtio_transport_recv_pkt(&virtio_transport, skb); }
From: Bobby Eshleman bobbyeshleman@meta.com
Make NS changes not break hyperv. Guest vsocks still remain in the global namespace always, so the behavior is unchanged.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- net/vmw_vsock/hyperv_transport.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c index 432fcbbd14d4..8862297b09a7 100644 --- a/net/vmw_vsock/hyperv_transport.c +++ b/net/vmw_vsock/hyperv_transport.c @@ -313,7 +313,7 @@ static void hvs_open_connection(struct vmbus_channel *chan) return;
hvs_addr_init(&addr, conn_from_host ? if_type : if_instance); - sk = vsock_find_bound_socket(&addr); + sk = vsock_find_bound_socket(&addr, vsock_global_net()); if (!sk) return;
From: Bobby Eshleman bobbyeshleman@meta.com
Add hooks for new internal NS calls to avoid breaking vmci. Guest vsocks remain in global mode namespaces, so behavior is unchanged.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- net/vmw_vsock/vmci_transport.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c index 7eccd6708d66..3c434ee3ca8c 100644 --- a/net/vmw_vsock/vmci_transport.c +++ b/net/vmw_vsock/vmci_transport.c @@ -703,9 +703,9 @@ static int vmci_transport_recv_stream_cb(void *data, struct vmci_datagram *dg) vsock_addr_init(&src, pkt->dg.src.context, pkt->src_port); vsock_addr_init(&dst, pkt->dg.dst.context, pkt->dst_port);
- sk = vsock_find_connected_socket(&src, &dst); + sk = vsock_find_connected_socket(&src, &dst, vsock_global_net()); if (!sk) { - sk = vsock_find_bound_socket(&dst); + sk = vsock_find_bound_socket(&dst, vsock_global_net()); if (!sk) { /* We could not find a socket for this specified * address. If this packet is a RST, we just drop it.
From: Bobby Eshleman bobbyeshleman@meta.com
Add NS support to vsock loopback. Sockets in a global mode netns communicate with each other, regardless of namespace. Sockets in a local mode netns may only communicate with other sockets within the same namespace.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- include/net/af_vsock.h | 4 +++ include/net/netns/vsock.h | 3 +++ net/vmw_vsock/af_vsock.c | 8 +++++- net/vmw_vsock/vsock_loopback.c | 59 +++++++++++++++++++++++++++++++++++------- 4 files changed, 63 insertions(+), 11 deletions(-)
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index 0c0c351394de..aefff6e102e7 100644 --- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -305,4 +305,8 @@ static inline bool vsock_net_check_mode(struct net *n1, struct net *n2) (vsock_net_mode(n1) == VSOCK_NET_MODE_GLOBAL && vsock_net_mode(n2) == VSOCK_NET_MODE_GLOBAL); } + +int vsock_loopback_init_net(struct net *net); +void vsock_loopback_exit_net(struct net *net); + #endif /* __AF_VSOCK_H__ */ diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h index 0bad4652815c..4420346e10a8 100644 --- a/include/net/netns/vsock.h +++ b/include/net/netns/vsock.h @@ -7,6 +7,8 @@ #define VSOCK_NET_MODE_GLOBAL 1 #define VSOCK_NET_MODE_LOCAL (1 << 1)
+struct vsock_loopback; + struct netns_vsock { struct ctl_table_header *vsock_hdr; spinlock_t lock; @@ -14,5 +16,6 @@ struct netns_vsock { /* protected by lock */ u8 ns_mode; bool written; + struct vsock_loopback *loopback; }; #endif /* __NET_NET_NAMESPACE_VSOCK_H */ diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index c69c2db03162..5689ce7d5843 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -2778,9 +2778,12 @@ static __net_init int vsock_sysctl_init_net(struct net *net) { vsock_net_init(net);
- if (vsock_sysctl_register(net)) + if (vsock_loopback_init_net(net)) return -ENOMEM;
+ if (vsock_sysctl_register(net)) + goto err_loopback; + #ifdef CONFIG_PROC_FS if (!proc_create_net_single_write("vsock_ns_mode", 0644, net->proc_net, vsock_proc_ns_mode_show, @@ -2793,12 +2796,15 @@ static __net_init int vsock_sysctl_init_net(struct net *net)
err_sysctl: vsock_sysctl_unregister(net); +err_loopback: + vsock_loopback_exit_net(net); return -ENOMEM; }
static __net_exit void vsock_sysctl_exit_net(struct net *net) { vsock_sysctl_unregister(net); + vsock_loopback_exit_net(net); }
static struct pernet_operations vsock_sysctl_ops __net_initdata = { diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c index 6e78927a598e..4fc07e3a1d2b 100644 --- a/net/vmw_vsock/vsock_loopback.c +++ b/net/vmw_vsock/vsock_loopback.c @@ -28,8 +28,19 @@ static u32 vsock_loopback_get_local_cid(void)
static int vsock_loopback_send_pkt(struct sk_buff *skb) { - struct vsock_loopback *vsock = &the_vsock_loopback; + struct vsock_loopback *vsock; int len = skb->len; + struct net *net; + + if (skb->sk) + net = sock_net(skb->sk); + else + net = NULL; + + if (net && net->vsock.ns_mode == VSOCK_NET_MODE_LOCAL) + vsock = net->vsock.loopback; + else + vsock = &the_vsock_loopback;
virtio_vsock_skb_queue_tail(&vsock->pkt_queue, skb); queue_work(vsock->workqueue, &vsock->pkt_work); @@ -46,7 +57,7 @@ static int vsock_loopback_cancel_pkt(struct vsock_sock *vsk) return 0; }
-static bool vsock_loopback_seqpacket_allow(u32 remote_cid); +static bool vsock_loopback_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid); static bool vsock_loopback_msgzerocopy_allow(void) { return true; @@ -106,7 +117,7 @@ static struct virtio_transport loopback_transport = { .send_pkt = vsock_loopback_send_pkt, };
-static bool vsock_loopback_seqpacket_allow(u32 remote_cid) +static bool vsock_loopback_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid) { return true; } @@ -134,27 +145,55 @@ static void vsock_loopback_work(struct work_struct *work) } }
-static int __init vsock_loopback_init(void) +static int vsock_loopback_init_vsock(struct vsock_loopback *vsock) { - struct vsock_loopback *vsock = &the_vsock_loopback; - int ret; - vsock->workqueue = alloc_workqueue("vsock-loopback", 0, 0); if (!vsock->workqueue) return -ENOMEM;
skb_queue_head_init(&vsock->pkt_queue); INIT_WORK(&vsock->pkt_work, vsock_loopback_work); + return 0; +} + +static void vsock_loopback_deinit_vsock(struct vsock_loopback *vsock) +{ + destroy_workqueue(vsock->workqueue); +} + +int vsock_loopback_init_net(struct net *net) +{ + net->vsock.loopback = kmalloc(GFP_KERNEL, sizeof(struct vsock_loopback)); + if (!net->vsock.loopback) + return -ENOMEM; + + return vsock_loopback_init_vsock(net->vsock.loopback); +} + +void vsock_loopback_exit_net(struct net *net) +{ + vsock_loopback_deinit_vsock(net->vsock.loopback); + kfree(net->vsock.loopback); +} + +static int __init vsock_loopback_init(void) +{ + struct vsock_loopback *vsock = &the_vsock_loopback; + int ret; + + ret = vsock_loopback_init_vsock(vsock); + if (ret < 0) + return ret;
ret = vsock_core_register(&loopback_transport.transport, VSOCK_TRANSPORT_F_LOCAL); if (ret) - goto out_wq; + goto out_deinit;
return 0;
-out_wq: - destroy_workqueue(vsock->workqueue); +out_deinit: + vsock_loopback_deinit_vsock(vsock); return ret; }
On Tue, Aug 05, 2025 at 02:49:17PM -0700, Bobby Eshleman wrote:
From: Bobby Eshleman bobbyeshleman@meta.com
Add NS support to vsock loopback. Sockets in a global mode netns communicate with each other, regardless of namespace. Sockets in a local mode netns may only communicate with other sockets within the same namespace.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com
...
diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
...
@@ -46,7 +57,7 @@ static int vsock_loopback_cancel_pkt(struct vsock_sock *vsk) return 0; } -static bool vsock_loopback_seqpacket_allow(u32 remote_cid); +static bool vsock_loopback_seqpacket_allow(struct vsock_sock *vsk, u32 remote_cid);
This change needs to be squashed into PATCH 3/12 vsock: add netns to af_vsock core
To avoid build breakage.
Likewise with the other change to vsock_loopback_seqpacket_allow below. And I think also for a number of other changes made by PATCH 3/12.
Please make sure that patches don't introduce transient build failures. It breaks bisection.
On the topic of vsock_loopback_seqpacket_allow, also:
* Please line wrap this so that the code is 80 columns wide or less, as is still preferred for Networking code.
Flagged by checkpatch.pl --max-line-length=80
* Can we move the definition of vsock_loopback_seqpacket_allow() here? The function itself is is trivial. And doing so would avoid a forward declaration.
static bool vsock_loopback_msgzerocopy_allow(void) { return true;
...
+int vsock_loopback_init_net(struct net *net) +{
- net->vsock.loopback = kmalloc(GFP_KERNEL, sizeof(struct vsock_loopback));
- if (!net->vsock.loopback)
return -ENOMEM;
- return vsock_loopback_init_vsock(net->vsock.loopback);
+}
+void vsock_loopback_exit_net(struct net *net) +{
- vsock_loopback_deinit_vsock(net->vsock.loopback);
- kfree(net->vsock.loopback);
+}
I think EXPORT_SYMBOL_GPL is needed for both vsock_loopback_exit_net and vsock_loopback_init_net for the case where CONFIG_VSOCKETS=m
Also, in Kconfig VSOCKETS_LOOPBACK depends on VSOCKETS. But this code adds a reverse dependency. As it stands it's possible to configure VSOCKETS without VSOCKETS_LOOPBACK, which will not compile.
Perhaps stub implementations of vsock_loopback_init_net and vsock_loopback_exit_net should be implemented in af_vsock.h if VSOCKETS_LOOPBACK is not enabled?
...
On Wed, Aug 06, 2025 at 08:12:39PM +0100, Simon Horman wrote:
On Tue, Aug 05, 2025 at 02:49:17PM -0700, Bobby Eshleman wrote:
From: Bobby Eshleman bobbyeshleman@meta.com
...
This change needs to be squashed into PATCH 3/12 vsock: add netns to af_vsock core
To avoid build breakage.
Likewise with the other change to vsock_loopback_seqpacket_allow below. And I think also for a number of other changes made by PATCH 3/12.
Please make sure that patches don't introduce transient build failures. It breaks bisection.
Will do, thanks!
On the topic of vsock_loopback_seqpacket_allow, also:
Please line wrap this so that the code is 80 columns wide or less, as is still preferred for Networking code.
Flagged by checkpatch.pl --max-line-length=80
Can we move the definition of vsock_loopback_seqpacket_allow() here? The function itself is is trivial. And doing so would avoid a forward declaration.
static bool vsock_loopback_msgzerocopy_allow(void) { return true;
...
+int vsock_loopback_init_net(struct net *net) +{
- net->vsock.loopback = kmalloc(GFP_KERNEL, sizeof(struct vsock_loopback));
- if (!net->vsock.loopback)
return -ENOMEM;
- return vsock_loopback_init_vsock(net->vsock.loopback);
+}
+void vsock_loopback_exit_net(struct net *net) +{
- vsock_loopback_deinit_vsock(net->vsock.loopback);
- kfree(net->vsock.loopback);
+}
I think EXPORT_SYMBOL_GPL is needed for both vsock_loopback_exit_net and vsock_loopback_init_net for the case where CONFIG_VSOCKETS=m
Also, in Kconfig VSOCKETS_LOOPBACK depends on VSOCKETS. But this code adds a reverse dependency. As it stands it's possible to configure VSOCKETS without VSOCKETS_LOOPBACK, which will not compile.
Perhaps stub implementations of vsock_loopback_init_net and vsock_loopback_exit_net should be implemented in af_vsock.h if VSOCKETS_LOOPBACK is not enabled?
Roger that, makes sense. Thanks for the review!
Best, Bobby
From: Bobby Eshleman bobbyeshleman@meta.com
Improve logging by adding configurable log levels. Additionally, improve usability of logging functions. Remove the test name prefix from logging functions so that logging calls can be made deeper into the call stack without passing down the test name or setting some global. Teach log function to accept a LOG_PREFIX variable to avoid unnecessary argument shifting.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- tools/testing/selftests/vsock/vmtest.sh | 75 ++++++++++++++++----------------- 1 file changed, 37 insertions(+), 38 deletions(-)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh index edacebfc1632..183647a86c8a 100755 --- a/tools/testing/selftests/vsock/vmtest.sh +++ b/tools/testing/selftests/vsock/vmtest.sh @@ -51,7 +51,12 @@ readonly TEST_DESCS=( "Run vsock_test using the loopback transport in the VM." )
-VERBOSE=0 +readonly LOG_LEVEL_DEBUG=0 +readonly LOG_LEVEL_INFO=1 +readonly LOG_LEVEL_WARN=2 +readonly LOG_LEVEL_ERROR=3 + +VERBOSE="${LOG_LEVEL_WARN}"
usage() { local name @@ -196,7 +201,7 @@ vm_start() {
qemu=$(command -v "${QEMU}")
- if [[ "${VERBOSE}" -eq 1 ]]; then + if [[ ${VERBOSE} -le ${LOG_LEVEL_DEBUG} ]]; then verbose_opt="--verbose" logfile=/dev/stdout fi @@ -271,60 +276,56 @@ EOF
host_wait_for_listener() { wait_for_listener "${TEST_HOST_PORT_LISTENER}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}" -} - -__log_stdin() { - cat | awk '{ printf "%s:\t%s\n","'"${prefix}"'", $0 }' -}
-__log_args() { - echo "$*" | awk '{ printf "%s:\t%s\n","'"${prefix}"'", $0 }' }
log() { - local prefix="$1" + local redirect + local prefix
- shift - local redirect= - if [[ ${VERBOSE} -eq 0 ]]; then + if [[ ${VERBOSE} -gt ${LOG_LEVEL_INFO} ]]; then redirect=/dev/null else redirect=/dev/stdout fi
+ prefix="${LOG_PREFIX:-}" + if [[ "$#" -eq 0 ]]; then - __log_stdin | tee -a "${LOG}" > ${redirect} + if [[ -n "${prefix}" ]]; then + cat | awk -v prefix="${prefix}" '{printf "%s: %s\n", prefix, $0}' + else + cat + fi else - __log_args "$@" | tee -a "${LOG}" > ${redirect} - fi + if [[ -n "${prefix}" ]]; then + echo "${prefix}: " "$@" + else + echo "$@" + fi + fi | tee -a "${LOG}" > ${redirect} }
-log_setup() { - log "setup" "$@" +log_host() { + LOG_PREFIX=host log $@ }
-log_host() { - local testname=$1 +log_guest() { + LOG_PREFIX=guest log $@ +}
- shift - log "test:${testname}:host" "$@" }
-log_guest() { - local testname=$1
- shift - log "test:${testname}:guest" "$@" }
test_vm_server_host_client() { - local testname="${FUNCNAME[0]#test_}"
vm_ssh -- "${VSOCK_TEST}" \ --mode=server \ --control-port="${TEST_GUEST_PORT}" \ --peer-cid=2 \ - 2>&1 | log_guest "${testname}" & + 2>&1 | log_guest &
vm_wait_for_listener "${TEST_GUEST_PORT}"
@@ -332,18 +333,17 @@ test_vm_server_host_client() { --mode=client \ --control-host=127.0.0.1 \ --peer-cid="${VSOCK_CID}" \ - --control-port="${TEST_HOST_PORT}" 2>&1 | log_host "${testname}" + --control-port="${TEST_HOST_PORT}" 2>&1 | log_host
return $? }
test_vm_client_host_server() { - local testname="${FUNCNAME[0]#test_}"
${VSOCK_TEST} \ --mode "server" \ --control-port "${TEST_HOST_PORT_LISTENER}" \ - --peer-cid "${VSOCK_CID}" 2>&1 | log_host "${testname}" & + --peer-cid "${VSOCK_CID}" 2>&1 | log_host &
host_wait_for_listener
@@ -351,19 +351,18 @@ test_vm_client_host_server() { --mode=client \ --control-host=10.0.2.2 \ --peer-cid=2 \ - --control-port="${TEST_HOST_PORT_LISTENER}" 2>&1 | log_guest "${testname}" + --control-port="${TEST_HOST_PORT_LISTENER}" 2>&1 | log_guest
return $? }
test_vm_loopback() { - local testname="${FUNCNAME[0]#test_}" local port=60000 # non-forwarded local port
vm_ssh -- "${VSOCK_TEST}" \ --mode=server \ --control-port="${port}" \ - --peer-cid=1 2>&1 | log_guest "${testname}" & + --peer-cid=1 2>&1 | log_guest &
vm_wait_for_listener "${port}"
@@ -371,7 +370,7 @@ test_vm_loopback() { --mode=client \ --control-host="127.0.0.1" \ --control-port="${port}" \ - --peer-cid=1 2>&1 | log_guest "${testname}" + --peer-cid=1 2>&1 | log_guest
return $? } @@ -429,7 +428,7 @@ QEMU="qemu-system-$(uname -m)" while getopts :hvsq:b o do case $o in - v) VERBOSE=1;; + v) VERBOSE=$(( VERBOSE - 1 ));; b) BUILD=1;; q) QEMU=$OPTARG;; h|*) usage;; @@ -452,10 +451,10 @@ handle_build
echo "1..${#ARGS[@]}"
-log_setup "Booting up VM" +log_host "Booting up VM" vm_start vm_wait_for_ssh -log_setup "VM booted up" +log_host "VM booted up"
cnt_pass=0 cnt_fail=0
From: Bobby Eshleman bobbyeshleman@meta.com
Add helper calls vm_vsock_test() and host_vsock_test() to invoke the vsock_test binary. This encapsulates several items of repeat logic, such as waiting for the server to reach listening state and enabling/disabling the bash option pipefail to avoid pipe-style logging from hiding failures.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- tools/testing/selftests/vsock/vmtest.sh | 120 ++++++++++++++++++++++++++++---- 1 file changed, 108 insertions(+), 12 deletions(-)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh index 183647a86c8a..5e36d1068f6f 100755 --- a/tools/testing/selftests/vsock/vmtest.sh +++ b/tools/testing/selftests/vsock/vmtest.sh @@ -248,6 +248,7 @@ wait_for_listener() local port=$1 local interval=$2 local max_intervals=$3 + local old_pipefail local protocol=tcp local pattern local i @@ -256,6 +257,13 @@ wait_for_listener()
# for tcp protocol additionally check the socket state [ "${protocol}" = "tcp" ] && pattern="${pattern}0A" + + # 'grep -q' exits on match, sending SIGPIPE to 'awk', which exits with + # an error, causing the if-condition to fail when pipefail is set. + # Instead, temporarily disable pipefail and restore it later. + old_pipefail=$(set -o | awk '/^pipefail[[:space:]]+(on|off)$/{print $2}') + set +o pipefail + for i in $(seq "${max_intervals}"); do if awk '{print $2" "$4}' /proc/net/"${protocol}"* | \ grep -q "${pattern}"; then @@ -263,6 +271,10 @@ wait_for_listener() fi sleep "${interval}" done + + if [[ "${old_pipefail}" == on ]]; then + set -o pipefail + fi }
vm_wait_for_listener() { @@ -314,28 +326,112 @@ log_guest() { LOG_PREFIX=guest log $@ }
+vm_vsock_test() { + local ns=$1 + local mode=$2 + local rc + + set -o pipefail + if [[ "${mode}" == client ]]; then + local host=$3 + local cid=$4 + local port=$5 + + # log output and use pipefail to respect vsock_test errors + vm_ssh "${ns}" -- "${VSOCK_TEST}" \ + --mode=client \ + --control-host="${host}" \ + --peer-cid="${cid}" \ + --control-port="${port}" \ + 2>&1 | log_guest + rc=$? + else + local cid=$3 + local port=$4 + + # log output and use pipefail to respect vsock_test errors + vm_ssh "${ns}" -- "${VSOCK_TEST}" \ + --mode=server \ + --peer-cid="${cid}" \ + --control-port="${port}" \ + 2>&1 | log_guest & + rc=$? + + if [[ $rc -ne 0 ]]; then + set +o pipefail + return $rc + fi + + vm_wait_for_listener "${ns}" "${port}" + rc=$? + fi + set +o pipefail + + return $rc }
+host_vsock_test() { + local ns=$1 + local mode=$2 + local cmd + + if [[ "${ns}" == none ]]; then + cmd="${VSOCK_TEST}" + else + cmd="ip netns exec ${ns} ${VSOCK_TEST}" + fi + + # log output and use pipefail to respect vsock_test errors + set -o pipefail + if [[ "${mode}" == client ]]; then + local host=$3 + local cid=$4 + local port=$5 + + ${cmd} \ + --mode="${mode}" \ + --peer-cid="${cid}" \ + --control-host="${host}" \ + --control-port="${port}" 2>&1 | log_host + rc=$? + else + local cid=$3 + local port=$4 + + ${cmd} \ + --mode="${mode}" \ + --peer-cid="${cid}" \ + --control-port="${port}" 2>&1 | log_host & + rc=$? + + if [[ $rc -ne 0 ]]; then + return $rc + fi + + host_wait_for_listener "${ns}" "${port}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}" + rc=$? + fi + set +o pipefail
+ return $rc }
test_vm_server_host_client() { + vm_vsock_test "none" "server" 2 "${TEST_GUEST_PORT}" + host_vsock_test "none" "client" "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}" +}
- vm_ssh -- "${VSOCK_TEST}" \ - --mode=server \ - --control-port="${TEST_GUEST_PORT}" \ - --peer-cid=2 \ - 2>&1 | log_guest & +test_vm_client_host_server() { + host_vsock_test "none" "server" "${VSOCK_CID}" "${TEST_HOST_PORT_LISTENER}" + vm_vsock_test "none" "client" "10.0.2.2" 2 "${TEST_HOST_PORT_LISTENER}" +}
- vm_wait_for_listener "${TEST_GUEST_PORT}" +test_vm_loopback() { + vm_vsock_test "none" "server" 1 "${TEST_HOST_PORT_LISTENER}" + vm_vsock_test "none" "client" "127.0.0.1" 1 "${TEST_HOST_PORT_LISTENER}" +}
- ${VSOCK_TEST} \ - --mode=client \ - --control-host=127.0.0.1 \ - --peer-cid="${VSOCK_CID}" \ - --control-port="${TEST_HOST_PORT}" 2>&1 | log_host
- return $? }
test_vm_client_host_server() {
From: Bobby Eshleman bobbyeshleman@meta.com
Add tests for namespace support in vsock. Use socat for basic connection failure tests and vsock_test for full functionality tests when communication is expected to succeed. vsock_test is not used for failure cases because in theory vsock_test could allow connection and some traffic flow but fail on some other case (e.g., fail on MSG_ZEROCOPY).
Tests cover all cases of clients and servers being in all variants of local ns, global ns, host process, and VM process.
Legacy tests are retained and executed in the init ns.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com --- tools/testing/selftests/vsock/vmtest.sh | 909 ++++++++++++++++++++++++++++---- 1 file changed, 804 insertions(+), 105 deletions(-)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh index 5e36d1068f6f..72cebeebf218 100755 --- a/tools/testing/selftests/vsock/vmtest.sh +++ b/tools/testing/selftests/vsock/vmtest.sh @@ -7,6 +7,7 @@ # * virtme-ng # * busybox-static (used by virtme-ng) # * qemu (used by virtme-ng) +# * socat
readonly SCRIPT_DIR="$(cd -P -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)" readonly KERNEL_CHECKOUT=$(realpath "${SCRIPT_DIR}"/../../../../) @@ -23,7 +24,7 @@ readonly VSOCK_CID=1234 readonly WAIT_PERIOD=3 readonly WAIT_PERIOD_MAX=60 readonly WAIT_TOTAL=$(( WAIT_PERIOD * WAIT_PERIOD_MAX )) -readonly QEMU_PIDFILE=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid) +readonly WAIT_QEMU=5
# virtme-ng offers a netdev for ssh when using "--ssh", but we also need a # control port forwarded for vsock_test. Because virtme-ng doesn't support @@ -33,23 +34,125 @@ readonly QEMU_PIDFILE=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid) # add the kernel cmdline options that virtme-init uses to setup the interface. readonly QEMU_TEST_PORT_FWD="hostfwd=tcp::${TEST_HOST_PORT}-:${TEST_GUEST_PORT}" readonly QEMU_SSH_PORT_FWD="hostfwd=tcp::${SSH_HOST_PORT}-:${SSH_GUEST_PORT}" -readonly QEMU_OPTS="\ - -netdev user,id=n0,${QEMU_TEST_PORT_FWD},${QEMU_SSH_PORT_FWD} \ - -device virtio-net-pci,netdev=n0 \ - -device vhost-vsock-pci,guest-cid=${VSOCK_CID} \ - --pidfile ${QEMU_PIDFILE} \ -" readonly KERNEL_CMDLINE="\ virtme.dhcp net.ifnames=0 biosdevname=0 \ virtme.ssh virtme_ssh_channel=tcp virtme_ssh_user=$USER \ " readonly LOG=$(mktemp /tmp/vsock_vmtest_XXXX.log) -readonly TEST_NAMES=(vm_server_host_client vm_client_host_server vm_loopback) +readonly TEST_NAMES=( + vm_server_host_client + vm_client_host_server + vm_loopback + host_vsock_ns_mode_ok + host_vsock_ns_mode_write_once_ok + global_same_cid_fails + local_same_cid_ok + global_local_same_cid_ok + local_global_same_cid_ok + diff_ns_global_host_connect_to_global_vm_ok + diff_ns_global_host_connect_to_local_vm_fails + diff_ns_global_vm_connect_to_global_host_ok + diff_ns_global_vm_connect_to_local_host_fails + diff_ns_local_host_connect_to_local_vm_fails + diff_ns_local_vm_connect_to_local_host_fails + diff_ns_global_to_local_loopback_local_fails + diff_ns_local_to_global_loopback_fails + diff_ns_local_to_local_loopback_fails + diff_ns_global_to_global_loopback_ok + same_ns_local_loopback_ok + same_ns_local_host_connect_to_local_vm_ok + same_ns_local_vm_connect_to_local_host_ok +) + readonly TEST_DESCS=( + # vm_server_host_client "Run vsock_test in server mode on the VM and in client mode on the host." + + # vm_client_host_server "Run vsock_test in client mode on the VM and in server mode on the host." + + # vm_loopback "Run vsock_test using the loopback transport in the VM." + + # host_vsock_ns_mode_ok + "Check /proc/net/vsock_ns_mode strings on the host." + + # host_vsock_ns_mode_write_once_ok + "Check /proc/net/vsock_ns_mode is write-once on the host." + + # global_same_cid_fails + "Check QEMU fails to start two VMs with same CID in two different global namespaces." + + # local_same_cid_ok + "Check QEMU successfully starts two VMs with same CID in two different local namespaces." + + # global_local_same_cid_ok + "Check QEMU successfully starts one VM in a global ns and then another VM in a local ns with the same CID." + + # local_global_same_cid_ok + "Check QEMU successfully starts one VM in a local ns and then another VM in a global ns with the same CID." + + # diff_ns_global_host_connect_to_global_vm_ok + "Run vsock_test client in global ns with server in VM in another global ns." + + # diff_ns_global_host_connect_to_local_vm_fails + "Run socat to test a process in a global ns fails to connect to a VM in a local ns." + + # diff_ns_global_vm_connect_to_global_host_ok + "Run vsock_test client in VM in a global ns with server in another global ns." + + # diff_ns_global_vm_connect_to_local_host_fails + "Run socat to test a VM in a global ns fails to connect to a host process in a local ns." + + # diff_ns_local_host_connect_to_local_vm_fails + "Run socat to test a host process in a local ns fails to connect to a VM in another local ns." + + # diff_ns_local_vm_connect_to_local_host_fails + "Run socat to test a VM in a local ns fails to connect to a host process in another local ns." + + # diff_ns_global_to_local_loopback_local_fails + "Run socat to test a loopback vsock in a global ns fails to connect to a vsock in a local ns." + + # diff_ns_local_to_global_loopback_fails + "Run socat to test a loopback vsock in a local ns fails to connect to a vsock in a global ns." + + # diff_ns_local_to_local_loopback_fails + "Run socat to test a loopback vsock in a local ns fails to connect to a vsock in another local ns." + + # diff_ns_global_to_global_loopback_ok + "Run socat to test a loopback vsock in a global ns successfuly connects to a vsock in another global ns." + + # same_ns_local_loopback_ok + "Run socat to test a loopback vsock in a local ns successfuly connects to a vsock in the same ns." + + # same_ns_local_host_connect_to_local_vm_ok + "Run vsock_test client in a local ns with server in VM in same ns." + + # same_ns_local_vm_connect_to_local_host_ok + "Run vsock_test client in VM in a local ns with server in same ns." +) + +readonly USE_SHARED_VM=(vm_server_host_client vm_client_host_server vm_loopback) +readonly USE_INIT_NETNS=( + global_same_cid_fails + local_same_cid_ok + global_local_same_cid_ok + local_global_same_cid_ok + diff_ns_global_host_connect_to_global_vm_ok + diff_ns_global_host_connect_to_local_vm_fails + diff_ns_global_vm_connect_to_global_host_ok + diff_ns_global_vm_connect_to_local_host_fails + diff_ns_local_host_connect_to_local_vm_fails + diff_ns_local_vm_connect_to_local_host_fails + diff_ns_global_to_local_loopback_local_fails + diff_ns_local_to_global_loopback_fails + diff_ns_local_to_local_loopback_fails + diff_ns_global_to_global_loopback_ok + same_ns_local_loopback_ok + same_ns_local_host_connect_to_local_vm_ok + same_ns_local_vm_connect_to_local_host_ok ) +readonly MODES=("local" "global")
readonly LOG_LEVEL_DEBUG=0 readonly LOG_LEVEL_INFO=1 @@ -58,6 +161,12 @@ readonly LOG_LEVEL_ERROR=3
VERBOSE="${LOG_LEVEL_WARN}"
+# Test pass/fail counters +cnt_pass=0 +cnt_fail=0 +cnt_skip=0 +cnt_total=0 + usage() { local name local desc @@ -77,7 +186,7 @@ usage() { for ((i = 0; i < ${#TEST_NAMES[@]}; i++)); do name=${TEST_NAMES[${i}]} desc=${TEST_DESCS[${i}]} - printf "\t%-35s%-35s\n" "${name}" "${desc}" + printf "\t%-55s%-35s\n" "${name}" "${desc}" done echo
@@ -89,21 +198,87 @@ die() { exit "${KSFT_FAIL}" }
+add_namespaces() { + # add namespaces local0, local1, global0, and global1 + for mode in "${MODES[@]}"; do + ip netns add "${mode}0" 2>/dev/null + ip netns add "${mode}1" 2>/dev/null + done +} + +init_namespaces() { + for mode in "${MODES[@]}"; do + ns_set_mode "${mode}0" "${mode}" + ns_set_mode "${mode}1" "${mode}" + + log_host "set ns ${mode}0 to mode ${mode}" + log_host "set ns ${mode}1 to mode ${mode}" + + # we need lo for qemu port forwarding + ip netns exec "${mode}0" ip link set dev lo up + ip netns exec "${mode}1" ip link set dev lo up + done +} + +del_namespaces() { + for mode in "${MODES[@]}"; do + ip netns del "${mode}0" + ip netns del "${mode}1" + log_host "removed ns ${mode}0" + log_host "removed ns ${mode}1" + done &>/dev/null +} + +ns_set_mode() { + local ns=$1 + local mode=$2 + + echo "${mode}" | ip netns exec "${ns}" \ + tee /proc/net/vsock_ns_mode &>/dev/null +} + vm_ssh() { - ssh -q -o UserKnownHostsFile=/dev/null -p ${SSH_HOST_PORT} localhost "$@" + local ns_exec + + if [[ "${1}" == none ]]; then + local ns_exec="" + else + local ns_exec="ip netns exec ${1}" + fi + + shift + + ${ns_exec} ssh -q -o UserKnownHostsFile=/dev/null -p ${SSH_HOST_PORT} localhost $* + return $? }
cleanup() { - if [[ -s "${QEMU_PIDFILE}" ]]; then - pkill -SIGTERM -F "${QEMU_PIDFILE}" > /dev/null 2>&1 - fi + del_namespaces +}
- # If failure occurred during or before qemu start up, then we need - # to clean this up ourselves. - if [[ -e "${QEMU_PIDFILE}" ]]; then - rm "${QEMU_PIDFILE}" - fi +terminate_pidfiles() { + local pidfile + + for pidfile in "$@"; do + if [[ -s "${pidfile}" ]]; then + pkill -SIGTERM -F "${pidfile}" 2>&1 > /dev/null + fi + + # If failure occurred during or before qemu start up, then we need + # to clean this up ourselves. + if [[ -e "${pidfile}" ]]; then + rm -f "${pidfile}" + fi + done +} + +terminate_pids() { + local pid + + for pid in "$@"; do + kill -SIGTERM "${pid}" &>/dev/null || : + done }
check_args() { @@ -133,7 +308,7 @@ check_args() { }
check_deps() { - for dep in vng ${QEMU} busybox pkill ssh; do + for dep in vng ${QEMU} busybox pkill ssh socat; do if [[ ! -x $(command -v "${dep}") ]]; then echo -e "skip: dependency ${dep} not found!\n" exit "${KSFT_SKIP}" @@ -170,6 +345,20 @@ check_vng() { fi }
+check_socat() { + local support_string + + support_string="$(socat -V)" + + if [[ "${support_string}" != *"WITH_VSOCK 1"* ]]; then + die "err: socat is missing vsock support" + fi + + if [[ "${support_string}" != *"WITH_UNIX 1"* ]]; then + die "err: socat is missing unix support" + fi +} + handle_build() { if [[ ! "${BUILD}" -eq 1 ]]; then return @@ -194,9 +383,14 @@ handle_build() { }
vm_start() { + local cid=$1 + local ns=$2 + local pidfile=$3 local logfile=/dev/null local verbose_opt="" + local qemu_opts="" local kernel_opt="" + local ns_exec="" local qemu
qemu=$(command -v "${QEMU}") @@ -206,27 +400,37 @@ vm_start() { logfile=/dev/stdout fi
+ qemu_opts="\ + -netdev user,id=n0,${QEMU_TEST_PORT_FWD},${QEMU_SSH_PORT_FWD} \ + -device virtio-net-pci,netdev=n0 \ + ${QEMU_OPTS} -device vhost-vsock-pci,guest-cid=${cid} \ + --pidfile ${pidfile} + " + if [[ "${BUILD}" -eq 1 ]]; then kernel_opt="${KERNEL_CHECKOUT}" fi
- vng \ + if [[ "${ns}" != "none" ]]; then + ns_exec="ip netns exec ${ns}" + fi + + ${ns_exec} vng \ --run \ ${kernel_opt} \ ${verbose_opt} \ - --qemu-opts="${QEMU_OPTS}" \ + --qemu-opts="${qemu_opts}" \ --qemu="${qemu}" \ --user root \ --append "${KERNEL_CMDLINE}" \ --rw &> ${logfile} &
- if ! timeout ${WAIT_TOTAL} \ - bash -c 'while [[ ! -s '"${QEMU_PIDFILE}"' ]]; do sleep 1; done; exit 0'; then - die "failed to boot VM" - fi + timeout "${WAIT_QEMU}" \ + bash -c 'while [[ ! -s '"${pidfile}"' ]]; do sleep 1; done; exit 0' }
vm_wait_for_ssh() { + local ns=$1 local i
i=0 @@ -234,7 +438,8 @@ vm_wait_for_ssh() { if [[ ${i} -gt ${WAIT_PERIOD_MAX} ]]; then die "Timed out waiting for guest ssh" fi - if vm_ssh -- true; then + + if vm_ssh "${ns}" -- true; then break fi i=$(( i + 1 )) @@ -269,6 +474,7 @@ wait_for_listener() grep -q "${pattern}"; then break fi + sleep "${interval}" done
@@ -278,17 +484,29 @@ wait_for_listener() }
vm_wait_for_listener() { - local port=$1 + local ns=$1 + local port=$2 + + log "Waiting for listener on port ${port} on vm"
- vm_ssh <<EOF + vm_ssh "${ns}" <<EOF $(declare -f wait_for_listener) wait_for_listener ${port} ${WAIT_PERIOD} ${WAIT_PERIOD_MAX} EOF }
host_wait_for_listener() { - wait_for_listener "${TEST_HOST_PORT_LISTENER}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}" + local ns=$1 + local port=$2
+ if [[ "${ns}" == none ]]; then + wait_for_listener "${port}" "${WAIT_PERIOD}" "${WAIT_PERIOD_MAX}" + else + ip netns exec "${ns}" bash <<-EOF + $(declare -f wait_for_listener) + wait_for_listener ${port} ${WAIT_PERIOD} ${WAIT_PERIOD_MAX} + EOF + fi }
log() { @@ -431,47 +649,499 @@ test_vm_loopback() { vm_vsock_test "none" "client" "127.0.0.1" 1 "${TEST_HOST_PORT_LISTENER}" }
+test_host_vsock_ns_mode_ok() { + add_namespaces + + for mode in "${MODES[@]}"; do + if ! ns_set_mode "${mode}0" "${mode}"; then + del_namespaces + return "${KSFT_FAIL}" + fi + done
+ del_namespaces }
-test_vm_client_host_server() { +test_host_vsock_ns_mode_write_once_ok() { + add_namespaces
- ${VSOCK_TEST} \ - --mode "server" \ - --control-port "${TEST_HOST_PORT_LISTENER}" \ - --peer-cid "${VSOCK_CID}" 2>&1 | log_host & + for mode in "${MODES[@]}"; do + local ns="${mode}0" + if ! ns_set_mode "${ns}" "${mode}"; then + del_namespaces + return "${KSFT_FAIL}" + fi
- host_wait_for_listener + # try writing again and expect failure + if ns_set_mode "${ns}" "${mode}"; then + del_namespaces + return "${KSFT_FAIL}" + fi + done
- vm_ssh -- "${VSOCK_TEST}" \ - --mode=client \ - --control-host=10.0.2.2 \ - --peer-cid=2 \ - --control-port="${TEST_HOST_PORT_LISTENER}" 2>&1 | log_guest + del_namespaces
- return $? + return "${KSFT_PASS}" }
-test_vm_loopback() { - local port=60000 # non-forwarded local port +namespaces_can_boot_same_cid() { + local ns0=$1 + local ns1=$2 + local pidfile1 pidfile2 + local cid=20 + readonly cid + local rc
- vm_ssh -- "${VSOCK_TEST}" \ - --mode=server \ - --control-port="${port}" \ - --peer-cid=1 2>&1 | log_guest & + pidfile1=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid) + vm_start "${cid}" "${ns0}" "${pidfile1}"
- vm_wait_for_listener "${port}" + pidfile2=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid) + vm_start "${cid}" "${ns1}" "${pidfile2}"
- vm_ssh -- "${VSOCK_TEST}" \ - --mode=client \ - --control-host="127.0.0.1" \ - --control-port="${port}" \ - --peer-cid=1 2>&1 | log_guest + rc=$? + terminate_pidfiles "${pidfile1}" "${pidfile2}"
- return $? + return $rc +} + +test_global_same_cid_fails() { + if namespaces_can_boot_same_cid "global0" "global1"; then + return "${KSFT_FAIL}" + fi + + return "${KSFT_PASS}" +} + +test_local_global_same_cid_ok() { + if namespaces_can_boot_same_cid "local0" "global0"; then + return "${KSFT_PASS}" + fi + + return "${KSFT_FAIL}" +} + +test_global_local_same_cid_ok() { + if namespaces_can_boot_same_cid "global0" "local0"; then + return "${KSFT_PASS}" + fi + + return "${KSFT_FAIL}" +} + +test_local_same_cid_ok() { + if namespaces_can_boot_same_cid "local0" "local0"; then + return "${KSFT_FAIL}" + fi + + return "${KSFT_PASS}" +} + +test_diff_ns_global_host_connect_to_global_vm_ok() { + local pids pid pidfile + local ns0 ns1 port + declare -a pids + local unixfile + ns0="global0" + ns1="global1" + port=1234 + local rc + + pidfile=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid) + + if ! vm_start "${VSOCK_CID}" "${ns0}" "${pidfile}"; then + return "${KSFT_FAIL}" + fi + + unixfile=$(mktemp -u /tmp/XXXX.sock) + ip netns exec "${ns1}" \ + socat TCP-LISTEN:"${TEST_HOST_PORT}",fork \ + UNIX-CONNECT:"${unixfile}" & + pids+=($!) + host_wait_for_listener "${ns1}" "${TEST_HOST_PORT}" + + ip netns exec "${ns0}" socat UNIX-LISTEN:"${unixfile}",fork \ + TCP-CONNECT:localhost:"${TEST_HOST_PORT}" & + pids+=($!) + + vm_vsock_test "${ns0}" "server" 2 "${TEST_GUEST_PORT}" + vm_wait_for_listener "${ns0}" "${TEST_GUEST_PORT}" + host_vsock_test "${ns1}" "client" "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}" + rc=$? + + for pid in "${pids[@]}"; do + if [[ "$(jobs -p)" = *"${pid}"* ]]; then + kill -SIGTERM "${pid}" &>/dev/null + fi + done + + terminate_pidfiles "${pidfile}" + + if [[ $rc -ne 0 ]]; then + return "${KSFT_FAIL}" + fi + + return "${KSFT_PASS}" }
-run_test() { +test_diff_ns_global_host_connect_to_local_vm_fails() { + local ns0="global0" + local ns1="local0" + local port=12345 + local pidfile + local result + local pid + + outfile=$(mktemp) + + pidfile=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid) + if ! vm_start "${VSOCK_CID}" "${ns1}" "${pidfile}"; then + log_host "failed to start vm (cid=${VSOCK_CID}, ns=${ns0})" + return $KSFT_FAIL + fi + + vm_wait_for_ssh "${ns1}" + vm_ssh "${ns1}" -- socat VSOCK-LISTEN:"${port}" STDOUT > "${outfile}" & + echo TEST | ip netns exec "${ns0}" \ + socat STDIN VSOCK-CONNECT:"${VSOCK_CID}":"${port}" 2>/dev/null + + terminate_pidfiles "${pidfile}" + + result=$(cat "${outfile}") + rm -f "${outfile}" + + if [[ "${result}" != TEST ]]; then + return $KSFT_PASS + fi + + return $KSFT_FAIL +} + +test_diff_ns_global_vm_connect_to_global_host_ok() { + local ns0="global0" + local ns1="global1" + local port=12345 + local unixfile + local pidfile + local pids + + declare -a pids + + log_host "Setup socat bridge from ns ${ns0} to ns ${ns1} over port ${port}" + + unixfile=$(mktemp -u /tmp/XXXX.sock) + + ip netns exec "${ns0}" \ + socat TCP-LISTEN:"${port}" UNIX-CONNECT:"${unixfile}" & + pids+=($!) + + ip netns exec "${ns1}" \ + socat UNIX-LISTEN:"${unixfile}" TCP-CONNECT:127.0.0.1:"${port}" & + pids+=($!) + + log_host "Launching ${VSOCK_TEST} in ns ${ns1}" + host_vsock_test "${ns1}" "server" "${VSOCK_CID}" "${port}" + + pidfile=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid) + if ! vm_start "${VSOCK_CID}" "${ns0}" "${pidfile}"; then + log_host "failed to start vm (cid=${cid}, ns=${ns0})" + terminate_pids "${pids[@]}" + rm -f "${unixfile}" + return $KSFT_FAIL + fi + + vm_wait_for_ssh "${ns0}" + vm_vsock_test "${ns0}" "client" "10.0.2.2" 2 "${port}" + rc=$? + + terminate_pidfiles "${pidfile}" + terminate_pids "${pids[@]}" + rm -f "${unixfile}" + + if [[ ! $rc -eq 0 ]]; then + return "${KSFT_FAIL}" + fi + + return "${KSFT_PASS}" + +} + +test_diff_ns_global_vm_connect_to_local_host_fails() { + local ns0="global0" + local ns1="local0" + local port=12345 + local pidfile + local result + local pid + + log_host "Launching socat in ns ${ns1}" + outfile=$(mktemp) + ip netns exec "${ns1}" socat VSOCK-LISTEN:${port} STDOUT &> "${outfile}" & + pid=$! + + pidfile=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid) + if ! vm_start "${VSOCK_CID}" "${ns0}" "${pidfile}"; then + log_host "failed to start vm (cid=${cid}, ns=${ns0})" + terminate_pids "${pid}" + rm -f "${outfile}" + return $KSFT_FAIL + fi + + vm_wait_for_ssh "${ns0}" + + vm_ssh "${ns0}" -- \ + bash -c "echo TEST | socat STDIN VSOCK-CONNECT:2:${port}" 2>&1 | log_guest + + terminate_pidfiles "${pidfile}" + terminate_pids "${pid}" + + result=$(cat "${outfile}") + rm -f "${outfile}" + + if [[ "${result}" != TEST ]]; then + return "${KSFT_PASS}" + fi + + return "${KSFT_FAIL}" +} + +test_diff_ns_local_host_connect_to_local_vm_fails() { + local ns0="local0" + local ns1="local1" + local port=12345 + local pidfile + local result + local pid + + outfile=$(mktemp) + + pidfile=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid) + if ! vm_start "${VSOCK_CID}" "${ns1}" "${pidfile}"; then + log_host "failed to start vm (cid=${cid}, ns=${ns0})" + return $KSFT_FAIL + fi + + vm_wait_for_ssh "${ns1}" + vm_ssh "${ns1}" -- socat VSOCK-LISTEN:"${port}" STDOUT > "${outfile}" & + echo TEST | ip netns exec "${ns0}" \ + socat STDIN VSOCK-CONNECT:"${VSOCK_CID}":"${port}" 2>/dev/null + + terminate_pidfiles "${pidfile}" + + result=$(cat "${outfile}") + rm -f "${outfile}" + + if [[ "${result}" != TEST ]]; then + return $KSFT_PASS + fi + + return $KSFT_FAIL +} + +test_diff_ns_local_vm_connect_to_local_host_fails() { + local ns0="local0" + local ns1="local1" + local port=12345 + local pidfile + local result + local pid + + log_host "Launching socat in ns ${ns1}" + outfile=$(mktemp) + ip netns exec "${ns1}" socat VSOCK-LISTEN:"${port}" STDOUT &> "${outfile}" & + pid=$! + + pidfile=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid) + if ! vm_start "${VSOCK_CID}" "${ns0}" "${pidfile}"; then + log_host "failed to start vm (cid=${cid}, ns=${ns0})" + rm -f "${outfile}" + return "${KSFT_FAIL}" + fi + + vm_wait_for_ssh "${ns0}" + + vm_ssh "${ns0}" -- \ + bash -c "echo TEST | socat STDIN VSOCK-CONNECT:2:${port}" 2>&1 | log_guest + + terminate_pidfiles "${pidfile}" + terminate_pids "${pid}" + + result=$(cat "${outfile}") + rm -f "${outfile}" + + if [[ "${result}" != TEST ]]; then + return "${KSFT_PASS}" + fi + + return "${KSFT_FAIL}" +} + +__test_loopback_two_netns() { + local ns0=$1 + local ns1=$2 + local port=12345 + local result + local pid + + log_host "Launching socat in ns ${ns1}" + outfile=$(mktemp) + ip netns exec "${ns1}" socat VSOCK-LISTEN:"${port}" STDOUT > "${outfile}" 2>/dev/null & + pid=$! + + log_host "Launching socat in ns ${ns0}" + echo TEST | ip netns exec "${ns0}" socat STDIN VSOCK-CONNECT:1:"${port}" 2>/dev/null + terminate_pids "${pid}" + + result=$(cat "${outfile}") + rm -f "${outfile}" + + if [[ "${result}" == TEST ]]; then + return 0 + fi + + return 1 +} + +test_diff_ns_global_to_local_loopback_local_fails() { + if ! __test_loopback_two_netns "global0" "local0"; then + return "${KSFT_PASS}" + fi + + return "${KSFT_FAIL}" +} + +test_diff_ns_local_to_global_loopback_fails() { + if ! __test_loopback_two_netns "local0" "global0"; then + return "${KSFT_PASS}" + fi + + return "${KSFT_FAIL}" +} + +test_diff_ns_local_to_local_loopback_fails() { + if ! __test_loopback_two_netns "local0" "local1"; then + return "${KSFT_PASS}" + fi + + return "${KSFT_FAIL}" +} + +test_diff_ns_global_to_global_loopback_ok() { + if __test_loopback_two_netns "global0" "global1"; then + return "${KSFT_PASS}" + fi + + return "${KSFT_FAIL}" +} + +test_same_ns_local_loopback_ok() { + if __test_loopback_two_netns "local0" "local0"; then + return "${KSFT_PASS}" + fi + + return "${KSFT_FAIL}" +} + +test_same_ns_local_host_connect_to_local_vm_ok() { + local ns="local0" + local port=1234 + local pidfile + local rc + + pidfile=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid) + + if ! vm_start "${VSOCK_CID}" "${ns}" "${pidfile}"; then + return "${KSFT_FAIL}" + fi + + vm_vsock_test "${ns}" "server" 2 "${TEST_GUEST_PORT}" + host_vsock_test "${ns}" "client" "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}" + rc=$? + + terminate_pidfiles "${pidfile}" + + if [[ $rc -ne 0 ]]; then + return "${KSFT_FAIL}" + fi + + return "${KSFT_PASS}" +} + +test_same_ns_local_vm_connect_to_local_host_ok() { + local ns="local0" + local port=1234 + local pidfile + local rc + + pidfile=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid) + + if ! vm_start "${VSOCK_CID}" "${ns}" "${pidfile}"; then + return "${KSFT_FAIL}" + fi + + vm_vsock_test "${ns}" "server" 2 "${TEST_GUEST_PORT}" + host_vsock_test "${ns}" "client" "127.0.0.1" "${VSOCK_CID}" "${TEST_HOST_PORT}" + rc=$? + + terminate_pidfiles "${pidfile}" + + if [[ $rc -ne 0 ]]; then + return "${KSFT_FAIL}" + fi + + return "${KSFT_PASS}" +} + +shared_vm_test() { + local tname + + tname="${1}" + + for testname in "${USE_SHARED_VM[@]}"; do + if [[ "${tname}" == "${testname}" ]]; then + return 0 + fi + done + + return 1 +} + + +init_netns_test() { + local tname + + tname="${1}" + + for testname in "${USE_INIT_NETNS[@]}"; do + if [[ "${tname}" == "${testname}" ]]; then + return 0 + fi + done + + return 1 +} + +check_result() { + local rc num + + rc=$1 + num=$(( cnt_total + 1 )) + + if [[ ${rc} -eq $KSFT_PASS ]]; then + cnt_pass=$(( cnt_pass + 1 )) + echo "ok ${num} ${arg}" + elif [[ ${rc} -eq $KSFT_SKIP ]]; then + cnt_skip=$(( cnt_skip + 1 )) + echo "ok ${num} ${arg} # SKIP" + elif [[ ${rc} -eq $KSFT_FAIL ]]; then + cnt_fail=$(( cnt_fail + 1 )) + echo "not ok ${num} ${arg} # exit=$rc" + fi + + cnt_total=$(( cnt_total + 1 )) +} + +run_shared_vm_tests() { + local start_shared_vm pidfile local host_oops_cnt_before local host_warn_cnt_before local vm_oops_cnt_before @@ -483,40 +1153,90 @@ run_test() { local name local rc
- host_oops_cnt_before=$(dmesg | grep -c -i 'Oops') - host_warn_cnt_before=$(dmesg --level=warn | wc -l) - vm_oops_cnt_before=$(vm_ssh -- dmesg | grep -c -i 'Oops') - vm_warn_cnt_before=$(vm_ssh -- dmesg --level=warn | wc -l) + start_shared_vm=0
- name=$(echo "${1}" | awk '{ print $1 }') - eval test_"${name}" - rc=$? + for arg in "${ARGS[@]}"; do + if shared_vm_test "${arg}"; then + start_shared_vm=1 + break + fi + done
- host_oops_cnt_after=$(dmesg | grep -i 'Oops' | wc -l) - if [[ ${host_oops_cnt_after} -gt ${host_oops_cnt_before} ]]; then - echo "FAIL: kernel oops detected on host" | log_host "${name}" - rc=$KSFT_FAIL + pidfile="" + if [[ "${start_shared_vm}" == 1 ]]; then + pidfile=$(mktemp $PIDFILE_TEMPLATE) + log_host "Booting up VM" + vm_start "${VSOCK_CID}" "none" "${pidfile}" + vm_wait_for_ssh "none" + log_host "VM booted up" fi
- host_warn_cnt_after=$(dmesg --level=warn | wc -l) - if [[ ${host_warn_cnt_after} -gt ${host_warn_cnt_before} ]]; then - echo "FAIL: kernel warning detected on host" | log_host "${name}" - rc=$KSFT_FAIL - fi + for arg in "${ARGS[@]}"; do + if ! shared_vm_test "${arg}"; then + continue + fi
- vm_oops_cnt_after=$(vm_ssh -- dmesg | grep -i 'Oops' | wc -l) - if [[ ${vm_oops_cnt_after} -gt ${vm_oops_cnt_before} ]]; then - echo "FAIL: kernel oops detected on vm" | log_host "${name}" - rc=$KSFT_FAIL - fi + host_oops_cnt_before=$(dmesg | grep -c -i 'Oops') + host_warn_cnt_before=$(dmesg --level=warn | wc -l) + vm_oops_cnt_before=$(vm_ssh none -- dmesg | grep -c -i 'Oops') + vm_warn_cnt_before=$(vm_ssh none -- dmesg --level=warn | wc -l) + + name=$(echo "${arg}" | awk '{ print $1 }') + log_host "Executing test_${name}" + eval test_"${name}" + rc=$? + + host_oops_cnt_after=$(dmesg | grep -i 'Oops' | wc -l) + if [[ ${host_oops_cnt_after} -gt ${host_oops_cnt_before} ]]; then + echo "FAIL: kernel oops detected on host" | log_host "${name}" + rc=$KSFT_FAIL + fi + + host_warn_cnt_after=$(dmesg --level=warn | wc -l) + if [[ ${host_warn_cnt_after} -gt ${host_warn_cnt_before} ]]; then + echo "FAIL: kernel warning detected on host" | log_host "${name}" + rc=$KSFT_FAIL + fi
- vm_warn_cnt_after=$(vm_ssh -- dmesg --level=warn | wc -l) - if [[ ${vm_warn_cnt_after} -gt ${vm_warn_cnt_before} ]]; then - echo "FAIL: kernel warning detected on vm" | log_host "${name}" - rc=$KSFT_FAIL + vm_oops_cnt_after=$(vm_ssh none -- dmesg | grep -i 'Oops' | wc -l) + if [[ ${vm_oops_cnt_after} -gt ${vm_oops_cnt_before} ]]; then + echo "FAIL: kernel oops detected on vm" | log_host "${name}" + rc=$KSFT_FAIL + fi + + vm_warn_cnt_after=$(vm_ssh none -- dmesg --level=warn | wc -l) + if [[ ${vm_warn_cnt_after} -gt ${vm_warn_cnt_before} ]]; then + echo "FAIL: kernel warning detected on vm" | log_host "${name}" + rc=$KSFT_FAIL + fi + + check_result "${rc}" + done + + if [[ -n "${pidfile}" ]]; then + log_host "VM terminate" + terminate_pidfiles "${pidfile}" fi +} + +run_isolated_vm_tests() { + for arg in "${ARGS[@]}"; do + if shared_vm_test "${arg}"; then + continue + fi
- return "${rc}" + add_namespaces + if init_netns_test "${arg}"; then + init_namespaces + fi + + name=$(echo "${arg}" | awk '{ print $1 }') + log_host "Executing test_${name}" + eval test_"${name}" + check_result $? + + del_namespaces + done }
QEMU="qemu-system-$(uname -m)" @@ -543,34 +1263,13 @@ fi check_args "${ARGS[@]}" check_deps check_vng +check_socat handle_build
echo "1..${#ARGS[@]}"
-log_host "Booting up VM" -vm_start -vm_wait_for_ssh -log_host "VM booted up" - -cnt_pass=0 -cnt_fail=0 -cnt_skip=0 -cnt_total=0 -for arg in "${ARGS[@]}"; do - run_test "${arg}" - rc=$? - if [[ ${rc} -eq $KSFT_PASS ]]; then - cnt_pass=$(( cnt_pass + 1 )) - echo "ok ${cnt_total} ${arg}" - elif [[ ${rc} -eq $KSFT_SKIP ]]; then - cnt_skip=$(( cnt_skip + 1 )) - echo "ok ${cnt_total} ${arg} # SKIP" - elif [[ ${rc} -eq $KSFT_FAIL ]]; then - cnt_fail=$(( cnt_fail + 1 )) - echo "not ok ${cnt_total} ${arg} # exit=$rc" - fi - cnt_total=$(( cnt_total + 1 )) -done +run_shared_vm_tests +run_isolated_vm_tests
echo "SUMMARY: PASS=${cnt_pass} SKIP=${cnt_skip} FAIL=${cnt_fail}" echo "Log: ${LOG}"
On Tue, Aug 05, 2025 at 02:49:08PM -0700, Bobby Eshleman wrote:
...
Thanks again for everyone's help and reviews!
Changes in v4:
- removed RFC tag
My bad, I didn't notice I still had the rfc tag before sending out with b4.
This is ready for review and not really an RFC. All test cases passing, etc...
-Bobby
On Tue, Aug 05, 2025 at 03:03:37PM -0700, Bobby Eshleman wrote:
On Tue, Aug 05, 2025 at 02:49:08PM -0700, Bobby Eshleman wrote:
...
Thanks again for everyone's help and reviews!
Changes in v4:
- removed RFC tag
My bad, I didn't notice I still had the rfc tag before sending out with b4.
This is ready for review and not really an RFC. All test cases passing, etc...
Ack. But net-next is currently closed for the merge-window.
So please don't post non-RFC patches for it until it reopens, around the 11th August.
On Wed, Aug 06, 2025 at 08:13:48PM +0100, Simon Horman wrote:
On Tue, Aug 05, 2025 at 03:03:37PM -0700, Bobby Eshleman wrote:
On Tue, Aug 05, 2025 at 02:49:08PM -0700, Bobby Eshleman wrote:
...
Ack. But net-next is currently closed for the merge-window.
So please don't post non-RFC patches for it until it reopens, around the 11th August.
Got it, thanks!
Hi Bobby,
On Tue, Aug 05, 2025 at 02:49:08PM -0700, Bobby Eshleman wrote:
This series adds namespace support to vhost-vsock. It does not add namespaces to any of the guest transports (virtio-vsock, hyperv, or vmci).
The current revision only supports two modes: local or global. Local mode is complete isolation of namespaces, while global mode is complete sharing between namespaces of CIDs (the original behavior).
Future may include supporting a mixed mode, which I expect to be more complicated because socket lookups will have to include new logic and API changes to behave differently based on if the lookup is part of a mixed mode CID allocation, a global CID allocation, a mixed-to-global connection (allowed), or a global-to-mixed connection (not allowed).
Modes are per-netns and write-once. This allows a system to configure namespaces independently (some may share CIDs, others are completely isolated). This also supports future mixed use cases, where there may be namespaces in global mode spinning up VMs while there are mixed mode namespaces that provide services to the VMs, but are not allowed to allocate from the global CID pool.
Thanks again for everyone's help and reviews!
Thanks for your work!
As I mentioned to you, I'll be off for the next 2 weeks, so I'll take a look when I'm back, but feel free to send new versions if you receive enough comments on this.
Thanks, Stefano
Signed-off-by: Bobby Eshleman bobbyeshleman@gmail.com To: Stefano Garzarella sgarzare@redhat.com To: Shuah Khan shuah@kernel.org To: David S. Miller davem@davemloft.net To: Eric Dumazet edumazet@google.com To: Jakub Kicinski kuba@kernel.org To: Paolo Abeni pabeni@redhat.com To: Simon Horman horms@kernel.org To: Stefan Hajnoczi stefanha@redhat.com To: Michael S. Tsirkin mst@redhat.com To: Jason Wang jasowang@redhat.com To: Xuan Zhuo xuanzhuo@linux.alibaba.com To: Eugenio Pérez eperezma@redhat.com To: K. Y. Srinivasan kys@microsoft.com To: Haiyang Zhang haiyangz@microsoft.com To: Wei Liu wei.liu@kernel.org To: Dexuan Cui decui@microsoft.com To: Bryan Tan bryan-bt.tan@broadcom.com To: Vishnu Dasa vishnu.dasa@broadcom.com To: Broadcom internal kernel review list bcm-kernel-feedback-list@broadcom.com Cc: virtualization@lists.linux.dev Cc: netdev@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org Cc: linux-hyperv@vger.kernel.org Cc: berrange@redhat.com
Changes in v4:
- removed RFC tag
- implemented loopback support
- renamed new tests to better reflect behavior
- completed suite of tests with permutations of ns modes and vsock_test
as guest/host
- simplified socat bridging with unix socket instead of tcp + veth
- only use vsock_test for success case, socat for failure case (context
in commit message)
- lots of cleanup
Changes in v3:
- add notion of "modes"
- add procfs /proc/net/vsock_ns_mode
- local and global modes only
- no /dev/vhost-vsock-netns
- vmtest.sh already merged, so new patch just adds new tests for NS
- Link to v2:
https://lore.kernel.org/kvm/20250312-vsock-netns-v2-0-84bffa1aa97a@gmail.com
Changes in v2:
- only support vhost-vsock namespaces
- all g2h namespaces retain old behavior, only common API changes
impacted by vhost-vsock changes
- add /dev/vhost-vsock-netns for "opt-in"
- leave /dev/vhost-vsock to old behavior
- removed netns module param
- Link to v1:
https://lore.kernel.org/r/20200116172428.311437-1-sgarzare@redhat.com
Changes in v1:
- added 'netns' module param to vsock.ko to enable the
network namespace support (disabled by default)
- added 'vsock_net_eq()' to check the "net" assigned to a socket
only when 'netns' support is enabled
- Link to RFC: https://patchwork.ozlabs.org/cover/1202235/
Bobby Eshleman (12): vsock: a per-net vsock NS mode state vsock: add net to vsock skb cb vsock: add netns to af_vsock core vsock/virtio: add netns to virtio transport common vhost/vsock: add netns support vsock/virtio: use the global netns hv_sock: add netns hooks vsock/vmci: add netns hooks vsock/loopback: add netns support selftests/vsock: improve logging in vmtest.sh selftests/vsock: invoke vsock_test through helpers selftests/vsock: add namespace tests
MAINTAINERS | 1 + drivers/vhost/vsock.c | 48 +- include/linux/virtio_vsock.h | 12 + include/net/af_vsock.h | 59 +- include/net/net_namespace.h | 4 + include/net/netns/vsock.h | 21 + net/vmw_vsock/af_vsock.c | 204 +++++- net/vmw_vsock/hyperv_transport.c | 2 +- net/vmw_vsock/virtio_transport.c | 5 +- net/vmw_vsock/virtio_transport_common.c | 14 +- net/vmw_vsock/vmci_transport.c | 4 +- net/vmw_vsock/vsock_loopback.c | 59 +- tools/testing/selftests/vsock/vmtest.sh | 1088 ++++++++++++++++++++++++++----- 13 files changed, 1330 insertions(+), 191 deletions(-)
base-commit: dd500e4aecf25e48e874ca7628697969df679493 change-id: 20250325-vsock-vmtest-b3a21d2102c2
Best regards,
Bobby Eshleman bobbyeshleman@meta.com
On Thu, Aug 07, 2025 at 10:06:35AM +0200, Stefano Garzarella wrote:
Hi Bobby,
...
Thanks for your work!
As I mentioned to you, I'll be off for the next 2 weeks, so I'll take a look when I'm back, but feel free to send new versions if you receive enough comments on this.
Thanks, Stefano
Thanks Stefano, enjoy your time off!
Best, Bobby
linux-kselftest-mirror@lists.linaro.org