On Thu, Nov 06, 2025 at 05:21:35PM +0100, Stefano Garzarella wrote:
On Thu, Oct 23, 2025 at 11:27:46AM -0700, Bobby Eshleman wrote:
From: Bobby Eshleman bobbyeshleman@meta.com
Add the ability to isolate vhost-vsock flows using namespaces.
The VM, via the vhost_vsock struct, inherits its namespace from the process that opens the vhost-vsock device. vhost_vsock lookup functions are modified to take into account the mode (e.g., if CIDs are matching but modes don't align, then return NULL).
vhost_vsock now acquires a reference to the namespace.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com
Changes in v7:
- remove the check_global flag of vhost_vsock_get(), that logic was both
wrong and not necessary, reuse vsock_net_check_mode() instead
- remove 'delete me' comment
Changes in v5:
- respect pid namespaces when assigning namespace to vhost_vsock
drivers/vhost/vsock.c | 44 ++++++++++++++++++++++++++++++++++---------- 1 file changed, 34 insertions(+), 10 deletions(-)
[...]
static int vhost_vsock_dev_open(struct inode *inode, struct file *file) {
struct vhost_virtqueue **vqs; struct vhost_vsock *vsock;
struct net *net; int ret;
/* This struct is large and allocation could fail, fall back to vmalloc
@@ -669,6 +684,14 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file) goto out; }
- net = current->nsproxy->net_ns;
- vsock->net = get_net_track(net, &vsock->ns_tracker, GFP_KERNEL);
- /* Cache the mode of the namespace so that if that netns mode changes,
* the vhost_vsock will continue to function as expected.*/I think we should document this in the commit description and in both we should add also the reason. (IIRC, it was to simplify everything and prevent a VM from changing modes when running and then tracking all its packets)
Sounds good!
- vsock->net_mode = vsock_net_mode(net);
- vsock->guest_cid = 0; /* no CID assigned yet */ vsock->seqpacket_allow = false;
@@ -708,7 +731,7 @@ static void vhost_vsock_reset_orphans(struct sock *sk) */
/* If the peer is still valid, no need to reset connection */
- if (vhost_vsock_get(vsk->remote_addr.svm_cid))
if (vhost_vsock_get(vsk->remote_addr.svm_cid, sock_net(sk), vsk->net_mode)) return;
/* If the close timeout is pending, let it expire. This avoids races
@@ -753,6 +776,7 @@ static int vhost_vsock_dev_release(struct inode *inode, struct file *file) virtio_vsock_skb_queue_purge(&vsock->send_pkt_queue);
vhost_dev_cleanup(&vsock->dev);
- put_net_track(vsock->net, &vsock->ns_tracker);
Doing this after virtio_vsock_skb_queue_purge() should ensure that all skbs have been drained, so there should be no one flying with this netns. Perhaps this clarifies my doubts about the skb net, but should we do something similar for loopback as well?
100% - for loopback the skb purge is done in the net exit hook, which is called just before netns destruction. Maybe it is worth commenting that context there too.
And maybe we should document that also in the virtio_vsock_skb_cb.
sgtm!
Best, Bobby