From: Haiyang Zhang haiyangz@microsoft.com
The existing code move the VF NIC to new namespace when NETDEV_REGISTER is received on netvsc NIC. During deletion of the namespace, default_device_exit_batch() >> default_device_exit_net() is called. When netvsc NIC is moved back and registered to the default namespace, it automatically brings VF NIC back to the default namespace. This will cause the default_device_exit_net() >> for_each_netdev_safe loop unable to detect the list end, and hit NULL ptr:
[ 231.449420] mana 7870:00:00.0 enP30832s1: Moved VF to namespace with: eth0 [ 231.449656] BUG: kernel NULL pointer dereference, address: 0000000000000010 [ 231.450246] #PF: supervisor read access in kernel mode [ 231.450579] #PF: error_code(0x0000) - not-present page [ 231.450916] PGD 17b8a8067 P4D 0 [ 231.451163] Oops: Oops: 0000 [#1] SMP NOPTI [ 231.451450] CPU: 82 UID: 0 PID: 1394 Comm: kworker/u768:1 Not tainted 6.16.0-rc4+ #3 VOLUNTARY [ 231.452042] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/21/2024 [ 231.452692] Workqueue: netns cleanup_net [ 231.452947] RIP: 0010:default_device_exit_batch+0x16c/0x3f0 [ 231.453326] Code: c0 0c f5 b3 e8 d5 db fe ff 48 85 c0 74 15 48 c7 c2 f8 fd ca b2 be 10 00 00 00 48 8d 7d c0 e8 7b 77 25 00 49 8b 86 28 01 00 00 <48> 8b 50 10 4c 8b 2a 4c 8d 62 f0 49 83 ed 10 4c 39 e0 0f 84 d6 00 [ 231.454294] RSP: 0018:ff75fc7c9bf9fd00 EFLAGS: 00010246 [ 231.454610] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 61c8864680b583eb [ 231.455094] RDX: ff1fa9f71462d800 RSI: ff75fc7c9bf9fd38 RDI: 0000000030766564 [ 231.455686] RBP: ff75fc7c9bf9fd78 R08: 0000000000000000 R09: 0000000000000000 [ 231.456126] R10: 0000000000000001 R11: 0000000000000004 R12: ff1fa9f70088e340 [ 231.456621] R13: ff1fa9f70088e340 R14: ffffffffb3f50c20 R15: ff1fa9f7103e6340 [ 231.457161] FS: 0000000000000000(0000) GS:ff1faa6783a08000(0000) knlGS:0000000000000000 [ 231.457707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 231.458031] CR2: 0000000000000010 CR3: 0000000179ab2006 CR4: 0000000000b73ef0 [ 231.458434] Call Trace: [ 231.458600] <TASK> [ 231.458777] ops_undo_list+0x100/0x220 [ 231.459015] cleanup_net+0x1b8/0x300 [ 231.459285] process_one_work+0x184/0x340
To fix it, move the VF namespace switching code from the NETDEV_REGISTER event handler to netvsc_open().
Cc: stable@vger.kernel.org Cc: cavery@redhat.com Fixes: 4c262801ea60 ("hv_netvsc: Fix VF namespace also in synthetic NIC NETDEV_REGISTER event") Signed-off-by: Haiyang Zhang haiyangz@microsoft.com --- v2: verified it's applicable to net, fixed cc list.
--- drivers/net/hyperv/netvsc_drv.c | 43 ++++++++++----------------------- 1 file changed, 13 insertions(+), 30 deletions(-)
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index 42d98e99566e..074ecc346108 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -135,6 +135,19 @@ static int netvsc_open(struct net_device *net) }
if (vf_netdev) { + if (!net_eq(dev_net(net), dev_net(vf_netdev))) { + ret = dev_change_net_namespace(vf_netdev, dev_net(net), + "eth%d"); + if (ret) + netdev_err(vf_netdev, + "Cannot move to same ns as %s: %d\n", + net->name, ret); + else + netdev_info(vf_netdev, + "Moved VF to namespace with: %s\n", + net->name); + } + /* Setting synthetic device up transparently sets * slave as up. If open fails, then slave will be * still be offline (and not used). @@ -2772,31 +2785,6 @@ static struct hv_driver netvsc_drv = { }, };
-/* Set VF's namespace same as the synthetic NIC */ -static void netvsc_event_set_vf_ns(struct net_device *ndev) -{ - struct net_device_context *ndev_ctx = netdev_priv(ndev); - struct net_device *vf_netdev; - int ret; - - vf_netdev = rtnl_dereference(ndev_ctx->vf_netdev); - if (!vf_netdev) - return; - - if (!net_eq(dev_net(ndev), dev_net(vf_netdev))) { - ret = dev_change_net_namespace(vf_netdev, dev_net(ndev), - "eth%d"); - if (ret) - netdev_err(vf_netdev, - "Cannot move to same namespace as %s: %d\n", - ndev->name, ret); - else - netdev_info(vf_netdev, - "Moved VF to namespace with: %s\n", - ndev->name); - } -} - /* * On Hyper-V, every VF interface is matched with a corresponding * synthetic interface. The synthetic interface is presented first @@ -2809,11 +2797,6 @@ static int netvsc_netdev_event(struct notifier_block *this, struct net_device *event_dev = netdev_notifier_info_to_dev(ptr); int ret = 0;
- if (event_dev->netdev_ops == &device_ops && event == NETDEV_REGISTER) { - netvsc_event_set_vf_ns(event_dev); - return NOTIFY_DONE; - } - ret = check_dev_is_matching_vf(event_dev); if (ret != 0) return NOTIFY_DONE;
On Mon, Jul 14, 2025 at 09:41:37AM -0700, Haiyang Zhang wrote:
From: Haiyang Zhang haiyangz@microsoft.com
The existing code move the VF NIC to new namespace when NETDEV_REGISTER is received on netvsc NIC. During deletion of the namespace, default_device_exit_batch() >> default_device_exit_net() is called. When netvsc NIC is moved back and registered to the default namespace, it automatically brings VF NIC back to the default namespace. This will cause the default_device_exit_net() >> for_each_netdev_safe loop unable to detect the list end, and hit NULL ptr:
[ 231.449420] mana 7870:00:00.0 enP30832s1: Moved VF to namespace with: eth0 [ 231.449656] BUG: kernel NULL pointer dereference, address: 0000000000000010 [ 231.450246] #PF: supervisor read access in kernel mode [ 231.450579] #PF: error_code(0x0000) - not-present page [ 231.450916] PGD 17b8a8067 P4D 0 [ 231.451163] Oops: Oops: 0000 [#1] SMP NOPTI [ 231.451450] CPU: 82 UID: 0 PID: 1394 Comm: kworker/u768:1 Not tainted 6.16.0-rc4+ #3 VOLUNTARY [ 231.452042] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/21/2024 [ 231.452692] Workqueue: netns cleanup_net [ 231.452947] RIP: 0010:default_device_exit_batch+0x16c/0x3f0 [ 231.453326] Code: c0 0c f5 b3 e8 d5 db fe ff 48 85 c0 74 15 48 c7 c2 f8 fd ca b2 be 10 00 00 00 48 8d 7d c0 e8 7b 77 25 00 49 8b 86 28 01 00 00 <48> 8b 50 10 4c 8b 2a 4c 8d 62 f0 49 83 ed 10 4c 39 e0 0f 84 d6 00 [ 231.454294] RSP: 0018:ff75fc7c9bf9fd00 EFLAGS: 00010246 [ 231.454610] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 61c8864680b583eb [ 231.455094] RDX: ff1fa9f71462d800 RSI: ff75fc7c9bf9fd38 RDI: 0000000030766564 [ 231.455686] RBP: ff75fc7c9bf9fd78 R08: 0000000000000000 R09: 0000000000000000 [ 231.456126] R10: 0000000000000001 R11: 0000000000000004 R12: ff1fa9f70088e340 [ 231.456621] R13: ff1fa9f70088e340 R14: ffffffffb3f50c20 R15: ff1fa9f7103e6340 [ 231.457161] FS: 0000000000000000(0000) GS:ff1faa6783a08000(0000) knlGS:0000000000000000 [ 231.457707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 231.458031] CR2: 0000000000000010 CR3: 0000000179ab2006 CR4: 0000000000b73ef0 [ 231.458434] Call Trace: [ 231.458600] <TASK> [ 231.458777] ops_undo_list+0x100/0x220 [ 231.459015] cleanup_net+0x1b8/0x300 [ 231.459285] process_one_work+0x184/0x340
To fix it, move the VF namespace switching code from the NETDEV_REGISTER event handler to netvsc_open().
Cc: stable@vger.kernel.org Cc: cavery@redhat.com Fixes: 4c262801ea60 ("hv_netvsc: Fix VF namespace also in synthetic NIC NETDEV_REGISTER event") Signed-off-by: Haiyang Zhang haiyangz@microsoft.com
With this change do we go back to the situation that existed prior to the cited patch? Quoting the cited commit:
The existing code moves VF to the same namespace as the synthetic NIC during netvsc_register_vf(). But, if the synthetic device is moved to a new namespace after the VF registration, the VF won't be moved together.
Or perhaps not because if synthetic device is moved then, in practice, it will subsequently be reopened? (Because it is closed as part of the move to a different netns?)
I am unsure.
-----Original Message----- From: Simon Horman horms@kernel.org Sent: Tuesday, July 15, 2025 9:06 AM To: Haiyang Zhang haiyangz@linux.microsoft.com Cc: linux-hyperv@vger.kernel.org; netdev@vger.kernel.org; Haiyang Zhang haiyangz@microsoft.com; KY Srinivasan kys@microsoft.com; wei.liu@kernel.org; Dexuan Cui decui@microsoft.com; andrew+netdev@lunn.ch; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; davem@davemloft.net; linux-kernel@vger.kernel.org; stable@vger.kernel.org; cavery@redhat.com Subject: [EXTERNAL] Re: [PATCH net,v2] hv_netvsc: Switch VF namespace in netvsc_open instead
On Mon, Jul 14, 2025 at 09:41:37AM -0700, Haiyang Zhang wrote:
From: Haiyang Zhang haiyangz@microsoft.com
The existing code move the VF NIC to new namespace when NETDEV_REGISTER
is
received on netvsc NIC. During deletion of the namespace, default_device_exit_batch() >> default_device_exit_net() is called. When netvsc NIC is moved back and registered to the default namespace, it automatically brings VF NIC back to the default namespace. This will
cause
the default_device_exit_net() >> for_each_netdev_safe loop unable to
detect
the list end, and hit NULL ptr:
[ 231.449420] mana 7870:00:00.0 enP30832s1: Moved VF to namespace with:
eth0
[ 231.449656] BUG: kernel NULL pointer dereference, address:
0000000000000010
[ 231.450246] #PF: supervisor read access in kernel mode [ 231.450579] #PF: error_code(0x0000) - not-present page [ 231.450916] PGD 17b8a8067 P4D 0 [ 231.451163] Oops: Oops: 0000 [#1] SMP NOPTI [ 231.451450] CPU: 82 UID: 0 PID: 1394 Comm: kworker/u768:1 Not tainted
6.16.0-rc4+ #3 VOLUNTARY
[ 231.452042] Hardware name: Microsoft Corporation Virtual
Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/21/2024
[ 231.452692] Workqueue: netns cleanup_net [ 231.452947] RIP: 0010:default_device_exit_batch+0x16c/0x3f0 [ 231.453326] Code: c0 0c f5 b3 e8 d5 db fe ff 48 85 c0 74 15 48 c7 c2
f8 fd ca b2 be 10 00 00 00 48 8d 7d c0 e8 7b 77 25 00 49 8b 86 28 01 00 00 <48> 8b 50 10 4c 8b 2a 4c 8d 62 f0 49 83 ed 10 4c 39 e0 0f 84 d6 00
[ 231.454294] RSP: 0018:ff75fc7c9bf9fd00 EFLAGS: 00010246 [ 231.454610] RAX: 0000000000000000 RBX: 0000000000000002 RCX:
61c8864680b583eb
[ 231.455094] RDX: ff1fa9f71462d800 RSI: ff75fc7c9bf9fd38 RDI:
0000000030766564
[ 231.455686] RBP: ff75fc7c9bf9fd78 R08: 0000000000000000 R09:
0000000000000000
[ 231.456126] R10: 0000000000000001 R11: 0000000000000004 R12:
ff1fa9f70088e340
[ 231.456621] R13: ff1fa9f70088e340 R14: ffffffffb3f50c20 R15:
ff1fa9f7103e6340
[ 231.457161] FS: 0000000000000000(0000) GS:ff1faa6783a08000(0000)
knlGS:0000000000000000
[ 231.457707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 231.458031] CR2: 0000000000000010 CR3: 0000000179ab2006 CR4:
0000000000b73ef0
[ 231.458434] Call Trace: [ 231.458600] <TASK> [ 231.458777] ops_undo_list+0x100/0x220 [ 231.459015] cleanup_net+0x1b8/0x300 [ 231.459285] process_one_work+0x184/0x340
To fix it, move the VF namespace switching code from the NETDEV_REGISTER event handler to netvsc_open().
Cc: stable@vger.kernel.org Cc: cavery@redhat.com Fixes: 4c262801ea60 ("hv_netvsc: Fix VF namespace also in synthetic NIC
NETDEV_REGISTER event")
Signed-off-by: Haiyang Zhang haiyangz@microsoft.com
With this change do we go back to the situation that existed prior to the cited patch? Quoting the cited commit:
The existing code moves VF to the same namespace as the synthetic NIC during netvsc_register_vf(). But, if the synthetic device is moved to
a new namespace after the VF registration, the VF won't be moved together.
Or perhaps not because if synthetic device is moved then, in practice, it will subsequently be reopened? (Because it is closed as part of the move to a different netns?)
There are two cases: 1) the synthetic device is moved to a new namespace before the VF device is offered from PCI: During netvsc_register_vf() >> dev_change_net_namespace() will put VF to the same namespace.
2) the synthetic device is moved to a new namespace after the VF device is offered from PCI: The commit 4c262801ea60 does the move in netvsc_event_set_vf_ns >> dev_change_net_namespace(). But it will cause Null ptr error during namespace deletion >> default_device_exit_net().
This patch keeps the code path (1) unchanged, and fix the code path (2). And yes, __dev_change_net_namespace() >> netif_close(dev), so in the new namespace the NIC always needs to be re-opened before using.
Thanks, - Haiyang
linux-stable-mirror@lists.linaro.org