-----Original Message----- From: Simon Horman horms@kernel.org Sent: Tuesday, July 15, 2025 9:06 AM To: Haiyang Zhang haiyangz@linux.microsoft.com Cc: linux-hyperv@vger.kernel.org; netdev@vger.kernel.org; Haiyang Zhang haiyangz@microsoft.com; KY Srinivasan kys@microsoft.com; wei.liu@kernel.org; Dexuan Cui decui@microsoft.com; andrew+netdev@lunn.ch; edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; davem@davemloft.net; linux-kernel@vger.kernel.org; stable@vger.kernel.org; cavery@redhat.com Subject: [EXTERNAL] Re: [PATCH net,v2] hv_netvsc: Switch VF namespace in netvsc_open instead
On Mon, Jul 14, 2025 at 09:41:37AM -0700, Haiyang Zhang wrote:
From: Haiyang Zhang haiyangz@microsoft.com
The existing code move the VF NIC to new namespace when NETDEV_REGISTER
is
received on netvsc NIC. During deletion of the namespace, default_device_exit_batch() >> default_device_exit_net() is called. When netvsc NIC is moved back and registered to the default namespace, it automatically brings VF NIC back to the default namespace. This will
cause
the default_device_exit_net() >> for_each_netdev_safe loop unable to
detect
the list end, and hit NULL ptr:
[ 231.449420] mana 7870:00:00.0 enP30832s1: Moved VF to namespace with:
eth0
[ 231.449656] BUG: kernel NULL pointer dereference, address:
0000000000000010
[ 231.450246] #PF: supervisor read access in kernel mode [ 231.450579] #PF: error_code(0x0000) - not-present page [ 231.450916] PGD 17b8a8067 P4D 0 [ 231.451163] Oops: Oops: 0000 [#1] SMP NOPTI [ 231.451450] CPU: 82 UID: 0 PID: 1394 Comm: kworker/u768:1 Not tainted
6.16.0-rc4+ #3 VOLUNTARY
[ 231.452042] Hardware name: Microsoft Corporation Virtual
Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 11/21/2024
[ 231.452692] Workqueue: netns cleanup_net [ 231.452947] RIP: 0010:default_device_exit_batch+0x16c/0x3f0 [ 231.453326] Code: c0 0c f5 b3 e8 d5 db fe ff 48 85 c0 74 15 48 c7 c2
f8 fd ca b2 be 10 00 00 00 48 8d 7d c0 e8 7b 77 25 00 49 8b 86 28 01 00 00 <48> 8b 50 10 4c 8b 2a 4c 8d 62 f0 49 83 ed 10 4c 39 e0 0f 84 d6 00
[ 231.454294] RSP: 0018:ff75fc7c9bf9fd00 EFLAGS: 00010246 [ 231.454610] RAX: 0000000000000000 RBX: 0000000000000002 RCX:
61c8864680b583eb
[ 231.455094] RDX: ff1fa9f71462d800 RSI: ff75fc7c9bf9fd38 RDI:
0000000030766564
[ 231.455686] RBP: ff75fc7c9bf9fd78 R08: 0000000000000000 R09:
0000000000000000
[ 231.456126] R10: 0000000000000001 R11: 0000000000000004 R12:
ff1fa9f70088e340
[ 231.456621] R13: ff1fa9f70088e340 R14: ffffffffb3f50c20 R15:
ff1fa9f7103e6340
[ 231.457161] FS: 0000000000000000(0000) GS:ff1faa6783a08000(0000)
knlGS:0000000000000000
[ 231.457707] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 231.458031] CR2: 0000000000000010 CR3: 0000000179ab2006 CR4:
0000000000b73ef0
[ 231.458434] Call Trace: [ 231.458600] <TASK> [ 231.458777] ops_undo_list+0x100/0x220 [ 231.459015] cleanup_net+0x1b8/0x300 [ 231.459285] process_one_work+0x184/0x340
To fix it, move the VF namespace switching code from the NETDEV_REGISTER event handler to netvsc_open().
Cc: stable@vger.kernel.org Cc: cavery@redhat.com Fixes: 4c262801ea60 ("hv_netvsc: Fix VF namespace also in synthetic NIC
NETDEV_REGISTER event")
Signed-off-by: Haiyang Zhang haiyangz@microsoft.com
With this change do we go back to the situation that existed prior to the cited patch? Quoting the cited commit:
The existing code moves VF to the same namespace as the synthetic NIC during netvsc_register_vf(). But, if the synthetic device is moved to
a new namespace after the VF registration, the VF won't be moved together.
Or perhaps not because if synthetic device is moved then, in practice, it will subsequently be reopened? (Because it is closed as part of the move to a different netns?)
There are two cases: 1) the synthetic device is moved to a new namespace before the VF device is offered from PCI: During netvsc_register_vf() >> dev_change_net_namespace() will put VF to the same namespace.
2) the synthetic device is moved to a new namespace after the VF device is offered from PCI: The commit 4c262801ea60 does the move in netvsc_event_set_vf_ns >> dev_change_net_namespace(). But it will cause Null ptr error during namespace deletion >> default_device_exit_net().
This patch keeps the code path (1) unchanged, and fix the code path (2). And yes, __dev_change_net_namespace() >> netif_close(dev), so in the new namespace the NIC always needs to be re-opened before using.
Thanks, - Haiyang