If the KVP (or VSS) daemon starts before the VMBus channel's ringbuffer is fully initialized, we can hit the panic below:
hv_utils: Registering HyperV Utility Driver hv_vmbus: registering driver hv_utils ... BUG: kernel NULL pointer dereference, address: 0000000000000000 CPU: 44 UID: 0 PID: 2552 Comm: hv_kvp_daemon Tainted: G E 6.11.0-rc3+ #1 RIP: 0010:hv_pkt_iter_first+0x12/0xd0 Call Trace: ... vmbus_recvpacket hv_kvp_onchannelcallback vmbus_on_event tasklet_action_common tasklet_action handle_softirqs irq_exit_rcu sysvec_hyperv_stimer0 </IRQ> <TASK> asm_sysvec_hyperv_stimer0 ... kvp_register_done hvt_op_read vfs_read ksys_read __x64_sys_read
This can happen because the KVP/VSS channel callback can be invoked even before the channel is fully opened: 1) as soon as hv_kvp_init() -> hvutil_transport_init() creates /dev/vmbus/hv_kvp, the kvp daemon can open the device file immediately and register itself to the driver by writing a message KVP_OP_REGISTER1 to the file (which is handled by kvp_on_msg() ->kvp_handle_handshake()) and reading the file for the driver's response, which is handled by hvt_op_read(), which calls hvt->on_read(), i.e. kvp_register_done().
2) the problem with kvp_register_done() is that it can cause the channel callback to be called even before the channel is fully opened, and when the channel callback is starting to run, util_probe()-> vmbus_open() may have not initialized the ringbuffer yet, so the callback can hit the panic of NULL pointer dereference.
To reproduce the panic consistently, we can add a "ssleep(10)" for KVP in __vmbus_open(), just before the first hv_ringbuffer_init(), and then we unload and reload the driver hv_utils, and run the daemon manually within the 10 seconds.
Fix the panic by checking the channel state in the channel callback. To avoid the race condition with __vmbus_open(), we disable and enable the channel callback temporarily in __vmbus_open().
The channel callbacks of the other VMBus devices don't need to check the channel state since they can't run before the channels are fully initialized.
Note: we would also need to fix the fcopy driver code, but that has been removed in commit ec314f61e4fc ("Drivers: hv: Remove fcopy driver") in the mainline kernel since v6.10. For old 6.x LTS kernels, and the 5.x and 4.x LTS kernels, the fcopy driver needs to be fixed when the fix is backported to the stable kernel branches.
Fixes: e0fa3e5e7df6 ("Drivers: hv: utils: fix a race on userspace daemons registration") Cc: stable@vger.kernel.org Signed-off-by: Dexuan Cui decui@microsoft.com --- drivers/hv/channel.c | 11 +++++++++++ drivers/hv/hv_kvp.c | 3 +++ drivers/hv/hv_snapshot.c | 3 +++ 3 files changed, 17 insertions(+)
diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c index fb8cd8469328..685e407a3fdf 100644 --- a/drivers/hv/channel.c +++ b/drivers/hv/channel.c @@ -657,6 +657,14 @@ static int __vmbus_open(struct vmbus_channel *newchannel, return -ENOMEM; }
+ /* + * The channel callbacks of KVP/VSS may run before __vmbus_open() + * finishes (see kvp_register_done() -> ... -> kvp_poll_wrapper()), so + * they check newchannel->state to tell the ringbuffer has been fully + * initialized or not. Disable and enable the tasklet to avoid the race. + */ + tasklet_disable(&newchannel->callback_event); + newchannel->state = CHANNEL_OPENING_STATE; newchannel->onchannel_callback = onchannelcallback; newchannel->channel_callback_context = context; @@ -750,6 +758,8 @@ static int __vmbus_open(struct vmbus_channel *newchannel, }
newchannel->state = CHANNEL_OPENED_STATE; + tasklet_enable(&newchannel->callback_event); + kfree(open_info); return 0;
@@ -766,6 +776,7 @@ static int __vmbus_open(struct vmbus_channel *newchannel, hv_ringbuffer_cleanup(&newchannel->inbound); vmbus_free_requestor(&newchannel->requestor); newchannel->state = CHANNEL_OPEN_STATE; + tasklet_enable(&newchannel->callback_event); return err; }
diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c index d35b60c06114..ec098067e579 100644 --- a/drivers/hv/hv_kvp.c +++ b/drivers/hv/hv_kvp.c @@ -662,6 +662,9 @@ void hv_kvp_onchannelcallback(void *context) if (kvp_transaction.state > HVUTIL_READY) return;
+ if (channel->state != CHANNEL_OPENED_STATE) + return; + if (vmbus_recvpacket(channel, recv_buffer, HV_HYP_PAGE_SIZE * 4, &recvlen, &requestid)) { pr_err_ratelimited("KVP request received. Could not read into recv buf\n"); return; diff --git a/drivers/hv/hv_snapshot.c b/drivers/hv/hv_snapshot.c index 0d2184be1691..f7924c2fc62e 100644 --- a/drivers/hv/hv_snapshot.c +++ b/drivers/hv/hv_snapshot.c @@ -301,6 +301,9 @@ void hv_vss_onchannelcallback(void *context) if (vss_transaction.state > HVUTIL_READY) return;
+ if (channel->state != CHANNEL_OPENED_STATE) + return; + if (vmbus_recvpacket(channel, recv_buffer, VSS_MAX_PKT_SIZE, &recvlen, &requestid)) { pr_err_ratelimited("VSS request received. Could not read into recv buf\n"); return;