-----Original Message----- From: Greg KH gregkh@linuxfoundation.org Sent: Monday, November 26, 2018 11:35 AM To: KY Srinivasan kys@microsoft.com Cc: linux-kernel@vger.kernel.org; devel@linuxdriverproject.org; olaf@aepfle.de; apw@canonical.com; jasowang@redhat.com; Stephen Hemminger sthemmin@microsoft.com; Michael Kelley mikelley@microsoft.com; vkuznets vkuznets@redhat.com; Haiyang Zhang haiyangz@microsoft.com; stable@vger.kernel.org Subject: Re: [PATCH 2/2] Drivers: hv: vmbus: offload the handling of channels to two workqueues
On Mon, Nov 26, 2018 at 02:29:57AM +0000, kys@linuxonhyperv.com wrote:
From: Dexuan Cui decui@microsoft.com
vmbus_process_offer() mustn't call channel->sc_creation_callback() directly for sub-channels, because sc_creation_callback() -> vmbus_open() may never get the host's response to the OPEN_CHANNEL message (the host may rescind a channel at any time, e.g. in the case of hot removing a NIC), and vmbus_onoffer_rescind() may not wake up the vmbus_open() as it's blocked due to a non-zero vmbus_connection.offer_in_progress, and finally we have a deadlock.
The above is also true for primary channels, if the related device drivers use sync probing mode by default.
And, usually the handling of primary channels and sub-channels can depend on each other, so we should offload them to different workqueues to avoid possible deadlock, e.g. in sync-probing mode, NIC1's netvsc_subchan_work() can race with NIC2's netvsc_probe() -> rtnl_lock(), and causes deadlock: the former gets the rtnl_lock and waits for all the sub-channels to appear, but the latter can't get the rtnl_lock and this blocks the handling of sub-channels.
The patch can fix the multiple-NIC deadlock described above for v3.x kernels (e.g. RHEL 7.x) which don't support async-probing of devices, and v4.4, v4.9, v4.14 and v4.18 which support async-probing but don't enable async-probing for Hyper-V drivers (yet).
The patch can also fix the hang issue in sub-channel's handling described above for all versions of kernels, including v4.19 and v4.20-rc3.
So the patch should be applied to all the existing kernels.
Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug") Cc: stable@vger.kernel.org Cc: Stephen Hemminger sthemmin@microsoft.com Cc: K. Y. Srinivasan kys@microsoft.com Cc: Haiyang Zhang haiyangz@microsoft.com Signed-off-by: Dexuan Cui decui@microsoft.com Signed-off-by: K. Y. Srinivasan kys@microsoft.com
drivers/hv/channel_mgmt.c | 188 +++++++++++++++++++++++++---------
drivers/hv/connection.c | 24 ++++- drivers/hv/hyperv_vmbus.h | 7 ++ include/linux/hyperv.h | 7 ++ 4 files changed, 161 insertions(+), 65 deletions(-)
As Sasha pointed out, this patch does not even apply :(
Sorry about that. These patches applied cleanly on my tree (misc-next). This series is to be applied on top of patch 0001-Drivers-hv-vmbus-Remove-the-useless-API-vmbus_get_ou.patch While the patch 0001-Drivers-hv-vmbus-Remove-the-useless-API-vmbus_get_ou.patch has been committed to the char-misc-testing branch, it is not in the misc-linus branch and that is the reason for this problem.
Regards,
K. Y