From: Michael S. Tsirkin mst@redhat.com Sent: 26 June 2025 12:04 PM To: Parav Pandit parav@nvidia.com Cc: Stefan Hajnoczi stefanha@redhat.com; axboe@kernel.dk; virtualization@lists.linux.dev; linux-block@vger.kernel.org; stable@vger.kernel.org; NBU-Contact-Li Rongqing (EXTERNAL) lirongqing@baidu.com; Chaitanya Kulkarni chaitanyak@nvidia.com; xuanzhuo@linux.alibaba.com; pbonzini@redhat.com; jasowang@redhat.com; alok.a.tiwari@oracle.com; Max Gurtovoy mgurtovoy@nvidia.com; Israel Rukshin israelr@nvidia.com Subject: Re: [PATCH v5] virtio_blk: Fix disk deletion hang on device surprise removal
On Thu, Jun 26, 2025 at 06:29:09AM +0000, Parav Pandit wrote:
yes however this is not at all different that hotunplug right after reset.
For hotunplug after reset, we likely need a timeout handler. Because block driver running inside the remove() callback waiting for the IO,
may not get notified from driver core to synchronize ongoing remove().
Notified of what?
Notification that surprise-removal occurred.
So is the scenario that graceful remove starts, and meanwhile a surprise removal happens?
Right.
where is it stuck then? can you explain?
I am not sure I understood the question.
Let me try: Following scenario will hang even with the current fix:
Say, 1. the graceful removal is ongoing in the remove() callback, where disk deletion del_gendisk() is ongoing, which waits for the requests to complete,
2. Now few requests are yet to complete, and surprise removal started.
At this point, virtio block driver will not get notified by the driver core layer, because it is likely serializing remove() happening by user/driver unload and PCI hotplug driver-initiated device removal. So vblk driver doesn't know that device is removed, block layer is waiting for requests completions to arrive which it never gets. So del_gendisk() gets stuck.
This needs some kind of timeout handling to improve the situation to make removal more robust.
Did I answer or I didn't understand the question?