Every time I run test srp/002 the following deadlock is triggered:
task:multipathd Call Trace: <TASK> __schedule+0x8c1/0x1bf0 schedule+0xdd/0x270 schedule_preempt_disabled+0x1c/0x30 __mutex_lock+0xb89/0x1650 mutex_lock_nested+0x1f/0x30 dm_table_set_restrictions+0x823/0xdf0 __bind+0x166/0x590 dm_swap_table+0x2a7/0x490 do_resume+0x1b1/0x610 dev_suspend+0x55/0x1a0 ctl_ioctl+0x3a5/0x7e0 dm_ctl_ioctl+0x12/0x20 __x64_sys_ioctl+0x127/0x1a0 x64_sys_call+0xe2b/0x17d0 do_syscall_64+0x96/0x3a0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 </TASK> task:(udev-worker) Call Trace: <TASK> __schedule+0x8c1/0x1bf0 schedule+0xdd/0x270 blk_mq_freeze_queue_wait+0xf2/0x140 blk_mq_freeze_queue_nomemsave+0x23/0x30 queue_ra_store+0x14e/0x290 queue_attr_store+0x23e/0x2c0 sysfs_kf_write+0xde/0x140 kernfs_fop_write_iter+0x3b2/0x630 vfs_write+0x4fd/0x1390 ksys_write+0xfd/0x230 __x64_sys_write+0x76/0xc0 x64_sys_call+0x276/0x17d0 do_syscall_64+0x96/0x3a0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 </TASK>
This deadlock happens because blk_mq_freeze_queue_nomemsave() waits for pending requests to finish. The pending requests do never complete because the dm-multipath queue_if_no_path option is enabled and the only path in the dm-multipath configuration is being removed.
Fix this deadlock by removing the queue freezing/unfreezing code from queue_ra_store().
Freezing the request queue from inside a block layer sysfs store callback function is essential when modifying parameters that affect how bios or requests are processed, e.g. parameters that affect bio_split_to_limit(). Freezing the request queue when modifying parameters that do not affect bio nor request processing is not necessary.
Cc: Nilay Shroff nilay@linux.ibm.com Cc: stable@vger.kernel.org Fixes: b07a889e8335 ("block: move q->sysfs_lock and queue-freeze under show/store method") Signed-off-by: Bart Van Assche bvanassche@acm.org ---
Changes compared to v1: made the patch description more detailed.
block/blk-sysfs.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index b2b9b89d6967..1f63b184c6e9 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -105,7 +105,6 @@ queue_ra_store(struct gendisk *disk, const char *page, size_t count) { unsigned long ra_kb; ssize_t ret; - unsigned int memflags; struct request_queue *q = disk->queue;
ret = queue_var_store(&ra_kb, page, count); @@ -116,10 +115,8 @@ queue_ra_store(struct gendisk *disk, const char *page, size_t count) * calculated from the queue limits by queue_limits_commit_update. */ mutex_lock(&q->limits_lock); - memflags = blk_mq_freeze_queue(q); disk->bdi->ra_pages = ra_kb >> (PAGE_SHIFT - 10); mutex_unlock(&q->limits_lock); - blk_mq_unfreeze_queue(q, memflags);
return ret; }
On Thu, Jun 26, 2025 at 01:37:13PM -0700, Bart Van Assche wrote:
This deadlock happens because blk_mq_freeze_queue_nomemsave() waits for pending requests to finish. The pending requests do never complete because the dm-multipath queue_if_no_path option is enabled and the only path in the dm-multipath configuration is being removed.
Well, if there are queued never completed bios the freeze will obviously fail. I don't see how this freeze is special vs other freezes or other attributes that freeze.
On 6/27/25 12:17 AM, Christoph Hellwig wrote:
On Thu, Jun 26, 2025 at 01:37:13PM -0700, Bart Van Assche wrote:
This deadlock happens because blk_mq_freeze_queue_nomemsave() waits for pending requests to finish. The pending requests do never complete because the dm-multipath queue_if_no_path option is enabled and the only path in the dm-multipath configuration is being removed.
Well, if there are queued never completed bios the freeze will obviously fail. I don't see how this freeze is special vs other freezes or other attributes that freeze.
Hi Christoph,
There is a difference: there are Linux distros, e.g. openSUSE, that set the read_ahead_kb attribute from a udev rule. I'm not aware of any Linux distros that set any of the other attributes from a udev rule for which the queue gets frozen from the .store callback (nr_requests, nomerges, rq_affinity, io_poll, io_timeout and wbt_lat_usec).
Thanks,
Bart.
On 6/27/25 12:17 AM, Christoph Hellwig wrote:
Well, if there are queued never completed bios the freeze will obviously fail. I don't see how this freeze is special vs other freezes or other attributes that freeze.
Hi Christoph,
Do you perhaps want me to remove the freeze/unfreeze calls from all sysfs store callbacks from which it is safe to remove these callbacks?
Thanks,
Bart.
On Mon, Jun 30, 2025 at 03:39:18PM -0700, Bart Van Assche wrote:
On 6/27/25 12:17 AM, Christoph Hellwig wrote:
Well, if there are queued never completed bios the freeze will obviously fail. I don't see how this freeze is special vs other freezes or other attributes that freeze.
Hi Christoph,
Do you perhaps want me to remove the freeze/unfreeze calls from all sysfs store callbacks from which it is safe to remove these callbacks?
But don't the remaining attributes that are not safe remain susceptible to this deadlock?
On 6/30/25 5:42 PM, Keith Busch wrote:
On Mon, Jun 30, 2025 at 03:39:18PM -0700, Bart Van Assche wrote:
On 6/27/25 12:17 AM, Christoph Hellwig wrote:
Well, if there are queued never completed bios the freeze will obviously fail. I don't see how this freeze is special vs other freezes or other attributes that freeze.
Hi Christoph,
Do you perhaps want me to remove the freeze/unfreeze calls from all sysfs store callbacks from which it is safe to remove these callbacks?
But don't the remaining attributes that are not safe remain susceptible to this deadlock?
For the remaining sysfs attributes the deadlock can be solved by letting the blk_mq_freeze_queue() call by the sysfs store methods time out if that call takes too long or by making that call interruptible by signals like Ctrl-C. I think its better to let functions like queue_requests_store() fail if an attempt to freeze a request queue takes longer than it should rather than to trigger a kernel deadlock.
Bart.
On 7/1/25 7:20 AM, Bart Van Assche wrote:
On 6/30/25 5:42 PM, Keith Busch wrote:
On Mon, Jun 30, 2025 at 03:39:18PM -0700, Bart Van Assche wrote:
On 6/27/25 12:17 AM, Christoph Hellwig wrote:
Well, if there are queued never completed bios the freeze will obviously fail. I don't see how this freeze is special vs other freezes or other attributes that freeze.
Hi Christoph,
Do you perhaps want me to remove the freeze/unfreeze calls from all sysfs store callbacks from which it is safe to remove these callbacks?
But don't the remaining attributes that are not safe remain susceptible to this deadlock?
For the remaining sysfs attributes the deadlock can be solved by letting the blk_mq_freeze_queue() call by the sysfs store methods time out if that call takes too long or by making that call interruptible by signals like Ctrl-C. I think its better to let functions like queue_requests_store() fail if an attempt to freeze a request queue takes longer than it should rather than to trigger a kernel deadlock.
I think you're proposing to use blk_mq_freeze_queue_wait_timeout() here (which puts the caller into uninterruptible sleep) and that might be okay. However, IMO, using TASK_INTERRUPTIBLE may not be worth in case those sysfs store methods are invoked from udev rules or application. But lets wait for others to chime in and suggest.
Thanks, --Nilay
On 6/27/25 2:07 AM, Bart Van Assche wrote:
Every time I run test srp/002 the following deadlock is triggered:
task:multipathd Call Trace:
<TASK> __schedule+0x8c1/0x1bf0 schedule+0xdd/0x270 schedule_preempt_disabled+0x1c/0x30 __mutex_lock+0xb89/0x1650 mutex_lock_nested+0x1f/0x30 dm_table_set_restrictions+0x823/0xdf0 __bind+0x166/0x590 dm_swap_table+0x2a7/0x490 do_resume+0x1b1/0x610 dev_suspend+0x55/0x1a0 ctl_ioctl+0x3a5/0x7e0 dm_ctl_ioctl+0x12/0x20 __x64_sys_ioctl+0x127/0x1a0 x64_sys_call+0xe2b/0x17d0 do_syscall_64+0x96/0x3a0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 </TASK> task:(udev-worker) Call Trace: <TASK> __schedule+0x8c1/0x1bf0 schedule+0xdd/0x270 blk_mq_freeze_queue_wait+0xf2/0x140 blk_mq_freeze_queue_nomemsave+0x23/0x30 queue_ra_store+0x14e/0x290 queue_attr_store+0x23e/0x2c0 sysfs_kf_write+0xde/0x140 kernfs_fop_write_iter+0x3b2/0x630 vfs_write+0x4fd/0x1390 ksys_write+0xfd/0x230 __x64_sys_write+0x76/0xc0 x64_sys_call+0x276/0x17d0 do_syscall_64+0x96/0x3a0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 </TASK>
This deadlock happens because blk_mq_freeze_queue_nomemsave() waits for pending requests to finish. The pending requests do never complete because the dm-multipath queue_if_no_path option is enabled and the only path in the dm-multipath configuration is being removed.
Fix this deadlock by removing the queue freezing/unfreezing code from queue_ra_store().
Freezing the request queue from inside a block layer sysfs store callback function is essential when modifying parameters that affect how bios or requests are processed, e.g. parameters that affect bio_split_to_limit(). Freezing the request queue when modifying parameters that do not affect bio nor request processing is not necessary.
Cc: Nilay Shroff nilay@linux.ibm.com Cc: stable@vger.kernel.org Fixes: b07a889e8335 ("block: move q->sysfs_lock and queue-freeze under show/store method") Signed-off-by: Bart Van Assche bvanassche@acm.org
I hope we'd address other sysfs store attributes requiring queue-freeze in another patch. So with that,
Looks good to me: Reviewed-by: Nilay Shroff nilay@linux.ibm.com
linux-stable-mirror@lists.linaro.org