On 6/30/25 3:41 AM, Nilay Shroff wrote:
Looking at your earlier dmsetup command: # dmsetup table mpatha 0 65536 multipath 1 queue_if_no_path 1 alua 1 1 service-time 0 1 2 8:32 1 1
In the above rule, the option queue_if_no_path seems bit odd (unless used with timeout). Can't we add module param queue_if_no_path_timeout_secs=<N> while loading dm-multipath and thus avoid hanging the queue I/O indefinitely when all paths of a multipath device is lost? IMO, queue_if_no_path without timeout may make sense when we know that the paths will eventually recover and that applications should simply wait.
I refuse to modify the tests that trigger the deadlock because: 1. The deadlock is a REGRESSION. Regressions are not tolerated in the Linux kernel and should be fixed instead of arguing about whether or not the use case should be modified. 2. The test that triggers the deadlock is not new. It is almost ten years old and the deadlock reported at the start of this email thread is the first deadlock in the block layer triggered by that test. 3. queue_if_no_path is widely used to avoid I/O errors if all paths are temporarily unavailable and if it is not known how long it will take to restore a path. queue_if_no_path can e.g. be used to prevent I/O errors if a technician mistakenly pulls the wrong cable(s) in a data center. 4. Unnecessary blk_mq_freeze_queue()/blk_mq_unfreeze_queue() pairs slow down the workflows that trigger these kernel function calls. Hence, if blk_mq_freeze_queue() and blk_mq_unfreeze_queue() are called unnecessarily, the calls to these functions should be removed.
Bart.