On 2022-09-19 11:35:09 -0600, Keith Busch wrote:
Fixes: d4060d2be1132 ("nvme-pci: fix controller reset hang when racing with nvme_timeout")
I revisted that commit, and it doesn't sound correct. Specifically this part:
5) reset_work() continues to setup_io_queues() as it observes no error in init_identify(). However, the admin queue has already been quiesced in dev_disable(). Thus, any following commands would be blocked forever in blk_execute_rq().
When a timeout occurs in the CONNECTING state, the timeout handler unquiesces the queue specifically to flush out any blocked requests. Is that commit really necessary? I'd rather just revert it to save the extra per-IO checks if not.
I can not speak with certainty whether 4060d2be1132 need to be reverted or not. I will need to carefully inspect reset code path and do more experiments. If this commit gets reverted we still need to add `nvme_commit_rqs` to `nvme_mq_admin_ops`.