From: Ionut Nechita ionut.nechita@windriver.com
Hi Ming,
Thank you for the feedback!
You're absolutely right - blk_mq_cpuhp_lock is only acquired in the slow path (setup/cleanup operations during queue initialization/teardown), not in the fast I/O path.
Looking at my testing results more carefully: - The queue_lock patch (PATCH 1/2) alone restores performance to 640 MB/s - The cpuhp_lock conversion (PATCH 2/2) doesn't contribute to fixing the I/O regression
The cpuhp_lock is used in: - blk_mq_remove_cpuhp() - queue cleanup - blk_mq_add_hw_queues_cpuhp() - queue setup - blk_mq_remove_hw_queues_cpuhp() - queue cleanup
These are indeed slow path operations with no contention in the I/O hot path.
I'll drop the second patch (cpuhp_lock conversion) and send v2 with only the queue_lock fix, which addresses the actual bottleneck: removing the sleeping lock from blk_mq_run_hw_queue() that was causing IRQ threads to serialize and enter D-state during I/O completion.
Best regards, Ionut