On Thu, Dec 22, 2022 at 11:21:21AM +0100, Johan Hovold wrote:
There is a lock inversion and rwsem read-lock recursion in the devfreq target callback which can lead to deadlocks.
Specifically, ufshcd_devfreq_scale() already holds a clk_scaling_lock read lock when toggling the write booster, which involves taking the dev_cmd mutex before taking another clk_scaling_lock read lock.
This can lead to a deadlock if another thread:
tries to acquire the dev_cmd and clk_scaling locks in the correct order, or
takes a clk_scaling write lock before the attempt to take the clk_scaling read lock a second time.
Fix this by dropping the clk_scaling_lock before toggling the write booster as was done before commit 0e9d4ca43ba8 ("scsi: ufs: Protect some contexts from unexpected clock scaling").
While the devfreq callbacks are already serialised, add a second serialising mutex to handle the unlikely case where a callback triggered through the devfreq sysfs interface is racing with a request to disable clock scaling through the UFS controller 'clkscale_enable' sysfs attribute. This could otherwise lead to the write booster being left disabled after having disabled clock scaling.
Also take the new mutex in ufshcd_clk_scaling_allow() to make sure that any pending write booster update has completed on return.
Note that this currently only affects Qualcomm platforms since commit 87bd05016a64 ("scsi: ufs: core: Allow host driver to disable wb toggling during clock scaling").
The lock inversion (i.e. 1 above) was reported by lockdep as:
====================================================== WARNING: possible circular locking dependency detected 6.1.0-next-20221216 #211 Not tainted
kworker/u16:2/71 is trying to acquire lock: ffff076280ba98a0 (&hba->dev_cmd.lock){+.+.}-{3:3}, at: ufshcd_query_flag+0x50/0x1c0
but task is already holding lock: ffff076280ba9cf0 (&hba->clk_scaling_lock){++++}-{3:3}, at: ufshcd_devfreq_scale+0x2b8/0x380
which lock already depends on the new lock. [ +0.011606] the existing dependency chain (in reverse order) is:
-> #1 (&hba->clk_scaling_lock){++++}-{3:3}: lock_acquire+0x68/0x90 down_read+0x58/0x80 ufshcd_exec_dev_cmd+0x70/0x2c0 ufshcd_verify_dev_init+0x68/0x170 ufshcd_probe_hba+0x398/0x1180 ufshcd_async_scan+0x30/0x320 async_run_entry_fn+0x34/0x150 process_one_work+0x288/0x6c0 worker_thread+0x74/0x450 kthread+0x118/0x120 ret_from_fork+0x10/0x20
-> #0 (&hba->dev_cmd.lock){+.+.}-{3:3}: __lock_acquire+0x12a0/0x2240 lock_acquire.part.0+0xcc/0x220 lock_acquire+0x68/0x90 __mutex_lock+0x98/0x430 mutex_lock_nested+0x2c/0x40 ufshcd_query_flag+0x50/0x1c0 ufshcd_query_flag_retry+0x64/0x100 ufshcd_wb_toggle+0x5c/0x120 ufshcd_devfreq_scale+0x2c4/0x380 ufshcd_devfreq_target+0xf4/0x230 devfreq_set_target+0x84/0x2f0 devfreq_update_target+0xc4/0xf0 devfreq_monitor+0x38/0x1f0 process_one_work+0x288/0x6c0 worker_thread+0x74/0x450 kthread+0x118/0x120 ret_from_fork+0x10/0x20
other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&hba->clk_scaling_lock); lock(&hba->dev_cmd.lock); lock(&hba->clk_scaling_lock); lock(&hba->dev_cmd.lock);
*** DEADLOCK ***
Fixes: 0e9d4ca43ba8 ("scsi: ufs: Protect some contexts from unexpected clock scaling") Cc: stable@vger.kernel.org # 5.12 Cc: Can Guo quic_cang@quicinc.com Signed-off-by: Johan Hovold johan+linaro@kernel.org
Tested-by: Andrew Halaney ahalaney@redhat.com
Thanks for taking the time to fix this, I can't get any UFS related splats to show after a brief round of testing now.
For what it is worth, the change looks good to me as well but I'll leave it to someone familiar with the UFS core to add a proper Reviewed-by tag.
Thanks, Andrew