From: Peter Wang peter.wang@mediatek.com
Three have deadlock when runtime suspend wait flush rtc work, and rtc work call ufshcd_rpm_put_sync to wait runtime resume.
Here is deadlock backtrace kworker/0:1 D 4892.876354 10 10971 4859 0x4208060 0x8 10 0 120 670730152367 ptr f0ffff80c2e40000 0 1 0x00000001 0x000000ff 0x000000ff 0x000000ff <ffffffee5e71ddb0> __switch_to+0x1a8/0x2d4 <ffffffee5e71e604> __schedule+0x684/0xa98 <ffffffee5e71ea60> schedule+0x48/0xc8 <ffffffee5e725f78> schedule_timeout+0x48/0x170 <ffffffee5e71fb74> do_wait_for_common+0x108/0x1b0 <ffffffee5e71efe0> wait_for_completion+0x44/0x60 <ffffffee5d6de968> __flush_work+0x39c/0x424 <ffffffee5d6decc0> __cancel_work_sync+0xd8/0x208 <ffffffee5d6dee2c> cancel_delayed_work_sync+0x14/0x28 <ffffffee5e2551b8> __ufshcd_wl_suspend+0x19c/0x480 <ffffffee5e255fb8> ufshcd_wl_runtime_suspend+0x3c/0x1d4 <ffffffee5dffd80c> scsi_runtime_suspend+0x78/0xc8 <ffffffee5df93580> __rpm_callback+0x94/0x3e0 <ffffffee5df90b0c> rpm_suspend+0x2d4/0x65c <ffffffee5df91448> __pm_runtime_suspend+0x80/0x114 <ffffffee5dffd95c> scsi_runtime_idle+0x38/0x6c <ffffffee5df912f4> rpm_idle+0x264/0x338 <ffffffee5df90f14> __pm_runtime_idle+0x80/0x110 <ffffffee5e24ce44> ufshcd_rtc_work+0x128/0x1e4 <ffffffee5d6e3a40> process_one_work+0x26c/0x650 <ffffffee5d6e65c8> worker_thread+0x260/0x3d8 <ffffffee5d6edec8> kthread+0x110/0x134 <ffffffee5d616b18> ret_from_fork+0x10/0x20
Fixes: 6bf999e0eb41 ("scsi: ufs: core: Add UFS RTC support") Cc: stable@vger.kernel.org 6.6.x
Signed-off-by: Peter Wang peter.wang@mediatek.com --- drivers/ufs/core/ufshcd.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index 46433ecf0c4d..bfcf2d468b5e 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -8188,8 +8188,15 @@ static void ufshcd_rtc_work(struct work_struct *work)
hba = container_of(to_delayed_work(work), struct ufs_hba, ufs_rtc_update_work);
- /* Update RTC only when there are no requests in progress and UFSHCI is operational */ - if (!ufshcd_is_ufs_dev_busy(hba) && hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL) + /* + * Update RTC only when + * 1. there are no requests in progress + * 2. UFSHCI is operational + * 3. pm operation is not in progress + */ + if (!ufshcd_is_ufs_dev_busy(hba) && + hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL && + !hba->pm_op_in_progress) ufshcd_update_rtc(hba);
if (ufshcd_is_ufs_dev_active(hba) && hba->dev_info.rtc_update_period)
On Fri, 2024-07-12 at 17:43 +0800, peter.wang@mediatek.com wrote:
From: Peter Wang peter.wang@mediatek.com
Three have deadlock when runtime suspend wait flush rtc work, and rtc work call ufshcd_rpm_put_sync to wait runtime resume.
Reviewed-by: Bean Huo beanhuo@micron.com
@@ -8188,8 +8188,15 @@ static void ufshcd_rtc_work(struct work_struct *work)
hba = container_of(to_delayed_work(work), struct ufs_hba,
ufs_rtc_update_work);
Will returning here If (!ufshcd_is_ufs_dev_active(hba)) works? And remove it in the 2nd if clause?
Thanks, Avri
/* Update RTC only when there are no requests in progress and UFSHCI is
operational */
if (!ufshcd_is_ufs_dev_busy(hba) && hba->ufshcd_state ==
UFSHCD_STATE_OPERATIONAL)
/*
* Update RTC only when
* 1. there are no requests in progress
* 2. UFSHCI is operational
* 3. pm operation is not in progress
*/
if (!ufshcd_is_ufs_dev_busy(hba) &&
hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL &&
!hba->pm_op_in_progress) ufshcd_update_rtc(hba); if (ufshcd_is_ufs_dev_active(hba) && hba->dev_info.rtc_update_period)
-- 2.18.0
On Fri, 2024-07-12 at 10:33 +0000, Avri Altman wrote:
@@ -8188,8 +8188,15 @@ static void ufshcd_rtc_work(struct work_struct *work)
hba = container_of(to_delayed_work(work), struct ufs_hba, ufs_rtc_update_work);
Will returning here If (!ufshcd_is_ufs_dev_active(hba)) works? And remove it in the 2nd if clause?
Avri,
we need to reschedule next time work in the below code. if return, cannot.
whatelse I missed?
kind regards, Bean
On Fri, 2024-07-12 at 10:33 +0000, Avri Altman wrote:
@@ -8188,8 +8188,15 @@ static void ufshcd_rtc_work(struct work_struct *work)
hba = container_of(to_delayed_work(work), struct ufs_hba,
ufs_rtc_update_work);
Will returning here If (!ufshcd_is_ufs_dev_active(hba)) works? And remove it in the 2nd if clause?
Avri,
we need to reschedule next time work in the below code. if return, cannot.
whatelse I missed?
a) If (!ufshcd_is_ufs_dev_active(hba)) - will not schedule ? b) schedule on next __ufshcd_wl_resume?
kind regards, Bean
On Fri, 2024-07-12 at 12:31 +0000, Avri Altman wrote:
hba = container_of(to_delayed_work(work), struct ufs_hba, ufs_rtc_update_work);
Will returning here If (!ufshcd_is_ufs_dev_active(hba)) works? And remove it in the 2nd if clause?
Avri,
we need to reschedule next time work in the below code. if return, cannot.
whatelse I missed?
a) If (!ufshcd_is_ufs_dev_active(hba)) - will not schedule ? b) schedule on next __ufshcd_wl_resume?
hba->pm_op_in_progress is true during __ufshcd_wl_resume(), will not schedule update work.
On Fri, 2024-07-12 at 12:31 +0000, Avri Altman wrote:
External email : Please do not click links or open attachments until you have verified the sender or the content.
On Fri, 2024-07-12 at 10:33 +0000, Avri Altman wrote:
@@ -8188,8 +8188,15 @@ static void ufshcd_rtc_work(struct work_struct *work)
hba = container_of(to_delayed_work(work), struct
ufs_hba,
ufs_rtc_update_work);
Will returning here If (!ufshcd_is_ufs_dev_active(hba)) works? And remove it in the 2nd if clause?
Avri,
we need to reschedule next time work in the below code. if return,
cannot.
whatelse I missed?
a) If (!ufshcd_is_ufs_dev_active(hba)) - will not schedule ? b) schedule on next __ufshcd_wl_resume?
Hi Avri,
Yes, if dev is not active (RPM state is not RPM_ACTIVE), will not schedule rtc work and schedule on next __ufshcd_wl_resume.
Thanks. Peter
kind regards, Bean
On 7/12/24 2:43 AM, peter.wang@mediatek.com wrote:
Three have deadlock when runtime suspend wait flush rtc work, and rtc work call ufshcd_rpm_put_sync to wait runtime resume.
"Three have"? The above description is very hard to understand. Please improve it.
/* Update RTC only when there are no requests in progress and UFSHCI is operational */
- if (!ufshcd_is_ufs_dev_busy(hba) && hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL)
/*
* Update RTC only when
* 1. there are no requests in progress
* 2. UFSHCI is operational
* 3. pm operation is not in progress
*/
- if (!ufshcd_is_ufs_dev_busy(hba) &&
hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL &&
ufshcd_update_rtc(hba);!hba->pm_op_in_progress)
if (ufshcd_is_ufs_dev_active(hba) && hba->dev_info.rtc_update_period)
The above seems racy to me. I don't think there is any mechanism that prevents that hba->pm_op_in_progress is set after it has been checked and before ufshcd_update_rtc() is called. Has it been considered to add an ufshcd_rpm_get_sync_nowait() call before the hba->pm_op_in_progress check and a ufshcd_rpm_put_sync() call after the ufshcd_update_rtc() call?
Thanks,
Bart.
On Fri, 2024-07-12 at 10:34 -0700, Bart Van Assche wrote:
- /* Update RTC only when there are no requests in progress and UFSHCI is operational */ - if (!ufshcd_is_ufs_dev_busy(hba) && hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL) + /* + * Update RTC only when + * 1. there are no requests in progress + * 2. UFSHCI is operational + * 3. pm operation is not in progress + */ + if (!ufshcd_is_ufs_dev_busy(hba) && + hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL && + !hba->pm_op_in_progress) ufshcd_update_rtc(hba); if (ufshcd_is_ufs_dev_active(hba) && hba-
dev_info.rtc_update_period)
The above seems racy to me. I don't think there is any mechanism that prevents that hba->pm_op_in_progress is set after it has been checked and before ufshcd_update_rtc() is called. Has it been considered to add an ufshcd_rpm_get_sync_nowait() call before the hba-
pm_op_in_progress
check and a ufshcd_rpm_put_sync() call after the ufshcd_update_rtc() call?
Thanks,
Bart.
Bart,
do you want this:
diff --git a/drivers/ufs/core/ufshcd-priv.h b/drivers/ufs/core/ufshcd- priv.h index ce36154ce963..2b74d6329b9d 100644 --- a/drivers/ufs/core/ufshcd-priv.h +++ b/drivers/ufs/core/ufshcd-priv.h @@ -311,6 +311,25 @@ static inline int ufshcd_update_ee_usr_mask(struct ufs_hba *hba, &hba->ee_drv_mask, set, clr); }
+static inline int ufshcd_rpm_get_sync_nowait(struct ufs_hba *hba) +{ + int ret = 0; + struct device *dev = &hba->ufs_device_wlun->sdev_gendev; + + pm_runtime_get_noresume(dev); + + /* Check if the device is already active */ + if (pm_runtime_active(dev)) + return 0; + + /* Attempt to resume the device without blocking */ + ret = pm_request_resume(dev); + if (ret < 0) + pm_runtime_put_noidle(dev); + + return ret; +} + static inline int ufshcd_rpm_get_sync(struct ufs_hba *hba) { return pm_runtime_get_sync(&hba->ufs_device_wlun->sdev_gendev); diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index bea00e069e9a..1b7fc4ce9e5c 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -8209,10 +8209,8 @@ static void ufshcd_update_rtc(struct ufs_hba *hba) */ val = ts64.tv_sec - hba->dev_info.rtc_time_baseline;
- ufshcd_rpm_get_sync(hba); err = ufshcd_query_attr(hba, UPIU_QUERY_OPCODE_WRITE_ATTR, QUERY_ATTR_IDN_SECONDS_PASSED, 0, 0, &val); - ufshcd_rpm_put_sync(hba);
if (err) dev_err(hba->dev, "%s: Failed to update rtc %d\n", __func__, err); @@ -8226,10 +8224,14 @@ static void ufshcd_rtc_work(struct work_struct *work)
hba = container_of(to_delayed_work(work), struct ufs_hba, ufs_rtc_update_work);
+ if (ufshcd_rpm_get_sync_nowait(hba)) + goto out; + /* Update RTC only when there are no requests in progress and UFSHCI is operational */ if (!ufshcd_is_ufs_dev_busy(hba) && hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL) ufshcd_update_rtc(hba); - + ufshcd_rpm_put_sync(hba); +out: if (ufshcd_is_ufs_dev_active(hba) && hba-
dev_info.rtc_update_period)
schedule_delayed_work(&hba->ufs_rtc_update_work, msecs_to_jiffies(hba-
dev_info.rtc_update_period));
(END)
or can we change cancel_delayed_work_sync(&hba->ufs_rtc_update_work); to cancel_delayed_work(&hba->ufs_rtc_update_work); ??
kind regards, Bean
On Mon, 2024-07-15 at 00:37 +0200, Bean Huo wrote:
or can we change cancel_delayed_work_sync(&hba->ufs_rtc_update_work); to cancel_delayed_work(&hba->ufs_rtc_update_work); ??
kind regards, Bean
Hi Bean,
Using cancel_delayed_work instead of cancel_delayed_work_sync could work, but it may lead to wasted time resume, RTC update, and suspend again. It has increased the system's power consumption.
Thanks. Peter
On Fri, 2024-07-12 at 10:34 -0700, Bart Van Assche wrote:
External email : Please do not click links or open attachments until you have verified the sender or the content. On 7/12/24 2:43 AM, peter.wang@mediatek.com wrote:
Three have deadlock when runtime suspend wait flush rtc work, and rtc work call ufshcd_rpm_put_sync to wait runtime resume.
"Three have"? The above description is very hard to understand. Please improve it.
Hi Bart,
Sorry, will improve the description next version.
- /* Update RTC only when there are no requests in progress and
UFSHCI is operational */
-if (!ufshcd_is_ufs_dev_busy(hba) && hba->ufshcd_state ==
UFSHCD_STATE_OPERATIONAL)
- /*
- Update RTC only when
- there are no requests in progress
- UFSHCI is operational
- pm operation is not in progress
- */
+if (!ufshcd_is_ufs_dev_busy(hba) &&
- hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL &&
- !hba->pm_op_in_progress) ufshcd_update_rtc(hba);
if (ufshcd_is_ufs_dev_active(hba) && hba- dev_info.rtc_update_period)
The above seems racy to me. I don't think there is any mechanism that prevents that hba->pm_op_in_progress is set after it has been checked and before ufshcd_update_rtc() is called. Has it been considered to add an ufshcd_rpm_get_sync_nowait() call before the hba-
pm_op_in_progress
check and a ufshcd_rpm_put_sync() call after the ufshcd_update_rtc() call?
Thanks,
Bart.
Yes, check pm_op_in_progress still cannot guarantee the absence of race conditions. But use ufshcd_rpm_get_sync_nowait might be a bit complicated. How about use ufshcd_rpm_get_if_active? I will update next version.
Thanks. Peter
linux-stable-mirror@lists.linaro.org