Some devices tend to trigger SYS_ERR interrupt while the host handling SYS_ERR state of the device during power up. This creates a race condition and causes a failure in booting up the device.
The issue is seen on the Sierra Wireless EM9191 modem during SYS_ERR handling in mhi_async_power_up(). Once the host detects that the device is in SYS_ERR state, it issues MHI_RESET and waits for the device to process the reset request. During this time, the device triggers SYS_ERR interrupt to the host and host starts handling SYS_ERR execution.
So by the time the device has completed reset, host starts SYS_ERR handling. This causes the race condition and the modem fails to boot.
Hence, register the IRQ handler only after handling the SYS_ERR check to avoid getting spurious IRQs from the device.
Cc: stable@vger.kernel.org Fixes: e18d4e9fa79b ("bus: mhi: core: Handle syserr during power_up") Reported-by: Aleksander Morgado aleksander@aleksander.es Signed-off-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org ---
Changes in v2:
* Switched to "mhi_poll_reg_field" for detecting MHI reset in device.
drivers/bus/mhi/core/pm.c | 32 ++++++++++---------------------- 1 file changed, 10 insertions(+), 22 deletions(-)
diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c index fb99e3727155..3c347fe9b10d 100644 --- a/drivers/bus/mhi/core/pm.c +++ b/drivers/bus/mhi/core/pm.c @@ -1038,7 +1038,6 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl) enum mhi_ee_type current_ee; enum dev_st_transition next_state; struct device *dev = &mhi_cntrl->mhi_dev->dev; - u32 val; int ret;
dev_info(dev, "Requested to power ON\n"); @@ -1055,10 +1054,6 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl) mutex_lock(&mhi_cntrl->pm_mutex); mhi_cntrl->pm_state = MHI_PM_DISABLE;
- ret = mhi_init_irq_setup(mhi_cntrl); - if (ret) - goto error_setup_irq; - /* Setup BHI INTVEC */ write_lock_irq(&mhi_cntrl->pm_lock); mhi_write_reg(mhi_cntrl, mhi_cntrl->bhi, BHI_INTVEC, 0); @@ -1072,7 +1067,7 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl) dev_err(dev, "%s is not a valid EE for power on\n", TO_MHI_EXEC_STR(current_ee)); ret = -EIO; - goto error_async_power_up; + goto error_setup_irq; }
state = mhi_get_mhi_state(mhi_cntrl); @@ -1081,20 +1076,12 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
if (state == MHI_STATE_SYS_ERR) { mhi_set_mhi_state(mhi_cntrl, MHI_STATE_RESET); - ret = wait_event_timeout(mhi_cntrl->state_event, - MHI_PM_IN_FATAL_STATE(mhi_cntrl->pm_state) || - mhi_read_reg_field(mhi_cntrl, - mhi_cntrl->regs, - MHICTRL, - MHICTRL_RESET_MASK, - MHICTRL_RESET_SHIFT, - &val) || - !val, - msecs_to_jiffies(mhi_cntrl->timeout_ms)); - if (!ret) { - ret = -EIO; + ret = mhi_poll_reg_field(mhi_cntrl, mhi_cntrl->regs, MHICTRL, + MHICTRL_RESET_MASK, MHICTRL_RESET_SHIFT, 0, + msecs_to_jiffies(mhi_cntrl->timeout_ms)); + if (ret) { dev_info(dev, "Failed to reset MHI due to syserr state\n"); - goto error_async_power_up; + goto error_setup_irq; }
/* @@ -1104,6 +1091,10 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl) mhi_write_reg(mhi_cntrl, mhi_cntrl->bhi, BHI_INTVEC, 0); }
+ ret = mhi_init_irq_setup(mhi_cntrl); + if (ret) + goto error_setup_irq; + /* Transition to next state */ next_state = MHI_IN_PBL(current_ee) ? DEV_ST_TRANSITION_PBL : DEV_ST_TRANSITION_READY; @@ -1116,9 +1107,6 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
return 0;
-error_async_power_up: - mhi_deinit_free_irq(mhi_cntrl); - error_setup_irq: mhi_cntrl->pm_state = MHI_PM_DISABLE; mutex_unlock(&mhi_cntrl->pm_mutex);
On 2021-11-08 09:49 AM, Manivannan Sadhasivam wrote:
Some devices tend to trigger SYS_ERR interrupt while the host handling SYS_ERR state of the device during power up. This creates a race condition and causes a failure in booting up the device.
The issue is seen on the Sierra Wireless EM9191 modem during SYS_ERR handling in mhi_async_power_up(). Once the host detects that the device is in SYS_ERR state, it issues MHI_RESET and waits for the device to process the reset request. During this time, the device triggers SYS_ERR interrupt to the host and host starts handling SYS_ERR execution.
So by the time the device has completed reset, host starts SYS_ERR handling. This causes the race condition and the modem fails to boot.
Hence, register the IRQ handler only after handling the SYS_ERR check to avoid getting spurious IRQs from the device.
Cc: stable@vger.kernel.org Fixes: e18d4e9fa79b ("bus: mhi: core: Handle syserr during power_up") Reported-by: Aleksander Morgado aleksander@aleksander.es Signed-off-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org
Changes in v2:
- Switched to "mhi_poll_reg_field" for detecting MHI reset in device.
drivers/bus/mhi/core/pm.c | 32 ++++++++++---------------------- 1 file changed, 10 insertions(+), 22 deletions(-)
diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c index fb99e3727155..3c347fe9b10d 100644 --- a/drivers/bus/mhi/core/pm.c +++ b/drivers/bus/mhi/core/pm.c @@ -1038,7 +1038,6 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl) enum mhi_ee_type current_ee; enum dev_st_transition next_state; struct device *dev = &mhi_cntrl->mhi_dev->dev;
- u32 val;
u32 interval_us = 25000; /* poll register field every 25 milliseconds */
int ret;
dev_info(dev, "Requested to power ON\n"); @@ -1055,10 +1054,6 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl) mutex_lock(&mhi_cntrl->pm_mutex); mhi_cntrl->pm_state = MHI_PM_DISABLE;
- ret = mhi_init_irq_setup(mhi_cntrl);
- if (ret)
goto error_setup_irq;
- /* Setup BHI INTVEC */ write_lock_irq(&mhi_cntrl->pm_lock); mhi_write_reg(mhi_cntrl, mhi_cntrl->bhi, BHI_INTVEC, 0);
Let's also set the BHI INTVEC up _after_ the IRQ setup is done. I believe doing this before setting up IRQs may be racy.
@@ -1072,7 +1067,7 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl) dev_err(dev, "%s is not a valid EE for power on\n", TO_MHI_EXEC_STR(current_ee)); ret = -EIO;
goto error_async_power_up;
goto error_setup_irq;
}
state = mhi_get_mhi_state(mhi_cntrl);
@@ -1081,20 +1076,12 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
if (state == MHI_STATE_SYS_ERR) { mhi_set_mhi_state(mhi_cntrl, MHI_STATE_RESET);
ret = wait_event_timeout(mhi_cntrl->state_event,
MHI_PM_IN_FATAL_STATE(mhi_cntrl->pm_state) ||
mhi_read_reg_field(mhi_cntrl,
mhi_cntrl->regs,
MHICTRL,
MHICTRL_RESET_MASK,
MHICTRL_RESET_SHIFT,
&val) ||
!val,
msecs_to_jiffies(mhi_cntrl->timeout_ms));
if (!ret) {
ret = -EIO;
ret = mhi_poll_reg_field(mhi_cntrl, mhi_cntrl->regs, MHICTRL,
MHICTRL_RESET_MASK, MHICTRL_RESET_SHIFT, 0,
msecs_to_jiffies(mhi_cntrl->timeout_ms));
Use "interval_us" as the last argument here since this API takes the polling interval value in microseconds as the last argument.
if (ret) { dev_info(dev, "Failed to reset MHI due to syserr state\n");
goto error_async_power_up;
goto error_setup_irq;
}
/*
@@ -1104,6 +1091,10 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl) mhi_write_reg(mhi_cntrl, mhi_cntrl->bhi, BHI_INTVEC, 0); }
- ret = mhi_init_irq_setup(mhi_cntrl);
- if (ret)
goto error_setup_irq;
- /* Transition to next state */ next_state = MHI_IN_PBL(current_ee) ? DEV_ST_TRANSITION_PBL : DEV_ST_TRANSITION_READY;
@@ -1116,9 +1107,6 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
return 0;
-error_async_power_up:
- mhi_deinit_free_irq(mhi_cntrl);
error_setup_irq: mhi_cntrl->pm_state = MHI_PM_DISABLE; mutex_unlock(&mhi_cntrl->pm_mutex);
Thanks, Bhaumik --- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
Adding same comment in v2 On 11/8/2021 9:49 AM, Manivannan Sadhasivam wrote:
Some devices tend to trigger SYS_ERR interrupt while the host handling SYS_ERR state of the device during power up. This creates a race condition and causes a failure in booting up the device.
The issue is seen on the Sierra Wireless EM9191 modem during SYS_ERR handling in mhi_async_power_up(). Once the host detects that the device is in SYS_ERR state, it issues MHI_RESET and waits for the device to process the reset request. During this time, the device triggers SYS_ERR
Device is not triggering the SYS_ERR interrupt, interrupt was triggered due to MHI RESET was getting cleared by device.
interrupt to the host and host starts handling SYS_ERR execution.
"As interrupts are setup, MHI reset results in device clearing the reset and it sends incoming BHI interrupt with state still seen as SYS_ERROR instead of READY."
So by the time the device has completed reset, host starts SYS_ERR handling. This causes the race condition and the modem fails to boot.
Hence, register the IRQ handler only after handling the SYS_ERR check to avoid getting spurious IRQs from the device.
Cc: stable@vger.kernel.org Fixes: e18d4e9fa79b ("bus: mhi: core: Handle syserr during power_up") Reported-by: Aleksander Morgado aleksander@aleksander.es Signed-off-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org
[..]
On 11/8/2021 12:06 PM, Hemant Kumar wrote:
Adding same comment in v2 On 11/8/2021 9:49 AM, Manivannan Sadhasivam wrote:
Some devices tend to trigger SYS_ERR interrupt while the host handling SYS_ERR state of the device during power up. This creates a race condition and causes a failure in booting up the device.
The issue is seen on the Sierra Wireless EM9191 modem during SYS_ERR handling in mhi_async_power_up(). Once the host detects that the device is in SYS_ERR state, it issues MHI_RESET and waits for the device to process the reset request. During this time, the device triggers SYS_ERR
Device is not triggering the SYS_ERR interrupt, interrupt was triggered due to MHI RESET was getting cleared by device.
Shouldn't the device state be RESET and not SYS_ERR at that point?
The device will enter SYS_ERR (and trigger an interrupt for that). Host issues MHI_RESET. Device is expected to clear SYS_ERR and enter the RESET state. Then the device clears MHI_RESET. Device can then trigger an interrupt to signal the state change (per the updated spec).
I was going to add my reviewed-by, but I'm confused by your comment.
interrupt to the host and host starts handling SYS_ERR execution.
"As interrupts are setup, MHI reset results in device clearing the reset and it sends incoming BHI interrupt with state still seen as SYS_ERROR instead of READY."
So by the time the device has completed reset, host starts SYS_ERR handling. This causes the race condition and the modem fails to boot.
Hence, register the IRQ handler only after handling the SYS_ERR check to avoid getting spurious IRQs from the device.
Cc: stable@vger.kernel.org Fixes: e18d4e9fa79b ("bus: mhi: core: Handle syserr during power_up") Reported-by: Aleksander Morgado aleksander@aleksander.es Signed-off-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org
[..]
On 11/17/2021 9:24 AM, Jeffrey Hugo wrote:
On 11/8/2021 12:06 PM, Hemant Kumar wrote:
Adding same comment in v2 On 11/8/2021 9:49 AM, Manivannan Sadhasivam wrote:
Some devices tend to trigger SYS_ERR interrupt while the host handling SYS_ERR state of the device during power up. This creates a race condition and causes a failure in booting up the device.
The issue is seen on the Sierra Wireless EM9191 modem during SYS_ERR handling in mhi_async_power_up(). Once the host detects that the device is in SYS_ERR state, it issues MHI_RESET and waits for the device to process the reset request. During this time, the device triggers SYS_ERR
Device is not triggering the SYS_ERR interrupt, interrupt was triggered due to MHI RESET was getting cleared by device.
Shouldn't the device state be RESET and not SYS_ERR at that point?
The device will enter SYS_ERR (and trigger an interrupt for that). Host issues MHI_RESET. Device is expected to clear SYS_ERR and enter the RESET state. Then the device clears MHI_RESET. Device can then trigger an interrupt to signal the state change (per the updated spec).
Dmesg log was showing first sys err was triggered by device, as part of sys error handling host was setting MHI_RESET and expecting to get BHI Intvec. When BHI intvec was triggered by device, host handled it by checking the MHI status register. MHi status was still showing SYS_ERR being set (which was supposed to get cleared after host issuing MHI RESET). Due to that host side bhi intvec threaded handler took diff path to handle sys error again. This is what we are trying to avoid as we think for some reason device is not behaving as per spec and either setting sys err again or not clearing it by the time bhi intvec (for reset clear) is handled by host.
I was going to add my reviewed-by, but I'm confused by your comment.
[..]
Thanks, Hemant
On Wed, Nov 17, 2021 at 12:20:34PM -0800, Hemant Kumar wrote:
On 11/17/2021 9:24 AM, Jeffrey Hugo wrote:
On 11/8/2021 12:06 PM, Hemant Kumar wrote:
Adding same comment in v2 On 11/8/2021 9:49 AM, Manivannan Sadhasivam wrote:
Some devices tend to trigger SYS_ERR interrupt while the host handling SYS_ERR state of the device during power up. This creates a race condition and causes a failure in booting up the device.
The issue is seen on the Sierra Wireless EM9191 modem during SYS_ERR handling in mhi_async_power_up(). Once the host detects that the device is in SYS_ERR state, it issues MHI_RESET and waits for the device to process the reset request. During this time, the device triggers SYS_ERR
Device is not triggering the SYS_ERR interrupt, interrupt was triggered due to MHI RESET was getting cleared by device.
Shouldn't the device state be RESET and not SYS_ERR at that point?
The device will enter SYS_ERR (and trigger an interrupt for that). Host issues MHI_RESET. Device is expected to clear SYS_ERR and enter the RESET state. Then the device clears MHI_RESET. Device can then trigger an interrupt to signal the state change (per the updated spec).
Dmesg log was showing first sys err was triggered by device, as part of sys error handling host was setting MHI_RESET and expecting to get BHI Intvec. When BHI intvec was triggered by device, host handled it by checking the MHI status register. MHi status was still showing SYS_ERR being set (which was supposed to get cleared after host issuing MHI RESET). Due to that host side bhi intvec threaded handler took diff path to handle sys error again. This is what we are trying to avoid as we think for some reason device is not behaving as per spec and either setting sys err again or not clearing it by the time bhi intvec (for reset clear) is handled by host.
If you look at the initial log shared by Aleksander you can see that there was no IRQ from device until the host resets the device during powerup.
[ 7.060730] mhi-pci-generic 0000:01:00.0: MHI PCI device found: sierra-em919x [ 7.067906] mhi-pci-generic 0000:01:00.0: BAR 0: assigned [mem 0x600000000-0x600000fff 64bit] [ 7.076455] mhi-pci-generic 0000:01:00.0: enabling device (0000 -> 0002) [ 7.083277] mhi-pci-generic 0000:01:00.0: using shared MSI [ 7.089508] mhi mhi0: Requested to power ON [ 7.094080] mhi mhi0: Attempting power on with EE: PASS THROUGH, state: SYS ERROR [ 7.180371] mhi mhi0: local ee: INVALID_EE state: RESET device ee: PASS THROUGH state: SYS ERROR
So once the host detects that the device is in SYS_ERR state (might have happended during the warm reboot and the device went to a bad state), host resets the device.
I'm going by the powerup scenario and in that case, the commit description is valid.
Thanks, Mani
I was going to add my reviewed-by, but I'm confused by your comment.
[..]
Thanks, Hemant
-- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
linux-stable-mirror@lists.linaro.org