It seems that the firmware of the 88W8897 card sometimes ignores or misses when we try to wake it up by writing to the firmware status register. This leads to the firmware wakeup timeout expiring and the driver resetting the card because we assume the firmware has hung up or crashed (unfortunately that's not unlikely with this card).
Turns out that most of the time the firmware actually didn't hang up, but simply "missed" our wakeup request and didn't send us an AWAKE event.
Trying again to read the firmware status register after a short timeout usually makes the firmware wake up as expected, so add a small retry loop to mwifiex_pm_wakeup_card() that looks at the interrupt status to check whether the card woke up.
The number of tries and timeout lengths for this were determined experimentally: The firmware usually takes about 500 us to wake up after we attempt to read the status register. In some cases where the firmware is very busy (for example while doing a bluetooth scan) it might even miss our requests for multiple milliseconds, which is why after 15 tries the waiting time gets increased to 10 ms. The maximum number of tries it took to wake the firmware when testing this was around 20, so a maximum number of 50 tries should give us plenty of safety margin.
A good reproducer for this issue is letting the firmware sleep and wake up in very short intervals, for example by pinging a device on the network every 0.1 seconds.
Cc: stable@vger.kernel.org Signed-off-by: Jonas Dreßler verdre@v0yd.nl --- drivers/net/wireless/marvell/mwifiex/pcie.c | 33 +++++++++++++++++---- 1 file changed, 27 insertions(+), 6 deletions(-)
diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c index 0eff717ac5fa..7fea319e013c 100644 --- a/drivers/net/wireless/marvell/mwifiex/pcie.c +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c @@ -661,11 +661,15 @@ static void mwifiex_delay_for_sleep_cookie(struct mwifiex_adapter *adapter, "max count reached while accessing sleep cookie\n"); }
+#define N_WAKEUP_TRIES_SHORT_INTERVAL 15 +#define N_WAKEUP_TRIES_LONG_INTERVAL 35 + /* This function wakes up the card by reading fw_status register. */ static int mwifiex_pm_wakeup_card(struct mwifiex_adapter *adapter) { struct pcie_service_card *card = adapter->card; const struct mwifiex_pcie_card_reg *reg = card->pcie.reg; + int n_tries = 0;
mwifiex_dbg(adapter, EVENT, "event: Wakeup device...\n"); @@ -673,12 +677,29 @@ static int mwifiex_pm_wakeup_card(struct mwifiex_adapter *adapter) if (reg->sleep_cookie) mwifiex_pcie_dev_wakeup_delay(adapter);
- /* Accessing fw_status register will wakeup device */ - if (mwifiex_write_reg(adapter, reg->fw_status, FIRMWARE_READY_PCIE)) { - mwifiex_dbg(adapter, ERROR, - "Writing fw_status register failed\n"); - return -1; - } + /* Access the fw_status register to wake up the device. + * Since the 88W8897 firmware sometimes appears to ignore or miss + * that wakeup request, we continue trying until we receive an + * interrupt from the card. + */ + do { + if (mwifiex_write_reg(adapter, reg->fw_status, FIRMWARE_READY_PCIE)) { + mwifiex_dbg(adapter, ERROR, + "Writing fw_status register failed\n"); + return -EIO; + } + + n_tries++; + + if (n_tries <= N_WAKEUP_TRIES_SHORT_INTERVAL) + usleep_range(400, 700); + else + msleep(10); + } while (n_tries <= N_WAKEUP_TRIES_SHORT_INTERVAL + N_WAKEUP_TRIES_LONG_INTERVAL && + READ_ONCE(adapter->int_status) == 0); + + mwifiex_dbg(adapter, EVENT, + "event: Tried %d times until firmware woke up\n", n_tries);
if (reg->sleep_cookie) { mwifiex_pcie_dev_wakeup_delay(adapter);
On Tue, Sep 14, 2021 at 01:48:13PM +0200, Jonas Dreßler wrote:
It seems that the firmware of the 88W8897 card sometimes ignores or misses when we try to wake it up by writing to the firmware status register. This leads to the firmware wakeup timeout expiring and the driver resetting the card because we assume the firmware has hung up or crashed (unfortunately that's not unlikely with this card).
Turns out that most of the time the firmware actually didn't hang up, but simply "missed" our wakeup request and didn't send us an AWAKE event.
Trying again to read the firmware status register after a short timeout usually makes the firmware wake up as expected, so add a small retry loop to mwifiex_pm_wakeup_card() that looks at the interrupt status to check whether the card woke up.
The number of tries and timeout lengths for this were determined experimentally: The firmware usually takes about 500 us to wake up after we attempt to read the status register. In some cases where the firmware is very busy (for example while doing a bluetooth scan) it might even miss our requests for multiple milliseconds, which is why after 15 tries the waiting time gets increased to 10 ms. The maximum number of tries it took to wake the firmware when testing this was around 20, so a maximum number of 50 tries should give us plenty of safety margin.
A good reproducer for this issue is letting the firmware sleep and wake up in very short intervals, for example by pinging a device on the network every 0.1 seconds.
...
- do {
if (mwifiex_write_reg(adapter, reg->fw_status, FIRMWARE_READY_PCIE)) {
mwifiex_dbg(adapter, ERROR,
"Writing fw_status register failed\n");
return -EIO;
}
n_tries++;
if (n_tries <= N_WAKEUP_TRIES_SHORT_INTERVAL)
usleep_range(400, 700);
else
msleep(10);
- } while (n_tries <= N_WAKEUP_TRIES_SHORT_INTERVAL + N_WAKEUP_TRIES_LONG_INTERVAL &&
READ_ONCE(adapter->int_status) == 0);
Can't you use read_poll_timeout() twice instead of this custom approach?
- mwifiex_dbg(adapter, EVENT,
"event: Tried %d times until firmware woke up\n", n_tries);
On 9/22/21 1:19 PM, Andy Shevchenko wrote:
On Tue, Sep 14, 2021 at 01:48:13PM +0200, Jonas Dreßler wrote:
It seems that the firmware of the 88W8897 card sometimes ignores or misses when we try to wake it up by writing to the firmware status register. This leads to the firmware wakeup timeout expiring and the driver resetting the card because we assume the firmware has hung up or crashed (unfortunately that's not unlikely with this card).
Turns out that most of the time the firmware actually didn't hang up, but simply "missed" our wakeup request and didn't send us an AWAKE event.
Trying again to read the firmware status register after a short timeout usually makes the firmware wake up as expected, so add a small retry loop to mwifiex_pm_wakeup_card() that looks at the interrupt status to check whether the card woke up.
The number of tries and timeout lengths for this were determined experimentally: The firmware usually takes about 500 us to wake up after we attempt to read the status register. In some cases where the firmware is very busy (for example while doing a bluetooth scan) it might even miss our requests for multiple milliseconds, which is why after 15 tries the waiting time gets increased to 10 ms. The maximum number of tries it took to wake the firmware when testing this was around 20, so a maximum number of 50 tries should give us plenty of safety margin.
A good reproducer for this issue is letting the firmware sleep and wake up in very short intervals, for example by pinging a device on the network every 0.1 seconds.
...
- do {
if (mwifiex_write_reg(adapter, reg->fw_status, FIRMWARE_READY_PCIE)) {
mwifiex_dbg(adapter, ERROR,
"Writing fw_status register failed\n");
return -EIO;
}
n_tries++;
if (n_tries <= N_WAKEUP_TRIES_SHORT_INTERVAL)
usleep_range(400, 700);
else
msleep(10);
- } while (n_tries <= N_WAKEUP_TRIES_SHORT_INTERVAL + N_WAKEUP_TRIES_LONG_INTERVAL &&
READ_ONCE(adapter->int_status) == 0);
Can't you use read_poll_timeout() twice instead of this custom approach?
I've tried this now, but read_poll_timeout() is not ideal for our use-case. What we'd need would be read->sleep->poll->repeat instead of read->poll->sleep->repeat. With read_poll_timeout() we always end up doing one more (unnecessary) write.
- mwifiex_dbg(adapter, EVENT,
"event: Tried %d times until firmware woke up\n", n_tries);
On Thu, Sep 30, 2021 at 08:04:00PM +0200, Jonas Dreßler wrote:
On 9/22/21 1:19 PM, Andy Shevchenko wrote:
On Tue, Sep 14, 2021 at 01:48:13PM +0200, Jonas Dreßler wrote:
...
- do {
if (mwifiex_write_reg(adapter, reg->fw_status, FIRMWARE_READY_PCIE)) {
mwifiex_dbg(adapter, ERROR,
"Writing fw_status register failed\n");
return -EIO;
}
n_tries++;
if (n_tries <= N_WAKEUP_TRIES_SHORT_INTERVAL)
usleep_range(400, 700);
else
msleep(10);
- } while (n_tries <= N_WAKEUP_TRIES_SHORT_INTERVAL + N_WAKEUP_TRIES_LONG_INTERVAL &&
READ_ONCE(adapter->int_status) == 0);
Can't you use read_poll_timeout() twice instead of this custom approach?
I've tried this now, but read_poll_timeout() is not ideal for our use-case. What we'd need would be read->sleep->poll->repeat instead of read->poll->sleep->repeat. With read_poll_timeout() we always end up doing one more (unnecessary) write.
First of all, there is a parameter to get sleep beforehand. Second, what is the problem with having one write more or less? Your current code doesn't guarantee this either. It only decreases probability of such scenario. Am I wrong?
On 9/30/21 10:58 PM, Andy Shevchenko wrote:
On Thu, Sep 30, 2021 at 08:04:00PM +0200, Jonas Dreßler wrote:
On 9/22/21 1:19 PM, Andy Shevchenko wrote:
On Tue, Sep 14, 2021 at 01:48:13PM +0200, Jonas Dreßler wrote:
...
- do {
if (mwifiex_write_reg(adapter, reg->fw_status, FIRMWARE_READY_PCIE)) {
mwifiex_dbg(adapter, ERROR,
"Writing fw_status register failed\n");
return -EIO;
}
n_tries++;
if (n_tries <= N_WAKEUP_TRIES_SHORT_INTERVAL)
usleep_range(400, 700);
else
msleep(10);
- } while (n_tries <= N_WAKEUP_TRIES_SHORT_INTERVAL + N_WAKEUP_TRIES_LONG_INTERVAL &&
READ_ONCE(adapter->int_status) == 0);
Can't you use read_poll_timeout() twice instead of this custom approach?
I've tried this now, but read_poll_timeout() is not ideal for our use-case. What we'd need would be read->sleep->poll->repeat instead of read->poll->sleep->repeat. With read_poll_timeout() we always end up doing one more (unnecessary) write.
First of all, there is a parameter to get sleep beforehand.
Sleeping beforehand will sleep before doing the first write, so that's just wasted time.
Second, what is the problem with having one write more or less? Your current code doesn't guarantee this either. It only decreases probability of such scenario. Am I wrong?
Indeed my approach just decreases the probability and we sometimes end up writing twice to wakeup the card, but it would kinda bug me if we'd always do one write too much.
Anyway, if you still prefer the read_poll_timeout() solution I'd be alright with that of course.
On Thu, Sep 30, 2021 at 11:07:09PM +0200, Jonas Dreßler wrote:
On 9/30/21 10:58 PM, Andy Shevchenko wrote:
On Thu, Sep 30, 2021 at 08:04:00PM +0200, Jonas Dreßler wrote:
...
Second, what is the problem with having one write more or less? Your current code doesn't guarantee this either. It only decreases probability of such scenario. Am I wrong?
Indeed my approach just decreases the probability and we sometimes end up writing twice to wakeup the card, but it would kinda bug me if we'd always do one write too much.
Anyway, if you still prefer the read_poll_timeout() solution I'd be alright with that of course.
Yes, it will make code cleaner.
On 9/14/21 13:48, Jonas Dreßler wrote:
It seems that the firmware of the 88W8897 card sometimes ignores or misses when we try to wake it up by writing to the firmware status register. This leads to the firmware wakeup timeout expiring and the driver resetting the card because we assume the firmware has hung up or crashed (unfortunately that's not unlikely with this card).
Turns out that most of the time the firmware actually didn't hang up, but simply "missed" our wakeup request and didn't send us an AWAKE event.
Trying again to read the firmware status register after a short timeout usually makes the firmware wake up as expected, so add a small retry loop to mwifiex_pm_wakeup_card() that looks at the interrupt status to check whether the card woke up.
The number of tries and timeout lengths for this were determined experimentally: The firmware usually takes about 500 us to wake up after we attempt to read the status register. In some cases where the firmware is very busy (for example while doing a bluetooth scan) it might even miss our requests for multiple milliseconds, which is why after 15 tries the waiting time gets increased to 10 ms. The maximum number of tries it took to wake the firmware when testing this was around 20, so a maximum number of 50 tries should give us plenty of safety margin.
A good reproducer for this issue is letting the firmware sleep and wake up in very short intervals, for example by pinging a device on the network every 0.1 seconds.
Cc: stable@vger.kernel.org Signed-off-by: Jonas Dreßler verdre@v0yd.nl
drivers/net/wireless/marvell/mwifiex/pcie.c | 33 +++++++++++++++++---- 1 file changed, 27 insertions(+), 6 deletions(-)
diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c index 0eff717ac5fa..7fea319e013c 100644 --- a/drivers/net/wireless/marvell/mwifiex/pcie.c +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c @@ -661,11 +661,15 @@ static void mwifiex_delay_for_sleep_cookie(struct mwifiex_adapter *adapter, "max count reached while accessing sleep cookie\n"); } +#define N_WAKEUP_TRIES_SHORT_INTERVAL 15 +#define N_WAKEUP_TRIES_LONG_INTERVAL 35
- /* This function wakes up the card by reading fw_status register. */ static int mwifiex_pm_wakeup_card(struct mwifiex_adapter *adapter) { struct pcie_service_card *card = adapter->card; const struct mwifiex_pcie_card_reg *reg = card->pcie.reg;
- int n_tries = 0;
mwifiex_dbg(adapter, EVENT, "event: Wakeup device...\n"); @@ -673,12 +677,29 @@ static int mwifiex_pm_wakeup_card(struct mwifiex_adapter *adapter) if (reg->sleep_cookie) mwifiex_pcie_dev_wakeup_delay(adapter);
- /* Accessing fw_status register will wakeup device */
- if (mwifiex_write_reg(adapter, reg->fw_status, FIRMWARE_READY_PCIE)) {
mwifiex_dbg(adapter, ERROR,
"Writing fw_status register failed\n");
return -1;
- }
- /* Access the fw_status register to wake up the device.
* Since the 88W8897 firmware sometimes appears to ignore or miss
* that wakeup request, we continue trying until we receive an
* interrupt from the card.
*/
- do {
if (mwifiex_write_reg(adapter, reg->fw_status, FIRMWARE_READY_PCIE)) {
mwifiex_dbg(adapter, ERROR,
"Writing fw_status register failed\n");
return -EIO;
}
n_tries++;
if (n_tries <= N_WAKEUP_TRIES_SHORT_INTERVAL)
usleep_range(400, 700);
else
msleep(10);
- } while (n_tries <= N_WAKEUP_TRIES_SHORT_INTERVAL + N_WAKEUP_TRIES_LONG_INTERVAL &&
READ_ONCE(adapter->int_status) == 0);
- mwifiex_dbg(adapter, EVENT,
"event: Tried %d times until firmware woke up\n", n_tries);
if (reg->sleep_cookie) { mwifiex_pcie_dev_wakeup_delay(adapter);
So I think I have another solution that might be a lot more elegant, how about this:
try_again: n_tries++;
mwifiex_write_reg(adapter, reg->fw_status, FIRMWARE_READY_PCIE);
if (wait_event_interruptible_timeout(adapter->card_wakeup_wait_q, READ_ONCE(adapter->int_status) != 0, WAKEUP_TRY_AGAIN_TIMEOUT) == 0 && n_tries < MAX_N_WAKEUP_TRIES) { goto try_again; }
and then call wake_up_interruptible() in the mwifiex_interrupt_status() interrupt handler.
This solution should make sure we always keep wakeup latency to a minimum and can still retry the register write if things didn't work.
Hi,
On Sun, Oct 3, 2021 at 2:18 AM Jonas Dreßler verdre@v0yd.nl wrote:
So I think I have another solution that might be a lot more elegant, how about this:
try_again: n_tries++;
mwifiex_write_reg(adapter, reg->fw_status, FIRMWARE_READY_PCIE); if (wait_event_interruptible_timeout(adapter->card_wakeup_wait_q, READ_ONCE(adapter->int_status) != 0, WAKEUP_TRY_AGAIN_TIMEOUT) == 0 && n_tries < MAX_N_WAKEUP_TRIES) { goto try_again; }
Isn't wait_event_interruptible_timeout()'s timeout in jiffies, which is not necessarily that predictable, and also a lot more coarse-grained than we want? (As in, if HZ=100, we're looking at precision on the order of 10ms, whereas the expected wakeup latency is ~6ms.) That would be OK for well-behaved PCI cases, where we never miss a write, but it could ~double your latency for your bad systems that will need more than one run of the loop.
Also, feels like a do/while could be cleaner, but that's a lesser detail.
and then call wake_up_interruptible() in the mwifiex_interrupt_status() interrupt handler.
This solution should make sure we always keep wakeup latency to a minimum and can still retry the register write if things didn't work.
Brian
linux-stable-mirror@lists.linaro.org