With recent changes in AOSP, adb is using asynchronous io, which
causes the following crash usually on a reboot:
[ 184.278302] BUG: scheduling while atomic: ksoftirqd/0/9/0x00000104
[ 184.284617] Modules linked in: wl18xx wlcore snd_soc_hdmi_codec wlcore_sdio tcpci_rt1711h tcpci tcpm typec adv7511 cec dwc3 phy_hi3660_usb3 snd_soc_simple_card snd_soc_a
[ 184.316034] Preemption disabled at:
[ 184.316072] [<ffffff8008081de4>] __do_softirq+0x64/0x398
[ 184.324953] CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G S 4.19.43-00669-g8e4970572c43-dirty #356
[ 184.334963] Hardware name: HiKey960 (DT)
[ 184.338892] Call trace:
[ 184.341352] dump_backtrace+0x0/0x158
[ 184.345025] show_stack+0x14/0x20
[ 184.348355] dump_stack+0x80/0xa4
[ 184.351685] __schedule_bug+0x6c/0xc0
[ 184.355363] __schedule+0x64c/0x978
[ 184.358863] schedule+0x2c/0x90
[ 184.362053] dwc3_gadget_ep_dequeue+0x274/0x388 [dwc3]
[ 184.367210] usb_ep_dequeue+0x24/0xf8
[ 184.370884] ffs_aio_cancel+0x3c/0x80
[ 184.374561] free_ioctx_users+0x40/0x148
[ 184.378500] percpu_ref_switch_to_atomic_rcu+0x180/0x1c0
[ 184.383830] rcu_process_callbacks+0x24c/0x5d8
[ 184.388283] __do_softirq+0x13c/0x398
[ 184.391959] run_ksoftirqd+0x3c/0x48
[ 184.395549] smpboot_thread_fn+0x220/0x288
[ 184.399660] kthread+0x12c/0x130
[ 184.402901] ret_from_fork+0x10/0x1c
This happens as usb_ep_dequeue can be called in interrupt
context, and dwc3_gadget_ep_dequeue() then calls
wait_event_lock_irq() which can sleep.
Upstream kernels are not affected due to the change
fec9095bdef4 ("dwc3: gadget: remove wait_end_transfer") which
removes the wait_even_lock_irq code. Unfortunately that change
has a number of dependencies, which I'm submitting here.
Also, to match upstream, in this series I've reverted one
change that was backported to -stable, to replace it with the
cherry-picked upstream commit (as the dependencies are now
there)
This issue also affects 4.14,4.9 and I believe 4.4 kernels,
however I don't know how to best backport this functionality
that far back. Help from the maintainers would be very much
appreciated!
New in v2:
* Reordered the patchset to put the revert patch first, which
avoids any bisection build issues. (Thanks to Jack Pham for
the suggestion!)
Feedback and comments would be welcome!
thanks
-john
Cc: Fei Yang <fei.yang(a)intel.com>
Cc: Sam Protsenko <semen.protsenko(a)linaro.org>
Cc: Felipe Balbi <balbi(a)kernel.org>
Cc: Jack Pham <jackp(a)codeaurora.org>
Cc: linux-usb(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # 4.19.y
Felipe Balbi (7):
usb: dwc3: gadget: combine unaligned and zero flags
usb: dwc3: gadget: track number of TRBs per request
usb: dwc3: gadget: use num_trbs when skipping TRBs on ->dequeue()
usb: dwc3: gadget: extract dwc3_gadget_ep_skip_trbs()
usb: dwc3: gadget: introduce cancelled_list
usb: dwc3: gadget: move requests to cancelled_list
usb: dwc3: gadget: remove wait_end_transfer
Jack Pham (1):
usb: dwc3: gadget: Clear req->needs_extra_trb flag on cleanup
John Stultz (1):
Revert "usb: dwc3: gadget: Clear req->needs_extra_trb flag on cleanup"
drivers/usb/dwc3/core.h | 15 ++--
drivers/usb/dwc3/gadget.c | 158 +++++++++++++-------------------------
drivers/usb/dwc3/gadget.h | 15 ++++
3 files changed, 75 insertions(+), 113 deletions(-)
--
2.17.1
From: Wang Xin <xin.wang7(a)cn.bosch.com>
eeprom: at24: fix unexpected timeout under high load
Within at24_loop_until_timeout the timestamp used for timeout checking
is recorded after the I2C transfer and sleep_range(). Under high CPU
load either the execution time for I2C transfer or sleep_range() could
actually be larger than the timeout value. Worst case the I2C transfer
is only tried once because the loop will exit due to the timeout
although the EEPROM is now ready.
To fix this issue the timestamp is recorded at the beginning of each
iteration. That is, before I2C transfer and sleep. Then the timeout
is actually checked against the timestamp of the previous iteration.
This makes sure that even if the timeout is reached, there is still one
more chance to try the I2C transfer in case the EEPROM is ready.
Example:
If you have a system which combines high CPU load with repeated EEPROM
writes you will run into the following scenario.
- System makes a successful regmap_bulk_write() to EEPROM.
- System wants to perform another write to EEPROM but EEPROM is still
busy with the last write.
- Because of high CPU load the usleep_range() will sleep more than
25 ms (at24_write_timeout).
- Within the over-long sleeping the EEPROM finished the previous write
operation and is ready again.
- at24_loop_until_timeout() will detect timeout and won't try to write.
Cc: <stable(a)vger.kernel.org> # 4.19.x
Signed-off-by: Wang Xin <xin.wang7(a)cn.bosch.com>
Signed-off-by: Mark Jonas <mark.jonas(a)de.bosch.com>
---
drivers/misc/eeprom/at24.c | 43 +++++++++++++++++++-------------------
1 file changed, 22 insertions(+), 21 deletions(-)
diff --git a/drivers/misc/eeprom/at24.c b/drivers/misc/eeprom/at24.c
index 94836fcbe721..ddfcf4ade7bf 100644
--- a/drivers/misc/eeprom/at24.c
+++ b/drivers/misc/eeprom/at24.c
@@ -106,23 +106,6 @@ static unsigned int at24_write_timeout = 25;
module_param_named(write_timeout, at24_write_timeout, uint, 0);
MODULE_PARM_DESC(at24_write_timeout, "Time (in ms) to try writes (default 25)");
-/*
- * Both reads and writes fail if the previous write didn't complete yet. This
- * macro loops a few times waiting at least long enough for one entire page
- * write to work while making sure that at least one iteration is run before
- * checking the break condition.
- *
- * It takes two parameters: a variable in which the future timeout in jiffies
- * will be stored and a temporary variable holding the time of the last
- * iteration of processing the request. Both should be unsigned integers
- * holding at least 32 bits.
- */
-#define at24_loop_until_timeout(tout, op_time) \
- for (tout = jiffies + msecs_to_jiffies(at24_write_timeout), \
- op_time = 0; \
- op_time ? time_before(op_time, tout) : true; \
- usleep_range(1000, 1500), op_time = jiffies)
-
struct at24_chip_data {
/*
* these fields mirror their equivalents in
@@ -311,13 +294,22 @@ static ssize_t at24_regmap_read(struct at24_data *at24, char *buf,
/* adjust offset for mac and serial read ops */
offset += at24->offset_adj;
- at24_loop_until_timeout(timeout, read_time) {
+ timeout = jiffies + msecs_to_jiffies(at24_write_timeout);
+ do {
+ /*
+ * The timestamp shall be taken before the actual operation
+ * to avoid a premature timeout in case of high CPU load.
+ */
+ read_time = jiffies;
+
ret = regmap_bulk_read(regmap, offset, buf, count);
dev_dbg(&client->dev, "read %zu@%d --> %d (%ld)\n",
count, offset, ret, jiffies);
if (!ret)
return count;
- }
+
+ usleep_range(1000, 1500);
+ } while (time_before(read_time, timeout));
return -ETIMEDOUT;
}
@@ -361,14 +353,23 @@ static ssize_t at24_regmap_write(struct at24_data *at24, const char *buf,
regmap = at24_client->regmap;
client = at24_client->client;
count = at24_adjust_write_count(at24, offset, count);
+ timeout = jiffies + msecs_to_jiffies(at24_write_timeout);
+
+ do {
+ /*
+ * The timestamp shall be taken before the actual operation
+ * to avoid a premature timeout in case of high CPU load.
+ */
+ write_time = jiffies;
- at24_loop_until_timeout(timeout, write_time) {
ret = regmap_bulk_write(regmap, offset, buf, count);
dev_dbg(&client->dev, "write %zu@%d --> %d (%ld)\n",
count, offset, ret, jiffies);
if (!ret)
return count;
- }
+
+ usleep_range(1000, 1500);
+ } while (time_before(write_time, timeout));
return -ETIMEDOUT;
}
--
2.17.1
Since commit ed194d136769 ("usb: core: remove local_irq_save() around
->complete() handler") the handlers rt2x00usb_interrupt_rxdone() and
rt2x00usb_interrupt_txdone() are not running with interrupts disabled
anymore. So these handlers are not guaranteed to run completely before
workqueue processing starts. So only mark entries ready for workqueue
processing after proper accounting in the dma done queue.
Note that rt2x00usb_work_rxdone() processes all available entries, not
only such for which queue_work() was called.
This fixes a regression on a RT5370 based wifi stick in AP mode, which
suddenly stopped data transmission after some period of heavy load. Also
stopping the hanging hostapd resulted in the error message "ieee80211
phy0: rt2x00queue_flush_queue: Warning - Queue 14 failed to flush".
Other operation modes are probably affected as well, this just was
the used testcase.
Fixes: ed194d136769 ("usb: core: remove local_irq_save() around ->complete() handler")
Cc: Stanislaw Gruszka <sgruszka(a)redhat.com>
Cc: Helmut Schaa <helmut.schaa(a)googlemail.com>
Cc: Kalle Valo <kvalo(a)codeaurora.org>
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: linux-wireless(a)vger.kernel.org
Cc: netdev(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # 4.20+
Signed-off-by: Soeren Moch <smoch(a)web.de>
---
drivers/net/wireless/ralink/rt2x00/rt2x00dev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/ralink/rt2x00/rt2x00dev.c b/drivers/net/wireless/ralink/rt2x00/rt2x00dev.c
index 1b08b01db27b..9c102a501ee6 100644
--- a/drivers/net/wireless/ralink/rt2x00/rt2x00dev.c
+++ b/drivers/net/wireless/ralink/rt2x00/rt2x00dev.c
@@ -263,9 +263,9 @@ EXPORT_SYMBOL_GPL(rt2x00lib_dmastart);
void rt2x00lib_dmadone(struct queue_entry *entry)
{
- set_bit(ENTRY_DATA_STATUS_PENDING, &entry->flags);
clear_bit(ENTRY_OWNER_DEVICE_DATA, &entry->flags);
rt2x00queue_index_inc(entry, Q_INDEX_DMA_DONE);
+ set_bit(ENTRY_DATA_STATUS_PENDING, &entry->flags);
}
EXPORT_SYMBOL_GPL(rt2x00lib_dmadone);
--
2.17.1