With recent changes in AOSP, adb is using asynchronous io, which
causes the following crash usually on a reboot:
[ 184.278302] BUG: scheduling while atomic: ksoftirqd/0/9/0x00000104
[ 184.284617] Modules linked in: wl18xx wlcore snd_soc_hdmi_codec wlcore_sdio tcpci_rt1711h tcpci tcpm typec adv7511 cec dwc3 phy_hi3660_usb3 snd_soc_simple_card snd_soc_a
[ 184.316034] Preemption disabled at:
[ 184.316072] [<ffffff8008081de4>] __do_softirq+0x64/0x398
[ 184.324953] CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G S 4.19.43-00669-g8e4970572c43-dirty #356
[ 184.334963] Hardware name: HiKey960 (DT)
[ 184.338892] Call trace:
[ 184.341352] dump_backtrace+0x0/0x158
[ 184.345025] show_stack+0x14/0x20
[ 184.348355] dump_stack+0x80/0xa4
[ 184.351685] __schedule_bug+0x6c/0xc0
[ 184.355363] __schedule+0x64c/0x978
[ 184.358863] schedule+0x2c/0x90
[ 184.362053] dwc3_gadget_ep_dequeue+0x274/0x388 [dwc3]
[ 184.367210] usb_ep_dequeue+0x24/0xf8
[ 184.370884] ffs_aio_cancel+0x3c/0x80
[ 184.374561] free_ioctx_users+0x40/0x148
[ 184.378500] percpu_ref_switch_to_atomic_rcu+0x180/0x1c0
[ 184.383830] rcu_process_callbacks+0x24c/0x5d8
[ 184.388283] __do_softirq+0x13c/0x398
[ 184.391959] run_ksoftirqd+0x3c/0x48
[ 184.395549] smpboot_thread_fn+0x220/0x288
[ 184.399660] kthread+0x12c/0x130
[ 184.402901] ret_from_fork+0x10/0x1c
This happens as usb_ep_dequeue can be called in interrupt
context, and dwc3_gadget_ep_dequeue() then calls
wait_event_lock_irq() which can sleep.
Upstream kernels are not affected due to the change
fec9095bdef4 ("dwc3: gadget: remove wait_end_transfer") which
removes the wait_even_lock_irq code. Unfortunately that change
has a number of dependencies, which I'm submitting here.
Also, to match upstream, in this series I've reverted one
change that was backported to -stable, to replace it with the
cherry-picked upstream commit (as the dependencies are now
there)
This issue also affects 4.14,4.9 and I believe 4.4 kernels,
however I don't know how to best backport this functionality
that far back. Help from the maintainers would be very much
appreciated!
Feedback and comments would be welcome!
thanks
-john
Cc: Fei Yang <fei.yang(a)intel.com>
Cc: Sam Protsenko <semen.protsenko(a)linaro.org>
Cc: Felipe Balbi <balbi(a)kernel.org>
Cc: linux-usb(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # 4.19.y
Felipe Balbi (7):
usb: dwc3: gadget: combine unaligned and zero flags
usb: dwc3: gadget: track number of TRBs per request
usb: dwc3: gadget: use num_trbs when skipping TRBs on ->dequeue()
usb: dwc3: gadget: extract dwc3_gadget_ep_skip_trbs()
usb: dwc3: gadget: introduce cancelled_list
usb: dwc3: gadget: move requests to cancelled_list
usb: dwc3: gadget: remove wait_end_transfer
Jack Pham (1):
usb: dwc3: gadget: Clear req->needs_extra_trb flag on cleanup
John Stultz (1):
Revert "usb: dwc3: gadget: Clear req->needs_extra_trb flag on cleanup"
drivers/usb/dwc3/core.h | 15 ++--
drivers/usb/dwc3/gadget.c | 158 +++++++++++++-------------------------
drivers/usb/dwc3/gadget.h | 15 ++++
3 files changed, 75 insertions(+), 113 deletions(-)
--
2.17.1
The patch titled
Subject: mm-hugetlb-soft-offline-dissolve_free_huge_page-return-zero-on-pagehuge-v3
has been removed from the -mm tree. Its filename was
mm-hugetlb-soft-offline-dissolve_free_huge_page-return-zero-on-pagehuge-v3.patch
This patch was dropped because it was folded into mm-hugetlb-soft-offline-dissolve_free_huge_page-return-zero-on-pagehuge.patch
------------------------------------------------------
From: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Subject: mm-hugetlb-soft-offline-dissolve_free_huge_page-return-zero-on-pagehuge-v3
- add PageHuge check in dissolve_free_huge_page() outside hugetlb_lock
- update comment on dissolve_free_huge_page() about return value
Link: http://lkml.kernel.org/r/1560761476-4651-3-git-send-email-n-horiguchi@ah.jp…
Reported-by: Chen, Jerry T <jerry.t.chen(a)intel.com>
Tested-by: Chen, Jerry T <jerry.t.chen(a)intel.com>
Signed-off-by: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining")
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Xishi Qiu <xishi.qiuxishi(a)alibaba-inc.com>
Cc: "Chen, Jerry T" <jerry.t.chen(a)intel.com>
Cc: "Zhuo, Qiuxu" <qiuxu.zhuo(a)intel.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org> [4.19+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-soft-offline-dissolve_free_huge_page-return-zero-on-pagehuge-v3
+++ a/mm/hugetlb.c
@@ -1510,14 +1510,22 @@ static int free_pool_huge_page(struct hs
/*
* Dissolve a given free hugepage into free buddy pages. This function does
- * nothing for in-use (including surplus) hugepages. Returns -EBUSY if the
- * dissolution fails because a give page is not a free hugepage, or because
- * free hugepages are fully reserved.
+ * nothing for in-use hugepages and non-hugepages.
+ * This function returns values like below:
+ *
+ * -EBUSY: failed to dissolved free hugepages or the hugepage is in-use
+ * (allocated or reserved.)
+ * 0: successfully dissolved free hugepages or the page is not a
+ * hugepage (considered as already dissolved)
*/
int dissolve_free_huge_page(struct page *page)
{
int rc = -EBUSY;
+ /* Not to disrupt normal path by vainly holding hugetlb_lock */
+ if (!PageHuge(page))
+ return 0;
+
spin_lock(&hugetlb_lock);
if (!PageHuge(page)) {
rc = 0;
_
Patches currently in -mm which might be from n-horiguchi(a)ah.jp.nec.com are
mm-soft-offline-return-ebusy-if-set_hwpoison_free_buddy_page-fails.patch
mm-hugetlb-soft-offline-dissolve_free_huge_page-return-zero-on-pagehuge.patch
In IEC 61883-6, 8 MIDI data streams are multiplexed into single
MIDI conformant data channel. The index of stream is calculated by
modulo 8 of the value of data block counter.
In fireworks, the value of data block counter in CIP header has a quirk
with firmware version v5.0.0, v5.7.3 and v5.8.0. This brings ALSA
IEC 61883-1/6 packet streaming engine to miss detection of MIDI
messages.
This commit fixes the miss detection to modify the value of data block
counter for the modulo calculation.
For maintainers, this bug exists since a commit 18f5ed365d3f ("ALSA:
fireworks/firewire-lib: add support for recent firmware quirk") in Linux
kernel v4.2. There're many changes since the commit. This fix can be
backported to Linux kernel v4.4 or later. I tagged a base commit to the
backport for your convenience.
Fixes: df075feefbd3 ("ALSA: firewire-lib: complete AM824 data block processing layer")
Cc: <stable(a)vger.kernel.org> # v4.4+
Signed-off-by: Takashi Sakamoto <o-takashi(a)sakamocchi.jp>
---
sound/firewire/amdtp-am824.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/firewire/amdtp-am824.c b/sound/firewire/amdtp-am824.c
index 7019a2143581..623d014c0e7e 100644
--- a/sound/firewire/amdtp-am824.c
+++ b/sound/firewire/amdtp-am824.c
@@ -321,7 +321,7 @@ static void read_midi_messages(struct amdtp_stream *s,
u8 *b;
for (f = 0; f < frames; f++) {
- port = (s->data_block_counter + f) % 8;
+ port = (8 - s->ctx_data.tx.first_dbc + s->data_block_counter + f) % 8;
b = (u8 *)&buffer[p->midi_position];
len = b[0] - 0x80;
--
2.20.1
commit 904d88d743b0c94092c5117955eab695df8109e8 upstream.
The syzbot reported
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xca/0x13e lib/dump_stack.c:113
print_address_description+0x67/0x231 mm/kasan/report.c:188
__kasan_report.cold+0x1a/0x32 mm/kasan/report.c:317
kasan_report+0xe/0x20 mm/kasan/common.c:614
qmi_wwan_probe+0x342/0x360 drivers/net/usb/qmi_wwan.c:1417
usb_probe_interface+0x305/0x7a0 drivers/usb/core/driver.c:361
really_probe+0x281/0x660 drivers/base/dd.c:509
driver_probe_device+0x104/0x210 drivers/base/dd.c:670
__device_attach_driver+0x1c2/0x220 drivers/base/dd.c:777
bus_for_each_drv+0x15c/0x1e0 drivers/base/bus.c:454
Caused by too many confusing indirections and casts.
id->driver_info is a pointer stored in a long. We want the
pointer here, not the address of it.
Thanks-to: Hillf Danton <hdanton(a)sina.com>
Reported-by: syzbot+b68605d7fadd21510de1(a)syzkaller.appspotmail.com
Cc: Kristian Evensen <kristian.evensen(a)gmail.com>
Fixes: e4bf63482c30 ("qmi_wwan: Add quirk for Quectel dynamic config")
Signed-off-by: Bjørn Mork <bjorn(a)mork.no>
[Upstream commit did not apply because I shuffled two lines in the
backport. The fixes tag for 4.14 is 3a6a5107ceb3.]
Signed-off-by: Kristian Evensen <kristian.evensen(a)gmail.com>
---
drivers/net/usb/qmi_wwan.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
index c2d6c501d..063daa343 100644
--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -1395,14 +1395,14 @@ static int qmi_wwan_probe(struct usb_interface *intf,
return -ENODEV;
}
- info = (void *)&id->driver_info;
-
/* Several Quectel modems supports dynamic interface configuration, so
* we need to match on class/subclass/protocol. These values are
* identical for the diagnostic- and QMI-interface, but bNumEndpoints is
* different. Ignore the current interface if the number of endpoints
* equals the number for the diag interface (two).
*/
+ info = (void *)id->driver_info;
+
if (info->data & QMI_WWAN_QUIRK_QUECTEL_DYNCFG) {
if (desc->bNumEndpoints == 2)
return -ENODEV;
--
2.20.1
I'm not entirely sure why this is, but for some reason:
921935dc6404 ("drm/amd/powerplay: enforce display related settings only on needed")
Breaks runtime PM resume on the Radeon PRO WX 3100 (Lexa) in one the
pre-production laptops I have. The issue manifests as the following
messages in dmesg:
[drm] UVD and UVD ENC initialized successfully.
amdgpu 0000:3b:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vce1 test failed (-110)
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <vce_v3_0> failed -110
[drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-110).
And happens after about 6-10 runtime PM suspend/resume cycles (sometimes
sooner, if you're lucky!). Unfortunately I can't seem to pin down
precisely which part in psm_adjust_power_state_dynamic that is causing
the issue, but not skipping the display setting setup seems to fix it.
Hopefully if there is a better fix for this, this patch will spark
discussion around it.
Fixes: 921935dc6404 ("drm/amd/powerplay: enforce display related settings only on needed")
Cc: Evan Quan <evan.quan(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: Huang Rui <ray.huang(a)amd.com>
Cc: Rex Zhu <Rex.Zhu(a)amd.com>
Cc: Likun Gao <Likun.Gao(a)amd.com>
Cc: <stable(a)vger.kernel.org> # v5.1+
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
---
drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c b/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
index 6cd6497c6fc2..0e1b2d930816 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
@@ -325,7 +325,7 @@ int hwmgr_resume(struct pp_hwmgr *hwmgr)
if (ret)
return ret;
- ret = psm_adjust_power_state_dynamic(hwmgr, true, NULL);
+ ret = psm_adjust_power_state_dynamic(hwmgr, false, NULL);
return ret;
}
--
2.21.0
There is a race between mca_reap(), btree_node_free() and journal code
btree_flush_write(), which results very rare and strange deadlock or
panic and are very hard to reproduce.
Let me explain how the race happens. In btree_flush_write() one btree
node with oldest journal pin is selected, then it is flushed to cache
device, the select-and-flush is a two steps operation. Between these two
steps, there are something may happen inside the race window,
- The selected btree node was reaped by mca_reap() and allocated to
other requesters for other btree node.
- The slected btree node was selected, flushed and released by mca
shrink callback bch_mca_scan().
When btree_flush_write() tries to flush the selected btree node, firstly
b->write_lock is held by mutex_lock(). If the race happens and the
memory of selected btree node is allocated to other btree node, if that
btree node's write_lock is held already, a deadlock very probably
happens here. A worse case is the memory of the selected btree node is
released, then all references to this btree node (e.g. b->write_lock)
will trigger NULL pointer deference panic.
This race was introduced in commit cafe56359144 ("bcache: A block layer
cache"), and enlarged by commit c4dc2497d50d ("bcache: fix high CPU
occupancy during journal"), which selected 128 btree nodes and flushed
them one-by-one in a quite long time period.
Such race is not easy to reproduce before. On a Lenovo SR650 server with
48 Xeon cores, and configure 1 NVMe SSD as cache device, a MD raid0
device assembled by 3 NVMe SSDs as backing device, this race can be
observed around every 10,000 times btree_flush_write() gets called. Both
deadlock and kernel panic all happened as aftermath of the race.
The idea of the fix is to add a btree flag BTREE_NODE_journal_flush. It
is set when selecting btree nodes, and cleared after btree nodes
flushed. Then when mca_reap() selects a btree node with this bit set,
this btree node will be skipped. Since mca_reap() only reaps btree node
without BTREE_NODE_journal_flush flag, such race is avoided.
Once corner case should be noticed, that is btree_node_free(). It might
be called in some error handling code path. For example the following
code piece from btree_split(),
2149 err_free2:
2150 bkey_put(b->c, &n2->key);
2151 btree_node_free(n2);
2152 rw_unlock(true, n2);
2153 err_free1:
2154 bkey_put(b->c, &n1->key);
2155 btree_node_free(n1);
2156 rw_unlock(true, n1);
At line 2151 and 2155, the btree node n2 and n1 are released without
mac_reap(), so BTREE_NODE_journal_flush also needs to be checked here.
If btree_node_free() is called directly in such error handling path,
and the selected btree node has BTREE_NODE_journal_flush bit set, just
delay for 1 us and retry again. In this case this btree node won't
be skipped, just retry until the BTREE_NODE_journal_flush bit cleared,
and free the btree node memory.
Fixes: cafe56359144 ("bcache: A block layer cache")
Signed-off-by: Coly Li <colyli(a)suse.de>
Reported-and-tested-by: kbuild test robot <lkp(a)intel.com>
Cc: stable(a)vger.kernel.org
---
drivers/md/bcache/btree.c | 28 +++++++++++++++++++++++++++-
drivers/md/bcache/btree.h | 2 ++
drivers/md/bcache/journal.c | 7 +++++++
3 files changed, 36 insertions(+), 1 deletion(-)
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 846306c3a887..ba434d9ac720 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -35,7 +35,7 @@
#include <linux/rcupdate.h>
#include <linux/sched/clock.h>
#include <linux/rculist.h>
-
+#include <linux/delay.h>
#include <trace/events/bcache.h>
/*
@@ -659,12 +659,25 @@ static int mca_reap(struct btree *b, unsigned int min_order, bool flush)
up(&b->io_mutex);
}
+retry:
/*
* BTREE_NODE_dirty might be cleared in btree_flush_btree() by
* __bch_btree_node_write(). To avoid an extra flush, acquire
* b->write_lock before checking BTREE_NODE_dirty bit.
*/
mutex_lock(&b->write_lock);
+ /*
+ * If this btree node is selected in btree_flush_write() by journal
+ * code, delay and retry until the node is flushed by journal code
+ * and BTREE_NODE_journal_flush bit cleared by btree_flush_write().
+ */
+ if (btree_node_journal_flush(b)) {
+ pr_debug("bnode %p is flushing by journal, retry", b);
+ mutex_unlock(&b->write_lock);
+ udelay(1);
+ goto retry;
+ }
+
if (btree_node_dirty(b))
__bch_btree_node_write(b, &cl);
mutex_unlock(&b->write_lock);
@@ -1081,7 +1094,20 @@ static void btree_node_free(struct btree *b)
BUG_ON(b == b->c->root);
+retry:
mutex_lock(&b->write_lock);
+ /*
+ * If the btree node is selected and flushing in btree_flush_write(),
+ * delay and retry until the BTREE_NODE_journal_flush bit cleared,
+ * then it is safe to free the btree node here. Otherwise this btree
+ * node will be in race condition.
+ */
+ if (btree_node_journal_flush(b)) {
+ mutex_unlock(&b->write_lock);
+ pr_debug("bnode %p journal_flush set, retry", b);
+ udelay(1);
+ goto retry;
+ }
if (btree_node_dirty(b)) {
btree_complete_write(b, btree_current_write(b));
diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h
index d1c72ef64edf..76cfd121a486 100644
--- a/drivers/md/bcache/btree.h
+++ b/drivers/md/bcache/btree.h
@@ -158,11 +158,13 @@ enum btree_flags {
BTREE_NODE_io_error,
BTREE_NODE_dirty,
BTREE_NODE_write_idx,
+ BTREE_NODE_journal_flush,
};
BTREE_FLAG(io_error);
BTREE_FLAG(dirty);
BTREE_FLAG(write_idx);
+BTREE_FLAG(journal_flush);
static inline struct btree_write *btree_current_write(struct btree *b)
{
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 1218e3cada3c..a1e3e1fcea6e 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -430,6 +430,7 @@ static void btree_flush_write(struct cache_set *c)
retry:
best = NULL;
+ mutex_lock(&c->bucket_lock);
for_each_cached_btree(b, c, i)
if (btree_current_write(b)->journal) {
if (!best)
@@ -442,15 +443,21 @@ static void btree_flush_write(struct cache_set *c)
}
b = best;
+ if (b)
+ set_btree_node_journal_flush(b);
+ mutex_unlock(&c->bucket_lock);
+
if (b) {
mutex_lock(&b->write_lock);
if (!btree_current_write(b)->journal) {
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
/* We raced */
goto retry;
}
__bch_btree_node_write(b, NULL);
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
}
}
--
2.16.4
Commit 9baf30972b55 ("bcache: fix for gc and write-back race") added a
new work queue dc->writeback_write_wq, but forgot to destroy it in the
error condition when creating dc->writeback_thread failed.
This patch destroys dc->writeback_write_wq if kthread_create() returns
error pointer to dc->writeback_thread, then a memory leak is avoided.
Fixes: 9baf30972b55 ("bcache: fix for gc and write-back race")
Signed-off-by: Coly Li <colyli(a)suse.de>
Cc: stable(a)vger.kernel.org
---
drivers/md/bcache/writeback.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 262f7ef20992..21081febcb59 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -833,6 +833,7 @@ int bch_cached_dev_writeback_start(struct cached_dev *dc)
"bcache_writeback");
if (IS_ERR(dc->writeback_thread)) {
cached_dev_put(dc);
+ destroy_workqueue(dc->writeback_write_wq);
return PTR_ERR(dc->writeback_thread);
}
dc->writeback_running = true;
--
2.16.4