Hi folks, here is a series with some fixes for dummy_hcd. First of all,
the reasoning behind it.
Syzkaller report [0] shows a hung task on uevent_show, and despite it was
fixed with a patch on drivers/base (a race between drivers shutdown and
uevent_show), another issue remains: a problem with Realtek emulated wifi
device [1]. While working the fix ([1]), we noticed that if it is
applied to recent kernels, all fine. But in v6.1.y and v6.6.y for example,
it didn't solve entirely the issue, and after some debugging, it was
narrowed to dummy_hcd transfer rates being waaay slower in such stable
versions.
The reason of such slowness is well-described in the first 2 patches of
this backport, but the thing is that these patches introduced subtle issues
as well, fixed in the other 2 patches. Hence, I decided to backport all of
them for the 2 latest LTS kernels.
Maybe this is not a good idea - I don't see a strong con, but who's
better to judge the benefits vs the risks than the patch authors,
reviewers, and the USB maintainer?! So, I've CCed Alan, Andrey, Greg and
Marcello here, and I thank you all in advance for reviews on this. And
my apologies for bothering you with the emails, I hope this is a simple
"OK, makes sense" or "Nah, doesn't worth it" situation =)
Cheers,
Guilherme
[0] https://syzkaller.appspot.com/bug?extid=edd9fe0d3a65b14588d5
[1] https://lore.kernel.org/r/20241101193412.1390391-1-gpiccoli@igalia.com/
Alan Stern (1):
USB: gadget: dummy-hcd: Fix "task hung" problem
Andrey Konovalov (1):
usb: gadget: dummy_hcd: execute hrtimer callback in softirq context
Marcello Sylvester Bauer (2):
usb: gadget: dummy_hcd: Switch to hrtimer transfer scheduler
usb: gadget: dummy_hcd: Set transfer interval to 1 microframe
drivers/usb/gadget/udc/dummy_hcd.c | 57 ++++++++++++++++++++----------
1 file changed, 38 insertions(+), 19 deletions(-)
--
2.46.2
Since commit 6d735722063a ("usb: dwc3: core: Prevent phy suspend during init"),
system suspend is broken on AM62 TI platforms.
Before that commit, both DWC3_GUSB3PIPECTL_SUSPHY and DWC3_GUSB2PHYCFG_SUSPHY
bits (hence forth called 2 SUSPHY bits) were being set during core
initialization and even during core re-initialization after a system
suspend/resume.
These bits are required to be set for system suspend/resume to work correctly
on AM62 platforms.
Since that commit, the 2 SUSPHY bits are not set for DEVICE/OTG mode if gadget
driver is not loaded and started.
For Host mode, the 2 SUSPHY bits are set before the first system suspend but
get cleared at system resume during core re-init and are never set again.
This patch resovles these two issues by ensuring the 2 SUSPHY bits are set
before system suspend and restored to the original state during system resume.
Cc: stable(a)vger.kernel.org # v6.9+
Fixes: 6d735722063a ("usb: dwc3: core: Prevent phy suspend during init")
Link: https://lore.kernel.org/all/1519dbe7-73b6-4afc-bfe3-23f4f75d772f@kernel.org/
Signed-off-by: Roger Quadros <rogerq(a)kernel.org>
Acked-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
---
Changes in v3:
- Fix single line comment style
- add DWC3_GUSB3PIPECTL_SUSPHY to documentation of susphy_state
- Added Acked-by tag
- Link to v2: https://lore.kernel.org/r/20241009-am62-lpm-usb-v2-1-da26c0cd2b1e@kernel.org
Changes in v2:
- Fix comment style
- Use both USB3 and USB2 SUSPHY bits to determine susphy_state during system suspend/resume.
- Restore SUSPHY bits at system resume regardless if it was set or cleared before system suspend.
- Link to v1: https://lore.kernel.org/r/20241001-am62-lpm-usb-v1-1-9916b71165f7@kernel.org
---
drivers/usb/dwc3/core.c | 19 +++++++++++++++++++
drivers/usb/dwc3/core.h | 3 +++
2 files changed, 22 insertions(+)
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 9eb085f359ce..ca77f0b186c4 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -2336,6 +2336,11 @@ static int dwc3_suspend_common(struct dwc3 *dwc, pm_message_t msg)
u32 reg;
int i;
+ dwc->susphy_state = (dwc3_readl(dwc->regs, DWC3_GUSB2PHYCFG(0)) &
+ DWC3_GUSB2PHYCFG_SUSPHY) ||
+ (dwc3_readl(dwc->regs, DWC3_GUSB3PIPECTL(0)) &
+ DWC3_GUSB3PIPECTL_SUSPHY);
+
switch (dwc->current_dr_role) {
case DWC3_GCTL_PRTCAP_DEVICE:
if (pm_runtime_suspended(dwc->dev))
@@ -2387,6 +2392,15 @@ static int dwc3_suspend_common(struct dwc3 *dwc, pm_message_t msg)
break;
}
+ if (!PMSG_IS_AUTO(msg)) {
+ /*
+ * TI AM62 platform requires SUSPHY to be
+ * enabled for system suspend to work.
+ */
+ if (!dwc->susphy_state)
+ dwc3_enable_susphy(dwc, true);
+ }
+
return 0;
}
@@ -2454,6 +2468,11 @@ static int dwc3_resume_common(struct dwc3 *dwc, pm_message_t msg)
break;
}
+ if (!PMSG_IS_AUTO(msg)) {
+ /* restore SUSPHY state to that before system suspend. */
+ dwc3_enable_susphy(dwc, dwc->susphy_state);
+ }
+
return 0;
}
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index c71240e8f7c7..31de4b57ae7c 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -1150,6 +1150,8 @@ struct dwc3_scratchpad_array {
* @sys_wakeup: set if the device may do system wakeup.
* @wakeup_configured: set if the device is configured for remote wakeup.
* @suspended: set to track suspend event due to U3/L2.
+ * @susphy_state: state of DWC3_GUSB2PHYCFG_SUSPHY + DWC3_GUSB3PIPECTL_SUSPHY
+ * before PM suspend.
* @imod_interval: set the interrupt moderation interval in 250ns
* increments or 0 to disable.
* @max_cfg_eps: current max number of IN eps used across all USB configs.
@@ -1382,6 +1384,7 @@ struct dwc3 {
unsigned sys_wakeup:1;
unsigned wakeup_configured:1;
unsigned suspended:1;
+ unsigned susphy_state:1;
u16 imod_interval;
---
base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc
change-id: 20240923-am62-lpm-usb-f420917bd707
Best regards,
--
Roger Quadros <rogerq(a)kernel.org>
The IT6505 bridge chip has a active low reset line. Since it is a
"reset" and not an "enable" line, the GPIO should be asserted to
put it in reset and deasserted to bring it out of reset during
the power on sequence.
The polarity was inverted when the driver was first introduced, likely
because the device family that was targeted had an inverting level
shifter on the reset line.
The MT8186 Corsola devices already have the IT6505 in their device tree,
but the whole display pipeline is actually disabled and won't be enabled
until some remaining issues are sorted out. The other known user is
the MT8183 Kukui / Jacuzzi family; their device trees currently do not
have the IT6505 included.
Fix the polarity in the driver while there are no actual users.
Fixes: b5c84a9edcd4 ("drm/bridge: add it6505 driver")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Chen-Yu Tsai <wenst(a)chromium.org>
---
drivers/gpu/drm/bridge/ite-it6505.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/bridge/ite-it6505.c b/drivers/gpu/drm/bridge/ite-it6505.c
index 7502a5f81557..df7ecdf0f422 100644
--- a/drivers/gpu/drm/bridge/ite-it6505.c
+++ b/drivers/gpu/drm/bridge/ite-it6505.c
@@ -2618,9 +2618,9 @@ static int it6505_poweron(struct it6505 *it6505)
/* time interval between OVDD and SYSRSTN at least be 10ms */
if (pdata->gpiod_reset) {
usleep_range(10000, 20000);
- gpiod_set_value_cansleep(pdata->gpiod_reset, 0);
- usleep_range(1000, 2000);
gpiod_set_value_cansleep(pdata->gpiod_reset, 1);
+ usleep_range(1000, 2000);
+ gpiod_set_value_cansleep(pdata->gpiod_reset, 0);
usleep_range(25000, 35000);
}
@@ -2651,7 +2651,7 @@ static int it6505_poweroff(struct it6505 *it6505)
disable_irq_nosync(it6505->irq);
if (pdata->gpiod_reset)
- gpiod_set_value_cansleep(pdata->gpiod_reset, 0);
+ gpiod_set_value_cansleep(pdata->gpiod_reset, 1);
if (pdata->pwr18) {
err = regulator_disable(pdata->pwr18);
@@ -3205,7 +3205,7 @@ static int it6505_init_pdata(struct it6505 *it6505)
return PTR_ERR(pdata->ovdd);
}
- pdata->gpiod_reset = devm_gpiod_get(dev, "reset", GPIOD_OUT_LOW);
+ pdata->gpiod_reset = devm_gpiod_get(dev, "reset", GPIOD_OUT_HIGH);
if (IS_ERR(pdata->gpiod_reset)) {
dev_err(dev, "gpiod_reset gpio not found");
return PTR_ERR(pdata->gpiod_reset);
--
2.47.0.163.g1226f6d8fa-goog
This series fixes a wrong handling of the child node within the
for_each_child_of_node() by adding the missing call to of_node_put() to
make it compatible with stable kernels that don't provide the scoped
variant of the macro, which is more secure and was introduced early this
year.
Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
---
Javier Carrasco (2):
drm/mediatek: Fix child node refcount handling in early exit
drm/mediatek: Switch to for_each_child_of_node_scoped()
drivers/gpu/drm/mediatek/mtk_drm_drv.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
---
base-commit: d61a00525464bfc5fe92c6ad713350988e492b88
change-id: 20241011-mtk_drm_drv_memleak-5e8b8e45ed1c
Best regards,
--
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
On x86 platform, kernel v5.10.228, perf-report command aborts due to "free():
invalid pointer" when perf-record command is run with taken branch stack
sampling enabled. This regression can be reproduced with the following steps:
- sudo perf record -b
- sudo perf report
The root cause is that bi[i].to.ms.maps does not always point to thread->maps,
which is a buffer dynamically allocated by maps_new(). Instead, it may point to
&machine->kmaps, while kmaps is not a pointer but a variable. The original
upstream commit c1149037f65b ("perf hist: Add missing puts to
hist__account_cycles") worked well because machine->kmaps had been refactored to
a pointer by the previous commit 1a97cee604dc ("perf maps: Use a pointer for
kmaps").
The memory leak issue, which the reverted patch intended to fix, has been solved
by commit cf96b8e45a9b ("perf session: Add missing evlist__delete when deleting
a session"). The root cause is that the evlist is not being deleted on exit in
perf-report, perf-script, and perf-data. Consequently, the reference count of
the thread increased by thread__get() in hist_entry__init() is not decremented
in hist_entry__delete(). As a result, thread->maps is not properly freed.
To this end,
- PATCH 1/2 reverts commit a83fc293acd5c5050a4828eced4a71d2b2fffdd3 to fix the
abort regression.
- PATCH 2/2 backports cf96b8e45a9b ("perf session: Add missing evlist__delete
when deleting a session") to fix memory leak issue.
Riccardo Mancini (1):
perf session: Add missing evlist__delete when deleting a session
Shuai Xue (1):
Revert "perf hist: Add missing puts to hist__account_cycles"
tools/perf/util/hist.c | 10 +++-------
tools/perf/util/session.c | 5 ++++-
2 files changed, 7 insertions(+), 8 deletions(-)
--
2.39.3
Move LNL scheduling WA to xe_device.h so this can be used in other
places without needing keep the same comment about removal of this WA
in the future. The WA, which flushes work or workqueues, is now wrapped
in macros and can be reused wherever needed.
Cc: Badal Nilawar <badal.nilawar(a)intel.com>
Cc: Matthew Auld <matthew.auld(a)intel.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray(a)intel.com>
Cc: Lucas De Marchi <lucas.demarchi(a)intel.com>
cc: <stable(a)vger.kernel.org> # v6.11+
Suggested-by: John Harrison <John.C.Harrison(a)Intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das(a)intel.com>
---
drivers/gpu/drm/xe/xe_device.h | 14 ++++++++++++++
drivers/gpu/drm/xe/xe_guc_ct.c | 11 +----------
2 files changed, 15 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
index 4c3f0ebe78a9..f1fbfe916867 100644
--- a/drivers/gpu/drm/xe/xe_device.h
+++ b/drivers/gpu/drm/xe/xe_device.h
@@ -191,4 +191,18 @@ void xe_device_declare_wedged(struct xe_device *xe);
struct xe_file *xe_file_get(struct xe_file *xef);
void xe_file_put(struct xe_file *xef);
+/*
+ * Occasionally it is seen that the G2H worker starts running after a delay of more than
+ * a second even after being queued and activated by the Linux workqueue subsystem. This
+ * leads to G2H timeout error. The root cause of issue lies with scheduling latency of
+ * Lunarlake Hybrid CPU. Issue disappears if we disable Lunarlake atom cores from BIOS
+ * and this is beyond xe kmd.
+ *
+ * TODO: Drop this change once workqueue scheduling delay issue is fixed on LNL Hybrid CPU.
+ */
+#define LNL_FLUSH_WORKQUEUE(wq__) \
+ flush_workqueue(wq__)
+#define LNL_FLUSH_WORK(wrk__) \
+ flush_work(wrk__)
+
#endif
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index 1b5d8fb1033a..703b44b257a7 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -1018,17 +1018,8 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
ret = wait_event_timeout(ct->g2h_fence_wq, g2h_fence.done, HZ);
- /*
- * Occasionally it is seen that the G2H worker starts running after a delay of more than
- * a second even after being queued and activated by the Linux workqueue subsystem. This
- * leads to G2H timeout error. The root cause of issue lies with scheduling latency of
- * Lunarlake Hybrid CPU. Issue dissappears if we disable Lunarlake atom cores from BIOS
- * and this is beyond xe kmd.
- *
- * TODO: Drop this change once workqueue scheduling delay issue is fixed on LNL Hybrid CPU.
- */
if (!ret) {
- flush_work(&ct->g2h_worker);
+ LNL_FLUSH_WORK(&ct->g2h_worker);
if (g2h_fence.done) {
xe_gt_warn(gt, "G2H fence %u, action %04x, done\n",
g2h_fence.seqno, action[0]);
--
2.46.0
On 2024-11-01 15:21:24 [-0400], Sasha Levin wrote:
> commit 052382490ee4f0f6d783ddce02fe6f2d15e134b5
> Author: Wander Lairson Costa <wander(a)redhat.com>
> Date: Mon Oct 21 16:26:24 2024 -0700
>
> igb: Disable threaded IRQ for igb_msix_other
>
> [ Upstream commit 338c4d3902feb5be49bfda530a72c7ab860e2c9f ]
>
> During testing of SR-IOV, Red Hat QE encountered an issue where the
> ip link up command intermittently fails for the igbvf interfaces when
> using the PREEMPT_RT variant. Investigation revealed that
> e1000_write_posted_mbx returns an error due to the lack of an ACK
> from e1000_poll_for_ack.
>
> The underlying issue arises from the fact that IRQs are threaded by
> default under PREEMPT_RT. While the exact hardware details are not
> available, it appears that the IRQ handled by igb_msix_other must
> be processed before e1000_poll_for_ack times out. However,
> e1000_write_posted_mbx is called with preemption disabled, leading
> to a scenario where the IRQ is serviced only after the failure of
> e1000_write_posted_mbx.
>
> To resolve this, we set IRQF_NO_THREAD for the affected interrupt,
> ensuring that the kernel handles it immediately, thereby preventing
> the aforementioned error.
Wander, please send a revert of this patch. The ISR (E1000_ICR_TS set)
may invoke igb_msg_task(), ptp_clock_event(), igb_perout(), igb_extts()
each of which acquire sleeping locks on PREEMPT_RT. Not sure if this
improved the situation or not.
Sebastian