Hi,
After updating to 6.14.2, the ethernet adapter is almost unusable, I get
over 30% packet loss.
Bisect says it's this commit:
commit 85f6414167da39e0da30bf370f1ecda5a58c6f7b
Author: Vitaly Lifshits <vitaly.lifshits(a)intel.com>
Date: Thu Mar 13 16:05:56 2025 +0200
e1000e: change k1 configuration on MTP and later platforms
[ Upstream commit efaaf344bc2917cbfa5997633bc18a05d3aed27f ]
Starting from Meteor Lake, the Kumeran interface between the integrated
MAC and the I219 PHY works at a different frequency. This causes sporadic
MDI errors when accessing the PHY, and in rare circumstances could lead
to packet corruption.
To overcome this, introduce minor changes to the Kumeran idle
state (K1) parameters during device initialization. Hardware reset
reverts this configuration, therefore it needs to be applied in a few
places.
Fixes: cc23f4f0b6b9 ("e1000e: Add support for Meteor Lake")
Signed-off-by: Vitaly Lifshits <vitaly.lifshits(a)intel.com>
Tested-by: Avigail Dahan <avigailx.dahan(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
drivers/net/ethernet/intel/e1000e/defines.h | 3 ++
drivers/net/ethernet/intel/e1000e/ich8lan.c | 80 +++++++++++++++++++++++++++--
drivers/net/ethernet/intel/e1000e/ich8lan.h | 4 ++
3 files changed, 82 insertions(+), 5 deletions(-)
My system is Novacustom V540TU laptop with Intel Core Ultra 5 125H. And
the e1000e driver is running in a Xen HVM (with PCI passthrough).
Interestingly, I have also another one with Intel Core Ultra 7 155H
where the issue does not happen. I don't see what is different about
network adapter there, they look identical on lspci (but there are
differences about other devices)...
I see the commit above was already backported to other stable branches
too...
#regzbot introduced: 85f6414167da39e0da30bf370f1ecda5a58c6f7b
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
xhci_reset() currently returns -ENODEV if XHCI_STATE_REMOVING is
set, without completing the xhci handshake, unless the reset completes
exceptionally quickly. This behavior causes a regression on Synopsys
DWC3 USB controllers with dual-role capabilities.
Specifically, when a DWC3 controller exits host mode and removes xhci
while a reset is still in progress, and then attempts to configure its
hardware for device mode, the ongoing, incomplete reset leads to
critical register access issues. All register reads return zero, not
just within the xHCI register space (which might be expected during a
reset), but across the entire DWC3 IP block.
This patch addresses the issue by preventing xhci_reset() from being
called in xhci_resume() and bailing out early in the reinit flow when
XHCI_STATE_REMOVING is set.
Cc: stable(a)vger.kernel.org
Fixes: 6ccb83d6c497 ("usb: xhci: Implement xhci_handshake_check_state() helper")
Suggested-by: Mathias Nyman <mathias.nyman(a)intel.com>
Signed-off-by: Roy Luo <royluo(a)google.com>
---
drivers/usb/host/xhci.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 90eb491267b5..244b12eafd95 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -1084,7 +1084,10 @@ int xhci_resume(struct xhci_hcd *xhci, bool power_lost, bool is_auto_resume)
xhci_dbg(xhci, "Stop HCD\n");
xhci_halt(xhci);
xhci_zero_64b_regs(xhci);
- retval = xhci_reset(xhci, XHCI_RESET_LONG_USEC);
+ if (xhci->xhc_state & XHCI_STATE_REMOVING)
+ retval = -ENODEV;
+ else
+ retval = xhci_reset(xhci, XHCI_RESET_LONG_USEC);
spin_unlock_irq(&xhci->lock);
if (retval)
return retval;
--
2.49.0.1204.g71687c7c1d-goog
This is similar to commit 62b6dee1b44a ("PCI/portdrv: Prevent LS7A Bus
Master clearing on shutdown"), which prevents LS7A Bus Master clearing
on kexec.
The key point of this is to work around the LS7A defect that clearing
PCI_COMMAND_MASTER prevents MMIO requests from going downstream, and
we may need to do that even after .shutdown(), e.g., to print console
messages. And in this case we rely on .shutdown() for the downstream
devices to disable interrupts and DMA.
Only skip Bus Master clearing on bridges because endpoint devices still
need it.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ming Wang <wangming01(a)loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
---
drivers/pci/pci-driver.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 602838416e6a..8a1e32367a06 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -517,7 +517,7 @@ static void pci_device_shutdown(struct device *dev)
* If it is not a kexec reboot, firmware will hit the PCI
* devices with big hammer and stop their DMA any way.
*/
- if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot))
+ if (kexec_in_progress && !pci_is_bridge(pci_dev) && (pci_dev->current_state <= PCI_D3hot))
pci_clear_master(pci_dev);
}
--
2.47.1
From: Hongchen Zhang <zhanghongchen(a)loongson.cn>
When the best selected CPU is offline, work_on_cpu() will stuck forever.
This can be happen if a node is online while all its CPUs are offline
(we can use "maxcpus=1" without "nr_cpus=1" to reproduce it), Therefore,
in this case, we should call local_pci_probe() instead of work_on_cpu().
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
Signed-off-by: Hongchen Zhang <zhanghongchen(a)loongson.cn>
---
drivers/pci/pci-driver.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index c8bd71a739f7..602838416e6a 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -386,7 +386,7 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
free_cpumask_var(wq_domain_mask);
}
- if (cpu < nr_cpu_ids)
+ if ((cpu < nr_cpu_ids) && cpu_online(cpu))
error = work_on_cpu(cpu, local_pci_probe, &ddi);
else
error = local_pci_probe(&ddi);
--
2.47.1
Hi Greg and Sasha,
Please find attached backports of commit d0afcfeb9e38 ("kbuild: Disable
-Wdefault-const-init-unsafe") for 6.6 and older, which is needed for tip
of tree versions of LLVM. Please let me know if there are any questions.
Cheers,
Nathan
It was reported that ideapad-laptop sometimes causes some recent (since
2024) Lenovo ThinkBook models shut down when:
- suspending/resuming
- closing/opening the lid
- (dis)connecting a charger
- reading/writing some sysfs properties, e.g., fan_mode, touchpad
- pressing down some Fn keys, e.g., Brightness Up/Down (Fn+F5/F6)
- (seldom) loading the kmod
The issue has existed since the launch day of such models, and there
have been some out-of-tree workarounds (see Link:) for the issue. One
disables some functionalities, while another one simply shortens
IDEAPAD_EC_TIMEOUT. The disabled functionalities have read_ec_data() in
their call chains, which calls schedule() between each poll.
It turns out that these models suffer from the indeterminacy of
schedule() because of their low tolerance for being polled too
frequently. Sometimes schedule() returns too soon due to the lack of
ready tasks, causing the margin between two polls to be too short.
In this case, the command is somehow aborted, and too many subsequent
polls (they poll for "nothing!") may eventually break the state machine
in the EC, resulting in a hard shutdown. This explains why shortening
IDEAPAD_EC_TIMEOUT works around the issue - it reduces the total number
of polls sent to the EC.
Even when it doesn't lead to a shutdown, frequent polls may also disturb
the ongoing operation and notably delay (+ 10-20ms) the availability of
EC response. This phenomenon is unlikely to be exclusive to the models
mentioned above, so dropping the schedule() manner should also slightly
improve the responsiveness of various models.
Fix these issues by migrating to usleep_range(150, 300). The interval is
chosen to add some margin to the minimal 50us and considering EC
responses are usually available after 150-2500us based on my test. It
should be enough to fix these issues on all models subject to the EC bug
without introducing latency on other models.
Tested on ThinkBook 14 G7+ ASP and solved both issues. No regression was
introduced in the test on a model without the EC bug (ThinkBook X IMH,
thanks Eric).
Link: https://github.com/ty2/ideapad-laptop-tb2024g6plus/commit/6c5db18c9e8109873…
Link: https://github.com/ferstar/ideapad-laptop-tb/commit/42d1e68e5009529d31bd23f…
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218771
Fixes: 6a09f21dd1e2 ("ideapad: add ACPI helpers")
Cc: stable(a)vger.kernel.org
Tested-by: Eric Long <i(a)hack3r.moe>
Signed-off-by: Rong Zhang <i(a)rong.moe>
---
drivers/platform/x86/ideapad-laptop.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/drivers/platform/x86/ideapad-laptop.c b/drivers/platform/x86/ideapad-laptop.c
index ede483573fe0..b5e4da6a6779 100644
--- a/drivers/platform/x86/ideapad-laptop.c
+++ b/drivers/platform/x86/ideapad-laptop.c
@@ -15,6 +15,7 @@
#include <linux/bug.h>
#include <linux/cleanup.h>
#include <linux/debugfs.h>
+#include <linux/delay.h>
#include <linux/device.h>
#include <linux/dmi.h>
#include <linux/i8042.h>
@@ -267,6 +268,20 @@ static void ideapad_shared_exit(struct ideapad_private *priv)
*/
#define IDEAPAD_EC_TIMEOUT 200 /* in ms */
+/*
+ * Some models (e.g., ThinkBook since 2024) have a low tolerance for being
+ * polled too frequently. Doing so may break the state machine in the EC,
+ * resulting in a hard shutdown.
+ *
+ * It is also observed that frequent polls may disturb the ongoing operation
+ * and notably delay the availability of EC response.
+ *
+ * These values are used as the delay before the first poll and the interval
+ * between subsequent polls to solve the above issues.
+ */
+#define IDEAPAD_EC_POLL_MIN_US 150
+#define IDEAPAD_EC_POLL_MAX_US 300
+
static int eval_int(acpi_handle handle, const char *name, unsigned long *res)
{
unsigned long long result;
@@ -383,7 +398,7 @@ static int read_ec_data(acpi_handle handle, unsigned long cmd, unsigned long *da
end_jiffies = jiffies + msecs_to_jiffies(IDEAPAD_EC_TIMEOUT) + 1;
while (time_before(jiffies, end_jiffies)) {
- schedule();
+ usleep_range(IDEAPAD_EC_POLL_MIN_US, IDEAPAD_EC_POLL_MAX_US);
err = eval_vpcr(handle, 1, &val);
if (err)
@@ -414,7 +429,7 @@ static int write_ec_cmd(acpi_handle handle, unsigned long cmd, unsigned long dat
end_jiffies = jiffies + msecs_to_jiffies(IDEAPAD_EC_TIMEOUT) + 1;
while (time_before(jiffies, end_jiffies)) {
- schedule();
+ usleep_range(IDEAPAD_EC_POLL_MIN_US, IDEAPAD_EC_POLL_MAX_US);
err = eval_vpcr(handle, 1, &val);
if (err)
base-commit: a5806cd506af5a7c19bcd596e4708b5c464bfd21
--
2.49.0
Hello kernel/driver developers,
I hope, with my information it's possible to find a bug/problem in the
kernel. Otherwise I am sorry, that I disturbed you.
I only use LTS kernels, but I can narrow it down to a hand full of them,
where it works.
The PC: Manjaro Stable/Cinnamon/X11/AMD Ryzen 5 2600/Radeon HD 7790/8GB
RAM
I already asked the Manjaro community, but with no luck.
The game: Hellpoint (GOG Linux latest version, Unity3D-Engine v2021),
uses vulkan
---
I came a long road of kernels. I had many versions of 5.4, 5.10, 5.15,
6.1 and 6.6 and and the game was always unplayable, because the frames
where around 1fps (performance of PC is not the problem).
I asked the mesa and cinnamon team for help in the past, but also with
no luck.
It never worked, till on 2025-03-29 when I installed 6.12.19 for the
first time and it worked!
But it only worked with 6.12.19, 6.12.20 and 6.12.21
When I updated to 6.12.25, it was back to unplayable.
For testing I installed 6.14.4 with the same result. It doesn't work.
I also compared file /proc/config.gz of both kernels (6.12.21 <>
6.14.4), but can't seem to see drastic changes to the graphical part.
I presume it has something to do with amdgpu.
If you need more information, I would be happy to help.
Kind regards,
Marion
Use common wrappers operating directly on the struct sg_table objects to
fix incorrect use of statterlists related calls. dma_unmap_sg() function
has to be called with the number of elements originally passed to the
dma_map_sg() function, not the one returned in sgtable's nents.
CC: stable(a)vger.kernel.org
Fixes: 425902f5c8e3 ("fpga zynq: Use the scatterlist interface")
Signed-off-by: Marek Szyprowski <m.szyprowski(a)samsung.com>
---
drivers/fpga/zynq-fpga.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/fpga/zynq-fpga.c b/drivers/fpga/zynq-fpga.c
index f7e08f7ea9ef..9bd39d1d4048 100644
--- a/drivers/fpga/zynq-fpga.c
+++ b/drivers/fpga/zynq-fpga.c
@@ -406,7 +406,7 @@ static int zynq_fpga_ops_write(struct fpga_manager *mgr, struct sg_table *sgt)
}
priv->dma_nelms =
- dma_map_sg(mgr->dev.parent, sgt->sgl, sgt->nents, DMA_TO_DEVICE);
+ dma_map_sgtable(mgr->dev.parent, sgt, DMA_TO_DEVICE);
if (priv->dma_nelms == 0) {
dev_err(&mgr->dev, "Unable to DMA map (TO_DEVICE)\n");
return -ENOMEM;
@@ -478,7 +478,7 @@ static int zynq_fpga_ops_write(struct fpga_manager *mgr, struct sg_table *sgt)
clk_disable(priv->clk);
out_free:
- dma_unmap_sg(mgr->dev.parent, sgt->sgl, sgt->nents, DMA_TO_DEVICE);
+ dma_unmap_sgtable(mgr->dev.parent, sgt, DMA_TO_DEVICE);
return err;
}
--
2.34.1
When transitioning from USB_ROLE_DEVICE to USB_ROLE_NONE, the code
assumed that the regulator should be disabled. However, if the regulator
is marked as always-on, regulator_is_enabled() continues to return true,
leading to an incorrect attempt to disable a regulator which is not
enabled.
This can result in warnings such as:
[ 250.155624] WARNING: CPU: 1 PID: 7326 at drivers/regulator/core.c:3004
_regulator_disable+0xe4/0x1a0
[ 250.155652] unbalanced disables for VIN_SYS_5V0
To fix this, we move the regulator control logic into
tegra186_xusb_padctl_id_override() function since it's directly related
to the ID override state. The regulator is now only disabled when the role
transitions from USB_ROLE_HOST to USB_ROLE_NONE, by checking the VBUS_ID
register. This ensures that regulator enable/disable operations are
properly balanced and only occur when actually transitioning to/from host
mode.
Fixes: 49d46e3c7e59 ("phy: tegra: xusb: Add set_mode support for UTMI phy on Tegra186")
Cc: stable(a)vger.kernel.org
Signed-off-by: Wayne Chang <waynec(a)nvidia.com>
---
drivers/phy/tegra/xusb-tegra186.c | 59 +++++++++++++++++++------------
1 file changed, 37 insertions(+), 22 deletions(-)
diff --git a/drivers/phy/tegra/xusb-tegra186.c b/drivers/phy/tegra/xusb-tegra186.c
index fae6242aa730..1b35d50821f7 100644
--- a/drivers/phy/tegra/xusb-tegra186.c
+++ b/drivers/phy/tegra/xusb-tegra186.c
@@ -774,13 +774,15 @@ static int tegra186_xusb_padctl_vbus_override(struct tegra_xusb_padctl *padctl,
}
static int tegra186_xusb_padctl_id_override(struct tegra_xusb_padctl *padctl,
- bool status)
+ struct tegra_xusb_usb2_port *port, bool status)
{
- u32 value;
+ u32 value, id_override;
+ int err = 0;
dev_dbg(padctl->dev, "%s id override\n", status ? "set" : "clear");
value = padctl_readl(padctl, USB2_VBUS_ID);
+ id_override = value & ID_OVERRIDE(~0);
if (status) {
if (value & VBUS_OVERRIDE) {
@@ -791,15 +793,35 @@ static int tegra186_xusb_padctl_id_override(struct tegra_xusb_padctl *padctl,
value = padctl_readl(padctl, USB2_VBUS_ID);
}
- value &= ~ID_OVERRIDE(~0);
- value |= ID_OVERRIDE_GROUNDED;
+ if (id_override != ID_OVERRIDE_GROUNDED) {
+ value &= ~ID_OVERRIDE(~0);
+ value |= ID_OVERRIDE_GROUNDED;
+ padctl_writel(padctl, value, USB2_VBUS_ID);
+
+ err = regulator_enable(port->supply);
+ if (err) {
+ dev_err(padctl->dev, "Failed to enable regulator: %d\n", err);
+ return err;
+ }
+ }
} else {
- value &= ~ID_OVERRIDE(~0);
- value |= ID_OVERRIDE_FLOATING;
+ if (id_override == ID_OVERRIDE_GROUNDED) {
+ /*
+ * The regulator is disabled only when the role transitions
+ * from USB_ROLE_HOST to USB_ROLE_NONE.
+ */
+ err = regulator_disable(port->supply);
+ if (err) {
+ dev_err(padctl->dev, "Failed to disable regulator: %d\n", err);
+ return err;
+ }
+
+ value &= ~ID_OVERRIDE(~0);
+ value |= ID_OVERRIDE_FLOATING;
+ padctl_writel(padctl, value, USB2_VBUS_ID);
+ }
}
- padctl_writel(padctl, value, USB2_VBUS_ID);
-
return 0;
}
@@ -818,27 +840,20 @@ static int tegra186_utmi_phy_set_mode(struct phy *phy, enum phy_mode mode,
if (mode == PHY_MODE_USB_OTG) {
if (submode == USB_ROLE_HOST) {
- tegra186_xusb_padctl_id_override(padctl, true);
-
- err = regulator_enable(port->supply);
+ err = tegra186_xusb_padctl_id_override(padctl, port, true);
+ if (err)
+ goto out;
} else if (submode == USB_ROLE_DEVICE) {
tegra186_xusb_padctl_vbus_override(padctl, true);
} else if (submode == USB_ROLE_NONE) {
- /*
- * When port is peripheral only or role transitions to
- * USB_ROLE_NONE from USB_ROLE_DEVICE, regulator is not
- * enabled.
- */
- if (regulator_is_enabled(port->supply))
- regulator_disable(port->supply);
-
- tegra186_xusb_padctl_id_override(padctl, false);
+ err = tegra186_xusb_padctl_id_override(padctl, port, false);
+ if (err)
+ goto out;
tegra186_xusb_padctl_vbus_override(padctl, false);
}
}
-
+out:
mutex_unlock(&padctl->lock);
-
return err;
}
--
2.25.1