Hello maintainers,
This series addresses a defect observed on certain hardware platforms using Linux kernel 6.1.147 with the i915 driver. The issue concerns hot plug detection (HPD) logic,
leading to unreliable or missed detection events on affected hardware. This is happening on some specific devices.
### Background
Issue:
On Simatic IPC227E, we observed unreliable or missing hot plug detection events, while on Simatic IPC227G (otherwise similar platform), expected hot plug behavior was maintained.
Affected kernel:
This patch series is intended for the Linux 6.1.y stable tree only (tested on 6.1.147)
Most of the tests were conducted on 6.1.147 (manual/standalone kernel build, CIP/Isar context).
Root cause analysis:
I do not have access to hardware signal traces or scope data to conclusively prove the root cause at electrical level. My understanding is based on observed driver behavior and logs.
Therefore my assumption as to the real cause is that on IPC227G, HPD IRQ storms are apparently not occurring, so the standard HPD IRQ-based detection works as expected. On IPC227E,
frequent HPD interrupts trigger the i915 driver’s storm detection logic, causing it to switch to polling mode. Therefore polling does not resume correctly, leading to the hotplug
issue this series addresses. Device IPC227E's behavior triggers this kernel edge case, likely due to slight variations in signal integrity, electrical margins, or internal component timing.
Device IPC227G, functions as expected, possibly due to cleaner electrical signaling or more optimal timing characteristics, thus avoiding the triggering condition.
Conclusion:
This points to a hardware-software interaction where kernel code assumes nicer signaling or margins than IPC227E is able to provide, exposing logic gaps not visible on more robust hardware.
### Patches
Patches 1-4:
- Partial backports of upstream commits; only the relevant logic or fixes are applied, with other code omitted due to downstream divergence.
- Applied minimal merging without exhaustive backport of all intermediate upstream changes.
Patch 5:
- Contains cherry-picked logic plus context/compatibility amendments as needed. Ensures that the driver builds.
- Together these fixes greatly improve reliability of hotplug detection on both devices, with no regression detected in our setups.
Thank you for your review,
Nicusor Huhulea
This patch series contains the following changes:
Dmitry Baryshkov (2):
drm/probe_helper: extract two helper functions
drm/probe-helper: enable and disable HPD on connectors
Imre Deak (2):
drm/i915: Fix HPD polling, reenabling the output poll work as needed
drm: Add an HPD poll helper to reschedule the poll work
Nicusor Huhulea (1):
drm/i915: fixes for i915 Hot Plug Detection and build/runtime issues
drivers/gpu/drm/drm_probe_helper.c | 127 ++++++++++++++-----
drivers/gpu/drm/i915/display/intel_hotplug.c | 4 +-
include/drm/drm_modeset_helper_vtables.h | 22 ++++
include/drm/drm_probe_helper.h | 1 +
4 files changed, 122 insertions(+), 32 deletions(-)
--
2.39.2
To prevent timing attacks, HMAC value comparison needs to be constant
time. Replace the memcmp() with the correct function, crypto_memneq().
Fixes: 1085b8276bb4 ("tpm: Add the rest of the session HMAC API")
Cc: stable(a)vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)kernel.org>
---
drivers/char/tpm/Kconfig | 1 +
drivers/char/tpm/tpm2-sessions.c | 6 +++---
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/char/tpm/Kconfig b/drivers/char/tpm/Kconfig
index dddd702b2454a..f9d8a4e966867 100644
--- a/drivers/char/tpm/Kconfig
+++ b/drivers/char/tpm/Kconfig
@@ -31,10 +31,11 @@ config TCG_TPM2_HMAC
bool "Use HMAC and encrypted transactions on the TPM bus"
default X86_64
select CRYPTO_ECDH
select CRYPTO_LIB_AESCFB
select CRYPTO_LIB_SHA256
+ select CRYPTO_LIB_UTILS
help
Setting this causes us to deploy a scheme which uses request
and response HMACs in addition to encryption for
communicating with the TPM to prevent or detect bus snooping
and interposer attacks (see tpm-security.rst). Saying Y
diff --git a/drivers/char/tpm/tpm2-sessions.c b/drivers/char/tpm/tpm2-sessions.c
index bdb119453dfbe..5fbd62ee50903 100644
--- a/drivers/char/tpm/tpm2-sessions.c
+++ b/drivers/char/tpm/tpm2-sessions.c
@@ -69,10 +69,11 @@
#include <linux/unaligned.h>
#include <crypto/kpp.h>
#include <crypto/ecdh.h>
#include <crypto/hash.h>
#include <crypto/hmac.h>
+#include <crypto/utils.h>
/* maximum number of names the TPM must remember for authorization */
#define AUTH_MAX_NAMES 3
#define AES_KEY_BYTES AES_KEYSIZE_128
@@ -827,16 +828,15 @@ int tpm_buf_check_hmac_response(struct tpm_chip *chip, struct tpm_buf *buf,
sha256_update(&sctx, auth->our_nonce, sizeof(auth->our_nonce));
sha256_update(&sctx, &auth->attrs, 1);
/* we're done with the rphash, so put our idea of the hmac there */
tpm2_hmac_final(&sctx, auth->session_key, sizeof(auth->session_key)
+ auth->passphrase_len, rphash);
- if (memcmp(rphash, &buf->data[offset_s], SHA256_DIGEST_SIZE) == 0) {
- rc = 0;
- } else {
+ if (crypto_memneq(rphash, &buf->data[offset_s], SHA256_DIGEST_SIZE)) {
dev_err(&chip->dev, "TPM: HMAC check failed\n");
goto out;
}
+ rc = 0;
/* now do response decryption */
if (auth->attrs & TPM2_SA_ENCRYPT) {
/* need key and IV */
tpm2_KDFa(auth->session_key, SHA256_DIGEST_SIZE
--
2.50.1
The Gemalto Cinterion PLS83-W modem (cdc_ether) is emitting confusing link
up and down events when the WWAN interface is activated on the modem-side.
Interrupt URBs will in consecutive polls grab:
* Link Connected
* Link Disconnected
* Link Connected
Where the last Connected is then a stable link state.
When the system is under load this may cause the unlink_urbs() work in
__handle_link_change() to not complete before the next usbnet_link_change()
call turns the carrier on again, allowing rx_submit() to queue new SKBs.
In that event the URB queue is filled faster than it can drain, ending up
in a RCU stall:
rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-.... } 33108 jiffies s: 201 root: 0x1/.
rcu: blocking rcu_node structures (internal RCU debug):
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
Call trace:
arch_local_irq_enable+0x4/0x8
local_bh_enable+0x18/0x20
__netdev_alloc_skb+0x18c/0x1cc
rx_submit+0x68/0x1f8 [usbnet]
rx_alloc_submit+0x4c/0x74 [usbnet]
usbnet_bh+0x1d8/0x218 [usbnet]
usbnet_bh_tasklet+0x10/0x18 [usbnet]
tasklet_action_common+0xa8/0x110
tasklet_action+0x2c/0x34
handle_softirqs+0x2cc/0x3a0
__do_softirq+0x10/0x18
____do_softirq+0xc/0x14
call_on_irq_stack+0x24/0x34
do_softirq_own_stack+0x18/0x20
__irq_exit_rcu+0xa8/0xb8
irq_exit_rcu+0xc/0x30
el1_interrupt+0x34/0x48
el1h_64_irq_handler+0x14/0x1c
el1h_64_irq+0x68/0x6c
_raw_spin_unlock_irqrestore+0x38/0x48
xhci_urb_dequeue+0x1ac/0x45c [xhci_hcd]
unlink1+0xd4/0xdc [usbcore]
usb_hcd_unlink_urb+0x70/0xb0 [usbcore]
usb_unlink_urb+0x24/0x44 [usbcore]
unlink_urbs.constprop.0.isra.0+0x64/0xa8 [usbnet]
__handle_link_change+0x34/0x70 [usbnet]
usbnet_deferred_kevent+0x1c0/0x320 [usbnet]
process_scheduled_works+0x2d0/0x48c
worker_thread+0x150/0x1dc
kthread+0xd8/0xe8
ret_from_fork+0x10/0x20
Get around the problem by delaying the carrier on to the scheduled work.
This needs a new flag to keep track of the necessary action.
The carrier ok check cannot be removed as it remains required for the
LINK_RESET event flow.
Fixes: 4b49f58fff00 ("usbnet: handle link change")
Cc: stable(a)vger.kernel.org
Signed-off-by: John Ernberg <john.ernberg(a)actia.se>
---
I've been testing this quite aggressively over a night, and seems equally
stable to my first approach. I'm a little bit concerned that the bit stuff
can now race (although much smaller) in the opposite direction, that a
carrier off can occur between test_and_clear_bit() and the carrier on
action in the handler. Leaving the carrier on when it shouldn't be.
v2:
- target tree in patch description.
- Drop Ming Lei from address list as their address bounces.
- Rework solution based on feedback by Jakub (let me know if you want a
Suggested-by tag, if we're keeping this direction)
v1: https://lore.kernel.org/netdev/20250710085028.1070922-1-john.ernberg@actia.…
Tested on 6.12.20 and forward ported. Stack trace from 6.12.20.
---
drivers/net/usb/usbnet.c | 11 ++++++++---
include/linux/usb/usbnet.h | 1 +
2 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index c04e715a4c2a..bc1d8631ffe0 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -1122,6 +1122,9 @@ static void __handle_link_change(struct usbnet *dev)
* tx queue is stopped by netcore after link becomes off
*/
} else {
+ if (test_and_clear_bit(EVENT_LINK_CARRIER_ON, &dev->flags))
+ netif_carrier_on(dev->net);
+
/* submitting URBs for reading packets */
tasklet_schedule(&dev->bh);
}
@@ -2009,10 +2012,12 @@ EXPORT_SYMBOL(usbnet_manage_power);
void usbnet_link_change(struct usbnet *dev, bool link, bool need_reset)
{
/* update link after link is reseted */
- if (link && !need_reset)
- netif_carrier_on(dev->net);
- else
+ if (link && !need_reset) {
+ set_bit(EVENT_LINK_CARRIER_ON, &dev->flags);
+ } else {
+ clear_bit(EVENT_LINK_CARRIER_ON, &dev->flags);
netif_carrier_off(dev->net);
+ }
if (need_reset && link)
usbnet_defer_kevent(dev, EVENT_LINK_RESET);
diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h
index 0b9f1e598e3a..4bc6bb01a0eb 100644
--- a/include/linux/usb/usbnet.h
+++ b/include/linux/usb/usbnet.h
@@ -76,6 +76,7 @@ struct usbnet {
# define EVENT_LINK_CHANGE 11
# define EVENT_SET_RX_MODE 12
# define EVENT_NO_IP_ALIGN 13
+# define EVENT_LINK_CARRIER_ON 14
/* This one is special, as it indicates that the device is going away
* there are cyclic dependencies between tasklet, timer and bh
* that must be broken
--
2.49.0
Hi Greg,
The below two patches are needed on linux-5.15.y and linux-6.1.y, please
help to add them to the stable tree.
b7a62611fab7 usb: chipidea: add USB PHY event
87ed257acb09 usb: phy: mxs: disconnect line when USB charger is attached
They are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git branch usb-testing
Thanks,
Xu Yang
+ stable
+ regressions
New subject
Great news.
Greg, Sasha,
Can you please pull in these 3 commits specifically to 6.6.y to fix a
regression that was reported by Morgan in 6.6.y:
commit 12753d71e8c5 ("ACPI: CPPC: Add helper to get the highest
performance value")
commit ed429c686b79 ("cpufreq: amd-pstate: Enable amd-pstate preferred
core support")
commit 3d291fe47fe1 ("cpufreq: amd-pstate: fix the highest frequency
issue which limits performance")
Further details are below.
Thanks!
On 9/5/2024 16:09, Jones, Morgan wrote:
> Mario,
>
> Confirmed. Thank you for the help! Slightly different refs on my end:
>
> Remotes:
>
> next https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (fetch)
> next https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (push)
> origin git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git (fetch)
> origin git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git (push)
> superm1 https://git.kernel.org/pub/scm/linux/kernel/git/superm1/linux.git/ (fetch)
> superm1 https://git.kernel.org/pub/scm/linux/kernel/git/superm1/linux.git/ (push)
> torvalds git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git (fetch)
> torvalds git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git (push)
>
> Patches:
>
> git format-patch 12753d71e8c5^..12753d71e8c5
> git format-patch f3a052391822b772b4e27f2594526cf1eb103cab^..f3a052391822b772b4e27f2594526cf1eb103cab
> git format-patch bf202e654bfa57fb8cf9d93d4c6855890b70b9c4^..bf202e654bfa57fb8cf9d93d4c6855890b70b9c4
>
> Results:
>
> Linux redact 6.6.48 #1-NixOS SMP PREEMPT_DYNAMIC Tue Jan 1 00:00:00 UTC 1980 x86_64 GNU/Linux
>
> analyzing CPU 56:
> driver: amd-pstate-epp
> CPUs which run at the same hardware frequency: 56
> CPUs which need to have their frequency coordinated by software: 56
> maximum transition latency: Cannot determine or is not supported.
> hardware limits: 400 MHz - 3.35 GHz
> available cpufreq governors: performance powersave
> current policy: frequency should be within 400 MHz and 3.35 GHz.
> The governor "performance" may decide which speed to use
> within this range.
> current CPU frequency: Unable to call hardware
> current CPU frequency: 2.09 GHz (asserted by call to kernel)
> boost state support:
> Supported: yes
> Active: yes
> AMD PSTATE Highest Performance: 255. Maximum Frequency: 3.35 GHz.
> AMD PSTATE Nominal Performance: 152. Nominal Frequency: 2.00 GHz.
> AMD PSTATE Lowest Non-linear Performance: 115. Lowest Non-linear Frequency: 1.51 GHz.
> AMD PSTATE Lowest Performance: 31. Lowest Frequency: 400 MHz.
>
> And our builds are back to being fast with `amd_pstate=active amd_prefcore=enable amd_pstate.shared_mem=1`.
>
> Morgan
>
> -----Original Message-----
> From: Mario Limonciello <mario.limonciello(a)amd.com>
> Sent: Thursday, September 5, 2024 8:12 AM
> To: Jones, Morgan <Morgan.Jones(a)viasat.com>
> Cc: linux-pm(a)vger.kernel.org; linux-kernel(a)vger.kernel.org; David Arcari <darcari(a)redhat.com>; Dhananjay Ugwekar <Dhananjay.Ugwekar(a)amd.com>; rafael(a)kernel.org; viresh.kumar(a)linaro.org; gautham.shenoy(a)amd.com; perry.yuan(a)amd.com; skhan(a)linuxfoundation.org; li.meng(a)amd.com; ray.huang(a)amd.com
> Subject: Re: [EXTERNAL] Re: [PATCH v2 2/2] cpufreq/amd-pstate: Fix the scaling_max_freq setting on shared memory CPPC systems
>
> Hi Morgan,
>
> Please apply these 3 commits:
>
> commit 12753d71e8c5 ("ACPI: CPPC: Add helper to get the highest performance value") commit ed429c686b79 ("cpufreq: amd-pstate: Enable amd-pstate preferred core support") commit 3d291fe47fe1 ("cpufreq: amd-pstate: fix the highest frequency issue which limits performance")
>
> The first two should help your system, the third will prevent introducing a regression on a different one.
>
> Assuming that works we should ask @stable to pull all 3 in to fix this regression.
>
> Thanks,
>
> On 9/4/2024 08:57, Mario Limonciello wrote:
>> Morgan,
>>
>> I was referring specfiically to the version that landed in Linus' tree:
>> https://urldefense.us/v3/__https://git.kernel.org/torvalds/c/8164f7433
>> 264__;!!C5Asm8uRnZQmlRln!aIZEDEbIUKD7OrxN0b0KjoqKYDL2yMkwk4EK7x_oSnyHQ
>> 6MEq7yt6JHjd0TD9DgEYEWDcF58OKL8c7G11bT3dSqL8eM$
>>
>> But yeah it's effectively the same thing. In any case, it's not the
>> solution.
>>
>> We had some internal discussion and suspect this is due to missing
>> prefcore patches in 6.6 as that feature landed in 6.9. We'll try to
>> reproduce this on a Rome system and come back with our findings and
>> suggestions what to do.
>>
>> Thanks,
>>
>