Hello,
Please consider applying the following commits for 6.12.y:
c104c16073b7 ("Kunit to check the longest symbol length")
54338750578f ("x86/tools: Drop duplicate unlikely() definition in
insn_decoder_test.c")
They should apply cleanly.
Those two commits implement a kunit test to verify that a symbol with
KSYM_NAME_LEN of 512 can be read.
The first commit implements the test. This commit also includes
a fix for the test x86/insn_decoder_test. In the case a symbol exceeds the
symbol length limit, an error will happen:
arch/x86/tools/insn_decoder_test: error: malformed line 1152000:
tBb_+0xf2>
..which overflowed by 10 characters reading this line:
ffffffff81458193: 74 3d je
ffffffff814581d2
<_RNvXse_NtNtNtCshGpAVYOtgW1_4core4iter8adapters7flattenINtB5_13FlattenCompatINtNtB7_3map3MapNtNtNtBb_3str4iter5CharsNtB1v_17CharEscapeDefaultENtNtBb_4char13EscapeDefaultENtNtBb_3fmt5Debug3fmtBb_+0xf2>
The fix was proposed in [1] and initially mentioned at [2].
The second commit fixes a warning when building with clang because
there was a definition of unlikely from compiler.h in tools/include/linux,
which conflicted with the one in the instruction decoder selftest.
[1] https://lore.kernel.org/lkml/Y9ES4UKl%2F+DtvAVS@gmail.com/
[2] https://lore.kernel.org/lkml/320c4dba-9919-404b-8a26-a8af16be1845@app.fastm…
I will send something similar to 6.6.y and 6.1.y
Thanks!
Best regards,
Sergio
If speed_hz < AMD_SPI_MIN_HZ, amd_set_spi_freq() iterates over the
entire amd_spi_freq array without breaking out early, causing 'i' to go
beyond the array bounds.
Fix that by stopping the loop when it gets to the last entry, so the low
speed_hz value gets clamped up to AMD_SPI_MIN_HZ.
Fixes the following warning with an UBSAN kernel:
drivers/spi/spi-amd.o: error: objtool: amd_set_spi_freq() falls through to next function amd_spi_set_opcode()
Fixes: 3fe26121dc3a ("spi: amd: Configure device speed")
Reported-by: kernel test robot <lkp(a)intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe(a)kernel.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Acked-by: Mark Brown <broonie(a)kernel.org>
Cc: Raju Rangoju <Raju.Rangoju(a)amd.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Link: https://lore.kernel.org/r/78fef0f2434f35be9095bcc9ffa23dd8cab667b9.17428528…
Closes: https://lore.kernel.org/r/202503161828.RUk9EhWx-lkp@intel.com/
Signed-off-by: Dmitriy Privalov <d.privalov(a)omp.ru>
---
drivers/spi/spi-amd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/spi/spi-amd.c b/drivers/spi/spi-amd.c
index bfc3ab5f39ea..b53301e563bc 100644
--- a/drivers/spi/spi-amd.c
+++ b/drivers/spi/spi-amd.c
@@ -243,7 +243,7 @@ static int amd_set_spi_freq(struct amd_spi *amd_spi, u32 speed_hz)
if (speed_hz < AMD_SPI_MIN_HZ)
return -EINVAL;
- for (i = 0; i < ARRAY_SIZE(amd_spi_freq); i++)
+ for (i = 0; i < ARRAY_SIZE(amd_spi_freq)-1; i++)
if (speed_hz >= amd_spi_freq[i].speed_hz)
break;
--
2.34.1
Hi,
in the last week, after updating to 6.6.92, we’ve encountered a number of VMs reporting temporarily hung tasks blocking the whole system for a few minutes. They unblock by themselves and have similar tracebacks.
The IO PSIs show 100% pressure for that time, but the underlying devices are still processing read and write IO (well within their capacity). I’ve eliminated the underlying storage (Ceph) as the source of problems as I couldn’t find any latency outliers or significant queuing during that time.
I’ve seen somewhat similar reports on 6.6.88 and 6.6.77, but those might have been different outliers.
I’m attaching 3 logs - my intuition and the data so far leads me to consider this might be a kernel bug. I haven’t found a way to reproduce this, yet.
Regards,
Christian
--
Christian Theune · ct(a)flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick
[ Upstream commit 6043b794c7668c19dabc4a93c75b924a19474d59 ]
During ILA address translations, the L4 checksums can be handled in
different ways. One of them, adj-transport, consist in parsing the
transport layer and updating any found checksum. This logic relies on
inet_proto_csum_replace_by_diff and produces an incorrect skb->csum when
in state CHECKSUM_COMPLETE.
This bug can be reproduced with a simple ILA to SIR mapping, assuming
packets are received with CHECKSUM_COMPLETE:
$ ip a show dev eth0
14: eth0@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 62:ae:35:9e:0f:8d brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 3333:0:0:1::c078/64 scope global
valid_lft forever preferred_lft forever
inet6 fd00:10:244:1::c078/128 scope global nodad
valid_lft forever preferred_lft forever
inet6 fe80::60ae:35ff:fe9e:f8d/64 scope link proto kernel_ll
valid_lft forever preferred_lft forever
$ ip ila add loc_match fd00:10:244:1 loc 3333:0:0:1 \
csum-mode adj-transport ident-type luid dev eth0
Then I hit [fd00:10:244:1::c078]:8000 with a server listening only on
[3333:0:0:1::c078]:8000. With the bug, the SYN packet is dropped with
SKB_DROP_REASON_TCP_CSUM after inet_proto_csum_replace_by_diff changed
skb->csum. The translation and drop are visible on pwru [1] traces:
IFACE TUPLE FUNC
eth0:9 [fd00:10:244:3::3d8]:51420->[fd00:10:244:1::c078]:8000(tcp) ipv6_rcv
eth0:9 [fd00:10:244:3::3d8]:51420->[fd00:10:244:1::c078]:8000(tcp) ip6_rcv_core
eth0:9 [fd00:10:244:3::3d8]:51420->[fd00:10:244:1::c078]:8000(tcp) nf_hook_slow
eth0:9 [fd00:10:244:3::3d8]:51420->[fd00:10:244:1::c078]:8000(tcp) inet_proto_csum_replace_by_diff
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) tcp_v6_early_demux
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ip6_route_input
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ip6_input
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ip6_input_finish
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ip6_protocol_deliver_rcu
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) raw6_local_deliver
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ipv6_raw_deliver
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) tcp_v6_rcv
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) __skb_checksum_complete
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) kfree_skb_reason(SKB_DROP_REASON_TCP_CSUM)
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) skb_release_head_state
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) skb_release_data
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) skb_free_head
eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) kfree_skbmem
This is happening because inet_proto_csum_replace_by_diff is updating
skb->csum when it shouldn't. The L4 checksum is updated such that it
"cancels" the IPv6 address change in terms of checksum computation, so
the impact on skb->csum is null.
Note this would be different for an IPv4 packet since three fields
would be updated: the IPv4 address, the IP checksum, and the L4
checksum. Two would cancel each other and skb->csum would still need
to be updated to take the L4 checksum change into account.
This patch fixes it by passing an ipv6 flag to
inet_proto_csum_replace_by_diff, to skip the skb->csum update if we're
in the IPv6 case. Note the behavior of the only other user of
inet_proto_csum_replace_by_diff, the BPF subsystem, is left as is in
this patch and fixed in the subsequent patch.
With the fix, using the reproduction from above, I can confirm
skb->csum is not touched by inet_proto_csum_replace_by_diff and the TCP
SYN proceeds to the application after the ILA translation.
Link: https://github.com/cilium/pwru [1]
Fixes: 65d7ab8de582 ("net: Identifier Locator Addressing module")
Signed-off-by: Paul Chaignon <paul.chaignon(a)gmail.com>
Acked-by: Daniel Borkmann <daniel(a)iogearbox.net>
Link: https://patch.msgid.link/b5539869e3550d46068504feb02d37653d939c0b.174850948…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Paul Chaignon <paul.chaignon(a)gmail.com>
---
include/net/checksum.h | 2 +-
net/core/filter.c | 2 +-
net/core/utils.c | 4 ++--
net/ipv6/ila/ila_common.c | 6 +++---
4 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/include/net/checksum.h b/include/net/checksum.h
index 1338cb92c8e7..28b101f26636 100644
--- a/include/net/checksum.h
+++ b/include/net/checksum.h
@@ -158,7 +158,7 @@ void inet_proto_csum_replace16(__sum16 *sum, struct sk_buff *skb,
const __be32 *from, const __be32 *to,
bool pseudohdr);
void inet_proto_csum_replace_by_diff(__sum16 *sum, struct sk_buff *skb,
- __wsum diff, bool pseudohdr);
+ __wsum diff, bool pseudohdr, bool ipv6);
static __always_inline
void inet_proto_csum_replace2(__sum16 *sum, struct sk_buff *skb,
diff --git a/net/core/filter.c b/net/core/filter.c
index 99b23fd2f509..e0d978c1a4cd 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1999,7 +1999,7 @@ BPF_CALL_5(bpf_l4_csum_replace, struct sk_buff *, skb, u32, offset,
if (unlikely(from != 0))
return -EINVAL;
- inet_proto_csum_replace_by_diff(ptr, skb, to, is_pseudo);
+ inet_proto_csum_replace_by_diff(ptr, skb, to, is_pseudo, false);
break;
case 2:
inet_proto_csum_replace2(ptr, skb, from, to, is_pseudo);
diff --git a/net/core/utils.c b/net/core/utils.c
index 27f4cffaae05..b8c21a859e27 100644
--- a/net/core/utils.c
+++ b/net/core/utils.c
@@ -473,11 +473,11 @@ void inet_proto_csum_replace16(__sum16 *sum, struct sk_buff *skb,
EXPORT_SYMBOL(inet_proto_csum_replace16);
void inet_proto_csum_replace_by_diff(__sum16 *sum, struct sk_buff *skb,
- __wsum diff, bool pseudohdr)
+ __wsum diff, bool pseudohdr, bool ipv6)
{
if (skb->ip_summed != CHECKSUM_PARTIAL) {
csum_replace_by_diff(sum, diff);
- if (skb->ip_summed == CHECKSUM_COMPLETE && pseudohdr)
+ if (skb->ip_summed == CHECKSUM_COMPLETE && pseudohdr && !ipv6)
skb->csum = ~csum_sub(diff, skb->csum);
} else if (pseudohdr) {
*sum = ~csum_fold(csum_add(diff, csum_unfold(*sum)));
diff --git a/net/ipv6/ila/ila_common.c b/net/ipv6/ila/ila_common.c
index 95e9146918cc..b8d43ed4689d 100644
--- a/net/ipv6/ila/ila_common.c
+++ b/net/ipv6/ila/ila_common.c
@@ -86,7 +86,7 @@ static void ila_csum_adjust_transport(struct sk_buff *skb,
diff = get_csum_diff(ip6h, p);
inet_proto_csum_replace_by_diff(&th->check, skb,
- diff, true);
+ diff, true, true);
}
break;
case NEXTHDR_UDP:
@@ -97,7 +97,7 @@ static void ila_csum_adjust_transport(struct sk_buff *skb,
if (uh->check || skb->ip_summed == CHECKSUM_PARTIAL) {
diff = get_csum_diff(ip6h, p);
inet_proto_csum_replace_by_diff(&uh->check, skb,
- diff, true);
+ diff, true, true);
if (!uh->check)
uh->check = CSUM_MANGLED_0;
}
@@ -111,7 +111,7 @@ static void ila_csum_adjust_transport(struct sk_buff *skb,
diff = get_csum_diff(ip6h, p);
inet_proto_csum_replace_by_diff(&ih->icmp6_cksum, skb,
- diff, true);
+ diff, true, true);
}
break;
}
--
2.43.0
From: Igor Pylypiv <ipylypiv(a)google.com>
[ Upstream commit e4f949ef1516c0d74745ee54a0f4882c1f6c7aea ]
pm8001_phy_control() populates the enable_completion pointer with a stack
address, sends a PHY_LINK_RESET / PHY_HARD_RESET, waits 300 ms, and
returns. The problem arises when a phy control response comes late. After
300 ms the pm8001_phy_control() function returns and the passed
enable_completion stack address is no longer valid. Late phy control
response invokes complete() on a dangling enable_completion pointer which
leads to a kernel crash.
Signed-off-by: Igor Pylypiv <ipylypiv(a)google.com>
Signed-off-by: Terrence Adams <tadamsjr(a)google.com>
Link: https://lore.kernel.org/r/20240627155924.2361370-2-tadamsjr@google.com
Acked-by: Jack Wang <jinpu.wang(a)ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Xiangyu Chen <xiangyu.chen(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Test info:
Building OS: Ubuntu 22.04.5
GCC: gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)
Base Tree: https://web.git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
Builds:
make defconfig; make bzImage
make allyesconfig; make bzImage
make allmodconfig; make bzImage
Boot target: Intel Basking Ridge
---
drivers/scsi/pm8001/pm8001_sas.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/pm8001/pm8001_sas.c b/drivers/scsi/pm8001/pm8001_sas.c
index a87c3d7e3e5c..f491edf73e23 100644
--- a/drivers/scsi/pm8001/pm8001_sas.c
+++ b/drivers/scsi/pm8001/pm8001_sas.c
@@ -168,7 +168,7 @@ int pm8001_phy_control(struct asd_sas_phy *sas_phy, enum phy_func func,
unsigned long flags;
pm8001_ha = sas_phy->ha->lldd_ha;
phy = &pm8001_ha->phy[phy_id];
- pm8001_ha->phy[phy_id].enable_completion = &completion;
+
switch (func) {
case PHY_FUNC_SET_LINK_RATE:
rates = funcdata;
@@ -181,6 +181,7 @@ int pm8001_phy_control(struct asd_sas_phy *sas_phy, enum phy_func func,
rates->maximum_linkrate;
}
if (pm8001_ha->phy[phy_id].phy_state == PHY_LINK_DISABLE) {
+ pm8001_ha->phy[phy_id].enable_completion = &completion;
PM8001_CHIP_DISP->phy_start_req(pm8001_ha, phy_id);
wait_for_completion(&completion);
}
@@ -189,6 +190,7 @@ int pm8001_phy_control(struct asd_sas_phy *sas_phy, enum phy_func func,
break;
case PHY_FUNC_HARD_RESET:
if (pm8001_ha->phy[phy_id].phy_state == PHY_LINK_DISABLE) {
+ pm8001_ha->phy[phy_id].enable_completion = &completion;
PM8001_CHIP_DISP->phy_start_req(pm8001_ha, phy_id);
wait_for_completion(&completion);
}
@@ -197,6 +199,7 @@ int pm8001_phy_control(struct asd_sas_phy *sas_phy, enum phy_func func,
break;
case PHY_FUNC_LINK_RESET:
if (pm8001_ha->phy[phy_id].phy_state == PHY_LINK_DISABLE) {
+ pm8001_ha->phy[phy_id].enable_completion = &completion;
PM8001_CHIP_DISP->phy_start_req(pm8001_ha, phy_id);
wait_for_completion(&completion);
}
--
2.34.1
commit de5fbbe1531f645c8b56098be8d1faf31e46f7f0 upstream
The appletbdrm driver is exclusively for Touch Bars on x86 Intel Macs.
The M1 Macs have a separate driver. So, lets avoid compiling it for
other architectures.
Signed-off-by: Aditya Garg <gargaditya08(a)live.com>
Reviewed-by: Alyssa Rosenzweig <alyssa(a)rosenzweig.io>
Link: https://lore.kernel.org/r/PN3PR01MB95970778982F28E4A3751392B8B72@PN3PR01MB9…
Signed-off-by: Alyssa Rosenzweig <alyssa(a)rosenzweig.io>
---
Sending this since https://lore.kernel.org/stable/20250617152509.019353397@linuxfoundation.org/
was also backported to 6.15
drivers/gpu/drm/tiny/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/tiny/Kconfig b/drivers/gpu/drm/tiny/Kconfig
index 95c1457d7..d66681d0e 100644
--- a/drivers/gpu/drm/tiny/Kconfig
+++ b/drivers/gpu/drm/tiny/Kconfig
@@ -3,6 +3,7 @@
config DRM_APPLETBDRM
tristate "DRM support for Apple Touch Bars"
depends on DRM && USB && MMU
+ depends on X86 || COMPILE_TEST
select DRM_GEM_SHMEM_HELPER
select DRM_KMS_HELPER
help
--
2.49.0
Jan,
I noticed that fanotify22, the FAN_FS_ERROR test has regressed in the
5.15.y stable tree.
This is because commit d3476f3dad4a ("ext4: don't set SB_RDONLY after
filesystem errors") was backported to 5.15.y and the later Fixes
commit could not be cleanly applied to 5.15.y over the new mount api
re-factoring.
I am not sure it is critical to fix this regression, because it is
mostly a regression in a test feature, but I think the backport is
pretty simple, although I could be missing something.
Please ACK if you agree that this backport should be applied to 5.15.y.
Thanks,
Amir.
Amir Goldstein (2):
ext4: make 'abort' mount option handling standard
ext4: avoid remount errors with 'abort' mount option
fs/ext4/ext4.h | 1 +
fs/ext4/super.c | 15 +++++++++------
2 files changed, 10 insertions(+), 6 deletions(-)
--
2.47.1
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x f90fff1e152dedf52b932240ebbd670d83330eca
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025061744-precinct-rubble-45c9@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f90fff1e152dedf52b932240ebbd670d83330eca Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg(a)redhat.com>
Date: Fri, 13 Jun 2025 19:26:50 +0200
Subject: [PATCH] posix-cpu-timers: fix race between handle_posix_cpu_timers()
and posix_cpu_timer_del()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
If an exiting non-autoreaping task has already passed exit_notify() and
calls handle_posix_cpu_timers() from IRQ, it can be reaped by its parent
or debugger right after unlock_task_sighand().
If a concurrent posix_cpu_timer_del() runs at that moment, it won't be
able to detect timer->it.cpu.firing != 0: cpu_timer_task_rcu() and/or
lock_task_sighand() will fail.
Add the tsk->exit_state check into run_posix_cpu_timers() to fix this.
This fix is not needed if CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y, because
exit_task_work() is called before exit_notify(). But the check still
makes sense, task_work_add(&tsk->posix_cputimers_work.work) will fail
anyway in this case.
Cc: stable(a)vger.kernel.org
Reported-by: Benoît Sevens <bsevens(a)google.com>
Fixes: 0bdd2ed4138e ("sched: run_posix_cpu_timers: Don't check ->exit_state, use lock_task_sighand()")
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 50e8d04ab661..2e5b89d7d866 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1405,6 +1405,15 @@ void run_posix_cpu_timers(void)
lockdep_assert_irqs_disabled();
+ /*
+ * Ensure that release_task(tsk) can't happen while
+ * handle_posix_cpu_timers() is running. Otherwise, a concurrent
+ * posix_cpu_timer_del() may fail to lock_task_sighand(tsk) and
+ * miss timer->it.cpu.firing != 0.
+ */
+ if (tsk->exit_state)
+ return;
+
/*
* If the actual expiry is deferred to task work context and the
* work is already scheduled there is no point to do anything here.