This is the start of the stable review cycle for the 6.12.57 release.
There are 40 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 02 Nov 2025 14:00:34 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.57-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.12.57-rc1
Edward Cree <ecree.xilinx(a)gmail.com>
sfc: fix NULL dereferences in ef100_process_design_param()
Xiaogang Chen <xiaogang.chen(a)amd.com>
udmabuf: fix a buf size overflow issue during udmabuf creation
Aditya Kumar Singh <quic_adisi(a)quicinc.com>
wifi: ath12k: fix read pointer after free in ath12k_mac_assign_vif_to_vdev()
Kees Bakker <kees(a)ijzerbout.nl>
iommu/vt-d: Avoid use of NULL after WARN_ON_ONCE
William Breathitt Gray <wbg(a)kernel.org>
gpio: idio-16: Define fixed direction of the GPIO lines
Ioana Ciornei <ioana.ciornei(a)nxp.com>
gpio: regmap: add the .fixed_direction_output configuration parameter
Mathieu Dubois-Briand <mathieu.dubois-briand(a)bootlin.com>
gpio: regmap: Allow to allocate regmap-irq device
Vincent Mailhol <mailhol.vincent(a)wanadoo.fr>
bits: introduce fixed-type GENMASK_U*()
Vincent Mailhol <mailhol.vincent(a)wanadoo.fr>
bits: add comments and newlines to #if, #else and #endif directives
Wang Liang <wangliang74(a)huawei.com>
bonding: check xdp prog when set bond mode
Hangbin Liu <liuhangbin(a)gmail.com>
bonding: return detailed error when loading native XDP fails
Alexander Wetzel <Alexander(a)wetzel-home.de>
wifi: cfg80211: Add missing lock in cfg80211_check_and_end_cac()
Chao Yu <chao(a)kernel.org>
f2fs: fix to avoid panic once fallocation fails for pinfile
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
mptcp: pm: in-kernel: C-flag: handle late ADD_ADDR
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
selftests: mptcp: join: mark 'delete re-add signal' as skipped if not supported
Geliang Tang <geliang(a)kernel.org>
selftests: mptcp: disable add_addr retrans in endpoint_tests
Jonathan Corbet <corbet(a)lwn.net>
docs: kdoc: handle the obsolescensce of docutils.ErrorString()
Menglong Dong <menglong8.dong(a)gmail.com>
arch: Add the macro COMPILE_OFFSETS to all the asm-offsets.c
Tejun Heo <tj(a)kernel.org>
sched_ext: Make qmap dump operation non-destructive
Filipe Manana <fdmanana(a)suse.com>
btrfs: use smp_mb__after_atomic() when forcing COW in create_pending_snapshot()
Qu Wenruo <wqu(a)suse.com>
btrfs: tree-checker: add inode extref checks
Filipe Manana <fdmanana(a)suse.com>
btrfs: abort transaction if we fail to update inode in log replay dir fixup
Filipe Manana <fdmanana(a)suse.com>
btrfs: use level argument in log tree walk callback replay_one_buffer()
Filipe Manana <fdmanana(a)suse.com>
btrfs: always drop log root tree reference in btrfs_replay_log()
Thorsten Blum <thorsten.blum(a)linux.dev>
btrfs: scrub: replace max_t()/min_t() with clamp() in scrub_throttle_dev_io()
Naohiro Aota <naohiro.aota(a)wdc.com>
btrfs: zoned: refine extent allocator hint selection
Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
btrfs: zoned: return error from btrfs_zone_finish_endio()
Filipe Manana <fdmanana(a)suse.com>
btrfs: abort transaction in the process_one_buffer() log tree walk callback
Filipe Manana <fdmanana(a)suse.com>
btrfs: abort transaction on specific error places when walking log tree
Chen Ridong <chenridong(a)huawei.com>
cpuset: Use new excpus for nocpu error check when enabling root partition
Avadhut Naik <avadhut.naik(a)amd.com>
EDAC/mc_sysfs: Increase legacy channel support to 16
David Kaplan <david.kaplan(a)amd.com>
x86/bugs: Fix reporting of LFENCE retpoline
David Kaplan <david.kaplan(a)amd.com>
x86/bugs: Report correct retbleed mitigation status
Jiri Olsa <jolsa(a)kernel.org>
seccomp: passthrough uprobe systemcall without filtering
Josh Poimboeuf <jpoimboe(a)kernel.org>
perf: Skip user unwind if the task is a kernel thread
Josh Poimboeuf <jpoimboe(a)kernel.org>
perf: Have get_perf_callchain() return NULL if crosstask and user are set
Steven Rostedt <rostedt(a)goodmis.org>
perf: Use current->flags & PF_KTHREAD|PF_USER_WORKER instead of current->mm == NULL
Dapeng Mi <dapeng1.mi(a)linux.intel.com>
perf/x86/intel: Add ICL_FIXED_0_ADAPTIVE bit into INTEL_FIXED_BITS_MASK
Richard Guy Briggs <rgb(a)redhat.com>
audit: record fanotify event regardless of presence of rules
Xiang Mei <xmei5(a)asu.edu>
net/sched: sch_qfq: Fix null-deref in agg_dequeue
-------------
Diffstat:
Documentation/sphinx/kernel_abi.py | 4 +-
Documentation/sphinx/kernel_feat.py | 4 +-
Documentation/sphinx/kernel_include.py | 6 ++-
Documentation/sphinx/maintainers_include.py | 4 +-
Makefile | 4 +-
arch/alpha/kernel/asm-offsets.c | 1 +
arch/arc/kernel/asm-offsets.c | 1 +
arch/arm/kernel/asm-offsets.c | 2 +
arch/arm64/kernel/asm-offsets.c | 1 +
arch/csky/kernel/asm-offsets.c | 1 +
arch/hexagon/kernel/asm-offsets.c | 1 +
arch/loongarch/kernel/asm-offsets.c | 2 +
arch/m68k/kernel/asm-offsets.c | 1 +
arch/microblaze/kernel/asm-offsets.c | 1 +
arch/mips/kernel/asm-offsets.c | 2 +
arch/nios2/kernel/asm-offsets.c | 1 +
arch/openrisc/kernel/asm-offsets.c | 1 +
arch/parisc/kernel/asm-offsets.c | 1 +
arch/powerpc/kernel/asm-offsets.c | 1 +
arch/riscv/kernel/asm-offsets.c | 1 +
arch/s390/kernel/asm-offsets.c | 1 +
arch/sh/kernel/asm-offsets.c | 1 +
arch/sparc/kernel/asm-offsets.c | 1 +
arch/um/kernel/asm-offsets.c | 2 +
arch/x86/events/intel/core.c | 10 ++--
arch/x86/include/asm/perf_event.h | 6 ++-
arch/x86/kernel/cpu/bugs.c | 9 ++--
arch/x86/kvm/pmu.h | 2 +-
arch/xtensa/kernel/asm-offsets.c | 1 +
drivers/dma-buf/udmabuf.c | 2 +-
drivers/edac/edac_mc_sysfs.c | 24 ++++++++++
drivers/gpio/gpio-idio-16.c | 5 ++
drivers/gpio/gpio-regmap.c | 53 ++++++++++++++++++--
drivers/iommu/intel/iommu.c | 7 +--
drivers/net/bonding/bond_main.c | 11 +++--
drivers/net/bonding/bond_options.c | 3 ++
drivers/net/ethernet/sfc/ef100_netdev.c | 6 +--
drivers/net/ethernet/sfc/ef100_nic.c | 47 ++++++++----------
drivers/net/wireless/ath/ath12k/mac.c | 6 +--
fs/btrfs/disk-io.c | 2 +-
fs/btrfs/extent-tree.c | 6 ++-
fs/btrfs/inode.c | 7 +--
fs/btrfs/scrub.c | 3 +-
fs/btrfs/transaction.c | 2 +-
fs/btrfs/tree-checker.c | 37 ++++++++++++++
fs/btrfs/tree-log.c | 64 +++++++++++++++++++------
fs/btrfs/zoned.c | 8 ++--
fs/btrfs/zoned.h | 9 ++--
fs/f2fs/file.c | 8 ++--
fs/f2fs/segment.c | 20 ++++----
include/linux/audit.h | 2 +-
include/linux/bitops.h | 1 -
include/linux/bits.h | 38 ++++++++++++++-
include/linux/gpio/regmap.h | 16 +++++++
include/net/bonding.h | 1 +
include/net/pkt_sched.h | 25 +++++++++-
kernel/cgroup/cpuset.c | 6 +--
kernel/events/callchain.c | 16 +++----
kernel/events/core.c | 7 +--
kernel/seccomp.c | 32 ++++++++++---
net/mptcp/pm_netlink.c | 6 +++
net/sched/sch_api.c | 10 ----
net/sched/sch_hfsc.c | 16 -------
net/sched/sch_qfq.c | 2 +-
net/wireless/reg.c | 4 ++
tools/sched_ext/scx_qmap.bpf.c | 18 ++++++-
tools/testing/selftests/net/mptcp/mptcp_join.sh | 3 +-
67 files changed, 442 insertions(+), 164 deletions(-)
Hi,
This Linux kernel patch series introduces support for error recovery for
passthrough PCI devices on System Z (s390x).
Background
----------
For PCI devices on s390x an operating system receives platform specific
error events from firmware rather than through AER.Today for
passthrough/userspace devices, we don't attempt any error recovery and
ignore any error events for the devices. The passthrough/userspace devices
are managed by the vfio-pci driver. The driver does register error handling
callbacks (error_detected), and on an error trigger an eventfd to
userspace. But we need a mechanism to notify userspace
(QEMU/guest/userspace drivers) about the error event.
Proposal
--------
We can expose this error information (currently only the PCI Error Code)
via a device feature. Userspace can then obtain the error information
via VFIO_DEVICE_FEATURE ioctl and take appropriate actions such as driving
a device reset.
This is how a typical flow for passthrough devices to a VM would work:
For passthrough devices to a VM, the driver bound to the device on the host
is vfio-pci. vfio-pci driver does support the error_detected() callback
(vfio_pci_core_aer_err_detected()), and on an PCI error s390x recovery
code on the host will call the vfio-pci error_detected() callback. The
vfio-pci error_detected() callback will notify userspace/QEMU via an
eventfd, and return PCI_ERS_RESULT_CAN_RECOVER. At this point the s390x
error recovery on the host will skip any further action(see patch 6) and
let userspace drive the error recovery.
Once userspace/QEMU is notified, it then injects this error into the VM
so device drivers in the VM can take recovery actions. For example for a
passthrough NVMe device, the VM's OS NVMe driver will access the device.
At this point the VM's NVMe driver's error_detected() will drive the
recovery by returning PCI_ERS_RESULT_NEED_RESET, and the s390x error
recovery in the VM's OS will try to do a reset. Resets are privileged
operations and so the VM will need intervention from QEMU to perform the
reset. QEMU will invoke the VFIO_DEVICE_RESET ioctl to now notify the
host that the VM is requesting a reset of the device. The vfio-pci driver
on the host will then perform the reset on the device to recover it.
Thanks
Farhan
ChangeLog
---------
v4 series https://lore.kernel.org/all/20250924171628.826-1-alifm@linux.ibm.com/
v4 -> v5
- Rebase on 6.18-rc5
- Move bug fixes to the beginning of the series (patch 1 and 2). These patches
were posted as a separate fixes series
https://lore.kernel.org/all/a14936ac-47d6-461b-816f-0fd66f869b0f@linux.ibm.…
- Add matching pci_put_dev() for pci_get_slot() (patch 6).
v3 series https://lore.kernel.org/all/20250911183307.1910-1-alifm@linux.ibm.com/
v3 -> v4
- Remove warn messages for each PCI capability not restored (patch 1)
- Check PCI_COMMAND and PCI_STATUS register for error value instead of device id
(patch 1)
- Fix kernel crash in patch 3
- Added reviewed by tags
- Address comments from Niklas's (patches 4, 5, 7)
- Fix compilation error non s390x system (patch 8)
- Explicitly align struct vfio_device_feature_zpci_err (patch 8)
v2 series https://lore.kernel.org/all/20250825171226.1602-1-alifm@linux.ibm.com/
v2 -> v3
- Patch 1 avoids saving any config space state if the device is in error
(suggested by Alex)
- Patch 2 adds additional check only for FLR reset to try other function
reset method (suggested by Alex).
- Patch 3 fixes a bug in s390 for resetting PCI devices with multiple
functions. Creates a new flag pci_slot to allow per function slot.
- Patch 4 fixes a bug in s390 for resource to bus address translation.
- Rebase on 6.17-rc5
v1 series https://lore.kernel.org/all/20250813170821.1115-1-alifm@linux.ibm.com/
v1 - > v2
- Patches 1 and 2 adds some additional checks for FLR/PM reset to
try other function reset method (suggested by Alex).
- Patch 3 fixes a bug in s390 for resetting PCI devices with multiple
functions.
- Patch 7 adds a new device feature for zPCI devices for the VFIO_DEVICE_FEATURE
ioctl. The ioctl is used by userspace to retriece any PCI error
information for the device (suggested by Alex).
- Patch 8 adds a reset_done() callback for the vfio-pci driver, to
restore the state of the device after a reset.
- Patch 9 removes the pcie check for triggering VFIO_PCI_ERR_IRQ_INDEX.
Farhan Ali (9):
PCI: Allow per function PCI slots
s390/pci: Add architecture specific resource/bus address translation
PCI: Avoid saving error values for config space
PCI: Add additional checks for flr reset
s390/pci: Update the logic for detecting passthrough device
s390/pci: Store PCI error information for passthrough devices
vfio-pci/zdev: Add a device feature for error information
vfio: Add a reset_done callback for vfio-pci driver
vfio: Remove the pcie check for VFIO_PCI_ERR_IRQ_INDEX
arch/s390/include/asm/pci.h | 29 ++++++++
arch/s390/pci/pci.c | 75 +++++++++++++++++++++
arch/s390/pci/pci_event.c | 107 +++++++++++++++++-------------
drivers/pci/host-bridge.c | 4 +-
drivers/pci/pci.c | 37 +++++++++--
drivers/pci/pcie/aer.c | 3 +
drivers/pci/pcie/dpc.c | 3 +
drivers/pci/pcie/ptm.c | 3 +
drivers/pci/slot.c | 25 ++++++-
drivers/pci/tph.c | 3 +
drivers/pci/vc.c | 3 +
drivers/vfio/pci/vfio_pci_core.c | 20 ++++--
drivers/vfio/pci/vfio_pci_intrs.c | 3 +-
drivers/vfio/pci/vfio_pci_priv.h | 9 +++
drivers/vfio/pci/vfio_pci_zdev.c | 45 ++++++++++++-
include/linux/pci.h | 1 +
include/uapi/linux/vfio.h | 15 +++++
17 files changed, 321 insertions(+), 64 deletions(-)
--
2.43.0
The function samsung_dsim_parse_dt() calls of_graph_get_endpoint_by_regs()
to get the endpoint device node, but fails to call of_node_put() to release
the reference when the function returns. This results in a device node
reference leak.
Fix this by adding the missing of_node_put() call before returning from
the function.
Found via static analysis and code review.
Fixes: 77169a11d4e9 ("drm/bridge: samsung-dsim: add driver support for exynos7870 DSIM bridge")
Cc: stable(a)vger.kernel.org
Signed-off-by: Miaoqian Lin <linmq006(a)gmail.com>
---
drivers/gpu/drm/bridge/samsung-dsim.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/bridge/samsung-dsim.c b/drivers/gpu/drm/bridge/samsung-dsim.c
index eabc4c32f6ab..1a5acd5077ad 100644
--- a/drivers/gpu/drm/bridge/samsung-dsim.c
+++ b/drivers/gpu/drm/bridge/samsung-dsim.c
@@ -2086,6 +2086,7 @@ static int samsung_dsim_parse_dt(struct samsung_dsim *dsi)
if (lane_polarities[1])
dsi->swap_dn_dp_data = true;
}
+ of_node_put(endpoint);
return 0;
}
--
2.39.5 (Apple Git-154)
The current LoongArch BPF trampoline implementation is incompatible
with tracing functions in kernel modules. This causes several severe
and user-visible problems:
* Kernel lockups when a BPF program is attached to a module function [0].
* The `bpf_selftests/module_attach` test fails consistently.
* Critical kernel modules like WireGuard experience traffic disruption
when their functions are traced with fentry [1].
Given the severity and the potential for other unknown side-effects,
it is safest to disable the feature entirely for now. This patch
prevents the BPF subsystem from allowing trampoline attachments to
module functions on LoongArch.
This is a temporary mitigation until the core issues in the trampoline
code for module handling can be identified and fixed.
[root@fedora bpf]# ./test_progs -a module_attach -v
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_module_attach:PASS:skel_open 0 nsec
test_module_attach:PASS:set_attach_target 0 nsec
test_module_attach:PASS:set_attach_target_explicit 0 nsec
test_module_attach:PASS:skel_load 0 nsec
libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
test_module_attach:FAIL:skel_attach skeleton attach failed: -524
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
Successfully unloaded bpf_testmod.ko.
[0]: https://lore.kernel.org/loongarch/CAK3+h2wDmpC-hP4u4pJY8T-yfKyk4yRzpu2LMO+C…
[1]: https://lore.kernel.org/loongarch/CAK3+h2wYcpc+OwdLDUBvg2rF9rvvyc5amfHT-KcF…
Cc: stable(a)vger.kernel.org
Fixes: f9b6b41f0cf3 (“LoongArch: BPF: Add basic bpf trampoline support”)
Closes: https://lore.kernel.org/loongarch/CAK3+h2wDmpC-hP4u4pJY8T-yfKyk4yRzpu2LMO+C…
Acked-by: Hengqi Chen <hengqi.chen(a)gmail.com>
Signed-off-by: Vincent Li <vincent.mc.li(a)gmail.com>
---
arch/loongarch/net/bpf_jit.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c
index cbe53d0b7fb0..49c1d4b95404 100644
--- a/arch/loongarch/net/bpf_jit.c
+++ b/arch/loongarch/net/bpf_jit.c
@@ -1624,6 +1624,9 @@ static int __arch_prepare_bpf_trampoline(struct jit_ctx *ctx, struct bpf_tramp_i
/* Direct jump skips 5 NOP instructions */
else if (is_bpf_text_address((unsigned long)orig_call))
orig_call += LOONGARCH_BPF_FENTRY_NBYTES;
+ /* Module tracing not supported - causes kernel lockups */
+ else if (is_module_text_address((unsigned long)orig_call))
+ return -ENOTSUPP;
if (flags & BPF_TRAMP_F_CALL_ORIG) {
move_addr(ctx, LOONGARCH_GPR_A0, (const u64)im);
--
2.38.1
The current LoongArch BPF trampoline implementation is incompatible
with tracing functions in kernel modules. This causes several severe
and user-visible problems:
* Kernel lockups when a BPF program is attached to a module function [0].
* The `bpf_selftests/module_attach` test fails consistently.
* Critical kernel modules like WireGuard experience traffic disruption
when their functions are traced with fentry [1].
Given the severity and the potential for other unknown side-effects,
it is safest to disable the feature entirely for now. This patch
prevents the BPF subsystem from allowing trampoline attachments to
module functions on LoongArch.
This is a temporary mitigation until the core issues in the trampoline
code for module handling can be identified and fixed.
[root@fedora bpf]# ./test_progs -a module_attach -v
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_module_attach:PASS:skel_open 0 nsec
test_module_attach:PASS:set_attach_target 0 nsec
test_module_attach:PASS:set_attach_target_explicit 0 nsec
test_module_attach:PASS:skel_load 0 nsec
libbpf: prog 'handle_fentry': failed to attach: -ENOTSUPP
libbpf: prog 'handle_fentry': failed to auto-attach: -ENOTSUPP
test_module_attach:FAIL:skel_attach skeleton attach failed: -524
Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED
Successfully unloaded bpf_testmod.ko.
[0]: https://lore.kernel.org/loongarch/CAK3+h2wDmpC-hP4u4pJY8T-yfKyk4yRzpu2LMO+C…
[1]: https://lore.kernel.org/loongarch/CAK3+h2wYcpc+OwdLDUBvg2rF9rvvyc5amfHT-KcF…
Cc: stable(a)vger.kernel.org
Fixes: f9b6b41f0cf3 (“LoongArch: BPF: Add basic bpf trampoline support”)
Closes: https://lore.kernel.org/loongarch/CAK3+h2wDmpC-hP4u4pJY8T-yfKyk4yRzpu2LMO+C…
Acked-by: Hengqi Chen <hengqi.chen(a)gmail.com>
Signed-off-by: Vincent Li <vincent.mc.li(a)gmail.com>
---
arch/loongarch/net/bpf_jit.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/loongarch/net/bpf_jit.c b/arch/loongarch/net/bpf_jit.c
index cbe53d0b7fb0..49c1d4b95404 100644
--- a/arch/loongarch/net/bpf_jit.c
+++ b/arch/loongarch/net/bpf_jit.c
@@ -1624,6 +1624,9 @@ static int __arch_prepare_bpf_trampoline(struct jit_ctx *ctx, struct bpf_tramp_i
/* Direct jump skips 5 NOP instructions */
else if (is_bpf_text_address((unsigned long)orig_call))
orig_call += LOONGARCH_BPF_FENTRY_NBYTES;
+ /* Module tracing not supported - causes kernel lockups */
+ else if (is_module_text_address((unsigned long)orig_call))
+ return -ENOTSUPP;
if (flags & BPF_TRAMP_F_CALL_ORIG) {
move_addr(ctx, LOONGARCH_GPR_A0, (const u64)im);
--
2.38.1
Hi Greg, Sasha,
On 03/11/2025 02:38, gregkh(a)linuxfoundation.org wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> mptcp: drop bogus optimization in __mptcp_check_push()
>
> to the 5.15-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> mptcp-drop-bogus-optimization-in-__mptcp_check_push.patch
> and it can be found in the queue-5.15 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
Can you please drop this patch from v5.15? It looks like it is causing
some issues with MP_PRIO tests. I think that's because back then, the
path is selected differently, with the use of 'msk->last_snd' which will
bypass some decisions to where to send the next data.
I will try to check if another version of this patch is needed for v5.15.
Cheers,
Matt
(I kept the patch below just in case some people from the MPTCP ML want
to react.)
> From stable+bounces-192087-greg=kroah.com(a)vger.kernel.org Mon Nov 3 05:15:58 2025
> From: Sasha Levin <sashal(a)kernel.org>
> Date: Sun, 2 Nov 2025 15:15:50 -0500
> Subject: mptcp: drop bogus optimization in __mptcp_check_push()
> To: stable(a)vger.kernel.org
> Cc: Paolo Abeni <pabeni(a)redhat.com>, Geliang Tang <geliang(a)kernel.org>, Mat Martineau <martineau(a)kernel.org>, "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>, Jakub Kicinski <kuba(a)kernel.org>, Sasha Levin <sashal(a)kernel.org>
> Message-ID: <20251102201550.3588174-1-sashal(a)kernel.org>
>
> From: Paolo Abeni <pabeni(a)redhat.com>
>
> [ Upstream commit 27b0e701d3872ba59c5b579a9e8a02ea49ad3d3b ]
>
> Accessing the transmit queue without owning the msk socket lock is
> inherently racy, hence __mptcp_check_push() could actually quit early
> even when there is pending data.
>
> That in turn could cause unexpected tx lock and timeout.
>
> Dropping the early check avoids the race, implicitly relaying on later
> tests under the relevant lock. With such change, all the other
> mptcp_send_head() call sites are now under the msk socket lock and we
> can additionally drop the now unneeded annotation on the transmit head
> pointer accesses.
>
> Fixes: 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks")
> Cc: stable(a)vger.kernel.org
> Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
> Reviewed-by: Geliang Tang <geliang(a)kernel.org>
> Tested-by: Geliang Tang <geliang(a)kernel.org>
> Reviewed-by: Mat Martineau <martineau(a)kernel.org>
> Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
> Link: https://patch.msgid.link/20251028-net-mptcp-send-timeout-v1-1-38ffff5a9ec8@…
> Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
> [ split upstream __subflow_push_pending modification across __mptcp_push_pending and __mptcp_subflow_push_pending ]
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> ---
> net/mptcp/protocol.c | 13 +++++--------
> net/mptcp/protocol.h | 2 +-
> 2 files changed, 6 insertions(+), 9 deletions(-)
>
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -1137,7 +1137,7 @@ static void __mptcp_clean_una(struct soc
> if (WARN_ON_ONCE(!msk->recovery))
> break;
>
> - WRITE_ONCE(msk->first_pending, mptcp_send_next(sk));
> + msk->first_pending = mptcp_send_next(sk);
> }
>
> dfrag_clear(sk, dfrag);
> @@ -1674,7 +1674,7 @@ void __mptcp_push_pending(struct sock *s
>
> mptcp_update_post_push(msk, dfrag, ret);
> }
> - WRITE_ONCE(msk->first_pending, mptcp_send_next(sk));
> + msk->first_pending = mptcp_send_next(sk);
> }
>
> /* at this point we held the socket lock for the last subflow we used */
> @@ -1732,7 +1732,7 @@ static void __mptcp_subflow_push_pending
>
> mptcp_update_post_push(msk, dfrag, ret);
> }
> - WRITE_ONCE(msk->first_pending, mptcp_send_next(sk));
> + msk->first_pending = mptcp_send_next(sk);
> }
>
> out:
> @@ -1850,7 +1850,7 @@ static int mptcp_sendmsg(struct sock *sk
> get_page(dfrag->page);
> list_add_tail(&dfrag->list, &msk->rtx_queue);
> if (!msk->first_pending)
> - WRITE_ONCE(msk->first_pending, dfrag);
> + msk->first_pending = dfrag;
> }
> pr_debug("msk=%p dfrag at seq=%llu len=%u sent=%u new=%d\n", msk,
> dfrag->data_seq, dfrag->data_len, dfrag->already_sent,
> @@ -2645,7 +2645,7 @@ static void __mptcp_clear_xmit(struct so
> struct mptcp_sock *msk = mptcp_sk(sk);
> struct mptcp_data_frag *dtmp, *dfrag;
>
> - WRITE_ONCE(msk->first_pending, NULL);
> + msk->first_pending = NULL;
> list_for_each_entry_safe(dfrag, dtmp, &msk->rtx_queue, list)
> dfrag_clear(sk, dfrag);
> }
> @@ -3114,9 +3114,6 @@ void __mptcp_data_acked(struct sock *sk)
>
> void __mptcp_check_push(struct sock *sk, struct sock *ssk)
> {
> - if (!mptcp_send_head(sk))
> - return;
> -
> if (!sock_owned_by_user(sk)) {
> struct sock *xmit_ssk = mptcp_subflow_get_send(mptcp_sk(sk));
>
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -325,7 +325,7 @@ static inline struct mptcp_data_frag *mp
> {
> const struct mptcp_sock *msk = mptcp_sk(sk);
>
> - return READ_ONCE(msk->first_pending);
> + return msk->first_pending;
> }
>
> static inline struct mptcp_data_frag *mptcp_send_next(struct sock *sk)
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.