From: Kairui Song <kasong(a)tencent.com>
This reverts commit 78524b05f1a3e16a5d00cc9c6259c41a9d6003ce.
While reviewing recent leaf entry changes, I noticed that commit
78524b05f1a3 ("mm, swap: avoid redundant swap device pinning") isn't
correct. It's true that most all callers of __read_swap_cache_async are
already holding a swap entry reference, so the repeated swap device
pinning isn't needed on the same swap device, but it is possible that
VMA readahead (swap_vma_readahead()) may encounter swap entries from a
different swap device when there are multiple swap devices, and call
__read_swap_cache_async without holding a reference to that swap device.
So it is possible to cause a UAF if swapoff of device A raced with
swapin on device B, and VMA readahead tries to read swap entries from
device A. It's not easy to trigger but in theory possible to cause real
issues. And besides, that commit made swap more vulnerable to issues
like corrupted page tables.
Just revert it. __read_swap_cache_async isn't that sensitive to
performance after all, as it's mostly used for SSD/HDD swap devices with
readahead. SYNCHRONOUS_IO devices may fallback onto it for swap count >
1 entries, but very soon we will have a new helper and routine for
such devices, so they will never touch this helper or have redundant
swap device reference overhead.
Fixes: 78524b05f1a3 ("mm, swap: avoid redundant swap device pinning")
Signed-off-by: Kairui Song <kasong(a)tencent.com>
---
mm/swap_state.c | 14 ++++++--------
mm/zswap.c | 8 +-------
2 files changed, 7 insertions(+), 15 deletions(-)
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 3f85a1c4cfd9..0c25675de977 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -406,13 +406,17 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
struct mempolicy *mpol, pgoff_t ilx, bool *new_page_allocated,
bool skip_if_exists)
{
- struct swap_info_struct *si = __swap_entry_to_info(entry);
+ struct swap_info_struct *si;
struct folio *folio;
struct folio *new_folio = NULL;
struct folio *result = NULL;
void *shadow = NULL;
*new_page_allocated = false;
+ si = get_swap_device(entry);
+ if (!si)
+ return NULL;
+
for (;;) {
int err;
@@ -499,6 +503,7 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
put_swap_folio(new_folio, entry);
folio_unlock(new_folio);
put_and_return:
+ put_swap_device(si);
if (!(*new_page_allocated) && new_folio)
folio_put(new_folio);
return result;
@@ -518,16 +523,11 @@ struct folio *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
struct vm_area_struct *vma, unsigned long addr,
struct swap_iocb **plug)
{
- struct swap_info_struct *si;
bool page_allocated;
struct mempolicy *mpol;
pgoff_t ilx;
struct folio *folio;
- si = get_swap_device(entry);
- if (!si)
- return NULL;
-
mpol = get_vma_policy(vma, addr, 0, &ilx);
folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx,
&page_allocated, false);
@@ -535,8 +535,6 @@ struct folio *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
if (page_allocated)
swap_read_folio(folio, plug);
-
- put_swap_device(si);
return folio;
}
diff --git a/mm/zswap.c b/mm/zswap.c
index 5d0f8b13a958..aefe71fd160c 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1005,18 +1005,12 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
struct folio *folio;
struct mempolicy *mpol;
bool folio_was_allocated;
- struct swap_info_struct *si;
int ret = 0;
/* try to allocate swap cache folio */
- si = get_swap_device(swpentry);
- if (!si)
- return -EEXIST;
-
mpol = get_task_policy(current);
folio = __read_swap_cache_async(swpentry, GFP_KERNEL, mpol,
- NO_INTERLEAVE_INDEX, &folio_was_allocated, true);
- put_swap_device(si);
+ NO_INTERLEAVE_INDEX, &folio_was_allocated, true);
if (!folio)
return -ENOMEM;
---
base-commit: 02dafa01ec9a00c3758c1c6478d82fe601f5f1ba
change-id: 20251109-revert-78524b05f1a3-04a1295bef8a
Best regards,
--
Kairui Song <kasong(a)tencent.com>
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> wrote:
> This is the start of the stable review cycle for the 6.12.58 release.
> There are 565 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
[SNIP]
> Zizhi Wo <wozizhi(a)huaweicloud.com>
> tty/vt: Add missing return value for VT_RESIZE in vt_ioctl()
Locking seems to be messed up in backport of above mentioned patch.
That patch is viewable here:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/…
Upstream uses guard() locking:
| case VT_RESIZE:
| {
| ....
| guard(console_lock)();
| ^^^^^^^^^^^^^^^^^^^^^-------this generates auto-unlock code
| ....
| ret = __vc_resize(vc_cons[i].d, cc, ll, true);
| if (ret)
| return ret;
| ^^^^^^^^^^----------this releases console lock
| ....
| break;
| }
Older stable branches use old-school locking:
| case VT_RESIZE:
| {
| ....
| console_lock();
| ....
| ret = __vc_resize(vc_cons[i].d, cc, ll, true);
| if (ret)
| return ret;
| ^^^^^^^^^^----------this does not release console lock
| ....
| console_unlock();
| break;
| }
Backporting upstream fixes that use guard() locking to older stable
branches that use old-school locking need "extra sports".
Please consider dropping or fixing above mentioned patch.
--
Jari Ruusu 4096R/8132F189 12D6 4C3A DCDA 0AA4 27BD ACDF F073 3C80 8132 F189
The sockmap feature allows bpf syscall from userspace, or based on bpf
sockops, replacing the sk_prot of sockets during protocol stack processing
with sockmap's custom read/write interfaces.
'''
tcp_rcv_state_process()
subflow_syn_recv_sock()
tcp_init_transfer(BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB)
bpf_skops_established <== sockops
bpf_sock_map_update(sk) <== call bpf helper
tcp_bpf_update_proto() <== update sk_prot
'''
Consider two scenarios:
1. When the server has MPTCP enabled and the client also requests MPTCP,
the sk passed to the BPF program is a subflow sk. Since subflows only
handle partial data, replacing their sk_prot is meaningless and will
cause traffic disruption.
2. When the server has MPTCP enabled but the client sends a TCP SYN
without MPTCP, subflow_syn_recv_sock() performs a fallback on the
subflow, replacing the subflow sk's sk_prot with the native sk_prot.
'''
subflow_ulp_fallback()
subflow_drop_ctx()
mptcp_subflow_ops_undo_override()
'''
Subsequently, accept::mptcp_stream_accept::mptcp_fallback_tcp_ops()
converts the subflow to plain TCP.
For the first case, we should prevent it from being combined with sockmap
by setting sk_prot->psock_update_sk_prot to NULL, which will be blocked by
sockmap's own flow.
For the second case, since subflow_syn_recv_sock() has already restored
sk_prot to native tcp_prot/tcpv6_prot, no further action is needed.
Fixes: cec37a6e41aa ("mptcp: Handle MP_CAPABLE options for outgoing connections")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Jiayuan Chen <jiayuan.chen(a)linux.dev>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
net/mptcp/subflow.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index e8325890a322..af707ce0f624 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -2144,6 +2144,10 @@ void __init mptcp_subflow_init(void)
tcp_prot_override = tcp_prot;
tcp_prot_override.release_cb = tcp_release_cb_override;
tcp_prot_override.diag_destroy = tcp_abort_override;
+#ifdef CONFIG_BPF_SYSCALL
+ /* Disable sockmap processing for subflows */
+ tcp_prot_override.psock_update_sk_prot = NULL;
+#endif
#if IS_ENABLED(CONFIG_MPTCP_IPV6)
/* In struct mptcp_subflow_request_sock, we assume the TCP request sock
@@ -2180,6 +2184,10 @@ void __init mptcp_subflow_init(void)
tcpv6_prot_override = tcpv6_prot;
tcpv6_prot_override.release_cb = tcp_release_cb_override;
tcpv6_prot_override.diag_destroy = tcp_abort_override;
+#ifdef CONFIG_BPF_SYSCALL
+ /* Disable sockmap processing for subflows */
+ tcpv6_prot_override.psock_update_sk_prot = NULL;
+#endif
#endif
mptcp_diag_subflow_init(&subflow_ulp_ops);
--
2.43.0
Hello,
New build issue found on stable-rc/linux-5.4.y:
---
clang: error: linker command failed with exit code 1 (use -v to see invocation) in samples/seccomp/bpf-fancy (scripts/Makefile.host:116) [logspec:kbuild,kbuild.other]
---
- dashboard: https://d.kernelci.org/i/maestro:9b282409ffe9399386349927812ed439dcc91837
- giturl: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
- commit HEAD: 350bc296cce9fcac34ec525a838f99ac76e33550
Log excerpt:
=====================================================
.o
/usr/bin/ld: cannot find crtbeginS.o: No such file or directory
/usr/bin/ld: cannot find -lgcc: No such file or directory
/usr/bin/ld: cannot find -lgcc_s: No such file or directory
clang: error: linker command failed with exit code 1 (use -v to see invocation)
=====================================================
# Builds where the incident occurred:
## i386_defconfig+allmodconfig+CONFIG_FRAME_WARN=2048 on (i386):
- compiler: clang-17
- config: https://files.kernelci.org/kbuild-clang-17-i386-allmodconfig-69128f652fd237…
- dashboard: https://d.kernelci.org/build/maestro:69128f652fd2377ea99535c5
#kernelci issue maestro:9b282409ffe9399386349927812ed439dcc91837
Reported-by: kernelci.org bot <bot(a)kernelci.org>
--
This is an experimental report format. Please send feedback in!
Talk to us at kernelci(a)lists.linux.dev
Made with love by the KernelCI team - https://kernelci.org
Fix a memory leak in netpoll and introduce netconsole selftests that
expose the issue when running with kmemleak detection enabled.
This patchset includes a selftest for netpoll with multiple concurrent
users (netconsole + bonding), which simulates the scenario from test[1]
that originally demonstrated the issue allegedly fixed by commit
efa95b01da18 ("netpoll: fix use after free") - a commit that is now
being reverted.
Sending this to "net" branch because this is a fix, and the selftest
might help with the backports validation.
Link: https://lore.kernel.org/lkml/96b940137a50e5c387687bb4f57de8b0435a653f.14048… [1]
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Changes in v10:
- Get rid of the create_and_enable_dynamic_target() (Simon)
- Link to v9: https://lore.kernel.org/r/20251106-netconsole_torture-v9-0-f73cd147c13c@deb…
Changes in v9:
- Reordered the config entries in tools/testing/selftests/drivers/net/bonding/config (NIPA)
- Link to v8: https://lore.kernel.org/r/20251104-netconsole_torture-v8-0-5288440e2fa0@deb…
Changes in v8:
- Sending it again, now that commit 1a8fed52f7be1 ("netdevsim: set the
carrier when the device goes up") has landed in net
- Created one namespace for TX and one for RX (Paolo)
- Used additional helpers to create and delete netdevsim (Paolo)
- Link to v7: https://lore.kernel.org/r/20251003-netconsole_torture-v7-0-aa92fcce62a9@deb…
Changes in v7:
- Rebased on top of `net`
- Link to v6: https://lore.kernel.org/r/20251002-netconsole_torture-v6-0-543bf52f6b46@deb…
Changes in v6:
- Expand the tests even more and some small fixups
- Moved the test to bonding selftests
- Link to v5: https://lore.kernel.org/r/20250918-netconsole_torture-v5-0-77e25e0a4eb6@deb…
Changes in v5:
- Set CONFIG_BONDING=m in selftests/drivers/net/config.
- Link to v4: https://lore.kernel.org/r/20250917-netconsole_torture-v4-0-0a5b3b8f81ce@deb…
Changes in v4:
- Added an additional selftest to test multiple netpoll users in
parallel
- Link to v3: https://lore.kernel.org/r/20250905-netconsole_torture-v3-0-875c7febd316@deb…
Changes in v3:
- This patchset is a merge of the fix and the selftest together as
recommended by Jakub.
Changes in v2:
- Reuse the netconsole creation from lib_netcons.sh. Thus, refactoring
the create_dynamic_target() (Jakub)
- Move the "wait" to after all the messages has been sent.
- Link to v1: https://lore.kernel.org/r/20250902-netconsole_torture-v1-1-03c6066598e9@deb…
---
Breno Leitao (4):
net: netpoll: fix incorrect refcount handling causing incorrect cleanup
selftest: netcons: refactor target creation
selftest: netcons: create a torture test
selftest: netcons: add test for netconsole over bonded interfaces
net/core/netpoll.c | 7 +-
tools/testing/selftests/drivers/net/Makefile | 1 +
.../testing/selftests/drivers/net/bonding/Makefile | 2 +
tools/testing/selftests/drivers/net/bonding/config | 4 +
.../drivers/net/bonding/netcons_over_bonding.sh | 361 +++++++++++++++++++++
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 78 ++++-
.../selftests/drivers/net/netcons_torture.sh | 130 ++++++++
7 files changed, 566 insertions(+), 17 deletions(-)
---
base-commit: 7d1988a943850c584e8e2e4bcc7a3b5275024072
change-id: 20250902-netconsole_torture-8fc23f0aca99
Best regards,
--
Breno Leitao <leitao(a)debian.org>
When turbo mode is unavailable on a Skylake-X system, executing the
command:
"echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo"
results in an unchecked MSR access error: WRMSR to 0x199
(attempted to write 0x0000000100001300).
This issue was reproduced on an OEM (Original Equipment Manufacturer)
system and is not a common problem across all Skylake-X systems.
This error occurs because the MSR 0x199 Turbo Engage Bit (bit 32) is set
when turbo mode is disabled. The issue arises when intel_pstate fails to
detect that turbo mode is disabled. Here intel_pstate relies on
MSR_IA32_MISC_ENABLE bit 38 to determine the status of turbo mode.
However, on this system, bit 38 is not set even when turbo mode is
disabled.
According to the Intel Software Developer's Manual (SDM), the BIOS sets
this bit during platform initialization to enable or disable
opportunistic processor performance operations. Logically, this bit
should be set in such cases. However, the SDM also specifies that "OS and
applications must use CPUID leaf 06H to detect processors with
opportunistic processor performance operations enabled."
Therefore, in addition to checking MSR_IA32_MISC_ENABLE bit 38, verify
that CPUID.06H:EAX[1] is 0 to accurately determine if turbo mode is
disabled.
Fixes: 4521e1a0ce17 ("cpufreq: intel_pstate: Reflect current no_turbo state correctly")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
---
drivers/cpufreq/intel_pstate.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index f41ed0b9e610..ba9bf06f1c77 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -598,6 +598,9 @@ static bool turbo_is_disabled(void)
{
u64 misc_en;
+ if (!cpu_feature_enabled(X86_FEATURE_IDA))
+ return true;
+
rdmsrl(MSR_IA32_MISC_ENABLE, misc_en);
return !!(misc_en & MSR_IA32_MISC_ENABLE_TURBO_DISABLE);
--
2.48.1