From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com>
Hello,
Plesae find the v2 AccECN case handling patch series, which covers
several excpetional case handling of Accurate ECN spec (RFC9768),
adds new identifiers to be used by CC modules, adds ecn_delta into
rate_sample, and keeps the ACE counter for computation, etc.
This patch series is part of the full AccECN patch series, which is available at
https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/
Best regards,
Chia-Yu
---
v3:
- Add additional min() check if pkts_acked_ewma is not initialized in #1.
- Change TCP_CONG_WANTS_ECT_1 into individual flag add helper function INET_ECN_xmit_wants_ect_1() in #3.
- Add empty line between variable declarations and code in #4.
- Update commit message to fix old AccECN commits in #5.
- Remove unnecessary brackets in #10.
- Move patch #3 in v2 to a later Prague patch serise and remove patch #13 in v2.
---
Chia-Yu Chang (10):
tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules
tcp: disable RFC3168 fallback identifier for CC modules
tcp: accecn: handle unexpected AccECN negotiation feedback
tcp: accecn: retransmit downgraded SYN in AccECN negotiation
tcp: move increment of num_retrans
tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN
SYN/ACK
tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion
tcp: accecn: fallback outgoing half link to non-AccECN
tcp: accecn: verify ACE counter in 1st ACK after AccECN negotiation
tcp: accecn: enable AccECN
Ilpo Järvinen (2):
tcp: try to avoid safer when ACKs are thinned
gro: flushing when CWR is set negatively affects AccECN
.../networking/net_cachelines/tcp_sock.rst | 1 +
include/linux/tcp.h | 1 +
include/net/inet_ecn.h | 20 ++++-
include/net/tcp.h | 32 ++++++-
include/net/tcp_ecn.h | 90 +++++++++++++------
net/ipv4/sysctl_net_ipv4.c | 2 +-
net/ipv4/tcp.c | 2 +
net/ipv4/tcp_cong.c | 10 ++-
net/ipv4/tcp_input.c | 49 ++++++++--
net/ipv4/tcp_minisocks.c | 40 ++++++---
net/ipv4/tcp_offload.c | 3 +-
net/ipv4/tcp_output.c | 35 +++++---
12 files changed, 218 insertions(+), 67 deletions(-)
--
2.34.1
Hello there,
Static analyser cppcheck says:
linux-6.17/tools/testing/selftests/landlock/fs_test.c:5631:23: style: A pointer can not be negative so it is either pointless or an error to check if it is. [pointerLessThanZero]
Source code is
if (log_match_cursor < 0)
return (long long)log_match_cursor;
but
char *log_match_cursor = log_match;
Suggest remove code.
Regards
David Binderman
From: Jack Thomson <jackabt(a)amazon.com>
Overview:
This patch series adds ARM64 support for the KVM_PRE_FAULT_MEMORY
feature, which was previously only available on x86 [1]. This allows
a reduction in the number of stage-2 faults during execution. This is
beneficial in post-copy migration scenarios, particularly in memory
intensive applications, where high latencies are experienced due to
the stage-2 faults when pre-populating memory via UFFD / memcpy.
Patch Overview:
- The first patch is a preparatory refactor.
- The second patch is adding a page walk flag for pre-faulting.
- The third patch adds support for the KVM_PRE_FAULT_MEMORY ioctl
on arm64.
- The fourth patch fixes an issue with unaligned mmap allocations
in the selftests.
- The fifth patch updates the pre_fault_memory_test to support
arm64.
- The last patch extends the pre_fault_memory_test to cover
different vm memory backings.
[1]: https://lore.kernel.org/kvm/20240710174031.312055-1-pbonzini@redhat.com
Jack Thomson (6):
KVM: arm64: Add __gmem_abort and __user_mem_abort
KVM: arm64: Add KVM_PGTABLE_WALK_PRE_FAULT walk flag
KVM: arm64: Add pre_fault_memory implementation
KVM: selftests: Fix unaligned mmap allocations
KVM: selftests: Enable pre_fault_memory_test for arm64
KVM: selftests: Add option for different backing in pre-fault tests
arch/arm64/include/asm/kvm_pgtable.h | 3 +
arch/arm64/kvm/Kconfig | 1 +
arch/arm64/kvm/arm.c | 1 +
arch/arm64/kvm/hyp/pgtable.c | 6 +-
arch/arm64/kvm/mmu.c | 97 +++++++++++++--
tools/testing/selftests/kvm/Makefile.kvm | 1 +
tools/testing/selftests/kvm/lib/kvm_util.c | 12 +-
.../selftests/kvm/pre_fault_memory_test.c | 110 +++++++++++++-----
8 files changed, 186 insertions(+), 45 deletions(-)
base-commit: 42188667be387867d2bf763d028654cbad046f7b
--
2.43.0
This series backports 19 patches to update minmax.h in the 6.1.y branch,
aligning it with v6.17-rc7.
The ultimate goal is to synchronize all longterm branches so that they
include the full set of minmax.h changes.
Previous work to update 6.12.48:
https://lore.kernel.org/stable/20250922103123.14538-1-farbere@amazon.com/T/…
and 6.6.107:
https://lore.kernel.org/stable/20250922103241.16213-1-farbere@amazon.com/T/…
The key motivation is to bring in commit d03eba99f5bf ("minmax: allow
min()/max()/clamp() if the arguments have the same signedness"), which
is missing in older kernels.
In mainline, this change enables min()/max()/clamp() to accept mixed
argument types, provided both have the same signedness. Without it,
backported patches that use these forms may trigger compiler warnings,
which escalate to build failures when -Werror is enabled.
Andy Shevchenko (1):
minmax: deduplicate __unconst_integer_typeof()
David Laight (8):
minmax: fix indentation of __cmp_once() and __clamp_once()
minmax.h: add whitespace around operators and after commas
minmax.h: update some comments
minmax.h: reduce the #define expansion of min(), max() and clamp()
minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp()
minmax.h: move all the clamp() definitions after the min/max() ones
minmax.h: simplify the variants of clamp()
minmax.h: remove some #defines that are only expanded once
Herve Codina (1):
minmax: Introduce {min,max}_array()
Linus Torvalds (8):
minmax: avoid overly complicated constant expressions in VM code
minmax: simplify and clarify min_t()/max_t() implementation
minmax: make generic MIN() and MAX() macros available everywhere
minmax: add a few more MIN_T/MAX_T users
minmax: simplify min()/max()/clamp() implementation
minmax: don't use max() in situations that want a C constant
expression
minmax: improve macro expansion and type checking
minmax: fix up min3() and max3() too
Matthew Wilcox (Oracle) (1):
minmax: add in_range() macro
arch/arm/mm/pageattr.c | 6 +-
arch/um/drivers/mconsole_user.c | 2 +
arch/x86/mm/pgtable.c | 2 +-
drivers/edac/sb_edac.c | 4 +-
drivers/edac/skx_common.h | 1 -
.../drm/amd/display/modules/hdcp/hdcp_ddc.c | 2 +
.../drm/amd/pm/powerplay/hwmgr/ppevvmath.h | 14 +-
drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 2 +-
.../drm/arm/display/include/malidp_utils.h | 2 +-
.../display/komeda/komeda_pipeline_state.c | 24 +-
drivers/gpu/drm/drm_color_mgmt.c | 2 +-
drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 6 -
drivers/gpu/drm/radeon/evergreen_cs.c | 2 +
drivers/hwmon/adt7475.c | 24 +-
drivers/input/touchscreen/cyttsp4_core.c | 2 +-
drivers/irqchip/irq-sun6i-r.c | 2 +-
drivers/md/dm-integrity.c | 2 +-
drivers/media/dvb-frontends/stv0367_priv.h | 3 +
.../net/ethernet/chelsio/cxgb3/cxgb3_main.c | 18 +-
.../net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +-
drivers/net/fjes/fjes_main.c | 4 +-
drivers/nfc/pn544/i2c.c | 2 -
drivers/platform/x86/sony-laptop.c | 1 -
drivers/scsi/isci/init.c | 6 +-
.../pci/hive_isp_css_include/math_support.h | 5 -
drivers/virt/acrn/ioreq.c | 4 +-
fs/btrfs/misc.h | 2 -
fs/btrfs/tree-checker.c | 2 +-
fs/ext2/balloc.c | 2 -
fs/ext4/ext4.h | 2 -
fs/ufs/util.h | 6 -
include/linux/compiler.h | 9 +
include/linux/minmax.h | 264 +++++++++++++-----
include/linux/pageblock-flags.h | 2 +-
kernel/trace/preemptirq_delay_test.c | 2 -
lib/btree.c | 1 -
lib/decompress_unlzma.c | 2 +
lib/logic_pio.c | 3 -
lib/vsprintf.c | 2 +-
mm/zsmalloc.c | 1 -
net/ipv4/proc.c | 2 +-
net/ipv6/proc.c | 2 +-
net/netfilter/nf_nat_core.c | 6 +-
net/tipc/core.h | 2 +-
net/tipc/link.c | 10 +-
.../selftests/bpf/progs/get_branch_snapshot.c | 4 +-
tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +
tools/testing/selftests/vm/mremap_test.c | 2 +
48 files changed, 290 insertions(+), 184 deletions(-)
--
2.47.3
During the discussion of the clone3() support for shadow stacks concerns
were raised from the glibc side that since it is not possible to reuse
the allocated shadow stack[1]. This means that the benefit of being able
to manage allocations is greatly reduced, for example it is not possible
to integrate the shadow stacks into the glibc thread stack cache. The
stack can be inspected but otherwise it would have to be unmapped and
remapped before it could be used again, it's not clear that this is
better than managing things in the kernel.
In that discussion I suggested that we could enable reuse by writing a
token to the shadow stack of exiting threads, mirroring how the
userspace stack pivot instructions write a token to the outgoing stack.
As mentioned by Florian[2] glibc already unwinds the stack and exits the
thread from the start routine which would integrate nicely with this,
the shadow stack pointer will be at the same place as it was when the
thread started.
This would not write a token if the thread doesn't exit cleanly, that
seems viable to me - users should probably handle this by double
checking that a token is present after waiting for the thread.
This is tagged as a RFC since I put it together fairly quickly to
demonstrate the proposal and the suggestion hasn't had much response
either way from the glibc developers. At the very least we don't
currently handle scheduling during exit(), or distinguish why the thread
is exiting. I've also not done anything about x86.
[1] https://marc.info/?l=glibc-alpha&m=175821637429537&w=2
[2] https://marc.info/?l=glibc-alpha&m=175733266913483&w=2
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Mark Brown (3):
arm64/gcs: Support reuse of GCS for exited threads
kselftest/arm64: Validate PR_SHADOW_STACK_EXIT_TOKEN in basic-gcs
kselftest/arm64: Add PR_SHADOW_STACK_EXIT_TOKEN to gcs-locking
arch/arm64/include/asm/gcs.h | 3 +-
arch/arm64/mm/gcs.c | 25 ++++-
include/uapi/linux/prctl.h | 1 +
tools/testing/selftests/arm64/gcs/basic-gcs.c | 121 ++++++++++++++++++++++++
tools/testing/selftests/arm64/gcs/gcs-locking.c | 23 +++++
tools/testing/selftests/arm64/gcs/gcs-util.h | 3 +-
6 files changed, 173 insertions(+), 3 deletions(-)
---
base-commit: 0b67d4b724b4afed2690c21bef418b8a803c5be2
change-id: 20250919-arm64-gcs-exit-token-82c3c2570aad
prerequisite-change-id: 20231019-clone3-shadow-stack-15d40d2bf536
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Fix a memory leak in netpoll and introduce netconsole selftests that
expose the issue when running with kmemleak detection enabled.
This patchset includes a selftest for netpoll with multiple concurrent
users (netconsole + bonding), which simulates the scenario from test[1]
that originally demonstrated the issue allegedly fixed by commit
efa95b01da18 ("netpoll: fix use after free") - a commit that is now
being reverted.
Sending this to "net" branch because this is a fix, and the selftest
might help with the backports validation.
Link: https://lore.kernel.org/lkml/96b940137a50e5c387687bb4f57de8b0435a653f.14048… [1]
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Changes in v5:
- Set CONFIG_BONDING=m in selftests/drivers/net/config.
- Link to v4: https://lore.kernel.org/r/20250917-netconsole_torture-v4-0-0a5b3b8f81ce@deb…
Changes in v4:
- Added an additional selftest to test multiple netpoll users in
parallel
- Link to v3: https://lore.kernel.org/r/20250905-netconsole_torture-v3-0-875c7febd316@deb…
Changes in v3:
- This patchset is a merge of the fix and the selftest together as
recommended by Jakub.
Changes in v2:
- Reuse the netconsole creation from lib_netcons.sh. Thus, refactoring
the create_dynamic_target() (Jakub)
- Move the "wait" to after all the messages has been sent.
- Link to v1: https://lore.kernel.org/r/20250902-netconsole_torture-v1-1-03c6066598e9@deb…
---
Breno Leitao (4):
net: netpoll: fix incorrect refcount handling causing incorrect cleanup
selftest: netcons: refactor target creation
selftest: netcons: create a torture test
selftest: netcons: add test for netconsole over bonded interfaces
net/core/netpoll.c | 7 +-
tools/testing/selftests/drivers/net/Makefile | 2 +
tools/testing/selftests/drivers/net/config | 1 +
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 197 ++++++++++++++++++---
.../selftests/drivers/net/netcons_over_bonding.sh | 76 ++++++++
.../selftests/drivers/net/netcons_torture.sh | 127 +++++++++++++
6 files changed, 385 insertions(+), 25 deletions(-)
---
base-commit: 5e87fdc37f8dc619549d49ba5c951b369ce7c136
change-id: 20250902-netconsole_torture-8fc23f0aca99
Best regards,
--
Breno Leitao <leitao(a)debian.org>
Hi Thomas and David,
I am seeing the following error during "kunit.py run --alltests run"
next-20250926.
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=16
ERROR:root:/usr/bin/ld: drivers/net/wireless/intel/iwlwifi/tests/devinfo.o: in function `devinfo_pci_ids_config':
devinfo.c:(.text+0x2d): undefined reference to `iwl_bz_mac_cfg'
collect2: error: ld returned 1 exit status
make[3]: *** [../scripts/Makefile.vmlinux:72: vmlinux.unstripped] Error 1
make[2]: *** [/linux/linux_next/Makefile:1242: vmlinux] Error 2
make[1]: *** [/linux/linux_next/Makefile:248: __sub-make] Error 2
make: *** [Makefile:248: __sub-make] Error 2
Possile intearction between these two commits: Note: linux-kselftext
kunit branch is fine I am going send kunit pr to Linus later today.
Heads up that "kunit.py run --alltests run" is failing on next-20250926
commit 031cdd3bc3f369553933c1b0f4cb18000162c8ff
Author: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Date: Mon Sep 8 09:03:38 2025 +0200
kunit: Enable PCI on UML without triggering WARN()
commit 137b0bb916f1addb2ffbefd09a6e3e9d15fe6100
Author: Johannes Berg <johannes.berg(a)intel.com>
Date: Mon Sep 15 11:34:28 2025 +0300
wifi: iwlwifi: tests: check listed PCI IDs have configs
Note: linux-kselftext build just fine.
thanks,
-- Shuah
Hi all,
The test_xsk.sh script covers many AF_XDP use cases. The tests it runs
are defined in xksxceiver.c. Since this script is used to test real
hardware, the goal here is to leave it as it is, and only integrate the
tests that run on veth peers into the test_progs framework.
Some tests are flaky so they can't be integrated in the CI as they are.
I think that fixing their flakyness would require a significant amount of
work. So, as first step, I've excluded them from the list of tests
migrated to the CI (cf PATCH 14). If these tests get fixed at some
point, integrating them into the CI will be straightforward.
I noticed a small error on a function's return value while investigating
on the report's summary issue pointed out by Maciej in previous iteration,
the new PATCH 3 fixes it.
PATCH 1 extracts test_xsk[.c/.h] from xskxceiver[.c/.h] to make the
tests available to test_progs.
PATCH 2 to 7 fix small issues in the current test
PATCH 8 to 13 handle all errors to release resources instead of calling
exit() when any error occurs.
PATCH 14 isolates some flaky tests
PATCH 15 integrate the non-flaky tests to the test_progs framework
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
---
Changes in v4:
- Fix test_xsk.sh's summary report.
- Merge PATCH 11 & 12 together, otherwise PATCH 11 fails to build.
- Split old PATCH 3 in two patches. The first one fixes
testapp_stats_rx_dropped(), the second one fixes
testapp_xdp_shared_umem(). The unecessary frees (in
testapp_stats_rx_full() and testapp_stats_fill_empty() are removed)
- Link to v3: https://lore.kernel.org/r/20250904-xsk-v3-0-ce382e331485@bootlin.com
Changes in v3:
- Rebase on latest bpf-next_base to integrate commit c9110e6f7237 ("selftests/bpf:
Fix count write in testapp_xdp_metadata_copy()").
- Move XDP_METADATA_COPY_* tests from flaky-tests to nominal tests
- Link to v2: https://lore.kernel.org/r/20250902-xsk-v2-0-17c6345d5215@bootlin.com
Changes in v2:
- Rebase on the latest bpf-next_base and integrate the newly added tests
to the work (adjust_tail* and tx_queue_consumer tests)
- Re-order patches to split xkxceiver sooner.
- Fix the bug reported by Maciej.
- Fix verbose mode in test_xsk.sh by keeping kselftest (remove PATCH 1,
7 and 8)
- Link to v1: https://lore.kernel.org/r/20250313-xsk-v1-0-7374729a93b9@bootlin.com
---
Bastien Curutchet (eBPF Foundation) (15):
selftests/bpf: test_xsk: Split xskxceiver
selftests/bpf: test_xsk: Initialize bitmap before use
selftests/bpf: test_xsk: Fix __testapp_validate_traffic()'s return value
selftests/bpf: test_xsk: fix memory leak in testapp_stats_rx_dropped()
selftests/bpf: test_xsk: fix memory leak in testapp_xdp_shared_umem()
selftests/bpf: test_xsk: Wrap test clean-up in functions
selftests/bpf: test_xsk: Release resources when swap fails
selftests/bpf: test_xsk: Add return value to init_iface()
selftests/bpf: test_xsk: Don't exit immediately when xsk_attach fails
selftests/bpf: test_xsk: Don't exit immediately when gettimeofday fails
selftests/bpf: test_xsk: Don't exit immediately when workers fail
selftests/bpf: test_xsk: Don't exit immediately if validate_traffic fails
selftests/bpf: test_xsk: Don't exit immediately on allocation failures
selftests/bpf: test_xsk: Isolate flaky tests
selftests/bpf: test_xsk: Integrate test_xsk.c to test_progs framework
tools/testing/selftests/bpf/Makefile | 11 +-
tools/testing/selftests/bpf/prog_tests/test_xsk.c | 2595 ++++++++++++++++++++
tools/testing/selftests/bpf/prog_tests/test_xsk.h | 294 +++
tools/testing/selftests/bpf/prog_tests/xsk.c | 146 ++
tools/testing/selftests/bpf/xskxceiver.c | 2696 +--------------------
tools/testing/selftests/bpf/xskxceiver.h | 156 --
6 files changed, 3174 insertions(+), 2724 deletions(-)
---
base-commit: 1bd67e08d0f3fcb8cc69a73fb7aab9f048be4b8e
change-id: 20250218-xsk-0cf90e975d14
Best regards,
--
Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
A task in the kernel (task_mm_cid_work) runs somewhat periodically to
compact the mm_cid for each process. Add a test to validate that it runs
correctly and timely.
The test spawns 1 thread pinned to each CPU, then each thread, including
the main one, runs in short bursts for some time. During this period, the
mm_cids should be spanning all numbers between 0 and nproc.
At the end of this phase, a thread with high enough mm_cid (>= nproc/2)
is selected to be the new leader, all other threads terminate.
After some time, the only running thread should see 0 as mm_cid, if that
doesn't happen, the compaction mechanism didn't work and the test fails.
The test never fails if only 1 core is available, in which case, we
cannot test anything as the only available mm_cid is 0.
Acked-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Gabriele Monaco <gmonaco(a)redhat.com>
---
tools/testing/selftests/rseq/.gitignore | 1 +
tools/testing/selftests/rseq/Makefile | 2 +-
.../selftests/rseq/mm_cid_compaction_test.c | 204 ++++++++++++++++++
3 files changed, 206 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/rseq/mm_cid_compaction_test.c
diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selftests/rseq/.gitignore
index 0fda241fa62b..b3920c59bf40 100644
--- a/tools/testing/selftests/rseq/.gitignore
+++ b/tools/testing/selftests/rseq/.gitignore
@@ -3,6 +3,7 @@ basic_percpu_ops_test
basic_percpu_ops_mm_cid_test
basic_test
basic_rseq_op_test
+mm_cid_compaction_test
param_test
param_test_benchmark
param_test_compare_twice
diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
index 0d0a5fae5954..bc4d940f66d4 100644
--- a/tools/testing/selftests/rseq/Makefile
+++ b/tools/testing/selftests/rseq/Makefile
@@ -17,7 +17,7 @@ OVERRIDE_TARGETS = 1
TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
param_test_benchmark param_test_compare_twice param_test_mm_cid \
param_test_mm_cid_benchmark param_test_mm_cid_compare_twice \
- syscall_errors_test
+ syscall_errors_test mm_cid_compaction_test
TEST_GEN_PROGS_EXTENDED = librseq.so
diff --git a/tools/testing/selftests/rseq/mm_cid_compaction_test.c b/tools/testing/selftests/rseq/mm_cid_compaction_test.c
new file mode 100644
index 000000000000..d13623625f5a
--- /dev/null
+++ b/tools/testing/selftests/rseq/mm_cid_compaction_test.c
@@ -0,0 +1,204 @@
+// SPDX-License-Identifier: LGPL-2.1
+#define _GNU_SOURCE
+#include <assert.h>
+#include <pthread.h>
+#include <sched.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stddef.h>
+
+#include "../kselftest.h"
+#include "rseq.h"
+
+#define VERBOSE 0
+#define printf_verbose(fmt, ...) \
+ do { \
+ if (VERBOSE) \
+ printf(fmt, ##__VA_ARGS__); \
+ } while (0)
+
+/* 50 ms */
+#define RUNNER_PERIOD 50000
+/*
+ * Number of runs before we terminate or get the token.
+ * The number is slowly increasing with the number of CPUs as the compaction
+ * process can take longer on larger systems. This is an arbitrary value.
+ */
+#define THREAD_RUNS (3 + args->num_cpus/8)
+
+/*
+ * Number of times we check that the mm_cid were compacted.
+ * Checks are repeated every RUNNER_PERIOD.
+ */
+#define MM_CID_COMPACT_TIMEOUT 10
+
+struct thread_args {
+ int cpu;
+ int num_cpus;
+ pthread_mutex_t *token;
+ pthread_barrier_t *barrier;
+ pthread_t *tinfo;
+ struct thread_args *args_head;
+};
+
+static void __noreturn *thread_runner(void *arg)
+{
+ struct thread_args *args = arg;
+ int i, ret, curr_mm_cid;
+ cpu_set_t cpumask;
+
+ CPU_ZERO(&cpumask);
+ CPU_SET(args->cpu, &cpumask);
+ ret = pthread_setaffinity_np(pthread_self(), sizeof(cpumask), &cpumask);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to set affinity");
+ abort();
+ }
+ pthread_barrier_wait(args->barrier);
+
+ for (i = 0; i < THREAD_RUNS; i++)
+ usleep(RUNNER_PERIOD);
+ curr_mm_cid = rseq_current_mm_cid();
+ /*
+ * We select one thread with high enough mm_cid to be the new leader.
+ * All other threads (including the main thread) will terminate.
+ * After some time, the mm_cid of the only remaining thread should
+ * converge to 0, if not, the test fails.
+ */
+ if (curr_mm_cid >= args->num_cpus / 2 &&
+ !pthread_mutex_trylock(args->token)) {
+ printf_verbose(
+ "cpu%d has mm_cid=%d and will be the new leader.\n",
+ sched_getcpu(), curr_mm_cid);
+ for (i = 0; i < args->num_cpus; i++) {
+ if (args->tinfo[i] == pthread_self())
+ continue;
+ ret = pthread_join(args->tinfo[i], NULL);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to join thread");
+ abort();
+ }
+ }
+ pthread_barrier_destroy(args->barrier);
+ free(args->tinfo);
+ free(args->token);
+ free(args->barrier);
+ free(args->args_head);
+
+ for (i = 0; i < MM_CID_COMPACT_TIMEOUT; i++) {
+ curr_mm_cid = rseq_current_mm_cid();
+ printf_verbose("run %d: mm_cid=%d on cpu%d.\n", i,
+ curr_mm_cid, sched_getcpu());
+ if (curr_mm_cid == 0)
+ exit(EXIT_SUCCESS);
+ usleep(RUNNER_PERIOD);
+ }
+ exit(EXIT_FAILURE);
+ }
+ printf_verbose("cpu%d has mm_cid=%d and is going to terminate.\n",
+ sched_getcpu(), curr_mm_cid);
+ pthread_exit(NULL);
+}
+
+int test_mm_cid_compaction(void)
+{
+ cpu_set_t affinity;
+ int i, j, ret = 0, num_threads;
+ pthread_t *tinfo;
+ pthread_mutex_t *token;
+ pthread_barrier_t *barrier;
+ struct thread_args *args;
+
+ sched_getaffinity(0, sizeof(affinity), &affinity);
+ num_threads = CPU_COUNT(&affinity);
+ tinfo = calloc(num_threads, sizeof(*tinfo));
+ if (!tinfo) {
+ perror("Error: failed to allocate tinfo");
+ return -1;
+ }
+ args = calloc(num_threads, sizeof(*args));
+ if (!args) {
+ perror("Error: failed to allocate args");
+ ret = -1;
+ goto out_free_tinfo;
+ }
+ token = malloc(sizeof(*token));
+ if (!token) {
+ perror("Error: failed to allocate token");
+ ret = -1;
+ goto out_free_args;
+ }
+ barrier = malloc(sizeof(*barrier));
+ if (!barrier) {
+ perror("Error: failed to allocate barrier");
+ ret = -1;
+ goto out_free_token;
+ }
+ if (num_threads == 1) {
+ fprintf(stderr, "Cannot test on a single cpu. "
+ "Skipping mm_cid_compaction test.\n");
+ /* only skipping the test, this is not a failure */
+ goto out_free_barrier;
+ }
+ pthread_mutex_init(token, NULL);
+ ret = pthread_barrier_init(barrier, NULL, num_threads);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to initialise barrier");
+ goto out_free_barrier;
+ }
+ for (i = 0, j = 0; i < CPU_SETSIZE && j < num_threads; i++) {
+ if (!CPU_ISSET(i, &affinity))
+ continue;
+ args[j].num_cpus = num_threads;
+ args[j].tinfo = tinfo;
+ args[j].token = token;
+ args[j].barrier = barrier;
+ args[j].cpu = i;
+ args[j].args_head = args;
+ if (!j) {
+ /* The first thread is the main one */
+ tinfo[0] = pthread_self();
+ ++j;
+ continue;
+ }
+ ret = pthread_create(&tinfo[j], NULL, thread_runner, &args[j]);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to create thread");
+ abort();
+ }
+ ++j;
+ }
+ printf_verbose("Started %d threads.\n", num_threads);
+
+ /* Also main thread will terminate if it is not selected as leader */
+ thread_runner(&args[0]);
+
+ /* only reached in case of errors */
+out_free_barrier:
+ free(barrier);
+out_free_token:
+ free(token);
+out_free_args:
+ free(args);
+out_free_tinfo:
+ free(tinfo);
+
+ return ret;
+}
+
+int main(int argc, char **argv)
+{
+ if (!rseq_mm_cid_available()) {
+ fprintf(stderr, "Error: rseq_mm_cid unavailable\n");
+ return -1;
+ }
+ if (test_mm_cid_compaction())
+ return -1;
+ return 0;
+}
--
2.51.0
This patch series introduces LANDLOCK_SCOPE_MEMFD_EXEC, a new Landlock
scoping mechanism that restricts execution of anonymous memory file
descriptors (memfd) created via memfd_create(2). This addresses security
gaps where processes can bypass W^X policies and execute arbitrary code
through anonymous memory objects.
Fixes: https://github.com/landlock-lsm/linux/issues/37
SECURITY PROBLEM
================
Current Landlock filesystem restrictions do not cover memfd objects,
allowing processes to:
1. Read-to-execute bypass: Create writable memfd, inject code,
then execute via mmap(PROT_EXEC) or direct execve()
2. Anonymous execution: Execute code without touching the filesystem via
execve("/proc/self/fd/N") where N is a memfd descriptor
3. Cross-domain access violations: Pass memfd between processes to
bypass domain restrictions
These scenarios can occur in sandboxed environments where filesystem
access is restricted but memfd creation remains possible.
IMPLEMENTATION
==============
The implementation adds hierarchical execution control through domain
scoping:
Core Components:
- is_memfd_file(): Reliable memfd detection via "memfd:" dentry prefix
- domain_is_scoped(): Cross-domain hierarchy checking (moved to domain.c)
- LSM hooks: mmap_file, file_mprotect, bprm_creds_for_exec
- Creation-time restrictions: hook_file_alloc_security
Security Matrix:
Execution decisions follow domain hierarchy rules preventing both
same-domain bypass attempts and cross-domain access violations while
preserving legitimate hierarchical access patterns.
Domain Hierarchy with LANDLOCK_SCOPE_MEMFD_EXEC:
===============================================
Root (no domain) - No restrictions
|
+-- Domain A [SCOPE_MEMFD_EXEC] Layer 1
| +-- memfd_A (tagged with Domain A as creator)
| |
| +-- Domain A1 (child) [NO SCOPE] Layer 2
| | +-- Inherits Layer 1 restrictions from parent
| | +-- memfd_A1 (can create, inherits restrictions)
| | +-- Domain A1a [SCOPE_MEMFD_EXEC] Layer 3
| | +-- memfd_A1a (tagged with Domain A1a)
| |
| +-- Domain A2 (child) [SCOPE_MEMFD_EXEC] Layer 2
| +-- memfd_A2 (tagged with Domain A2 as creator)
| +-- CANNOT access memfd_A1 (different subtree)
|
+-- Domain B [SCOPE_MEMFD_EXEC] Layer 1
+-- memfd_B (tagged with Domain B as creator)
+-- CANNOT access ANY memfd from Domain A subtree
Execution Decision Matrix:
========================
Executor-> | A | A1 | A1a | A2 | B | Root
Creator | | | | | |
------------|-----|----|-----|----|----|-----
Domain A | X | X | X | X | X | Y
Domain A1 | Y | X | X | X | X | Y
Domain A1a | Y | Y | X | X | X | Y
Domain A2 | Y | X | X | X | X | Y
Domain B | X | X | X | X | X | Y
Root | Y | Y | Y | Y | Y | Y
Legend: Y = Execution allowed, X = Execution denied
Scenarios Covered:
- Direct mmap(PROT_EXEC) on memfd files
- Two-stage mmap(PROT_READ) + mprotect(PROT_EXEC) bypass attempts
- execve("/proc/self/fd/N") anonymous execution
- execveat() and fexecve() file descriptor execution
- Cross-process memfd inheritance and IPC passing
TESTING
=======
All patches have been validated with:
- scripts/checkpatch.pl --strict (clean)
- Selftests covering same-domain restrictions, cross-domain
hierarchy enforcement, and regular file isolation
- KUnit tests for memfd detection edge cases
DISCLAIMER
==========
My understanding of Landlock scoping semantics may be limited, but this
implementation reflects my current understanding based on available
documentation and code analysis. I welcome feedback and corrections
regarding the scoping logic and domain hierarchy enforcement.
Signed-off-by: Abhinav Saxena <xandfury(a)gmail.com>
---
Abhinav Saxena (4):
landlock: add LANDLOCK_SCOPE_MEMFD_EXEC scope
landlock: implement memfd detection
landlock: add memfd exec LSM hooks and scoping
selftests/landlock: add memfd execution tests
include/uapi/linux/landlock.h | 5 +
security/landlock/.kunitconfig | 1 +
security/landlock/audit.c | 4 +
security/landlock/audit.h | 1 +
security/landlock/cred.c | 14 -
security/landlock/domain.c | 67 ++++
security/landlock/domain.h | 4 +
security/landlock/fs.c | 405 ++++++++++++++++++++-
security/landlock/limits.h | 2 +-
security/landlock/task.c | 67 ----
.../selftests/landlock/scoped_memfd_exec_test.c | 325 +++++++++++++++++
11 files changed, 812 insertions(+), 83 deletions(-)
---
base-commit: 5b74b2eff1eeefe43584e5b7b348c8cd3b723d38
change-id: 20250716-memfd-exec-ac0d582018c3
Best regards,
--
Abhinav Saxena <xandfury(a)gmail.com>
From: Wilfred Mallawa <wilfred.mallawa(a)wdc.com>
During a handshake, an endpoint may specify a maximum record size limit.
Currently, the kernel defaults to TLS_MAX_PAYLOAD_SIZE (16KB) for the
maximum record size. Meaning that, the outgoing records from the kernel
can exceed a lower size negotiated during the handshake. In such a case,
the TLS endpoint must send a fatal "record_overflow" alert [1], and
thus the record is discarded.
Upcoming Western Digital NVMe-TCP hardware controllers implement TLS
support. For these devices, supporting TLS record size negotiation is
necessary because the maximum TLS record size supported by the controller
is less than the default 16KB currently used by the kernel.
This patch adds support for retrieving the negotiated record size limit
during a handshake, and enforcing it at the TLS layer such that outgoing
records are no larger than the size negotiated. This patch depends on
the respective userspace support in tlshd and GnuTLS [2].
[1] https://www.rfc-editor.org/rfc/rfc8449
[2] https://gitlab.com/gnutls/gnutls/-/merge_requests/2005
Signed-off-by: Wilfred Mallawa <wilfred.mallawa(a)wdc.com>
---
Changes V3 -> V4:
* Added record_size_limit RFC reference to documentation
* Always export the record size limit in tls_get_info()
* Disallow user space to change the record_size_limit from under us
if an open record is pending.
* Added record_size_limit minimum size check as per RFC
* Allow space for the ContentType byte for TLS 1.3. The expected
behaviour is that userspace directly uses the negotiated
record_size_limit, kernel will limit the plaintext buffer size
appropirately.
* New patch to add self-tests.
---
Documentation/networking/tls.rst | 12 +++++
include/net/tls.h | 5 +++
include/uapi/linux/tls.h | 2 +
net/tls/tls_device.c | 2 +-
net/tls/tls_main.c | 75 ++++++++++++++++++++++++++++++++
net/tls/tls_sw.c | 2 +-
6 files changed, 96 insertions(+), 2 deletions(-)
diff --git a/Documentation/networking/tls.rst b/Documentation/networking/tls.rst
index 36cc7afc2527..d24bf8911bb8 100644
--- a/Documentation/networking/tls.rst
+++ b/Documentation/networking/tls.rst
@@ -280,6 +280,18 @@ If the record decrypted turns out to had been padded or is not a data
record it will be decrypted again into a kernel buffer without zero copy.
Such events are counted in the ``TlsDecryptRetry`` statistic.
+TLS_TX_RECORD_SIZE_LIM
+~~~~~~~~~~~~~~~~~~~~~~
+
+Sets the maximum size for the plaintext of a protected record.
+
+The provided value should correspond to the limit negotiated during the TLS
+handshake via the `record_size_limit` extension (RFC 8449)[1]. When this
+option is set, the kernel enforces this limit on all transmitted TLS records,
+ensuring no plaintext fragment exceeds the specified size.
+
+[1] https://datatracker.ietf.org/doc/html/rfc8449
+
Statistics
==========
diff --git a/include/net/tls.h b/include/net/tls.h
index 857340338b69..32f053770ec4 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -53,6 +53,8 @@ struct tls_rec;
/* Maximum data size carried in a TLS record */
#define TLS_MAX_PAYLOAD_SIZE ((size_t)1 << 14)
+/* Minimum record size limit as per RFC8449 */
+#define TLS_MIN_RECORD_SIZE_LIM ((size_t)1 << 6)
#define TLS_HEADER_SIZE 5
#define TLS_NONCE_OFFSET TLS_HEADER_SIZE
@@ -226,6 +228,9 @@ struct tls_context {
u8 rx_conf:3;
u8 zerocopy_sendfile:1;
u8 rx_no_pad:1;
+ u16 tx_record_size_limit; /* Max plaintext fragment size. For TLS 1.3,
+ * this excludes the ContentType.
+ */
int (*push_pending_record)(struct sock *sk, int flags);
void (*sk_write_space)(struct sock *sk);
diff --git a/include/uapi/linux/tls.h b/include/uapi/linux/tls.h
index b66a800389cc..3add266d5916 100644
--- a/include/uapi/linux/tls.h
+++ b/include/uapi/linux/tls.h
@@ -41,6 +41,7 @@
#define TLS_RX 2 /* Set receive parameters */
#define TLS_TX_ZEROCOPY_RO 3 /* TX zerocopy (only sendfile now) */
#define TLS_RX_EXPECT_NO_PAD 4 /* Attempt opportunistic zero-copy */
+#define TLS_TX_RECORD_SIZE_LIM 5 /* Maximum record size */
/* Supported versions */
#define TLS_VERSION_MINOR(ver) ((ver) & 0xFF)
@@ -194,6 +195,7 @@ enum {
TLS_INFO_RXCONF,
TLS_INFO_ZC_RO_TX,
TLS_INFO_RX_NO_PAD,
+ TLS_INFO_TX_RECORD_SIZE_LIM,
__TLS_INFO_MAX,
};
#define TLS_INFO_MAX (__TLS_INFO_MAX - 1)
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index f672a62a9a52..bf16ceb41dde 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -459,7 +459,7 @@ static int tls_push_data(struct sock *sk,
/* TLS_HEADER_SIZE is not counted as part of the TLS record, and
* we need to leave room for an authentication tag.
*/
- max_open_record_len = TLS_MAX_PAYLOAD_SIZE +
+ max_open_record_len = tls_ctx->tx_record_size_limit +
prot->prepend_size;
do {
rc = tls_do_allocation(sk, ctx, pfrag, prot->prepend_size);
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index a3ccb3135e51..09883d9c6c96 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -544,6 +544,31 @@ static int do_tls_getsockopt_no_pad(struct sock *sk, char __user *optval,
return 0;
}
+static int do_tls_getsockopt_tx_record_size(struct sock *sk, char __user *optval,
+ int __user *optlen)
+{
+ struct tls_context *ctx = tls_get_ctx(sk);
+ int len;
+ /* TLS 1.3: Record length contains ContentType */
+ u16 record_size_limit = ctx->prot_info.version == TLS_1_3_VERSION ?
+ ctx->tx_record_size_limit + 1 :
+ ctx->tx_record_size_limit;
+
+ if (get_user(len, optlen))
+ return -EFAULT;
+
+ if (len < sizeof(record_size_limit))
+ return -EINVAL;
+
+ if (put_user(sizeof(record_size_limit), optlen))
+ return -EFAULT;
+
+ if (copy_to_user(optval, &record_size_limit, sizeof(record_size_limit)))
+ return -EFAULT;
+
+ return 0;
+}
+
static int do_tls_getsockopt(struct sock *sk, int optname,
char __user *optval, int __user *optlen)
{
@@ -563,6 +588,9 @@ static int do_tls_getsockopt(struct sock *sk, int optname,
case TLS_RX_EXPECT_NO_PAD:
rc = do_tls_getsockopt_no_pad(sk, optval, optlen);
break;
+ case TLS_TX_RECORD_SIZE_LIM:
+ rc = do_tls_getsockopt_tx_record_size(sk, optval, optlen);
+ break;
default:
rc = -ENOPROTOOPT;
break;
@@ -812,6 +840,43 @@ static int do_tls_setsockopt_no_pad(struct sock *sk, sockptr_t optval,
return rc;
}
+static int do_tls_setsockopt_tx_record_size(struct sock *sk, sockptr_t optval,
+ unsigned int optlen)
+{
+ struct tls_context *ctx = tls_get_ctx(sk);
+ struct tls_sw_context_tx *sw_ctx = tls_sw_ctx_tx(ctx);
+ u16 value;
+
+ if (sw_ctx->open_rec)
+ return -EBUSY;
+
+ if (sockptr_is_null(optval) || optlen != sizeof(value))
+ return -EINVAL;
+
+ if (copy_from_sockptr(&value, optval, sizeof(value)))
+ return -EFAULT;
+
+ if (value < TLS_MIN_RECORD_SIZE_LIM)
+ return -EINVAL;
+
+ if (ctx->prot_info.version == TLS_1_2_VERSION &&
+ value > TLS_MAX_PAYLOAD_SIZE)
+ return -EINVAL;
+
+ if (ctx->prot_info.version == TLS_1_3_VERSION &&
+ value - 1 > TLS_MAX_PAYLOAD_SIZE)
+ return -EINVAL;
+
+ /*
+ * For TLS 1.3: 'value' includes one byte for the appended ContentType.
+ * Adjust the kernel's internal plaintext limit accordingly.
+ */
+ ctx->tx_record_size_limit = ctx->prot_info.version == TLS_1_3_VERSION ?
+ value - 1 : value;
+
+ return 0;
+}
+
static int do_tls_setsockopt(struct sock *sk, int optname, sockptr_t optval,
unsigned int optlen)
{
@@ -833,6 +898,9 @@ static int do_tls_setsockopt(struct sock *sk, int optname, sockptr_t optval,
case TLS_RX_EXPECT_NO_PAD:
rc = do_tls_setsockopt_no_pad(sk, optval, optlen);
break;
+ case TLS_TX_RECORD_SIZE_LIM:
+ rc = do_tls_setsockopt_tx_record_size(sk, optval, optlen);
+ break;
default:
rc = -ENOPROTOOPT;
break;
@@ -1022,6 +1090,7 @@ static int tls_init(struct sock *sk)
ctx->tx_conf = TLS_BASE;
ctx->rx_conf = TLS_BASE;
+ ctx->tx_record_size_limit = TLS_MAX_PAYLOAD_SIZE;
update_sk_prot(sk, ctx);
out:
write_unlock_bh(&sk->sk_callback_lock);
@@ -1111,6 +1180,11 @@ static int tls_get_info(struct sock *sk, struct sk_buff *skb, bool net_admin)
goto nla_failure;
}
+ err = nla_put_u16(skb, TLS_INFO_TX_RECORD_SIZE_LIM,
+ ctx->tx_record_size_limit);
+ if (err)
+ goto nla_failure;
+
rcu_read_unlock();
nla_nest_end(skb, start);
return 0;
@@ -1132,6 +1206,7 @@ static size_t tls_get_info_size(const struct sock *sk, bool net_admin)
nla_total_size(sizeof(u16)) + /* TLS_INFO_TXCONF */
nla_total_size(0) + /* TLS_INFO_ZC_RO_TX */
nla_total_size(0) + /* TLS_INFO_RX_NO_PAD */
+ nla_total_size(sizeof(u16)) + /* TLS_INFO_TX_RECORD_SIZE_LIM */
0;
return size;
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index bac65d0d4e3e..28fb796573d1 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1079,7 +1079,7 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg,
orig_size = msg_pl->sg.size;
full_record = false;
try_to_copy = msg_data_left(msg);
- record_room = TLS_MAX_PAYLOAD_SIZE - msg_pl->sg.size;
+ record_room = tls_ctx->tx_record_size_limit - msg_pl->sg.size;
if (try_to_copy >= record_room) {
try_to_copy = record_room;
full_record = true;
--
2.51.0
This check was removed in commit e6f497955fb6 ("ipv6: Check GATEWAY
in rtm_to_fib6_multipath_config().") as part of rt6_qualify_for ecmp().
The author correctly recognises that rt6_qualify_for_ecmp() returns
false if fb_nh_gw_family is set to AF_UNSPEC, but then mistakes
AF_UNSPEC for AF_INET6 when reasoning that the check is unnecessary.
This means certain malformed entries don't get caught in
ip6_route_multipath_add().
This patch reintroduces the AF_UNSPEC check while respecting changes
of the initial patch.
Reported-by: syzbot+a259a17220263c2d73fc(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=a259a17220263c2d73fc
Fixes: e6f497955fb6 ("ipv6: Check GATEWAY in rtm_to_fib6_multipath_config().")
Signed-off-by: Maksimilijan Marosevic <maksimilijan.marosevic(a)proton.me>
---
net/ipv6/route.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index aee6a10b112a..884bae3fb1b1 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -5454,6 +5454,14 @@ static int ip6_route_multipath_add(struct fib6_config *cfg,
goto cleanup;
}
+ if (rt->fib6_nh->fib_nh_gw_family == AF_UNSPEC) {
+ err = -EINVAL;
+ NL_SET_ERR_MSG(extack,
+ "Device only routes can not be added for IPv6 using the multipath API.");
+ fib6_info_release(rt);
+ goto cleanup;
+ }
+
rt->fib6_nh->fib_nh_weight = rtnh->rtnh_hops + 1;
err = ip6_route_info_append(&rt6_nh_list, rt, &r_cfg);
--
2.43.0
The Alphabetical layout was... auto-correct.
Friendly: Neo (Jesus-Christ a.k.a. King David again).
-
Miracles proving that I was Jesus, and King David, are in my Facebook.
I proposed to explain the Holy-Trinity live on TV.
It is since year 2000 that I must be in the news.
I do not declare be Jew, but get persecuted anyway by antisemites (antichristians?).
Writing still does not pay my bills at all, keeps me almost homeless.
While my intellectual property is stolen, my names removed from many Masterpieces.
Facebook
https://www.facebook.com/profile.php?id=100057121342964
Paypal
https://www.paypal.com/paypalme/meDavidSantamaria
Email/Teams
DavidSantamaria(a)hotmail.fr<mailto:DavidSantamaria@hotmail.fr>
RCS/WhatsApp
+33 7 67 99 32 37
$ADA
addr1qx8chpwdeqv77duf2eutrtgvd5967l4w87fy54fx0022gr8p80z2mq7cmunmrdvy8yn3pzfzpm46zyfjp8usl36vpw2q509hrd
$BTC
3FdwVoDzJoUzceogUwEmVu9YxoFvX6c2Rk
$WAVES
3PEAeFkwqVsgAiyad8uVQzQxiGDyJNnCCn5
Sent from my Public Address.
selftests/net/lib.sh contains a suite of iproute2 wrappers that
automatically schedule the corresponding cleanup through defer. The fact
they do so is however not immediately obvious, one needs to know which
functions are handling the deferral behind the scenes, and which expect the
caller to handle cleanups themselves.
A convention for these auto-deferring functions would help both writing and
patch review. This patchset does so by marking these functions with an adf_
prefix. We already have a few such functions: forwarding/lib.sh has
adf_mcd_start() and a few selftests add private helpers that conform to
this convention.
Patches #1 to #8 gradually convert individual functions, one per patch.
Patch #9 renames an auto-deferring private helpers named dfr_* to adf_*.
The plan is not to retro-rename all private helpers, but I happened to know
about this one.
Patches #10 to #12 introduce several autodefer helpers for commonly used
forwarding/lib.sh functions, and opportunistically convert straightforward
instances of 'action; defer counteraction' to the new helpers.
Patch #13 adds some README verbiage to pitch defer and the adf_*
convention.
Petr Machata (13):
selftests: net: lib: Rename ip_link_add() to adf_*
selftests: net: lib: Rename ip_link_set_master() to adf_*
selftests: net: lib: Rename ip_link_set_addr() to adf_*
selftests: net: lib: Rename ip_link_set_up() to adf_*
selftests: net: lib: Rename ip_link_set_down() to adf_*
selftests: net: lib: Rename ip_addr_add() to adf_*
selftests: net: lib: Rename ip_route_add() to adf_*
selftests: net: lib: Rename bridge_vlan_add() to adf_*
selftests: net: vlan_bridge_binding: Rename dfr_set_binding_*() to
adf_*
selftests: forwarding: lib: Add an autodefer variant of vrf_prepare()
selftests: forwarding: lib: Add an autodefer variant of
simple_if_init()
selftests: forwarding: lib: Add an autodefer variant of
forwarding_enable()
selftests: forwarding: README: Mention defer, adf_
.../drivers/net/mlxsw/devlink_trap_policer.sh | 9 +-
.../drivers/net/mlxsw/qos_ets_strict.sh | 12 +-
.../drivers/net/mlxsw/qos_max_descriptors.sh | 9 +-
.../drivers/net/mlxsw/qos_mc_aware.sh | 12 +-
.../drivers/net/mlxsw/sch_red_core.sh | 6 +-
tools/testing/selftests/net/fdb_notify.sh | 26 ++--
tools/testing/selftests/net/forwarding/README | 15 ++
.../net/forwarding/bridge_activity_notify.sh | 21 ++-
.../net/forwarding/bridge_fdb_local_vlan_0.sh | 65 ++++----
tools/testing/selftests/net/forwarding/lib.sh | 18 +++
.../selftests/net/forwarding/sch_ets_core.sh | 9 +-
.../selftests/net/forwarding/sch_red.sh | 12 +-
.../selftests/net/forwarding/sch_tbf_core.sh | 6 +-
.../net/forwarding/vxlan_bridge_1q_mc_ul.sh | 141 +++++++++---------
.../net/forwarding/vxlan_reserved.sh | 33 ++--
tools/testing/selftests/net/lib.sh | 16 +-
.../net/test_vxlan_fdb_changelink.sh | 8 +-
.../selftests/net/vlan_bridge_binding.sh | 44 +++---
18 files changed, 225 insertions(+), 237 deletions(-)
--
2.49.0
Here are some patches for the MPTCP PM, including some refactoring that
I thought it would be best to send at the end of a cycle to avoid
conflicts between net and net-next that could last a few weeks.
The most interesting changes are in the first and last patch, the rest
are patches refactoring the code & tests to validate the modifications.
- Patches 1 & 2: When servers set the C-flag in their MP_CAPABLE to tell
clients not to create subflows to the initial address and port -- e.g.
a deployment behind a L4 load balancer like a typical CDN deployment
-- clients will not use their other endpoints when default settings
are used. That's because the in-kernel path-manager uses the 'subflow'
endpoints to create subflows only to the initial address and port. The
first patch fixes that (for >=v5.14), and the second one validates it.
- Patches 3-14: various patches refactoring the code around the
in-kernel PM (mainly): split too long functions, rename variables and
functions to avoid confusions, reduce structure size, and compare IDs
instead of IP addresses. Note that one patch modifies one internal
variable used in one BPF selftest.
- Patch 15: ability to control endpoints that are used in reaction to a
new address announced by the other peer. With that, endpoints can be
used only once.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Notes:
- Patches 1 & 2 are sent to net-next on purpose: to delay a bit the
backports, just in case. Plus we are at the end of a cycle, and not
to delay the other refactoring patches.
- Sorry, I wanted to send this series earlier on, but due to some
unrelated issues (and holiday), it got delayed. Most patches are
pure refactoring ones.
---
Matthieu Baerts (NGI0) (15):
mptcp: pm: in-kernel: usable client side with C-flag
selftests: mptcp: join: validate C-flag + def limit
mptcp: pm: in-kernel: refactor fill_local_addresses_vec
mptcp: pm: in-kernel: refactor fill_remote_addresses_vec
mptcp: pm: rename 'subflows' to 'extra_subflows'
mptcp: pm: in-kernel: rename 'subflows_max' to 'limit_extra_subflows'
mptcp: pm: in-kernel: rename 'add_addr_signal_max' to 'endp_signal_max'
mptcp: pm: in-kernel: rename 'add_addr_accept_max' to 'limit_add_addr_accepted'
mptcp: pm: in-kernel: rename 'local_addr_max' to 'endp_subflow_max'
mptcp: pm: in-kernel: rename 'local_addr_list' to 'endp_list'
mptcp: pm: in-kernel: rename 'addrs' to 'endpoints'
mptcp: pm: in-kernel: remove stale_loss_cnt
mptcp: pm: in-kernel: reduce pernet struct size
mptcp: pm: in-kernel: compare IDs instead of addresses
mptcp: pm: in-kernel: add laminar endpoints
include/uapi/linux/mptcp.h | 11 +-
net/mptcp/pm.c | 32 +-
net/mptcp/pm_kernel.c | 569 ++++++++++++++--------
net/mptcp/pm_userspace.c | 2 +-
net/mptcp/protocol.h | 21 +-
net/mptcp/sockopt.c | 22 +-
tools/testing/selftests/bpf/progs/mptcp_subflow.c | 2 +-
tools/testing/selftests/net/mptcp/mptcp_join.sh | 11 +
8 files changed, 441 insertions(+), 229 deletions(-)
---
base-commit: a1f1f2422e098485b09e55a492de05cf97f9954d
change-id: 20250925-net-next-mptcp-c-flag-laminar-f8442e4d4bd9
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Fix to avoid cases where the `res` shell variable is
empty in script comparisons.
The comparison has been modified into string comparison to
handle other possible values the variable could assume.
The issue can be reproduced with the command:
make kselftest TARGETS=net
It solves the error:
./tfo_passive.sh: line 98: [: -eq: unary operator expected
Signed-off-by: Alessandro Zanni <alessandro.zanni87(a)gmail.com>
---
Notes:
v2: edit condition to handle strings
tools/testing/selftests/net/tfo_passive.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/tfo_passive.sh b/tools/testing/selftests/net/tfo_passive.sh
index 80bf11fdc046..a4550511830a 100755
--- a/tools/testing/selftests/net/tfo_passive.sh
+++ b/tools/testing/selftests/net/tfo_passive.sh
@@ -95,7 +95,7 @@ wait
res=$(cat $out_file)
rm $out_file
-if [ $res -eq 0 ]; then
+if [ "$res" = "0" ]; then
echo "got invalid NAPI ID from passive TFO socket"
cleanup_ns
exit 1
--
2.43.0
The generic vDSO provides a lot common functionality shared between
different architectures. SPARC is the last architecture not using it,
preventing some necessary code cleanup.
Make use of the generic infrastructure.
Follow-up to and replacement for Arnd's SPARC vDSO removal patches:
https://lore.kernel.org/lkml/20250707144726.4008707-1-arnd@kernel.org/
Tested on a Niagara T4 and QEMU.
This has a semantic conflict with my series "vdso: Reject absolute
relocations during build". The last patch of this series expects all users
of the generic vDSO library to use the vdsocheck tool.
This is not the case (yet) for SPARC64. I do have the patches for the
integration, the specifics will depend on which series is applied first.
Based on tip/timers/vdso.
[0] https://lore.kernel.org/lkml/20250812-vdso-absolute-reloc-v4-0-61a8b615e5ec…
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Changes in v3:
- Allocate vDSO data pages dynamically (and lots of preparations for that)
- Drop clock_getres()
- Fix 32bit clock_gettime() syscall fallback
- Link to v2: https://lore.kernel.org/r/20250815-vdso-sparc64-generic-2-v2-0-b5ff80672347…
Changes in v2:
- Rebase on v6.17-rc1
- Drop RFC state
- Fix typo in commit message
- Drop duplicate 'select GENERIC_TIME_VSYSCALL'
- Merge "sparc64: time: Remove architecture-specific clocksource data" into the
main conversion patch. It violated the check in __clocksource_register_scale()
- Link to v1: https://lore.kernel.org/r/20250724-vdso-sparc64-generic-2-v1-0-e376a3bd24d1…
---
Arnd Bergmann (1):
clocksource: remove ARCH_CLOCKSOURCE_DATA
Thomas Weißschuh (35):
selftests: vDSO: vdso_test_correctness: Handle different tv_usec types
arm64: vDSO: getrandom: Explicitly include asm/alternative.h
arm64: vDSO: gettimeofday: Explicitly include vdso/clocksource.h
arm64: vDSO: compat_gettimeofday: Add explicit includes
ARM: vdso: gettimeofday: Add explicit includes
powerpc/vdso/gettimeofday: Explicitly include vdso/time32.h
powerpc/vdso: Explicitly include asm/cputable.h and asm/feature-fixups.h
LoongArch: vDSO: Explicitly include asm/vdso/vdso.h
MIPS: vdso: Add include guard to asm/vdso/vdso.h
MIPS: vdso: Explicitly include asm/vdso/vdso.h
random: vDSO: Add explicit includes
vdso/gettimeofday: Add explicit includes
vdso/helpers: Explicitly include vdso/processor.h
vdso/datapage: Remove inclusion of gettimeofday.h
vdso/datapage: Trim down unnecessary includes
random: vDSO: trim vDSO includes
random: vDSO: remove ifdeffery
random: vDSO: split out datapage update into helper functions
random: vDSO: only access vDSO datapage after random_init()
s390/time: Set up vDSO datapage later
vdso/datastore: Reduce scope of some variables in vvar_fault()
vdso/datastore: Drop inclusion of linux/mmap_lock.h
vdso/datastore: Map pages through struct page
vdso/datastore: Allocate data pages dynamically
sparc64: vdso: Link with -z noexecstack
sparc64: vdso: Remove obsolete "fake section table" reservation
sparc64: vdso: Replace code patching with runtime conditional
sparc64: vdso: Move hardware counter read into header
sparc64: vdso: Move syscall fallbacks into header
sparc64: vdso: Introduce vdso/processor.h
sparc64: vdso: Switch to the generic vDSO library
sparc64: vdso2c: Drop sym_vvar_start handling
sparc64: vdso2c: Remove symbol handling
sparc64: vdso: Implement clock_gettime64()
clocksource: drop include of asm/clocksource.h from linux/clocksource.h
arch/arm/include/asm/vdso/gettimeofday.h | 2 +
arch/arm64/include/asm/vdso/compat_gettimeofday.h | 3 +
arch/arm64/include/asm/vdso/gettimeofday.h | 2 +
arch/arm64/kernel/vdso/vgetrandom.c | 2 +
arch/loongarch/kernel/process.c | 1 +
arch/loongarch/kernel/vdso.c | 1 +
arch/mips/include/asm/vdso/vdso.h | 5 +
arch/mips/kernel/vdso.c | 1 +
arch/powerpc/include/asm/vdso/gettimeofday.h | 1 +
arch/powerpc/include/asm/vdso/processor.h | 3 +
arch/s390/kernel/time.c | 4 +-
arch/sparc/Kconfig | 3 +-
arch/sparc/include/asm/clocksource.h | 9 -
arch/sparc/include/asm/processor.h | 3 +
arch/sparc/include/asm/processor_32.h | 2 -
arch/sparc/include/asm/processor_64.h | 25 --
arch/sparc/include/asm/vdso.h | 2 -
arch/sparc/include/asm/vdso/clocksource.h | 10 +
arch/sparc/include/asm/vdso/gettimeofday.h | 184 ++++++++++
arch/sparc/include/asm/vdso/processor.h | 41 +++
arch/sparc/include/asm/vdso/vsyscall.h | 10 +
arch/sparc/include/asm/vvar.h | 75 ----
arch/sparc/kernel/Makefile | 1 -
arch/sparc/kernel/time_64.c | 6 +-
arch/sparc/kernel/vdso.c | 69 ----
arch/sparc/vdso/Makefile | 8 +-
arch/sparc/vdso/vclock_gettime.c | 380 ++-------------------
arch/sparc/vdso/vdso-layout.lds.S | 26 +-
arch/sparc/vdso/vdso.lds.S | 2 -
arch/sparc/vdso/vdso2c.c | 24 --
arch/sparc/vdso/vdso2c.h | 45 +--
arch/sparc/vdso/vdso32/vdso32.lds.S | 4 +-
arch/sparc/vdso/vma.c | 274 +--------------
drivers/char/random.c | 75 ++--
include/linux/clocksource.h | 8 -
include/linux/vdso_datastore.h | 6 +
include/vdso/datapage.h | 23 +-
include/vdso/helpers.h | 1 +
init/main.c | 2 +
kernel/time/Kconfig | 4 -
lib/vdso/datastore.c | 73 ++--
lib/vdso/getrandom.c | 3 +
lib/vdso/gettimeofday.c | 17 +
.../testing/selftests/vDSO/vdso_test_correctness.c | 8 +-
44 files changed, 451 insertions(+), 997 deletions(-)
---
base-commit: 5f84f6004e298bd41c9e4ed45c18447954b1dce6
change-id: 20250722-vdso-sparc64-generic-2-25f2e058e92c
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Fix to avoid the usage of the `res` variable uninitialized in the
following macro expansions.
It solves the following warning:
In function ‘iommufd_viommu_vdevice_alloc’,
inlined from ‘wrapper_iommufd_viommu_vdevice_alloc’ at
iommufd.c:2889:1:
../kselftest_harness.h:760:12: warning: ‘ret’ may be used uninitialized
[-Wmaybe-uninitialized]
760 | if (!(__exp _t __seen)) { \
| ^
../kselftest_harness.h:513:9: note: in expansion of macro ‘__EXPECT’
513 | __EXPECT(expected, #expected, seen, #seen, ==, 1)
| ^~~~~~~~
iommufd_utils.h:1057:9: note: in expansion of macro ‘ASSERT_EQ’
1057 | ASSERT_EQ(0, _test_cmd_trigger_vevents(self->fd, dev_id,
nvevents))
| ^~~~~~~~~
iommufd.c:2924:17: note: in expansion of macro
‘test_cmd_trigger_vevents’
2924 | test_cmd_trigger_vevents(dev_id, 3);
| ^~~~~~~~~~~~~~~~~~~~~~~~
The issue can be reproduced, building the tests, with the command:
make -C tools/testing/selftests TARGETS=iommu
Signed-off-by: Alessandro Zanni <alessandro.zanni87(a)gmail.com>
---
tools/testing/selftests/iommu/iommufd_utils.h | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 3c3e08b8c90e..772ca1db6e59 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -1042,15 +1042,13 @@ static int _test_cmd_trigger_vevents(int fd, __u32 dev_id, __u32 nvevents)
.dev_id = dev_id,
},
};
- int ret;
while (nvevents--) {
- ret = ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_TRIGGER_VEVENT),
- &trigger_vevent_cmd);
- if (ret < 0)
+ if (!ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_TRIGGER_VEVENT),
+ &trigger_vevent_cmd))
return -1;
}
- return ret;
+ return 0;
}
#define test_cmd_trigger_vevents(dev_id, nvevents) \
--
2.43.0
This series is preparing to add the -Wsign-compare C compilation flag to
the Makefile for bpf selftests as requested by a TODO to help avoid
implicit type conversions and have predictable behavior.
Changelog:
Changes from v2:
-Split up the patch into a patch series as suggested by vivek
-Include only changes to variable types with no casting by my mentor
david
-Removed the -Wsign-compare in Makefile to avoid compilation errors
until adding casting for rest of comparisons.
Link:https://lore.kernel.org/bpf/20250924195731.6374-1-mehdi.benhadjkhelifa…
Changes from v1:
- Fix CI failed builds where it failed due to do missing .c and
.h files in my patch for working in mainline.
Link:https://lore.kernel.org/bpf/20250924162408.815137-1-mehdi.benhadjkheli…
Mehdi Ben Hadj Khelifa (3):
selftests/bpf: Prepare to add -Wsign-compare for bpf tests
selftests/bpf: Prepare to add -Wsign-compare for bpf tests
selftests/bpf: Prepare to add -Wsign-compare for bpf tests
tools/testing/selftests/bpf/progs/test_global_func11.c | 2 +-
tools/testing/selftests/bpf/progs/test_global_func12.c | 2 +-
tools/testing/selftests/bpf/progs/test_global_func13.c | 2 +-
tools/testing/selftests/bpf/progs/test_global_func9.c | 2 +-
tools/testing/selftests/bpf/progs/test_map_init.c | 2 +-
tools/testing/selftests/bpf/progs/test_parse_tcp_hdr_opt.c | 2 +-
.../selftests/bpf/progs/test_parse_tcp_hdr_opt_dynptr.c | 2 +-
tools/testing/selftests/bpf/progs/test_skb_ctx.c | 2 +-
tools/testing/selftests/bpf/progs/test_snprintf.c | 2 +-
tools/testing/selftests/bpf/progs/test_sockmap_strp.c | 2 +-
tools/testing/selftests/bpf/progs/test_tc_tunnel.c | 2 +-
tools/testing/selftests/bpf/progs/test_xdp.c | 2 +-
tools/testing/selftests/bpf/progs/test_xdp_dynptr.c | 2 +-
tools/testing/selftests/bpf/progs/test_xdp_loop.c | 2 +-
tools/testing/selftests/bpf/progs/test_xdp_noinline.c | 4 ++--
tools/testing/selftests/bpf/progs/uprobe_multi.c | 4 ++--
.../selftests/bpf/progs/uprobe_multi_session_recursive.c | 5 +++--
.../selftests/bpf/progs/verifier_iterating_callbacks.c | 2 +-
18 files changed, 22 insertions(+), 21 deletions(-)
--
2.51.0
Add a basic test suite for drivers that support PSP. Also, add a PSP
implementation in the netdevsim driver.
The netdevsim implementation does encapsulation and decapsulation of
PSP packets, but no crypto.
The tests cover the basic usage of the uapi, and demonstrate key
exchange and connection setup. The tests and netdevsim support IPv4
and IPv6. Here is an example run on a system with a CX7 NIC.
TAP version 13
1..28
ok 1 psp.data_basic_send_v0_ip4
ok 2 psp.data_basic_send_v0_ip6
ok 3 psp.data_basic_send_v1_ip4
ok 4 psp.data_basic_send_v1_ip6
ok 5 psp.data_basic_send_v2_ip4 # SKIP ('PSP version not supported', 'hdr0-aes-gmac-128')
ok 6 psp.data_basic_send_v2_ip6 # SKIP ('PSP version not supported', 'hdr0-aes-gmac-128')
ok 7 psp.data_basic_send_v3_ip4 # SKIP ('PSP version not supported', 'hdr0-aes-gmac-256')
ok 8 psp.data_basic_send_v3_ip6 # SKIP ('PSP version not supported', 'hdr0-aes-gmac-256')
ok 9 psp.data_mss_adjust_ip4
ok 10 psp.data_mss_adjust_ip6
ok 11 psp.dev_list_devices
ok 12 psp.dev_get_device
ok 13 psp.dev_get_device_bad
ok 14 psp.dev_rotate
ok 15 psp.dev_rotate_spi
ok 16 psp.assoc_basic
ok 17 psp.assoc_bad_dev
ok 18 psp.assoc_sk_only_conn
ok 19 psp.assoc_sk_only_mismatch
ok 20 psp.assoc_sk_only_mismatch_tx
ok 21 psp.assoc_sk_only_unconn
ok 22 psp.assoc_version_mismatch
ok 23 psp.assoc_twice
ok 24 psp.data_send_bad_key
ok 25 psp.data_send_disconnect
ok 26 psp.data_stale_key
ok 27 psp.removal_device_rx # XFAIL Test only works on netdevsim
ok 28 psp.removal_device_bi # XFAIL Test only works on netdevsim
# Totals: pass:22 fail:0 xfail:2 xpass:0 skip:4 error:0
#
# Responder logs (0):
# STDERR:
# Set PSP enable on device 1 to 0x3
# Set PSP enable on device 1 to 0x0
CHANGES:
v2:
- fix pylint warnings
- insert CONFIG_INET_PSP in alphebetical order
- use branch to skip all tests
- fix compilation error when CONFIG_INET_PSP is not set
v1: https://lore.kernel.org/netdev/20250924194959.2845473-1-daniel.zahka@gmail.…
Jakub Kicinski (8):
netdevsim: a basic test PSP implementation
selftests: drv-net: base device access API test
selftests: drv-net: add PSP responder
selftests: drv-net: psp: add basic data transfer and key rotation
tests
selftests: drv-net: psp: add association tests
selftests: drv-net: psp: add connection breaking tests
selftests: drv-net: psp: add test for auto-adjusting TCP MSS
selftests: drv-net: psp: add tests for destroying devices
drivers/net/netdevsim/Makefile | 4 +
drivers/net/netdevsim/netdev.c | 55 +-
drivers/net/netdevsim/netdevsim.h | 33 +
drivers/net/netdevsim/psp.c | 234 +++++++
net/core/skbuff.c | 1 +
.../testing/selftests/drivers/net/.gitignore | 1 +
tools/testing/selftests/drivers/net/Makefile | 10 +
tools/testing/selftests/drivers/net/config | 1 +
.../drivers/net/hw/lib/py/__init__.py | 4 +-
.../selftests/drivers/net/lib/py/__init__.py | 4 +-
.../selftests/drivers/net/lib/py/env.py | 5 +
tools/testing/selftests/drivers/net/psp.py | 593 ++++++++++++++++++
.../selftests/drivers/net/psp_responder.c | 483 ++++++++++++++
.../testing/selftests/net/lib/py/__init__.py | 2 +-
tools/testing/selftests/net/lib/py/ksft.py | 10 +
tools/testing/selftests/net/lib/py/ynl.py | 5 +
16 files changed, 1432 insertions(+), 13 deletions(-)
create mode 100644 drivers/net/netdevsim/psp.c
create mode 100755 tools/testing/selftests/drivers/net/psp.py
create mode 100644 tools/testing/selftests/drivers/net/psp_responder.c
--
2.47.3
From: Dylan Yudaken <dyudaken(a)gmail.com>
Add a .gitignore for the test case build object.
Signed-off-by: Dylan Yudaken <dyudaken(a)gmail.com>
Signed-off-by: Sohil Mehta <sohil.mehta(a)intel.com>
Reviewed-by: Simon Horman <horms(a)kernel.org>
---
The binary creates some noise. The patch to fix that seems to have
fallen through the cracks. Sending another revision with an expanded Cc
list.
v2:
- Pick up the review tag
v1: https://lore.kernel.org/all/20250623232549.3263273-1-dyudaken@gmail.com/
---
tools/testing/selftests/kexec/.gitignore | 2 ++
1 file changed, 2 insertions(+)
create mode 100644 tools/testing/selftests/kexec/.gitignore
diff --git a/tools/testing/selftests/kexec/.gitignore b/tools/testing/selftests/kexec/.gitignore
new file mode 100644
index 000000000000..5f3d9e089ae8
--- /dev/null
+++ b/tools/testing/selftests/kexec/.gitignore
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+test_kexec_jump
--
2.43.0
This is v10 of the TDX selftests.
This series is based on v6.17-rc4 and has a dependency on
"KVM: TDX: Force split irqchip for TDX at irqchip creation time" [1]
Changes from v9 [2]:
- Rebased on top of v6.17-rc4.
- Addressed the comments from v9.
- Removed special handling for split irqchip in the test code in favor
for the kvm fix in [1].
- Removed outdated support for VM memory not backed by guest_memfd.
- Split "KVM: selftests: Hook TDX support to vm and vcpu creation" into
4 separate patches.
[1] https://lore.kernel.org/lkml/20250904062007.622530-1-sagis@google.com/
[2] https://lore.kernel.org/lkml/20250821042915.3712925-1-sagis@google.com/
Ackerley Tng (2):
KVM: selftests: Add helpers to init TDX memory and finalize VM
KVM: selftests: Add ucall support for TDX
Erdem Aktas (2):
KVM: selftests: Add TDX boot code
KVM: selftests: Add support for TDX TDCALL from guest
Isaku Yamahata (2):
KVM: selftests: Update kvm_init_vm_address_properties() for TDX
KVM: selftests: TDX: Use KVM_TDX_CAPABILITIES to validate TDs'
attribute configuration
Sagi Shahar (15):
KVM: selftests: Allocate pgd in virt_map() as necessary
KVM: selftests: Expose functions to get default sregs values
KVM: selftests: Expose function to allocate guest vCPU stack
KVM: selftests: Expose segment definitons to assembly files
KVM: selftests: Add kbuild definitons
KVM: selftests: Define structs to pass parameters to TDX boot code
KVM: selftests: Set up TDX boot code region
KVM: selftests: Set up TDX boot parameters region
KVM: selftests: Add helper to initialize TDX VM
KVM: selftests: Call TDX init when creating a new TDX vm
KVM: selftests: Setup memory regions for TDX on vm creation
KVM: selftests: Call KVM_TDX_INIT_VCPU when creating a new TDX vcpu
KVM: selftests: Set entry point for TDX guest code
KVM: selftests: Add wrapper for TDX MMIO from guest
KVM: selftests: Add TDX lifecycle test
tools/include/linux/kbuild.h | 18 +
tools/testing/selftests/kvm/Makefile.kvm | 32 ++
.../selftests/kvm/include/x86/processor.h | 35 ++
.../selftests/kvm/include/x86/processor_asm.h | 12 +
.../selftests/kvm/include/x86/tdx/td_boot.h | 74 ++++
.../kvm/include/x86/tdx/td_boot_asm.h | 16 +
.../selftests/kvm/include/x86/tdx/tdcall.h | 34 ++
.../selftests/kvm/include/x86/tdx/tdx.h | 14 +
.../selftests/kvm/include/x86/tdx/tdx_util.h | 86 +++++
.../testing/selftests/kvm/include/x86/ucall.h | 4 +-
tools/testing/selftests/kvm/lib/kvm_util.c | 10 +-
.../testing/selftests/kvm/lib/x86/processor.c | 91 +++--
.../selftests/kvm/lib/x86/tdx/td_boot.S | 60 +++
.../kvm/lib/x86/tdx/td_boot_offsets.c | 21 ++
.../selftests/kvm/lib/x86/tdx/tdcall.S | 93 +++++
.../kvm/lib/x86/tdx/tdcall_offsets.c | 16 +
tools/testing/selftests/kvm/lib/x86/tdx/tdx.c | 23 ++
.../selftests/kvm/lib/x86/tdx/tdx_util.c | 354 ++++++++++++++++++
tools/testing/selftests/kvm/lib/x86/ucall.c | 45 ++-
tools/testing/selftests/kvm/x86/tdx_vm_test.c | 31 ++
20 files changed, 1032 insertions(+), 37 deletions(-)
create mode 100644 tools/include/linux/kbuild.h
create mode 100644 tools/testing/selftests/kvm/include/x86/processor_asm.h
create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/td_boot.h
create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/td_boot_asm.h
create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/tdcall.h
create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/tdx.h
create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/tdx_util.h
create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/td_boot.S
create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/td_boot_offsets.c
create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdcall.S
create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdcall_offsets.c
create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdx.c
create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdx_util.c
create mode 100644 tools/testing/selftests/kvm/x86/tdx_vm_test.c
--
2.51.0.338.gd7d06c2dae-goog
From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com>
Hello,
Plesae find the v2 AccECN case handling patch series, which covers
several excpetional case handling of Accurate ECN spec (RFC9768),
adds new identifiers to be used by CC modules, adds ecn_delta into
rate_sample, and keeps the ACE counter for computation, etc.
This patch series is part of the full AccECN patch series, which is available at
https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/
Best regards,
Chia-Yu
---
Chia-Yu Chang (11):
tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules
tcp: disable RFC3168 fallback identifier for CC modules
tcp: accecn: handle unexpected AccECN negotiation feedback
tcp: accecn: retransmit downgraded SYN in AccECN negotiation
tcp: move increment of num_retrans
tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN
SYN/ACK
tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion
tcp: accecn: fallback outgoing half link to non-AccECN
tcp: accecn: verify ACE counter in 1st ACK after AccECN negotiation
tcp: accecn: stop sending AccECN opt when loss ACK w/ option
tcp: accecn: enable AccECN
Ilpo Järvinen (3):
tcp: try to avoid safer when ACKs are thinned
gro: flushing when CWR is set negatively affects AccECN
tcp: accecn: Add ece_delta to rate_sample
.../networking/net_cachelines/tcp_sock.rst | 1 +
include/linux/tcp.h | 4 +-
include/net/inet_ecn.h | 20 +++-
include/net/tcp.h | 30 +++++-
include/net/tcp_ecn.h | 85 ++++++++++++-----
net/ipv4/sysctl_net_ipv4.c | 2 +-
net/ipv4/tcp.c | 2 +
net/ipv4/tcp_cong.c | 9 +-
net/ipv4/tcp_input.c | 91 +++++++++++++------
net/ipv4/tcp_minisocks.c | 40 +++++---
net/ipv4/tcp_offload.c | 3 +-
net/ipv4/tcp_output.c | 38 +++++---
12 files changed, 239 insertions(+), 86 deletions(-)
--
2.34.1
This patchset introduces target resume capability to netconsole allowing
it to recover targets when underlying low-level interface comes back
online.
The patchset starts by refactoring netconsole state representation in
order to allow representing deactivated targets (targets that are
disabled due to interfaces going down). It then modifies netconsole to
handle NETDEV_UP events for such targets and setups netpoll.
The patchset includes a selftest that validates netconsole target state
transitions and that target is functional after resumed.
Signed-off-by: Andre Carvalho <asantostc(a)gmail.com>
---
Changes in v2:
- Attempt to resume target in the same thread, instead of using
workqueue .
- Add wrapper around __netpoll_setup (patch 4).
- Renamed resume_target to maybe_resume_target and moved conditionals to
inside its implementation, keeping code more clear.
- Verify that device addr matches target mac address when target was
setup using mac.
- Update selftest to cover targets bound by mac and interface name.
- Fix typo in selftest comment and sort tests alphabetically in
Makefile.
- Link to v1:
https://lore.kernel.org/r/20250909-netcons-retrigger-v1-0-3aea904926cf@gmai…
---
Andre Carvalho (4):
netconsole: convert 'enabled' flag to enum for clearer state management
netpoll: add wrapper around __netpoll_setup with dev reference
netconsole: resume previously deactivated target
selftests: netconsole: validate target reactivation
Breno Leitao (2):
netconsole: add target_state enum
netconsole: add STATE_DEACTIVATED to track targets disabled by low level
drivers/net/netconsole.c | 102 +++++++++++++++------
include/linux/netpoll.h | 1 +
net/core/netpoll.c | 20 ++++
tools/testing/selftests/drivers/net/Makefile | 1 +
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 30 +++++-
.../selftests/drivers/net/netcons_resume.sh | 92 +++++++++++++++++++
6 files changed, 216 insertions(+), 30 deletions(-)
---
base-commit: 312e6f7676e63bbb9b81e5c68e580a9f776cc6f0
change-id: 20250816-netcons-retrigger-a4f547bfc867
Best regards,
--
Andre Carvalho <asantostc(a)gmail.com>
From: Benjamin Berg <benjamin.berg(a)intel.com>
This patchset is an attempt to start a nolibc port of UML. The goal is
to port UML to use nolibc in smaller chunks to make the switch more
manageable.
There are three parts to this patchset:
* Two patches to use tools/include headers instead of kernel headers
for userspace files.
* A few nolibc fixes and a new NOLIBC_NO_STARTCODE compile flag for it
* Finally nolibc build support for UML and switching two files while
adding the appropriate support in nolibc itself.
v1 of this patchset was
https://lore.kernel.org/all/20250915071115.1429196-1-benjamin@sipsolutions.…
Changes in v2:
- add sys/uio.h and sys/ptrace.h to nolibc
- Use NOLIBC_NO_RUNTIME to disable nolibc startup code
- Fix out-of-tree build
- various small improvements and cleanups
Should the nolibc changes be merged separately or could everything go
through the same branch?
Also, what about tools/include/linux/compiler.h? It seems that was added
for the tracing code, but it is not clear to me who might ACK that fix.
Benjamin
Benjamin Berg (11):
tools compiler.h: fix __used definition
um: use tools/include for user files
tools/nolibc/stdio: remove perror if NOLIBC_IGNORE_ERRNO is set
tools/nolibc/dirent: avoid errno in readdir_r
tools/nolibc: use __fallthrough__ rather than fallthrough
tools/nolibc: add option to disable runtime
um: add infrastructure to build files using nolibc
um: use nolibc for the --showconfig implementation
tools/nolibc: add uio.h with readv and writev
tools/nolibc: add ptrace support
um: switch ptrace FP register access to nolibc
arch/um/Makefile | 38 +++++++++++---
arch/um/include/shared/init.h | 2 +-
arch/um/include/shared/os.h | 2 +
arch/um/include/shared/user.h | 6 ---
arch/um/kernel/Makefile | 2 +-
arch/um/kernel/skas/stub.c | 1 +
arch/um/kernel/skas/stub_exe.c | 4 +-
arch/um/os-Linux/skas/process.c | 6 +--
arch/um/os-Linux/start_up.c | 4 +-
arch/um/scripts/Makefile.rules | 10 +++-
arch/x86/um/Makefile | 6 ++-
arch/x86/um/os-Linux/Makefile | 5 +-
arch/x86/um/os-Linux/registers.c | 16 ++----
arch/x86/um/user-offsets.c | 1 -
tools/include/linux/compiler.h | 2 +-
tools/include/nolibc/Makefile | 2 +
tools/include/nolibc/arch-arm.h | 2 +
tools/include/nolibc/arch-arm64.h | 2 +
tools/include/nolibc/arch-loongarch.h | 2 +
tools/include/nolibc/arch-m68k.h | 2 +
tools/include/nolibc/arch-mips.h | 2 +
tools/include/nolibc/arch-powerpc.h | 2 +
tools/include/nolibc/arch-riscv.h | 2 +
tools/include/nolibc/arch-s390.h | 2 +
tools/include/nolibc/arch-sh.h | 2 +
tools/include/nolibc/arch-sparc.h | 2 +
tools/include/nolibc/arch-x86.h | 4 ++
tools/include/nolibc/compiler.h | 4 +-
tools/include/nolibc/crt.h | 3 ++
tools/include/nolibc/dirent.h | 6 +--
tools/include/nolibc/nolibc.h | 2 +
tools/include/nolibc/stackprotector.h | 2 +
tools/include/nolibc/stdio.h | 2 +
tools/include/nolibc/stdlib.h | 2 +
tools/include/nolibc/sys.h | 3 +-
tools/include/nolibc/sys/auxv.h | 3 ++
tools/include/nolibc/sys/ptrace.h | 52 ++++++++++++++++++++
tools/include/nolibc/sys/uio.h | 49 ++++++++++++++++++
tools/testing/selftests/nolibc/nolibc-test.c | 11 +++++
39 files changed, 222 insertions(+), 48 deletions(-)
create mode 100644 tools/include/nolibc/sys/ptrace.h
create mode 100644 tools/include/nolibc/sys/uio.h
--
2.51.0
The sendto() call in walk_tx() was passing NULL as the buffer argument,
which can trigger a -Wnonnull warning with some compilers.
Although the size is 0 and no data is actually sent, passing a null
pointer is technically incorrect.
This commit changes NULL to an empty string literal ("") to satisfy the
non-null argument requirement and fix the compiler warning.
Signed-off-by: Wake Liu <wakel(a)google.com>
---
tools/testing/selftests/net/psock_tpacket.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/psock_tpacket.c b/tools/testing/selftests/net/psock_tpacket.c
index 221270cee3ea..0c24adbb292e 100644
--- a/tools/testing/selftests/net/psock_tpacket.c
+++ b/tools/testing/selftests/net/psock_tpacket.c
@@ -470,7 +470,7 @@ static void walk_tx(int sock, struct ring *ring)
bug_on(total_packets != 0);
- ret = sendto(sock, NULL, 0, 0, NULL, 0);
+ ret = sendto(sock, "", 0, 0, NULL, 0);
if (ret == -1) {
perror("sendto");
exit(1);
--
2.51.0.534.gc79095c0ca-goog
The TODO about using the number of vCPUs instead of vcpu.id + 1
was already addressed by commit 376bc1b458c9 ("KVM: selftests: Don't
assume vcpu->id is '0' in xAPIC state test"). The comment is now
stale and can be removed.
Signed-off-by: Sukrut Heroorkar <hsukrut3(a)gmail.com>
---
tools/testing/selftests/kvm/x86/xapic_state_test.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86/xapic_state_test.c b/tools/testing/selftests/kvm/x86/xapic_state_test.c
index fdebff1165c7..3b4814c55722 100644
--- a/tools/testing/selftests/kvm/x86/xapic_state_test.c
+++ b/tools/testing/selftests/kvm/x86/xapic_state_test.c
@@ -120,8 +120,8 @@ static void test_icr(struct xapic_vcpu *x)
__test_icr(x, icr | i);
/*
- * Send all flavors of IPIs to non-existent vCPUs. TODO: use number of
- * vCPUs, not vcpu.id + 1. Arbitrarily use vector 0xff.
+ * Send all flavors of IPIs to non-existent vCPUs. Arbitrarily use
+ * vector 0xff.
*/
icr = APIC_INT_ASSERT | 0xff;
for (i = 0; i < 0xff; i++) {
--
2.43.0
Fix to avoid the usage of the `res` variable uninitialized in the
following macro expansions.
It solves the following warning:
In function ‘iommufd_viommu_vdevice_alloc’,
inlined from ‘wrapper_iommufd_viommu_vdevice_alloc’ at
iommufd.c:2889:1:
../kselftest_harness.h:760:12: warning: ‘ret’ may be used uninitialized
[-Wmaybe-uninitialized]
760 | if (!(__exp _t __seen)) { \
| ^
../kselftest_harness.h:513:9: note: in expansion of macro ‘__EXPECT’
513 | __EXPECT(expected, #expected, seen, #seen, ==, 1)
| ^~~~~~~~
iommufd_utils.h:1057:9: note: in expansion of macro ‘ASSERT_EQ’
1057 | ASSERT_EQ(0, _test_cmd_trigger_vevents(self->fd, dev_id,
nvevents))
| ^~~~~~~~~
iommufd.c:2924:17: note: in expansion of macro
‘test_cmd_trigger_vevents’
2924 | test_cmd_trigger_vevents(dev_id, 3);
| ^~~~~~~~~~~~~~~~~~~~~~~~
The issue can be reproduced, building the tests, with the command:
make -C tools/testing/selftests TARGETS=iommu
Signed-off-by: Alessandro Zanni <alessandro.zanni87(a)gmail.com>
---
tools/testing/selftests/iommu/iommufd_utils.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 3c3e08b8c90e..4ae0fcc4f871 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -1042,7 +1042,7 @@ static int _test_cmd_trigger_vevents(int fd, __u32 dev_id, __u32 nvevents)
.dev_id = dev_id,
},
};
- int ret;
+ int ret = 0;
while (nvevents--) {
ret = ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_TRIGGER_VEVENT),
--
2.43.0
This is series 2a/5 of the migration to `core::ffi::CStr`[0].
20250704-core-cstr-prepare-v1-0-a91524037783(a)gmail.com.
This series depends on the prior series[0] and is intended to go through
the rust tree to reduce the number of release cycles required to
complete the work.
Subsystem maintainers: I would appreciate your `Acked-by`s so that this
can be taken through Miguel's tree (where the other series must go).
[0] https://lore.kernel.org/all/20250704-core-cstr-prepare-v1-0-a91524037783@gm…
Signed-off-by: Tamir Duberstein <tamird(a)gmail.com>
---
Changes in v3:
- Add a patch to address new code in device.rs.
- Drop incorrectly applied Acked-by tags from Danilo.
- Link to v2: https://lore.kernel.org/r/20250719-core-cstr-fanout-1-v2-0-1ab5ba189c6e@gma…
Changes in v2:
- Rebase on rust-next.
- Drop pin-init patch, which is no longer needed.
- Link to v1: https://lore.kernel.org/r/20250709-core-cstr-fanout-1-v1-0-64308e7203fc@gma…
---
Tamir Duberstein (9):
gpu: nova-core: use `kernel::{fmt,prelude::fmt!}`
rust: alloc: use `kernel::{fmt,prelude::fmt!}`
rust: block: use `kernel::{fmt,prelude::fmt!}`
rust: device: use `kernel::{fmt,prelude::fmt!}`
rust: file: use `kernel::{fmt,prelude::fmt!}`
rust: kunit: use `kernel::{fmt,prelude::fmt!}`
rust: seq_file: use `kernel::{fmt,prelude::fmt!}`
rust: sync: use `kernel::{fmt,prelude::fmt!}`
rust: device: use `kernel::{fmt,prelude::fmt!}`
drivers/block/rnull.rs | 2 +-
drivers/gpu/nova-core/gpu.rs | 3 +--
drivers/gpu/nova-core/regs/macros.rs | 6 +++---
rust/kernel/alloc/kbox.rs | 2 +-
rust/kernel/alloc/kvec.rs | 2 +-
rust/kernel/alloc/kvec/errors.rs | 2 +-
rust/kernel/block/mq.rs | 2 +-
rust/kernel/block/mq/gen_disk.rs | 2 +-
rust/kernel/block/mq/raw_writer.rs | 3 +--
rust/kernel/device.rs | 6 +++---
rust/kernel/device/property.rs | 23 ++++++++++++-----------
rust/kernel/fs/file.rs | 5 +++--
rust/kernel/kunit.rs | 8 ++++----
rust/kernel/seq_file.rs | 6 +++---
rust/kernel/sync/arc.rs | 2 +-
scripts/rustdoc_test_gen.rs | 2 +-
16 files changed, 38 insertions(+), 38 deletions(-)
---
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
change-id: 20250709-core-cstr-fanout-1-f20611832272
Best regards,
--
Tamir Duberstein <tamird(a)gmail.com>
The rtnetlink FOU selftest prints an incorrect string:
"FAIL: fou"s. Change it to the intended "FAIL: fou" by
removing a stray character in the end_test string of the test.
Signed-off-by: Alok Tiwari <alok.a.tiwari(a)oracle.com>
---
tools/testing/selftests/net/rtnetlink.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh
index d6c00efeb664..24bba74c77ee 100755
--- a/tools/testing/selftests/net/rtnetlink.sh
+++ b/tools/testing/selftests/net/rtnetlink.sh
@@ -519,7 +519,7 @@ kci_test_encap_fou()
run_cmd_fail ip -netns "$testns" fou del port 9999
run_cmd ip -netns "$testns" fou del port 7777
if [ $ret -ne 0 ]; then
- end_test "FAIL: fou"s
+ end_test "FAIL: fou"
return 1
fi
--
2.50.1
To: linux-kselftest(a)vger.kernel.org
Date: 24-09-2025
Thematic Funds Letter Of Intent
It's a pleasure to connect with you
Having been referred to your investment by my team, we would be
honored to review your available investment projects for onward
referral to my principal investors who can allocate capital for
the financing of it.
kindly advise at your convenience
Best Regards,
Respectfully,
Al Sayyid Sultan Yarub Al Busaidi
Director
The active-backup bonding mode supports XFRM ESP offload. However, when
a bond is added using command like `ip link add bond0 type bond mode 1
miimon 100`, the `ethtool -k` command shows that the XFRM ESP offload is
disabled. This occurs because, in bond_newlink(), we change bond link
first and register bond device later. So the XFRM feature update in
bond_option_mode_set() is not called as the bond device is not yet
registered, leading to the offload feature not being set successfully.
To resolve this issue, we can modify the code order in bond_newlink() to
ensure that the bond device is registered first before changing the bond
link parameters. This change will allow the XFRM ESP offload feature to be
correctly enabled.
Fixes: 007ab5345545 ("bonding: fix feature flag setting at init time")
Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
v2: rebase to latest net, no code update
---
drivers/net/bonding/bond_main.c | 2 +-
drivers/net/bonding/bond_netlink.c | 16 +++++++++-------
include/net/bonding.h | 1 +
3 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 57be04f6cb11..f4f0feddd9fa 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4411,7 +4411,7 @@ void bond_work_init_all(struct bonding *bond)
INIT_DELAYED_WORK(&bond->slave_arr_work, bond_slave_arr_handler);
}
-static void bond_work_cancel_all(struct bonding *bond)
+void bond_work_cancel_all(struct bonding *bond)
{
cancel_delayed_work_sync(&bond->mii_work);
cancel_delayed_work_sync(&bond->arp_work);
diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c
index 57fff2421f1b..7a9d73ec8e91 100644
--- a/drivers/net/bonding/bond_netlink.c
+++ b/drivers/net/bonding/bond_netlink.c
@@ -579,20 +579,22 @@ static int bond_newlink(struct net_device *bond_dev,
struct rtnl_newlink_params *params,
struct netlink_ext_ack *extack)
{
+ struct bonding *bond = netdev_priv(bond_dev);
struct nlattr **data = params->data;
struct nlattr **tb = params->tb;
int err;
- err = bond_changelink(bond_dev, tb, data, extack);
- if (err < 0)
+ err = register_netdevice(bond_dev);
+ if (err)
return err;
- err = register_netdevice(bond_dev);
- if (!err) {
- struct bonding *bond = netdev_priv(bond_dev);
+ netif_carrier_off(bond_dev);
+ bond_work_init_all(bond);
- netif_carrier_off(bond_dev);
- bond_work_init_all(bond);
+ err = bond_changelink(bond_dev, tb, data, extack);
+ if (err) {
+ bond_work_cancel_all(bond);
+ unregister_netdevice(bond_dev);
}
return err;
diff --git a/include/net/bonding.h b/include/net/bonding.h
index e06f0d63b2c1..bd56ad976cfb 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -711,6 +711,7 @@ struct bond_vlan_tag *bond_verify_device_path(struct net_device *start_dev,
int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave);
void bond_slave_arr_work_rearm(struct bonding *bond, unsigned long delay);
void bond_work_init_all(struct bonding *bond);
+void bond_work_cancel_all(struct bonding *bond);
#ifdef CONFIG_PROC_FS
void bond_create_proc_entry(struct bonding *bond);
--
2.50.1
Soft offlining a HugeTLB page reduces the HugeTLB page pool.
Commit 56374430c5dfc ("mm/memory-failure: userspace controls soft-offlining pages")
introduced the following sysctl interface to control soft offline:
/proc/sys/vm/enable_soft_offline
The interface does not distinguish between page types:
0 - Soft offline is disabled
1 - Soft offline is enabled
Convert enable_soft_offline to a bitmask and support disabling soft
offline for HugeTLB pages:
Bits:
0 - Enable soft offline
1 - Disable soft offline for HugeTLB pages
Supported values:
0 - Soft offline is disabled
1 - Soft offline is enabled
3 - Soft offline is enabled (disabled for HugeTLB pages)
Existing behavior is preserved.
Update documentation and HugeTLB soft offline self tests.
Reported-by: Shawn Fan <shawn.fan(a)intel.com>
Suggested-by: Tony Luck <tony.luck(a)intel.com>
Signed-off-by: Kyle Meyer <kyle.meyer(a)hpe.com>
---
Tony's patch:
* https://lore.kernel.org/all/20250904155720.22149-1-tony.luck@intel.com
v1:
* https://lore.kernel.org/all/aMGkAI3zKlVsO0S2@hpe.com
v1 -> v2:
* Make the interface extensible, as suggested by David.
* Preserve existing behavior, as suggested by Jiaqi and David.
Why clear errno in self tests?
madvise() does not set errno when it's successful and errno is set by madvise()
during test_soft_offline_common(3) causing test_soft_offline_common(1) to fail:
# Test soft-offline when enabled_soft_offline=1
# Hugepagesize is 1048576kB
# enable_soft_offline => 1
# Before MADV_SOFT_OFFLINE nr_hugepages=7
# Allocated 0x80000000 bytes of hugetlb pages
# MADV_SOFT_OFFLINE 0x7fd600000000 ret=0, errno=95
# MADV_SOFT_OFFLINE should ret 0
# After MADV_SOFT_OFFLINE nr_hugepages=6
not ok 2 Test soft-offline when enabled_soft_offline=1
---
.../ABI/testing/sysfs-memory-page-offline | 3 ++
Documentation/admin-guide/sysctl/vm.rst | 28 ++++++++++++++++---
mm/memory-failure.c | 17 +++++++++--
.../selftests/mm/hugetlb-soft-offline.c | 19 ++++++++++---
4 files changed, 56 insertions(+), 11 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-memory-page-offline b/Documentation/ABI/testing/sysfs-memory-page-offline
index 00f4e35f916f..d3f05ed6605e 100644
--- a/Documentation/ABI/testing/sysfs-memory-page-offline
+++ b/Documentation/ABI/testing/sysfs-memory-page-offline
@@ -20,6 +20,9 @@ Description:
number, or a error when the offlining failed. Reading
the file is not allowed.
+ Soft-offline can be controlled via sysctl, see:
+ Documentation/admin-guide/sysctl/vm.rst
+
What: /sys/devices/system/memory/hard_offline_page
Date: Sep 2009
KernelVersion: 2.6.33
diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 4d71211fdad8..ace73480eb9d 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -309,19 +309,39 @@ physical memory) vs performance / capacity implications in transparent and
HugeTLB cases.
For all architectures, enable_soft_offline controls whether to soft offline
-memory pages. When set to 1, kernel attempts to soft offline the pages
-whenever it thinks needed. When set to 0, kernel returns EOPNOTSUPP to
-the request to soft offline the pages. Its default value is 1.
+memory pages.
+
+enable_soft_offline is a bitmask:
+
+Bits::
+
+ 0 - Enable soft offline
+ 1 - Disable soft offline for HugeTLB pages
+
+Supported values::
+
+ 0 - Soft offline is disabled
+ 1 - Soft offline is enabled
+ 3 - Soft offline is enabled (disabled for HugeTLB pages)
+
+The default value is 1.
+
+If soft offline is disabled for the requested page type, EOPNOTSUPP is returned.
It is worth mentioning that after setting enable_soft_offline to 0, the
following requests to soft offline pages will not be performed:
+- Request to soft offline from sysfs (soft_offline_page).
+
- Request to soft offline pages from RAS Correctable Errors Collector.
-- On ARM, the request to soft offline pages from GHES driver.
+- On ARM and X86, the request to soft offline pages from GHES driver.
- On PARISC, the request to soft offline pages from Page Deallocation Table.
+Note:
+ Soft offlining a HugeTLB page reduces the HugeTLB page pool.
+
extfrag_threshold
=================
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index fc30ca4804bf..0ad9ae11d9e8 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -64,11 +64,14 @@
#include "internal.h"
#include "ras/ras_event.h"
+#define SOFT_OFFLINE_ENABLED BIT(0)
+#define SOFT_OFFLINE_SKIP_HUGETLB BIT(1)
+
static int sysctl_memory_failure_early_kill __read_mostly;
static int sysctl_memory_failure_recovery __read_mostly = 1;
-static int sysctl_enable_soft_offline __read_mostly = 1;
+static int sysctl_enable_soft_offline __read_mostly = SOFT_OFFLINE_ENABLED;
atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
@@ -150,7 +153,7 @@ static const struct ctl_table memory_failure_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
- .extra2 = SYSCTL_ONE,
+ .extra2 = SYSCTL_THREE,
}
};
@@ -2799,12 +2802,20 @@ int soft_offline_page(unsigned long pfn, int flags)
return -EIO;
}
- if (!sysctl_enable_soft_offline) {
+ if (!(sysctl_enable_soft_offline & SOFT_OFFLINE_ENABLED)) {
pr_info_once("disabled by /proc/sys/vm/enable_soft_offline\n");
put_ref_page(pfn, flags);
return -EOPNOTSUPP;
}
+ if (sysctl_enable_soft_offline & SOFT_OFFLINE_SKIP_HUGETLB) {
+ if (folio_test_hugetlb(pfn_folio(pfn))) {
+ pr_info_once("disabled for HugeTLB pages by /proc/sys/vm/enable_soft_offline\n");
+ put_ref_page(pfn, flags);
+ return -EOPNOTSUPP;
+ }
+ }
+
mutex_lock(&mf_mutex);
if (PageHWPoison(page)) {
diff --git a/tools/testing/selftests/mm/hugetlb-soft-offline.c b/tools/testing/selftests/mm/hugetlb-soft-offline.c
index f086f0e04756..b87c8778cadf 100644
--- a/tools/testing/selftests/mm/hugetlb-soft-offline.c
+++ b/tools/testing/selftests/mm/hugetlb-soft-offline.c
@@ -5,6 +5,8 @@
* offlining failed with EOPNOTSUPP.
* - if enable_soft_offline = 1, a hugepage should be dissolved and
* nr_hugepages/free_hugepages should be reduced by 1.
+ * - if enable_soft_offline = 3, hugepages should stay intact and soft
+ * offlining failed with EOPNOTSUPP.
*
* Before running, make sure more than 2 hugepages of default_hugepagesz
* are allocated. For example, if /proc/meminfo/Hugepagesize is 2048kB:
@@ -32,6 +34,9 @@
#define EPREFIX " !!! "
+#define SOFT_OFFLINE_ENABLED (1 << 0)
+#define SOFT_OFFLINE_SKIP_HUGETLB (1 << 1)
+
static int do_soft_offline(int fd, size_t len, int expect_errno)
{
char *filemap = NULL;
@@ -56,6 +61,7 @@ static int do_soft_offline(int fd, size_t len, int expect_errno)
ksft_print_msg("Allocated %#lx bytes of hugetlb pages\n", len);
hwp_addr = filemap + len / 2;
+ errno = 0;
ret = madvise(hwp_addr, pagesize, MADV_SOFT_OFFLINE);
ksft_print_msg("MADV_SOFT_OFFLINE %p ret=%d, errno=%d\n",
hwp_addr, ret, errno);
@@ -83,7 +89,7 @@ static int set_enable_soft_offline(int value)
char cmd[256] = {0};
FILE *cmdfile = NULL;
- if (value != 0 && value != 1)
+ if (value < 0 || value > 3)
return -EINVAL;
sprintf(cmd, "echo %d > /proc/sys/vm/enable_soft_offline", value);
@@ -155,13 +161,17 @@ static int create_hugetlbfs_file(struct statfs *file_stat)
static void test_soft_offline_common(int enable_soft_offline)
{
int fd;
- int expect_errno = enable_soft_offline ? 0 : EOPNOTSUPP;
+ int expect_errno = 0;
struct statfs file_stat;
unsigned long hugepagesize_kb = 0;
unsigned long nr_hugepages_before = 0;
unsigned long nr_hugepages_after = 0;
int ret;
+ if (!(enable_soft_offline & SOFT_OFFLINE_ENABLED) ||
+ (enable_soft_offline & SOFT_OFFLINE_SKIP_HUGETLB))
+ expect_errno = EOPNOTSUPP;
+
ksft_print_msg("Test soft-offline when enabled_soft_offline=%d\n",
enable_soft_offline);
@@ -198,7 +208,7 @@ static void test_soft_offline_common(int enable_soft_offline)
// No need for the hugetlbfs file from now on.
close(fd);
- if (enable_soft_offline) {
+ if (expect_errno == 0) {
if (nr_hugepages_before != nr_hugepages_after + 1) {
ksft_test_result_fail("MADV_SOFT_OFFLINE should reduced 1 hugepage\n");
return;
@@ -219,8 +229,9 @@ static void test_soft_offline_common(int enable_soft_offline)
int main(int argc, char **argv)
{
ksft_print_header();
- ksft_set_plan(2);
+ ksft_set_plan(3);
+ test_soft_offline_common(3);
test_soft_offline_common(1);
test_soft_offline_common(0);
--
2.51.0
With the current Makefile, if the user tries something like
make TARGETS="bpf mm"
only mm is run and bpf is skipped, which is not intentional.
`bpf` and `sched_ext` are always filtered out even when TARGETS is set
explicitly due to how SKIP_TARGETS is implemented.
This default skip exists because these tests require newer LLVM/Clang
versions that may not be available on all systems.
Fix the SKIP_TARGETS logic so that bpf and sched_ext remain
skipped when TARGETS is taken from the Makefile but are included when
the user specifies them explicitly.
Signed-off-by: I Viswanath <viswanathiyyappan(a)gmail.com>
---
make --silent summary=1 TARGETS="bpf size" kselftest
make[3]: Entering directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/bpf'
Auto-detecting system features:
... llvm: [ OFF ]
Makefile:127: tools/build/Makefile.feature: No such file or directory
make[4]: *** No rule to make target 'tools/build/Makefile.feature'. Stop.
make[3]: *** [Makefile:344: /home/user/kernel-dev/linux-next/tools/testing/selftests/bpf/tools/sbin/bpftool] Error 2
make[3]: Leaving directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/bpf'
make[3]: Nothing to be done for 'all'.
make[3]: Entering directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/bpf'
Auto-detecting system features:
... llvm: [ OFF ]
Makefile:127: tools/build/Makefile.feature: No such file or directory
make[4]: *** No rule to make target 'tools/build/Makefile.feature'. Stop.
make[3]: *** [Makefile:344: /home/user/kernel-dev/linux-next/tools/testing/selftests/bpf/tools/sbin/bpftool] Error 2
make[3]: Leaving directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/bpf'
TAP version 13
1..1
# selftests: size: get_size
ok 1 selftests: size: get_size
make --silent summary=1 kselftest (bpf is between arm64 and breakpoints in TARGETS)
make[3]: Nothing to be done for 'all'.
make[3]: Entering directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/alsa'
make[3]: Nothing to be done for 'all'.
make[3]: Leaving directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/alsa'
make[3]: Entering directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/amd-pstate'
make[3]: Nothing to be done for 'all'.
make[3]: Leaving directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/amd-pstate'
make[3]: Entering directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/arm64'
make[3]: Leaving directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/arm64'
make[3]: Entering directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/breakpoints'
make[3]: Nothing to be done for 'all'.
make[3]: Leaving directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/breakpoints'
make[3]: Nothing to be done for 'all'.
tools/testing/selftests/Makefile | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index babed7b1c2d1..c6cedb09c372 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -145,7 +145,10 @@ endif
# User can optionally provide a TARGETS skiplist. By default we skip
# targets using BPF since it has cutting edge build time dependencies
# which require more effort to install.
-SKIP_TARGETS ?= bpf sched_ext
+ifeq ($(origin TARGETS), file)
+ SKIP_TARGETS ?= bpf sched_ext
+endif
+
ifneq ($(SKIP_TARGETS),)
TMP := $(filter-out $(SKIP_TARGETS), $(TARGETS))
override TARGETS := $(TMP)
--
2.47.3
Now that the 'flags' attribute is used, it seems interesting to add one
flag for 'server-side', a boolean value.
Here are a few patches related to the 'server-side' attribute:
- Patch 1: only announce this attribute on the server side.
- Patch 2: announce the 'server-side' flag when this is the case.
- Patch 3: deprecate the 'server-side' attribute.
- Patch 4: use the 'server-side' flag in the selftests.
- Patches 5, 6: small cleanups when working on code around.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Matthieu Baerts (NGI0) (6):
mptcp: pm: netlink: only add server-side attr when true
mptcp: pm: netlink: announce server-side flag
mptcp: pm: netlink: deprecate server-side attribute
selftests: mptcp: pm: get server-side flag
mptcp: use _BITUL() instead of (1 << x)
mptcp: remove unused returned value of check_data_fin
Documentation/netlink/specs/mptcp_pm.yaml | 5 +++--
include/uapi/linux/mptcp.h | 11 ++++++-----
include/uapi/linux/mptcp_pm.h | 4 ++--
net/mptcp/pm_netlink.c | 9 +++++++--
net/mptcp/protocol.c | 5 +----
tools/testing/selftests/net/mptcp/pm_nl_ctl.c | 9 ++++++++-
tools/testing/selftests/net/mptcp/userspace_pm.sh | 2 +-
7 files changed, 28 insertions(+), 17 deletions(-)
---
base-commit: 315f423be0d1ebe720d8fd4fa6bed68586b13d34
change-id: 20250916-net-next-mptcp-server-side-flag-0f002418946d
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>