This series backports 19 patches to update minmax.h in the 6.1.y branch,
aligning it with v6.17-rc7.
The ultimate goal is to synchronize all longterm branches so that they
include the full set of minmax.h changes.
Previous work to update 6.12.48:
https://lore.kernel.org/stable/20250922103123.14538-1-farbere@amazon.com/T/…
and 6.6.107:
https://lore.kernel.org/stable/20250922103241.16213-1-farbere@amazon.com/T/…
The key motivation is to bring in commit d03eba99f5bf ("minmax: allow
min()/max()/clamp() if the arguments have the same signedness"), which
is missing in older kernels.
In mainline, this change enables min()/max()/clamp() to accept mixed
argument types, provided both have the same signedness. Without it,
backported patches that use these forms may trigger compiler warnings,
which escalate to build failures when -Werror is enabled.
Andy Shevchenko (1):
minmax: deduplicate __unconst_integer_typeof()
David Laight (8):
minmax: fix indentation of __cmp_once() and __clamp_once()
minmax.h: add whitespace around operators and after commas
minmax.h: update some comments
minmax.h: reduce the #define expansion of min(), max() and clamp()
minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp()
minmax.h: move all the clamp() definitions after the min/max() ones
minmax.h: simplify the variants of clamp()
minmax.h: remove some #defines that are only expanded once
Herve Codina (1):
minmax: Introduce {min,max}_array()
Linus Torvalds (8):
minmax: avoid overly complicated constant expressions in VM code
minmax: simplify and clarify min_t()/max_t() implementation
minmax: make generic MIN() and MAX() macros available everywhere
minmax: add a few more MIN_T/MAX_T users
minmax: simplify min()/max()/clamp() implementation
minmax: don't use max() in situations that want a C constant
expression
minmax: improve macro expansion and type checking
minmax: fix up min3() and max3() too
Matthew Wilcox (Oracle) (1):
minmax: add in_range() macro
arch/arm/mm/pageattr.c | 6 +-
arch/um/drivers/mconsole_user.c | 2 +
arch/x86/mm/pgtable.c | 2 +-
drivers/edac/sb_edac.c | 4 +-
drivers/edac/skx_common.h | 1 -
.../drm/amd/display/modules/hdcp/hdcp_ddc.c | 2 +
.../drm/amd/pm/powerplay/hwmgr/ppevvmath.h | 14 +-
drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 2 +-
.../drm/arm/display/include/malidp_utils.h | 2 +-
.../display/komeda/komeda_pipeline_state.c | 24 +-
drivers/gpu/drm/drm_color_mgmt.c | 2 +-
drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 6 -
drivers/gpu/drm/radeon/evergreen_cs.c | 2 +
drivers/hwmon/adt7475.c | 24 +-
drivers/input/touchscreen/cyttsp4_core.c | 2 +-
drivers/irqchip/irq-sun6i-r.c | 2 +-
drivers/md/dm-integrity.c | 2 +-
drivers/media/dvb-frontends/stv0367_priv.h | 3 +
.../net/ethernet/chelsio/cxgb3/cxgb3_main.c | 18 +-
.../net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +-
drivers/net/fjes/fjes_main.c | 4 +-
drivers/nfc/pn544/i2c.c | 2 -
drivers/platform/x86/sony-laptop.c | 1 -
drivers/scsi/isci/init.c | 6 +-
.../pci/hive_isp_css_include/math_support.h | 5 -
drivers/virt/acrn/ioreq.c | 4 +-
fs/btrfs/misc.h | 2 -
fs/btrfs/tree-checker.c | 2 +-
fs/ext2/balloc.c | 2 -
fs/ext4/ext4.h | 2 -
fs/ufs/util.h | 6 -
include/linux/compiler.h | 9 +
include/linux/minmax.h | 264 +++++++++++++-----
include/linux/pageblock-flags.h | 2 +-
kernel/trace/preemptirq_delay_test.c | 2 -
lib/btree.c | 1 -
lib/decompress_unlzma.c | 2 +
lib/logic_pio.c | 3 -
lib/vsprintf.c | 2 +-
mm/zsmalloc.c | 1 -
net/ipv4/proc.c | 2 +-
net/ipv6/proc.c | 2 +-
net/netfilter/nf_nat_core.c | 6 +-
net/tipc/core.h | 2 +-
net/tipc/link.c | 10 +-
.../selftests/bpf/progs/get_branch_snapshot.c | 4 +-
tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +
tools/testing/selftests/vm/mremap_test.c | 2 +
48 files changed, 290 insertions(+), 184 deletions(-)
--
2.47.3
During the discussion of the clone3() support for shadow stacks concerns
were raised from the glibc side that since it is not possible to reuse
the allocated shadow stack[1]. This means that the benefit of being able
to manage allocations is greatly reduced, for example it is not possible
to integrate the shadow stacks into the glibc thread stack cache. The
stack can be inspected but otherwise it would have to be unmapped and
remapped before it could be used again, it's not clear that this is
better than managing things in the kernel.
In that discussion I suggested that we could enable reuse by writing a
token to the shadow stack of exiting threads, mirroring how the
userspace stack pivot instructions write a token to the outgoing stack.
As mentioned by Florian[2] glibc already unwinds the stack and exits the
thread from the start routine which would integrate nicely with this,
the shadow stack pointer will be at the same place as it was when the
thread started.
This would not write a token if the thread doesn't exit cleanly, that
seems viable to me - users should probably handle this by double
checking that a token is present after waiting for the thread.
This is tagged as a RFC since I put it together fairly quickly to
demonstrate the proposal and the suggestion hasn't had much response
either way from the glibc developers. At the very least we don't
currently handle scheduling during exit(), or distinguish why the thread
is exiting. I've also not done anything about x86.
[1] https://marc.info/?l=glibc-alpha&m=175821637429537&w=2
[2] https://marc.info/?l=glibc-alpha&m=175733266913483&w=2
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Mark Brown (3):
arm64/gcs: Support reuse of GCS for exited threads
kselftest/arm64: Validate PR_SHADOW_STACK_EXIT_TOKEN in basic-gcs
kselftest/arm64: Add PR_SHADOW_STACK_EXIT_TOKEN to gcs-locking
arch/arm64/include/asm/gcs.h | 3 +-
arch/arm64/mm/gcs.c | 25 ++++-
include/uapi/linux/prctl.h | 1 +
tools/testing/selftests/arm64/gcs/basic-gcs.c | 121 ++++++++++++++++++++++++
tools/testing/selftests/arm64/gcs/gcs-locking.c | 23 +++++
tools/testing/selftests/arm64/gcs/gcs-util.h | 3 +-
6 files changed, 173 insertions(+), 3 deletions(-)
---
base-commit: 0b67d4b724b4afed2690c21bef418b8a803c5be2
change-id: 20250919-arm64-gcs-exit-token-82c3c2570aad
prerequisite-change-id: 20231019-clone3-shadow-stack-15d40d2bf536
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Fix a memory leak in netpoll and introduce netconsole selftests that
expose the issue when running with kmemleak detection enabled.
This patchset includes a selftest for netpoll with multiple concurrent
users (netconsole + bonding), which simulates the scenario from test[1]
that originally demonstrated the issue allegedly fixed by commit
efa95b01da18 ("netpoll: fix use after free") - a commit that is now
being reverted.
Sending this to "net" branch because this is a fix, and the selftest
might help with the backports validation.
Link: https://lore.kernel.org/lkml/96b940137a50e5c387687bb4f57de8b0435a653f.14048… [1]
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Changes in v5:
- Set CONFIG_BONDING=m in selftests/drivers/net/config.
- Link to v4: https://lore.kernel.org/r/20250917-netconsole_torture-v4-0-0a5b3b8f81ce@deb…
Changes in v4:
- Added an additional selftest to test multiple netpoll users in
parallel
- Link to v3: https://lore.kernel.org/r/20250905-netconsole_torture-v3-0-875c7febd316@deb…
Changes in v3:
- This patchset is a merge of the fix and the selftest together as
recommended by Jakub.
Changes in v2:
- Reuse the netconsole creation from lib_netcons.sh. Thus, refactoring
the create_dynamic_target() (Jakub)
- Move the "wait" to after all the messages has been sent.
- Link to v1: https://lore.kernel.org/r/20250902-netconsole_torture-v1-1-03c6066598e9@deb…
---
Breno Leitao (4):
net: netpoll: fix incorrect refcount handling causing incorrect cleanup
selftest: netcons: refactor target creation
selftest: netcons: create a torture test
selftest: netcons: add test for netconsole over bonded interfaces
net/core/netpoll.c | 7 +-
tools/testing/selftests/drivers/net/Makefile | 2 +
tools/testing/selftests/drivers/net/config | 1 +
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 197 ++++++++++++++++++---
.../selftests/drivers/net/netcons_over_bonding.sh | 76 ++++++++
.../selftests/drivers/net/netcons_torture.sh | 127 +++++++++++++
6 files changed, 385 insertions(+), 25 deletions(-)
---
base-commit: 5e87fdc37f8dc619549d49ba5c951b369ce7c136
change-id: 20250902-netconsole_torture-8fc23f0aca99
Best regards,
--
Breno Leitao <leitao(a)debian.org>
Hi Thomas and David,
I am seeing the following error during "kunit.py run --alltests run"
next-20250926.
$ make all compile_commands.json scripts_gdb ARCH=um O=.kunit --jobs=16
ERROR:root:/usr/bin/ld: drivers/net/wireless/intel/iwlwifi/tests/devinfo.o: in function `devinfo_pci_ids_config':
devinfo.c:(.text+0x2d): undefined reference to `iwl_bz_mac_cfg'
collect2: error: ld returned 1 exit status
make[3]: *** [../scripts/Makefile.vmlinux:72: vmlinux.unstripped] Error 1
make[2]: *** [/linux/linux_next/Makefile:1242: vmlinux] Error 2
make[1]: *** [/linux/linux_next/Makefile:248: __sub-make] Error 2
make: *** [Makefile:248: __sub-make] Error 2
Possile intearction between these two commits: Note: linux-kselftext
kunit branch is fine I am going send kunit pr to Linus later today.
Heads up that "kunit.py run --alltests run" is failing on next-20250926
commit 031cdd3bc3f369553933c1b0f4cb18000162c8ff
Author: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Date: Mon Sep 8 09:03:38 2025 +0200
kunit: Enable PCI on UML without triggering WARN()
commit 137b0bb916f1addb2ffbefd09a6e3e9d15fe6100
Author: Johannes Berg <johannes.berg(a)intel.com>
Date: Mon Sep 15 11:34:28 2025 +0300
wifi: iwlwifi: tests: check listed PCI IDs have configs
Note: linux-kselftext build just fine.
thanks,
-- Shuah
Hi all,
The test_xsk.sh script covers many AF_XDP use cases. The tests it runs
are defined in xksxceiver.c. Since this script is used to test real
hardware, the goal here is to leave it as it is, and only integrate the
tests that run on veth peers into the test_progs framework.
Some tests are flaky so they can't be integrated in the CI as they are.
I think that fixing their flakyness would require a significant amount of
work. So, as first step, I've excluded them from the list of tests
migrated to the CI (cf PATCH 14). If these tests get fixed at some
point, integrating them into the CI will be straightforward.
I noticed a small error on a function's return value while investigating
on the report's summary issue pointed out by Maciej in previous iteration,
the new PATCH 3 fixes it.
PATCH 1 extracts test_xsk[.c/.h] from xskxceiver[.c/.h] to make the
tests available to test_progs.
PATCH 2 to 7 fix small issues in the current test
PATCH 8 to 13 handle all errors to release resources instead of calling
exit() when any error occurs.
PATCH 14 isolates some flaky tests
PATCH 15 integrate the non-flaky tests to the test_progs framework
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
---
Changes in v4:
- Fix test_xsk.sh's summary report.
- Merge PATCH 11 & 12 together, otherwise PATCH 11 fails to build.
- Split old PATCH 3 in two patches. The first one fixes
testapp_stats_rx_dropped(), the second one fixes
testapp_xdp_shared_umem(). The unecessary frees (in
testapp_stats_rx_full() and testapp_stats_fill_empty() are removed)
- Link to v3: https://lore.kernel.org/r/20250904-xsk-v3-0-ce382e331485@bootlin.com
Changes in v3:
- Rebase on latest bpf-next_base to integrate commit c9110e6f7237 ("selftests/bpf:
Fix count write in testapp_xdp_metadata_copy()").
- Move XDP_METADATA_COPY_* tests from flaky-tests to nominal tests
- Link to v2: https://lore.kernel.org/r/20250902-xsk-v2-0-17c6345d5215@bootlin.com
Changes in v2:
- Rebase on the latest bpf-next_base and integrate the newly added tests
to the work (adjust_tail* and tx_queue_consumer tests)
- Re-order patches to split xkxceiver sooner.
- Fix the bug reported by Maciej.
- Fix verbose mode in test_xsk.sh by keeping kselftest (remove PATCH 1,
7 and 8)
- Link to v1: https://lore.kernel.org/r/20250313-xsk-v1-0-7374729a93b9@bootlin.com
---
Bastien Curutchet (eBPF Foundation) (15):
selftests/bpf: test_xsk: Split xskxceiver
selftests/bpf: test_xsk: Initialize bitmap before use
selftests/bpf: test_xsk: Fix __testapp_validate_traffic()'s return value
selftests/bpf: test_xsk: fix memory leak in testapp_stats_rx_dropped()
selftests/bpf: test_xsk: fix memory leak in testapp_xdp_shared_umem()
selftests/bpf: test_xsk: Wrap test clean-up in functions
selftests/bpf: test_xsk: Release resources when swap fails
selftests/bpf: test_xsk: Add return value to init_iface()
selftests/bpf: test_xsk: Don't exit immediately when xsk_attach fails
selftests/bpf: test_xsk: Don't exit immediately when gettimeofday fails
selftests/bpf: test_xsk: Don't exit immediately when workers fail
selftests/bpf: test_xsk: Don't exit immediately if validate_traffic fails
selftests/bpf: test_xsk: Don't exit immediately on allocation failures
selftests/bpf: test_xsk: Isolate flaky tests
selftests/bpf: test_xsk: Integrate test_xsk.c to test_progs framework
tools/testing/selftests/bpf/Makefile | 11 +-
tools/testing/selftests/bpf/prog_tests/test_xsk.c | 2595 ++++++++++++++++++++
tools/testing/selftests/bpf/prog_tests/test_xsk.h | 294 +++
tools/testing/selftests/bpf/prog_tests/xsk.c | 146 ++
tools/testing/selftests/bpf/xskxceiver.c | 2696 +--------------------
tools/testing/selftests/bpf/xskxceiver.h | 156 --
6 files changed, 3174 insertions(+), 2724 deletions(-)
---
base-commit: 1bd67e08d0f3fcb8cc69a73fb7aab9f048be4b8e
change-id: 20250218-xsk-0cf90e975d14
Best regards,
--
Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
A task in the kernel (task_mm_cid_work) runs somewhat periodically to
compact the mm_cid for each process. Add a test to validate that it runs
correctly and timely.
The test spawns 1 thread pinned to each CPU, then each thread, including
the main one, runs in short bursts for some time. During this period, the
mm_cids should be spanning all numbers between 0 and nproc.
At the end of this phase, a thread with high enough mm_cid (>= nproc/2)
is selected to be the new leader, all other threads terminate.
After some time, the only running thread should see 0 as mm_cid, if that
doesn't happen, the compaction mechanism didn't work and the test fails.
The test never fails if only 1 core is available, in which case, we
cannot test anything as the only available mm_cid is 0.
Acked-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Gabriele Monaco <gmonaco(a)redhat.com>
---
tools/testing/selftests/rseq/.gitignore | 1 +
tools/testing/selftests/rseq/Makefile | 2 +-
.../selftests/rseq/mm_cid_compaction_test.c | 204 ++++++++++++++++++
3 files changed, 206 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/rseq/mm_cid_compaction_test.c
diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selftests/rseq/.gitignore
index 0fda241fa62b..b3920c59bf40 100644
--- a/tools/testing/selftests/rseq/.gitignore
+++ b/tools/testing/selftests/rseq/.gitignore
@@ -3,6 +3,7 @@ basic_percpu_ops_test
basic_percpu_ops_mm_cid_test
basic_test
basic_rseq_op_test
+mm_cid_compaction_test
param_test
param_test_benchmark
param_test_compare_twice
diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
index 0d0a5fae5954..bc4d940f66d4 100644
--- a/tools/testing/selftests/rseq/Makefile
+++ b/tools/testing/selftests/rseq/Makefile
@@ -17,7 +17,7 @@ OVERRIDE_TARGETS = 1
TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
param_test_benchmark param_test_compare_twice param_test_mm_cid \
param_test_mm_cid_benchmark param_test_mm_cid_compare_twice \
- syscall_errors_test
+ syscall_errors_test mm_cid_compaction_test
TEST_GEN_PROGS_EXTENDED = librseq.so
diff --git a/tools/testing/selftests/rseq/mm_cid_compaction_test.c b/tools/testing/selftests/rseq/mm_cid_compaction_test.c
new file mode 100644
index 000000000000..d13623625f5a
--- /dev/null
+++ b/tools/testing/selftests/rseq/mm_cid_compaction_test.c
@@ -0,0 +1,204 @@
+// SPDX-License-Identifier: LGPL-2.1
+#define _GNU_SOURCE
+#include <assert.h>
+#include <pthread.h>
+#include <sched.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stddef.h>
+
+#include "../kselftest.h"
+#include "rseq.h"
+
+#define VERBOSE 0
+#define printf_verbose(fmt, ...) \
+ do { \
+ if (VERBOSE) \
+ printf(fmt, ##__VA_ARGS__); \
+ } while (0)
+
+/* 50 ms */
+#define RUNNER_PERIOD 50000
+/*
+ * Number of runs before we terminate or get the token.
+ * The number is slowly increasing with the number of CPUs as the compaction
+ * process can take longer on larger systems. This is an arbitrary value.
+ */
+#define THREAD_RUNS (3 + args->num_cpus/8)
+
+/*
+ * Number of times we check that the mm_cid were compacted.
+ * Checks are repeated every RUNNER_PERIOD.
+ */
+#define MM_CID_COMPACT_TIMEOUT 10
+
+struct thread_args {
+ int cpu;
+ int num_cpus;
+ pthread_mutex_t *token;
+ pthread_barrier_t *barrier;
+ pthread_t *tinfo;
+ struct thread_args *args_head;
+};
+
+static void __noreturn *thread_runner(void *arg)
+{
+ struct thread_args *args = arg;
+ int i, ret, curr_mm_cid;
+ cpu_set_t cpumask;
+
+ CPU_ZERO(&cpumask);
+ CPU_SET(args->cpu, &cpumask);
+ ret = pthread_setaffinity_np(pthread_self(), sizeof(cpumask), &cpumask);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to set affinity");
+ abort();
+ }
+ pthread_barrier_wait(args->barrier);
+
+ for (i = 0; i < THREAD_RUNS; i++)
+ usleep(RUNNER_PERIOD);
+ curr_mm_cid = rseq_current_mm_cid();
+ /*
+ * We select one thread with high enough mm_cid to be the new leader.
+ * All other threads (including the main thread) will terminate.
+ * After some time, the mm_cid of the only remaining thread should
+ * converge to 0, if not, the test fails.
+ */
+ if (curr_mm_cid >= args->num_cpus / 2 &&
+ !pthread_mutex_trylock(args->token)) {
+ printf_verbose(
+ "cpu%d has mm_cid=%d and will be the new leader.\n",
+ sched_getcpu(), curr_mm_cid);
+ for (i = 0; i < args->num_cpus; i++) {
+ if (args->tinfo[i] == pthread_self())
+ continue;
+ ret = pthread_join(args->tinfo[i], NULL);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to join thread");
+ abort();
+ }
+ }
+ pthread_barrier_destroy(args->barrier);
+ free(args->tinfo);
+ free(args->token);
+ free(args->barrier);
+ free(args->args_head);
+
+ for (i = 0; i < MM_CID_COMPACT_TIMEOUT; i++) {
+ curr_mm_cid = rseq_current_mm_cid();
+ printf_verbose("run %d: mm_cid=%d on cpu%d.\n", i,
+ curr_mm_cid, sched_getcpu());
+ if (curr_mm_cid == 0)
+ exit(EXIT_SUCCESS);
+ usleep(RUNNER_PERIOD);
+ }
+ exit(EXIT_FAILURE);
+ }
+ printf_verbose("cpu%d has mm_cid=%d and is going to terminate.\n",
+ sched_getcpu(), curr_mm_cid);
+ pthread_exit(NULL);
+}
+
+int test_mm_cid_compaction(void)
+{
+ cpu_set_t affinity;
+ int i, j, ret = 0, num_threads;
+ pthread_t *tinfo;
+ pthread_mutex_t *token;
+ pthread_barrier_t *barrier;
+ struct thread_args *args;
+
+ sched_getaffinity(0, sizeof(affinity), &affinity);
+ num_threads = CPU_COUNT(&affinity);
+ tinfo = calloc(num_threads, sizeof(*tinfo));
+ if (!tinfo) {
+ perror("Error: failed to allocate tinfo");
+ return -1;
+ }
+ args = calloc(num_threads, sizeof(*args));
+ if (!args) {
+ perror("Error: failed to allocate args");
+ ret = -1;
+ goto out_free_tinfo;
+ }
+ token = malloc(sizeof(*token));
+ if (!token) {
+ perror("Error: failed to allocate token");
+ ret = -1;
+ goto out_free_args;
+ }
+ barrier = malloc(sizeof(*barrier));
+ if (!barrier) {
+ perror("Error: failed to allocate barrier");
+ ret = -1;
+ goto out_free_token;
+ }
+ if (num_threads == 1) {
+ fprintf(stderr, "Cannot test on a single cpu. "
+ "Skipping mm_cid_compaction test.\n");
+ /* only skipping the test, this is not a failure */
+ goto out_free_barrier;
+ }
+ pthread_mutex_init(token, NULL);
+ ret = pthread_barrier_init(barrier, NULL, num_threads);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to initialise barrier");
+ goto out_free_barrier;
+ }
+ for (i = 0, j = 0; i < CPU_SETSIZE && j < num_threads; i++) {
+ if (!CPU_ISSET(i, &affinity))
+ continue;
+ args[j].num_cpus = num_threads;
+ args[j].tinfo = tinfo;
+ args[j].token = token;
+ args[j].barrier = barrier;
+ args[j].cpu = i;
+ args[j].args_head = args;
+ if (!j) {
+ /* The first thread is the main one */
+ tinfo[0] = pthread_self();
+ ++j;
+ continue;
+ }
+ ret = pthread_create(&tinfo[j], NULL, thread_runner, &args[j]);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to create thread");
+ abort();
+ }
+ ++j;
+ }
+ printf_verbose("Started %d threads.\n", num_threads);
+
+ /* Also main thread will terminate if it is not selected as leader */
+ thread_runner(&args[0]);
+
+ /* only reached in case of errors */
+out_free_barrier:
+ free(barrier);
+out_free_token:
+ free(token);
+out_free_args:
+ free(args);
+out_free_tinfo:
+ free(tinfo);
+
+ return ret;
+}
+
+int main(int argc, char **argv)
+{
+ if (!rseq_mm_cid_available()) {
+ fprintf(stderr, "Error: rseq_mm_cid unavailable\n");
+ return -1;
+ }
+ if (test_mm_cid_compaction())
+ return -1;
+ return 0;
+}
--
2.51.0
This patch series introduces LANDLOCK_SCOPE_MEMFD_EXEC, a new Landlock
scoping mechanism that restricts execution of anonymous memory file
descriptors (memfd) created via memfd_create(2). This addresses security
gaps where processes can bypass W^X policies and execute arbitrary code
through anonymous memory objects.
Fixes: https://github.com/landlock-lsm/linux/issues/37
SECURITY PROBLEM
================
Current Landlock filesystem restrictions do not cover memfd objects,
allowing processes to:
1. Read-to-execute bypass: Create writable memfd, inject code,
then execute via mmap(PROT_EXEC) or direct execve()
2. Anonymous execution: Execute code without touching the filesystem via
execve("/proc/self/fd/N") where N is a memfd descriptor
3. Cross-domain access violations: Pass memfd between processes to
bypass domain restrictions
These scenarios can occur in sandboxed environments where filesystem
access is restricted but memfd creation remains possible.
IMPLEMENTATION
==============
The implementation adds hierarchical execution control through domain
scoping:
Core Components:
- is_memfd_file(): Reliable memfd detection via "memfd:" dentry prefix
- domain_is_scoped(): Cross-domain hierarchy checking (moved to domain.c)
- LSM hooks: mmap_file, file_mprotect, bprm_creds_for_exec
- Creation-time restrictions: hook_file_alloc_security
Security Matrix:
Execution decisions follow domain hierarchy rules preventing both
same-domain bypass attempts and cross-domain access violations while
preserving legitimate hierarchical access patterns.
Domain Hierarchy with LANDLOCK_SCOPE_MEMFD_EXEC:
===============================================
Root (no domain) - No restrictions
|
+-- Domain A [SCOPE_MEMFD_EXEC] Layer 1
| +-- memfd_A (tagged with Domain A as creator)
| |
| +-- Domain A1 (child) [NO SCOPE] Layer 2
| | +-- Inherits Layer 1 restrictions from parent
| | +-- memfd_A1 (can create, inherits restrictions)
| | +-- Domain A1a [SCOPE_MEMFD_EXEC] Layer 3
| | +-- memfd_A1a (tagged with Domain A1a)
| |
| +-- Domain A2 (child) [SCOPE_MEMFD_EXEC] Layer 2
| +-- memfd_A2 (tagged with Domain A2 as creator)
| +-- CANNOT access memfd_A1 (different subtree)
|
+-- Domain B [SCOPE_MEMFD_EXEC] Layer 1
+-- memfd_B (tagged with Domain B as creator)
+-- CANNOT access ANY memfd from Domain A subtree
Execution Decision Matrix:
========================
Executor-> | A | A1 | A1a | A2 | B | Root
Creator | | | | | |
------------|-----|----|-----|----|----|-----
Domain A | X | X | X | X | X | Y
Domain A1 | Y | X | X | X | X | Y
Domain A1a | Y | Y | X | X | X | Y
Domain A2 | Y | X | X | X | X | Y
Domain B | X | X | X | X | X | Y
Root | Y | Y | Y | Y | Y | Y
Legend: Y = Execution allowed, X = Execution denied
Scenarios Covered:
- Direct mmap(PROT_EXEC) on memfd files
- Two-stage mmap(PROT_READ) + mprotect(PROT_EXEC) bypass attempts
- execve("/proc/self/fd/N") anonymous execution
- execveat() and fexecve() file descriptor execution
- Cross-process memfd inheritance and IPC passing
TESTING
=======
All patches have been validated with:
- scripts/checkpatch.pl --strict (clean)
- Selftests covering same-domain restrictions, cross-domain
hierarchy enforcement, and regular file isolation
- KUnit tests for memfd detection edge cases
DISCLAIMER
==========
My understanding of Landlock scoping semantics may be limited, but this
implementation reflects my current understanding based on available
documentation and code analysis. I welcome feedback and corrections
regarding the scoping logic and domain hierarchy enforcement.
Signed-off-by: Abhinav Saxena <xandfury(a)gmail.com>
---
Abhinav Saxena (4):
landlock: add LANDLOCK_SCOPE_MEMFD_EXEC scope
landlock: implement memfd detection
landlock: add memfd exec LSM hooks and scoping
selftests/landlock: add memfd execution tests
include/uapi/linux/landlock.h | 5 +
security/landlock/.kunitconfig | 1 +
security/landlock/audit.c | 4 +
security/landlock/audit.h | 1 +
security/landlock/cred.c | 14 -
security/landlock/domain.c | 67 ++++
security/landlock/domain.h | 4 +
security/landlock/fs.c | 405 ++++++++++++++++++++-
security/landlock/limits.h | 2 +-
security/landlock/task.c | 67 ----
.../selftests/landlock/scoped_memfd_exec_test.c | 325 +++++++++++++++++
11 files changed, 812 insertions(+), 83 deletions(-)
---
base-commit: 5b74b2eff1eeefe43584e5b7b348c8cd3b723d38
change-id: 20250716-memfd-exec-ac0d582018c3
Best regards,
--
Abhinav Saxena <xandfury(a)gmail.com>
From: Wilfred Mallawa <wilfred.mallawa(a)wdc.com>
During a handshake, an endpoint may specify a maximum record size limit.
Currently, the kernel defaults to TLS_MAX_PAYLOAD_SIZE (16KB) for the
maximum record size. Meaning that, the outgoing records from the kernel
can exceed a lower size negotiated during the handshake. In such a case,
the TLS endpoint must send a fatal "record_overflow" alert [1], and
thus the record is discarded.
Upcoming Western Digital NVMe-TCP hardware controllers implement TLS
support. For these devices, supporting TLS record size negotiation is
necessary because the maximum TLS record size supported by the controller
is less than the default 16KB currently used by the kernel.
This patch adds support for retrieving the negotiated record size limit
during a handshake, and enforcing it at the TLS layer such that outgoing
records are no larger than the size negotiated. This patch depends on
the respective userspace support in tlshd and GnuTLS [2].
[1] https://www.rfc-editor.org/rfc/rfc8449
[2] https://gitlab.com/gnutls/gnutls/-/merge_requests/2005
Signed-off-by: Wilfred Mallawa <wilfred.mallawa(a)wdc.com>
---
Changes V3 -> V4:
* Added record_size_limit RFC reference to documentation
* Always export the record size limit in tls_get_info()
* Disallow user space to change the record_size_limit from under us
if an open record is pending.
* Added record_size_limit minimum size check as per RFC
* Allow space for the ContentType byte for TLS 1.3. The expected
behaviour is that userspace directly uses the negotiated
record_size_limit, kernel will limit the plaintext buffer size
appropirately.
* New patch to add self-tests.
---
Documentation/networking/tls.rst | 12 +++++
include/net/tls.h | 5 +++
include/uapi/linux/tls.h | 2 +
net/tls/tls_device.c | 2 +-
net/tls/tls_main.c | 75 ++++++++++++++++++++++++++++++++
net/tls/tls_sw.c | 2 +-
6 files changed, 96 insertions(+), 2 deletions(-)
diff --git a/Documentation/networking/tls.rst b/Documentation/networking/tls.rst
index 36cc7afc2527..d24bf8911bb8 100644
--- a/Documentation/networking/tls.rst
+++ b/Documentation/networking/tls.rst
@@ -280,6 +280,18 @@ If the record decrypted turns out to had been padded or is not a data
record it will be decrypted again into a kernel buffer without zero copy.
Such events are counted in the ``TlsDecryptRetry`` statistic.
+TLS_TX_RECORD_SIZE_LIM
+~~~~~~~~~~~~~~~~~~~~~~
+
+Sets the maximum size for the plaintext of a protected record.
+
+The provided value should correspond to the limit negotiated during the TLS
+handshake via the `record_size_limit` extension (RFC 8449)[1]. When this
+option is set, the kernel enforces this limit on all transmitted TLS records,
+ensuring no plaintext fragment exceeds the specified size.
+
+[1] https://datatracker.ietf.org/doc/html/rfc8449
+
Statistics
==========
diff --git a/include/net/tls.h b/include/net/tls.h
index 857340338b69..32f053770ec4 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -53,6 +53,8 @@ struct tls_rec;
/* Maximum data size carried in a TLS record */
#define TLS_MAX_PAYLOAD_SIZE ((size_t)1 << 14)
+/* Minimum record size limit as per RFC8449 */
+#define TLS_MIN_RECORD_SIZE_LIM ((size_t)1 << 6)
#define TLS_HEADER_SIZE 5
#define TLS_NONCE_OFFSET TLS_HEADER_SIZE
@@ -226,6 +228,9 @@ struct tls_context {
u8 rx_conf:3;
u8 zerocopy_sendfile:1;
u8 rx_no_pad:1;
+ u16 tx_record_size_limit; /* Max plaintext fragment size. For TLS 1.3,
+ * this excludes the ContentType.
+ */
int (*push_pending_record)(struct sock *sk, int flags);
void (*sk_write_space)(struct sock *sk);
diff --git a/include/uapi/linux/tls.h b/include/uapi/linux/tls.h
index b66a800389cc..3add266d5916 100644
--- a/include/uapi/linux/tls.h
+++ b/include/uapi/linux/tls.h
@@ -41,6 +41,7 @@
#define TLS_RX 2 /* Set receive parameters */
#define TLS_TX_ZEROCOPY_RO 3 /* TX zerocopy (only sendfile now) */
#define TLS_RX_EXPECT_NO_PAD 4 /* Attempt opportunistic zero-copy */
+#define TLS_TX_RECORD_SIZE_LIM 5 /* Maximum record size */
/* Supported versions */
#define TLS_VERSION_MINOR(ver) ((ver) & 0xFF)
@@ -194,6 +195,7 @@ enum {
TLS_INFO_RXCONF,
TLS_INFO_ZC_RO_TX,
TLS_INFO_RX_NO_PAD,
+ TLS_INFO_TX_RECORD_SIZE_LIM,
__TLS_INFO_MAX,
};
#define TLS_INFO_MAX (__TLS_INFO_MAX - 1)
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index f672a62a9a52..bf16ceb41dde 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -459,7 +459,7 @@ static int tls_push_data(struct sock *sk,
/* TLS_HEADER_SIZE is not counted as part of the TLS record, and
* we need to leave room for an authentication tag.
*/
- max_open_record_len = TLS_MAX_PAYLOAD_SIZE +
+ max_open_record_len = tls_ctx->tx_record_size_limit +
prot->prepend_size;
do {
rc = tls_do_allocation(sk, ctx, pfrag, prot->prepend_size);
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index a3ccb3135e51..09883d9c6c96 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -544,6 +544,31 @@ static int do_tls_getsockopt_no_pad(struct sock *sk, char __user *optval,
return 0;
}
+static int do_tls_getsockopt_tx_record_size(struct sock *sk, char __user *optval,
+ int __user *optlen)
+{
+ struct tls_context *ctx = tls_get_ctx(sk);
+ int len;
+ /* TLS 1.3: Record length contains ContentType */
+ u16 record_size_limit = ctx->prot_info.version == TLS_1_3_VERSION ?
+ ctx->tx_record_size_limit + 1 :
+ ctx->tx_record_size_limit;
+
+ if (get_user(len, optlen))
+ return -EFAULT;
+
+ if (len < sizeof(record_size_limit))
+ return -EINVAL;
+
+ if (put_user(sizeof(record_size_limit), optlen))
+ return -EFAULT;
+
+ if (copy_to_user(optval, &record_size_limit, sizeof(record_size_limit)))
+ return -EFAULT;
+
+ return 0;
+}
+
static int do_tls_getsockopt(struct sock *sk, int optname,
char __user *optval, int __user *optlen)
{
@@ -563,6 +588,9 @@ static int do_tls_getsockopt(struct sock *sk, int optname,
case TLS_RX_EXPECT_NO_PAD:
rc = do_tls_getsockopt_no_pad(sk, optval, optlen);
break;
+ case TLS_TX_RECORD_SIZE_LIM:
+ rc = do_tls_getsockopt_tx_record_size(sk, optval, optlen);
+ break;
default:
rc = -ENOPROTOOPT;
break;
@@ -812,6 +840,43 @@ static int do_tls_setsockopt_no_pad(struct sock *sk, sockptr_t optval,
return rc;
}
+static int do_tls_setsockopt_tx_record_size(struct sock *sk, sockptr_t optval,
+ unsigned int optlen)
+{
+ struct tls_context *ctx = tls_get_ctx(sk);
+ struct tls_sw_context_tx *sw_ctx = tls_sw_ctx_tx(ctx);
+ u16 value;
+
+ if (sw_ctx->open_rec)
+ return -EBUSY;
+
+ if (sockptr_is_null(optval) || optlen != sizeof(value))
+ return -EINVAL;
+
+ if (copy_from_sockptr(&value, optval, sizeof(value)))
+ return -EFAULT;
+
+ if (value < TLS_MIN_RECORD_SIZE_LIM)
+ return -EINVAL;
+
+ if (ctx->prot_info.version == TLS_1_2_VERSION &&
+ value > TLS_MAX_PAYLOAD_SIZE)
+ return -EINVAL;
+
+ if (ctx->prot_info.version == TLS_1_3_VERSION &&
+ value - 1 > TLS_MAX_PAYLOAD_SIZE)
+ return -EINVAL;
+
+ /*
+ * For TLS 1.3: 'value' includes one byte for the appended ContentType.
+ * Adjust the kernel's internal plaintext limit accordingly.
+ */
+ ctx->tx_record_size_limit = ctx->prot_info.version == TLS_1_3_VERSION ?
+ value - 1 : value;
+
+ return 0;
+}
+
static int do_tls_setsockopt(struct sock *sk, int optname, sockptr_t optval,
unsigned int optlen)
{
@@ -833,6 +898,9 @@ static int do_tls_setsockopt(struct sock *sk, int optname, sockptr_t optval,
case TLS_RX_EXPECT_NO_PAD:
rc = do_tls_setsockopt_no_pad(sk, optval, optlen);
break;
+ case TLS_TX_RECORD_SIZE_LIM:
+ rc = do_tls_setsockopt_tx_record_size(sk, optval, optlen);
+ break;
default:
rc = -ENOPROTOOPT;
break;
@@ -1022,6 +1090,7 @@ static int tls_init(struct sock *sk)
ctx->tx_conf = TLS_BASE;
ctx->rx_conf = TLS_BASE;
+ ctx->tx_record_size_limit = TLS_MAX_PAYLOAD_SIZE;
update_sk_prot(sk, ctx);
out:
write_unlock_bh(&sk->sk_callback_lock);
@@ -1111,6 +1180,11 @@ static int tls_get_info(struct sock *sk, struct sk_buff *skb, bool net_admin)
goto nla_failure;
}
+ err = nla_put_u16(skb, TLS_INFO_TX_RECORD_SIZE_LIM,
+ ctx->tx_record_size_limit);
+ if (err)
+ goto nla_failure;
+
rcu_read_unlock();
nla_nest_end(skb, start);
return 0;
@@ -1132,6 +1206,7 @@ static size_t tls_get_info_size(const struct sock *sk, bool net_admin)
nla_total_size(sizeof(u16)) + /* TLS_INFO_TXCONF */
nla_total_size(0) + /* TLS_INFO_ZC_RO_TX */
nla_total_size(0) + /* TLS_INFO_RX_NO_PAD */
+ nla_total_size(sizeof(u16)) + /* TLS_INFO_TX_RECORD_SIZE_LIM */
0;
return size;
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index bac65d0d4e3e..28fb796573d1 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1079,7 +1079,7 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg,
orig_size = msg_pl->sg.size;
full_record = false;
try_to_copy = msg_data_left(msg);
- record_room = TLS_MAX_PAYLOAD_SIZE - msg_pl->sg.size;
+ record_room = tls_ctx->tx_record_size_limit - msg_pl->sg.size;
if (try_to_copy >= record_room) {
try_to_copy = record_room;
full_record = true;
--
2.51.0
This check was removed in commit e6f497955fb6 ("ipv6: Check GATEWAY
in rtm_to_fib6_multipath_config().") as part of rt6_qualify_for ecmp().
The author correctly recognises that rt6_qualify_for_ecmp() returns
false if fb_nh_gw_family is set to AF_UNSPEC, but then mistakes
AF_UNSPEC for AF_INET6 when reasoning that the check is unnecessary.
This means certain malformed entries don't get caught in
ip6_route_multipath_add().
This patch reintroduces the AF_UNSPEC check while respecting changes
of the initial patch.
Reported-by: syzbot+a259a17220263c2d73fc(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=a259a17220263c2d73fc
Fixes: e6f497955fb6 ("ipv6: Check GATEWAY in rtm_to_fib6_multipath_config().")
Signed-off-by: Maksimilijan Marosevic <maksimilijan.marosevic(a)proton.me>
---
net/ipv6/route.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index aee6a10b112a..884bae3fb1b1 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -5454,6 +5454,14 @@ static int ip6_route_multipath_add(struct fib6_config *cfg,
goto cleanup;
}
+ if (rt->fib6_nh->fib_nh_gw_family == AF_UNSPEC) {
+ err = -EINVAL;
+ NL_SET_ERR_MSG(extack,
+ "Device only routes can not be added for IPv6 using the multipath API.");
+ fib6_info_release(rt);
+ goto cleanup;
+ }
+
rt->fib6_nh->fib_nh_weight = rtnh->rtnh_hops + 1;
err = ip6_route_info_append(&rt6_nh_list, rt, &r_cfg);
--
2.43.0
The Alphabetical layout was... auto-correct.
Friendly: Neo (Jesus-Christ a.k.a. King David again).
-
Miracles proving that I was Jesus, and King David, are in my Facebook.
I proposed to explain the Holy-Trinity live on TV.
It is since year 2000 that I must be in the news.
I do not declare be Jew, but get persecuted anyway by antisemites (antichristians?).
Writing still does not pay my bills at all, keeps me almost homeless.
While my intellectual property is stolen, my names removed from many Masterpieces.
Facebook
https://www.facebook.com/profile.php?id=100057121342964
Paypal
https://www.paypal.com/paypalme/meDavidSantamaria
Email/Teams
DavidSantamaria(a)hotmail.fr<mailto:DavidSantamaria@hotmail.fr>
RCS/WhatsApp
+33 7 67 99 32 37
$ADA
addr1qx8chpwdeqv77duf2eutrtgvd5967l4w87fy54fx0022gr8p80z2mq7cmunmrdvy8yn3pzfzpm46zyfjp8usl36vpw2q509hrd
$BTC
3FdwVoDzJoUzceogUwEmVu9YxoFvX6c2Rk
$WAVES
3PEAeFkwqVsgAiyad8uVQzQxiGDyJNnCCn5
Sent from my Public Address.
selftests/net/lib.sh contains a suite of iproute2 wrappers that
automatically schedule the corresponding cleanup through defer. The fact
they do so is however not immediately obvious, one needs to know which
functions are handling the deferral behind the scenes, and which expect the
caller to handle cleanups themselves.
A convention for these auto-deferring functions would help both writing and
patch review. This patchset does so by marking these functions with an adf_
prefix. We already have a few such functions: forwarding/lib.sh has
adf_mcd_start() and a few selftests add private helpers that conform to
this convention.
Patches #1 to #8 gradually convert individual functions, one per patch.
Patch #9 renames an auto-deferring private helpers named dfr_* to adf_*.
The plan is not to retro-rename all private helpers, but I happened to know
about this one.
Patches #10 to #12 introduce several autodefer helpers for commonly used
forwarding/lib.sh functions, and opportunistically convert straightforward
instances of 'action; defer counteraction' to the new helpers.
Patch #13 adds some README verbiage to pitch defer and the adf_*
convention.
Petr Machata (13):
selftests: net: lib: Rename ip_link_add() to adf_*
selftests: net: lib: Rename ip_link_set_master() to adf_*
selftests: net: lib: Rename ip_link_set_addr() to adf_*
selftests: net: lib: Rename ip_link_set_up() to adf_*
selftests: net: lib: Rename ip_link_set_down() to adf_*
selftests: net: lib: Rename ip_addr_add() to adf_*
selftests: net: lib: Rename ip_route_add() to adf_*
selftests: net: lib: Rename bridge_vlan_add() to adf_*
selftests: net: vlan_bridge_binding: Rename dfr_set_binding_*() to
adf_*
selftests: forwarding: lib: Add an autodefer variant of vrf_prepare()
selftests: forwarding: lib: Add an autodefer variant of
simple_if_init()
selftests: forwarding: lib: Add an autodefer variant of
forwarding_enable()
selftests: forwarding: README: Mention defer, adf_
.../drivers/net/mlxsw/devlink_trap_policer.sh | 9 +-
.../drivers/net/mlxsw/qos_ets_strict.sh | 12 +-
.../drivers/net/mlxsw/qos_max_descriptors.sh | 9 +-
.../drivers/net/mlxsw/qos_mc_aware.sh | 12 +-
.../drivers/net/mlxsw/sch_red_core.sh | 6 +-
tools/testing/selftests/net/fdb_notify.sh | 26 ++--
tools/testing/selftests/net/forwarding/README | 15 ++
.../net/forwarding/bridge_activity_notify.sh | 21 ++-
.../net/forwarding/bridge_fdb_local_vlan_0.sh | 65 ++++----
tools/testing/selftests/net/forwarding/lib.sh | 18 +++
.../selftests/net/forwarding/sch_ets_core.sh | 9 +-
.../selftests/net/forwarding/sch_red.sh | 12 +-
.../selftests/net/forwarding/sch_tbf_core.sh | 6 +-
.../net/forwarding/vxlan_bridge_1q_mc_ul.sh | 141 +++++++++---------
.../net/forwarding/vxlan_reserved.sh | 33 ++--
tools/testing/selftests/net/lib.sh | 16 +-
.../net/test_vxlan_fdb_changelink.sh | 8 +-
.../selftests/net/vlan_bridge_binding.sh | 44 +++---
18 files changed, 225 insertions(+), 237 deletions(-)
--
2.49.0
Here are some patches for the MPTCP PM, including some refactoring that
I thought it would be best to send at the end of a cycle to avoid
conflicts between net and net-next that could last a few weeks.
The most interesting changes are in the first and last patch, the rest
are patches refactoring the code & tests to validate the modifications.
- Patches 1 & 2: When servers set the C-flag in their MP_CAPABLE to tell
clients not to create subflows to the initial address and port -- e.g.
a deployment behind a L4 load balancer like a typical CDN deployment
-- clients will not use their other endpoints when default settings
are used. That's because the in-kernel path-manager uses the 'subflow'
endpoints to create subflows only to the initial address and port. The
first patch fixes that (for >=v5.14), and the second one validates it.
- Patches 3-14: various patches refactoring the code around the
in-kernel PM (mainly): split too long functions, rename variables and
functions to avoid confusions, reduce structure size, and compare IDs
instead of IP addresses. Note that one patch modifies one internal
variable used in one BPF selftest.
- Patch 15: ability to control endpoints that are used in reaction to a
new address announced by the other peer. With that, endpoints can be
used only once.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Notes:
- Patches 1 & 2 are sent to net-next on purpose: to delay a bit the
backports, just in case. Plus we are at the end of a cycle, and not
to delay the other refactoring patches.
- Sorry, I wanted to send this series earlier on, but due to some
unrelated issues (and holiday), it got delayed. Most patches are
pure refactoring ones.
---
Matthieu Baerts (NGI0) (15):
mptcp: pm: in-kernel: usable client side with C-flag
selftests: mptcp: join: validate C-flag + def limit
mptcp: pm: in-kernel: refactor fill_local_addresses_vec
mptcp: pm: in-kernel: refactor fill_remote_addresses_vec
mptcp: pm: rename 'subflows' to 'extra_subflows'
mptcp: pm: in-kernel: rename 'subflows_max' to 'limit_extra_subflows'
mptcp: pm: in-kernel: rename 'add_addr_signal_max' to 'endp_signal_max'
mptcp: pm: in-kernel: rename 'add_addr_accept_max' to 'limit_add_addr_accepted'
mptcp: pm: in-kernel: rename 'local_addr_max' to 'endp_subflow_max'
mptcp: pm: in-kernel: rename 'local_addr_list' to 'endp_list'
mptcp: pm: in-kernel: rename 'addrs' to 'endpoints'
mptcp: pm: in-kernel: remove stale_loss_cnt
mptcp: pm: in-kernel: reduce pernet struct size
mptcp: pm: in-kernel: compare IDs instead of addresses
mptcp: pm: in-kernel: add laminar endpoints
include/uapi/linux/mptcp.h | 11 +-
net/mptcp/pm.c | 32 +-
net/mptcp/pm_kernel.c | 569 ++++++++++++++--------
net/mptcp/pm_userspace.c | 2 +-
net/mptcp/protocol.h | 21 +-
net/mptcp/sockopt.c | 22 +-
tools/testing/selftests/bpf/progs/mptcp_subflow.c | 2 +-
tools/testing/selftests/net/mptcp/mptcp_join.sh | 11 +
8 files changed, 441 insertions(+), 229 deletions(-)
---
base-commit: a1f1f2422e098485b09e55a492de05cf97f9954d
change-id: 20250925-net-next-mptcp-c-flag-laminar-f8442e4d4bd9
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Fix to avoid cases where the `res` shell variable is
empty in script comparisons.
The comparison has been modified into string comparison to
handle other possible values the variable could assume.
The issue can be reproduced with the command:
make kselftest TARGETS=net
It solves the error:
./tfo_passive.sh: line 98: [: -eq: unary operator expected
Signed-off-by: Alessandro Zanni <alessandro.zanni87(a)gmail.com>
---
Notes:
v2: edit condition to handle strings
tools/testing/selftests/net/tfo_passive.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/tfo_passive.sh b/tools/testing/selftests/net/tfo_passive.sh
index 80bf11fdc046..a4550511830a 100755
--- a/tools/testing/selftests/net/tfo_passive.sh
+++ b/tools/testing/selftests/net/tfo_passive.sh
@@ -95,7 +95,7 @@ wait
res=$(cat $out_file)
rm $out_file
-if [ $res -eq 0 ]; then
+if [ "$res" = "0" ]; then
echo "got invalid NAPI ID from passive TFO socket"
cleanup_ns
exit 1
--
2.43.0
The generic vDSO provides a lot common functionality shared between
different architectures. SPARC is the last architecture not using it,
preventing some necessary code cleanup.
Make use of the generic infrastructure.
Follow-up to and replacement for Arnd's SPARC vDSO removal patches:
https://lore.kernel.org/lkml/20250707144726.4008707-1-arnd@kernel.org/
Tested on a Niagara T4 and QEMU.
This has a semantic conflict with my series "vdso: Reject absolute
relocations during build". The last patch of this series expects all users
of the generic vDSO library to use the vdsocheck tool.
This is not the case (yet) for SPARC64. I do have the patches for the
integration, the specifics will depend on which series is applied first.
Based on tip/timers/vdso.
[0] https://lore.kernel.org/lkml/20250812-vdso-absolute-reloc-v4-0-61a8b615e5ec…
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Changes in v3:
- Allocate vDSO data pages dynamically (and lots of preparations for that)
- Drop clock_getres()
- Fix 32bit clock_gettime() syscall fallback
- Link to v2: https://lore.kernel.org/r/20250815-vdso-sparc64-generic-2-v2-0-b5ff80672347…
Changes in v2:
- Rebase on v6.17-rc1
- Drop RFC state
- Fix typo in commit message
- Drop duplicate 'select GENERIC_TIME_VSYSCALL'
- Merge "sparc64: time: Remove architecture-specific clocksource data" into the
main conversion patch. It violated the check in __clocksource_register_scale()
- Link to v1: https://lore.kernel.org/r/20250724-vdso-sparc64-generic-2-v1-0-e376a3bd24d1…
---
Arnd Bergmann (1):
clocksource: remove ARCH_CLOCKSOURCE_DATA
Thomas Weißschuh (35):
selftests: vDSO: vdso_test_correctness: Handle different tv_usec types
arm64: vDSO: getrandom: Explicitly include asm/alternative.h
arm64: vDSO: gettimeofday: Explicitly include vdso/clocksource.h
arm64: vDSO: compat_gettimeofday: Add explicit includes
ARM: vdso: gettimeofday: Add explicit includes
powerpc/vdso/gettimeofday: Explicitly include vdso/time32.h
powerpc/vdso: Explicitly include asm/cputable.h and asm/feature-fixups.h
LoongArch: vDSO: Explicitly include asm/vdso/vdso.h
MIPS: vdso: Add include guard to asm/vdso/vdso.h
MIPS: vdso: Explicitly include asm/vdso/vdso.h
random: vDSO: Add explicit includes
vdso/gettimeofday: Add explicit includes
vdso/helpers: Explicitly include vdso/processor.h
vdso/datapage: Remove inclusion of gettimeofday.h
vdso/datapage: Trim down unnecessary includes
random: vDSO: trim vDSO includes
random: vDSO: remove ifdeffery
random: vDSO: split out datapage update into helper functions
random: vDSO: only access vDSO datapage after random_init()
s390/time: Set up vDSO datapage later
vdso/datastore: Reduce scope of some variables in vvar_fault()
vdso/datastore: Drop inclusion of linux/mmap_lock.h
vdso/datastore: Map pages through struct page
vdso/datastore: Allocate data pages dynamically
sparc64: vdso: Link with -z noexecstack
sparc64: vdso: Remove obsolete "fake section table" reservation
sparc64: vdso: Replace code patching with runtime conditional
sparc64: vdso: Move hardware counter read into header
sparc64: vdso: Move syscall fallbacks into header
sparc64: vdso: Introduce vdso/processor.h
sparc64: vdso: Switch to the generic vDSO library
sparc64: vdso2c: Drop sym_vvar_start handling
sparc64: vdso2c: Remove symbol handling
sparc64: vdso: Implement clock_gettime64()
clocksource: drop include of asm/clocksource.h from linux/clocksource.h
arch/arm/include/asm/vdso/gettimeofday.h | 2 +
arch/arm64/include/asm/vdso/compat_gettimeofday.h | 3 +
arch/arm64/include/asm/vdso/gettimeofday.h | 2 +
arch/arm64/kernel/vdso/vgetrandom.c | 2 +
arch/loongarch/kernel/process.c | 1 +
arch/loongarch/kernel/vdso.c | 1 +
arch/mips/include/asm/vdso/vdso.h | 5 +
arch/mips/kernel/vdso.c | 1 +
arch/powerpc/include/asm/vdso/gettimeofday.h | 1 +
arch/powerpc/include/asm/vdso/processor.h | 3 +
arch/s390/kernel/time.c | 4 +-
arch/sparc/Kconfig | 3 +-
arch/sparc/include/asm/clocksource.h | 9 -
arch/sparc/include/asm/processor.h | 3 +
arch/sparc/include/asm/processor_32.h | 2 -
arch/sparc/include/asm/processor_64.h | 25 --
arch/sparc/include/asm/vdso.h | 2 -
arch/sparc/include/asm/vdso/clocksource.h | 10 +
arch/sparc/include/asm/vdso/gettimeofday.h | 184 ++++++++++
arch/sparc/include/asm/vdso/processor.h | 41 +++
arch/sparc/include/asm/vdso/vsyscall.h | 10 +
arch/sparc/include/asm/vvar.h | 75 ----
arch/sparc/kernel/Makefile | 1 -
arch/sparc/kernel/time_64.c | 6 +-
arch/sparc/kernel/vdso.c | 69 ----
arch/sparc/vdso/Makefile | 8 +-
arch/sparc/vdso/vclock_gettime.c | 380 ++-------------------
arch/sparc/vdso/vdso-layout.lds.S | 26 +-
arch/sparc/vdso/vdso.lds.S | 2 -
arch/sparc/vdso/vdso2c.c | 24 --
arch/sparc/vdso/vdso2c.h | 45 +--
arch/sparc/vdso/vdso32/vdso32.lds.S | 4 +-
arch/sparc/vdso/vma.c | 274 +--------------
drivers/char/random.c | 75 ++--
include/linux/clocksource.h | 8 -
include/linux/vdso_datastore.h | 6 +
include/vdso/datapage.h | 23 +-
include/vdso/helpers.h | 1 +
init/main.c | 2 +
kernel/time/Kconfig | 4 -
lib/vdso/datastore.c | 73 ++--
lib/vdso/getrandom.c | 3 +
lib/vdso/gettimeofday.c | 17 +
.../testing/selftests/vDSO/vdso_test_correctness.c | 8 +-
44 files changed, 451 insertions(+), 997 deletions(-)
---
base-commit: 5f84f6004e298bd41c9e4ed45c18447954b1dce6
change-id: 20250722-vdso-sparc64-generic-2-25f2e058e92c
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Fix to avoid the usage of the `res` variable uninitialized in the
following macro expansions.
It solves the following warning:
In function ‘iommufd_viommu_vdevice_alloc’,
inlined from ‘wrapper_iommufd_viommu_vdevice_alloc’ at
iommufd.c:2889:1:
../kselftest_harness.h:760:12: warning: ‘ret’ may be used uninitialized
[-Wmaybe-uninitialized]
760 | if (!(__exp _t __seen)) { \
| ^
../kselftest_harness.h:513:9: note: in expansion of macro ‘__EXPECT’
513 | __EXPECT(expected, #expected, seen, #seen, ==, 1)
| ^~~~~~~~
iommufd_utils.h:1057:9: note: in expansion of macro ‘ASSERT_EQ’
1057 | ASSERT_EQ(0, _test_cmd_trigger_vevents(self->fd, dev_id,
nvevents))
| ^~~~~~~~~
iommufd.c:2924:17: note: in expansion of macro
‘test_cmd_trigger_vevents’
2924 | test_cmd_trigger_vevents(dev_id, 3);
| ^~~~~~~~~~~~~~~~~~~~~~~~
The issue can be reproduced, building the tests, with the command:
make -C tools/testing/selftests TARGETS=iommu
Signed-off-by: Alessandro Zanni <alessandro.zanni87(a)gmail.com>
---
tools/testing/selftests/iommu/iommufd_utils.h | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 3c3e08b8c90e..772ca1db6e59 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -1042,15 +1042,13 @@ static int _test_cmd_trigger_vevents(int fd, __u32 dev_id, __u32 nvevents)
.dev_id = dev_id,
},
};
- int ret;
while (nvevents--) {
- ret = ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_TRIGGER_VEVENT),
- &trigger_vevent_cmd);
- if (ret < 0)
+ if (!ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_TRIGGER_VEVENT),
+ &trigger_vevent_cmd))
return -1;
}
- return ret;
+ return 0;
}
#define test_cmd_trigger_vevents(dev_id, nvevents) \
--
2.43.0
This series is preparing to add the -Wsign-compare C compilation flag to
the Makefile for bpf selftests as requested by a TODO to help avoid
implicit type conversions and have predictable behavior.
Changelog:
Changes from v2:
-Split up the patch into a patch series as suggested by vivek
-Include only changes to variable types with no casting by my mentor
david
-Removed the -Wsign-compare in Makefile to avoid compilation errors
until adding casting for rest of comparisons.
Link:https://lore.kernel.org/bpf/20250924195731.6374-1-mehdi.benhadjkhelifa…
Changes from v1:
- Fix CI failed builds where it failed due to do missing .c and
.h files in my patch for working in mainline.
Link:https://lore.kernel.org/bpf/20250924162408.815137-1-mehdi.benhadjkheli…
Mehdi Ben Hadj Khelifa (3):
selftests/bpf: Prepare to add -Wsign-compare for bpf tests
selftests/bpf: Prepare to add -Wsign-compare for bpf tests
selftests/bpf: Prepare to add -Wsign-compare for bpf tests
tools/testing/selftests/bpf/progs/test_global_func11.c | 2 +-
tools/testing/selftests/bpf/progs/test_global_func12.c | 2 +-
tools/testing/selftests/bpf/progs/test_global_func13.c | 2 +-
tools/testing/selftests/bpf/progs/test_global_func9.c | 2 +-
tools/testing/selftests/bpf/progs/test_map_init.c | 2 +-
tools/testing/selftests/bpf/progs/test_parse_tcp_hdr_opt.c | 2 +-
.../selftests/bpf/progs/test_parse_tcp_hdr_opt_dynptr.c | 2 +-
tools/testing/selftests/bpf/progs/test_skb_ctx.c | 2 +-
tools/testing/selftests/bpf/progs/test_snprintf.c | 2 +-
tools/testing/selftests/bpf/progs/test_sockmap_strp.c | 2 +-
tools/testing/selftests/bpf/progs/test_tc_tunnel.c | 2 +-
tools/testing/selftests/bpf/progs/test_xdp.c | 2 +-
tools/testing/selftests/bpf/progs/test_xdp_dynptr.c | 2 +-
tools/testing/selftests/bpf/progs/test_xdp_loop.c | 2 +-
tools/testing/selftests/bpf/progs/test_xdp_noinline.c | 4 ++--
tools/testing/selftests/bpf/progs/uprobe_multi.c | 4 ++--
.../selftests/bpf/progs/uprobe_multi_session_recursive.c | 5 +++--
.../selftests/bpf/progs/verifier_iterating_callbacks.c | 2 +-
18 files changed, 22 insertions(+), 21 deletions(-)
--
2.51.0
Add a basic test suite for drivers that support PSP. Also, add a PSP
implementation in the netdevsim driver.
The netdevsim implementation does encapsulation and decapsulation of
PSP packets, but no crypto.
The tests cover the basic usage of the uapi, and demonstrate key
exchange and connection setup. The tests and netdevsim support IPv4
and IPv6. Here is an example run on a system with a CX7 NIC.
TAP version 13
1..28
ok 1 psp.data_basic_send_v0_ip4
ok 2 psp.data_basic_send_v0_ip6
ok 3 psp.data_basic_send_v1_ip4
ok 4 psp.data_basic_send_v1_ip6
ok 5 psp.data_basic_send_v2_ip4 # SKIP ('PSP version not supported', 'hdr0-aes-gmac-128')
ok 6 psp.data_basic_send_v2_ip6 # SKIP ('PSP version not supported', 'hdr0-aes-gmac-128')
ok 7 psp.data_basic_send_v3_ip4 # SKIP ('PSP version not supported', 'hdr0-aes-gmac-256')
ok 8 psp.data_basic_send_v3_ip6 # SKIP ('PSP version not supported', 'hdr0-aes-gmac-256')
ok 9 psp.data_mss_adjust_ip4
ok 10 psp.data_mss_adjust_ip6
ok 11 psp.dev_list_devices
ok 12 psp.dev_get_device
ok 13 psp.dev_get_device_bad
ok 14 psp.dev_rotate
ok 15 psp.dev_rotate_spi
ok 16 psp.assoc_basic
ok 17 psp.assoc_bad_dev
ok 18 psp.assoc_sk_only_conn
ok 19 psp.assoc_sk_only_mismatch
ok 20 psp.assoc_sk_only_mismatch_tx
ok 21 psp.assoc_sk_only_unconn
ok 22 psp.assoc_version_mismatch
ok 23 psp.assoc_twice
ok 24 psp.data_send_bad_key
ok 25 psp.data_send_disconnect
ok 26 psp.data_stale_key
ok 27 psp.removal_device_rx # XFAIL Test only works on netdevsim
ok 28 psp.removal_device_bi # XFAIL Test only works on netdevsim
# Totals: pass:22 fail:0 xfail:2 xpass:0 skip:4 error:0
#
# Responder logs (0):
# STDERR:
# Set PSP enable on device 1 to 0x3
# Set PSP enable on device 1 to 0x0
CHANGES:
v2:
- fix pylint warnings
- insert CONFIG_INET_PSP in alphebetical order
- use branch to skip all tests
- fix compilation error when CONFIG_INET_PSP is not set
v1: https://lore.kernel.org/netdev/20250924194959.2845473-1-daniel.zahka@gmail.…
Jakub Kicinski (8):
netdevsim: a basic test PSP implementation
selftests: drv-net: base device access API test
selftests: drv-net: add PSP responder
selftests: drv-net: psp: add basic data transfer and key rotation
tests
selftests: drv-net: psp: add association tests
selftests: drv-net: psp: add connection breaking tests
selftests: drv-net: psp: add test for auto-adjusting TCP MSS
selftests: drv-net: psp: add tests for destroying devices
drivers/net/netdevsim/Makefile | 4 +
drivers/net/netdevsim/netdev.c | 55 +-
drivers/net/netdevsim/netdevsim.h | 33 +
drivers/net/netdevsim/psp.c | 234 +++++++
net/core/skbuff.c | 1 +
.../testing/selftests/drivers/net/.gitignore | 1 +
tools/testing/selftests/drivers/net/Makefile | 10 +
tools/testing/selftests/drivers/net/config | 1 +
.../drivers/net/hw/lib/py/__init__.py | 4 +-
.../selftests/drivers/net/lib/py/__init__.py | 4 +-
.../selftests/drivers/net/lib/py/env.py | 5 +
tools/testing/selftests/drivers/net/psp.py | 593 ++++++++++++++++++
.../selftests/drivers/net/psp_responder.c | 483 ++++++++++++++
.../testing/selftests/net/lib/py/__init__.py | 2 +-
tools/testing/selftests/net/lib/py/ksft.py | 10 +
tools/testing/selftests/net/lib/py/ynl.py | 5 +
16 files changed, 1432 insertions(+), 13 deletions(-)
create mode 100644 drivers/net/netdevsim/psp.c
create mode 100755 tools/testing/selftests/drivers/net/psp.py
create mode 100644 tools/testing/selftests/drivers/net/psp_responder.c
--
2.47.3
From: Dylan Yudaken <dyudaken(a)gmail.com>
Add a .gitignore for the test case build object.
Signed-off-by: Dylan Yudaken <dyudaken(a)gmail.com>
Signed-off-by: Sohil Mehta <sohil.mehta(a)intel.com>
Reviewed-by: Simon Horman <horms(a)kernel.org>
---
The binary creates some noise. The patch to fix that seems to have
fallen through the cracks. Sending another revision with an expanded Cc
list.
v2:
- Pick up the review tag
v1: https://lore.kernel.org/all/20250623232549.3263273-1-dyudaken@gmail.com/
---
tools/testing/selftests/kexec/.gitignore | 2 ++
1 file changed, 2 insertions(+)
create mode 100644 tools/testing/selftests/kexec/.gitignore
diff --git a/tools/testing/selftests/kexec/.gitignore b/tools/testing/selftests/kexec/.gitignore
new file mode 100644
index 000000000000..5f3d9e089ae8
--- /dev/null
+++ b/tools/testing/selftests/kexec/.gitignore
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+test_kexec_jump
--
2.43.0
This is v10 of the TDX selftests.
This series is based on v6.17-rc4 and has a dependency on
"KVM: TDX: Force split irqchip for TDX at irqchip creation time" [1]
Changes from v9 [2]:
- Rebased on top of v6.17-rc4.
- Addressed the comments from v9.
- Removed special handling for split irqchip in the test code in favor
for the kvm fix in [1].
- Removed outdated support for VM memory not backed by guest_memfd.
- Split "KVM: selftests: Hook TDX support to vm and vcpu creation" into
4 separate patches.
[1] https://lore.kernel.org/lkml/20250904062007.622530-1-sagis@google.com/
[2] https://lore.kernel.org/lkml/20250821042915.3712925-1-sagis@google.com/
Ackerley Tng (2):
KVM: selftests: Add helpers to init TDX memory and finalize VM
KVM: selftests: Add ucall support for TDX
Erdem Aktas (2):
KVM: selftests: Add TDX boot code
KVM: selftests: Add support for TDX TDCALL from guest
Isaku Yamahata (2):
KVM: selftests: Update kvm_init_vm_address_properties() for TDX
KVM: selftests: TDX: Use KVM_TDX_CAPABILITIES to validate TDs'
attribute configuration
Sagi Shahar (15):
KVM: selftests: Allocate pgd in virt_map() as necessary
KVM: selftests: Expose functions to get default sregs values
KVM: selftests: Expose function to allocate guest vCPU stack
KVM: selftests: Expose segment definitons to assembly files
KVM: selftests: Add kbuild definitons
KVM: selftests: Define structs to pass parameters to TDX boot code
KVM: selftests: Set up TDX boot code region
KVM: selftests: Set up TDX boot parameters region
KVM: selftests: Add helper to initialize TDX VM
KVM: selftests: Call TDX init when creating a new TDX vm
KVM: selftests: Setup memory regions for TDX on vm creation
KVM: selftests: Call KVM_TDX_INIT_VCPU when creating a new TDX vcpu
KVM: selftests: Set entry point for TDX guest code
KVM: selftests: Add wrapper for TDX MMIO from guest
KVM: selftests: Add TDX lifecycle test
tools/include/linux/kbuild.h | 18 +
tools/testing/selftests/kvm/Makefile.kvm | 32 ++
.../selftests/kvm/include/x86/processor.h | 35 ++
.../selftests/kvm/include/x86/processor_asm.h | 12 +
.../selftests/kvm/include/x86/tdx/td_boot.h | 74 ++++
.../kvm/include/x86/tdx/td_boot_asm.h | 16 +
.../selftests/kvm/include/x86/tdx/tdcall.h | 34 ++
.../selftests/kvm/include/x86/tdx/tdx.h | 14 +
.../selftests/kvm/include/x86/tdx/tdx_util.h | 86 +++++
.../testing/selftests/kvm/include/x86/ucall.h | 4 +-
tools/testing/selftests/kvm/lib/kvm_util.c | 10 +-
.../testing/selftests/kvm/lib/x86/processor.c | 91 +++--
.../selftests/kvm/lib/x86/tdx/td_boot.S | 60 +++
.../kvm/lib/x86/tdx/td_boot_offsets.c | 21 ++
.../selftests/kvm/lib/x86/tdx/tdcall.S | 93 +++++
.../kvm/lib/x86/tdx/tdcall_offsets.c | 16 +
tools/testing/selftests/kvm/lib/x86/tdx/tdx.c | 23 ++
.../selftests/kvm/lib/x86/tdx/tdx_util.c | 354 ++++++++++++++++++
tools/testing/selftests/kvm/lib/x86/ucall.c | 45 ++-
tools/testing/selftests/kvm/x86/tdx_vm_test.c | 31 ++
20 files changed, 1032 insertions(+), 37 deletions(-)
create mode 100644 tools/include/linux/kbuild.h
create mode 100644 tools/testing/selftests/kvm/include/x86/processor_asm.h
create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/td_boot.h
create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/td_boot_asm.h
create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/tdcall.h
create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/tdx.h
create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/tdx_util.h
create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/td_boot.S
create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/td_boot_offsets.c
create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdcall.S
create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdcall_offsets.c
create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdx.c
create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdx_util.c
create mode 100644 tools/testing/selftests/kvm/x86/tdx_vm_test.c
--
2.51.0.338.gd7d06c2dae-goog
From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com>
Hello,
Plesae find the v2 AccECN case handling patch series, which covers
several excpetional case handling of Accurate ECN spec (RFC9768),
adds new identifiers to be used by CC modules, adds ecn_delta into
rate_sample, and keeps the ACE counter for computation, etc.
This patch series is part of the full AccECN patch series, which is available at
https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/
Best regards,
Chia-Yu
---
Chia-Yu Chang (11):
tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules
tcp: disable RFC3168 fallback identifier for CC modules
tcp: accecn: handle unexpected AccECN negotiation feedback
tcp: accecn: retransmit downgraded SYN in AccECN negotiation
tcp: move increment of num_retrans
tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN
SYN/ACK
tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion
tcp: accecn: fallback outgoing half link to non-AccECN
tcp: accecn: verify ACE counter in 1st ACK after AccECN negotiation
tcp: accecn: stop sending AccECN opt when loss ACK w/ option
tcp: accecn: enable AccECN
Ilpo Järvinen (3):
tcp: try to avoid safer when ACKs are thinned
gro: flushing when CWR is set negatively affects AccECN
tcp: accecn: Add ece_delta to rate_sample
.../networking/net_cachelines/tcp_sock.rst | 1 +
include/linux/tcp.h | 4 +-
include/net/inet_ecn.h | 20 +++-
include/net/tcp.h | 30 +++++-
include/net/tcp_ecn.h | 85 ++++++++++++-----
net/ipv4/sysctl_net_ipv4.c | 2 +-
net/ipv4/tcp.c | 2 +
net/ipv4/tcp_cong.c | 9 +-
net/ipv4/tcp_input.c | 91 +++++++++++++------
net/ipv4/tcp_minisocks.c | 40 +++++---
net/ipv4/tcp_offload.c | 3 +-
net/ipv4/tcp_output.c | 38 +++++---
12 files changed, 239 insertions(+), 86 deletions(-)
--
2.34.1
This patchset introduces target resume capability to netconsole allowing
it to recover targets when underlying low-level interface comes back
online.
The patchset starts by refactoring netconsole state representation in
order to allow representing deactivated targets (targets that are
disabled due to interfaces going down). It then modifies netconsole to
handle NETDEV_UP events for such targets and setups netpoll.
The patchset includes a selftest that validates netconsole target state
transitions and that target is functional after resumed.
Signed-off-by: Andre Carvalho <asantostc(a)gmail.com>
---
Changes in v2:
- Attempt to resume target in the same thread, instead of using
workqueue .
- Add wrapper around __netpoll_setup (patch 4).
- Renamed resume_target to maybe_resume_target and moved conditionals to
inside its implementation, keeping code more clear.
- Verify that device addr matches target mac address when target was
setup using mac.
- Update selftest to cover targets bound by mac and interface name.
- Fix typo in selftest comment and sort tests alphabetically in
Makefile.
- Link to v1:
https://lore.kernel.org/r/20250909-netcons-retrigger-v1-0-3aea904926cf@gmai…
---
Andre Carvalho (4):
netconsole: convert 'enabled' flag to enum for clearer state management
netpoll: add wrapper around __netpoll_setup with dev reference
netconsole: resume previously deactivated target
selftests: netconsole: validate target reactivation
Breno Leitao (2):
netconsole: add target_state enum
netconsole: add STATE_DEACTIVATED to track targets disabled by low level
drivers/net/netconsole.c | 102 +++++++++++++++------
include/linux/netpoll.h | 1 +
net/core/netpoll.c | 20 ++++
tools/testing/selftests/drivers/net/Makefile | 1 +
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 30 +++++-
.../selftests/drivers/net/netcons_resume.sh | 92 +++++++++++++++++++
6 files changed, 216 insertions(+), 30 deletions(-)
---
base-commit: 312e6f7676e63bbb9b81e5c68e580a9f776cc6f0
change-id: 20250816-netcons-retrigger-a4f547bfc867
Best regards,
--
Andre Carvalho <asantostc(a)gmail.com>
From: Benjamin Berg <benjamin.berg(a)intel.com>
This patchset is an attempt to start a nolibc port of UML. The goal is
to port UML to use nolibc in smaller chunks to make the switch more
manageable.
There are three parts to this patchset:
* Two patches to use tools/include headers instead of kernel headers
for userspace files.
* A few nolibc fixes and a new NOLIBC_NO_STARTCODE compile flag for it
* Finally nolibc build support for UML and switching two files while
adding the appropriate support in nolibc itself.
v1 of this patchset was
https://lore.kernel.org/all/20250915071115.1429196-1-benjamin@sipsolutions.…
Changes in v2:
- add sys/uio.h and sys/ptrace.h to nolibc
- Use NOLIBC_NO_RUNTIME to disable nolibc startup code
- Fix out-of-tree build
- various small improvements and cleanups
Should the nolibc changes be merged separately or could everything go
through the same branch?
Also, what about tools/include/linux/compiler.h? It seems that was added
for the tracing code, but it is not clear to me who might ACK that fix.
Benjamin
Benjamin Berg (11):
tools compiler.h: fix __used definition
um: use tools/include for user files
tools/nolibc/stdio: remove perror if NOLIBC_IGNORE_ERRNO is set
tools/nolibc/dirent: avoid errno in readdir_r
tools/nolibc: use __fallthrough__ rather than fallthrough
tools/nolibc: add option to disable runtime
um: add infrastructure to build files using nolibc
um: use nolibc for the --showconfig implementation
tools/nolibc: add uio.h with readv and writev
tools/nolibc: add ptrace support
um: switch ptrace FP register access to nolibc
arch/um/Makefile | 38 +++++++++++---
arch/um/include/shared/init.h | 2 +-
arch/um/include/shared/os.h | 2 +
arch/um/include/shared/user.h | 6 ---
arch/um/kernel/Makefile | 2 +-
arch/um/kernel/skas/stub.c | 1 +
arch/um/kernel/skas/stub_exe.c | 4 +-
arch/um/os-Linux/skas/process.c | 6 +--
arch/um/os-Linux/start_up.c | 4 +-
arch/um/scripts/Makefile.rules | 10 +++-
arch/x86/um/Makefile | 6 ++-
arch/x86/um/os-Linux/Makefile | 5 +-
arch/x86/um/os-Linux/registers.c | 16 ++----
arch/x86/um/user-offsets.c | 1 -
tools/include/linux/compiler.h | 2 +-
tools/include/nolibc/Makefile | 2 +
tools/include/nolibc/arch-arm.h | 2 +
tools/include/nolibc/arch-arm64.h | 2 +
tools/include/nolibc/arch-loongarch.h | 2 +
tools/include/nolibc/arch-m68k.h | 2 +
tools/include/nolibc/arch-mips.h | 2 +
tools/include/nolibc/arch-powerpc.h | 2 +
tools/include/nolibc/arch-riscv.h | 2 +
tools/include/nolibc/arch-s390.h | 2 +
tools/include/nolibc/arch-sh.h | 2 +
tools/include/nolibc/arch-sparc.h | 2 +
tools/include/nolibc/arch-x86.h | 4 ++
tools/include/nolibc/compiler.h | 4 +-
tools/include/nolibc/crt.h | 3 ++
tools/include/nolibc/dirent.h | 6 +--
tools/include/nolibc/nolibc.h | 2 +
tools/include/nolibc/stackprotector.h | 2 +
tools/include/nolibc/stdio.h | 2 +
tools/include/nolibc/stdlib.h | 2 +
tools/include/nolibc/sys.h | 3 +-
tools/include/nolibc/sys/auxv.h | 3 ++
tools/include/nolibc/sys/ptrace.h | 52 ++++++++++++++++++++
tools/include/nolibc/sys/uio.h | 49 ++++++++++++++++++
tools/testing/selftests/nolibc/nolibc-test.c | 11 +++++
39 files changed, 222 insertions(+), 48 deletions(-)
create mode 100644 tools/include/nolibc/sys/ptrace.h
create mode 100644 tools/include/nolibc/sys/uio.h
--
2.51.0
The sendto() call in walk_tx() was passing NULL as the buffer argument,
which can trigger a -Wnonnull warning with some compilers.
Although the size is 0 and no data is actually sent, passing a null
pointer is technically incorrect.
This commit changes NULL to an empty string literal ("") to satisfy the
non-null argument requirement and fix the compiler warning.
Signed-off-by: Wake Liu <wakel(a)google.com>
---
tools/testing/selftests/net/psock_tpacket.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/psock_tpacket.c b/tools/testing/selftests/net/psock_tpacket.c
index 221270cee3ea..0c24adbb292e 100644
--- a/tools/testing/selftests/net/psock_tpacket.c
+++ b/tools/testing/selftests/net/psock_tpacket.c
@@ -470,7 +470,7 @@ static void walk_tx(int sock, struct ring *ring)
bug_on(total_packets != 0);
- ret = sendto(sock, NULL, 0, 0, NULL, 0);
+ ret = sendto(sock, "", 0, 0, NULL, 0);
if (ret == -1) {
perror("sendto");
exit(1);
--
2.51.0.534.gc79095c0ca-goog
The TODO about using the number of vCPUs instead of vcpu.id + 1
was already addressed by commit 376bc1b458c9 ("KVM: selftests: Don't
assume vcpu->id is '0' in xAPIC state test"). The comment is now
stale and can be removed.
Signed-off-by: Sukrut Heroorkar <hsukrut3(a)gmail.com>
---
tools/testing/selftests/kvm/x86/xapic_state_test.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86/xapic_state_test.c b/tools/testing/selftests/kvm/x86/xapic_state_test.c
index fdebff1165c7..3b4814c55722 100644
--- a/tools/testing/selftests/kvm/x86/xapic_state_test.c
+++ b/tools/testing/selftests/kvm/x86/xapic_state_test.c
@@ -120,8 +120,8 @@ static void test_icr(struct xapic_vcpu *x)
__test_icr(x, icr | i);
/*
- * Send all flavors of IPIs to non-existent vCPUs. TODO: use number of
- * vCPUs, not vcpu.id + 1. Arbitrarily use vector 0xff.
+ * Send all flavors of IPIs to non-existent vCPUs. Arbitrarily use
+ * vector 0xff.
*/
icr = APIC_INT_ASSERT | 0xff;
for (i = 0; i < 0xff; i++) {
--
2.43.0
Fix to avoid the usage of the `res` variable uninitialized in the
following macro expansions.
It solves the following warning:
In function ‘iommufd_viommu_vdevice_alloc’,
inlined from ‘wrapper_iommufd_viommu_vdevice_alloc’ at
iommufd.c:2889:1:
../kselftest_harness.h:760:12: warning: ‘ret’ may be used uninitialized
[-Wmaybe-uninitialized]
760 | if (!(__exp _t __seen)) { \
| ^
../kselftest_harness.h:513:9: note: in expansion of macro ‘__EXPECT’
513 | __EXPECT(expected, #expected, seen, #seen, ==, 1)
| ^~~~~~~~
iommufd_utils.h:1057:9: note: in expansion of macro ‘ASSERT_EQ’
1057 | ASSERT_EQ(0, _test_cmd_trigger_vevents(self->fd, dev_id,
nvevents))
| ^~~~~~~~~
iommufd.c:2924:17: note: in expansion of macro
‘test_cmd_trigger_vevents’
2924 | test_cmd_trigger_vevents(dev_id, 3);
| ^~~~~~~~~~~~~~~~~~~~~~~~
The issue can be reproduced, building the tests, with the command:
make -C tools/testing/selftests TARGETS=iommu
Signed-off-by: Alessandro Zanni <alessandro.zanni87(a)gmail.com>
---
tools/testing/selftests/iommu/iommufd_utils.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 3c3e08b8c90e..4ae0fcc4f871 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -1042,7 +1042,7 @@ static int _test_cmd_trigger_vevents(int fd, __u32 dev_id, __u32 nvevents)
.dev_id = dev_id,
},
};
- int ret;
+ int ret = 0;
while (nvevents--) {
ret = ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_TRIGGER_VEVENT),
--
2.43.0
This is series 2a/5 of the migration to `core::ffi::CStr`[0].
20250704-core-cstr-prepare-v1-0-a91524037783(a)gmail.com.
This series depends on the prior series[0] and is intended to go through
the rust tree to reduce the number of release cycles required to
complete the work.
Subsystem maintainers: I would appreciate your `Acked-by`s so that this
can be taken through Miguel's tree (where the other series must go).
[0] https://lore.kernel.org/all/20250704-core-cstr-prepare-v1-0-a91524037783@gm…
Signed-off-by: Tamir Duberstein <tamird(a)gmail.com>
---
Changes in v3:
- Add a patch to address new code in device.rs.
- Drop incorrectly applied Acked-by tags from Danilo.
- Link to v2: https://lore.kernel.org/r/20250719-core-cstr-fanout-1-v2-0-1ab5ba189c6e@gma…
Changes in v2:
- Rebase on rust-next.
- Drop pin-init patch, which is no longer needed.
- Link to v1: https://lore.kernel.org/r/20250709-core-cstr-fanout-1-v1-0-64308e7203fc@gma…
---
Tamir Duberstein (9):
gpu: nova-core: use `kernel::{fmt,prelude::fmt!}`
rust: alloc: use `kernel::{fmt,prelude::fmt!}`
rust: block: use `kernel::{fmt,prelude::fmt!}`
rust: device: use `kernel::{fmt,prelude::fmt!}`
rust: file: use `kernel::{fmt,prelude::fmt!}`
rust: kunit: use `kernel::{fmt,prelude::fmt!}`
rust: seq_file: use `kernel::{fmt,prelude::fmt!}`
rust: sync: use `kernel::{fmt,prelude::fmt!}`
rust: device: use `kernel::{fmt,prelude::fmt!}`
drivers/block/rnull.rs | 2 +-
drivers/gpu/nova-core/gpu.rs | 3 +--
drivers/gpu/nova-core/regs/macros.rs | 6 +++---
rust/kernel/alloc/kbox.rs | 2 +-
rust/kernel/alloc/kvec.rs | 2 +-
rust/kernel/alloc/kvec/errors.rs | 2 +-
rust/kernel/block/mq.rs | 2 +-
rust/kernel/block/mq/gen_disk.rs | 2 +-
rust/kernel/block/mq/raw_writer.rs | 3 +--
rust/kernel/device.rs | 6 +++---
rust/kernel/device/property.rs | 23 ++++++++++++-----------
rust/kernel/fs/file.rs | 5 +++--
rust/kernel/kunit.rs | 8 ++++----
rust/kernel/seq_file.rs | 6 +++---
rust/kernel/sync/arc.rs | 2 +-
scripts/rustdoc_test_gen.rs | 2 +-
16 files changed, 38 insertions(+), 38 deletions(-)
---
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
change-id: 20250709-core-cstr-fanout-1-f20611832272
Best regards,
--
Tamir Duberstein <tamird(a)gmail.com>
The rtnetlink FOU selftest prints an incorrect string:
"FAIL: fou"s. Change it to the intended "FAIL: fou" by
removing a stray character in the end_test string of the test.
Signed-off-by: Alok Tiwari <alok.a.tiwari(a)oracle.com>
---
tools/testing/selftests/net/rtnetlink.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh
index d6c00efeb664..24bba74c77ee 100755
--- a/tools/testing/selftests/net/rtnetlink.sh
+++ b/tools/testing/selftests/net/rtnetlink.sh
@@ -519,7 +519,7 @@ kci_test_encap_fou()
run_cmd_fail ip -netns "$testns" fou del port 9999
run_cmd ip -netns "$testns" fou del port 7777
if [ $ret -ne 0 ]; then
- end_test "FAIL: fou"s
+ end_test "FAIL: fou"
return 1
fi
--
2.50.1
To: linux-kselftest(a)vger.kernel.org
Date: 24-09-2025
Thematic Funds Letter Of Intent
It's a pleasure to connect with you
Having been referred to your investment by my team, we would be
honored to review your available investment projects for onward
referral to my principal investors who can allocate capital for
the financing of it.
kindly advise at your convenience
Best Regards,
Respectfully,
Al Sayyid Sultan Yarub Al Busaidi
Director
The active-backup bonding mode supports XFRM ESP offload. However, when
a bond is added using command like `ip link add bond0 type bond mode 1
miimon 100`, the `ethtool -k` command shows that the XFRM ESP offload is
disabled. This occurs because, in bond_newlink(), we change bond link
first and register bond device later. So the XFRM feature update in
bond_option_mode_set() is not called as the bond device is not yet
registered, leading to the offload feature not being set successfully.
To resolve this issue, we can modify the code order in bond_newlink() to
ensure that the bond device is registered first before changing the bond
link parameters. This change will allow the XFRM ESP offload feature to be
correctly enabled.
Fixes: 007ab5345545 ("bonding: fix feature flag setting at init time")
Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
v2: rebase to latest net, no code update
---
drivers/net/bonding/bond_main.c | 2 +-
drivers/net/bonding/bond_netlink.c | 16 +++++++++-------
include/net/bonding.h | 1 +
3 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 57be04f6cb11..f4f0feddd9fa 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4411,7 +4411,7 @@ void bond_work_init_all(struct bonding *bond)
INIT_DELAYED_WORK(&bond->slave_arr_work, bond_slave_arr_handler);
}
-static void bond_work_cancel_all(struct bonding *bond)
+void bond_work_cancel_all(struct bonding *bond)
{
cancel_delayed_work_sync(&bond->mii_work);
cancel_delayed_work_sync(&bond->arp_work);
diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c
index 57fff2421f1b..7a9d73ec8e91 100644
--- a/drivers/net/bonding/bond_netlink.c
+++ b/drivers/net/bonding/bond_netlink.c
@@ -579,20 +579,22 @@ static int bond_newlink(struct net_device *bond_dev,
struct rtnl_newlink_params *params,
struct netlink_ext_ack *extack)
{
+ struct bonding *bond = netdev_priv(bond_dev);
struct nlattr **data = params->data;
struct nlattr **tb = params->tb;
int err;
- err = bond_changelink(bond_dev, tb, data, extack);
- if (err < 0)
+ err = register_netdevice(bond_dev);
+ if (err)
return err;
- err = register_netdevice(bond_dev);
- if (!err) {
- struct bonding *bond = netdev_priv(bond_dev);
+ netif_carrier_off(bond_dev);
+ bond_work_init_all(bond);
- netif_carrier_off(bond_dev);
- bond_work_init_all(bond);
+ err = bond_changelink(bond_dev, tb, data, extack);
+ if (err) {
+ bond_work_cancel_all(bond);
+ unregister_netdevice(bond_dev);
}
return err;
diff --git a/include/net/bonding.h b/include/net/bonding.h
index e06f0d63b2c1..bd56ad976cfb 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -711,6 +711,7 @@ struct bond_vlan_tag *bond_verify_device_path(struct net_device *start_dev,
int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave);
void bond_slave_arr_work_rearm(struct bonding *bond, unsigned long delay);
void bond_work_init_all(struct bonding *bond);
+void bond_work_cancel_all(struct bonding *bond);
#ifdef CONFIG_PROC_FS
void bond_create_proc_entry(struct bonding *bond);
--
2.50.1
Soft offlining a HugeTLB page reduces the HugeTLB page pool.
Commit 56374430c5dfc ("mm/memory-failure: userspace controls soft-offlining pages")
introduced the following sysctl interface to control soft offline:
/proc/sys/vm/enable_soft_offline
The interface does not distinguish between page types:
0 - Soft offline is disabled
1 - Soft offline is enabled
Convert enable_soft_offline to a bitmask and support disabling soft
offline for HugeTLB pages:
Bits:
0 - Enable soft offline
1 - Disable soft offline for HugeTLB pages
Supported values:
0 - Soft offline is disabled
1 - Soft offline is enabled
3 - Soft offline is enabled (disabled for HugeTLB pages)
Existing behavior is preserved.
Update documentation and HugeTLB soft offline self tests.
Reported-by: Shawn Fan <shawn.fan(a)intel.com>
Suggested-by: Tony Luck <tony.luck(a)intel.com>
Signed-off-by: Kyle Meyer <kyle.meyer(a)hpe.com>
---
Tony's patch:
* https://lore.kernel.org/all/20250904155720.22149-1-tony.luck@intel.com
v1:
* https://lore.kernel.org/all/aMGkAI3zKlVsO0S2@hpe.com
v1 -> v2:
* Make the interface extensible, as suggested by David.
* Preserve existing behavior, as suggested by Jiaqi and David.
Why clear errno in self tests?
madvise() does not set errno when it's successful and errno is set by madvise()
during test_soft_offline_common(3) causing test_soft_offline_common(1) to fail:
# Test soft-offline when enabled_soft_offline=1
# Hugepagesize is 1048576kB
# enable_soft_offline => 1
# Before MADV_SOFT_OFFLINE nr_hugepages=7
# Allocated 0x80000000 bytes of hugetlb pages
# MADV_SOFT_OFFLINE 0x7fd600000000 ret=0, errno=95
# MADV_SOFT_OFFLINE should ret 0
# After MADV_SOFT_OFFLINE nr_hugepages=6
not ok 2 Test soft-offline when enabled_soft_offline=1
---
.../ABI/testing/sysfs-memory-page-offline | 3 ++
Documentation/admin-guide/sysctl/vm.rst | 28 ++++++++++++++++---
mm/memory-failure.c | 17 +++++++++--
.../selftests/mm/hugetlb-soft-offline.c | 19 ++++++++++---
4 files changed, 56 insertions(+), 11 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-memory-page-offline b/Documentation/ABI/testing/sysfs-memory-page-offline
index 00f4e35f916f..d3f05ed6605e 100644
--- a/Documentation/ABI/testing/sysfs-memory-page-offline
+++ b/Documentation/ABI/testing/sysfs-memory-page-offline
@@ -20,6 +20,9 @@ Description:
number, or a error when the offlining failed. Reading
the file is not allowed.
+ Soft-offline can be controlled via sysctl, see:
+ Documentation/admin-guide/sysctl/vm.rst
+
What: /sys/devices/system/memory/hard_offline_page
Date: Sep 2009
KernelVersion: 2.6.33
diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 4d71211fdad8..ace73480eb9d 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -309,19 +309,39 @@ physical memory) vs performance / capacity implications in transparent and
HugeTLB cases.
For all architectures, enable_soft_offline controls whether to soft offline
-memory pages. When set to 1, kernel attempts to soft offline the pages
-whenever it thinks needed. When set to 0, kernel returns EOPNOTSUPP to
-the request to soft offline the pages. Its default value is 1.
+memory pages.
+
+enable_soft_offline is a bitmask:
+
+Bits::
+
+ 0 - Enable soft offline
+ 1 - Disable soft offline for HugeTLB pages
+
+Supported values::
+
+ 0 - Soft offline is disabled
+ 1 - Soft offline is enabled
+ 3 - Soft offline is enabled (disabled for HugeTLB pages)
+
+The default value is 1.
+
+If soft offline is disabled for the requested page type, EOPNOTSUPP is returned.
It is worth mentioning that after setting enable_soft_offline to 0, the
following requests to soft offline pages will not be performed:
+- Request to soft offline from sysfs (soft_offline_page).
+
- Request to soft offline pages from RAS Correctable Errors Collector.
-- On ARM, the request to soft offline pages from GHES driver.
+- On ARM and X86, the request to soft offline pages from GHES driver.
- On PARISC, the request to soft offline pages from Page Deallocation Table.
+Note:
+ Soft offlining a HugeTLB page reduces the HugeTLB page pool.
+
extfrag_threshold
=================
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index fc30ca4804bf..0ad9ae11d9e8 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -64,11 +64,14 @@
#include "internal.h"
#include "ras/ras_event.h"
+#define SOFT_OFFLINE_ENABLED BIT(0)
+#define SOFT_OFFLINE_SKIP_HUGETLB BIT(1)
+
static int sysctl_memory_failure_early_kill __read_mostly;
static int sysctl_memory_failure_recovery __read_mostly = 1;
-static int sysctl_enable_soft_offline __read_mostly = 1;
+static int sysctl_enable_soft_offline __read_mostly = SOFT_OFFLINE_ENABLED;
atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
@@ -150,7 +153,7 @@ static const struct ctl_table memory_failure_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
- .extra2 = SYSCTL_ONE,
+ .extra2 = SYSCTL_THREE,
}
};
@@ -2799,12 +2802,20 @@ int soft_offline_page(unsigned long pfn, int flags)
return -EIO;
}
- if (!sysctl_enable_soft_offline) {
+ if (!(sysctl_enable_soft_offline & SOFT_OFFLINE_ENABLED)) {
pr_info_once("disabled by /proc/sys/vm/enable_soft_offline\n");
put_ref_page(pfn, flags);
return -EOPNOTSUPP;
}
+ if (sysctl_enable_soft_offline & SOFT_OFFLINE_SKIP_HUGETLB) {
+ if (folio_test_hugetlb(pfn_folio(pfn))) {
+ pr_info_once("disabled for HugeTLB pages by /proc/sys/vm/enable_soft_offline\n");
+ put_ref_page(pfn, flags);
+ return -EOPNOTSUPP;
+ }
+ }
+
mutex_lock(&mf_mutex);
if (PageHWPoison(page)) {
diff --git a/tools/testing/selftests/mm/hugetlb-soft-offline.c b/tools/testing/selftests/mm/hugetlb-soft-offline.c
index f086f0e04756..b87c8778cadf 100644
--- a/tools/testing/selftests/mm/hugetlb-soft-offline.c
+++ b/tools/testing/selftests/mm/hugetlb-soft-offline.c
@@ -5,6 +5,8 @@
* offlining failed with EOPNOTSUPP.
* - if enable_soft_offline = 1, a hugepage should be dissolved and
* nr_hugepages/free_hugepages should be reduced by 1.
+ * - if enable_soft_offline = 3, hugepages should stay intact and soft
+ * offlining failed with EOPNOTSUPP.
*
* Before running, make sure more than 2 hugepages of default_hugepagesz
* are allocated. For example, if /proc/meminfo/Hugepagesize is 2048kB:
@@ -32,6 +34,9 @@
#define EPREFIX " !!! "
+#define SOFT_OFFLINE_ENABLED (1 << 0)
+#define SOFT_OFFLINE_SKIP_HUGETLB (1 << 1)
+
static int do_soft_offline(int fd, size_t len, int expect_errno)
{
char *filemap = NULL;
@@ -56,6 +61,7 @@ static int do_soft_offline(int fd, size_t len, int expect_errno)
ksft_print_msg("Allocated %#lx bytes of hugetlb pages\n", len);
hwp_addr = filemap + len / 2;
+ errno = 0;
ret = madvise(hwp_addr, pagesize, MADV_SOFT_OFFLINE);
ksft_print_msg("MADV_SOFT_OFFLINE %p ret=%d, errno=%d\n",
hwp_addr, ret, errno);
@@ -83,7 +89,7 @@ static int set_enable_soft_offline(int value)
char cmd[256] = {0};
FILE *cmdfile = NULL;
- if (value != 0 && value != 1)
+ if (value < 0 || value > 3)
return -EINVAL;
sprintf(cmd, "echo %d > /proc/sys/vm/enable_soft_offline", value);
@@ -155,13 +161,17 @@ static int create_hugetlbfs_file(struct statfs *file_stat)
static void test_soft_offline_common(int enable_soft_offline)
{
int fd;
- int expect_errno = enable_soft_offline ? 0 : EOPNOTSUPP;
+ int expect_errno = 0;
struct statfs file_stat;
unsigned long hugepagesize_kb = 0;
unsigned long nr_hugepages_before = 0;
unsigned long nr_hugepages_after = 0;
int ret;
+ if (!(enable_soft_offline & SOFT_OFFLINE_ENABLED) ||
+ (enable_soft_offline & SOFT_OFFLINE_SKIP_HUGETLB))
+ expect_errno = EOPNOTSUPP;
+
ksft_print_msg("Test soft-offline when enabled_soft_offline=%d\n",
enable_soft_offline);
@@ -198,7 +208,7 @@ static void test_soft_offline_common(int enable_soft_offline)
// No need for the hugetlbfs file from now on.
close(fd);
- if (enable_soft_offline) {
+ if (expect_errno == 0) {
if (nr_hugepages_before != nr_hugepages_after + 1) {
ksft_test_result_fail("MADV_SOFT_OFFLINE should reduced 1 hugepage\n");
return;
@@ -219,8 +229,9 @@ static void test_soft_offline_common(int enable_soft_offline)
int main(int argc, char **argv)
{
ksft_print_header();
- ksft_set_plan(2);
+ ksft_set_plan(3);
+ test_soft_offline_common(3);
test_soft_offline_common(1);
test_soft_offline_common(0);
--
2.51.0
With the current Makefile, if the user tries something like
make TARGETS="bpf mm"
only mm is run and bpf is skipped, which is not intentional.
`bpf` and `sched_ext` are always filtered out even when TARGETS is set
explicitly due to how SKIP_TARGETS is implemented.
This default skip exists because these tests require newer LLVM/Clang
versions that may not be available on all systems.
Fix the SKIP_TARGETS logic so that bpf and sched_ext remain
skipped when TARGETS is taken from the Makefile but are included when
the user specifies them explicitly.
Signed-off-by: I Viswanath <viswanathiyyappan(a)gmail.com>
---
make --silent summary=1 TARGETS="bpf size" kselftest
make[3]: Entering directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/bpf'
Auto-detecting system features:
... llvm: [ OFF ]
Makefile:127: tools/build/Makefile.feature: No such file or directory
make[4]: *** No rule to make target 'tools/build/Makefile.feature'. Stop.
make[3]: *** [Makefile:344: /home/user/kernel-dev/linux-next/tools/testing/selftests/bpf/tools/sbin/bpftool] Error 2
make[3]: Leaving directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/bpf'
make[3]: Nothing to be done for 'all'.
make[3]: Entering directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/bpf'
Auto-detecting system features:
... llvm: [ OFF ]
Makefile:127: tools/build/Makefile.feature: No such file or directory
make[4]: *** No rule to make target 'tools/build/Makefile.feature'. Stop.
make[3]: *** [Makefile:344: /home/user/kernel-dev/linux-next/tools/testing/selftests/bpf/tools/sbin/bpftool] Error 2
make[3]: Leaving directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/bpf'
TAP version 13
1..1
# selftests: size: get_size
ok 1 selftests: size: get_size
make --silent summary=1 kselftest (bpf is between arm64 and breakpoints in TARGETS)
make[3]: Nothing to be done for 'all'.
make[3]: Entering directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/alsa'
make[3]: Nothing to be done for 'all'.
make[3]: Leaving directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/alsa'
make[3]: Entering directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/amd-pstate'
make[3]: Nothing to be done for 'all'.
make[3]: Leaving directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/amd-pstate'
make[3]: Entering directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/arm64'
make[3]: Leaving directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/arm64'
make[3]: Entering directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/breakpoints'
make[3]: Nothing to be done for 'all'.
make[3]: Leaving directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/breakpoints'
make[3]: Nothing to be done for 'all'.
tools/testing/selftests/Makefile | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index babed7b1c2d1..c6cedb09c372 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -145,7 +145,10 @@ endif
# User can optionally provide a TARGETS skiplist. By default we skip
# targets using BPF since it has cutting edge build time dependencies
# which require more effort to install.
-SKIP_TARGETS ?= bpf sched_ext
+ifeq ($(origin TARGETS), file)
+ SKIP_TARGETS ?= bpf sched_ext
+endif
+
ifneq ($(SKIP_TARGETS),)
TMP := $(filter-out $(SKIP_TARGETS), $(TARGETS))
override TARGETS := $(TMP)
--
2.47.3
Now that the 'flags' attribute is used, it seems interesting to add one
flag for 'server-side', a boolean value.
Here are a few patches related to the 'server-side' attribute:
- Patch 1: only announce this attribute on the server side.
- Patch 2: announce the 'server-side' flag when this is the case.
- Patch 3: deprecate the 'server-side' attribute.
- Patch 4: use the 'server-side' flag in the selftests.
- Patches 5, 6: small cleanups when working on code around.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Matthieu Baerts (NGI0) (6):
mptcp: pm: netlink: only add server-side attr when true
mptcp: pm: netlink: announce server-side flag
mptcp: pm: netlink: deprecate server-side attribute
selftests: mptcp: pm: get server-side flag
mptcp: use _BITUL() instead of (1 << x)
mptcp: remove unused returned value of check_data_fin
Documentation/netlink/specs/mptcp_pm.yaml | 5 +++--
include/uapi/linux/mptcp.h | 11 ++++++-----
include/uapi/linux/mptcp_pm.h | 4 ++--
net/mptcp/pm_netlink.c | 9 +++++++--
net/mptcp/protocol.c | 5 +----
tools/testing/selftests/net/mptcp/pm_nl_ctl.c | 9 ++++++++-
tools/testing/selftests/net/mptcp/userspace_pm.sh | 2 +-
7 files changed, 28 insertions(+), 17 deletions(-)
---
base-commit: 315f423be0d1ebe720d8fd4fa6bed68586b13d34
change-id: 20250916-net-next-mptcp-server-side-flag-0f002418946d
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
This series backports 15 patches to update minmax.h in the 6.6.y branch,
aligning it with v6.17-rc7.
The ultimate goal is to synchronize all longterm branches so that they
include the full set of minmax.h changes.
The key motivation is to bring in commit d03eba99f5bf ("minmax: allow
min()/max()/clamp() if the arguments have the same signedness"), which
is missing in older kernels.
In mainline, this change enables min()/max()/clamp() to accept mixed
argument types, provided both have the same signedness. Without it,
backported patches that use these forms may trigger compiler warnings,
which escalate to build failures when -Werror is enabled.
David Laight (7):
minmax.h: add whitespace around operators and after commas
minmax.h: update some comments
minmax.h: reduce the #define expansion of min(), max() and clamp()
minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp()
minmax.h: move all the clamp() definitions after the min/max() ones
minmax.h: simplify the variants of clamp()
minmax.h: remove some #defines that are only expanded once
Linus Torvalds (8):
minmax: avoid overly complicated constant expressions in VM code
minmax: simplify and clarify min_t()/max_t() implementation
minmax: add a few more MIN_T/MAX_T users
minmax: make generic MIN() and MAX() macros available everywhere
minmax: simplify min()/max()/clamp() implementation
minmax: don't use max() in situations that want a C constant
expression
minmax: improve macro expansion and type checking
minmax: fix up min3() and max3() too
arch/um/drivers/mconsole_user.c | 2 +
arch/x86/mm/pgtable.c | 2 +-
drivers/edac/sb_edac.c | 4 +-
drivers/edac/skx_common.h | 1 -
.../drm/amd/display/modules/hdcp/hdcp_ddc.c | 2 +
.../drm/amd/pm/powerplay/hwmgr/ppevvmath.h | 14 +-
drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 2 +-
drivers/gpu/drm/drm_color_mgmt.c | 2 +-
drivers/gpu/drm/radeon/evergreen_cs.c | 2 +
drivers/hwmon/adt7475.c | 24 +-
drivers/input/touchscreen/cyttsp4_core.c | 2 +-
drivers/irqchip/irq-sun6i-r.c | 2 +-
drivers/md/dm-integrity.c | 6 +-
drivers/media/dvb-frontends/stv0367_priv.h | 3 +
.../net/can/usb/etas_es58x/es58x_devlink.c | 2 +-
.../net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +-
drivers/net/fjes/fjes_main.c | 4 +-
drivers/nfc/pn544/i2c.c | 2 -
drivers/platform/x86/sony-laptop.c | 1 -
drivers/scsi/isci/init.c | 6 +-
.../pci/hive_isp_css_include/math_support.h | 5 -
fs/btrfs/tree-checker.c | 2 +-
include/linux/compiler.h | 9 +
include/linux/minmax.h | 228 +++++++++++-------
include/linux/pageblock-flags.h | 2 +-
kernel/trace/preemptirq_delay_test.c | 2 -
lib/btree.c | 1 -
lib/decompress_unlzma.c | 2 +
lib/vsprintf.c | 2 +-
mm/zsmalloc.c | 2 -
net/ipv4/proc.c | 2 +-
net/ipv6/proc.c | 2 +-
tools/testing/selftests/mm/mremap_test.c | 2 +
tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +
34 files changed, 202 insertions(+), 146 deletions(-)
--
2.47.3
Hi,
While staring at epoll, I noticed ep_events_available() looks wrong. I
wrote a small program to confirm, and yes it is definitely wrong.
This series adds a reproducer to kselftest, and fix the bug.
Nam Cao (2):
selftests/eventpoll: Add test for multiple waiters
eventpoll: Fix epoll_wait() report false negative
fs/eventpoll.c | 16 +------
.../filesystems/epoll/epoll_wakeup_test.c | 45 +++++++++++++++++++
2 files changed, 47 insertions(+), 14 deletions(-)
--
2.39.5
Hi everyone,
This patchset introduces a new BPF program type that allows overriding
a tracepoint probe function registered via register_trace_*.
Motivation
----------
Tracepoint probe functions registered via register_trace_* in the kernel
cannot be dynamically modified, changing a probe function requires recompiling
the kernel and rebooting. Nor can BPF programs change an existing
probe function.
Overiding tracepoint supports a way to apply patches into kernel quickly
(such as applying security ones), through predefined static tracepoints,
without waiting for upstream integration.
This patchset demonstrates the way to override probe functions by BPF program.
Overview
--------
This patchset adds BPF_PROG_TYPE_RAW_TRACEPOINT_OVERRIDE program type.
When this type of BPF program attaches, it overrides the target tracepoint
probe function.
And it also extends a new struct type "tracepoint_func_snapshot", which extends
the tracepoint structure. It is used to record the original probe function
registered by kernel after BPF program being attached and restore from it
after detachment.
Critical steps
--------------
1. Attach: Attach programs via the raw_tracepoint_open syscall.
2. Override:
(a) Locate the target probe by `probe_name`.
(b) Override target probe with the BPF program.
(c) Save the BPF program and target probe function into "tracepoint_func_snapshot".
3. Restore: When the BPF program is detached, automatically restore
the original probe function from earlier saved snapshot.
Future work
-----------
This patchset is intended as a first step toward supporting BPF programs
that can override tracepoint probes. The current implementation may not yet
cover all use cases or handle every corner case.
I welcome feedback and suggestions from the community, and will continue to
refine and improve the design based on comments and real-world requirements.
Thanks!
Fuyu
Fuyu Zhao (3):
bpf: Introduce BPF_PROG_TYPE_RAW_TRACEPOINT_OVERRIDE
libbpf: Add support for BPF_PROG_TYPE_RAW_TRACEPOINT_OVERRIDE
selftests/bpf: Add selftest for "raw_tp.o"
include/linux/bpf_types.h | 2 +
include/linux/trace_events.h | 9 +
include/linux/tracepoint-defs.h | 6 +
include/linux/tracepoint.h | 3 +
include/uapi/linux/bpf.h | 2 +
kernel/bpf/syscall.c | 35 +++-
kernel/trace/bpf_trace.c | 31 +++
kernel/tracepoint.c | 190 +++++++++++++++++-
tools/include/uapi/linux/bpf.h | 2 +
tools/lib/bpf/bpf.c | 1 +
tools/lib/bpf/bpf.h | 3 +-
tools/lib/bpf/libbpf.c | 27 ++-
tools/lib/bpf/libbpf.h | 3 +-
.../bpf/prog_tests/raw_tp_override_test_run.c | 23 +++
.../bpf/progs/test_raw_tp_override_test_run.c | 20 ++
.../selftests/bpf/test_kmods/bpf_testmod.c | 7 +
16 files changed, 352 insertions(+), 12 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/raw_tp_override_test_run.c
create mode 100644 tools/testing/selftests/bpf/progs/test_raw_tp_override_test_run.c
--
2.43.0
The test_kexec_jump binary is generated during 'make kselftest' but was
not ignored, leading to it appearing as untracked in `git status`.
Create a .gitignore file for selftests/kexec and add this
generated file to it.
Signed-off-by: Madhur Kumar <madhurkumar004(a)gmail.com>
---
tools/testing/selftests/kexec/.gitignore | 1 +
1 file changed, 1 insertion(+)
create mode 100644 tools/testing/selftests/kexec/.gitignore
diff --git a/tools/testing/selftests/kexec/.gitignore b/tools/testing/selftests/kexec/.gitignore
new file mode 100644
index 000000000000..6cbe9a1049f3
--- /dev/null
+++ b/tools/testing/selftests/kexec/.gitignore
@@ -0,0 +1 @@
+test_kexec_jump
--
2.51.0
The tmpshmcstat file is generated with kselftest run but was not
ignored, leading to it appearing as untracked in git status.
Add it to .gitignore to silence the warning.
Signed-off-by: Madhur Kumar <madhurkumar004(a)gmail.com>
---
tools/testing/selftests/cachestat/.gitignore | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/cachestat/.gitignore b/tools/testing/selftests/cachestat/.gitignore
index d6c30b43a4bb..abbb13b6e96b 100644
--- a/tools/testing/selftests/cachestat/.gitignore
+++ b/tools/testing/selftests/cachestat/.gitignore
@@ -1,2 +1,3 @@
# SPDX-License-Identifier: GPL-2.0-only
test_cachestat
+tmpshmcstat
--
2.51.0
For a while now we have supported file handles for pidfds. This has
proven to be very useful.
Extend the concept to cover namespaces as well. After this patchset it
is possible to encode and decode namespace file handles using the
commong name_to_handle_at() and open_by_handle_at() apis.
Namespaces file descriptors can already be derived from pidfds which
means they aren't subject to overmount protection bugs. IOW, it's
irrelevant if the caller would not have access to an appropriate
/proc/<pid>/ns/ directory as they could always just derive the namespace
based on a pidfd already.
It has the same advantage as pidfds. It's possible to reliably and for
the lifetime of the system refer to a namespace without pinning any
resources and to compare them.
Permission checking is kept simple. If the caller is located in the
namespace the file handle refers to they are able to open it otherwise
they must hold privilege over the owning namespace of the relevant
namespace.
Both the network namespace and the mount namespace already have an
associated cookie that isn't recycled and is fully exposed to userspace.
Move this into ns_common and use the same id space for all namespaces so
they can trivially and reliably be compared.
There's more coming based on the iterator infrastructure but the series
is large enough and focuses on file handles.
Extensive selftests included. I still have various other test-suites to
run but it holds up so far.
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
---
Christian Brauner (32):
pidfs: validate extensible ioctls
nsfs: validate extensible ioctls
block: use extensible_ioctl_valid()
ns: move to_ns_common() to ns_common.h
nsfs: add nsfs.h header
ns: uniformly initialize ns_common
mnt: use ns_common_init()
ipc: use ns_common_init()
cgroup: use ns_common_init()
pid: use ns_common_init()
time: use ns_common_init()
uts: use ns_common_init()
user: use ns_common_init()
net: use ns_common_init()
ns: remove ns_alloc_inum()
nstree: make iterator generic
mnt: support iterator
cgroup: support iterator
ipc: support iterator
net: support iterator
pid: support iterator
time: support iterator
userns: support iterator
uts: support iterator
ns: add to_<type>_ns() to respective headers
nsfs: add current_in_namespace()
nsfs: support file handles
nsfs: support exhaustive file handles
nsfs: add missing id retrieval support
tools: update nsfs.h uapi header
selftests/namespaces: add identifier selftests
selftests/namespaces: add file handle selftests
block/blk-integrity.c | 8 +-
fs/fhandle.c | 6 +
fs/internal.h | 1 +
fs/mount.h | 10 +-
fs/namespace.c | 156 +--
fs/nsfs.c | 266 +++-
fs/pidfs.c | 2 +-
include/linux/cgroup.h | 5 +
include/linux/exportfs.h | 6 +
include/linux/fs.h | 14 +
include/linux/ipc_namespace.h | 5 +
include/linux/ns_common.h | 29 +
include/linux/nsfs.h | 40 +
include/linux/nsproxy.h | 11 -
include/linux/nstree.h | 89 ++
include/linux/pid_namespace.h | 5 +
include/linux/proc_ns.h | 32 +-
include/linux/time_namespace.h | 9 +
include/linux/user_namespace.h | 5 +
include/linux/utsname.h | 5 +
include/net/net_namespace.h | 6 +
include/uapi/linux/fcntl.h | 1 +
include/uapi/linux/nsfs.h | 12 +-
init/main.c | 2 +
ipc/msgutil.c | 1 +
ipc/namespace.c | 12 +-
ipc/shm.c | 2 +
kernel/Makefile | 2 +-
kernel/cgroup/cgroup.c | 2 +
kernel/cgroup/namespace.c | 24 +-
kernel/nstree.c | 233 ++++
kernel/pid_namespace.c | 13 +-
kernel/time/namespace.c | 23 +-
kernel/user_namespace.c | 17 +-
kernel/utsname.c | 28 +-
net/core/net_namespace.c | 59 +-
tools/include/uapi/linux/nsfs.h | 23 +-
tools/testing/selftests/namespaces/.gitignore | 2 +
tools/testing/selftests/namespaces/Makefile | 7 +
tools/testing/selftests/namespaces/config | 7 +
.../selftests/namespaces/file_handle_test.c | 1410 ++++++++++++++++++++
tools/testing/selftests/namespaces/nsid_test.c | 986 ++++++++++++++
42 files changed, 3306 insertions(+), 270 deletions(-)
---
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
change-id: 20250905-work-namespace-c68826dda0d4
Some ublk selftests have strange behavior when fio is not installed.
While most tests behave correctly (run if they don't need fio, or skip
if they need fio), the following tests have different behavior:
- test_null_01, test_null_02, test_generic_01, test_generic_02, and
test_generic_12 try to run fio without checking if it exists first,
and fail on any failure of the fio command (including "fio command
not found"). So these tests fail when they should skip.
- test_stress_05 runs fio without checking if it exists first, but
doesn't fail on fio command failure. This test passes, but that pass
is misleading as the test doesn't do anything useful without fio
installed. So this test passes when it should skip.
Fix these issues by adding _have_program fio checks to the top of all of
these tests.
Signed-off-by: Uday Shankar <ushankar(a)purestorage.com>
---
Changes in v2:
- Also fix test_generic_01, test_generic_02, test_generic_12, which fail
on systems where bpftrace is installed but fio is not (Mohit Gupta)
- Link to v1: https://lore.kernel.org/r/20250916-ublk_fio-v1-1-8d522539eed7@purestorage.c…
---
tools/testing/selftests/ublk/test_generic_01.sh | 4 ++++
tools/testing/selftests/ublk/test_generic_02.sh | 4 ++++
tools/testing/selftests/ublk/test_generic_12.sh | 4 ++++
tools/testing/selftests/ublk/test_null_01.sh | 4 ++++
tools/testing/selftests/ublk/test_null_02.sh | 4 ++++
tools/testing/selftests/ublk/test_stress_05.sh | 4 ++++
6 files changed, 24 insertions(+)
diff --git a/tools/testing/selftests/ublk/test_generic_01.sh b/tools/testing/selftests/ublk/test_generic_01.sh
index 9227a208ba53128e4a202298316ff77e05607595..21a31cd5491aa79ffe3ad458a0055e832c619325 100755
--- a/tools/testing/selftests/ublk/test_generic_01.sh
+++ b/tools/testing/selftests/ublk/test_generic_01.sh
@@ -10,6 +10,10 @@ if ! _have_program bpftrace; then
exit "$UBLK_SKIP_CODE"
fi
+if ! _have_program fio; then
+ exit "$UBLK_SKIP_CODE"
+fi
+
_prep_test "null" "sequential io order"
dev_id=$(_add_ublk_dev -t null)
diff --git a/tools/testing/selftests/ublk/test_generic_02.sh b/tools/testing/selftests/ublk/test_generic_02.sh
index 3e80121e3bf5e191aa9ffe1f85e1693be4fdc2d2..12920768b1a080d37fcdff93de7a0439101de09e 100755
--- a/tools/testing/selftests/ublk/test_generic_02.sh
+++ b/tools/testing/selftests/ublk/test_generic_02.sh
@@ -10,6 +10,10 @@ if ! _have_program bpftrace; then
exit "$UBLK_SKIP_CODE"
fi
+if ! _have_program fio; then
+ exit "$UBLK_SKIP_CODE"
+fi
+
_prep_test "null" "sequential io order for MQ"
dev_id=$(_add_ublk_dev -t null -q 2)
diff --git a/tools/testing/selftests/ublk/test_generic_12.sh b/tools/testing/selftests/ublk/test_generic_12.sh
index 7abbb00d251df9403857b1c6f53aec8bf8eab176..b4046201b4d99ef5355b845ebea2c9a3924276a5 100755
--- a/tools/testing/selftests/ublk/test_generic_12.sh
+++ b/tools/testing/selftests/ublk/test_generic_12.sh
@@ -10,6 +10,10 @@ if ! _have_program bpftrace; then
exit "$UBLK_SKIP_CODE"
fi
+if ! _have_program fio; then
+ exit "$UBLK_SKIP_CODE"
+fi
+
_prep_test "null" "do imbalanced load, it should be balanced over I/O threads"
NTHREADS=6
diff --git a/tools/testing/selftests/ublk/test_null_01.sh b/tools/testing/selftests/ublk/test_null_01.sh
index a34203f726685787da80b0e32da95e0fcb90d0b1..c2cb8f7a09fe37a9956d067fd56b28dc7ca6bd68 100755
--- a/tools/testing/selftests/ublk/test_null_01.sh
+++ b/tools/testing/selftests/ublk/test_null_01.sh
@@ -6,6 +6,10 @@
TID="null_01"
ERR_CODE=0
+if ! _have_program fio; then
+ exit "$UBLK_SKIP_CODE"
+fi
+
_prep_test "null" "basic IO test"
dev_id=$(_add_ublk_dev -t null)
diff --git a/tools/testing/selftests/ublk/test_null_02.sh b/tools/testing/selftests/ublk/test_null_02.sh
index 5633ca8766554b22be252c7cb2d13de1bf923b90..8accd35beb55c149f74b23f0fb562e12cbf3e362 100755
--- a/tools/testing/selftests/ublk/test_null_02.sh
+++ b/tools/testing/selftests/ublk/test_null_02.sh
@@ -6,6 +6,10 @@
TID="null_02"
ERR_CODE=0
+if ! _have_program fio; then
+ exit "$UBLK_SKIP_CODE"
+fi
+
_prep_test "null" "basic IO test with zero copy"
dev_id=$(_add_ublk_dev -t null -z)
diff --git a/tools/testing/selftests/ublk/test_stress_05.sh b/tools/testing/selftests/ublk/test_stress_05.sh
index 566cfd90d192ce8c1f98ca2539792d54a787b3d1..274295061042e5db3f4f0846ae63ea9b787fb2ee 100755
--- a/tools/testing/selftests/ublk/test_stress_05.sh
+++ b/tools/testing/selftests/ublk/test_stress_05.sh
@@ -5,6 +5,10 @@
TID="stress_05"
ERR_CODE=0
+if ! _have_program fio; then
+ exit "$UBLK_SKIP_CODE"
+fi
+
run_io_and_remove()
{
local size=$1
---
base-commit: da7b97ba0d219a14a83e9cc93f98b53939f12944
change-id: 20250916-ublk_fio-1910998b00b3
Best regards,
--
Uday Shankar <ushankar(a)purestorage.com>
From: Jeff Xu <jeffxu(a)google.com>
Since Linux introduced the memfd feature, memfd have always had their
execute bit set, and the memfd_create() syscall doesn't allow setting
it differently.
However, in a secure by default system, such as ChromeOS, (where all
executables should come from the rootfs, which is protected by Verified
boot), this executable nature of memfd opens a door for NoExec bypass
and enables “confused deputy attack”. E.g, in VRP bug [1]: cros_vm
process created a memfd to share the content with an external process,
however the memfd is overwritten and used for executing arbitrary code
and root escalation. [2] lists more VRP in this kind.
On the other hand, executable memfd has its legit use, runc uses memfd’s
seal and executable feature to copy the contents of the binary then
execute them, for such system, we need a solution to differentiate runc's
use of executable memfds and an attacker's [3].
To address those above, this set of patches add following:
1> Let memfd_create() set X bit at creation time.
2> Let memfd to be sealed for modifying X bit.
3> A new pid namespace sysctl: vm.memfd_noexec to control the behavior of
X bit.For example, if a container has vm.memfd_noexec=2, then
memfd_create() without MFD_NOEXEC_SEAL will be rejected.
4> A new security hook in memfd_create(). This make it possible to a new
LSM, which rejects or allows executable memfd based on its security policy.
Change history:
v7:
- patch 2/6: remove #ifdef and MAX_PATH (memfd_test.c).
- patch 3/6: check capability (CAP_SYS_ADMIN) from userns instead of
global ns (pid_sysctl.h). Add a tab (pid_namespace.h).
- patch 5/6: remove #ifdef (memfd_test.c)
- patch 6/6: remove unneeded security_move_mount(security.c).
v6:https://lore.kernel.org/lkml/20221206150233.1963717-1-jeffxu@google.com/
- Address comment and move "#ifdef CONFIG_" from .c file to pid_sysctl.h
v5:https://lore.kernel.org/lkml/20221206152358.1966099-1-jeffxu@google.com/
- Pass vm.memfd_noexec from current ns to child ns.
- Fix build issue detected by kernel test robot.
- Add missing security.c
v3:https://lore.kernel.org/lkml/20221202013404.163143-1-jeffxu@google.com/
- Address API design comments in v2.
- Let memfd_create() to set X bit at creation time.
- A new pid namespace sysctl: vm.memfd_noexec to control behavior of X bit.
- A new security hook in memfd_create().
v2:https://lore.kernel.org/lkml/20220805222126.142525-1-jeffxu@google.com/
- address comments in V1.
- add sysctl (vm.mfd_noexec) to set the default file permissions of
memfd_create to be non-executable.
v1:https://lwn.net/Articles/890096/
[1] https://crbug.com/1305411
[2] https://bugs.chromium.org/p/chromium/issues/list?q=type%3Dbug-security%20me…
[3] https://lwn.net/Articles/781013/
Daniel Verkamp (2):
mm/memfd: add F_SEAL_EXEC
selftests/memfd: add tests for F_SEAL_EXEC
Jeff Xu (4):
mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC
mm/memfd: Add write seals when apply SEAL_EXEC to executable memfd
selftests/memfd: add tests for MFD_NOEXEC_SEAL MFD_EXEC
mm/memfd: security hook for memfd_create
include/linux/lsm_hook_defs.h | 1 +
include/linux/lsm_hooks.h | 4 +
include/linux/pid_namespace.h | 19 ++
include/linux/security.h | 6 +
include/uapi/linux/fcntl.h | 1 +
include/uapi/linux/memfd.h | 4 +
kernel/pid_namespace.c | 5 +
kernel/pid_sysctl.h | 59 ++++
mm/memfd.c | 61 +++-
mm/shmem.c | 6 +
security/security.c | 5 +
tools/testing/selftests/memfd/fuse_test.c | 1 +
tools/testing/selftests/memfd/memfd_test.c | 341 ++++++++++++++++++++-
13 files changed, 510 insertions(+), 3 deletions(-)
create mode 100644 kernel/pid_sysctl.h
base-commit: eb7081409f94a9a8608593d0fb63a1aa3d6f95d8
--
2.39.0.rc1.256.g54fd8350bd-goog
This patch adds support for the Zalasr ISA extension, which supplies the
real load acquire/store release instructions.
The specification can be found here:
https://github.com/riscv/riscv-zalasr/blob/main/chapter2.adoc
This patch seires has been tested with ltp on Qemu with Brensan's zalasr
support patch[1].
Some false positive spacing error happens during patch checking. Thus I
CCed maintainers of checkpatch.pl as well.
[1] https://lore.kernel.org/all/CAGPSXwJEdtqW=nx71oufZp64nK6tK=0rytVEcz4F-gfvCO…
v3:
- Apply acquire/release semantics to arch_xchg/arch_cmpxchg operations
so as to ensure FENCE.TSO ordering between operations which precede the
UNLOCK+LOCK sequence and operations which follow the sequence. Thanks
to Andrea.
- Support hwprobe of Zalasr.
- Allow Zalasr extensions for Guest/VM.
v2:
- Adjust the order of Zalasr and Zalrsc in dt-bindings. Thanks to
Conor.
Xu Lu (8):
riscv: add ISA extension parsing for Zalasr
dt-bindings: riscv: Add Zalasr ISA extension description
riscv: hwprobe: Export Zalasr extension
riscv: Introduce Zalasr instructions
riscv: Use Zalasr for smp_load_acquire/smp_store_release
riscv: Apply acquire/release semantics to arch_xchg/arch_cmpxchg
operations
RISC-V: KVM: Allow Zalasr extensions for Guest/VM
KVM: riscv: selftests: Add Zalasr extensions to get-reg-list test
Documentation/arch/riscv/hwprobe.rst | 5 +-
.../devicetree/bindings/riscv/extensions.yaml | 5 +
arch/riscv/include/asm/atomic.h | 6 -
arch/riscv/include/asm/barrier.h | 91 ++++++++++--
arch/riscv/include/asm/cmpxchg.h | 136 ++++++++----------
arch/riscv/include/asm/hwcap.h | 1 +
arch/riscv/include/asm/insn-def.h | 79 ++++++++++
arch/riscv/include/uapi/asm/hwprobe.h | 1 +
arch/riscv/include/uapi/asm/kvm.h | 1 +
arch/riscv/kernel/cpufeature.c | 1 +
arch/riscv/kernel/sys_hwprobe.c | 1 +
arch/riscv/kvm/vcpu_onereg.c | 2 +
.../selftests/kvm/riscv/get-reg-list.c | 4 +
13 files changed, 242 insertions(+), 91 deletions(-)
--
2.20.1
FEAT_LSFE is optional from v9.5, it adds new instructions for atomic
memory operations with floating point values. We have no immediate use
for it in kernel, provide a hwcap so userspace can discover it and allow
the ID register field to be exposed to KVM guests.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v4:
- Rebase onto arm64/for-next/cpufeature, note that both patches have
build dependencies on this.
- Drop unneeded cc clobber in hwcap.
- Use STRFADD as the instruction probed in hwcap.
- Link to v3: https://lore.kernel.org/r/20250818-arm64-lsfe-v3-0-af6f4d66eb39@kernel.org
Changes in v3:
- Rebase onto v6.17-rc1.
- Link to v2: https://lore.kernel.org/r/20250703-arm64-lsfe-v2-0-eced80999cb4@kernel.org
Changes in v2:
- Fix result of vi dropping in hwcap test.
- Link to v1: https://lore.kernel.org/r/20250627-arm64-lsfe-v1-0-68351c4bf741@kernel.org
---
Mark Brown (2):
KVM: arm64: Expose FEAT_LSFE to guests
kselftest/arm64: Add lsfe to the hwcaps test
arch/arm64/kvm/sys_regs.c | 4 +++-
tools/testing/selftests/arm64/abi/hwcap.c | 21 +++++++++++++++++++++
2 files changed, 24 insertions(+), 1 deletion(-)
---
base-commit: 220928e52cb03d223b3acad3888baf0687486d21
change-id: 20250625-arm64-lsfe-0810cf98adc2
Best regards,
--
Mark Brown <broonie(a)kernel.org>
This patch simplifies kublk's implementation of the feature list
command, fixes a bug where a feature was missing, and adds a test to
ensure that similar bugs do not happen in the future.
Signed-off-by: Uday Shankar <ushankar(a)purestorage.com>
---
Changes in v2:
- Add log lines to new test in failure case, to tell the user how to fix
the test, and to indicate that the failure is expected when running
an old test suite against a new kernel (Ming Lei)
- Link to v1: https://lore.kernel.org/r/20250916-ublk_features-v1-0-52014be9cde5@purestor…
---
Uday Shankar (3):
selftests: ublk: kublk: simplify feat_map definition
selftests: ublk: kublk: add UBLK_F_BUF_REG_OFF_DAEMON to feat_map
selftests: ublk: add test to verify that feat_map is complete
tools/testing/selftests/ublk/Makefile | 1 +
tools/testing/selftests/ublk/kublk.c | 32 +++++++++++++------------
tools/testing/selftests/ublk/test_generic_13.sh | 20 ++++++++++++++++
3 files changed, 38 insertions(+), 15 deletions(-)
---
base-commit: da7b97ba0d219a14a83e9cc93f98b53939f12944
change-id: 20250916-ublk_features-07af4e321e5a
Best regards,
--
Uday Shankar <ushankar(a)purestorage.com>
There is a spelling mistake in a test message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
---
tools/testing/selftests/futex/functional/futex_numa_mpol.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/futex/functional/futex_numa_mpol.c b/tools/testing/selftests/futex/functional/futex_numa_mpol.c
index 722427fe90bf..3a71ab93db72 100644
--- a/tools/testing/selftests/futex/functional/futex_numa_mpol.c
+++ b/tools/testing/selftests/futex/functional/futex_numa_mpol.c
@@ -206,7 +206,7 @@ int main(int argc, char *argv[])
ksft_print_msg("Memory back to RW\n");
test_futex(futex_ptr, 0);
- ksft_test_result_pass("futex2 memory boundarie tests passed\n");
+ ksft_test_result_pass("futex2 memory boundary tests passed\n");
/* MPOL test. Does not work as expected */
#ifdef LIBNUMA_VER_SUFFICIENT
--
2.51.0
I've removed the RFC tag from this version of the series, but the items
that I'm looking for feedback on remains the same:
- The userspace ABI, in particular:
- The vector length used for the SVE registers, access to the SVE
registers and access to ZA and (if available) ZT0 depending on
the current state of PSTATE.{SM,ZA}.
- The use of a single finalisation for both SVE and SME.
- The addition of control for enabling fine grained traps in a similar
manner to FGU but without the UNDEF, I'm not clear if this is desired
at all and at present this requires symmetric read and write traps like
FGU. That seemed like it might be desired from an implementation
point of view but we already have one case where we enable an
asymmetric trap (for ARM64_WORKAROUND_AMPERE_AC03_CPU_38) and it
seems generally useful to enable asymmetrically.
This series implements support for SME use in non-protected KVM guests.
Much of this is very similar to SVE, the main additional challenge that
SME presents is that it introduces a new vector length similar to the
SVE vector length and two new controls which change the registers seen
by guests:
- PSTATE.ZA enables the ZA matrix register and, if SME2 is supported,
the ZT0 LUT register.
- PSTATE.SM enables streaming mode, a new floating point mode which
uses the SVE register set with the separately configured SME vector
length. In streaming mode implementation of the FFR register is
optional.
It is also permitted to build systems which support SME without SVE, in
this case when not in streaming mode no SVE registers or instructions
are available. Further, there is no requirement that there be any
overlap in the set of vector lengths supported by SVE and SME in a
system, this is expected to be a common situation in practical systems.
Since there is a new vector length to configure we introduce a new
feature parallel to the existing SVE one with a new pseudo register for
the streaming mode vector length. Due to the overlap with SVE caused by
streaming mode rather than finalising SME as a separate feature we use
the existing SVE finalisation to also finalise SME, a new define
KVM_ARM_VCPU_VEC is provided to help make user code clearer. Finalising
SVE and SME separately would introduce complication with register access
since finalising SVE makes the SVE registers writeable by userspace and
doing multiple finalisations results in an error being reported.
Dealing with a state where the SVE registers are writeable due to one of
SVE or SME being finalised but may have their VL changed by the other
being finalised seems like needless complexity with minimal practical
utility, it seems clearer to just express directly that only one
finalisation can be done in the ABI.
Access to the floating point registers follows the architecture:
- When both SVE and SME are present:
- If PSTATE.SM == 0 the vector length used for the Z and P registers
is the SVE vector length.
- If PSTATE.SM == 1 the vector length used for the Z and P registers
is the SME vector length.
- If only SME is present:
- If PSTATE.SM == 0 the Z and P registers are inaccessible and the
floating point state accessed via the encodings for the V registers.
- If PSTATE.SM == 1 the vector length used for the Z and P registers
- The SME specific ZA and ZT0 registers are only accessible if SVCR.ZA is 1.
The VMM must understand this, in particular when loading state SVCR
should be configured before other state. It should be noted that while
the architecture refers to PSTATE.SM and PSTATE.ZA these PSTATE bits are
not preserved in SPSR_ELx, they are only accessible via SVCR.
There are a large number of subfeatures for SME, most of which only
offer additional instructions but some of which (SME2 and FA64) add
architectural state. These are configured via the ID registers as per
usual.
Protected KVM supported, with the implementation maintaining the
existing restriction that the hypervisor will refuse to run if streaming
mode or ZA is enabled. This both simplfies the code and avoids the need
to allocate storage for host ZA and ZT0 state, there seems to be little
practical use case for supporting this and the memory usage would be
non-trivial.
The new KVM_ARM_VCPU_VEC feature and ZA and ZT0 registers have not been
added to the get-reg-list selftest, the idea of supporting additional
features there without restructuring the program to generate all
possible feature combinations has been rejected. I will post a separate
series which does that restructuring.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v8:
- Small fixes in ABI documentation.
- Link to v7: https://lore.kernel.org/r/20250822-kvm-arm64-sme-v7-0-7a65d82b8b10@kernel.o…
Changes in v7:
- Rebase onto v6.17-rc1.
- Handle SMIDR_EL1 as a VM wide ID register and use this in feat_sme_smps().
- Expose affinity fields in SMIDR_EL1.
- Remove SMPRI_EL1 from vcpu_sysreg, the value is always 0 currently.
- Prevent userspace writes to SMPRIMAP_EL2.
- Link to v6: https://lore.kernel.org/r/20250625-kvm-arm64-sme-v6-0-114cff4ffe04@kernel.o…
Changes in v6:
- Rebase onto v6.16-rc3.
- Link to v5: https://lore.kernel.org/r/20250417-kvm-arm64-sme-v5-0-f469a2d5f574@kernel.o…
Changes in v5:
- Rebase onto v6.15-rc2.
- Add pKVM guest support.
- Always restore SVCR.
- Link to v4: https://lore.kernel.org/r/20250214-kvm-arm64-sme-v4-0-d64a681adcc2@kernel.o…
Changes in v4:
- Rebase onto v6.14-rc2 and Mark Rutland's fixes.
- Expose SME to nested guests.
- Additional cleanups and test fixes following on from the rebase.
- Flush register state on VMM PSTATE.{SM,ZA}.
- Link to v3: https://lore.kernel.org/r/20241220-kvm-arm64-sme-v3-0-05b018c1ffeb@kernel.o…
Changes in v3:
- Rebase onto v6.12-rc2.
- Link to v2: https://lore.kernel.org/r/20231222-kvm-arm64-sme-v2-0-da226cb180bb@kernel.o…
Changes in v2:
- Rebase onto v6.7-rc3.
- Configure subfeatures based on host system only.
- Complete nVHE support.
- There was some snafu with sending v1 out, it didn't make it to the
lists but in case it hit people's inboxes I'm sending as v2.
---
Mark Brown (29):
arm64/sysreg: Update SMIDR_EL1 to DDI0601 2025-06
arm64/fpsimd: Update FA64 and ZT0 enables when loading SME state
arm64/fpsimd: Decide to save ZT0 and streaming mode FFR at bind time
arm64/fpsimd: Check enable bit for FA64 when saving EFI state
arm64/fpsimd: Determine maximum virtualisable SME vector length
KVM: arm64: Introduce non-UNDEF FGT control
KVM: arm64: Pay attention to FFR parameter in SVE save and load
KVM: arm64: Pull ctxt_has_ helpers to start of sysreg-sr.h
KVM: arm64: Move SVE state access macros after feature test macros
KVM: arm64: Rename SVE finalization constants to be more general
KVM: arm64: Document the KVM ABI for SME
KVM: arm64: Define internal features for SME
KVM: arm64: Rename sve_state_reg_region
KVM: arm64: Store vector lengths in an array
KVM: arm64: Implement SME vector length configuration
KVM: arm64: Support SME control registers
KVM: arm64: Support TPIDR2_EL0
KVM: arm64: Support SME identification registers for guests
KVM: arm64: Support SME priority registers
KVM: arm64: Provide assembly for SME register access
KVM: arm64: Support userspace access to streaming mode Z and P registers
KVM: arm64: Flush register state on writes to SVCR.SM and SVCR.ZA
KVM: arm64: Expose SME specific state to userspace
KVM: arm64: Context switch SME state for guests
KVM: arm64: Handle SME exceptions
KVM: arm64: Expose SME to nested guests
KVM: arm64: Provide interface for configuring and enabling SME for guests
KVM: arm64: selftests: Add SME system registers to get-reg-list
KVM: arm64: selftests: Add SME to set_id_regs test
Documentation/virt/kvm/api.rst | 115 ++++++++---
arch/arm64/include/asm/fpsimd.h | 26 +++
arch/arm64/include/asm/kvm_emulate.h | 6 +
arch/arm64/include/asm/kvm_host.h | 169 ++++++++++++---
arch/arm64/include/asm/kvm_hyp.h | 5 +-
arch/arm64/include/asm/kvm_pkvm.h | 2 +-
arch/arm64/include/asm/vncr_mapping.h | 2 +
arch/arm64/include/uapi/asm/kvm.h | 33 +++
arch/arm64/kernel/cpufeature.c | 2 -
arch/arm64/kernel/fpsimd.c | 89 ++++----
arch/arm64/kvm/arm.c | 10 +
arch/arm64/kvm/config.c | 8 +-
arch/arm64/kvm/fpsimd.c | 28 ++-
arch/arm64/kvm/guest.c | 252 ++++++++++++++++++++---
arch/arm64/kvm/handle_exit.c | 14 ++
arch/arm64/kvm/hyp/fpsimd.S | 28 ++-
arch/arm64/kvm/hyp/include/hyp/switch.h | 175 ++++++++++++++--
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 110 ++++++----
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 86 ++++++--
arch/arm64/kvm/hyp/nvhe/pkvm.c | 85 ++++++--
arch/arm64/kvm/hyp/nvhe/switch.c | 4 +-
arch/arm64/kvm/hyp/nvhe/sys_regs.c | 6 +
arch/arm64/kvm/hyp/vhe/switch.c | 17 +-
arch/arm64/kvm/hyp/vhe/sysreg-sr.c | 7 +
arch/arm64/kvm/nested.c | 3 +-
arch/arm64/kvm/reset.c | 156 ++++++++++----
arch/arm64/kvm/sys_regs.c | 141 ++++++++++++-
arch/arm64/tools/sysreg | 8 +-
include/uapi/linux/kvm.h | 1 +
tools/testing/selftests/kvm/arm64/get-reg-list.c | 15 +-
tools/testing/selftests/kvm/arm64/set_id_regs.c | 27 ++-
31 files changed, 1327 insertions(+), 303 deletions(-)
---
base-commit: 062b3e4a1f880f104a8d4b90b767788786aa7b78
change-id: 20230301-kvm-arm64-sme-06a1246d3636
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Syzkaller found this, fput runs the release from a work queue so the
refcount remains elevated during abort. This is tricky so move more
handling of files into the core code.
Add a WARN_ON to catch things like this more reliably without relying on
kasn.
Update the fail_nth test to succeed on 6.17 kernels.
Jason Gunthorpe (3):
iommufd: Fix race during abort for file descriptors
iommufd: WARN if an object is aborted with an elevated refcount
iommufd/selftest: Update the fail_nth limit
drivers/iommu/iommufd/device.c | 3 +-
drivers/iommu/iommufd/eventq.c | 9 +----
drivers/iommu/iommufd/iommufd_private.h | 3 +-
drivers/iommu/iommufd/main.c | 39 +++++++++++++++++--
.../selftests/iommu/iommufd_fail_nth.c | 2 +-
5 files changed, 42 insertions(+), 14 deletions(-)
base-commit: 1046d40b0e78d2cd63f6183629699b629b21f877
--
2.43.0
Mshare is a developing feature proposed by Anthony Yznaga and Khalid Aziz
that enables sharing of PTEs across processes. The V3 patch set has been
posted for review:
https://lore.kernel.org/linux-mm/20250820010415.699353-1-anthony.yznaga@ora…
This patch set adds selftests to exercise and demonstrate basic
functionality of mshare.
The initial tests use open, ioctl, and mmap syscalls to establish a shared
memory mapping between two processes and verify the expected behavior.
Additional tests are included to check interoperability with swap and
Transparent Huge Pages.
Future work will extend coverage to other use cases such as integration
with KVM and more advanced scenarios.
This series is intended to be applied on top of mshare V3, which is
based on mm-new (2025-08-15).
-----------------
V1->V2:
- Based on mshare V3, which based on mm-new as of 2025-08-15
- (Fix) For test cases in basic.c, Change to use a small chunk of
memory(4k/8K for normal pages, 2M/4M for hugetlb pages), as to
ensure these tests can run on any server or device.
- (Fix) For test cases of hugetlb, swap and THP, add a tips to
configure corresponding settings.
- (Fix) Add memory to .gitignore file once it exists
- (fix) Correct the Changelog of THP test case that mshare support
THP only when user configure shmem_enabled as always
V1:
https://lore.kernel.org/all/20250825145719.29455-1-linyongting@bytedance.co…
Yongting Lin (8):
mshare: Add selftests
mshare: selftests: Adding config fragments
mshare: selftests: Add some helper functions for mshare filesystem
mshare: selftests: Add test case shared memory
mshare: selftests: Add test case ioctl unmap
mshare: selftests: Add some helper functions for configuring and
retrieving cgroup
mshare: selftests: Add test case to demostrate the swapping of mshare
memory
mshare: selftests: Add test case to demostrate that mshare partly
supports THP
tools/testing/selftests/mshare/.gitignore | 4 +
tools/testing/selftests/mshare/Makefile | 7 +
tools/testing/selftests/mshare/basic.c | 109 ++++++++++
tools/testing/selftests/mshare/config | 1 +
tools/testing/selftests/mshare/memory.c | 89 ++++++++
tools/testing/selftests/mshare/util.c | 254 ++++++++++++++++++++++
6 files changed, 464 insertions(+)
create mode 100644 tools/testing/selftests/mshare/.gitignore
create mode 100644 tools/testing/selftests/mshare/Makefile
create mode 100644 tools/testing/selftests/mshare/basic.c
create mode 100644 tools/testing/selftests/mshare/config
create mode 100644 tools/testing/selftests/mshare/memory.c
create mode 100644 tools/testing/selftests/mshare/util.c
--
2.20.1
For a while now we have supported file handles for pidfds. This has
proven to be very useful.
Extend the concept to cover namespaces as well. After this patchset it
is possible to encode and decode namespace file handles using the
commong name_to_handle_at() and open_by_handle_at() apis.
Namespaces file descriptors can already be derived from pidfds which
means they aren't subject to overmount protection bugs. IOW, it's
irrelevant if the caller would not have access to an appropriate
/proc/<pid>/ns/ directory as they could always just derive the namespace
based on a pidfd already.
It has the same advantage as pidfds. It's possible to reliably and for
the lifetime of the system refer to a namespace without pinning any
resources and to compare them.
Permission checking is kept simple. If the caller is located in the
namespace the file handle refers to they are able to open it otherwise
they must hold privilege over the owning namespace of the relevant
namespace.
Both the network namespace and the mount namespace already have an
associated cookie that isn't recycled and is fully exposed to userspace.
Move this into ns_common and use the same id space for all namespaces so
they can trivially and reliably be compared.
There's more coming based on the iterator infrastructure but the series
is large enough and focuses on file handles.
Extensive selftests included.
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
---
Changes in v2:
- Address various review comments.
- Use a common NS_GET_ID ioctl() instead of individual ioctls.
- Link to v1: https://lore.kernel.org/20250910-work-namespace-v1-0-4dd56e7359d8@kernel.org
---
Christian Brauner (33):
pidfs: validate extensible ioctls
nsfs: drop tautological ioctl() check
nsfs: validate extensible ioctls
block: use extensible_ioctl_valid()
ns: move to_ns_common() to ns_common.h
nsfs: add nsfs.h header
ns: uniformly initialize ns_common
cgroup: use ns_common_init()
ipc: use ns_common_init()
mnt: use ns_common_init()
net: use ns_common_init()
pid: use ns_common_init()
time: use ns_common_init()
user: use ns_common_init()
uts: use ns_common_init()
ns: remove ns_alloc_inum()
nstree: make iterator generic
mnt: support ns lookup
cgroup: support ns lookup
ipc: support ns lookup
net: support ns lookup
pid: support ns lookup
time: support ns lookup
user: support ns lookup
uts: support ns lookup
ns: add to_<type>_ns() to respective headers
nsfs: add current_in_namespace()
nsfs: support file handles
nsfs: support exhaustive file handles
nsfs: add missing id retrieval support
tools: update nsfs.h uapi header
selftests/namespaces: add identifier selftests
selftests/namespaces: add file handle selftests
block/blk-integrity.c | 8 +-
fs/fhandle.c | 6 +
fs/internal.h | 1 +
fs/mount.h | 10 +-
fs/namespace.c | 156 +--
fs/nsfs.c | 201 ++-
fs/pidfs.c | 2 +-
include/linux/cgroup.h | 5 +
include/linux/exportfs.h | 6 +
include/linux/fs.h | 14 +
include/linux/ipc_namespace.h | 5 +
include/linux/ns_common.h | 29 +
include/linux/nsfs.h | 40 +
include/linux/nsproxy.h | 11 -
include/linux/nstree.h | 89 ++
include/linux/pid_namespace.h | 5 +
include/linux/proc_ns.h | 32 +-
include/linux/time_namespace.h | 9 +
include/linux/user_namespace.h | 5 +
include/linux/utsname.h | 5 +
include/net/net_namespace.h | 6 +
include/uapi/linux/fcntl.h | 1 +
include/uapi/linux/nsfs.h | 15 +-
init/main.c | 2 +
ipc/msgutil.c | 1 +
ipc/namespace.c | 12 +-
ipc/shm.c | 2 +
kernel/Makefile | 2 +-
kernel/cgroup/cgroup.c | 2 +
kernel/cgroup/namespace.c | 24 +-
kernel/nstree.c | 233 ++++
kernel/pid_namespace.c | 13 +-
kernel/time/namespace.c | 23 +-
kernel/user_namespace.c | 17 +-
kernel/utsname.c | 28 +-
net/core/net_namespace.c | 59 +-
tools/include/uapi/linux/nsfs.h | 17 +-
tools/testing/selftests/namespaces/.gitignore | 2 +
tools/testing/selftests/namespaces/Makefile | 7 +
tools/testing/selftests/namespaces/config | 7 +
.../selftests/namespaces/file_handle_test.c | 1429 ++++++++++++++++++++
tools/testing/selftests/namespaces/nsid_test.c | 986 ++++++++++++++
42 files changed, 3257 insertions(+), 270 deletions(-)
---
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
change-id: 20250905-work-namespace-c68826dda0d4
[ I think at this point everyone is OK with the ABI, and the x86
implementation has been tested so hopefully we are near to being
able to get this merged? If there are any outstanding issues let
me know and I can look at addressing them. The one possible issue
I am aware of is that the RISC-V shadow stack support was briefly
in -next but got dropped along with the general RISC-V issues during
the last merge window, rebasing for that is still in progress. I
guess ideally this could be applied on a branch and then pulled into
the RISC-V tree? ]
The kernel has recently added support for shadow stacks, currently
x86 only using their CET feature but both arm64 and RISC-V have
equivalent features (GCS and Zicfiss respectively), I am actively
working on GCS[1]. With shadow stacks the hardware maintains an
additional stack containing only the return addresses for branch
instructions which is not generally writeable by userspace and ensures
that any returns are to the recorded addresses. This provides some
protection against ROP attacks and making it easier to collect call
stacks. These shadow stacks are allocated in the address space of the
userspace process.
Our API for shadow stacks does not currently offer userspace any
flexiblity for managing the allocation of shadow stacks for newly
created threads, instead the kernel allocates a new shadow stack with
the same size as the normal stack whenever a thread is created with the
feature enabled. The stacks allocated in this way are freed by the
kernel when the thread exits or shadow stacks are disabled for the
thread. This lack of flexibility and control isn't ideal, in the vast
majority of cases the shadow stack will be over allocated and the
implicit allocation and deallocation is not consistent with other
interfaces. As far as I can tell the interface is done in this manner
mainly because the shadow stack patches were in development since before
clone3() was implemented.
Since clone3() is readily extensible let's add support for specifying a
shadow stack when creating a new thread or process, keeping the current
implicit allocation behaviour if one is not specified either with
clone3() or through the use of clone(). The user must provide a shadow
stack pointer, this must point to memory mapped for use as a shadow
stackby map_shadow_stack() with an architecture specified shadow stack
token at the top of the stack.
Yuri Khrustalev has raised questions from the libc side regarding
discoverability of extended clone3() structure sizes[2], this seems like
a general issue with clone3(). There was a suggestion to add a hwcap on
arm64 which isn't ideal but is doable there, though architecture
specific mechanisms would also be needed for x86 (and RISC-V if it's
support gets merged before this does). The idea has, however, had
strong pushback from the architecture maintainers and it is possible to
detect support for this in clone3() by attempting a call with a
misaligned shadow stack pointer specified so no hwcap has been added.
[1] https://lore.kernel.org/linux-arm-kernel/20241001-arm64-gcs-v13-0-222b78d87…
[2] https://lore.kernel.org/r/aCs65ccRQtJBnZ_5@arm.com
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v21:
- Rebase onto https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git kernel-6.18.clone3
- Rename shadow_stack_token to shstk_token, since it's a simple rename I've
kept the acks and reviews but I dropped the tested-bys just to be safe.
- Link to v20: https://lore.kernel.org/r/20250902-clone3-shadow-stack-v20-0-4d9fff1c53e7@k…
Changes in v20:
- Comment fixes and clarifications in x86 arch_shstk_validate_clone()
from Rick Edgecombe.
- Spelling fix in documentation.
- Link to v19: https://lore.kernel.org/r/20250819-clone3-shadow-stack-v19-0-bc957075479b@k…
Changes in v19:
- Rebase onto v6.17-rc1.
- Link to v18: https://lore.kernel.org/r/20250702-clone3-shadow-stack-v18-0-7965d2b694db@k…
Changes in v18:
- Rebase onto v6.16-rc3.
- Thanks to pointers from Yuri Khrustalev this version has been tested
on x86 so I have removed the RFT tag.
- Clarify clone3_shadow_stack_valid() comment about the Kconfig check.
- Remove redundant GCSB DSYNCs in arm64 code.
- Fix token validation on x86.
- Link to v17: https://lore.kernel.org/r/20250609-clone3-shadow-stack-v17-0-8840ed97ff6f@k…
Changes in v17:
- Rebase onto v6.16-rc1.
- Link to v16: https://lore.kernel.org/r/20250416-clone3-shadow-stack-v16-0-2ffc9ca3917b@k…
Changes in v16:
- Rebase onto v6.15-rc2.
- Roll in fixes from x86 testing from Rick Edgecombe.
- Rework so that the argument is shadow_stack_token.
- Link to v15: https://lore.kernel.org/r/20250408-clone3-shadow-stack-v15-0-3fa245c6e3be@k…
Changes in v15:
- Rebase onto v6.15-rc1.
- Link to v14: https://lore.kernel.org/r/20250206-clone3-shadow-stack-v14-0-805b53af73b9@k…
Changes in v14:
- Rebase onto v6.14-rc1.
- Link to v13: https://lore.kernel.org/r/20241203-clone3-shadow-stack-v13-0-93b89a81a5ed@k…
Changes in v13:
- Rebase onto v6.13-rc1.
- Link to v12: https://lore.kernel.org/r/20241031-clone3-shadow-stack-v12-0-7183eb8bee17@k…
Changes in v12:
- Add the regular prctl() to the userspace API document since arm64
support is queued in -next.
- Link to v11: https://lore.kernel.org/r/20241005-clone3-shadow-stack-v11-0-2a6a2bd6d651@k…
Changes in v11:
- Rebase onto arm64 for-next/gcs, which is based on v6.12-rc1, and
integrate arm64 support.
- Rework the interface to specify a shadow stack pointer rather than a
base and size like we do for the regular stack.
- Link to v10: https://lore.kernel.org/r/20240821-clone3-shadow-stack-v10-0-06e8797b9445@k…
Changes in v10:
- Integrate fixes & improvements for the x86 implementation from Rick
Edgecombe.
- Require that the shadow stack be VM_WRITE.
- Require that the shadow stack base and size be sizeof(void *) aligned.
- Clean up trailing newline.
- Link to v9: https://lore.kernel.org/r/20240819-clone3-shadow-stack-v9-0-962d74f99464@ke…
Changes in v9:
- Pull token validation earlier and report problems with an error return
to parent rather than signal delivery to the child.
- Verify that the top of the supplied shadow stack is VM_SHADOW_STACK.
- Rework token validation to only do the page mapping once.
- Drop no longer needed support for testing for signals in selftest.
- Fix typo in comments.
- Link to v8: https://lore.kernel.org/r/20240808-clone3-shadow-stack-v8-0-0acf37caf14c@ke…
Changes in v8:
- Fix token verification with user specified shadow stack.
- Don't track user managed shadow stacks for child processes.
- Link to v7: https://lore.kernel.org/r/20240731-clone3-shadow-stack-v7-0-a9532eebfb1d@ke…
Changes in v7:
- Rebase onto v6.11-rc1.
- Typo fixes.
- Link to v6: https://lore.kernel.org/r/20240623-clone3-shadow-stack-v6-0-9ee7783b1fb9@ke…
Changes in v6:
- Rebase onto v6.10-rc3.
- Ensure we don't try to free the parent shadow stack in error paths of
x86 arch code.
- Spelling fixes in userspace API document.
- Additional cleanups and improvements to the clone3() tests to support
the shadow stack tests.
- Link to v5: https://lore.kernel.org/r/20240203-clone3-shadow-stack-v5-0-322c69598e4b@ke…
Changes in v5:
- Rebase onto v6.8-rc2.
- Rework ABI to have the user allocate the shadow stack memory with
map_shadow_stack() and a token.
- Force inlining of the x86 shadow stack enablement.
- Move shadow stack enablement out into a shared header for reuse by
other tests.
- Link to v4: https://lore.kernel.org/r/20231128-clone3-shadow-stack-v4-0-8b28ffe4f676@ke…
Changes in v4:
- Formatting changes.
- Use a define for minimum shadow stack size and move some basic
validation to fork.c.
- Link to v3: https://lore.kernel.org/r/20231120-clone3-shadow-stack-v3-0-a7b8ed3e2acc@ke…
Changes in v3:
- Rebase onto v6.7-rc2.
- Remove stale shadow_stack in internal kargs.
- If a shadow stack is specified unconditionally use it regardless of
CLONE_ parameters.
- Force enable shadow stacks in the selftest.
- Update changelogs for RISC-V feature rename.
- Link to v2: https://lore.kernel.org/r/20231114-clone3-shadow-stack-v2-0-b613f8681155@ke…
Changes in v2:
- Rebase onto v6.7-rc1.
- Remove ability to provide preallocated shadow stack, just specify the
desired size.
- Link to v1: https://lore.kernel.org/r/20231023-clone3-shadow-stack-v1-0-d867d0b5d4d0@ke…
---
Mark Brown (8):
arm64/gcs: Return a success value from gcs_alloc_thread_stack()
Documentation: userspace-api: Add shadow stack API documentation
selftests: Provide helper header for shadow stack testing
fork: Add shadow stack support to clone3()
selftests/clone3: Remove redundant flushes of output streams
selftests/clone3: Factor more of main loop into test_clone3()
selftests/clone3: Allow tests to flag if -E2BIG is a valid error code
selftests/clone3: Test shadow stack support
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/shadow_stack.rst | 44 +++++
arch/arm64/include/asm/gcs.h | 8 +-
arch/arm64/kernel/process.c | 8 +-
arch/arm64/mm/gcs.c | 55 +++++-
arch/x86/include/asm/shstk.h | 11 +-
arch/x86/kernel/process.c | 2 +-
arch/x86/kernel/shstk.c | 53 ++++-
include/asm-generic/cacheflush.h | 11 ++
include/linux/sched/task.h | 17 ++
include/uapi/linux/sched.h | 9 +-
kernel/fork.c | 93 +++++++--
tools/testing/selftests/clone3/clone3.c | 226 ++++++++++++++++++----
tools/testing/selftests/clone3/clone3_selftests.h | 65 ++++++-
tools/testing/selftests/ksft_shstk.h | 98 ++++++++++
15 files changed, 620 insertions(+), 81 deletions(-)
---
base-commit: 76cea30ad520238160bf8f5e2f2803fcd7a08d22
change-id: 20231019-clone3-shadow-stack-15d40d2bf536
Best regards,
--
Mark Brown <broonie(a)kernel.org>
sockmap_redir was introduced to comprehensively test the BPF redirection.
This series strives to increase the tested sockmap/sockhash code coverage;
adds support for skipping the actual redirect part, allowing to simply
SK_DROP or SK_PASS the packet.
BPF_MAP_TYPE_SOCKMAP
BPF_MAP_TYPE_SOCKHASH
x
sk_msg-to-egress
sk_msg-to-ingress
sk_skb-to-egress
sk_skb-to-ingress
x
AF_INET, SOCK_STREAM
AF_INET6, SOCK_STREAM
AF_INET, SOCK_DGRAM
AF_INET6, SOCK_DGRAM
AF_UNIX, SOCK_STREAM
AF_UNIX, SOCK_DGRAM
AF_VSOCK, SOCK_STREAM
AF_VSOCK, SOCK_SEQPACKET
x
SK_REDIRECT
SK_DROP
SK_PASS
Patch 5 ("Support no-redirect SK_DROP/SK_PASS") implements the feature.
Patches 3 ("Rename functions") and 4 ("Let test specify skel's
redirect_type") make preparatory changes.
I also took the opportunity to clean up (Patch 1, "Simplify try_recv()")
and improve a bit (Patch 2, "Fix OOB handling").
$ cd tools/testing/selftests/bpf
$ make
$ sudo ./test_progs -t sockmap_redir
...
Summary: 1/720 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Michal Luczaj <mhal(a)rbox.co>
---
Michal Luczaj (5):
selftests/bpf: sockmap_redir: Simplify try_recv()
selftests/bpf: sockmap_redir: Fix OOB handling
selftests/bpf: sockmap_redir: Rename functions
selftests/bpf: sockmap_redir: Let test specify skel's redirect_type
selftests/bpf: sockmap_redir: Support no-redirect SK_DROP/SK_PASS
.../selftests/bpf/prog_tests/sockmap_redir.c | 143 +++++++++++++++------
1 file changed, 105 insertions(+), 38 deletions(-)
---
base-commit: e8a6a9d3e8cc539d281e77b9df2439f223ec8153
change-id: 20250523-redir-test-pass-drop-2f2a5edca6e1
Best regards,
--
Michal Luczaj <mhal(a)rbox.co>
[ based on kvm/next ]
Unmapping virtual machine guest memory from the host kernel's direct map is a
successful mitigation against Spectre-style transient execution issues: If the
kernel page tables do not contain entries pointing to guest memory, then any
attempted speculative read through the direct map will necessarily be blocked
by the MMU before any observable microarchitectural side-effects happen. This
means that Spectre-gadgets and similar cannot be used to target virtual machine
memory. Roughly 60% of speculative execution issues fall into this category [1,
Table 1].
This patch series extends guest_memfd with the ability to remove its memory
from the host kernel's direct map, to be able to attain the above protection
for KVM guests running inside guest_memfd.
Additionally, a Firecracker branch with support for these VMs can be found on
GitHub [2].
For more details, please refer to the v5 cover letter [v5]. No
substantial changes in design have taken place since.
=== Changes Since v5 ===
- Fix up error handling for set_direct_map_[in]valid_noflush() (Mike)
- Fix capability check for KVM_GUEST_MEMFD_NO_DIRECT_MAP (Mike)
- Make secretmem_aops static in mm/secretmem.c (Mike)
- Fixup some more comments in gup.c that referred to secretmem
specifically to instead point to AS_NO_DIRECT_MAP (Mike)
- New patch (PATCH 4/11) to avoid ifdeffery in kvm_gmem_free_folio() (Mike)
- vma_is_no_direct_map() -> vma_has_no_direct_map() rename (David)
- Squash some patches (David)
- Fix up const-ness of parameters to new functions in pagemap.h (Fuad)
[1]: https://download.vusec.net/papers/quarantine_raid23.pdf
[2]: https://github.com/firecracker-microvm/firecracker/tree/feature/secret-hidi…
[RFCv1]: https://lore.kernel.org/kvm/20240709132041.3625501-1-roypat@amazon.co.uk/
[RFCv2]: https://lore.kernel.org/kvm/20240910163038.1298452-1-roypat@amazon.co.uk/
[RFCv3]: https://lore.kernel.org/kvm/20241030134912.515725-1-roypat@amazon.co.uk/
[v4]: https://lore.kernel.org/kvm/20250221160728.1584559-1-roypat@amazon.co.uk/
[v5]: https://lore.kernel.org/kvm/20250828093902.2719-1-roypat@amazon.co.uk/
Elliot Berman (1):
filemap: Pass address_space mapping to ->free_folio()
Patrick Roy (10):
arch: export set_direct_map_valid_noflush to KVM module
mm: introduce AS_NO_DIRECT_MAP
KVM: guest_memfd: Add stub for kvm_arch_gmem_invalidate
KVM: guest_memfd: Add flag to remove from direct map
KVM: selftests: load elf via bounce buffer
KVM: selftests: set KVM_MEM_GUEST_MEMFD in vm_mem_add() if guest_memfd
!= -1
KVM: selftests: Add guest_memfd based vm_mem_backing_src_types
KVM: selftests: stuff vm_mem_backing_src_type into vm_shape
KVM: selftests: cover GUEST_MEMFD_FLAG_NO_DIRECT_MAP in existing
selftests
KVM: selftests: Test guest execution from direct map removed gmem
Documentation/filesystems/locking.rst | 2 +-
Documentation/virt/kvm/api.rst | 5 ++
arch/arm64/include/asm/kvm_host.h | 12 ++++
arch/arm64/mm/pageattr.c | 1 +
arch/loongarch/mm/pageattr.c | 1 +
arch/riscv/mm/pageattr.c | 1 +
arch/s390/mm/pageattr.c | 1 +
arch/x86/mm/pat/set_memory.c | 1 +
fs/nfs/dir.c | 11 ++--
fs/orangefs/inode.c | 3 +-
include/linux/fs.h | 2 +-
include/linux/kvm_host.h | 9 +++
include/linux/pagemap.h | 16 +++++
include/linux/secretmem.h | 18 ------
include/uapi/linux/kvm.h | 2 +
lib/buildid.c | 4 +-
mm/filemap.c | 9 +--
mm/gup.c | 19 ++----
mm/mlock.c | 2 +-
mm/secretmem.c | 11 ++--
mm/vmscan.c | 4 +-
.../testing/selftests/kvm/guest_memfd_test.c | 2 +
.../testing/selftests/kvm/include/kvm_util.h | 37 ++++++++---
.../testing/selftests/kvm/include/test_util.h | 8 +++
tools/testing/selftests/kvm/lib/elf.c | 8 +--
tools/testing/selftests/kvm/lib/io.c | 23 +++++++
tools/testing/selftests/kvm/lib/kvm_util.c | 61 +++++++++++--------
tools/testing/selftests/kvm/lib/test_util.c | 8 +++
tools/testing/selftests/kvm/lib/x86/sev.c | 1 +
.../selftests/kvm/pre_fault_memory_test.c | 1 +
.../selftests/kvm/set_memory_region_test.c | 50 +++++++++++++--
.../kvm/x86/private_mem_conversions_test.c | 7 ++-
virt/kvm/guest_memfd.c | 56 ++++++++++++++---
virt/kvm/kvm_main.c | 5 ++
34 files changed, 288 insertions(+), 113 deletions(-)
base-commit: a6ad54137af92535cfe32e19e5f3bc1bb7dbd383
--
2.50.1
Add a README file for RISC-V specific kernel selftests under
tools/testing/selftests/riscv/. This mirrors the existing README
for arm64, providing clear guidance on how the tests are architecture
specific and skipped on non-riscv systems. It also includes
standard make commands for building, running and installing the
tests, along with a reference to general kselftest documentation.
Signed-off-by: Bala-Vignesh-Reddy <reddybalavignesh9979(a)gmail.com>
---
tools/testing/selftests/riscv/README | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
create mode 100644 tools/testing/selftests/riscv/README
diff --git a/tools/testing/selftests/riscv/README b/tools/testing/selftests/riscv/README
new file mode 100644
index 000000000000..443da395da68
--- /dev/null
+++ b/tools/testing/selftests/riscv/README
@@ -0,0 +1,24 @@
+KSelfTest RISC-V
+================
+
+- These tests are riscv specific and so not built or run but just skipped
+ completely when env-variable ARCH is found to be different than 'riscv'.
+
+- Holding true the above, RISC-V KSFT tests can be run within the
+ KSelfTest framework using standard Linux top-level-makefile targets:
+
+ $ make TARGETS=riscv kselftest-clean
+ $ make TARGETS=riscv kselftest
+
+ or
+
+ $ make -C tools/testing/selftests TARGETS=riscv \
+ INSTALL_PATH=<your-installation-path> install
+
+ or, alternatively, only specific riscv/ subtargets can be picked:
+
+ $ make -C tools/testing/selftests TARGETS=riscv RISCV_SUBTARGETS="mm vector" \
+ INSTALL_PATH=<your-installation-path> install
+
+ Further details on building and running KSFT can be found in:
+ Documentation/dev-tools/kselftest.rst
--
2.43.0
Currently the UAPI headers are always installed into the source directory.
When building out-of-tree this doesn't work, as the include path will be
wrong and it dirties the source tree, leading to complains by kbuild.
Make sure the 'headers' target installs the UAPI headers in the correctly.
The real target directory can come from multiple places. To handle them all
extract the target directory from KHDR_INCLUDES.
Reported-by: Jason Gunthorpe <jgg(a)nvidia.com>
Closes: https://lore.kernel.org/lkml/20250917153209.GA2023406@nvidia.com/
Fixes: 1a59f5d31569 ("selftests: Add headers target")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg(a)nvidia.com>
---
tools/testing/selftests/lib.mk | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index 5303900339292e618dee4fd7ff8a7c2fa3209a68..a448fae57831d86098806adaff53f6f1a747febb 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -228,7 +228,10 @@ $(OUTPUT)/%:%.S
$(LINK.S) $^ $(LDLIBS) -o $@
endif
+# Extract the expected header directory
+khdr_output := $(patsubst %/usr/include,%,$(filter %/usr/include,$(KHDR_INCLUDES)))
+
headers:
- $(Q)$(MAKE) -C $(top_srcdir) headers
+ $(Q)$(MAKE) -f $(top_srcdir)/Makefile -C $(khdr_output) headers
.PHONY: run_tests all clean install emit_tests gen_mods_dir clean_mods_dir headers
---
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
change-id: 20250918-kselftest-uapi-out-of-tree-98d50f59040c
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Currently it is not possible to disable streaming mode via ptrace on SME
only systems, the interface for doing this is to write via NT_ARM_SVE but
such writes will be rejected on a system without SVE support. Enable this
functionality by allowing userspace to write SVE_PT_REGS_FPSIMD format data
via NT_ARM_SVE with the vector length set to 0 on SME only systems. Such
writes currently error since we require that a vector length is specified
which should minimise the risk that existing software is relying on current
behaviour.
Reads are not supported since I am not aware of any use case for this and
there is some risk that an existing userspace application may be confused if
it reads NT_ARM_SVE on a system without SVE. Existing kernels will return
FPSIMD formatted register state from NT_ARM_SVE if full SVE state is not
stored, for example if the task has not used SVE. Returning a vector length
of 0 would create a risk that software could try to do things like allocate
space for register state with zero sizes, while returning a vector length of
128 bits would look like SVE is supported. It seems safer to just not make
the changes to add read support.
It remains possible for userspace to detect a SME only system via the ptrace
interface only since reads of NT_ARM_SSVE and NT_ARM_ZA will suceed while
reads of NT_ARM_SVE will fail. Read/write access to the FPSIMD registers in
non-streaming mode is available via REGSET_FPR.
The aim is is to make a minimally invasive change, no operation that would
previously have succeeded will be affected, and we use a previously defined
interface in new circumstances rather than define completely new ABI.
The series starts with some enhancements to sve-ptrace to cover some
further corners of existing behaviours in order to reduce the risk of
inadvertent changes, implements the proposed new ABI, then extends both
sve-ptrace and fp-ptrace to exercise it.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Mark Brown (5):
kselftest/arm64: Verify that we reject out of bounds VLs in sve-ptrace
kselftest/arm64: Check that unsupported regsets fail in sve-ptrace
arm64/sme: Support disabling streaming mode via ptrace on SME only systems
kselftst/arm64: Test NT_ARM_SVE FPSIMD format writes on non-SVE systems
kselftest/arm64: Cover disabling streaming mode without SVE in fp-ptrace
Documentation/arch/arm64/sve.rst | 5 +
arch/arm64/kernel/ptrace.c | 40 ++++++--
tools/testing/selftests/arm64/fp/fp-ptrace.c | 5 +-
tools/testing/selftests/arm64/fp/sve-ptrace.c | 139 +++++++++++++++++++++++++-
4 files changed, 177 insertions(+), 12 deletions(-)
---
base-commit: 768361ab16ce943ef3577cea204dc81aa4a47517
change-id: 20250717-arm64-sme-ptrace-sme-only-1fb850600ea0
prerequisite-change-id: 20250808-arm64-fp-trace-macro-02ede083da51
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Hi,
The pre-existing kselftest for TPM2 is derived works of my earlier Python
based rudimentary TPM2 stack called 'tpm2-scripts'.
In order to get more coverage and more mainintainable and extensible test
suite I'd like to eventually rewrite the tests with bash and tpm2sh, which
is a TPM2 cli written with Rust and based on my new TPM2 stack [1] [2].
Given linux-rust work, would it be acceptable to require cargo to install
a runner for kselftest? I'm finishing off now 0.11 version of the tool,
which will take some time (versions before that are honestly quite bad,
don't try them) but after that this would be something I'd like to
put together.
NOTE: while tpm2-protocol itself is Apache/MIT, tpm2sh is GPL3 licensed
command-line program (for what it is worth).
[1] https://github.com/puavo-org/tpm2sh
[2] https://git.kernel.org/pub/scm/linux/kernel/git/jarkko/tpm2-protocol.git/ab…
BR, Jarkko
kunit_kcalloc() may fail. parse_filter_attr_test() uses the returned
pointer without checking it, and then writes to parsed_filters[j],
which can lead to a NULL pointer dereference under low-memory
conditions.
Use KUNIT_ASSERT_NOT_ERR_OR_NULL() to abort the test on allocation
failure, per KUnit guidance for unsafe-to-continue cases.
Fixes: 1c9fd080dffe ("kunit: fix uninitialized variables bug in attributes filtering")
Fixes: 76066f93f1df ("kunit: add tests for filtering attributes")
Signed-off-by: Guangshuo Li <lgs201920130244(a)gmail.com>
---
lib/kunit/executor_test.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/lib/kunit/executor_test.c b/lib/kunit/executor_test.c
index f0090c2729cd..084636ad8a72 100644
--- a/lib/kunit/executor_test.c
+++ b/lib/kunit/executor_test.c
@@ -127,6 +127,10 @@ static void parse_filter_attr_test(struct kunit *test)
parsed_filters = kunit_kcalloc(test, filter_count, sizeof(*parsed_filters),
GFP_KERNEL);
+
+ /* Abort test if allocation failed. */
+ KUNIT_ASSERT_NOT_ERR_OR_NULL(test, parsed_filters);
+
for (j = 0; j < filter_count; j++) {
parsed_filters[j] = kunit_next_attr_filter(&filter, &err);
KUNIT_ASSERT_EQ_MSG(test, err, 0, "failed to parse filter from '%s'", filters);
--
2.43.0
basic-gcs has it's own make rule to handle the special compiler
invocation to build against nolibc. This rule does not respect the
$(CFLAGS) passed by the Makefile from the parent directory.
However these $(CFLAGS) set up the include path to include the UAPI
headers from the current kernel.
Due to this the asm/hwcap.h header is used from the toolchain instead of
the UAPI and the definition of HWCAP_GCS is not found.
Restructure the rule for basic-gcs to respect the $(CFLAGS).
Also drop those options which are already provided by $(CFLAGS).
Reported-by: Naresh Kamboju <naresh.kamboju(a)linaro.org>
Closes: https://lore.kernel.org/lkml/CA+G9fYv77X+kKz2YT6xw7=9UrrotTbQ6fgNac7oohOg8B…
Fixes: a985fe638344 ("kselftest/arm64/gcs: Use nolibc's getauxval()")
Tested-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
tools/testing/selftests/arm64/gcs/Makefile | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/arm64/gcs/Makefile b/tools/testing/selftests/arm64/gcs/Makefile
index d2f3497a9103fc12ebc90c7f4e33ab9b846c6c8a..1fbbf0ca1f0291d00882920eb2d1efbf99882ec1 100644
--- a/tools/testing/selftests/arm64/gcs/Makefile
+++ b/tools/testing/selftests/arm64/gcs/Makefile
@@ -14,11 +14,11 @@ LDLIBS+=-lpthread
include ../../lib.mk
$(OUTPUT)/basic-gcs: basic-gcs.c
- $(CC) -g -fno-asynchronous-unwind-tables -fno-ident -s -Os -nostdlib \
- -static -include ../../../../include/nolibc/nolibc.h \
+ $(CC) $(CFLAGS) -fno-asynchronous-unwind-tables -fno-ident -s -nostdlib -nostdinc \
+ -static -I../../../../include/nolibc -include ../../../../include/nolibc/nolibc.h \
-I../../../../../usr/include \
-std=gnu99 -I../.. -g \
- -ffreestanding -Wall $^ -o $@ -lgcc
+ -ffreestanding $^ -o $@ -lgcc
$(OUTPUT)/gcs-stress-thread: gcs-stress-thread.S
$(CC) -nostdlib $^ -o $@
---
base-commit: 14a41628c470f4aa069075cdcf6ec0138b6cf1da
change-id: 20250916-arm64-gcs-nolibc-7d1f03a2a3cf
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
From: Yicong Yang <yangyicong(a)hisilicon.com>
Armv8.7 introduces single-copy atomic 64-byte loads and stores
instructions and its variants named under FEAT_{LS64, LS64_V}.
Add support for Armv8.7 FEAT_{LS64, LS64_V}:
- Add identifying and enabling in the cpufeature list
- Expose the support of these features to userspace through HWCAP3
and cpuinfo
- Add related hwcap test
- Handle the trap of unsupported memory (normal/uncacheable) access in a VM
A real scenario for this feature is that the userspace driver can make use of
this to implement direct WQE (workqueue entry) - a mechanism to fill WQE
directly into the hardware.
Picked Marc's 2 patches form [1] for handling the LS64 trap in a VM on emulated
MMIO and the introduce of KVM_EXIT_ARM_LDST64B.
[1] https://lore.kernel.org/linux-arm-kernel/20240815125959.2097734-1-maz@kerne…
Tested with updated hwcap test:
[root@localhost tmp]# dmesg | grep "All CPU(s) started"
[ 14.789859] CPU: All CPU(s) started at EL2
[root@localhost tmp]# ./hwcap
# LS64 present
ok 217 cpuinfo_match_LS64
ok 218 sigill_LS64
ok 219 # SKIP sigbus_LS64_V
# LS64_V present
ok 220 cpuinfo_match_LS64_V
ok 221 sigill_LS64_V
ok 222 # SKIP sigbus_LS64_V
# 115 skipped test(s) detected. Consider enabling relevant config options to improve coverage.
# Totals: pass:107 fail:0 xfail:0 xpass:0 skip:115 error:0
root@localhost:/mnt# dmesg | grep "All CPU(s) started"
[ 0.281152] CPU: All CPU(s) started at EL1
root@localhost:/mnt# ./hwcap
# LS64 present
ok 217 cpuinfo_match_LS64
ok 218 sigill_LS64
ok 219 # SKIP sigbus_LS64
# LS64_V present
ok 220 cpuinfo_match_LS64_V
ok 221 sigill_LS64_V
ok 222 # SKIP sigbus_LS64_V
# 115 skipped test(s) detected. Consider enabling relevant config options to improve coverage.
# Totals: pass:107 fail:0 xfail:0 xpass:0 skip:115 error:0
Change since v3:
- Inject DABT fault for LS64 fault on unsupported memory but with valid memslot
Link: https://lore.kernel.org/linux-arm-kernel/20250626080906.64230-1-yangyicong@…
Change since v2:
- Handle the LS64 fault to userspace and allow userspace to inject LS64 fault
- Reorder the patches to make KVM handling prior to feature support
Link: https://lore.kernel.org/linux-arm-kernel/20250331094320.35226-1-yangyicong@…
Change since v1:
- Drop the support for LS64_ACCDATA
- handle the DABT of unsupported memory type after checking the memory attributes
Link: https://lore.kernel.org/linux-arm-kernel/20241202135504.14252-1-yangyicong@…
Marc Zyngier (2):
KVM: arm64: Add exit to userspace on {LD,ST}64B* outside of memslots
KVM: arm64: Add documentation for KVM_EXIT_ARM_LDST64B
Yicong Yang (5):
KVM: arm64: Handle DABT caused by LS64* instructions on unsupported
memory
arm64: Provide basic EL2 setup for FEAT_{LS64, LS64_V} usage at EL0/1
arm64: Add support for FEAT_{LS64, LS64_V}
KVM: arm64: Enable FEAT_{LS64, LS64_V} in the supported guest
kselftest/arm64: Add HWCAP test for FEAT_{LS64, LS64_V}
Documentation/arch/arm64/booting.rst | 12 +++
Documentation/arch/arm64/elf_hwcaps.rst | 6 ++
Documentation/virt/kvm/api.rst | 43 +++++++++--
arch/arm64/include/asm/el2_setup.h | 12 ++-
arch/arm64/include/asm/esr.h | 8 ++
arch/arm64/include/asm/hwcap.h | 2 +
arch/arm64/include/asm/kvm_emulate.h | 7 ++
arch/arm64/include/uapi/asm/hwcap.h | 2 +
arch/arm64/kernel/cpufeature.c | 51 +++++++++++++
arch/arm64/kernel/cpuinfo.c | 2 +
arch/arm64/kvm/inject_fault.c | 29 ++++++++
arch/arm64/kvm/mmio.c | 27 ++++++-
arch/arm64/kvm/mmu.c | 14 +++-
arch/arm64/tools/cpucaps | 2 +
include/uapi/linux/kvm.h | 3 +-
tools/testing/selftests/arm64/abi/hwcap.c | 90 +++++++++++++++++++++++
16 files changed, 299 insertions(+), 11 deletions(-)
--
2.24.0
Hi all,
This is a second version of a series I sent some time ago, it continues
the work of migrating the script tests into prog_tests.
The test_xsk.sh script covers many AF_XDP use cases. The tests it runs
are defined in xksxceiver.c. Since this script is used to test real
hardware, the goal here is to leave it as it is, and only integrate the
tests that run on veth peers into the test_progs framework.
Some tests are flaky so they can't be integrated in the CI as they are.
I think that fixing their flakyness would require a significant amount of
work. So, as first step, I've excluded them from the list of tests
migrated to the CI (see PATCH 13). If these tests get fixed at some
point, integrating them into the CI will be straightforward.
PATCH 1 extracts test_xsk[.c/.h] from xskxceiver[.c/.h] to make the
tests available to test_progs.
PATCH 2 to 5 fix small issues in the current test
PATCH 7 to 12 handle all errors to release resources instead of calling
exit() when any error occurs.
PATCH 13 isolates some flaky tests
PATCH 14 integrate the non-flaky tests to the test_progs framework
Maciej, I've fixed the bug you found in the initial series. I've
looked for any hardware able to run test_xsk.sh in my office, but I
couldn't find one ... So here again, only the veth part has been tested,
sorry about that.
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
---
Changes in v3:
- Rebase on latest bpf-next_base to integrate commit c9110e6f7237 ("selftests/bpf:
Fix count write in testapp_xdp_metadata_copy()").
- Move XDP_METADATA_COPY_* tests from flaky-tests to nominal tests
- Link to v2: https://lore.kernel.org/r/20250902-xsk-v2-0-17c6345d5215@bootlin.com
Changes in v2:
- Rebase on the latest bpf-next_base and integrate the newly added tests
to the work (adjust_tail* and tx_queue_consumer tests)
- Re-order patches to split xkxceiver sooner.
- Fix the bug reported by Maciej.
- Fix verbose mode in test_xsk.sh by keeping kselftest (remove PATCH 1,
7 and 8)
- Link to v1: https://lore.kernel.org/r/20250313-xsk-v1-0-7374729a93b9@bootlin.com
---
Bastien Curutchet (eBPF Foundation) (14):
selftests/bpf: test_xsk: Split xskxceiver
selftests/bpf: test_xsk: Initialize bitmap before use
selftests/bpf: test_xsk: Fix memory leaks
selftests/bpf: test_xsk: Wrap test clean-up in functions
selftests/bpf: test_xsk: Release resources when swap fails
selftests/bpf: test_xsk: Add return value to init_iface()
selftests/bpf: test_xsk: Don't exit immediately when xsk_attach fails
selftests/bpf: test_xsk: Don't exit immediately when gettimeofday fails
selftests/bpf: test_xsk: Don't exit immediately when workers fail
selftests/bpf: test_xsk: Don't exit immediately if validate_traffic fails
selftests/bpf: test_xsk: Don't exit immediately on allocation failures
selftests/bpf: test_xsk: Move exit_with_error to xskxceiver.c
selftests/bpf: test_xsk: Isolate flaky tests
selftests/bpf: test_xsk: Integrate test_xsk.c to test_progs framework
tools/testing/selftests/bpf/Makefile | 11 +-
tools/testing/selftests/bpf/prog_tests/test_xsk.c | 2604 ++++++++++++++++++++
tools/testing/selftests/bpf/prog_tests/test_xsk.h | 294 +++
tools/testing/selftests/bpf/prog_tests/xsk.c | 146 ++
tools/testing/selftests/bpf/xskxceiver.c | 2685 +--------------------
tools/testing/selftests/bpf/xskxceiver.h | 156 --
6 files changed, 3170 insertions(+), 2726 deletions(-)
---
base-commit: e4e08c130231eb8071153ab5f056874d8f70430b
change-id: 20250218-xsk-0cf90e975d14
Best regards,
--
Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com>
Hello,
Please find the v19 AccECN protocol patch series, which covers the core
functionality of Accurate ECN, AccECN negotiation, AccECN TCP options,
and AccECN failure handling. The Accurate ECN draft can be found in
https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-accurate-ecn-28, and it
will be RFC9768.
This patch series is part of the full AccECN patch series, which is available at
https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/
Best Regards,
Chia-Yu
---
v19 (16-Sep-2025)
- Repost remaining 10 patches in this series, as the first 4 patches are applied (Jakub Kicinski <kuba(a)kernel.org>)
v18 (11-Sep-2025)
- Reorder tcpi_accecn_fail_mode and tcpi_accecn_opt_seen to avoid adding fields in the middle of tcp_info (Eric Dumazet <edumazet(a)google.com>)
v17 (8-Sep-2025)
- Change tcp_ecn_mode_max from 5 to 2 to disable AccECN enablement before the whole AccECN feature been accpeted
v16 (6-Sep-2025)
- Use TCP_ECN_IN_ACCECN_OUT_ACCECN, TCP_ECN_IN_ECN_OUT_ECN, and TCP_ECN_IN_ACCECN_OUT_ECN in comments of tcp_ecn_send_syn() (Eric Dumazet <edumazet(a)google.com>)
- Add tcpi_accecn_fail_mode and tcpi_accecn_opt_seen to make tcp_info be multiple of 64 bits in patch #12
v15 (14-Aug-205)
- Update pahole results in commit messages
- Accurate ECN will become RFC9768
v14 (22-Jul-2025)
- Add missing const for struct tcp_sock of tcp_accecn_option_beacon_check() of #11 (Simon Horman <horms(a)kernel.org>)
v13 (18-Jul-2025)
- Implement tcp_accecn_extract_syn_ect() and tcp_accecn_reflector_flags() with static array lookup of patch #6 (Paolo Abeni <pabeni(a)redhat.com>)
- Fix typos in comments of #6 and remove patch #7 of v12 about simulatenous connect (Paolo Abeni <pabeni(a)redhat.com>)
- Move TCP_ACCECN_E1B_INIT_OFFSET, TCP_ACCECN_E0B_INIT_OFFSET, and TCP_ACCECN_CEB_INIT_OFFSET from patch #7 to #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Use static array lookup in tcp_accecn_optfield_to_ecnfield() of patch #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Return false when WARN_ON_ONCE() is true in tcp_accecn_process_option() of patch #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Make synack_ecn_bytes as static const array and use const u32 pointer in tcp_options_write() of #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Use ALIGN() and ALIGN_DOWN() in tcp_options_fit_accecn() to pad TCP AccECN option to dword of #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Return TCP_ACCECN_OPT_FAIL_SEEN if WARN_ON_ONCE() is true in tcp_accecn_option_init() of #12 (Paolo Abeni <pabeni(a)redhat.com>)
v12 (04-Jul-2025)
- Fix compilation issues with some intermediate patches in v11
- Add more comments for AccECN helpers of tcp_ecn.h
v11 (03-Jul-2025)
- Fix compilation issues with some intermediate patches in v10
v10 (02-Jul-2025)
- Add new patch of separated header file include/net/tcp_ecn.h to include ECN and AccECN functions (Eric Dumazet <edumazet(a)google.com>)
- Add comments on the AccECN helper functions in tcp_ecn.h (Eric Dumazet <edumazet(a)google.com>)
- Add documentation of tcp_ecn, tcp_ecn_option, tcp_ecn_beacon in ip-sysctl.rst to the corresponding patch (Eric Dumazet <edumazet(a)google.com>)
- Split wait third ACK functionality into a separated patch from AccECN negotiation patch (Eric Dumazet <edumazet(a)google.com>)
- Add READ_ONCE() over every reads of sysctl for all patches in the series (Eric Dumazet <edumazet(a)google.com>)
- Merge heuristics of AccECN option ceb/cep and ACE field multi-wrap into a single patch
- Add a table of SACK block reduction and required AccECN field in patch #15 commit message (Eric Dumazet <edumazet(a)google.com>)
v9 (21-Jun-2025)
- Use tcp_data_ecn_check() to set TCP_ECN_SEE flag only for RFC3168 ECN (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments about setting TCP_ECN_SEEN flag for RFC3168 and Accruate ECN (Paolo Abeni <pabeni(a)redhat.com>)
- Restruct the code in the for loop of tcp_accecn_process_option() (Paolo Abeni <pabeni(a)redhat.com>)
- Remove ecn_bytes and add use_synack_ecn_bytes flag to identify whether syn_ack_bytes or received_ecn_bytes is used (Paolo Abeni <pabeni(a)redhat.com>)
- Replace leftover_bytes and leftover_size with leftover_highbyte and leftover_lowbyte and add comments in tcp_options_write() (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments and commit message about the 1st retx SYN still attempt AccECN negotiation (Paolo Abeni <pabeni(a)redhat.com>)
v8 (10-Jun-2025)
- Add new helper function tcp_ecn_received_counters_payload() in #6 (Paolo Abeni <pabeni(a)redhat.com>)
- Set opts->num_sack_blocks=0 to avoid potential undefined value in #8 (Paolo Abeni <pabeni(a)redhat.com>)
- Reset leftover_size to 2 once leftover_bytes is used in #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Add new helper function tcp_accecn_opt_demand_min() in #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Add new helper function tcp_accecn_saw_opt_fail_recv() in #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Update tcp_options_fit_accecn() to avoid using recursion in #14 (Paolo Abeni <pabeni(a)redhat.com>)
v7 (14-May-2025)
- Modify group sizes of tcp_sock_write_txrx and tcp_sock_write_rx in #3 based on pahole results (Paolo Abeni <pabeni(a)redhat.com>)
- Fix the issue in #4 and #5 where the RFC3168 ECN behavior in tcp_ecn_send() is changed (Paolo Abeni <pabeni(a)redhat.com>)
- Modify group size of tcp_sock_write_txrx in #4 and #6 based on pahole results (Paolo Abeni <pabeni(a)redhat.com>)
- Update commit message for #9 to explain the increase in tcp_sock_write_rx group size
- Modify group size of tcp_sock_write_tx in #10 based on pahole results
v6 (09-May-2025)
- Add #3 to utilize exisintg holes of tcp_sock_write_txrx group for later patches (#4, #9, #10) with new u8 members (Paolo Abeni <pabeni(a)redhat.com>)
- Add pahole outcomes before and after commit in #4, #5, #6, #9, #10, #15 (Paolo Abeni <pabeni(a)redhat.com>)
- Define new helper function tcp_send_ack_reflect_ect() for sending ACK with reflected ECT in #5 (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments for function tcp_ecn_rcv_synack() in #5 (Paolo Abeni <pabeni(a)redhat.com>)
- Add enum/define to be used by sysctl_tcp_ecn in #5, sysctl_tcp_ecn_option in #9, and sysctl_tcp_ecn_option_beacon in #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Move accecn_fail_mode and saw_accecn_opt in #5 and #11 to use exisintg holes of tcp_sock (Paolo Abeni <pabeni(a)redhat.com>)
- Change data type of new members of tcp_request_sock and move them to the end of struct in #5 and #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Move new members of tcp_info to the end of struct in #6 (Paolo Abeni <pabeni(a)redhat.com>)
- Merge previous #7 into #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Mask ecnfield with INET_ECN_MASK to remove WARN_ONCE in #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Reduce the indentation levels for reabability in #9 and #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Move delivered_ecn_bytes to the RX group in #9, accecn_opt_tstamp to the TX group in #10, pkts_acked_ewma to the RX group in #15 (Paolo Abeni <pabeni(a)redhat.com>)
- Add changes in Documentation/networking/net_cachelines/tcp_sock.rst for new tcp_sock members in #3, #5, #6, #9, #10, #15
v5 (22-Apr-2025)
- Further fix for 32-bit ARM alignment in tcp.c (Simon Horman <horms(a)kernel.org>)
v4 (18-Apr-2025)
- Fix 32-bit ARM assertion for alignment requirement (Simon Horman <horms(a)kernel.org>)
v3 (14-Apr-2025)
- Fix patch apply issue in v2 (Jakub Kicinski <kuba(a)kernel.org>)
v2 (18-Mar-2025)
- Add one missing patch from the previous AccECN protocol preparation patch series to this patch series.
---
Chia-Yu Chang (3):
tcp: accecn: AccECN option send control
tcp: accecn: AccECN option failure handling
tcp: accecn: try to fit AccECN option with SACK
Ilpo Järvinen (7):
tcp: AccECN core
tcp: accecn: AccECN negotiation
tcp: accecn: add AccECN rx byte counters
tcp: accecn: AccECN needs to know delivered bytes
tcp: sack option handling improvements
tcp: accecn: AccECN option
tcp: accecn: AccECN option ceb/cep and ACE field multi-wrap heuristics
Documentation/networking/ip-sysctl.rst | 55 +-
.../networking/net_cachelines/tcp_sock.rst | 12 +
include/linux/tcp.h | 28 +-
include/net/netns/ipv4.h | 2 +
include/net/tcp.h | 33 ++
include/net/tcp_ecn.h | 554 +++++++++++++++++-
include/uapi/linux/tcp.h | 9 +
net/ipv4/syncookies.c | 4 +
net/ipv4/sysctl_net_ipv4.c | 19 +
net/ipv4/tcp.c | 30 +-
net/ipv4/tcp_input.c | 318 +++++++++-
net/ipv4/tcp_ipv4.c | 8 +-
net/ipv4/tcp_minisocks.c | 40 +-
net/ipv4/tcp_output.c | 239 +++++++-
net/ipv6/syncookies.c | 2 +
net/ipv6/tcp_ipv6.c | 1 +
16 files changed, 1278 insertions(+), 76 deletions(-)
--
2.34.1
From: Lance Yang <lance.yang(a)linux.dev>
The madv_populate and soft-dirty kselftests currently fail on systems where
CONFIG_MEM_SOFT_DIRTY is disabled.
Introduce a new helper softdirty_supported() into vm_util.c/h to ensure
tests are properly skipped when the feature is not enabled.
Acked-by: David Hildenbrand <david(a)redhat.com>
Suggested-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: Lance Yang <lance.yang(a)linux.dev>
---
v2 -> v3:
- Optimize softdirty_supported() by directly assigning check_vmflag()
result (per David)
- Pick AB from David - thanks!
- https://lore.kernel.org/lkml/20250917122750.36608-1-lance.yang@linux.dev
v1 -> v2:
- Rename softdirty_is_supported() to softdirty_supported() (per David)
- Drop aarch64 specific handling (per David)
- https://lore.kernel.org/lkml/20250917055913.49759-1-lance.yang@linux.dev
tools/testing/selftests/mm/madv_populate.c | 21 ++-------------------
tools/testing/selftests/mm/soft-dirty.c | 5 ++++-
tools/testing/selftests/mm/vm_util.c | 17 +++++++++++++++++
tools/testing/selftests/mm/vm_util.h | 1 +
4 files changed, 24 insertions(+), 20 deletions(-)
diff --git a/tools/testing/selftests/mm/madv_populate.c b/tools/testing/selftests/mm/madv_populate.c
index b6fabd5c27ed..d8d11bc67ddc 100644
--- a/tools/testing/selftests/mm/madv_populate.c
+++ b/tools/testing/selftests/mm/madv_populate.c
@@ -264,23 +264,6 @@ static void test_softdirty(void)
munmap(addr, SIZE);
}
-static int system_has_softdirty(void)
-{
- /*
- * There is no way to check if the kernel supports soft-dirty, other
- * than by writing to a page and seeing if the bit was set. But the
- * tests are intended to check that the bit gets set when it should, so
- * doing that check would turn a potentially legitimate fail into a
- * skip. Fortunately, we know for sure that arm64 does not support
- * soft-dirty. So for now, let's just use the arch as a corse guide.
- */
-#if defined(__aarch64__)
- return 0;
-#else
- return 1;
-#endif
-}
-
int main(int argc, char **argv)
{
int nr_tests = 16;
@@ -288,7 +271,7 @@ int main(int argc, char **argv)
pagesize = getpagesize();
- if (system_has_softdirty())
+ if (softdirty_supported())
nr_tests += 5;
ksft_print_header();
@@ -300,7 +283,7 @@ int main(int argc, char **argv)
test_holes();
test_populate_read();
test_populate_write();
- if (system_has_softdirty())
+ if (softdirty_supported())
test_softdirty();
err = ksft_get_fail_cnt();
diff --git a/tools/testing/selftests/mm/soft-dirty.c b/tools/testing/selftests/mm/soft-dirty.c
index 8a3f2b4b2186..4ee4db3750c1 100644
--- a/tools/testing/selftests/mm/soft-dirty.c
+++ b/tools/testing/selftests/mm/soft-dirty.c
@@ -200,8 +200,11 @@ int main(int argc, char **argv)
int pagesize;
ksft_print_header();
- ksft_set_plan(15);
+ if (!softdirty_supported())
+ ksft_exit_skip("soft-dirty is not support\n");
+
+ ksft_set_plan(15);
pagemap_fd = open(PAGEMAP_FILE_PATH, O_RDONLY);
if (pagemap_fd < 0)
ksft_exit_fail_msg("Failed to open %s\n", PAGEMAP_FILE_PATH);
diff --git a/tools/testing/selftests/mm/vm_util.c b/tools/testing/selftests/mm/vm_util.c
index 56e9bd541edd..e33cda301dad 100644
--- a/tools/testing/selftests/mm/vm_util.c
+++ b/tools/testing/selftests/mm/vm_util.c
@@ -449,6 +449,23 @@ bool check_vmflag_pfnmap(void *addr)
return check_vmflag(addr, "pf");
}
+bool softdirty_supported(void)
+{
+ char *addr;
+ bool supported = false;
+ const size_t pagesize = getpagesize();
+
+ /* New mappings are expected to be marked with VM_SOFTDIRTY (sd). */
+ addr = mmap(0, pagesize, PROT_READ | PROT_WRITE,
+ MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
+ if (!addr)
+ ksft_exit_fail_msg("mmap failed\n");
+
+ supported = check_vmflag(addr, "sd");
+ munmap(addr, pagesize);
+ return supported;
+}
+
/*
* Open an fd at /proc/$pid/maps and configure procmap_out ready for
* PROCMAP_QUERY query. Returns 0 on success, or an error code otherwise.
diff --git a/tools/testing/selftests/mm/vm_util.h b/tools/testing/selftests/mm/vm_util.h
index 07c4acfd84b6..26c30fdc0241 100644
--- a/tools/testing/selftests/mm/vm_util.h
+++ b/tools/testing/selftests/mm/vm_util.h
@@ -104,6 +104,7 @@ bool find_vma_procmap(struct procmap_fd *procmap, void *address);
int close_procmap(struct procmap_fd *procmap);
int write_sysfs(const char *file_path, unsigned long val);
int read_sysfs(const char *file_path, unsigned long *val);
+bool softdirty_supported(void);
static inline int open_self_procmap(struct procmap_fd *procmap_out)
{
--
2.49.0
This patch simplifies kublk's implementation of the feature list
command, fixes a bug where a feature was missing, and adds a test to
ensure that similar bugs do not happen in the future.
Signed-off-by: Uday Shankar <ushankar(a)purestorage.com>
---
Uday Shankar (3):
selftests: ublk: kublk: simplify feat_map definition
selftests: ublk: kublk: add UBLK_F_BUF_REG_OFF_DAEMON to feat_map
selftests: ublk: add test to verify that feat_map is complete
tools/testing/selftests/ublk/Makefile | 1 +
tools/testing/selftests/ublk/kublk.c | 32 +++++++++++++------------
tools/testing/selftests/ublk/test_generic_13.sh | 16 +++++++++++++
3 files changed, 34 insertions(+), 15 deletions(-)
---
base-commit: da7b97ba0d219a14a83e9cc93f98b53939f12944
change-id: 20250916-ublk_features-07af4e321e5a
Best regards,
--
Uday Shankar <ushankar(a)purestorage.com>
Unlike IPv4, IPv6 routing strictly requires the source address to be valid
on the outgoing interface. If the NS target is set to a remote VLAN interface,
and the source address is also configured on a VLAN over a bond interface,
setting the oif to the bond device will fail to retrieve the correct
destination route.
Fix this by not setting the oif to the bond device when retrieving the NS
target destination. This allows the correct destination device (the VLAN
interface) to be determined, so that bond_verify_device_path can return the
proper VLAN tags for sending NS messages.
Reported-by: David Wilder <wilder(a)us.ibm.com>
Closes: https://lore.kernel.org/netdev/aGOKggdfjv0cApTO@fedora/
Suggested-by: Jay Vosburgh <jv(a)jvosburgh.net>
Tested-by: David Wilder <wilder(a)us.ibm.com>
Acked-by: Jay Vosburgh <jv(a)jvosburgh.net>
Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets")
Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
v4: rebase to latest net
v3: no update
v2: split the patch into 2 parts, the kernel change and test update (Jay Vosburgh)
---
drivers/net/bonding/bond_main.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 8832bc9f107b..57be04f6cb11 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3356,7 +3356,6 @@ static void bond_ns_send_all(struct bonding *bond, struct slave *slave)
/* Find out through which dev should the packet go */
memset(&fl6, 0, sizeof(struct flowi6));
fl6.daddr = targets[i];
- fl6.flowi6_oif = bond->dev->ifindex;
dst = ip6_route_output(dev_net(bond->dev), NULL, &fl6);
if (dst->error) {
--
2.50.1
This patch series refactors all futex selftests to use
kselftest_harness.h instead of futex's logging.h, as discussed here [1].
This allows to remove a lot of boilerplate code and to simplify some
parts of the test logic, mainly when the test needs to exit early. The
result of this is more than 500 lines removed from
tools/testing/selftests/futex/. Also, this enables new tests to use
kselftest.h features like ASSERT_s and such.
There are some caveats around this refactor:
- logging.h had verbosity levels, while kselftest_harness.h doesn't. I
created a new print function called ksft_print_dbg_msg() that prints
the message if the user uses the -d flag, so now there's an
equivalent of this feature.
- futex_requeue_pi test accepted command line arguments to be used as
test parameters (e.g. ./futex_requeue_pi -b -l -t 500000). This
doesn't work with kselftest_harness.h because there's no
straightforward way to send command line arguments to the test.
I used FIXTURE_VARIANT() to achieve the same result, but now the
parameters live inside of the test file, instead of on
functional/run.sh. This increased a little bit the number of test
cases for futex_requeue_pi, from 22 to 24.
Although there are a lot of patches, they follow a simple pattern:
- The usage() function, options parsing, kseftest setup
(ksft_print_header(), ksft_set_plan(), ksft_print_cnts()), are all
gone
- info() is replaced with ksft_print_dbg_msg()
- Exit on error paths are replaced with ksft_exit_fail_msg()
- fail/pass replaced with their ksft_ equivalents.
I have compared the results of run.sh before and after this patchset and
didn't find any regression from the test results.
Thanks,
André
[1] https://lore.kernel.org/lkml/87ecv6p364.ffs@tglx/
---
Changes in v3:
- Rebased on top of tip/locking/futex
- Link to v2: https://lore.kernel.org/r/20250720-tonyk-robust_test_cleanup-v2-0-1f9bcb5b7…
Changes in v2:
- Rebased on top of tip/master
- Dropped priv_hash global test variant now that this feature was
dropped
- Added include <stdbool.h> in the first patch
- Link to v1: https://lore.kernel.org/r/20250704-tonyk-robust_test_cleanup-v1-0-c0ff4f24c…
---
André Almeida (15):
selftests: kselftest: Create ksft_print_dbg_msg()
selftests/futex: Refactor futex_requeue_pi with kselftest_harness.h
selftests/futex: Refactor futex_requeue_pi_mismatched_ops with kselftest_harness.h
selftests/futex: Refactor futex_requeue_pi_signal_restart with kselftest_harness.h
selftests/futex: Refactor futex_wait_timeout with kselftest_harness.h
selftests/futex: Refactor futex_wait_wouldblock with kselftest_harness.h
selftests/futex: Refactor futex_wait_unitialized_heap with kselftest_harness.h
selftests/futex: Refactor futex_wait_private_mapped_file with kselftest_harness.h
selftests/futex: Refactor futex_wait with kselftest_harness.h
selftests/futex: Refactor futex_requeue with kselftest_harness.h
selftests/futex: Refactor futex_waitv with kselftest_harness.h
selftests/futex: Refactor futex_priv_hash with kselftest_harness.h
selftests/futex: Refactor futex_numa_mpol with kselftest_harness.h
selftests/futex: Drop logging.h include from futex_numa
selftests/futex: Remove logging.h file
tools/testing/selftests/futex/functional/Makefile | 3 +-
.../selftests/futex/functional/futex_numa.c | 3 +-
.../selftests/futex/functional/futex_numa_mpol.c | 39 +--
.../selftests/futex/functional/futex_priv_hash.c | 48 +---
.../selftests/futex/functional/futex_requeue.c | 76 ++----
.../selftests/futex/functional/futex_requeue_pi.c | 261 ++++++++++-----------
.../functional/futex_requeue_pi_mismatched_ops.c | 86 ++-----
.../functional/futex_requeue_pi_signal_restart.c | 129 +++-------
.../selftests/futex/functional/futex_wait.c | 103 +++-----
.../functional/futex_wait_private_mapped_file.c | 83 ++-----
.../futex/functional/futex_wait_timeout.c | 139 +++++------
.../functional/futex_wait_uninitialized_heap.c | 76 ++----
.../futex/functional/futex_wait_wouldblock.c | 75 ++----
.../selftests/futex/functional/futex_waitv.c | 98 ++++----
tools/testing/selftests/futex/functional/run.sh | 61 +----
tools/testing/selftests/futex/include/logging.h | 148 ------------
tools/testing/selftests/kselftest.h | 14 ++
tools/testing/selftests/kselftest_harness.h | 13 +-
18 files changed, 456 insertions(+), 999 deletions(-)
---
base-commit: ed323aeda5e09fa1ab95946673939c8c425c329c
change-id: 20250703-tonyk-robust_test_cleanup-d1f3406365d9
Best regards,
--
André Almeida <andrealmeid(a)igalia.com>
Fix a memory leak in netpoll and introduce netconsole selftests that
expose the issue when running with kmemleak detection enabled.
This patchset includes a selftest for netpoll with multiple concurrent
users (netconsole + bonding), which simulates the scenario from test[1]
that originally demonstrated the issue allegedly fixed by commit
efa95b01da18 ("netpoll: fix use after free") - a commit that is now
being reverted.
Sending this to "net" branch because this is a fix, and the selftest
might help with the backports validation.
Link: https://lore.kernel.org/lkml/96b940137a50e5c387687bb4f57de8b0435a653f.14048… [1]
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Changes in v4:
- Added an additional selftest to test multiple netpoll users in
parallel
- Link to v3: https://lore.kernel.org/r/20250905-netconsole_torture-v3-0-875c7febd316@deb…
Changes in v3:
- This patchset is a merge of the fix and the selftest together as
recommended by Jakub.
Changes in v2:
- Reuse the netconsole creation from lib_netcons.sh. Thus, refactoring
the create_dynamic_target() (Jakub)
- Move the "wait" to after all the messages has been sent.
- Link to v1: https://lore.kernel.org/r/20250902-netconsole_torture-v1-1-03c6066598e9@deb…
---
Breno Leitao (4):
net: netpoll: fix incorrect refcount handling causing incorrect cleanup
selftest: netcons: refactor target creation
selftest: netcons: create a torture test
selftest: netcons: add test for netconsole over bonded interfaces
net/core/netpoll.c | 7 +-
tools/testing/selftests/drivers/net/Makefile | 2 +
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 197 ++++++++++++++++++---
.../selftests/drivers/net/netcons_over_bonding.sh | 76 ++++++++
.../selftests/drivers/net/netcons_torture.sh | 127 +++++++++++++
5 files changed, 384 insertions(+), 25 deletions(-)
---
base-commit: 5e87fdc37f8dc619549d49ba5c951b369ce7c136
change-id: 20250902-netconsole_torture-8fc23f0aca99
Best regards,
--
Breno Leitao <leitao(a)debian.org>
For testing the functionality of the vDSO, it is necessary to build
userspace programs for multiple different architectures.
It is additional work to acquire matching userspace cross-compilers with
full C libraries and then building root images out of those.
The kernel tree already contains nolibc, a small, header-only C library.
By using it, it is possible to build userspace programs without any
additional dependencies.
For example the kernel.org crosstools or multi-target clang can be used
to build test programs for a multitude of architectures.
While nolibc is very limited, it is enough for many selftests.
With some minor adjustments it is possible to make parse_vdso.c
compatible with nolibc.
As an example, vdso_standalone_test_x86 is now built from the same C
code as the regular vdso_test_gettimeofday, while still being completely
standalone.
Also drop the dependency of parse_vdso.c on the elf.h header from libc and only
use the one from the kernel's UAPI.
While this series is useful on its own now, it will also integrate with the
kunit UAPI framework currently under development:
https://lore.kernel.org/lkml/20250217-kunit-kselftests-v1-0-42b4524c3b0a@li…
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Changes in v2:
- Provide a limits.h header in nolibc
- Pick up Reviewed-by tags from Kees
- Link to v1: https://lore.kernel.org/r/20250203-parse_vdso-nolibc-v1-0-9cb6268d77be@linu…
---
Thomas Weißschuh (16):
MAINTAINERS: Add vDSO selftests
elf, uapi: Add definition for STN_UNDEF
elf, uapi: Add definition for DT_GNU_HASH
elf, uapi: Add definitions for VER_FLG_BASE and VER_FLG_WEAK
elf, uapi: Add type ElfXX_Versym
elf, uapi: Add types ElfXX_Verdef and ElfXX_Veraux
tools/include: Add uapi/linux/elf.h
selftests: Add headers target
tools/nolibc: add limits.h shim header
selftests: vDSO: vdso_standalone_test_x86: Use vdso_init_form_sysinfo_ehdr
selftests: vDSO: parse_vdso: Drop vdso_init_from_auxv()
selftests: vDSO: parse_vdso: Use UAPI headers instead of libc headers
selftests: vDSO: parse_vdso: Test __SIZEOF_LONG__ instead of ULONG_MAX
selftests: vDSO: vdso_test_gettimeofday: Clean up includes
selftests: vDSO: vdso_test_gettimeofday: Make compatible with nolibc
selftests: vDSO: vdso_standalone_test_x86: Switch to nolibc
MAINTAINERS | 1 +
include/uapi/linux/elf.h | 38 ++
tools/include/nolibc/Makefile | 1 +
tools/include/nolibc/limits.h | 7 +
tools/include/uapi/linux/elf.h | 524 +++++++++++++++++++++
tools/testing/selftests/lib.mk | 5 +-
tools/testing/selftests/vDSO/Makefile | 11 +-
tools/testing/selftests/vDSO/parse_vdso.c | 19 +-
tools/testing/selftests/vDSO/parse_vdso.h | 1 -
.../selftests/vDSO/vdso_standalone_test_x86.c | 143 +-----
.../selftests/vDSO/vdso_test_gettimeofday.c | 4 +-
11 files changed, 590 insertions(+), 164 deletions(-)
---
base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b
change-id: 20241017-parse_vdso-nolibc-e069baa7ff48
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
From: Lance Yang <lance.yang(a)linux.dev>
The madv_populate and soft-dirty kselftests currently fail on systems where
CONFIG_MEM_SOFT_DIRTY is disabled.
Introduce a new helper softdirty_supported() into vm_util.c/h to ensure
tests are properly skipped when the feature is not enabled.
Signed-off-by: Lance Yang <lance.yang(a)linux.dev>
---
v1 -> v2:
- Rename softdirty_is_supported() to softdirty_supported() (per David)
- Drop aarch64 specific handling (per David)
- https://lore.kernel.org/lkml/20250917055913.49759-1-lance.yang@linux.dev
tools/testing/selftests/mm/madv_populate.c | 21 ++-------------------
tools/testing/selftests/mm/soft-dirty.c | 5 ++++-
tools/testing/selftests/mm/vm_util.c | 19 +++++++++++++++++++
tools/testing/selftests/mm/vm_util.h | 1 +
4 files changed, 26 insertions(+), 20 deletions(-)
diff --git a/tools/testing/selftests/mm/madv_populate.c b/tools/testing/selftests/mm/madv_populate.c
index b6fabd5c27ed..d8d11bc67ddc 100644
--- a/tools/testing/selftests/mm/madv_populate.c
+++ b/tools/testing/selftests/mm/madv_populate.c
@@ -264,23 +264,6 @@ static void test_softdirty(void)
munmap(addr, SIZE);
}
-static int system_has_softdirty(void)
-{
- /*
- * There is no way to check if the kernel supports soft-dirty, other
- * than by writing to a page and seeing if the bit was set. But the
- * tests are intended to check that the bit gets set when it should, so
- * doing that check would turn a potentially legitimate fail into a
- * skip. Fortunately, we know for sure that arm64 does not support
- * soft-dirty. So for now, let's just use the arch as a corse guide.
- */
-#if defined(__aarch64__)
- return 0;
-#else
- return 1;
-#endif
-}
-
int main(int argc, char **argv)
{
int nr_tests = 16;
@@ -288,7 +271,7 @@ int main(int argc, char **argv)
pagesize = getpagesize();
- if (system_has_softdirty())
+ if (softdirty_supported())
nr_tests += 5;
ksft_print_header();
@@ -300,7 +283,7 @@ int main(int argc, char **argv)
test_holes();
test_populate_read();
test_populate_write();
- if (system_has_softdirty())
+ if (softdirty_supported())
test_softdirty();
err = ksft_get_fail_cnt();
diff --git a/tools/testing/selftests/mm/soft-dirty.c b/tools/testing/selftests/mm/soft-dirty.c
index 8a3f2b4b2186..4ee4db3750c1 100644
--- a/tools/testing/selftests/mm/soft-dirty.c
+++ b/tools/testing/selftests/mm/soft-dirty.c
@@ -200,8 +200,11 @@ int main(int argc, char **argv)
int pagesize;
ksft_print_header();
- ksft_set_plan(15);
+ if (!softdirty_supported())
+ ksft_exit_skip("soft-dirty is not support\n");
+
+ ksft_set_plan(15);
pagemap_fd = open(PAGEMAP_FILE_PATH, O_RDONLY);
if (pagemap_fd < 0)
ksft_exit_fail_msg("Failed to open %s\n", PAGEMAP_FILE_PATH);
diff --git a/tools/testing/selftests/mm/vm_util.c b/tools/testing/selftests/mm/vm_util.c
index 56e9bd541edd..ac41d10454a5 100644
--- a/tools/testing/selftests/mm/vm_util.c
+++ b/tools/testing/selftests/mm/vm_util.c
@@ -449,6 +449,25 @@ bool check_vmflag_pfnmap(void *addr)
return check_vmflag(addr, "pf");
}
+bool softdirty_supported(void)
+{
+ char *addr;
+ bool supported = false;
+ const size_t pagesize = getpagesize();
+
+ /* New mappings are expected to be marked with VM_SOFTDIRTY (sd). */
+ addr = mmap(0, pagesize, PROT_READ | PROT_WRITE,
+ MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
+ if (!addr)
+ ksft_exit_fail_msg("mmap failed\n");
+
+ if (check_vmflag(addr, "sd"))
+ supported = true;
+
+ munmap(addr, pagesize);
+ return supported;
+}
+
/*
* Open an fd at /proc/$pid/maps and configure procmap_out ready for
* PROCMAP_QUERY query. Returns 0 on success, or an error code otherwise.
diff --git a/tools/testing/selftests/mm/vm_util.h b/tools/testing/selftests/mm/vm_util.h
index 07c4acfd84b6..26c30fdc0241 100644
--- a/tools/testing/selftests/mm/vm_util.h
+++ b/tools/testing/selftests/mm/vm_util.h
@@ -104,6 +104,7 @@ bool find_vma_procmap(struct procmap_fd *procmap, void *address);
int close_procmap(struct procmap_fd *procmap);
int write_sysfs(const char *file_path, unsigned long val);
int read_sysfs(const char *file_path, unsigned long *val);
+bool softdirty_supported(void);
static inline int open_self_procmap(struct procmap_fd *procmap_out)
{
--
2.49.0
From: Lance Yang <lance.yang(a)linux.dev>
The madv_populate and soft-dirty kselftests currently fail on systems where
CONFIG_MEM_SOFT_DIRTY is disabled.
Introduce a new helper softdirty_is_supported() into vm_util.c/h to ensure
tests are properly skipped when the feature is not enabled.
Suggested-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: Lance Yang <lance.yang(a)linux.dev>
---
tools/testing/selftests/mm/madv_populate.c | 21 ++--------------
tools/testing/selftests/mm/soft-dirty.c | 5 +++-
tools/testing/selftests/mm/vm_util.c | 28 ++++++++++++++++++++++
tools/testing/selftests/mm/vm_util.h | 1 +
4 files changed, 35 insertions(+), 20 deletions(-)
diff --git a/tools/testing/selftests/mm/madv_populate.c b/tools/testing/selftests/mm/madv_populate.c
index b6fabd5c27ed..43dac7783004 100644
--- a/tools/testing/selftests/mm/madv_populate.c
+++ b/tools/testing/selftests/mm/madv_populate.c
@@ -264,23 +264,6 @@ static void test_softdirty(void)
munmap(addr, SIZE);
}
-static int system_has_softdirty(void)
-{
- /*
- * There is no way to check if the kernel supports soft-dirty, other
- * than by writing to a page and seeing if the bit was set. But the
- * tests are intended to check that the bit gets set when it should, so
- * doing that check would turn a potentially legitimate fail into a
- * skip. Fortunately, we know for sure that arm64 does not support
- * soft-dirty. So for now, let's just use the arch as a corse guide.
- */
-#if defined(__aarch64__)
- return 0;
-#else
- return 1;
-#endif
-}
-
int main(int argc, char **argv)
{
int nr_tests = 16;
@@ -288,7 +271,7 @@ int main(int argc, char **argv)
pagesize = getpagesize();
- if (system_has_softdirty())
+ if (softdirty_is_supported())
nr_tests += 5;
ksft_print_header();
@@ -300,7 +283,7 @@ int main(int argc, char **argv)
test_holes();
test_populate_read();
test_populate_write();
- if (system_has_softdirty())
+ if (softdirty_is_supported())
test_softdirty();
err = ksft_get_fail_cnt();
diff --git a/tools/testing/selftests/mm/soft-dirty.c b/tools/testing/selftests/mm/soft-dirty.c
index 8a3f2b4b2186..98e42d2ac32a 100644
--- a/tools/testing/selftests/mm/soft-dirty.c
+++ b/tools/testing/selftests/mm/soft-dirty.c
@@ -200,8 +200,11 @@ int main(int argc, char **argv)
int pagesize;
ksft_print_header();
- ksft_set_plan(15);
+ if (!softdirty_is_supported())
+ ksft_exit_skip("soft-dirty is not support\n");
+
+ ksft_set_plan(15);
pagemap_fd = open(PAGEMAP_FILE_PATH, O_RDONLY);
if (pagemap_fd < 0)
ksft_exit_fail_msg("Failed to open %s\n", PAGEMAP_FILE_PATH);
diff --git a/tools/testing/selftests/mm/vm_util.c b/tools/testing/selftests/mm/vm_util.c
index 56e9bd541edd..3173335df775 100644
--- a/tools/testing/selftests/mm/vm_util.c
+++ b/tools/testing/selftests/mm/vm_util.c
@@ -449,6 +449,34 @@ bool check_vmflag_pfnmap(void *addr)
return check_vmflag(addr, "pf");
}
+bool softdirty_is_supported(void)
+{
+ char *addr;
+ int ret = 0;
+ size_t pagesize;
+
+ /* We know for sure that arm64 does not support soft-dirty. */
+#if defined(__aarch64__)
+ return ret;
+#endif
+ pagesize = getpagesize();
+ /*
+ * __mmap_complete() always sets VM_SOFTDIRTY for new VMAs, so we
+ * just mmap a small region and check its VmFlags in /proc/self/smaps
+ * for the "sd" flag.
+ */
+ addr = mmap(0, pagesize, PROT_READ | PROT_WRITE,
+ MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
+ if (!addr)
+ ksft_exit_fail_msg("mmap failed\n");
+
+ if (check_vmflag(addr, "sd"))
+ ret = 1;
+
+ munmap(addr, pagesize);
+ return ret;
+}
+
/*
* Open an fd at /proc/$pid/maps and configure procmap_out ready for
* PROCMAP_QUERY query. Returns 0 on success, or an error code otherwise.
diff --git a/tools/testing/selftests/mm/vm_util.h b/tools/testing/selftests/mm/vm_util.h
index 07c4acfd84b6..87ad8e0d92c0 100644
--- a/tools/testing/selftests/mm/vm_util.h
+++ b/tools/testing/selftests/mm/vm_util.h
@@ -104,6 +104,7 @@ bool find_vma_procmap(struct procmap_fd *procmap, void *address);
int close_procmap(struct procmap_fd *procmap);
int write_sysfs(const char *file_path, unsigned long val);
int read_sysfs(const char *file_path, unsigned long *val);
+bool softdirty_is_supported(void);
static inline int open_self_procmap(struct procmap_fd *procmap_out)
{
--
2.49.0
Arnd sent the v1 of the series in July, and it was bogus. So with a
little help from claude-sonnet I built up the missing ioctls tests and
tried to figure out a way to apply Arnd's logic without breaking the
existing ioctls.
The end result is in patch 3/3, which makes use of subfunctions to keep
the main ioctl code path clean.
Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
---
Changes in v3:
- dropped the co-developed-by tag and put a blurb instead
- change the attribution of patch 3/3 to me as requested by Arnd.
- Link to v2: https://lore.kernel.org/r/20250826-b4-hidraw-ioctls-v2-0-c7726b236719@kerne…
changes in v2:
- add new hidraw ioctls tests
- refactor Arnd's patch to keep the existing error path logic
- link to v1: https://lore.kernel.org/linux-input/20250711072847.2836962-1-arnd@kernel.or…
---
Benjamin Tissoires (3):
selftests/hid: hidraw: add more coverage for hidraw ioctls
selftests/hid: hidraw: forge wrong ioctls and tests them
HID: hidraw: tighten ioctl command parsing
drivers/hid/hidraw.c | 224 ++++++++-------
include/uapi/linux/hidraw.h | 2 +
tools/testing/selftests/hid/hid_common.h | 6 +
tools/testing/selftests/hid/hidraw.c | 473 +++++++++++++++++++++++++++++++
4 files changed, 603 insertions(+), 102 deletions(-)
---
base-commit: 02d6eeedbc36d4b309d5518778071a749ef79c4e
change-id: 20250825-b4-hidraw-ioctls-66f34297032a
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
This commit is a rewrite almost from scratch of vmtest.sh.
By relying on virtme-ng, we get rid of boot2container, reducing the
total bootup time (and network requirements). That means that we are
relying on the programs being installed on the host, but that shouldn't
be an issue. The generation of the kconfig is also now handled by
virtme-ng, so that's one less thing to worry.
I used tools/testing/selftests/vsock/vmtest.sh as a base and modified it
to look mostly like my previous script:
- removed the custom ssh handling
- make use of vng for compiling, which allows to bring remote
compilation (and potentially remote compilation on a remote container)
- change the verbosity logic by having 2 levels:
- first one shows the tests outputs
- second level also shows the VM logs
- instead of only running the compiled kernel when it is built, if we
are in the kernel tree, use the kernel artifacts there (and complain
if they are not built)
- adapted the tests list to match the HID subsystem tests
Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
---
I have switched my workflow to make use of virtme-ng for a few months.
Now it's time to automate the manual commands I've been running in
vmtest.sh.
---
tools/testing/selftests/hid/vmtest.sh | 668 +++++++++++++++++++++-------------
1 file changed, 423 insertions(+), 245 deletions(-)
diff --git a/tools/testing/selftests/hid/vmtest.sh b/tools/testing/selftests/hid/vmtest.sh
index db534e9099a8a4684346eed0067d397ffa6f80cf..ecbd57f775a044b4d076b4800ca0068f9533056c 100755
--- a/tools/testing/selftests/hid/vmtest.sh
+++ b/tools/testing/selftests/hid/vmtest.sh
@@ -1,296 +1,474 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (c) 2025 Red Hat
+# Copyright (c) 2025 Meta Platforms, Inc. and affiliates
+#
+# Dependencies:
+# * virtme-ng
+# * busybox-static (used by virtme-ng)
+# * qemu (used by virtme-ng)
+
+readonly SCRIPT_DIR="$(cd -P -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)"
+readonly KERNEL_CHECKOUT=$(realpath "${SCRIPT_DIR}"/../../../../)
+
+source "${SCRIPT_DIR}"/../kselftest/ktap_helpers.sh
+
+readonly HID_BPF_TEST="${SCRIPT_DIR}"/hid_bpf
+readonly HIDRAW_TEST="${SCRIPT_DIR}"/hidraw
+readonly HID_BPF_PROGS="${KERNEL_CHECKOUT}/drivers/hid/bpf/progs"
+readonly SSH_GUEST_PORT=22
+readonly WAIT_PERIOD=3
+readonly WAIT_PERIOD_MAX=60
+readonly WAIT_TOTAL=$(( WAIT_PERIOD * WAIT_PERIOD_MAX ))
+readonly QEMU_PIDFILE=$(mktemp /tmp/qemu_hid_vmtest_XXXX.pid)
+
+readonly QEMU_OPTS="\
+ --pidfile ${QEMU_PIDFILE} \
+"
+readonly KERNEL_CMDLINE=""
+readonly LOG=$(mktemp /tmp/hid_vmtest_XXXX.log)
+readonly TEST_NAMES=(vm_hid_bpf vm_hidraw vm_pytest)
+readonly TEST_DESCS=(
+ "Run hid_bpf tests in the VM."
+ "Run hidraw tests in the VM."
+ "Run the hid-tools test-suite in the VM."
+)
+
+VERBOSE=0
+SHELL_MODE=0
+BUILD_HOST=""
+BUILD_HOST_PODMAN_CONTAINER_NAME=""
+
+usage() {
+ local name
+ local desc
+ local i
+
+ echo
+ echo "$0 [OPTIONS] [TEST]... [-- tests-args]"
+ echo "If no TEST argument is given, all tests will be run."
+ echo
+ echo "Options"
+ echo " -b: build the kernel from the current source tree and use it for guest VMs"
+ echo " -H: hostname for remote build host (used with -b)"
+ echo " -p: podman container name for remote build host (used with -b)"
+ echo " Example: -H beefyserver -p vng"
+ echo " -q: set the path to or name of qemu binary"
+ echo " -s: start a shell in the VM instead of running tests"
+ echo " -v: more verbose output (can be repeated multiple times)"
+ echo
+ echo "Available tests"
+
+ for ((i = 0; i < ${#TEST_NAMES[@]}; i++)); do
+ name=${TEST_NAMES[${i}]}
+ desc=${TEST_DESCS[${i}]}
+ printf "\t%-35s%-35s\n" "${name}" "${desc}"
+ done
+ echo
-set -u
-set -e
-
-# This script currently only works for x86_64
-ARCH="$(uname -m)"
-case "${ARCH}" in
-x86_64)
- QEMU_BINARY=qemu-system-x86_64
- BZIMAGE="arch/x86/boot/bzImage"
- ;;
-*)
- echo "Unsupported architecture"
exit 1
- ;;
-esac
-SCRIPT_DIR="$(dirname $(realpath $0))"
-OUTPUT_DIR="$SCRIPT_DIR/results"
-KCONFIG_REL_PATHS=("${SCRIPT_DIR}/config" "${SCRIPT_DIR}/config.common" "${SCRIPT_DIR}/config.${ARCH}")
-B2C_URL="https://gitlab.freedesktop.org/gfx-ci/boot2container/-/raw/main/vm2c.py"
-NUM_COMPILE_JOBS="$(nproc)"
-LOG_FILE_BASE="$(date +"hid_selftests.%Y-%m-%d_%H-%M-%S")"
-LOG_FILE="${LOG_FILE_BASE}.log"
-EXIT_STATUS_FILE="${LOG_FILE_BASE}.exit_status"
-CONTAINER_IMAGE="registry.freedesktop.org/bentiss/hid/fedora/39:2023-11-22.1"
-
-TARGETS="${TARGETS:=$(basename ${SCRIPT_DIR})}"
-DEFAULT_COMMAND="pip3 install hid-tools; make -C tools/testing/selftests TARGETS=${TARGETS} run_tests"
-
-usage()
-{
- cat <<EOF
-Usage: $0 [-j N] [-s] [-b] [-d <output_dir>] -- [<command>]
-
-<command> is the command you would normally run when you are in
-the source kernel direcory. e.g:
-
- $0 -- ./tools/testing/selftests/hid/hid_bpf
-
-If no command is specified and a debug shell (-s) is not requested,
-"${DEFAULT_COMMAND}" will be run by default.
-
-If you build your kernel using KBUILD_OUTPUT= or O= options, these
-can be passed as environment variables to the script:
-
- O=<kernel_build_path> $0 -- ./tools/testing/selftests/hid/hid_bpf
-
-or
-
- KBUILD_OUTPUT=<kernel_build_path> $0 -- ./tools/testing/selftests/hid/hid_bpf
-
-Options:
-
- -u) Update the boot2container script to a newer version.
- -d) Update the output directory (default: ${OUTPUT_DIR})
- -b) Run only the build steps for the kernel and the selftests
- -j) Number of jobs for compilation, similar to -j in make
- (default: ${NUM_COMPILE_JOBS})
- -s) Instead of powering off the VM, start an interactive
- shell. If <command> is specified, the shell runs after
- the command finishes executing
-EOF
}
-download()
-{
- local file="$1"
+die() {
+ echo "$*" >&2
+ exit "${KSFT_FAIL}"
+}
- echo "Downloading $file..." >&2
- curl -Lsf "$file" -o "${@:2}"
+vm_ssh() {
+ # vng --ssh-client keeps shouting "Warning: Permanently added 'virtme-ng%22'
+ # (ED25519) to the list of known hosts.",
+ # So replace the command with what's actually called and add the "-q" option
+ stdbuf -oL ssh -q \
+ -F ${HOME}/.cache/virtme-ng/.ssh/virtme-ng-ssh.conf \
+ -l root virtme-ng%${SSH_GUEST_PORT} \
+ "$@"
+ return $?
}
-recompile_kernel()
-{
- local kernel_checkout="$1"
- local make_command="$2"
+cleanup() {
+ if [[ -s "${QEMU_PIDFILE}" ]]; then
+ pkill -SIGTERM -F "${QEMU_PIDFILE}" > /dev/null 2>&1
+ fi
- cd "${kernel_checkout}"
+ # If failure occurred during or before qemu start up, then we need
+ # to clean this up ourselves.
+ if [[ -e "${QEMU_PIDFILE}" ]]; then
+ rm "${QEMU_PIDFILE}"
+ fi
+}
+
+check_args() {
+ local found
- ${make_command} olddefconfig
- ${make_command} headers
- ${make_command}
+ for arg in "$@"; do
+ found=0
+ for name in "${TEST_NAMES[@]}"; do
+ if [[ "${name}" = "${arg}" ]]; then
+ found=1
+ break
+ fi
+ done
+
+ if [[ "${found}" -eq 0 ]]; then
+ echo "${arg} is not an available test" >&2
+ usage
+ fi
+ done
+
+ for arg in "$@"; do
+ if ! command -v > /dev/null "test_${arg}"; then
+ echo "Test ${arg} not found" >&2
+ usage
+ fi
+ done
+}
+
+check_deps() {
+ for dep in vng ${QEMU} busybox pkill ssh pytest; do
+ if [[ ! -x $(command -v "${dep}") ]]; then
+ echo -e "skip: dependency ${dep} not found!\n"
+ exit "${KSFT_SKIP}"
+ fi
+ done
+
+ if [[ ! -x $(command -v "${HID_BPF_TEST}") ]]; then
+ printf "skip: %s not found!" "${HID_BPF_TEST}"
+ printf " Please build the kselftest hid_bpf target.\n"
+ exit "${KSFT_SKIP}"
+ fi
+
+ if [[ ! -x $(command -v "${HIDRAW_TEST}") ]]; then
+ printf "skip: %s not found!" "${HIDRAW_TEST}"
+ printf " Please build the kselftest hidraw target.\n"
+ exit "${KSFT_SKIP}"
+ fi
}
-update_selftests()
-{
- local kernel_checkout="$1"
- local selftests_dir="${kernel_checkout}/tools/testing/selftests/hid"
+check_vng() {
+ local tested_versions
+ local version
+ local ok
- cd "${selftests_dir}"
- ${make_command}
+ tested_versions=("1.36" "1.37")
+ version="$(vng --version)"
+
+ ok=0
+ for tv in "${tested_versions[@]}"; do
+ if [[ "${version}" == *"${tv}"* ]]; then
+ ok=1
+ break
+ fi
+ done
+
+ if [[ ! "${ok}" -eq 1 ]]; then
+ printf "warning: vng version '%s' has not been tested and may " "${version}" >&2
+ printf "not function properly.\n\tThe following versions have been tested: " >&2
+ echo "${tested_versions[@]}" >&2
+ fi
}
-run_vm()
-{
- local run_dir="$1"
- local b2c="$2"
- local kernel_bzimage="$3"
- local command="$4"
- local post_command=""
-
- cd "${run_dir}"
-
- if ! which "${QEMU_BINARY}" &> /dev/null; then
- cat <<EOF
-Could not find ${QEMU_BINARY}
-Please install qemu or set the QEMU_BINARY environment variable.
-EOF
+handle_build() {
+ if [[ ! "${BUILD}" -eq 1 ]]; then
+ return
+ fi
+
+ if [[ ! -d "${KERNEL_CHECKOUT}" ]]; then
+ echo "-b requires vmtest.sh called from the kernel source tree" >&2
exit 1
fi
- # alpine (used in post-container requires the PATH to have /bin
- export PATH=$PATH:/bin
+ pushd "${KERNEL_CHECKOUT}" &>/dev/null
- if [[ "${debug_shell}" != "yes" ]]
- then
- touch ${OUTPUT_DIR}/${LOG_FILE}
- command="mount bpffs -t bpf /sys/fs/bpf/; set -o pipefail ; ${command} 2>&1 | tee ${OUTPUT_DIR}/${LOG_FILE}"
- post_command="cat ${OUTPUT_DIR}/${LOG_FILE}"
- else
- command="mount bpffs -t bpf /sys/fs/bpf/; ${command}"
+ if ! vng --kconfig --config "${SCRIPT_DIR}"/config; then
+ die "failed to generate .config for kernel source tree (${KERNEL_CHECKOUT})"
fi
- set +e
- $b2c --command "${command}" \
- --kernel ${kernel_bzimage} \
- --workdir ${OUTPUT_DIR} \
- --image ${CONTAINER_IMAGE}
+ local vng_args=("-v" "--config" "${SCRIPT_DIR}/config" "--build")
- echo $? > ${OUTPUT_DIR}/${EXIT_STATUS_FILE}
+ if [[ -n "${BUILD_HOST}" ]]; then
+ vng_args+=("--build-host" "${BUILD_HOST}")
+ fi
- set -e
+ if [[ -n "${BUILD_HOST_PODMAN_CONTAINER_NAME}" ]]; then
+ vng_args+=("--build-host-exec-prefix" \
+ "podman exec -ti ${BUILD_HOST_PODMAN_CONTAINER_NAME}")
+ fi
- ${post_command}
-}
+ if ! vng "${vng_args[@]}"; then
+ die "failed to build kernel from source tree (${KERNEL_CHECKOUT})"
+ fi
-is_rel_path()
-{
- local path="$1"
+ if ! make -j$(nproc) -C "${HID_BPF_PROGS}"; then
+ die "failed to build HID bpf objects from source tree (${HID_BPF_PROGS})"
+ fi
- [[ ${path:0:1} != "/" ]]
+ if ! make -j$(nproc) -C "${SCRIPT_DIR}"; then
+ die "failed to build HID selftests from source tree (${SCRIPT_DIR})"
+ fi
+
+ popd &>/dev/null
}
-do_update_kconfig()
-{
- local kernel_checkout="$1"
- local kconfig_file="$2"
+vm_start() {
+ local logfile=/dev/null
+ local verbose_opt=""
+ local kernel_opt=""
+ local qemu
- rm -f "$kconfig_file" 2> /dev/null
+ qemu=$(command -v "${QEMU}")
- for config in "${KCONFIG_REL_PATHS[@]}"; do
- local kconfig_src="${config}"
- cat "$kconfig_src" >> "$kconfig_file"
- done
-}
+ if [[ "${VERBOSE}" -eq 2 ]]; then
+ verbose_opt="--verbose"
+ logfile=/dev/stdout
+ fi
-update_kconfig()
-{
- local kernel_checkout="$1"
- local kconfig_file="$2"
-
- if [[ -f "${kconfig_file}" ]]; then
- local local_modified="$(stat -c %Y "${kconfig_file}")"
-
- for config in "${KCONFIG_REL_PATHS[@]}"; do
- local kconfig_src="${config}"
- local src_modified="$(stat -c %Y "${kconfig_src}")"
- # Only update the config if it has been updated after the
- # previously cached config was created. This avoids
- # unnecessarily compiling the kernel and selftests.
- if [[ "${src_modified}" -gt "${local_modified}" ]]; then
- do_update_kconfig "$kernel_checkout" "$kconfig_file"
- # Once we have found one outdated configuration
- # there is no need to check other ones.
- break
- fi
- done
- else
- do_update_kconfig "$kernel_checkout" "$kconfig_file"
+ # If we are running from within the kernel source tree, use the kernel source tree
+ # as the kernel to boot, otherwise use the currently running kernel.
+ if [[ "$(realpath "$(pwd)")" == "${KERNEL_CHECKOUT}"* ]]; then
+ kernel_opt="${KERNEL_CHECKOUT}"
fi
-}
-main()
-{
- local script_dir="$(cd -P -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)"
- local kernel_checkout=$(realpath "${script_dir}"/../../../../)
- # By default the script searches for the kernel in the checkout directory but
- # it also obeys environment variables O= and KBUILD_OUTPUT=
- local kernel_bzimage="${kernel_checkout}/${BZIMAGE}"
- local command="${DEFAULT_COMMAND}"
- local update_b2c="no"
- local debug_shell="no"
- local build_only="no"
-
- while getopts ':hsud:j:b' opt; do
- case ${opt} in
- u)
- update_b2c="yes"
- ;;
- d)
- OUTPUT_DIR="$OPTARG"
- ;;
- j)
- NUM_COMPILE_JOBS="$OPTARG"
- ;;
- s)
- command="/bin/sh"
- debug_shell="yes"
- ;;
- b)
- build_only="yes"
- ;;
- h)
- usage
- exit 0
- ;;
- \? )
- echo "Invalid Option: -$OPTARG"
- usage
- exit 1
- ;;
- : )
- echo "Invalid Option: -$OPTARG requires an argument"
- usage
- exit 1
- ;;
- esac
- done
- shift $((OPTIND -1))
-
- # trap 'catch "$?"' EXIT
- if [[ "${build_only}" == "no" && "${debug_shell}" == "no" ]]; then
- if [[ $# -eq 0 ]]; then
- echo "No command specified, will run ${DEFAULT_COMMAND} in the vm"
- else
- command="$@"
-
- if [[ "${command}" == "/bin/bash" || "${command}" == "bash" ]]
- then
- debug_shell="yes"
- fi
+ vng \
+ --run \
+ ${kernel_opt} \
+ ${verbose_opt} \
+ --qemu-opts="${QEMU_OPTS}" \
+ --qemu="${qemu}" \
+ --user root \
+ --append "${KERNEL_CMDLINE}" \
+ --ssh "${SSH_GUEST_PORT}" \
+ --rw &> ${logfile} &
+
+ local vng_pid=$!
+ local elapsed=0
+
+ while [[ ! -s "${QEMU_PIDFILE}" ]]; do
+ if ! kill -0 "${vng_pid}" 2>/dev/null; then
+ echo "vng process (PID ${vng_pid}) exited early, check logs for details" >&2
+ die "failed to boot VM"
fi
- fi
- local kconfig_file="${OUTPUT_DIR}/latest.config"
- local make_command="make -j ${NUM_COMPILE_JOBS} KCONFIG_CONFIG=${kconfig_file}"
+ if [[ ${elapsed} -ge ${WAIT_TOTAL} ]]; then
+ echo "Timed out after ${WAIT_TOTAL} seconds waiting for VM to boot" >&2
+ die "failed to boot VM"
+ fi
- # Figure out where the kernel is being built.
- # O takes precedence over KBUILD_OUTPUT.
- if [[ "${O:=""}" != "" ]]; then
- if is_rel_path "${O}"; then
- O="$(realpath "${PWD}/${O}")"
+ sleep 1
+ elapsed=$((elapsed + 1))
+ done
+}
+
+vm_wait_for_ssh() {
+ local i
+
+ i=0
+ while true; do
+ if [[ ${i} -gt ${WAIT_PERIOD_MAX} ]]; then
+ die "Timed out waiting for guest ssh"
fi
- kernel_bzimage="${O}/${BZIMAGE}"
- make_command="${make_command} O=${O}"
- elif [[ "${KBUILD_OUTPUT:=""}" != "" ]]; then
- if is_rel_path "${KBUILD_OUTPUT}"; then
- KBUILD_OUTPUT="$(realpath "${PWD}/${KBUILD_OUTPUT}")"
+ if vm_ssh -- true; then
+ break
fi
- kernel_bzimage="${KBUILD_OUTPUT}/${BZIMAGE}"
- make_command="${make_command} KBUILD_OUTPUT=${KBUILD_OUTPUT}"
+ i=$(( i + 1 ))
+ sleep ${WAIT_PERIOD}
+ done
+}
+
+vm_mount_bpffs() {
+ vm_ssh -- mount bpffs -t bpf /sys/fs/bpf
+}
+
+__log_stdin() {
+ stdbuf -oL awk '{ printf "%s:\t%s\n","'"${prefix}"'", $0; fflush() }'
+}
+
+__log_args() {
+ echo "$*" | awk '{ printf "%s:\t%s\n","'"${prefix}"'", $0 }'
+}
+
+log() {
+ local verbose="$1"
+ shift
+
+ local prefix="$1"
+
+ shift
+ local redirect=
+ if [[ ${verbose} -le 0 ]]; then
+ redirect=/dev/null
+ else
+ redirect=/dev/stdout
+ fi
+
+ if [[ "$#" -eq 0 ]]; then
+ __log_stdin | tee -a "${LOG}" > ${redirect}
+ else
+ __log_args "$@" | tee -a "${LOG}" > ${redirect}
fi
+}
- local b2c="${OUTPUT_DIR}/vm2c.py"
+log_setup() {
+ log $((VERBOSE-1)) "setup" "$@"
+}
- echo "Output directory: ${OUTPUT_DIR}"
+log_host() {
+ local testname=$1
- mkdir -p "${OUTPUT_DIR}"
- update_kconfig "${kernel_checkout}" "${kconfig_file}"
+ shift
+ log $((VERBOSE-1)) "test:${testname}:host" "$@"
+}
- recompile_kernel "${kernel_checkout}" "${make_command}"
- update_selftests "${kernel_checkout}" "${make_command}"
+log_guest() {
+ local testname=$1
- if [[ "${build_only}" == "no" ]]; then
- if [[ "${update_b2c}" == "no" && ! -f "${b2c}" ]]; then
- echo "vm2c script not found in ${b2c}"
- update_b2c="yes"
- fi
+ shift
+ log ${VERBOSE} "# test:${testname}" "$@"
+}
- if [[ "${update_b2c}" == "yes" ]]; then
- download $B2C_URL $b2c
- chmod +x $b2c
- fi
+test_vm_hid_bpf() {
+ local testname="${FUNCNAME[0]#test_}"
- run_vm "${kernel_checkout}" $b2c "${kernel_bzimage}" "${command}"
- if [[ "${debug_shell}" != "yes" ]]; then
- echo "Logs saved in ${OUTPUT_DIR}/${LOG_FILE}"
- fi
+ vm_ssh -- "${HID_BPF_TEST}" \
+ 2>&1 | log_guest "${testname}"
+
+ return ${PIPESTATUS[0]}
+}
+
+test_vm_hidraw() {
+ local testname="${FUNCNAME[0]#test_}"
+
+ vm_ssh -- "${HIDRAW_TEST}" \
+ 2>&1 | log_guest "${testname}"
+
+ return ${PIPESTATUS[0]}
+}
+
+test_vm_pytest() {
+ local testname="${FUNCNAME[0]#test_}"
- exit $(cat ${OUTPUT_DIR}/${EXIT_STATUS_FILE})
+ shift
+
+ vm_ssh -- pytest ${SCRIPT_DIR}/tests --color=yes "$@" \
+ 2>&1 | log_guest "${testname}"
+
+ return ${PIPESTATUS[0]}
+}
+
+run_test() {
+ local vm_oops_cnt_before
+ local vm_warn_cnt_before
+ local vm_oops_cnt_after
+ local vm_warn_cnt_after
+ local name
+ local rc
+
+ vm_oops_cnt_before=$(vm_ssh -- dmesg | grep -c -i 'Oops')
+ vm_error_cnt_before=$(vm_ssh -- dmesg --level=err | wc -l)
+
+ name=$(echo "${1}" | awk '{ print $1 }')
+ eval test_"${name}" "$@"
+ rc=$?
+
+ vm_oops_cnt_after=$(vm_ssh -- dmesg | grep -i 'Oops' | wc -l)
+ if [[ ${vm_oops_cnt_after} -gt ${vm_oops_cnt_before} ]]; then
+ echo "FAIL: kernel oops detected on vm" | log_host "${name}"
+ rc=$KSFT_FAIL
+ fi
+
+ vm_error_cnt_after=$(vm_ssh -- dmesg --level=err | wc -l)
+ if [[ ${vm_error_cnt_after} -gt ${vm_error_cnt_before} ]]; then
+ echo "FAIL: kernel error detected on vm" | log_host "${name}"
+ vm_ssh -- dmesg --level=err | log_host "${name}"
+ rc=$KSFT_FAIL
fi
+
+ return "${rc}"
}
-main "$@"
+QEMU="qemu-system-$(uname -m)"
+
+while getopts :hvsbq:H:p: o
+do
+ case $o in
+ v) VERBOSE=$((VERBOSE+1));;
+ s) SHELL_MODE=1;;
+ b) BUILD=1;;
+ q) QEMU=$OPTARG;;
+ H) BUILD_HOST=$OPTARG;;
+ p) BUILD_HOST_PODMAN_CONTAINER_NAME=$OPTARG;;
+ h|*) usage;;
+ esac
+done
+shift $((OPTIND-1))
+
+trap cleanup EXIT
+
+PARAMS=""
+
+if [[ ${#} -eq 0 ]]; then
+ ARGS=("${TEST_NAMES[@]}")
+else
+ ARGS=()
+ COUNT=0
+ for arg in $@; do
+ COUNT=$((COUNT+1))
+ if [[ x"$arg" == x"--" ]]; then
+ break
+ fi
+ ARGS+=($arg)
+ done
+ shift $COUNT
+ PARAMS="$@"
+fi
+
+if [[ "${SHELL_MODE}" -eq 0 ]]; then
+ check_args "${ARGS[@]}"
+ echo "1..${#ARGS[@]}"
+fi
+check_deps
+check_vng
+handle_build
+
+log_setup "Booting up VM"
+vm_start
+vm_wait_for_ssh
+vm_mount_bpffs
+log_setup "VM booted up"
+
+if [[ "${SHELL_MODE}" -eq 1 ]]; then
+ log_setup "Starting interactive shell in VM"
+ echo "Starting shell in VM. Use 'exit' to quit and shutdown the VM."
+ CURRENT_DIR="$(pwd)"
+ vm_ssh -t -- "cd '${CURRENT_DIR}' && exec bash -l"
+ exit "$KSFT_PASS"
+fi
+
+cnt_pass=0
+cnt_fail=0
+cnt_skip=0
+cnt_total=0
+for arg in "${ARGS[@]}"; do
+ run_test "${arg}" "${PARAMS}"
+ rc=$?
+ if [[ ${rc} -eq $KSFT_PASS ]]; then
+ cnt_pass=$(( cnt_pass + 1 ))
+ echo "ok ${cnt_total} ${arg}"
+ elif [[ ${rc} -eq $KSFT_SKIP ]]; then
+ cnt_skip=$(( cnt_skip + 1 ))
+ echo "ok ${cnt_total} ${arg} # SKIP"
+ elif [[ ${rc} -eq $KSFT_FAIL ]]; then
+ cnt_fail=$(( cnt_fail + 1 ))
+ echo "not ok ${cnt_total} ${arg} # exit=$rc"
+ fi
+ cnt_total=$(( cnt_total + 1 ))
+done
+
+echo "SUMMARY: PASS=${cnt_pass} SKIP=${cnt_skip} FAIL=${cnt_fail}"
+echo "Log: ${LOG}"
+
+if [ $((cnt_pass + cnt_skip)) -eq ${cnt_total} ]; then
+ exit "$KSFT_PASS"
+else
+ exit "$KSFT_FAIL"
+fi
---
base-commit: b80a75cf6999fb79971b41eaec7af2bb4b514714
change-id: 20250818-virtme-ng-f73db7e61235
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
From: Lance Yang <lance.yang(a)linux.dev>
The madv_populate and soft-dirty kselftests currently fail on systems where
CONFIG_MEM_SOFT_DIRTY is disabled.
Introduce a new helper softdirty_is_supported() into vm_util.c/h to ensure
tests are properly skipped when the feature is not enabled.
Signed-off-by: Lance Yang <lance.yang(a)linux.dev>
---
tools/testing/selftests/mm/madv_populate.c | 21 ++--------------
tools/testing/selftests/mm/soft-dirty.c | 5 +++-
tools/testing/selftests/mm/vm_util.c | 28 ++++++++++++++++++++++
tools/testing/selftests/mm/vm_util.h | 1 +
4 files changed, 35 insertions(+), 20 deletions(-)
diff --git a/tools/testing/selftests/mm/madv_populate.c b/tools/testing/selftests/mm/madv_populate.c
index b6fabd5c27ed..43dac7783004 100644
--- a/tools/testing/selftests/mm/madv_populate.c
+++ b/tools/testing/selftests/mm/madv_populate.c
@@ -264,23 +264,6 @@ static void test_softdirty(void)
munmap(addr, SIZE);
}
-static int system_has_softdirty(void)
-{
- /*
- * There is no way to check if the kernel supports soft-dirty, other
- * than by writing to a page and seeing if the bit was set. But the
- * tests are intended to check that the bit gets set when it should, so
- * doing that check would turn a potentially legitimate fail into a
- * skip. Fortunately, we know for sure that arm64 does not support
- * soft-dirty. So for now, let's just use the arch as a corse guide.
- */
-#if defined(__aarch64__)
- return 0;
-#else
- return 1;
-#endif
-}
-
int main(int argc, char **argv)
{
int nr_tests = 16;
@@ -288,7 +271,7 @@ int main(int argc, char **argv)
pagesize = getpagesize();
- if (system_has_softdirty())
+ if (softdirty_is_supported())
nr_tests += 5;
ksft_print_header();
@@ -300,7 +283,7 @@ int main(int argc, char **argv)
test_holes();
test_populate_read();
test_populate_write();
- if (system_has_softdirty())
+ if (softdirty_is_supported())
test_softdirty();
err = ksft_get_fail_cnt();
diff --git a/tools/testing/selftests/mm/soft-dirty.c b/tools/testing/selftests/mm/soft-dirty.c
index 8a3f2b4b2186..98e42d2ac32a 100644
--- a/tools/testing/selftests/mm/soft-dirty.c
+++ b/tools/testing/selftests/mm/soft-dirty.c
@@ -200,8 +200,11 @@ int main(int argc, char **argv)
int pagesize;
ksft_print_header();
- ksft_set_plan(15);
+ if (!softdirty_is_supported())
+ ksft_exit_skip("soft-dirty is not support\n");
+
+ ksft_set_plan(15);
pagemap_fd = open(PAGEMAP_FILE_PATH, O_RDONLY);
if (pagemap_fd < 0)
ksft_exit_fail_msg("Failed to open %s\n", PAGEMAP_FILE_PATH);
diff --git a/tools/testing/selftests/mm/vm_util.c b/tools/testing/selftests/mm/vm_util.c
index 56e9bd541edd..3173335df775 100644
--- a/tools/testing/selftests/mm/vm_util.c
+++ b/tools/testing/selftests/mm/vm_util.c
@@ -449,6 +449,34 @@ bool check_vmflag_pfnmap(void *addr)
return check_vmflag(addr, "pf");
}
+bool softdirty_is_supported(void)
+{
+ char *addr;
+ int ret = 0;
+ size_t pagesize;
+
+ /* We know for sure that arm64 does not support soft-dirty. */
+#if defined(__aarch64__)
+ return ret;
+#endif
+ pagesize = getpagesize();
+ /*
+ * __mmap_complete() always sets VM_SOFTDIRTY for new VMAs, so we
+ * just mmap a small region and check its VmFlags in /proc/self/smaps
+ * for the "sd" flag.
+ */
+ addr = mmap(0, pagesize, PROT_READ | PROT_WRITE,
+ MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
+ if (!addr)
+ ksft_exit_fail_msg("mmap failed\n");
+
+ if (check_vmflag(addr, "sd"))
+ ret = 1;
+
+ munmap(addr, pagesize);
+ return ret;
+}
+
/*
* Open an fd at /proc/$pid/maps and configure procmap_out ready for
* PROCMAP_QUERY query. Returns 0 on success, or an error code otherwise.
diff --git a/tools/testing/selftests/mm/vm_util.h b/tools/testing/selftests/mm/vm_util.h
index 07c4acfd84b6..87ad8e0d92c0 100644
--- a/tools/testing/selftests/mm/vm_util.h
+++ b/tools/testing/selftests/mm/vm_util.h
@@ -104,6 +104,7 @@ bool find_vma_procmap(struct procmap_fd *procmap, void *address);
int close_procmap(struct procmap_fd *procmap);
int write_sysfs(const char *file_path, unsigned long val);
int read_sysfs(const char *file_path, unsigned long *val);
+bool softdirty_is_supported(void);
static inline int open_self_procmap(struct procmap_fd *procmap_out)
{
--
2.49.0
These kfuncs are removed in commit 2f9838e25790
("selftests/bpf: Cleanup bpf qdisc selftests"), but they are still
referenced by multiple tests. Otherwise, we will get the following errors.
```
progs/bpf_qdisc_fail__incompl_ops.c:13:2: error: call to undeclared function 'bpf_qdisc_skb_drop'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
13 | bpf_qdisc_skb_drop(skb, to_free);
| ^
1 error generated.
progs/bpf_qdisc_fifo.c:38:3: error: call to undeclared function 'bpf_qdisc_skb_drop'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
38 | bpf_qdisc_skb_drop(skb, to_free);
| ^
progs/bpf_qdisc_fq.c:280:11: error: call to undeclared function 'bpf_skb_get_hash'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
280 | hash = bpf_skb_get_hash(skb) & q.orphan_mask;
| ^
progs/bpf_qdisc_fq.c:287:11: error: call to undeclared function 'bpf_skb_get_hash'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
287 | hash = bpf_skb_get_hash(skb) & q.orphan_mask;
| ^
progs/bpf_qdisc_fq.c:375:3: error: call to undeclared function 'bpf_qdisc_skb_drop'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
375 | bpf_qdisc_skb_drop(skb, to_free);
| ^
progs/bpf_qdisc_fifo.c:71:2: error: call to undeclared function 'bpf_qdisc_bstats_update'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
71 | bpf_qdisc_bstats_update(sch, skb);
| ^
progs/bpf_qdisc_fifo.c:106:4: error: call to undeclared function 'bpf_kfree_skb'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
106 | bpf_kfree_skb(skb);
| ^
3 errors generated.
progs/bpf_qdisc_fq.c:614:3: error: call to undeclared function 'bpf_qdisc_bstats_update'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
614 | bpf_qdisc_bstats_update(sch, skb);
| ^
progs/bpf_qdisc_fq.c:619:3: error: call to undeclared function 'bpf_qdisc_watchdog_schedule'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
619 | bpf_qdisc_watchdog_schedule(sch, cb_ctx.expire, q.timer_slack);
| ^
5 errors generated.
```
Fixes: 2f9838e25790 ("selftests/bpf: Cleanup bpf qdisc selftests")
Signed-off-by: Xing Guo <higuoxing(a)gmail.com>
---
tools/testing/selftests/bpf/progs/bpf_qdisc_common.h | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/bpf_qdisc_common.h b/tools/testing/selftests/bpf/progs/bpf_qdisc_common.h
index 3754f581b328..7e7f2fe04f22 100644
--- a/tools/testing/selftests/bpf/progs/bpf_qdisc_common.h
+++ b/tools/testing/selftests/bpf/progs/bpf_qdisc_common.h
@@ -14,6 +14,12 @@
struct bpf_sk_buff_ptr;
+u32 bpf_skb_get_hash(struct sk_buff *p) __ksym;
+void bpf_kfree_skb(struct sk_buff *p) __ksym;
+void bpf_qdisc_skb_drop(struct sk_buff *p, struct bpf_sk_buff_ptr *to_free) __ksym;
+void bpf_qdisc_watchdog_schedule(struct Qdisc *sch, u64 expire, u64 delta_ns) __ksym;
+void bpf_qdisc_bstats_update(struct Qdisc *sch, const struct sk_buff *skb) __ksym;
+
static struct qdisc_skb_cb *qdisc_skb_cb(const struct sk_buff *skb)
{
return (struct qdisc_skb_cb *)skb->cb;
--
2.51.0
Some ublk selftests have strange behavior when fio is not installed.
While most tests behave correctly (run if they don't need fio, or skip
if they need fio), the following tests have different behavior:
- test_null_01 and test_null_02 try to run fio without checking if it
exists first, and fail on any failure of the fio command (including
"fio command not found"). So these tests fail when they should skip.
- test_stress_05 runs fio without checking if it exists first, but
doesn't fail on fio command failure. This test passes, but that pass
is misleading as the test doesn't do anything useful without fio
installed. So this test passes when it should skip.
Fix these issues by adding _have_program fio checks to the top of all
three of these tests.
Signed-off-by: Uday Shankar <ushankar(a)purestorage.com>
---
tools/testing/selftests/ublk/test_null_01.sh | 4 ++++
tools/testing/selftests/ublk/test_null_02.sh | 4 ++++
tools/testing/selftests/ublk/test_stress_05.sh | 4 ++++
3 files changed, 12 insertions(+)
diff --git a/tools/testing/selftests/ublk/test_null_01.sh b/tools/testing/selftests/ublk/test_null_01.sh
index a34203f726685787da80b0e32da95e0fcb90d0b1..c2cb8f7a09fe37a9956d067fd56b28dc7ca6bd68 100755
--- a/tools/testing/selftests/ublk/test_null_01.sh
+++ b/tools/testing/selftests/ublk/test_null_01.sh
@@ -6,6 +6,10 @@
TID="null_01"
ERR_CODE=0
+if ! _have_program fio; then
+ exit "$UBLK_SKIP_CODE"
+fi
+
_prep_test "null" "basic IO test"
dev_id=$(_add_ublk_dev -t null)
diff --git a/tools/testing/selftests/ublk/test_null_02.sh b/tools/testing/selftests/ublk/test_null_02.sh
index 5633ca8766554b22be252c7cb2d13de1bf923b90..8accd35beb55c149f74b23f0fb562e12cbf3e362 100755
--- a/tools/testing/selftests/ublk/test_null_02.sh
+++ b/tools/testing/selftests/ublk/test_null_02.sh
@@ -6,6 +6,10 @@
TID="null_02"
ERR_CODE=0
+if ! _have_program fio; then
+ exit "$UBLK_SKIP_CODE"
+fi
+
_prep_test "null" "basic IO test with zero copy"
dev_id=$(_add_ublk_dev -t null -z)
diff --git a/tools/testing/selftests/ublk/test_stress_05.sh b/tools/testing/selftests/ublk/test_stress_05.sh
index 566cfd90d192ce8c1f98ca2539792d54a787b3d1..274295061042e5db3f4f0846ae63ea9b787fb2ee 100755
--- a/tools/testing/selftests/ublk/test_stress_05.sh
+++ b/tools/testing/selftests/ublk/test_stress_05.sh
@@ -5,6 +5,10 @@
TID="stress_05"
ERR_CODE=0
+if ! _have_program fio; then
+ exit "$UBLK_SKIP_CODE"
+fi
+
run_io_and_remove()
{
local size=$1
---
base-commit: da7b97ba0d219a14a83e9cc93f98b53939f12944
change-id: 20250916-ublk_fio-1910998b00b3
Best regards,
--
Uday Shankar <ushankar(a)purestorage.com>
Check if watchdog device supports WDIOF_KEEPALIVEPING option before
entering keep_alive() ping test loop. Fix watchdog-test silently looping
if ioctl based ping is not supported by the device. Exit from test in
such case instead of stucking in loop executing failing keep_alive()
Fixes: d89d08ffd2c5 ("selftests: watchdog: Fix ioctl SET* error paths to take oneshot exit path")
Signed-off-by: Akhilesh Patil <akhilesh(a)ee.iitb.ac.in>
---
Testing:
# wdt_test_1 -f /dev/watchdog0 -i
watchdog_info:
identity: m41t93 rtc Watchdog
firmware_version: 0
Support/Status: Set timeout (in seconds)
Support/Status: Watchdog triggers a management or other external alarm not a reboot
# wdt_test_1 -f /dev/watchdog0 -d -t 5 -p 2 -e
Watchdog card disabled.
Watchdog timeout set to 5 seconds.
Watchdog ping rate set to 2 seconds.
Watchdog card enabled.
WDIOC_KEEPALIVE not supported by this device
without this change
# wdt_test_2 -f /dev/watchdog0 -d -t 5 -p 2 -e
Watchdog card disabled.
Watchdog timeout set to 5 seconds.
Watchdog ping rate set to 2 seconds.
Watchdog card enabled.
Watchdog Ticking Away!
^C
(Where test stuck here forver silently)
---
tools/testing/selftests/watchdog/watchdog-test.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/watchdog/watchdog-test.c b/tools/testing/selftests/watchdog/watchdog-test.c
index a1f506ba5578..4f09c5db0c7f 100644
--- a/tools/testing/selftests/watchdog/watchdog-test.c
+++ b/tools/testing/selftests/watchdog/watchdog-test.c
@@ -332,6 +332,12 @@ int main(int argc, char *argv[])
if (oneshot)
goto end;
+ /* Check if WDIOF_KEEPALIVEPING is supported */
+ if (!(info.options & WDIOF_KEEPALIVEPING)) {
+ printf("WDIOC_KEEPALIVE not supported by this device\n");
+ goto end;
+ }
+
printf("Watchdog Ticking Away!\n");
/*
--
2.34.1
FEAT_LSFE is optional from v9.5, it adds new instructions for atomic
memory operations with floating point values. We have no immediate use
for it in kernel, provide a hwcap so userspace can discover it and allow
the ID register field to be exposed to KVM guests.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v3:
- Rebase onto v6.17-rc1.
- Link to v2: https://lore.kernel.org/r/20250703-arm64-lsfe-v2-0-eced80999cb4@kernel.org
Changes in v2:
- Fix result of vi dropping in hwcap test.
- Link to v1: https://lore.kernel.org/r/20250627-arm64-lsfe-v1-0-68351c4bf741@kernel.org
---
Mark Brown (3):
arm64/hwcap: Add hwcap for FEAT_LSFE
KVM: arm64: Expose FEAT_LSFE to guests
kselftest/arm64: Add lsfe to the hwcaps test
Documentation/arch/arm64/elf_hwcaps.rst | 4 ++++
arch/arm64/include/asm/hwcap.h | 1 +
arch/arm64/include/uapi/asm/hwcap.h | 1 +
arch/arm64/kernel/cpufeature.c | 2 ++
arch/arm64/kernel/cpuinfo.c | 1 +
arch/arm64/kvm/sys_regs.c | 4 +++-
tools/testing/selftests/arm64/abi/hwcap.c | 21 +++++++++++++++++++++
7 files changed, 33 insertions(+), 1 deletion(-)
---
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
change-id: 20250625-arm64-lsfe-0810cf98adc2
Best regards,
--
Mark Brown <broonie(a)kernel.org>
This is series 2b/5 of the migration to `core::ffi::CStr`[0].
20250704-core-cstr-prepare-v1-0-a91524037783(a)gmail.com.
This series depends on the prior series[0] and is intended to go through
the rust tree to reduce the number of release cycles required to
complete the work.
Subsystem maintainers: I would appreciate your `Acked-by`s so that this
can be taken through Miguel's tree (where the other series must go).
[0] https://lore.kernel.org/all/20250704-core-cstr-prepare-v1-0-a91524037783@gm…
Signed-off-by: Tamir Duberstein <tamird(a)gmail.com>
---
Changes in v3:
- Add a patch to deal with new code in acpi.
- Drop incorrectly applied Acked-by tags from Danilo.
- Link to v2: https://lore.kernel.org/r/20250719-core-cstr-fanout-1-v2-0-e1cb53f6d233@gma…
Changes in v2:
- Update patch title (was nova-core, now drm/panic).
- Link to v1: https://lore.kernel.org/r/20250709-core-cstr-fanout-1-v1-0-fd793b3e58a2@gma…
---
Tamir Duberstein (11):
drm/panic: use `core::ffi::CStr` method names
rust: auxiliary: use `core::ffi::CStr` method names
rust: configfs: use `core::ffi::CStr` method names
rust: cpufreq: use `core::ffi::CStr` method names
rust: drm: use `core::ffi::CStr` method names
rust: firmware: use `core::ffi::CStr` method names
rust: kunit: use `core::ffi::CStr` method names
rust: miscdevice: use `core::ffi::CStr` method names
rust: net: use `core::ffi::CStr` method names
rust: of: use `core::ffi::CStr` method names
rust: acpi: use `core::ffi::CStr` method names
drivers/gpu/drm/drm_panic_qr.rs | 2 +-
rust/kernel/acpi.rs | 7 ++-----
rust/kernel/auxiliary.rs | 4 ++--
rust/kernel/configfs.rs | 4 ++--
rust/kernel/cpufreq.rs | 2 +-
rust/kernel/drm/device.rs | 4 ++--
rust/kernel/firmware.rs | 2 +-
rust/kernel/kunit.rs | 6 +++---
rust/kernel/miscdevice.rs | 2 +-
rust/kernel/net/phy.rs | 2 +-
rust/kernel/of.rs | 2 +-
samples/rust/rust_configfs.rs | 2 +-
12 files changed, 18 insertions(+), 21 deletions(-)
---
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
change-id: 20250709-core-cstr-fanout-1-f20611832272
Best regards,
--
Tamir Duberstein <tamird(a)gmail.com>
Unlike IPv4, IPv6 routing strictly requires the source address to be valid
on the outgoing interface. If the NS target is set to a remote VLAN interface,
and the source address is also configured on a VLAN over a bond interface,
setting the oif to the bond device will fail to retrieve the correct
destination route.
Fix this by not setting the oif to the bond device when retrieving the NS
target destination. This allows the correct destination device (the VLAN
interface) to be determined, so that bond_verify_device_path can return the
proper VLAN tags for sending NS messages.
Reported-by: David Wilder <wilder(a)us.ibm.com>
Closes: https://lore.kernel.org/netdev/aGOKggdfjv0cApTO@fedora/
Suggested-by: Jay Vosburgh <jv(a)jvosburgh.net>
Tested-by: David Wilder <wilder(a)us.ibm.com>
Acked-by: Jay Vosburgh <jv(a)jvosburgh.net>
Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets")
Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
v3: no update
v2: split the patch into 2 parts, the kernel change and test update (Jay Vosburgh)
---
drivers/net/bonding/bond_main.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 257333c88710..30cf97f4e814 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3355,7 +3355,6 @@ static void bond_ns_send_all(struct bonding *bond, struct slave *slave)
/* Find out through which dev should the packet go */
memset(&fl6, 0, sizeof(struct flowi6));
fl6.daddr = targets[i];
- fl6.flowi6_oif = bond->dev->ifindex;
dst = ip6_route_output(dev_net(bond->dev), NULL, &fl6);
if (dst->error) {
--
2.50.1
Checkpatch.pl expects at least 4 lines of help text.
Extend the help text to make checkpatch.pl happy.
Fixes: 031cdd3bc3f3 ("kunit: Enable PCI on UML without triggering WARN()")
Suggested-by: Shuah Khan <skhan(a)linuxfoundation.org>
Link: https://lore.kernel.org/lkml/3dc95227-2be9-48a0-bdea-3f283d9b2a38@linuxfoun…
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Feel free to fold this into the original commit.
---
lib/kunit/Kconfig | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/lib/kunit/Kconfig b/lib/kunit/Kconfig
index 1823539e96da30e165fa8d395ccbd3f6754c836e..7a6af361d2fc6276b9667be8c694b0c80e33c1e8 100644
--- a/lib/kunit/Kconfig
+++ b/lib/kunit/Kconfig
@@ -112,5 +112,9 @@ config KUNIT_UML_PCI
select UML_PCI
help
Enables the PCI subsystem on UML for use by KUnit tests.
+ Some KUnit tests require the PCI core which is not enabled by
+ default on UML.
+
+ If unsure, say N.
endif # KUNIT
---
base-commit: f20e264262f1e6a6e5302249e37da355d844b52b
change-id: 20250916-kunit-pci-kconfig-357264bb45f4
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
The `kunit_test` proc macro only checks for the `test` attribute
immediately preceding a `fn`. If the function is disabled via a `cfg`,
the generated code would result in a compile error referencing a
non-existent function [1].
This collects attributes and specifically cherry-picks `cfg` attributes
to be duplicated inside KUnit wrapper functions such that a test function
disabled via `cfg` compiles and is ignored correctly.
Link: https://lore.kernel.org/rust-for-linux/CANiq72==48=69hYiDo1321pCzgn_n1_jg=e… [1]
Closes: https://github.com/Rust-for-Linux/linux/issues/1185
Suggested-by: Miguel Ojeda <ojeda(a)kernel.org>
Signed-off-by: Kaibo Ma <ent3rm4n(a)gmail.com>
---
rust/kernel/kunit.rs | 7 +++++++
rust/macros/kunit.rs | 46 ++++++++++++++++++++++++++++++++------------
2 files changed, 41 insertions(+), 12 deletions(-)
diff --git a/rust/kernel/kunit.rs b/rust/kernel/kunit.rs
index 41efd8759..32640dfc9 100644
--- a/rust/kernel/kunit.rs
+++ b/rust/kernel/kunit.rs
@@ -357,4 +357,11 @@ fn rust_test_kunit_example_test() {
fn rust_test_kunit_in_kunit_test() {
assert!(in_kunit_test());
}
+
+ #[test]
+ #[cfg(not(all()))]
+ fn rust_test_kunit_always_disabled_test() {
+ // This test should never run because of the `cfg`.
+ assert!(false);
+ }
}
diff --git a/rust/macros/kunit.rs b/rust/macros/kunit.rs
index 81d18149a..850a321e5 100644
--- a/rust/macros/kunit.rs
+++ b/rust/macros/kunit.rs
@@ -5,6 +5,7 @@
//! Copyright (c) 2023 José Expósito <jose.exposito89(a)gmail.com>
use proc_macro::{Delimiter, Group, TokenStream, TokenTree};
+use std::collections::HashMap;
use std::fmt::Write;
pub(crate) fn kunit_tests(attr: TokenStream, ts: TokenStream) -> TokenStream {
@@ -41,20 +42,32 @@ pub(crate) fn kunit_tests(attr: TokenStream, ts: TokenStream) -> TokenStream {
// Get the functions set as tests. Search for `[test]` -> `fn`.
let mut body_it = body.stream().into_iter();
let mut tests = Vec::new();
+ let mut attributes: HashMap<String, TokenStream> = HashMap::new();
while let Some(token) = body_it.next() {
match token {
- TokenTree::Group(ident) if ident.to_string() == "[test]" => match body_it.next() {
- Some(TokenTree::Ident(ident)) if ident.to_string() == "fn" => {
- let test_name = match body_it.next() {
- Some(TokenTree::Ident(ident)) => ident.to_string(),
- _ => continue,
- };
- tests.push(test_name);
+ TokenTree::Punct(ref p) if p.as_char() == '#' => match body_it.next() {
+ Some(TokenTree::Group(g)) if g.delimiter() == Delimiter::Bracket => {
+ if let Some(TokenTree::Ident(name)) = g.stream().into_iter().next() {
+ // Collect attributes because we need to find which are tests. We also
+ // need to copy `cfg` attributes so tests can be conditionally enabled.
+ attributes
+ .entry(name.to_string())
+ .or_default()
+ .extend([token, TokenTree::Group(g)]);
+ }
+ continue;
}
- _ => continue,
+ _ => (),
},
+ TokenTree::Ident(i) if i.to_string() == "fn" && attributes.contains_key("test") => {
+ if let Some(TokenTree::Ident(test_name)) = body_it.next() {
+ tests.push((test_name, attributes.remove("cfg").unwrap_or_default()))
+ }
+ }
+
_ => (),
}
+ attributes.clear();
}
// Add `#[cfg(CONFIG_KUNIT="y")]` before the module declaration.
@@ -100,11 +113,20 @@ pub(crate) fn kunit_tests(attr: TokenStream, ts: TokenStream) -> TokenStream {
let mut test_cases = "".to_owned();
let mut assert_macros = "".to_owned();
let path = crate::helpers::file();
- for test in &tests {
+ let num_tests = tests.len();
+ for (test, cfg_attr) in tests {
let kunit_wrapper_fn_name = format!("kunit_rust_wrapper_{test}");
- // An extra `use` is used here to reduce the length of the message.
+ // Append any `cfg` attributes the user might have written on their tests so we don't
+ // attempt to call them when they are `cfg`'d out. An extra `use` is used here to reduce
+ // the length of the assert message.
let kunit_wrapper = format!(
- "unsafe extern \"C\" fn {kunit_wrapper_fn_name}(_test: *mut ::kernel::bindings::kunit) {{ use ::kernel::kunit::is_test_result_ok; assert!(is_test_result_ok({test}())); }}",
+ r#"unsafe extern "C" fn {kunit_wrapper_fn_name}(_test: *mut ::kernel::bindings::kunit)
+ {{
+ {cfg_attr} {{
+ use ::kernel::kunit::is_test_result_ok;
+ assert!(is_test_result_ok({test}()));
+ }}
+ }}"#,
);
writeln!(kunit_macros, "{kunit_wrapper}").unwrap();
writeln!(
@@ -139,7 +161,7 @@ macro_rules! assert_eq {{
writeln!(
kunit_macros,
"static mut TEST_CASES: [::kernel::bindings::kunit_case; {}] = [\n{test_cases} ::kernel::kunit::kunit_case_null(),\n];",
- tests.len() + 1
+ num_tests + 1
)
.unwrap();
--
2.50.1
The active-backup bonding mode supports XFRM ESP offload. However, when
a bond is added using command like `ip link add bond0 type bond mode 1
miimon 100`, the `ethtool -k` command shows that the XFRM ESP offload is
disabled. This occurs because, in bond_newlink(), we change bond link
first and register bond device later. So the XFRM feature update in
bond_option_mode_set() is not called as the bond device is not yet
registered, leading to the offload feature not being set successfully.
To resolve this issue, we can modify the code order in bond_newlink() to
ensure that the bond device is registered first before changing the bond
link parameters. This change will allow the XFRM ESP offload feature to be
correctly enabled.
Fixes: 007ab5345545 ("bonding: fix feature flag setting at init time")
Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
drivers/net/bonding/bond_main.c | 2 +-
drivers/net/bonding/bond_netlink.c | 16 +++++++++-------
include/net/bonding.h | 1 +
3 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 257333c88710..2182b34226ca 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4411,7 +4411,7 @@ void bond_work_init_all(struct bonding *bond)
INIT_DELAYED_WORK(&bond->slave_arr_work, bond_slave_arr_handler);
}
-static void bond_work_cancel_all(struct bonding *bond)
+void bond_work_cancel_all(struct bonding *bond)
{
cancel_delayed_work_sync(&bond->mii_work);
cancel_delayed_work_sync(&bond->arp_work);
diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c
index 57fff2421f1b..7a9d73ec8e91 100644
--- a/drivers/net/bonding/bond_netlink.c
+++ b/drivers/net/bonding/bond_netlink.c
@@ -579,20 +579,22 @@ static int bond_newlink(struct net_device *bond_dev,
struct rtnl_newlink_params *params,
struct netlink_ext_ack *extack)
{
+ struct bonding *bond = netdev_priv(bond_dev);
struct nlattr **data = params->data;
struct nlattr **tb = params->tb;
int err;
- err = bond_changelink(bond_dev, tb, data, extack);
- if (err < 0)
+ err = register_netdevice(bond_dev);
+ if (err)
return err;
- err = register_netdevice(bond_dev);
- if (!err) {
- struct bonding *bond = netdev_priv(bond_dev);
+ netif_carrier_off(bond_dev);
+ bond_work_init_all(bond);
- netif_carrier_off(bond_dev);
- bond_work_init_all(bond);
+ err = bond_changelink(bond_dev, tb, data, extack);
+ if (err) {
+ bond_work_cancel_all(bond);
+ unregister_netdevice(bond_dev);
}
return err;
diff --git a/include/net/bonding.h b/include/net/bonding.h
index e06f0d63b2c1..bd56ad976cfb 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -711,6 +711,7 @@ struct bond_vlan_tag *bond_verify_device_path(struct net_device *start_dev,
int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave);
void bond_slave_arr_work_rearm(struct bonding *bond, unsigned long delay);
void bond_work_init_all(struct bonding *bond);
+void bond_work_cancel_all(struct bonding *bond);
#ifdef CONFIG_PROC_FS
void bond_create_proc_entry(struct bonding *bond);
--
2.50.1
RX devmem sometimes fails on NIPA:
https://netdev-3.bots.linux.dev/vmksft-fbnic-qemu-dbg/results/294402/7-devm…
Both RSS and flow steering are properly installed, but the wait_port_listen
fails. Try to remove sleep(1) to see if the cause of the failure is
spending too much time during RX setup. I don't see a good reason to
have sleep in the first place. If there needs to be a delay between
installing the rules and receiving the traffic, let's add it to the
callers (devmem.py) instead.
Signed-off-by: Stanislav Fomichev <sdf(a)fomichev.me>
---
tools/testing/selftests/drivers/net/hw/ncdevmem.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/hw/ncdevmem.c b/tools/testing/selftests/drivers/net/hw/ncdevmem.c
index c0a22938bed2..3288ed04ce08 100644
--- a/tools/testing/selftests/drivers/net/hw/ncdevmem.c
+++ b/tools/testing/selftests/drivers/net/hw/ncdevmem.c
@@ -872,8 +872,6 @@ static int do_server(struct memory_buffer *mem)
goto err_reset_rss;
}
- sleep(1);
-
if (bind_rx_queue(ifindex, mem->fd, create_queues(), num_queues, &ys)) {
pr_err("Failed to bind");
goto err_reset_flow_steering;
--
2.51.0
Here are some small unrelated cleanups collected when working on some
fixes recently.
- Patches 1 & 2: close file descriptors in exit paths in the selftests.
- Patch 3: fix a wrong type (int i/o u32) when parsing netlink message.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Geliang Tang (2):
selftests: mptcp: close server file descriptors
selftests: mptcp: close server IPC descriptors
Matthieu Baerts (NGI0) (1):
mptcp: pm: netlink: fix if-idx type
net/mptcp/pm_netlink.c | 2 +-
tools/testing/selftests/net/mptcp/mptcp_inq.c | 9 +++++++--
tools/testing/selftests/net/mptcp/mptcp_sockopt.c | 9 +++++++--
3 files changed, 15 insertions(+), 5 deletions(-)
---
base-commit: dc2f650f7e6857bf384069c1a56b2937a1ee370d
change-id: 20250912-net-next-mptcp-minor-fixes-6-18-a10e141ae3e7
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
This series should fix the recent instabilities seen by MPTCP and NIPA
CIs where the 'mptcp_connect.sh' tests fail regularly when running the
'disconnect' subtests with "plain" TCP sockets, e.g.
# INFO: disconnect
# 63 ns1 MPTCP -> ns1 (10.0.1.1:20001 ) MPTCP (duration 996ms) [ OK ]
# 64 ns1 MPTCP -> ns1 (10.0.1.1:20002 ) TCP (duration 851ms) [ OK ]
# 65 ns1 TCP -> ns1 (10.0.1.1:20003 ) MPTCP Unexpected revents: POLLERR/POLLNVAL(19)
# (duration 896ms) [FAIL] file received by server does not match (in, out):
# -rw-r--r-- 1 root root 11112852 Aug 19 09:16 /tmp/tmp.hlJe5DoMoq.disconnect
# Trailing bytes are:
# /{ga 6@=#.8:-rw------- 1 root root 10085368 Aug 19 09:16 /tmp/tmp.blClunilxx
# Trailing bytes are:
# /{ga 6@=#.8:66 ns1 MPTCP -> ns1 (dead:beef:1::1:20004) MPTCP (duration 987ms) [ OK ]
# 67 ns1 MPTCP -> ns1 (dead:beef:1::1:20005) TCP (duration 911ms) [ OK ]
# 68 ns1 TCP -> ns1 (dead:beef:1::1:20006) MPTCP (duration 980ms) [ OK ]
# [FAIL] Tests of the full disconnection have failed
These issues started to be visible after some behavioural changes in
TCP, where too quick re-connections after a shutdown() can now be more
easily rejected. Patch 3 modifies the selftests to wait, but this
resolution revealed an issue in MPTCP which is fixed by patch 1 (a fix
for v5.9 kernel).
Patches 2 and 4 improve some errors reported by the selftests, and patch
5 helps with the debugging of such issues.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Note: The last two patches are not strictly fixes, but they are useful
in case similar issues happen again. That's why they have been added
here in this series for -net. If that's an issue, please drop them, and
I can re-send them later on.
---
Matthieu Baerts (NGI0) (5):
mptcp: propagate shutdown to subflows when possible
selftests: mptcp: connect: catch IO errors on listen side
selftests: mptcp: avoid spurious errors on TCP disconnect
selftests: mptcp: print trailing bytes with od
selftests: mptcp: connect: print pcap prefix
net/mptcp/protocol.c | 16 ++++++++++++++++
tools/testing/selftests/net/mptcp/mptcp_connect.c | 11 ++++++-----
tools/testing/selftests/net/mptcp/mptcp_connect.sh | 6 +++++-
tools/testing/selftests/net/mptcp/mptcp_lib.sh | 2 +-
4 files changed, 28 insertions(+), 7 deletions(-)
---
base-commit: 2690cb089502b80b905f2abdafd1bf2d54e1abef
change-id: 20250912-net-mptcp-fix-sft-connect-f095ad7a6e36
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
During the connection establishment, a peer can tell the other one that
it cannot establish new subflows to the initial IP address and port by
setting the 'C' flag [1]. Doing so makes sense when the sender is behind
a strict NAT, operating behind a legacy Layer 4 load balancer, or using
anycast IP address for example.
When this 'C' flag is set, the path-managers must then not try to
establish new subflows to the other peer's initial IP address and port.
The in-kernel PM has access to this info, but the userspace PM didn't,
not letting the userspace daemon able to respect the RFC8684.
Here are a few fixes related to this 'C' flag (aka 'deny-join-id0'):
- Patch 1: add remote_deny_join_id0 info on passive connections. A fix
for v5.14.
- Patch 2: let the userspace PM daemon know about the deny_join_id0
attribute, so when set, it can avoid creating new subflows to the
initial IP address and port. A fix for v5.19.
- Patch 3: a validation for the previous commit.
- Patch 4: record the deny_join_id0 info when TFO is used. A fix for
v6.2.
- Patch 5: not related to deny-join-id0, but it fixes errors messages in
the sockopt selftests, not to create confusions. A fix for v6.5.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Geliang Tang (1):
selftests: mptcp: sockopt: fix error messages
Matthieu Baerts (NGI0) (4):
mptcp: set remote_deny_join_id0 on SYN recv
mptcp: pm: nl: announce deny-join-id0 flag
selftests: mptcp: userspace pm: validate deny-join-id0 flag
mptcp: tfo: record 'deny join id0' info
Documentation/netlink/specs/mptcp_pm.yaml | 4 ++--
include/uapi/linux/mptcp.h | 2 ++
include/uapi/linux/mptcp_pm.h | 4 ++--
net/mptcp/options.c | 6 +++---
net/mptcp/pm_netlink.c | 7 +++++++
net/mptcp/subflow.c | 4 ++++
tools/testing/selftests/net/mptcp/mptcp_sockopt.c | 16 ++++++++++------
tools/testing/selftests/net/mptcp/pm_nl_ctl.c | 7 +++++++
tools/testing/selftests/net/mptcp/userspace_pm.sh | 14 +++++++++++---
9 files changed, 48 insertions(+), 16 deletions(-)
---
base-commit: 2690cb089502b80b905f2abdafd1bf2d54e1abef
change-id: 20250912-net-mptcp-pm-uspace-deny_join_id0-b6111e4e7e69
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
[ based on kvm/next ]
Implement guest_memfd allocation and population via the write syscall.
This is useful in non-CoCo use cases where the host can access guest
memory. Even though the same can also be achieved via userspace mapping
and memcpying from userspace, write provides a more performant option
because it does not need to set page tables and it does not cause a page
fault for every page like memcpy would. Note that memcpy cannot be
accelerated via MADV_POPULATE_WRITE as it is not supported by
guest_memfd and relies on GUP.
Populating 512MiB of guest_memfd on a x86 machine:
- via memcpy: 436 ms
- via write: 202 ms (-54%)
v5:
- Replace the call to the unexported filemap_remove_folio with
zeroing the bytes that could not be copied
- Fix checkpatch findings
v4:
- https://lore.kernel.org/kvm/20250828153049.3922-1-kalyazin@amazon.com
- Switch from implementing the write callback to write_iter
- Remove conditional compilation
v3:
- https://lore.kernel.org/kvm/20250303130838.28812-1-kalyazin@amazon.com
- David/Mike D: Only compile support for the write syscall if
CONFIG_KVM_GMEM_SHARED_MEM (now gone) is enabled.
v2:
- https://lore.kernel.org/kvm/20241129123929.64790-1-kalyazin@amazon.com
- Switch from an ioctl to the write syscall to implement population
v1:
- https://lore.kernel.org/kvm/20241024095429.54052-1-kalyazin@amazon.com
Nikita Kalyazin (2):
KVM: guest_memfd: add generic population via write
KVM: selftests: update guest_memfd write tests
.../testing/selftests/kvm/guest_memfd_test.c | 86 +++++++++++++++++--
virt/kvm/guest_memfd.c | 62 ++++++++++++-
2 files changed, 141 insertions(+), 7 deletions(-)
base-commit: a6ad54137af92535cfe32e19e5f3bc1bb7dbd383
--
2.50.1
After commit 5c3bf6cba791 ("bonding: assign random address if device
address is same as bond"), bonding will erroneously randomize the MAC
address of the first interface added to the bond if fail_over_mac =
follow.
Correct this by additionally testing for the bond being empty before
randomizing the MAC.
Fixes: 5c3bf6cba791 ("bonding: assign random address if device address is same as bond")
Reported-by: Qiuling Ren <qren(a)redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
drivers/net/bonding/bond_main.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 257333c88710..8832bc9f107b 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2132,6 +2132,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
memcpy(ss.__data, bond_dev->dev_addr, bond_dev->addr_len);
} else if (bond->params.fail_over_mac == BOND_FOM_FOLLOW &&
BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP &&
+ bond_has_slaves(bond) &&
memcmp(slave_dev->dev_addr, bond_dev->dev_addr, bond_dev->addr_len) == 0) {
/* Set slave to random address to avoid duplicate mac
* address in later fail over.
--
2.50.1
From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com>
Hello,
Please find the v18 AccECN protocol patch series, which covers the core
functionality of Accurate ECN, AccECN negotiation, AccECN TCP options,
and AccECN failure handling. The Accurate ECN draft can be found in
https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-accurate-ecn-28, and it
will be RFC9768.
This patch series is part of the full AccECN patch series, which is available at
https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/
Best Regards,
Chia-Yu
---
v18 (11-Sep-2025)
- Reorder tcpi_accecn_fail_mode and tcpi_accecn_opt_seen to avoid adding fields in the middle of tcp_info (Eric Dumazet <edumazet(a)google.com>)
v17 (8-Sep-2025)
- Change tcp_ecn_mode_max from 5 to 2 to disable AccECN enablement before the whole AccECN feature been accpeted
v16 (6-Sep-2025)
- Use TCP_ECN_IN_ACCECN_OUT_ACCECN, TCP_ECN_IN_ECN_OUT_ECN, and TCP_ECN_IN_ACCECN_OUT_ECN in comments of tcp_ecn_send_syn() (Eric Dumazet <edumazet(a)google.com>)
- Add tcpi_accecn_fail_mode and tcpi_accecn_opt_seen to make tcp_info be multiple of 64 bits in patch #12
v15 (14-Aug-205)
- Update pahole results in commit messages
- Accurate ECN will become RFC9768
v14 (22-Jul-2025)
- Add missing const for struct tcp_sock of tcp_accecn_option_beacon_check() of #11 (Simon Horman <horms(a)kernel.org>)
v13 (18-Jul-2025)
- Implement tcp_accecn_extract_syn_ect() and tcp_accecn_reflector_flags() with static array lookup of patch #6 (Paolo Abeni <pabeni(a)redhat.com>)
- Fix typos in comments of #6 and remove patch #7 of v12 about simulatenous connect (Paolo Abeni <pabeni(a)redhat.com>)
- Move TCP_ACCECN_E1B_INIT_OFFSET, TCP_ACCECN_E0B_INIT_OFFSET, and TCP_ACCECN_CEB_INIT_OFFSET from patch #7 to #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Use static array lookup in tcp_accecn_optfield_to_ecnfield() of patch #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Return false when WARN_ON_ONCE() is true in tcp_accecn_process_option() of patch #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Make synack_ecn_bytes as static const array and use const u32 pointer in tcp_options_write() of #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Use ALIGN() and ALIGN_DOWN() in tcp_options_fit_accecn() to pad TCP AccECN option to dword of #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Return TCP_ACCECN_OPT_FAIL_SEEN if WARN_ON_ONCE() is true in tcp_accecn_option_init() of #12 (Paolo Abeni <pabeni(a)redhat.com>)
v12 (04-Jul-2025)
- Fix compilation issues with some intermediate patches in v11
- Add more comments for AccECN helpers of tcp_ecn.h
v11 (03-Jul-2025)
- Fix compilation issues with some intermediate patches in v10
v10 (02-Jul-2025)
- Add new patch of separated header file include/net/tcp_ecn.h to include ECN and AccECN functions (Eric Dumazet <edumazet(a)google.com>)
- Add comments on the AccECN helper functions in tcp_ecn.h (Eric Dumazet <edumazet(a)google.com>)
- Add documentation of tcp_ecn, tcp_ecn_option, tcp_ecn_beacon in ip-sysctl.rst to the corresponding patch (Eric Dumazet <edumazet(a)google.com>)
- Split wait third ACK functionality into a separated patch from AccECN negotiation patch (Eric Dumazet <edumazet(a)google.com>)
- Add READ_ONCE() over every reads of sysctl for all patches in the series (Eric Dumazet <edumazet(a)google.com>)
- Merge heuristics of AccECN option ceb/cep and ACE field multi-wrap into a single patch
- Add a table of SACK block reduction and required AccECN field in patch #15 commit message (Eric Dumazet <edumazet(a)google.com>)
v9 (21-Jun-2025)
- Use tcp_data_ecn_check() to set TCP_ECN_SEE flag only for RFC3168 ECN (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments about setting TCP_ECN_SEEN flag for RFC3168 and Accruate ECN (Paolo Abeni <pabeni(a)redhat.com>)
- Restruct the code in the for loop of tcp_accecn_process_option() (Paolo Abeni <pabeni(a)redhat.com>)
- Remove ecn_bytes and add use_synack_ecn_bytes flag to identify whether syn_ack_bytes or received_ecn_bytes is used (Paolo Abeni <pabeni(a)redhat.com>)
- Replace leftover_bytes and leftover_size with leftover_highbyte and leftover_lowbyte and add comments in tcp_options_write() (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments and commit message about the 1st retx SYN still attempt AccECN negotiation (Paolo Abeni <pabeni(a)redhat.com>)
v8 (10-Jun-2025)
- Add new helper function tcp_ecn_received_counters_payload() in #6 (Paolo Abeni <pabeni(a)redhat.com>)
- Set opts->num_sack_blocks=0 to avoid potential undefined value in #8 (Paolo Abeni <pabeni(a)redhat.com>)
- Reset leftover_size to 2 once leftover_bytes is used in #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Add new helper function tcp_accecn_opt_demand_min() in #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Add new helper function tcp_accecn_saw_opt_fail_recv() in #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Update tcp_options_fit_accecn() to avoid using recursion in #14 (Paolo Abeni <pabeni(a)redhat.com>)
v7 (14-May-2025)
- Modify group sizes of tcp_sock_write_txrx and tcp_sock_write_rx in #3 based on pahole results (Paolo Abeni <pabeni(a)redhat.com>)
- Fix the issue in #4 and #5 where the RFC3168 ECN behavior in tcp_ecn_send() is changed (Paolo Abeni <pabeni(a)redhat.com>)
- Modify group size of tcp_sock_write_txrx in #4 and #6 based on pahole results (Paolo Abeni <pabeni(a)redhat.com>)
- Update commit message for #9 to explain the increase in tcp_sock_write_rx group size
- Modify group size of tcp_sock_write_tx in #10 based on pahole results
v6 (09-May-2025)
- Add #3 to utilize exisintg holes of tcp_sock_write_txrx group for later patches (#4, #9, #10) with new u8 members (Paolo Abeni <pabeni(a)redhat.com>)
- Add pahole outcomes before and after commit in #4, #5, #6, #9, #10, #15 (Paolo Abeni <pabeni(a)redhat.com>)
- Define new helper function tcp_send_ack_reflect_ect() for sending ACK with reflected ECT in #5 (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments for function tcp_ecn_rcv_synack() in #5 (Paolo Abeni <pabeni(a)redhat.com>)
- Add enum/define to be used by sysctl_tcp_ecn in #5, sysctl_tcp_ecn_option in #9, and sysctl_tcp_ecn_option_beacon in #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Move accecn_fail_mode and saw_accecn_opt in #5 and #11 to use exisintg holes of tcp_sock (Paolo Abeni <pabeni(a)redhat.com>)
- Change data type of new members of tcp_request_sock and move them to the end of struct in #5 and #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Move new members of tcp_info to the end of struct in #6 (Paolo Abeni <pabeni(a)redhat.com>)
- Merge previous #7 into #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Mask ecnfield with INET_ECN_MASK to remove WARN_ONCE in #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Reduce the indentation levels for reabability in #9 and #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Move delivered_ecn_bytes to the RX group in #9, accecn_opt_tstamp to the TX group in #10, pkts_acked_ewma to the RX group in #15 (Paolo Abeni <pabeni(a)redhat.com>)
- Add changes in Documentation/networking/net_cachelines/tcp_sock.rst for new tcp_sock members in #3, #5, #6, #9, #10, #15
v5 (22-Apr-2025)
- Further fix for 32-bit ARM alignment in tcp.c (Simon Horman <horms(a)kernel.org>)
v4 (18-Apr-2025)
- Fix 32-bit ARM assertion for alignment requirement (Simon Horman <horms(a)kernel.org>)
v3 (14-Apr-2025)
- Fix patch apply issue in v2 (Jakub Kicinski <kuba(a)kernel.org>)
v2 (18-Mar-2025)
- Add one missing patch from the previous AccECN protocol preparation patch series to this patch series.
---
Chia-Yu Chang (5):
tcp: reorganize tcp_sock_write_txrx group for variables later
tcp: ecn functions in separated include file
tcp: accecn: AccECN option send control
tcp: accecn: AccECN option failure handling
tcp: accecn: try to fit AccECN option with SACK
Ilpo Järvinen (9):
tcp: reorganize SYN ECN code
tcp: fast path functions later
tcp: AccECN core
tcp: accecn: AccECN negotiation
tcp: accecn: add AccECN rx byte counters
tcp: accecn: AccECN needs to know delivered bytes
tcp: sack option handling improvements
tcp: accecn: AccECN option
tcp: accecn: AccECN option ceb/cep and ACE field multi-wrap heuristics
Documentation/networking/ip-sysctl.rst | 55 +-
.../networking/net_cachelines/tcp_sock.rst | 12 +
include/linux/tcp.h | 32 +-
include/net/netns/ipv4.h | 2 +
include/net/tcp.h | 87 ++-
include/net/tcp_ecn.h | 642 ++++++++++++++++++
include/uapi/linux/tcp.h | 9 +
net/ipv4/syncookies.c | 4 +
net/ipv4/sysctl_net_ipv4.c | 19 +
net/ipv4/tcp.c | 30 +-
net/ipv4/tcp_input.c | 353 ++++++++--
net/ipv4/tcp_ipv4.c | 8 +-
net/ipv4/tcp_minisocks.c | 40 +-
net/ipv4/tcp_output.c | 294 ++++++--
net/ipv6/syncookies.c | 2 +
net/ipv6/tcp_ipv6.c | 1 +
16 files changed, 1406 insertions(+), 184 deletions(-)
create mode 100644 include/net/tcp_ecn.h
--
2.34.1
For systems having CONFIG_NR_CPUS set to > 1024 in kernel config
the selftest fails as arena_spin_lock_irqsave() returns EOPNOTSUPP.
(eg - incase of powerpc default value for CONFIG_NR_CPUS is 8192)
The selftest is skipped incase bpf program returns EOPNOTSUPP,
with a descriptive message logged.
Tested-by: Venkat Rao Bagalkote <venkat88(a)linux.ibm.com>
Signed-off-by: Saket Kumar Bhaskar <skb99(a)linux.ibm.com>
---
Changes since v2:
* Separated arena_spin_lock selftest fix patch from the arena
patchset as it has to go via bpf-next tree.
* For EOPNOTSUPP set test_skip to 3, to differentiate it from
scenarios when run conditions are not met as suggested by Hari.
* Tweaked message displayed on SKIP to remove display of online
cpus.
v2:https://lore.kernel.org/all/20250829165135.1273071-1-skb99@linux.ibm.com/
Changes since v1:
Addressed comments from Alexei:
* Removed skel->rodata->nr_cpus = get_nprocs() and its usage to get
currently online cpus(as it needs to be updated from userspace).
v1:https://lore.kernel.org/all/20250805062747.3479221-1-skb99@linux.ibm.com/
---
.../selftests/bpf/prog_tests/arena_spin_lock.c | 13 +++++++++++++
tools/testing/selftests/bpf/progs/arena_spin_lock.c | 5 ++++-
2 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/arena_spin_lock.c b/tools/testing/selftests/bpf/prog_tests/arena_spin_lock.c
index 0223fce4db2b..693fd86fbde6 100644
--- a/tools/testing/selftests/bpf/prog_tests/arena_spin_lock.c
+++ b/tools/testing/selftests/bpf/prog_tests/arena_spin_lock.c
@@ -40,8 +40,13 @@ static void *spin_lock_thread(void *arg)
err = bpf_prog_test_run_opts(prog_fd, &topts);
ASSERT_OK(err, "test_run err");
+
+ if (topts.retval == -EOPNOTSUPP)
+ goto end;
+
ASSERT_EQ((int)topts.retval, 0, "test_run retval");
+end:
pthread_exit(arg);
}
@@ -63,6 +68,7 @@ static void test_arena_spin_lock_size(int size)
skel = arena_spin_lock__open_and_load();
if (!ASSERT_OK_PTR(skel, "arena_spin_lock__open_and_load"))
return;
+
if (skel->data->test_skip == 2) {
test__skip();
goto end;
@@ -86,6 +92,13 @@ static void test_arena_spin_lock_size(int size)
goto end_barrier;
}
+ if (skel->data->test_skip == 3) {
+ printf("%s:SKIP: CONFIG_NR_CPUS exceed the maximum supported by arena spinlock\n",
+ __func__);
+ test__skip();
+ goto end_barrier;
+ }
+
ASSERT_EQ(skel->bss->counter, repeat * nthreads, "check counter value");
end_barrier:
diff --git a/tools/testing/selftests/bpf/progs/arena_spin_lock.c b/tools/testing/selftests/bpf/progs/arena_spin_lock.c
index c4500c37f85e..086b57a426cf 100644
--- a/tools/testing/selftests/bpf/progs/arena_spin_lock.c
+++ b/tools/testing/selftests/bpf/progs/arena_spin_lock.c
@@ -37,8 +37,11 @@ int prog(void *ctx)
#if defined(ENABLE_ATOMICS_TESTS) && defined(__BPF_FEATURE_ADDR_SPACE_CAST)
unsigned long flags;
- if ((ret = arena_spin_lock_irqsave(&lock, flags)))
+ if ((ret = arena_spin_lock_irqsave(&lock, flags))) {
+ if (ret == -EOPNOTSUPP)
+ test_skip = 3;
return ret;
+ }
if (counter != limit)
counter++;
bpf_repeat(cs_count);
--
2.43.5
Various KUnit tests require PCI infrastructure to work. All normal
platforms enable PCI by default, but UML does not. Enabling PCI from
.kunitconfig files is problematic as it would not be portable. So in
commit 6fc3a8636a7b ("kunit: tool: Enable virtio/PCI by default on UML")
PCI was enabled by way of CONFIG_UML_PCI_OVER_VIRTIO=y. However
CONFIG_UML_PCI_OVER_VIRTIO requires additional configuration of
CONFIG_UML_PCI_OVER_VIRTIO_DEVICE_ID or will otherwise trigger a WARN() in
virtio_pcidev_init(). However there is no one correct value for
UML_PCI_OVER_VIRTIO_DEVICE_ID which could be used by default.
This warning is confusing when debugging test failures.
On the other hand, the functionality of CONFIG_UML_PCI_OVER_VIRTIO is not
used at all, given that it is completely non-functional as indicated by
the WARN() in question. Instead it is only used as a way to enable
CONFIG_UML_PCI which itself is not directly configurable.
Instead of going through CONFIG_UML_PCI_OVER_VIRTIO, introduce a custom
configuration option which enables CONFIG_UML_PCI without triggering
warnings or building dead code.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Reviewed-by: Johannes Berg <johannes(a)sipsolutions.net>
---
Changes in v2:
- Rebase onto v6.17-rc1
- Pick up review from Johannes
- Link to v1: https://lore.kernel.org/r/20250627-kunit-uml-pci-v1-1-a622fa445e58@linutron…
---
lib/kunit/Kconfig | 7 +++++++
tools/testing/kunit/configs/arch_uml.config | 5 ++---
2 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/lib/kunit/Kconfig b/lib/kunit/Kconfig
index c10ede4b1d2201d5f8cddeb71cc5096e21be9b6a..1823539e96da30e165fa8d395ccbd3f6754c836e 100644
--- a/lib/kunit/Kconfig
+++ b/lib/kunit/Kconfig
@@ -106,4 +106,11 @@ config KUNIT_DEFAULT_TIMEOUT
If unsure, the default timeout of 300 seconds is suitable for most
cases.
+config KUNIT_UML_PCI
+ bool "KUnit UML PCI Support"
+ depends on UML
+ select UML_PCI
+ help
+ Enables the PCI subsystem on UML for use by KUnit tests.
+
endif # KUNIT
diff --git a/tools/testing/kunit/configs/arch_uml.config b/tools/testing/kunit/configs/arch_uml.config
index 54ad8972681a2cc724e6122b19407188910b9025..28edf816aa70e6f408d9486efff8898df79ee090 100644
--- a/tools/testing/kunit/configs/arch_uml.config
+++ b/tools/testing/kunit/configs/arch_uml.config
@@ -1,8 +1,7 @@
# Config options which are added to UML builds by default
-# Enable virtio/pci, as a lot of tests require it.
-CONFIG_VIRTIO_UML=y
-CONFIG_UML_PCI_OVER_VIRTIO=y
+# Enable pci, as a lot of tests require it.
+CONFIG_KUNIT_UML_PCI=y
# Enable FORTIFY_SOURCE for wider checking.
CONFIG_FORTIFY_SOURCE=y
---
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
change-id: 20250626-kunit-uml-pci-a2b687553746
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
This series is trimmed down version of previous more generic series[1].
In this new series, only -wunreachable-code flag is being added and dead
code is being removed from generated warnings.
[1] https://lore.kernel.org/all/20250822082145.4145617-1-usama.anjum@collabora.…
Muhammad Usama Anjum (2):
selftests/mm: Add -Wunreachable-code and fix warnings
selftests/mm: protection_keys: Fix dead code
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/hmm-tests.c | 5 ++---
tools/testing/selftests/mm/pkey_sighandler_tests.c | 2 +-
tools/testing/selftests/mm/protection_keys.c | 4 +---
tools/testing/selftests/mm/split_huge_page_test.c | 2 +-
5 files changed, 6 insertions(+), 8 deletions(-)
--
2.47.3
[ I think at this point everyone is OK with the ABI, and the x86
implementation has been tested so hopefully we are near to being
able to get this merged? If there are any outstanding issues let
me know and I can look at addressing them. The one possible issue
I am aware of is that the RISC-V shadow stack support was briefly
in -next but got dropped along with the general RISC-V issues during
the last merge window, rebasing for that is still in progress. I
guess ideally this could be applied on a branch and then pulled into
the RISC-V tree? ]
The kernel has recently added support for shadow stacks, currently
x86 only using their CET feature but both arm64 and RISC-V have
equivalent features (GCS and Zicfiss respectively), I am actively
working on GCS[1]. With shadow stacks the hardware maintains an
additional stack containing only the return addresses for branch
instructions which is not generally writeable by userspace and ensures
that any returns are to the recorded addresses. This provides some
protection against ROP attacks and making it easier to collect call
stacks. These shadow stacks are allocated in the address space of the
userspace process.
Our API for shadow stacks does not currently offer userspace any
flexiblity for managing the allocation of shadow stacks for newly
created threads, instead the kernel allocates a new shadow stack with
the same size as the normal stack whenever a thread is created with the
feature enabled. The stacks allocated in this way are freed by the
kernel when the thread exits or shadow stacks are disabled for the
thread. This lack of flexibility and control isn't ideal, in the vast
majority of cases the shadow stack will be over allocated and the
implicit allocation and deallocation is not consistent with other
interfaces. As far as I can tell the interface is done in this manner
mainly because the shadow stack patches were in development since before
clone3() was implemented.
Since clone3() is readily extensible let's add support for specifying a
shadow stack when creating a new thread or process, keeping the current
implicit allocation behaviour if one is not specified either with
clone3() or through the use of clone(). The user must provide a shadow
stack pointer, this must point to memory mapped for use as a shadow
stackby map_shadow_stack() with an architecture specified shadow stack
token at the top of the stack.
Yuri Khrustalev has raised questions from the libc side regarding
discoverability of extended clone3() structure sizes[2], this seems like
a general issue with clone3(). There was a suggestion to add a hwcap on
arm64 which isn't ideal but is doable there, though architecture
specific mechanisms would also be needed for x86 (and RISC-V if it's
support gets merged before this does). The idea has, however, had
strong pushback from the architecture maintainers and it is possible to
detect support for this in clone3() by attempting a call with a
misaligned shadow stack pointer specified so no hwcap has been added.
[1] https://lore.kernel.org/linux-arm-kernel/20241001-arm64-gcs-v13-0-222b78d87…
[2] https://lore.kernel.org/r/aCs65ccRQtJBnZ_5@arm.com
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v20:
- Comment fixes and clarifications in x86 arch_shstk_validate_clone()
from Rick Edgecombe.
- Spelling fix in documentation.
- Link to v19: https://lore.kernel.org/r/20250819-clone3-shadow-stack-v19-0-bc957075479b@k…
Changes in v19:
- Rebase onto v6.17-rc1.
- Link to v18: https://lore.kernel.org/r/20250702-clone3-shadow-stack-v18-0-7965d2b694db@k…
Changes in v18:
- Rebase onto v6.16-rc3.
- Thanks to pointers from Yuri Khrustalev this version has been tested
on x86 so I have removed the RFT tag.
- Clarify clone3_shadow_stack_valid() comment about the Kconfig check.
- Remove redundant GCSB DSYNCs in arm64 code.
- Fix token validation on x86.
- Link to v17: https://lore.kernel.org/r/20250609-clone3-shadow-stack-v17-0-8840ed97ff6f@k…
Changes in v17:
- Rebase onto v6.16-rc1.
- Link to v16: https://lore.kernel.org/r/20250416-clone3-shadow-stack-v16-0-2ffc9ca3917b@k…
Changes in v16:
- Rebase onto v6.15-rc2.
- Roll in fixes from x86 testing from Rick Edgecombe.
- Rework so that the argument is shadow_stack_token.
- Link to v15: https://lore.kernel.org/r/20250408-clone3-shadow-stack-v15-0-3fa245c6e3be@k…
Changes in v15:
- Rebase onto v6.15-rc1.
- Link to v14: https://lore.kernel.org/r/20250206-clone3-shadow-stack-v14-0-805b53af73b9@k…
Changes in v14:
- Rebase onto v6.14-rc1.
- Link to v13: https://lore.kernel.org/r/20241203-clone3-shadow-stack-v13-0-93b89a81a5ed@k…
Changes in v13:
- Rebase onto v6.13-rc1.
- Link to v12: https://lore.kernel.org/r/20241031-clone3-shadow-stack-v12-0-7183eb8bee17@k…
Changes in v12:
- Add the regular prctl() to the userspace API document since arm64
support is queued in -next.
- Link to v11: https://lore.kernel.org/r/20241005-clone3-shadow-stack-v11-0-2a6a2bd6d651@k…
Changes in v11:
- Rebase onto arm64 for-next/gcs, which is based on v6.12-rc1, and
integrate arm64 support.
- Rework the interface to specify a shadow stack pointer rather than a
base and size like we do for the regular stack.
- Link to v10: https://lore.kernel.org/r/20240821-clone3-shadow-stack-v10-0-06e8797b9445@k…
Changes in v10:
- Integrate fixes & improvements for the x86 implementation from Rick
Edgecombe.
- Require that the shadow stack be VM_WRITE.
- Require that the shadow stack base and size be sizeof(void *) aligned.
- Clean up trailing newline.
- Link to v9: https://lore.kernel.org/r/20240819-clone3-shadow-stack-v9-0-962d74f99464@ke…
Changes in v9:
- Pull token validation earlier and report problems with an error return
to parent rather than signal delivery to the child.
- Verify that the top of the supplied shadow stack is VM_SHADOW_STACK.
- Rework token validation to only do the page mapping once.
- Drop no longer needed support for testing for signals in selftest.
- Fix typo in comments.
- Link to v8: https://lore.kernel.org/r/20240808-clone3-shadow-stack-v8-0-0acf37caf14c@ke…
Changes in v8:
- Fix token verification with user specified shadow stack.
- Don't track user managed shadow stacks for child processes.
- Link to v7: https://lore.kernel.org/r/20240731-clone3-shadow-stack-v7-0-a9532eebfb1d@ke…
Changes in v7:
- Rebase onto v6.11-rc1.
- Typo fixes.
- Link to v6: https://lore.kernel.org/r/20240623-clone3-shadow-stack-v6-0-9ee7783b1fb9@ke…
Changes in v6:
- Rebase onto v6.10-rc3.
- Ensure we don't try to free the parent shadow stack in error paths of
x86 arch code.
- Spelling fixes in userspace API document.
- Additional cleanups and improvements to the clone3() tests to support
the shadow stack tests.
- Link to v5: https://lore.kernel.org/r/20240203-clone3-shadow-stack-v5-0-322c69598e4b@ke…
Changes in v5:
- Rebase onto v6.8-rc2.
- Rework ABI to have the user allocate the shadow stack memory with
map_shadow_stack() and a token.
- Force inlining of the x86 shadow stack enablement.
- Move shadow stack enablement out into a shared header for reuse by
other tests.
- Link to v4: https://lore.kernel.org/r/20231128-clone3-shadow-stack-v4-0-8b28ffe4f676@ke…
Changes in v4:
- Formatting changes.
- Use a define for minimum shadow stack size and move some basic
validation to fork.c.
- Link to v3: https://lore.kernel.org/r/20231120-clone3-shadow-stack-v3-0-a7b8ed3e2acc@ke…
Changes in v3:
- Rebase onto v6.7-rc2.
- Remove stale shadow_stack in internal kargs.
- If a shadow stack is specified unconditionally use it regardless of
CLONE_ parameters.
- Force enable shadow stacks in the selftest.
- Update changelogs for RISC-V feature rename.
- Link to v2: https://lore.kernel.org/r/20231114-clone3-shadow-stack-v2-0-b613f8681155@ke…
Changes in v2:
- Rebase onto v6.7-rc1.
- Remove ability to provide preallocated shadow stack, just specify the
desired size.
- Link to v1: https://lore.kernel.org/r/20231023-clone3-shadow-stack-v1-0-d867d0b5d4d0@ke…
---
Mark Brown (8):
arm64/gcs: Return a success value from gcs_alloc_thread_stack()
Documentation: userspace-api: Add shadow stack API documentation
selftests: Provide helper header for shadow stack testing
fork: Add shadow stack support to clone3()
selftests/clone3: Remove redundant flushes of output streams
selftests/clone3: Factor more of main loop into test_clone3()
selftests/clone3: Allow tests to flag if -E2BIG is a valid error code
selftests/clone3: Test shadow stack support
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/shadow_stack.rst | 44 +++++
arch/arm64/include/asm/gcs.h | 8 +-
arch/arm64/kernel/process.c | 8 +-
arch/arm64/mm/gcs.c | 55 +++++-
arch/x86/include/asm/shstk.h | 11 +-
arch/x86/kernel/process.c | 2 +-
arch/x86/kernel/shstk.c | 53 ++++-
include/asm-generic/cacheflush.h | 11 ++
include/linux/sched/task.h | 17 ++
include/uapi/linux/sched.h | 9 +-
kernel/fork.c | 93 +++++++--
tools/testing/selftests/clone3/clone3.c | 226 ++++++++++++++++++----
tools/testing/selftests/clone3/clone3_selftests.h | 65 ++++++-
tools/testing/selftests/ksft_shstk.h | 98 ++++++++++
15 files changed, 620 insertions(+), 81 deletions(-)
---
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
change-id: 20231019-clone3-shadow-stack-15d40d2bf536
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Ever since the introduction of pid namespaces, procfs has had very
implicit behaviour surrounding them (the pidns used by a procfs mount is
auto-selected based on the mounting process's active pidns, and the
pidns itself is basically hidden once the mount has been constructed).
/* pidns mount option for procfs */
This implicit behaviour has historically meant that userspace was
required to do some special dances in order to configure the pidns of a
procfs mount as desired. Examples include:
* In order to bypass the mnt_too_revealing() check, Kubernetes creates
a procfs mount from an empty pidns so that user namespaced containers
can be nested (without this, the nested containers would fail to
mount procfs). But this requires forking off a helper process because
you cannot just one-shot this using mount(2).
* Container runtimes in general need to fork into a container before
configuring its mounts, which can lead to security issues in the case
of shared-pidns containers (a privileged process in the pidns can
interact with your container runtime process). While
SUID_DUMP_DISABLE and user namespaces make this less of an issue, the
strict need for this due to a minor uAPI wart is kind of unfortunate.
Things would be much easier if there was a way for userspace to just
specify the pidns they want. Patch 1 implements a new "pidns" argument
which can be set using fsconfig(2):
fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd);
fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0);
or classic mount(2) / mount(8):
// mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc
mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid");
The initial security model I have in this RFC is to be as conservative
as possible and just mirror the security model for setns(2) -- which
means that you can only set pidns=... to pid namespaces that your
current pid namespace is a direct ancestor of and you have CAP_SYS_ADMIN
privileges over the pid namespace. This fulfils the requirements of
container runtimes, but I suspect that this may be too strict for some
usecases.
The pidns argument is not displayed in mountinfo -- it's not clear to me
what value it would make sense to show (maybe we could just use ns_dname
to provide an identifier for the namespace, but this number would be
fairly useless to userspace). I'm open to suggestions. Note that
PROCFS_GET_PID_NAMESPACE (see below) does at least let userspace get
information about this outside of mountinfo.
Note that you cannot change the pidns of an already-created procfs
instance. The primary reason is that allowing this to be changed would
require RCU-protecting proc_pid_ns(sb) and thus auditing all of
fs/proc/* and some of the users in fs/* to make sure they wouldn't UAF
the pid namespace. Since creating procfs instances is very cheap, it
seems unnecessary to overcomplicate this upfront. Trying to reconfigure
procfs this way errors out with -EBUSY.
/* ioctl(PROCFS_GET_PID_NAMESPACE) */
In addition, being able to figure out what pid namespace is being used
by a procfs mount is quite useful when you have an administrative
process (such as a container runtime) which wants to figure out the
correct way of mapping PIDs between its own namespace and the namespace
for procfs (using NS_GET_{PID,TGID}_{IN,FROM}_PIDNS). There are
alternative ways to do this, but they all rely on ancillary information
that third-party libraries and tools do not necessarily have access to.
To make this easier, add a new ioctl (PROCFS_GET_PID_NAMESPACE) which
can be used to get a reference to the pidns that a procfs is using.
Rather than copying the (fairly strict) security model for setns(2),
apply a slightly looser model to better match what userspace can already
do:
* Make the ioctl only valid on the root (meaning that a process without
access to the procfs root -- such as only having an fd to a procfs
file or some open_tree(2)-like subset -- cannot use this API). This
means that the process already has some level of access to the
/proc/$pid directories.
* If the calling process is in an ancestor pidns, then they can already
create pidfd for processes inside the pidns, which is morally
equivalent to a pidns file descriptor according to setns(2). So it
seems reasonable to just allow it in this case. (The justification
for this model was suggested by Christian.)
* If the process has access to /proc/1/ns/pid already (i.e. has
ptrace-read access to the pidns pid1), then this ioctl is equivalent
to just opening a handle to it that way.
Ideally we would check for ptrace-read access against all processes
in the pidns (which is very likely to be true for at least one
process, as SUID_DUMP_DISABLE is cleared on exec(2) and is rarely set
by most programs), but this would obviously not scale.
I'm open to suggestions for whether we need to make this stricter (or
possibly allow more cases).
Signed-off-by: Aleksa Sarai <cyphar(a)cyphar.com>
---
Changes in v4:
- Remove unneeded EXPORT_SYMBOL_GPL. [Christian Brauner]
- Return -EOPNOTSUPP for new APIs for CONFIG_PID_NS=n rather than
pretending they don't exist entirely. [Christian Brauner]
- PROCFS_IOCTL_MAGIC conflicts with XSDFEC_MAGIC, so we need to allocate
subvalues more carefully (switch to _IO(PROCFS_IOCTL_MAGIC, 32)).
- Add some more selftests for PROCFS_GET_PID_NAMESPACE.
- Reword argument for PROCFS_GET_PID_NAMESPACE security model based on
Christian's suggestion, and remove CAP_SYS_ADMIN edge-case (in most
cases, such a process would also have ptrace-read credentials over the
pidns pid1).
- v3: <https://lore.kernel.org/r/20250724-procfs-pidns-api-v3-0-4c685c910923@cypha…>
Changes in v3:
- Disallow changing pidns for existing procfs instances, as we'd
probably have to RCU-protect everything that touches the pinned pidns
reference.
- Improve tests with slightly nicer ASSERT_ERRNO* macros.
- v2: <https://lore.kernel.org/r/20250723-procfs-pidns-api-v2-0-621e7edd8e40@cypha…>
Changes in v2:
- #ifdef CONFIG_PID_NS
- Improve cover letter wording to make it clear we're talking about two
separate features with different permission models. [Andy Lutomirski]
- Fix build warnings in pidns_is_ancestor() patch. [kernel test robot]
- v1: <https://lore.kernel.org/r/20250721-procfs-pidns-api-v1-0-5cd9007e512d@cypha…>
---
Aleksa Sarai (4):
pidns: move is-ancestor logic to helper
procfs: add "pidns" mount option
procfs: add PROCFS_GET_PID_NAMESPACE ioctl
selftests/proc: add tests for new pidns APIs
Documentation/filesystems/proc.rst | 12 ++
fs/proc/root.c | 166 +++++++++++++++-
include/linux/pid_namespace.h | 9 +
include/uapi/linux/fs.h | 4 +
kernel/pid_namespace.c | 22 ++-
tools/testing/selftests/proc/.gitignore | 1 +
tools/testing/selftests/proc/Makefile | 1 +
tools/testing/selftests/proc/proc-pidns.c | 315 ++++++++++++++++++++++++++++++
8 files changed, 514 insertions(+), 16 deletions(-)
---
base-commit: 66639db858112bf6b0f76677f7517643d586e575
change-id: 20250717-procfs-pidns-api-8ed1583431f0
Best regards,
--
Aleksa Sarai <cyphar(a)cyphar.com>
From: Benjamin Berg <benjamin.berg(a)intel.com>
For a while now, we have discussed that it may be better to avoid using
libc inside UML as it may be interfering in unexpected ways with kernel
functionality. A major point of concern is that there is no guarantee
that the libc is not using any address space that may conflict with
kernel addresses.
This patchset is an attempt to start a nolibc port of UML. The goal is
to port UML to use nolibc in smaller chunks to make the switch more
manageable.
There are three parts to this patchset:
* Two patches to use tools/include headers instead of kernel headers
for userspace files.
* A few nolibc fixes and a new NOLIBC_NO_STARTCODE compile flag for it
* Finally nolibc build support for UML and switching two files
The first two parts could be merged independently. The last step to use
nolibc inside UML obviously depends on the first two.
Benjamin
Benjamin Berg (9):
tools compiler.h: fix __used definition
um: use tools/include for user files
tools/nolibc/stdio: remove perror if NOLIBC_IGNORE_ERRNO is set
tools/nolibc/dirent: avoid errno in readdir_r
tools/nolibc: use __fallthrough__ rather than fallthrough
tools/nolibc: add option to disable startup code
um: add infrastructure to build files using nolibc
um: use nolibc for the --showconfig implementation
um: switch ptrace FP register access to nolibc
arch/um/Makefile | 32 ++++++++++++++++---
.../um/include/shared/generated/asm-offsets.h | 1 +
.../include/shared/generated/user_constants.h | 1 +
arch/um/include/shared/init.h | 2 +-
arch/um/include/shared/os.h | 2 ++
arch/um/include/shared/user.h | 5 ---
arch/um/kernel/Makefile | 2 +-
arch/um/kernel/skas/stub.c | 1 +
arch/um/kernel/skas/stub_exe.c | 4 +--
arch/um/os-Linux/skas/process.c | 6 ++--
arch/um/os-Linux/start_up.c | 4 +--
arch/um/scripts/Makefile.rules | 10 ++++--
arch/x86/um/Makefile | 6 ++--
arch/x86/um/os-Linux/Makefile | 5 ++-
arch/x86/um/os-Linux/registers.c | 22 +++++--------
arch/x86/um/user-offsets.c | 1 -
tools/include/linux/compiler.h | 2 +-
tools/include/nolibc/arch-arm.h | 2 ++
tools/include/nolibc/arch-arm64.h | 2 ++
tools/include/nolibc/arch-loongarch.h | 2 ++
tools/include/nolibc/arch-m68k.h | 2 ++
tools/include/nolibc/arch-mips.h | 2 ++
tools/include/nolibc/arch-powerpc.h | 2 ++
tools/include/nolibc/arch-riscv.h | 2 ++
tools/include/nolibc/arch-s390.h | 2 ++
tools/include/nolibc/arch-sh.h | 2 ++
tools/include/nolibc/arch-sparc.h | 2 ++
tools/include/nolibc/arch-x86.h | 4 +++
tools/include/nolibc/compiler.h | 4 +--
tools/include/nolibc/crt.h | 3 ++
tools/include/nolibc/dirent.h | 6 ++--
tools/include/nolibc/stackprotector.h | 2 ++
tools/include/nolibc/stdio.h | 2 ++
tools/include/nolibc/stdlib.h | 2 ++
tools/include/nolibc/sys.h | 3 +-
tools/include/nolibc/sys/auxv.h | 3 ++
36 files changed, 108 insertions(+), 47 deletions(-)
create mode 120000 arch/um/include/shared/generated/asm-offsets.h
create mode 120000 arch/um/include/shared/generated/user_constants.h
--
2.51.0
On Sun, Sep 14, 2025 at 6:24 AM Chris Mason <clm(a)meta.com> wrote:
>
> On Fri, 8 Aug 2025 08:28:49 -0700 Suren Baghdasaryan <surenb(a)google.com> wrote:
>
> > Utilize per-vma locks to stabilize vma after lookup without taking
> > mmap_lock during PROCMAP_QUERY ioctl execution. If vma lock is
> > contended, we fall back to mmap_lock but take it only momentarily
> > to lock the vma and release the mmap_lock. In a very unlikely case
> > of vm_refcnt overflow, this fall back path will fail and ioctl is
> > done under mmap_lock protection.
> >
> > This change is designed to reduce mmap_lock contention and prevent
> > PROCMAP_QUERY ioctl calls from blocking address space updates.
> >
> > Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
> > Acked-by: SeongJae Park <sj(a)kernel.org>
> > ---
> > fs/proc/task_mmu.c | 103 +++++++++++++++++++++++++++++++++++++--------
> > 1 file changed, 85 insertions(+), 18 deletions(-)
> >
> > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > index c0968d293b61..e64cf40ce9c4 100644
> > --- a/fs/proc/task_mmu.c
> > +++ b/fs/proc/task_mmu.c
> > @@ -132,6 +132,12 @@ static void release_task_mempolicy(struct proc_maps_private *priv)
>
> [ ... ]
>
> > +static struct vm_area_struct *query_vma_find_by_addr(struct proc_maps_locking_ctx *lock_ctx,
> > + unsigned long addr)
> > +{
> > + struct mm_struct *mm = lock_ctx->mm;
> > + struct vm_area_struct *vma;
> > + struct vma_iterator vmi;
> > +
> > + if (lock_ctx->mmap_locked)
> > + return find_vma(mm, addr);
> > +
> > + /* Unlock previously locked VMA and find the next one under RCU */
> > + unlock_ctx_vma(lock_ctx);
> > + rcu_read_lock();
> > + vma_iter_init(&vmi, mm, addr);
> > + vma = lock_next_vma(mm, &vmi, addr);
> > + rcu_read_unlock();
> > +
> > + if (!vma)
> > + return NULL;
> > +
> > + if (!IS_ERR(vma)) {
> > + lock_ctx->locked_vma = vma;
> > + return vma;
> > + }
> > +
> > + if (PTR_ERR(vma) == -EAGAIN) {
> > + /* Fallback to mmap_lock on vma->vm_refcnt overflow */
> > + mmap_read_lock(mm);
>
> I know it's just a (very rare) fallback, but should we be using
> mmap_read_lock_killable() for consistency? I can see this impacting oom
> kills or other times we really want to be able to get rid of procs.
That's a good idea. From a quick look it seems safe to fail with
-EINTR here, which will propagate all the way to do_procmap_query().
Do you want to post a fixup patch?
Thanks,
Suren.
>
> -chris
Two patches here, first fixes the issue where tunnel core doesn't
actually extract DF bit from the outer IP header, even though both
OVS and TC flower allow matching on it. More details in the commit
message.
The second is a selftest for openvswitch that reproduces the issue,
but also just adds some basic coverage for the tunnel metadata
extraction and related openvswitch uAPI.
Version 2:
* Added missing tun_dst NULL check.
* Added Reviewed-by from Aaron for the selftest.
Version 1:
https://lore.kernel.org/netdev/20250905133105.3940420-1-i.maximets@ovn.org/
Ilya Maximets (2):
net: dst_metadata: fix IP_DF bit not extracted from tunnel headers
selftests: openvswitch: add a simple test for tunnel metadata
include/net/dst_metadata.h | 11 ++-
.../selftests/net/openvswitch/openvswitch.sh | 88 +++++++++++++++++--
2 files changed, 90 insertions(+), 9 deletions(-)
--
2.50.1
The arm64 Guarded Control Stack (GCS) feature provides support for
hardware protected stacks of return addresses, intended to provide
hardening against return oriented programming (ROP) attacks and to make
it easier to gather call stacks for applications such as profiling.
When GCS is active a secondary stack called the Guarded Control Stack is
maintained, protected with a memory attribute which means that it can
only be written with specific GCS operations. The current GCS pointer
can not be directly written to by userspace. When a BL is executed the
value stored in LR is also pushed onto the GCS, and when a RET is
executed the top of the GCS is popped and compared to LR with a fault
being raised if the values do not match. GCS operations may only be
performed on GCS pages, a data abort is generated if they are not.
The combination of hardware enforcement and lack of extra instructions
in the function entry and exit paths should result in something which
has less overhead and is more difficult to attack than a purely software
implementation like clang's shadow stacks.
This series implements support for managing GCS for KVM guests.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v16:
- Rebase onto v6.17-rc3.
- Also expose the feature to nested guests.
- Implement emulation of EXLOCK when returning from nested guests.
- Rename enter_exception_gcs() to compute_exlock().
- Move all ID_AA64PFR1_EL1 handling to the final kernel patch.
- Drop unneeded forwarding of GCS exceptions.
- Commit and cover message updates.
- Link to v15: https://lore.kernel.org/r/20250820-arm64-gcs-v15-0-5e334da18b84@kernel.org
Changes in v15:
- Rebase onto v6.17-rc1.
- Link to v14: https://lore.kernel.org/r/20241005-arm64-gcs-v14-0-59060cd6092b@kernel.org
Changes in v14:
- Rebase onto arm64/for-next/gcs which includes all the non-KVM support.
- Manage the fine grained traps for GCS instructions.
- Manage PSTATE.EXLOCK when delivering exceptions to KVM guests.
- Link to v13: https://lore.kernel.org/r/20241001-arm64-gcs-v13-0-222b78d87eee@kernel.org
Changes in v13:
- Rebase onto v6.12-rc1.
- Allocate VM_HIGH_ARCH_6 since protection keys used all the existing
bits.
- Implement mm_release() and free transparently allocated GCSs there.
- Use bit 32 of AT_HWCAP for GCS due to AT_HWCAP2 being filled.
- Since we now only set GCSCRE0_EL1 on change ensure that it is
initialised with GCSPR_EL0 accessible to EL0.
- Fix OOM handling on thread copy.
- Link to v12: https://lore.kernel.org/r/20240829-arm64-gcs-v12-0-42fec947436a@kernel.org
Changes in v12:
- Clarify and simplify the signal handling code so we work with the
register state.
- When checking for write aborts to shadow stack pages ensure the fault
is a data abort.
- Depend on !UPROBES.
- Comment cleanups.
- Link to v11: https://lore.kernel.org/r/20240822-arm64-gcs-v11-0-41b81947ecb5@kernel.org
Changes in v11:
- Remove the dependency on the addition of clone3() support for shadow
stacks, rebasing onto v6.11-rc3.
- Make ID_AA64PFR1_EL1.GCS writeable in KVM.
- Hide GCS registers when GCS is not enabled for KVM guests.
- Require HCRX_EL2.GCSEn if booting at EL1.
- Require that GCSCR_EL1 and GCSCRE0_EL1 be initialised regardless of
if we boot at EL2 or EL1.
- Remove some stray use of bit 63 in signal cap tokens.
- Warn if we see a GCS with VM_SHARED.
- Remove rdundant check for VM_WRITE in fault handling.
- Cleanups and clarifications in the ABI document.
- Clean up and improve documentation of some sync placement.
- Only set the EL0 GCS mode if it's actually changed.
- Various minor fixes and tweaks.
- Link to v10: https://lore.kernel.org/r/20240801-arm64-gcs-v10-0-699e2bd2190b@kernel.org
Changes in v10:
- Fix issues with THP.
- Tighten up requirements for initialising GCSCR*.
- Only generate GCS signal frames for threads using GCS.
- Only context switch EL1 GCS registers if S1PIE is enabled.
- Move context switch of GCSCRE0_EL1 to EL0 context switch.
- Make GCS registers unconditionally visible to userspace.
- Use FHU infrastructure.
- Don't change writability of ID_AA64PFR1_EL1 for KVM.
- Remove unused arguments from alloc_gcs().
- Typo fixes.
- Link to v9: https://lore.kernel.org/r/20240625-arm64-gcs-v9-0-0f634469b8f0@kernel.org
Changes in v9:
- Rebase onto v6.10-rc3.
- Restructure and clarify memory management fault handling.
- Fix up basic-gcs for the latest clone3() changes.
- Convert to newly merged KVM ID register based feature configuration.
- Fixes for NV traps.
- Link to v8: https://lore.kernel.org/r/20240203-arm64-gcs-v8-0-c9fec77673ef@kernel.org
Changes in v8:
- Invalidate signal cap token on stack when consuming.
- Typo and other trivial fixes.
- Don't try to use process_vm_write() on GCS, it intentionally does not
work.
- Fix leak of thread GCSs.
- Rebase onto latest clone3() series.
- Link to v7: https://lore.kernel.org/r/20231122-arm64-gcs-v7-0-201c483bd775@kernel.org
Changes in v7:
- Rebase onto v6.7-rc2 via the clone3() patch series.
- Change the token used to cap the stack during signal handling to be
compatible with GCSPOPM.
- Fix flags for new page types.
- Fold in support for clone3().
- Replace copy_to_user_gcs() with put_user_gcs().
- Link to v6: https://lore.kernel.org/r/20231009-arm64-gcs-v6-0-78e55deaa4dd@kernel.org
Changes in v6:
- Rebase onto v6.6-rc3.
- Add some more gcsb_dsync() barriers following spec clarifications.
- Due to ongoing discussion around clone()/clone3() I've not updated
anything there, the behaviour is the same as on previous versions.
- Link to v5: https://lore.kernel.org/r/20230822-arm64-gcs-v5-0-9ef181dd6324@kernel.org
Changes in v5:
- Don't map any permissions for user GCSs, we always use EL0 accessors
or use a separate mapping of the page.
- Reduce the standard size of the GCS to RLIMIT_STACK/2.
- Enforce a PAGE_SIZE alignment requirement on map_shadow_stack().
- Clarifications and fixes to documentation.
- More tests.
- Link to v4: https://lore.kernel.org/r/20230807-arm64-gcs-v4-0-68cfa37f9069@kernel.org
Changes in v4:
- Implement flags for map_shadow_stack() allowing the cap and end of
stack marker to be enabled independently or not at all.
- Relax size and alignment requirements for map_shadow_stack().
- Add more blurb explaining the advantages of hardware enforcement.
- Link to v3: https://lore.kernel.org/r/20230731-arm64-gcs-v3-0-cddf9f980d98@kernel.org
Changes in v3:
- Rebase onto v6.5-rc4.
- Add a GCS barrier on context switch.
- Add a GCS stress test.
- Link to v2: https://lore.kernel.org/r/20230724-arm64-gcs-v2-0-dc2c1d44c2eb@kernel.org
Changes in v2:
- Rebase onto v6.5-rc3.
- Rework prctl() interface to allow each bit to be locked independently.
- map_shadow_stack() now places the cap token based on the size
requested by the caller not the actual space allocated.
- Mode changes other than enable via ptrace are now supported.
- Expand test coverage.
- Various smaller fixes and adjustments.
- Link to v1: https://lore.kernel.org/r/20230716-arm64-gcs-v1-0-bf567f93bba6@kernel.org
---
Mark Brown (6):
arm64/gcs: Ensure FGTs for EL1 GCS instructions are disabled
KVM: arm64: Manage GCS access and registers for guests
KVM: arm64: Set PSTATE.EXLOCK when entering an exception
KVM: arm64: Validate GCS exception lock when emulating ERET
KVM: arm64: Allow GCS to be enabled for guests
KVM: selftests: arm64: Add GCS registers to get-reg-list
arch/arm64/include/asm/el2_setup.h | 5 +++
arch/arm64/include/asm/kvm_emulate.h | 3 ++
arch/arm64/include/asm/kvm_host.h | 14 +++++++++
arch/arm64/include/asm/vncr_mapping.h | 2 ++
arch/arm64/include/uapi/asm/ptrace.h | 1 +
arch/arm64/kvm/emulate-nested.c | 40 +++++++++++++++++++++++-
arch/arm64/kvm/hyp/exception.c | 37 ++++++++++++++++++++++
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 31 ++++++++++++++++++
arch/arm64/kvm/hyp/vhe/sysreg-sr.c | 10 ++++++
arch/arm64/kvm/nested.c | 7 +++--
arch/arm64/kvm/sys_regs.c | 32 +++++++++++++++++--
tools/testing/selftests/kvm/arm64/get-reg-list.c | 12 +++++++
12 files changed, 188 insertions(+), 6 deletions(-)
---
base-commit: 1b237f190eb3d36f52dffe07a40b5eb210280e00
change-id: 20230303-arm64-gcs-e311ab0d8729
Best regards,
--
Mark Brown <broonie(a)kernel.org>
The futex_numa_mpol test requires libnuma, which is not available on
all platforms. When the test is not built, the run.sh script fails
because it unconditionally tries to execute the test binary.
Check for the futex_numa_mpol executable before running it. If the
binary is not present, print a skip message and continue.
This allows the test suite to run successfully on platforms that do
not have libnuma and therefore do not build the futex_numa_mpol
test.
Signed-off-by: Wake Liu <wakel(a)google.com>
---
tools/testing/selftests/futex/functional/run.sh | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/futex/functional/run.sh b/tools/testing/selftests/futex/functional/run.sh
index 81739849f299..f3e43eb806bf 100755
--- a/tools/testing/selftests/futex/functional/run.sh
+++ b/tools/testing/selftests/futex/functional/run.sh
@@ -88,4 +88,8 @@ echo
./futex_priv_hash -g $COLOR
echo
-./futex_numa_mpol $COLOR
+if [ -x ./futex_numa_mpol ]; then
+ ./futex_numa_mpol $COLOR
+else
+ echo "SKIP: futex_numa_mpol (not built)"
+fi
--
2.51.0.355.g5224444f11-goog
Soft offlining a HugeTLB page reduces the available HugeTLB page pool.
Since HugeTLB pages are preallocated, reducing the available HugeTLB
page pool can cause allocation failures.
/proc/sys/vm/enable_soft_offline provides a sysctl interface to
disable/enable soft offline:
0 - Soft offline is disabled.
1 - Soft offline is enabled.
The current sysctl interface does not distinguish between HugeTLB pages
and other page types.
Disable soft offline for HugeTLB pages by default (1) and extend the
sysctl interface to preserve existing behavior (2):
0 - Soft offline is disabled.
1 - Soft offline is enabled (excluding HugeTLB pages).
2 - Soft offline is enabled (including HugeTLB pages).
Update documentation for the sysctl interface, reference the sysctl
interface in the sysfs ABI documentation, and update HugeTLB soft
offline selftests.
Reported-by: Shawn Fan <shawn.fan(a)intel.com>
Suggested-by: Tony Luck <tony.luck(a)intel.com>
Signed-off-by: Kyle Meyer <kyle.meyer(a)hpe.com>
---
Tony's original patch disabled soft offline for HugeTLB pages when
a correctable memory error reported via GHES (with "error threshold
exceeded" set) happened to be on a HugeTLB page:
https://lore.kernel.org/all/20250904155720.22149-1-tony.luck@intel.com
This patch disables soft offline for HugeTLB pages by default
(not just from GHES).
---
.../ABI/testing/sysfs-memory-page-offline | 6 ++++
Documentation/admin-guide/sysctl/vm.rst | 18 ++++++++---
mm/memory-failure.c | 21 ++++++++++--
.../selftests/mm/hugetlb-soft-offline.c | 32 +++++++++++++------
4 files changed, 60 insertions(+), 17 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-memory-page-offline b/Documentation/ABI/testing/sysfs-memory-page-offline
index 00f4e35f916f..befb89ae39ec 100644
--- a/Documentation/ABI/testing/sysfs-memory-page-offline
+++ b/Documentation/ABI/testing/sysfs-memory-page-offline
@@ -20,6 +20,12 @@ Description:
number, or a error when the offlining failed. Reading
the file is not allowed.
+ Soft-offline can be disabled/enabled via sysctl:
+ /proc/sys/vm/enable_soft_offline
+
+ For details, see:
+ Documentation/admin-guide/sysctl/vm.rst
+
What: /sys/devices/system/memory/hard_offline_page
Date: Sep 2009
KernelVersion: 2.6.33
diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 4d71211fdad8..ae56372bd604 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -309,19 +309,29 @@ physical memory) vs performance / capacity implications in transparent and
HugeTLB cases.
For all architectures, enable_soft_offline controls whether to soft offline
-memory pages. When set to 1, kernel attempts to soft offline the pages
-whenever it thinks needed. When set to 0, kernel returns EOPNOTSUPP to
-the request to soft offline the pages. Its default value is 1.
+memory pages:
+
+- 0: Soft offline is disabled.
+- 1: Soft offline is enabled (excluding HugeTLB pages).
+- 2: Soft offline is enabled (including HugeTLB pages).
+
+The default is 1.
+
+If soft offline is disabled for the requested page type, EOPNOTSUPP is returned.
It is worth mentioning that after setting enable_soft_offline to 0, the
following requests to soft offline pages will not be performed:
+- Request to soft offline from sysfs (soft_offline_page).
+
- Request to soft offline pages from RAS Correctable Errors Collector.
-- On ARM, the request to soft offline pages from GHES driver.
+- On ARM and X86, the request to soft offline pages from GHES driver.
- On PARISC, the request to soft offline pages from Page Deallocation Table.
+Note: Soft offlining a HugeTLB page reduces the HugeTLB page pool.
+
extfrag_threshold
=================
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index fc30ca4804bf..cb59a99b48c5 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -64,11 +64,18 @@
#include "internal.h"
#include "ras/ras_event.h"
+enum soft_offline {
+ SOFT_OFFLINE_DISABLED = 0,
+ SOFT_OFFLINE_ENABLED_SKIP_HUGETLB,
+ SOFT_OFFLINE_ENABLED
+};
+
static int sysctl_memory_failure_early_kill __read_mostly;
static int sysctl_memory_failure_recovery __read_mostly = 1;
-static int sysctl_enable_soft_offline __read_mostly = 1;
+static int sysctl_enable_soft_offline __read_mostly =
+ SOFT_OFFLINE_ENABLED_SKIP_HUGETLB;
atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
@@ -150,7 +157,7 @@ static const struct ctl_table memory_failure_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
- .extra2 = SYSCTL_ONE,
+ .extra2 = SYSCTL_TWO,
}
};
@@ -2799,12 +2806,20 @@ int soft_offline_page(unsigned long pfn, int flags)
return -EIO;
}
- if (!sysctl_enable_soft_offline) {
+ if (sysctl_enable_soft_offline == SOFT_OFFLINE_DISABLED) {
pr_info_once("disabled by /proc/sys/vm/enable_soft_offline\n");
put_ref_page(pfn, flags);
return -EOPNOTSUPP;
}
+ if (sysctl_enable_soft_offline == SOFT_OFFLINE_ENABLED_SKIP_HUGETLB) {
+ if (folio_test_hugetlb(pfn_folio(pfn))) {
+ pr_info_once("disabled for HugeTLB pages by /proc/sys/vm/enable_soft_offline\n");
+ put_ref_page(pfn, flags);
+ return -EOPNOTSUPP;
+ }
+ }
+
mutex_lock(&mf_mutex);
if (PageHWPoison(page)) {
diff --git a/tools/testing/selftests/mm/hugetlb-soft-offline.c b/tools/testing/selftests/mm/hugetlb-soft-offline.c
index f086f0e04756..7e2873cd0a6d 100644
--- a/tools/testing/selftests/mm/hugetlb-soft-offline.c
+++ b/tools/testing/selftests/mm/hugetlb-soft-offline.c
@@ -1,10 +1,15 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Test soft offline behavior for HugeTLB pages:
- * - if enable_soft_offline = 0, hugepages should stay intact and soft
- * offlining failed with EOPNOTSUPP.
- * - if enable_soft_offline = 1, a hugepage should be dissolved and
- * nr_hugepages/free_hugepages should be reduced by 1.
+ *
+ * - if enable_soft_offline = 0 (SOFT_OFFLINE_DISABLED), HugeTLB pages
+ * should stay intact and soft offlining failed with EOPNOTSUPP.
+ *
+ * - if enable_soft_offline = 1 (SOFT_OFFLINE_ENABLED_SKIP_HUGETLB), HugeTLB pages
+ * should stay intact and soft offlining failed with EOPNOTSUPP.
+ *
+ * - if enable_soft_offline = 2 (SOFT_OFFLINE_ENABLED), a HugeTLB page should be
+ * dissolved and nr_hugepages/free_hugepages should be reduced by 1.
*
* Before running, make sure more than 2 hugepages of default_hugepagesz
* are allocated. For example, if /proc/meminfo/Hugepagesize is 2048kB:
@@ -32,6 +37,12 @@
#define EPREFIX " !!! "
+enum soft_offline {
+ SOFT_OFFLINE_DISABLED = 0,
+ SOFT_OFFLINE_ENABLED_SKIP_HUGETLB,
+ SOFT_OFFLINE_ENABLED
+};
+
static int do_soft_offline(int fd, size_t len, int expect_errno)
{
char *filemap = NULL;
@@ -83,7 +94,7 @@ static int set_enable_soft_offline(int value)
char cmd[256] = {0};
FILE *cmdfile = NULL;
- if (value != 0 && value != 1)
+ if (value < SOFT_OFFLINE_DISABLED || value > SOFT_OFFLINE_ENABLED)
return -EINVAL;
sprintf(cmd, "echo %d > /proc/sys/vm/enable_soft_offline", value);
@@ -155,7 +166,7 @@ static int create_hugetlbfs_file(struct statfs *file_stat)
static void test_soft_offline_common(int enable_soft_offline)
{
int fd;
- int expect_errno = enable_soft_offline ? 0 : EOPNOTSUPP;
+ int expect_errno = (enable_soft_offline == SOFT_OFFLINE_ENABLED) ? 0 : EOPNOTSUPP;
struct statfs file_stat;
unsigned long hugepagesize_kb = 0;
unsigned long nr_hugepages_before = 0;
@@ -198,7 +209,7 @@ static void test_soft_offline_common(int enable_soft_offline)
// No need for the hugetlbfs file from now on.
close(fd);
- if (enable_soft_offline) {
+ if (enable_soft_offline == SOFT_OFFLINE_ENABLED) {
if (nr_hugepages_before != nr_hugepages_after + 1) {
ksft_test_result_fail("MADV_SOFT_OFFLINE should reduced 1 hugepage\n");
return;
@@ -219,10 +230,11 @@ static void test_soft_offline_common(int enable_soft_offline)
int main(int argc, char **argv)
{
ksft_print_header();
- ksft_set_plan(2);
+ ksft_set_plan(3);
- test_soft_offline_common(1);
- test_soft_offline_common(0);
+ test_soft_offline_common(SOFT_OFFLINE_ENABLED);
+ test_soft_offline_common(SOFT_OFFLINE_ENABLED_SKIP_HUGETLB);
+ test_soft_offline_common(SOFT_OFFLINE_DISABLED);
ksft_finished();
}
--
2.51.0
This macro gets used in different tests. Add it to kselftest.h
which is central location and tests use this header. Then use this new
macro.
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
tools/testing/selftests/kselftest.h | 4 ++++
tools/testing/selftests/mm/protection_keys.c | 2 +-
tools/testing/selftests/net/ovpn/ovpn-cli.c | 3 ++-
3 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kselftest.h b/tools/testing/selftests/kselftest.h
index 661d31c4b558c..274480e3573ab 100644
--- a/tools/testing/selftests/kselftest.h
+++ b/tools/testing/selftests/kselftest.h
@@ -92,6 +92,10 @@
#endif
#define __printf(a, b) __attribute__((format(printf, a, b)))
+#ifndef __always_unused
+#define __always_unused __attribute__((__unused__))
+#endif
+
#ifndef __maybe_unused
#define __maybe_unused __attribute__((__unused__))
#endif
diff --git a/tools/testing/selftests/mm/protection_keys.c b/tools/testing/selftests/mm/protection_keys.c
index 6281d4c61b50e..2085982dba696 100644
--- a/tools/testing/selftests/mm/protection_keys.c
+++ b/tools/testing/selftests/mm/protection_keys.c
@@ -1302,7 +1302,7 @@ static void test_mprotect_with_pkey_0(int *ptr, u16 pkey)
static void test_ptrace_of_child(int *ptr, u16 pkey)
{
- __attribute__((__unused__)) int peek_result;
+ __always_unused int peek_result;
pid_t child_pid;
void *ignored = 0;
long ret;
diff --git a/tools/testing/selftests/net/ovpn/ovpn-cli.c b/tools/testing/selftests/net/ovpn/ovpn-cli.c
index 9201f2905f2ce..688a5fa6fdacd 100644
--- a/tools/testing/selftests/net/ovpn/ovpn-cli.c
+++ b/tools/testing/selftests/net/ovpn/ovpn-cli.c
@@ -32,9 +32,10 @@
#include <sys/socket.h>
+#include "../../kselftest.h"
+
/* defines to make checkpatch happy */
#define strscpy strncpy
-#define __always_unused __attribute__((__unused__))
/* libnl < 3.5.0 does not set the NLA_F_NESTED on its own, therefore we
* have to explicitly do it to prevent the kernel from failing upon
--
2.47.3
The three patches fix the va_high_addr_switch.sh test failure on x86_64.
Patch 1 fixes the hugepage setup issue that nr_hugepages is reset too
early in run_vmtests.sh and break the later va_high_addr_switch testing.
Patch 2 adds hugepage setup in va_high_addr_switch test, so that it can
still work if vm_runtests.sh changes the hugepage setup someday.
Patch 3 fixes the test failure caused by the hint addr align method change
in hugetlb_get_unmapped_area().
Changes in v3:
- patch 1 adds the Acked-by from David
- patch 3 changes the mmap hint addr to hugepage aligned from page aligned
Changes in v2:
- patch 1 renames nr_hugepgs_origin to orig_nr_hugepgs
- add a patch 2 to setup hugeapges in va_high_addr_switch test
Chunyu Hu (3):
selftests/mm: fix hugepages cleanup too early
selftests/mm: alloc hugepages in va_high_addr_switch test
selftests/mm: fix va_high_addr_switch.sh failure on x86_64
tools/testing/selftests/mm/run_vmtests.sh | 9 ++++-
.../selftests/mm/va_high_addr_switch.c | 4 +-
.../selftests/mm/va_high_addr_switch.sh | 37 +++++++++++++++++++
3 files changed, 46 insertions(+), 4 deletions(-)
--
2.49.0
Hi Linux-kselftest,
Please provide a quote for your products:
Include:
1.Pricing (per unit)
2.Delivery cost & timeline
3.Quote expiry date
Deadline: September
Thanks!
Kamal Prasad
Albinayah Trading
Some high-level virtual drivers need to compute features from their
lower devices, but each currently has its own implementation and may
miss some feature computations. This patch set introduces a common function
to compute features for such devices.
Currently, bonding, team, and bridge have been updated to use the new
helper.
v3:
a) fix hw_enc_features asign order (Sabrina Dubroca)
b) set virtual dev feature defination in netdev_features.h (Jakub Kicinski)
c) remove unneeded err in team_del_slave (Stanislav Fomichev)
d) remove NETIF_F_HW_ESP test as it needs to be test with GSO pkts (Sabrina Dubroca)
v2:
a) remove hard_header_len setting. I will set needed_headroom for bond/team
in a separate patch as bridge has it's own ways. (Ido Schimmel)
b) Add test file to Makefile, set RET=0 to a proper location. (Ido Schimmel)
Hangbin Liu (5):
net: add a common function to compute features from lowers devices
bonding: use common function to compute the features
team: use common function to compute the features
net: bridge: use common function to compute the features
selftests/net: add offload checking test for virtual interface
drivers/net/bonding/bond_main.c | 99 +-------------
drivers/net/team/team_core.c | 78 +----------
include/linux/netdev_features.h | 18 +++
include/linux/netdevice.h | 1 +
net/bridge/br_if.c | 22 +---
net/core/dev.c | 76 +++++++++++
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/config | 1 +
tools/testing/selftests/net/vdev_offload.sh | 137 ++++++++++++++++++++
9 files changed, 246 insertions(+), 187 deletions(-)
create mode 100755 tools/testing/selftests/net/vdev_offload.sh
--
2.50.1
This patch series introduces support for the PROBE_MEM32,
bpf_addr_space_cast and PROBE_ATOMIC instructions in the powerpc BPF JIT,
facilitating the implementation of BPF arena and arena atomics.
All selftests related to bpf_arena, bpf_arena_atomic(except
load_acquire/store_release) enablement are passing:
# ./test_progs -t arena_list
#5/1 arena_list/arena_list_1:OK
#5/2 arena_list/arena_list_1000:OK
#5 arena_list:OK
Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED
# ./test_progs -t arena_htab
#4/1 arena_htab/arena_htab_llvm:OK
#4/2 arena_htab/arena_htab_asm:OK
#4 arena_htab:OK
Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED
# ./test_progs -t verifier_arena
#464/1 verifier_arena/basic_alloc1:OK
#464/2 verifier_arena/basic_alloc2:OK
#464/3 verifier_arena/basic_alloc3:OK
#464/4 verifier_arena/iter_maps1:OK
#464/5 verifier_arena/iter_maps2:OK
#464/6 verifier_arena/iter_maps3:OK
#464 verifier_arena:OK
#465/1 verifier_arena_large/big_alloc1:OK
#465/2 verifier_arena_large/big_alloc2:OK
#465 verifier_arena_large:OK
Summary: 2/8 PASSED, 0 SKIPPED, 0 FAILED
# ./test_progs -t arena_atomics
#3/1 arena_atomics/add:OK
#3/2 arena_atomics/sub:OK
#3/3 arena_atomics/and:OK
#3/4 arena_atomics/or:OK
#3/5 arena_atomics/xor:OK
#3/6 arena_atomics/cmpxchg:OK
#3/7 arena_atomics/xchg:OK
#3/8 arena_atomics/uaf:OK
#3/9 arena_atomics/load_acquire:SKIP
#3/10 arena_atomics/store_release:SKIP
#3 arena_atomics:OK (SKIP: 2/10)
Summary: 1/8 PASSED, 2 SKIPPED, 0 FAILED
Changes since v2:
* Dropped arena_spin_lock selftest fix patch from the patchset as it has
to go via bpf-next while these changes will go via powerpc tree.
v2:https://lore.kernel.org/all/20250829165135.1273071-1-skb99@linux.ibm.com/
Changes since v1:
Addressed comments from Chris:
* Squashed introduction of bpf_jit_emit_probe_mem_store() and its usage in
one patch.
* Defined and used PPC_RAW_RLDICL_DOT to avoid the CMPDI.
* Removed conditional statement for fixup[0] = PPC_RAW_LI(dst_reg, 0);
* Indicated this change is limited to powerpc64 in subject.
Addressed comments from Alexei:
* Removed skel->rodata->nr_cpus = get_nprocs() and its usage to get
currently online cpus(as it needs to be updated from userspace).
Addressed comments from Hari:
* Updated the bpf jit stack layout and associated macros to accommodate
new NVR.
v1:https://lore.kernel.org/all/20250805062747.3479221-1-skb99@linux.ibm.com/
Saket Kumar Bhaskar (4):
powerpc64/bpf: Implement PROBE_MEM32 pseudo instructions
powerpc64/bpf: Implement bpf_addr_space_cast instruction
powerpc64/bpf: Introduce bpf_jit_emit_atomic_ops() to emit atomic
instructions
powerpc64/bpf: Implement PROBE_ATOMIC instructions
arch/powerpc/include/asm/ppc-opcode.h | 1 +
arch/powerpc/net/bpf_jit.h | 6 +-
arch/powerpc/net/bpf_jit_comp.c | 32 +-
arch/powerpc/net/bpf_jit_comp32.c | 2 +-
arch/powerpc/net/bpf_jit_comp64.c | 401 +++++++++++++++++++-------
5 files changed, 330 insertions(+), 112 deletions(-)
--
2.43.5
Some high-level virtual drivers need to compute features from their
lower devices, but each currently has its own implementation and may
miss some feature computations. This patch set introduces a common function
to compute features for such devices.
Currently, bonding, team, and bridge have been updated to use the new
helper.
Hangbin Liu (5):
net: add a common function to compute features from lowers devices
bonding: use common function to compute the features
team: use common function to compute the features
net: bridge: use common function to compute the features
selftests/net: add offload checking test for virtual interface
drivers/net/bonding/bond_main.c | 99 +----------
drivers/net/team/team_core.c | 73 +-------
include/linux/netdevice.h | 19 +++
net/bridge/br_if.c | 22 +--
net/core/dev.c | 79 +++++++++
tools/testing/selftests/net/config | 2 +
tools/testing/selftests/net/vdev_offload.sh | 174 ++++++++++++++++++++
7 files changed, 285 insertions(+), 183 deletions(-)
create mode 100755 tools/testing/selftests/net/vdev_offload.sh
--
2.50.1
Add an operation, SECCOMP_CLONE_FILTER, that can copy the seccomp
filters from another process to the current process.
Changes from v1 to v2:
* Fixed locking issues. Thanks Al, Alexei, and Kees :)
* Allow filters to be cloned if CAP_SYS_ADMIN or no new privs
is set
* I initially had only CAP_SYS_ADMIN, but I can't think of a
way no new privs is harmful here, so I added it. Thanks, Kees
* Switch to passing in pidfd directly rather than a pointer to a
pidfd
* This more closely aligns with other pidfd syscalls
* Fixed warning in the sample code reported by the test robot
* Various cleanups and improvements in the selftest
Note that I left in the restriction that the target process
has no seccomp filters already loaded. I could see this
limitation being removed in a later patchset, but there are
requests for this feature at present.
Finally, I re-ran the performance numbers and updated the patch
with the latest numbers. The locking changes significantly sped
up the clone operation, and it's now ~1900x faster than the
current method.
Tom Hromatka (1):
seccomp: Add SECCOMP_CLONE_FILTER operation
.../userspace-api/seccomp_filter.rst | 10 ++
include/uapi/linux/seccomp.h | 1 +
kernel/seccomp.c | 48 ++++++
samples/seccomp/.gitignore | 1 +
samples/seccomp/Makefile | 2 +-
samples/seccomp/clone-filter.c | 150 ++++++++++++++++++
tools/include/uapi/linux/seccomp.h | 1 +
tools/testing/selftests/seccomp/seccomp_bpf.c | 114 +++++++++++++
8 files changed, 326 insertions(+), 1 deletion(-)
create mode 100644 samples/seccomp/clone-filter.c
--
2.47.3
From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com>
Hello,
Please find the v17 AccECN protocol patch series, which covers the core
functionality of Accurate ECN, AccECN negotiation, AccECN TCP options,
and AccECN failure handling. The Accurate ECN draft can be found in
https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-accurate-ecn-28, and it
will be RFC9768.
This patch series is part of the full AccECN patch series, which is available at
https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/
Best Regards,
Chia-Yu
---
v17 (8-Sep-2025)
- Change tcp_ecn_mode_max from 5 to 2 to disable AccECN enablement before the whole AccECN feature been accpeted
v16 (6-Sep-2025)
- Use TCP_ECN_IN_ACCECN_OUT_ACCECN, TCP_ECN_IN_ECN_OUT_ECN, and TCP_ECN_IN_ACCECN_OUT_ECN in comments of tcp_ecn_send_syn() (Eric Dumazet <edumazet(a)google.com>)
- Add tcpi_accecn_fail_mode and tcpi_accecn_opt_seen to make tcp_info be multiple of 64 bits in patch #12
v15 (14-Aug-205)
- Update pahole results in commit messages
- Accurate ECN will become RFC9768
v14 (22-Jul-2025)
- Add missing const for struct tcp_sock of tcp_accecn_option_beacon_check() of #11 (Simon Horman <horms(a)kernel.org>)
v13 (18-Jul-2025)
- Implement tcp_accecn_extract_syn_ect() and tcp_accecn_reflector_flags() with static array lookup of patch #6 (Paolo Abeni <pabeni(a)redhat.com>)
- Fix typos in comments of #6 and remove patch #7 of v12 about simulatenous connect (Paolo Abeni <pabeni(a)redhat.com>)
- Move TCP_ACCECN_E1B_INIT_OFFSET, TCP_ACCECN_E0B_INIT_OFFSET, and TCP_ACCECN_CEB_INIT_OFFSET from patch #7 to #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Use static array lookup in tcp_accecn_optfield_to_ecnfield() of patch #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Return false when WARN_ON_ONCE() is true in tcp_accecn_process_option() of patch #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Make synack_ecn_bytes as static const array and use const u32 pointer in tcp_options_write() of #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Use ALIGN() and ALIGN_DOWN() in tcp_options_fit_accecn() to pad TCP AccECN option to dword of #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Return TCP_ACCECN_OPT_FAIL_SEEN if WARN_ON_ONCE() is true in tcp_accecn_option_init() of #12 (Paolo Abeni <pabeni(a)redhat.com>)
v12 (04-Jul-2025)
- Fix compilation issues with some intermediate patches in v11
- Add more comments for AccECN helpers of tcp_ecn.h
v11 (03-Jul-2025)
- Fix compilation issues with some intermediate patches in v10
v10 (02-Jul-2025)
- Add new patch of separated header file include/net/tcp_ecn.h to include ECN and AccECN functions (Eric Dumazet <edumazet(a)google.com>)
- Add comments on the AccECN helper functions in tcp_ecn.h (Eric Dumazet <edumazet(a)google.com>)
- Add documentation of tcp_ecn, tcp_ecn_option, tcp_ecn_beacon in ip-sysctl.rst to the corresponding patch (Eric Dumazet <edumazet(a)google.com>)
- Split wait third ACK functionality into a separated patch from AccECN negotiation patch (Eric Dumazet <edumazet(a)google.com>)
- Add READ_ONCE() over every reads of sysctl for all patches in the series (Eric Dumazet <edumazet(a)google.com>)
- Merge heuristics of AccECN option ceb/cep and ACE field multi-wrap into a single patch
- Add a table of SACK block reduction and required AccECN field in patch #15 commit message (Eric Dumazet <edumazet(a)google.com>)
v9 (21-Jun-2025)
- Use tcp_data_ecn_check() to set TCP_ECN_SEE flag only for RFC3168 ECN (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments about setting TCP_ECN_SEEN flag for RFC3168 and Accruate ECN (Paolo Abeni <pabeni(a)redhat.com>)
- Restruct the code in the for loop of tcp_accecn_process_option() (Paolo Abeni <pabeni(a)redhat.com>)
- Remove ecn_bytes and add use_synack_ecn_bytes flag to identify whether syn_ack_bytes or received_ecn_bytes is used (Paolo Abeni <pabeni(a)redhat.com>)
- Replace leftover_bytes and leftover_size with leftover_highbyte and leftover_lowbyte and add comments in tcp_options_write() (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments and commit message about the 1st retx SYN still attempt AccECN negotiation (Paolo Abeni <pabeni(a)redhat.com>)
v8 (10-Jun-2025)
- Add new helper function tcp_ecn_received_counters_payload() in #6 (Paolo Abeni <pabeni(a)redhat.com>)
- Set opts->num_sack_blocks=0 to avoid potential undefined value in #8 (Paolo Abeni <pabeni(a)redhat.com>)
- Reset leftover_size to 2 once leftover_bytes is used in #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Add new helper function tcp_accecn_opt_demand_min() in #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Add new helper function tcp_accecn_saw_opt_fail_recv() in #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Update tcp_options_fit_accecn() to avoid using recursion in #14 (Paolo Abeni <pabeni(a)redhat.com>)
v7 (14-May-2025)
- Modify group sizes of tcp_sock_write_txrx and tcp_sock_write_rx in #3 based on pahole results (Paolo Abeni <pabeni(a)redhat.com>)
- Fix the issue in #4 and #5 where the RFC3168 ECN behavior in tcp_ecn_send() is changed (Paolo Abeni <pabeni(a)redhat.com>)
- Modify group size of tcp_sock_write_txrx in #4 and #6 based on pahole results (Paolo Abeni <pabeni(a)redhat.com>)
- Update commit message for #9 to explain the increase in tcp_sock_write_rx group size
- Modify group size of tcp_sock_write_tx in #10 based on pahole results
v6 (09-May-2025)
- Add #3 to utilize exisintg holes of tcp_sock_write_txrx group for later patches (#4, #9, #10) with new u8 members (Paolo Abeni <pabeni(a)redhat.com>)
- Add pahole outcomes before and after commit in #4, #5, #6, #9, #10, #15 (Paolo Abeni <pabeni(a)redhat.com>)
- Define new helper function tcp_send_ack_reflect_ect() for sending ACK with reflected ECT in #5 (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments for function tcp_ecn_rcv_synack() in #5 (Paolo Abeni <pabeni(a)redhat.com>)
- Add enum/define to be used by sysctl_tcp_ecn in #5, sysctl_tcp_ecn_option in #9, and sysctl_tcp_ecn_option_beacon in #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Move accecn_fail_mode and saw_accecn_opt in #5 and #11 to use exisintg holes of tcp_sock (Paolo Abeni <pabeni(a)redhat.com>)
- Change data type of new members of tcp_request_sock and move them to the end of struct in #5 and #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Move new members of tcp_info to the end of struct in #6 (Paolo Abeni <pabeni(a)redhat.com>)
- Merge previous #7 into #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Mask ecnfield with INET_ECN_MASK to remove WARN_ONCE in #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Reduce the indentation levels for reabability in #9 and #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Move delivered_ecn_bytes to the RX group in #9, accecn_opt_tstamp to the TX group in #10, pkts_acked_ewma to the RX group in #15 (Paolo Abeni <pabeni(a)redhat.com>)
- Add changes in Documentation/networking/net_cachelines/tcp_sock.rst for new tcp_sock members in #3, #5, #6, #9, #10, #15
v5 (22-Apr-2025)
- Further fix for 32-bit ARM alignment in tcp.c (Simon Horman <horms(a)kernel.org>)
v4 (18-Apr-2025)
- Fix 32-bit ARM assertion for alignment requirement (Simon Horman <horms(a)kernel.org>)
v3 (14-Apr-2025)
- Fix patch apply issue in v2 (Jakub Kicinski <kuba(a)kernel.org>)
v2 (18-Mar-2025)
- Add one missing patch from the previous AccECN protocol preparation patch series to this patch series.
---
Chia-Yu Chang (5):
tcp: reorganize tcp_sock_write_txrx group for variables later
tcp: ecn functions in separated include file
tcp: accecn: AccECN option send control
tcp: accecn: AccECN option failure handling
tcp: accecn: try to fit AccECN option with SACK
Ilpo Järvinen (9):
tcp: reorganize SYN ECN code
tcp: fast path functions later
tcp: AccECN core
tcp: accecn: AccECN negotiation
tcp: accecn: add AccECN rx byte counters
tcp: accecn: AccECN needs to know delivered bytes
tcp: sack option handling improvements
tcp: accecn: AccECN option
tcp: accecn: AccECN option ceb/cep and ACE field multi-wrap heuristics
Documentation/networking/ip-sysctl.rst | 55 +-
.../networking/net_cachelines/tcp_sock.rst | 12 +
include/linux/tcp.h | 32 +-
include/net/netns/ipv4.h | 2 +
include/net/tcp.h | 87 ++-
include/net/tcp_ecn.h | 642 ++++++++++++++++++
include/uapi/linux/tcp.h | 9 +
net/ipv4/syncookies.c | 4 +
net/ipv4/sysctl_net_ipv4.c | 19 +
net/ipv4/tcp.c | 30 +-
net/ipv4/tcp_input.c | 353 ++++++++--
net/ipv4/tcp_ipv4.c | 8 +-
net/ipv4/tcp_minisocks.c | 40 +-
net/ipv4/tcp_output.c | 294 ++++++--
net/ipv6/syncookies.c | 2 +
net/ipv6/tcp_ipv6.c | 1 +
16 files changed, 1406 insertions(+), 184 deletions(-)
create mode 100644 include/net/tcp_ecn.h
--
2.34.1
There are currently no kernel tests that verify setting and getting
options of the team driver.
In the future, options may be added that implicitly change other
options, which will make it useful to have tests like these that show
nothing breaks. There will be a follow up patch to this that adds new
"rx_enabled" and "tx_enabled" options, which will implicitly affect the
"enabled" option value and vice versa.
The tests use teamnl to first set options to specific values and then
gets them to compare to the set values.
Signed-off-by: Marc Harvey <marcharvey(a)google.com>
---
Changes in v3:
- Applied minor style changes based on v2 feedback.
- Link to v2: https://lore.kernel.org/netdev/20250904015424.1228665-1-marcharvey@google.c…
Changes in v2:
- Fixed shellcheck failures.
- Fixed test failing in vng by adding a config option to enable the
team driver's active backup mode.
- Link to v1: https://lore.kernel.org/netdev/20250902235504.4190036-1-marcharvey@google.c…
.../selftests/drivers/net/team/Makefile | 6 +-
.../testing/selftests/drivers/net/team/config | 1 +
.../selftests/drivers/net/team/options.sh | 188 ++++++++++++++++++
3 files changed, 193 insertions(+), 2 deletions(-)
create mode 100755 tools/testing/selftests/drivers/net/team/options.sh
diff --git a/tools/testing/selftests/drivers/net/team/Makefile b/tools/testing/selftests/drivers/net/team/Makefile
index eaf6938f100e..89d854c7e674 100644
--- a/tools/testing/selftests/drivers/net/team/Makefile
+++ b/tools/testing/selftests/drivers/net/team/Makefile
@@ -1,11 +1,13 @@
# SPDX-License-Identifier: GPL-2.0
# Makefile for net selftests
-TEST_PROGS := dev_addr_lists.sh propagation.sh
+TEST_PROGS := dev_addr_lists.sh propagation.sh options.sh
TEST_INCLUDES := \
../bonding/lag_lib.sh \
../../../net/forwarding/lib.sh \
- ../../../net/lib.sh
+ ../../../net/lib.sh \
+ ../../../net/in_netns.sh \
+ ../../../net/lib/sh/defer.sh
include ../../../lib.mk
diff --git a/tools/testing/selftests/drivers/net/team/config b/tools/testing/selftests/drivers/net/team/config
index 636b3525b679..558e1d0cf565 100644
--- a/tools/testing/selftests/drivers/net/team/config
+++ b/tools/testing/selftests/drivers/net/team/config
@@ -3,4 +3,5 @@ CONFIG_IPV6=y
CONFIG_MACVLAN=y
CONFIG_NETDEVSIM=m
CONFIG_NET_TEAM=y
+CONFIG_NET_TEAM_MODE_ACTIVEBACKUP=y
CONFIG_NET_TEAM_MODE_LOADBALANCE=y
diff --git a/tools/testing/selftests/drivers/net/team/options.sh b/tools/testing/selftests/drivers/net/team/options.sh
new file mode 100755
index 000000000000..44888f32b513
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/team/options.sh
@@ -0,0 +1,188 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# These tests verify basic set and get functionality of the team
+# driver options over netlink.
+
+# Run in private netns.
+test_dir="$(dirname "$0")"
+if [[ $# -eq 0 ]]; then
+ "${test_dir}"/../../../net/in_netns.sh "$0" __subprocess
+ exit $?
+fi
+
+ALL_TESTS="
+ team_test_options
+"
+
+source "${test_dir}/../../../net/lib.sh"
+
+TEAM_PORT="team0"
+MEMBER_PORT="dummy0"
+
+setup()
+{
+ ip link add name "${MEMBER_PORT}" type dummy
+ ip link add name "${TEAM_PORT}" type team
+}
+
+get_and_check_value()
+{
+ local option_name="$1"
+ local expected_value="$2"
+ local port_flag="$3"
+
+ local value_from_get
+
+ if ! value_from_get=$(teamnl "${TEAM_PORT}" getoption "${option_name}" \
+ "${port_flag}"); then
+ echo "Could not get option '${option_name}'" >&2
+ return 1
+ fi
+
+ if [[ "${value_from_get}" != "${expected_value}" ]]; then
+ echo "Incorrect value for option '${option_name}'" >&2
+ echo "get (${value_from_get}) != set (${expected_value})" >&2
+ return 1
+ fi
+}
+
+set_and_check_get()
+{
+ local option_name="$1"
+ local option_value="$2"
+ local port_flag="$3"
+
+ local value_from_get
+
+ if ! teamnl "${TEAM_PORT}" setoption "${option_name}" \
+ "${option_value}" "${port_flag}"; then
+ echo "'setoption ${option_name} ${option_value}' failed" >&2
+ return 1
+ fi
+
+ get_and_check_value "${option_name}" "${option_value}" "${port_flag}"
+ return $?
+}
+
+# Get a "port flag" to pass to the `teamnl` command.
+# E.g. $1="dummy0" -> "port=dummy0",
+# $1="" -> ""
+get_port_flag()
+{
+ local port_name="$1"
+
+ if [[ -n "${port_name}" ]]; then
+ echo "--port=${port_name}"
+ fi
+}
+
+attach_port_if_specified()
+{
+ local port_name="$1"
+
+ if [[ -n "${port_name}" ]]; then
+ ip link set dev "${port_name}" master "${TEAM_PORT}"
+ return $?
+ fi
+}
+
+detach_port_if_specified()
+{
+ local port_name="$1"
+
+ if [[ -n "${port_name}" ]]; then
+ ip link set dev "${port_name}" nomaster
+ return $?
+ fi
+}
+
+# Test that an option's get value matches its set value.
+# Globals:
+# RET - Used by testing infra like `check_err`.
+# EXIT_STATUS - Used by `log_test` for whole script exit value.
+# Arguments:
+# option_name - The name of the option.
+# value_1 - The first value to try setting.
+# value_2 - The second value to try setting.
+# port_name - The (optional) name of the attached port.
+team_test_option()
+{
+ local option_name="$1"
+ local value_1="$2"
+ local value_2="$3"
+ local possible_values="$2 $3 $2"
+ local port_name="$4"
+ local port_flag
+
+ RET=0
+
+ echo "Setting '${option_name}' to '${value_1}' and '${value_2}'"
+
+ attach_port_if_specified "${port_name}"
+ check_err $? "Couldn't attach ${port_name} to master"
+ port_flag=$(get_port_flag "${port_name}")
+
+ # Set and get both possible values.
+ for value in ${possible_values}; do
+ set_and_check_get "${option_name}" "${value}" "${port_flag}"
+ check_err $? "Failed to set '${option_name}' to '${value}'"
+ done
+
+ detach_port_if_specified "${port_name}"
+ check_err $? "Couldn't detach ${port_name} from its master"
+
+ log_test "Set + Get '${option_name}' test"
+}
+
+# Test that getting a non-existant option fails.
+# Globals:
+# RET - Used by testing infra like `check_err`.
+# EXIT_STATUS - Used by `log_test` for whole script exit value.
+# Arguments:
+# option_name - The name of the option.
+# port_name - The (optional) name of the attached port.
+team_test_get_option_fails()
+{
+ local option_name="$1"
+ local port_name="$2"
+ local port_flag
+
+ RET=0
+
+ attach_port_if_specified "${port_name}"
+ check_err $? "Couldn't attach ${port_name} to master"
+ port_flag=$(get_port_flag "${port_name}")
+
+ # Just confirm that getting the value fails.
+ teamnl "${TEAM_PORT}" getoption "${option_name}" "${port_flag}"
+ check_fail $? "Shouldn't be able to get option '${option_name}'"
+
+ detach_port_if_specified "${port_name}"
+
+ log_test "Get '${option_name}' fails"
+}
+
+team_test_options()
+{
+ # Wrong option name behavior.
+ team_test_get_option_fails fake_option1
+ team_test_get_option_fails fake_option2 "${MEMBER_PORT}"
+
+ # Correct set and get behavior.
+ team_test_option mode activebackup loadbalance
+ team_test_option notify_peers_count 0 5
+ team_test_option notify_peers_interval 0 5
+ team_test_option mcast_rejoin_count 0 5
+ team_test_option mcast_rejoin_interval 0 5
+ team_test_option enabled true false "${MEMBER_PORT}"
+ team_test_option user_linkup true false "${MEMBER_PORT}"
+ team_test_option user_linkup_enabled true false "${MEMBER_PORT}"
+ team_test_option priority 10 20 "${MEMBER_PORT}"
+ team_test_option queue_id 0 1 "${MEMBER_PORT}"
+}
+
+require_command teamnl
+setup
+tests_run
+exit "${EXIT_STATUS}"
--
2.51.0.384.g4c02a37b29-goog
From: Zhou Yuhang <zhouyuhang(a)kylinos.cn>
On x86_64, the size of struct flock is 32 bytes,
and the layout of this structure may be as follows:
+------------+ offset 0
| l_type | 2 bytes
+------------+ offset 2
| l_whence | 2 bytes
+------------+ offset 4
| padding | 4 bytes
+------------+ offset 8
| l_start | 8 bytes
+------------+ offset 16
| l_len | 8 bytes
+------------+ offset 24
| l_pid | 4 bytes
+------------+ offset 28
| padding | 4 bytes
+------------+ offset 32
Flock fl and fl2 are not initialized after definition.
The padding bytes in the structure may contain random values,
which could cause memcmp() to return a non-zero value,
potentially leading to test failure. The output is as follows:
# [INFO] opened fds 3 4
# [SUCCESS] set OFD read lock on first fd
# [SUCCESS] read and write locks conflicted
# [SUCCESS] F_UNLCK test returns: locked, type 0 pid -1 len 3
# [FAIL] F_UNLCK test returns: locked, type 0 pid -1 len 3
Initialize them to zero to solve this problem.
Signed-off-by: Zhou Yuhang <zhouyuhang(a)kylinos.cn>
---
changes in v2:
- Add a description of the struct flock layout to the commit message.
---
tools/testing/selftests/filelock/ofdlocks.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/filelock/ofdlocks.c b/tools/testing/selftests/filelock/ofdlocks.c
index a55b79810ab2..84e25505bebb 100644
--- a/tools/testing/selftests/filelock/ofdlocks.c
+++ b/tools/testing/selftests/filelock/ofdlocks.c
@@ -36,6 +36,8 @@ int main(void)
{
int rc;
struct flock fl, fl2;
+ memset(&fl, 0, sizeof(fl));
+ memset(&fl2, 0, sizeof(fl2));
int fd = open("/tmp/aa", O_RDWR | O_CREAT | O_EXCL, 0600);
int fd2 = open("/tmp/aa", O_RDONLY);
--
2.33.0
MADV_COLLAPSE is part of linux/mman.h and needs to be included
for this selftest for glibc compatibility. It is also included
in other tests that use MADV_COLLAPSE.
Fixes: d9c7ff4dae62 ("selftests: prctl: introduce tests for disabling THPs completely")
Reported-by: Mark Brown <broonie(a)kernel.org>
Closes: https://lore.kernel.org/all/c8249725-e91d-4c51-b9bb-40305e61e20d@sirena.org…
Signed-off-by: Usama Arif <usamaarif642(a)gmail.com>
---
tools/testing/selftests/mm/prctl_thp_disable.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mm/prctl_thp_disable.c b/tools/testing/selftests/mm/prctl_thp_disable.c
index feb711dca3a1d..84b4a4b345af5 100644
--- a/tools/testing/selftests/mm/prctl_thp_disable.c
+++ b/tools/testing/selftests/mm/prctl_thp_disable.c
@@ -9,6 +9,7 @@
#include <string.h>
#include <unistd.h>
#include <sys/mman.h>
+#include <linux/mman.h>
#include <sys/prctl.h>
#include <sys/wait.h>
--
2.47.3
Fix a memory leak issue on netpoll and create a netconsole test that exposes
the problem, when run with kmemleak enabled.
This is a merge of two patches I've sent individually and are merged on
the same patchset[1][2].
Link: https://lore.kernel.org/all/20250904-netconsole_torture-v2-0-5775ed5dc366@d… [1]
Link: https://lore.kernel.org/all/20250902165426.6d6cd172@kernel.org/ [2]
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Changes in v3:
- this patchset is a merge of the fix and the selftest together as
recommended by Jakub.
Changes in v2:
- Reuse the netconsole creation from lib_netcons.sh. Thus, refactoring
the create_dynamic_target() (Jakub)
- Move the "wait" to after all the messages has been sent.
- Link to v1: https://lore.kernel.org/r/20250902-netconsole_torture-v1-1-03c6066598e9@deb…
---
Breno Leitao (3):
netpoll: fix incorrect refcount handling causing incorrect cleanup
selftest: netcons: refactor target creation
selftest: netcons: create a torture test
net/core/netpoll.c | 7 +-
tools/testing/selftests/drivers/net/Makefile | 1 +
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 30 +++--
.../selftests/drivers/net/netcons_torture.sh | 127 +++++++++++++++++++++
4 files changed, 152 insertions(+), 13 deletions(-)
---
base-commit: d69eb204c255c35abd9e8cb621484e8074c75eaa
change-id: 20250902-netconsole_torture-8fc23f0aca99
Best regards,
--
Breno Leitao <leitao(a)debian.org>
We recently missed detecting an issue during early testing because
the default (!all) tests would not trigger it and even when running
"all" tests it only would happen sometimes because of races.
So let's allow for an easy way to specify "GUP all pages in a single
call", extend the test matrix and extend our default (!all) tests.
By GUP'ing all pages in a single call, with the default size of 128MiB
we'll cover multiple leaf page tables / PMDs on architectures with sane
THP sizes.
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Signed-off-by: David Hildenbrand <david(a)redhat.com>
---
tools/testing/selftests/mm/gup_test.c | 2 ++
tools/testing/selftests/mm/run_vmtests.sh | 8 +++++---
2 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/gup_test.c b/tools/testing/selftests/mm/gup_test.c
index bdeaac67ff9aa..8900b840c17a7 100644
--- a/tools/testing/selftests/mm/gup_test.c
+++ b/tools/testing/selftests/mm/gup_test.c
@@ -139,6 +139,8 @@ int main(int argc, char **argv)
break;
case 'n':
nr_pages = atoi(optarg);
+ if (nr_pages < 0)
+ nr_pages = size / psize();
break;
case 't':
thp = 1;
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 9e88cc25b9df2..6240e579b3ba5 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -138,7 +138,7 @@ run_gup_matrix() {
# -n: How many pages to fetch together? 512 is special
# because it's default thp size (or 2M on x86), 123 to
# just test partial gup when hit a huge in whatever form
- for num in "-n 1" "-n 512" "-n 123"; do
+ for num in "-n 1" "-n 512" "-n 123" "-n -1"; do
CATEGORY="gup_test" run_test ./gup_test \
$huge $test_cmd $write $share $num
done
@@ -313,9 +313,11 @@ if $RUN_ALL; then
run_gup_matrix
else
# get_user_pages_fast() benchmark
- CATEGORY="gup_test" run_test ./gup_test -u
+ CATEGORY="gup_test" run_test ./gup_test -u -n 1
+ CATEGORY="gup_test" run_test ./gup_test -u -n -1
# pin_user_pages_fast() benchmark
- CATEGORY="gup_test" run_test ./gup_test -a
+ CATEGORY="gup_test" run_test ./gup_test -a -n 1
+ CATEGORY="gup_test" run_test ./gup_test -a -n -1
fi
# Dump pages 0, 19, and 4096, using pin_user_pages:
CATEGORY="gup_test" run_test ./gup_test -ct -F 0x1 0 19 0x1000
--
2.50.1
This series fixes issues in devlink_rate_tc_bw.py selftest that made
its checks unreliable and its documentation inconsistent with the
actual configuration.
V2:
- Dropped the patch that relaxed the total bandwidth check. Jakub
suggested addressing the instability with interval-based measurement
and by migrating to load.py. That will be handled in a follow-up.
- Link to V1: https://lore.kernel.org/netdev/20250831080641.1828455-1-cjubran@nvidia.com/
Thanks
Carolina Jubran (2):
selftests: drv-net: Fix and clarify TC bandwidth split in
devlink_rate_tc_bw.py
selftests: drv-net: Fix tolerance calculation in devlink_rate_tc_bw.py
.../drivers/net/hw/devlink_rate_tc_bw.py | 100 ++++++++----------
1 file changed, 43 insertions(+), 57 deletions(-)
--
2.38.1
┌────────────┐ ┌───────────────────────────────────┐ ┌────────────────┐
│ │ │ │ │ │
│ │ │ PCI Endpoint │ │ PCI Host │
│ │ │ │ │ │
│ │◄──┤ 1.platform_msi_domain_alloc_irqs()│ │ │
│ │ │ │ │ │
│ MSI ├──►│ 2.write_msi_msg() ├──►├─BAR<n> │
│ Controller │ │ update doorbell register address│ │ │
│ │ │ for BAR │ │ │
│ │ │ │ │ 3. Write BAR<n>│
│ │◄──┼───────────────────────────────────┼───┤ │
│ │ │ │ │ │
│ ├──►│ 4.Irq Handle │ │ │
│ │ │ │ │ │
│ │ │ │ │ │
└────────────┘ └───────────────────────────────────┘ └────────────────┘
This patches based on old https://lore.kernel.org/imx/20221124055036.1630573-1-Frank.Li@nxp.com/
Original patch only target to vntb driver. But actually it is common
method.
This patches add new API to pci-epf-core, so any EP driver can use it.
Previous v2 discussion here.
https://lore.kernel.org/imx/20230911220920.1817033-1-Frank.Li@nxp.com/
Changes in v21:
- Align to bar size, try to fix Niklas reported problem.
- Rebase to v6.16-rc5
- Link to v20: https://lore.kernel.org/r/20250709-ep-msi-v20-0-43d56f9bd54a@nxp.com
Changes in v20:
- remove set epf of_node's patch and only support one epf now.
- move imx6's patch to first
- detail change see each patches' change log
- Link to v19: https://lore.kernel.org/r/20250609-ep-msi-v19-0-77362eaa48fa@nxp.com
Changes in v19:
- irq part already in v6.16-rc1, only missed pcie/dts part
- rebase to v6.16-rc1
- update commit message for patch IMMUTABLE check.
- Link to v18: https://lore.kernel.org/r/20250414-ep-msi-v18-0-f69b49917464@nxp.com
Changes in v18:
- pci-ep.yaml: sort property order, fix maxvalue to 0x7ffff for msi-map-mask and
iommu-map-mask
- Link to v17: https://lore.kernel.org/r/20250407-ep-msi-v17-0-633ab45a31d0@nxp.com
Changes in v17:
- move document part to pci-ep.yaml
- Link to v16: https://lore.kernel.org/r/20250404-ep-msi-v16-0-d4919d68c0d0@nxp.com
Changes in v16:
- remove arm64: dts: imx95-19x19-evk: Add PCIe1 endpoint function overlay file
because there are better patches, which under review.
- Add document for pcie-ep msi-map usage
- other change to see each patch's change log
About IMMUTABLE (No change for this part, tglx provide feedback)
> - This IMMUTABLE thing serves no purpose, because you don't randomly
> plug this end-point block on any MSI controller. They come as part
> of an SoC.
"Yes and no. The problem is that the EP implementation is meant to be a
generic library and while GIC-ITS guarantees immutability of the
address/data pair after setup, there are architectures (x86, loongson,
riscv) where the base MSI controller does not and immutability is only
achieved when interrupt remapping is enabled. The latter can be disabled
at boot-time and then the EP implementation becomes a lottery across
affinity changes.
That was my concern about this library implementation and that's why I
asked for a mechanism to ensure that the underlying irqdomain provides a
immutable address/data pair.
So it does not matter for GIC-ITS, but in the larger picture it matters.
Thanks,
tglx
"
So it does not matter for GIC-ITS, but in the larger picture it matters.
- Link to v15: https://lore.kernel.org/r/20250211-ep-msi-v15-0-bcacc1f2b1a9@nxp.com
Changes in v15:
- rebase to v6.14-rc1
- fix build issue find by kernel test robot
- Link to v14: https://lore.kernel.org/r/20250207-ep-msi-v14-0-9671b136f2b8@nxp.com
Changes in v14:
Marc Zyngier raised concerns about adding DOMAIN_BUS_DEVICE_PCI_EP_MSI. As
a result, the approach has been reverted to the v9 method. However, there
are several improvements:
MSI now supports msi-map in addition to msi-parent.
- The struct device: id is used as the endpoint function (EPF) device
identity to map to the stream ID (sideband information).
- The EPC device tree source (DTS) utilizes msi-map to provide such
information.
- The EPF device's of_node is set to the EPC controller’s node. This
approach is commonly used for multi-function device (MFD) platform child
devices, allowing them to inherit properties from the MFD device’s DTS,
such as reset-cells and gpio-cells. This method is well-suited for the
current case, as the EPF is inherently created/binded to the EPC and
should inherit the EPC’s DTS node properties.
Additionally:
Since the basic IMX95 LUT support has already been merged into the
mainline, a DTS and driver increment patch is added to complete the
solution. The patch is rebased onto the latest linux-next tree and
aligned with the new pcitest framework.
- Link to v13: https://lore.kernel.org/r/20241218-ep-msi-v13-0-646e2192dc24@nxp.com
Changes in v13:
- Change to use DOMAIN_BUS_PCI_DEVICE_EP_MSI
- Change request id as func | vfunc << 3
- Remove IRQ_DOMAIN_MSI_IMMUTABLE
Thomas Gleixner:
I hope capture all your points in review comments. If missed, let me know.
- Link to v12: https://lore.kernel.org/r/20241211-ep-msi-v12-0-33d4532fa520@nxp.com
Changes in v12:
- Change to use IRQ_DOMAIN_MSI_IMMUTABLE and add help function
irq_domain_msi_is_immuatble().
- split PCI: endpoint: pci-ep-msi: Add MSI address/data pair mutable check to 3 patches
- Link to v11: https://lore.kernel.org/r/20241209-ep-msi-v11-0-7434fa8397bd@nxp.com
Changes in v11:
- Change to use MSI_FLAG_MSG_IMMUTABLE
- Link to v10: https://lore.kernel.org/r/20241204-ep-msi-v10-0-87c378dbcd6d@nxp.com
Changes in v10:
Thomas Gleixner:
There are big change in pci-ep-msi.c. I am sure if go on the
corrent path. The key improvement is remove only 1 function devices's
limitation.
I use new patch for imutable check, which relative additional
feature compared to base enablement patch.
- Remove patch Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all()
- Add new patch irqchip/gic-v3-its: Avoid overwriting msi_prepare callback if provided by msi_domain_info
- Remove only support 1 endpoint function limiation.
- Create one MSI domain for each endpoint function devices.
- Use "msi-map" in pci ep controler node, instead of of msi-parent. first
argument is
(func_no << 8 | vfunc_no)
- Link to v9: https://lore.kernel.org/r/20241203-ep-msi-v9-0-a60dbc3f15dd@nxp.com
Changes in v9
- Add patch platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all()
- Remove patch PCI: endpoint: Add pci_epc_get_fn() API for customizable filtering
- Remove API pci_epf_align_inbound_addr_lo_hi
- Move doorbell_alloc in to doorbell_enable function.
- Link to v8: https://lore.kernel.org/r/20241116-ep-msi-v8-0-6f1f68ffd1bb@nxp.com
Changes in v8:
- update helper function name to pci_epf_align_inbound_addr()
- Link to v7: https://lore.kernel.org/r/20241114-ep-msi-v7-0-d4ac7aafbd2c@nxp.com
Changes in v7:
- Add helper function pci_epf_align_addr();
- Link to v6: https://lore.kernel.org/r/20241112-ep-msi-v6-0-45f9722e3c2a@nxp.com
Changes in v6:
- change doorbell_addr to doorbell_offset
- use round_down()
- add Niklas's test by tag
- rebase to pci/endpoint
- Link to v5: https://lore.kernel.org/r/20241108-ep-msi-v5-0-a14951c0d007@nxp.com
Changes in v5:
- Move request_irq to epf test function driver for more flexiable user case
- Add fixed size bar handler
- Some minor improvememtn to see each patches's changelog.
- Link to v4: https://lore.kernel.org/r/20241031-ep-msi-v4-0-717da2d99b28@nxp.com
Changes in v4:
- Remove patch genirq/msi: Add cleanup guard define for msi_lock_descs()/msi_unlock_descs()
- Use new method to avoid compatible problem.
Add new command DOORBELL_ENABLE and DOORBELL_DISABLE.
pcitest -B send DOORBELL_ENABLE first, EP test function driver try to
remap one of BAR_N (except test register bar) to ITS MSI MMIO space. Old
driver don't support new command, so failure return, not side effect.
After test, DOORBELL_DISABLE command send out to recover original map, so
pcitest bar test can pass as normal.
- Other detail change see each patches's change log
- Link to v3: https://lore.kernel.org/r/20241015-ep-msi-v3-0-cedc89a16c1a@nxp.com
Change from v2 to v3
- Fixed manivannan's comments
- Move common part to pci-ep-msi.c and pci-ep-msi.h
- rebase to 6.12-rc1
- use RevID to distingiush old version
mkdir /sys/kernel/config/pci_ep/functions/pci_epf_test/func1
echo 16 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/msi_interrupts
echo 0x080c > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/deviceid
echo 0x1957 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/vendorid
echo 1 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/revid
^^^^^^ to enable platform msi support.
ln -s /sys/kernel/config/pci_ep/functions/pci_epf_test/func1 /sys/kernel/config/pci_ep/controllers/4c380000.pcie-ep
- use new device ID, which identify support doorbell to avoid broken
compatility.
Enable doorbell support only for PCI_DEVICE_ID_IMX8_DB, while other devices
keep the same behavior as before.
EP side RC with old driver RC with new driver
PCI_DEVICE_ID_IMX8_DB no probe doorbell enabled
Other device ID doorbell disabled* doorbell disabled*
* Behavior remains unchanged.
Change from v1 to v2
- Add missed patch for endpont/pci-epf-test.c
- Move alloc and free to epc driver from epf.
- Provide general help function for EPC driver to alloc platform msi irq.
- Fixed manivannan's comments.
Signed-off-by: Frank Li <Frank.Li(a)nxp.com>
---
Frank Li (9):
PCI: imx6: Add helper function imx_pcie_add_lut_by_rid()
PCI: imx6: Add LUT configuration for MSI/IOMMU in Endpoint mode
PCI: endpoint: Add RC-to-EP doorbell support using platform MSI controller
PCI: endpoint: pci-ep-msi: Add MSI address/data pair mutable check
PCI: endpoint: Add pci_epf_align_inbound_addr() helper for address alignment
PCI: endpoint: pci-epf-test: Add doorbell test support
misc: pci_endpoint_test: Add doorbell test case
selftests: pci_endpoint: Add doorbell test case
arm64: dts: imx95: Add msi-map for pci-ep device
Documentation/PCI/endpoint/pci-test-howto.rst | 14 +++
arch/arm64/boot/dts/freescale/imx95.dtsi | 1 +
drivers/misc/pci_endpoint_test.c | 85 ++++++++++++-
drivers/pci/controller/dwc/pci-imx6.c | 25 ++--
drivers/pci/endpoint/Kconfig | 8 ++
drivers/pci/endpoint/Makefile | 1 +
drivers/pci/endpoint/functions/pci-epf-test.c | 136 +++++++++++++++++++++
drivers/pci/endpoint/pci-ep-msi.c | 98 +++++++++++++++
drivers/pci/endpoint/pci-epf-core.c | 36 ++++++
include/linux/pci-ep-msi.h | 28 +++++
include/linux/pci-epf.h | 18 +++
include/uapi/linux/pcitest.h | 1 +
.../selftests/pci_endpoint/pci_endpoint_test.c | 28 +++++
13 files changed, 470 insertions(+), 9 deletions(-)
---
base-commit: d7b8f8e20813f0179d8ef519541a3527e7661d3a
change-id: 20241010-ep-msi-8b4cab33b1be
Best regards,
--
Frank Li <Frank.Li(a)nxp.com>
Here are various unrelated fixes:
- Patch 1: Fix a wrong attribute type in the MPTCP Netlink specs. A fix
for v6.7.
- Patch 2: Avoid mentioning a deprecated MPTCP sysctl knob in the doc. A
fix for v6.15.
- Patch 3: Handle new warnings from ShellCheck v0.11.0. This prevents
some warnings reported by some CIs. If it is not a good material for
'net', please drop it and I can resend it later, targeting 'net-next'.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Matthieu Baerts (NGI0) (3):
netlink: specs: mptcp: fix if-idx attribute type
doc: mptcp: net.mptcp.pm_type is deprecated
selftests: mptcp: shellcheck: support v0.11.0
Documentation/netlink/specs/mptcp_pm.yaml | 2 +-
Documentation/networking/mptcp.rst | 8 ++++----
tools/testing/selftests/net/mptcp/diag.sh | 2 +-
tools/testing/selftests/net/mptcp/mptcp_connect.sh | 2 +-
tools/testing/selftests/net/mptcp/mptcp_join.sh | 2 +-
tools/testing/selftests/net/mptcp/mptcp_sockopt.sh | 2 +-
tools/testing/selftests/net/mptcp/pm_netlink.sh | 5 +++--
tools/testing/selftests/net/mptcp/simult_flows.sh | 2 +-
tools/testing/selftests/net/mptcp/userspace_pm.sh | 2 +-
9 files changed, 14 insertions(+), 13 deletions(-)
---
base-commit: e2a10daba84968f6b5777d150985fd7d6abc9c84
change-id: 20250908-net-mptcp-misc-fixes-6-17-rc5-7550f5f90b66
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
During my testing, I found that guest debugging with 'DR6.BD' does not
work in instruction emulation, as the current code only considers the
guest's DR7. Upon reviewing the code, I also observed that the checks
for the userspace guest debugging feature and the guest's own debugging
feature are repeated in different places during instruction
emulation, but the overall logic is the same. If guest debugging
is enabled, it needs to exit to userspace; otherwise, a #DB
exception needs to be injected into the guest. Therefore, as
suggested by Jiangshan Lai, some cleanup has been done for #DB
handling in instruction emulation in this patchset. A new
function named 'kvm_inject_emulated_db()' is introduced to
consolidate all the checking logic. Moreover, I hope we can make
the #DB interception path use the same function as well.
Additionally, when I looked into the single-step #DB handling in
instruction emulation, I noticed that the interrupt shadow is toggled,
but it is not considered in the single-step #DB injection. This
oversight causes VM entry to fail on VMX (due to pending debug
exceptions checking) or breaks the 'MOV SS' suppressed #DB. For the
latter, I have kept the behavior for now in my patchset, as I need some
suggestions.
Hou Wenlong (7):
KVM: x86: Set guest DR6 by kvm_queue_exception_p() in instruction
emulation
KVM: x86: Check guest debug in DR access instruction emulation
KVM: x86: Only check effective code breakpoint in emulation
KVM: x86: Consolidate KVM_GUESTDBG_SINGLESTEP check into the
kvm_inject_emulated_db()
KVM: VMX: Set 'BS' bit in pending debug exceptions during instruction
emulation
KVM: selftests: Verify guest debug DR7.GD checking during instruction
emulation
KVM: selftests: Verify 'BS' bit checking in pending debug exception
during VM entry
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/emulate.c | 14 +--
arch/x86/kvm/kvm_emulate.h | 7 +-
arch/x86/kvm/vmx/main.c | 9 ++
arch/x86/kvm/vmx/vmx.c | 14 ++-
arch/x86/kvm/vmx/x86_ops.h | 1 +
arch/x86/kvm/x86.c | 109 +++++++++++-------
arch/x86/kvm/x86.h | 7 ++
.../selftests/kvm/include/x86/processor.h | 3 +-
tools/testing/selftests/kvm/x86/debug_regs.c | 64 +++++++++-
11 files changed, 167 insertions(+), 63 deletions(-)
base-commit: ecbcc2461839e848970468b44db32282e5059925
--
2.31.1
Unlike IPv4, IPv6 routing strictly requires the source address to be valid
on the outgoing interface. If the NS target is set to a remote VLAN interface,
and the source address is also configured on a VLAN over a bond interface,
setting the oif to the bond device will fail to retrieve the correct
destination route.
Fix this by not setting the oif to the bond device when retrieving the NS
target destination. This allows the correct destination device (the VLAN
interface) to be determined, so that bond_verify_device_path can return the
proper VLAN tags for sending NS messages.
Reported-by: David Wilder <wilder(a)us.ibm.com>
Closes: https://lore.kernel.org/netdev/aGOKggdfjv0cApTO@fedora/
Suggested-by: Jay Vosburgh <jv(a)jvosburgh.net>
Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets")
Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
v2: split the patch into 2 parts, the kernel change and test update (Jay Vosburgh)
---
drivers/net/bonding/bond_main.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 257333c88710..30cf97f4e814 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3355,7 +3355,6 @@ static void bond_ns_send_all(struct bonding *bond, struct slave *slave)
/* Find out through which dev should the packet go */
memset(&fl6, 0, sizeof(struct flowi6));
fl6.daddr = targets[i];
- fl6.flowi6_oif = bond->dev->ifindex;
dst = ip6_route_output(dev_net(bond->dev), NULL, &fl6);
if (dst->error) {
--
2.50.1
Hello:
This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba(a)kernel.org>:
On Sun, 07 Sep 2025 17:32:41 +0200 you wrote:
> Currently, the MPTCP ADD_ADDR notifications are retransmitted after a
> fixed timeout controlled by the net.mptcp.add_addr_timeout sysctl knob,
> if the corresponding "echo" packets are not received before. This can be
> too slow (or too quick), especially with a too cautious default value
> set to 2 minutes.
>
> - Patch 1: make ADD_ADDR retransmission timeout adaptive, using the
> TCP's retransmission timeout. The corresponding sysctl knob is now
> used as a maximum value.
>
> [...]
Here is the summary with links:
- [net-next,1/3] mptcp: make ADD_ADDR retransmission timeout adaptive
https://git.kernel.org/netdev/net-next/c/30549eebc4d8
- [net-next,2/3] selftests: mptcp: join: tolerate more ADD_ADDR
https://git.kernel.org/netdev/net-next/c/63c31d42cf6f
- [net-next,3/3] selftests: mptcp: join: allow more time to send ADD_ADDR
https://git.kernel.org/netdev/net-next/c/e2cda6343bfe
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
Commit 5c3bf6cba791 ("bonding: assign random address if device address is
same as bond") fixed an issue where, after releasing the first slave and
re-adding it to the bond with fail_over_mac=follow, both the active and
backup slaves could end up with duplicate MAC addresses. To avoid this,
the new slave was assigned a random address.
However, if this happens when adding the very first slave, the bond’s
hardware address is set to match the slave’s. Later, during the
fail_over_mac=follow check, the slave’s MAC is randomized because it
naturally matches the bond, which is incorrect.
The issue is normally hidden since the first slave usually becomes the
active one, which restores the bond's MAC address. However, if another
slave is selected as the initial active interface, the issue becomes visible.
Fix this by assigning a random address only when slaves already exist in
the bond.
Fixes: 5c3bf6cba791 ("bonding: assign random address if device address is same as bond")
Reported-by: Qiuling Ren <qren(a)redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
drivers/net/bonding/bond_main.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 257333c88710..8832bc9f107b 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2132,6 +2132,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
memcpy(ss.__data, bond_dev->dev_addr, bond_dev->addr_len);
} else if (bond->params.fail_over_mac == BOND_FOM_FOLLOW &&
BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP &&
+ bond_has_slaves(bond) &&
memcmp(slave_dev->dev_addr, bond_dev->dev_addr, bond_dev->addr_len) == 0) {
/* Set slave to random address to avoid duplicate mac
* address in later fail over.
--
2.50.1
The pmtu test takes nearly an hour when run on a debug kernel
(10min on a normal kernel, so the debug slow down is quite significant).
NIPA tries to ensure all results are delivered by a certain deadline
so this prevents it from retrying the test in case of a flake.
Looks like one of the slowest operations in the test is calling out
to ./openvswitch/ovs-dpctl.py to remove potential leftover OvS interfaces.
Check whether the interfaces exist in the first place in sysfs,
since it can be done directly in bash it is very fast.
This should save us around 20-30% of the test runtime.
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
tools/testing/selftests/net/pmtu.sh | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh
index 88e914c4eef9..a3323c21f001 100755
--- a/tools/testing/selftests/net/pmtu.sh
+++ b/tools/testing/selftests/net/pmtu.sh
@@ -1089,10 +1089,11 @@ cleanup() {
cleanup_all_ns
- ip link del veth_A-C 2>/dev/null
- ip link del veth_A-R1 2>/dev/null
- cleanup_del_ovs_internal
- cleanup_del_ovs_vswitchd
+ [ -e "/sys/class/net/veth_A-C" ] && ip link del veth_A-C
+ [ -e "/sys/class/net/veth_A-R1" ] && ip link del veth_A-R1
+ [ -e "/sys/class/net/ovs_br0" ] && cleanup_del_ovs_internal
+ [ -e "/sys/class/net/ovs_br0" ] && cleanup_del_ovs_vswitchd
+
rm -f "$tmpoutfile"
}
--
2.51.0
Hi all,
This series updates the drv-net XDP program used by the new xdp.py selftest
to use the bpf_dynptr APIs for packet access.
The selftest itself is unchanged.
The original program accessed packet headers directly via
ctx->data/data_end, implicitly assuming headers are always in the linear
region. That assumption is incorrect for multi-buffer XDP and does not
hold across all drivers. For example, mlx5 with striding RQ can leave the
linear area empty, causing the multi-buffer cases to fail.
Switching to bpf_xdp_load/store_bytes would work but always incurs copies.
Instead, this series adopts bpf_dynptr, which provides safe,
verifier-checked access across both linear and fragmented areas while
avoiding copies.
Amery Hung has also proposed a series [1] that addresses the same issues in
the program, but through the use of bpf_xdp_pull_data. My series is not
intended as a replacement for that work, but rather as an exploration of
another viable solution, each of which may be preferable under different
circumstances.
In cases where the program does not return XDP_PASS, I believe dynptr has
an advantage since it avoids an extra copy. Conversely, when the program
returns XDP_PASS, bpf_xdp_pull_data may be preferable, as the copy will
be performed in any case during skb creation.
It may make sense to split the work into two separate programs, allowing us
to test both solutions independently. Alternatively, we can consider a
combined approach, where the more fitting solution is applied for each use
case. I welcome feedback on which direction would be most useful.
[1] https://lore.kernel.org/all/20250905173352.3759457-1-ameryhung@gmail.com/
Thanks!
Nimrod
Nimrod Oren (5):
selftests: drv-net: Test XDP_TX with bpf_dynptr
selftests: drv-net: Test XDP tail adjustment with bpf_dynptr
selftests: drv-net: Test XDP head adjustment with bpf_dynptr
selftests: drv-net: Adjust XDP header data with bpf_dynptr
selftests: drv-net: Check XDP header data with bpf_dynptr
.../selftests/net/lib/xdp_native.bpf.c | 219 ++++++++----------
1 file changed, 96 insertions(+), 123 deletions(-)
--
2.45.0
This series fixes issues in devlink_rate_tc_bw.py selftest that made
its checks unreliable and its documentation inconsistent with the
actual configuration.
Thanks
Carolina Jubran (3):
selftests: drv-net: Fix and clarify TC bandwidth split in
devlink_rate_tc_bw.py
selftests: drv-net: Fix tolerance calculation in devlink_rate_tc_bw.py
selftests: drv-net: Relax total BW check in devlink_rate_tc_bw.py
.../drivers/net/hw/devlink_rate_tc_bw.py | 102 ++++++++----------
1 file changed, 44 insertions(+), 58 deletions(-)
--
2.38.1
The loop in bench_sockmap_prog_destroy() has two issues:
1. Using 'sizeof(ctx.fds)' as the loop bound results in the number of
bytes, not the number of file descriptors, causing the loop to iterate
far more times than intended.
2. The condition 'ctx.fds[0] > 0' incorrectly checks only the first fd for
all iterations, potentially leaving file descriptors unclosed. Change
it to 'ctx.fds[i] > 0' to check each fd properly.
These fixes ensure correct cleanup of all file descriptors when the
benchmark exits.
Signed-off-by: Jiayuan Chen <jiayuan.chen(a)linux.dev>
Reported-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Closes: https://lore.kernel.org/bpf/aLqfWuRR9R_KTe5e@stanley.mountain/
---
tools/testing/selftests/bpf/benchs/bench_sockmap.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/benchs/bench_sockmap.c b/tools/testing/selftests/bpf/benchs/bench_sockmap.c
index 8ebf563a67a2..cfc072aa7fff 100644
--- a/tools/testing/selftests/bpf/benchs/bench_sockmap.c
+++ b/tools/testing/selftests/bpf/benchs/bench_sockmap.c
@@ -10,6 +10,7 @@
#include <argp.h>
#include "bench.h"
#include "bench_sockmap_prog.skel.h"
+#include "bpf_util.h"
#define FILE_SIZE (128 * 1024)
#define DATA_REPEAT_SIZE 10
@@ -124,8 +125,8 @@ static void bench_sockmap_prog_destroy(void)
{
int i;
- for (i = 0; i < sizeof(ctx.fds); i++) {
- if (ctx.fds[0] > 0)
+ for (i = 0; i < ARRAY_SIZE(ctx.fds); i++) {
+ if (ctx.fds[i] > 0)
close(ctx.fds[i]);
}
--
2.43.0
Two patches here, first fixes the issue where tunnel core doesn't
actually extract DF bit from the outer IP header, even though both
OVS and TC flower allow matching on it. More details in the commit
message.
The second is a selftest for openvswitch that reproduces the issue,
but also just adds some basic coverage for the tunnel metadata
extraction and related openvswitch uAPI.
Ilya Maximets (2):
net: dst_metadata: fix IP_DF bit not extracted from tunnel headers
selftests: openvswitch: add a simple test for tunnel metadata
include/net/dst_metadata.h | 11 ++-
.../selftests/net/openvswitch/openvswitch.sh | 88 +++++++++++++++++--
2 files changed, 90 insertions(+), 9 deletions(-)
--
2.50.1
This patchset introduces a new per-port bonding option: `ad_actor_port_prio`.
It allows users to configure the actor's port priority, which can then be used
by the bonding driver for aggregator selection based on port priority.
This provides finer control over LACP aggregator choice, especially in setups
with multiple eligible aggregators over 2 switches.
v5:
a) rename 'prio' to 'actor_port_prio' in bond_ad_select_tbl (Jay Vosburgh)
b) update document description
v4:
a) fix actor_port_prio minimal value (Jay Vosburgh)
b) fix ad_agg_selection_test comment order (Paolo Abeni)
c) restruct selftest, reduce duplication (Paolo Abeni)
v3:
a) add comments when init slave port_priority (Jonas Gorski)
b) rename ad_lacp_port_prio to lacp_port_prio (Jay Vosburgh)
v2:
a) set default bond option value for port priority (Nikolay Aleksandrov)
b) fix __agg_ports_priority coding style (Nikolay Aleksandrov)
c) fix shellcheck warns
Hangbin Liu (3):
bonding: add support for per-port LACP actor priority
bonding: support aggregator selection based on port priority
selftests: bonding: add test for LACP actor port priority
Documentation/networking/bonding.rst | 25 +++-
drivers/net/bonding/bond_3ad.c | 31 +++++
drivers/net/bonding/bond_netlink.c | 16 +++
drivers/net/bonding/bond_options.c | 45 +++++++-
include/net/bond_3ad.h | 2 +
include/net/bond_options.h | 1 +
include/uapi/linux/if_link.h | 1 +
.../selftests/drivers/net/bonding/Makefile | 3 +-
.../drivers/net/bonding/bond_lacp_prio.sh | 108 ++++++++++++++++++
tools/testing/selftests/net/forwarding/lib.sh | 24 ----
tools/testing/selftests/net/lib.sh | 24 ++++
11 files changed, 247 insertions(+), 33 deletions(-)
create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_lacp_prio.sh
--
2.50.1
The three patches fix the va_high_addr_switch.sh test failure on x86_64.
Patch 1 fixes the hugepage setup issue that nr_hugepages is reset too
early in run_vmtests.sh and break the later va_high_addr_switch testing.
Patch 2 adds hugepage setup in va_high_addr_switch test, so that it can
still work if vm_runtests.sh changes the hugepage setup someday.
Patch 3 fixes the test failure caused by the hint addr align method change
in hugetlb_get_unmapped_area().
Changes in v2:
- patch 1 renames nr_hugepgs_origin to orig_nr_hugepgs
- add a patch 2 to setup hugeapges in va_high_addr_switch test
Chunyu Hu (3):
selftests/mm: fix hugepages cleanup too early
selftests/mm: alloc hugepages in va_high_addr_switch test
selftests/mm: fix va_high_addr_switch.sh failure on x86_64
tools/testing/selftests/mm/run_vmtests.sh | 9 ++++-
.../selftests/mm/va_high_addr_switch.c | 4 +-
.../selftests/mm/va_high_addr_switch.sh | 37 +++++++++++++++++++
3 files changed, 46 insertions(+), 4 deletions(-)
--
2.49.0
This patchset ensures that the number of hugepages is correctly set in the
system so that the uffd-stress test does not fail due to the racy nature of
the test. Patch 1 changes the hugepage constraint in the run_vmtests.sh
script, whereas patch 2 changes the constraint in the test itself.
---
Based on 6.17-rc5.
Dev Jain (2):
selftests/mm/uffd-stress: make test operate on less hugetlb memory
selftests/mm/uffd-stress: stricten constraint on free hugepages needed
before the test
tools/testing/selftests/mm/run_vmtests.sh | 10 +++++++---
tools/testing/selftests/mm/uffd-stress.c | 17 +++++++++++------
2 files changed, 18 insertions(+), 9 deletions(-)
--
2.30.2
This patchset ensures that the number of hugepages is correctly set in the
system so that the uffd-stress test does not fail due to the racy nature of
the test. Patch 1 corrects the hugepage constraint in the run_vmtests.sh
script, whereas patch 2 corrects the constraint in the test itself.
Dev Jain (2):
selftests/mm/uffd-stress: Make test operate on less hugetlb memory
selftests/mm/uffd-stress: Stricten constraint on free hugepages before
the test
tools/testing/selftests/mm/run_vmtests.sh | 2 +-
tools/testing/selftests/mm/uffd-stress.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
--
2.30.2
Some high-level virtual drivers need to compute features from their
lower devices, but each currently has its own implementation and may
miss some feature computations. This patch set introduces a common function
to compute features for such devices.
Currently, bonding, team, and bridge have been updated to use the new
helper.
v2:
a) remove hard_header_len setting. I will set needed_headroom for bond/team
in a separate patch as bridge has it's own ways. (Ido Schimmel)
b) Add test file to Makefile, set RET=0 to a proper location. (Ido Schimmel)
Hangbin Liu (5):
net: add a common function to compute features from lowers devices
bonding: use common function to compute the features
team: use common function to compute the features
net: bridge: use common function to compute the features
selftests/net: add offload checking test for virtual interface
drivers/net/bonding/bond_main.c | 99 +----------
drivers/net/team/team_core.c | 73 +-------
include/linux/netdevice.h | 19 +++
net/bridge/br_if.c | 22 +--
net/core/dev.c | 76 +++++++++
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/config | 2 +
tools/testing/selftests/net/vdev_offload.sh | 176 ++++++++++++++++++++
8 files changed, 285 insertions(+), 183 deletions(-)
create mode 100755 tools/testing/selftests/net/vdev_offload.sh
--
2.50.1
From: Fred Griffoul <griffoul(a)casper.infradead.org>
This patch series addresses performance issues in nested VMX when
handling unmanaged guest memory. Unmanaged guest memory refers to memory
not directly mapped by the kernel (no struct page), such as memory
passed with the mem= parameter or guest_memfd for non-Confidential
Computing (CoCo) VMs.
Current Problem:
During nested VMX operations, the system frequently accesses specific
guest pages during L2 VM entry/exit cycles. The current workflow:
1. kvm_vcpu_map() invokes memremap() for unmanaged memory.
2. The system either directly accesses mapped memory via nested VMX or
passes it to the L2 guest through vmcs02.
3. kvm_vcpu_unmap() invokes memunmap()
This repeated map/unmap cycle creates significant performance overhead
due to expensive remapping operations.
Solution approach:
Our solution replaces kvm_host_map with gfn_to_pfn_cache in nested VMX.
It addresses two distinct types of guest pages.
First, we handle the L1 MSR bitmap page, which requires read-only access
for folding L1 and L0 MSR bitmap. We implement this conversion to
gfn_to_pfn_cache in patch 1.
Second, we tackle system pages, including APIC access, virtual APIC, and
posted interrupt descriptor pages. These pages are more complex as
they're accessed by both nested VMX code _and_ passed to the L2 guest in
vmcs02 fields. This requires to restore and complete the
"guest-uses-pfn" support in pfncache through patches 2 and 3, followed
by implementing kvm_host_map replacement with caches in patch 4.
Testing:
Patch 5 introduces a new selftest to verify cache invalidation and
memslot update functionality.
The changes are available in a git repository at:
git://git.infradead.org/users/griffoul/linux.git tags/nvmx-gpc-v1
Suggested-by: dwmw(a)amazon.co.uk
Fred Griffoul (5):
KVM: nVMX: Implement cache for L1 MSR bitmap
KVM: pfncache: Restore guest-uses-pfn support
KVM: x86: Add nested state validation for pfncache support
KVM: nVMX: Implement cache for L1 APIC pages
KVM: selftests: Add nested VMX APIC cache invalidation test
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/vmx/nested.c | 213 +++++++++---
arch/x86/kvm/vmx/vmx.h | 10 +-
arch/x86/kvm/x86.c | 14 +-
include/linux/kvm_host.h | 34 +-
include/linux/kvm_types.h | 1 +
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../selftests/kvm/x86/vmx_apic_update_test.c | 302 ++++++++++++++++++
virt/kvm/kvm_main.c | 3 +-
virt/kvm/kvm_mm.h | 6 +-
virt/kvm/pfncache.c | 43 ++-
11 files changed, 575 insertions(+), 53 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86/vmx_apic_update_test.c
--
2.51.0
Replace the hardcoded 0xff in test_icr() with the actual number of vcpus
created for the vm. This address the existing TODO and keeps the test
correct if it is ever run with multiple vcpus.
Signed-off-by: Sukrut Heroorkar <hsukrut3(a)gmail.com>
---
tools/testing/selftests/kvm/x86/xapic_state_test.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/x86/xapic_state_test.c b/tools/testing/selftests/kvm/x86/xapic_state_test.c
index fdebff1165c7..4af36682503e 100644
--- a/tools/testing/selftests/kvm/x86/xapic_state_test.c
+++ b/tools/testing/selftests/kvm/x86/xapic_state_test.c
@@ -56,6 +56,17 @@ static void x2apic_guest_code(void)
} while (1);
}
+static unsigned int vm_nr_vcpus(struct kvm_vm *vm)
+{
+ struct kvm_vcpu *vcpu;
+ unsigned int count = 0;
+
+ list_for_each_entry(vcpu, &vm->vcpus, list)
+ count++;
+
+ return count;
+}
+
static void ____test_icr(struct xapic_vcpu *x, uint64_t val)
{
struct kvm_vcpu *vcpu = x->vcpu;
@@ -124,7 +135,7 @@ static void test_icr(struct xapic_vcpu *x)
* vCPUs, not vcpu.id + 1. Arbitrarily use vector 0xff.
*/
icr = APIC_INT_ASSERT | 0xff;
- for (i = 0; i < 0xff; i++) {
+ for (i = 0; i < vm_nr_vcpus(vcpu->vm); i++) {
if (i == vcpu->id)
continue;
for (j = 0; j < 8; j++)
--
2.43.0
Recent changes to make netlink socket memory accounting must
have broken the implicit assumption of the netlink-dump test
that we can fit exactly 64 dumps into the socket. Handle the
failure mode properly, and increase the dump count to 80
to make sure we still run into the error condition if
the default buffer size increases in the future.
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
tools/testing/selftests/net/netlink-dumps.c | 43 ++++++++++++++++-----
1 file changed, 33 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/net/netlink-dumps.c b/tools/testing/selftests/net/netlink-dumps.c
index 07423f256f96..7618ebe528a4 100644
--- a/tools/testing/selftests/net/netlink-dumps.c
+++ b/tools/testing/selftests/net/netlink-dumps.c
@@ -31,9 +31,18 @@ struct ext_ack {
const char *str;
};
-/* 0: no done, 1: done found, 2: extack found, -1: error */
-static int nl_get_extack(char *buf, size_t n, struct ext_ack *ea)
+enum get_ea_ret {
+ ERROR = -1,
+ NO_CTRL = 0,
+ FOUND_DONE,
+ FOUND_ERR,
+ FOUND_EXTACK,
+};
+
+static enum get_ea_ret
+nl_get_extack(char *buf, size_t n, struct ext_ack *ea)
{
+ enum get_ea_ret ret = NO_CTRL;
const struct nlmsghdr *nlh;
const struct nlattr *attr;
ssize_t rem;
@@ -41,15 +50,19 @@ static int nl_get_extack(char *buf, size_t n, struct ext_ack *ea)
for (rem = n; rem > 0; NLMSG_NEXT(nlh, rem)) {
nlh = (struct nlmsghdr *)&buf[n - rem];
if (!NLMSG_OK(nlh, rem))
- return -1;
+ return ERROR;
- if (nlh->nlmsg_type != NLMSG_DONE)
+ if (nlh->nlmsg_type == NLMSG_ERROR)
+ ret = FOUND_ERR;
+ else if (nlh->nlmsg_type == NLMSG_DONE)
+ ret = FOUND_DONE;
+ else
continue;
ea->err = -*(int *)NLMSG_DATA(nlh);
if (!(nlh->nlmsg_flags & NLM_F_ACK_TLVS))
- return 1;
+ return ret;
ynl_attr_for_each(attr, nlh, sizeof(int)) {
switch (ynl_attr_type(attr)) {
@@ -68,10 +81,10 @@ static int nl_get_extack(char *buf, size_t n, struct ext_ack *ea)
}
}
- return 2;
+ return FOUND_EXTACK;
}
- return 0;
+ return ret;
}
static const struct {
@@ -99,9 +112,9 @@ static const struct {
TEST(dump_extack)
{
int netlink_sock;
+ int i, cnt, ret;
char buf[8192];
int one = 1;
- int i, cnt;
ssize_t n;
netlink_sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
@@ -118,7 +131,7 @@ TEST(dump_extack)
ASSERT_EQ(n, 0);
/* Dump so many times we fill up the buffer */
- cnt = 64;
+ cnt = 80;
for (i = 0; i < cnt; i++) {
n = send(netlink_sock, &dump_neigh_bad,
sizeof(dump_neigh_bad), 0);
@@ -140,10 +153,20 @@ TEST(dump_extack)
}
ASSERT_GE(n, (ssize_t)sizeof(struct nlmsghdr));
- EXPECT_EQ(nl_get_extack(buf, n, &ea), 2);
+ ret = nl_get_extack(buf, n, &ea);
+ /* Once we fill the buffer we'll see one ENOBUFS followed
+ * by a number of EBUSYs. Then the last recv() will finally
+ * trigger and complete the dump.
+ */
+ if (ret == FOUND_ERR && (ea.err == ENOBUFS || ea.err == EBUSY))
+ continue;
+ EXPECT_EQ(ret, FOUND_EXTACK);
+ EXPECT_EQ(ea.err, EINVAL);
EXPECT_EQ(ea.attr_offs,
sizeof(struct nlmsghdr) + sizeof(struct ndmsg));
}
+ /* Make sure last message was a full DONE+extack */
+ EXPECT_EQ(ret, FOUND_EXTACK);
}
static const struct {
--
2.51.0
The arm64 Guarded Control Stack (GCS) feature provides support for
hardware protected stacks of return addresses, intended to provide
hardening against return oriented programming (ROP) attacks and to make
it easier to gather call stacks for applications such as profiling.
When GCS is active a secondary stack called the Guarded Control Stack is
maintained, protected with a memory attribute which means that it can
only be written with specific GCS operations. The current GCS pointer
can not be directly written to by userspace. When a BL is executed the
value stored in LR is also pushed onto the GCS, and when a RET is
executed the top of the GCS is popped and compared to LR with a fault
being raised if the values do not match. GCS operations may only be
performed on GCS pages, a data abort is generated if they are not.
The combination of hardware enforcement and lack of extra instructions
in the function entry and exit paths should result in something which
has less overhead and is more difficult to attack than a purely software
implementation like clang's shadow stacks.
This series implements support for managing GCS for KVM guests, it also
includes a fix for S1PIE which has also been sent separately as this
feature is a dependency for GCS. It is based on:
https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/gcs
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v15:
- Rebase onto v6.17-rc1.
- Link to v14: https://lore.kernel.org/r/20241005-arm64-gcs-v14-0-59060cd6092b@kernel.org
Changes in v14:
- Rebase onto arm64/for-next/gcs which includes all the non-KVM support.
- Manage the fine grained traps for GCS instructions.
- Manage PSTATE.EXLOCK when delivering exceptions to KVM guests.
- Link to v13: https://lore.kernel.org/r/20241001-arm64-gcs-v13-0-222b78d87eee@kernel.org
Changes in v13:
- Rebase onto v6.12-rc1.
- Allocate VM_HIGH_ARCH_6 since protection keys used all the existing
bits.
- Implement mm_release() and free transparently allocated GCSs there.
- Use bit 32 of AT_HWCAP for GCS due to AT_HWCAP2 being filled.
- Since we now only set GCSCRE0_EL1 on change ensure that it is
initialised with GCSPR_EL0 accessible to EL0.
- Fix OOM handling on thread copy.
- Link to v12: https://lore.kernel.org/r/20240829-arm64-gcs-v12-0-42fec947436a@kernel.org
Changes in v12:
- Clarify and simplify the signal handling code so we work with the
register state.
- When checking for write aborts to shadow stack pages ensure the fault
is a data abort.
- Depend on !UPROBES.
- Comment cleanups.
- Link to v11: https://lore.kernel.org/r/20240822-arm64-gcs-v11-0-41b81947ecb5@kernel.org
Changes in v11:
- Remove the dependency on the addition of clone3() support for shadow
stacks, rebasing onto v6.11-rc3.
- Make ID_AA64PFR1_EL1.GCS writeable in KVM.
- Hide GCS registers when GCS is not enabled for KVM guests.
- Require HCRX_EL2.GCSEn if booting at EL1.
- Require that GCSCR_EL1 and GCSCRE0_EL1 be initialised regardless of
if we boot at EL2 or EL1.
- Remove some stray use of bit 63 in signal cap tokens.
- Warn if we see a GCS with VM_SHARED.
- Remove rdundant check for VM_WRITE in fault handling.
- Cleanups and clarifications in the ABI document.
- Clean up and improve documentation of some sync placement.
- Only set the EL0 GCS mode if it's actually changed.
- Various minor fixes and tweaks.
- Link to v10: https://lore.kernel.org/r/20240801-arm64-gcs-v10-0-699e2bd2190b@kernel.org
Changes in v10:
- Fix issues with THP.
- Tighten up requirements for initialising GCSCR*.
- Only generate GCS signal frames for threads using GCS.
- Only context switch EL1 GCS registers if S1PIE is enabled.
- Move context switch of GCSCRE0_EL1 to EL0 context switch.
- Make GCS registers unconditionally visible to userspace.
- Use FHU infrastructure.
- Don't change writability of ID_AA64PFR1_EL1 for KVM.
- Remove unused arguments from alloc_gcs().
- Typo fixes.
- Link to v9: https://lore.kernel.org/r/20240625-arm64-gcs-v9-0-0f634469b8f0@kernel.org
Changes in v9:
- Rebase onto v6.10-rc3.
- Restructure and clarify memory management fault handling.
- Fix up basic-gcs for the latest clone3() changes.
- Convert to newly merged KVM ID register based feature configuration.
- Fixes for NV traps.
- Link to v8: https://lore.kernel.org/r/20240203-arm64-gcs-v8-0-c9fec77673ef@kernel.org
Changes in v8:
- Invalidate signal cap token on stack when consuming.
- Typo and other trivial fixes.
- Don't try to use process_vm_write() on GCS, it intentionally does not
work.
- Fix leak of thread GCSs.
- Rebase onto latest clone3() series.
- Link to v7: https://lore.kernel.org/r/20231122-arm64-gcs-v7-0-201c483bd775@kernel.org
Changes in v7:
- Rebase onto v6.7-rc2 via the clone3() patch series.
- Change the token used to cap the stack during signal handling to be
compatible with GCSPOPM.
- Fix flags for new page types.
- Fold in support for clone3().
- Replace copy_to_user_gcs() with put_user_gcs().
- Link to v6: https://lore.kernel.org/r/20231009-arm64-gcs-v6-0-78e55deaa4dd@kernel.org
Changes in v6:
- Rebase onto v6.6-rc3.
- Add some more gcsb_dsync() barriers following spec clarifications.
- Due to ongoing discussion around clone()/clone3() I've not updated
anything there, the behaviour is the same as on previous versions.
- Link to v5: https://lore.kernel.org/r/20230822-arm64-gcs-v5-0-9ef181dd6324@kernel.org
Changes in v5:
- Don't map any permissions for user GCSs, we always use EL0 accessors
or use a separate mapping of the page.
- Reduce the standard size of the GCS to RLIMIT_STACK/2.
- Enforce a PAGE_SIZE alignment requirement on map_shadow_stack().
- Clarifications and fixes to documentation.
- More tests.
- Link to v4: https://lore.kernel.org/r/20230807-arm64-gcs-v4-0-68cfa37f9069@kernel.org
Changes in v4:
- Implement flags for map_shadow_stack() allowing the cap and end of
stack marker to be enabled independently or not at all.
- Relax size and alignment requirements for map_shadow_stack().
- Add more blurb explaining the advantages of hardware enforcement.
- Link to v3: https://lore.kernel.org/r/20230731-arm64-gcs-v3-0-cddf9f980d98@kernel.org
Changes in v3:
- Rebase onto v6.5-rc4.
- Add a GCS barrier on context switch.
- Add a GCS stress test.
- Link to v2: https://lore.kernel.org/r/20230724-arm64-gcs-v2-0-dc2c1d44c2eb@kernel.org
Changes in v2:
- Rebase onto v6.5-rc3.
- Rework prctl() interface to allow each bit to be locked independently.
- map_shadow_stack() now places the cap token based on the size
requested by the caller not the actual space allocated.
- Mode changes other than enable via ptrace are now supported.
- Expand test coverage.
- Various smaller fixes and adjustments.
- Link to v1: https://lore.kernel.org/r/20230716-arm64-gcs-v1-0-bf567f93bba6@kernel.org
---
Mark Brown (6):
arm64/gcs: Ensure FGTs for EL1 GCS instructions are disabled
KVM: arm64: Manage GCS access and registers for guests
KVM: arm64: Forward GCS exceptions to nested guests
KVM: arm64: Set PSTATE.EXLOCK when entering an exception
KVM: arm64: Allow GCS to be enabled for guests
KVM: selftests: arm64: Add GCS registers to get-reg-list
arch/arm64/include/asm/el2_setup.h | 4 +++
arch/arm64/include/asm/kvm_emulate.h | 3 ++
arch/arm64/include/asm/kvm_host.h | 14 +++++++++
arch/arm64/include/asm/vncr_mapping.h | 2 ++
arch/arm64/include/uapi/asm/ptrace.h | 1 +
arch/arm64/kvm/handle_exit.c | 14 +++++++--
arch/arm64/kvm/hyp/exception.c | 37 ++++++++++++++++++++++++
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 31 ++++++++++++++++++++
arch/arm64/kvm/hyp/vhe/sysreg-sr.c | 10 +++++++
arch/arm64/kvm/sys_regs.c | 32 ++++++++++++++++++--
tools/testing/selftests/kvm/arm64/get-reg-list.c | 12 ++++++++
11 files changed, 155 insertions(+), 5 deletions(-)
---
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
change-id: 20230303-arm64-gcs-e311ab0d8729
Best regards,
--
Mark Brown <broonie(a)kernel.org>
+lists
Please keep discussions on-list unless there's something that can't/shouldn't be
posted publicly, e.g. for confidentiality or security reasons.
On Tue, Sep 02, 2025, Faruqui, Aqib wrote:
> I suppose a fix for blindly using PAGE_SIZE in subsequent macros:
>
> #ifdef PAGE_SIZE
> #undef PAGE_SIZE
> #endif
> #define PAGE_SIZE (1ULL << PAGE_SHIFT)
>
> Is no better and is instead blindly suppressing the compiler's redefinition warning.
>
> I'm having trouble finding what causes the conflict, any advice here?
Maybe try a newer compiler? E.g. gcc-14.2 will spit out the exact location of the
previous definition.
In file included from include/x86/svm_util.h:13,
from include/x86/sev.h:15,
from lib/x86/sev.c:5:
include/x86/processor.h:373:9: error: "PAGE_SIZE" redefined [-Werror]
373 | #define PAGE_SIZE (1ULL << PAGE_SHIFT)
| ^~~~~~~~~
include/x86/processor.h:370:9: note: this is the location of the previous definition
370 | #define PAGE_SIZE BIT(12)
| ^~~~~~~~~
Fix to use the return value of the function 'chdir("/")' and check if the
return is either 0 (ok) or 1 (not ok, so the test stops).
The patch fies the solves the following errors:
mount-notify_test.c:468:17: warning: ignoring return value of ‘chdir’
declared with attribute ‘warn_unused_result’ [-Wunused-result]
468 | chdir("/");
mount-notify_test_ns.c:489:17: warning: ignoring return value of
‘chdir’ declared with attribute ‘warn_unused_result’ [-Wunused-
result]
489 | chdir("/");
To reproduce the issue, use the command:
make kselftest TARGET=filesystems/statmount
Signed-off-by: Alessandro Zanni <alessandro.zanni87(a)gmail.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
---
.../selftests/filesystems/mount-notify/mount-notify_test.c | 2 +-
.../selftests/filesystems/mount-notify/mount-notify_test_ns.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c b/tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c
index 5a3b0ace1a88..a7f899599d52 100644
--- a/tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c
+++ b/tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c
@@ -458,7 +458,7 @@ TEST_F(fanotify, rmdir)
ASSERT_GE(ret, 0);
if (ret == 0) {
- chdir("/");
+ ASSERT_EQ(0, chdir("/"));
unshare(CLONE_NEWNS);
mount("", "/", NULL, MS_REC|MS_PRIVATE, NULL);
umount2("/a", MNT_DETACH);
diff --git a/tools/testing/selftests/filesystems/mount-notify/mount-notify_test_ns.c b/tools/testing/selftests/filesystems/mount-notify/mount-notify_test_ns.c
index d91946e69591..dc9eb3087a1a 100644
--- a/tools/testing/selftests/filesystems/mount-notify/mount-notify_test_ns.c
+++ b/tools/testing/selftests/filesystems/mount-notify/mount-notify_test_ns.c
@@ -486,7 +486,7 @@ TEST_F(fanotify, rmdir)
ASSERT_GE(ret, 0);
if (ret == 0) {
- chdir("/");
+ ASSERT_EQ(0, chdir("/"));
unshare(CLONE_NEWNS);
mount("", "/", NULL, MS_REC|MS_PRIVATE, NULL);
umount2("/a", MNT_DETACH);
--
2.43.0
Fix to use the return value of the function 'chdir("/")' and check if the
return is either 0 (ok) or 1 (not ok, so the test stops).
The patch fies the solves the following errors:
mount-notify_test.c:468:17: warning: ignoring return value of ‘chdir’
declared with attribute ‘warn_unused_result’ [-Wunused-result]
468 | chdir("/");
mount-notify_test_ns.c:489:17: warning: ignoring return value of
‘chdir’ declared with attribute ‘warn_unused_result’ [-Wunused-
result]
489 | chdir("/");
To reproduce the issue, use the command:
make kselftest TARGET=filesystems/statmount
Signed-off-by: Alessandro Zanni <alessandro.zanni87(a)gmail.com>
---
.../selftests/filesystems/mount-notify/mount-notify_test.c | 2 +-
.../selftests/filesystems/mount-notify/mount-notify_test_ns.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c b/tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c
index 5a3b0ace1a88..a7f899599d52 100644
--- a/tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c
+++ b/tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c
@@ -458,7 +458,7 @@ TEST_F(fanotify, rmdir)
ASSERT_GE(ret, 0);
if (ret == 0) {
- chdir("/");
+ ASSERT_EQ(0, chdir("/"));
unshare(CLONE_NEWNS);
mount("", "/", NULL, MS_REC|MS_PRIVATE, NULL);
umount2("/a", MNT_DETACH);
diff --git a/tools/testing/selftests/filesystems/mount-notify/mount-notify_test_ns.c b/tools/testing/selftests/filesystems/mount-notify/mount-notify_test_ns.c
index d91946e69591..dc9eb3087a1a 100644
--- a/tools/testing/selftests/filesystems/mount-notify/mount-notify_test_ns.c
+++ b/tools/testing/selftests/filesystems/mount-notify/mount-notify_test_ns.c
@@ -486,7 +486,7 @@ TEST_F(fanotify, rmdir)
ASSERT_GE(ret, 0);
if (ret == 0) {
- chdir("/");
+ ASSERT_EQ(0, chdir("/"));
unshare(CLONE_NEWNS);
mount("", "/", NULL, MS_REC|MS_PRIVATE, NULL);
umount2("/a", MNT_DETACH);
--
2.43.0
From: Feng Yang <yangfeng(a)kylinos.cn>
The error message printed here only uses the previous err value,
which results in it being printed as 0.
When bpf_map__attach_struct_ops encounters an error,
it uses libbpf_err_ptr(err) to set errno = -err and returns NULL.
Therefore, Using -errno can fix this issue.
Fix before:
run_subtest:FAIL:1019 bpf_map__attach_struct_ops failed for map pro_epilogue: err=0
Fix after:
run_subtest:FAIL:1019 bpf_map__attach_struct_ops failed for map pro_epilogue: err=-9
Signed-off-by: Feng Yang <yangfeng(a)kylinos.cn>
---
Changes in v3:
- Use -errno here directly, thanks: Andrii Nakryiko.
- Link to v2: https://lore.kernel.org/all/20250829014125.198653-1-yangfeng59949@163.com/
---
Changes in v2:
- Use libbpf_get_error, thanks: Alexei Starovoitov.
- Link to v1: https://lore.kernel.org/all/20250828081507.1380218-1-yangfeng59949@163.com/
---
tools/testing/selftests/bpf/test_loader.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_loader.c b/tools/testing/selftests/bpf/test_loader.c
index 78423cf89e01..33d59c093a27 100644
--- a/tools/testing/selftests/bpf/test_loader.c
+++ b/tools/testing/selftests/bpf/test_loader.c
@@ -1083,7 +1083,7 @@ void run_subtest(struct test_loader *tester,
link = bpf_map__attach_struct_ops(map);
if (!link) {
PRINT_FAIL("bpf_map__attach_struct_ops failed for map %s: err=%d\n",
- bpf_map__name(map), err);
+ bpf_map__name(map), -errno);
goto tobj_cleanup;
}
links[links_cnt++] = link;
--
2.25.1
From: Vivek Yadav <vivekyadav1207731111(a)gmail.com>
Hi all,
This small series makes cosmetic style cleanups in the arm64 kselftests
to improve readability and suppress checkpatch warnings. These changes
are purely cosmetic and do not affect functionality.
Changes in this series:
* Suppress unnecessary checkpatch warning in a comment
* Add parentheses around sizeof for clarity
* Remove redundant blank line
---
Vivek Yadav (3):
kselftest/arm64: Remove extra blank line
kselftest/arm64: Supress warning and improve readability
kselftest/arm64: Add parentheses around sizeof for clarity
tools/testing/selftests/arm64/abi/hwcap.c | 1 -
tools/testing/selftests/arm64/bti/assembler.h | 1 -
tools/testing/selftests/arm64/fp/fp-ptrace.c | 1 -
tools/testing/selftests/arm64/fp/fp-stress.c | 4 ++--
tools/testing/selftests/arm64/fp/sve-ptrace.c | 2 +-
tools/testing/selftests/arm64/fp/vec-syscfg.c | 1 -
tools/testing/selftests/arm64/fp/zt-ptrace.c | 1 -
tools/testing/selftests/arm64/gcs/gcs-locking.c | 1 -
8 files changed, 3 insertions(+), 9 deletions(-)
--
2.25.1
Two small cleanups.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (2):
kselftest/arm64/gcs: Correctly check return value when disabling GCS
kselftest/arm64/gcs: Use nolibc's getauxval()
tools/testing/selftests/arm64/gcs/basic-gcs.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
---
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
change-id: 20250821-nolibc-gcs-fixes-11cf7585bb74
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Fix to use the return value of the function 'chdir("/")' and check if the
return is either 0 (ok) or 1 (not ok, so the test stops).
The patch fies the solves the following errors:
mount-notify_test.c:468:17: warning: ignoring return value of ‘chdir’
declared with attribute ‘warn_unused_result’ [-Wunused-result]
468 | chdir("/");
mount-notify_test_ns.c:489:17: warning: ignoring return value of
‘chdir’ declared with attribute ‘warn_unused_result’ [-Wunused-
result]
489 | chdir("/");
To reproduce the issue, use the command:
make kselftest TARGET=filesystems/statmount
Signed-off-by: Alessandro Zanni <alessandro.zanni87(a)gmail.com>
---
.../selftests/filesystems/mount-notify/mount-notify_test.c | 2 +-
.../selftests/filesystems/mount-notify/mount-notify_test_ns.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c b/tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c
index 5a3b0ace1a88..a7f899599d52 100644
--- a/tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c
+++ b/tools/testing/selftests/filesystems/mount-notify/mount-notify_test.c
@@ -458,7 +458,7 @@ TEST_F(fanotify, rmdir)
ASSERT_GE(ret, 0);
if (ret == 0) {
- chdir("/");
+ ASSERT_EQ(0, chdir("/"));
unshare(CLONE_NEWNS);
mount("", "/", NULL, MS_REC|MS_PRIVATE, NULL);
umount2("/a", MNT_DETACH);
diff --git a/tools/testing/selftests/filesystems/mount-notify/mount-notify_test_ns.c b/tools/testing/selftests/filesystems/mount-notify/mount-notify_test_ns.c
index d91946e69591..dc9eb3087a1a 100644
--- a/tools/testing/selftests/filesystems/mount-notify/mount-notify_test_ns.c
+++ b/tools/testing/selftests/filesystems/mount-notify/mount-notify_test_ns.c
@@ -486,7 +486,7 @@ TEST_F(fanotify, rmdir)
ASSERT_GE(ret, 0);
if (ret == 0) {
- chdir("/");
+ ASSERT_EQ(0, chdir("/"));
unshare(CLONE_NEWNS);
mount("", "/", NULL, MS_REC|MS_PRIVATE, NULL);
umount2("/a", MNT_DETACH);
--
2.43.0
This series adds ONE_REG interface for SBI FWFT extension implemented
by KVM RISC-V. This was missed out in accepted SBI FWFT patches for
KVM RISC-V.
These patches can also be found in the riscv_kvm_fwft_one_reg_v3 branch
at: https://github.com/avpatel/linux.git
Changes since v2:
- Re-based on latest KVM RISC-V queue
- Improved FWFT ONE_REG interface to allow enabling/disabling each
FWFT feature from KVM userspace
Changes since v1:
- Dropped have_state in PATCH4 as suggested by Drew
- Added Drew's Reviewed-by in appropriate patches
Anup Patel (6):
RISC-V: KVM: Set initial value of hedeleg in kvm_arch_vcpu_create()
RISC-V: KVM: Introduce feature specific reset for SBI FWFT
RISC-V: KVM: Introduce optional ONE_REG callbacks for SBI extensions
RISC-V: KVM: Move copy_sbi_ext_reg_indices() to SBI implementation
RISC-V: KVM: Implement ONE_REG interface for SBI FWFT state
KVM: riscv: selftests: Add SBI FWFT to get-reg-list test
arch/riscv/include/asm/kvm_vcpu_sbi.h | 22 +-
arch/riscv/include/asm/kvm_vcpu_sbi_fwft.h | 1 +
arch/riscv/include/uapi/asm/kvm.h | 15 ++
arch/riscv/kvm/vcpu.c | 3 +-
arch/riscv/kvm/vcpu_onereg.c | 60 +----
arch/riscv/kvm/vcpu_sbi.c | 172 +++++++++++--
arch/riscv/kvm/vcpu_sbi_fwft.c | 227 ++++++++++++++++--
arch/riscv/kvm/vcpu_sbi_sta.c | 63 +++--
.../selftests/kvm/riscv/get-reg-list.c | 32 +++
9 files changed, 467 insertions(+), 128 deletions(-)
--
2.43.0
When many ADD_ADDR need to be sent, it can take some time to send each
of them, and create new subflows. Some CIs seem to occasionally have
issues with these tests, especially with "debug" kernels.
Two subtests will now run for a slightly longer time: the last two where
3 or more ADD_ADDR are sent during the test.
Reviewed-by: Geliang Tang <geliang(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
tools/testing/selftests/net/mptcp/mptcp_join.sh | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index e9e11a9e60fd5374c8a98c3b7159ccbca8053030..b41cebfa1f921ce9ea6a88a908bf6d5e6027b367 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -2268,7 +2268,8 @@ signal_address_tests()
pm_nl_add_endpoint $ns1 10.0.3.1 flags signal
pm_nl_add_endpoint $ns1 10.0.4.1 flags signal
pm_nl_set_limits $ns2 3 3
- run_tests $ns1 $ns2 10.0.1.1
+ speed=slow \
+ run_tests $ns1 $ns2 10.0.1.1
chk_join_nr 3 3 3
chk_add_nr 3 3
fi
@@ -2280,7 +2281,8 @@ signal_address_tests()
pm_nl_add_endpoint $ns1 10.0.3.1 flags signal
pm_nl_add_endpoint $ns1 10.0.14.1 flags signal
pm_nl_set_limits $ns2 3 3
- run_tests $ns1 $ns2 10.0.1.1
+ speed=slow \
+ run_tests $ns1 $ns2 10.0.1.1
join_syn_tx=3 \
chk_join_nr 1 1 1
chk_add_nr 3 3
--
2.51.0
ADD_ADDR can be retransmitted, and with, the parent commit, these
retransmissions can be sent quicker: from 2 minutes to less than one
second.
To avoid false positives where retransmitted ADD_ADDR causes higher
counters than expected, it is required to be more tolerant. Errors are
now only reported when fewer ADD_ADDRs have been sent/received, except
if no ADD_ADDR are expected.
Before the parent commit, the tolerance was present for each tests where
the ADD_ADDR could be retransmitted in a reasonable time (1 sec). Now
that all tests can have retransmitted ADD_ADDR, it is normal to apply
the same tolerance for all tests.
An alternative could be to disable the ADD_ADDR retransmissions by
default, but that's changing the default kernel behaviour. Plus,
ADD_ADDR retransmissions can be required for some tests. To avoid adding
exceptions to many tests, it seems better to increase the tolerance.
Later, we could add a new MIB counter to identify the ADD_ADDR
retransmissions, and remove the tolerance when this counter is
available.
Reviewed-by: Geliang Tang <geliang(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
tools/testing/selftests/net/mptcp/mptcp_join.sh | 19 +++++++------------
1 file changed, 7 insertions(+), 12 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 2f046167a0b6cc6fb5531a033d8d95c9ea399cf9..e9e11a9e60fd5374c8a98c3b7159ccbca8053030 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -358,6 +358,7 @@ reset_with_add_addr_timeout()
tables="${ip6tables}"
fi
+ # set a maximum, to avoid too long timeout with exponential backoff
ip netns exec $ns1 sysctl -q net.mptcp.add_addr_timeout=1
if ! ip netns exec $ns2 $tables -A OUTPUT -p tcp \
@@ -1669,7 +1670,6 @@ chk_add_nr()
local tx=""
local rx=""
local count
- local timeout
if [[ $ns_invert = "invert" ]]; then
ns_tx=$ns2
@@ -1678,15 +1678,13 @@ chk_add_nr()
rx=" server"
fi
- timeout=$(ip netns exec ${ns_tx} sysctl -n net.mptcp.add_addr_timeout)
-
print_check "add addr rx${rx}"
count=$(mptcp_lib_get_counter ${ns_rx} "MPTcpExtAddAddr")
if [ -z "$count" ]; then
print_skip
- # if the test configured a short timeout tolerate greater then expected
- # add addrs options, due to retransmissions
- elif [ "$count" != "$add_nr" ] && { [ "$timeout" -gt 1 ] || [ "$count" -lt "$add_nr" ]; }; then
+ # Tolerate more ADD_ADDR then expected (if any), due to retransmissions
+ elif [ "$count" != "$add_nr" ] &&
+ { [ "$add_nr" -eq 0 ] || [ "$count" -lt "$add_nr" ]; }; then
fail_test "got $count ADD_ADDR[s] expected $add_nr"
else
print_ok
@@ -1774,18 +1772,15 @@ chk_add_tx_nr()
{
local add_tx_nr=$1
local echo_tx_nr=$2
- local timeout
local count
- timeout=$(ip netns exec $ns1 sysctl -n net.mptcp.add_addr_timeout)
-
print_check "add addr tx"
count=$(mptcp_lib_get_counter ${ns1} "MPTcpExtAddAddrTx")
if [ -z "$count" ]; then
print_skip
- # if the test configured a short timeout tolerate greater then expected
- # add addrs options, due to retransmissions
- elif [ "$count" != "$add_tx_nr" ] && { [ "$timeout" -gt 1 ] || [ "$count" -lt "$add_tx_nr" ]; }; then
+ # Tolerate more ADD_ADDR then expected (if any), due to retransmissions
+ elif [ "$count" != "$add_tx_nr" ] &&
+ { [ "$add_tx_nr" -eq 0 ] || [ "$count" -lt "$add_tx_nr" ]; }; then
fail_test "got $count ADD_ADDR[s] TX, expected $add_tx_nr"
else
print_ok
--
2.51.0
From: Geliang Tang <tanggeliang(a)kylinos.cn>
Currently the ADD_ADDR option is retransmitted with a fixed timeout. This
patch makes the retransmission timeout adaptive by using the maximum RTO
among all the subflows, while still capping it at the configured maximum
value (add_addr_timeout_max). This improves responsiveness when
establishing new subflows.
Specifically:
1. Adds mptcp_adjust_add_addr_timeout() helper to compute the adaptive
timeout.
2. Uses maximum subflow RTO (icsk_rto) when available.
3. Applies exponential backoff based on retransmission count.
4. Maintains fallback to configured max timeout when no RTO data exists.
This slightly changes the behaviour of the MPTCP "add_addr_timeout"
sysctl knob to be used as a maximum instead of a fixed value. But this
is seen as an improvement: the ADD_ADDR might be sent quicker than
before to improve the overall MPTCP connection. Also, the default
value is set to 2 min, which was already way too long, and caused the
ADD_ADDR not to be retransmitted for connections shorter than 2 minutes.
Suggested-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/576
Reviewed-by: Christoph Paasch <cpaasch(a)openai.com>
Signed-off-by: Geliang Tang <tanggeliang(a)kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
v2: no changes.
---
Documentation/networking/mptcp-sysctl.rst | 8 +++++---
net/mptcp/pm.c | 28 ++++++++++++++++++++++++----
2 files changed, 29 insertions(+), 7 deletions(-)
diff --git a/Documentation/networking/mptcp-sysctl.rst b/Documentation/networking/mptcp-sysctl.rst
index 1683c139821e3ba6d9eaa3c59330a523d29f1164..1eb6af26b4a7acdedd575a126c576210a78f0d4d 100644
--- a/Documentation/networking/mptcp-sysctl.rst
+++ b/Documentation/networking/mptcp-sysctl.rst
@@ -8,9 +8,11 @@ MPTCP Sysfs variables
===============================
add_addr_timeout - INTEGER (seconds)
- Set the timeout after which an ADD_ADDR control message will be
- resent to an MPTCP peer that has not acknowledged a previous
- ADD_ADDR message.
+ Set the maximum value of timeout after which an ADD_ADDR control message
+ will be resent to an MPTCP peer that has not acknowledged a previous
+ ADD_ADDR message. A dynamically estimated retransmission timeout based
+ on the estimated connection round-trip-time is used if this value is
+ lower than the maximum one.
Do not retransmit if set to 0.
diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 136a380602cae872b76560649c924330e5f42533..204e1f61212e2be77a8476f024b59be67d04b80a 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -268,6 +268,27 @@ int mptcp_pm_mp_prio_send_ack(struct mptcp_sock *msk,
return -EINVAL;
}
+static unsigned int mptcp_adjust_add_addr_timeout(struct mptcp_sock *msk)
+{
+ const struct net *net = sock_net((struct sock *)msk);
+ unsigned int rto = mptcp_get_add_addr_timeout(net);
+ struct mptcp_subflow_context *subflow;
+ unsigned int max = 0;
+
+ mptcp_for_each_subflow(msk, subflow) {
+ struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+ struct inet_connection_sock *icsk = inet_csk(ssk);
+
+ if (icsk->icsk_rto > max)
+ max = icsk->icsk_rto;
+ }
+
+ if (max && max < rto)
+ rto = max;
+
+ return rto;
+}
+
static void mptcp_pm_add_timer(struct timer_list *timer)
{
struct mptcp_pm_add_entry *entry = timer_container_of(entry, timer,
@@ -292,7 +313,7 @@ static void mptcp_pm_add_timer(struct timer_list *timer)
goto out;
}
- timeout = mptcp_get_add_addr_timeout(sock_net(sk));
+ timeout = mptcp_adjust_add_addr_timeout(msk);
if (!timeout)
goto out;
@@ -307,7 +328,7 @@ static void mptcp_pm_add_timer(struct timer_list *timer)
if (entry->retrans_times < ADD_ADDR_RETRANS_MAX)
sk_reset_timer(sk, timer,
- jiffies + timeout);
+ jiffies + (timeout << entry->retrans_times));
spin_unlock_bh(&msk->pm.lock);
@@ -348,7 +369,6 @@ bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk,
{
struct mptcp_pm_add_entry *add_entry = NULL;
struct sock *sk = (struct sock *)msk;
- struct net *net = sock_net(sk);
unsigned int timeout;
lockdep_assert_held(&msk->pm.lock);
@@ -374,7 +394,7 @@ bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk,
timer_setup(&add_entry->add_timer, mptcp_pm_add_timer, 0);
reset_timer:
- timeout = mptcp_get_add_addr_timeout(net);
+ timeout = mptcp_adjust_add_addr_timeout(msk);
if (timeout)
sk_reset_timer(sk, &add_entry->add_timer, jiffies + timeout);
--
2.51.0
Changes in v2:
- Optimized the logic in descriptions. (Song Liu)
- Created a new header file to declare kfuncs for future extensions included by other files. (Christian Loehle)
- Fixed some logical issues in the code. (Christian Loehle)
Reference:
[1] https://lore.kernel.org/bpf/20250829101137.9507-1-yikai.lin@vivo.com/
Summary
----------
Hi, everyone,
This patch set introduces an extensible cpuidle governor framework
using BPF struct_ops, enabling dynamic implementation of idle-state selection policies
via BPF programs.
Motivation
----------
As is well-known, CPUs support multiple idle states (e.g., C0, C1, C2, ...),
where deeper states reduce power consumption, but results in longer wakeup latency,
potentially affecting performance.
Existing generic cpuidle governors operate effectively in common scenarios
but exhibit suboptimal behavior in specific Android phone's use cases.
Our testing reveals that during low-utilization scenarios
(e.g., screen-off background tasks like music playback with CPU utilization <10%),
the C0 state occupies ~50% of idle time, causing significant energy inefficiency.
Reducing C0 to ≤20% could yield ≥5% power savings on mobile phones.
To address this, we expect:
1.Dynamic governor switching to power-saved policies for low cpu utilization scenarios (e.g., screen-off mode)
2.Dynamic switching to alternate governors for high-performance scenarios (e.g., gaming)
OverView
----------
The BPF cpuidle ext governor registers at postcore_initcall()
but remains disabled by default due to its low priority "rating" with value "1".
Activation requires adjust higer "rating" than other governors within BPF.
Core Components:
1.**struct cpuidle_gov_ext_ops** – BPF-overridable operations:
- ops.enable()/ops.disable(): enable or disable callback
- ops.select(): cpu Idle-state selection logic
- ops.set_stop_tick(): Scheduler tick management after state selection
- ops.reflect(): feedback info about previous idle state.
- ops.init()/ops.deinit(): Initialization or cleanup.
2.**Critical kfuncs for kernel state access**:
- bpf_cpuidle_ext_gov_update_rating():
Activate ext governor by raising rating must be called from "ops.init()"
- bpf_cpuidle_ext_gov_latency_req(): get idle-state latency constraints
- bpf_tick_nohz_get_sleep_length(): get CPU sleep duration in tickless mode
Future work
----------
1. Scenario detection: Identifying low-utilization states (e.g., screen-off + background music)
2. Policy optimization: Optimizing state-selection algorithms for specific scenarios
Is it related to sched_ext?
---------------------------
The cpuidle framework is as follows.
----------------------------------------------------------
Scheduler Core
----------------------------------------------------------
|
v
----------------------------------------------------------
| FAIR Class | EXT Class | IDLE Class |
----------------------------------------------------------
| | | |
| | | v
| | | ------------------------
| | | enter_cpu_idle()
| | | ------------------------
| | | |
| | | v
| | | ------------------------------
| | | | CPUIDLE Governor |
| | | ------------------------------
| | | | | |
| | | v v v
| | |-----------------------------------
| | | default | | other | | BPF ext |
| | | Governor | | Governor | | Governor | <<===Here is the feature we add.
| | |-----------------------------------
| | | | | |
| | | v v v
| | |-------------------------------------
| | | select idle state
| | |-------------------------------------
Whereas cpuidle is invoked after switching to idle class when no tasks are present in the scheduling RQ.
They are not directly related, so implementing kfuncs or other extensions through sched_ext is not feasible.
Lin Yikai (2):
cpuidle: Implement BPF extensible cpuidle governor class
selftests/bpf: Add selftests for cpuidle_gov_ext
drivers/cpuidle/Kconfig | 12 +
drivers/cpuidle/governors/Makefile | 1 +
drivers/cpuidle/governors/ext.c | 537 ++++++++++++++++++
.../bpf/prog_tests/test_cpuidle_gov_ext.c | 28 +
.../selftests/bpf/progs/cpuidle_common.h | 13 +
.../selftests/bpf/progs/cpuidle_gov_ext.c | 200 +++++++
6 files changed, 791 insertions(+)
create mode 100644 drivers/cpuidle/governors/ext.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/test_cpuidle_gov_ext.c
create mode 100644 tools/testing/selftests/bpf/progs/cpuidle_common.h
create mode 100644 tools/testing/selftests/bpf/progs/cpuidle_gov_ext.c
--
2.43.0
From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com>
Hello,
Please find the v16 AccECN protocol patch series, which covers the core
functionality of Accurate ECN, AccECN negotiation, AccECN TCP options,
and AccECN failure handling. The Accurate ECN draft can be found in
https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-accurate-ecn-28, and it
will be RFC9768.
This patch series is part of the full AccECN patch series, which is available at
https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/
Best Regards,
Chia-Yu
---
v16 (6-Sep-2025)
- Use TCP_ECN_IN_ACCECN_OUT_ACCECN, TCP_ECN_IN_ECN_OUT_ECN, and TCP_ECN_IN_ACCECN_OUT_ECN in comments of tcp_ecn_send_syn() (Eric Dumazet <edumazet(a)google.com>)
- Add tcpi_accecn_fail_mode and tcpi_accecn_opt_seen to make tcp_info be multiple of 64 bits in patch #12
v15 (14-Aug-205)
- Update pahole results in commit messages
- Accurate ECN will become RFC9768
v14 (22-Jul-2025)
- Add missing const for struct tcp_sock of tcp_accecn_option_beacon_check() of #11 (Simon Horman <horms(a)kernel.org>)
v13 (18-Jul-2025)
- Implement tcp_accecn_extract_syn_ect() and tcp_accecn_reflector_flags() with static array lookup of patch #6 (Paolo Abeni <pabeni(a)redhat.com>)
- Fix typos in comments of #6 and remove patch #7 of v12 about simulatenous connect (Paolo Abeni <pabeni(a)redhat.com>)
- Move TCP_ACCECN_E1B_INIT_OFFSET, TCP_ACCECN_E0B_INIT_OFFSET, and TCP_ACCECN_CEB_INIT_OFFSET from patch #7 to #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Use static array lookup in tcp_accecn_optfield_to_ecnfield() of patch #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Return false when WARN_ON_ONCE() is true in tcp_accecn_process_option() of patch #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Make synack_ecn_bytes as static const array and use const u32 pointer in tcp_options_write() of #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Use ALIGN() and ALIGN_DOWN() in tcp_options_fit_accecn() to pad TCP AccECN option to dword of #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Return TCP_ACCECN_OPT_FAIL_SEEN if WARN_ON_ONCE() is true in tcp_accecn_option_init() of #12 (Paolo Abeni <pabeni(a)redhat.com>)
v12 (04-Jul-2025)
- Fix compilation issues with some intermediate patches in v11
- Add more comments for AccECN helpers of tcp_ecn.h
v11 (03-Jul-2025)
- Fix compilation issues with some intermediate patches in v10
v10 (02-Jul-2025)
- Add new patch of separated header file include/net/tcp_ecn.h to include ECN and AccECN functions (Eric Dumazet <edumazet(a)google.com>)
- Add comments on the AccECN helper functions in tcp_ecn.h (Eric Dumazet <edumazet(a)google.com>)
- Add documentation of tcp_ecn, tcp_ecn_option, tcp_ecn_beacon in ip-sysctl.rst to the corresponding patch (Eric Dumazet <edumazet(a)google.com>)
- Split wait third ACK functionality into a separated patch from AccECN negotiation patch (Eric Dumazet <edumazet(a)google.com>)
- Add READ_ONCE() over every reads of sysctl for all patches in the series (Eric Dumazet <edumazet(a)google.com>)
- Merge heuristics of AccECN option ceb/cep and ACE field multi-wrap into a single patch
- Add a table of SACK block reduction and required AccECN field in patch #15 commit message (Eric Dumazet <edumazet(a)google.com>)
v9 (21-Jun-2025)
- Use tcp_data_ecn_check() to set TCP_ECN_SEE flag only for RFC3168 ECN (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments about setting TCP_ECN_SEEN flag for RFC3168 and Accruate ECN (Paolo Abeni <pabeni(a)redhat.com>)
- Restruct the code in the for loop of tcp_accecn_process_option() (Paolo Abeni <pabeni(a)redhat.com>)
- Remove ecn_bytes and add use_synack_ecn_bytes flag to identify whether syn_ack_bytes or received_ecn_bytes is used (Paolo Abeni <pabeni(a)redhat.com>)
- Replace leftover_bytes and leftover_size with leftover_highbyte and leftover_lowbyte and add comments in tcp_options_write() (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments and commit message about the 1st retx SYN still attempt AccECN negotiation (Paolo Abeni <pabeni(a)redhat.com>)
v8 (10-Jun-2025)
- Add new helper function tcp_ecn_received_counters_payload() in #6 (Paolo Abeni <pabeni(a)redhat.com>)
- Set opts->num_sack_blocks=0 to avoid potential undefined value in #8 (Paolo Abeni <pabeni(a)redhat.com>)
- Reset leftover_size to 2 once leftover_bytes is used in #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Add new helper function tcp_accecn_opt_demand_min() in #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Add new helper function tcp_accecn_saw_opt_fail_recv() in #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Update tcp_options_fit_accecn() to avoid using recursion in #14 (Paolo Abeni <pabeni(a)redhat.com>)
v7 (14-May-2025)
- Modify group sizes of tcp_sock_write_txrx and tcp_sock_write_rx in #3 based on pahole results (Paolo Abeni <pabeni(a)redhat.com>)
- Fix the issue in #4 and #5 where the RFC3168 ECN behavior in tcp_ecn_send() is changed (Paolo Abeni <pabeni(a)redhat.com>)
- Modify group size of tcp_sock_write_txrx in #4 and #6 based on pahole results (Paolo Abeni <pabeni(a)redhat.com>)
- Update commit message for #9 to explain the increase in tcp_sock_write_rx group size
- Modify group size of tcp_sock_write_tx in #10 based on pahole results
v6 (09-May-2025)
- Add #3 to utilize exisintg holes of tcp_sock_write_txrx group for later patches (#4, #9, #10) with new u8 members (Paolo Abeni <pabeni(a)redhat.com>)
- Add pahole outcomes before and after commit in #4, #5, #6, #9, #10, #15 (Paolo Abeni <pabeni(a)redhat.com>)
- Define new helper function tcp_send_ack_reflect_ect() for sending ACK with reflected ECT in #5 (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments for function tcp_ecn_rcv_synack() in #5 (Paolo Abeni <pabeni(a)redhat.com>)
- Add enum/define to be used by sysctl_tcp_ecn in #5, sysctl_tcp_ecn_option in #9, and sysctl_tcp_ecn_option_beacon in #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Move accecn_fail_mode and saw_accecn_opt in #5 and #11 to use exisintg holes of tcp_sock (Paolo Abeni <pabeni(a)redhat.com>)
- Change data type of new members of tcp_request_sock and move them to the end of struct in #5 and #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Move new members of tcp_info to the end of struct in #6 (Paolo Abeni <pabeni(a)redhat.com>)
- Merge previous #7 into #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Mask ecnfield with INET_ECN_MASK to remove WARN_ONCE in #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Reduce the indentation levels for reabability in #9 and #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Move delivered_ecn_bytes to the RX group in #9, accecn_opt_tstamp to the TX group in #10, pkts_acked_ewma to the RX group in #15 (Paolo Abeni <pabeni(a)redhat.com>)
- Add changes in Documentation/networking/net_cachelines/tcp_sock.rst for new tcp_sock members in #3, #5, #6, #9, #10, #15
v5 (22-Apr-2025)
- Further fix for 32-bit ARM alignment in tcp.c (Simon Horman <horms(a)kernel.org>)
v4 (18-Apr-2025)
- Fix 32-bit ARM assertion for alignment requirement (Simon Horman <horms(a)kernel.org>)
v3 (14-Apr-2025)
- Fix patch apply issue in v2 (Jakub Kicinski <kuba(a)kernel.org>)
v2 (18-Mar-2025)
- Add one missing patch from the previous AccECN protocol preparation patch series to this patch series.
---
Chia-Yu Chang (5):
tcp: reorganize tcp_sock_write_txrx group for variables later
tcp: ecn functions in separated include file
tcp: accecn: AccECN option send control
tcp: accecn: AccECN option failure handling
tcp: accecn: try to fit AccECN option with SACK
Ilpo Järvinen (9):
tcp: reorganize SYN ECN code
tcp: fast path functions later
tcp: AccECN core
tcp: accecn: AccECN negotiation
tcp: accecn: add AccECN rx byte counters
tcp: accecn: AccECN needs to know delivered bytes
tcp: sack option handling improvements
tcp: accecn: AccECN option
tcp: accecn: AccECN option ceb/cep and ACE field multi-wrap heuristics
Documentation/networking/ip-sysctl.rst | 55 +-
.../networking/net_cachelines/tcp_sock.rst | 12 +
include/linux/tcp.h | 32 +-
include/net/netns/ipv4.h | 2 +
include/net/tcp.h | 87 ++-
include/net/tcp_ecn.h | 642 ++++++++++++++++++
include/uapi/linux/tcp.h | 9 +
net/ipv4/syncookies.c | 4 +
net/ipv4/sysctl_net_ipv4.c | 19 +
net/ipv4/tcp.c | 30 +-
net/ipv4/tcp_input.c | 353 ++++++++--
net/ipv4/tcp_ipv4.c | 8 +-
net/ipv4/tcp_minisocks.c | 40 +-
net/ipv4/tcp_output.c | 294 ++++++--
net/ipv6/syncookies.c | 2 +
net/ipv6/tcp_ipv6.c | 1 +
16 files changed, 1406 insertions(+), 184 deletions(-)
create mode 100644 include/net/tcp_ecn.h
--
2.34.1
devmem test fails on NIPA. Most likely we get skb(s) with readable
frags (why?) but the failure manifests as an OOM. The OOM happens
because ncdevmem spams the following message:
recvmsg ret=-1
recvmsg: Bad address
As of today, ncdevmem can't deal with various reasons of EFAULT:
- falling back to regular recvmsg for non-devmem skbs
- increasing ctrl_data size (can't happen with ncdevmem's large buffer)
Exit (cleanly) with error when recvmsg returns EFAULT. This should at
least cause the test to cleanup its state.
Signed-off-by: Stanislav Fomichev <sdf(a)fomichev.me>
---
tools/testing/selftests/drivers/net/hw/ncdevmem.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/drivers/net/hw/ncdevmem.c b/tools/testing/selftests/drivers/net/hw/ncdevmem.c
index 8dc9511d046f..c0a22938bed2 100644
--- a/tools/testing/selftests/drivers/net/hw/ncdevmem.c
+++ b/tools/testing/selftests/drivers/net/hw/ncdevmem.c
@@ -945,6 +945,10 @@ static int do_server(struct memory_buffer *mem)
continue;
if (ret < 0) {
perror("recvmsg");
+ if (errno == EFAULT) {
+ pr_err("received EFAULT, won't recover");
+ goto err_close_client;
+ }
continue;
}
if (ret == 0) {
--
2.51.0
Create a netconsole test that puts a lot of pressure on the netconsole
list manipulation. Do it by creating dynamic targets and deleting
targets while messages are being sent. Also put interface down while the
In order to do it, refactor create_dynamic_target(), so it can be used to
create random targets in the torture test.
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Changes in v2:
- Reuse the netconsole creation from lib_netcons.sh. Thus, refactoring
the create_dynamic_target() (Jakub)
- Move the "wait" to after all the messages has been sent.
- Link to v1: https://lore.kernel.org/r/20250902-netconsole_torture-v1-1-03c6066598e9@deb…
---
Breno Leitao (2):
selftest: netcons: refactor target creation
selftest: netcons: create a torture test
tools/testing/selftests/drivers/net/Makefile | 1 +
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 30 +++--
.../selftests/drivers/net/netcons_torture.sh | 127 +++++++++++++++++++++
3 files changed, 147 insertions(+), 11 deletions(-)
---
base-commit: 2fd4161d0d2547650d9559d57fc67b4e0a26a9e3
change-id: 20250902-netconsole_torture-8fc23f0aca99
Best regards,
--
Breno Leitao <leitao(a)debian.org>