selftests/net/lib.sh contains a suite of iproute2 wrappers that
automatically schedule the corresponding cleanup through defer. The fact
they do so is however not immediately obvious, one needs to know which
functions are handling the deferral behind the scenes, and which expect the
caller to handle cleanups themselves.
A convention for these auto-deferring functions would help both writing and
patch review. This patchset does so by marking these functions with an adf_
prefix. We already have a few such functions: forwarding/lib.sh has
adf_mcd_start() and a few selftests add private helpers that conform to
this convention.
Patches #1 to #8 gradually convert individual functions, one per patch.
Patch #9 renames an auto-deferring private helpers named dfr_* to adf_*.
The plan is not to retro-rename all private helpers, but I happened to know
about this one.
Patches #10 to #12 introduce several autodefer helpers for commonly used
forwarding/lib.sh functions, and opportunistically convert straightforward
instances of 'action; defer counteraction' to the new helpers.
Patch #13 adds some README verbiage to pitch defer and the adf_*
convention.
Petr Machata (13):
selftests: net: lib: Rename ip_link_add() to adf_*
selftests: net: lib: Rename ip_link_set_master() to adf_*
selftests: net: lib: Rename ip_link_set_addr() to adf_*
selftests: net: lib: Rename ip_link_set_up() to adf_*
selftests: net: lib: Rename ip_link_set_down() to adf_*
selftests: net: lib: Rename ip_addr_add() to adf_*
selftests: net: lib: Rename ip_route_add() to adf_*
selftests: net: lib: Rename bridge_vlan_add() to adf_*
selftests: net: vlan_bridge_binding: Rename dfr_set_binding_*() to
adf_*
selftests: forwarding: lib: Add an autodefer variant of vrf_prepare()
selftests: forwarding: lib: Add an autodefer variant of
simple_if_init()
selftests: forwarding: lib: Add an autodefer variant of
forwarding_enable()
selftests: forwarding: README: Mention defer, adf_
.../drivers/net/mlxsw/devlink_trap_policer.sh | 9 +-
.../drivers/net/mlxsw/qos_ets_strict.sh | 12 +-
.../drivers/net/mlxsw/qos_max_descriptors.sh | 9 +-
.../drivers/net/mlxsw/qos_mc_aware.sh | 12 +-
.../drivers/net/mlxsw/sch_red_core.sh | 6 +-
tools/testing/selftests/net/fdb_notify.sh | 26 ++--
tools/testing/selftests/net/forwarding/README | 15 ++
.../net/forwarding/bridge_activity_notify.sh | 21 ++-
.../net/forwarding/bridge_fdb_local_vlan_0.sh | 65 ++++----
tools/testing/selftests/net/forwarding/lib.sh | 18 +++
.../selftests/net/forwarding/sch_ets_core.sh | 9 +-
.../selftests/net/forwarding/sch_red.sh | 12 +-
.../selftests/net/forwarding/sch_tbf_core.sh | 6 +-
.../net/forwarding/vxlan_bridge_1q_mc_ul.sh | 141 +++++++++---------
.../net/forwarding/vxlan_reserved.sh | 33 ++--
tools/testing/selftests/net/lib.sh | 16 +-
.../net/test_vxlan_fdb_changelink.sh | 8 +-
.../selftests/net/vlan_bridge_binding.sh | 44 +++---
18 files changed, 225 insertions(+), 237 deletions(-)
--
2.49.0
Here are some patches for the MPTCP PM, including some refactoring that
I thought it would be best to send at the end of a cycle to avoid
conflicts between net and net-next that could last a few weeks.
The most interesting changes are in the first and last patch, the rest
are patches refactoring the code & tests to validate the modifications.
- Patches 1 & 2: When servers set the C-flag in their MP_CAPABLE to tell
clients not to create subflows to the initial address and port -- e.g.
a deployment behind a L4 load balancer like a typical CDN deployment
-- clients will not use their other endpoints when default settings
are used. That's because the in-kernel path-manager uses the 'subflow'
endpoints to create subflows only to the initial address and port. The
first patch fixes that (for >=v5.14), and the second one validates it.
- Patches 3-14: various patches refactoring the code around the
in-kernel PM (mainly): split too long functions, rename variables and
functions to avoid confusions, reduce structure size, and compare IDs
instead of IP addresses. Note that one patch modifies one internal
variable used in one BPF selftest.
- Patch 15: ability to control endpoints that are used in reaction to a
new address announced by the other peer. With that, endpoints can be
used only once.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Notes:
- Patches 1 & 2 are sent to net-next on purpose: to delay a bit the
backports, just in case. Plus we are at the end of a cycle, and not
to delay the other refactoring patches.
- Sorry, I wanted to send this series earlier on, but due to some
unrelated issues (and holiday), it got delayed. Most patches are
pure refactoring ones.
---
Matthieu Baerts (NGI0) (15):
mptcp: pm: in-kernel: usable client side with C-flag
selftests: mptcp: join: validate C-flag + def limit
mptcp: pm: in-kernel: refactor fill_local_addresses_vec
mptcp: pm: in-kernel: refactor fill_remote_addresses_vec
mptcp: pm: rename 'subflows' to 'extra_subflows'
mptcp: pm: in-kernel: rename 'subflows_max' to 'limit_extra_subflows'
mptcp: pm: in-kernel: rename 'add_addr_signal_max' to 'endp_signal_max'
mptcp: pm: in-kernel: rename 'add_addr_accept_max' to 'limit_add_addr_accepted'
mptcp: pm: in-kernel: rename 'local_addr_max' to 'endp_subflow_max'
mptcp: pm: in-kernel: rename 'local_addr_list' to 'endp_list'
mptcp: pm: in-kernel: rename 'addrs' to 'endpoints'
mptcp: pm: in-kernel: remove stale_loss_cnt
mptcp: pm: in-kernel: reduce pernet struct size
mptcp: pm: in-kernel: compare IDs instead of addresses
mptcp: pm: in-kernel: add laminar endpoints
include/uapi/linux/mptcp.h | 11 +-
net/mptcp/pm.c | 32 +-
net/mptcp/pm_kernel.c | 569 ++++++++++++++--------
net/mptcp/pm_userspace.c | 2 +-
net/mptcp/protocol.h | 21 +-
net/mptcp/sockopt.c | 22 +-
tools/testing/selftests/bpf/progs/mptcp_subflow.c | 2 +-
tools/testing/selftests/net/mptcp/mptcp_join.sh | 11 +
8 files changed, 441 insertions(+), 229 deletions(-)
---
base-commit: a1f1f2422e098485b09e55a492de05cf97f9954d
change-id: 20250925-net-next-mptcp-c-flag-laminar-f8442e4d4bd9
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Fix to avoid cases where the `res` shell variable is
empty in script comparisons.
The comparison has been modified into string comparison to
handle other possible values the variable could assume.
The issue can be reproduced with the command:
make kselftest TARGETS=net
It solves the error:
./tfo_passive.sh: line 98: [: -eq: unary operator expected
Signed-off-by: Alessandro Zanni <alessandro.zanni87(a)gmail.com>
---
Notes:
v2: edit condition to handle strings
tools/testing/selftests/net/tfo_passive.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/tfo_passive.sh b/tools/testing/selftests/net/tfo_passive.sh
index 80bf11fdc046..a4550511830a 100755
--- a/tools/testing/selftests/net/tfo_passive.sh
+++ b/tools/testing/selftests/net/tfo_passive.sh
@@ -95,7 +95,7 @@ wait
res=$(cat $out_file)
rm $out_file
-if [ $res -eq 0 ]; then
+if [ "$res" = "0" ]; then
echo "got invalid NAPI ID from passive TFO socket"
cleanup_ns
exit 1
--
2.43.0
The generic vDSO provides a lot common functionality shared between
different architectures. SPARC is the last architecture not using it,
preventing some necessary code cleanup.
Make use of the generic infrastructure.
Follow-up to and replacement for Arnd's SPARC vDSO removal patches:
https://lore.kernel.org/lkml/20250707144726.4008707-1-arnd@kernel.org/
Tested on a Niagara T4 and QEMU.
This has a semantic conflict with my series "vdso: Reject absolute
relocations during build". The last patch of this series expects all users
of the generic vDSO library to use the vdsocheck tool.
This is not the case (yet) for SPARC64. I do have the patches for the
integration, the specifics will depend on which series is applied first.
Based on tip/timers/vdso.
[0] https://lore.kernel.org/lkml/20250812-vdso-absolute-reloc-v4-0-61a8b615e5ec…
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Changes in v3:
- Allocate vDSO data pages dynamically (and lots of preparations for that)
- Drop clock_getres()
- Fix 32bit clock_gettime() syscall fallback
- Link to v2: https://lore.kernel.org/r/20250815-vdso-sparc64-generic-2-v2-0-b5ff80672347…
Changes in v2:
- Rebase on v6.17-rc1
- Drop RFC state
- Fix typo in commit message
- Drop duplicate 'select GENERIC_TIME_VSYSCALL'
- Merge "sparc64: time: Remove architecture-specific clocksource data" into the
main conversion patch. It violated the check in __clocksource_register_scale()
- Link to v1: https://lore.kernel.org/r/20250724-vdso-sparc64-generic-2-v1-0-e376a3bd24d1…
---
Arnd Bergmann (1):
clocksource: remove ARCH_CLOCKSOURCE_DATA
Thomas Weißschuh (35):
selftests: vDSO: vdso_test_correctness: Handle different tv_usec types
arm64: vDSO: getrandom: Explicitly include asm/alternative.h
arm64: vDSO: gettimeofday: Explicitly include vdso/clocksource.h
arm64: vDSO: compat_gettimeofday: Add explicit includes
ARM: vdso: gettimeofday: Add explicit includes
powerpc/vdso/gettimeofday: Explicitly include vdso/time32.h
powerpc/vdso: Explicitly include asm/cputable.h and asm/feature-fixups.h
LoongArch: vDSO: Explicitly include asm/vdso/vdso.h
MIPS: vdso: Add include guard to asm/vdso/vdso.h
MIPS: vdso: Explicitly include asm/vdso/vdso.h
random: vDSO: Add explicit includes
vdso/gettimeofday: Add explicit includes
vdso/helpers: Explicitly include vdso/processor.h
vdso/datapage: Remove inclusion of gettimeofday.h
vdso/datapage: Trim down unnecessary includes
random: vDSO: trim vDSO includes
random: vDSO: remove ifdeffery
random: vDSO: split out datapage update into helper functions
random: vDSO: only access vDSO datapage after random_init()
s390/time: Set up vDSO datapage later
vdso/datastore: Reduce scope of some variables in vvar_fault()
vdso/datastore: Drop inclusion of linux/mmap_lock.h
vdso/datastore: Map pages through struct page
vdso/datastore: Allocate data pages dynamically
sparc64: vdso: Link with -z noexecstack
sparc64: vdso: Remove obsolete "fake section table" reservation
sparc64: vdso: Replace code patching with runtime conditional
sparc64: vdso: Move hardware counter read into header
sparc64: vdso: Move syscall fallbacks into header
sparc64: vdso: Introduce vdso/processor.h
sparc64: vdso: Switch to the generic vDSO library
sparc64: vdso2c: Drop sym_vvar_start handling
sparc64: vdso2c: Remove symbol handling
sparc64: vdso: Implement clock_gettime64()
clocksource: drop include of asm/clocksource.h from linux/clocksource.h
arch/arm/include/asm/vdso/gettimeofday.h | 2 +
arch/arm64/include/asm/vdso/compat_gettimeofday.h | 3 +
arch/arm64/include/asm/vdso/gettimeofday.h | 2 +
arch/arm64/kernel/vdso/vgetrandom.c | 2 +
arch/loongarch/kernel/process.c | 1 +
arch/loongarch/kernel/vdso.c | 1 +
arch/mips/include/asm/vdso/vdso.h | 5 +
arch/mips/kernel/vdso.c | 1 +
arch/powerpc/include/asm/vdso/gettimeofday.h | 1 +
arch/powerpc/include/asm/vdso/processor.h | 3 +
arch/s390/kernel/time.c | 4 +-
arch/sparc/Kconfig | 3 +-
arch/sparc/include/asm/clocksource.h | 9 -
arch/sparc/include/asm/processor.h | 3 +
arch/sparc/include/asm/processor_32.h | 2 -
arch/sparc/include/asm/processor_64.h | 25 --
arch/sparc/include/asm/vdso.h | 2 -
arch/sparc/include/asm/vdso/clocksource.h | 10 +
arch/sparc/include/asm/vdso/gettimeofday.h | 184 ++++++++++
arch/sparc/include/asm/vdso/processor.h | 41 +++
arch/sparc/include/asm/vdso/vsyscall.h | 10 +
arch/sparc/include/asm/vvar.h | 75 ----
arch/sparc/kernel/Makefile | 1 -
arch/sparc/kernel/time_64.c | 6 +-
arch/sparc/kernel/vdso.c | 69 ----
arch/sparc/vdso/Makefile | 8 +-
arch/sparc/vdso/vclock_gettime.c | 380 ++-------------------
arch/sparc/vdso/vdso-layout.lds.S | 26 +-
arch/sparc/vdso/vdso.lds.S | 2 -
arch/sparc/vdso/vdso2c.c | 24 --
arch/sparc/vdso/vdso2c.h | 45 +--
arch/sparc/vdso/vdso32/vdso32.lds.S | 4 +-
arch/sparc/vdso/vma.c | 274 +--------------
drivers/char/random.c | 75 ++--
include/linux/clocksource.h | 8 -
include/linux/vdso_datastore.h | 6 +
include/vdso/datapage.h | 23 +-
include/vdso/helpers.h | 1 +
init/main.c | 2 +
kernel/time/Kconfig | 4 -
lib/vdso/datastore.c | 73 ++--
lib/vdso/getrandom.c | 3 +
lib/vdso/gettimeofday.c | 17 +
.../testing/selftests/vDSO/vdso_test_correctness.c | 8 +-
44 files changed, 451 insertions(+), 997 deletions(-)
---
base-commit: 5f84f6004e298bd41c9e4ed45c18447954b1dce6
change-id: 20250722-vdso-sparc64-generic-2-25f2e058e92c
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Fix to avoid the usage of the `res` variable uninitialized in the
following macro expansions.
It solves the following warning:
In function ‘iommufd_viommu_vdevice_alloc’,
inlined from ‘wrapper_iommufd_viommu_vdevice_alloc’ at
iommufd.c:2889:1:
../kselftest_harness.h:760:12: warning: ‘ret’ may be used uninitialized
[-Wmaybe-uninitialized]
760 | if (!(__exp _t __seen)) { \
| ^
../kselftest_harness.h:513:9: note: in expansion of macro ‘__EXPECT’
513 | __EXPECT(expected, #expected, seen, #seen, ==, 1)
| ^~~~~~~~
iommufd_utils.h:1057:9: note: in expansion of macro ‘ASSERT_EQ’
1057 | ASSERT_EQ(0, _test_cmd_trigger_vevents(self->fd, dev_id,
nvevents))
| ^~~~~~~~~
iommufd.c:2924:17: note: in expansion of macro
‘test_cmd_trigger_vevents’
2924 | test_cmd_trigger_vevents(dev_id, 3);
| ^~~~~~~~~~~~~~~~~~~~~~~~
The issue can be reproduced, building the tests, with the command:
make -C tools/testing/selftests TARGETS=iommu
Signed-off-by: Alessandro Zanni <alessandro.zanni87(a)gmail.com>
---
tools/testing/selftests/iommu/iommufd_utils.h | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 3c3e08b8c90e..772ca1db6e59 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -1042,15 +1042,13 @@ static int _test_cmd_trigger_vevents(int fd, __u32 dev_id, __u32 nvevents)
.dev_id = dev_id,
},
};
- int ret;
while (nvevents--) {
- ret = ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_TRIGGER_VEVENT),
- &trigger_vevent_cmd);
- if (ret < 0)
+ if (!ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_TRIGGER_VEVENT),
+ &trigger_vevent_cmd))
return -1;
}
- return ret;
+ return 0;
}
#define test_cmd_trigger_vevents(dev_id, nvevents) \
--
2.43.0
This series is preparing to add the -Wsign-compare C compilation flag to
the Makefile for bpf selftests as requested by a TODO to help avoid
implicit type conversions and have predictable behavior.
Changelog:
Changes from v2:
-Split up the patch into a patch series as suggested by vivek
-Include only changes to variable types with no casting by my mentor
david
-Removed the -Wsign-compare in Makefile to avoid compilation errors
until adding casting for rest of comparisons.
Link:https://lore.kernel.org/bpf/20250924195731.6374-1-mehdi.benhadjkhelifa…
Changes from v1:
- Fix CI failed builds where it failed due to do missing .c and
.h files in my patch for working in mainline.
Link:https://lore.kernel.org/bpf/20250924162408.815137-1-mehdi.benhadjkheli…
Mehdi Ben Hadj Khelifa (3):
selftests/bpf: Prepare to add -Wsign-compare for bpf tests
selftests/bpf: Prepare to add -Wsign-compare for bpf tests
selftests/bpf: Prepare to add -Wsign-compare for bpf tests
tools/testing/selftests/bpf/progs/test_global_func11.c | 2 +-
tools/testing/selftests/bpf/progs/test_global_func12.c | 2 +-
tools/testing/selftests/bpf/progs/test_global_func13.c | 2 +-
tools/testing/selftests/bpf/progs/test_global_func9.c | 2 +-
tools/testing/selftests/bpf/progs/test_map_init.c | 2 +-
tools/testing/selftests/bpf/progs/test_parse_tcp_hdr_opt.c | 2 +-
.../selftests/bpf/progs/test_parse_tcp_hdr_opt_dynptr.c | 2 +-
tools/testing/selftests/bpf/progs/test_skb_ctx.c | 2 +-
tools/testing/selftests/bpf/progs/test_snprintf.c | 2 +-
tools/testing/selftests/bpf/progs/test_sockmap_strp.c | 2 +-
tools/testing/selftests/bpf/progs/test_tc_tunnel.c | 2 +-
tools/testing/selftests/bpf/progs/test_xdp.c | 2 +-
tools/testing/selftests/bpf/progs/test_xdp_dynptr.c | 2 +-
tools/testing/selftests/bpf/progs/test_xdp_loop.c | 2 +-
tools/testing/selftests/bpf/progs/test_xdp_noinline.c | 4 ++--
tools/testing/selftests/bpf/progs/uprobe_multi.c | 4 ++--
.../selftests/bpf/progs/uprobe_multi_session_recursive.c | 5 +++--
.../selftests/bpf/progs/verifier_iterating_callbacks.c | 2 +-
18 files changed, 22 insertions(+), 21 deletions(-)
--
2.51.0
Add a basic test suite for drivers that support PSP. Also, add a PSP
implementation in the netdevsim driver.
The netdevsim implementation does encapsulation and decapsulation of
PSP packets, but no crypto.
The tests cover the basic usage of the uapi, and demonstrate key
exchange and connection setup. The tests and netdevsim support IPv4
and IPv6. Here is an example run on a system with a CX7 NIC.
TAP version 13
1..28
ok 1 psp.data_basic_send_v0_ip4
ok 2 psp.data_basic_send_v0_ip6
ok 3 psp.data_basic_send_v1_ip4
ok 4 psp.data_basic_send_v1_ip6
ok 5 psp.data_basic_send_v2_ip4 # SKIP ('PSP version not supported', 'hdr0-aes-gmac-128')
ok 6 psp.data_basic_send_v2_ip6 # SKIP ('PSP version not supported', 'hdr0-aes-gmac-128')
ok 7 psp.data_basic_send_v3_ip4 # SKIP ('PSP version not supported', 'hdr0-aes-gmac-256')
ok 8 psp.data_basic_send_v3_ip6 # SKIP ('PSP version not supported', 'hdr0-aes-gmac-256')
ok 9 psp.data_mss_adjust_ip4
ok 10 psp.data_mss_adjust_ip6
ok 11 psp.dev_list_devices
ok 12 psp.dev_get_device
ok 13 psp.dev_get_device_bad
ok 14 psp.dev_rotate
ok 15 psp.dev_rotate_spi
ok 16 psp.assoc_basic
ok 17 psp.assoc_bad_dev
ok 18 psp.assoc_sk_only_conn
ok 19 psp.assoc_sk_only_mismatch
ok 20 psp.assoc_sk_only_mismatch_tx
ok 21 psp.assoc_sk_only_unconn
ok 22 psp.assoc_version_mismatch
ok 23 psp.assoc_twice
ok 24 psp.data_send_bad_key
ok 25 psp.data_send_disconnect
ok 26 psp.data_stale_key
ok 27 psp.removal_device_rx # XFAIL Test only works on netdevsim
ok 28 psp.removal_device_bi # XFAIL Test only works on netdevsim
# Totals: pass:22 fail:0 xfail:2 xpass:0 skip:4 error:0
#
# Responder logs (0):
# STDERR:
# Set PSP enable on device 1 to 0x3
# Set PSP enable on device 1 to 0x0
CHANGES:
v2:
- fix pylint warnings
- insert CONFIG_INET_PSP in alphebetical order
- use branch to skip all tests
- fix compilation error when CONFIG_INET_PSP is not set
v1: https://lore.kernel.org/netdev/20250924194959.2845473-1-daniel.zahka@gmail.…
Jakub Kicinski (8):
netdevsim: a basic test PSP implementation
selftests: drv-net: base device access API test
selftests: drv-net: add PSP responder
selftests: drv-net: psp: add basic data transfer and key rotation
tests
selftests: drv-net: psp: add association tests
selftests: drv-net: psp: add connection breaking tests
selftests: drv-net: psp: add test for auto-adjusting TCP MSS
selftests: drv-net: psp: add tests for destroying devices
drivers/net/netdevsim/Makefile | 4 +
drivers/net/netdevsim/netdev.c | 55 +-
drivers/net/netdevsim/netdevsim.h | 33 +
drivers/net/netdevsim/psp.c | 234 +++++++
net/core/skbuff.c | 1 +
.../testing/selftests/drivers/net/.gitignore | 1 +
tools/testing/selftests/drivers/net/Makefile | 10 +
tools/testing/selftests/drivers/net/config | 1 +
.../drivers/net/hw/lib/py/__init__.py | 4 +-
.../selftests/drivers/net/lib/py/__init__.py | 4 +-
.../selftests/drivers/net/lib/py/env.py | 5 +
tools/testing/selftests/drivers/net/psp.py | 593 ++++++++++++++++++
.../selftests/drivers/net/psp_responder.c | 483 ++++++++++++++
.../testing/selftests/net/lib/py/__init__.py | 2 +-
tools/testing/selftests/net/lib/py/ksft.py | 10 +
tools/testing/selftests/net/lib/py/ynl.py | 5 +
16 files changed, 1432 insertions(+), 13 deletions(-)
create mode 100644 drivers/net/netdevsim/psp.c
create mode 100755 tools/testing/selftests/drivers/net/psp.py
create mode 100644 tools/testing/selftests/drivers/net/psp_responder.c
--
2.47.3
From: Dylan Yudaken <dyudaken(a)gmail.com>
Add a .gitignore for the test case build object.
Signed-off-by: Dylan Yudaken <dyudaken(a)gmail.com>
Signed-off-by: Sohil Mehta <sohil.mehta(a)intel.com>
Reviewed-by: Simon Horman <horms(a)kernel.org>
---
The binary creates some noise. The patch to fix that seems to have
fallen through the cracks. Sending another revision with an expanded Cc
list.
v2:
- Pick up the review tag
v1: https://lore.kernel.org/all/20250623232549.3263273-1-dyudaken@gmail.com/
---
tools/testing/selftests/kexec/.gitignore | 2 ++
1 file changed, 2 insertions(+)
create mode 100644 tools/testing/selftests/kexec/.gitignore
diff --git a/tools/testing/selftests/kexec/.gitignore b/tools/testing/selftests/kexec/.gitignore
new file mode 100644
index 000000000000..5f3d9e089ae8
--- /dev/null
+++ b/tools/testing/selftests/kexec/.gitignore
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+test_kexec_jump
--
2.43.0
This is v10 of the TDX selftests.
This series is based on v6.17-rc4 and has a dependency on
"KVM: TDX: Force split irqchip for TDX at irqchip creation time" [1]
Changes from v9 [2]:
- Rebased on top of v6.17-rc4.
- Addressed the comments from v9.
- Removed special handling for split irqchip in the test code in favor
for the kvm fix in [1].
- Removed outdated support for VM memory not backed by guest_memfd.
- Split "KVM: selftests: Hook TDX support to vm and vcpu creation" into
4 separate patches.
[1] https://lore.kernel.org/lkml/20250904062007.622530-1-sagis@google.com/
[2] https://lore.kernel.org/lkml/20250821042915.3712925-1-sagis@google.com/
Ackerley Tng (2):
KVM: selftests: Add helpers to init TDX memory and finalize VM
KVM: selftests: Add ucall support for TDX
Erdem Aktas (2):
KVM: selftests: Add TDX boot code
KVM: selftests: Add support for TDX TDCALL from guest
Isaku Yamahata (2):
KVM: selftests: Update kvm_init_vm_address_properties() for TDX
KVM: selftests: TDX: Use KVM_TDX_CAPABILITIES to validate TDs'
attribute configuration
Sagi Shahar (15):
KVM: selftests: Allocate pgd in virt_map() as necessary
KVM: selftests: Expose functions to get default sregs values
KVM: selftests: Expose function to allocate guest vCPU stack
KVM: selftests: Expose segment definitons to assembly files
KVM: selftests: Add kbuild definitons
KVM: selftests: Define structs to pass parameters to TDX boot code
KVM: selftests: Set up TDX boot code region
KVM: selftests: Set up TDX boot parameters region
KVM: selftests: Add helper to initialize TDX VM
KVM: selftests: Call TDX init when creating a new TDX vm
KVM: selftests: Setup memory regions for TDX on vm creation
KVM: selftests: Call KVM_TDX_INIT_VCPU when creating a new TDX vcpu
KVM: selftests: Set entry point for TDX guest code
KVM: selftests: Add wrapper for TDX MMIO from guest
KVM: selftests: Add TDX lifecycle test
tools/include/linux/kbuild.h | 18 +
tools/testing/selftests/kvm/Makefile.kvm | 32 ++
.../selftests/kvm/include/x86/processor.h | 35 ++
.../selftests/kvm/include/x86/processor_asm.h | 12 +
.../selftests/kvm/include/x86/tdx/td_boot.h | 74 ++++
.../kvm/include/x86/tdx/td_boot_asm.h | 16 +
.../selftests/kvm/include/x86/tdx/tdcall.h | 34 ++
.../selftests/kvm/include/x86/tdx/tdx.h | 14 +
.../selftests/kvm/include/x86/tdx/tdx_util.h | 86 +++++
.../testing/selftests/kvm/include/x86/ucall.h | 4 +-
tools/testing/selftests/kvm/lib/kvm_util.c | 10 +-
.../testing/selftests/kvm/lib/x86/processor.c | 91 +++--
.../selftests/kvm/lib/x86/tdx/td_boot.S | 60 +++
.../kvm/lib/x86/tdx/td_boot_offsets.c | 21 ++
.../selftests/kvm/lib/x86/tdx/tdcall.S | 93 +++++
.../kvm/lib/x86/tdx/tdcall_offsets.c | 16 +
tools/testing/selftests/kvm/lib/x86/tdx/tdx.c | 23 ++
.../selftests/kvm/lib/x86/tdx/tdx_util.c | 354 ++++++++++++++++++
tools/testing/selftests/kvm/lib/x86/ucall.c | 45 ++-
tools/testing/selftests/kvm/x86/tdx_vm_test.c | 31 ++
20 files changed, 1032 insertions(+), 37 deletions(-)
create mode 100644 tools/include/linux/kbuild.h
create mode 100644 tools/testing/selftests/kvm/include/x86/processor_asm.h
create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/td_boot.h
create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/td_boot_asm.h
create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/tdcall.h
create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/tdx.h
create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/tdx_util.h
create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/td_boot.S
create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/td_boot_offsets.c
create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdcall.S
create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdcall_offsets.c
create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdx.c
create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdx_util.c
create mode 100644 tools/testing/selftests/kvm/x86/tdx_vm_test.c
--
2.51.0.338.gd7d06c2dae-goog
From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com>
Hello,
Plesae find the v2 AccECN case handling patch series, which covers
several excpetional case handling of Accurate ECN spec (RFC9768),
adds new identifiers to be used by CC modules, adds ecn_delta into
rate_sample, and keeps the ACE counter for computation, etc.
This patch series is part of the full AccECN patch series, which is available at
https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/
Best regards,
Chia-Yu
---
Chia-Yu Chang (11):
tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules
tcp: disable RFC3168 fallback identifier for CC modules
tcp: accecn: handle unexpected AccECN negotiation feedback
tcp: accecn: retransmit downgraded SYN in AccECN negotiation
tcp: move increment of num_retrans
tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN
SYN/ACK
tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion
tcp: accecn: fallback outgoing half link to non-AccECN
tcp: accecn: verify ACE counter in 1st ACK after AccECN negotiation
tcp: accecn: stop sending AccECN opt when loss ACK w/ option
tcp: accecn: enable AccECN
Ilpo Järvinen (3):
tcp: try to avoid safer when ACKs are thinned
gro: flushing when CWR is set negatively affects AccECN
tcp: accecn: Add ece_delta to rate_sample
.../networking/net_cachelines/tcp_sock.rst | 1 +
include/linux/tcp.h | 4 +-
include/net/inet_ecn.h | 20 +++-
include/net/tcp.h | 30 +++++-
include/net/tcp_ecn.h | 85 ++++++++++++-----
net/ipv4/sysctl_net_ipv4.c | 2 +-
net/ipv4/tcp.c | 2 +
net/ipv4/tcp_cong.c | 9 +-
net/ipv4/tcp_input.c | 91 +++++++++++++------
net/ipv4/tcp_minisocks.c | 40 +++++---
net/ipv4/tcp_offload.c | 3 +-
net/ipv4/tcp_output.c | 38 +++++---
12 files changed, 239 insertions(+), 86 deletions(-)
--
2.34.1
This patchset introduces target resume capability to netconsole allowing
it to recover targets when underlying low-level interface comes back
online.
The patchset starts by refactoring netconsole state representation in
order to allow representing deactivated targets (targets that are
disabled due to interfaces going down). It then modifies netconsole to
handle NETDEV_UP events for such targets and setups netpoll.
The patchset includes a selftest that validates netconsole target state
transitions and that target is functional after resumed.
Signed-off-by: Andre Carvalho <asantostc(a)gmail.com>
---
Changes in v2:
- Attempt to resume target in the same thread, instead of using
workqueue .
- Add wrapper around __netpoll_setup (patch 4).
- Renamed resume_target to maybe_resume_target and moved conditionals to
inside its implementation, keeping code more clear.
- Verify that device addr matches target mac address when target was
setup using mac.
- Update selftest to cover targets bound by mac and interface name.
- Fix typo in selftest comment and sort tests alphabetically in
Makefile.
- Link to v1:
https://lore.kernel.org/r/20250909-netcons-retrigger-v1-0-3aea904926cf@gmai…
---
Andre Carvalho (4):
netconsole: convert 'enabled' flag to enum for clearer state management
netpoll: add wrapper around __netpoll_setup with dev reference
netconsole: resume previously deactivated target
selftests: netconsole: validate target reactivation
Breno Leitao (2):
netconsole: add target_state enum
netconsole: add STATE_DEACTIVATED to track targets disabled by low level
drivers/net/netconsole.c | 102 +++++++++++++++------
include/linux/netpoll.h | 1 +
net/core/netpoll.c | 20 ++++
tools/testing/selftests/drivers/net/Makefile | 1 +
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 30 +++++-
.../selftests/drivers/net/netcons_resume.sh | 92 +++++++++++++++++++
6 files changed, 216 insertions(+), 30 deletions(-)
---
base-commit: 312e6f7676e63bbb9b81e5c68e580a9f776cc6f0
change-id: 20250816-netcons-retrigger-a4f547bfc867
Best regards,
--
Andre Carvalho <asantostc(a)gmail.com>
From: Benjamin Berg <benjamin.berg(a)intel.com>
This patchset is an attempt to start a nolibc port of UML. The goal is
to port UML to use nolibc in smaller chunks to make the switch more
manageable.
There are three parts to this patchset:
* Two patches to use tools/include headers instead of kernel headers
for userspace files.
* A few nolibc fixes and a new NOLIBC_NO_STARTCODE compile flag for it
* Finally nolibc build support for UML and switching two files while
adding the appropriate support in nolibc itself.
v1 of this patchset was
https://lore.kernel.org/all/20250915071115.1429196-1-benjamin@sipsolutions.…
Changes in v2:
- add sys/uio.h and sys/ptrace.h to nolibc
- Use NOLIBC_NO_RUNTIME to disable nolibc startup code
- Fix out-of-tree build
- various small improvements and cleanups
Should the nolibc changes be merged separately or could everything go
through the same branch?
Also, what about tools/include/linux/compiler.h? It seems that was added
for the tracing code, but it is not clear to me who might ACK that fix.
Benjamin
Benjamin Berg (11):
tools compiler.h: fix __used definition
um: use tools/include for user files
tools/nolibc/stdio: remove perror if NOLIBC_IGNORE_ERRNO is set
tools/nolibc/dirent: avoid errno in readdir_r
tools/nolibc: use __fallthrough__ rather than fallthrough
tools/nolibc: add option to disable runtime
um: add infrastructure to build files using nolibc
um: use nolibc for the --showconfig implementation
tools/nolibc: add uio.h with readv and writev
tools/nolibc: add ptrace support
um: switch ptrace FP register access to nolibc
arch/um/Makefile | 38 +++++++++++---
arch/um/include/shared/init.h | 2 +-
arch/um/include/shared/os.h | 2 +
arch/um/include/shared/user.h | 6 ---
arch/um/kernel/Makefile | 2 +-
arch/um/kernel/skas/stub.c | 1 +
arch/um/kernel/skas/stub_exe.c | 4 +-
arch/um/os-Linux/skas/process.c | 6 +--
arch/um/os-Linux/start_up.c | 4 +-
arch/um/scripts/Makefile.rules | 10 +++-
arch/x86/um/Makefile | 6 ++-
arch/x86/um/os-Linux/Makefile | 5 +-
arch/x86/um/os-Linux/registers.c | 16 ++----
arch/x86/um/user-offsets.c | 1 -
tools/include/linux/compiler.h | 2 +-
tools/include/nolibc/Makefile | 2 +
tools/include/nolibc/arch-arm.h | 2 +
tools/include/nolibc/arch-arm64.h | 2 +
tools/include/nolibc/arch-loongarch.h | 2 +
tools/include/nolibc/arch-m68k.h | 2 +
tools/include/nolibc/arch-mips.h | 2 +
tools/include/nolibc/arch-powerpc.h | 2 +
tools/include/nolibc/arch-riscv.h | 2 +
tools/include/nolibc/arch-s390.h | 2 +
tools/include/nolibc/arch-sh.h | 2 +
tools/include/nolibc/arch-sparc.h | 2 +
tools/include/nolibc/arch-x86.h | 4 ++
tools/include/nolibc/compiler.h | 4 +-
tools/include/nolibc/crt.h | 3 ++
tools/include/nolibc/dirent.h | 6 +--
tools/include/nolibc/nolibc.h | 2 +
tools/include/nolibc/stackprotector.h | 2 +
tools/include/nolibc/stdio.h | 2 +
tools/include/nolibc/stdlib.h | 2 +
tools/include/nolibc/sys.h | 3 +-
tools/include/nolibc/sys/auxv.h | 3 ++
tools/include/nolibc/sys/ptrace.h | 52 ++++++++++++++++++++
tools/include/nolibc/sys/uio.h | 49 ++++++++++++++++++
tools/testing/selftests/nolibc/nolibc-test.c | 11 +++++
39 files changed, 222 insertions(+), 48 deletions(-)
create mode 100644 tools/include/nolibc/sys/ptrace.h
create mode 100644 tools/include/nolibc/sys/uio.h
--
2.51.0
The sendto() call in walk_tx() was passing NULL as the buffer argument,
which can trigger a -Wnonnull warning with some compilers.
Although the size is 0 and no data is actually sent, passing a null
pointer is technically incorrect.
This commit changes NULL to an empty string literal ("") to satisfy the
non-null argument requirement and fix the compiler warning.
Signed-off-by: Wake Liu <wakel(a)google.com>
---
tools/testing/selftests/net/psock_tpacket.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/psock_tpacket.c b/tools/testing/selftests/net/psock_tpacket.c
index 221270cee3ea..0c24adbb292e 100644
--- a/tools/testing/selftests/net/psock_tpacket.c
+++ b/tools/testing/selftests/net/psock_tpacket.c
@@ -470,7 +470,7 @@ static void walk_tx(int sock, struct ring *ring)
bug_on(total_packets != 0);
- ret = sendto(sock, NULL, 0, 0, NULL, 0);
+ ret = sendto(sock, "", 0, 0, NULL, 0);
if (ret == -1) {
perror("sendto");
exit(1);
--
2.51.0.534.gc79095c0ca-goog
The TODO about using the number of vCPUs instead of vcpu.id + 1
was already addressed by commit 376bc1b458c9 ("KVM: selftests: Don't
assume vcpu->id is '0' in xAPIC state test"). The comment is now
stale and can be removed.
Signed-off-by: Sukrut Heroorkar <hsukrut3(a)gmail.com>
---
tools/testing/selftests/kvm/x86/xapic_state_test.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86/xapic_state_test.c b/tools/testing/selftests/kvm/x86/xapic_state_test.c
index fdebff1165c7..3b4814c55722 100644
--- a/tools/testing/selftests/kvm/x86/xapic_state_test.c
+++ b/tools/testing/selftests/kvm/x86/xapic_state_test.c
@@ -120,8 +120,8 @@ static void test_icr(struct xapic_vcpu *x)
__test_icr(x, icr | i);
/*
- * Send all flavors of IPIs to non-existent vCPUs. TODO: use number of
- * vCPUs, not vcpu.id + 1. Arbitrarily use vector 0xff.
+ * Send all flavors of IPIs to non-existent vCPUs. Arbitrarily use
+ * vector 0xff.
*/
icr = APIC_INT_ASSERT | 0xff;
for (i = 0; i < 0xff; i++) {
--
2.43.0
Fix to avoid the usage of the `res` variable uninitialized in the
following macro expansions.
It solves the following warning:
In function ‘iommufd_viommu_vdevice_alloc’,
inlined from ‘wrapper_iommufd_viommu_vdevice_alloc’ at
iommufd.c:2889:1:
../kselftest_harness.h:760:12: warning: ‘ret’ may be used uninitialized
[-Wmaybe-uninitialized]
760 | if (!(__exp _t __seen)) { \
| ^
../kselftest_harness.h:513:9: note: in expansion of macro ‘__EXPECT’
513 | __EXPECT(expected, #expected, seen, #seen, ==, 1)
| ^~~~~~~~
iommufd_utils.h:1057:9: note: in expansion of macro ‘ASSERT_EQ’
1057 | ASSERT_EQ(0, _test_cmd_trigger_vevents(self->fd, dev_id,
nvevents))
| ^~~~~~~~~
iommufd.c:2924:17: note: in expansion of macro
‘test_cmd_trigger_vevents’
2924 | test_cmd_trigger_vevents(dev_id, 3);
| ^~~~~~~~~~~~~~~~~~~~~~~~
The issue can be reproduced, building the tests, with the command:
make -C tools/testing/selftests TARGETS=iommu
Signed-off-by: Alessandro Zanni <alessandro.zanni87(a)gmail.com>
---
tools/testing/selftests/iommu/iommufd_utils.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 3c3e08b8c90e..4ae0fcc4f871 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -1042,7 +1042,7 @@ static int _test_cmd_trigger_vevents(int fd, __u32 dev_id, __u32 nvevents)
.dev_id = dev_id,
},
};
- int ret;
+ int ret = 0;
while (nvevents--) {
ret = ioctl(fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_TRIGGER_VEVENT),
--
2.43.0
This is series 2a/5 of the migration to `core::ffi::CStr`[0].
20250704-core-cstr-prepare-v1-0-a91524037783(a)gmail.com.
This series depends on the prior series[0] and is intended to go through
the rust tree to reduce the number of release cycles required to
complete the work.
Subsystem maintainers: I would appreciate your `Acked-by`s so that this
can be taken through Miguel's tree (where the other series must go).
[0] https://lore.kernel.org/all/20250704-core-cstr-prepare-v1-0-a91524037783@gm…
Signed-off-by: Tamir Duberstein <tamird(a)gmail.com>
---
Changes in v3:
- Add a patch to address new code in device.rs.
- Drop incorrectly applied Acked-by tags from Danilo.
- Link to v2: https://lore.kernel.org/r/20250719-core-cstr-fanout-1-v2-0-1ab5ba189c6e@gma…
Changes in v2:
- Rebase on rust-next.
- Drop pin-init patch, which is no longer needed.
- Link to v1: https://lore.kernel.org/r/20250709-core-cstr-fanout-1-v1-0-64308e7203fc@gma…
---
Tamir Duberstein (9):
gpu: nova-core: use `kernel::{fmt,prelude::fmt!}`
rust: alloc: use `kernel::{fmt,prelude::fmt!}`
rust: block: use `kernel::{fmt,prelude::fmt!}`
rust: device: use `kernel::{fmt,prelude::fmt!}`
rust: file: use `kernel::{fmt,prelude::fmt!}`
rust: kunit: use `kernel::{fmt,prelude::fmt!}`
rust: seq_file: use `kernel::{fmt,prelude::fmt!}`
rust: sync: use `kernel::{fmt,prelude::fmt!}`
rust: device: use `kernel::{fmt,prelude::fmt!}`
drivers/block/rnull.rs | 2 +-
drivers/gpu/nova-core/gpu.rs | 3 +--
drivers/gpu/nova-core/regs/macros.rs | 6 +++---
rust/kernel/alloc/kbox.rs | 2 +-
rust/kernel/alloc/kvec.rs | 2 +-
rust/kernel/alloc/kvec/errors.rs | 2 +-
rust/kernel/block/mq.rs | 2 +-
rust/kernel/block/mq/gen_disk.rs | 2 +-
rust/kernel/block/mq/raw_writer.rs | 3 +--
rust/kernel/device.rs | 6 +++---
rust/kernel/device/property.rs | 23 ++++++++++++-----------
rust/kernel/fs/file.rs | 5 +++--
rust/kernel/kunit.rs | 8 ++++----
rust/kernel/seq_file.rs | 6 +++---
rust/kernel/sync/arc.rs | 2 +-
scripts/rustdoc_test_gen.rs | 2 +-
16 files changed, 38 insertions(+), 38 deletions(-)
---
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
change-id: 20250709-core-cstr-fanout-1-f20611832272
Best regards,
--
Tamir Duberstein <tamird(a)gmail.com>
The rtnetlink FOU selftest prints an incorrect string:
"FAIL: fou"s. Change it to the intended "FAIL: fou" by
removing a stray character in the end_test string of the test.
Signed-off-by: Alok Tiwari <alok.a.tiwari(a)oracle.com>
---
tools/testing/selftests/net/rtnetlink.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh
index d6c00efeb664..24bba74c77ee 100755
--- a/tools/testing/selftests/net/rtnetlink.sh
+++ b/tools/testing/selftests/net/rtnetlink.sh
@@ -519,7 +519,7 @@ kci_test_encap_fou()
run_cmd_fail ip -netns "$testns" fou del port 9999
run_cmd ip -netns "$testns" fou del port 7777
if [ $ret -ne 0 ]; then
- end_test "FAIL: fou"s
+ end_test "FAIL: fou"
return 1
fi
--
2.50.1
To: linux-kselftest(a)vger.kernel.org
Date: 24-09-2025
Thematic Funds Letter Of Intent
It's a pleasure to connect with you
Having been referred to your investment by my team, we would be
honored to review your available investment projects for onward
referral to my principal investors who can allocate capital for
the financing of it.
kindly advise at your convenience
Best Regards,
Respectfully,
Al Sayyid Sultan Yarub Al Busaidi
Director
The active-backup bonding mode supports XFRM ESP offload. However, when
a bond is added using command like `ip link add bond0 type bond mode 1
miimon 100`, the `ethtool -k` command shows that the XFRM ESP offload is
disabled. This occurs because, in bond_newlink(), we change bond link
first and register bond device later. So the XFRM feature update in
bond_option_mode_set() is not called as the bond device is not yet
registered, leading to the offload feature not being set successfully.
To resolve this issue, we can modify the code order in bond_newlink() to
ensure that the bond device is registered first before changing the bond
link parameters. This change will allow the XFRM ESP offload feature to be
correctly enabled.
Fixes: 007ab5345545 ("bonding: fix feature flag setting at init time")
Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
v2: rebase to latest net, no code update
---
drivers/net/bonding/bond_main.c | 2 +-
drivers/net/bonding/bond_netlink.c | 16 +++++++++-------
include/net/bonding.h | 1 +
3 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 57be04f6cb11..f4f0feddd9fa 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4411,7 +4411,7 @@ void bond_work_init_all(struct bonding *bond)
INIT_DELAYED_WORK(&bond->slave_arr_work, bond_slave_arr_handler);
}
-static void bond_work_cancel_all(struct bonding *bond)
+void bond_work_cancel_all(struct bonding *bond)
{
cancel_delayed_work_sync(&bond->mii_work);
cancel_delayed_work_sync(&bond->arp_work);
diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c
index 57fff2421f1b..7a9d73ec8e91 100644
--- a/drivers/net/bonding/bond_netlink.c
+++ b/drivers/net/bonding/bond_netlink.c
@@ -579,20 +579,22 @@ static int bond_newlink(struct net_device *bond_dev,
struct rtnl_newlink_params *params,
struct netlink_ext_ack *extack)
{
+ struct bonding *bond = netdev_priv(bond_dev);
struct nlattr **data = params->data;
struct nlattr **tb = params->tb;
int err;
- err = bond_changelink(bond_dev, tb, data, extack);
- if (err < 0)
+ err = register_netdevice(bond_dev);
+ if (err)
return err;
- err = register_netdevice(bond_dev);
- if (!err) {
- struct bonding *bond = netdev_priv(bond_dev);
+ netif_carrier_off(bond_dev);
+ bond_work_init_all(bond);
- netif_carrier_off(bond_dev);
- bond_work_init_all(bond);
+ err = bond_changelink(bond_dev, tb, data, extack);
+ if (err) {
+ bond_work_cancel_all(bond);
+ unregister_netdevice(bond_dev);
}
return err;
diff --git a/include/net/bonding.h b/include/net/bonding.h
index e06f0d63b2c1..bd56ad976cfb 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -711,6 +711,7 @@ struct bond_vlan_tag *bond_verify_device_path(struct net_device *start_dev,
int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave);
void bond_slave_arr_work_rearm(struct bonding *bond, unsigned long delay);
void bond_work_init_all(struct bonding *bond);
+void bond_work_cancel_all(struct bonding *bond);
#ifdef CONFIG_PROC_FS
void bond_create_proc_entry(struct bonding *bond);
--
2.50.1
Soft offlining a HugeTLB page reduces the HugeTLB page pool.
Commit 56374430c5dfc ("mm/memory-failure: userspace controls soft-offlining pages")
introduced the following sysctl interface to control soft offline:
/proc/sys/vm/enable_soft_offline
The interface does not distinguish between page types:
0 - Soft offline is disabled
1 - Soft offline is enabled
Convert enable_soft_offline to a bitmask and support disabling soft
offline for HugeTLB pages:
Bits:
0 - Enable soft offline
1 - Disable soft offline for HugeTLB pages
Supported values:
0 - Soft offline is disabled
1 - Soft offline is enabled
3 - Soft offline is enabled (disabled for HugeTLB pages)
Existing behavior is preserved.
Update documentation and HugeTLB soft offline self tests.
Reported-by: Shawn Fan <shawn.fan(a)intel.com>
Suggested-by: Tony Luck <tony.luck(a)intel.com>
Signed-off-by: Kyle Meyer <kyle.meyer(a)hpe.com>
---
Tony's patch:
* https://lore.kernel.org/all/20250904155720.22149-1-tony.luck@intel.com
v1:
* https://lore.kernel.org/all/aMGkAI3zKlVsO0S2@hpe.com
v1 -> v2:
* Make the interface extensible, as suggested by David.
* Preserve existing behavior, as suggested by Jiaqi and David.
Why clear errno in self tests?
madvise() does not set errno when it's successful and errno is set by madvise()
during test_soft_offline_common(3) causing test_soft_offline_common(1) to fail:
# Test soft-offline when enabled_soft_offline=1
# Hugepagesize is 1048576kB
# enable_soft_offline => 1
# Before MADV_SOFT_OFFLINE nr_hugepages=7
# Allocated 0x80000000 bytes of hugetlb pages
# MADV_SOFT_OFFLINE 0x7fd600000000 ret=0, errno=95
# MADV_SOFT_OFFLINE should ret 0
# After MADV_SOFT_OFFLINE nr_hugepages=6
not ok 2 Test soft-offline when enabled_soft_offline=1
---
.../ABI/testing/sysfs-memory-page-offline | 3 ++
Documentation/admin-guide/sysctl/vm.rst | 28 ++++++++++++++++---
mm/memory-failure.c | 17 +++++++++--
.../selftests/mm/hugetlb-soft-offline.c | 19 ++++++++++---
4 files changed, 56 insertions(+), 11 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-memory-page-offline b/Documentation/ABI/testing/sysfs-memory-page-offline
index 00f4e35f916f..d3f05ed6605e 100644
--- a/Documentation/ABI/testing/sysfs-memory-page-offline
+++ b/Documentation/ABI/testing/sysfs-memory-page-offline
@@ -20,6 +20,9 @@ Description:
number, or a error when the offlining failed. Reading
the file is not allowed.
+ Soft-offline can be controlled via sysctl, see:
+ Documentation/admin-guide/sysctl/vm.rst
+
What: /sys/devices/system/memory/hard_offline_page
Date: Sep 2009
KernelVersion: 2.6.33
diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 4d71211fdad8..ace73480eb9d 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -309,19 +309,39 @@ physical memory) vs performance / capacity implications in transparent and
HugeTLB cases.
For all architectures, enable_soft_offline controls whether to soft offline
-memory pages. When set to 1, kernel attempts to soft offline the pages
-whenever it thinks needed. When set to 0, kernel returns EOPNOTSUPP to
-the request to soft offline the pages. Its default value is 1.
+memory pages.
+
+enable_soft_offline is a bitmask:
+
+Bits::
+
+ 0 - Enable soft offline
+ 1 - Disable soft offline for HugeTLB pages
+
+Supported values::
+
+ 0 - Soft offline is disabled
+ 1 - Soft offline is enabled
+ 3 - Soft offline is enabled (disabled for HugeTLB pages)
+
+The default value is 1.
+
+If soft offline is disabled for the requested page type, EOPNOTSUPP is returned.
It is worth mentioning that after setting enable_soft_offline to 0, the
following requests to soft offline pages will not be performed:
+- Request to soft offline from sysfs (soft_offline_page).
+
- Request to soft offline pages from RAS Correctable Errors Collector.
-- On ARM, the request to soft offline pages from GHES driver.
+- On ARM and X86, the request to soft offline pages from GHES driver.
- On PARISC, the request to soft offline pages from Page Deallocation Table.
+Note:
+ Soft offlining a HugeTLB page reduces the HugeTLB page pool.
+
extfrag_threshold
=================
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index fc30ca4804bf..0ad9ae11d9e8 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -64,11 +64,14 @@
#include "internal.h"
#include "ras/ras_event.h"
+#define SOFT_OFFLINE_ENABLED BIT(0)
+#define SOFT_OFFLINE_SKIP_HUGETLB BIT(1)
+
static int sysctl_memory_failure_early_kill __read_mostly;
static int sysctl_memory_failure_recovery __read_mostly = 1;
-static int sysctl_enable_soft_offline __read_mostly = 1;
+static int sysctl_enable_soft_offline __read_mostly = SOFT_OFFLINE_ENABLED;
atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
@@ -150,7 +153,7 @@ static const struct ctl_table memory_failure_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
- .extra2 = SYSCTL_ONE,
+ .extra2 = SYSCTL_THREE,
}
};
@@ -2799,12 +2802,20 @@ int soft_offline_page(unsigned long pfn, int flags)
return -EIO;
}
- if (!sysctl_enable_soft_offline) {
+ if (!(sysctl_enable_soft_offline & SOFT_OFFLINE_ENABLED)) {
pr_info_once("disabled by /proc/sys/vm/enable_soft_offline\n");
put_ref_page(pfn, flags);
return -EOPNOTSUPP;
}
+ if (sysctl_enable_soft_offline & SOFT_OFFLINE_SKIP_HUGETLB) {
+ if (folio_test_hugetlb(pfn_folio(pfn))) {
+ pr_info_once("disabled for HugeTLB pages by /proc/sys/vm/enable_soft_offline\n");
+ put_ref_page(pfn, flags);
+ return -EOPNOTSUPP;
+ }
+ }
+
mutex_lock(&mf_mutex);
if (PageHWPoison(page)) {
diff --git a/tools/testing/selftests/mm/hugetlb-soft-offline.c b/tools/testing/selftests/mm/hugetlb-soft-offline.c
index f086f0e04756..b87c8778cadf 100644
--- a/tools/testing/selftests/mm/hugetlb-soft-offline.c
+++ b/tools/testing/selftests/mm/hugetlb-soft-offline.c
@@ -5,6 +5,8 @@
* offlining failed with EOPNOTSUPP.
* - if enable_soft_offline = 1, a hugepage should be dissolved and
* nr_hugepages/free_hugepages should be reduced by 1.
+ * - if enable_soft_offline = 3, hugepages should stay intact and soft
+ * offlining failed with EOPNOTSUPP.
*
* Before running, make sure more than 2 hugepages of default_hugepagesz
* are allocated. For example, if /proc/meminfo/Hugepagesize is 2048kB:
@@ -32,6 +34,9 @@
#define EPREFIX " !!! "
+#define SOFT_OFFLINE_ENABLED (1 << 0)
+#define SOFT_OFFLINE_SKIP_HUGETLB (1 << 1)
+
static int do_soft_offline(int fd, size_t len, int expect_errno)
{
char *filemap = NULL;
@@ -56,6 +61,7 @@ static int do_soft_offline(int fd, size_t len, int expect_errno)
ksft_print_msg("Allocated %#lx bytes of hugetlb pages\n", len);
hwp_addr = filemap + len / 2;
+ errno = 0;
ret = madvise(hwp_addr, pagesize, MADV_SOFT_OFFLINE);
ksft_print_msg("MADV_SOFT_OFFLINE %p ret=%d, errno=%d\n",
hwp_addr, ret, errno);
@@ -83,7 +89,7 @@ static int set_enable_soft_offline(int value)
char cmd[256] = {0};
FILE *cmdfile = NULL;
- if (value != 0 && value != 1)
+ if (value < 0 || value > 3)
return -EINVAL;
sprintf(cmd, "echo %d > /proc/sys/vm/enable_soft_offline", value);
@@ -155,13 +161,17 @@ static int create_hugetlbfs_file(struct statfs *file_stat)
static void test_soft_offline_common(int enable_soft_offline)
{
int fd;
- int expect_errno = enable_soft_offline ? 0 : EOPNOTSUPP;
+ int expect_errno = 0;
struct statfs file_stat;
unsigned long hugepagesize_kb = 0;
unsigned long nr_hugepages_before = 0;
unsigned long nr_hugepages_after = 0;
int ret;
+ if (!(enable_soft_offline & SOFT_OFFLINE_ENABLED) ||
+ (enable_soft_offline & SOFT_OFFLINE_SKIP_HUGETLB))
+ expect_errno = EOPNOTSUPP;
+
ksft_print_msg("Test soft-offline when enabled_soft_offline=%d\n",
enable_soft_offline);
@@ -198,7 +208,7 @@ static void test_soft_offline_common(int enable_soft_offline)
// No need for the hugetlbfs file from now on.
close(fd);
- if (enable_soft_offline) {
+ if (expect_errno == 0) {
if (nr_hugepages_before != nr_hugepages_after + 1) {
ksft_test_result_fail("MADV_SOFT_OFFLINE should reduced 1 hugepage\n");
return;
@@ -219,8 +229,9 @@ static void test_soft_offline_common(int enable_soft_offline)
int main(int argc, char **argv)
{
ksft_print_header();
- ksft_set_plan(2);
+ ksft_set_plan(3);
+ test_soft_offline_common(3);
test_soft_offline_common(1);
test_soft_offline_common(0);
--
2.51.0
With the current Makefile, if the user tries something like
make TARGETS="bpf mm"
only mm is run and bpf is skipped, which is not intentional.
`bpf` and `sched_ext` are always filtered out even when TARGETS is set
explicitly due to how SKIP_TARGETS is implemented.
This default skip exists because these tests require newer LLVM/Clang
versions that may not be available on all systems.
Fix the SKIP_TARGETS logic so that bpf and sched_ext remain
skipped when TARGETS is taken from the Makefile but are included when
the user specifies them explicitly.
Signed-off-by: I Viswanath <viswanathiyyappan(a)gmail.com>
---
make --silent summary=1 TARGETS="bpf size" kselftest
make[3]: Entering directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/bpf'
Auto-detecting system features:
... llvm: [ OFF ]
Makefile:127: tools/build/Makefile.feature: No such file or directory
make[4]: *** No rule to make target 'tools/build/Makefile.feature'. Stop.
make[3]: *** [Makefile:344: /home/user/kernel-dev/linux-next/tools/testing/selftests/bpf/tools/sbin/bpftool] Error 2
make[3]: Leaving directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/bpf'
make[3]: Nothing to be done for 'all'.
make[3]: Entering directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/bpf'
Auto-detecting system features:
... llvm: [ OFF ]
Makefile:127: tools/build/Makefile.feature: No such file or directory
make[4]: *** No rule to make target 'tools/build/Makefile.feature'. Stop.
make[3]: *** [Makefile:344: /home/user/kernel-dev/linux-next/tools/testing/selftests/bpf/tools/sbin/bpftool] Error 2
make[3]: Leaving directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/bpf'
TAP version 13
1..1
# selftests: size: get_size
ok 1 selftests: size: get_size
make --silent summary=1 kselftest (bpf is between arm64 and breakpoints in TARGETS)
make[3]: Nothing to be done for 'all'.
make[3]: Entering directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/alsa'
make[3]: Nothing to be done for 'all'.
make[3]: Leaving directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/alsa'
make[3]: Entering directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/amd-pstate'
make[3]: Nothing to be done for 'all'.
make[3]: Leaving directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/amd-pstate'
make[3]: Entering directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/arm64'
make[3]: Leaving directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/arm64'
make[3]: Entering directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/breakpoints'
make[3]: Nothing to be done for 'all'.
make[3]: Leaving directory '/home/user/kernel-dev/linux-next/tools/testing/selftests/breakpoints'
make[3]: Nothing to be done for 'all'.
tools/testing/selftests/Makefile | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index babed7b1c2d1..c6cedb09c372 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -145,7 +145,10 @@ endif
# User can optionally provide a TARGETS skiplist. By default we skip
# targets using BPF since it has cutting edge build time dependencies
# which require more effort to install.
-SKIP_TARGETS ?= bpf sched_ext
+ifeq ($(origin TARGETS), file)
+ SKIP_TARGETS ?= bpf sched_ext
+endif
+
ifneq ($(SKIP_TARGETS),)
TMP := $(filter-out $(SKIP_TARGETS), $(TARGETS))
override TARGETS := $(TMP)
--
2.47.3
Now that the 'flags' attribute is used, it seems interesting to add one
flag for 'server-side', a boolean value.
Here are a few patches related to the 'server-side' attribute:
- Patch 1: only announce this attribute on the server side.
- Patch 2: announce the 'server-side' flag when this is the case.
- Patch 3: deprecate the 'server-side' attribute.
- Patch 4: use the 'server-side' flag in the selftests.
- Patches 5, 6: small cleanups when working on code around.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Matthieu Baerts (NGI0) (6):
mptcp: pm: netlink: only add server-side attr when true
mptcp: pm: netlink: announce server-side flag
mptcp: pm: netlink: deprecate server-side attribute
selftests: mptcp: pm: get server-side flag
mptcp: use _BITUL() instead of (1 << x)
mptcp: remove unused returned value of check_data_fin
Documentation/netlink/specs/mptcp_pm.yaml | 5 +++--
include/uapi/linux/mptcp.h | 11 ++++++-----
include/uapi/linux/mptcp_pm.h | 4 ++--
net/mptcp/pm_netlink.c | 9 +++++++--
net/mptcp/protocol.c | 5 +----
tools/testing/selftests/net/mptcp/pm_nl_ctl.c | 9 ++++++++-
tools/testing/selftests/net/mptcp/userspace_pm.sh | 2 +-
7 files changed, 28 insertions(+), 17 deletions(-)
---
base-commit: 315f423be0d1ebe720d8fd4fa6bed68586b13d34
change-id: 20250916-net-next-mptcp-server-side-flag-0f002418946d
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
This series backports 15 patches to update minmax.h in the 6.6.y branch,
aligning it with v6.17-rc7.
The ultimate goal is to synchronize all longterm branches so that they
include the full set of minmax.h changes.
The key motivation is to bring in commit d03eba99f5bf ("minmax: allow
min()/max()/clamp() if the arguments have the same signedness"), which
is missing in older kernels.
In mainline, this change enables min()/max()/clamp() to accept mixed
argument types, provided both have the same signedness. Without it,
backported patches that use these forms may trigger compiler warnings,
which escalate to build failures when -Werror is enabled.
David Laight (7):
minmax.h: add whitespace around operators and after commas
minmax.h: update some comments
minmax.h: reduce the #define expansion of min(), max() and clamp()
minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp()
minmax.h: move all the clamp() definitions after the min/max() ones
minmax.h: simplify the variants of clamp()
minmax.h: remove some #defines that are only expanded once
Linus Torvalds (8):
minmax: avoid overly complicated constant expressions in VM code
minmax: simplify and clarify min_t()/max_t() implementation
minmax: add a few more MIN_T/MAX_T users
minmax: make generic MIN() and MAX() macros available everywhere
minmax: simplify min()/max()/clamp() implementation
minmax: don't use max() in situations that want a C constant
expression
minmax: improve macro expansion and type checking
minmax: fix up min3() and max3() too
arch/um/drivers/mconsole_user.c | 2 +
arch/x86/mm/pgtable.c | 2 +-
drivers/edac/sb_edac.c | 4 +-
drivers/edac/skx_common.h | 1 -
.../drm/amd/display/modules/hdcp/hdcp_ddc.c | 2 +
.../drm/amd/pm/powerplay/hwmgr/ppevvmath.h | 14 +-
drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c | 2 +-
drivers/gpu/drm/drm_color_mgmt.c | 2 +-
drivers/gpu/drm/radeon/evergreen_cs.c | 2 +
drivers/hwmon/adt7475.c | 24 +-
drivers/input/touchscreen/cyttsp4_core.c | 2 +-
drivers/irqchip/irq-sun6i-r.c | 2 +-
drivers/md/dm-integrity.c | 6 +-
drivers/media/dvb-frontends/stv0367_priv.h | 3 +
.../net/can/usb/etas_es58x/es58x_devlink.c | 2 +-
.../net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +-
drivers/net/fjes/fjes_main.c | 4 +-
drivers/nfc/pn544/i2c.c | 2 -
drivers/platform/x86/sony-laptop.c | 1 -
drivers/scsi/isci/init.c | 6 +-
.../pci/hive_isp_css_include/math_support.h | 5 -
fs/btrfs/tree-checker.c | 2 +-
include/linux/compiler.h | 9 +
include/linux/minmax.h | 228 +++++++++++-------
include/linux/pageblock-flags.h | 2 +-
kernel/trace/preemptirq_delay_test.c | 2 -
lib/btree.c | 1 -
lib/decompress_unlzma.c | 2 +
lib/vsprintf.c | 2 +-
mm/zsmalloc.c | 2 -
net/ipv4/proc.c | 2 +-
net/ipv6/proc.c | 2 +-
tools/testing/selftests/mm/mremap_test.c | 2 +
tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +
34 files changed, 202 insertions(+), 146 deletions(-)
--
2.47.3
Hi,
While staring at epoll, I noticed ep_events_available() looks wrong. I
wrote a small program to confirm, and yes it is definitely wrong.
This series adds a reproducer to kselftest, and fix the bug.
Nam Cao (2):
selftests/eventpoll: Add test for multiple waiters
eventpoll: Fix epoll_wait() report false negative
fs/eventpoll.c | 16 +------
.../filesystems/epoll/epoll_wakeup_test.c | 45 +++++++++++++++++++
2 files changed, 47 insertions(+), 14 deletions(-)
--
2.39.5
Hi everyone,
This patchset introduces a new BPF program type that allows overriding
a tracepoint probe function registered via register_trace_*.
Motivation
----------
Tracepoint probe functions registered via register_trace_* in the kernel
cannot be dynamically modified, changing a probe function requires recompiling
the kernel and rebooting. Nor can BPF programs change an existing
probe function.
Overiding tracepoint supports a way to apply patches into kernel quickly
(such as applying security ones), through predefined static tracepoints,
without waiting for upstream integration.
This patchset demonstrates the way to override probe functions by BPF program.
Overview
--------
This patchset adds BPF_PROG_TYPE_RAW_TRACEPOINT_OVERRIDE program type.
When this type of BPF program attaches, it overrides the target tracepoint
probe function.
And it also extends a new struct type "tracepoint_func_snapshot", which extends
the tracepoint structure. It is used to record the original probe function
registered by kernel after BPF program being attached and restore from it
after detachment.
Critical steps
--------------
1. Attach: Attach programs via the raw_tracepoint_open syscall.
2. Override:
(a) Locate the target probe by `probe_name`.
(b) Override target probe with the BPF program.
(c) Save the BPF program and target probe function into "tracepoint_func_snapshot".
3. Restore: When the BPF program is detached, automatically restore
the original probe function from earlier saved snapshot.
Future work
-----------
This patchset is intended as a first step toward supporting BPF programs
that can override tracepoint probes. The current implementation may not yet
cover all use cases or handle every corner case.
I welcome feedback and suggestions from the community, and will continue to
refine and improve the design based on comments and real-world requirements.
Thanks!
Fuyu
Fuyu Zhao (3):
bpf: Introduce BPF_PROG_TYPE_RAW_TRACEPOINT_OVERRIDE
libbpf: Add support for BPF_PROG_TYPE_RAW_TRACEPOINT_OVERRIDE
selftests/bpf: Add selftest for "raw_tp.o"
include/linux/bpf_types.h | 2 +
include/linux/trace_events.h | 9 +
include/linux/tracepoint-defs.h | 6 +
include/linux/tracepoint.h | 3 +
include/uapi/linux/bpf.h | 2 +
kernel/bpf/syscall.c | 35 +++-
kernel/trace/bpf_trace.c | 31 +++
kernel/tracepoint.c | 190 +++++++++++++++++-
tools/include/uapi/linux/bpf.h | 2 +
tools/lib/bpf/bpf.c | 1 +
tools/lib/bpf/bpf.h | 3 +-
tools/lib/bpf/libbpf.c | 27 ++-
tools/lib/bpf/libbpf.h | 3 +-
.../bpf/prog_tests/raw_tp_override_test_run.c | 23 +++
.../bpf/progs/test_raw_tp_override_test_run.c | 20 ++
.../selftests/bpf/test_kmods/bpf_testmod.c | 7 +
16 files changed, 352 insertions(+), 12 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/raw_tp_override_test_run.c
create mode 100644 tools/testing/selftests/bpf/progs/test_raw_tp_override_test_run.c
--
2.43.0
The test_kexec_jump binary is generated during 'make kselftest' but was
not ignored, leading to it appearing as untracked in `git status`.
Create a .gitignore file for selftests/kexec and add this
generated file to it.
Signed-off-by: Madhur Kumar <madhurkumar004(a)gmail.com>
---
tools/testing/selftests/kexec/.gitignore | 1 +
1 file changed, 1 insertion(+)
create mode 100644 tools/testing/selftests/kexec/.gitignore
diff --git a/tools/testing/selftests/kexec/.gitignore b/tools/testing/selftests/kexec/.gitignore
new file mode 100644
index 000000000000..6cbe9a1049f3
--- /dev/null
+++ b/tools/testing/selftests/kexec/.gitignore
@@ -0,0 +1 @@
+test_kexec_jump
--
2.51.0
The tmpshmcstat file is generated with kselftest run but was not
ignored, leading to it appearing as untracked in git status.
Add it to .gitignore to silence the warning.
Signed-off-by: Madhur Kumar <madhurkumar004(a)gmail.com>
---
tools/testing/selftests/cachestat/.gitignore | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/cachestat/.gitignore b/tools/testing/selftests/cachestat/.gitignore
index d6c30b43a4bb..abbb13b6e96b 100644
--- a/tools/testing/selftests/cachestat/.gitignore
+++ b/tools/testing/selftests/cachestat/.gitignore
@@ -1,2 +1,3 @@
# SPDX-License-Identifier: GPL-2.0-only
test_cachestat
+tmpshmcstat
--
2.51.0
Some ublk selftests have strange behavior when fio is not installed.
While most tests behave correctly (run if they don't need fio, or skip
if they need fio), the following tests have different behavior:
- test_null_01, test_null_02, test_generic_01, test_generic_02, and
test_generic_12 try to run fio without checking if it exists first,
and fail on any failure of the fio command (including "fio command
not found"). So these tests fail when they should skip.
- test_stress_05 runs fio without checking if it exists first, but
doesn't fail on fio command failure. This test passes, but that pass
is misleading as the test doesn't do anything useful without fio
installed. So this test passes when it should skip.
Fix these issues by adding _have_program fio checks to the top of all of
these tests.
Signed-off-by: Uday Shankar <ushankar(a)purestorage.com>
---
Changes in v2:
- Also fix test_generic_01, test_generic_02, test_generic_12, which fail
on systems where bpftrace is installed but fio is not (Mohit Gupta)
- Link to v1: https://lore.kernel.org/r/20250916-ublk_fio-v1-1-8d522539eed7@purestorage.c…
---
tools/testing/selftests/ublk/test_generic_01.sh | 4 ++++
tools/testing/selftests/ublk/test_generic_02.sh | 4 ++++
tools/testing/selftests/ublk/test_generic_12.sh | 4 ++++
tools/testing/selftests/ublk/test_null_01.sh | 4 ++++
tools/testing/selftests/ublk/test_null_02.sh | 4 ++++
tools/testing/selftests/ublk/test_stress_05.sh | 4 ++++
6 files changed, 24 insertions(+)
diff --git a/tools/testing/selftests/ublk/test_generic_01.sh b/tools/testing/selftests/ublk/test_generic_01.sh
index 9227a208ba53128e4a202298316ff77e05607595..21a31cd5491aa79ffe3ad458a0055e832c619325 100755
--- a/tools/testing/selftests/ublk/test_generic_01.sh
+++ b/tools/testing/selftests/ublk/test_generic_01.sh
@@ -10,6 +10,10 @@ if ! _have_program bpftrace; then
exit "$UBLK_SKIP_CODE"
fi
+if ! _have_program fio; then
+ exit "$UBLK_SKIP_CODE"
+fi
+
_prep_test "null" "sequential io order"
dev_id=$(_add_ublk_dev -t null)
diff --git a/tools/testing/selftests/ublk/test_generic_02.sh b/tools/testing/selftests/ublk/test_generic_02.sh
index 3e80121e3bf5e191aa9ffe1f85e1693be4fdc2d2..12920768b1a080d37fcdff93de7a0439101de09e 100755
--- a/tools/testing/selftests/ublk/test_generic_02.sh
+++ b/tools/testing/selftests/ublk/test_generic_02.sh
@@ -10,6 +10,10 @@ if ! _have_program bpftrace; then
exit "$UBLK_SKIP_CODE"
fi
+if ! _have_program fio; then
+ exit "$UBLK_SKIP_CODE"
+fi
+
_prep_test "null" "sequential io order for MQ"
dev_id=$(_add_ublk_dev -t null -q 2)
diff --git a/tools/testing/selftests/ublk/test_generic_12.sh b/tools/testing/selftests/ublk/test_generic_12.sh
index 7abbb00d251df9403857b1c6f53aec8bf8eab176..b4046201b4d99ef5355b845ebea2c9a3924276a5 100755
--- a/tools/testing/selftests/ublk/test_generic_12.sh
+++ b/tools/testing/selftests/ublk/test_generic_12.sh
@@ -10,6 +10,10 @@ if ! _have_program bpftrace; then
exit "$UBLK_SKIP_CODE"
fi
+if ! _have_program fio; then
+ exit "$UBLK_SKIP_CODE"
+fi
+
_prep_test "null" "do imbalanced load, it should be balanced over I/O threads"
NTHREADS=6
diff --git a/tools/testing/selftests/ublk/test_null_01.sh b/tools/testing/selftests/ublk/test_null_01.sh
index a34203f726685787da80b0e32da95e0fcb90d0b1..c2cb8f7a09fe37a9956d067fd56b28dc7ca6bd68 100755
--- a/tools/testing/selftests/ublk/test_null_01.sh
+++ b/tools/testing/selftests/ublk/test_null_01.sh
@@ -6,6 +6,10 @@
TID="null_01"
ERR_CODE=0
+if ! _have_program fio; then
+ exit "$UBLK_SKIP_CODE"
+fi
+
_prep_test "null" "basic IO test"
dev_id=$(_add_ublk_dev -t null)
diff --git a/tools/testing/selftests/ublk/test_null_02.sh b/tools/testing/selftests/ublk/test_null_02.sh
index 5633ca8766554b22be252c7cb2d13de1bf923b90..8accd35beb55c149f74b23f0fb562e12cbf3e362 100755
--- a/tools/testing/selftests/ublk/test_null_02.sh
+++ b/tools/testing/selftests/ublk/test_null_02.sh
@@ -6,6 +6,10 @@
TID="null_02"
ERR_CODE=0
+if ! _have_program fio; then
+ exit "$UBLK_SKIP_CODE"
+fi
+
_prep_test "null" "basic IO test with zero copy"
dev_id=$(_add_ublk_dev -t null -z)
diff --git a/tools/testing/selftests/ublk/test_stress_05.sh b/tools/testing/selftests/ublk/test_stress_05.sh
index 566cfd90d192ce8c1f98ca2539792d54a787b3d1..274295061042e5db3f4f0846ae63ea9b787fb2ee 100755
--- a/tools/testing/selftests/ublk/test_stress_05.sh
+++ b/tools/testing/selftests/ublk/test_stress_05.sh
@@ -5,6 +5,10 @@
TID="stress_05"
ERR_CODE=0
+if ! _have_program fio; then
+ exit "$UBLK_SKIP_CODE"
+fi
+
run_io_and_remove()
{
local size=$1
---
base-commit: da7b97ba0d219a14a83e9cc93f98b53939f12944
change-id: 20250916-ublk_fio-1910998b00b3
Best regards,
--
Uday Shankar <ushankar(a)purestorage.com>
From: Jeff Xu <jeffxu(a)google.com>
Since Linux introduced the memfd feature, memfd have always had their
execute bit set, and the memfd_create() syscall doesn't allow setting
it differently.
However, in a secure by default system, such as ChromeOS, (where all
executables should come from the rootfs, which is protected by Verified
boot), this executable nature of memfd opens a door for NoExec bypass
and enables “confused deputy attack”. E.g, in VRP bug [1]: cros_vm
process created a memfd to share the content with an external process,
however the memfd is overwritten and used for executing arbitrary code
and root escalation. [2] lists more VRP in this kind.
On the other hand, executable memfd has its legit use, runc uses memfd’s
seal and executable feature to copy the contents of the binary then
execute them, for such system, we need a solution to differentiate runc's
use of executable memfds and an attacker's [3].
To address those above, this set of patches add following:
1> Let memfd_create() set X bit at creation time.
2> Let memfd to be sealed for modifying X bit.
3> A new pid namespace sysctl: vm.memfd_noexec to control the behavior of
X bit.For example, if a container has vm.memfd_noexec=2, then
memfd_create() without MFD_NOEXEC_SEAL will be rejected.
4> A new security hook in memfd_create(). This make it possible to a new
LSM, which rejects or allows executable memfd based on its security policy.
Change history:
v7:
- patch 2/6: remove #ifdef and MAX_PATH (memfd_test.c).
- patch 3/6: check capability (CAP_SYS_ADMIN) from userns instead of
global ns (pid_sysctl.h). Add a tab (pid_namespace.h).
- patch 5/6: remove #ifdef (memfd_test.c)
- patch 6/6: remove unneeded security_move_mount(security.c).
v6:https://lore.kernel.org/lkml/20221206150233.1963717-1-jeffxu@google.com/
- Address comment and move "#ifdef CONFIG_" from .c file to pid_sysctl.h
v5:https://lore.kernel.org/lkml/20221206152358.1966099-1-jeffxu@google.com/
- Pass vm.memfd_noexec from current ns to child ns.
- Fix build issue detected by kernel test robot.
- Add missing security.c
v3:https://lore.kernel.org/lkml/20221202013404.163143-1-jeffxu@google.com/
- Address API design comments in v2.
- Let memfd_create() to set X bit at creation time.
- A new pid namespace sysctl: vm.memfd_noexec to control behavior of X bit.
- A new security hook in memfd_create().
v2:https://lore.kernel.org/lkml/20220805222126.142525-1-jeffxu@google.com/
- address comments in V1.
- add sysctl (vm.mfd_noexec) to set the default file permissions of
memfd_create to be non-executable.
v1:https://lwn.net/Articles/890096/
[1] https://crbug.com/1305411
[2] https://bugs.chromium.org/p/chromium/issues/list?q=type%3Dbug-security%20me…
[3] https://lwn.net/Articles/781013/
Daniel Verkamp (2):
mm/memfd: add F_SEAL_EXEC
selftests/memfd: add tests for F_SEAL_EXEC
Jeff Xu (4):
mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC
mm/memfd: Add write seals when apply SEAL_EXEC to executable memfd
selftests/memfd: add tests for MFD_NOEXEC_SEAL MFD_EXEC
mm/memfd: security hook for memfd_create
include/linux/lsm_hook_defs.h | 1 +
include/linux/lsm_hooks.h | 4 +
include/linux/pid_namespace.h | 19 ++
include/linux/security.h | 6 +
include/uapi/linux/fcntl.h | 1 +
include/uapi/linux/memfd.h | 4 +
kernel/pid_namespace.c | 5 +
kernel/pid_sysctl.h | 59 ++++
mm/memfd.c | 61 +++-
mm/shmem.c | 6 +
security/security.c | 5 +
tools/testing/selftests/memfd/fuse_test.c | 1 +
tools/testing/selftests/memfd/memfd_test.c | 341 ++++++++++++++++++++-
13 files changed, 510 insertions(+), 3 deletions(-)
create mode 100644 kernel/pid_sysctl.h
base-commit: eb7081409f94a9a8608593d0fb63a1aa3d6f95d8
--
2.39.0.rc1.256.g54fd8350bd-goog
This patch adds support for the Zalasr ISA extension, which supplies the
real load acquire/store release instructions.
The specification can be found here:
https://github.com/riscv/riscv-zalasr/blob/main/chapter2.adoc
This patch seires has been tested with ltp on Qemu with Brensan's zalasr
support patch[1].
Some false positive spacing error happens during patch checking. Thus I
CCed maintainers of checkpatch.pl as well.
[1] https://lore.kernel.org/all/CAGPSXwJEdtqW=nx71oufZp64nK6tK=0rytVEcz4F-gfvCO…
v3:
- Apply acquire/release semantics to arch_xchg/arch_cmpxchg operations
so as to ensure FENCE.TSO ordering between operations which precede the
UNLOCK+LOCK sequence and operations which follow the sequence. Thanks
to Andrea.
- Support hwprobe of Zalasr.
- Allow Zalasr extensions for Guest/VM.
v2:
- Adjust the order of Zalasr and Zalrsc in dt-bindings. Thanks to
Conor.
Xu Lu (8):
riscv: add ISA extension parsing for Zalasr
dt-bindings: riscv: Add Zalasr ISA extension description
riscv: hwprobe: Export Zalasr extension
riscv: Introduce Zalasr instructions
riscv: Use Zalasr for smp_load_acquire/smp_store_release
riscv: Apply acquire/release semantics to arch_xchg/arch_cmpxchg
operations
RISC-V: KVM: Allow Zalasr extensions for Guest/VM
KVM: riscv: selftests: Add Zalasr extensions to get-reg-list test
Documentation/arch/riscv/hwprobe.rst | 5 +-
.../devicetree/bindings/riscv/extensions.yaml | 5 +
arch/riscv/include/asm/atomic.h | 6 -
arch/riscv/include/asm/barrier.h | 91 ++++++++++--
arch/riscv/include/asm/cmpxchg.h | 136 ++++++++----------
arch/riscv/include/asm/hwcap.h | 1 +
arch/riscv/include/asm/insn-def.h | 79 ++++++++++
arch/riscv/include/uapi/asm/hwprobe.h | 1 +
arch/riscv/include/uapi/asm/kvm.h | 1 +
arch/riscv/kernel/cpufeature.c | 1 +
arch/riscv/kernel/sys_hwprobe.c | 1 +
arch/riscv/kvm/vcpu_onereg.c | 2 +
.../selftests/kvm/riscv/get-reg-list.c | 4 +
13 files changed, 242 insertions(+), 91 deletions(-)
--
2.20.1
FEAT_LSFE is optional from v9.5, it adds new instructions for atomic
memory operations with floating point values. We have no immediate use
for it in kernel, provide a hwcap so userspace can discover it and allow
the ID register field to be exposed to KVM guests.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v4:
- Rebase onto arm64/for-next/cpufeature, note that both patches have
build dependencies on this.
- Drop unneeded cc clobber in hwcap.
- Use STRFADD as the instruction probed in hwcap.
- Link to v3: https://lore.kernel.org/r/20250818-arm64-lsfe-v3-0-af6f4d66eb39@kernel.org
Changes in v3:
- Rebase onto v6.17-rc1.
- Link to v2: https://lore.kernel.org/r/20250703-arm64-lsfe-v2-0-eced80999cb4@kernel.org
Changes in v2:
- Fix result of vi dropping in hwcap test.
- Link to v1: https://lore.kernel.org/r/20250627-arm64-lsfe-v1-0-68351c4bf741@kernel.org
---
Mark Brown (2):
KVM: arm64: Expose FEAT_LSFE to guests
kselftest/arm64: Add lsfe to the hwcaps test
arch/arm64/kvm/sys_regs.c | 4 +++-
tools/testing/selftests/arm64/abi/hwcap.c | 21 +++++++++++++++++++++
2 files changed, 24 insertions(+), 1 deletion(-)
---
base-commit: 220928e52cb03d223b3acad3888baf0687486d21
change-id: 20250625-arm64-lsfe-0810cf98adc2
Best regards,
--
Mark Brown <broonie(a)kernel.org>
This patch simplifies kublk's implementation of the feature list
command, fixes a bug where a feature was missing, and adds a test to
ensure that similar bugs do not happen in the future.
Signed-off-by: Uday Shankar <ushankar(a)purestorage.com>
---
Changes in v2:
- Add log lines to new test in failure case, to tell the user how to fix
the test, and to indicate that the failure is expected when running
an old test suite against a new kernel (Ming Lei)
- Link to v1: https://lore.kernel.org/r/20250916-ublk_features-v1-0-52014be9cde5@purestor…
---
Uday Shankar (3):
selftests: ublk: kublk: simplify feat_map definition
selftests: ublk: kublk: add UBLK_F_BUF_REG_OFF_DAEMON to feat_map
selftests: ublk: add test to verify that feat_map is complete
tools/testing/selftests/ublk/Makefile | 1 +
tools/testing/selftests/ublk/kublk.c | 32 +++++++++++++------------
tools/testing/selftests/ublk/test_generic_13.sh | 20 ++++++++++++++++
3 files changed, 38 insertions(+), 15 deletions(-)
---
base-commit: da7b97ba0d219a14a83e9cc93f98b53939f12944
change-id: 20250916-ublk_features-07af4e321e5a
Best regards,
--
Uday Shankar <ushankar(a)purestorage.com>
There is a spelling mistake in a test message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
---
tools/testing/selftests/futex/functional/futex_numa_mpol.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/futex/functional/futex_numa_mpol.c b/tools/testing/selftests/futex/functional/futex_numa_mpol.c
index 722427fe90bf..3a71ab93db72 100644
--- a/tools/testing/selftests/futex/functional/futex_numa_mpol.c
+++ b/tools/testing/selftests/futex/functional/futex_numa_mpol.c
@@ -206,7 +206,7 @@ int main(int argc, char *argv[])
ksft_print_msg("Memory back to RW\n");
test_futex(futex_ptr, 0);
- ksft_test_result_pass("futex2 memory boundarie tests passed\n");
+ ksft_test_result_pass("futex2 memory boundary tests passed\n");
/* MPOL test. Does not work as expected */
#ifdef LIBNUMA_VER_SUFFICIENT
--
2.51.0
Syzkaller found this, fput runs the release from a work queue so the
refcount remains elevated during abort. This is tricky so move more
handling of files into the core code.
Add a WARN_ON to catch things like this more reliably without relying on
kasn.
Update the fail_nth test to succeed on 6.17 kernels.
Jason Gunthorpe (3):
iommufd: Fix race during abort for file descriptors
iommufd: WARN if an object is aborted with an elevated refcount
iommufd/selftest: Update the fail_nth limit
drivers/iommu/iommufd/device.c | 3 +-
drivers/iommu/iommufd/eventq.c | 9 +----
drivers/iommu/iommufd/iommufd_private.h | 3 +-
drivers/iommu/iommufd/main.c | 39 +++++++++++++++++--
.../selftests/iommu/iommufd_fail_nth.c | 2 +-
5 files changed, 42 insertions(+), 14 deletions(-)
base-commit: 1046d40b0e78d2cd63f6183629699b629b21f877
--
2.43.0
Mshare is a developing feature proposed by Anthony Yznaga and Khalid Aziz
that enables sharing of PTEs across processes. The V3 patch set has been
posted for review:
https://lore.kernel.org/linux-mm/20250820010415.699353-1-anthony.yznaga@ora…
This patch set adds selftests to exercise and demonstrate basic
functionality of mshare.
The initial tests use open, ioctl, and mmap syscalls to establish a shared
memory mapping between two processes and verify the expected behavior.
Additional tests are included to check interoperability with swap and
Transparent Huge Pages.
Future work will extend coverage to other use cases such as integration
with KVM and more advanced scenarios.
This series is intended to be applied on top of mshare V3, which is
based on mm-new (2025-08-15).
-----------------
V1->V2:
- Based on mshare V3, which based on mm-new as of 2025-08-15
- (Fix) For test cases in basic.c, Change to use a small chunk of
memory(4k/8K for normal pages, 2M/4M for hugetlb pages), as to
ensure these tests can run on any server or device.
- (Fix) For test cases of hugetlb, swap and THP, add a tips to
configure corresponding settings.
- (Fix) Add memory to .gitignore file once it exists
- (fix) Correct the Changelog of THP test case that mshare support
THP only when user configure shmem_enabled as always
V1:
https://lore.kernel.org/all/20250825145719.29455-1-linyongting@bytedance.co…
Yongting Lin (8):
mshare: Add selftests
mshare: selftests: Adding config fragments
mshare: selftests: Add some helper functions for mshare filesystem
mshare: selftests: Add test case shared memory
mshare: selftests: Add test case ioctl unmap
mshare: selftests: Add some helper functions for configuring and
retrieving cgroup
mshare: selftests: Add test case to demostrate the swapping of mshare
memory
mshare: selftests: Add test case to demostrate that mshare partly
supports THP
tools/testing/selftests/mshare/.gitignore | 4 +
tools/testing/selftests/mshare/Makefile | 7 +
tools/testing/selftests/mshare/basic.c | 109 ++++++++++
tools/testing/selftests/mshare/config | 1 +
tools/testing/selftests/mshare/memory.c | 89 ++++++++
tools/testing/selftests/mshare/util.c | 254 ++++++++++++++++++++++
6 files changed, 464 insertions(+)
create mode 100644 tools/testing/selftests/mshare/.gitignore
create mode 100644 tools/testing/selftests/mshare/Makefile
create mode 100644 tools/testing/selftests/mshare/basic.c
create mode 100644 tools/testing/selftests/mshare/config
create mode 100644 tools/testing/selftests/mshare/memory.c
create mode 100644 tools/testing/selftests/mshare/util.c
--
2.20.1
For a while now we have supported file handles for pidfds. This has
proven to be very useful.
Extend the concept to cover namespaces as well. After this patchset it
is possible to encode and decode namespace file handles using the
commong name_to_handle_at() and open_by_handle_at() apis.
Namespaces file descriptors can already be derived from pidfds which
means they aren't subject to overmount protection bugs. IOW, it's
irrelevant if the caller would not have access to an appropriate
/proc/<pid>/ns/ directory as they could always just derive the namespace
based on a pidfd already.
It has the same advantage as pidfds. It's possible to reliably and for
the lifetime of the system refer to a namespace without pinning any
resources and to compare them.
Permission checking is kept simple. If the caller is located in the
namespace the file handle refers to they are able to open it otherwise
they must hold privilege over the owning namespace of the relevant
namespace.
Both the network namespace and the mount namespace already have an
associated cookie that isn't recycled and is fully exposed to userspace.
Move this into ns_common and use the same id space for all namespaces so
they can trivially and reliably be compared.
There's more coming based on the iterator infrastructure but the series
is large enough and focuses on file handles.
Extensive selftests included.
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
---
Changes in v2:
- Address various review comments.
- Use a common NS_GET_ID ioctl() instead of individual ioctls.
- Link to v1: https://lore.kernel.org/20250910-work-namespace-v1-0-4dd56e7359d8@kernel.org
---
Christian Brauner (33):
pidfs: validate extensible ioctls
nsfs: drop tautological ioctl() check
nsfs: validate extensible ioctls
block: use extensible_ioctl_valid()
ns: move to_ns_common() to ns_common.h
nsfs: add nsfs.h header
ns: uniformly initialize ns_common
cgroup: use ns_common_init()
ipc: use ns_common_init()
mnt: use ns_common_init()
net: use ns_common_init()
pid: use ns_common_init()
time: use ns_common_init()
user: use ns_common_init()
uts: use ns_common_init()
ns: remove ns_alloc_inum()
nstree: make iterator generic
mnt: support ns lookup
cgroup: support ns lookup
ipc: support ns lookup
net: support ns lookup
pid: support ns lookup
time: support ns lookup
user: support ns lookup
uts: support ns lookup
ns: add to_<type>_ns() to respective headers
nsfs: add current_in_namespace()
nsfs: support file handles
nsfs: support exhaustive file handles
nsfs: add missing id retrieval support
tools: update nsfs.h uapi header
selftests/namespaces: add identifier selftests
selftests/namespaces: add file handle selftests
block/blk-integrity.c | 8 +-
fs/fhandle.c | 6 +
fs/internal.h | 1 +
fs/mount.h | 10 +-
fs/namespace.c | 156 +--
fs/nsfs.c | 201 ++-
fs/pidfs.c | 2 +-
include/linux/cgroup.h | 5 +
include/linux/exportfs.h | 6 +
include/linux/fs.h | 14 +
include/linux/ipc_namespace.h | 5 +
include/linux/ns_common.h | 29 +
include/linux/nsfs.h | 40 +
include/linux/nsproxy.h | 11 -
include/linux/nstree.h | 89 ++
include/linux/pid_namespace.h | 5 +
include/linux/proc_ns.h | 32 +-
include/linux/time_namespace.h | 9 +
include/linux/user_namespace.h | 5 +
include/linux/utsname.h | 5 +
include/net/net_namespace.h | 6 +
include/uapi/linux/fcntl.h | 1 +
include/uapi/linux/nsfs.h | 15 +-
init/main.c | 2 +
ipc/msgutil.c | 1 +
ipc/namespace.c | 12 +-
ipc/shm.c | 2 +
kernel/Makefile | 2 +-
kernel/cgroup/cgroup.c | 2 +
kernel/cgroup/namespace.c | 24 +-
kernel/nstree.c | 233 ++++
kernel/pid_namespace.c | 13 +-
kernel/time/namespace.c | 23 +-
kernel/user_namespace.c | 17 +-
kernel/utsname.c | 28 +-
net/core/net_namespace.c | 59 +-
tools/include/uapi/linux/nsfs.h | 17 +-
tools/testing/selftests/namespaces/.gitignore | 2 +
tools/testing/selftests/namespaces/Makefile | 7 +
tools/testing/selftests/namespaces/config | 7 +
.../selftests/namespaces/file_handle_test.c | 1429 ++++++++++++++++++++
tools/testing/selftests/namespaces/nsid_test.c | 986 ++++++++++++++
42 files changed, 3257 insertions(+), 270 deletions(-)
---
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
change-id: 20250905-work-namespace-c68826dda0d4
[ I think at this point everyone is OK with the ABI, and the x86
implementation has been tested so hopefully we are near to being
able to get this merged? If there are any outstanding issues let
me know and I can look at addressing them. The one possible issue
I am aware of is that the RISC-V shadow stack support was briefly
in -next but got dropped along with the general RISC-V issues during
the last merge window, rebasing for that is still in progress. I
guess ideally this could be applied on a branch and then pulled into
the RISC-V tree? ]
The kernel has recently added support for shadow stacks, currently
x86 only using their CET feature but both arm64 and RISC-V have
equivalent features (GCS and Zicfiss respectively), I am actively
working on GCS[1]. With shadow stacks the hardware maintains an
additional stack containing only the return addresses for branch
instructions which is not generally writeable by userspace and ensures
that any returns are to the recorded addresses. This provides some
protection against ROP attacks and making it easier to collect call
stacks. These shadow stacks are allocated in the address space of the
userspace process.
Our API for shadow stacks does not currently offer userspace any
flexiblity for managing the allocation of shadow stacks for newly
created threads, instead the kernel allocates a new shadow stack with
the same size as the normal stack whenever a thread is created with the
feature enabled. The stacks allocated in this way are freed by the
kernel when the thread exits or shadow stacks are disabled for the
thread. This lack of flexibility and control isn't ideal, in the vast
majority of cases the shadow stack will be over allocated and the
implicit allocation and deallocation is not consistent with other
interfaces. As far as I can tell the interface is done in this manner
mainly because the shadow stack patches were in development since before
clone3() was implemented.
Since clone3() is readily extensible let's add support for specifying a
shadow stack when creating a new thread or process, keeping the current
implicit allocation behaviour if one is not specified either with
clone3() or through the use of clone(). The user must provide a shadow
stack pointer, this must point to memory mapped for use as a shadow
stackby map_shadow_stack() with an architecture specified shadow stack
token at the top of the stack.
Yuri Khrustalev has raised questions from the libc side regarding
discoverability of extended clone3() structure sizes[2], this seems like
a general issue with clone3(). There was a suggestion to add a hwcap on
arm64 which isn't ideal but is doable there, though architecture
specific mechanisms would also be needed for x86 (and RISC-V if it's
support gets merged before this does). The idea has, however, had
strong pushback from the architecture maintainers and it is possible to
detect support for this in clone3() by attempting a call with a
misaligned shadow stack pointer specified so no hwcap has been added.
[1] https://lore.kernel.org/linux-arm-kernel/20241001-arm64-gcs-v13-0-222b78d87…
[2] https://lore.kernel.org/r/aCs65ccRQtJBnZ_5@arm.com
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v21:
- Rebase onto https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git kernel-6.18.clone3
- Rename shadow_stack_token to shstk_token, since it's a simple rename I've
kept the acks and reviews but I dropped the tested-bys just to be safe.
- Link to v20: https://lore.kernel.org/r/20250902-clone3-shadow-stack-v20-0-4d9fff1c53e7@k…
Changes in v20:
- Comment fixes and clarifications in x86 arch_shstk_validate_clone()
from Rick Edgecombe.
- Spelling fix in documentation.
- Link to v19: https://lore.kernel.org/r/20250819-clone3-shadow-stack-v19-0-bc957075479b@k…
Changes in v19:
- Rebase onto v6.17-rc1.
- Link to v18: https://lore.kernel.org/r/20250702-clone3-shadow-stack-v18-0-7965d2b694db@k…
Changes in v18:
- Rebase onto v6.16-rc3.
- Thanks to pointers from Yuri Khrustalev this version has been tested
on x86 so I have removed the RFT tag.
- Clarify clone3_shadow_stack_valid() comment about the Kconfig check.
- Remove redundant GCSB DSYNCs in arm64 code.
- Fix token validation on x86.
- Link to v17: https://lore.kernel.org/r/20250609-clone3-shadow-stack-v17-0-8840ed97ff6f@k…
Changes in v17:
- Rebase onto v6.16-rc1.
- Link to v16: https://lore.kernel.org/r/20250416-clone3-shadow-stack-v16-0-2ffc9ca3917b@k…
Changes in v16:
- Rebase onto v6.15-rc2.
- Roll in fixes from x86 testing from Rick Edgecombe.
- Rework so that the argument is shadow_stack_token.
- Link to v15: https://lore.kernel.org/r/20250408-clone3-shadow-stack-v15-0-3fa245c6e3be@k…
Changes in v15:
- Rebase onto v6.15-rc1.
- Link to v14: https://lore.kernel.org/r/20250206-clone3-shadow-stack-v14-0-805b53af73b9@k…
Changes in v14:
- Rebase onto v6.14-rc1.
- Link to v13: https://lore.kernel.org/r/20241203-clone3-shadow-stack-v13-0-93b89a81a5ed@k…
Changes in v13:
- Rebase onto v6.13-rc1.
- Link to v12: https://lore.kernel.org/r/20241031-clone3-shadow-stack-v12-0-7183eb8bee17@k…
Changes in v12:
- Add the regular prctl() to the userspace API document since arm64
support is queued in -next.
- Link to v11: https://lore.kernel.org/r/20241005-clone3-shadow-stack-v11-0-2a6a2bd6d651@k…
Changes in v11:
- Rebase onto arm64 for-next/gcs, which is based on v6.12-rc1, and
integrate arm64 support.
- Rework the interface to specify a shadow stack pointer rather than a
base and size like we do for the regular stack.
- Link to v10: https://lore.kernel.org/r/20240821-clone3-shadow-stack-v10-0-06e8797b9445@k…
Changes in v10:
- Integrate fixes & improvements for the x86 implementation from Rick
Edgecombe.
- Require that the shadow stack be VM_WRITE.
- Require that the shadow stack base and size be sizeof(void *) aligned.
- Clean up trailing newline.
- Link to v9: https://lore.kernel.org/r/20240819-clone3-shadow-stack-v9-0-962d74f99464@ke…
Changes in v9:
- Pull token validation earlier and report problems with an error return
to parent rather than signal delivery to the child.
- Verify that the top of the supplied shadow stack is VM_SHADOW_STACK.
- Rework token validation to only do the page mapping once.
- Drop no longer needed support for testing for signals in selftest.
- Fix typo in comments.
- Link to v8: https://lore.kernel.org/r/20240808-clone3-shadow-stack-v8-0-0acf37caf14c@ke…
Changes in v8:
- Fix token verification with user specified shadow stack.
- Don't track user managed shadow stacks for child processes.
- Link to v7: https://lore.kernel.org/r/20240731-clone3-shadow-stack-v7-0-a9532eebfb1d@ke…
Changes in v7:
- Rebase onto v6.11-rc1.
- Typo fixes.
- Link to v6: https://lore.kernel.org/r/20240623-clone3-shadow-stack-v6-0-9ee7783b1fb9@ke…
Changes in v6:
- Rebase onto v6.10-rc3.
- Ensure we don't try to free the parent shadow stack in error paths of
x86 arch code.
- Spelling fixes in userspace API document.
- Additional cleanups and improvements to the clone3() tests to support
the shadow stack tests.
- Link to v5: https://lore.kernel.org/r/20240203-clone3-shadow-stack-v5-0-322c69598e4b@ke…
Changes in v5:
- Rebase onto v6.8-rc2.
- Rework ABI to have the user allocate the shadow stack memory with
map_shadow_stack() and a token.
- Force inlining of the x86 shadow stack enablement.
- Move shadow stack enablement out into a shared header for reuse by
other tests.
- Link to v4: https://lore.kernel.org/r/20231128-clone3-shadow-stack-v4-0-8b28ffe4f676@ke…
Changes in v4:
- Formatting changes.
- Use a define for minimum shadow stack size and move some basic
validation to fork.c.
- Link to v3: https://lore.kernel.org/r/20231120-clone3-shadow-stack-v3-0-a7b8ed3e2acc@ke…
Changes in v3:
- Rebase onto v6.7-rc2.
- Remove stale shadow_stack in internal kargs.
- If a shadow stack is specified unconditionally use it regardless of
CLONE_ parameters.
- Force enable shadow stacks in the selftest.
- Update changelogs for RISC-V feature rename.
- Link to v2: https://lore.kernel.org/r/20231114-clone3-shadow-stack-v2-0-b613f8681155@ke…
Changes in v2:
- Rebase onto v6.7-rc1.
- Remove ability to provide preallocated shadow stack, just specify the
desired size.
- Link to v1: https://lore.kernel.org/r/20231023-clone3-shadow-stack-v1-0-d867d0b5d4d0@ke…
---
Mark Brown (8):
arm64/gcs: Return a success value from gcs_alloc_thread_stack()
Documentation: userspace-api: Add shadow stack API documentation
selftests: Provide helper header for shadow stack testing
fork: Add shadow stack support to clone3()
selftests/clone3: Remove redundant flushes of output streams
selftests/clone3: Factor more of main loop into test_clone3()
selftests/clone3: Allow tests to flag if -E2BIG is a valid error code
selftests/clone3: Test shadow stack support
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/shadow_stack.rst | 44 +++++
arch/arm64/include/asm/gcs.h | 8 +-
arch/arm64/kernel/process.c | 8 +-
arch/arm64/mm/gcs.c | 55 +++++-
arch/x86/include/asm/shstk.h | 11 +-
arch/x86/kernel/process.c | 2 +-
arch/x86/kernel/shstk.c | 53 ++++-
include/asm-generic/cacheflush.h | 11 ++
include/linux/sched/task.h | 17 ++
include/uapi/linux/sched.h | 9 +-
kernel/fork.c | 93 +++++++--
tools/testing/selftests/clone3/clone3.c | 226 ++++++++++++++++++----
tools/testing/selftests/clone3/clone3_selftests.h | 65 ++++++-
tools/testing/selftests/ksft_shstk.h | 98 ++++++++++
15 files changed, 620 insertions(+), 81 deletions(-)
---
base-commit: 76cea30ad520238160bf8f5e2f2803fcd7a08d22
change-id: 20231019-clone3-shadow-stack-15d40d2bf536
Best regards,
--
Mark Brown <broonie(a)kernel.org>