September 2025 - Linux-kselftest-mirror

[PATCH v2] selftests: ublk: fix behavior when fio is not installed

by Uday Shankar

Some ublk selftests have strange behavior when fio is not installed. While most tests behave correctly (run if they don't need fio, or skip if they need fio), the following tests have different behavior: - test_null_01, test_null_02, test_generic_01, test_generic_02, and test_generic_12 try to run fio without checking if it exists first, and fail on any failure of the fio command (including "fio command not found"). So these tests fail when they should skip. - test_stress_05 runs fio without checking if it exists first, but doesn't fail on fio command failure. This test passes, but that pass is misleading as the test doesn't do anything useful without fio installed. So this test passes when it should skip. Fix these issues by adding _have_program fio checks to the top of all of these tests. Signed-off-by: Uday Shankar <ushankar(a)purestorage.com> --- Changes in v2: - Also fix test_generic_01, test_generic_02, test_generic_12, which fail on systems where bpftrace is installed but fio is not (Mohit Gupta) - Link to v1: https://lore.kernel.org/r/20250916-ublk_fio-v1-1-8d522539eed7@purestorage.c… --- tools/testing/selftests/ublk/test_generic_01.sh | 4 ++++ tools/testing/selftests/ublk/test_generic_02.sh | 4 ++++ tools/testing/selftests/ublk/test_generic_12.sh | 4 ++++ tools/testing/selftests/ublk/test_null_01.sh | 4 ++++ tools/testing/selftests/ublk/test_null_02.sh | 4 ++++ tools/testing/selftests/ublk/test_stress_05.sh | 4 ++++ 6 files changed, 24 insertions(+) diff --git a/tools/testing/selftests/ublk/test_generic_01.sh b/tools/testing/selftests/ublk/test_generic_01.sh index 9227a208ba53128e4a202298316ff77e05607595..21a31cd5491aa79ffe3ad458a0055e832c619325 100755 --- a/tools/testing/selftests/ublk/test_generic_01.sh +++ b/tools/testing/selftests/ublk/test_generic_01.sh @@ -10,6 +10,10 @@ if ! _have_program bpftrace; then exit "$UBLK_SKIP_CODE" fi +if ! _have_program fio; then + exit "$UBLK_SKIP_CODE" +fi + _prep_test "null" "sequential io order" dev_id=$(_add_ublk_dev -t null) diff --git a/tools/testing/selftests/ublk/test_generic_02.sh b/tools/testing/selftests/ublk/test_generic_02.sh index 3e80121e3bf5e191aa9ffe1f85e1693be4fdc2d2..12920768b1a080d37fcdff93de7a0439101de09e 100755 --- a/tools/testing/selftests/ublk/test_generic_02.sh +++ b/tools/testing/selftests/ublk/test_generic_02.sh @@ -10,6 +10,10 @@ if ! _have_program bpftrace; then exit "$UBLK_SKIP_CODE" fi +if ! _have_program fio; then + exit "$UBLK_SKIP_CODE" +fi + _prep_test "null" "sequential io order for MQ" dev_id=$(_add_ublk_dev -t null -q 2) diff --git a/tools/testing/selftests/ublk/test_generic_12.sh b/tools/testing/selftests/ublk/test_generic_12.sh index 7abbb00d251df9403857b1c6f53aec8bf8eab176..b4046201b4d99ef5355b845ebea2c9a3924276a5 100755 --- a/tools/testing/selftests/ublk/test_generic_12.sh +++ b/tools/testing/selftests/ublk/test_generic_12.sh @@ -10,6 +10,10 @@ if ! _have_program bpftrace; then exit "$UBLK_SKIP_CODE" fi +if ! _have_program fio; then + exit "$UBLK_SKIP_CODE" +fi + _prep_test "null" "do imbalanced load, it should be balanced over I/O threads" NTHREADS=6 diff --git a/tools/testing/selftests/ublk/test_null_01.sh b/tools/testing/selftests/ublk/test_null_01.sh index a34203f726685787da80b0e32da95e0fcb90d0b1..c2cb8f7a09fe37a9956d067fd56b28dc7ca6bd68 100755 --- a/tools/testing/selftests/ublk/test_null_01.sh +++ b/tools/testing/selftests/ublk/test_null_01.sh @@ -6,6 +6,10 @@ TID="null_01" ERR_CODE=0 +if ! _have_program fio; then + exit "$UBLK_SKIP_CODE" +fi + _prep_test "null" "basic IO test" dev_id=$(_add_ublk_dev -t null) diff --git a/tools/testing/selftests/ublk/test_null_02.sh b/tools/testing/selftests/ublk/test_null_02.sh index 5633ca8766554b22be252c7cb2d13de1bf923b90..8accd35beb55c149f74b23f0fb562e12cbf3e362 100755 --- a/tools/testing/selftests/ublk/test_null_02.sh +++ b/tools/testing/selftests/ublk/test_null_02.sh @@ -6,6 +6,10 @@ TID="null_02" ERR_CODE=0 +if ! _have_program fio; then + exit "$UBLK_SKIP_CODE" +fi + _prep_test "null" "basic IO test with zero copy" dev_id=$(_add_ublk_dev -t null -z) diff --git a/tools/testing/selftests/ublk/test_stress_05.sh b/tools/testing/selftests/ublk/test_stress_05.sh index 566cfd90d192ce8c1f98ca2539792d54a787b3d1..274295061042e5db3f4f0846ae63ea9b787fb2ee 100755 --- a/tools/testing/selftests/ublk/test_stress_05.sh +++ b/tools/testing/selftests/ublk/test_stress_05.sh @@ -5,6 +5,10 @@ TID="stress_05" ERR_CODE=0 +if ! _have_program fio; then + exit "$UBLK_SKIP_CODE" +fi + run_io_and_remove() { local size=$1 --- base-commit: da7b97ba0d219a14a83e9cc93f98b53939f12944 change-id: 20250916-ublk_fio-1910998b00b3 Best regards, -- Uday Shankar <ushankar(a)purestorage.com>

2 months, 1 week

3
2
0 0

[PATCH v7 0/6] mm/memfd: introduce MFD_NOEXEC_SEAL and MFD_EXEC

by jeffxu＠chromium.org

From: Jeff Xu <jeffxu(a)google.com> Since Linux introduced the memfd feature, memfd have always had their execute bit set, and the memfd_create() syscall doesn't allow setting it differently. However, in a secure by default system, such as ChromeOS, (where all executables should come from the rootfs, which is protected by Verified boot), this executable nature of memfd opens a door for NoExec bypass and enables “confused deputy attack”. E.g, in VRP bug [1]: cros_vm process created a memfd to share the content with an external process, however the memfd is overwritten and used for executing arbitrary code and root escalation. [2] lists more VRP in this kind. On the other hand, executable memfd has its legit use, runc uses memfd’s seal and executable feature to copy the contents of the binary then execute them, for such system, we need a solution to differentiate runc's use of executable memfds and an attacker's [3]. To address those above, this set of patches add following: 1> Let memfd_create() set X bit at creation time. 2> Let memfd to be sealed for modifying X bit. 3> A new pid namespace sysctl: vm.memfd_noexec to control the behavior of X bit.For example, if a container has vm.memfd_noexec=2, then memfd_create() without MFD_NOEXEC_SEAL will be rejected. 4> A new security hook in memfd_create(). This make it possible to a new LSM, which rejects or allows executable memfd based on its security policy. Change history: v7: - patch 2/6: remove #ifdef and MAX_PATH (memfd_test.c). - patch 3/6: check capability (CAP_SYS_ADMIN) from userns instead of global ns (pid_sysctl.h). Add a tab (pid_namespace.h). - patch 5/6: remove #ifdef (memfd_test.c) - patch 6/6: remove unneeded security_move_mount(security.c). v6:https://lore.kernel.org/lkml/20221206150233.1963717-1-jeffxu@google.com/ - Address comment and move "#ifdef CONFIG_" from .c file to pid_sysctl.h v5:https://lore.kernel.org/lkml/20221206152358.1966099-1-jeffxu@google.com/ - Pass vm.memfd_noexec from current ns to child ns. - Fix build issue detected by kernel test robot. - Add missing security.c v3:https://lore.kernel.org/lkml/20221202013404.163143-1-jeffxu@google.com/ - Address API design comments in v2. - Let memfd_create() to set X bit at creation time. - A new pid namespace sysctl: vm.memfd_noexec to control behavior of X bit. - A new security hook in memfd_create(). v2:https://lore.kernel.org/lkml/20220805222126.142525-1-jeffxu@google.com/ - address comments in V1. - add sysctl (vm.mfd_noexec) to set the default file permissions of memfd_create to be non-executable. v1:https://lwn.net/Articles/890096/ [1] https://crbug.com/1305411 [2] https://bugs.chromium.org/p/chromium/issues/list?q=type%3Dbug-security%20me… [3] https://lwn.net/Articles/781013/ Daniel Verkamp (2): mm/memfd: add F_SEAL_EXEC selftests/memfd: add tests for F_SEAL_EXEC Jeff Xu (4): mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC mm/memfd: Add write seals when apply SEAL_EXEC to executable memfd selftests/memfd: add tests for MFD_NOEXEC_SEAL MFD_EXEC mm/memfd: security hook for memfd_create include/linux/lsm_hook_defs.h | 1 + include/linux/lsm_hooks.h | 4 + include/linux/pid_namespace.h | 19 ++ include/linux/security.h | 6 + include/uapi/linux/fcntl.h | 1 + include/uapi/linux/memfd.h | 4 + kernel/pid_namespace.c | 5 + kernel/pid_sysctl.h | 59 ++++ mm/memfd.c | 61 +++- mm/shmem.c | 6 + security/security.c | 5 + tools/testing/selftests/memfd/fuse_test.c | 1 + tools/testing/selftests/memfd/memfd_test.c | 341 ++++++++++++++++++++- 13 files changed, 510 insertions(+), 3 deletions(-) create mode 100644 kernel/pid_sysctl.h base-commit: eb7081409f94a9a8608593d0fb63a1aa3d6f95d8 -- 2.39.0.rc1.256.g54fd8350bd-goog

2 months, 1 week

9
25
0 0

[PATCH v3 0/8] riscv: Add Zalasr ISA extension support

by Xu Lu

This patch adds support for the Zalasr ISA extension, which supplies the real load acquire/store release instructions. The specification can be found here: https://github.com/riscv/riscv-zalasr/blob/main/chapter2.adoc This patch seires has been tested with ltp on Qemu with Brensan's zalasr support patch[1]. Some false positive spacing error happens during patch checking. Thus I CCed maintainers of checkpatch.pl as well. [1] https://lore.kernel.org/all/CAGPSXwJEdtqW=nx71oufZp64nK6tK=0rytVEcz4F-gfvCO… v3: - Apply acquire/release semantics to arch_xchg/arch_cmpxchg operations so as to ensure FENCE.TSO ordering between operations which precede the UNLOCK+LOCK sequence and operations which follow the sequence. Thanks to Andrea. - Support hwprobe of Zalasr. - Allow Zalasr extensions for Guest/VM. v2: - Adjust the order of Zalasr and Zalrsc in dt-bindings. Thanks to Conor. Xu Lu (8): riscv: add ISA extension parsing for Zalasr dt-bindings: riscv: Add Zalasr ISA extension description riscv: hwprobe: Export Zalasr extension riscv: Introduce Zalasr instructions riscv: Use Zalasr for smp_load_acquire/smp_store_release riscv: Apply acquire/release semantics to arch_xchg/arch_cmpxchg operations RISC-V: KVM: Allow Zalasr extensions for Guest/VM KVM: riscv: selftests: Add Zalasr extensions to get-reg-list test Documentation/arch/riscv/hwprobe.rst | 5 +- .../devicetree/bindings/riscv/extensions.yaml | 5 + arch/riscv/include/asm/atomic.h | 6 - arch/riscv/include/asm/barrier.h | 91 ++++++++++-- arch/riscv/include/asm/cmpxchg.h | 136 ++++++++---------- arch/riscv/include/asm/hwcap.h | 1 + arch/riscv/include/asm/insn-def.h | 79 ++++++++++ arch/riscv/include/uapi/asm/hwprobe.h | 1 + arch/riscv/include/uapi/asm/kvm.h | 1 + arch/riscv/kernel/cpufeature.c | 1 + arch/riscv/kernel/sys_hwprobe.c | 1 + arch/riscv/kvm/vcpu_onereg.c | 2 + .../selftests/kvm/riscv/get-reg-list.c | 4 + 13 files changed, 242 insertions(+), 91 deletions(-) -- 2.20.1

2 months, 1 week

3
14
0 0

[PATCH v4 0/2] arm64: Support FEAT_LSFE (Large System Float Extension)

by Mark Brown

FEAT_LSFE is optional from v9.5, it adds new instructions for atomic memory operations with floating point values. We have no immediate use for it in kernel, provide a hwcap so userspace can discover it and allow the ID register field to be exposed to KVM guests. Signed-off-by: Mark Brown <broonie(a)kernel.org> --- Changes in v4: - Rebase onto arm64/for-next/cpufeature, note that both patches have build dependencies on this. - Drop unneeded cc clobber in hwcap. - Use STRFADD as the instruction probed in hwcap. - Link to v3: https://lore.kernel.org/r/20250818-arm64-lsfe-v3-0-af6f4d66eb39@kernel.org Changes in v3: - Rebase onto v6.17-rc1. - Link to v2: https://lore.kernel.org/r/20250703-arm64-lsfe-v2-0-eced80999cb4@kernel.org Changes in v2: - Fix result of vi dropping in hwcap test. - Link to v1: https://lore.kernel.org/r/20250627-arm64-lsfe-v1-0-68351c4bf741@kernel.org --- Mark Brown (2): KVM: arm64: Expose FEAT_LSFE to guests kselftest/arm64: Add lsfe to the hwcaps test arch/arm64/kvm/sys_regs.c | 4 +++- tools/testing/selftests/arm64/abi/hwcap.c | 21 +++++++++++++++++++++ 2 files changed, 24 insertions(+), 1 deletion(-) --- base-commit: 220928e52cb03d223b3acad3888baf0687486d21 change-id: 20250625-arm64-lsfe-0810cf98adc2 Best regards, -- Mark Brown <broonie(a)kernel.org>

2 months, 1 week

4
6
0 0

[PATCH v2 0/3] selftests: ublk: kublk: fix feature list

by Uday Shankar

This patch simplifies kublk's implementation of the feature list command, fixes a bug where a feature was missing, and adds a test to ensure that similar bugs do not happen in the future. Signed-off-by: Uday Shankar <ushankar(a)purestorage.com> --- Changes in v2: - Add log lines to new test in failure case, to tell the user how to fix the test, and to indicate that the failure is expected when running an old test suite against a new kernel (Ming Lei) - Link to v1: https://lore.kernel.org/r/20250916-ublk_features-v1-0-52014be9cde5@purestor… --- Uday Shankar (3): selftests: ublk: kublk: simplify feat_map definition selftests: ublk: kublk: add UBLK_F_BUF_REG_OFF_DAEMON to feat_map selftests: ublk: add test to verify that feat_map is complete tools/testing/selftests/ublk/Makefile | 1 + tools/testing/selftests/ublk/kublk.c | 32 +++++++++++++------------ tools/testing/selftests/ublk/test_generic_13.sh | 20 ++++++++++++++++ 3 files changed, 38 insertions(+), 15 deletions(-) --- base-commit: da7b97ba0d219a14a83e9cc93f98b53939f12944 change-id: 20250916-ublk_features-07af4e321e5a Best regards, -- Uday Shankar <ushankar(a)purestorage.com>

2 months, 1 week

3
5
0 0

[PATCH][next] selftest/futex: Fix spelling mistake "boundarie" -> "boundary"

by Colin Ian King

There is a spelling mistake in a test message. Fix it. Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com> --- tools/testing/selftests/futex/functional/futex_numa_mpol.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/futex/functional/futex_numa_mpol.c b/tools/testing/selftests/futex/functional/futex_numa_mpol.c index 722427fe90bf..3a71ab93db72 100644 --- a/tools/testing/selftests/futex/functional/futex_numa_mpol.c +++ b/tools/testing/selftests/futex/functional/futex_numa_mpol.c @@ -206,7 +206,7 @@ int main(int argc, char *argv[]) ksft_print_msg("Memory back to RW\n"); test_futex(futex_ptr, 0); - ksft_test_result_pass("futex2 memory boundarie tests passed\n"); + ksft_test_result_pass("futex2 memory boundary tests passed\n"); /* MPOL test. Does not work as expected */ #ifdef LIBNUMA_VER_SUFFICIENT -- 2.51.0

2 months, 1 week

1
0
0 0

[PATCH 0/3] Fix a race with fput during eventq abort

by Jason Gunthorpe

Syzkaller found this, fput runs the release from a work queue so the refcount remains elevated during abort. This is tricky so move more handling of files into the core code. Add a WARN_ON to catch things like this more reliably without relying on kasn. Update the fail_nth test to succeed on 6.17 kernels. Jason Gunthorpe (3): iommufd: Fix race during abort for file descriptors iommufd: WARN if an object is aborted with an elevated refcount iommufd/selftest: Update the fail_nth limit drivers/iommu/iommufd/device.c | 3 +- drivers/iommu/iommufd/eventq.c | 9 +---- drivers/iommu/iommufd/iommufd_private.h | 3 +- drivers/iommu/iommufd/main.c | 39 +++++++++++++++++-- .../selftests/iommu/iommufd_fail_nth.c | 2 +- 5 files changed, 42 insertions(+), 14 deletions(-) base-commit: 1046d40b0e78d2cd63f6183629699b629b21f877 -- 2.43.0

2 months, 1 week

4
16
0 0

[PATCH V2 0/8] Add selftests for mshare

by Yongting Lin

Mshare is a developing feature proposed by Anthony Yznaga and Khalid Aziz that enables sharing of PTEs across processes. The V3 patch set has been posted for review: https://lore.kernel.org/linux-mm/20250820010415.699353-1-anthony.yznaga@ora… This patch set adds selftests to exercise and demonstrate basic functionality of mshare. The initial tests use open, ioctl, and mmap syscalls to establish a shared memory mapping between two processes and verify the expected behavior. Additional tests are included to check interoperability with swap and Transparent Huge Pages. Future work will extend coverage to other use cases such as integration with KVM and more advanced scenarios. This series is intended to be applied on top of mshare V3, which is based on mm-new (2025-08-15). ----------------- V1->V2: - Based on mshare V3, which based on mm-new as of 2025-08-15 - (Fix) For test cases in basic.c, Change to use a small chunk of memory(4k/8K for normal pages, 2M/4M for hugetlb pages), as to ensure these tests can run on any server or device. - (Fix) For test cases of hugetlb, swap and THP, add a tips to configure corresponding settings. - (Fix) Add memory to .gitignore file once it exists - (fix) Correct the Changelog of THP test case that mshare support THP only when user configure shmem_enabled as always V1: https://lore.kernel.org/all/20250825145719.29455-1-linyongting@bytedance.co… Yongting Lin (8): mshare: Add selftests mshare: selftests: Adding config fragments mshare: selftests: Add some helper functions for mshare filesystem mshare: selftests: Add test case shared memory mshare: selftests: Add test case ioctl unmap mshare: selftests: Add some helper functions for configuring and retrieving cgroup mshare: selftests: Add test case to demostrate the swapping of mshare memory mshare: selftests: Add test case to demostrate that mshare partly supports THP tools/testing/selftests/mshare/.gitignore | 4 + tools/testing/selftests/mshare/Makefile | 7 + tools/testing/selftests/mshare/basic.c | 109 ++++++++++ tools/testing/selftests/mshare/config | 1 + tools/testing/selftests/mshare/memory.c | 89 ++++++++ tools/testing/selftests/mshare/util.c | 254 ++++++++++++++++++++++ 6 files changed, 464 insertions(+) create mode 100644 tools/testing/selftests/mshare/.gitignore create mode 100644 tools/testing/selftests/mshare/Makefile create mode 100644 tools/testing/selftests/mshare/basic.c create mode 100644 tools/testing/selftests/mshare/config create mode 100644 tools/testing/selftests/mshare/memory.c create mode 100644 tools/testing/selftests/mshare/util.c -- 2.20.1

2 months, 2 weeks

1
7
0 0

[PATCH v2 00/33] ns: support file handles

by Christian Brauner

For a while now we have supported file handles for pidfds. This has proven to be very useful. Extend the concept to cover namespaces as well. After this patchset it is possible to encode and decode namespace file handles using the commong name_to_handle_at() and open_by_handle_at() apis. Namespaces file descriptors can already be derived from pidfds which means they aren't subject to overmount protection bugs. IOW, it's irrelevant if the caller would not have access to an appropriate /proc/<pid>/ns/ directory as they could always just derive the namespace based on a pidfd already. It has the same advantage as pidfds. It's possible to reliably and for the lifetime of the system refer to a namespace without pinning any resources and to compare them. Permission checking is kept simple. If the caller is located in the namespace the file handle refers to they are able to open it otherwise they must hold privilege over the owning namespace of the relevant namespace. Both the network namespace and the mount namespace already have an associated cookie that isn't recycled and is fully exposed to userspace. Move this into ns_common and use the same id space for all namespaces so they can trivially and reliably be compared. There's more coming based on the iterator infrastructure but the series is large enough and focuses on file handles. Extensive selftests included. Signed-off-by: Christian Brauner <brauner(a)kernel.org> --- Changes in v2: - Address various review comments. - Use a common NS_GET_ID ioctl() instead of individual ioctls. - Link to v1: https://lore.kernel.org/20250910-work-namespace-v1-0-4dd56e7359d8@kernel.org --- Christian Brauner (33): pidfs: validate extensible ioctls nsfs: drop tautological ioctl() check nsfs: validate extensible ioctls block: use extensible_ioctl_valid() ns: move to_ns_common() to ns_common.h nsfs: add nsfs.h header ns: uniformly initialize ns_common cgroup: use ns_common_init() ipc: use ns_common_init() mnt: use ns_common_init() net: use ns_common_init() pid: use ns_common_init() time: use ns_common_init() user: use ns_common_init() uts: use ns_common_init() ns: remove ns_alloc_inum() nstree: make iterator generic mnt: support ns lookup cgroup: support ns lookup ipc: support ns lookup net: support ns lookup pid: support ns lookup time: support ns lookup user: support ns lookup uts: support ns lookup ns: add to_<type>_ns() to respective headers nsfs: add current_in_namespace() nsfs: support file handles nsfs: support exhaustive file handles nsfs: add missing id retrieval support tools: update nsfs.h uapi header selftests/namespaces: add identifier selftests selftests/namespaces: add file handle selftests block/blk-integrity.c | 8 +- fs/fhandle.c | 6 + fs/internal.h | 1 + fs/mount.h | 10 +- fs/namespace.c | 156 +-- fs/nsfs.c | 201 ++- fs/pidfs.c | 2 +- include/linux/cgroup.h | 5 + include/linux/exportfs.h | 6 + include/linux/fs.h | 14 + include/linux/ipc_namespace.h | 5 + include/linux/ns_common.h | 29 + include/linux/nsfs.h | 40 + include/linux/nsproxy.h | 11 - include/linux/nstree.h | 89 ++ include/linux/pid_namespace.h | 5 + include/linux/proc_ns.h | 32 +- include/linux/time_namespace.h | 9 + include/linux/user_namespace.h | 5 + include/linux/utsname.h | 5 + include/net/net_namespace.h | 6 + include/uapi/linux/fcntl.h | 1 + include/uapi/linux/nsfs.h | 15 +- init/main.c | 2 + ipc/msgutil.c | 1 + ipc/namespace.c | 12 +- ipc/shm.c | 2 + kernel/Makefile | 2 +- kernel/cgroup/cgroup.c | 2 + kernel/cgroup/namespace.c | 24 +- kernel/nstree.c | 233 ++++ kernel/pid_namespace.c | 13 +- kernel/time/namespace.c | 23 +- kernel/user_namespace.c | 17 +- kernel/utsname.c | 28 +- net/core/net_namespace.c | 59 +- tools/include/uapi/linux/nsfs.h | 17 +- tools/testing/selftests/namespaces/.gitignore | 2 + tools/testing/selftests/namespaces/Makefile | 7 + tools/testing/selftests/namespaces/config | 7 + .../selftests/namespaces/file_handle_test.c | 1429 ++++++++++++++++++++ tools/testing/selftests/namespaces/nsid_test.c | 986 ++++++++++++++ 42 files changed, 3257 insertions(+), 270 deletions(-) --- base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585 change-id: 20250905-work-namespace-c68826dda0d4

2 months, 2 weeks

7
70
0 0

[PATCH v21 0/8] fork: Support shadow stacks in clone3()

by Mark Brown

[ I think at this point everyone is OK with the ABI, and the x86 implementation has been tested so hopefully we are near to being able to get this merged? If there are any outstanding issues let me know and I can look at addressing them. The one possible issue I am aware of is that the RISC-V shadow stack support was briefly in -next but got dropped along with the general RISC-V issues during the last merge window, rebasing for that is still in progress. I guess ideally this could be applied on a branch and then pulled into the RISC-V tree? ] The kernel has recently added support for shadow stacks, currently x86 only using their CET feature but both arm64 and RISC-V have equivalent features (GCS and Zicfiss respectively), I am actively working on GCS[1]. With shadow stacks the hardware maintains an additional stack containing only the return addresses for branch instructions which is not generally writeable by userspace and ensures that any returns are to the recorded addresses. This provides some protection against ROP attacks and making it easier to collect call stacks. These shadow stacks are allocated in the address space of the userspace process. Our API for shadow stacks does not currently offer userspace any flexiblity for managing the allocation of shadow stacks for newly created threads, instead the kernel allocates a new shadow stack with the same size as the normal stack whenever a thread is created with the feature enabled. The stacks allocated in this way are freed by the kernel when the thread exits or shadow stacks are disabled for the thread. This lack of flexibility and control isn't ideal, in the vast majority of cases the shadow stack will be over allocated and the implicit allocation and deallocation is not consistent with other interfaces. As far as I can tell the interface is done in this manner mainly because the shadow stack patches were in development since before clone3() was implemented. Since clone3() is readily extensible let's add support for specifying a shadow stack when creating a new thread or process, keeping the current implicit allocation behaviour if one is not specified either with clone3() or through the use of clone(). The user must provide a shadow stack pointer, this must point to memory mapped for use as a shadow stackby map_shadow_stack() with an architecture specified shadow stack token at the top of the stack. Yuri Khrustalev has raised questions from the libc side regarding discoverability of extended clone3() structure sizes[2], this seems like a general issue with clone3(). There was a suggestion to add a hwcap on arm64 which isn't ideal but is doable there, though architecture specific mechanisms would also be needed for x86 (and RISC-V if it's support gets merged before this does). The idea has, however, had strong pushback from the architecture maintainers and it is possible to detect support for this in clone3() by attempting a call with a misaligned shadow stack pointer specified so no hwcap has been added. [1] https://lore.kernel.org/linux-arm-kernel/20241001-arm64-gcs-v13-0-222b78d87… [2] https://lore.kernel.org/r/aCs65ccRQtJBnZ_5@arm.com Signed-off-by: Mark Brown <broonie(a)kernel.org> --- Changes in v21: - Rebase onto https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git kernel-6.18.clone3 - Rename shadow_stack_token to shstk_token, since it's a simple rename I've kept the acks and reviews but I dropped the tested-bys just to be safe. - Link to v20: https://lore.kernel.org/r/20250902-clone3-shadow-stack-v20-0-4d9fff1c53e7@k… Changes in v20: - Comment fixes and clarifications in x86 arch_shstk_validate_clone() from Rick Edgecombe. - Spelling fix in documentation. - Link to v19: https://lore.kernel.org/r/20250819-clone3-shadow-stack-v19-0-bc957075479b@k… Changes in v19: - Rebase onto v6.17-rc1. - Link to v18: https://lore.kernel.org/r/20250702-clone3-shadow-stack-v18-0-7965d2b694db@k… Changes in v18: - Rebase onto v6.16-rc3. - Thanks to pointers from Yuri Khrustalev this version has been tested on x86 so I have removed the RFT tag. - Clarify clone3_shadow_stack_valid() comment about the Kconfig check. - Remove redundant GCSB DSYNCs in arm64 code. - Fix token validation on x86. - Link to v17: https://lore.kernel.org/r/20250609-clone3-shadow-stack-v17-0-8840ed97ff6f@k… Changes in v17: - Rebase onto v6.16-rc1. - Link to v16: https://lore.kernel.org/r/20250416-clone3-shadow-stack-v16-0-2ffc9ca3917b@k… Changes in v16: - Rebase onto v6.15-rc2. - Roll in fixes from x86 testing from Rick Edgecombe. - Rework so that the argument is shadow_stack_token. - Link to v15: https://lore.kernel.org/r/20250408-clone3-shadow-stack-v15-0-3fa245c6e3be@k… Changes in v15: - Rebase onto v6.15-rc1. - Link to v14: https://lore.kernel.org/r/20250206-clone3-shadow-stack-v14-0-805b53af73b9@k… Changes in v14: - Rebase onto v6.14-rc1. - Link to v13: https://lore.kernel.org/r/20241203-clone3-shadow-stack-v13-0-93b89a81a5ed@k… Changes in v13: - Rebase onto v6.13-rc1. - Link to v12: https://lore.kernel.org/r/20241031-clone3-shadow-stack-v12-0-7183eb8bee17@k… Changes in v12: - Add the regular prctl() to the userspace API document since arm64 support is queued in -next. - Link to v11: https://lore.kernel.org/r/20241005-clone3-shadow-stack-v11-0-2a6a2bd6d651@k… Changes in v11: - Rebase onto arm64 for-next/gcs, which is based on v6.12-rc1, and integrate arm64 support. - Rework the interface to specify a shadow stack pointer rather than a base and size like we do for the regular stack. - Link to v10: https://lore.kernel.org/r/20240821-clone3-shadow-stack-v10-0-06e8797b9445@k… Changes in v10: - Integrate fixes & improvements for the x86 implementation from Rick Edgecombe. - Require that the shadow stack be VM_WRITE. - Require that the shadow stack base and size be sizeof(void *) aligned. - Clean up trailing newline. - Link to v9: https://lore.kernel.org/r/20240819-clone3-shadow-stack-v9-0-962d74f99464@ke… Changes in v9: - Pull token validation earlier and report problems with an error return to parent rather than signal delivery to the child. - Verify that the top of the supplied shadow stack is VM_SHADOW_STACK. - Rework token validation to only do the page mapping once. - Drop no longer needed support for testing for signals in selftest. - Fix typo in comments. - Link to v8: https://lore.kernel.org/r/20240808-clone3-shadow-stack-v8-0-0acf37caf14c@ke… Changes in v8: - Fix token verification with user specified shadow stack. - Don't track user managed shadow stacks for child processes. - Link to v7: https://lore.kernel.org/r/20240731-clone3-shadow-stack-v7-0-a9532eebfb1d@ke… Changes in v7: - Rebase onto v6.11-rc1. - Typo fixes. - Link to v6: https://lore.kernel.org/r/20240623-clone3-shadow-stack-v6-0-9ee7783b1fb9@ke… Changes in v6: - Rebase onto v6.10-rc3. - Ensure we don't try to free the parent shadow stack in error paths of x86 arch code. - Spelling fixes in userspace API document. - Additional cleanups and improvements to the clone3() tests to support the shadow stack tests. - Link to v5: https://lore.kernel.org/r/20240203-clone3-shadow-stack-v5-0-322c69598e4b@ke… Changes in v5: - Rebase onto v6.8-rc2. - Rework ABI to have the user allocate the shadow stack memory with map_shadow_stack() and a token. - Force inlining of the x86 shadow stack enablement. - Move shadow stack enablement out into a shared header for reuse by other tests. - Link to v4: https://lore.kernel.org/r/20231128-clone3-shadow-stack-v4-0-8b28ffe4f676@ke… Changes in v4: - Formatting changes. - Use a define for minimum shadow stack size and move some basic validation to fork.c. - Link to v3: https://lore.kernel.org/r/20231120-clone3-shadow-stack-v3-0-a7b8ed3e2acc@ke… Changes in v3: - Rebase onto v6.7-rc2. - Remove stale shadow_stack in internal kargs. - If a shadow stack is specified unconditionally use it regardless of CLONE_ parameters. - Force enable shadow stacks in the selftest. - Update changelogs for RISC-V feature rename. - Link to v2: https://lore.kernel.org/r/20231114-clone3-shadow-stack-v2-0-b613f8681155@ke… Changes in v2: - Rebase onto v6.7-rc1. - Remove ability to provide preallocated shadow stack, just specify the desired size. - Link to v1: https://lore.kernel.org/r/20231023-clone3-shadow-stack-v1-0-d867d0b5d4d0@ke… --- Mark Brown (8): arm64/gcs: Return a success value from gcs_alloc_thread_stack() Documentation: userspace-api: Add shadow stack API documentation selftests: Provide helper header for shadow stack testing fork: Add shadow stack support to clone3() selftests/clone3: Remove redundant flushes of output streams selftests/clone3: Factor more of main loop into test_clone3() selftests/clone3: Allow tests to flag if -E2BIG is a valid error code selftests/clone3: Test shadow stack support Documentation/userspace-api/index.rst | 1 + Documentation/userspace-api/shadow_stack.rst | 44 +++++ arch/arm64/include/asm/gcs.h | 8 +- arch/arm64/kernel/process.c | 8 +- arch/arm64/mm/gcs.c | 55 +++++- arch/x86/include/asm/shstk.h | 11 +- arch/x86/kernel/process.c | 2 +- arch/x86/kernel/shstk.c | 53 ++++- include/asm-generic/cacheflush.h | 11 ++ include/linux/sched/task.h | 17 ++ include/uapi/linux/sched.h | 9 +- kernel/fork.c | 93 +++++++-- tools/testing/selftests/clone3/clone3.c | 226 ++++++++++++++++++---- tools/testing/selftests/clone3/clone3_selftests.h | 65 ++++++- tools/testing/selftests/ksft_shstk.h | 98 ++++++++++ 15 files changed, 620 insertions(+), 81 deletions(-) --- base-commit: 76cea30ad520238160bf8f5e2f2803fcd7a08d22 change-id: 20231019-clone3-shadow-stack-15d40d2bf536 Best regards, -- Mark Brown <broonie(a)kernel.org>

2 months, 2 weeks

4
14
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror September 2025