August 2023 - Linux-kselftest-mirror

[PATCH v4 00/36] arm64/gcs: Provide support for GCS in userspace

by Mark Brown

The arm64 Guarded Control Stack (GCS) feature provides support for hardware protected stacks of return addresses, intended to provide hardening against return oriented programming (ROP) attacks and to make it easier to gather call stacks for applications such as profiling. When GCS is active a secondary stack called the Guarded Control Stack is maintained, protected with a memory attribute which means that it can only be written with specific GCS operations. The current GCS pointer can not be directly written to by userspace. When a BL is executed the value stored in LR is also pushed onto the GCS, and when a RET is executed the top of the GCS is popped and compared to LR with a fault being raised if the values do not match. GCS operations may only be performed on GCS pages, a data abort is generated if they are not. The combination of hardware enforcement and lack of extra instructions in the function entry and exit paths should result in something which has less overhead and is more difficult to attack than a purely software implementation like clang's shadow stacks. This series implements support for use of GCS by userspace, along with support for use of GCS within KVM guests. It does not enable use of GCS by either EL1 or EL2, this will be implemented separately. Executables are started without GCS and must use a prctl() to enable it, it is expected that this will be done very early in application execution by the dynamic linker or other startup code. x86 has an equivalent feature called shadow stacks, this series depends on the x86 patches for generic memory management support for the new guarded/shadow stack page type and shares APIs as much as possible. As there has been extensive discussion with the wider community around the ABI for shadow stacks I have as far as practical kept implementation decisions close to those for x86, anticipating that review would lead to similar conclusions in the absence of strong reasoning for divergence. The main divergence I am concious of is that x86 allows shadow stack to be enabled and disabled repeatedly, freeing the shadow stack for the thread whenever disabled, while this implementation keeps the GCS allocated after disable but refuses to reenable it. This is to avoid races with things actively walking the GCS during a disable, we do anticipate that some systems will wish to disable GCS at runtime but are not aware of any demand for subsequently reenabling it. x86 uses an arch_prctl() to manage enable and disable, since only x86 and S/390 use arch_prctl() a generic prctl() was proposed[1] as part of a patch set for the equivalent RISC-V zisslpcfi feature which I initially adopted fairly directly but following review feedback has been revised quite a bit. There is an open issue with support for CRIU, on x86 this required the ability to set the GCS mode via ptrace. This series supports configuring mode bits other than enable/disable via ptrace but it needs to be confirmed if this is sufficient. There's a few bits where I'm not convinced with where I've placed things, in particular the GCS write operation is in the GCS header not in uaccess.h, I wasn't sure what was clearest there and am probably too close to the code to have a clear opinion. The reporting of GCS in /proc/PID/smaps is also a bit awkward. The series depends on the x86 shadow stack support: https://lore.kernel.org/lkml/20230227222957.24501-1-rick.p.edgecombe@intel.… I've rebased this onto v6.5-rc4 but not included it in the series in order to avoid confusion with Rick's work and cut down the size of the series, you can see the branch at: https://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc.git arm64-gcs [1] https://lore.kernel.org/lkml/20230213045351.3945824-1-debug@rivosinc.com/ Signed-off-by: Mark Brown <broonie(a)kernel.org> --- Changes in v4: - Implement flags for map_shadow_stack() allowing the cap and end of stack marker to be enabled independently or not at all. - Relax size and alignment requirements for map_shadow_stack(). - Add more blurb explaining the advantages of hardware enforcement. - Link to v3: https://lore.kernel.org/r/20230731-arm64-gcs-v3-0-cddf9f980d98@kernel.org Changes in v3: - Rebase onto v6.5-rc4. - Add a GCS barrier on context switch. - Add a GCS stress test. - Link to v2: https://lore.kernel.org/r/20230724-arm64-gcs-v2-0-dc2c1d44c2eb@kernel.org Changes in v2: - Rebase onto v6.5-rc3. - Rework prctl() interface to allow each bit to be locked independently. - map_shadow_stack() now places the cap token based on the size requested by the caller not the actual space allocated. - Mode changes other than enable via ptrace are now supported. - Expand test coverage. - Various smaller fixes and adjustments. - Link to v1: https://lore.kernel.org/r/20230716-arm64-gcs-v1-0-bf567f93bba6@kernel.org --- Mark Brown (36): prctl: arch-agnostic prctl for shadow stack arm64: Document boot requirements for Guarded Control Stacks arm64/gcs: Document the ABI for Guarded Control Stacks arm64/sysreg: Add new system registers for GCS arm64/sysreg: Add definitions for architected GCS caps arm64/gcs: Add manual encodings of GCS instructions arm64/gcs: Provide copy_to_user_gcs() arm64/cpufeature: Runtime detection of Guarded Control Stack (GCS) arm64/mm: Allocate PIE slots for EL0 guarded control stack mm: Define VM_SHADOW_STACK for arm64 when we support GCS arm64/mm: Map pages for guarded control stack KVM: arm64: Manage GCS registers for guests arm64/gcs: Allow GCS usage at EL0 and EL1 arm64/idreg: Add overrride for GCS arm64/hwcap: Add hwcap for GCS arm64/traps: Handle GCS exceptions arm64/mm: Handle GCS data aborts arm64/gcs: Context switch GCS state for EL0 arm64/gcs: Allocate a new GCS for threads with GCS enabled arm64/gcs: Implement shadow stack prctl() interface arm64/mm: Implement map_shadow_stack() arm64/signal: Set up and restore the GCS context for signal handlers arm64/signal: Expose GCS state in signal frames arm64/ptrace: Expose GCS via ptrace and core files arm64: Add Kconfig for Guarded Control Stack (GCS) kselftest/arm64: Verify the GCS hwcap kselftest/arm64: Add GCS as a detected feature in the signal tests kselftest/arm64: Add framework support for GCS to signal handling tests kselftest/arm64: Allow signals tests to specify an expected si_code kselftest/arm64: Always run signals tests with GCS enabled kselftest/arm64: Add very basic GCS test program kselftest/arm64: Add a GCS test program built with the system libc kselftest/arm64: Add test coverage for GCS mode locking selftests/arm64: Add GCS signal tests kselftest/arm64: Add a GCS stress test kselftest/arm64: Enable GCS for the FP stress tests Documentation/admin-guide/kernel-parameters.txt | 3 + Documentation/arch/arm64/booting.rst | 22 + Documentation/arch/arm64/elf_hwcaps.rst | 3 + Documentation/arch/arm64/gcs.rst | 228 +++++++++ Documentation/arch/arm64/index.rst | 1 + Documentation/filesystems/proc.rst | 2 +- arch/arm64/Kconfig | 19 + arch/arm64/include/asm/cpufeature.h | 6 + arch/arm64/include/asm/el2_setup.h | 17 + arch/arm64/include/asm/esr.h | 28 +- arch/arm64/include/asm/exception.h | 2 + arch/arm64/include/asm/gcs.h | 106 ++++ arch/arm64/include/asm/hwcap.h | 1 + arch/arm64/include/asm/kvm_arm.h | 4 +- arch/arm64/include/asm/kvm_host.h | 12 + arch/arm64/include/asm/pgtable-prot.h | 14 +- arch/arm64/include/asm/processor.h | 7 + arch/arm64/include/asm/sysreg.h | 20 + arch/arm64/include/asm/uaccess.h | 42 ++ arch/arm64/include/uapi/asm/hwcap.h | 1 + arch/arm64/include/uapi/asm/ptrace.h | 8 + arch/arm64/include/uapi/asm/sigcontext.h | 9 + arch/arm64/kernel/cpufeature.c | 19 + arch/arm64/kernel/cpuinfo.c | 1 + arch/arm64/kernel/entry-common.c | 23 + arch/arm64/kernel/idreg-override.c | 2 + arch/arm64/kernel/process.c | 85 ++++ arch/arm64/kernel/ptrace.c | 59 +++ arch/arm64/kernel/signal.c | 237 ++++++++- arch/arm64/kernel/traps.c | 11 + arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 17 + arch/arm64/kvm/sys_regs.c | 22 + arch/arm64/mm/Makefile | 1 + arch/arm64/mm/fault.c | 78 ++- arch/arm64/mm/gcs.c | 234 +++++++++ arch/arm64/mm/mmap.c | 12 +- arch/arm64/tools/cpucaps | 1 + arch/arm64/tools/sysreg | 55 +++ fs/proc/task_mmu.c | 3 + include/linux/mm.h | 16 +- include/linux/syscalls.h | 1 + include/uapi/asm-generic/unistd.h | 5 +- include/uapi/linux/elf.h | 1 + include/uapi/linux/prctl.h | 22 + kernel/sys.c | 30 ++ kernel/sys_ni.c | 1 + tools/testing/selftests/arm64/Makefile | 2 +- tools/testing/selftests/arm64/abi/hwcap.c | 19 + tools/testing/selftests/arm64/fp/assembler.h | 15 + tools/testing/selftests/arm64/fp/fpsimd-test.S | 2 + tools/testing/selftests/arm64/fp/sve-test.S | 2 + tools/testing/selftests/arm64/fp/za-test.S | 2 + tools/testing/selftests/arm64/fp/zt-test.S | 2 + tools/testing/selftests/arm64/gcs/.gitignore | 5 + tools/testing/selftests/arm64/gcs/Makefile | 24 + tools/testing/selftests/arm64/gcs/asm-offsets.h | 0 tools/testing/selftests/arm64/gcs/basic-gcs.c | 356 ++++++++++++++ tools/testing/selftests/arm64/gcs/gcs-locking.c | 200 ++++++++ .../selftests/arm64/gcs/gcs-stress-thread.S | 311 ++++++++++++ tools/testing/selftests/arm64/gcs/gcs-stress.c | 532 +++++++++++++++++++++ tools/testing/selftests/arm64/gcs/gcs-util.h | 87 ++++ tools/testing/selftests/arm64/gcs/libc-gcs.c | 500 +++++++++++++++++++ tools/testing/selftests/arm64/signal/.gitignore | 1 + .../testing/selftests/arm64/signal/test_signals.c | 17 +- .../testing/selftests/arm64/signal/test_signals.h | 6 + .../selftests/arm64/signal/test_signals_utils.c | 32 +- .../selftests/arm64/signal/test_signals_utils.h | 39 ++ .../arm64/signal/testcases/gcs_exception_fault.c | 59 +++ .../selftests/arm64/signal/testcases/gcs_frame.c | 78 +++ .../arm64/signal/testcases/gcs_write_fault.c | 67 +++ .../selftests/arm64/signal/testcases/testcases.c | 7 + .../selftests/arm64/signal/testcases/testcases.h | 1 + 72 files changed, 3823 insertions(+), 34 deletions(-) --- base-commit: ed0e1456f04be7a93c9a186e8e13aed78b555617 change-id: 20230303-arm64-gcs-e311ab0d8729 Best regards, -- Mark Brown <broonie(a)kernel.org>

2 years, 1 month

5
88
0 0

[PATCH v6] selftests/clone3: Fix broken test under !CONFIG_TIME_NS

by Tiezhu Yang

When execute the following command to test clone3 under !CONFIG_TIME_NS: # make headers && cd tools/testing/selftests/clone3 && make && ./clone3 we can see the following error info: # [7538] Trying clone3() with flags 0x80 (size 0) # Invalid argument - Failed to create new process # [7538] clone3() with flags says: -22 expected 0 not ok 18 [7538] Result (-22) is different than expected (0) ... # Totals: pass:18 fail:1 xfail:0 xpass:0 skip:0 error:0 This is because if CONFIG_TIME_NS is not set, but the flag CLONE_NEWTIME (0x80) is used to clone a time namespace, it will return -EINVAL in copy_time_ns(). If kernel does not support CONFIG_TIME_NS, /proc/self/ns/time will be not exist, and then we should skip clone3() test with CLONE_NEWTIME. With this patch under !CONFIG_TIME_NS: # make headers && cd tools/testing/selftests/clone3 && make && ./clone3 ... # Time namespaces are not supported ok 18 # SKIP Skipping clone3() with CLONE_NEWTIME ... # Totals: pass:18 fail:0 xfail:0 xpass:0 skip:1 error:0 Fixes: 515bddf0ec41 ("selftests/clone3: test clone3 with CLONE_NEWTIME") Suggested-by: Thomas Gleixner <tglx(a)linutronix.de> Signed-off-by: Tiezhu Yang <yangtiezhu(a)loongson.cn> --- v6: Rebase on 6.5-rc1 and update the commit message tools/testing/selftests/clone3/clone3.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/clone3/clone3.c b/tools/testing/selftests/clone3/clone3.c index e60cf4d..1c61e3c 100644 --- a/tools/testing/selftests/clone3/clone3.c +++ b/tools/testing/selftests/clone3/clone3.c @@ -196,7 +196,12 @@ int main(int argc, char *argv[]) CLONE3_ARGS_NO_TEST); /* Do a clone3() in a new time namespace */ - test_clone3(CLONE_NEWTIME, 0, 0, CLONE3_ARGS_NO_TEST); + if (access("/proc/self/ns/time", F_OK) == 0) { + test_clone3(CLONE_NEWTIME, 0, 0, CLONE3_ARGS_NO_TEST); + } else { + ksft_print_msg("Time namespaces are not supported\n"); + ksft_test_result_skip("Skipping clone3() with CLONE_NEWTIME\n"); + } /* Do a clone3() with exit signal (SIGCHLD) in flags */ test_clone3(SIGCHLD, 0, -EINVAL, CLONE3_ARGS_NO_TEST); -- 2.1.0

2 years, 1 month

1
2
0 0

[PATCH v2 0/2] Modify vDSO selftests

by Tiezhu Yang

v2: Rebase on 6.5-rc1 and update the commit message Tiezhu Yang (2): selftests/vDSO: Add support for LoongArch selftests/vDSO: Get version and name for all archs tools/testing/selftests/vDSO/vdso_config.h | 6 ++++- tools/testing/selftests/vDSO/vdso_test_getcpu.c | 16 +++++-------- .../selftests/vDSO/vdso_test_gettimeofday.c | 26 ++++++---------------- 3 files changed, 18 insertions(+), 30 deletions(-) -- 2.1.0

2 years, 1 month

1
4
0 0

[RFC PATCH v2 00/20] context_tracking,x86: Defer some IPIs until a user->kernel transition

by Valentin Schneider

Context ======= We've observed within Red Hat that isolated, NOHZ_FULL CPUs running a pure-userspace application get regularly interrupted by IPIs sent from housekeeping CPUs. Those IPIs are caused by activity on the housekeeping CPUs leading to various on_each_cpu() calls, e.g.: 64359.052209596 NetworkManager 0 1405 smp_call_function_many_cond (cpu=0, func=do_kernel_range_flush) smp_call_function_many_cond+0x1 smp_call_function+0x39 on_each_cpu+0x2a flush_tlb_kernel_range+0x7b __purge_vmap_area_lazy+0x70 _vm_unmap_aliases.part.42+0xdf change_page_attr_set_clr+0x16a set_memory_ro+0x26 bpf_int_jit_compile+0x2f9 bpf_prog_select_runtime+0xc6 bpf_prepare_filter+0x523 sk_attach_filter+0x13 sock_setsockopt+0x92c __sys_setsockopt+0x16a __x64_sys_setsockopt+0x20 do_syscall_64+0x87 entry_SYSCALL_64_after_hwframe+0x65 The heart of this series is the thought that while we cannot remove NOHZ_FULL CPUs from the list of CPUs targeted by these IPIs, they may not have to execute the callbacks immediately. Anything that only affects kernelspace can wait until the next user->kernel transition, providing it can be executed "early enough" in the entry code. The original implementation is from Peter [1]. Nicolas then added kernel TLB invalidation deferral to that [2], and I picked it up from there. Deferral approach ================= Storing each and every callback, like a secondary call_single_queue turned out to be a no-go: the whole point of deferral is to keep NOHZ_FULL CPUs in userspace for as long as possible - no signal of any form would be sent when deferring an IPI. This means that any form of queuing for deferred callbacks would end up as a convoluted memory leak. Deferred IPIs must thus be coalesced, which this series achieves by assigning IPIs a "type" and having a mapping of IPI type to callback, leveraged upon kernel entry. What about IPIs whose callback take a parameter, you may ask? Peter suggested during OSPM23 [3] that since on_each_cpu() targets housekeeping CPUs *and* isolated CPUs, isolated CPUs can access either global or housekeeping-CPU-local state to "reconstruct" the data that would have been sent via the IPI. This series does not affect any IPI callback that requires an argument, but the approach would remain the same (one coalescable callback executed on kernel entry). Kernel entry vs execution of the deferred operation =================================================== There is a non-zero length of code that is executed upon kernel entry before the deferred operation can be itself executed (i.e. before we start getting into context_tracking.c proper). This means one must take extra care to what can happen in the early entry code, and that <bad things> cannot happen. For instance, we really don't want to hit instructions that have been modified by a remote text_poke() while we're on our way to execute a deferred sync_core(). Patches ======= o Patches 1-9 have been submitted separately and are included for the sake of testing o Patches 10-14 focus on having objtool detect problematic static key usage in early entry o Patch 15 adds the infrastructure for IPI deferral. o Patches 16-17 add some RCU testing infrastructure o Patch 18 adds text_poke() IPI deferral. o Patches 19-20 add vunmap() flush_tlb_kernel_range() IPI deferral These ones I'm a lot less confident about, mostly due to lacking instrumentation/verification. The actual deferred callback is also incomplete as it's not properly noinstr: vmlinux.o: warning: objtool: __flush_tlb_all_noinstr+0x19: call to native_write_cr4() leaves .noinstr.text section and it doesn't support PARAVIRT - it's going to need a pv_ops.mmu entry, but I have *no idea* what a sane implementation would be for Xen so I haven't touched that yet. Patches are also available at: https://gitlab.com/vschneid/linux.git -b redhat/isolirq/defer/v2 Testing ======= Note: this is a different machine than used for v1, because that machine decided to act difficult. Xeon E5-2699 system with SMToff, NOHZ_FULL, isolated CPUs. RHEL9 userspace. Workload is using rteval (kernel compilation + hackbench) on housekeeping CPUs and a dummy stay-in-userspace loop on the isolated CPUs. The main invocation is: $ trace-cmd record -e "csd_queue_cpu" -f "cpu & CPUS{$ISOL_CPUS}" \ -e "ipi_send_cpumask" -f "cpumask & CPUS{$ISOL_CPUS}" \ -e "ipi_send_cpu" -f "cpu & CPUS{$ISOL_CPUS}" \ rteval --onlyload --loads-cpulist=$HK_CPUS \ --hackbench-runlowmem=True --duration=$DURATION This only records IPIs sent to isolated CPUs, so any event there is interference (with a bit of fuzz at the start/end of the workload when spawning the processes). All tests were done with a duration of 30 minutes. v6.5-rc1 (+ cpumask filtering patches): # This is the actual IPI count $ trace-cmd report | grep callback | awk '{ print $(NF) }' | sort | uniq -c | sort -nr 338 callback=generic_smp_call_function_single_interrupt+0x0 # These are the different CSD's that caused IPIs $ trace-cmd report | grep csd_queue | awk '{ print $(NF-1) }' | sort | uniq -c | sort -nr 9207 func=do_flush_tlb_all 1116 func=do_sync_core 62 func=do_kernel_range_flush 3 func=nohz_full_kick_func v6.5-rc1 + patches: # This is the actual IPI count $ trace-cmd report | grep callback | awk '{ print $(NF) }' | sort | uniq -c | sort -nr 2 callback=generic_smp_call_function_single_interrupt+0x0 # These are the different CSD's that caused IPIs $ trace-cmd report | grep csd_queue | awk '{ print $(NF-1) }' | sort | uniq -c | sort -nr 2 func=nohz_full_kick_func The incriminating IPIs are all gone, but note that on the machine I used to test v1 there were still some do_flush_tlb_all() IPIs caused by pcpu_balance_workfn(), since only vmalloc is affected by the deferral mechanism. Acknowledgements ================ Special thanks to: o Clark Williams for listening to my ramblings about this and throwing ideas my way o Josh Poimboeuf for his guidance regarding objtool and hinting at the .data..ro_after_init section. Links ===== [1]: https://lore.kernel.org/all/20210929151723.162004989@infradead.org/ [2]: https://github.com/vianpl/linux.git -b ct-work-defer-wip [3]: https://youtu.be/0vjE6fjoVVE Revisions ========= RFCv1 -> RFCv2 ++++++++++++++ o Rebased onto v6.5-rc1 o Updated the trace filter patches (Steven) o Fixed __ro_after_init keys used in modules (Peter) o Dropped the extra context_tracking atomic, squashed the new bits in the existing .state field (Peter, Frederic) o Added an RCU_EXPERT config for the RCU dynticks counter size, and added an rcutorture case for a low-size counter (Paul) The new TREE11 case with a 2-bit dynticks counter seems to pass when ran against this series. o Fixed flush_tlb_kernel_range_deferrable() definition Peter Zijlstra (1): jump_label,module: Don't alloc static_key_mod for __ro_after_init keys Valentin Schneider (19): tracing/filters: Dynamically allocate filter_pred.regex tracing/filters: Enable filtering a cpumask field by another cpumask tracing/filters: Enable filtering a scalar field by a cpumask tracing/filters: Enable filtering the CPU common field by a cpumask tracing/filters: Optimise cpumask vs cpumask filtering when user mask is a single CPU tracing/filters: Optimise scalar vs cpumask filtering when the user mask is a single CPU tracing/filters: Optimise CPU vs cpumask filtering when the user mask is a single CPU tracing/filters: Further optimise scalar vs cpumask comparison tracing/filters: Document cpumask filtering objtool: Flesh out warning related to pv_ops[] calls objtool: Warn about non __ro_after_init static key usage in .noinstr context_tracking: Make context_tracking_key __ro_after_init x86/kvm: Make kvm_async_pf_enabled __ro_after_init context-tracking: Introduce work deferral infrastructure rcu: Make RCU dynticks counter size configurable rcutorture: Add a test config to torture test low RCU_DYNTICKS width context_tracking,x86: Defer kernel text patching IPIs context_tracking,x86: Add infrastructure to defer kernel TLBI x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs Documentation/trace/events.rst | 14 + arch/Kconfig | 9 + arch/x86/Kconfig | 1 + arch/x86/include/asm/context_tracking_work.h | 20 ++ arch/x86/include/asm/text-patching.h | 1 + arch/x86/include/asm/tlbflush.h | 2 + arch/x86/kernel/alternative.c | 24 +- arch/x86/kernel/kprobes/core.c | 4 +- arch/x86/kernel/kprobes/opt.c | 4 +- arch/x86/kernel/kvm.c | 2 +- arch/x86/kernel/module.c | 2 +- arch/x86/mm/tlb.c | 40 ++- include/asm-generic/sections.h | 5 + include/linux/context_tracking.h | 26 ++ include/linux/context_tracking_state.h | 65 +++- include/linux/context_tracking_work.h | 28 ++ include/linux/jump_label.h | 1 + include/linux/trace_events.h | 1 + init/main.c | 1 + kernel/context_tracking.c | 53 ++- kernel/jump_label.c | 49 +++ kernel/rcu/Kconfig | 33 ++ kernel/time/Kconfig | 5 + kernel/trace/trace_events_filter.c | 302 ++++++++++++++++-- mm/vmalloc.c | 19 +- tools/objtool/check.c | 22 +- tools/objtool/include/objtool/check.h | 1 + tools/objtool/include/objtool/special.h | 2 + tools/objtool/special.c | 3 + .../selftests/rcutorture/configs/rcu/TREE11 | 19 ++ .../rcutorture/configs/rcu/TREE11.boot | 1 + 31 files changed, 695 insertions(+), 64 deletions(-) create mode 100644 arch/x86/include/asm/context_tracking_work.h create mode 100644 include/linux/context_tracking_work.h create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TREE11 create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TREE11.boot -- 2.31.1

2 years, 1 month

12
75
0 0

[PATCH 1/6] selftests: capabilities: remove duplicate unneeded defines

by Muhammad Usama Anjum

These duplicate defines should automatically be picked up from kernel headers. Use KHDR_INCLUDES to add kernel header files. Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com> --- tools/testing/selftests/capabilities/Makefile | 2 +- tools/testing/selftests/capabilities/test_execve.c | 8 -------- tools/testing/selftests/capabilities/validate_cap.c | 8 -------- 3 files changed, 1 insertion(+), 17 deletions(-) diff --git a/tools/testing/selftests/capabilities/Makefile b/tools/testing/selftests/capabilities/Makefile index 6e9d98d457d5b..411ac098308f1 100644 --- a/tools/testing/selftests/capabilities/Makefile +++ b/tools/testing/selftests/capabilities/Makefile @@ -2,7 +2,7 @@ TEST_GEN_FILES := validate_cap TEST_GEN_PROGS := test_execve -CFLAGS += -O2 -g -std=gnu99 -Wall +CFLAGS += -O2 -g -std=gnu99 -Wall $(KHDR_INCLUDES) LDLIBS += -lcap-ng -lrt -ldl include ../lib.mk diff --git a/tools/testing/selftests/capabilities/test_execve.c b/tools/testing/selftests/capabilities/test_execve.c index df0ef02b40367..e3a352b020a79 100644 --- a/tools/testing/selftests/capabilities/test_execve.c +++ b/tools/testing/selftests/capabilities/test_execve.c @@ -20,14 +20,6 @@ #include "../kselftest.h" -#ifndef PR_CAP_AMBIENT -#define PR_CAP_AMBIENT 47 -# define PR_CAP_AMBIENT_IS_SET 1 -# define PR_CAP_AMBIENT_RAISE 2 -# define PR_CAP_AMBIENT_LOWER 3 -# define PR_CAP_AMBIENT_CLEAR_ALL 4 -#endif - static int nerrs; static pid_t mpid; /* main() pid is used to avoid duplicate test counts */ diff --git a/tools/testing/selftests/capabilities/validate_cap.c b/tools/testing/selftests/capabilities/validate_cap.c index cdfc94268fe6e..60b4e7b716a75 100644 --- a/tools/testing/selftests/capabilities/validate_cap.c +++ b/tools/testing/selftests/capabilities/validate_cap.c @@ -9,14 +9,6 @@ #include "../kselftest.h" -#ifndef PR_CAP_AMBIENT -#define PR_CAP_AMBIENT 47 -# define PR_CAP_AMBIENT_IS_SET 1 -# define PR_CAP_AMBIENT_RAISE 2 -# define PR_CAP_AMBIENT_LOWER 3 -# define PR_CAP_AMBIENT_CLEAR_ALL 4 -#endif - #if __GLIBC__ > 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ >= 19) # define HAVE_GETAUXVAL #endif -- 2.39.2

2 years, 1 month

4
16
0 0

[PATCH v2] selftests/lkdtm: Disable CONFIG_UBSAN_TRAP in test config

by Ricardo Cañuelo

The lkdtm selftest config fragment enables CONFIG_UBSAN_TRAP to make the ARRAY_BOUNDS test kill the calling process when an out-of-bound access is detected by UBSAN. However, after this [1] commit, UBSAN is triggered under many new scenarios that weren't detected before, such as in struct definitions with fixed-size trailing arrays used as flexible arrays. As a result, CONFIG_UBSAN_TRAP=y has become a very aggressive option to enable except for specific situations. `make kselftest-merge` applies CONFIG_UBSAN_TRAP=y to the kernel config for all selftests, which makes many of them fail because of system hangs during boot. This change removes the config option from the lkdtm kselftest and configures the ARRAY_BOUNDS test to look for UBSAN reports rather than relying on the calling process being killed. [1] commit 2d47c6956ab3 ("ubsan: Tighten UBSAN_BOUNDS on GCC")' Signed-off-by: Ricardo Cañuelo <ricardo.canuelo(a)collabora.com> Reviewed-by: Kees Cook <keescook(a)chromium.org> --- Changelog: v2: - Configure the ARRAY_BOUNDS lkdtm test to match UBSAN reports instead of disabling the test tools/testing/selftests/lkdtm/config | 1 - tools/testing/selftests/lkdtm/tests.txt | 2 +- 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/tools/testing/selftests/lkdtm/config b/tools/testing/selftests/lkdtm/config index 5d52f64dfb43..7afe05e8c4d7 100644 --- a/tools/testing/selftests/lkdtm/config +++ b/tools/testing/selftests/lkdtm/config @@ -9,7 +9,6 @@ CONFIG_INIT_ON_FREE_DEFAULT_ON=y CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y CONFIG_UBSAN=y CONFIG_UBSAN_BOUNDS=y -CONFIG_UBSAN_TRAP=y CONFIG_STACKPROTECTOR_STRONG=y CONFIG_SLUB_DEBUG=y CONFIG_SLUB_DEBUG_ON=y diff --git a/tools/testing/selftests/lkdtm/tests.txt b/tools/testing/selftests/lkdtm/tests.txt index 607b8d7e3ea3..2f3a1b96da6e 100644 --- a/tools/testing/selftests/lkdtm/tests.txt +++ b/tools/testing/selftests/lkdtm/tests.txt @@ -7,7 +7,7 @@ EXCEPTION #EXHAUST_STACK Corrupts memory on failure #CORRUPT_STACK Crashes entire system on success #CORRUPT_STACK_STRONG Crashes entire system on success -ARRAY_BOUNDS +ARRAY_BOUNDS call trace:|UBSAN: array-index-out-of-bounds CORRUPT_LIST_ADD list_add corruption CORRUPT_LIST_DEL list_del corruption STACK_GUARD_PAGE_LEADING -- 2.25.1

2 years, 2 months

2
2
0 0

[PATCH v4 0/8] add UFFDIO_POISON to simulate memory poisoning with UFFD

by Axel Rasmussen

This series adds a new userfaultfd feature, UFFDIO_POISON. See commit 4 for a detailed description of the feature. The series is based on Linus master (partial 6.5 merge window), and structured like this: - Patches 1-3 are preparation / refactoring - Patches 4-6 implement and advertise the new feature - Patches 7-8 implement a unit test for the new feature Changelog: v3 -> v4: - [Peter] Rename PTE_MARKER_ERROR and helpers to PTE_MARKER_POISONED. - [Peter] Switch from calloc to memset for initializing some state in the selftest. v2 -> v3: - Rebase onto current Linus master. - Don't overwrite existing PTE markers for non-hugetlb UFFDIO_POISON. Before, non-hugetlb would override them, but hugetlb would not. I don't think there's a use case where we *want* to override a UFFD_WP marker for example, so take the more conservative behavior for all kinds of memory. - [Peter] Drop hugetlb mfill atomic refactoring, since it isn't needed for this series (we don't touch that code directly anyway). - [Peter] Switch to re-using PTE_MARKER_SWAPIN_ERROR instead of defining new PTE_MARKER_UFFD_POISON. - [Peter] Extract start / len range overflow check into existing validate_range helper; this fixes the style issue of unnecessary braces in the UFFDIO_POISON implementation, because this code is just deleted. - [Peter] Extract file size check out into a new helper. - [Peter] Defer actually "enabling" the new feature until the last commit in the series; combine this with adding the documentation. As a consequence, move the selftest commits after this one. - [Randy] Fix typo in documentation. v1 -> v2: - [Peter] Return VM_FAULT_HWPOISON not VM_FAULT_SIGBUS, to yield the correct behavior for KVM (guest MCE). - [Peter] Rename UFFDIO_SIGBUS to UFFDIO_POISON. - [Peter] Implement hugetlbfs support for UFFDIO_POISON. Axel Rasmussen (8): mm: make PTE_MARKER_SWAPIN_ERROR more general mm: userfaultfd: check for start + len overflow in validate_range mm: userfaultfd: extract file size check out into a helper mm: userfaultfd: add new UFFDIO_POISON ioctl mm: userfaultfd: support UFFDIO_POISON for hugetlbfs mm: userfaultfd: document and enable new UFFDIO_POISON feature selftests/mm: refactor uffd_poll_thread to allow custom fault handlers selftests/mm: add uffd unit test for UFFDIO_POISON Documentation/admin-guide/mm/userfaultfd.rst | 15 +++ fs/userfaultfd.c | 73 ++++++++++-- include/linux/mm_inline.h | 19 +++ include/linux/swapops.h | 15 ++- include/linux/userfaultfd_k.h | 4 + include/uapi/linux/userfaultfd.h | 25 +++- mm/hugetlb.c | 51 ++++++-- mm/madvise.c | 2 +- mm/memory.c | 15 ++- mm/mprotect.c | 4 +- mm/shmem.c | 4 +- mm/swapfile.c | 2 +- mm/userfaultfd.c | 83 ++++++++++--- tools/testing/selftests/mm/uffd-common.c | 5 +- tools/testing/selftests/mm/uffd-common.h | 3 + tools/testing/selftests/mm/uffd-stress.c | 8 +- tools/testing/selftests/mm/uffd-unit-tests.c | 117 +++++++++++++++++++ 17 files changed, 379 insertions(+), 66 deletions(-) -- 2.41.0.255.g8b1d071c50-goog

2 years, 2 months

4
15
0 0

[PATCH v2] KVM: selftests: Add tests - invalid inputs for KVM_CREATE_GUEST_MEMFD

by Ackerley Tng

Test that invalid inputs for KVM_CREATE_GUEST_MEMFD, such as non-page-aligned page size and invalid flags, are rejected by the KVM_CREATE_GUEST_MEMFD with EINVAL Signed-off-by: Ackerley Tng <ackerleytng(a)google.com> --- .../testing/selftests/kvm/guest_memfd_test.c | 49 +++++++++++++++++++ .../selftests/kvm/include/kvm_util_base.h | 11 ++++- 2 files changed, 58 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing/selftests/kvm/guest_memfd_test.c index eb93c608a7e0..4d2b110ab0d6 100644 --- a/tools/testing/selftests/kvm/guest_memfd_test.c +++ b/tools/testing/selftests/kvm/guest_memfd_test.c @@ -8,6 +8,7 @@ #define _GNU_SOURCE #include "test_util.h" #include "kvm_util_base.h" +#include <linux/bitmap.h> #include <linux/falloc.h> #include <sys/mman.h> #include <sys/types.h> @@ -90,6 +91,52 @@ static void test_fallocate(int fd, size_t page_size, size_t total_size) TEST_ASSERT(!ret, "fallocate to restore punched hole should succeed"); } +static void test_create_guest_memfd_invalid(struct kvm_vm *vm) +{ + uint64_t valid_flags = 0; + size_t page_size = getpagesize(); + uint64_t flag; + size_t size; + int fd; + + for (size = 1; size < page_size; size++) { + fd = __vm_create_guest_memfd(vm, size, 0); + TEST_ASSERT(fd == -1 && errno == EINVAL, + "guest_memfd() with non-page-aligned page size '0x%lx' should fail with EINVAL", + size); + } + + if (thp_configured()) { + for (size = page_size * 2; size < get_trans_hugepagesz(); size += page_size) { + fd = __vm_create_guest_memfd(vm, size, KVM_GUEST_MEMFD_ALLOW_HUGEPAGE); + TEST_ASSERT(fd == -1 && errno == EINVAL, + "guest_memfd() with non-hugepage-aligned page size '0x%lx' should fail with EINVAL", + size); + } + + valid_flags = KVM_GUEST_MEMFD_ALLOW_HUGEPAGE; + } + + for (flag = 1; flag; flag <<= 1) { + uint64_t bit; + + if (flag & valid_flags) + continue; + + fd = __vm_create_guest_memfd(vm, page_size, flag); + TEST_ASSERT(fd == -1 && errno == EINVAL, + "guest_memfd() with flag '0x%lx' should fail with EINVAL", + flag); + + for_each_set_bit(bit, &valid_flags, 64) { + fd = __vm_create_guest_memfd(vm, page_size, flag | BIT_ULL(bit)); + TEST_ASSERT(fd == -1 && errno == EINVAL, + "guest_memfd() with flags '0x%llx' should fail with EINVAL", + flag | BIT_ULL(bit)); + } + } +} + int main(int argc, char *argv[]) { @@ -103,6 +150,8 @@ int main(int argc, char *argv[]) vm = vm_create_barebones(); + test_create_guest_memfd_invalid(vm); + fd = vm_create_guest_memfd(vm, total_size, 0); test_file_read_write(fd); diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h index 39b38c75b99c..8bdfadd72349 100644 --- a/tools/testing/selftests/kvm/include/kvm_util_base.h +++ b/tools/testing/selftests/kvm/include/kvm_util_base.h @@ -474,7 +474,8 @@ static inline uint64_t vm_get_stat(struct kvm_vm *vm, const char *stat_name) } void vm_create_irqchip(struct kvm_vm *vm); -static inline int vm_create_guest_memfd(struct kvm_vm *vm, uint64_t size, + +static inline int __vm_create_guest_memfd(struct kvm_vm *vm, uint64_t size, uint64_t flags) { struct kvm_create_guest_memfd gmem = { @@ -482,7 +483,13 @@ static inline int vm_create_guest_memfd(struct kvm_vm *vm, uint64_t size, .flags = flags, }; - int fd = __vm_ioctl(vm, KVM_CREATE_GUEST_MEMFD, &gmem); + return __vm_ioctl(vm, KVM_CREATE_GUEST_MEMFD, &gmem); +} + +static inline int vm_create_guest_memfd(struct kvm_vm *vm, uint64_t size, + uint64_t flags) +{ + int fd = __vm_create_guest_memfd(vm, size, flags); TEST_ASSERT(fd >= 0, KVM_IOCTL_ERROR(KVM_CREATE_GUEST_MEMFD, fd)); return fd; -- 2.42.0.rc1.204.g551eb34607-goog

2 years, 2 months

2
1
0 0

[PATCH v4 00/10] tracing: introducing eventfs

by Ajay Kaher

Events Tracing infrastructure contains lot of files, directories (internally in terms of inodes, dentries). And ends up by consuming memory in MBs. We can have multiple events of Events Tracing, which further requires more memory. Instead of creating inodes/dentries, eventfs could keep meta-data and skip the creation of inodes/dentries. As and when require, eventfs will create the inodes/dentries only for required files/directories. Also eventfs would delete the inodes/dentries once no more requires but preserve the meta data. Tracing events took ~9MB, with this approach it took ~4.5MB for ~10K files/dir. v3: Patch 3,4,5,7,9: removed all the eventfs_rwsem code and replaced it with an srcu lock for the readers, and a mutex to synchronize the writers of the list. Patch 2: moved 'tracefs_inode' and 'get_tracefs()' to v4 03/10 Patch 3: moved the struct eventfs_file and eventfs_inode into event_inode.c as it really should not be exposed to all users. Patch 5: added a recursion check to eventfs_remove_rec() as it is really dangerous to have unchecked recursion in the kernel (we do have a fixed size stack). have the free use srcu callbacks. After the srcu grace periods are done, it adds the eventfs_file onto a llist (lockless link list) and wakes up a work queue. Then the work queue does the freeing (this needs to be done in task/workqueue context, as srcu callbacks are done in softirq context). Patch 6: renamed: eventfs_create_file() -> create_file() eventfs_create_dir() -> create_dir() v2: Patch 01: new patch:'Require all trace events to have a TRACE_SYSTEM' Patch 02: moved from v1 1/9 Patch 03: moved from v1 2/9 As suggested by Zheng Yejian, introduced eventfs_prepare_ef() helper function to add files or directories to eventfs fix WARNING reported by kernel test robot in v1 8/9 Patch 04: moved from v1 3/9 used eventfs_prepare_ef() to add files fix WARNING reported by kernel test robot in v1 8/9 Patch 05: moved from v1 4/9 fix compiling warning reported by kernel test robot in v1 4/9 Patch 06: moved from v1 5/9 Patch 07: moved from v1 6/9 Patch 08: moved from v1 7/9 Patch 09: moved from v1 8/9 rebased because of v3 01/10 Patch 10: moved from v1 9/9 v1: Patch 1: add header file Patch 2: resolved kernel test robot issues protecting eventfs lists using nested eventfs_rwsem Patch 3: protecting eventfs lists using nested eventfs_rwsem Patch 4: improve events cleanup code to fix crashes Patch 5: resolved kernel test robot issues removed d_instantiate_anon() calls Patch 6: resolved kernel test robot issues fix kprobe test in eventfs_root_lookup() protecting eventfs lists using nested eventfs_rwsem Patch 7: remove header file Patch 8: pass eventfs_rwsem as argument to eventfs functions called eventfs_remove_events_dir() instead of tracefs_remove() from event_trace_del_tracer() Patch 9: new patch to fix kprobe test case Ajay Kaher (9): tracefs: Rename some tracefs function eventfs: Implement eventfs dir creation functions eventfs: Implement eventfs file add functions eventfs: Implement eventfs file, directory remove function eventfs: Implement functions to create eventfs files and directories eventfs: Implement eventfs lookup, read, open functions eventfs: Implement tracefs_inode_cache eventfs: Move tracing/events to eventfs test: ftrace: Fix kprobe test for eventfs Steven Rostedt (Google) (1): tracing: Require all trace events to have a TRACE_SYSTEM fs/tracefs/Makefile | 1 + fs/tracefs/event_inode.c | 711 ++++++++++++++++++ fs/tracefs/inode.c | 124 ++- fs/tracefs/internal.h | 25 + include/linux/trace_events.h | 1 + include/linux/tracefs.h | 32 + kernel/trace/trace.h | 2 +- kernel/trace/trace_events.c | 78 +- .../ftrace/test.d/kprobe/kprobe_args_char.tc | 4 +- .../test.d/kprobe/kprobe_args_string.tc | 4 +- 10 files changed, 930 insertions(+), 52 deletions(-) create mode 100644 fs/tracefs/event_inode.c create mode 100644 fs/tracefs/internal.h -- 2.39.0

2 years, 2 months

4
35
0 0

[PATCH] selftests/ftrace: Test toplevel-enable for instance

by Zheng Yejian

'available_events' is actually not required by 'test.d/event/toplevel-enable.tc' and its Existence has been tested in 'test.d/00basic/basic4.tc'. So the require of 'available_events' can be dropped and then we can add 'instance' flag to test 'test.d/event/toplevel-enable.tc' for instance. Test result show as below: # ./ftracetest test.d/event/toplevel-enable.tc === Ftrace unit tests === [1] event tracing - enable/disable with top level files [PASS] [2] (instance) event tracing - enable/disable with top level files [PASS] # of passed: 2 # of failed: 0 # of unresolved: 0 # of untested: 0 # of unsupported: 0 # of xfailed: 0 # of undefined(test bug): 0 Signed-off-by: Zheng Yejian <zhengyejian1(a)huawei.com> --- tools/testing/selftests/ftrace/test.d/event/toplevel-enable.tc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/ftrace/test.d/event/toplevel-enable.tc b/tools/testing/selftests/ftrace/test.d/event/toplevel-enable.tc index 93c10ea42a68..8b8e1aea985b 100644 --- a/tools/testing/selftests/ftrace/test.d/event/toplevel-enable.tc +++ b/tools/testing/selftests/ftrace/test.d/event/toplevel-enable.tc @@ -1,7 +1,8 @@ #!/bin/sh # SPDX-License-Identifier: GPL-2.0 # description: event tracing - enable/disable with top level files -# requires: available_events set_event events/enable +# requires: set_event events/enable +# flags: instance do_reset() { echo > set_event -- 2.25.1

2 years, 2 months

2
3
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror August 2023