February 2024 - Linux-kselftest-mirror

[PATCH] selftests/seccomp: Pin benchmark to single CPU

by Kees Cook

The seccomp benchmark test (for validating the benefit of bitmaps) can be sensitive to scheduling speed, so pin the process to a single CPU, which appears to significantly improve reliability. Reported-by: kernel test robot <oliver.sang(a)intel.com> Closes: https://lore.kernel.org/oe-lkp/202402061002.3a8722fd-oliver.sang@intel.com Cc: Mark Brown <broonie(a)kernel.org> Cc: Andy Lutomirski <luto(a)amacapital.net> Cc: Will Drewry <wad(a)chromium.org> Signed-off-by: Kees Cook <keescook(a)chromium.org> --- .../selftests/seccomp/seccomp_benchmark.c | 27 +++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/tools/testing/selftests/seccomp/seccomp_benchmark.c b/tools/testing/selftests/seccomp/seccomp_benchmark.c index 5b5c9d558dee..d0b733e708cc 100644 --- a/tools/testing/selftests/seccomp/seccomp_benchmark.c +++ b/tools/testing/selftests/seccomp/seccomp_benchmark.c @@ -4,7 +4,9 @@ */ #define _GNU_SOURCE #include <assert.h> +#include <err.h> #include <limits.h> +#include <sched.h> #include <stdbool.h> #include <stddef.h> #include <stdio.h> @@ -119,6 +121,29 @@ long compare(const char *name_one, const char *name_eval, const char *name_two, return good ? 0 : 1; } +/* Pin to a single CPU so the benchmark won't bounce around the system. */ +void affinity(void) +{ + long cpu; + ulong ncores = sysconf(_SC_NPROCESSORS_CONF); + cpu_set_t *setp = CPU_ALLOC(ncores); + ulong setsz = CPU_ALLOC_SIZE(ncores); + + /* Set from highest CPU down. */ + for (cpu = ncores - 1; cpu >= 0; cpu--) { + CPU_ZERO_S(setsz, setp); + CPU_SET_S(cpu, setsz, setp); + if (sched_setaffinity(getpid(), setsz, setp) == -1) + continue; + printf("Pinned to CPU %lu of %lu\n", cpu + 1, ncores); + goto out; + } + fprintf(stderr, "Could not set CPU affinity -- calibration may not work well"); + +out: + CPU_FREE(setp); +} + int main(int argc, char *argv[]) { struct sock_filter bitmap_filter[] = { @@ -153,6 +178,8 @@ int main(int argc, char *argv[]) system("grep -H . /proc/sys/net/core/bpf_jit_enable"); system("grep -H . /proc/sys/net/core/bpf_jit_harden"); + affinity(); + if (argc > 1) samples = strtoull(argv[1], NULL, 0); else -- 2.34.1

1 year, 10 months

2
3
0 0

[PATCH net-next] selftests/net: ignore timing errors in so_txtime if KSFT_MACHINE_SLOW

by Willem de Bruijn

From: Willem de Bruijn <willemb(a)google.com> This test is time sensitive. It may fail on virtual machines and for debug builds. Continue to run in these environments to get code coverage. But optionally suppress failure for timing errors (only). This is controlled with environment variable KSFT_MACHINE_SLOW. The test continues to return 0 (KSFT_PASS), rather than KSFT_XFAIL as previously discussed. Because making so_txtime.c return that and then making so_txtime.sh capture runs that pass that vs KSFT_FAIL and pass it on added a bunch of (fragile bash) boilerplate, while the result is interpreted the same as KSFT_PASS anyway. Signed-off-by: Willem de Bruijn <willemb(a)google.com> --- tools/testing/selftests/net/so_txtime.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/net/so_txtime.c b/tools/testing/selftests/net/so_txtime.c index 2672ac0b6d1f..8457b7ccbc09 100644 --- a/tools/testing/selftests/net/so_txtime.c +++ b/tools/testing/selftests/net/so_txtime.c @@ -134,8 +134,11 @@ static void do_recv_one(int fdr, struct timed_send *ts) if (rbuf[0] != ts->data) error(1, 0, "payload mismatch. expected %c", ts->data); - if (llabs(tstop - texpect) > cfg_variance_us) - error(1, 0, "exceeds variance (%d us)", cfg_variance_us); + if (llabs(tstop - texpect) > cfg_variance_us) { + fprintf(stderr, "exceeds variance (%d us)\n", cfg_variance_us); + if (!getenv("KSFT_MACHINE_SLOW")) + exit(1); + } } static void do_recv_verify_empty(int fdr) -- 2.43.0.429.g432eaa2c6b-goog

1 year, 10 months

5
5
0 0

[PATCH] selftests/net: Amend per-netns counter checks

by Dmitry Safonov

Selftests here check not only that connect()/accept() for TCP-AO/TCP-MD5/non-signed-TCP combinations do/don't establish connections, but also counters: those are per-AO-key, per-socket and per-netns. The counters are checked on the server's side, as the server listener has TCP-AO/TCP-MD5/no keys for different peers. All tests run in the same namespaces with the same veth pair, created in test_init(). After close() in both client and server, the sides go through the regular FIN/ACK + FIN/ACK sequence, which goes in the background. If the selftest has already started a new testing scenario, read per-netns counters - it may fail in the end iff it doesn't expect the TCPAOGood per-netns counters go up during the test. Let's just kill both TCP-AO sides - that will avoid any asynchronous background TCP-AO segments going to either sides. Reported-by: Jakub Kicinski <kuba(a)kernel.org> Closes: https://lore.kernel.org/all/20240201132153.4d68f45e@kernel.org/T/#u Fixes: 6f0c472a6815 ("selftests/net: Add TCP-AO + TCP-MD5 + no sign listen socket tests") Signed-off-by: Dmitry Safonov <dima(a)arista.com> --- tools/testing/selftests/net/tcp_ao/unsigned-md5.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/net/tcp_ao/unsigned-md5.c b/tools/testing/selftests/net/tcp_ao/unsigned-md5.c index c5b568cd7d90..6b59a652159f 100644 --- a/tools/testing/selftests/net/tcp_ao/unsigned-md5.c +++ b/tools/testing/selftests/net/tcp_ao/unsigned-md5.c @@ -110,9 +110,9 @@ static void try_accept(const char *tst_name, unsigned int port, test_tcp_ao_counters_cmp(tst_name, &ao_cnt1, &ao_cnt2, cnt_expected); out: - synchronize_threads(); /* close() */ + synchronize_threads(); /* test_kill_sk() */ if (sk > 0) - close(sk); + test_kill_sk(sk); } static void server_add_routes(void) @@ -302,10 +302,10 @@ static void try_connect(const char *tst_name, unsigned int port, test_ok("%s: connected", tst_name); out: - synchronize_threads(); /* close() */ + synchronize_threads(); /* test_kill_sk() */ /* _test_connect_socket() cleans up on failure */ if (ret > 0) - close(sk); + test_kill_sk(sk); } #define PREINSTALL_MD5_FIRST BIT(0) @@ -486,10 +486,10 @@ static void try_to_add(const char *tst_name, unsigned int port, } out: - synchronize_threads(); /* close() */ + synchronize_threads(); /* test_kill_sk() */ /* _test_connect_socket() cleans up on failure */ if (ret > 0) - close(sk); + test_kill_sk(sk); } static void client_add_ip(union tcp_addr *client, const char *ip) --- base-commit: 021533194476035883300d60fbb3136426ac8ea5 change-id: 20240202-unsigned-md5-netns-counters-35134409362a Best regards, -- Dmitry Safonov <dima(a)arista.com>

1 year, 10 months

2
1
0 0

[PATCH v4 0/5] selftests/resctrl: Add non-contiguous CBMs in Intel CAT selftest

by Maciej Wieczor-Retman

Non-contiguous CBM support for Intel CAT has been merged into the kernel with Commit 0e3cd31f6e90 ("x86/resctrl: Enable non-contiguous CBMs in Intel CAT") but there is no selftest that would validate if this feature works correctly. The selftest needs to verify if writing non-contiguous CBMs to the schemata file behaves as expected in comparison to the information about non-contiguous CBMs support. The patch series is based on a rework of resctrl selftests that's currently in review [1]. The patch also implements a similar functionality presented in the bash script included in the cover letter of the original non-contiguous CBMs in Intel CAT series [3]. Changelog v4: - Changes to error failure return values in non-contiguous test. - Some minor text refactoring without functional changes. Changelog v3: - Rebase onto v4 of Ilpo's series [1]. - Split old patch 3/4 into two parts. One doing refactoring and one adding a new function. - Some changes to all the patches after Reinette's review. Changelog v2: - Rebase onto v4 of Ilpo's series [2]. - Add two patches that prepare helpers for the new test. - Move Ilpo's patch that adds test grouping to this series. - Apply Ilpo's suggestion to the patch that adds a new test. [1] https://lore.kernel.org/all/20231215150515.36983-1-ilpo.jarvinen@linux.inte… [2] https://lore.kernel.org/all/20231211121826.14392-1-ilpo.jarvinen@linux.inte… [3] https://lore.kernel.org/all/cover.1696934091.git.maciej.wieczor-retman@inte… Older versions of this series: [v1] https://lore.kernel.org/all/20231109112847.432687-1-maciej.wieczor-retman@i… [v2] https://lore.kernel.org/all/cover.1702392177.git.maciej.wieczor-retman@inte… Ilpo Järvinen (1): selftests/resctrl: Add test groups and name L3 CAT test L3_CAT Maciej Wieczor-Retman (4): selftests/resctrl: Add helpers for the non-contiguous test selftests/resctrl: Split validate_resctrl_feature_request() selftests/resctrl: Add resource_info_file_exists() selftests/resctrl: Add non-contiguous CBMs CAT test tools/testing/selftests/resctrl/cat_test.c | 84 ++++++++++++++++- tools/testing/selftests/resctrl/cmt_test.c | 2 +- tools/testing/selftests/resctrl/mba_test.c | 2 +- tools/testing/selftests/resctrl/mbm_test.c | 6 +- tools/testing/selftests/resctrl/resctrl.h | 10 +- .../testing/selftests/resctrl/resctrl_tests.c | 18 +++- tools/testing/selftests/resctrl/resctrlfs.c | 94 ++++++++++++++++--- 7 files changed, 192 insertions(+), 24 deletions(-) -- 2.43.0

1 year, 10 months

3
17
0 0

[broonie-misc:kselftest-seccomp-benchmark-ktap] [kselftest/seccomp] 626fa92237: kernel-selftests.seccomp.seccomp_benchmark.fail

by kernel test robot

Hello, kernel test robot noticed "kernel-selftests.seccomp.seccomp_benchmark.fail" on: commit: 626fa9223749db85f03678573dd49ba2c7b6cd8b ("kselftest/seccomp: Report each expectation we assert as a KTAP test") https://git.kernel.org/cgit/linux/kernel/git/broonie/misc.git kselftest-seccomp-benchmark-ktap in testcase: kernel-selftests version: kernel-selftests-x86_64-60acb023-1_20230329 with following parameters: group: group-s compiler: gcc-12 test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 32G memory (please refer to attached dmesg/kmsg for entire log/backtrace) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang(a)intel.com> | Closes: https://lore.kernel.org/oe-lkp/202402061002.3a8722fd-oliver.sang@intel.com # timeout set to 120 # selftests: seccomp: seccomp_benchmark # TAP version 13 # 1..7 # # Running on: # # Linux lkp-csl-d01 6.8.0-rc1-00003-g626fa9223749 #1 SMP PREEMPT_DYNAMIC Wed Jan 31 08:33:40 CST 2024 x86_64 GNU/Linux # # Current BPF sysctl settings: # # /proc/sys/net/core/bpf_jit_enable:1 # # /proc/sys/net/core/bpf_jit_harden:0 # # Calibrating sample size for 15 seconds worth of syscalls ... # # Benchmarking 36800370 syscalls... # # 15.443201110 - 1.042366576 = 14400834534 (14.4s) # # getpid native: 391 ns # # 31.586833659 - 15.443583738 = 16143249921 (16.1s) # # getpid RET_ALLOW 1 filter (bitmap): 438 ns # # 47.494976754 - 31.587621280 = 15907355474 (15.9s) # # getpid RET_ALLOW 2 filters (bitmap): 432 ns # # 66.262898246 - 47.495560365 = 18767337881 (18.8s) # # getpid RET_ALLOW 3 filters (full): 509 ns # # 86.089613909 - 66.263287445 = 19826326464 (19.8s) # # getpid RET_ALLOW 4 filters (full): 538 ns # # Estimated total seccomp overhead for 1 bitmapped filter: 47 ns # # Estimated total seccomp overhead for 2 bitmapped filters: 41 ns # # Estimated total seccomp overhead for 3 full filters: 118 ns # # Estimated total seccomp overhead for 4 full filters: 147 ns # # Estimated seccomp entry overhead: 53 ns # # Estimated seccomp per-filter overhead (last 2 diff): 29 ns # # Estimated seccomp per-filter overhead (filters / 4): 23 ns # # Expectations: # # native ≤ 1 bitmap (391 ≤ 438): ✔️ # ok 1 native ≤ 1 bitmap # # native ≤ 1 filter (391 ≤ 509): ✔️ # ok 2 native ≤ 1 filter # # per-filter (last 2 diff) ≈ per-filter (filters / 4) (29 ≈ 23): ❌ # not ok 3 per-filter (last 2 diff) ≈ per-filter (filters / 4) # # 1 bitmapped ≈ 2 bitmapped (47 ≈ 41): ❌ # not ok 4 1 bitmapped ≈ 2 bitmapped # # Skipping constant action bitmap expectations: they appear unsupported. # ok 5 # SKIP entry ≈ 1 bitmapped # ok 6 # SKIP entry ≈ 2 bitmapped # ok 7 # SKIP native + entry + (per filter * 4) ≈ 4 filters total # # Saw unexpected benchmark result. Try running again with more samples? # # Totals: pass:2 fail:2 xfail:0 xpass:0 skip:3 error:0 not ok 2 selftests: seccomp: seccomp_benchmark # exit=1 The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240206/202402061002.3a8722fd-oliv… -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki

1 year, 10 months

1
0
0 0

[PATCH v8 0/4] Introduce mseal

by jeffxu＠chromium.org

From: Jeff Xu <jeffxu(a)chromium.org> This patchset proposes a new mseal() syscall for the Linux kernel. In a nutshell, mseal() protects the VMAs of a given virtual memory range against modifications, such as changes to their permission bits. Modern CPUs support memory permissions, such as the read/write (RW) and no-execute (NX) bits. Linux has supported NX since the release of kernel version 2.6.8 in August 2004 [1]. The memory permission feature improves the security stance on memory corruption bugs, as an attacker cannot simply write to arbitrary memory and point the code to it. The memory must be marked with the X bit, or else an exception will occur. Internally, the kernel maintains the memory permissions in a data structure called VMA (vm_area_struct). mseal() additionally protects the VMA itself against modifications of the selected seal type. Memory sealing is useful to mitigate memory corruption issues where a corrupted pointer is passed to a memory management system. For example, such an attacker primitive can break control-flow integrity guarantees since read-only memory that is supposed to be trusted can become writable or .text pages can get remapped. Memory sealing can automatically be applied by the runtime loader to seal .text and .rodata pages and applications can additionally seal security critical data at runtime. A similar feature already exists in the XNU kernel with the VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the mimmutable syscall [4]. Also, Chrome wants to adopt this feature for their CFI work [2] and this patchset has been designed to be compatible with the Chrome use case. Two system calls are involved in sealing the map: mmap() and mseal(). The new mseal() is an syscall on 64 bit CPU, and with following signature: int mseal(void addr, size_t len, unsigned long flags) addr/len: memory range. flags: reserved. mseal() blocks following operations for the given memory range. 1> Unmapping, moving to another location, and shrinking the size, via munmap() and mremap(), can leave an empty space, therefore can be replaced with a VMA with a new set of attributes. 2> Moving or expanding a different VMA into the current location, via mremap(). 3> Modifying a VMA via mmap(MAP_FIXED). 4> Size expansion, via mremap(), does not appear to pose any specific risks to sealed VMAs. It is included anyway because the use case is unclear. In any case, users can rely on merging to expand a sealed VMA. 5> mprotect() and pkey_mprotect(). 6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous memory, when users don't have write permission to the memory. Those behaviors can alter region contents by discarding pages, effectively a memset(0) for anonymous memory. In addition: mmap() has two related changes. The PROT_SEAL bit in prot field of mmap(). When present, it marks the map sealed since creation. The MAP_SEALABLE bit in the flags field of mmap(). When present, it marks the map as sealable. A map created without MAP_SEALABLE will not support sealing, i.e. mseal() will fail. Applications that don't care about sealing will expect their behavior unchanged. For those that need sealing support, opt-in by adding MAP_SEALABLE in mmap(). The idea that inspired this patch comes from Stephen Röttger’s work in V8 CFI [5]. Chrome browser in ChromeOS will be the first user of this API. Indeed, the Chrome browser has very specific requirements for sealing, which are distinct from those of most applications. For example, in the case of libc, sealing is only applied to read-only (RO) or read-execute (RX) memory segments (such as .text and .RELRO) to prevent them from becoming writable, the lifetime of those mappings are tied to the lifetime of the process. Chrome wants to seal two large address space reservations that are managed by different allocators. The memory is mapped RW- and RWX respectively but write access to it is restricted using pkeys (or in the future ARM permission overlay extensions). The lifetime of those mappings are not tied to the lifetime of the process, therefore, while the memory is sealed, the allocators still need to free or discard the unused memory. For example, with madvise(DONTNEED). However, always allowing madvise(DONTNEED) on this range poses a security risk. For example if a jump instruction crosses a page boundary and the second page gets discarded, it will overwrite the target bytes with zeros and change the control flow. Checking write-permission before the discard operation allows us to control when the operation is valid. In this case, the madvise will only succeed if the executing thread has PKEY write permissions and PKRU changes are protected in software by control-flow integrity. Although the initial version of this patch series is targeting the Chrome browser as its first user, it became evident during upstream discussions that we would also want to ensure that the patch set eventually is a complete solution for memory sealing and compatible with other use cases. The specific scenario currently in mind is glibc's use case of loading and sealing ELF executables. To this end, Stephen is working on a change to glibc to add sealing support to the dynamic linker, which will seal all non-writable segments at startup. Once this work is completed, all applications will be able to automatically benefit from these new protections. In closing, I would like to formally acknowledge the valuable contributions received during the RFC process, which were instrumental in shaping this patch: Jann Horn: raising awareness and providing valuable insights on the destructive madvise operations. Liam R. Howlett: perf optimization. Linus Torvalds: assisting in defining system call signature and scope. Pedro Falcato: suggesting sealing in the mmap(). Theo de Raadt: sharing the experiences and insight gained from implementing mimmutable() in OpenBSD. Change history: =============== V8: - perf optimization in mmap. (Liam R. Howlett) - add one testcase (test_seal_zero_address) - Update mseal.rst to add note for MAP_SEALABLE. V7: - fix index.rst (Randy Dunlap) - fix arm build (Randy Dunlap) - return EPERM for blocked operations (Theo de Raadt) https://lore.kernel.org/linux-mm/20240122152905.2220849-2-jeffxu@chromium.o… V6: - Drop RFC from subject, Given Linus's general approval. - Adjust syscall number for mseal (main Jan.11/2024) - Code style fix (Matthew Wilcox) - selftest: use ksft macros (Muhammad Usama Anjum) - Document fix. (Randy Dunlap) https://lore.kernel.org/all/20240111234142.2944934-1-jeffxu@chromium.org/ V5: - fix build issue in mseal-Wire-up-mseal-syscall (Suggested by Linus Torvalds, and Greg KH) - updates on selftest. https://lore.kernel.org/lkml/20240109154547.1839886-1-jeffxu@chromium.org/#r V4: (Suggested by Linus Torvalds) - new signature: mseal(start,len,flags) - 32 bit is not supported. vm_seal is removed, use vm_flags instead. - single bit in vm_flags for sealed state. - CONFIG_MSEAL kernel config is removed. - single bit of PROT_SEAL in the "Prot" field of mmap(). Other changes: - update selftest (Suggested by Muhammad Usama Anjum) - update documentation. https://lore.kernel.org/all/20240104185138.169307-1-jeffxu@chromium.org/ V3: - Abandon per-syscall approach, (Suggested by Linus Torvalds). - Organize sealing types around their functionality, such as MM_SEAL_BASE, MM_SEAL_PROT_PKEY. - Extend the scope of sealing from calls originated in userspace to both kernel and userspace. (Suggested by Linus Torvalds) - Add seal type support in mmap(). (Suggested by Pedro Falcato) - Add a new sealing type: MM_SEAL_DISCARD_RO_ANON to prevent destructive operations of madvise. (Suggested by Jann Horn and Stephen Röttger) - Make sealed VMAs mergeable. (Suggested by Jann Horn) - Add MAP_SEALABLE to mmap() - Add documentation - mseal.rst https://lore.kernel.org/linux-mm/20231212231706.2680890-2-jeffxu@chromium.o… v2: Use _BITUL to define MM_SEAL_XX type. Use unsigned long for seal type in sys_mseal() and other functions. Remove internal VM_SEAL_XX type and convert_user_seal_type(). Remove MM_ACTION_XX type. Remove caller_origin(ON_BEHALF_OF_XX) and replace with sealing bitmask. Add more comments in code. Add a detailed commit message. https://lore.kernel.org/lkml/20231017090815.1067790-1-jeffxu@chromium.org/ v1: https://lore.kernel.org/lkml/20231016143828.647848-1-jeffxu@chromium.org/ ---------------------------------------------------------------- [1] https://kernelnewbies.org/Linux_2_6_8 [2] https://v8.dev/blog/control-flow-integrity [3] https://github.com/apple-oss-distributions/xnu/blob/1031c584a5e37aff177559b… [4] https://man.openbsd.org/mimmutable.2 [5] https://docs.google.com/document/d/1O2jwK4dxI3nRcOJuPYkonhTkNQfbmwdvxQMyXge… [6] https://lore.kernel.org/lkml/CAG48ez3ShUYey+ZAFsU2i1RpQn0a5eOs2hzQ426Fkcgnf… [7] https://lore.kernel.org/lkml/20230515130553.2311248-1-jeffxu@chromium.org/ Jeff Xu (4): mseal: Wire up mseal syscall mseal: add mseal syscall selftest mm/mseal memory sealing mseal:add documentation Documentation/userspace-api/index.rst | 1 + Documentation/userspace-api/mseal.rst | 215 ++ arch/alpha/kernel/syscalls/syscall.tbl | 1 + arch/arm/tools/syscall.tbl | 1 + arch/arm64/include/asm/unistd.h | 2 +- arch/arm64/include/asm/unistd32.h | 2 + arch/m68k/kernel/syscalls/syscall.tbl | 1 + arch/microblaze/kernel/syscalls/syscall.tbl | 1 + arch/mips/kernel/syscalls/syscall_n32.tbl | 1 + arch/mips/kernel/syscalls/syscall_n64.tbl | 1 + arch/mips/kernel/syscalls/syscall_o32.tbl | 1 + arch/parisc/kernel/syscalls/syscall.tbl | 1 + arch/powerpc/kernel/syscalls/syscall.tbl | 1 + arch/s390/kernel/syscalls/syscall.tbl | 1 + arch/sh/kernel/syscalls/syscall.tbl | 1 + arch/sparc/kernel/syscalls/syscall.tbl | 1 + arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/xtensa/kernel/syscalls/syscall.tbl | 1 + include/linux/syscalls.h | 1 + include/uapi/asm-generic/mman-common.h | 8 + include/uapi/asm-generic/unistd.h | 5 +- kernel/sys_ni.c | 1 + mm/Makefile | 4 + mm/internal.h | 48 + mm/madvise.c | 12 + mm/mmap.c | 35 +- mm/mprotect.c | 10 + mm/mremap.c | 31 + mm/mseal.c | 343 ++++ tools/testing/selftests/mm/.gitignore | 1 + tools/testing/selftests/mm/Makefile | 1 + tools/testing/selftests/mm/mseal_test.c | 2024 +++++++++++++++++++ 33 files changed, 2756 insertions(+), 3 deletions(-) create mode 100644 Documentation/userspace-api/mseal.rst create mode 100644 mm/mseal.c create mode 100644 tools/testing/selftests/mm/mseal_test.c -- 2.43.0.429.g432eaa2c6b-goog

1 year, 10 months

12
49
0 0

[PATCH v2 2/2] selftests/mm: run_vmtests.sh: add hugetlb_madv_vs_map

by Breno Leitao

hugetlb_madv_vs_map selftest was not part of the mm test-suite since we didn't have a fix for the problem it found. Now that the problem is already fixed (see previous commit), let's enable this selftest in the default test-suite. Signed-off-by: Breno Leitao <leitao(a)debian.org> --- tools/testing/selftests/mm/run_vmtests.sh | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh index 246d53a5d7f2..50e2094ed761 100755 --- a/tools/testing/selftests/mm/run_vmtests.sh +++ b/tools/testing/selftests/mm/run_vmtests.sh @@ -253,6 +253,7 @@ nr_hugepages_tmp=$(cat /proc/sys/vm/nr_hugepages) # For this test, we need one and just one huge page echo 1 > /proc/sys/vm/nr_hugepages CATEGORY="hugetlb" run_test ./hugetlb_fault_after_madv +CATEGORY="hugetlb" run_test ./hugetlb_madv_vs_map # Restore the previous number of huge pages, since further tests rely on it echo "$nr_hugepages_tmp" > /proc/sys/vm/nr_hugepages -- 2.34.1

1 year, 10 months

1
0
0 0

[PATCH bpf-next v4 0/3] Annotate kfuncs in .BTF_ids section

by Daniel Xu

=== Description === This is a bpf-treewide change that annotates all kfuncs as such inside .BTF_ids. This annotation eventually allows us to automatically generate kfunc prototypes from bpftool. We store this metadata inside a yet-unused flags field inside struct btf_id_set8 (thanks Kumar!). pahole will be taught where to look. More details about the full chain of events are available in commit 3's description. The accompanying pahole and bpftool changes can be viewed here on these "frozen" branches [0][1]. [0]: https://github.com/danobi/pahole/tree/kfunc_btf-v3-mailed [1]: https://github.com/danobi/linux/tree/kfunc_bpftool-mailed === Changelog === Changes from v3: * Rebase to bpf-next and add missing annotation on new kfunc Changes from v2: * Only WARN() for vmlinux kfuncs Changes from v1: * Move WARN_ON() up a call level * Also return error when kfunc set is not properly tagged * Use BTF_KFUNCS_START/END instead of flags * Rename BTF_SET8_KFUNC to BTF_SET8_KFUNCS Daniel Xu (3): bpf: btf: Support flags for BTF_SET8 sets bpf: btf: Add BTF_KFUNCS_START/END macro pair bpf: treewide: Annotate BPF kfuncs in BTF Documentation/bpf/kfuncs.rst | 8 +++---- drivers/hid/bpf/hid_bpf_dispatch.c | 8 +++---- fs/verity/measure.c | 4 ++-- include/linux/btf_ids.h | 21 +++++++++++++++---- kernel/bpf/btf.c | 8 +++++++ kernel/bpf/cpumask.c | 4 ++-- kernel/bpf/helpers.c | 8 +++---- kernel/bpf/map_iter.c | 4 ++-- kernel/cgroup/rstat.c | 4 ++-- kernel/trace/bpf_trace.c | 8 +++---- net/bpf/test_run.c | 8 +++---- net/core/filter.c | 20 +++++++++--------- net/core/xdp.c | 4 ++-- net/ipv4/bpf_tcp_ca.c | 4 ++-- net/ipv4/fou_bpf.c | 4 ++-- net/ipv4/tcp_bbr.c | 4 ++-- net/ipv4/tcp_cubic.c | 4 ++-- net/ipv4/tcp_dctcp.c | 4 ++-- net/netfilter/nf_conntrack_bpf.c | 4 ++-- net/netfilter/nf_nat_bpf.c | 4 ++-- net/xfrm/xfrm_interface_bpf.c | 4 ++-- net/xfrm/xfrm_state_bpf.c | 4 ++-- .../selftests/bpf/bpf_testmod/bpf_testmod.c | 8 +++---- 23 files changed, 87 insertions(+), 66 deletions(-) -- 2.42.1

1 year, 10 months

7
10
0 0

[PATCH v14 6/6] ring-buffer/selftest: Add ring-buffer mapping test

by Vincent Donnefort

This test maps a ring-buffer and validate the meta-page after reset and after emitting few events. Cc: Shuah Khan <shuah(a)kernel.org> Cc: Shuah Khan <skhan(a)linuxfoundation.org> Cc: linux-kselftest(a)vger.kernel.org Signed-off-by: Vincent Donnefort <vdonnefort(a)google.com> diff --git a/tools/testing/selftests/ring-buffer/Makefile b/tools/testing/selftests/ring-buffer/Makefile new file mode 100644 index 000000000000..627c5fa6d1ab --- /dev/null +++ b/tools/testing/selftests/ring-buffer/Makefile @@ -0,0 +1,8 @@ +# SPDX-License-Identifier: GPL-2.0 +CFLAGS += -Wl,-no-as-needed -Wall +CFLAGS += $(KHDR_INCLUDES) +CFLAGS += -D_GNU_SOURCE + +TEST_GEN_PROGS = map_test + +include ../lib.mk diff --git a/tools/testing/selftests/ring-buffer/config b/tools/testing/selftests/ring-buffer/config new file mode 100644 index 000000000000..d936f8f00e78 --- /dev/null +++ b/tools/testing/selftests/ring-buffer/config @@ -0,0 +1,2 @@ +CONFIG_FTRACE=y +CONFIG_TRACER_SNAPSHOT=y diff --git a/tools/testing/selftests/ring-buffer/map_test.c b/tools/testing/selftests/ring-buffer/map_test.c new file mode 100644 index 000000000000..56c44b29d998 --- /dev/null +++ b/tools/testing/selftests/ring-buffer/map_test.c @@ -0,0 +1,273 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Ring-buffer memory mapping tests + * + * Copyright (c) 2024 Vincent Donnefort <vdonnefort(a)google.com> + */ +#include <fcntl.h> +#include <sched.h> +#include <stdbool.h> +#include <stdio.h> +#include <stdlib.h> +#include <unistd.h> + +#include <linux/trace_mmap.h> + +#include <sys/mman.h> +#include <sys/ioctl.h> + +#include "../user_events/user_events_selftests.h" /* share tracefs setup */ +#include "../kselftest_harness.h" + +#define TRACEFS_ROOT "/sys/kernel/tracing" + +static int __tracefs_write(const char *path, const char *value) +{ + int fd, ret; + + fd = open(path, O_WRONLY | O_TRUNC); + if (fd < 0) + return fd; + + ret = write(fd, value, strlen(value)); + + close(fd); + + return ret == -1 ? -errno : 0; +} + +static int __tracefs_write_int(const char *path, int value) +{ + char *str; + int ret; + + if (asprintf(&str, "%d", value) < 0) + return -1; + + ret = __tracefs_write(path, str); + + free(str); + + return ret; +} + +#define tracefs_write_int(path, value) \ + ASSERT_EQ(__tracefs_write_int((path), (value)), 0) + +#define tracefs_write(path, value) \ + ASSERT_EQ(__tracefs_write((path), (value)), 0) + +static int tracefs_reset(void) +{ + if (__tracefs_write_int(TRACEFS_ROOT"/tracing_on", 0)) + return -1; + if (__tracefs_write(TRACEFS_ROOT"/trace", "")) + return -1; + if (__tracefs_write(TRACEFS_ROOT"/set_event", "")) + return -1; + if (__tracefs_write(TRACEFS_ROOT"/current_tracer", "nop")) + return -1; + + return 0; +} + +struct tracefs_cpu_map_desc { + struct trace_buffer_meta *meta; + void *data; + int cpu_fd; +}; + +int tracefs_cpu_map(struct tracefs_cpu_map_desc *desc, int cpu) +{ + unsigned long meta_len, data_len; + int page_size = getpagesize(); + char *cpu_path; + void *map; + + if (asprintf(&cpu_path, + TRACEFS_ROOT"/per_cpu/cpu%d/trace_pipe_raw", + cpu) < 0) + return -ENOMEM; + + desc->cpu_fd = open(cpu_path, O_RDONLY | O_NONBLOCK); + free(cpu_path); + if (desc->cpu_fd < 0) + return -ENODEV; + + map = mmap(NULL, page_size, PROT_READ, MAP_SHARED, desc->cpu_fd, 0); + if (map == MAP_FAILED) + return -errno; + + desc->meta = (struct trace_buffer_meta *)map; + + meta_len = desc->meta->meta_page_size; + data_len = desc->meta->subbuf_size * desc->meta->nr_subbufs; + + map = mmap(NULL, data_len, PROT_READ, MAP_SHARED, desc->cpu_fd, meta_len); + if (map == MAP_FAILED) { + munmap(desc->meta, desc->meta->meta_page_size); + return -EINVAL; + } + + desc->data = map; + + return 0; +} + +void tracefs_cpu_unmap(struct tracefs_cpu_map_desc *desc) +{ + munmap(desc->data, desc->meta->subbuf_size * desc->meta->nr_subbufs); + munmap(desc->meta, desc->meta->meta_page_size); + close(desc->cpu_fd); +} + +FIXTURE(map) { + struct tracefs_cpu_map_desc map_desc; + bool umount; +}; + +FIXTURE_VARIANT(map) { + int subbuf_size; +}; + +FIXTURE_VARIANT_ADD(map, subbuf_size_4k) { + .subbuf_size = 4, +}; + +FIXTURE_VARIANT_ADD(map, subbuf_size_8k) { + .subbuf_size = 8, +}; + +FIXTURE_SETUP(map) +{ + int cpu = sched_getcpu(); + cpu_set_t cpu_mask; + bool fail, umount; + char *message; + + if (!tracefs_enabled(&message, &fail, &umount)) { + if (fail) { + TH_LOG("Tracefs setup failed: %s", message); + ASSERT_FALSE(fail); + } + SKIP(return, "Skipping: %s", message); + } + + self->umount = umount; + + ASSERT_GE(cpu, 0); + + ASSERT_EQ(tracefs_reset(), 0); + + tracefs_write_int(TRACEFS_ROOT"/buffer_subbuf_size_kb", variant->subbuf_size); + + ASSERT_EQ(tracefs_cpu_map(&self->map_desc, cpu), 0); + + /* + * Ensure generated events will be found on this very same ring-buffer. + */ + CPU_ZERO(&cpu_mask); + CPU_SET(cpu, &cpu_mask); + ASSERT_EQ(sched_setaffinity(0, sizeof(cpu_mask), &cpu_mask), 0); +} + +FIXTURE_TEARDOWN(map) +{ + tracefs_reset(); + + if (self->umount) + tracefs_unmount(); + + tracefs_cpu_unmap(&self->map_desc); +} + +TEST_F(map, meta_page_check) +{ + struct tracefs_cpu_map_desc *desc = &self->map_desc; + int cnt = 0; + + ASSERT_EQ(desc->meta->entries, 0); + ASSERT_EQ(desc->meta->overrun, 0); + ASSERT_EQ(desc->meta->read, 0); + + ASSERT_EQ(desc->meta->reader.id, 0); + ASSERT_EQ(desc->meta->reader.read, 0); + + ASSERT_EQ(ioctl(desc->cpu_fd, TRACE_MMAP_IOCTL_GET_READER), 0); + ASSERT_EQ(desc->meta->reader.id, 0); + + tracefs_write_int(TRACEFS_ROOT"/tracing_on", 1); + for (int i = 0; i < 16; i++) + tracefs_write_int(TRACEFS_ROOT"/trace_marker", i); +again: + ASSERT_EQ(ioctl(desc->cpu_fd, TRACE_MMAP_IOCTL_GET_READER), 0); + + ASSERT_EQ(desc->meta->entries, 16); + ASSERT_EQ(desc->meta->overrun, 0); + ASSERT_EQ(desc->meta->read, 16); + + ASSERT_EQ(desc->meta->reader.id, 1); + + if (!(cnt++)) + goto again; +} + +FIXTURE(snapshot) { + bool umount; +}; + +FIXTURE_SETUP(snapshot) +{ + bool fail, umount; + struct stat sb; + char *message; + + if (stat(TRACEFS_ROOT"/snapshot", &sb)) + SKIP(return, "Skipping: %s", "snapshot not available"); + + if (!tracefs_enabled(&message, &fail, &umount)) { + if (fail) { + TH_LOG("Tracefs setup failed: %s", message); + ASSERT_FALSE(fail); + } + SKIP(return, "Skipping: %s", message); + } + + self->umount = umount; +} + +FIXTURE_TEARDOWN(snapshot) +{ + __tracefs_write(TRACEFS_ROOT"/events/sched/sched_switch/trigger", + "!snapshot"); + tracefs_reset(); + + if (self->umount) + tracefs_unmount(); +} + +TEST_F(snapshot, excludes_map) +{ + struct tracefs_cpu_map_desc map_desc; + int cpu = sched_getcpu(); + + ASSERT_GE(cpu, 0); + tracefs_write(TRACEFS_ROOT"/events/sched/sched_switch/trigger", + "snapshot"); + ASSERT_EQ(tracefs_cpu_map(&map_desc, cpu), -EBUSY); +} + +TEST_F(snapshot, excluded_by_map) +{ + struct tracefs_cpu_map_desc map_desc; + int cpu = sched_getcpu(); + + ASSERT_EQ(tracefs_cpu_map(&map_desc, cpu), 0); + + ASSERT_EQ(__tracefs_write(TRACEFS_ROOT"/events/sched/sched_switch/trigger", + "snapshot"), -EBUSY); + ASSERT_EQ(__tracefs_write(TRACEFS_ROOT"/snapshot", + "1"), -EBUSY); +} + +TEST_HARNESS_MAIN -- 2.43.0.594.gd9cf4e227d-goog

1 year, 10 months

1
0
0 0

[PATCH v3] KVM: selftests: Fix the dirty_log_test semaphore imbalance

by Shaoqin Huang

When execute the dirty_log_test on some aarch64 machine, it sometimes trigger the ASSERT: ==== Test Assertion Failure ==== dirty_log_test.c:384: dirty_ring_vcpu_ring_full pid=14854 tid=14854 errno=22 - Invalid argument 1 0x00000000004033eb: dirty_ring_collect_dirty_pages at dirty_log_test.c:384 2 0x0000000000402d27: log_mode_collect_dirty_pages at dirty_log_test.c:505 3 (inlined by) run_test at dirty_log_test.c:802 4 0x0000000000403dc7: for_each_guest_mode at guest_modes.c:100 5 0x0000000000401dff: main at dirty_log_test.c:941 (discriminator 3) 6 0x0000ffff9be173c7: ?? ??:0 7 0x0000ffff9be1749f: ?? ??:0 8 0x000000000040206f: _start at ??:? Didn't continue vcpu even without ring full The dirty_log_test fails when execute the dirty-ring test, this is because the sem_vcpu_cont and the sem_vcpu_stop is non-zero value when execute the dirty_ring_collect_dirty_pages() function. When those two sem_t variables are non-zero, the dirty_ring_wait_vcpu() at the beginning of the dirty_ring_collect_dirty_pages() will not wait for the vcpu to stop, but continue to execute the following code. In this case, before vcpu stop, if the dirty_ring_vcpu_ring_full is true, and the dirty_ring_collect_dirty_pages() has passed the check for the dirty_ring_vcpu_ring_full but hasn't execute the check for the continued_vcpu, the vcpu stop, and set the dirty_ring_vcpu_ring_full to false. Then dirty_ring_collect_dirty_pages() will trigger the ASSERT. Why sem_vcpu_cont and sem_vcpu_stop can be non-zero value? It's because the dirty_ring_before_vcpu_join() execute the sem_post(&sem_vcpu_cont) at the end of each dirty-ring test. It can cause two cases: 1. sem_vcpu_cont be non-zero. When we set the host_quit to be true, the vcpu_worker directly see the host_quit to be true, it quit. So the log_mode_before_vcpu_join() function will set the sem_vcpu_cont to 1, since the vcpu_worker has quit, it won't consume it. 2. sem_vcpu_stop be non-zero. When we set the host_quit to be true, the vcpu_worker has entered the guest state, the next time it exit from guest state, it will set the sem_vcpu_stop to 1, and then see the host_quit, no one will consume the sem_vcpu_stop. When execute more and more dirty-ring tests, the sem_vcpu_cont and sem_vcpu_stop can be larger and larger, which makes many code paths don't wait for the sem_t. Thus finally cause the problem. To fix this problem, we can wait a while before set the host_quit to true, which gives the vcpu time to enter the guest state, so it will exit again. Then we can wait the vcpu to exit, and let it continue again, then the vcpu will see the host_quit. Thus the sem_vcpu_cont and sem_vcpu_stop will be both zero when test finished. Signed-off-by: Shaoqin Huang <shahuang(a)redhat.com> --- v2->v3: - Rebase to v6.8-rc2. - Use TEST_ASSERT(). v1->v2: - Fix the real logic bug, not just fresh the context. v1: https://lore.kernel.org/all/20231116093536.22256-1-shahuang@redhat.com/ v2: https://lore.kernel.org/all/20231117052210.26396-1-shahuang@redhat.com/ tools/testing/selftests/kvm/dirty_log_test.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c index 6cbecf499767..dd2d8be390a5 100644 --- a/tools/testing/selftests/kvm/dirty_log_test.c +++ b/tools/testing/selftests/kvm/dirty_log_test.c @@ -417,7 +417,8 @@ static void dirty_ring_after_vcpu_run(struct kvm_vcpu *vcpu, int ret, int err) static void dirty_ring_before_vcpu_join(void) { - /* Kick another round of vcpu just to make sure it will quit */ + /* Wait vcpu exit, and let it continue to see the host_quit. */ + dirty_ring_wait_vcpu(); sem_post(&sem_vcpu_cont); } @@ -719,6 +720,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) struct kvm_vm *vm; unsigned long *bmap; uint32_t ring_buf_idx = 0; + int sem_val; if (!log_mode_supported()) { print_skip("Log mode '%s' not supported", @@ -726,6 +728,11 @@ static void run_test(enum vm_guest_mode mode, void *arg) return; } + sem_getvalue(&sem_vcpu_stop, &sem_val); + assert(sem_val == 0); + sem_getvalue(&sem_vcpu_cont, &sem_val); + assert(sem_val == 0); + /* * We reserve page table for 2 times of extra dirty mem which * will definitely cover the original (1G+) test range. Here @@ -825,6 +832,13 @@ static void run_test(enum vm_guest_mode mode, void *arg) sync_global_to_guest(vm, iteration); } + /* + * + * Before we set the host_quit, let the vcpu has time to run, to make + * sure we consume the sem_vcpu_stop and the vcpu consume the + * sem_vcpu_cont, to keep the semaphore balance. + */ + usleep(p->interval * 1000); /* Tell the vcpu thread to quit */ host_quit = true; log_mode_before_vcpu_join(); base-commit: 41bccc98fb7931d63d03f326a746ac4d429c1dd3 -- 2.40.1

1 year, 10 months

3
5
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror February 2024