November 2024 - Linux-kselftest-mirror

[PATCH v7 00/32] riscv control-flow integrity for usermode

by Deepak Gupta

Basics and overview =================== Software with larger attack surfaces (e.g. network facing apps like databases, browsers or apps relying on browser runtimes) suffer from memory corruption issues which can be utilized by attackers to bend control flow of the program to eventually gain control (by making their payload executable). Attackers are able to perform such attacks by leveraging call-sites which rely on indirect calls or return sites which rely on obtaining return address from stack memory. To mitigate such attacks, risc-v extension zicfilp enforces that all indirect calls must land on a landing pad instruction `lpad` else cpu will raise software check exception (a new cpu exception cause code on riscv). Similarly for return flow, risc-v extension zicfiss extends architecture with - `sspush` instruction to push return address on a shadow stack - `sspopchk` instruction to pop return address from shadow stack and compare with input operand (i.e. return address on stack) - `sspopchk` to raise software check exception if comparision above was a mismatch - Protection mechanism using which shadow stack is not writeable via regular store instructions More information an details can be found at extensions github repo [1]. Equivalent to landing pad (zicfilp) on x86 is `ENDBRANCH` instruction in Intel CET [3] and branch target identification (BTI) [4] on arm. Similarly x86's Intel CET has shadow stack [5] and arm64 has guarded control stack (GCS) [6] which are very similar to risc-v's zicfiss shadow stack. x86 already supports shadow stack for user mode and arm64 support for GCS in usermode [7] is in -next. Kernel awareness for user control flow integrity ================================================ This series picks up Samuel Holland's envcfg changes [2] as well. So if those are being applied independently, they should be removed from this series. Enabling: In order to maintain compatibility and not break anything in user mode, kernel doesn't enable control flow integrity cpu extensions on binary by default. Instead exposes a prctl interface to enable, disable and lock the shadow stack or landing pad feature for a task. This allows userspace (loader) to enumerate if all objects in its address space are compiled with shadow stack and landing pad support and accordingly enable the feature. Additionally if a subsequent `dlopen` happens on a library, user mode can take a decision again to disable the feature (if incoming library is not compiled with support) OR terminate the task (if user mode policy is strict to have all objects in address space to be compiled with control flow integirty cpu feature). prctl to enable shadow stack results in allocating shadow stack from virtual memory and activating for user address space. x86 and arm64 are also following same direction due to similar reason(s). clone/fork: On clone and fork, cfi state for task is inherited by child. Shadow stack is part of virtual memory and is a writeable memory from kernel perspective (writeable via a restricted set of instructions aka shadow stack instructions) Thus kernel changes ensure that this memory is converted into read-only when fork/clone happens and COWed when fault is taken due to sspush, sspopchk or ssamoswap. In case `CLONE_VM` is specified and shadow stack is to be enabled, kernel will automatically allocate a shadow stack for that clone call. map_shadow_stack: x86 introduced `map_shadow_stack` system call to allow user space to explicitly map shadow stack memory in its address space. It is useful to allocate shadow for different contexts managed by a single thread (green threads or contexts) risc-v implements this system call as well. signal management: If shadow stack is enabled for a task, kernel performs an asynchronous control flow diversion to deliver the signal and eventually expects userspace to issue sigreturn so that original execution can be resumed. Even though resume context is prepared by kernel, it is in user space memory and is subject to memory corruption and corruption bugs can be utilized by attacker in this race window to perform arbitrary sigreturn and eventually bypass cfi mechanism. Another issue is how to ensure that cfi related state on sigcontext area is not trampled by legacy apps or apps compiled with old kernel headers. In order to mitigate control-flow hijacting, kernel prepares a token and place it on shadow stack before signal delivery and places address of token in sigcontext structure. During sigreturn, kernel obtains address of token from sigcontext struture, reads token from shadow stack and validates it and only then allow sigreturn to succeed. Compatiblity issue is solved by adopting dynamic sigcontext management introduced for vector extension. This series re-factor the code little bit to allow future sigcontext management easy (as proposed by Andy Chiu from SiFive) config and compilation: Introduce a new risc-v config option `CONFIG_RISCV_USER_CFI`. Selecting this config option picks the kernel support for user control flow integrity. This optin is presented only if toolchain has shadow stack and landing pad support. And is on purpose guarded by toolchain support. Reason being that eventually vDSO also needs to be compiled in with shadow stack and landing pad support. vDSO compile patches are not included as of now because landing pad labeling scheme is yet to settle for usermode runtime. To get more information on kernel interactions with respect to zicfilp and zicfiss, patch series adds documentation for `zicfilp` and `zicfiss` in following: Documentation/arch/riscv/zicfiss.rst Documentation/arch/riscv/zicfilp.rst How to test this series ======================= Toolchain --------- $ git clone git@github.com:sifive/riscv-gnu-toolchain.git -b cfi-dev $ riscv-gnu-toolchain/configure --prefix=<path-to-where-to-build> --with-arch=rv64gc_zicfilp_zicfiss --enable-linux --disable-gdb --with-extra-multilib-test="rv64gc_zicfilp_zicfiss-lp64d:-static" $ make -j$(nproc) Qemu ---- $ git clone git@github.com:deepak0414/qemu.git -b zicfilp_zicfiss_ratified_master_july11 $ cd qemu $ mkdir build $ cd build $ ../configure --target-list=riscv64-softmmu $ make -j$(nproc) Opensbi ------- $ git clone git@github.com:deepak0414/opensbi.git -b v6_cfi_spec_split_opensbi $ make CROSS_COMPILE=<your riscv toolchain> -j$(nproc) PLATFORM=generic Linux ----- Running defconfig is fine. CFI is enabled by default if the toolchain supports it. $ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) defconfig $ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) Branch where user cfi enabling patches are maintained https://github.com/deepak0414/linux-riscv-cfi/tree/vdso_user_cfi_v6.12-rc1 In case you're building your own rootfs using toolchain, please make sure you pick following patch to ensure that vDSO compiled with lpad and shadow stack. "arch/riscv: compile vdso with landing pad" Running ------- Modify your qemu command to have: -bios <path-to-cfi-opensbi>/build/platform/generic/firmware/fw_dynamic.bin -cpu rv64,zicfilp=true,zicfiss=true,zimop=true,zcmop=true vDSO related Opens (in the flux) ================================= I am listing these opens for laying out plan and what to expect in future patch sets. And of course for the sake of discussion. Shadow stack and landing pad enabling in vDSO ---------------------------------------------- vDSO must have shadow stack and landing pad support compiled in for task to have shadow stack and landing pad support. This patch series doesn't enable that (yet). Enabling shadow stack support in vDSO should be straight forward (intend to do that in next versions of patch set). Enabling landing pad support in vDSO requires some collaboration with toolchain folks to follow a single label scheme for all object binaries. This is necessary to ensure that all indirect call-sites are setting correct label and target landing pads are decorated with same label scheme. How many vDSOs --------------- Shadow stack instructions are carved out of zimop (may be operations) and if CPU doesn't implement zimop, they're illegal instructions. Kernel could be running on a CPU which may or may not implement zimop. And thus kernel will have to carry 2 different vDSOs and expose the appropriate one depending on whether CPU implements zimop or not. References ========== [1] - https://github.com/riscv/riscv-cfi [2] - https://lore.kernel.org/all/20240814081126.956287-1-samuel.holland@sifive.c… [3] - https://lwn.net/Articles/889475/ [4] - https://developer.arm.com/documentation/109576/0100/Branch-Target-Identific… [5] - https://www.intel.com/content/dam/develop/external/us/en/documents/catc17-i… [6] - https://lwn.net/Articles/940403/ [7] - https://lore.kernel.org/all/20241001-arm64-gcs-v13-0-222b78d87eee@kernel.or… --- changelog --------- v7: - Removed "riscv/Kconfig: enable HAVE_EXIT_THREAD for riscv" Instead using `deactivate_mm` flow to clean up. see here for more context https://lore.kernel.org/all/20230908203655.543765-1-rick.p.edgecombe@intel.… - Changed the header include in `kselftest`. Hopefully this fixes compile issue faced by Zong Li at SiFive. - Cleaned up an orphaned change to `mm/mmap.c` in below patch "riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE" - Lock interfaces for shadow stack and indirect branch tracking expect arg == 0 Any future evolution of this interface should accordingly define how arg should be setup. - `mm/map.c` has an instance of using `VM_SHADOW_STACK`. Fixed it to use helper `is_shadow_stack_vma`. - Link to v6: https://lore.kernel.org/r/20241008-v5_user_cfi_series-v6-0-60d9fe073f37@riv… v6: - Picked up Samuel Holland's changes as is with `envcfg` placed in `thread` instead of `thread_info` - fixed unaligned newline escapes in kselftest - cleaned up messages in kselftest and included test output in commit message - fixed a bug in clone path reported by Zong Li - fixed a build issue if CONFIG_RISCV_ISA_V is not selected (this was introduced due to re-factoring signal context management code) v5: - rebased on v6.12-rc1 - Fixed schema related issues in device tree file - Fixed some of the documentation related issues in zicfilp/ss.rst (style issues and added index) - added `SHADOW_STACK_SET_MARKER` so that implementation can define base of shadow stack. - Fixed warnings on definitions added in usercfi.h when CONFIG_RISCV_USER_CFI is not selected. - Adopted context header based signal handling as proposed by Andy Chiu - Added support for enabling kernel mode access to shadow stack using FWFT (https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/src/ext-firmware…) - Link to v5: https://lore.kernel.org/r/20241001-v5_user_cfi_series-v1-0-3ba65b6e550f@riv… (Note: I had an issue in my workflow due to which version number wasn't picked up correctly while sending out patches) v4: - rebased on 6.11-rc6 - envcfg: Converged with Samuel Holland's patches for envcfg management on per- thread basis. - vma_is_shadow_stack is renamed to is_vma_shadow_stack - picked up Mark Brown's `ARCH_HAS_USER_SHADOW_STACK` patch - signal context: using extended context management to maintain compatibility. - fixed `-Wmissing-prototypes` compiler warnings for prctl functions - Documentation fixes and amending typos. - Link to v4: https://lore.kernel.org/all/20240912231650.3740732-1-debug@rivosinc.com/ v3: - envcfg logic to pick up base envcfg had a bug where `ENVCFG_CBZE` could have been picked on per task basis, even though CPU didn't implement it. Fixed in this series. - dt-bindings As suggested, split into separate commit. fixed the messaging that spec is in public review - arch_is_shadow_stack change arch_is_shadow_stack changed to vma_is_shadow_stack - hwprobe zicfiss / zicfilp if present will get enumerated in hwprobe - selftests As suggested, added object and binary filenames to .gitignore Selftest binary anyways need to be compiled with cfi enabled compiler which will make sure that landing pad and shadow stack are enabled. Thus removed separate enable/disable tests. Cleaned up tests a bit. - Link to v3: https://lore.kernel.org/lkml/20240403234054.2020347-1-debug@rivosinc.com/ v2: - Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow integrity for user mode programs can be compiled in the kernel. - Enabling of control flow integrity for user programs is left to user runtime - This patch series introduces arch agnostic `prctls` to enable shadow stack and indirect branch tracking. And implements them on riscv. --- Andy Chiu (1): riscv: signal: abstract header saving for setup_sigcontext Clément Léger (1): riscv: Add Firmware Feature SBI extensions definitions Deepak Gupta (25): mm: helper `is_shadow_stack_vma` to check shadow stack vma dt-bindings: riscv: zicfilp and zicfiss in dt-bindings (extensions.yaml) riscv: zicfiss / zicfilp enumeration riscv: zicfiss / zicfilp extension csr and bit definitions riscv: usercfi state for task and save/restore of CSR_SSP on trap entry/exit riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE riscv mm: manufacture shadow stack pte riscv mmu: teach pte_mkwrite to manufacture shadow stack PTEs riscv mmu: write protect and shadow stack riscv/mm: Implement map_shadow_stack() syscall riscv/shstk: If needed allocate a new shadow stack on clone prctl: arch-agnostic prctl for indirect branch tracking riscv: Implements arch agnostic shadow stack prctls riscv: Implements arch agnostic indirect branch tracking prctls riscv/traps: Introduce software check exception riscv/signal: save and restore of shadow stack for signal riscv/kernel: update __show_regs to print shadow stack register riscv/ptrace: riscv cfi status and state via ptrace and in core files riscv/hwprobe: zicfilp / zicfiss enumeration in hwprobe riscv: enable kernel access to shadow stack memory via FWFT sbi call riscv: kernel command line option to opt out of user cfi riscv: create a config for shadow stack and landing pad instr support riscv: Documentation for landing pad / indirect branch tracking riscv: Documentation for shadow stack on riscv kselftest/riscv: kselftest for user mode cfi Mark Brown (2): mm: Introduce ARCH_HAS_USER_SHADOW_STACK prctl: arch-agnostic prctl for shadow stack Samuel Holland (3): riscv: Enable cbo.zero only when all harts support Zicboz riscv: Add support for per-thread envcfg CSR values riscv: Call riscv_user_isa_enable() only on the boot hart Documentation/arch/riscv/index.rst | 2 + Documentation/arch/riscv/zicfilp.rst | 115 +++++ Documentation/arch/riscv/zicfiss.rst | 176 +++++++ .../devicetree/bindings/riscv/extensions.yaml | 14 + arch/riscv/Kconfig | 20 + arch/riscv/include/asm/asm-prototypes.h | 1 + arch/riscv/include/asm/cpufeature.h | 15 +- arch/riscv/include/asm/csr.h | 16 + arch/riscv/include/asm/entry-common.h | 2 + arch/riscv/include/asm/hwcap.h | 2 + arch/riscv/include/asm/mman.h | 24 + arch/riscv/include/asm/mmu_context.h | 7 + arch/riscv/include/asm/pgtable.h | 30 +- arch/riscv/include/asm/processor.h | 3 + arch/riscv/include/asm/sbi.h | 27 ++ arch/riscv/include/asm/switch_to.h | 8 + arch/riscv/include/asm/thread_info.h | 3 + arch/riscv/include/asm/usercfi.h | 89 ++++ arch/riscv/include/asm/vector.h | 3 + arch/riscv/include/uapi/asm/hwprobe.h | 2 + arch/riscv/include/uapi/asm/ptrace.h | 22 + arch/riscv/include/uapi/asm/sigcontext.h | 1 + arch/riscv/kernel/Makefile | 2 + arch/riscv/kernel/asm-offsets.c | 8 + arch/riscv/kernel/cpufeature.c | 13 +- arch/riscv/kernel/entry.S | 31 +- arch/riscv/kernel/head.S | 12 + arch/riscv/kernel/process.c | 26 +- arch/riscv/kernel/ptrace.c | 83 ++++ arch/riscv/kernel/signal.c | 140 +++++- arch/riscv/kernel/smpboot.c | 2 - arch/riscv/kernel/suspend.c | 4 +- arch/riscv/kernel/sys_hwprobe.c | 2 + arch/riscv/kernel/sys_riscv.c | 10 + arch/riscv/kernel/traps.c | 42 ++ arch/riscv/kernel/usercfi.c | 526 +++++++++++++++++++++ arch/riscv/mm/init.c | 2 +- arch/riscv/mm/pgtable.c | 17 + arch/x86/Kconfig | 1 + fs/proc/task_mmu.c | 2 +- include/linux/cpu.h | 4 + include/linux/mm.h | 5 +- include/uapi/asm-generic/mman.h | 4 + include/uapi/linux/elf.h | 1 + include/uapi/linux/prctl.h | 48 ++ kernel/sys.c | 60 +++ mm/Kconfig | 6 + mm/gup.c | 2 +- mm/mmap.c | 2 +- mm/vma.h | 10 +- tools/testing/selftests/riscv/Makefile | 2 +- tools/testing/selftests/riscv/cfi/.gitignore | 3 + tools/testing/selftests/riscv/cfi/Makefile | 10 + tools/testing/selftests/riscv/cfi/cfi_rv_test.h | 84 ++++ tools/testing/selftests/riscv/cfi/riscv_cfi_test.c | 78 +++ tools/testing/selftests/riscv/cfi/shadowstack.c | 373 +++++++++++++++ tools/testing/selftests/riscv/cfi/shadowstack.h | 37 ++ 57 files changed, 2191 insertions(+), 43 deletions(-) --- base-commit: 7d9923ee3960bdbfaa7f3a4e0ac2364e770c46ff change-id: 20240930-v5_user_cfi_series-3dc332f8f5b2 -- - debug

6 months, 3 weeks

2
33
0 0

[PATCH v2 0/2] Modify the watchdog selftest for execution in non-interactive environments

by Laura Nao

This series is a follow-up to v1[1], aimed at making the watchdog selftest more suitable for CI environments. Currently, in non-interactive setups, the watchdog kselftest can only run with oneshot parameters, preventing the testing of the WDIOC_KEEPALIVE ioctl since the ping loop is only interrupted by SIGINT. The first patch adds a new -c option to limit the number of watchdog pings, allowing the test to be optionally finite. The second patch updates the test output to conform to KTAP. The default behavior remains unchanged: without the -c option, the keep_alive() loop continues indefinitely until interrupted by SIGINT. [1] https://lore.kernel.org/all/20240506111359.224579-1-laura.nao@collabora.com/ Changes in v2: - The keep_alive() loop remains infinite by default - Introduced keep_alive_res variable to track the WDIOC_KEEPALIVE ioctl return code for user reporting Laura Nao (2): selftests/watchdog: add -c option to limit the ping loop selftests/watchdog: convert the test output to KTAP format .../selftests/watchdog/watchdog-test.c | 169 +++++++++++------- 1 file changed, 103 insertions(+), 66 deletions(-) -- 2.30.2

6 months, 3 weeks

2
5
0 0

[PATCH v2] selftests/ftrace: adjust offset for kprobe syntax error test

by Hari Bathini

In 'NOFENTRY_ARGS' test case for syntax check, any offset X of `vfs_read+X` except function entry offset (0) fits the criterion, even if that offset is not at instruction boundary, as the parser comes before probing. But with "ENDBR64" instruction on x86, offset 4 is treated as function entry. So, X can't be 4 as well. Thus, 8 was used as offset for the test case. On 64-bit powerpc though, any offset <= 16 can be considered function entry depending on build configuration (see arch_kprobe_on_func_entry() for implementation details). So, use `vfs_read+20` to accommodate that scenario too. Suggested-by: Masami Hiramatsu <mhiramat(a)kernel.org> Signed-off-by: Hari Bathini <hbathini(a)linux.ibm.com> --- Changes in v2: * Use 20 as offset for all arches. .../selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc index a16c6a6f6055..8f1c58f0c239 100644 --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc @@ -111,7 +111,7 @@ check_error 'p vfs_read $arg* ^$arg*' # DOUBLE_ARGS if !grep -q 'kernel return probes support:' README; then check_error 'r vfs_read ^$arg*' # NOFENTRY_ARGS fi -check_error 'p vfs_read+8 ^$arg*' # NOFENTRY_ARGS +check_error 'p vfs_read+20 ^$arg*' # NOFENTRY_ARGS check_error 'p vfs_read ^hoge' # NO_BTFARG check_error 'p kfree ^$arg10' # NO_BTFARG (exceed the number of parameters) check_error 'r kfree ^$retval' # NO_RETVAL -- 2.47.0

6 months, 3 weeks

3
5
0 0

[PATCH v4] memfd: `MFD_NOEXEC_SEAL` should not imply `MFD_ALLOW_SEALING`

by Barnabás Pőcze

`MFD_NOEXEC_SEAL` should remove the executable bits and set `F_SEAL_EXEC` to prevent further modifications to the executable bits as per the comment in the uapi header file: not executable and sealed to prevent changing to executable However, commit 105ff5339f498a ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC") that introduced this feature made it so that `MFD_NOEXEC_SEAL` unsets `F_SEAL_SEAL`, essentially acting as a superset of `MFD_ALLOW_SEALING`. Nothing implies that it should be so, and indeed up until the second version of the of the patchset[0] that introduced `MFD_EXEC` and `MFD_NOEXEC_SEAL`, `F_SEAL_SEAL` was not removed, however, it was changed in the third revision of the patchset[1] without a clear explanation. This behaviour is surprising for application developers, there is no documentation that would reveal that `MFD_NOEXEC_SEAL` has the additional effect of `MFD_ALLOW_SEALING`. Additionally, combined with `vm.memfd_noexec=2` it has the effect of making all memfds initially sealable. So do not remove `F_SEAL_SEAL` when `MFD_NOEXEC_SEAL` is requested, thereby returning to the pre-Linux 6.3 behaviour of only allowing sealing when `MFD_ALLOW_SEALING` is specified. Now, this is technically a uapi break. However, the damage is expected to be minimal. To trigger user visible change, a program has to do the following steps: - create memfd: - with `MFD_NOEXEC_SEAL`, - without `MFD_ALLOW_SEALING`; - try to add seals / check the seals. But that seems unlikely to happen intentionally since this change essentially reverts the kernel's behaviour to that of Linux <6.3, so if a program worked correctly on those older kernels, it will likely work correctly after this change. I have used Debian Code Search and GitHub to try to find potential breakages, and I could only find a single one. dbus-broker's memfd_create() wrapper is aware of this implicit `MFD_ALLOW_SEALING` behaviour, and tries to work around it[2]. This workaround will break. Luckily, this only affects the test suite, it does not affect the normal operations of dbus-broker. There is a PR with a fix[3]. I also carried out a smoke test by building a kernel with this change and booting an Arch Linux system into GNOME and Plasma sessions. There was also a previous attempt to address this peculiarity by introducing a new flag[4]. [0]: https://lore.kernel.org/lkml/20220805222126.142525-3-jeffxu@google.com/ [1]: https://lore.kernel.org/lkml/20221202013404.163143-3-jeffxu@google.com/ [2]: https://github.com/bus1/dbus-broker/blob/9eb0b7e5826fc76cad7b025bc46f267d4a… [3]: https://github.com/bus1/dbus-broker/pull/366 [4]: https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/ Cc: stable(a)vger.kernel.org Signed-off-by: Barnabás Pőcze <pobrn(a)protonmail.com> --- * v3: https://lore.kernel.org/linux-mm/20240611231409.3899809-1-jeffxu@chromium.o… * v2: https://lore.kernel.org/linux-mm/20240524033933.135049-1-jeffxu@google.com/ * v1: https://lore.kernel.org/linux-mm/20240513191544.94754-1-pobrn@protonmail.co… This fourth version returns to removing the inconsistency as opposed to documenting its existence, with the same code change as v1 but with a somewhat extended commit message. This is sent because I believe it is worth at least a try; it can be easily reverted if bigger application breakages are discovered than initially imagined. --- mm/memfd.c | 9 ++++----- tools/testing/selftests/memfd/memfd_test.c | 2 +- 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/mm/memfd.c b/mm/memfd.c index 7d8d3ab3fa37..8b7f6afee21d 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -356,12 +356,11 @@ SYSCALL_DEFINE2(memfd_create, inode->i_mode &= ~0111; file_seals = memfd_file_seals_ptr(file); - if (file_seals) { - *file_seals &= ~F_SEAL_SEAL; + if (file_seals) *file_seals |= F_SEAL_EXEC; - } - } else if (flags & MFD_ALLOW_SEALING) { - /* MFD_EXEC and MFD_ALLOW_SEALING are set */ + } + + if (flags & MFD_ALLOW_SEALING) { file_seals = memfd_file_seals_ptr(file); if (file_seals) *file_seals &= ~F_SEAL_SEAL; diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c index 95af2d78fd31..7b78329f65b6 100644 --- a/tools/testing/selftests/memfd/memfd_test.c +++ b/tools/testing/selftests/memfd/memfd_test.c @@ -1151,7 +1151,7 @@ static void test_noexec_seal(void) mfd_def_size, MFD_CLOEXEC | MFD_NOEXEC_SEAL); mfd_assert_mode(fd, 0666); - mfd_assert_has_seals(fd, F_SEAL_EXEC); + mfd_assert_has_seals(fd, F_SEAL_SEAL | F_SEAL_EXEC); mfd_fail_chmod(fd, 0777); close(fd); } -- 2.45.2

6 months, 3 weeks

4
5
0 0

[PATCH] selftests/timens: Add fclose(proc) to prevent memory leaks

by liujing

If fopen succeeds, the fscanf function is called to read the data. Regardless of whether fscanf is successful, you need to run fclose(proc) to prevent memory leaks. Signed-off-by: liujing <liujing(a)cmss.chinamobile.com> diff --git a/tools/testing/selftests/timens/procfs.c b/tools/testing/selftests/timens/procfs.c index 1833ca97eb24..e47844a73c31 100644 --- a/tools/testing/selftests/timens/procfs.c +++ b/tools/testing/selftests/timens/procfs.c @@ -79,9 +79,11 @@ static int read_proc_uptime(struct timespec *uptime) if (fscanf(proc, "%lu.%02lu", &up_sec, &up_nsec) != 2) { if (errno) { pr_perror("fscanf"); + fclose(proc); return -errno; } pr_err("failed to parse /proc/uptime"); + fclose(proc); return -1; } fclose(proc); -- 2.27.0

6 months, 4 weeks

2
1
0 0

[PATCH] selftests: prctl: Fix resource leaks

by liujing

After using the fopen function successfully, you need to call fclose to close the file before finally returning. Signed-off-by: liujing <liujing(a)cmss.chinamobile.com> diff --git a/tools/testing/selftests/prctl/set-process-name.c b/tools/testing/selftests/prctl/set-process-name.c index 562f707ba771..7be7afff0cd2 100644 --- a/tools/testing/selftests/prctl/set-process-name.c +++ b/tools/testing/selftests/prctl/set-process-name.c @@ -66,9 +66,12 @@ int check_name(void) return -EIO; fscanf(fptr, "%s", output); - if (ferror(fptr)) + if (ferror(fptr)) { + fclose(fptr); return -EIO; + } + fclose(fptr); int res = prctl(PR_GET_NAME, name, NULL, NULL, NULL); if (res < 0) -- 2.27.0

6 months, 4 weeks

2
1
0 0

[PATCH net-next 0/4] netconsole: Add support for CPU population

by Breno Leitao

The current implementation of netconsole sends all log messages in parallel, which can lead to an intermixed and interleaved output on the receiving side. This makes it challenging to demultiplex the messages and attribute them to their originating CPUs. As a result, users and developers often struggle to effectively analyze and debug the parallel log output received through netconsole. Example of a message got from produciton hosts: ------------[ cut here ]------------ ------------[ cut here ]------------ refcount_t: saturated; leaking memory. WARNING: CPU: 2 PID: 1613668 at lib/refcount.c:22 refcount_warn_saturate+0x5e/0xe0 refcount_t: addition on 0; use-after-free. WARNING: CPU: 26 PID: 4139916 at lib/refcount.c:25 refcount_warn_saturate+0x7d/0xe0 Modules linked in: bpf_preload(E) vhost_net(E) tun(E) vhost(E) This series of patches introduces a new feature to the netconsole subsystem that allows the automatic population of the CPU number in the userdata field for each log message. This enhancement provides several benefits: * Improved demultiplexing of parallel log output: When multiple CPUs are sending messages concurrently, the added CPU number in the userdata makes it easier to differentiate and attribute the messages to their originating CPUs. * Better visibility into message sources: The CPU number information gives users and developers more insight into which specific CPU a particular log message came from, which can be valuable for debugging and analysis. The changes in this series are as follows: Patch "Ensure dynamic_netconsole_mutex is held during userdata update" Add a lockdep assert to make sure dynamic_netconsole_mutex is held when calling update_userdata(). Patch "netconsole: Add option to auto-populate CPU number in userdata" Adds a new option to enable automatic CPU number population in the netconsole userdata Provides a new "populate_cpu_nr" sysfs attribute to control this feature Patch "netconsole: selftest: test CPU number auto-population" Expands the existing netconsole selftest to verify the CPU number auto-population functionality Ensures the received netconsole messages contain the expected "cpu=" entry in the userdata Patch "netconsole: docs: Add documentation for CPU number auto-population" Updates the netconsole documentation to explain the new CPU number auto-population feature Provides instructions on how to enable and use the feature I believe these changes will be a valuable addition to the netconsole subsystem, enhancing its usefulness for kernel developers and users. Signed-off-by: Breno Leitao <leitao(a)debian.org> --- Breno Leitao (4): netconsole: Ensure dynamic_netconsole_mutex is held during userdata update netconsole: Add option to auto-populate CPU number in userdata netconsole: docs: Add documentation for CPU number auto-population netconsole: selftest: Validate CPU number auto-population in userdata Documentation/networking/netconsole.rst | 44 +++++++++++++++ drivers/net/netconsole.c | 63 ++++++++++++++++++++++ .../testing/selftests/drivers/net/netcons_basic.sh | 18 +++++++ 3 files changed, 125 insertions(+) --- base-commit: a58f00ed24b849d449f7134fd5d86f07090fe2f5 change-id: 20241108-netcon_cpu-ce3917e88f4b Best regards, -- Breno Leitao <leitao(a)debian.org>

6 months, 4 weeks

2
8
0 0

[PATCH] selftests: lsm: Refactor `flags_overset_lsm_set_self_attr` test

by Amit Vadhavana

- Remove unnecessary `tctx` variable, use `ctx` directly. - Simplified code with no functional changes. Signed-off-by: Amit Vadhavana <av2082000(a)gmail.com> --- tools/testing/selftests/lsm/lsm_set_self_attr_test.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/lsm/lsm_set_self_attr_test.c b/tools/testing/selftests/lsm/lsm_set_self_attr_test.c index 66dec47e3ca3..732e89fe99c0 100644 --- a/tools/testing/selftests/lsm/lsm_set_self_attr_test.c +++ b/tools/testing/selftests/lsm/lsm_set_self_attr_test.c @@ -56,16 +56,15 @@ TEST(flags_zero_lsm_set_self_attr) TEST(flags_overset_lsm_set_self_attr) { const long page_size = sysconf(_SC_PAGESIZE); - char *ctx = calloc(page_size, 1); + struct lsm_ctx *ctx = calloc(page_size, 1); __u32 size = page_size; - struct lsm_ctx *tctx = (struct lsm_ctx *)ctx; ASSERT_NE(NULL, ctx); if (attr_lsm_count()) { - ASSERT_LE(1, lsm_get_self_attr(LSM_ATTR_CURRENT, tctx, &size, + ASSERT_LE(1, lsm_get_self_attr(LSM_ATTR_CURRENT, ctx, &size, 0)); } - ASSERT_EQ(-1, lsm_set_self_attr(LSM_ATTR_CURRENT | LSM_ATTR_PREV, tctx, + ASSERT_EQ(-1, lsm_set_self_attr(LSM_ATTR_CURRENT | LSM_ATTR_PREV, ctx, size, 0)); free(ctx); -- 2.25.1

6 months, 4 weeks

5
8
0 0

[PATCH v2 net-next] wireguard: allowedips: Add WGALLOWEDIP_F_REMOVE_ME flag

by Jordan Rife

With the current API the only way to remove an allowed IP is to completely rebuild the allowed IPs set for a peer using WGPEER_F_REPLACE_ALLOWEDIPS. In other words, if my current configuration is such that a peer has allowed IP IPs 192.168.0.2 and 192.168.0.3 and I want to remove 192.168.0.2 the actual transition looks like this. [192.168.0.2, 192.168.0.3] <-- Initial state [] <-- Step 1: Allowed IPs removed for peer [192.168.0.3] <-- Step 2: Allowed IPs added back for peer This is true even if the allowed IP list is small and the update does not need to be batched into multiple WG_CMD_SET_DEVICE requests, as the removal and subsequent addition of IPs is non-atomic within a single request. Consequently, wg_allowedips_lookup_dst and wg_allowedips_lookup_src may return NULL while reconfiguring a peer even for packets bound for IPs a user did not intend to remove leading to unintended interruptions in connectivity. This presents in userspace as failed calls to sendto and sendmsg for UDP sockets. In my case, I ran netperf while repeatedly reconfiguring the allowed IPs for a peer with wg. /usr/local/bin/netperf -H 10.102.73.72 -l 10m -t UDP_STREAM -- -R 1 -m 1024 send_data: data send error: No route to host (errno 113) netperf: send_omni: send_data failed: No route to host While this may not be of particular concern for environments where peers and allowed IPs are mostly static, Cilium manages peers and allowed IPs in a dynamic environment where peers (i.e. Kubernetes nodes) and allowed IPs (i.e. Pods running on those nodes) can frequently change. Cilium must continually keep its WireGuard device's configuration in sync with its cluster state leading to unnecessary churn and packet drops. This patch introduces a new flag called WGALLOWEDIP_F_REMOVE_ME which in the same way that WGPEER_F_REMOVE_ME allows a user to remove a single peer from a WireGuard device's configuration allows a user to remove an IP from a peer's set of allowed IPs. This has two benefits. First, it allows systems such as Cilium to avoid introducing connectivity blips while reconfiguring a WireGuard device. Second, it allows us to more efficiently keep the device's configuration in sync with the cluster state, as we no longer need to do frequent rebuilds of the allowed IPs list for each peer. Instead, the device's configuration can be incrementally updated. This patch also bumps WG_GENL_VERSION which can be used by clients to detect whether or not their system supports the WGALLOWEDIP_F_REMOVE_ME flag. ======= Changes ======= v1->v2 ------ * Fixed some Sparse warnings Signed-off-by: Jordan Rife <jrife(a)google.com> --- drivers/net/wireguard/allowedips.c | 103 ++++++++++---- drivers/net/wireguard/allowedips.h | 4 + drivers/net/wireguard/netlink.c | 45 +++++-- drivers/net/wireguard/selftest/allowedips.c | 30 +++++ include/uapi/linux/wireguard.h | 11 +- tools/testing/selftests/wireguard/Makefile | 18 +++ tools/testing/selftests/wireguard/netns.sh | 38 ++++++ tools/testing/selftests/wireguard/remove-ip.c | 126 ++++++++++++++++++ 8 files changed, 333 insertions(+), 42 deletions(-) create mode 100644 tools/testing/selftests/wireguard/Makefile create mode 100644 tools/testing/selftests/wireguard/remove-ip.c diff --git a/drivers/net/wireguard/allowedips.c b/drivers/net/wireguard/allowedips.c index 4b8528206cc8a..ff52259dd8d81 100644 --- a/drivers/net/wireguard/allowedips.c +++ b/drivers/net/wireguard/allowedips.c @@ -249,6 +249,56 @@ static int add(struct allowedips_node __rcu **trie, u8 bits, const u8 *key, return 0; } +static void _remove(struct allowedips_node *node, struct mutex *lock) +{ + struct allowedips_node *child, **parent_bit, *parent; + bool free_parent; + + list_del_init(&node->peer_list); + RCU_INIT_POINTER(node->peer, NULL); + if (node->bit[0] && node->bit[1]) + return; + child = rcu_dereference_protected(node->bit[!rcu_access_pointer(node->bit[0])], + lockdep_is_held(lock)); + if (child) + child->parent_bit_packed = node->parent_bit_packed; + parent_bit = (struct allowedips_node **)(node->parent_bit_packed & ~3UL); + *parent_bit = child; + parent = (void *)parent_bit - + offsetof(struct allowedips_node, bit[node->parent_bit_packed & 1]); + free_parent = !rcu_access_pointer(node->bit[0]) && + !rcu_access_pointer(node->bit[1]) && + (node->parent_bit_packed & 3) <= 1 && + !rcu_access_pointer(parent->peer); + if (free_parent) + child = rcu_dereference_protected(parent->bit[!(node->parent_bit_packed & 1)], + lockdep_is_held(lock)); + call_rcu(&node->rcu, node_free_rcu); + if (!free_parent) + return; + if (child) + child->parent_bit_packed = parent->parent_bit_packed; + *(struct allowedips_node **)(parent->parent_bit_packed & ~3UL) = child; + call_rcu(&parent->rcu, node_free_rcu); +} + +static int remove(struct allowedips_node __rcu **trie, u8 bits, const u8 *key, + u8 cidr, struct wg_peer *peer, struct mutex *lock) +{ + struct allowedips_node *node; + + if (unlikely(cidr > bits || !peer)) + return -EINVAL; + if (!rcu_access_pointer(*trie) || + !node_placement(*trie, key, cidr, bits, &node, lock) || + peer != rcu_access_pointer(node->peer)) + return 0; + + _remove(node, lock); + + return 0; +} + void wg_allowedips_init(struct allowedips *table) { table->root4 = table->root6 = NULL; @@ -300,43 +350,38 @@ int wg_allowedips_insert_v6(struct allowedips *table, const struct in6_addr *ip, return add(&table->root6, 128, key, cidr, peer, lock); } +int wg_allowedips_remove_v4(struct allowedips *table, const struct in_addr *ip, + u8 cidr, struct wg_peer *peer, struct mutex *lock) +{ + /* Aligned so it can be passed to fls */ + u8 key[4] __aligned(__alignof(u32)); + + ++table->seq; + swap_endian(key, (const u8 *)ip, 32); + return remove(&table->root4, 32, key, cidr, peer, lock); +} + +int wg_allowedips_remove_v6(struct allowedips *table, const struct in6_addr *ip, + u8 cidr, struct wg_peer *peer, struct mutex *lock) +{ + /* Aligned so it can be passed to fls64 */ + u8 key[16] __aligned(__alignof(u64)); + + ++table->seq; + swap_endian(key, (const u8 *)ip, 128); + return remove(&table->root6, 128, key, cidr, peer, lock); +} + void wg_allowedips_remove_by_peer(struct allowedips *table, struct wg_peer *peer, struct mutex *lock) { - struct allowedips_node *node, *child, **parent_bit, *parent, *tmp; - bool free_parent; + struct allowedips_node *node, *tmp; if (list_empty(&peer->allowedips_list)) return; ++table->seq; list_for_each_entry_safe(node, tmp, &peer->allowedips_list, peer_list) { - list_del_init(&node->peer_list); - RCU_INIT_POINTER(node->peer, NULL); - if (node->bit[0] && node->bit[1]) - continue; - child = rcu_dereference_protected(node->bit[!rcu_access_pointer(node->bit[0])], - lockdep_is_held(lock)); - if (child) - child->parent_bit_packed = node->parent_bit_packed; - parent_bit = (struct allowedips_node **)(node->parent_bit_packed & ~3UL); - *parent_bit = child; - parent = (void *)parent_bit - - offsetof(struct allowedips_node, bit[node->parent_bit_packed & 1]); - free_parent = !rcu_access_pointer(node->bit[0]) && - !rcu_access_pointer(node->bit[1]) && - (node->parent_bit_packed & 3) <= 1 && - !rcu_access_pointer(parent->peer); - if (free_parent) - child = rcu_dereference_protected( - parent->bit[!(node->parent_bit_packed & 1)], - lockdep_is_held(lock)); - call_rcu(&node->rcu, node_free_rcu); - if (!free_parent) - continue; - if (child) - child->parent_bit_packed = parent->parent_bit_packed; - *(struct allowedips_node **)(parent->parent_bit_packed & ~3UL) = child; - call_rcu(&parent->rcu, node_free_rcu); + _remove(node, lock); } } diff --git a/drivers/net/wireguard/allowedips.h b/drivers/net/wireguard/allowedips.h index 2346c797eb4d8..931958cb6e100 100644 --- a/drivers/net/wireguard/allowedips.h +++ b/drivers/net/wireguard/allowedips.h @@ -38,6 +38,10 @@ int wg_allowedips_insert_v4(struct allowedips *table, const struct in_addr *ip, u8 cidr, struct wg_peer *peer, struct mutex *lock); int wg_allowedips_insert_v6(struct allowedips *table, const struct in6_addr *ip, u8 cidr, struct wg_peer *peer, struct mutex *lock); +int wg_allowedips_remove_v4(struct allowedips *table, const struct in_addr *ip, + u8 cidr, struct wg_peer *peer, struct mutex *lock); +int wg_allowedips_remove_v6(struct allowedips *table, const struct in6_addr *ip, + u8 cidr, struct wg_peer *peer, struct mutex *lock); void wg_allowedips_remove_by_peer(struct allowedips *table, struct wg_peer *peer, struct mutex *lock); /* The ip input pointer should be __aligned(__alignof(u64))) */ diff --git a/drivers/net/wireguard/netlink.c b/drivers/net/wireguard/netlink.c index f7055180ba4aa..5f2a8553ab43d 100644 --- a/drivers/net/wireguard/netlink.c +++ b/drivers/net/wireguard/netlink.c @@ -46,7 +46,8 @@ static const struct nla_policy peer_policy[WGPEER_A_MAX + 1] = { static const struct nla_policy allowedip_policy[WGALLOWEDIP_A_MAX + 1] = { [WGALLOWEDIP_A_FAMILY] = { .type = NLA_U16 }, [WGALLOWEDIP_A_IPADDR] = NLA_POLICY_MIN_LEN(sizeof(struct in_addr)), - [WGALLOWEDIP_A_CIDR_MASK] = { .type = NLA_U8 } + [WGALLOWEDIP_A_CIDR_MASK] = { .type = NLA_U8 }, + [WGALLOWEDIP_A_FLAGS] = { .type = NLA_U32 } }; static struct wg_device *lookup_interface(struct nlattr **attrs, @@ -329,6 +330,7 @@ static int set_port(struct wg_device *wg, u16 port) static int set_allowedip(struct wg_peer *peer, struct nlattr **attrs) { int ret = -EINVAL; + u32 flags = 0; u16 family; u8 cidr; @@ -337,19 +339,38 @@ static int set_allowedip(struct wg_peer *peer, struct nlattr **attrs) return ret; family = nla_get_u16(attrs[WGALLOWEDIP_A_FAMILY]); cidr = nla_get_u8(attrs[WGALLOWEDIP_A_CIDR_MASK]); + if (attrs[WGALLOWEDIP_A_FLAGS]) + flags = nla_get_u32(attrs[WGALLOWEDIP_A_FLAGS]); if (family == AF_INET && cidr <= 32 && - nla_len(attrs[WGALLOWEDIP_A_IPADDR]) == sizeof(struct in_addr)) - ret = wg_allowedips_insert_v4( - &peer->device->peer_allowedips, - nla_data(attrs[WGALLOWEDIP_A_IPADDR]), cidr, peer, - &peer->device->device_update_lock); - else if (family == AF_INET6 && cidr <= 128 && - nla_len(attrs[WGALLOWEDIP_A_IPADDR]) == sizeof(struct in6_addr)) - ret = wg_allowedips_insert_v6( - &peer->device->peer_allowedips, - nla_data(attrs[WGALLOWEDIP_A_IPADDR]), cidr, peer, - &peer->device->device_update_lock); + nla_len(attrs[WGALLOWEDIP_A_IPADDR]) == sizeof(struct in_addr)) { + if (flags & WGALLOWEDIP_F_REMOVE_ME) + ret = wg_allowedips_remove_v4(&peer->device->peer_allowedips, + nla_data(attrs[WGALLOWEDIP_A_IPADDR]), + cidr, + peer, + &peer->device->device_update_lock); + else + ret = wg_allowedips_insert_v4(&peer->device->peer_allowedips, + nla_data(attrs[WGALLOWEDIP_A_IPADDR]), + cidr, + peer, + &peer->device->device_update_lock); + } else if (family == AF_INET6 && cidr <= 128 && + nla_len(attrs[WGALLOWEDIP_A_IPADDR]) == sizeof(struct in6_addr)) { + if (flags & WGALLOWEDIP_F_REMOVE_ME) + ret = wg_allowedips_remove_v6(&peer->device->peer_allowedips, + nla_data(attrs[WGALLOWEDIP_A_IPADDR]), + cidr, + peer, + &peer->device->device_update_lock); + else + ret = wg_allowedips_insert_v6(&peer->device->peer_allowedips, + nla_data(attrs[WGALLOWEDIP_A_IPADDR]), + cidr, + peer, + &peer->device->device_update_lock); + } return ret; } diff --git a/drivers/net/wireguard/selftest/allowedips.c b/drivers/net/wireguard/selftest/allowedips.c index 3d1f64ff2e122..9f6458a889e96 100644 --- a/drivers/net/wireguard/selftest/allowedips.c +++ b/drivers/net/wireguard/selftest/allowedips.c @@ -461,6 +461,10 @@ static __init struct wg_peer *init_peer(void) wg_allowedips_insert_v##version(&t, ip##version(ipa, ipb, ipc, ipd), \ cidr, mem, &mutex) +#define remove(version, mem, ipa, ipb, ipc, ipd, cidr) \ + wg_allowedips_remove_v##version(&t, ip##version(ipa, ipb, ipc, ipd), \ + cidr, mem, &mutex) + #define maybe_fail() do { \ ++i; \ if (!_s) { \ @@ -586,6 +590,32 @@ bool __init wg_allowedips_selftest(void) test_negative(4, a, 192, 0, 0, 0); test_negative(4, a, 255, 0, 0, 0); + insert(4, a, 1, 0, 0, 0, 32); + insert(4, a, 192, 0, 0, 0, 24); + insert(6, a, 0x24446801, 0x40e40800, 0xdeaebeef, 0xdefbeef, 128); + insert(6, a, 0x24446800, 0xf0e40800, 0xeeaebeef, 0, 98); + test(4, a, 1, 0, 0, 0); + test(4, a, 192, 0, 0, 1); + test(6, a, 0x24446801, 0x40e40800, 0xdeaebeef, 0xdefbeef); + test(6, a, 0x24446800, 0xf0e40800, 0xeeaebeef, 0x10101010); + /* Must be an exact match to remove */ + remove(4, a, 192, 0, 0, 0, 32); + test(4, a, 192, 0, 0, 1); + remove(4, a, 192, 0, 0, 0, 24); + test_negative(4, a, 192, 0, 0, 1); + remove(4, a, 1, 0, 0, 0, 32); + test_negative(4, a, 1, 0, 0, 0); + /* Must be an exact match to remove */ + remove(6, a, 0x24446801, 0x40e40800, 0xdeaebeef, 0xdefbeef, 96); + test(6, a, 0x24446801, 0x40e40800, 0xdeaebeef, 0xdefbeef); + remove(6, a, 0x24446801, 0x40e40800, 0xdeaebeef, 0xdefbeef, 128); + test_negative(6, a, 0x24446801, 0x40e40800, 0xdeaebeef, 0xdefbeef); + /* Must match the peer to remove */ + remove(6, b, 0x24446800, 0xf0e40800, 0xeeaebeef, 0, 98); + test(6, a, 0x24446800, 0xf0e40800, 0xeeaebeef, 0x10101010); + remove(6, a, 0x24446800, 0xf0e40800, 0xeeaebeef, 0, 98); + test_negative(6, a, 0x24446800, 0xf0e40800, 0xeeaebeef, 0x10101010); + wg_allowedips_free(&t, &mutex); wg_allowedips_init(&t); insert(4, a, 192, 168, 0, 0, 16); diff --git a/include/uapi/linux/wireguard.h b/include/uapi/linux/wireguard.h index ae88be14c9478..e219194cb9f5a 100644 --- a/include/uapi/linux/wireguard.h +++ b/include/uapi/linux/wireguard.h @@ -101,6 +101,10 @@ * WGALLOWEDIP_A_FAMILY: NLA_U16 * WGALLOWEDIP_A_IPADDR: struct in_addr or struct in6_addr * WGALLOWEDIP_A_CIDR_MASK: NLA_U8 + * WGALLOWEDIP_A_FLAGS: NLA_U32, WGALLOWEDIP_F_REMOVE_ME if + * the specified IP should be removed, + * otherwise this IP will be added if + * it is not already present. * 0: NLA_NESTED * ... * 0: NLA_NESTED @@ -132,7 +136,7 @@ #define _WG_UAPI_WIREGUARD_H #define WG_GENL_NAME "wireguard" -#define WG_GENL_VERSION 1 +#define WG_GENL_VERSION 2 #define WG_KEY_LEN 32 @@ -184,11 +188,16 @@ enum wgpeer_attribute { }; #define WGPEER_A_MAX (__WGPEER_A_LAST - 1) +enum wgallowedip_flag { + WGALLOWEDIP_F_REMOVE_ME = 1U << 0, + __WGALLOWEDIP_F_ALL = WGALLOWEDIP_F_REMOVE_ME +}; enum wgallowedip_attribute { WGALLOWEDIP_A_UNSPEC, WGALLOWEDIP_A_FAMILY, WGALLOWEDIP_A_IPADDR, WGALLOWEDIP_A_CIDR_MASK, + WGALLOWEDIP_A_FLAGS, __WGALLOWEDIP_A_LAST }; #define WGALLOWEDIP_A_MAX (__WGALLOWEDIP_A_LAST - 1) diff --git a/tools/testing/selftests/wireguard/Makefile b/tools/testing/selftests/wireguard/Makefile new file mode 100644 index 0000000000000..4f4db54f89cb3 --- /dev/null +++ b/tools/testing/selftests/wireguard/Makefile @@ -0,0 +1,18 @@ +# SPDX-License-Identifier: GPL-2.0 +# +# Note: To build this you must install libnl-3 and libnl-genl-3 development +# packages. +remove-ip: + gcc -I/usr/include/libnl3 \ + -I../../../../usr/include \ + remove-ip.c \ + -o remove-ip \ + -lnl-genl-3 \ + -lnl-3 + +.PHONY: all +all: remove-ip + +.PHONY: clean +clean: + rm remove-ip diff --git a/tools/testing/selftests/wireguard/netns.sh b/tools/testing/selftests/wireguard/netns.sh index 405ff262ca93d..70058d6ebbe85 100755 --- a/tools/testing/selftests/wireguard/netns.sh +++ b/tools/testing/selftests/wireguard/netns.sh @@ -28,6 +28,7 @@ exec 3>&1 export LANG=C export WG_HIDE_KEYS=never NPROC=( /sys/devices/system/cpu/cpu+([0-9]) ); NPROC=${#NPROC[@]} +SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) netns0="wg-test-$$-0" netns1="wg-test-$$-1" netns2="wg-test-$$-2" @@ -610,6 +611,43 @@ n0 wg set wg0 peer "$pub2" allowed-ips "$allowedips" } < <(n0 wg show wg0 allowed-ips) ip0 link del wg0 +# Test IP removal +allowedips=( ) +for i in {1..197}; do + allowedips+=( 192.168.0.$i ) + allowedips+=( abcd::$i ) +done +saved_ifs="$IFS" +IFS=, +allowedips="${allowedips[*]}" +IFS="$saved_ifs" +ip0 link add wg0 type wireguard +n0 wg set wg0 peer "$pub1" allowed-ips "$allowedips" +pub1_hex=$(echo "$pub1" | base64 -d | xxd -p -c 50) +n0 $SCRIPT_DIR/remove-ip wg0 "$pub1_hex" 4 192.168.0.1 +n0 $SCRIPT_DIR/remove-ip wg0 "$pub1_hex" 4 192.168.0.20 +n0 $SCRIPT_DIR/remove-ip wg0 "$pub1_hex" 4 192.168.0.100 +n0 $SCRIPT_DIR/remove-ip wg0 "$pub1_hex" 6 abcd::1 +n0 $SCRIPT_DIR/remove-ip wg0 "$pub1_hex" 6 abcd::20 +n0 $SCRIPT_DIR/remove-ip wg0 "$pub1_hex" 6 abcd::100 +n0 wg show wg0 allowed-ips +{ + read -r pub allowedips + [[ $pub == "$pub1" ]] + i=0 + for ip in $allowedips; do + [[ "$ip" != "192.168.0.1" ]] + [[ "$ip" != "192.168.0.20" ]] + [[ "$ip" != "192.168.0.100" ]] + [[ "$ip" != "abcd::1" ]] + [[ "$ip" != "abcd::20" ]] + [[ "$ip" != "abcd::100" ]] + ((++i)) + done + ((i == 388)) +} < <(n0 wg show wg0 allowed-ips) +ip0 link del wg0 + ! n0 wg show doesnotexist || false ip0 link add wg0 type wireguard diff --git a/tools/testing/selftests/wireguard/remove-ip.c b/tools/testing/selftests/wireguard/remove-ip.c new file mode 100644 index 0000000000000..242f922d99b56 --- /dev/null +++ b/tools/testing/selftests/wireguard/remove-ip.c @@ -0,0 +1,126 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <linux/wireguard.h> +#include <sys/socket.h> +#include <netinet/in.h> +#include <arpa/inet.h> +#include <netlink/socket.h> +#include <netlink/netlink.h> +#include <netlink/genl/ctrl.h> +#include <netlink/genl/genl.h> +#include <netlink/genl/family.h> + +#define CURVE25519_KEY_SIZE 32 + +const char *usage = "Usage: remove-ip INTERFACE_NAME PEER_PUBLIC_KEY_HEX IP_VERSION IP"; + +char h2b(char c) +{ + if ('0' <= c && c <= '9') + return c - '0'; + else if ('a' <= c && c <= 'f') + return 10 + (c - 'a'); + + return -1; +} + +int parse_key(const char *raw, unsigned char key[CURVE25519_KEY_SIZE]) +{ + int ret = 0; + int i; + + for (i = 0; i < CURVE25519_KEY_SIZE; i++) { + char h, l; + + h = h2b(raw[0]); + if (h < 0) + return -1; + + l = h2b(raw[1]); + if (l < 0) + return -1; + + key[i] = (h << 4) | l; + raw += 2; + } + + return 0; +} + +int main(int argc, char **argv) +{ + unsigned char addr[sizeof(struct in6_addr)]; + unsigned char pub_key[CURVE25519_KEY_SIZE]; + struct nl_sock *sock; + struct nl_msg *msg; + int addr_len; + int family; + int cidr; + int af; + + if (argc < 5) { + printf("Not enough arguments.\n\n%s\n", usage); + return -1; + } + + if (parse_key(argv[2], pub_key)) { + printf("Could not parse public key\n"); + return -1; + } + + switch (argv[3][0]) { + case '4': + af = AF_INET; + addr_len = sizeof(struct in_addr); + cidr = 32; + break; + case '6': + af = AF_INET6; + addr_len = sizeof(struct in6_addr); + cidr = 128; + break; + default: + printf("Invalid IP version\n"); + return -1; + } + + if (inet_pton(af, argv[4], &addr) <= 0) { + printf("Could not parse IP address\n"); + return -1; + } + + sock = nl_socket_alloc(); + genl_connect(sock); + family = genl_ctrl_resolve(sock, WG_GENL_NAME); + msg = nlmsg_alloc(); + genlmsg_put(msg, NL_AUTO_PID, NL_AUTO_SEQ, family, 0, NLM_F_ECHO, + WG_CMD_SET_DEVICE, WG_GENL_VERSION); + nla_put_string(msg, WGDEVICE_A_IFNAME, argv[1]); + + struct nlattr *peers = nla_nest_start(msg, WGDEVICE_A_PEERS); + struct nlattr *peer0 = nla_nest_start(msg, 0); + + nla_put(msg, WGPEER_A_PUBLIC_KEY, CURVE25519_KEY_SIZE, pub_key); + + struct nlattr *allowed_ips = nla_nest_start(msg, WGPEER_A_ALLOWEDIPS); + struct nlattr *allowed_ip0 = nla_nest_start(msg, 0); + + nla_put_u16(msg, WGALLOWEDIP_A_FAMILY, af); + nla_put(msg, WGALLOWEDIP_A_IPADDR, addr_len, &addr); + nla_put_u8(msg, WGALLOWEDIP_A_CIDR_MASK, cidr); + nla_put_u32(msg, WGALLOWEDIP_A_FLAGS, WGALLOWEDIP_F_REMOVE_ME); + nla_nest_end(msg, allowed_ip0); + nla_nest_end(msg, allowed_ips); + nla_nest_end(msg, peer0); + nla_nest_end(msg, peers); + + int err = nl_send_sync(sock, msg); + + if (err < 0) { + char message[256]; + + nl_perror(err, message); + printf("An error occurred: %d - %s\n", err, message); + } + + return err; +} -- 2.46.0.598.g6f2099f65c-goog

6 months, 4 weeks

3
7
0 0

[PATCH v5 0/2] kselftest: tmpfs: Add ksft macros and skip if no root

by Shivam Chaudhary

This version 5 patch series replace direct error handling methods with ksft macros, which provide better reporting.Currently, when the tmpfs test runs, it does not display any output if it passes,and if it fails (particularly when not run as root),it simply exits without any warning or message. This series of patch adds: 1. Add 'ksft_print_header()' and 'ksft_set_plan()' to structure test outputs more effectively. 2. Error if not run as root. 3. Replace direct error handling with 'ksft_test_result_*', 'ksft_exit_fail_msg' macros for better reporting. v4->v5: - Remove unnecessary pass messages. - Remove unnecessary use of KSFT_SKIP. - Add appropriate use of ksft_exit_fail_msg. v4 1/2: https://lore.kernel.org/all/20241105202639.1977356-2-cvam0000@gmail.com/ v4 2/2: https://lore.kernel.org/all/20241105202639.1977356-3-cvam0000@gmail.com/ v3->v4: - Start a patchset - Split patch into smaller patches to make it easy to review. Patch1 Replace 'ksft_test_result_skip' with 'KSFT_SKIP' during root run check. Patch2 Replace 'ksft_test_result_fail' with 'KSFT_SKIP' where fail does not make sense, or failure could be due to not unsupported APIs with appropriate warnings. v3: https://lore.kernel.org/all/20241028185756.111832-1-cvam0000@gmail.com/ v2->v3: - Remove extra ksft_set_plan() - Remove function for unshare() - Fix the comment style v2: https://lore.kernel.org/all/20241026191621.2860376-1-cvam0000@gmail.com/ v1->v2: - Make the commit message more clear. v1: https://lore.kernel.org/all/20241024200228.1075840-1-cvam0000@gmail.com/T/#u thanks Shivam Shivam Chaudhary (2): selftests: tmpfs: Add Test-fail if not run as root selftests: tmpfs: Add kselftest support to tmpfs .../selftests/tmpfs/bug-link-o-tmpfile.c | 60 ++++++++++++------- 1 file changed, 37 insertions(+), 23 deletions(-) -- 2.45.2

6 months, 4 weeks

2
5
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror November 2024