June 2025 - Linux-kselftest-mirror

[PATCH bpf-next v5 0/3] Allow mmap of /sys/kernel/btf/vmlinux

by Lorenz Bauer

I'd like to cut down the memory usage of parsing vmlinux BTF in ebpf-go. With some upcoming changes the library is sitting at 5MiB for a parse. Most of that memory is simply copying the BTF blob into user space. By allowing vmlinux BTF to be mmapped read-only into user space I can cut memory usage by about 75%. Signed-off-by: Lorenz Bauer <lmb(a)isovalent.com> --- Changes in v5: - Fix error return of btf_parse_raw_mmap (Andrii) - Link to v4: https://lore.kernel.org/r/20250510-vmlinux-mmap-v4-0-69e424b2a672@isovalent… Changes in v4: - Go back to remap_pfn_range for aarch64 compat - Dropped btf_new_no_copy (Andrii) - Fixed nits in selftests (Andrii) - Clearer error handling in the mmap handler (Andrii) - Fixed build on s390 - Link to v3: https://lore.kernel.org/r/20250505-vmlinux-mmap-v3-0-5d53afa060e8@isovalent… Changes in v3: - Remove slightly confusing calculation of trailing (Alexei) - Use vm_insert_page (Alexei) - Simplified libbpf code - Link to v2: https://lore.kernel.org/r/20250502-vmlinux-mmap-v2-0-95c271434519@isovalent… Changes in v2: - Use btf__new in selftest - Avoid vm_iomap_memory in btf_vmlinux_mmap - Add VM_DONTDUMP - Add support to libbpf - Link to v1: https://lore.kernel.org/r/20250501-vmlinux-mmap-v1-0-aa2724572598@isovalent… --- Lorenz Bauer (3): btf: allow mmap of vmlinux btf selftests: bpf: add a test for mmapable vmlinux BTF libbpf: Use mmap to parse vmlinux BTF from sysfs include/asm-generic/vmlinux.lds.h | 3 +- kernel/bpf/sysfs_btf.c | 32 ++++++++ tools/lib/bpf/btf.c | 89 +++++++++++++++++----- tools/testing/selftests/bpf/prog_tests/btf_sysfs.c | 81 ++++++++++++++++++++ 4 files changed, 186 insertions(+), 19 deletions(-) --- base-commit: 7220eabff8cb4af3b93cd021aa853b9f5df2923f change-id: 20250501-vmlinux-mmap-2ec5563c3ef1 Best regards, -- Lorenz Bauer <lmb(a)isovalent.com>

2 months, 2 weeks

6
11
0 0

[PATCH net-next v12 00/10] tun: Introduce virtio-net hashing feature

by Akihiko Odaki

NOTE: I'm leaving Daynix Computing Ltd., for which I worked on this patch series, by the end of this month. While net-next is closed, this is the last chance for me to send another version so let me send the local changes now. Please contact Yuri Benditovich, who is CCed on this email, for anything about this series. virtio-net have two usage of hashes: one is RSS and another is hash reporting. Conventionally the hash calculation was done by the VMM. However, computing the hash after the queue was chosen defeats the purpose of RSS. Another approach is to use eBPF steering program. This approach has another downside: it cannot report the calculated hash due to the restrictive nature of eBPF. Introduce the code to compute hashes to the kernel in order to overcome thse challenges. An alternative solution is to extend the eBPF steering program so that it will be able to report to the userspace, but it is based on context rewrites, which is in feature freeze. We can adopt kfuncs, but they will not be UAPIs. We opt to ioctl to align with other relevant UAPIs (KVM and vhost_net). The patches for QEMU to use this new feature was submitted as RFC and is available at: https://patchew.org/QEMU/20250530-hash-v5-0-343d7d7a8200@daynix.com/ This work was presented at LPC 2024: https://lpc.events/event/18/contributions/1963/ V1 -> V2: Changed to introduce a new BPF program type. Signed-off-by: Akihiko Odaki <akihiko.odaki(a)daynix.com> --- Changes in v12: - Updated tools/testing/selftests/net/config. - Split TUNSETVNETHASH. - Link to v11: https://lore.kernel.org/r/20250317-rss-v11-0-4cacca92f31f@daynix.com Changes in v11: - Added the missing code to free vnet_hash in patch "tap: Introduce virtio-net hash feature". - Link to v10: https://lore.kernel.org/r/20250313-rss-v10-0-3185d73a9af0@daynix.com Changes in v10: - Split common code and TUN/TAP-specific code into separate patches. - Reverted a spurious style change in patch "tun: Introduce virtio-net hash feature". - Added a comment explaining disable_ipv6 in tests. - Used AF_PACKET for patch "selftest: tun: Add tests for virtio-net hashing". I also added the usage of FIXTURE_VARIANT() as the testing function now needs access to more variant-specific variables. - Corrected the message of patch "selftest: tun: Add tests for virtio-net hashing"; it mentioned validation of configuration but it is not scope of this patch. - Expanded the description of patch "selftest: tun: Add tests for virtio-net hashing". - Added patch "tun: Allow steering eBPF program to fall back". - Changed to handle TUNGETVNETHASHCAP before taking the rtnl lock. - Removed redundant tests for tun_vnet_ioctl(). - Added patch "selftest: tap: Add tests for virtio-net ioctls". - Added a design explanation of ioctls for extensibility and migration. - Removed a few branches in patch "vhost/net: Support VIRTIO_NET_F_HASH_REPORT". - Link to v9: https://lore.kernel.org/r/20250307-rss-v9-0-df76624025eb@daynix.com Changes in v9: - Added a missing return statement in patch "tun: Introduce virtio-net hash feature". - Link to v8: https://lore.kernel.org/r/20250306-rss-v8-0-7ab4f56ff423@daynix.com Changes in v8: - Disabled IPv6 to eliminate noises in tests. - Added a branch in tap to avoid unnecessary dissection when hash reporting is disabled. - Removed unnecessary rtnl_lock(). - Extracted code to handle new ioctls into separate functions to avoid adding extra NULL checks to the code handling other ioctls. - Introduced variable named "fd" to __tun_chr_ioctl(). - s/-/=/g in a patch message to avoid confusing Git. - Link to v7: https://lore.kernel.org/r/20250228-rss-v7-0-844205cbbdd6@daynix.com Changes in v7: - Ensured to set hash_report to VIRTIO_NET_HASH_REPORT_NONE for VHOST_NET_F_VIRTIO_NET_HDR. - s/4/sizeof(u32)/ in patch "virtio_net: Add functions for hashing". - Added tap_skb_cb type. - Rebased. - Link to v6: https://lore.kernel.org/r/20250109-rss-v6-0-b1c90ad708f6@daynix.com Changes in v6: - Extracted changes to fill vnet header holes into another series. - Squashed patches "skbuff: Introduce SKB_EXT_TUN_VNET_HASH", "tun: Introduce virtio-net hash reporting feature", and "tun: Introduce virtio-net RSS" into patch "tun: Introduce virtio-net hash feature". - Dropped the RFC tag. - Link to v5: https://lore.kernel.org/r/20241008-rss-v5-0-f3cf68df005d@daynix.com Changes in v5: - Fixed a compilation error with CONFIG_TUN_VNET_CROSS_LE. - Optimized the calculation of the hash value according to: https://git.dpdk.org/dpdk/commit/?id=3fb1ea032bd6ff8317af5dac9af901f1f324ca… - Added patch "tun: Unify vnet implementation". - Dropped patch "tap: Pad virtio header with zero". - Added patch "selftest: tun: Test vnet ioctls without device". - Reworked selftests to skip for older kernels. - Documented the case when the underlying device is deleted and packets have queue_mapping set by TC. - Reordered test harness arguments. - Added code to handle fragmented packets. - Link to v4: https://lore.kernel.org/r/20240924-rss-v4-0-84e932ec0e6c@daynix.com Changes in v4: - Moved tun_vnet_hash_ext to if_tun.h. - Renamed virtio_net_toeplitz() to virtio_net_toeplitz_calc(). - Replaced htons() with cpu_to_be16(). - Changed virtio_net_hash_rss() to return void. - Reordered variable declarations in virtio_net_hash_rss(). - Removed virtio_net_hdr_v1_hash_from_skb(). - Updated messages of "tap: Pad virtio header with zero" and "tun: Pad virtio header with zero". - Fixed vnet_hash allocation size. - Ensured to free vnet_hash when destructing tun_struct. - Link to v3: https://lore.kernel.org/r/20240915-rss-v3-0-c630015db082@daynix.com Changes in v3: - Reverted back to add ioctl. - Split patch "tun: Introduce virtio-net hashing feature" into "tun: Introduce virtio-net hash reporting feature" and "tun: Introduce virtio-net RSS". - Changed to reuse hash values computed for automq instead of performing RSS hashing when hash reporting is requested but RSS is not. - Extracted relevant data from struct tun_struct to keep it minimal. - Added kernel-doc. - Changed to allow calling TUNGETVNETHASHCAP before TUNSETIFF. - Initialized num_buffers with 1. - Added a test case for unclassified packets. - Fixed error handling in tests. - Changed tests to verify that the queue index will not overflow. - Rebased. - Link to v2: https://lore.kernel.org/r/20231015141644.260646-1-akihiko.odaki@daynix.com --- Akihiko Odaki (10): virtio_net: Add functions for hashing net: flow_dissector: Export flow_keys_dissector_symmetric tun: Allow steering eBPF program to fall back tun: Add common virtio-net hash feature code tun: Introduce virtio-net hash feature tap: Introduce virtio-net hash feature selftest: tun: Test vnet ioctls without device selftest: tun: Add tests for virtio-net hashing selftest: tap: Add tests for virtio-net ioctls vhost/net: Support VIRTIO_NET_F_HASH_REPORT Documentation/networking/tuntap.rst | 7 + drivers/net/Kconfig | 1 + drivers/net/ipvlan/ipvtap.c | 2 +- drivers/net/macvtap.c | 2 +- drivers/net/tap.c | 80 +++++- drivers/net/tun.c | 92 +++++-- drivers/net/tun_vnet.h | 165 +++++++++++- drivers/vhost/net.c | 68 ++--- include/linux/if_tap.h | 4 +- include/linux/skbuff.h | 3 + include/linux/virtio_net.h | 188 ++++++++++++++ include/net/flow_dissector.h | 1 + include/uapi/linux/if_tun.h | 80 ++++++ net/core/flow_dissector.c | 3 +- net/core/skbuff.c | 4 + tools/testing/selftests/net/Makefile | 2 +- tools/testing/selftests/net/config | 1 + tools/testing/selftests/net/tap.c | 131 +++++++++- tools/testing/selftests/net/tun.c | 485 ++++++++++++++++++++++++++++++++++- 19 files changed, 1234 insertions(+), 85 deletions(-) --- base-commit: 5cb8274d66c611b7889565c418a8158517810f9b change-id: 20240403-rss-e737d89efa77 Best regards, -- Akihiko Odaki <akihiko.odaki(a)daynix.com>

2 months, 2 weeks

5
38
0 0

[PATCH v17 00/27] riscv control-flow integrity for usermode

by Deepak Gupta

Basics and overview =================== Software with larger attack surfaces (e.g. network facing apps like databases, browsers or apps relying on browser runtimes) suffer from memory corruption issues which can be utilized by attackers to bend control flow of the program to eventually gain control (by making their payload executable). Attackers are able to perform such attacks by leveraging call-sites which rely on indirect calls or return sites which rely on obtaining return address from stack memory. To mitigate such attacks, risc-v extension zicfilp enforces that all indirect calls must land on a landing pad instruction `lpad` else cpu will raise software check exception (a new cpu exception cause code on riscv). Similarly for return flow, risc-v extension zicfiss extends architecture with - `sspush` instruction to push return address on a shadow stack - `sspopchk` instruction to pop return address from shadow stack and compare with input operand (i.e. return address on stack) - `sspopchk` to raise software check exception if comparision above was a mismatch - Protection mechanism using which shadow stack is not writeable via regular store instructions More information an details can be found at extensions github repo [1]. Equivalent to landing pad (zicfilp) on x86 is `ENDBRANCH` instruction in Intel CET [3] and branch target identification (BTI) [4] on arm. Similarly x86's Intel CET has shadow stack [5] and arm64 has guarded control stack (GCS) [6] which are very similar to risc-v's zicfiss shadow stack. x86 and arm64 support for user mode shadow stack is already in mainline. Kernel awareness for user control flow integrity ================================================ This series picks up Samuel Holland's envcfg changes [2] as well. So if those are being applied independently, they should be removed from this series. Enabling: In order to maintain compatibility and not break anything in user mode, kernel doesn't enable control flow integrity cpu extensions on binary by default. Instead exposes a prctl interface to enable, disable and lock the shadow stack or landing pad feature for a task. This allows userspace (loader) to enumerate if all objects in its address space are compiled with shadow stack and landing pad support and accordingly enable the feature. Additionally if a subsequent `dlopen` happens on a library, user mode can take a decision again to disable the feature (if incoming library is not compiled with support) OR terminate the task (if user mode policy is strict to have all objects in address space to be compiled with control flow integirty cpu feature). prctl to enable shadow stack results in allocating shadow stack from virtual memory and activating for user address space. x86 and arm64 are also following same direction due to similar reason(s). clone/fork: On clone and fork, cfi state for task is inherited by child. Shadow stack is part of virtual memory and is a writeable memory from kernel perspective (writeable via a restricted set of instructions aka shadow stack instructions) Thus kernel changes ensure that this memory is converted into read-only when fork/clone happens and COWed when fault is taken due to sspush, sspopchk or ssamoswap. In case `CLONE_VM` is specified and shadow stack is to be enabled, kernel will automatically allocate a shadow stack for that clone call. map_shadow_stack: x86 introduced `map_shadow_stack` system call to allow user space to explicitly map shadow stack memory in its address space. It is useful to allocate shadow for different contexts managed by a single thread (green threads or contexts) risc-v implements this system call as well. signal management: If shadow stack is enabled for a task, kernel performs an asynchronous control flow diversion to deliver the signal and eventually expects userspace to issue sigreturn so that original execution can be resumed. Even though resume context is prepared by kernel, it is in user space memory and is subject to memory corruption and corruption bugs can be utilized by attacker in this race window to perform arbitrary sigreturn and eventually bypass cfi mechanism. Another issue is how to ensure that cfi related state on sigcontext area is not trampled by legacy apps or apps compiled with old kernel headers. In order to mitigate control-flow hijacting, kernel prepares a token and place it on shadow stack before signal delivery and places address of token in sigcontext structure. During sigreturn, kernel obtains address of token from sigcontext struture, reads token from shadow stack and validates it and only then allow sigreturn to succeed. Compatiblity issue is solved by adopting dynamic sigcontext management introduced for vector extension. This series re-factor the code little bit to allow future sigcontext management easy (as proposed by Andy Chiu from SiFive) config and compilation: Introduce a new risc-v config option `CONFIG_RISCV_USER_CFI`. Selecting this config option picks the kernel support for user control flow integrity. This optin is presented only if toolchain has shadow stack and landing pad support. And is on purpose guarded by toolchain support. Reason being that eventually vDSO also needs to be compiled in with shadow stack and landing pad support. vDSO compile patches are not included as of now because landing pad labeling scheme is yet to settle for usermode runtime. To get more information on kernel interactions with respect to zicfilp and zicfiss, patch series adds documentation for `zicfilp` and `zicfiss` in following: Documentation/arch/riscv/zicfiss.rst Documentation/arch/riscv/zicfilp.rst How to test this series ======================= Toolchain --------- $ git clone git@github.com:sifive/riscv-gnu-toolchain.git -b cfi-dev $ riscv-gnu-toolchain/configure --prefix=<path-to-where-to-build> --with-arch=rv64gc_zicfilp_zicfiss --enable-linux --disable-gdb --with-extra-multilib-test="rv64gc_zicfilp_zicfiss-lp64d:-static" $ make -j$(nproc) Qemu ---- Get the lastest qemu $ cd qemu $ mkdir build $ cd build $ ../configure --target-list=riscv64-softmmu $ make -j$(nproc) Opensbi ------- $ git clone git@github.com:deepak0414/opensbi.git -b v6_cfi_spec_split_opensbi $ make CROSS_COMPILE=<your riscv toolchain> -j$(nproc) PLATFORM=generic Linux ----- Running defconfig is fine. CFI is enabled by default if the toolchain supports it. $ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) defconfig $ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) In case you're building your own rootfs using toolchain, please make sure you pick following patch to ensure that vDSO compiled with lpad and shadow stack. "arch/riscv: compile vdso with landing pad" Branch where above patch can be picked https://github.com/deepak0414/linux-riscv-cfi/tree/vdso_user_cfi_v6.12-rc1 Running ------- Modify your qemu command to have: -bios <path-to-cfi-opensbi>/build/platform/generic/firmware/fw_dynamic.bin -cpu rv64,zicfilp=true,zicfiss=true,zimop=true,zcmop=true vDSO related Opens (in the flux) ================================= I am listing these opens for laying out plan and what to expect in future patch sets. And of course for the sake of discussion. Shadow stack and landing pad enabling in vDSO ---------------------------------------------- vDSO must have shadow stack and landing pad support compiled in for task to have shadow stack and landing pad support. This patch series doesn't enable that (yet). Enabling shadow stack support in vDSO should be straight forward (intend to do that in next versions of patch set). Enabling landing pad support in vDSO requires some collaboration with toolchain folks to follow a single label scheme for all object binaries. This is necessary to ensure that all indirect call-sites are setting correct label and target landing pads are decorated with same label scheme. How many vDSOs --------------- Shadow stack instructions are carved out of zimop (may be operations) and if CPU doesn't implement zimop, they're illegal instructions. Kernel could be running on a CPU which may or may not implement zimop. And thus kernel will have to carry 2 different vDSOs and expose the appropriate one depending on whether CPU implements zimop or not. References ========== [1] - https://github.com/riscv/riscv-cfi [2] - https://lore.kernel.org/all/20240814081126.956287-1-samuel.holland@sifive.c… [3] - https://lwn.net/Articles/889475/ [4] - https://developer.arm.com/documentation/109576/0100/Branch-Target-Identific… [5] - https://www.intel.com/content/dam/develop/external/us/en/documents/catc17-i… [6] - https://lwn.net/Articles/940403/ To: Thomas Gleixner <tglx(a)linutronix.de> To: Ingo Molnar <mingo(a)redhat.com> To: Borislav Petkov <bp(a)alien8.de> To: Dave Hansen <dave.hansen(a)linux.intel.com> To: x86(a)kernel.org To: H. Peter Anvin <hpa(a)zytor.com> To: Andrew Morton <akpm(a)linux-foundation.org> To: Liam R. Howlett <Liam.Howlett(a)oracle.com> To: Vlastimil Babka <vbabka(a)suse.cz> To: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> To: Paul Walmsley <paul.walmsley(a)sifive.com> To: Palmer Dabbelt <palmer(a)dabbelt.com> To: Albert Ou <aou(a)eecs.berkeley.edu> To: Conor Dooley <conor(a)kernel.org> To: Rob Herring <robh(a)kernel.org> To: Krzysztof Kozlowski <krzk+dt(a)kernel.org> To: Arnd Bergmann <arnd(a)arndb.de> To: Christian Brauner <brauner(a)kernel.org> To: Peter Zijlstra <peterz(a)infradead.org> To: Oleg Nesterov <oleg(a)redhat.com> To: Eric Biederman <ebiederm(a)xmission.com> To: Kees Cook <kees(a)kernel.org> To: Jonathan Corbet <corbet(a)lwn.net> To: Shuah Khan <shuah(a)kernel.org> To: Jann Horn <jannh(a)google.com> To: Conor Dooley <conor+dt(a)kernel.org> To: Miguel Ojeda <ojeda(a)kernel.org> To: Alex Gaynor <alex.gaynor(a)gmail.com> To: Boqun Feng <boqun.feng(a)gmail.com> To: Gary Guo <gary(a)garyguo.net> To: Björn Roy Baron <bjorn3_gh(a)protonmail.com> To: Benno Lossin <benno.lossin(a)proton.me> To: Andreas Hindborg <a.hindborg(a)kernel.org> To: Alice Ryhl <aliceryhl(a)google.com> To: Trevor Gross <tmgross(a)umich.edu> Cc: linux-kernel(a)vger.kernel.org Cc: linux-fsdevel(a)vger.kernel.org Cc: linux-mm(a)kvack.org Cc: linux-riscv(a)lists.infradead.org Cc: devicetree(a)vger.kernel.org Cc: linux-arch(a)vger.kernel.org Cc: linux-doc(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: alistair.francis(a)wdc.com Cc: richard.henderson(a)linaro.org Cc: jim.shu(a)sifive.com Cc: andybnac(a)gmail.com Cc: kito.cheng(a)sifive.com Cc: charlie(a)rivosinc.com Cc: atishp(a)rivosinc.com Cc: evan(a)rivosinc.com Cc: cleger(a)rivosinc.com Cc: alexghiti(a)rivosinc.com Cc: samitolvanen(a)google.com Cc: broonie(a)kernel.org Cc: rick.p.edgecombe(a)intel.com Cc: rust-for-linux(a)vger.kernel.org changelog --------- v17: - fixed warnings due to empty macros in usercfi.h (reported by alexg) - fixed prefixes in commit titles reported by alexg - took below uprobe with fcfi v2 patch from Zong Li and squashed it with "riscv/traps: Introduce software check exception and uprobe handling" https://lore.kernel.org/all/20250604093403.10916-1-zong.li@sifive.com/ v16: - If FWFT is not implemented or returns error for shadow stack activation, then no_usercfi is set to disable shadow stack. Although this should be picked up by extension validation and activation. Fixed this bug for zicfilp and zicfiss both. Thanks to Charlie Jenkins for reporting this. - If toolchain doesn't support cfi, cfi kselftest shouldn't build. Suggested by Charlie Jenkins. - Default for CONFIG_RISCV_USER_CFI is set to no. Charlie/Atish suggested to keep it off till we have more hardware availibility with RVA23 profile and zimop/zcmop implemented. Else this will start breaking people's workflow - Includes the fix if "!RV64 and !SBI" then definitions for FWFT in asm-offsets.c error. v15: - Toolchain has been updated to include `-fcf-protection` flag. This exists for x86 as well. Updated kernel patches to compile vDSO and selftest to compile with `fcf-protection=full` flag. - selecting CONFIG_RISCV_USERCFI selects CONFIG_RISCV_SBI. - Patch to enable shadow stack for kernel wasn't hidden behind CONFIG_RISCV_USERCFI and CONFIG_RISCV_SBI. fixed that. v14: - rebased on top of palmer/sbi-v3. Thus dropped clement's FWFT patches Updated RISCV_ISA_EXT_XXXX in hwcap and hwprobe constants. - Took Radim's suggestions on bitfields. - Placed cfi_state at the end of thread_info block so that current situation is not disturbed with respect to member fields of thread_info in single cacheline. v13: - cpu_supports_shadow_stack/cpu_supports_indirect_br_lp_instr uses riscv_has_extension_unlikely() - uses nops(count) to create nop slide - RISCV_ACQUIRE_BARRIER is not needed in `amo_user_shstk`. Removed it - changed ternaries to simply use implicit casting to convert to bool. - kernel command line allows to disable zicfilp and zicfiss independently. updated kernel-parameters.txt. - ptrace user abi for cfi uses bitmasks instead of bitfields. Added ptrace kselftest. - cosmetic and grammatical changes to documentation. v12: - It seems like I had accidently squashed arch agnostic indirect branch tracking prctl and riscv implementation of those prctls. Split them again. - set_shstk_status/set_indir_lp_status perform CSR writes only when CPU support is available. As suggested by Zong Li. - Some minor clean up in kselftests as suggested by Zong Li. v11: - patch "arch/riscv: compile vdso with landing pad" was unconditionally selecting `_zicfilp` for vDSO compile. fixed that. Changed `lpad 1` to to `lpad 0`. v10: - dropped "mm: helper `is_shadow_stack_vma` to check shadow stack vma". This patch is not that interesting to this patch series for risc-v. There are instances in arch directories where VM_SHADOW_STACK flag is anyways used. Dropping this patch to expedite merging in riscv tree. - Took suggestions from `Clement` on "riscv: zicfiss / zicfilp enumeration" to validate presence of cfi based on config. - Added a patch for vDSO to have `lpad 0`. I had omitted this earlier to make sure we add single vdso object with cfi enabled. But a vdso object with scheme of zero labeled landing pad is least common denominator and should work with all objects of zero labeled as well as function-signature labeled objects. v9: - rebased on master (39a803b754d5 fix braino in "9p: fix ->rename_sem exclusion") - dropped "mm: Introduce ARCH_HAS_USER_SHADOW_STACK" (master has it from arm64/gcs) - dropped "prctl: arch-agnostic prctl for shadow stack" (master has it from arm64/gcs) v8: - rebased on palmer/for-next - dropped samuel holland's `envcfg` context switch patches. they are in parlmer/for-next v7: - Removed "riscv/Kconfig: enable HAVE_EXIT_THREAD for riscv" Instead using `deactivate_mm` flow to clean up. see here for more context https://lore.kernel.org/all/20230908203655.543765-1-rick.p.edgecombe@intel.… - Changed the header include in `kselftest`. Hopefully this fixes compile issue faced by Zong Li at SiFive. - Cleaned up an orphaned change to `mm/mmap.c` in below patch "riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE" - Lock interfaces for shadow stack and indirect branch tracking expect arg == 0 Any future evolution of this interface should accordingly define how arg should be setup. - `mm/map.c` has an instance of using `VM_SHADOW_STACK`. Fixed it to use helper `is_shadow_stack_vma`. - Link to v6: https://lore.kernel.org/r/20241008-v5_user_cfi_series-v6-0-60d9fe073f37@riv… v6: - Picked up Samuel Holland's changes as is with `envcfg` placed in `thread` instead of `thread_info` - fixed unaligned newline escapes in kselftest - cleaned up messages in kselftest and included test output in commit message - fixed a bug in clone path reported by Zong Li - fixed a build issue if CONFIG_RISCV_ISA_V is not selected (this was introduced due to re-factoring signal context management code) v5: - rebased on v6.12-rc1 - Fixed schema related issues in device tree file - Fixed some of the documentation related issues in zicfilp/ss.rst (style issues and added index) - added `SHADOW_STACK_SET_MARKER` so that implementation can define base of shadow stack. - Fixed warnings on definitions added in usercfi.h when CONFIG_RISCV_USER_CFI is not selected. - Adopted context header based signal handling as proposed by Andy Chiu - Added support for enabling kernel mode access to shadow stack using FWFT (https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/src/ext-firmware…) - Link to v5: https://lore.kernel.org/r/20241001-v5_user_cfi_series-v1-0-3ba65b6e550f@riv… (Note: I had an issue in my workflow due to which version number wasn't picked up correctly while sending out patches) v4: - rebased on 6.11-rc6 - envcfg: Converged with Samuel Holland's patches for envcfg management on per- thread basis. - vma_is_shadow_stack is renamed to is_vma_shadow_stack - picked up Mark Brown's `ARCH_HAS_USER_SHADOW_STACK` patch - signal context: using extended context management to maintain compatibility. - fixed `-Wmissing-prototypes` compiler warnings for prctl functions - Documentation fixes and amending typos. - Link to v4: https://lore.kernel.org/all/20240912231650.3740732-1-debug@rivosinc.com/ v3: - envcfg logic to pick up base envcfg had a bug where `ENVCFG_CBZE` could have been picked on per task basis, even though CPU didn't implement it. Fixed in this series. - dt-bindings As suggested, split into separate commit. fixed the messaging that spec is in public review - arch_is_shadow_stack change arch_is_shadow_stack changed to vma_is_shadow_stack - hwprobe zicfiss / zicfilp if present will get enumerated in hwprobe - selftests As suggested, added object and binary filenames to .gitignore Selftest binary anyways need to be compiled with cfi enabled compiler which will make sure that landing pad and shadow stack are enabled. Thus removed separate enable/disable tests. Cleaned up tests a bit. - Link to v3: https://lore.kernel.org/lkml/20240403234054.2020347-1-debug@rivosinc.com/ v2: - Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow integrity for user mode programs can be compiled in the kernel. - Enabling of control flow integrity for user programs is left to user runtime - This patch series introduces arch agnostic `prctls` to enable shadow stack and indirect branch tracking. And implements them on riscv. --- Changes in v17: - Link to v16: https://lore.kernel.org/r/20250522-v5_user_cfi_series-v16-0-64f61a35eee7@ri… Changes in v16: - Link to v15: https://lore.kernel.org/r/20250502-v5_user_cfi_series-v15-0-914966471885@ri… Changes in v15: - changelog posted just below cover letter - Link to v14: https://lore.kernel.org/r/20250429-v5_user_cfi_series-v14-0-5239410d012a@ri… Changes in v14: - changelog posted just below cover letter - Link to v13: https://lore.kernel.org/r/20250424-v5_user_cfi_series-v13-0-971437de586a@ri… Changes in v13: - changelog posted just below cover letter - Link to v12: https://lore.kernel.org/r/20250314-v5_user_cfi_series-v12-0-e51202b53138@ri… Changes in v12: - changelog posted just below cover letter - Link to v11: https://lore.kernel.org/r/20250310-v5_user_cfi_series-v11-0-86b36cbfb910@ri… Changes in v11: - changelog posted just below cover letter - Link to v10: https://lore.kernel.org/r/20250210-v5_user_cfi_series-v10-0-163dcfa31c60@ri… --- Andy Chiu (1): riscv: signal: abstract header saving for setup_sigcontext Deepak Gupta (25): mm: VM_SHADOW_STACK definition for riscv dt-bindings: riscv: zicfilp and zicfiss in dt-bindings (extensions.yaml) riscv: zicfiss / zicfilp enumeration riscv: zicfiss / zicfilp extension csr and bit definitions riscv: usercfi state for task and save/restore of CSR_SSP on trap entry/exit riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE riscv/mm: manufacture shadow stack pte riscv/mm: teach pte_mkwrite to manufacture shadow stack PTEs riscv/mm: write protect and shadow stack riscv/mm: Implement map_shadow_stack() syscall riscv/shstk: If needed allocate a new shadow stack on clone riscv: Implements arch agnostic shadow stack prctls prctl: arch-agnostic prctl for indirect branch tracking riscv: Implements arch agnostic indirect branch tracking prctls riscv/traps: Introduce software check exception and uprobe handling riscv/signal: save and restore of shadow stack for signal riscv/kernel: update __show_regs to print shadow stack register riscv/ptrace: riscv cfi status and state via ptrace and in core files riscv/hwprobe: zicfilp / zicfiss enumeration in hwprobe riscv: kernel command line option to opt out of user cfi riscv: enable kernel access to shadow stack memory via FWFT sbi call riscv: create a config for shadow stack and landing pad instr support riscv: Documentation for landing pad / indirect branch tracking riscv: Documentation for shadow stack on riscv kselftest/riscv: kselftest for user mode cfi Jim Shu (1): arch/riscv: compile vdso with landing pad Documentation/admin-guide/kernel-parameters.txt | 8 + Documentation/arch/riscv/index.rst | 2 + Documentation/arch/riscv/zicfilp.rst | 115 +++++ Documentation/arch/riscv/zicfiss.rst | 179 +++++++ .../devicetree/bindings/riscv/extensions.yaml | 14 + arch/riscv/Kconfig | 21 + arch/riscv/Makefile | 5 +- arch/riscv/include/asm/asm-prototypes.h | 1 + arch/riscv/include/asm/assembler.h | 44 ++ arch/riscv/include/asm/cpufeature.h | 12 + arch/riscv/include/asm/csr.h | 16 + arch/riscv/include/asm/entry-common.h | 2 + arch/riscv/include/asm/hwcap.h | 2 + arch/riscv/include/asm/mman.h | 26 + arch/riscv/include/asm/mmu_context.h | 7 + arch/riscv/include/asm/pgtable.h | 30 +- arch/riscv/include/asm/processor.h | 2 + arch/riscv/include/asm/thread_info.h | 3 + arch/riscv/include/asm/usercfi.h | 95 ++++ arch/riscv/include/asm/vector.h | 3 + arch/riscv/include/uapi/asm/hwprobe.h | 2 + arch/riscv/include/uapi/asm/ptrace.h | 34 ++ arch/riscv/include/uapi/asm/sigcontext.h | 1 + arch/riscv/kernel/Makefile | 1 + arch/riscv/kernel/asm-offsets.c | 10 + arch/riscv/kernel/cpufeature.c | 27 + arch/riscv/kernel/entry.S | 33 +- arch/riscv/kernel/head.S | 27 + arch/riscv/kernel/process.c | 27 +- arch/riscv/kernel/ptrace.c | 95 ++++ arch/riscv/kernel/signal.c | 148 +++++- arch/riscv/kernel/sys_hwprobe.c | 2 + arch/riscv/kernel/sys_riscv.c | 10 + arch/riscv/kernel/traps.c | 51 ++ arch/riscv/kernel/usercfi.c | 545 +++++++++++++++++++++ arch/riscv/kernel/vdso/Makefile | 6 + arch/riscv/kernel/vdso/flush_icache.S | 4 + arch/riscv/kernel/vdso/getcpu.S | 4 + arch/riscv/kernel/vdso/rt_sigreturn.S | 4 + arch/riscv/kernel/vdso/sys_hwprobe.S | 4 + arch/riscv/mm/init.c | 2 +- arch/riscv/mm/pgtable.c | 16 + include/linux/cpu.h | 4 + include/linux/mm.h | 7 + include/uapi/linux/elf.h | 2 + include/uapi/linux/prctl.h | 27 + kernel/sys.c | 30 ++ tools/testing/selftests/riscv/Makefile | 2 +- tools/testing/selftests/riscv/cfi/.gitignore | 3 + tools/testing/selftests/riscv/cfi/Makefile | 16 + tools/testing/selftests/riscv/cfi/cfi_rv_test.h | 82 ++++ tools/testing/selftests/riscv/cfi/riscv_cfi_test.c | 173 +++++++ tools/testing/selftests/riscv/cfi/shadowstack.c | 385 +++++++++++++++ tools/testing/selftests/riscv/cfi/shadowstack.h | 27 + 54 files changed, 2369 insertions(+), 29 deletions(-) --- base-commit: 4181f8ad7a1061efed0219951d608d4988302af7 change-id: 20240930-v5_user_cfi_series-3dc332f8f5b2 -- - debug

2 months, 2 weeks

2
34
0 0

[RFC Patch 0/2] selftests/mm: assert rmap behave as expected

by Wei Yang

As David suggested, currently we don't have a high level test case to verify the behavior of rmap. This patch set introduce the verification on rmap by migration. Patch 1 is a preparation to move ksm related operation into vm_util. Patch 2 is the new test case. Currently it covers following four scenarios: * anonymous page * shmem page * pagecache page * ksm page Wei Yang (2): selftests/mm: put general ksm operation into vm_util selftests/mm: assert rmap behave as expected MAINTAINERS | 1 + tools/testing/selftests/mm/.gitignore | 1 + tools/testing/selftests/mm/Makefile | 3 + .../selftests/mm/ksm_functional_tests.c | 76 +-- tools/testing/selftests/mm/rmap.c | 466 ++++++++++++++++++ tools/testing/selftests/mm/run_vmtests.sh | 4 + tools/testing/selftests/mm/vm_util.c | 71 +++ tools/testing/selftests/mm/vm_util.h | 7 + 8 files changed, 563 insertions(+), 66 deletions(-) create mode 100644 tools/testing/selftests/mm/rmap.c -- 2.34.1

2 months, 2 weeks

2
8
0 0

[PATCH v4 00/15] kunit: Introduce UAPI testing framework

by Thomas Weißschuh

Currently testing of userspace and in-kernel API use two different frameworks. kselftests for the userspace ones and Kunit for the in-kernel ones. Besides their different scopes, both have different strengths and limitations: Kunit: * Tests are normal kernel code. * They use the regular kernel toolchain. * They can be packaged and distributed as modules conveniently. Kselftests: * Tests are normal userspace code * They need a userspace toolchain. A kernel cross toolchain is likely not enough. * A fair amout of userland is required to run the tests, which means a full distro or handcrafted rootfs. * There is no way to conveniently package and run kselftests with a given kernel image. * The kselftests makefiles are not as powerful as regular kbuild. For example they are missing proper header dependency tracking or more complex compiler option modifications. Therefore kunit is much easier to run against different kernel configurations and architectures. This series aims to combine kselftests and kunit, avoiding both their limitations. It works by compiling the userspace kselftests as part of the regular kernel build, embedding them into the kunit kernel or module and executing them from there. If the kernel toolchain is not fit to produce userspace because of a missing libc, the kernel's own nolibc can be used instead. The structured TAP output from the kselftest is integrated into the kunit KTAP output transparently, the kunit parser can parse the combined logs together. Further room for improvements: * Call each test in its completely dedicated namespace * Handle additional test files besides the test executable through archives. CPIO, cramfs, etc. * Compatibility with kselftest_harness.h (in progress) * Expose the blobs in debugfs * Provide some convience wrappers around compat userprogs * Figure out a migration path/coexistence solution for kunit UAPI and tools/testing/selftests/ Output from the kunit example testcase, note the output of "example_uapi_tests". $ ./tools/testing/kunit/kunit.py run --kunitconfig lib/kunit example ... Running tests with: $ .kunit/linux kunit.filter_glob=example kunit.enable=1 mem=1G console=tty kunit_shutdown=halt [11:53:53] ================== example (10 subtests) =================== [11:53:53] [PASSED] example_simple_test [11:53:53] [SKIPPED] example_skip_test [11:53:53] [SKIPPED] example_mark_skipped_test [11:53:53] [PASSED] example_all_expect_macros_test [11:53:53] [PASSED] example_static_stub_test [11:53:53] [PASSED] example_static_stub_using_fn_ptr_test [11:53:53] [PASSED] example_priv_test [11:53:53] =================== example_params_test =================== [11:53:53] [SKIPPED] example value 3 [11:53:53] [PASSED] example value 2 [11:53:53] [PASSED] example value 1 [11:53:53] [SKIPPED] example value 0 [11:53:53] =============== [PASSED] example_params_test =============== [11:53:53] [PASSED] example_slow_test [11:53:53] ======================= (4 subtests) ======================= [11:53:53] [PASSED] procfs [11:53:53] [PASSED] userspace test 2 [11:53:53] [SKIPPED] userspace test 3: some reason [11:53:53] [PASSED] userspace test 4 [11:53:53] ================ [PASSED] example_uapi_test ================ [11:53:53] ===================== [PASSED] example ===================== [11:53:53] ============================================================ [11:53:53] Testing complete. Ran 16 tests: passed: 11, skipped: 5 [11:53:53] Elapsed time: 67.543s total, 1.823s configuring, 65.655s building, 0.058s running Based on v6.16-rc1. Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> --- Changes in v4: - Move Kconfig.nolibc from tools/ to init/ - Drop generic userprogs nolibc integration - Drop generic blob framework - Pick up review tags from David - Extend new kunit TAP parser tests - Add MAINTAINERS entry - Allow CONFIG_KUNIT_UAPI=m - Split /proc validation into dedicated UAPI test - Trim recipient list a bit - Use KUNIT_FAIL_AND_ABORT() over KUNIT_FAIL() - Link to v3: https://lore.kernel.org/r/20250611-kunit-kselftests-v3-0-55e3d148cbc6@linut… Changes in v3: - Reintroduce CONFIG_CC_CAN_LINK_STATIC - Enable CONFIG_ARCH_HAS_NOLIBC for m68k and SPARC - Properly handle 'clean' target for userprogs - Use ramfs over tmpfs to reduce dependencies - Inherit userprogs byte order and ABI from kernel - Drop now unnecessary "#ifndef NOLIBC" - Pick up review tags - Drop usage of __private in blob.h, sparse complains and it is not really necessary - Fix execution on loongarch when using clang - Drop userprogs libgcc handling, it was ugly and is not yet necessary - Link to v2: https://lore.kernel.org/r/20250407-kunit-kselftests-v2-0-454114e287fd@linut… Changes in v2: - Rebase onto v6.15-rc1 - Add documentation and kernel docs - Resolve invalid kconfig breakages - Drop already applied patch "kbuild: implement CONFIG_HEADERS_INSTALL for Usermode Linux" - Drop userprogs CONFIG_WERROR integration, it doesn't need to be part of this series - Replace patch prefix "kconfig" with "kbuild" - Rename kunit_uapi_run_executable() to kunit_uapi_run_kselftest() - Generate private, conflict-free symbols in the blob framework - Handle kselftest exit codes - Handle SIGABRT - Forward output also to kunit debugfs log - Install a fd=0 stdin filedescriptor - Link to v1: https://lore.kernel.org/r/20250217-kunit-kselftests-v1-0-42b4524c3b0a@linut… --- Thomas Weißschuh (15): kbuild: userprogs: avoid duplication of flags inherited from kernel kbuild: userprogs: also inherit byte order and ABI from kernel kbuild: doc: add label for userprogs section init: re-add CONFIG_CC_CAN_LINK_STATIC init: add nolibc build support fs,fork,exit: export symbols necessary for KUnit UAPI support kunit: tool: Add test for nested test result reporting kunit: tool: Don't overwrite test status based on subtest counts kunit: tool: Parse skipped tests from kselftest.h kunit: Always descend into kunit directory during build kunit: qemu_configs: loongarch: Enable LSX/LSAX kunit: Introduce UAPI testing framework kunit: uapi: Add example for UAPI tests kunit: uapi: Introduce preinit executable kunit: uapi: Validate usability of /proc Documentation/dev-tools/kunit/api/index.rst | 5 + Documentation/dev-tools/kunit/api/uapi.rst | 14 + Documentation/kbuild/makefiles.rst | 2 + MAINTAINERS | 11 + Makefile | 7 +- fs/exec.c | 2 + fs/file.c | 1 + fs/filesystems.c | 2 + fs/fs_struct.c | 1 + fs/pipe.c | 2 + include/kunit/uapi.h | 77 ++++++ init/Kconfig | 7 + init/Kconfig.nolibc | 15 + init/Makefile.nolibc | 13 + kernel/exit.c | 3 + kernel/fork.c | 2 + lib/Makefile | 4 - lib/kunit/Kconfig | 14 + lib/kunit/Makefile | 27 +- lib/kunit/kunit-example-test.c | 15 + lib/kunit/kunit-example-uapi.c | 22 ++ lib/kunit/kunit-test-uapi.c | 51 ++++ lib/kunit/kunit-test.c | 23 +- lib/kunit/kunit-uapi.c | 305 +++++++++++++++++++++ lib/kunit/uapi-preinit.c | 63 +++++ tools/testing/kunit/kunit_parser.py | 13 +- tools/testing/kunit/kunit_tool_test.py | 11 + tools/testing/kunit/qemu_configs/loongarch.py | 2 + .../test_is_test_passed-failure-nested.log | 10 + .../test_data/test_is_test_passed-kselftest.log | 3 +- 30 files changed, 714 insertions(+), 13 deletions(-) --- base-commit: 9d5898b413d17510b2a41664a42390a2c79f8bf4 change-id: 20241015-kunit-kselftests-56273bc40442 Best regards, -- Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>

2 months, 2 weeks

7
37
0 0

[PATCH 0/3] module: make structure definitions always visible

by Thomas Weißschuh

Code using IS_ENABLED(CONFIG_MODULES) as a C expression may need access to the module structure definitions to compile. Make sure these structure definitions are always visible. Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> --- Thomas Weißschuh (3): module: move 'struct module_use' to internal.h module: make structure definitions always visible kunit: test: Drop CONFIG_MODULE ifdeffery include/linux/module.h | 30 ++++++++++++------------------ kernel/module/internal.h | 7 +++++++ lib/kunit/test.c | 8 -------- 3 files changed, 19 insertions(+), 26 deletions(-) --- base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494 change-id: 20250611-kunit-ifdef-modules-0fefd13ae153 Best regards, -- Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>

2 months, 2 weeks

4
15
0 0

[PATCH] kunit: Enable PCI on UML without triggering WARN()

by Thomas Weißschuh

Various KUnit tests require PCI infrastructure to work. All normal platforms enable PCI by default, but UML does not. Enabling PCI from .kunitconfig files is problematic as it would not be portable. So in commit 6fc3a8636a7b ("kunit: tool: Enable virtio/PCI by default on UML") PCI was enabled by way of CONFIG_UML_PCI_OVER_VIRTIO=y. However CONFIG_UML_PCI_OVER_VIRTIO requires additional configuration of CONFIG_UML_PCI_OVER_VIRTIO_DEVICE_ID or will otherwise trigger a WARN() in virtio_pcidev_init(). However there is no one correct value for UML_PCI_OVER_VIRTIO_DEVICE_ID which could be used by default. This warning is confusing when debugging test failures. On the other hand, the functionality of CONFIG_UML_PCI_OVER_VIRTIO is not used at all, given that it is completely non-functional as indicated by the WARN() in question. Instead it is only used as a way to enable CONFIG_UML_PCI which itself is not directly configurable. Instead of going through CONFIG_UML_PCI_OVER_VIRTIO, introduce a custom configuration option which enables CONFIG_UML_PCI without triggering warnings or building dead code. Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> --- lib/kunit/Kconfig | 7 +++++++ tools/testing/kunit/configs/arch_uml.config | 5 ++--- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/lib/kunit/Kconfig b/lib/kunit/Kconfig index a97897edd9642f3e5df7fdd9dee26ee5cf00d6a4..c8ca155521b2455a221ddbec3f6fc55662c83475 100644 --- a/lib/kunit/Kconfig +++ b/lib/kunit/Kconfig @@ -93,4 +93,11 @@ config KUNIT_AUTORUN_ENABLED In most cases this should be left as Y. Only if additional opt-in behavior is needed should this be set to N. +config KUNIT_UML_PCI + bool "KUnit UML PCI Support" + depends on UML + select UML_PCI + help + Enables the PCI subsystem on UML for use by KUnit tests. + endif # KUNIT diff --git a/tools/testing/kunit/configs/arch_uml.config b/tools/testing/kunit/configs/arch_uml.config index 54ad8972681a2cc724e6122b19407188910b9025..28edf816aa70e6f408d9486efff8898df79ee090 100644 --- a/tools/testing/kunit/configs/arch_uml.config +++ b/tools/testing/kunit/configs/arch_uml.config @@ -1,8 +1,7 @@ # Config options which are added to UML builds by default -# Enable virtio/pci, as a lot of tests require it. -CONFIG_VIRTIO_UML=y -CONFIG_UML_PCI_OVER_VIRTIO=y +# Enable pci, as a lot of tests require it. +CONFIG_KUNIT_UML_PCI=y # Enable FORTIFY_SOURCE for wider checking. CONFIG_FORTIFY_SOURCE=y --- base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494 change-id: 20250626-kunit-uml-pci-a2b687553746 Best regards, -- Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>

2 months, 2 weeks

2
1
0 0

[PATCH v7 00/28] iommufd: Add vIOMMU infrastructure (Part-4 HW QUEUE)

by Nicolin Chen

The vIOMMU object is designed to represent a slice of an IOMMU HW for its virtualization features shared with or passed to user space (a VM mostly) in a way of HW acceleration. This extended the HWPT-based design for more advanced virtualization feature. HW QUEUE introduced by this series as a part of the vIOMMU infrastructure represents a HW accelerated queue/buffer for VM to use exclusively, e.g. - NVIDIA's Virtual Command Queue - AMD vIOMMU's Command Buffer, Event Log Buffer, and PPR Log Buffer each of which allows its IOMMU HW to directly access a queue memory owned by a guest VM and allows a guest OS to control the HW queue direclty, to avoid VM Exit overheads to improve the performance. Introduce IOMMUFD_OBJ_HW_QUEUE and its pairing IOMMUFD_CMD_HW_QUEUE_ALLOC allowing VMM to forward the IOMMU-specific queue info, such as queue base address, size, and etc. Meanwhile, a guest-owned queue needs the guest kernel to control the queue by reading/writing its consumer and producer indexes, via MMIO acceses to the hardware MMIO registers. Introduce an mmap infrastructure for iommufd to support passing through a piece of MMIO region from the host physical address space to the guest physical address space. The mmap info (offset/ length) used by an mmap syscall must be pre-allocated and returned to the user space via an output driver-data during an IOMMUFD_CMD_HW_QUEUE_ALLOC call. Thus, it requires a driver-specific user data support in the vIOMMU allocation flow. As a real-world use case, this series implements a HW QUEUE support in the tegra241-cmdqv driver for VCMDQs on NVIDIA Grace CPU. In another word, it is also the Tegra CMDQV series Part-2 (user-space support), reworked from Previous RFCv1: https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/ This enables the HW accelerated feature for NVIDIA Grace CPU. Compared to the standard SMMUv3 operating in the nested translation mode trapping CMDQ for TLBI and ATC_INV commands, this gives a huge performance improvement: 70% to 90% reductions of invalidation time were measured by various DMA unmap tests running in a guest OS. // Unmap latencies from "dma_map_benchmark -g @granule -t @threads", // by toggling "/sys/kernel/debug/iommu/tegra241_cmdqv/bypass_vcmdq" @granule | @threads | bypass_vcmdq=1 | bypass_vcmdq=0 4KB 1 35.7 us 5.3 us 16KB 1 41.8 us 6.8 us 64KB 1 68.9 us 9.9 us 128KB 1 109.0 us 12.6 us 256KB 1 187.1 us 18.0 us 4KB 2 96.9 us 6.8 us 16KB 2 97.8 us 7.5 us 64KB 2 151.5 us 10.7 us 128KB 2 257.8 us 12.7 us 256KB 2 443.0 us 17.9 us This is on Github: https://github.com/nicolinc/iommufd/commits/iommufd_hw_queue-v7 Paring QEMU branch for testing: https://github.com/nicolinc/qemu/commits/wip/for_iommufd_hw_queue-v7 Changelog v7 * Rebased on Jason's for-next tree (iommufd_hw_queue-prep series) * Add Reviewed-by from Baolu, Jason, Pranjal * Update kdocs and notes * [iommu] Replace "u32" with "enum iommu_hw_info_type" * [iommufd] Rename vdev->id to vdev->virt_id * [iommufd] Replace macros with inline helpers * [iommufd] Report unmapped_bytes in error path * [iommufd] Add iommufd_access_is_internal helper * [iommufd] Do not drop ops->unmap check for mdevs * [iommufd] Store physical addresses in immap structure * [iommufd] Reorder access and hw_queue object allocations * [iommufd] Scan for an internal access before any unmap call * [iommufd] Drop unused ictx pointer in struct iommufd_hw_queue * [iommufd] Use kcalloc to avoid failure due to memory fragmentation * [tegra] Use "else" * [tegra] Lock destroy() using lvcmdq_mutex v6 https://lore.kernel.org/all/cover.1749884998.git.nicolinc@nvidia.com/ * Rebase on iommufd_hw_queue-prep-v2 * Add Reviewed-by from Kevin and Jason * [iommufd] Update kdocs and notes * [iommufd] Drop redundant pages[i] check * [iommufd] Allow nesting_parent_iova to be 0 * [iommufd] Add iommufd_hw_queue_alloc_phys() * [iommufd] Revise iommufd_viommu_alloc/destroy_mmap APIs * [iommufd] Move destroy ops to vdevice/hw_queue structures * [iommufd] Add union in hw_info struct to share out_data_type field * [iommufd] Replace iopt_pin/unpin_pages() with internal access APIs * [iommufd] Replace vdevice_alloc with vdevice_size and vdevice_init * [iommufd] Replace hw_queue_alloc with get_hw_queue_size/hw_queue_init * [iommufd] Replace IOMMUFD_VIOMMU_FLAG_HW_QUEUE_READS_PA with init_phys * [smmu] Drop arm_smmu_domain_ipa_to_pa * [smmu] Update arm_smmu_impl_ops changes for vsmmu_init * [tegra] Add a vdev_to_vsid macro * [tegra] Add lvcmdq_mutex to protect multi queues * [tegra] Drop duplicated kcalloc for vintf->lvcmdqs (memory leak) v5 https://lore.kernel.org/all/cover.1747537752.git.nicolinc@nvidia.com/ * Rebase on v6.15-rc6 * Add Reviewed-by from Jason and Kevin * Correct typos in kdoc and update commit logs * [iommufd] Add a cosmetic fix * [iommufd] Drop unused num_pfns * [iommufd] Drop unnecessary check * [iommufd] Reorder patch sequence * [iommufd] Use io_remap_pfn_range() * [iommufd] Use success oriented flow * [iommufd] Fix max_npages calculation * [iommufd] Add more selftest coverage * [iommufd] Drop redundant static_assert * [iommufd] Fix mmap pfn range validation * [iommufd] Reject unmap on pinned iovas * [iommufd] Drop redundant vm_flags_set() * [iommufd] Drop iommufd_struct_destroy() * [iommufd] Drop redundant queue iova test * [iommufd] Use "mmio_addr" and "mmio_pfn" * [iommufd] Rename to "nesting_parent_iova" * [iommufd] Make iopt_pin_pages call option * [iommufd] Add ictx comparison in depend() * [iommufd] Add iommufd_object_alloc_ucmd() * [iommufd] Move kcalloc() after validations * [iommufd] Replace ictx setting with WARN_ON * [iommufd] Make hw_info's type bidirectional * [smmu] Add supported_vsmmu_type in impl_ops * [smmu] Drop impl report in smmu vendor struct * [tegra] Add IOMMU_HW_INFO_TYPE_TEGRA241_CMDQV * [tegra] Replace "number of VINTFs" with a note * [tegra] Drop the redundant lvcmdq pointer setting * [tegra] Flag IOMMUFD_VIOMMU_FLAG_HW_QUEUE_READS_PA * [tegra] Use "vintf_alloc_vsid" for vdevice_alloc op v4 https://lore.kernel.org/all/cover.1746757630.git.nicolinc@nvidia.com/ * Rebase on v6.15-rc5 * Add Reviewed-by from Vasant * Rename "vQUEUE" to "HW QUEUE" * Use "offset" and "length" for all mmap-related variables * [iommufd] Use u64 for guest PA * [iommufd] Fix typo in uAPI doc * [iommufd] Rename immap_id to offset * [iommufd] Drop the partial-size mmap support * [iommufd] Do not replace WARN_ON with WARN_ON_ONCE * [iommufd] Use "u64 base_addr" for queue base address * [iommufd] Use u64 base_pfn/num_pfns for immap structure * [iommufd] Correct the size passed in to mtree_alloc_range() * [iommufd] Add IOMMUFD_VIOMMU_FLAG_HW_QUEUE_READS_PA to viommu_ops v3 https://lore.kernel.org/all/cover.1746139811.git.nicolinc@nvidia.com/ * Add Reviewed-by from Baolu, Pranjal, and Alok * Revise kdocs, uAPI docs, and commit logs * Rename "vCMDQ" back to "vQUEUE" for AMD cases * [tegra] Add tegra241_vcmdq_hw_flush_timeout() * [tegra] Rename vsmmu_alloc to alloc_vintf_user * [tegra] Use writel for SID replacement registers * [tegra] Move mmap removal call to vsmmu_destroy op * [tegra] Fix revert in tegra241_vintf_alloc_lvcmdq_user() * [iommufd] Replace "& ~PAGE_MASK" with PAGE_ALIGNED() * [iommufd] Add an object-type "owner" to immap structure * [iommufd] Drop the ictx input in the new for-driver APIs * [iommufd] Add iommufd_vma_ops to keep track of mmap lifecycle * [iommufd] Add viommu-based iommufd_viommu_alloc/destroy_mmap helpers * [iommufd] Rename iommufd_ctx_alloc/free_mmap to _iommufd_alloc/destroy_mmap v2 https://lore.kernel.org/all/cover.1745646960.git.nicolinc@nvidia.com/ * Add Reviewed-by from Jason * [smmu] Fix vsmmu initial value * [smmu] Support impl for hw_info * [tegra] Rename "slot" to "vsid" * [tegra] Update kdocs and commit logs * [tegra] Map/unmap LVCMDQ dynamically * [tegra] Refcount the previous LVCMDQ * [tegra] Return -EEXIST if LVCMDQ exists * [tegra] Simplify VINTF cleanup routine * [tegra] Use vmid and s2_domain in vsmmu * [tegra] Rename "mmap_pgoff" to "immap_id" * [tegra] Add more addr and length validation * [iommufd] Add more narrative to mmap's kdoc * [iommufd] Add iommufd_struct_depend/undepend() * [iommufd] Rename vcmdq_free op to vcmdq_destroy * [iommufd] Fix bug in iommu_copy_struct_to_user() * [iommufd] Drop is_io from iommufd_ctx_alloc_mmap() * [iommufd] Test the queue memory for its contiguity * [iommufd] Return -ENXIO if address or length fails * [iommufd] Do not change @min_last in mock_viommu_alloc() * [iommufd] Generalize TEGRA241_VCMDQ data in core structure * [iommufd] Add selftest coverage for IOMMUFD_CMD_VCMDQ_ALLOC * [iommufd] Add iopt_pin_pages() to prevent queue memory from unmapping v1 https://lore.kernel.org/all/cover.1744353300.git.nicolinc@nvidia.com/ Thanks Nicolin Nicolin Chen (28): iommufd: Report unmapped bytes in the error path of iopt_unmap_iova_range iommufd/viommu: Explicitly define vdev->virt_id iommu: Use enum iommu_hw_info_type for type in hw_info op iommu: Add iommu_copy_struct_to_user helper iommu: Pass in a driver-level user data structure to viommu_init op iommufd/viommu: Allow driver-specific user data for a vIOMMU object iommufd/selftest: Support user_data in mock_viommu_alloc iommufd/selftest: Add coverage for viommu data iommufd/access: Add internal APIs for HW queue to use iommufd/access: Bypass access->ops->unmap for internal use iommufd/viommu: Add driver-defined vDEVICE support iommufd/viommu: Introduce IOMMUFD_OBJ_HW_QUEUE and its related struct iommufd/viommu: Add IOMMUFD_CMD_HW_QUEUE_ALLOC ioctl iommufd/driver: Add iommufd_hw_queue_depend/undepend() helpers iommufd/selftest: Add coverage for IOMMUFD_CMD_HW_QUEUE_ALLOC iommufd: Add mmap interface iommufd/selftest: Add coverage for the new mmap interface Documentation: userspace-api: iommufd: Update HW QUEUE iommu: Allow an input type in hw_info op iommufd: Allow an input data_type via iommu_hw_info iommufd/selftest: Update hw_info coverage for an input data_type iommu/arm-smmu-v3-iommufd: Add vsmmu_size/type and vsmmu_init impl ops iommu/arm-smmu-v3-iommufd: Add hw_info to impl_ops iommu/tegra241-cmdqv: Use request_threaded_irq iommu/tegra241-cmdqv: Simplify deinit flow in tegra241_cmdqv_remove_vintf() iommu/tegra241-cmdqv: Do not statically map LVCMDQs iommu/tegra241-cmdqv: Add user-space use support iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 22 +- drivers/iommu/iommufd/iommufd_private.h | 50 +- drivers/iommu/iommufd/iommufd_test.h | 20 + include/linux/iommu.h | 50 +- include/linux/iommufd.h | 160 ++++++ include/uapi/linux/iommufd.h | 145 +++++- tools/testing/selftests/iommu/iommufd_utils.h | 89 +++- .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 28 +- .../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 484 +++++++++++++++++- drivers/iommu/intel/iommu.c | 7 +- drivers/iommu/iommufd/device.c | 90 +++- drivers/iommu/iommufd/driver.c | 81 ++- drivers/iommu/iommufd/io_pagetable.c | 17 +- drivers/iommu/iommufd/main.c | 69 +++ drivers/iommu/iommufd/selftest.c | 153 +++++- drivers/iommu/iommufd/viommu.c | 208 +++++++- tools/testing/selftests/iommu/iommufd.c | 143 +++++- .../selftests/iommu/iommufd_fail_nth.c | 15 +- Documentation/userspace-api/iommufd.rst | 12 + 19 files changed, 1736 insertions(+), 107 deletions(-) -- 2.43.0

2 months, 3 weeks

4
66
0 0

[PATCH v2 00/14] stackleak: Support Clang stack depth tracking

by Kees Cook

v2: - rename stackleak to kstack_erase (mingo) - address __init vs inline with KCOV changes v1: https://lore.kernel.org/lkml/20250507180852.work.231-kees@kernel.org/ RFC: https://lore.kernel.org/lkml/20250502185834.work.560-kees@kernel.org/ Hi, As part of looking at what GCC plugins could be replaced with Clang implementations, this series uses the recently landed stack depth tracking callback in Clang[1] to implement the stackleak feature. Since the Clang feature is now landed, I'm moving this out of RFC to a v1. Since this touches a lot of arch-specific Makefiles, I tried to trim the CC list down to just mailing lists in those cases, otherwise the CC was giant. Thanks! -Kees [1] https://clang.llvm.org/docs/SanitizerCoverage.html#tracing-stack-depth Kees Cook (14): stackleak: Rename STACKLEAK to KSTACK_ERASE stackleak: Rename stackleak_track_stack to __sanitizer_cov_stack_depth stackleak: Split KSTACK_ERASE_CFLAGS from GCC_PLUGINS_CFLAGS x86: Handle KCOV __init vs inline mismatches arm: Handle KCOV __init vs inline mismatches arm64: Handle KCOV __init vs inline mismatches s390: Handle KCOV __init vs inline mismatches powerpc: Handle KCOV __init vs inline mismatches mips: Handle KCOV __init vs inline mismatches loongarch: Handle KCOV __init vs inline mismatches init.h: Disable sanitizer coverage for __init and __head kstack_erase: Support Clang stack depth tracking configs/hardening: Enable CONFIG_KSTACK_ERASE configs/hardening: Enable CONFIG_INIT_ON_FREE_DEFAULT_ON arch/Kconfig | 4 +- arch/arm/Kconfig | 2 +- arch/arm64/Kconfig | 2 +- arch/riscv/Kconfig | 2 +- arch/s390/Kconfig | 2 +- arch/x86/Kconfig | 2 +- security/Kconfig.hardening | 45 +++++++++------- Makefile | 1 + arch/arm/boot/compressed/Makefile | 2 +- arch/arm/vdso/Makefile | 2 +- arch/arm64/kernel/pi/Makefile | 2 +- arch/arm64/kernel/vdso/Makefile | 3 +- arch/arm64/kvm/hyp/nvhe/Makefile | 2 +- arch/riscv/kernel/pi/Makefile | 2 +- arch/riscv/purgatory/Makefile | 2 +- arch/sparc/vdso/Makefile | 3 +- arch/x86/entry/vdso/Makefile | 3 +- arch/x86/purgatory/Makefile | 2 +- drivers/firmware/efi/libstub/Makefile | 8 +-- drivers/misc/lkdtm/Makefile | 2 +- kernel/Makefile | 10 ++-- lib/Makefile | 2 +- scripts/Makefile.gcc-plugins | 16 +----- scripts/Makefile.kstack_erase | 21 ++++++++ scripts/gcc-plugins/stackleak_plugin.c | 52 +++++++++---------- Documentation/admin-guide/sysctl/kernel.rst | 4 +- Documentation/arch/x86/x86_64/mm.rst | 2 +- Documentation/security/self-protection.rst | 2 +- .../zh_CN/security/self-protection.rst | 2 +- arch/arm64/include/asm/acpi.h | 2 +- arch/loongarch/include/asm/smp.h | 2 +- arch/mips/include/asm/time.h | 2 +- arch/s390/hypfs/hypfs.h | 2 +- arch/s390/hypfs/hypfs_diag.h | 2 +- arch/x86/entry/calling.h | 4 +- arch/x86/include/asm/acpi.h | 4 +- arch/x86/include/asm/init.h | 2 +- arch/x86/include/asm/realmode.h | 2 +- include/linux/acpi.h | 4 +- include/linux/bootconfig.h | 2 +- include/linux/efi.h | 2 +- include/linux/init.h | 4 +- include/linux/{stackleak.h => kstack_erase.h} | 20 +++---- include/linux/memblock.h | 2 +- include/linux/mfd/dbx500-prcmu.h | 2 +- include/linux/sched.h | 4 +- arch/arm/kernel/entry-common.S | 2 +- arch/arm64/kernel/entry.S | 2 +- arch/riscv/kernel/entry.S | 2 +- arch/s390/kernel/entry.S | 2 +- arch/arm/mm/cache-feroceon-l2.c | 2 +- arch/arm/mm/cache-tauros2.c | 2 +- arch/loongarch/kernel/time.c | 2 +- arch/loongarch/mm/ioremap.c | 4 +- arch/powerpc/mm/book3s64/hash_utils.c | 2 +- arch/powerpc/mm/book3s64/radix_pgtable.c | 2 +- arch/s390/mm/init.c | 2 +- arch/x86/kernel/kvm.c | 2 +- drivers/clocksource/timer-orion.c | 2 +- .../lkdtm/{stackleak.c => kstack_erase.c} | 26 +++++----- drivers/platform/x86/thinkpad_acpi.c | 4 +- drivers/soc/ti/pm33xx.c | 2 +- fs/proc/base.c | 6 +-- kernel/fork.c | 2 +- kernel/{stackleak.c => kstack_erase.c} | 22 ++++---- tools/objtool/check.c | 4 +- tools/testing/selftests/lkdtm/config | 2 +- MAINTAINERS | 6 ++- kernel/configs/hardening.config | 6 +++ 69 files changed, 203 insertions(+), 171 deletions(-) create mode 100644 scripts/Makefile.kstack_erase rename include/linux/{stackleak.h => kstack_erase.h} (81%) rename drivers/misc/lkdtm/{stackleak.c => kstack_erase.c} (89%) rename kernel/{stackleak.c => kstack_erase.c} (87%) -- 2.34.1

2 months, 3 weeks

8
28
0 0

[PATCH net] selftests: drv-net: rss_ctx: Add short delay between per-context traffic checks

by Nimrod Oren

A few packets may still be sent and received during the termination of the iperf processes. These late packets cause failures when they arrive on queues expected to be empty. Add a one second delay between repeated _send_traffic_check() calls in rss_ctx tests to ensure such packets are processed before the next traffic checks are performed. Example failure observed: Check failed 2 != 0 traffic on inactive queues (context 1): [0, 0, 1, 1, 386385, 397196, 0, 0, 0, 0, ...] Check failed 4 != 0 traffic on inactive queues (context 2): [0, 0, 0, 0, 2, 2, 247152, 253013, 0, 0, ...] Check failed 2 != 0 traffic on inactive queues (context 3): [0, 0, 0, 0, 0, 0, 1, 1, 282434, 283070, ...] Note: While the `noise` parameter could be used to tolerate these late packets, it would be inappropriate here. `noise` tolerates far more traffic than acceptable in this case, risking false positives. Inactive queues are supposed to see zero traffic. Fixes: 847aa551fa78 ("selftests: drv-net: rss_ctx: factor out send traffic and check") Reviewed-by: Gal Pressman <gal(a)nvidia.com> Reviewed-by: Carolina Jubran <cjubran(a)nvidia.com> Signed-off-by: Nimrod Oren <noren(a)nvidia.com> --- tools/testing/selftests/drivers/net/hw/rss_ctx.py | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/testing/selftests/drivers/net/hw/rss_ctx.py b/tools/testing/selftests/drivers/net/hw/rss_ctx.py index 7bb552f8b182..19be69227693 100755 --- a/tools/testing/selftests/drivers/net/hw/rss_ctx.py +++ b/tools/testing/selftests/drivers/net/hw/rss_ctx.py @@ -4,6 +4,7 @@ import datetime import random import re +import time from lib.py import ksft_run, ksft_pr, ksft_exit from lib.py import ksft_eq, ksft_ne, ksft_ge, ksft_in, ksft_lt, ksft_true, ksft_raises from lib.py import NetDrvEpEnv @@ -492,6 +493,7 @@ def test_rss_context(cfg, ctx_cnt=1, create_with_cfg=None): { 'target': (2+i*2, 3+i*2), 'noise': (0, 1), 'empty': list(range(2, 2+i*2)) + list(range(4+i*2, 2+2*ctx_cnt)) }) + time.sleep(1) if requested_ctx_cnt != ctx_cnt: raise KsftSkipEx(f"Tested only {ctx_cnt} contexts, wanted {requested_ctx_cnt}") @@ -559,6 +561,7 @@ def test_rss_context_out_of_order(cfg, ctx_cnt=4): } _send_traffic_check(cfg, ports[i], f"context {i}", expected) + time.sleep(1) # Use queues 0 and 1 for normal traffic ethtool(f"-X {cfg.ifname} equal 2") -- 2.37.1

2 months, 3 weeks

2
3
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror June 2025