October 2025 - Linux-kselftest-mirror

[PATCH v5 net-next 00/14] AccECN protocol case handling series

by chia-yu.chang＠nokia-bell-labs.com

From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com> Hello, Plesae find the v5 AccECN case handling patch series, which covers several excpetional case handling of Accurate ECN spec (RFC9768), adds new identifiers to be used by CC modules, adds ecn_delta into rate_sample, and keeps the ACE counter for computation, etc. This patch series is part of the full AccECN patch series, which is available at https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/ Best regards, Chia-Yu --- v5: - Move previous #11 in v4 in latter patch after discussion with RFC author. - Add #3 to update the comments for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN. (Parav Pandit <parav(a)nvidia.com>) - Add gro self-test for TCP CWR flag in #4. (Eric Dumazet <edumazet(a)google.com>) - Add fixes: tag into #7 (Paolo Abeni <pabeni(a)redhat.com>) - Update commit message of #8 and if condition check (Paolo Abeni <pabeni(a)redhat.com>) - Add empty line between variable declarations and code in #13 (Paolo Abeni <pabeni(a)redhat.com>) v4: - Add previous #13 in v2 back after dicussion with the RFC author. - Add TCP_ACCECN_OPTION_PERSIST to tcp_ecn_option sysctl to ignore AccECN fallback policy on sending AccECN option. v3: - Add additional min() check if pkts_acked_ewma is not initialized in #1. (Paolo Abeni <pabeni(a)redhat.com>) - Change TCP_CONG_WANTS_ECT_1 into individual flag add helper function INET_ECN_xmit_wants_ect_1() in #3. (Paolo Abeni <pabeni(a)redhat.com>) - Add empty line between variable declarations and code in #4. (Paolo Abeni <pabeni(a)redhat.com>) - Update commit message to fix old AccECN commits in #5. (Paolo Abeni <pabeni(a)redhat.com>) - Remove unnecessary brackets in #10. (Paolo Abeni <pabeni(a)redhat.com>) - Move patch #3 in v2 to a later Prague patch serise and remove patch #13 in v2. (Paolo Abeni <pabeni(a)redhat.com>) --- Chia-Yu Chang (12): net: update commnets for SKB_GSO_TCP_ECN and SKB_GSO_TCP_ACCECN selftests/net: gro: add self-test for TCP CWR flag tcp: L4S ECT(1) identifier and NEEDS_ACCECN for CC modules tcp: disable RFC3168 fallback identifier for CC modules tcp: accecn: handle unexpected AccECN negotiation feedback tcp: accecn: retransmit downgraded SYN in AccECN negotiation tcp: move increment of num_retrans tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion tcp: accecn: fallback outgoing half link to non-AccECN tcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSIST tcp: accecn: enable AccECN Ilpo Järvinen (2): tcp: try to avoid safer when ACKs are thinned gro: flushing when CWR is set negatively affects AccECN Documentation/networking/ip-sysctl.rst | 4 +- .../networking/net_cachelines/tcp_sock.rst | 1 + include/linux/skbuff.h | 13 ++- include/linux/tcp.h | 4 +- include/net/inet_ecn.h | 20 +++- include/net/tcp.h | 32 ++++++- include/net/tcp_ecn.h | 92 ++++++++++++++----- net/ipv4/sysctl_net_ipv4.c | 4 +- net/ipv4/tcp.c | 2 + net/ipv4/tcp_cong.c | 10 +- net/ipv4/tcp_input.c | 37 +++++++- net/ipv4/tcp_minisocks.c | 40 +++++--- net/ipv4/tcp_offload.c | 3 +- net/ipv4/tcp_output.c | 42 ++++++--- tools/testing/selftests/net/gro.c | 80 +++++++++++----- 15 files changed, 294 insertions(+), 90 deletions(-) -- 2.34.1

1 month, 2 weeks

4
38
0 0

[PATCH net-next v14 0/7] bonding: Extend arp_ip_target format to allow for a list of vlan tags.

by David Wilder

The current implementation of the arp monitor builds a list of vlan-tags by following the chain of net_devices above the bond. See bond_verify_device_path(). Unfortunately, with some configurations, this is not possible. One example is when an ovs switch is configured above the bond. This change extends the "arp_ip_target" parameter format to allow for a list of vlan tags to be included for each arp target. This new list of tags is optional and may be omitted to preserve the current format and process of discovering vlans. The new format for arp_ip_target is: arp_ip_target ipv4-address[vlan-tag\...],... For example: arp_ip_target 10.0.0.1[10/20] arp_ip_target 10.0.0.1[] (used to disable vlan discovery) Changes since V13 Thanks for the help Paolo: - Changed first argument of bond_option_arp_ip_target_add() to a const. - Changed first argument of bond_arp_target_to_string to a const. - Added compiler time check of size argument to: bond_arp_target_to_string(), BUILD_BUG_ON(size != BOND_OPTION_STRING_MAX_SIZE); - In bond_arp_send_all() I changed the condition for both the allocation and the free calls to be the same to improve the clarity of the code. - Removed extra tab in bond_fill_info(). - Updated update bond_get_size() to reflect the increased payload for the arp_ip_target option. - Corrected indentation and alignment in bond-arp-ip-target.sh. Changes since V12 Fixed uninitialized variable in bond_option_arp_ip_targets_set() (patch 4) causing a CI failure. Changes since V11 No Change. Changes since V10 Thanks Paolo: - 1/7 Changed the layout of struct bond_arp_target to reduce size of the struct. - 3/7 Fixed format 'size-num' -> 'size - num' - 7/7 Updated selftest (bond-arp-ip-target.sh). Removed sleep 10 in check_failure_count(). Added call to tc to verify arp probes are reaching the target interface. Then I verify that the Link Failure counts are not increasing over "time". Arp probes are sent every 100ms, two missed probes will trigger a Link failure. A one second wait between checking counts should be be more than sufficient. This speeds up the execution of the test. Thanks Nikolay: - 4/7 In bond_option_arp_ip_targets_clear() I changed the definition of empty_target to empty_target = {}. - bond_validate_tags() now verifies input is a multiple of sizeof(struct bond_vlan_tag). Updated VID validity check to use: !tags->vlan_id || tags->vlan_id >= VLAN_VID_MASK) as suggested. - In bond_option_arp_ip_targets_set() removed the redundant length check of target.target_ip. - Added kfree(target.tags) when bond_option_arp_ip_target_add() results in an error. - Removed the caching of struct bond_vlan_tag returned by bond_verify_device_path(), Nikolay pointed out that caching tags prevented the detection of VLAN configuration changes. Added a kfree(tags) for tags allocated in bond_verify_device_path(). Jay, Nikolay and I had a discussion regarding locking when adding, deleting or changing vlan tags. Jay pointed out that user supplied tags that are stashed in the bond configuration and can only be changed via user space this can be done safely in an RCU manner as netlink always operates with RTNL held. If user space provided tags and then replumbs things, it'll be on user space to update the tags in a safe manor. I was concerned about changing options on a configured bond, I found that attempting to change a bonds configuration (using "ip set") will abort the attempt to make a change if the bond's state is "UP" or has slaves configured. Therefor the configuration and operational side of a bond is separated. I agree with Jay that the existing locking scheme is sufficient. Change since V9 Fix kdoc build error. Changes since V8: Moved the #define BOND_MAX_VLAN_TAGS from patch 6 to patch 3. Thanks Simon for catching the bisection break. Changes since V7: These changes should eliminate the CI failures I have been seeing. 1) patch 2, changed type of bond_opt_value.extra_len to size_t. 2) Patch 4, added bond_validate_tags() to validate the array of bond_vlan_tag provided by the user. Changes since V6: 1) I made a number of changes to fix the failure seen in the kernel CI. I am still unable to reproduce the this failure, hopefully I have fixed it. These change are in patch #4 to functions: bond_option_arp_ip_targets_clear() and bond_option_arp_ip_targets_set() Changes since V5: Only the last 2 patches have changed since V5. 1) Fixed sparse warning in bond_fill_info(). 2) Also in bond_fill_info() I resolved data.addr uninitialized when if condition is not met. Thank you Simon for catching this. Note: The change is different that what I shared earlier. 3) Fixed shellcheck warnings in test script: Blocked source warning, Ignored specific unassigned references and exported ALL_TESTS to resolve a reference warning. Changes since V4: 1)Dropped changes to proc and sysfs APIs to bonding. These APIs do not need to be updated to support new functionality. Netlink and iproute2 have been updated to do the right thing, but the other APIs are more or less frozen in the past. 2)Jakub reported a warning triggered in bond_info_seq_show() during testing. I was unable to reproduce this warning or identify it with code inspection. However, all my changes to bond_info_seq_show() have been dropped as unnecessary (see above). Hopefully this will resolve the issue. 3)Selftest script has been updated based on the results of shellcheck. Two unresolved references that are not possible to resolve are all that remain. 4)A patch was added updating bond_info_fill() to support "ip -d show <bond-device>" command. The inclusion of a list of vlan tags is optional. The new logic preserves both forward and backward compatibility with the kernel and iproute2 versions. Changes since V3: 1) Moved the parsing of the extended arp_ip_target out of the kernel and into userspace (ip command). A separate patch to iproute2 to follow shortly. 2) Split up the patch set to make review easier. Please see iproute changes in a separate posting. Thank you for your time and reviews. Signed-off-by: David Wilder <wilder(a)us.ibm.com> David Wilder (7): bonding: Adding struct bond_arp_target bonding: Adding extra_len field to struct bond_opt_value. bonding: arp_ip_target helpers. bonding: Processing extended arp_ip_target from user space. bonding: Update to bond_arp_send_all() to use supplied vlan tags bonding: Update for extended arp_ip_target format. bonding: Selftest and documentation for the arp_ip_target parameter. Documentation/networking/bonding.rst | 11 + drivers/net/bonding/bond_main.c | 48 +++-- drivers/net/bonding/bond_netlink.c | 39 +++- drivers/net/bonding/bond_options.c | 146 ++++++++++--- drivers/net/bonding/bond_procfs.c | 4 +- drivers/net/bonding/bond_sysfs.c | 4 +- include/net/bond_options.h | 29 ++- include/net/bonding.h | 67 +++++- .../selftests/drivers/net/bonding/Makefile | 1 + .../drivers/net/bonding/bond-arp-ip-target.sh | 204 ++++++++++++++++++ 10 files changed, 474 insertions(+), 79 deletions(-) create mode 100755 tools/testing/selftests/drivers/net/bonding/bond-arp-ip-target.sh -- 2.50.1

1 month, 2 weeks

4
10
0 0

[PATCH v22 00/28] riscv control-flow integrity for usermode

by Deepak Gupta via B4 Relay

v22: fixing build error due to -march=zicfiss being picked in gcc-13 and above but not actually doing any codegen or recognizing instruction for zicfiss. Change in v22 makes dependence on `-fcf-protection=full` compiler flag to ensure that toolchain has support and then only CONFIG_RISCV_USER_CFI will be visible in menuconfig. v21: fixed build errors. Basics and overview =================== Software with larger attack surfaces (e.g. network facing apps like databases, browsers or apps relying on browser runtimes) suffer from memory corruption issues which can be utilized by attackers to bend control flow of the program to eventually gain control (by making their payload executable). Attackers are able to perform such attacks by leveraging call-sites which rely on indirect calls or return sites which rely on obtaining return address from stack memory. To mitigate such attacks, risc-v extension zicfilp enforces that all indirect calls must land on a landing pad instruction `lpad` else cpu will raise software check exception (a new cpu exception cause code on riscv). Similarly for return flow, risc-v extension zicfiss extends architecture with - `sspush` instruction to push return address on a shadow stack - `sspopchk` instruction to pop return address from shadow stack and compare with input operand (i.e. return address on stack) - `sspopchk` to raise software check exception if comparision above was a mismatch - Protection mechanism using which shadow stack is not writeable via regular store instructions More information an details can be found at extensions github repo [1]. Equivalent to landing pad (zicfilp) on x86 is `ENDBRANCH` instruction in Intel CET [3] and branch target identification (BTI) [4] on arm. Similarly x86's Intel CET has shadow stack [5] and arm64 has guarded control stack (GCS) [6] which are very similar to risc-v's zicfiss shadow stack. x86 and arm64 support for user mode shadow stack is already in mainline. Kernel awareness for user control flow integrity ================================================ This series picks up Samuel Holland's envcfg changes [2] as well. So if those are being applied independently, they should be removed from this series. Enabling: In order to maintain compatibility and not break anything in user mode, kernel doesn't enable control flow integrity cpu extensions on binary by default. Instead exposes a prctl interface to enable, disable and lock the shadow stack or landing pad feature for a task. This allows userspace (loader) to enumerate if all objects in its address space are compiled with shadow stack and landing pad support and accordingly enable the feature. Additionally if a subsequent `dlopen` happens on a library, user mode can take a decision again to disable the feature (if incoming library is not compiled with support) OR terminate the task (if user mode policy is strict to have all objects in address space to be compiled with control flow integirty cpu feature). prctl to enable shadow stack results in allocating shadow stack from virtual memory and activating for user address space. x86 and arm64 are also following same direction due to similar reason(s). clone/fork: On clone and fork, cfi state for task is inherited by child. Shadow stack is part of virtual memory and is a writeable memory from kernel perspective (writeable via a restricted set of instructions aka shadow stack instructions) Thus kernel changes ensure that this memory is converted into read-only when fork/clone happens and COWed when fault is taken due to sspush, sspopchk or ssamoswap. In case `CLONE_VM` is specified and shadow stack is to be enabled, kernel will automatically allocate a shadow stack for that clone call. map_shadow_stack: x86 introduced `map_shadow_stack` system call to allow user space to explicitly map shadow stack memory in its address space. It is useful to allocate shadow for different contexts managed by a single thread (green threads or contexts) risc-v implements this system call as well. signal management: If shadow stack is enabled for a task, kernel performs an asynchronous control flow diversion to deliver the signal and eventually expects userspace to issue sigreturn so that original execution can be resumed. Even though resume context is prepared by kernel, it is in user space memory and is subject to memory corruption and corruption bugs can be utilized by attacker in this race window to perform arbitrary sigreturn and eventually bypass cfi mechanism. Another issue is how to ensure that cfi related state on sigcontext area is not trampled by legacy apps or apps compiled with old kernel headers. In order to mitigate control-flow hijacting, kernel prepares a token and place it on shadow stack before signal delivery and places address of token in sigcontext structure. During sigreturn, kernel obtains address of token from sigcontext struture, reads token from shadow stack and validates it and only then allow sigreturn to succeed. Compatiblity issue is solved by adopting dynamic sigcontext management introduced for vector extension. This series re-factor the code little bit to allow future sigcontext management easy (as proposed by Andy Chiu from SiFive) config and compilation: Introduce a new risc-v config option `CONFIG_RISCV_USER_CFI`. Selecting this config option picks the kernel support for user control flow integrity. This optin is presented only if toolchain has shadow stack and landing pad support. And is on purpose guarded by toolchain support. Reason being that eventually vDSO also needs to be compiled in with shadow stack and landing pad support. vDSO compile patches are not included as of now because landing pad labeling scheme is yet to settle for usermode runtime. To get more information on kernel interactions with respect to zicfilp and zicfiss, patch series adds documentation for `zicfilp` and `zicfiss` in following: Documentation/arch/riscv/zicfiss.rst Documentation/arch/riscv/zicfilp.rst How to test this series ======================= Toolchain --------- $ git clone git@github.com:sifive/riscv-gnu-toolchain.git -b cfi-dev $ riscv-gnu-toolchain/configure --prefix=<path-to-where-to-build> --with-arch=rv64gc_zicfilp_zicfiss --enable-linux --disable-gdb --with-extra-multilib-test="rv64gc_zicfilp_zicfiss-lp64d:-static" $ make -j$(nproc) Qemu ---- Get the lastest qemu $ cd qemu $ mkdir build $ cd build $ ../configure --target-list=riscv64-softmmu $ make -j$(nproc) Opensbi ------- $ git clone git@github.com:deepak0414/opensbi.git -b v6_cfi_spec_split_opensbi $ make CROSS_COMPILE=<your riscv toolchain> -j$(nproc) PLATFORM=generic Linux ----- Running defconfig is fine. CFI is enabled by default if the toolchain supports it. $ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) defconfig $ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) Running ------- Modify your qemu command to have: -bios <path-to-cfi-opensbi>/build/platform/generic/firmware/fw_dynamic.bin -cpu rv64,zicfilp=true,zicfiss=true,zimop=true,zcmop=true References ========== [1] - https://github.com/riscv/riscv-cfi [2] - https://lore.kernel.org/all/20240814081126.956287-1-samuel.holland@sifive.c… [3] - https://lwn.net/Articles/889475/ [4] - https://developer.arm.com/documentation/109576/0100/Branch-Target-Identific… [5] - https://www.intel.com/content/dam/develop/external/us/en/documents/catc17-i… [6] - https://lwn.net/Articles/940403/ To: Thomas Gleixner <tglx(a)linutronix.de> To: Ingo Molnar <mingo(a)redhat.com> To: Borislav Petkov <bp(a)alien8.de> To: Dave Hansen <dave.hansen(a)linux.intel.com> To: x86(a)kernel.org To: H. Peter Anvin <hpa(a)zytor.com> To: Andrew Morton <akpm(a)linux-foundation.org> To: Liam R. Howlett <Liam.Howlett(a)oracle.com> To: Vlastimil Babka <vbabka(a)suse.cz> To: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> To: Paul Walmsley <paul.walmsley(a)sifive.com> To: Palmer Dabbelt <palmer(a)dabbelt.com> To: Albert Ou <aou(a)eecs.berkeley.edu> To: Conor Dooley <conor(a)kernel.org> To: Rob Herring <robh(a)kernel.org> To: Krzysztof Kozlowski <krzk+dt(a)kernel.org> To: Arnd Bergmann <arnd(a)arndb.de> To: Christian Brauner <brauner(a)kernel.org> To: Peter Zijlstra <peterz(a)infradead.org> To: Oleg Nesterov <oleg(a)redhat.com> To: Eric Biederman <ebiederm(a)xmission.com> To: Kees Cook <kees(a)kernel.org> To: Jonathan Corbet <corbet(a)lwn.net> To: Shuah Khan <shuah(a)kernel.org> To: Jann Horn <jannh(a)google.com> To: Conor Dooley <conor+dt(a)kernel.org> To: Miguel Ojeda <ojeda(a)kernel.org> To: Alex Gaynor <alex.gaynor(a)gmail.com> To: Boqun Feng <boqun.feng(a)gmail.com> To: Gary Guo <gary(a)garyguo.net> To: Björn Roy Baron <bjorn3_gh(a)protonmail.com> To: Benno Lossin <benno.lossin(a)proton.me> To: Andreas Hindborg <a.hindborg(a)kernel.org> To: Alice Ryhl <aliceryhl(a)google.com> To: Trevor Gross <tmgross(a)umich.edu> Cc: linux-kernel(a)vger.kernel.org Cc: linux-fsdevel(a)vger.kernel.org Cc: linux-mm(a)kvack.org Cc: linux-riscv(a)lists.infradead.org Cc: devicetree(a)vger.kernel.org Cc: linux-arch(a)vger.kernel.org Cc: linux-doc(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: alistair.francis(a)wdc.com Cc: richard.henderson(a)linaro.org Cc: jim.shu(a)sifive.com Cc: andybnac(a)gmail.com Cc: kito.cheng(a)sifive.com Cc: charlie(a)rivosinc.com Cc: atishp(a)rivosinc.com Cc: evan(a)rivosinc.com Cc: cleger(a)rivosinc.com Cc: alexghiti(a)rivosinc.com Cc: samitolvanen(a)google.com Cc: broonie(a)kernel.org Cc: rick.p.edgecombe(a)intel.com Cc: rust-for-linux(a)vger.kernel.org changelog --------- v22: - CONFIG_RISCV_USER_CFI was by default "n". With dual vdso support it is default "y" (if toolchain supports it). Fixing build error due to "-march=zicfiss" being picked in gcc-13 partially. gcc-13 only recognizes the flag but not actually doing any codegen or recognizing instruction for zicfiss. Change in v22 makes dependence on `-fcf-protection=full` compiler flag to ensure that toolchain has support and then only CONFIG_RISCV_USER_CFI will be visible in menuconfig. - picked up tags and some cosmetic changes in commit message for dual vdso patch. v21: - Fixing build errors due to changes in arch/riscv/include/asm/vdso.h Using #ifdef instead of IS_ENABLED in arch/riscv/include/asm/vdso.h vdso-cfi-offsets.h should be included only when CONFIG_RISCV_USER_CFI is selected. v20: - rebased on v6.18-rc1. - Added two vDSO support. If `CONFIG_RISCV_USER_CFI` is selected two vDSOs are compiled (one for hardware prior to RVA23 and one for RVA23 onwards). Kernel exposes RVA23 vDSO if hardware/cpu implements zimop else exposes existing vDSO to userspace. - default selection for `CONFIG_RISCV_USER_CFI` is "Yes". - replaced "__ASSEMBLY__" with "__ASSEMBLER__" v19: - riscv_nousercfi was `int`. changed it to unsigned long. Thanks to Alex Ghiti for reporting it. It was a bug. - ELP is cleared on trap entry only when CONFIG_64BIT. - restore ssp back on return to usermode was being done before `riscv_v_context_nesting_end` on trap exit path. If kernel shadow stack were enabled this would result in kernel operating on user shadow stack and panic (as I found in my testing of kcfi patch series). So fixed that. v18: - rebased on 6.16-rc1 - uprobe handling clears ELP in sstatus image in pt_regs - vdso was missing shadow stack elf note for object files. added that. Additional asm file for vdso needed the elf marker flag. toolchain should complain if `-fcf-protection=full` and marker is missing for object generated from asm file. Asked toolchain folks to fix this. Although no reason to gate the merge on that. - Split up compile options for march and fcf-protection in vdso Makefile - CONFIG_RISCV_USER_CFI option is moved under "Kernel features" menu Added `arch/riscv/configs/hardening.config` fragment which selects CONFIG_RISCV_USER_CFI v17: - fixed warnings due to empty macros in usercfi.h (reported by alexg) - fixed prefixes in commit titles reported by alexg - took below uprobe with fcfi v2 patch from Zong Li and squashed it with "riscv/traps: Introduce software check exception and uprobe handling" https://lore.kernel.org/all/20250604093403.10916-1-zong.li@sifive.com/ v16: - If FWFT is not implemented or returns error for shadow stack activation, then no_usercfi is set to disable shadow stack. Although this should be picked up by extension validation and activation. Fixed this bug for zicfilp and zicfiss both. Thanks to Charlie Jenkins for reporting this. - If toolchain doesn't support cfi, cfi kselftest shouldn't build. Suggested by Charlie Jenkins. - Default for CONFIG_RISCV_USER_CFI is set to no. Charlie/Atish suggested to keep it off till we have more hardware availibility with RVA23 profile and zimop/zcmop implemented. Else this will start breaking people's workflow - Includes the fix if "!RV64 and !SBI" then definitions for FWFT in asm-offsets.c error. v15: - Toolchain has been updated to include `-fcf-protection` flag. This exists for x86 as well. Updated kernel patches to compile vDSO and selftest to compile with `fcf-protection=full` flag. - selecting CONFIG_RISCV_USERCFI selects CONFIG_RISCV_SBI. - Patch to enable shadow stack for kernel wasn't hidden behind CONFIG_RISCV_USERCFI and CONFIG_RISCV_SBI. fixed that. v14: - rebased on top of palmer/sbi-v3. Thus dropped clement's FWFT patches Updated RISCV_ISA_EXT_XXXX in hwcap and hwprobe constants. - Took Radim's suggestions on bitfields. - Placed cfi_state at the end of thread_info block so that current situation is not disturbed with respect to member fields of thread_info in single cacheline. v13: - cpu_supports_shadow_stack/cpu_supports_indirect_br_lp_instr uses riscv_has_extension_unlikely() - uses nops(count) to create nop slide - RISCV_ACQUIRE_BARRIER is not needed in `amo_user_shstk`. Removed it - changed ternaries to simply use implicit casting to convert to bool. - kernel command line allows to disable zicfilp and zicfiss independently. updated kernel-parameters.txt. - ptrace user abi for cfi uses bitmasks instead of bitfields. Added ptrace kselftest. - cosmetic and grammatical changes to documentation. v12: - It seems like I had accidently squashed arch agnostic indirect branch tracking prctl and riscv implementation of those prctls. Split them again. - set_shstk_status/set_indir_lp_status perform CSR writes only when CPU support is available. As suggested by Zong Li. - Some minor clean up in kselftests as suggested by Zong Li. v11: - patch "arch/riscv: compile vdso with landing pad" was unconditionally selecting `_zicfilp` for vDSO compile. fixed that. Changed `lpad 1` to to `lpad 0`. v10: - dropped "mm: helper `is_shadow_stack_vma` to check shadow stack vma". This patch is not that interesting to this patch series for risc-v. There are instances in arch directories where VM_SHADOW_STACK flag is anyways used. Dropping this patch to expedite merging in riscv tree. - Took suggestions from `Clement` on "riscv: zicfiss / zicfilp enumeration" to validate presence of cfi based on config. - Added a patch for vDSO to have `lpad 0`. I had omitted this earlier to make sure we add single vdso object with cfi enabled. But a vdso object with scheme of zero labeled landing pad is least common denominator and should work with all objects of zero labeled as well as function-signature labeled objects. v9: - rebased on master (39a803b754d5 fix braino in "9p: fix ->rename_sem exclusion") - dropped "mm: Introduce ARCH_HAS_USER_SHADOW_STACK" (master has it from arm64/gcs) - dropped "prctl: arch-agnostic prctl for shadow stack" (master has it from arm64/gcs) v8: - rebased on palmer/for-next - dropped samuel holland's `envcfg` context switch patches. they are in parlmer/for-next v7: - Removed "riscv/Kconfig: enable HAVE_EXIT_THREAD for riscv" Instead using `deactivate_mm` flow to clean up. see here for more context https://lore.kernel.org/all/20230908203655.543765-1-rick.p.edgecombe@intel.… - Changed the header include in `kselftest`. Hopefully this fixes compile issue faced by Zong Li at SiFive. - Cleaned up an orphaned change to `mm/mmap.c` in below patch "riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE" - Lock interfaces for shadow stack and indirect branch tracking expect arg == 0 Any future evolution of this interface should accordingly define how arg should be setup. - `mm/map.c` has an instance of using `VM_SHADOW_STACK`. Fixed it to use helper `is_shadow_stack_vma`. - Link to v6: https://lore.kernel.org/r/20241008-v5_user_cfi_series-v6-0-60d9fe073f37@riv… v6: - Picked up Samuel Holland's changes as is with `envcfg` placed in `thread` instead of `thread_info` - fixed unaligned newline escapes in kselftest - cleaned up messages in kselftest and included test output in commit message - fixed a bug in clone path reported by Zong Li - fixed a build issue if CONFIG_RISCV_ISA_V is not selected (this was introduced due to re-factoring signal context management code) v5: - rebased on v6.12-rc1 - Fixed schema related issues in device tree file - Fixed some of the documentation related issues in zicfilp/ss.rst (style issues and added index) - added `SHADOW_STACK_SET_MARKER` so that implementation can define base of shadow stack. - Fixed warnings on definitions added in usercfi.h when CONFIG_RISCV_USER_CFI is not selected. - Adopted context header based signal handling as proposed by Andy Chiu - Added support for enabling kernel mode access to shadow stack using FWFT (https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/src/ext-firmware…) - Link to v5: https://lore.kernel.org/r/20241001-v5_user_cfi_series-v1-0-3ba65b6e550f@riv… (Note: I had an issue in my workflow due to which version number wasn't picked up correctly while sending out patches) v4: - rebased on 6.11-rc6 - envcfg: Converged with Samuel Holland's patches for envcfg management on per- thread basis. - vma_is_shadow_stack is renamed to is_vma_shadow_stack - picked up Mark Brown's `ARCH_HAS_USER_SHADOW_STACK` patch - signal context: using extended context management to maintain compatibility. - fixed `-Wmissing-prototypes` compiler warnings for prctl functions - Documentation fixes and amending typos. - Link to v4: https://lore.kernel.org/all/20240912231650.3740732-1-debug@rivosinc.com/ v3: - envcfg logic to pick up base envcfg had a bug where `ENVCFG_CBZE` could have been picked on per task basis, even though CPU didn't implement it. Fixed in this series. - dt-bindings As suggested, split into separate commit. fixed the messaging that spec is in public review - arch_is_shadow_stack change arch_is_shadow_stack changed to vma_is_shadow_stack - hwprobe zicfiss / zicfilp if present will get enumerated in hwprobe - selftests As suggested, added object and binary filenames to .gitignore Selftest binary anyways need to be compiled with cfi enabled compiler which will make sure that landing pad and shadow stack are enabled. Thus removed separate enable/disable tests. Cleaned up tests a bit. - Link to v3: https://lore.kernel.org/lkml/20240403234054.2020347-1-debug@rivosinc.com/ v2: - Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow integrity for user mode programs can be compiled in the kernel. - Enabling of control flow integrity for user programs is left to user runtime - This patch series introduces arch agnostic `prctls` to enable shadow stack and indirect branch tracking. And implements them on riscv. --- Changes in v22: - Link to v21: https://lore.kernel.org/r/20251015-v5_user_cfi_series-v21-0-6a07856e90e7@ri… Changes in v21: - Link to v20: https://lore.kernel.org/r/20251013-v5_user_cfi_series-v20-0-b9de4be9912e@ri… Changes in v20: - Link to v19: https://lore.kernel.org/r/20250731-v5_user_cfi_series-v19-0-09b468d7beab@ri… Changes in v19: - Link to v18: https://lore.kernel.org/r/20250711-v5_user_cfi_series-v18-0-a8ee62f9f38e@ri… Changes in v18: - Link to v17: https://lore.kernel.org/r/20250604-v5_user_cfi_series-v17-0-4565c2cf869f@ri… Changes in v17: - Link to v16: https://lore.kernel.org/r/20250522-v5_user_cfi_series-v16-0-64f61a35eee7@ri… Changes in v16: - Link to v15: https://lore.kernel.org/r/20250502-v5_user_cfi_series-v15-0-914966471885@ri… Changes in v15: - changelog posted just below cover letter - Link to v14: https://lore.kernel.org/r/20250429-v5_user_cfi_series-v14-0-5239410d012a@ri… Changes in v14: - changelog posted just below cover letter - Link to v13: https://lore.kernel.org/r/20250424-v5_user_cfi_series-v13-0-971437de586a@ri… Changes in v13: - changelog posted just below cover letter - Link to v12: https://lore.kernel.org/r/20250314-v5_user_cfi_series-v12-0-e51202b53138@ri… Changes in v12: - changelog posted just below cover letter - Link to v11: https://lore.kernel.org/r/20250310-v5_user_cfi_series-v11-0-86b36cbfb910@ri… Changes in v11: - changelog posted just below cover letter - Link to v10: https://lore.kernel.org/r/20250210-v5_user_cfi_series-v10-0-163dcfa31c60@ri… --- Andy Chiu (1): riscv: signal: abstract header saving for setup_sigcontext Deepak Gupta (26): mm: VM_SHADOW_STACK definition for riscv dt-bindings: riscv: zicfilp and zicfiss in dt-bindings (extensions.yaml) riscv: zicfiss / zicfilp enumeration riscv: zicfiss / zicfilp extension csr and bit definitions riscv: usercfi state for task and save/restore of CSR_SSP on trap entry/exit riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE riscv/mm: manufacture shadow stack pte riscv/mm: teach pte_mkwrite to manufacture shadow stack PTEs riscv/mm: write protect and shadow stack riscv/mm: Implement map_shadow_stack() syscall riscv/shstk: If needed allocate a new shadow stack on clone riscv: Implements arch agnostic shadow stack prctls prctl: arch-agnostic prctl for indirect branch tracking riscv: Implements arch agnostic indirect branch tracking prctls riscv/traps: Introduce software check exception and uprobe handling riscv/signal: save and restore of shadow stack for signal riscv/kernel: update __show_regs to print shadow stack register riscv/ptrace: riscv cfi status and state via ptrace and in core files riscv/hwprobe: zicfilp / zicfiss enumeration in hwprobe riscv: kernel command line option to opt out of user cfi riscv: enable kernel access to shadow stack memory via FWFT sbi call arch/riscv: dual vdso creation logic and select vdso based on hw riscv: create a config for shadow stack and landing pad instr support riscv: Documentation for landing pad / indirect branch tracking riscv: Documentation for shadow stack on riscv kselftest/riscv: kselftest for user mode cfi Jim Shu (1): arch/riscv: compile vdso with landing pad and shadow stack note Documentation/admin-guide/kernel-parameters.txt | 8 + Documentation/arch/riscv/index.rst | 2 + Documentation/arch/riscv/zicfilp.rst | 115 +++++ Documentation/arch/riscv/zicfiss.rst | 179 +++++++ .../devicetree/bindings/riscv/extensions.yaml | 14 + arch/riscv/Kconfig | 22 + arch/riscv/Makefile | 8 +- arch/riscv/configs/hardening.config | 4 + arch/riscv/include/asm/asm-prototypes.h | 1 + arch/riscv/include/asm/assembler.h | 44 ++ arch/riscv/include/asm/cpufeature.h | 12 + arch/riscv/include/asm/csr.h | 16 + arch/riscv/include/asm/entry-common.h | 2 + arch/riscv/include/asm/hwcap.h | 2 + arch/riscv/include/asm/mman.h | 26 + arch/riscv/include/asm/mmu_context.h | 7 + arch/riscv/include/asm/pgtable.h | 30 +- arch/riscv/include/asm/processor.h | 1 + arch/riscv/include/asm/thread_info.h | 3 + arch/riscv/include/asm/usercfi.h | 95 ++++ arch/riscv/include/asm/vdso.h | 13 +- arch/riscv/include/asm/vector.h | 3 + arch/riscv/include/uapi/asm/hwprobe.h | 2 + arch/riscv/include/uapi/asm/ptrace.h | 34 ++ arch/riscv/include/uapi/asm/sigcontext.h | 1 + arch/riscv/kernel/Makefile | 2 + arch/riscv/kernel/asm-offsets.c | 10 + arch/riscv/kernel/cpufeature.c | 27 + arch/riscv/kernel/entry.S | 38 ++ arch/riscv/kernel/head.S | 27 + arch/riscv/kernel/process.c | 27 +- arch/riscv/kernel/ptrace.c | 95 ++++ arch/riscv/kernel/signal.c | 148 +++++- arch/riscv/kernel/sys_hwprobe.c | 2 + arch/riscv/kernel/sys_riscv.c | 10 + arch/riscv/kernel/traps.c | 54 ++ arch/riscv/kernel/usercfi.c | 545 +++++++++++++++++++++ arch/riscv/kernel/vdso.c | 7 + arch/riscv/kernel/vdso/Makefile | 40 +- arch/riscv/kernel/vdso/flush_icache.S | 4 + arch/riscv/kernel/vdso/gen_vdso_offsets.sh | 4 +- arch/riscv/kernel/vdso/getcpu.S | 4 + arch/riscv/kernel/vdso/note.S | 3 + arch/riscv/kernel/vdso/rt_sigreturn.S | 4 + arch/riscv/kernel/vdso/sys_hwprobe.S | 4 + arch/riscv/kernel/vdso/vgetrandom-chacha.S | 5 +- arch/riscv/kernel/vdso_cfi/Makefile | 25 + arch/riscv/kernel/vdso_cfi/vdso-cfi.S | 11 + arch/riscv/mm/init.c | 2 +- arch/riscv/mm/pgtable.c | 16 + include/linux/cpu.h | 4 + include/linux/mm.h | 7 + include/uapi/linux/elf.h | 2 + include/uapi/linux/prctl.h | 27 + kernel/sys.c | 30 ++ tools/testing/selftests/riscv/Makefile | 2 +- tools/testing/selftests/riscv/cfi/.gitignore | 3 + tools/testing/selftests/riscv/cfi/Makefile | 16 + tools/testing/selftests/riscv/cfi/cfi_rv_test.h | 82 ++++ tools/testing/selftests/riscv/cfi/riscv_cfi_test.c | 173 +++++++ tools/testing/selftests/riscv/cfi/shadowstack.c | 385 +++++++++++++++ tools/testing/selftests/riscv/cfi/shadowstack.h | 27 + 62 files changed, 2475 insertions(+), 41 deletions(-) --- base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787 change-id: 20240930-v5_user_cfi_series-3dc332f8f5b2 -- - debug

1 month, 2 weeks

7
44
0 0

[PATCH] selftest: net: fix variable sized type not at the end of struct warnings

by Ankit Khushwaha

Some network selftests defined variable-sized types defined at the end of struct causing -Wgnu-variable-sized-type-not-at-end warning. warning: timestamping.c:285:18: warning: field 'cm' with variable sized type 'struct cmsghdr' not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end] 285 | struct cmsghdr cm; | ^ ipsec.c:835:5: warning: field 'u' with variable sized type 'union (unnamed union at ipsec.c:831:3)' not at the end of a struct or class is a GNU extension [-Wgnu-variable-sized-type-not-at-end] 835 | } u; | ^ This patch move these field at the end of struct to fix these warnings. Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux(a)gmail.com> --- tools/testing/selftests/net/ipsec.c | 2 +- tools/testing/selftests/net/timestamping.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/net/ipsec.c b/tools/testing/selftests/net/ipsec.c index 0ccf484b1d9d..36083c8f884f 100644 --- a/tools/testing/selftests/net/ipsec.c +++ b/tools/testing/selftests/net/ipsec.c @@ -828,12 +828,12 @@ static int xfrm_state_pack_algo(struct nlmsghdr *nh, size_t req_sz, struct xfrm_desc *desc) { struct { + char buf[XFRM_ALGO_KEY_BUF_SIZE]; union { struct xfrm_algo alg; struct xfrm_algo_aead aead; struct xfrm_algo_auth auth; } u; - char buf[XFRM_ALGO_KEY_BUF_SIZE]; } alg = {}; size_t alen, elen, clen, aelen; unsigned short type; diff --git a/tools/testing/selftests/net/timestamping.c b/tools/testing/selftests/net/timestamping.c index 044bc0e9ed81..ad2be2143698 100644 --- a/tools/testing/selftests/net/timestamping.c +++ b/tools/testing/selftests/net/timestamping.c @@ -282,8 +282,8 @@ static void recvpacket(int sock, int recvmsg_flags, struct iovec entry; struct sockaddr_in from_addr; struct { - struct cmsghdr cm; char control[512]; + struct cmsghdr cm; } control; int res; -- 2.51.0

1 month, 2 weeks

2
3
0 0

[PATCH] selftests/tracing: Run sample events to clear page cache events

by Steven Rostedt

From: Steven Rostedt <rostedt(a)goodmis.org> The tracing selftest "event-filter-function.tc" was failing because it first runs the "sample_events" function that triggers the kmem_cache_free event and it looks at what function was used during a call to "ls". But the first time it calls this, it could trigger events that are used to pull pages into the page cache. The rest of the test uses the function it finds during that call to see if it will be called in subsequent "sample_events" calls. But if there's no need to pull pages into the page cache, it will not trigger that function and the test will fail. Call the "sample_events" twice to trigger all the page cache work before it calls it to find a function to use in subsequent checks. Cc: stable(a)vger.kernel.org Fixes: eb50d0f250e96 ("selftests/ftrace: Choose target function for filter test from samples") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- .../selftests/ftrace/test.d/filter/event-filter-function.tc | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc index c62165fabd0c..003f612f57b0 100644 --- a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc +++ b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc @@ -20,6 +20,10 @@ sample_events() { echo 0 > tracing_on echo 0 > events/enable +# Clear functions caused by page cache; run sample_events twice +sample_events +sample_events + echo "Get the most frequently calling function" echo > trace sample_events -- 2.51.0

1 month, 2 weeks

3
3
0 0

[PATCH net-next 0/3] Add YNL test framework and library improvements

by Hangbin Liu

This series enhances YNL tools with some functionalities and adds YNL selftest framework. Changes include: - Add MAC address parsing support in YNL library - Fix rt-rule spec consistency with other rt-* families - Add selftests covering CLI and ethtool functionality The tests provide usage examples and regression testing for YNL tools. Hangbin Liu (3): tools: ynl: Add MAC address parsing support netlink: specs: update rt-rule src/dst attribute types to support IPv4 addresses selftests: net: add YNL test framework Documentation/netlink/specs/rt-rule.yaml | 6 +- tools/net/ynl/pyynl/lib/ynl.py | 9 + tools/testing/selftests/Makefile | 1 + tools/testing/selftests/net/ynl/Makefile | 18 ++ tools/testing/selftests/net/ynl/cli.sh | 234 +++++++++++++++++++++ tools/testing/selftests/net/ynl/config | 6 + tools/testing/selftests/net/ynl/ethtool.sh | 188 +++++++++++++++++ tools/testing/selftests/net/ynl/settings | 1 + 8 files changed, 461 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/net/ynl/Makefile create mode 100755 tools/testing/selftests/net/ynl/cli.sh create mode 100644 tools/testing/selftests/net/ynl/config create mode 100755 tools/testing/selftests/net/ynl/ethtool.sh create mode 100644 tools/testing/selftests/net/ynl/settings -- 2.50.1

1 month, 2 weeks

3
22
0 0

[PATCH v6 0/2] KVM: guest_memfd: use write for population

by Kalyazin, Nikita

[ based on kvm/next ] Implement guest_memfd population via the write syscall. This is useful in non-CoCo use cases where the host can access guest memory. Even though the same can also be achieved via userspace mapping and memcpying from userspace, write provides a more performant option because it does not need to set page tables and it does not cause a page fault for every page like memcpy would. Note that memcpy cannot be accelerated via MADV_POPULATE_WRITE as it is not supported by guest_memfd and relies on GUP. Populating 512MiB of guest_memfd on a x86 machine: - via memcpy: 436 ms - via write: 202 ms (-54%) The write syscall support is conditional on kvm_gmem_supports_mmap. When in-place shared/private conversion is supported, write should only be allowed on shared pages. v6: - Make write support conditional on mmap support instead of relying on the up-to-date flag to decide whether writing to a page is allowed - James: Remove depenendencies on folio_test_large - James: Remove page alignment restriction - James: Formatting fixes v5: - https://lore.kernel.org/kvm/20250902111951.58315-1-kalyazin@amazon.com/ - Replace the call to the unexported filemap_remove_folio with zeroing the bytes that could not be copied - Fix checkpatch findings v4: - https://lore.kernel.org/kvm/20250828153049.3922-1-kalyazin@amazon.com - Switch from implementing the write callback to write_iter - Remove conditional compilation v3: - https://lore.kernel.org/kvm/20250303130838.28812-1-kalyazin@amazon.com - David/Mike D: Only compile support for the write syscall if CONFIG_KVM_GMEM_SHARED_MEM (now gone) is enabled. v2: - https://lore.kernel.org/kvm/20241129123929.64790-1-kalyazin@amazon.com - Switch from an ioctl to the write syscall to implement population v1: - https://lore.kernel.org/kvm/20241024095429.54052-1-kalyazin@amazon.com Nikita Kalyazin (2): KVM: guest_memfd: add generic population via write KVM: selftests: update guest_memfd write tests .../testing/selftests/kvm/guest_memfd_test.c | 51 ++++++++++++++++--- virt/kvm/guest_memfd.c | 49 ++++++++++++++++++ 2 files changed, 94 insertions(+), 6 deletions(-) base-commit: 6b36119b94d0b2bb8cea9d512017efafd461d6ac -- 2.50.1

1 month, 3 weeks

4
8
0 0

[PATCH net-next v8 00/14] vsock: add namespace support to vhost-vsock

by Bobby Eshleman

This series adds namespace support to vhost-vsock and loopback. It does not add namespaces to any of the other guest transports (virtio-vsock, hyperv, or vmci). The current revision supports two modes: local and global. Local mode is complete isolation of namespaces, while global mode is complete sharing between namespaces of CIDs (the original behavior). The mode is set using /proc/sys/net/vsock/ns_mode. Modes are per-netns and write-once. This allows a system to configure namespaces independently (some may share CIDs, others are completely isolated). This also supports future possible mixed use cases, where there may be namespaces in global mode spinning up VMs while there are mixed mode namespaces that provide services to the VMs, but are not allowed to allocate from the global CID pool (this mode not implemented in this series). If a socket or VM is created when a namespace is global but the namespace changes to local, the socket or VM will continue working normally. That is, the socket or VM assumes the mode behavior of the namespace at the time the socket/VM was created. The original mode is captured in vsock_create() and so occurs at the time of socket(2) and accept(2) for sockets and open(2) on /dev/vhost-vsock for VMs. This prevents a socket/VM connection from suddenly breaking due to a namespace mode change. Any new sockets/VMs created after the mode change will adopt the new mode's behavior. Additionally, added tests for the new namespace features: tools/testing/selftests/vsock/vmtest.sh 1..30 ok 1 vm_server_host_client ok 2 vm_client_host_server ok 3 vm_loopback ok 4 ns_host_vsock_ns_mode_ok ok 5 ns_host_vsock_ns_mode_write_once_ok ok 6 ns_global_same_cid_fails ok 7 ns_local_same_cid_ok ok 8 ns_global_local_same_cid_ok ok 9 ns_local_global_same_cid_ok ok 10 ns_diff_global_host_connect_to_global_vm_ok ok 11 ns_diff_global_host_connect_to_local_vm_fails ok 12 ns_diff_global_vm_connect_to_global_host_ok ok 13 ns_diff_global_vm_connect_to_local_host_fails ok 14 ns_diff_local_host_connect_to_local_vm_fails ok 15 ns_diff_local_vm_connect_to_local_host_fails ok 16 ns_diff_global_to_local_loopback_local_fails ok 17 ns_diff_local_to_global_loopback_fails ok 18 ns_diff_local_to_local_loopback_fails ok 19 ns_diff_global_to_global_loopback_ok ok 20 ns_same_local_loopback_ok ok 21 ns_same_local_host_connect_to_local_vm_ok ok 22 ns_same_local_vm_connect_to_local_host_ok ok 23 ns_mode_change_connection_continue_vm_ok ok 24 ns_mode_change_connection_continue_host_ok ok 25 ns_mode_change_connection_continue_both_ok ok 26 ns_delete_vm_ok ok 27 ns_delete_host_ok ok 28 ns_delete_both_ok ok 29 ns_loopback_global_global_late_module_load_ok ok 30 ns_loopback_local_local_late_module_load_fails SUMMARY: PASS=30 SKIP=0 FAIL=0 Dependent on series: https://lore.kernel.org/all/20251022-vsock-selftests-fixes-and-improvements… Thanks again for everyone's help and reviews! Signed-off-by: Bobby Eshleman <bobbyeshleman(a)gmail.com> To: Stefano Garzarella <sgarzare(a)redhat.com> To: Shuah Khan <shuah(a)kernel.org> To: David S. Miller <davem(a)davemloft.net> To: Eric Dumazet <edumazet(a)google.com> To: Jakub Kicinski <kuba(a)kernel.org> To: Paolo Abeni <pabeni(a)redhat.com> To: Simon Horman <horms(a)kernel.org> To: Stefan Hajnoczi <stefanha(a)redhat.com> To: Michael S. Tsirkin <mst(a)redhat.com> To: Jason Wang <jasowang(a)redhat.com> To: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com> To: Eugenio Pérez <eperezma(a)redhat.com> To: K. Y. Srinivasan <kys(a)microsoft.com> To: Haiyang Zhang <haiyangz(a)microsoft.com> To: Wei Liu <wei.liu(a)kernel.org> To: Dexuan Cui <decui(a)microsoft.com> To: Bryan Tan <bryan-bt.tan(a)broadcom.com> To: Vishnu Dasa <vishnu.dasa(a)broadcom.com> To: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com> Cc: virtualization(a)lists.linux.dev Cc: netdev(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: linux-kernel(a)vger.kernel.org Cc: kvm(a)vger.kernel.org Cc: linux-hyperv(a)vger.kernel.org Cc: berrange(a)redhat.com Changes in v8: - Break generic cleanup/refactoring patches into standalone series, remove those from this series - Link to dependency: https://lore.kernel.org/all/20251022-vsock-selftests-fixes-and-improvements… - Link to v7: https://lore.kernel.org/r/20251021-vsock-vmtest-v7-0-0661b7b6f081@meta.com Changes in v7: - fix hv_sock build - break out vmtest patches into distinct, more well-scoped patches - change `orig_net_mode` to `net_mode` - many fixes and style changes in per-patch change sets (see individual patches for specific changes) - optimize `virtio_vsock_skb_cb` layout - update commit messages with more useful descriptions - vsock_loopback: use orig_net_mode instead of current net mode - add tests for edge cases (ns deletion, mode changing, loopback module load ordering) - Link to v6: https://lore.kernel.org/r/20250916-vsock-vmtest-v6-0-064d2eb0c89d@meta.com Changes in v6: - define behavior when mode changes to local while socket/VM is alive - af_vsock: clarify description of CID behavior - af_vsock: use stronger langauge around CID rules (dont use "may") - af_vsock: improve naming of buf/buffer - af_vsock: improve string length checking on proc writes - vsock_loopback: add space in struct to clarify lock protection - vsock_loopback: do proper cleanup/unregister on vsock_loopback_exit() - vsock_loopback: use virtio_vsock_skb_net() instead of sock_net() - vsock_loopback: set loopback to NULL after kfree() - vsock_loopback: use pernet_operations and remove callback mechanism - vsock_loopback: add macros for "global" and "local" - vsock_loopback: fix length checking - vmtest.sh: check for namespace support in vmtest.sh - Link to v5: https://lore.kernel.org/r/20250827-vsock-vmtest-v5-0-0ba580bede5b@meta.com Changes in v5: - /proc/net/vsock_ns_mode -> /proc/sys/net/vsock/ns_mode - vsock_global_net -> vsock_global_dummy_net - fix netns lookup in vhost_vsock to respect pid namespaces - add callbacks for vsock_loopback to avoid circular dependency - vmtest.sh loads vsock_loopback module - remove vsock_net_mode_can_set() - change vsock_net_write_mode() to return true/false based on success - make vsock_net_mode enum instead of u8 - Link to v4: https://lore.kernel.org/r/20250805-vsock-vmtest-v4-0-059ec51ab111@meta.com Changes in v4: - removed RFC tag - implemented loopback support - renamed new tests to better reflect behavior - completed suite of tests with permutations of ns modes and vsock_test as guest/host - simplified socat bridging with unix socket instead of tcp + veth - only use vsock_test for success case, socat for failure case (context in commit message) - lots of cleanup Changes in v3: - add notion of "modes" - add procfs /proc/net/vsock_ns_mode - local and global modes only - no /dev/vhost-vsock-netns - vmtest.sh already merged, so new patch just adds new tests for NS - Link to v2: https://lore.kernel.org/kvm/20250312-vsock-netns-v2-0-84bffa1aa97a@gmail.com Changes in v2: - only support vhost-vsock namespaces - all g2h namespaces retain old behavior, only common API changes impacted by vhost-vsock changes - add /dev/vhost-vsock-netns for "opt-in" - leave /dev/vhost-vsock to old behavior - removed netns module param - Link to v1: https://lore.kernel.org/r/20200116172428.311437-1-sgarzare@redhat.com Changes in v1: - added 'netns' module param to vsock.ko to enable the network namespace support (disabled by default) - added 'vsock_net_eq()' to check the "net" assigned to a socket only when 'netns' support is enabled - Link to RFC: https://patchwork.ozlabs.org/cover/1202235/ --- Bobby Eshleman (14): vsock: a per-net vsock NS mode state vsock/virtio: pack struct virtio_vsock_skb_cb vsock: add netns to vsock skb cb vsock: add netns to vsock core vsock/loopback: add netns support vsock/virtio: add netns to virtio transport common vhost/vsock: add netns support selftests/vsock: add namespace helpers to vmtest.sh selftests/vsock: prepare vm management helpers for namespaces selftests/vsock: add tests for proc sys vsock ns_mode selftests/vsock: add namespace tests for CID collisions selftests/vsock: add tests for host <-> vm connectivity with namespaces selftests/vsock: add tests for namespace deletion and mode changes selftests/vsock: add tests for module loading order MAINTAINERS | 1 + drivers/vhost/vsock.c | 48 +- include/linux/virtio_vsock.h | 47 +- include/net/af_vsock.h | 70 ++- include/net/net_namespace.h | 4 + include/net/netns/vsock.h | 22 + net/vmw_vsock/af_vsock.c | 264 +++++++- net/vmw_vsock/virtio_transport.c | 7 +- net/vmw_vsock/virtio_transport_common.c | 21 +- net/vmw_vsock/vsock_loopback.c | 89 ++- tools/testing/selftests/vsock/vmtest.sh | 1044 ++++++++++++++++++++++++++++++- 11 files changed, 1532 insertions(+), 85 deletions(-) --- base-commit: 962ac5ca99a5c3e7469215bf47572440402dfd59 change-id: 20250325-vsock-vmtest-b3a21d2102c2 prerequisite-message-id: <20251022-vsock-selftests-fixes-and-improvements-v1-0-edeb179d6463(a)meta.com> prerequisite-patch-id: a2eecc3851f2509ed40009a7cab6990c6d7cfff5 prerequisite-patch-id: 501db2100636b9c8fcb3b64b8b1df797ccbede85 prerequisite-patch-id: ba1a2f07398a035bc48ef72edda41888614be449 prerequisite-patch-id: fd5cc5445aca9355ce678e6d2bfa89fab8a57e61 prerequisite-patch-id: 795ab4432ffb0843e22b580374782e7e0d99b909 prerequisite-patch-id: 1499d263dc933e75366c09e045d2125ca39f7ddd prerequisite-patch-id: f92d99bb1d35d99b063f818a19dcda999152d74c prerequisite-patch-id: e3296f38cdba6d903e061cff2bbb3e7615e8e671 prerequisite-patch-id: bc4662b4710d302d4893f58708820fc2a0624325 prerequisite-patch-id: f8991f2e98c2661a706183fde6b35e2b8d9aedcf prerequisite-patch-id: 44bf9ed69353586d284e5ee63d6fffa30439a698 prerequisite-patch-id: d50621bc630eeaf608bbaf260370c8dabf6326df Best regards, -- Bobby Eshleman <bobbyeshleman(a)meta.com>

1 month, 3 weeks

2
35
0 0

[PATCH v2] selftests: harness: Support KCOV.

by Kuniyuki Iwashima

While writing a selftest with kselftest_harness.h, I often want to check which paths are actually exercised. Let's support generating KCOV coverage data. We can specify the output directory via the KCOV_OUTPUT environment variable, and the number of instructions to collect via the KCOV_SLOTS environment variable. # KCOV_OUTPUT=$PWD/kcov KCOV_SLOTS=$((4096 * 2)) \ ./tools/testing/selftests/net/af_unix/scm_inq Both variables can also be specified as the make variable. # make -C tools/testing/selftests/ \ KCOV_OUTPUT=$PWD/kcov KCOV_SLOTS=$((4096 * 4)) \ kselftest_override_timeout=60 TARGETS=net/af_unix run_tests The coverage data can be simply decoded with addr2line: $ cat kcov/* | sort | uniq | addr2line -e vmlinux | grep unix net/unix/af_unix.c:1056 net/unix/af_unix.c:3138 net/unix/af_unix.c:3834 net/unix/af_unix.c:3838 net/unix/af_unix.c:311 (discriminator 2) ... or more nicely with a script embedded in vock [0]: $ cat kcov/* | sort | uniq > local.log $ python3 ~/kernel/tools/vock/report.py \ --kernel-src ./ --vmlinux ./vmlinux \ --mode local --local-log local.log --filter unix ... ------------------------------- Coverage Report -------------------------------- 📄 net/unix/af_unix.c (276 lines) ... 942 | static int unix_setsockopt(struct socket *sock, int level, int optname, 943 | sockptr_t optval, unsigned int optlen) 944 | { ... 961 | switch (optname) { 962 | case SO_INQ: 963 > if (sk->sk_type != SOCK_STREAM) 964 | return -EINVAL; 965 | 966 > if (val > 1 || val < 0) 967 | return -EINVAL; 968 | 969 > WRITE_ONCE(u->recvmsg_inq, val); 970 | break; Link: https://github.com/kzall0c/vock/blob/f3d97de9954f9df758c0ab287ca7e24e654288… #[0] Signed-off-by: Kuniyuki Iwashima <kuniyu(a)google.com> --- v2: Support TEST() v1: https://lore.kernel.org/linux-kselftest/20251017084022.3721950-1-kuniyu@goo… --- Documentation/dev-tools/kselftest.rst | 41 ++++++ tools/testing/selftests/Makefile | 14 ++- tools/testing/selftests/kselftest_harness.h | 133 +++++++++++++++++++- 3 files changed, 178 insertions(+), 10 deletions(-) diff --git a/Documentation/dev-tools/kselftest.rst b/Documentation/dev-tools/kselftest.rst index 18c2da67fae4..5c2b92ac4a30 100644 --- a/Documentation/dev-tools/kselftest.rst +++ b/Documentation/dev-tools/kselftest.rst @@ -200,6 +200,47 @@ You can look at the TAP output to see if you ran into the timeout. Test runners which know a test must run under a specific time can then optionally treat these timeouts then as fatal. +KCOV for selftests +================== + +Selftests built with `kselftest_harness.h` natively support generating +KCOV coverage data. See :doc:`KCOV: code coverage for fuzzing </dev-tools/kcov>` +for prerequisites. + +You can specify the output directory with the `KCOV_OUTPUT` environment +variable. Additionally, you can specify the number of instructions to +collect with the `KCOV_SLOTS` environment variable :: + + # KCOV_OUTPUT=$PWD/kcov KCOV_SLOTS=$((4096 * 2)) \ + ./tools/testing/selftests/net/af_unix/scm_inq + +In the output directory, a coverage file is generated for each test +case in the selftest :: + + $ ls kcov/ + scm_inq.dgram.basic scm_inq.seqpacket.basic scm_inq.stream.basic + +The default value of `KCOV_SLOTS` is `4096`, and `KCOV_SLOTS` multiplied +by `sizeof(unsigned long)` must be multiple of `4096`, so the smallest +value is `512`. + +Both `KCOV_OUTPUT` and `KCOV_SLOTS` can be specified as the variables +on the `make` command line :: + + # make -C tools/testing/selftests/ \ + kselftest_override_timeout=60 \ + KCOV_OUTPUT=$PWD/kcov KCOV_SLOTS=$((4096 * 4)) \ + TARGETS=net/af_unix run_tests + +The collected data can be decoded with `addr2line` :: + + $ cat kcov/* | sort | uniq | addr2line -e vmlinux | grep unix + net/unix/af_unix.c:1056 + net/unix/af_unix.c:3138 + net/unix/af_unix.c:3834 + net/unix/af_unix.c:3838 + ... + Packaging selftests =================== diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index c46ebdb9b8ef..40e70fb1a347 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -218,12 +218,14 @@ all: done; exit $$ret; run_tests: all - @for TARGET in $(TARGETS); do \ - BUILD_TARGET=$$BUILD/$$TARGET; \ - $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET run_tests \ - SRC_PATH=$(shell readlink -e $$(pwd)) \ - OBJ_PATH=$(BUILD) \ - O=$(abs_objtree); \ + @for TARGET in $(TARGETS); do \ + BUILD_TARGET=$$BUILD/$$TARGET; \ + $(MAKE) OUTPUT=$$BUILD_TARGET \ + KCOV_OUTPUT=$(abspath $(KCOV_OUTPUT)) \ + -C $$TARGET run_tests \ + SRC_PATH=$(shell readlink -e $$(pwd)) \ + OBJ_PATH=$(BUILD) \ + O=$(abs_objtree); \ done; hotplug: diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h index 3f66e862e83e..5b7a01722981 100644 --- a/tools/testing/selftests/kselftest_harness.h +++ b/tools/testing/selftests/kselftest_harness.h @@ -56,6 +56,8 @@ #include <asm/types.h> #include <ctype.h> #include <errno.h> +#include <fcntl.h> +#include <linux/kcov.h> #include <linux/unistd.h> #include <poll.h> #include <stdbool.h> @@ -63,7 +65,9 @@ #include <stdio.h> #include <stdlib.h> #include <string.h> +#include <sys/ioctl.h> #include <sys/mman.h> +#include <sys/stat.h> #include <sys/types.h> #include <sys/wait.h> #include <unistd.h> @@ -175,9 +179,12 @@ static void test_name(struct __test_metadata *_metadata); \ static void wrapper_##test_name( \ struct __test_metadata *_metadata, \ - struct __fixture_variant_metadata __attribute__((unused)) *variant) \ + struct __fixture_variant_metadata __attribute__((unused)) *variant, \ + char *test_full_name) \ { \ + enable_kcov(_metadata); \ test_name(_metadata); \ + disable_kcov(_metadata, test_full_name); \ } \ static struct __test_metadata _##test_name##_object = \ { .name = #test_name, \ @@ -401,7 +408,8 @@ const FIXTURE_VARIANT(fixture_name) *variant); \ static void wrapper_##fixture_name##_##test_name( \ struct __test_metadata *_metadata, \ - struct __fixture_variant_metadata *variant) \ + struct __fixture_variant_metadata *variant, \ + char *test_full_name) \ { \ /* fixture data is alloced, setup, and torn down per call. */ \ FIXTURE_DATA(fixture_name) self_private, *self = NULL; \ @@ -430,7 +438,9 @@ if (_metadata->exit_code) \ _exit(0); \ *_metadata->no_teardown = false; \ + enable_kcov(_metadata); \ fixture_name##_##test_name(_metadata, self, variant->data); \ + disable_kcov(_metadata, test_full_name); \ _metadata->teardown_fn(false, _metadata, self, variant->data); \ _exit(0); \ } else if (child < 0 || child != waitpid(child, &status, 0)) { \ @@ -470,6 +480,8 @@ object->teardown_fn = &wrapper_##fixture_name##_##test_name##_teardown; \ object->termsig = signal; \ object->timeout = tmout; \ + object->kcov_fd = -1; \ + object->kcov_slots = -1; \ _##fixture_name##_##test_name##_object = object; \ __register_test(object); \ } \ @@ -908,7 +920,8 @@ __register_fixture_variant(struct __fixture_metadata *f, struct __test_metadata { const char *name; void (*fn)(struct __test_metadata *, - struct __fixture_variant_metadata *); + struct __fixture_variant_metadata *, + char *test_name); pid_t pid; /* pid of test when being run */ struct __fixture_metadata *fixture; void (*teardown_fn)(bool in_parent, struct __test_metadata *_metadata, @@ -923,6 +936,10 @@ struct __test_metadata { const void *variant; struct __test_results *results; struct __test_metadata *prev, *next; + int kcov_fd; + int kcov_slots; + char *kcov_dir; + unsigned long *kcov_mem; }; static inline bool __test_passed(struct __test_metadata *metadata) @@ -1185,6 +1202,114 @@ static bool test_enabled(int argc, char **argv, return !has_positive; } +#define KCOV_SLOTS 4096 + +static void enable_kcov(struct __test_metadata *t) +{ + char *slots; + int err; + + t->kcov_dir = getenv("KCOV_OUTPUT"); + if (!t->kcov_dir || *t->kcov_dir == '\0') + return; + + slots = getenv("KCOV_SLOTS"); + if (slots && *slots != '\0') + sscanf(slots, "%d", &t->kcov_slots); + if (t->kcov_slots <= 0) + t->kcov_slots = KCOV_SLOTS; + + t->kcov_fd = open("/sys/kernel/debug/kcov", O_RDWR); + if (t->kcov_fd < 0) { + ksft_print_msg("ERROR OPENING KCOV FD\n"); + goto err; + } + + err = ioctl(t->kcov_fd, KCOV_INIT_TRACE, t->kcov_slots); + if (err) { + ksft_print_msg("ERROR INITIALISING KCOV\n"); + goto err; + } + + t->kcov_mem = mmap(NULL, sizeof(unsigned long) * t->kcov_slots, + PROT_READ | PROT_WRITE, MAP_SHARED, t->kcov_fd, 0); + if ((void *)t->kcov_mem == MAP_FAILED) { + ksft_print_msg("ERROR ALLOCATING MEMORY FOR KCOV\n"); + goto err; + } + + err = ioctl(t->kcov_fd, KCOV_ENABLE, KCOV_TRACE_PC); + if (err) { + ksft_print_msg("ERROR ENABLING KCOV\n"); + goto err; + } + + __atomic_store_n(&t->kcov_mem[0], 0, __ATOMIC_RELAXED); + return; +err: + t->exit_code = KSFT_FAIL; + _exit(KSFT_FAIL); +} + +static void disable_kcov(struct __test_metadata *t, char *test_name) +{ + int slots, err, dir, fd, i; + + if (t->kcov_fd == -1) + return; + + slots = __atomic_load_n(&t->kcov_mem[0], __ATOMIC_RELAXED); + if (slots == t->kcov_slots - 1) + ksft_print_msg("Set KCOV_SLOTS to a value greater than %d\n", t->kcov_slots); + + err = ioctl(t->kcov_fd, KCOV_DISABLE, 0); + if (err) { + ksft_print_msg("ERROR DISABLING KCOV\n"); + goto out; + } + + err = mkdir(t->kcov_dir, 0755); + if (err == -1 && errno != EEXIST) { + ksft_print_msg("ERROR CREATING '%s'\n", t->kcov_dir); + goto out; + } + err = 0; + + dir = open(t->kcov_dir, O_DIRECTORY); + if (dir < 0) { + ksft_print_msg("ERROR OPENING %s\n", t->kcov_dir); + err = dir; + goto out; + } + + fd = openat(dir, test_name, O_RDWR | O_CREAT | O_TRUNC); + + close(dir); + + if (fd == -1) { + ksft_print_msg("ERROR CREATING '%s' at '%s'\n", test_name, t->kcov_dir); + err = fd; + goto out; + } + + for (i = 0; i < slots; i++) { + char buf[64]; + int size; + + size = snprintf(buf, 64, "0x%lx\n", t->kcov_mem[i + 1]); + write(fd, buf, size); + } + +out: + munmap(t->kcov_mem, sizeof(t->kcov_mem[0]) * t->kcov_slots); + close(t->kcov_fd); + + if (err) { + t->exit_code = KSFT_FAIL; + _exit(KSFT_FAIL); + } +} + static void __run_test(struct __fixture_metadata *f, struct __fixture_variant_metadata *variant, struct __test_metadata *t) @@ -1216,7 +1341,7 @@ static void __run_test(struct __fixture_metadata *f, t->exit_code = KSFT_FAIL; } else if (child == 0) { setpgrp(); - t->fn(t, variant); + t->fn(t, variant, test_name); _exit(t->exit_code); } else { t->pid = child; -- 2.51.1.838.g19442a804e-goog

1 month, 3 weeks

3
3
0 0

[PATCH] vfio: selftests: Store libvfio build outputs in $(OUTPUT)/libvfio

by David Matlack

Store the tools/testing/selftests/vfio/lib outputs (e.g. object files) in $(OUTPUT)/libvfio rather than in $(OUTPUT)/lib. This is in preparation for building the VFIO selftests library into the KVM selftests (see Link below). Specifically this will avoid name conflicts between tools/testing/selftests/{vfio,kvm/lib and also avoid leaving behind empty directories under tools/testing/selftests/kvm after a make clean. Link: https://lore.kernel.org/kvm/20250912222525.2515416-2-dmatlack@google.com/ Signed-off-by: David Matlack <dmatlack(a)google.com> --- Note: This patch applies on top of vfio/next. https://github.com/awilliam/linux-vfio/tree/next tools/testing/selftests/vfio/lib/libvfio.mk | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/vfio/lib/libvfio.mk b/tools/testing/selftests/vfio/lib/libvfio.mk index 5d11c3a89a28..3c0cdac30cb6 100644 --- a/tools/testing/selftests/vfio/lib/libvfio.mk +++ b/tools/testing/selftests/vfio/lib/libvfio.mk @@ -1,24 +1,26 @@ include $(top_srcdir)/scripts/subarch.include ARCH ?= $(SUBARCH) -VFIO_DIR := $(selfdir)/vfio +LIBVFIO_SRCDIR := $(selfdir)/vfio/lib -LIBVFIO_C := lib/vfio_pci_device.c -LIBVFIO_C += lib/vfio_pci_driver.c +LIBVFIO_C := vfio_pci_device.c +LIBVFIO_C += vfio_pci_driver.c ifeq ($(ARCH:x86_64=x86),x86) -LIBVFIO_C += lib/drivers/ioat/ioat.c -LIBVFIO_C += lib/drivers/dsa/dsa.c +LIBVFIO_C += drivers/ioat/ioat.c +LIBVFIO_C += drivers/dsa/dsa.c endif -LIBVFIO_O := $(patsubst %.c, $(OUTPUT)/%.o, $(LIBVFIO_C)) +LIBVFIO_OUTPUT := $(OUTPUT)/libvfio + +LIBVFIO_O := $(patsubst %.c, $(LIBVFIO_OUTPUT)/%.o, $(LIBVFIO_C)) LIBVFIO_O_DIRS := $(shell dirname $(LIBVFIO_O) | uniq) $(shell mkdir -p $(LIBVFIO_O_DIRS)) -CFLAGS += -I$(VFIO_DIR)/lib/include +CFLAGS += -I$(LIBVFIO_SRCDIR)/include -$(LIBVFIO_O): $(OUTPUT)/%.o : $(VFIO_DIR)/%.c +$(LIBVFIO_O): $(LIBVFIO_OUTPUT)/%.o : $(LIBVFIO_SRCDIR)/%.c $(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< -o $@ -EXTRA_CLEAN += $(LIBVFIO_O) +EXTRA_CLEAN += $(LIBVFIO_OUTPUT) base-commit: acb59a4bb8ed34e738a4c3463127bf3f6b5e11a9 -- 2.51.0.534.gc79095c0ca-goog

1 month, 3 weeks

2
1
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror October 2025