- Linux-kselftest-mirror - lists.linaro.org

[PATCH v1 00/16] iommufd: Add vIOMMU infrastructure (Part-4 vCMDQ)

by Nicolin Chen

The vIOMMU object is designed to represent a slice of an IOMMU HW for its virtualization features shared with or passed to user space (a VM mostly) in a way of HW acceleration. This extended the HWPT-based design for more advanced virtualization feature. A vCMDQ introduced by this series as a part of the vIOMMU infrastructure represents a HW supported queue/buffer for VM to use exclusively, e.g. - NVIDIA's virtual command queue - AMD vIOMMU's command buffer either of which is an IOMMU HW feature to directly load and execute cache invalidation commands issued by a guest kernel, to shoot down TLB entries that HW cached for guest-owned stage-1 page table entries. This is a big improvement since there is no VM Exit during an invalidation, compared to the traditional invalidation pathway by trapping a guest-own invalidation queue and forwarding those commands/requests to the host kernel that will eventually fill a HW-owned queue to execute those commands. Thus, a vCMDQ object, as an initial use case, is all about a guest-owned HW command queue that VMM can allocate/configure depending on the request from a guest kernel. Introduce a new IOMMUFD_OBJ_VCMDQ and its allocator IOMMUFD_CMD_VCMDQ_ALLOC allowing VMM to forward the IOMMU-specific queue info, such as queue base address, size, and etc. Meanwhile, a guest-owned command queue needs the kernel (a command queue driver) to control the queue by reading/writing its consumer and producer indexes, which means the command queue HW allows the guest kernel to get a direct R/W access to those registers. Introduce an mmap infrastructure to the iommufd core so as to support pass through a piece of MMIO region from the host physical address space to the guest physical address space. The VMA info (vm_pgoff/size) used by an mmap must be pre-allocated during the IOMMUFD_CMD_VCMDQ_ALLOC and given those info to the user space as an output driver-data by the IOMMUFD_CMD_VCMDQ_ALLOC. So, this requires a driver-specific user data support by a vIOMMU object. As a real-world use case, this series implements a vCMDQ support to the tegra241-cmdqv driver for the vCMDQ on NVIDIA Grace CPU. In another word, this is also the Tegra CMDQV series Part-2 (user-space support), reworked from Previous RFCv1: https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/ This is on Github: https://github.com/nicolinc/iommufd/commits/iommufd_vcmdq-v1 Paring QEMU branch for testing: https://github.com/nicolinc/qemu/commits/wip/for_iommufd_vcmdq-v1 Thanks Nicolin Nicolin Chen (16): iommu: Pass in a driver-level user data structure to viommu_alloc op iommufd/viommu: Allow driver-specific user data for a vIOMMU object iommu: Add iommu_copy_struct_to_user helper iommufd: Add iommufd_struct_destroy to revert iommufd_viommu_alloc iommufd/selftest: Support user_data in mock_viommu_alloc iommufd/selftest: Add covearge for viommu data iommufd/viommu: Add driver-allocated vDEVICE support iommufd/viommu: Introduce IOMMUFD_OBJ_VCMDQ and its related struct iommufd/viommmu: Add IOMMUFD_CMD_VCMDQ_ALLOC ioctl iommufd: Add mmap interface iommufd/selftest: Add coverage for the new mmap interface Documentation: userspace-api: iommufd: Update vCMDQ iommu/tegra241-cmdqv: Use request_threaded_irq iommu/arm-smmu-v3: Add vsmmu_alloc impl op iommu/tegra241-cmdqv: Add user-space use support iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 24 +- drivers/iommu/iommufd/iommufd_private.h | 20 +- drivers/iommu/iommufd/iommufd_test.h | 17 + include/linux/iommu.h | 43 ++- include/linux/iommufd.h | 93 +++++ include/uapi/linux/iommufd.h | 87 +++++ tools/testing/selftests/iommu/iommufd_utils.h | 21 +- .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 26 +- .../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 349 +++++++++++++++++- drivers/iommu/iommufd/driver.c | 54 +++ drivers/iommu/iommufd/main.c | 54 ++- drivers/iommu/iommufd/selftest.c | 58 ++- drivers/iommu/iommufd/viommu.c | 78 +++- tools/testing/selftests/iommu/iommufd.c | 34 +- .../selftests/iommu/iommufd_fail_nth.c | 5 +- Documentation/userspace-api/iommufd.rst | 11 + 16 files changed, 912 insertions(+), 62 deletions(-) -- 2.43.0

8 months, 3 weeks

6
51
0 0

[PATCH v5 00/13] riscv: add SBI FWFT misaligned exception delegation support

by Clément Léger

The SBI Firmware Feature extension allows the S-mode to request some specific features (either hardware or software) to be enabled. This series uses this extension to request misaligned access exception delegation to S-mode in order to let the kernel handle it. It also adds support for the KVM FWFT SBI extension based on the misaligned access handling infrastructure. FWFT SBI extension is part of the SBI V3.0 specifications [1]. It can be tested using the qemu provided at [2] which contains the series from [3]. Upstream kvm-unit-tests can be used inside kvm to tests the correct delegation of misaligned exceptions. Upstream OpenSBI can be used. Note: Since SBI V3.0 is not yet ratified, FWFT extension API is split between interface only and implementation, allowing to pick only the interface which do not have hard dependencies on SBI. The tests can be run using the kselftest from series [4]. $ qemu-system-riscv64 \ -cpu rv64,trap-misaligned-access=true,v=true \ -M virt \ -m 1024M \ -bios fw_dynamic.bin \ -kernel Image ... # ./misaligned TAP version 13 1..23 # Starting 23 tests from 1 test cases. # RUN global.gp_load_lh ... # OK global.gp_load_lh ok 1 global.gp_load_lh # RUN global.gp_load_lhu ... # OK global.gp_load_lhu ok 2 global.gp_load_lhu # RUN global.gp_load_lw ... # OK global.gp_load_lw ok 3 global.gp_load_lw # RUN global.gp_load_lwu ... # OK global.gp_load_lwu ok 4 global.gp_load_lwu # RUN global.gp_load_ld ... # OK global.gp_load_ld ok 5 global.gp_load_ld # RUN global.gp_load_c_lw ... # OK global.gp_load_c_lw ok 6 global.gp_load_c_lw # RUN global.gp_load_c_ld ... # OK global.gp_load_c_ld ok 7 global.gp_load_c_ld # RUN global.gp_load_c_ldsp ... # OK global.gp_load_c_ldsp ok 8 global.gp_load_c_ldsp # RUN global.gp_load_sh ... # OK global.gp_load_sh ok 9 global.gp_load_sh # RUN global.gp_load_sw ... # OK global.gp_load_sw ok 10 global.gp_load_sw # RUN global.gp_load_sd ... # OK global.gp_load_sd ok 11 global.gp_load_sd # RUN global.gp_load_c_sw ... # OK global.gp_load_c_sw ok 12 global.gp_load_c_sw # RUN global.gp_load_c_sd ... # OK global.gp_load_c_sd ok 13 global.gp_load_c_sd # RUN global.gp_load_c_sdsp ... # OK global.gp_load_c_sdsp ok 14 global.gp_load_c_sdsp # RUN global.fpu_load_flw ... # OK global.fpu_load_flw ok 15 global.fpu_load_flw # RUN global.fpu_load_fld ... # OK global.fpu_load_fld ok 16 global.fpu_load_fld # RUN global.fpu_load_c_fld ... # OK global.fpu_load_c_fld ok 17 global.fpu_load_c_fld # RUN global.fpu_load_c_fldsp ... # OK global.fpu_load_c_fldsp ok 18 global.fpu_load_c_fldsp # RUN global.fpu_store_fsw ... # OK global.fpu_store_fsw ok 19 global.fpu_store_fsw # RUN global.fpu_store_fsd ... # OK global.fpu_store_fsd ok 20 global.fpu_store_fsd # RUN global.fpu_store_c_fsd ... # OK global.fpu_store_c_fsd ok 21 global.fpu_store_c_fsd # RUN global.fpu_store_c_fsdsp ... # OK global.fpu_store_c_fsdsp ok 22 global.fpu_store_c_fsdsp # RUN global.gen_sigbus ... [12797.988647] misaligned[618]: unhandled signal 7 code 0x1 at 0x0000000000014dc0 in misaligned[4dc0,10000+76000] [12797.988990] CPU: 0 UID: 0 PID: 618 Comm: misaligned Not tainted 6.13.0-rc6-00008-g4ec4468967c9-dirty #51 [12797.989169] Hardware name: riscv-virtio,qemu (DT) [12797.989264] epc : 0000000000014dc0 ra : 0000000000014d00 sp : 00007fffe165d100 [12797.989407] gp : 000000000008f6e8 tp : 0000000000095760 t0 : 0000000000000008 [12797.989544] t1 : 00000000000965d8 t2 : 000000000008e830 s0 : 00007fffe165d160 [12797.989692] s1 : 000000000000001a a0 : 0000000000000000 a1 : 0000000000000002 [12797.989831] a2 : 0000000000000000 a3 : 0000000000000000 a4 : ffffffffdeadbeef [12797.989964] a5 : 000000000008ef61 a6 : 626769735f6e0000 a7 : fffffffffffff000 [12797.990094] s2 : 0000000000000001 s3 : 00007fffe165d838 s4 : 00007fffe165d848 [12797.990238] s5 : 000000000000001a s6 : 0000000000010442 s7 : 0000000000010200 [12797.990391] s8 : 000000000000003a s9 : 0000000000094508 s10: 0000000000000000 [12797.990526] s11: 0000555567460668 t3 : 00007fffe165d070 t4 : 00000000000965d0 [12797.990656] t5 : fefefefefefefeff t6 : 0000000000000073 [12797.990756] status: 0000000200004020 badaddr: 000000000008ef61 cause: 0000000000000006 [12797.990911] Code: 8793 8791 3423 fcf4 3783 fc84 c737 dead 0713 eef7 (c398) 0001 # OK global.gen_sigbus ok 23 global.gen_sigbus # PASSED: 23 / 23 tests passed. # Totals: pass:23 fail:0 xfail:0 xpass:0 skip:0 error:0 With kvm-tools: # lkvm run -k sbi.flat -m 128 Info: # lkvm run -k sbi.flat -m 128 -c 1 --name guest-97 Info: Removed ghost socket file "/root/.lkvm//guest-97.sock". ########################################################################## # kvm-unit-tests ########################################################################## ... [test messages elided] PASS: sbi: fwft: FWFT extension probing no error PASS: sbi: fwft: get/set reserved feature 0x6 error == SBI_ERR_DENIED PASS: sbi: fwft: get/set reserved feature 0x3fffffff error == SBI_ERR_DENIED PASS: sbi: fwft: get/set reserved feature 0x80000000 error == SBI_ERR_DENIED PASS: sbi: fwft: get/set reserved feature 0xbfffffff error == SBI_ERR_DENIED PASS: sbi: fwft: misaligned_deleg: Get misaligned deleg feature no error PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature invalid value error PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature invalid value error PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature value no error PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature value 0 PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature value no error PASS: sbi: fwft: misaligned_deleg: Set misaligned deleg feature value 1 PASS: sbi: fwft: misaligned_deleg: Verify misaligned load exception trap in supervisor SUMMARY: 50 tests, 2 unexpected failures, 12 skipped This series is available at [5]. Link: https://github.com/riscv-non-isa/riscv-sbi-doc/releases/download/vv3.0-rc2/… [1] Link: https://github.com/rivosinc/qemu/tree/dev/cleger/misaligned [2] Link: https://lore.kernel.org/all/20241211211933.198792-3-fkonrad@amd.com/T/ [3] Link: https://lore.kernel.org/linux-riscv/20250414123543.1615478-1-cleger@rivosin… [4] Link: https://github.com/rivosinc/linux/tree/dev/cleger/fwft [5] --- V5: - Return ERANGE as mapping for SBI_ERR_BAD_RANGE - Removed unused sbi_fwft_get() - Fix kernel for sbi_fwft_local_set_cpumask() - Fix indentation for sbi_fwft_local_set() - Remove spurious space in kvm_sbi_fwft_ops. - Rebased on origin/master - Remove fixes commits and sent them as a separate series [4] V4: - Check SBI version 3.0 instead of 2.0 for FWFT presence - Use long for kvm_sbi_fwft operation return value - Init KVM sbi extension even if default_disabled - Remove revert_on_fail parameter for sbi_fwft_feature_set(). - Fix comments for sbi_fwft_set/get() - Only handle local features (there are no globals yet in the spec) - Add new SBI errors to sbi_err_map_linux_errno() V3: - Added comment about kvm sbi fwft supported/set/get callback requirements - Move struct kvm_sbi_fwft_feature in kvm_sbi_fwft.c - Add a FWFT interface V2: - Added Kselftest for misaligned testing - Added get_user() usage instead of __get_user() - Reenable interrupt when possible in misaligned access handling - Document that riscv supports unaligned-traps - Fix KVM extension state when an init function is present - Rework SBI misaligned accesses trap delegation code - Added support for CPU hotplugging - Added KVM SBI reset callback - Added reset for KVM SBI FWFT lock - Return SBI_ERR_DENIED_LOCKED when LOCK flag is set Clément Léger (13): riscv: sbi: add Firmware Feature (FWFT) SBI extensions definitions riscv: sbi: add new SBI error mappings riscv: sbi: add FWFT extension interface riscv: sbi: add SBI FWFT extension calls riscv: misaligned: request misaligned exception from SBI riscv: misaligned: use on_each_cpu() for scalar misaligned access probing riscv: misaligned: use correct CONFIG_ ifdef for misaligned_access_speed riscv: misaligned: move emulated access uniformity check in a function riscv: misaligned: add a function to check misalign trap delegability RISC-V: KVM: add SBI extension init()/deinit() functions RISC-V: KVM: add SBI extension reset callback RISC-V: KVM: add support for FWFT SBI extension RISC-V: KVM: add support for SBI_FWFT_MISALIGNED_DELEG arch/riscv/include/asm/cpufeature.h | 8 +- arch/riscv/include/asm/kvm_host.h | 5 +- arch/riscv/include/asm/kvm_vcpu_sbi.h | 12 + arch/riscv/include/asm/kvm_vcpu_sbi_fwft.h | 29 +++ arch/riscv/include/asm/sbi.h | 60 +++++ arch/riscv/include/uapi/asm/kvm.h | 1 + arch/riscv/kernel/sbi.c | 75 ++++++ arch/riscv/kernel/traps_misaligned.c | 110 ++++++++- arch/riscv/kernel/unaligned_access_speed.c | 8 +- arch/riscv/kvm/Makefile | 1 + arch/riscv/kvm/vcpu.c | 7 +- arch/riscv/kvm/vcpu_sbi.c | 54 +++++ arch/riscv/kvm/vcpu_sbi_fwft.c | 252 +++++++++++++++++++++ arch/riscv/kvm/vcpu_sbi_sta.c | 3 +- 14 files changed, 610 insertions(+), 15 deletions(-) create mode 100644 arch/riscv/include/asm/kvm_vcpu_sbi_fwft.h create mode 100644 arch/riscv/kvm/vcpu_sbi_fwft.c -- 2.49.0

8 months, 3 weeks

2
23
0 0

[PATCH 0/3] tools/nolibc: make all headers usable directly

by Thomas Weißschuh

Make sure that any nolibc header can be included in any order. Even if nolibc.h was not pre-included already. This conflicts indirectly with "tools/nolibc: various new functions" [0]. I'll resolve those conflicts when applying. [0] https://lore.kernel.org/lkml/20250423-nolibc-misc-v1-0-a925bf40297b@linutro… Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> --- Thomas Weißschuh (3): tools/nolibc: add target to check header usability tools/nolibc: include nolibc.h early from all header files selftests/nolibc: always run nolibc header check tools/include/nolibc/Makefile | 9 +++++++++ tools/include/nolibc/ctype.h | 6 +++--- tools/include/nolibc/dirent.h | 6 +++--- tools/include/nolibc/elf.h | 6 +++--- tools/include/nolibc/errno.h | 6 +++--- tools/include/nolibc/fcntl.h | 6 +++--- tools/include/nolibc/getopt.h | 6 +++--- tools/include/nolibc/signal.h | 6 +++--- tools/include/nolibc/stdio.h | 6 +++--- tools/include/nolibc/stdlib.h | 6 +++--- tools/include/nolibc/string.h | 7 +++---- tools/include/nolibc/sys.h | 6 +++--- tools/include/nolibc/sys/auxv.h | 6 +++--- tools/include/nolibc/sys/mman.h | 6 +++--- tools/include/nolibc/sys/stat.h | 7 +++---- tools/include/nolibc/sys/syscall.h | 6 +++--- tools/include/nolibc/sys/time.h | 6 +++--- tools/include/nolibc/sys/wait.h | 7 +++---- tools/include/nolibc/time.h | 6 +++--- tools/include/nolibc/types.h | 6 +++--- tools/include/nolibc/unistd.h | 6 +++--- tools/testing/selftests/nolibc/Makefile | 2 +- 22 files changed, 70 insertions(+), 64 deletions(-) --- base-commit: e90ce42e81381665dbcedc5fa12e74759ee89639 change-id: 20250423-nolibc-header-check-8c9d21850d3f Best regards, -- Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>

8 months, 3 weeks

2
4
0 0

[PATCH 0/2] selftests: ublk: misc fixes

by Uday Shankar

Fix a couple of small issues in the ublk selftests Signed-off-by: Uday Shankar <ushankar(a)purestorage.com> --- Uday Shankar (2): selftests: ublk: kublk: build with -Werror selftests: ublk: common: fix _get_disk_dev_t for pre-9.0 coreutils tools/testing/selftests/ublk/Makefile | 2 +- tools/testing/selftests/ublk/test_common.sh | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) --- base-commit: d2ce053979d1d302fb009f6e1538a0776f177e1a change-id: 20250423-ublk_selftests-3b2e200b1fa4 Best regards, -- Uday Shankar <ushankar(a)purestorage.com>

8 months, 3 weeks

3
6
0 0

[PATCH net-next v10 0/9] Device memory TCP TX

by Mina Almasry

v10: https://lore.kernel.org/netdev/20250417231540.2780723-1-almasrymina@google.… Addressed comments following conversations with Pavel, Stan, and Harshitha. Thank you guys for the reviews again. Overall minor changes: Changelog: - Check for !niov->pp in io_zcrx_recv_frag, just in case we end up with a TX niov in that path (Pavel). - Fix locking case in !netif_device_present (Jakub/Stan). v9: https://lore.kernel.org/netdev/20250415224756.152002-1-almasrymina@google.c… Changelog: - Use priv->bindings list instead of sock_bindings_list. This was missed during the rebase as the bindings have been updated to use priv->bindings recently (thanks Stan!) v8: https://lore.kernel.org/netdev/20250308214045.1160445-1-almasrymina@google.… Only address minor comments on V7 Changelog: - Use netdev locking instead of rtnl_locking to match rx path. - Now that iouring zcrx is in net-next, use NET_IOV_IOURING instead of NET_IOV_UNSPECIFIED. - Post send binding to net_devmem_dmabuf_bindings after it's been fully initialized (Stan). v7: https://lore.kernel.org/netdev/20250227041209.2031104-1-almasrymina@google.… === Changelog: - Check the dmabuf net_iov binding belongs to the device the TX is going out on. (Jakub) - Provide detailed inspection of callsites of __skb_frag_ref/skb_page_unref in patch 2's changelog (Jakub) v6: https://lore.kernel.org/netdev/20250222191517.743530-1-almasrymina@google.c… === v6 has no major changes. Addressed a few issues from Paolo and David, and collected Acks from Stan. Thank you everyone for the review! Changes: - retain behavior to process MSG_FASTOPEN even if the provided cmsg is invalid (Paolo). - Rework the freeing of tx_vec slightly (it now has its own err label). (Paolo). - Squash the commit that makes dmabuf unbinding scheduled work into the same one which implements the TX path so we don't run into future errors on bisecting (Paolo). - Fix/add comments to explain how dmabuf binding refcounting works (David). v5: https://lore.kernel.org/netdev/20250220020914.895431-1-almasrymina@google.c… === v5 has no major changes; it clears up the relatively minor issues pointed out to in v4, and rebases the series on top of net-next to resolve the conflict with a patch that raced to the tree. It also collects the review tags from v4. Changes: - Rebase to net-next - Fix issues in selftest (Stan). - Address comments in the devmem and netmem driver docs (Stan and Bagas) - Fix zerocopy_fill_skb_from_devmem return error code (Stan). v4: https://lore.kernel.org/netdev/20250203223916.1064540-1-almasrymina@google.… === v4 mainly addresses the critical driver support issue surfaced in v3 by Paolo and Stan. Drivers aiming to support netmem_tx should make sure not to pass the netmem dma-addrs to the dma-mapping APIs, as these dma-addrs may come from dma-bufs. Additionally other feedback from v3 is addressed. Major changes: - Add helpers to handle netmem dma-addrs. Add GVE support for netmem_tx. - Fix binding->tx_vec not being freed on error paths during the tx binding. - Add a minimal devmem_tx test to devmem.py. - Clean up everything obsolete from the cover letter (Paolo). v3: https://patchwork.kernel.org/project/netdevbpf/list/?series=929401&state=* === Address minor comments from RFCv2 and fix a few build warnings and ynl-regen issues. No major changes. RFC v2: https://patchwork.kernel.org/project/netdevbpf/list/?series=920056&state=* ======= RFC v2 addresses much of the feedback from RFC v1. I plan on sending something close to this as net-next reopens, sending it slightly early to get feedback if any. Major changes: -------------- - much improved UAPI as suggested by Stan. We now interpret the iov_base of the passed in iov from userspace as the offset into the dmabuf to send from. This removes the need to set iov.iov_base = NULL which may be confusing to users, and enables us to send multiple iovs in the same sendmsg() call. ncdevmem and the docs show a sample use of that. - Removed the duplicate dmabuf iov_iter in binding->iov_iter. I think this is good improvment as it was confusing to keep track of 2 iterators for the same sendmsg, and mistracking both iterators caused a couple of bugs reported in the last iteration that are now resolved with this streamlining. - Improved test coverage in ncdevmem. Now multiple sendmsg() are tested, and sending multiple iovs in the same sendmsg() is tested. - Fixed issue where dmabuf unmapping was happening in invalid context (Stan). ==================================================================== The TX path had been dropped from the Device Memory TCP patch series post RFCv1 [1], to make that series slightly easier to review. This series rebases the implementation of the TX path on top of the net_iov/netmem framework agreed upon and merged. The motivation for the feature is thoroughly described in the docs & cover letter of the original proposal, so I don't repeat the lengthy descriptions here, but they are available in [1]. Full outline on usage of the TX path is detailed in the documentation included with this series. Test example is available via the kselftest included in the series as well. The series is relatively small, as the TX path for this feature largely piggybacks on the existing MSG_ZEROCOPY implementation. Patch Overview: --------------- 1. Documentation & tests to give high level overview of the feature being added. 1. Add netmem refcounting needed for the TX path. 2. Devmem TX netlink API. 3. Devmem TX net stack implementation. 4. Make dma-buf unbinding scheduled work to handle TX cases where it gets freed from contexts where we can't sleep. 5. Add devmem TX documentation. 6. Add scaffolding enabling driver support for netmem_tx. Add helpers, driver feature flag, and docs to enable drivers to declare netmem_tx support. 7. Guard netmem_tx against being enabled against drivers that don't support it. 8. Add devmem_tx selftests. Add TX path to ncdevmem and add a test to devmem.py. Testing: -------- Testing is very similar to devmem TCP RX path. The ncdevmem test used for the RX path is now augemented with client functionality to test TX path. * Test Setup: Kernel: net-next with this RFC and memory provider API cherry-picked locally. Hardware: Google Cloud A3 VMs. NIC: GVE with header split & RSS & flow steering support. Performance results are not included with this version, unfortunately. I'm having issues running the dma-buf exporter driver against the upstream kernel on my test setup. The issues are specific to that dma-buf exporter and do not affect this patch series. I plan to follow up this series with perf fixes if the tests point to issues once they're up and running. Special thanks to Stan who took a stab at rebasing the TX implementation on top of the netmem/net_iov framework merged. Parts of his proposal [2] that are reused as-is are forked off into their own patches to give full credit. [1] https://lore.kernel.org/netdev/20240909054318.1809580-1-almasrymina@google.… [2] https://lore.kernel.org/netdev/20240913150913.1280238-2-sdf@fomichev.me/T/#… Cc: sdf(a)fomichev.me Cc: asml.silence(a)gmail.com Cc: dw(a)davidwei.uk Cc: Jamal Hadi Salim <jhs(a)mojatatu.com> Cc: Victor Nogueira <victor(a)mojatatu.com> Cc: Pedro Tammela <pctammela(a)mojatatu.com> Cc: Samiullah Khawaja <skhawaja(a)google.com> Cc: Kuniyuki Iwashima <kuniyu(a)amazon.com> Mina Almasry (8): netmem: add niov->type attribute to distinguish different net_iov types net: add get_netmem/put_netmem support net: devmem: Implement TX path net: add devmem TCP TX documentation net: enable driver support for netmem TX gve: add netmem TX support to GVE DQO-RDA mode net: check for driver support in netmem TX selftests: ncdevmem: Implement devmem TCP TX Stanislav Fomichev (1): net: devmem: TCP tx netlink api Documentation/netlink/specs/netdev.yaml | 12 + Documentation/networking/devmem.rst | 150 ++++++++- .../networking/net_cachelines/net_device.rst | 1 + Documentation/networking/netdev-features.rst | 5 + Documentation/networking/netmem.rst | 23 +- drivers/net/ethernet/google/gve/gve_main.c | 4 + drivers/net/ethernet/google/gve/gve_tx_dqo.c | 8 +- include/linux/netdevice.h | 2 + include/linux/skbuff.h | 17 +- include/linux/skbuff_ref.h | 4 +- include/net/netmem.h | 34 +- include/net/sock.h | 1 + include/uapi/linux/netdev.h | 1 + io_uring/zcrx.c | 3 +- net/core/datagram.c | 48 ++- net/core/dev.c | 34 +- net/core/devmem.c | 139 ++++++-- net/core/devmem.h | 83 ++++- net/core/netdev-genl-gen.c | 13 + net/core/netdev-genl-gen.h | 1 + net/core/netdev-genl.c | 80 ++++- net/core/skbuff.c | 48 ++- net/core/sock.c | 6 + net/ipv4/ip_output.c | 3 +- net/ipv4/tcp.c | 50 ++- net/ipv6/ip6_output.c | 3 +- net/vmw_vsock/virtio_transport_common.c | 5 +- tools/include/uapi/linux/netdev.h | 1 + .../selftests/drivers/net/hw/devmem.py | 26 +- .../selftests/drivers/net/hw/ncdevmem.c | 300 +++++++++++++++++- 30 files changed, 1015 insertions(+), 90 deletions(-) base-commit: 21b01cb8e88ea200a834a2c114b5dc6aa378ac56 -- 2.49.0.805.g082f7c87e0-goog

8 months, 3 weeks

4
15
0 0

[PATCH] selftests/mm: compaction_test: Support platform with huge mount of memory

by Feng Tang

When running mm selftest to verify mm patches, 'compaction_test' case failed on an x86 server with 1TB memory. And the root cause is that it has too much free memory than what the test supports. The test case tries to allocate 100000 huge pages, which is about 200 GB for that x86 server, and when it succeeds, it expects it's large than 1/3 of 80% of the free memory in system. This logic only works for platform with 750 GB ( 200 / (1/3) / 80% ) or less free memory, and may raise false alarm for others. Fix it by changing the fixed page number to self-adjustable number according to the real number of free memory. Fixes: bd67d5c15cc19 ("Test compaction of mlocked memory") Signed-off-by: Feng Tang <feng.tang(a)linux.alibaba.com> --- tools/testing/selftests/mm/compaction_test.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c index 2c3a0eb6b22d..9bc4591c7b16 100644 --- a/tools/testing/selftests/mm/compaction_test.c +++ b/tools/testing/selftests/mm/compaction_test.c @@ -90,6 +90,8 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size, int compaction_index = 0; char nr_hugepages[20] = {0}; char init_nr_hugepages[24] = {0}; + char target_nr_hugepages[24] = {0}; + int slen; snprintf(init_nr_hugepages, sizeof(init_nr_hugepages), "%lu", initial_nr_hugepages); @@ -106,11 +108,18 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size, goto out; } - /* Request a large number of huge pages. The Kernel will allocate - as much as it can */ - if (write(fd, "100000", (6*sizeof(char))) != (6*sizeof(char))) { - ksft_print_msg("Failed to write 100000 to /proc/sys/vm/nr_hugepages: %s\n", - strerror(errno)); + /* + * Request huge pages for about half of the free memory. The Kernel + * will allocate as much as it can, and we expect it will get at least 1/3 + */ + nr_hugepages_ul = mem_free / hugepage_size / 2; + snprintf(target_nr_hugepages, sizeof(target_nr_hugepages), + "%lu", nr_hugepages_ul); + + slen = strlen(target_nr_hugepages); + if (write(fd, target_nr_hugepages, slen) != slen) { + ksft_print_msg("Failed to write %lu to /proc/sys/vm/nr_hugepages: %s\n", + nr_hugepages_ul, strerror(errno)); goto close_fd; } -- 2.43.5

8 months, 3 weeks

3
3
0 0

[PATCH] selftests/pidfd: align stack to fix SP alignment exception

by Shuai Xue

The pidfd_test fails on the ARM64 platform with the following error: Bail out! pidfd_poll check for premature notification on child thread exec test: Failed When exception-trace is enabled, the kernel logs the details: #echo 1 > /proc/sys/debug/exception-trace #dmesg | tail -n 20 [48628.713023] pidfd_test[1082142]: unhandled exception: SP Alignment, ESR 0x000000009a000000, SP/PC alignment exception in pidfd_test[400000+4000] [48628.713049] CPU: 21 PID: 1082142 Comm: pidfd_test Kdump: loaded Tainted: G W E 6.6.71-3_rc1.al8.aarch64 #1 [48628.713051] Hardware name: AlibabaCloud AliServer-Xuanwu2.0AM-1UC1P-5B/AS1111MG1, BIOS 1.2.M1.AL.P.157.00 07/29/2023 [48628.713053] pstate: 60001800 (nZCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=-c) [48628.713055] pc : 0000000000402100 [48628.713056] lr : 0000ffff98288f9c [48628.713056] sp : 0000ffffde49daa8 [48628.713057] x29: 0000000000000000 x28: 0000000000000000 x27: 0000000000000000 [48628.713060] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 [48628.713062] x23: 0000000000000000 x22: 0000000000000000 x21: 0000000000400e80 [48628.713065] x20: 0000000000000000 x19: 0000000000402650 x18: 0000000000000000 [48628.713067] x17: 00000000004200d8 x16: 0000ffff98288f40 x15: 0000ffffde49b92c [48628.713070] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [48628.713072] x11: 0000000000001011 x10: 0000000000402100 x9 : 0000000000000010 [48628.713074] x8 : 00000000000000dc x7 : 3861616239346564 x6 : 000000000000000a [48628.713077] x5 : 0000ffffde49daa8 x4 : 000000000000000a x3 : 0000ffffde49daa8 [48628.713079] x2 : 0000ffffde49dadc x1 : 0000ffffde49daa8 x0 : 0000000000000000 According to ARM ARM D1.3.10.2 SP alignment checking: > When the SP is used as the base address of a calculation, regardless of > any offset applied by the instruction, if bits [3:0] of the SP are not > 0b0000, there is a misaligned SP. To fix it, align the stack with 16 bytes. Signed-off-by: Shuai Xue <xueshuai(a)linux.alibaba.com> --- tools/testing/selftests/pidfd/pidfd_test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/pidfd/pidfd_test.c b/tools/testing/selftests/pidfd/pidfd_test.c index c081ae91313a..ec161a7c3ff9 100644 --- a/tools/testing/selftests/pidfd/pidfd_test.c +++ b/tools/testing/selftests/pidfd/pidfd_test.c @@ -33,7 +33,7 @@ static bool have_pidfd_send_signal; static pid_t pidfd_clone(int flags, int *pidfd, int (*fn)(void *)) { size_t stack_size = 1024; - char *stack[1024] = { 0 }; + char *stack[1024] __attribute__((aligned(16))) = {0}; #ifdef __ia64__ return __clone2(fn, stack, stack_size, flags | SIGCHLD, NULL, pidfd); -- 2.39.3

8 months, 3 weeks

2
4
0 0

[PATCH net-next v3 0/3] Fix netdevim to correctly mark NAPI IDs

by Joe Damato

Greetings: Welcome to v3. This series fixes netdevsim to correctly set the NAPI ID on the skb. This is helpful for writing tests around features that use SO_INCOMING_NAPI_ID. In addition to the netdevsim fix in patch 1, patches 2 & 3 do some self test refactoring and add a test for NAPI IDs. The test itself (patch 4) introduces a C helper because apparently python doesn't have socket.SO_INCOMING_NAPI_ID. Thanks, Joe v3: - Dropped patch 3 from v2 as it is no longer necessary. - Patch 3 from this series (which was patch 4 in the v2) - Sorted .gitignore alphabetically - added cfg.remote_deploy so the test supports real remote machines - Dropped the NetNSEnter as it is unnecessary - Fixed a string interpolation issue that Paolo hit with his Python version v2: https://lore.kernel.org/netdev/20250417013301.39228-1-jdamato@fastly.com/ - No longer an RFC - Minor whitespace change in patch 1 (no functional change). - Patches 2-4 new in v2 rfcv1: https://lore.kernel.org/netdev/20250329000030.39543-1-jdamato@fastly.com/ Joe Damato (3): netdevsim: Mark NAPI ID on skb in nsim_rcv selftests: drv-net: Factor out ksft C helpers selftests: drv-net: Test that NAPI ID is non-zero drivers/net/netdevsim/netdev.c | 2 + .../testing/selftests/drivers/net/.gitignore | 1 + tools/testing/selftests/drivers/net/Makefile | 6 +- tools/testing/selftests/drivers/net/ksft.h | 56 +++++++++++++ .../testing/selftests/drivers/net/napi_id.py | 24 ++++++ .../selftests/drivers/net/napi_id_helper.c | 83 +++++++++++++++++++ .../selftests/drivers/net/xdp_helper.c | 49 +---------- 7 files changed, 173 insertions(+), 48 deletions(-) create mode 100644 tools/testing/selftests/drivers/net/ksft.h create mode 100755 tools/testing/selftests/drivers/net/napi_id.py create mode 100644 tools/testing/selftests/drivers/net/napi_id_helper.c base-commit: 22ab6b9467c1822291a1175a0eb825b7ec057ef9 -- 2.43.0

8 months, 3 weeks

2
6
0 0

[PATCH net 0/2] mptcp: pm: Defer freeing userspace pm entries

by Matthieu Baerts (NGI0)

Here are two unrelated fixes for MPTCP: - Patch 1: free userspace PM entry with RCU helpers. A fix for v6.14. - Patch 2: avoid a warning when running diag.sh selftest. A fix for v6.15-rc1. Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org> --- Geliang Tang (1): selftests: mptcp: diag: use mptcp_lib_get_info_value Mat Martineau (1): mptcp: pm: Defer freeing of MPTCP userspace path manager entries net/mptcp/pm_userspace.c | 6 +++++- tools/testing/selftests/net/mptcp/diag.sh | 5 ++--- 2 files changed, 7 insertions(+), 4 deletions(-) --- base-commit: 750d0ac001e85b754404178ee8ce01cbc76a03be change-id: 20250421-net-mptcp-pm-defer-freeing-f9cd01b70043 Best regards, -- Matthieu Baerts (NGI0) <matttbe(a)kernel.org>

8 months, 3 weeks

2
3
0 0

[PATCH bpf-next v1 0/2] bpf: Fix panic in bpf_get_local_storage

by Jiayuan Chen

The selftest I provided can reproduce a panic: './test_progs -a cgroup_storage_update' When we attach a program to cgroup and if prog->aux->cgroup_storage exists, which means the cgroup_storage map is used in the program, we will then allocate storage by bpf_cgroup_storages_alloc() and assign it to pl->storage. At the end, pl->storage will be assigned to cgrp->bpf.effective[atype]->cgroup_storage by xxx_effective_progs(). But when we attach a program without the cgroup_storage map being used (prog->aux->cgroup_storage is empty), the cgroup_storage in struct bpf_prog_array_item is empty. Then, if we use BPF_LINK_UPDATE to replace the old program with a new one that uses the cgroup_storage map, we miss the cgroup_storage being initialized. This causes a panic when accessing storage in bpf_get_local_storage. Jiayuan Chen (2): bpf: Create cgroup storage if needed when updating link selftests/bpf: Add link update test for cgroup_storage kernel/bpf/cgroup.c | 24 +++++++--- .../selftests/bpf/prog_tests/cgroup_storage.c | 45 +++++++++++++++++++ .../selftests/bpf/progs/cgroup_storage.c | 6 +++ 3 files changed, 70 insertions(+), 5 deletions(-) -- 2.47.1

8 months, 3 weeks

3
6
0 0

[PATCH bpf-next v4 0/2] bpf: Allow access to const void pointer arguments in tracing programs

by KaFai Wan

If we try to access argument which is pointer to const void, it's an UNKNOWN type, verifier will fail to load. Use is_void_or_int_ptr to check if type is void or int pointer. Add a selftest to check it. --- KaFai Wan (2): bpf: Allow access to const void pointer arguments in tracing programs selftests/bpf: Add test to access const void pointer argument in tracing program kernel/bpf/btf.c | 13 +++---------- net/bpf/test_run.c | 8 +++++++- .../selftests/bpf/progs/verifier_btf_ctx_access.c | 12 ++++++++++++ 3 files changed, 22 insertions(+), 11 deletions(-) Changelog: v3->v4: Addressed comments from Alexei Starovoitov - change SOB to match From email address - add Acked-by from jirka Details in here: https://lore.kernel.org/all/20250417151548.1276279-1-kafai.wan@hotmail.com/ v2->v3: Addressed comments from jirka - remove duplicate checks for void pointer Details in here: https://lore.kernel.org/bpf/20250416161756.1079178-1-kafai.wan@hotmail.com/ v1->v2: Addressed comments from jirka - use btf_type_is_void to check if type is void - merge is_void_ptr and is_int_ptr to is_void_or_int_ptr - fix selftests Details in here: https://lore.kernel.org/all/20250412170626.3638516-1-kafai.wan@hotmail.com/ -- 2.43.0

8 months, 3 weeks

3
4
0 0

[PATCH] selftests/bpf: Fix null pointer check in skb_pkt_end.c

by Prabhav Kumar Vaish

Ensure that 'tcp' is checked for NULL before dereferencing. This resolves a potential null pointer dereference warning reported by static analysis. Signed-off-by: Prabhav Kumar Vaish <pvkumar5749404(a)gmail.com> --- tools/testing/selftests/bpf/progs/skb_pkt_end.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/progs/skb_pkt_end.c b/tools/testing/selftests/bpf/progs/skb_pkt_end.c index 3bb4451524a1..db33ff2839f7 100644 --- a/tools/testing/selftests/bpf/progs/skb_pkt_end.c +++ b/tools/testing/selftests/bpf/progs/skb_pkt_end.c @@ -45,10 +45,10 @@ int main_prog(struct __sk_buff *skb) goto out; tcp = (void*)(ip + 1); - if (tcp->dest != 0) - goto out; if (!tcp) goto out; + if (tcp->dest != 0) + goto out; urg_ptr = tcp->urg_ptr; -- 2.34.1

8 months, 3 weeks

2
2
0 0

[PATCH 0/4] Replace CONFIG_DMABUF_SYSFS_STATS with BPF

by T.J. Mercier

Until CONFIG_DMABUF_SYSFS_STATS was added [1] it was only possible to perform per-buffer accounting with debugfs which is not suitable for production environments. Eventually we discovered the overhead with per-buffer sysfs file creation/removal was significantly impacting allocation and free times, and exacerbated kernfs lock contention. [2] dma_buf_stats_setup() is responsible for 39% of single-page buffer creation duration, or 74% of single-page dma_buf_export() duration when stressing dmabuf allocations and frees. I prototyped a change from per-buffer to per-exporter statistics with a RCU protected list of exporter allocations that accommodates most (but not all) of our use-cases and avoids almost all of the sysfs overhead. While that adds less overhead than per-buffer sysfs, and less even than the maintenance of the dmabuf debugfs_list, it's still *additional* overhead on top of the debugfs_list and doesn't give us per-buffer info. This series uses the existing dmabuf debugfs_list to implement a BPF dmabuf iterator, which adds no overhead to buffer allocation/free and provides per-buffer info. While the kernel must have CONFIG_DEBUG_FS for the dmabuf_iter to be available, debugfs does not need to be mounted. The BPF program loaded by userspace that extracts per-buffer information gets to define its own interface which avoids the lack of ABI stability with debugfs (even if it were mounted). As this is a replacement for our use of CONFIG_DMABUF_SYSFS_STATS, the last patch is a RFC for removing it from the kernel. Please see my suggestion there regarding the timeline for that. [1] https://lore.kernel.org/linux-media/20201210044400.1080308-1-hridya@google.… [2] https://lore.kernel.org/all/20220516171315.2400578-1-tjmercier@google.com/ T.J. Mercier (4): dma-buf: Rename and expose debugfs symbols bpf: Add dmabuf iterator selftests/bpf: Add test for dmabuf_iter RFC: dma-buf: Remove DMA-BUF statistics .../ABI/testing/sysfs-kernel-dmabuf-buffers | 24 --- Documentation/driver-api/dma-buf.rst | 5 - drivers/dma-buf/Kconfig | 15 -- drivers/dma-buf/Makefile | 1 - drivers/dma-buf/dma-buf-sysfs-stats.c | 202 ------------------ drivers/dma-buf/dma-buf-sysfs-stats.h | 35 --- drivers/dma-buf/dma-buf.c | 40 +--- include/linux/btf_ids.h | 1 + include/linux/dma-buf.h | 6 + kernel/bpf/Makefile | 3 + kernel/bpf/dmabuf_iter.c | 130 +++++++++++ tools/testing/selftests/bpf/config | 1 + .../selftests/bpf/prog_tests/dmabuf_iter.c | 116 ++++++++++ .../testing/selftests/bpf/progs/dmabuf_iter.c | 31 +++ 14 files changed, 299 insertions(+), 311 deletions(-) delete mode 100644 Documentation/ABI/testing/sysfs-kernel-dmabuf-buffers delete mode 100644 drivers/dma-buf/dma-buf-sysfs-stats.c delete mode 100644 drivers/dma-buf/dma-buf-sysfs-stats.h create mode 100644 kernel/bpf/dmabuf_iter.c create mode 100644 tools/testing/selftests/bpf/prog_tests/dmabuf_iter.c create mode 100644 tools/testing/selftests/bpf/progs/dmabuf_iter.c -- 2.49.0.604.gff1f9ca942-goog

8 months, 3 weeks

5
26
0 0

Re: [PATCH v7 1/2] selftests: memcg: Allow low event with no memory.low and memory_recursiveprot on

by Michal Koutný

On Tue, Apr 22, 2025 at 07:58:56PM -0400, Waiman Long <llong(a)redhat.com> wrote: > Am I correct to assume that the purpose of 1d09069f5313f ("selftests: > memcg: expect no low events in unprotected sibling") is to force a > failure in the test_memcg_low test to force a change in the current > behavior? Or was it the case that it didn't fail when you submit your > patch? Yes, the failure had been intended to mark unexpected mode of reclaim (there's still a reproducer somewhere in the references). However, I learnt that: a) it ain't easy to fix, b) the only occurence of the troublesome behavior was in the test and never reported by users in real life. I've started to prefer the variant where the particular check is indefinite since that. HTH, Michal

8 months, 3 weeks

2
1
0 0

[PATCH net-next v9 0/9] Device memory TCP TX

by Mina Almasry

v9: https://lore.kernel.org/netdev/20250415224756.152002-1-almasrymina@google.c… Changelog: - Use priv->bindings list instead of sock_bindings_list. This was missed during the rebase as the bindings have been updated to use priv->bindings recently (thanks Stan!) v8: https://lore.kernel.org/netdev/20250308214045.1160445-1-almasrymina@google.… Only address minor comments on V7 Changelog: - Use netdev locking instead of rtnl_locking to match rx path. - Now that iouring zcrx is in net-next, use NET_IOV_IOURING instead of NET_IOV_UNSPECIFIED. - Post send binding to net_devmem_dmabuf_bindings after it's been fully initialized (Stan). v7: https://lore.kernel.org/netdev/20250227041209.2031104-1-almasrymina@google.… === Changelog: - Check the dmabuf net_iov binding belongs to the device the TX is going out on. (Jakub) - Provide detailed inspection of callsites of __skb_frag_ref/skb_page_unref in patch 2's changelog (Jakub) v6: https://lore.kernel.org/netdev/20250222191517.743530-1-almasrymina@google.c… === v6 has no major changes. Addressed a few issues from Paolo and David, and collected Acks from Stan. Thank you everyone for the review! Changes: - retain behavior to process MSG_FASTOPEN even if the provided cmsg is invalid (Paolo). - Rework the freeing of tx_vec slightly (it now has its own err label). (Paolo). - Squash the commit that makes dmabuf unbinding scheduled work into the same one which implements the TX path so we don't run into future errors on bisecting (Paolo). - Fix/add comments to explain how dmabuf binding refcounting works (David). v5: https://lore.kernel.org/netdev/20250220020914.895431-1-almasrymina@google.c… === v5 has no major changes; it clears up the relatively minor issues pointed out to in v4, and rebases the series on top of net-next to resolve the conflict with a patch that raced to the tree. It also collects the review tags from v4. Changes: - Rebase to net-next - Fix issues in selftest (Stan). - Address comments in the devmem and netmem driver docs (Stan and Bagas) - Fix zerocopy_fill_skb_from_devmem return error code (Stan). v4: https://lore.kernel.org/netdev/20250203223916.1064540-1-almasrymina@google.… === v4 mainly addresses the critical driver support issue surfaced in v3 by Paolo and Stan. Drivers aiming to support netmem_tx should make sure not to pass the netmem dma-addrs to the dma-mapping APIs, as these dma-addrs may come from dma-bufs. Additionally other feedback from v3 is addressed. Major changes: - Add helpers to handle netmem dma-addrs. Add GVE support for netmem_tx. - Fix binding->tx_vec not being freed on error paths during the tx binding. - Add a minimal devmem_tx test to devmem.py. - Clean up everything obsolete from the cover letter (Paolo). v3: https://patchwork.kernel.org/project/netdevbpf/list/?series=929401&state=* === Address minor comments from RFCv2 and fix a few build warnings and ynl-regen issues. No major changes. RFC v2: https://patchwork.kernel.org/project/netdevbpf/list/?series=920056&state=* ======= RFC v2 addresses much of the feedback from RFC v1. I plan on sending something close to this as net-next reopens, sending it slightly early to get feedback if any. Major changes: -------------- - much improved UAPI as suggested by Stan. We now interpret the iov_base of the passed in iov from userspace as the offset into the dmabuf to send from. This removes the need to set iov.iov_base = NULL which may be confusing to users, and enables us to send multiple iovs in the same sendmsg() call. ncdevmem and the docs show a sample use of that. - Removed the duplicate dmabuf iov_iter in binding->iov_iter. I think this is good improvment as it was confusing to keep track of 2 iterators for the same sendmsg, and mistracking both iterators caused a couple of bugs reported in the last iteration that are now resolved with this streamlining. - Improved test coverage in ncdevmem. Now multiple sendmsg() are tested, and sending multiple iovs in the same sendmsg() is tested. - Fixed issue where dmabuf unmapping was happening in invalid context (Stan). ==================================================================== The TX path had been dropped from the Device Memory TCP patch series post RFCv1 [1], to make that series slightly easier to review. This series rebases the implementation of the TX path on top of the net_iov/netmem framework agreed upon and merged. The motivation for the feature is thoroughly described in the docs & cover letter of the original proposal, so I don't repeat the lengthy descriptions here, but they are available in [1]. Full outline on usage of the TX path is detailed in the documentation included with this series. Test example is available via the kselftest included in the series as well. The series is relatively small, as the TX path for this feature largely piggybacks on the existing MSG_ZEROCOPY implementation. Patch Overview: --------------- 1. Documentation & tests to give high level overview of the feature being added. 1. Add netmem refcounting needed for the TX path. 2. Devmem TX netlink API. 3. Devmem TX net stack implementation. 4. Make dma-buf unbinding scheduled work to handle TX cases where it gets freed from contexts where we can't sleep. 5. Add devmem TX documentation. 6. Add scaffolding enabling driver support for netmem_tx. Add helpers, driver feature flag, and docs to enable drivers to declare netmem_tx support. 7. Guard netmem_tx against being enabled against drivers that don't support it. 8. Add devmem_tx selftests. Add TX path to ncdevmem and add a test to devmem.py. Testing: -------- Testing is very similar to devmem TCP RX path. The ncdevmem test used for the RX path is now augemented with client functionality to test TX path. * Test Setup: Kernel: net-next with this RFC and memory provider API cherry-picked locally. Hardware: Google Cloud A3 VMs. NIC: GVE with header split & RSS & flow steering support. Performance results are not included with this version, unfortunately. I'm having issues running the dma-buf exporter driver against the upstream kernel on my test setup. The issues are specific to that dma-buf exporter and do not affect this patch series. I plan to follow up this series with perf fixes if the tests point to issues once they're up and running. Special thanks to Stan who took a stab at rebasing the TX implementation on top of the netmem/net_iov framework merged. Parts of his proposal [2] that are reused as-is are forked off into their own patches to give full credit. [1] https://lore.kernel.org/netdev/20240909054318.1809580-1-almasrymina@google.… [2] https://lore.kernel.org/netdev/20240913150913.1280238-2-sdf@fomichev.me/T/#… Cc: sdf(a)fomichev.me Cc: asml.silence(a)gmail.com Cc: dw(a)davidwei.uk Cc: Jamal Hadi Salim <jhs(a)mojatatu.com> Cc: Victor Nogueira <victor(a)mojatatu.com> Cc: Pedro Tammela <pctammela(a)mojatatu.com> Cc: Samiullah Khawaja <skhawaja(a)google.com> Cc: Kuniyuki Iwashima <kuniyu(a)amazon.com> Mina Almasry (8): netmem: add niov->type attribute to distinguish different net_iov types net: add get_netmem/put_netmem support net: devmem: Implement TX path net: add devmem TCP TX documentation net: enable driver support for netmem TX gve: add netmem TX support to GVE DQO-RDA mode net: check for driver support in netmem TX selftests: ncdevmem: Implement devmem TCP TX Stanislav Fomichev (1): net: devmem: TCP tx netlink api Documentation/netlink/specs/netdev.yaml | 12 + Documentation/networking/devmem.rst | 150 ++++++++- .../networking/net_cachelines/net_device.rst | 1 + Documentation/networking/netdev-features.rst | 5 + Documentation/networking/netmem.rst | 23 +- drivers/net/ethernet/google/gve/gve_main.c | 4 + drivers/net/ethernet/google/gve/gve_tx_dqo.c | 8 +- include/linux/netdevice.h | 2 + include/linux/skbuff.h | 17 +- include/linux/skbuff_ref.h | 4 +- include/net/netmem.h | 34 +- include/net/sock.h | 1 + include/uapi/linux/netdev.h | 1 + io_uring/zcrx.c | 1 + net/core/datagram.c | 48 ++- net/core/dev.c | 34 +- net/core/devmem.c | 139 ++++++-- net/core/devmem.h | 83 ++++- net/core/netdev-genl-gen.c | 13 + net/core/netdev-genl-gen.h | 1 + net/core/netdev-genl.c | 75 ++++- net/core/skbuff.c | 48 ++- net/core/sock.c | 6 + net/ipv4/ip_output.c | 3 +- net/ipv4/tcp.c | 50 ++- net/ipv6/ip6_output.c | 3 +- net/vmw_vsock/virtio_transport_common.c | 5 +- tools/include/uapi/linux/netdev.h | 1 + .../selftests/drivers/net/hw/devmem.py | 26 +- .../selftests/drivers/net/hw/ncdevmem.c | 300 +++++++++++++++++- 30 files changed, 1009 insertions(+), 89 deletions(-) base-commit: 240ce924d2718b8f6f622f2a9a9c219b9da736e8 -- 2.49.0.805.g082f7c87e0-goog

8 months, 3 weeks

3
24
0 0

[PATCH net-next v2] selftests/vsock: add initial vmtest.sh for vsock

by Bobby Eshleman

This commit introduces a new vmtest.sh runner for vsock. It uses virtme-ng/qemu to run tests in a VM. The tests validate G2H, H2G, and loopback. The testing tools from tools/testing/vsock/ are reused. Currently, only vsock_test is used. VMCI and hyperv support is automatically built, though not used. Only tested on x86. To run: $ tools/testing/selftests/vsock/vmtest.sh Future work can include vsock_diag_test. Signed-off-by: Bobby Eshleman <bobbyeshleman(a)gmail.com> --- Changes in v2: - add kernel oops and warnings checker - change testname variable to use FUNCNAME - fix spacing in test_vm_server_host_client - add -s skip build option to vmtest.sh - add test_vm_loopback - pass port to vm_wait_for_listener - fix indentation in vmtest.sh - add vmci and hyperv to config - changed whitespace from tabs to spaces in help string - Link to v1: https://lore.kernel.org/r/20250410-vsock-vmtest-v1-1-f35a81dab98c@gmail.com --- MAINTAINERS | 1 + tools/testing/selftests/vsock/.gitignore | 1 + tools/testing/selftests/vsock/config.vsock | 10 + tools/testing/selftests/vsock/vmtest.sh | 306 +++++++++++++++++++++++++++++ 4 files changed, 318 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index c3fce441672349f7850c57d788bc1a29b203fba5..f214cf7c4fb59ec67885ee6c81daa44e17c80f5f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -25323,6 +25323,7 @@ F: include/uapi/linux/vm_sockets.h F: include/uapi/linux/vm_sockets_diag.h F: include/uapi/linux/vsockmon.h F: net/vmw_vsock/ +F: tools/testing/selftests/vsock/ F: tools/testing/vsock/ VMALLOC diff --git a/tools/testing/selftests/vsock/.gitignore b/tools/testing/selftests/vsock/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..1950aa8ac68c0831c12c1aaa429da45bbe41e60f --- /dev/null +++ b/tools/testing/selftests/vsock/.gitignore @@ -0,0 +1 @@ +vsock_selftests.log diff --git a/tools/testing/selftests/vsock/config.vsock b/tools/testing/selftests/vsock/config.vsock new file mode 100644 index 0000000000000000000000000000000000000000..9e0fb2270e6a2fc0beb5f0d9f0bc37158d0a9d23 --- /dev/null +++ b/tools/testing/selftests/vsock/config.vsock @@ -0,0 +1,10 @@ +CONFIG_VSOCKETS=y +CONFIG_VSOCKETS_DIAG=y +CONFIG_VSOCKETS_LOOPBACK=y +CONFIG_VMWARE_VMCI_VSOCKETS=y +CONFIG_VIRTIO_VSOCKETS=y +CONFIG_VIRTIO_VSOCKETS_COMMON=y +CONFIG_HYPERV_VSOCKETS=y +CONFIG_VMWARE_VMCI=y +CONFIG_VHOST_VSOCK=y +CONFIG_HYPERV=y diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh new file mode 100755 index 0000000000000000000000000000000000000000..61dfcc06223fa7a30cb575cb3f2d01121b3ed3ce --- /dev/null +++ b/tools/testing/selftests/vsock/vmtest.sh @@ -0,0 +1,306 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (c) 2025 Meta Platforms, Inc. and affiliates +# +# Dependencies: +# * virtme-ng +# * busybox-static (used by virtme-ng) +# * qemu (used by virtme-ng) + +SCRIPT_DIR="$(cd -P -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)" +KERNEL_CHECKOUT=$(realpath ${SCRIPT_DIR}/../../../..) +PLATFORM=${PLATFORM:-$(uname -m)} + +if [[ -z "${QEMU:-}" ]]; then + QEMU=$(which qemu-system-${PLATFORM}) +fi + +SKIP_BUILD=0 + +VSOCK_TEST=${KERNEL_CHECKOUT}/tools/testing/vsock/vsock_test + +TEST_GUEST_PORT=51000 +TEST_HOST_PORT=50000 +TEST_HOST_PORT_LISTENER=50001 +SSH_GUEST_PORT=22 +SSH_HOST_PORT=2222 +VSOCK_CID=1234 + +QEMU_PIDFILE=/tmp/qemu.pid + +# virtme-ng offers a netdev for ssh when using "--ssh", but we also need a +# control port forwarded for vsock_test. Because virtme-ng doesn't support +# adding an additional port to forward to the device created from "--ssh" and +# virtme-init mistakenly sets identical IPs to the ssh device and additional +# devices, we instead opt out of using --ssh, add the device manually, and also +# add the kernel cmdline options that virtme-init uses to setup the interface. +QEMU_OPTS="" +QEMU_OPTS="${QEMU_OPTS} -netdev user,id=n0,hostfwd=tcp::${TEST_HOST_PORT}-:${TEST_GUEST_PORT}" +QEMU_OPTS="${QEMU_OPTS},hostfwd=tcp::${SSH_HOST_PORT}-:${SSH_GUEST_PORT}" +QEMU_OPTS="${QEMU_OPTS} -device virtio-net-pci,netdev=n0" +QEMU_OPTS="${QEMU_OPTS} -device vhost-vsock-pci,guest-cid=${VSOCK_CID}" +QEMU_OPTS="${QEMU_OPTS} --pidfile ${QEMU_PIDFILE}" +KERNEL_CMDLINE="virtme.dhcp net.ifnames=0 biosdevname=0 virtme.ssh virtme_ssh_user=$USER" + +LOG=${SCRIPT_DIR}/vsock_selftests.log + +# Name Description +tests=" + vm_server_host_client Run vsock_test in server mode on the VM and in client mode on the host. + vm_client_host_server Run vsock_test in client mode on the VM and in server mode on the host. + vm_loopback Run vsock_test using the loopback transport in the VM. +" + +usage() { + echo + echo "$0 [OPTIONS]" + echo + echo "Options" + echo " -v: verbose output" + echo " -s: skip build" + echo + echo "Available tests${tests}" + exit 1 +} + +die() { + echo "$*" >&2 + exit 1 +} + +vm_ssh() { + ssh -q -o UserKnownHostsFile=/dev/null -p 2222 localhost $* + return $? +} + +cleanup() { + if [[ -f "${QEMU_PIDFILE}" ]]; then + pkill -9 -F ${QEMU_PIDFILE} 2>&1 >/dev/null + fi +} + +build() { + log_setup "Building kernel and tests" + + pushd ${KERNEL_CHECKOUT} >/dev/null + vng \ + --kconfig \ + --config ${KERNEL_CHECKOUT}/tools/testing/selftests/vsock/config.vsock + make -j$(nproc) + make -C ${KERNEL_CHECKOUT}/tools/testing/vsock + popd >/dev/null + echo +} + +vm_setup() { + local VNG_OPTS="" + if [[ "${VERBOSE}" = 1 ]]; then + VNG_OPTS="--verbose" + fi + vng \ + $VNG_OPTS \ + --run ~/local/linux \ + --qemu /bin/qemu-system-x86_64 \ + --qemu-opts="${QEMU_OPTS}" \ + --user root \ + --append "${KERNEL_CMDLINE}" \ + --rw 2>&1 >/dev/null & +} + +vm_wait_for_ssh() { + i=0 + while [[ true ]]; do + if (( i > 20 )); then + die "Timed out waiting for guest ssh" + fi + vm_ssh -- true + if [[ $? -eq 0 ]]; then + break + fi + i=$(( i + 1 )) + sleep 5 + done +} + +wait_for_listener() { + local PORT=$1 + local i=0 + while ! ss -ltn | grep -q ":${PORT}"; do + if (( i > 30 )); then + die "Timed out waiting for listener on port ${PORT}" + fi + sleep 3 + i=$(( i + 1 )) + done +} + +vm_wait_for_listener() { + local port=$1 + vm_ssh -- "$(declare -f wait_for_listener); wait_for_listener ${port}" +} + +host_wait_for_listener() { + wait_for_listener ${TEST_HOST_LISTENER_PORT} +} + +log() { + local prefix="$1" + shift + + if [[ "$#" -eq 0 ]]; then + cat | awk '{ printf "%s:\t%s\n","'"${prefix}"'", $0 }' | tee -a ${LOG} + else + echo "$*" | awk '{ printf "%s:\t%s\n","'"${prefix}"'", $0 }' | tee -a ${LOG} + fi +} + +log_setup() { + log "setup" "$@" +} + +log_host() { + testname=$1 + shift + log "test:${testname}:host" "$@" +} + +log_guest() { + testname=$1 + shift + log "test:${testname}:guest" "$@" +} + +test_vm_server_host_client() { + local testname="${FUNCNAME[0]#test_}" + + vm_ssh -- "${VSOCK_TEST}" \ + --mode=server \ + --control-port="${TEST_GUEST_PORT}" \ + --peer-cid=2 \ + 2>&1 | log_guest "${testname}" & + + vm_wait_for_listener ${TEST_GUEST_PORT} + + ${VSOCK_TEST} \ + --mode=client \ + --control-host=127.0.0.1 \ + --peer-cid="${VSOCK_CID}" \ + --control-port="${TEST_HOST_PORT}" 2>&1 | log_host "${testname}" + + rc=$? +} + +test_vm_client_host_server() { + local testname="${FUNCNAME[0]#test_}" + + ${VSOCK_TEST} \ + --mode "server" \ + --control-port "${TEST_HOST_PORT_LISTENER}" \ + --peer-cid "${VSOCK_CID}" 2>&1 | log_host "${testname}" & + + host_wait_for_listener + + vm_ssh -- "${VSOCK_TEST}" \ + --mode=client \ + --control-host=10.0.2.2 \ + --peer-cid=2 \ + --control-port="${TEST_HOST_PORT_LISTENER}" 2>&1 | log_guest "${testname}" + + rc=$? +} + +test_vm_loopback() { + local testname="${FUNCNAME[0]#test_}" + local port=60000 # non-forwarded local port + + vm_ssh -- ${VSOCK_TEST} \ + --mode=server \ + --control-port="${port}" \ + --peer-cid="${VSOCK_CID}" & + + vm_wait_for_listener ${port} + + vm_ssh -- ${VSOCK_TEST} \ + --mode=client \ + --control-host="127.0.0.1" \ + --control-port="${port}" \ + --peer-cid="${VSOCK_CID}" + + rc=$? +} + +run_test() { + unset IFS + local host_oops_cnt_before=$(dmesg | grep -i 'Oops' | wc -l) + local host_warn_cnt_before=$(dmesg --level=warn | wc -l) + local vm_oops_cnt_before=$(vm_ssh -- dmesg | grep -i 'Oops' | wc -l) + local vm_warn_cnt_before=$(vm_ssh -- dmesg --level=warn | wc -l) + + name=$(echo "${1}" | awk '{ print $1 }') + eval test_"${name}" + + local host_oops_cnt_after=$(dmesg | grep -i 'Oops' | wc -l) + if [[ ${host_oops_cnt_after} -gt ${host_oops_cnt_before} ]]; then + echo "${name}: kernel oops detected on host" | log_host ${name} + rc=1 + fi + + local host_warn_cnt_after=$(dmesg --level=warn | wc -l) + if [[ ${host_warn_cnt_after} -gt ${host_warn_cnt_before} ]]; then + echo "${name}: kernel warning detected on host" | log_host ${name} + rc=1 + fi + + local vm_oops_cnt_after=$(vm_ssh -- dmesg | grep -i 'Oops' | wc -l) + if [[ ${vm_oops_cnt_after} -gt ${vm_oops_cnt_before} ]]; then + echo "${name}: kernel oops detected on vm" | log_host ${name} + rc=1 + fi + + local vm_warn_cnt_after=$(vm_ssh -- dmesg --level=warn | wc -l) + if [[ ${vm_warn_cnt_after} -gt ${vm_warn_cnt_before} ]]; then + echo "${name}: kernel warning detected on vm" | log_host ${name} + rc=1 + fi +} + +while getopts :hvs o +do + case $o in + v) VERBOSE=1;; + s) SKIP_BUILD=1;; + h|*) usage;; + esac +done +shift $((OPTIND-1)) + +trap cleanup EXIT + +> ${LOG} +if (( SKIP_BUILD != 1 )); then + build +fi +log_setup "Booting up VM" +vm_setup +vm_wait_for_ssh +log_setup "VM booted up" + +IFS=" +" +cnt=0 +for t in ${tests}; do + rc=0 + run_test "${t}" + if [[ ${rc} != 0 ]]; then + cnt=$(( cnt + 1 )) + fi +done + +if [[ ${cnt} = 0 ]]; then + echo OK +else + echo FAILED: ${cnt} +fi +echo "Log: ${LOG}" +exit ${cnt} --- base-commit: cc04ed502457412960d215b9cd55f0d966fda255 change-id: 20250325-vsock-vmtest-b3a21d2102c2 Best regards, -- Bobby Eshleman <bobbyeshleman(a)gmail.com>

8 months, 3 weeks

2
2
0 0

[PATCH][next] selftests/perf_events: Fix spelling mistake "sycnhronize" -> "synchronize"

by Colin Ian King

There is a spelling mistake in an error message. Fix it. Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com> --- tools/testing/selftests/perf_events/watermark_signal.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/perf_events/watermark_signal.c b/tools/testing/selftests/perf_events/watermark_signal.c index 49dc1e831174..e03fe1b9bba2 100644 --- a/tools/testing/selftests/perf_events/watermark_signal.c +++ b/tools/testing/selftests/perf_events/watermark_signal.c @@ -75,7 +75,7 @@ TEST(watermark_signal) if (waitpid(child, &child_status, WSTOPPED) != child || !(WIFSTOPPED(child_status) && WSTOPSIG(child_status) == SIGSTOP)) { fprintf(stderr, - "failed to sycnhronize with child errno=%d status=%x\n", + "failed to synchronize with child errno=%d status=%x\n", errno, child_status); goto cleanup; -- 2.49.0

8 months, 3 weeks

1
0
0 0

[PATCH v6 0/5] userfaultfd move option

by Suren Baghdasaryan

This patch series introduces UFFDIO_MOVE feature to userfaultfd, which has long been implemented and maintained by Andrea in his local tree [1], but was not upstreamed due to lack of use cases where this approach would be better than allocating a new page and copying the contents. Previous upstraming attempts could be found at [6] and [7]. UFFDIO_COPY performs ~20% better than UFFDIO_MOVE when the application needs pages to be allocated [2]. However, with UFFDIO_MOVE, if pages are available (in userspace) for recycling, as is usually the case in heap compaction algorithms, then we can avoid the page allocation and memcpy (done by UFFDIO_COPY). Also, since the pages are recycled in the userspace, we avoid the need to release (via madvise) the pages back to the kernel [3]. We see over 40% reduction (on a Google pixel 6 device) in the compacting thread’s completion time by using UFFDIO_MOVE vs. UFFDIO_COPY. This was measured using a benchmark that emulates a heap compaction implementation using userfaultfd (to allow concurrent accesses by application threads). More details of the usecase are explained in [3]. Furthermore, UFFDIO_MOVE enables moving swapped-out pages without touching them within the same vma. Today, it can only be done by mremap, however it forces splitting the vma. TODOs for follow-up improvements: - cross-mm support. Known differences from single-mm and missing pieces: - memcg recharging (might need to isolate pages in the process) - mm counters - cross-mm deposit table moves - cross-mm test - document the address space where src and dest reside in struct uffdio_move - TLB flush batching. Will require extensive changes to PTL locking in move_pages_pte(). OTOH that might let us reuse parts of mremap code. Changes since v5 [10]: - added logic to split large folios in move_pages_pte(), per David Hildenbrand - added check for PAE before split_huge_pmd() to avoid the split if the move operation can't be done - replaced calls to default_huge_page_size() with read_pmd_pagesize() in uffd_move_pmd test, per David Hildenbrand - fixed the condition in uffd_move_test_common() checking if area alignment is needed Changes since v4 [9]: - added Acked-by in patch 1, per Peter Xu - added description for ctx, mm and mode parameters of move_pages(), per kernel test robot - added Reviewed-by's, per Peter Xu and Axel Rasmussen - removed unused operations in uffd_test_case_ops - refactored uffd-unit-test changes to avoid using global variables and handle pmd moves without page size overrides, per Peter Xu Changes since v3 [8]: - changed retry path in folio_lock_anon_vma_read() to unlock and then relock RCU, per Peter Xu - removed cross-mm support from initial patchset, per David Hildenbrand - replaced BUG_ONs with VM_WARN_ON or WARN_ON_ONCE, per David Hildenbrand - added missing cache flushing, per Lokesh Gidra and Peter Xu - updated manpage text in the patch description, per Peter Xu - renamed internal functions from "remap" to "move", per Peter Xu - added mmap_changing check after taking mmap_lock, per Peter Xu - changed uffd context check to ensure dst_mm is registered onto uffd we are operating on, Peter Xu and David Hildenbrand - changed to non-maybe variants of maybe*_mkwrite(), per David Hildenbrand - fixed warning for CONFIG_TRANSPARENT_HUGEPAGE=n, per kernel test robot - comments cleanup, per David Hildenbrand and Peter Xu - checks for VM_IO,VM_PFNMAP,VM_HUGETLB,..., per David Hildenbrand - prevent moving pinned pages, per Peter Xu - changed uffd tests to call move uffd_test_ctx_clear() at the end of the test run instead of in the beginning of the next run - added support for testcase-specific ops - added test for moving PMD-aligned blocks Changes since v2 [5]: - renamed UFFDIO_REMAP to UFFDIO_MOVE, per David Hildenbrand - rebase over mm-unstable to use folio_move_anon_rmap(), per David Hildenbrand - added text for manpage explaining DONTFORK and KSM requirements for this feature, per David Hildenbrand - check for anon_vma changes in the fast path of folio_lock_anon_vma_read, per Peter Xu - updated the title and description of the first patch, per David Hildenbrand - updating comments in folio_lock_anon_vma_read() explaining the need for anon_vma checks, per David Hildenbrand - changed all mapcount checks to PageAnonExclusive, per Jann Horn and David Hildenbrand - changed counters in remap_swap_pte() from MM_ANONPAGES to MM_SWAPENTS, per Jann Horn - added a check for PTE change after folio is locked in remap_pages_pte(), per Jann Horn - added handling of PMD migration entries and bailout when pmd_devmap(), per Jann Horn - added checks to ensure both src and dst VMAs are writable, per Peter Xu - added UFFD_FEATURE_MOVE, per Peter Xu - removed obsolete comments, per Peter Xu - renamed remap_anon_pte to remap_present_pte, per Peter Xu - added a comment for folio_get_anon_vma() explaining the need for anon_vma checks, per Peter Xu - changed error handling in remap_pages() to make it more clear, per Peter Xu - changed EFAULT to EAGAIN to retry when a hugepage appears or disappears from under us, per Peter Xu - added links to previous upstreaming attempts, per David Hildenbrand Changes since v1 [4]: - add mmget_not_zero in userfaultfd_remap, per Jann Horn - removed extern from function definitions, per Matthew Wilcox - converted to folios in remap_pages_huge_pmd, per Matthew Wilcox - use PageAnonExclusive in remap_pages_huge_pmd, per David Hildenbrand - handle pgtable transfers between MMs, per Jann Horn - ignore concurrent A/D pte bit changes, per Jann Horn - split functions into smaller units, per David Hildenbrand - test for folio_test_large in remap_anon_pte, per Matthew Wilcox - use pte_swp_exclusive for swapcount check, per David Hildenbrand - eliminated use of mmu_notifier_invalidate_range_start_nonblock, per Jann Horn - simplified THP alignment checks, per Jann Horn - refactored the loop inside remap_pages, per Jann Horn - additional clarifying comments, per Jann Horn Main changes since Andrea's last version [1]: - Trivial translations from page to folio, mmap_sem to mmap_lock - Replace pmd_trans_unstable() with pte_offset_map_nolock() and handle its possible failure - Move pte mapping into remap_pages_pte to allow for retries when source page or anon_vma is contended. Since pte_offset_map_nolock() start RCU read section, we can't block anymore after mapping a pte, so have to unmap the ptesm do the locking and retry. - Add and use anon_vma_trylock_write() to avoid blocking while in RCU read section. - Accommodate changes in mmu_notifier_range_init() API, switch to mmu_notifier_invalidate_range_start_nonblock() to avoid blocking while in RCU read section. - Open-code now removed __swp_swapcount() - Replace pmd_read_atomic() with pmdp_get_lockless() - Add new selftest for UFFDIO_MOVE [1] https://gitlab.com/aarcange/aa/-/commit/2aec7aea56b10438a3881a20a411aa4b1fc… [2] https://lore.kernel.org/all/1425575884-2574-1-git-send-email-aarcange@redha… [3] https://lore.kernel.org/linux-mm/CA+EESO4uO84SSnBhArH4HvLNhaUQ5nZKNKXqxRCyj… [4] https://lore.kernel.org/all/20230914152620.2743033-1-surenb@google.com/ [5] https://lore.kernel.org/all/20230923013148.1390521-1-surenb@google.com/ [6] https://lore.kernel.org/all/1425575884-2574-21-git-send-email-aarcange@redh… [7] https://lore.kernel.org/all/cover.1547251023.git.blake.caldwell@colorado.ed… [8] https://lore.kernel.org/all/20231009064230.2952396-1-surenb@google.com/ [9] https://lore.kernel.org/all/20231028003819.652322-1-surenb@google.com/ [10] https://lore.kernel.org/all/20231121171643.3719880-1-surenb@google.com/ Andrea Arcangeli (2): mm/rmap: support move to different root anon_vma in folio_move_anon_rmap() userfaultfd: UFFDIO_MOVE uABI Suren Baghdasaryan (3): selftests/mm: call uffd_test_ctx_clear at the end of the test selftests/mm: add uffd_test_case_ops to allow test case-specific operations selftests/mm: add UFFDIO_MOVE ioctl test Documentation/admin-guide/mm/userfaultfd.rst | 3 + fs/userfaultfd.c | 72 +++ include/linux/rmap.h | 5 + include/linux/userfaultfd_k.h | 11 + include/uapi/linux/userfaultfd.h | 29 +- mm/huge_memory.c | 122 ++++ mm/khugepaged.c | 3 + mm/rmap.c | 30 + mm/userfaultfd.c | 614 +++++++++++++++++++ tools/testing/selftests/mm/uffd-common.c | 39 +- tools/testing/selftests/mm/uffd-common.h | 9 + tools/testing/selftests/mm/uffd-stress.c | 5 +- tools/testing/selftests/mm/uffd-unit-tests.c | 192 ++++++ 13 files changed, 1130 insertions(+), 4 deletions(-) -- 2.43.0.rc2.451.g8631bc7472-goog

8 months, 3 weeks

7
43
0 0

[PATCH] selftests: pid_namespace: Fix pid_max build with missing mount header

by Niko Nikolov

Add #include <sys/mount.h> to pid_max.c to fix build errors caused by implicit declarations of mount() and umount2(), and undefined symbols MS_PRIVATE, MS_REC, and MNT_DETACH. Signed-off-by: Niko Nikolov <nikolay.niko.nikolov(a)gmail.com> --- tools/testing/selftests/pid_namespace/pid_max.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/pid_namespace/pid_max.c b/tools/testing/selftests/pid_namespace/pid_max.c index 51c414faabb0..972bedc475f1 100644 --- a/tools/testing/selftests/pid_namespace/pid_max.c +++ b/tools/testing/selftests/pid_namespace/pid_max.c @@ -11,6 +11,7 @@ #include <string.h> #include <syscall.h> #include <sys/wait.h> +#include <sys/mount.h> #include "../kselftest_harness.h" #include "../pidfd/pidfd.h" -- 2.49.0

8 months, 3 weeks

1
0
0 0

[PATCH bpf-next v3 0/2] bpf: Allow access to const void pointer arguments in tracing programs

by KaFai Wan

If we try to access argument which is pointer to const void, it's an UNKNOWN type, verifier will fail to load. Use is_void_or_int_ptr to check if type is void or int pointer. Add a selftest to check it. --- KaFai Wan (2): bpf: Allow access to const void pointer arguments in tracing programs selftests/bpf: Add test to access const void pointer argument in tracing program kernel/bpf/btf.c | 13 +++---------- net/bpf/test_run.c | 8 +++++++- .../selftests/bpf/progs/verifier_btf_ctx_access.c | 12 ++++++++++++ 3 files changed, 22 insertions(+), 11 deletions(-) Changelog: v2->v3: Addressed comments from jirka - remove duplicate checks for void pointer Details in here: https://lore.kernel.org/bpf/20250416161756.1079178-1-kafai.wan@hotmail.com/ v1->v2: Addressed comments from jirka - use btf_type_is_void to check if type is void - merge is_void_ptr and is_int_ptr to is_void_or_int_ptr - fix selftests Details in here: https://lore.kernel.org/all/20250412170626.3638516-1-kafai.wan@hotmail.com/ -- 2.43.0

8 months, 3 weeks

3
4
0 0

[PATCH] lib: Ensure prime numbers tests are included in KUnit test runs

by Mark Brown

When the select of PRIME_MUMBERS was removed from it's KUnit test Kconfig nothing was added to the KUnit configs, meaning that when run via the KUnit runner the tests are neither built nor run. Add PRIME_NUMBERS to all_tests.config so they are enabled when the KUnit runner builds the kernel. Fixes: 3f2925174f8b ("lib/prime_numbers: KUnit test should not select PRIME_NUMBERS") Signed-off-by: Mark Brown <broonie(a)kernel.org> --- tools/testing/kunit/configs/all_tests.config | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/kunit/configs/all_tests.config b/tools/testing/kunit/configs/all_tests.config index cdd9782f9646..7bb885b0c32d 100644 --- a/tools/testing/kunit/configs/all_tests.config +++ b/tools/testing/kunit/configs/all_tests.config @@ -43,6 +43,8 @@ CONFIG_REGMAP_BUILD=y CONFIG_AUDIT=y +CONFIG_PRIME_NUMBERS=y + CONFIG_SECURITY=y CONFIG_SECURITY_APPARMOR=y CONFIG_SECURITY_LANDLOCK=y --- base-commit: 9c32cda43eb78f78c73aee4aa344b777714e259b change-id: 20250422-lib-fix-prime-numbers-kunit-323659c2cfe2 Best regards, -- Mark Brown <broonie(a)kernel.org>

8 months, 3 weeks

3
2
0 0

[PATCH v2] tracing: selftests: Add testing a user string to filters

by Steven Rostedt

From: Steven Rostedt <rostedt(a)goodmis.org> Running the following commands was broken: # cd /sys/kernel/tracing # echo "filename.ustring ~ \"/proc*\"" > events/syscalls/sys_enter_openat/filter # echo 1 > events/syscalls/sys_enter_openat/enable # ls /proc/$$/maps # cat trace And would produce nothing when it should have produced something like: ls-1192 [007] ..... 8169.828333: sys_openat(dfd: ffffffffffffff9c, filename: 7efc18359904, flags: 80000, mode: 0) Add a test to check this case so that it will be caught if it breaks again. Link: https://lore.kernel.org/linux-trace-kernel/20250417183003.505835fb@gandalf.… Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- Changes since v1: https://lore.kernel.org/20250417223323.3edb4f6c@batman.local.home - Use $TMPDIR instead of $TESTDIR as test file (Masami Hiramatsu) .../test.d/filter/event-filter-function.tc | 20 +++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc index 118247b8dd84..c62165fabd0c 100644 --- a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc +++ b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc @@ -80,6 +80,26 @@ if [ $misscnt -gt 0 ]; then exit_fail fi +# Check strings too +if [ -f events/syscalls/sys_enter_openat/filter ]; then + DIRNAME=`basename $TMPDIR` + echo "filename.ustring ~ \"*$DIRNAME*\"" > events/syscalls/sys_enter_openat/filter + echo 1 > events/syscalls/sys_enter_openat/enable + echo 1 > tracing_on + ls /bin/sh + nocnt=`grep openat trace | wc -l` + ls $TMPDIR + echo 0 > tracing_on + hitcnt=`grep openat trace | wc -l`; + echo 0 > events/syscalls/sys_enter_openat/enable + if [ $nocnt -gt 0 ]; then + exit_fail + fi + if [ $hitcnt -eq 0 ]; then + exit_fail + fi +fi + reset_events_filter exit 0 -- 2.47.2

8 months, 3 weeks

2
2
0 0

[PATCH v4 net-next 00/15] AccECN protocol patch series

by chia-yu.chang＠nokia-bell-labs.com

From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com> Hello, Plese find the v4: v4 (18-Apr-2025) - Fix 32-bit ARM assertion for alignment reuirement (Simon Horman <horms(a)kernel.org>) v3 (14-Apr-2025) - Fix patch apply issue in v2 (Jakub Kicinski <kuba(a)kernel.org>) v2 (18-Mar-2025) - Add one missing patch from previous AccECN protocol preparation patch series to this patch series The full patch series can be found in https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/ The Accurate ECN draft can be found in https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-accurate-ecn-28 Best regards, Chia-Yu Chia-Yu Chang (1): tcp: accecn: AccECN option failure handling Ilpo Järvinen (14): tcp: reorganize SYN ECN code tcp: fast path functions later tcp: AccECN core tcp: accecn: AccECN negotiation tcp: accecn: add AccECN rx byte counters tcp: accecn: AccECN needs to know delivered bytes tcp: allow embedding leftover into option padding tcp: sack option handling improvements tcp: accecn: AccECN option tcp: accecn: AccECN option send control tcp: accecn: AccECN option ceb/cep heuristic tcp: accecn: AccECN ACE field multi-wrap heuristic tcp: accecn: try to fit AccECN option with SACK tcp: try to avoid safer when ACKs are thinned include/linux/tcp.h | 27 +- include/net/netns/ipv4.h | 2 + include/net/tcp.h | 198 +++++++++++-- include/uapi/linux/tcp.h | 7 + net/ipv4/syncookies.c | 3 + net/ipv4/sysctl_net_ipv4.c | 19 ++ net/ipv4/tcp.c | 26 +- net/ipv4/tcp_input.c | 591 +++++++++++++++++++++++++++++++++++-- net/ipv4/tcp_ipv4.c | 5 +- net/ipv4/tcp_minisocks.c | 92 +++++- net/ipv4/tcp_output.c | 302 +++++++++++++++++-- net/ipv6/syncookies.c | 1 + net/ipv6/tcp_ipv6.c | 1 + 13 files changed, 1178 insertions(+), 96 deletions(-) -- 2.34.1

8 months, 3 weeks

4
21
0 0

[PATCH v11 net-next 0/5] DUALPI2 patch

by chia-yu.chang＠nokia-bell-labs.com

From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com> Hello, Please find the reposted DualPI2 patch v11. v11 (15-Apr-2025) - Replace hstimer_init with hstimer_setup in sch_dualpi2.c v10 (25-Mar-2025) - Remove leftover include in include/linux/netdevice.h and anonimous struct in sch_dualpi2.c (Paolo Abeni <pabeni(a)redhat.com>) - Use kfree_skb_reason() and add SKB_DROP_REASON_DUALPI2_STEP_DROP drop reason (Paolo Abeni <pabeni(a)redhat.com>) - Split sch_dualpi2.c into 3 patches (and overall 5 patches): Struct definition & parsing, Dump stats & configuration, Enqueue/Dequeue (Paolo Abeni <pabeni(a)redhat.com>) v9 (16-Mar-2025) - Fix mem_usage error in previous version - Add min_qlen_step to dualpi2 attribute as the minimum queue length in number of packets in the L-queue to start step threshold marking. In previous versions, this value was fixed to 2, so only when the queue length of the L queue was greater or equal to 2 packets step threshold was applied to mark packets in the L-queue. This will cause larger queuing delays for L4S traffic at low rates (<20Mbps). So we parameterize it and change the default value to 0. Comparison of tcp_1down run 'HTB 20Mbit + DUALPI2 + 10ms base delay' Old versions: avg median # data pts Ping (ms) ICMP : 11.55 11.70 ms 350 TCP upload avg : 18.96 N/A Mbits/s 350 TCP upload sum : 18.96 N/A Mbits/s 350 New version (v9): avg median # data pts Ping (ms) ICMP : 10.81 10.70 ms 350 TCP upload avg : 18.91 N/A Mbits/s 350 TCP upload sum : 18.91 N/A Mbits/s 350 Comparison of tcp_1down run 'HTB 10Mbit + DUALPI2 + 10ms base delay' Old versions: avg median # data pts Ping (ms) ICMP : 12.61 12.80 ms 350 TCP upload avg : 9.48 N/A Mbits/s 350 TCP upload sum : 9.48 N/A Mbits/s 350 New version (v9): avg median # data pts Ping (ms) ICMP : 11.06 10.80 ms 350 TCP upload avg : 9.43 N/A Mbits/s 350 TCP upload sum : 9.43 N/A Mbits/s 350 Comparison of tcp_1down run 'HTB 10Mbit + DUALPI2 + 10ms base delay' Old versions: avg median # data pts Ping (ms) ICMP : 40.86 37.45 ms 350 TCP upload avg : 0.88 N/A Mbits/s 350 TCP upload sum : 0.88 N/A Mbits/s 350 TCP upload::1 : 0.88 0.97 Mbits/s 350 New version (v9): avg median # data pts Ping (ms) ICMP : 11.07 10.40 ms 350 TCP upload avg : 0.55 N/A Mbits/s 350 TCP upload sum : 0.55 N/A Mbits/s 350 TCP upload::1 : 0.55 0.59 Mbits/s 350 v8 (11-Mar-2025) - Fix warning messages in v7 v7 (07-Mar-2025) - Separate into 3 patches to avoid mixing changes of documentation, selftest, and code. (Cong Wang <xiyou.wangcong(a)gmail.com>) v6 (04-Mar-2025) - Add modprobe for dulapi2 in tc-testing script tc-testing/tdc.sh (Jakub Kicinski <kuba(a)kernel.org>) - Update test cases in dualpi2.json - Update commit message v5 (22-Feb-2025) - A comparison was done between MQ + DUALPI2, MQ + FQ_PIE, MQ + FQ_CODEL: Unshaped 1gigE with 4 download streams test: - Summary of tcp_4down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 1.19 1.34 ms 349 TCP download avg : 235.42 N/A Mbits/s 349 TCP download sum : 941.68 N/A Mbits/s 349 TCP download::1 : 235.19 235.39 Mbits/s 349 TCP download::2 : 235.03 235.35 Mbits/s 349 TCP download::3 : 236.89 235.44 Mbits/s 349 TCP download::4 : 234.57 235.19 Mbits/s 349 - Summary of tcp_4down run 'MQ + FQ_PIE' avg median # data pts Ping (ms) ICMP : 1.21 1.37 ms 350 TCP download avg : 235.42 N/A Mbits/s 350 TCP download sum : 941.61 N/A Mbits/s 350 TCP download::1 : 232.54 233.13 Mbits/s 350 TCP download::2 : 232.52 232.80 Mbits/s 350 TCP download::3 : 233.14 233.78 Mbits/s 350 TCP download::4 : 243.41 241.48 Mbits/s 350 - Summary of tcp_4down run 'MQ + DUALPI2' avg median # data pts Ping (ms) ICMP : 1.19 1.34 ms 349 TCP download avg : 235.42 N/A Mbits/s 349 TCP download sum : 941.68 N/A Mbits/s 349 TCP download::1 : 235.19 235.39 Mbits/s 349 TCP download::2 : 235.03 235.35 Mbits/s 349 TCP download::3 : 236.89 235.44 Mbits/s 349 TCP download::4 : 234.57 235.19 Mbits/s 349 Unshaped 1gigE with 128 download streams test: - Summary of tcp_128down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 1.88 1.86 ms 350 TCP download avg : 7.39 N/A Mbits/s 350 TCP download sum : 946.47 N/A Mbits/s 350 Unshaped 10gigE with 4 download streams test: - Summary of tcp_4down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 0.22 0.23 ms 350 TCP download avg : 2354.08 N/A Mbits/s 350 TCP download sum : 9416.31 N/A Mbits/s 350 TCP download::1 : 2353.65 2352.81 Mbits/s 350 TCP download::2 : 2354.54 2354.21 Mbits/s 350 TCP download::3 : 2353.56 2353.78 Mbits/s 350 TCP download::4 : 2354.56 2354.45 Mbits/s 350 - Summary of tcp_4down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 0.20 0.19 ms 350 TCP download avg : 2354.76 N/A Mbits/s 350 TCP download sum : 9419.04 N/A Mbits/s 350 TCP download::1 : 2354.77 2353.89 Mbits/s 350 TCP download::2 : 2353.41 2354.29 Mbits/s 350 TCP download::3 : 2356.18 2354.19 Mbits/s 350 TCP download::4 : 2354.68 2353.15 Mbits/s 350 - Summary of tcp_4down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 0.24 0.24 ms 350 TCP download avg : 2354.11 N/A Mbits/s 350 TCP download sum : 9416.43 N/A Mbits/s 350 TCP download::1 : 2354.75 2353.93 Mbits/s 350 TCP download::2 : 2353.15 2353.75 Mbits/s 350 TCP download::3 : 2353.49 2353.72 Mbits/s 350 TCP download::4 : 2355.04 2353.73 Mbits/s 350 Unshaped 10gigE with 128 download streams test: - Summary of tcp_128down run 'MQ + FQ_CODEL': avg median # data pts Ping (ms) ICMP : 7.57 8.69 ms 350 TCP download avg : 73.97 N/A Mbits/s 350 TCP download sum : 9467.82 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + FQ_PIE': avg median # data pts Ping (ms) ICMP : 7.82 8.91 ms 350 TCP download avg : 73.97 N/A Mbits/s 350 TCP download sum : 9468.42 N/A Mbits/s 350 - Summary of tcp_128down run 'MQ + DUALPI2': avg median # data pts Ping (ms) ICMP : 6.87 7.93 ms 350 TCP download avg : 73.95 N/A Mbits/s 350 TCP download sum : 9465.87 N/A Mbits/s 350 From the results shown above, we see small differences between combinations. - Update commit message to include results of no_split_gso and split_gso (Dave Taht <dave.taht(a)gmail.com> and Paolo Abeni <pabeni(a)redhat.com>) - Add memlimit in dualpi2 attribute, and add memory_used, max_memory_used, memory_limit in dualpi2 stats (Dave Taht <dave.taht(a)gmail.com>) - Update note in sch_dualpi2.c related to BBRv3 status (Dave Taht <dave.taht(a)gmail.com>) - Update license identifier (Dave Taht <dave.taht(a)gmail.com>) - Add selftest in tools/testing/selftests/tc-testing (Cong Wang <xiyou.wangcong(a)gmail.com>) - Use netlink policies for parameter checks (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Modify texts & fix typos in Documentation/netlink/specs/tc.yaml (Dave Taht <dave.taht(a)gmail.com>) - Add dscsriptions of packet counter statistics and reset function of sch_dualpi2.c - Fix step_thresh in packets - Update code comments in sch_dualpi2.c v4 (22-Oct-2024) - Update statement in Kconfig for DualPI2 (Stephen Hemminger <stephen(a)networkplumber.org>) - Put a blank line after #define in sch_dualpi2.c (Stephen Hemminger <stephen(a)networkplumber.org>) - Fix line length warning v3 (19-Oct-2024) - Fix compilaiton error - Update Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba(a)kernel.org>) v2 (18-Oct-2024) - Add Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba(a)kernel.org>) - Use dualpi2 instead of skb prefix (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Replace nla_parse_nested_deprecated with nla_parse_nested (Jamal Hadi Salim <jhs(a)mojatatu.com>) - Fix line length warning For more details of DualPI2, plesae refer IETF RFC9332 (https://datatracker.ietf.org/doc/html/rfc9332). Best regards, Chia-Yu Chia-Yu Chang (4): Documentation: netlink: specs: tc: Add DualPI2 specification selftests/tc-testing: Add selftests for qdisc DualPI2 sched: Struct definition and parsing of dualpi2 qdisc sched: Dump configuration and statistics of dualpi2 qdisc Koen De Schepper (1): sched: Add enqueue/dequeue of dualpi2 qdisc Documentation/netlink/specs/tc.yaml | 144 +++ include/net/dropreason-core.h | 6 + include/uapi/linux/pkt_sched.h | 39 + net/sched/Kconfig | 12 + net/sched/Makefile | 1 + net/sched/sch_dualpi2.c | 1091 +++++++++++++++++ tools/testing/selftests/tc-testing/config | 1 + .../tc-testing/tc-tests/qdiscs/dualpi2.json | 149 +++ tools/testing/selftests/tc-testing/tdc.sh | 1 + 9 files changed, 1444 insertions(+) create mode 100644 net/sched/sch_dualpi2.c create mode 100644 tools/testing/selftests/tc-testing/tc-tests/qdiscs/dualpi2.json -- 2.34.1

8 months, 3 weeks

5
17
0 0

[PATCH v2] lib: PRIME_NUMBERS_KUNIT_TEST should not select PRIME_NUMBERS

by Geert Uytterhoeven

Enabling a (modular) test should not silently enable additional kernel functionality, as that may increase the attack vector of a product. Fix this by making PRIME_NUMBERS_KUNIT_TEST depend on PRIME_NUMBERS instead of selecting it. After this, one can safely enable CONFIG_KUNIT_ALL_TESTS=m to build modules for all appropriate tests for ones system, without pulling in extra unwanted functionality, while still allowing a tester to manually enable PRIME_NUMBERS and this test suite on a system where PRIME_NUMBERS is not enabled by default. Resurrect CONFIG_PRIME_NUMBERS=m in tools/testing/selftests/lib/config for the latter use case. Fixes: 313b38a6ecb46db4 ("lib/prime_numbers: convert self-test to KUnit") Signed-off-by: Geert Uytterhoeven <geert(a)linux-m68k.org> Acked-by: Tamir Duberstein <tamird(a)gmail.com> --- v2: - Add Acked-by, - Resurrect CONFIG_PRIME_NUMBERS=m in tools/testing/selftests/lib/config. --- lib/Kconfig.debug | 2 +- tools/testing/selftests/lib/config | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 4060a89866626c0a..51722f5d041970aa 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -3326,7 +3326,7 @@ config GCD_KUNIT_TEST config PRIME_NUMBERS_KUNIT_TEST tristate "Prime number generator test" if !KUNIT_ALL_TESTS depends on KUNIT - select PRIME_NUMBERS + depends on PRIME_NUMBERS default KUNIT_ALL_TESTS help This option enables the KUnit test suite for the {is,next}_prime_number diff --git a/tools/testing/selftests/lib/config b/tools/testing/selftests/lib/config index 81a1f64a22e860a6..377b3699ff312933 100644 --- a/tools/testing/selftests/lib/config +++ b/tools/testing/selftests/lib/config @@ -1,2 +1,3 @@ CONFIG_TEST_BITMAP=m +CONFIG_PRIME_NUMBERS=m CONFIG_TEST_BITOPS=m -- 2.43.0

8 months, 3 weeks

3
4
0 0

[PATCH v9 0/6] rust: reduce `as` casts, enable related lints

by Tamir Duberstein

This started with a patch that enabled `clippy::ptr_as_ptr`. Benno Lossin suggested I also look into `clippy::ptr_cast_constness` and I discovered `clippy::as_ptr_cast_mut`. This series now enables all 3 lints. It also enables `clippy::as_underscore` which ensures other pointer casts weren't missed. As a later addition, `clippy::cast_lossless` and `clippy::ref_as_ptr` are also enabled. This series depends on "rust: retain pointer mut-ness in `container_of!`"[1]. Link: https://lore.kernel.org/all/20250409-container-of-mutness-v1-1-64f472b94534… [1] Signed-off-by: Tamir Duberstein <tamird(a)gmail.com> --- Changes in v9: - Replace ref-to-ptr coercion using `let` bindings with `core::ptr::from_{ref,mut}`. (Boqun Feng). - Link to v8: https://lore.kernel.org/r/20250409-ptr-as-ptr-v8-0-3738061534ef@gmail.com Changes in v8: - Use coercion to go ref -> ptr. - rustfmt. - Rebase on v6.15-rc1. - Extract first commit to its own series as it is shared with other series. - Link to v7: https://lore.kernel.org/r/20250325-ptr-as-ptr-v7-0-87ab452147b9@gmail.com Changes in v7: - Add patch to enable `clippy::ref_as_ptr`. - Link to v6: https://lore.kernel.org/r/20250324-ptr-as-ptr-v6-0-49d1b7fd4290@gmail.com Changes in v6: - Drop strict provenance patch. - Fix URLs in doc comments. - Add patch to enable `clippy::cast_lossless`. - Rebase on rust-next. - Link to v5: https://lore.kernel.org/r/20250317-ptr-as-ptr-v5-0-5b5f21fa230a@gmail.com Changes in v5: - Use `pointer::addr` in OF. (Boqun Feng) - Add documentation on stubs. (Benno Lossin) - Mark stubs `#[inline]`. - Pick up Alice's RB on a shared commit from https://lore.kernel.org/all/Z9f-3Aj3_FWBZRrm@google.com/. - Link to v4: https://lore.kernel.org/r/20250315-ptr-as-ptr-v4-0-b2d72c14dc26@gmail.com Changes in v4: - Add missing SoB. (Benno Lossin) - Use `without_provenance_mut` in alloc. (Boqun Feng) - Limit strict provenance lints to the `kernel` crate to avoid complex logic in the build system. This can be revisited on MSRV >= 1.84.0. - Rebase on rust-next. - Link to v3: https://lore.kernel.org/r/20250314-ptr-as-ptr-v3-0-e7ba61048f4a@gmail.com Changes in v3: - Fixed clippy warning in rust/kernel/firmware.rs. (kernel test robot) Link: https://lore.kernel.org/all/202503120332.YTCpFEvv-lkp@intel.com/ - s/as u64/as bindings::phys_addr_t/g. (Benno Lossin) - Use strict provenance APIs and enable lints. (Benno Lossin) - Link to v2: https://lore.kernel.org/r/20250309-ptr-as-ptr-v2-0-25d60ad922b7@gmail.com Changes in v2: - Fixed typo in first commit message. - Added additional patches, converted to series. - Link to v1: https://lore.kernel.org/r/20250307-ptr-as-ptr-v1-1-582d06514c98@gmail.com --- Tamir Duberstein (6): rust: enable `clippy::ptr_as_ptr` lint rust: enable `clippy::ptr_cast_constness` lint rust: enable `clippy::as_ptr_cast_mut` lint rust: enable `clippy::as_underscore` lint rust: enable `clippy::cast_lossless` lint rust: enable `clippy::ref_as_ptr` lint Makefile | 6 ++++++ drivers/gpu/drm/drm_panic_qr.rs | 2 +- rust/bindings/lib.rs | 3 +++ rust/kernel/alloc/allocator_test.rs | 2 +- rust/kernel/alloc/kvec.rs | 4 ++-- rust/kernel/block/mq/operations.rs | 2 +- rust/kernel/block/mq/request.rs | 6 +++--- rust/kernel/device.rs | 4 ++-- rust/kernel/device_id.rs | 4 ++-- rust/kernel/devres.rs | 19 ++++++++++--------- rust/kernel/dma.rs | 6 +++--- rust/kernel/error.rs | 2 +- rust/kernel/firmware.rs | 3 ++- rust/kernel/fs/file.rs | 2 +- rust/kernel/io.rs | 18 +++++++++--------- rust/kernel/kunit.rs | 11 +++++++---- rust/kernel/list/impl_list_item_mod.rs | 2 +- rust/kernel/miscdevice.rs | 2 +- rust/kernel/net/phy.rs | 4 ++-- rust/kernel/of.rs | 6 +++--- rust/kernel/pci.rs | 11 +++++++---- rust/kernel/platform.rs | 4 +++- rust/kernel/print.rs | 6 +++--- rust/kernel/seq_file.rs | 2 +- rust/kernel/str.rs | 14 +++++++------- rust/kernel/sync/poll.rs | 2 +- rust/kernel/time/hrtimer/pin.rs | 2 +- rust/kernel/time/hrtimer/pin_mut.rs | 2 +- rust/kernel/uaccess.rs | 4 ++-- rust/kernel/workqueue.rs | 12 ++++++------ rust/uapi/lib.rs | 3 +++ 31 files changed, 96 insertions(+), 74 deletions(-) --- base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8 change-id: 20250307-ptr-as-ptr-21b1867fc4d4 prerequisite-change-id: 20250409-container-of-mutness-b153dab4388d:v1 prerequisite-patch-id: 53d5889db599267f87642bb0ae3063c29bc24863 Best regards, -- Tamir Duberstein <tamird(a)gmail.com>

8 months, 3 weeks

3
23
0 0

[PATCH v7 0/2] memcg: Fix test_memcg_min/low test failures

by Waiman Long

v7: - Skip the vmscan change as the mem_cgroup_usage() check for now as it is currently redundant. v6: - The memcg_test_low failure is indeed due to the memory_recursiveprot mount option which is enabled by default in systemd cgroup v2 setting. So adopt Michal's suggestion to adjust the low event checking according to whether memory_recursiveprot is enabled or not. v5: - Use mem_cgroup_usage() in patch 1 as originally suggested by Johannes. The test_memcontrol selftest consistently fails its test_memcg_low sub-test (with memory_recursiveprot enabled) and sporadically fails its test_memcg_min sub-test. This patchset fixes the test_memcg_min and test_memcg_low failures by adjusting the test_memcontrol selftest to fix these test failures. Waiman Long (2): selftests: memcg: Allow low event with no memory.low and memory_recursiveprot on selftests: memcg: Increase error tolerance of child memory.current check in test_memcg_protection() .../selftests/cgroup/test_memcontrol.c | 20 ++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) -- 2.49.0

8 months, 3 weeks

3
5
0 0

[PATCH 0/4] mm: permit guard regions for file-backed/shmem mappings

by Lorenzo Stoakes

The guard regions feature was initially implemented to support anonymous mappings only, excluding shmem. This was done such as to introduce the feature carefully and incrementally and to be conservative when considering the various caveats and corner cases that are applicable to file-backed mappings but not to anonymous ones. Now this feature has landed in 6.13, it is time to revisit this and to extend this functionality to file-backed and shmem mappings. In order to make this maximally useful, and since one may map file-backed mappings read-only (for instance ELF images), we also remove the restriction on read-only mappings and permit the establishment of guard regions in any non-hugetlb, non-mlock()'d mapping. It is permissible to permit the establishment of guard regions in read-only mappings because the guard regions only reduce access to the mapping, and when removed simply reinstate the existing attributes of the underlying VMA, meaning no access violations can occur. While the change in kernel code introduced in this series is small, the majority of the effort here is spent in extending the testing to assert that the feature works correctly across numerous file-backed mapping scenarios. Every single guard region self-test performed against anonymous memory (which is relevant and not anon-only) has now been updated to also be performed against shmem and a mapping of a file in the working directory. This confirms that all cases also function correctly for file-backed guard regions. In addition a number of other tests are added for specific file-backed mapping scenarios. There are a number of other concerns that one might have with regard to guard regions, addressed below: Readahead ~~~~~~~~~ Readahead is a process through which the page cache is populated on the assumption that sequential reads will occur, thus amortising I/O and, through a clever use of the PG_readahead folio flag establishing during major fault and checked upon minor fault, provides for asynchronous I/O to occur as dat is processed, reducing I/O stalls as data is faulted in. Guard regions do not alter this mechanism which operations at the folio and fault level, but do of course prevent the faulting of folios that would otherwise be mapped. In the instance of a major fault prior to a guard region, synchronous readahead will occur including populating folios in the page cache which the guard regions will, in the case of the mapping in question, prevent access to. In addition, if PG_readahead is placed in a folio that is now inaccessible, this will prevent asynchronous readahead from occurring as it would otherwise do. However, there are mechanisms for heuristically resetting this within readahead regardless, which will 'recover' correct readahead behaviour. Readahead presumes sequential data access, the presence of a guard region clearly indicates that, at least in the guard region, no such sequential access will occur, as it cannot occur there. So this should have very little impact on any real workload. The far more important point is as to whether readahead causes incorrect or inappropriate mapping of ranges disallowed by the presence of guard regions - this is not the case, as readahead does not 'pre-fault' memory in this fashion. At any rate, any mechanism which would attempt to do so would hit the usual page fault paths, which correctly handle PTE markers as with anonymous mappings. Fault-Around ~~~~~~~~~~~~ The fault-around logic, in a similar vein to readahead, attempts to improve efficiency with regard to file-backed memory mappings, however it differs in that it does not try to fetch folios into the page cache that are about to be accessed, but rather pre-maps a range of folios around the faulting address. Guard regions making use of PTE markers makes this relatively trivial, as this case is already handled - see filemap_map_folio_range() and filemap_map_order0_folio() - in both instances, the solution is to simply keep the established page table mappings and let the fault handler take care of PTE markers, as per the comment: /* * NOTE: If there're PTE markers, we'll leave them to be * handled in the specific fault path, and it'll prohibit * the fault-around logic. */ This works, as establishing guard regions results in page table mappings with PTE markers, and clearing them removes them. Truncation ~~~~~~~~~~ File truncation will not eliminate existing guard regions, as the truncation operation will ultimately zap the range via unmap_mapping_range(), which specifically excludes PTE markers. Zapping ~~~~~~~ Zapping is, as with anonymous mappings, handled by zap_nonpresent_ptes(), which specifically deals with guard entries, leaving them intact except in instances such as process teardown or munmap() where they need to be removed. Reclaim ~~~~~~~ When reclaim is performed on file-backed folios, it ultimately invokes try_to_unmap_one() via the rmap. If the folio is non-large, then map_pte() will ultimately abort the operation for the guard region mapping. If large, then check_pte() will determine that this is a non-device private entry/device-exclusive entry 'swap' PTE and thus abort the operation in that instance. Therefore, no odd things happen in the instance of reclaim being attempted upon a file-backed guard region. Hole Punching ~~~~~~~~~~~~~ This updates the page cache and ultimately invokes unmap_mapping_range(), which explicitly leaves PTE markers in place. Because the establishment of guard regions zapped any existing mappings to file-backed folios, once the guard regions are removed then the hole-punched region will be faulted in as usual and everything will behave as expected. Lorenzo Stoakes (4): mm: allow guard regions in file-backed and read-only mappings selftests/mm: rename guard-pages to guard-regions tools/selftests: expand all guard region tests to file-backed tools/selftests: add file/shmem-backed mapping guard region tests mm/madvise.c | 8 +- tools/testing/selftests/mm/.gitignore | 2 +- tools/testing/selftests/mm/Makefile | 2 +- .../mm/{guard-pages.c => guard-regions.c} | 921 ++++++++++++++++-- 4 files changed, 821 insertions(+), 112 deletions(-) rename tools/testing/selftests/mm/{guard-pages.c => guard-regions.c} (58%) -- 2.48.1

8 months, 3 weeks

7
63
0 0

Re: [PATCH bpf-next v2 05/11] bpf, arm64, powerpc: Add bpf_jit_bypass_spec_v1/v4()

by Luis Gerhorst

kernel test robot <lkp(a)intel.com> writes: > All warnings (new ones prefixed by >>): > >>> kernel/bpf/core.c:3037:13: warning: no previous prototype for 'bpf_jit_bypass_spec_v1' [-Wmissing-prototypes] > 3037 | bool __weak bpf_jit_bypass_spec_v1(void) > | ^~~~~~~~~~~~~~~~~~~~~~ >>> kernel/bpf/core.c:3042:13: warning: no previous prototype for 'bpf_jit_bypass_spec_v4' [-Wmissing-prototypes] > 3042 | bool __weak bpf_jit_bypass_spec_v4(void) > | ^~~~~~~~~~~~~~~~~~~~~~ That's because the prototypes in include/linux/bpf.h were in the #ifdef CONFIG_BPF_SYSCALL. I fixed this for v3 by moving the prototypes out of the ifdef.

8 months, 3 weeks

1
0
0 0

[PATCH bpf-next v2 0/9] selftests/bpf: Test sockmap/sockhash redirection

by Michal Luczaj

The idea behind this series is to comprehensively test the BPF redirection: BPF_MAP_TYPE_SOCKMAP, BPF_MAP_TYPE_SOCKHASH x sk_msg-to-egress, sk_msg-to-ingress, sk_skb-to-egress, sk_skb-to-ingress x AF_INET, SOCK_STREAM, AF_INET6, SOCK_STREAM, AF_INET, SOCK_DGRAM, AF_INET6, SOCK_DGRAM, AF_UNIX, SOCK_STREAM, AF_UNIX, SOCK_DGRAM, AF_VSOCK, SOCK_STREAM, AF_VSOCK, SOCK_SEQPACKET New module is introduced, sockmap_redir: all supported and unsupported redirect combinations are tested for success and failure respectively. Code is pretty much stolen/adapted from Jakub Sitnicki's sockmap_redir_matrix.c [1]. Usage: $ cd tools/testing/selftests/bpf $ make $ sudo ./test_progs -t sockmap_redir ... Summary: 1/576 PASSED, 0 SKIPPED, 0 FAILED [1]: https://github.com/jsitnicki/sockmap-redir-matrix/blob/main/sockmap_redir_m… Changes in v2: - Verify that the unsupported redirect combos do fail [Jakub] - Dedup tests in sockmap_listen - Cosmetic changes and code reordering - Link to v1: https://lore.kernel.org/bpf/42939687-20f9-4a45-b7c2-342a0e11a014@rbox.co/ Suggested-by: Jakub Sitnicki <jakub(a)cloudflare.com> Signed-off-by: Michal Luczaj <mhal(a)rbox.co> --- Michal Luczaj (9): selftests/bpf: Support af_unix SOCK_DGRAM socket pair creation selftests/bpf: Add socket_kind_to_str() to socket_helpers selftests/bpf: Add u32()/u64() to sockmap_helpers selftests/bpf: Allow setting BPF_F_INGRESS in prog_msg_verdict() selftests/bpf: Add selftest for sockmap/hashmap redirection selftests/bpf: sockmap_listen cleanup: Drop af_vsock redir tests selftests/bpf: sockmap_listen cleanup: Drop af_unix redir tests selftests/bpf: sockmap_listen cleanup: Drop af_inet SOCK_DGRAM redir tests docs/bpf: sockmap: Add a missing comma Documentation/bpf/map_sockmap.rst | 2 +- .../selftests/bpf/prog_tests/socket_helpers.h | 84 +++- .../selftests/bpf/prog_tests/sockmap_helpers.h | 25 +- .../selftests/bpf/prog_tests/sockmap_listen.c | 459 +------------------- .../selftests/bpf/prog_tests/sockmap_redir.c | 461 +++++++++++++++++++++ .../selftests/bpf/progs/test_sockmap_listen.c | 6 +- 6 files changed, 558 insertions(+), 479 deletions(-) --- base-commit: a27a97f713947b20ba91b23a3ef77fa92d74171b change-id: 20240922-selftests-sockmap-redir-5d839396c75e Best regards, -- Michal Luczaj <mhal(a)rbox.co>

8 months, 3 weeks

4
21
0 0

[PATCH v9 0/5] KVM: selftests: Add LoongArch support

by Bibo Mao

--- Changes in v9: 1. Add vm mode VM_MODE_P47V47_16K, LoongArch VM uses this mode by default, rather than VM_MODE_P36V47_16K. 2. Refresh some spelling issues in changelog. Changes in v8: 1. Porting patch based on the latest version. 2. For macro PC_OFFSET_EXREGS, offsetof() method is used for C header file, still hardcoded definition for assemble language. Changes in v7: 1. Refine code to add LoongArch support in test case set_memory_region_test. Changes in v6: 1. Refresh the patch based on latest kernel 6.8-rc1, add LoongArch support about testcase set_memory_region_test. 2. Add hardware_disable_test test case. 3. Drop modification about macro DEFAULT_GUEST_TEST_MEM, it is problem of LoongArch binutils, this issue is raised to LoongArch binutils owners. Changes in v5: 1. In LoongArch kvm self tests, the DEFAULT_GUEST_TEST_MEM could be 0x130000000, it is different from the default value in memstress.h. So we Move the definition of DEFAULT_GUEST_TEST_MEM into LoongArch ucall.h, and add 'ifndef' condition for DEFAULT_GUEST_TEST_MEM in memstress.h. Changes in v4: 1. Remove the based-on flag, as the LoongArch KVM patch series have been accepted by Linux kernel, so this can be applied directly in kernel. Changes in v3: 1. Improve implementation of LoongArch VM page walk. 2. Add exception handler for LoongArch. 3. Add dirty_log_test, dirty_log_perf_test, guest_print_test test cases for LoongArch. 4. Add __ASSEMBLER__ macro to distinguish asm file and c file. 5. Move ucall_arch_do_ucall to the header file and make it as static inline to avoid function calls. 6. Change the DEFAULT_GUEST_TEST_MEM base addr for LoongArch. Changes in v2: 1. We should use ".balign 4096" to align the assemble code with 4K in exception.S instead of "align 12". 2. LoongArch only supports 3 or 4 levels page tables, so we remove the hanlders for 2-levels page table. 3. Remove the DEFAULT_LOONGARCH_GUEST_STACK_VADDR_MIN and use the common DEFAULT_GUEST_STACK_VADDR_MIN to allocate stack memory in guest. 4. Reorganize the test cases supported by LoongArch. 5. Fix some code comments. 6. Add kvm_binary_stats_test test case into LoongArch KVM selftests. --- Bibo Mao (5): KVM: selftests: Add VM_MODE_P47V47_16K vm mode KVM: selftests: Add KVM selftests header files for LoongArch KVM: selftests: Add core KVM selftests support for LoongArch KVM: selftests: Add ucall test support for LoongArch KVM: selftests: Add test cases for LoongArch tools/testing/selftests/kvm/Makefile | 2 +- tools/testing/selftests/kvm/Makefile.kvm | 18 + .../testing/selftests/kvm/include/kvm_util.h | 6 + .../kvm/include/loongarch/kvm_util_arch.h | 7 + .../kvm/include/loongarch/processor.h | 138 +++++++ .../selftests/kvm/include/loongarch/ucall.h | 20 + tools/testing/selftests/kvm/lib/kvm_util.c | 3 + .../selftests/kvm/lib/loongarch/exception.S | 59 +++ .../selftests/kvm/lib/loongarch/processor.c | 347 ++++++++++++++++++ .../selftests/kvm/lib/loongarch/ucall.c | 38 ++ .../selftests/kvm/set_memory_region_test.c | 2 +- 11 files changed, 638 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/kvm/include/loongarch/kvm_util_arch.h create mode 100644 tools/testing/selftests/kvm/include/loongarch/processor.h create mode 100644 tools/testing/selftests/kvm/include/loongarch/ucall.h create mode 100644 tools/testing/selftests/kvm/lib/loongarch/exception.S create mode 100644 tools/testing/selftests/kvm/lib/loongarch/processor.c create mode 100644 tools/testing/selftests/kvm/lib/loongarch/ucall.c base-commit: 8ffd015db85fea3e15a77027fda6c02ced4d2444 -- 2.39.3

8 months, 3 weeks

3
9
0 0

[PATCH 12/14] torture: Add testing of RCU's Rust bindings to torture.sh

by Joel Fernandes

From: "Paul E. McKenney" <paulmck(a)kernel.org> This commit adds a --do-rcu-rust parameter to torture.sh, which invokes a rust_doctests_kernel kunit run. Note that kunit wants a clean source tree, so this runs "make mrproper", which might come as a surprise to some users. Should there be a --mrproper parameter to torture.sh to make the user explicitly ask for it? Co-developed-by: Boqun Feng <boqun.feng(a)gmail.com> Signed-off-by: Boqun Feng <boqun.feng(a)gmail.com> Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org> Signed-off-by: Joel Fernandes <joelagnelf(a)nvidia.com> --- .../selftests/rcutorture/bin/torture.sh | 45 +++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/tools/testing/selftests/rcutorture/bin/torture.sh b/tools/testing/selftests/rcutorture/bin/torture.sh index 475f758f6216..e03fdaca89b3 100755 --- a/tools/testing/selftests/rcutorture/bin/torture.sh +++ b/tools/testing/selftests/rcutorture/bin/torture.sh @@ -59,6 +59,7 @@ do_clocksourcewd=yes do_rt=yes do_rcutasksflavors=yes do_srcu_lockdep=yes +do_rcu_rust=no # doyesno - Helper function for yes/no arguments function doyesno () { @@ -89,6 +90,7 @@ usage () { echo " --do-rcutorture / --do-no-rcutorture / --no-rcutorture" echo " --do-refscale / --do-no-refscale / --no-refscale" echo " --do-rt / --do-no-rt / --no-rt" + echo " --do-rcu-rust / --do-no-rcu-rust / --no-rcu-rust" echo " --do-scftorture / --do-no-scftorture / --no-scftorture" echo " --do-srcu-lockdep / --do-no-srcu-lockdep / --no-srcu-lockdep" echo " --duration [ <minutes> | <hours>h | <days>d ]" @@ -191,6 +193,9 @@ do --do-rt|--do-no-rt|--no-rt) do_rt=`doyesno "$1" --do-rt` ;; + --do-rcu-rust|--do-no-rcu-rust|--no-rcu-rust) + do_rcu_rust=`doyesno "$1" --do-rcu-rust` + ;; --do-scftorture|--do-no-scftorture|--no-scftorture) do_scftorture=`doyesno "$1" --do-scftorture` ;; @@ -485,6 +490,46 @@ then torture_set "rcurttorture-exp" tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration "$duration_rcutorture" --configs "TREE03" --kconfig "CONFIG_PREEMPT_RT=y CONFIG_EXPERT=y CONFIG_HZ_PERIODIC=n CONFIG_NO_HZ_FULL=y CONFIG_RCU_NOCB_CPU=y" --trust-make fi +if test "$do_rcu_rust" = "yes" +then + echo " --- do-rcu-rust:" Start `date` | tee -a $T/log + rrdir="tools/testing/selftests/rcutorture/res/$ds/results-rcu-rust" + mkdir -p "$rrdir" + echo " --- make LLVM=1 rustavailable " | tee -a $rrdir/log > $rrdir/rustavailable.out + make LLVM=1 rustavailable > $T/rustavailable.out 2>&1 + retcode=$? + echo $retcode > $rrdir/rustavailable.exitcode + cat $T/rustavailable.out | tee -a $rrdir/log >> $rrdir/rustavailable.out 2>&1 + buildphase=rustavailable + if test "$retcode" -eq 0 + then + echo " --- Running 'make mrproper' in order to run kunit." | tee -a $rrdir/log > $rrdir/mrproper.out + make mrproper > $rrdir/mrproper.out 2>&1 + retcode=$? + echo $retcode > $rrdir/mrproper.exitcode + buildphase=mrproper + fi + if test "$retcode" -eq 0 + then + echo " --- Running rust_doctests_kernel." | tee -a $rrdir/log > $rrdir/rust_doctests_kernel.out + ./tools/testing/kunit/kunit.py run --make_options LLVM=1 --make_options CLIPPY=1 --arch arm64 --kconfig_add CONFIG_SMP=y --kconfig_add CONFIG_WERROR=y --kconfig_add CONFIG_RUST=y rust_doctests_kernel >> $rrdir/rust_doctests_kernel.out 2>&1 + # @@@ Remove "--arch arm64" in order to test on native architecture? + # @@@ Analyze $rrdir/rust_doctests_kernel.out contents? + retcode=$? + echo $retcode > $rrdir/rust_doctests_kernel.exitcode + buildphase=rust_doctests_kernel + fi + if test "$retcode" -eq 0 + then + echo "rcu-rust($retcode)" $rrdir >> $T/successes + echo Success >> $rrdir/log + else + echo "rcu-rust($retcode)" $rrdir >> $T/failures + echo " --- rcu-rust Test summary:" >> $rrdir/log + echo " --- Summary: Exit code $retcode from $buildphase, see $rrdir/$buildphase.out" >> $rrdir/log + fi +fi + if test "$do_srcu_lockdep" = "yes" then echo " --- do-srcu-lockdep:" Start `date` | tee -a $T/log -- 2.43.0

8 months, 3 weeks

3
9
0 0

[PATCH v2 0/7] tools/nolibc: fix some undefined behaviour and enable UBSAN

by Thomas Weißschuh

Fix some issues uncovered by UBSAN and enable UBSAN for nolibc-test to avoid regressions. Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net> --- Changes in v2: - Introduce and use __nolibc_aligned_as() - Reduce size of fixes to i{64,}toa_r() - Link to v1: https://lore.kernel.org/r/20250416-nolibc-ubsan-v1-0-c4704bb23da7@weissschu… --- Thomas Weißschuh (7): tools/nolibc: add __nolibc_has_feature() tools/nolibc: add __nolibc_aligned() and __nolibc_aligned_as() tools/nolibc: disable function sanitizer for _start_c() tools/nolibc: properly align dirent buffer tools/nolibc: fix integer overflow in i{64,}toa_r() and selftests/nolibc: disable ubsan for smash_stack() selftests/nolibc: enable UBSAN if available tools/include/nolibc/compiler.h | 9 +++++++++ tools/include/nolibc/crt.h | 5 +++++ tools/include/nolibc/dirent.h | 3 ++- tools/include/nolibc/stdlib.h | 4 ++-- tools/testing/selftests/nolibc/Makefile | 3 ++- tools/testing/selftests/nolibc/nolibc-test.c | 1 + 6 files changed, 21 insertions(+), 4 deletions(-) --- base-commit: 7c73c10b906778384843b9d3ac6c2224727bbf5c change-id: 20250416-nolibc-ubsan-028401698654 Best regards, -- Thomas Weißschuh <linux(a)weissschuh.net>

8 months, 3 weeks

2
8
0 0

[PATCH 0/6] tools/nolibc: fix some undefined behaviour and enable UBSAN

by Thomas Weißschuh

Fix some issues uncovered by UBSAN and enable UBSAN for nolibc-test to avoid regressions. Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net> --- Thomas Weißschuh (6): tools/nolibc: add __nolibc_has_feature() tools/nolibc: disable function sanitizer for _start_c() tools/nolibc: properly align dirent buffer tools/nolibc: fix integer overflow in i{64,}toa_r() and selftests/nolibc: disable ubsan for smash_stack() selftests/nolibc: enable UBSAN if available tools/include/nolibc/compiler.h | 6 ++++++ tools/include/nolibc/crt.h | 5 +++++ tools/include/nolibc/dirent.h | 1 + tools/include/nolibc/stdlib.h | 24 ++++++++---------------- tools/testing/selftests/nolibc/Makefile | 3 ++- tools/testing/selftests/nolibc/nolibc-test.c | 1 + 6 files changed, 23 insertions(+), 17 deletions(-) --- base-commit: 7c73c10b906778384843b9d3ac6c2224727bbf5c change-id: 20250416-nolibc-ubsan-028401698654 Best regards, -- Thomas Weißschuh <linux(a)weissschuh.net>

8 months, 3 weeks

3
17
0 0

[PATCH v2 1/2] time/timekeeping: Fix possible inconsistencies in _COARSE clockids

by John Stultz

Lei Chen raised an issue with CLOCK_MONOTONIC_COARSE seeing time inconsistencies. Lei tracked down that this was being caused by the adjustment tk->tkr_mono.xtime_nsec -= offset; which is made to compensate for the unaccumulated cycles in offset when the mult value is adjusted forward, so that the non-_COARSE clockids don't see inconsistencies. However, the _COARSE clockids don't use the mult*offset value in their calculations, so this subtraction can cause the _COARSE clock ids to jump back a bit. Now, by design, this negative adjustment should be fine, because the logic run from timekeeping_adjust() is done after we accumulate approx mult*interval_cycles into xtime_nsec. The accumulated (mult*interval_cycles) will be larger then the (mult_adj*offset) value subtracted from xtime_nsec, and both operations are done together under the tk_core.lock, so the net change to xtime_nsec should always be positive. However, do_adjtimex() calls into timekeeping_advance() as well, since we want to apply the ntp freq adjustment immediately. In this case, we don't return early when the offset is smaller then interval_cycles, so we don't end up accumulating any time into xtime_nsec. But we do go on to call timekeeping_adjust(), which modifies the mult value, and subtracts from xtime_nsec to correct for the new mult value. Here because we did not accumulate anything, we have a window where the _COARSE clockids that don't utilize the mult*offset value, can see an inconsistency. So to fix this, rework the timekeeping_advance() logic a bit so that when we are called from do_adjtimex(), we call timekeeping_forward(), to first accumulate the sub-interval time into xtime_nsec. Then with no unaccumulated cycles in offset, we can do the mult adjustment without worry of the subtraction having an impact. Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: Stephen Boyd <sboyd(a)kernel.org> Cc: Anna-Maria Behnsen <anna-maria(a)linutronix.de> Cc: Frederic Weisbecker <frederic(a)kernel.org> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Miroslav Lichvar <mlichvar(a)redhat.com> Cc: linux-kselftest(a)vger.kernel.org Cc: kernel-team(a)android.com Cc: Lei Chen <lei.chen(a)smartx.com> Fixes: da15cfdae033 ("time: Introduce CLOCK_REALTIME_COARSE") Reported-by: Lei Chen <lei.chen(a)smartx.com> Closes: https://lore.kernel.org/lkml/20250310030004.3705801-1-lei.chen@smartx.com/ Diagnosed-by: Thomas Gleixner <tglx(a)linutronix.de> Additional-fixes-by: Thomas Gleixner <tglx(a)linutronix.de> Signed-off-by: John Stultz <jstultz(a)google.com> --- v2: Include fixes from Thomas, dropping the unnecessary clock_set setting, and instead clearing ntp_error, along with some other minor tweaks. --- kernel/time/timekeeping.c | 94 ++++++++++++++++++++++++++++----------- 1 file changed, 69 insertions(+), 25 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 1e67d076f1955..929846b8b45ab 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -682,20 +682,19 @@ static void timekeeping_update_from_shadow(struct tk_data *tkd, unsigned int act } /** - * timekeeping_forward_now - update clock to the current time + * timekeeping_forward - update clock to given cycle now value * @tk: Pointer to the timekeeper to update + * @cycle_now: Current clocksource read value * * Forward the current clock to update its state since the last call to * update_wall_time(). This is useful before significant clock changes, * as it avoids having to deal with this time offset explicitly. */ -static void timekeeping_forward_now(struct timekeeper *tk) +static void timekeeping_forward(struct timekeeper *tk, u64 cycle_now) { - u64 cycle_now, delta; + u64 delta = clocksource_delta(cycle_now, tk->tkr_mono.cycle_last, tk->tkr_mono.mask, + tk->tkr_mono.clock->max_raw_delta); - cycle_now = tk_clock_read(&tk->tkr_mono); - delta = clocksource_delta(cycle_now, tk->tkr_mono.cycle_last, tk->tkr_mono.mask, - tk->tkr_mono.clock->max_raw_delta); tk->tkr_mono.cycle_last = cycle_now; tk->tkr_raw.cycle_last = cycle_now; @@ -710,6 +709,21 @@ static void timekeeping_forward_now(struct timekeeper *tk) } } +/** + * timekeeping_forward_now - update clock to the current time + * @tk: Pointer to the timekeeper to update + * + * Forward the current clock to update its state since the last call to + * update_wall_time(). This is useful before significant clock changes, + * as it avoids having to deal with this time offset explicitly. + */ +static void timekeeping_forward_now(struct timekeeper *tk) +{ + u64 cycle_now = tk_clock_read(&tk->tkr_mono); + + timekeeping_forward(tk, cycle_now); +} + /** * ktime_get_real_ts64 - Returns the time of day in a timespec64. * @ts: pointer to the timespec to be set @@ -2151,6 +2165,54 @@ static u64 logarithmic_accumulation(struct timekeeper *tk, u64 offset, return offset; } +static u64 timekeeping_accumulate(struct timekeeper *tk, u64 offset, + enum timekeeping_adv_mode mode, + unsigned int *clock_set) +{ + int shift = 0, maxshift; + + /* + * TK_ADV_FREQ indicates that adjtimex(2) directly set the + * frequency or the tick length. + * + * Accumulate the offset, so that the new multiplier starts from + * now. This is required as otherwise for offsets, which are + * smaller than tk::cycle_interval, timekeeping_adjust() could set + * xtime_nsec backwards, which subsequently causes time going + * backwards in the coarse time getters. But even for the case + * where offset is greater than tk::cycle_interval the periodic + * accumulation does not have much value. + * + * Also reset tk::ntp_error as it does not make sense to keep the + * old accumulated error around in this case. + */ + if (mode == TK_ADV_FREQ) { + timekeeping_forward(tk, tk->tkr_mono.cycle_last + offset); + tk->ntp_error = 0; + return 0; + } + + /* + * With NO_HZ we may have to accumulate many cycle_intervals + * (think "ticks") worth of time at once. To do this efficiently, + * we calculate the largest doubling multiple of cycle_intervals + * that is smaller than the offset. We then accumulate that + * chunk in one go, and then try to consume the next smaller + * doubled multiple. + */ + shift = ilog2(offset) - ilog2(tk->cycle_interval); + shift = max(0, shift); + /* Bound shift to one less than what overflows tick_length */ + maxshift = (64 - (ilog2(ntp_tick_length()) + 1)) - 1; + shift = min(shift, maxshift); + while (offset >= tk->cycle_interval) { + offset = logarithmic_accumulation(tk, offset, shift, clock_set); + if (offset < tk->cycle_interval << shift) + shift--; + } + return offset; +} + /* * timekeeping_advance - Updates the timekeeper to the current time and * current NTP tick length @@ -2160,7 +2222,6 @@ static bool timekeeping_advance(enum timekeeping_adv_mode mode) struct timekeeper *tk = &tk_core.shadow_timekeeper; struct timekeeper *real_tk = &tk_core.timekeeper; unsigned int clock_set = 0; - int shift = 0, maxshift; u64 offset; guard(raw_spinlock_irqsave)(&tk_core.lock); @@ -2177,24 +2238,7 @@ static bool timekeeping_advance(enum timekeeping_adv_mode mode) if (offset < real_tk->cycle_interval && mode == TK_ADV_TICK) return false; - /* - * With NO_HZ we may have to accumulate many cycle_intervals - * (think "ticks") worth of time at once. To do this efficiently, - * we calculate the largest doubling multiple of cycle_intervals - * that is smaller than the offset. We then accumulate that - * chunk in one go, and then try to consume the next smaller - * doubled multiple. - */ - shift = ilog2(offset) - ilog2(tk->cycle_interval); - shift = max(0, shift); - /* Bound shift to one less than what overflows tick_length */ - maxshift = (64 - (ilog2(ntp_tick_length())+1)) - 1; - shift = min(shift, maxshift); - while (offset >= tk->cycle_interval) { - offset = logarithmic_accumulation(tk, offset, shift, &clock_set); - if (offset < tk->cycle_interval<<shift) - shift--; - } + offset = timekeeping_accumulate(tk, offset, mode, &clock_set); /* Adjust the multiplier to correct NTP error */ timekeeping_adjust(tk, offset); -- 2.49.0.395.g12beb8f557-goog

8 months, 4 weeks

3
22
0 0

[GIT PULL] kunit fixes update for Linux 6.15-rc3

by Shuah Khan

Hi Linus, Please pull the following kunit fixes update for Linux 6.15-rc3. Fixes arch sh kunit qemu_configs script sh.py to honor kunit cmdline. diff is attached. thanks, -- Shuah ---------------------------------------------------------------- The following changes since commit 8ffd015db85fea3e15a77027fda6c02ced4d2444: Linux 6.15-rc2 (2025-04-13 11:54:49 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-kunit-fixes-6.15-rc3 for you to fetch changes up to b26c1a85f3fc3cc749380ff94199377fc2d0c203: kunit: qemu_configs: SH: Respect kunit cmdline (2025-04-14 10:08:01 -0600) ---------------------------------------------------------------- linux_kselftest-kunit-fixes-6.15-rc3 Fixes arch sh kunit qemu_configs script sh.py to honor kunit cmdline. ---------------------------------------------------------------- Thomas Weißschuh (1): kunit: qemu_configs: SH: Respect kunit cmdline tools/testing/kunit/qemu_configs/sh.py | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) ----------------------------------------------------------------

8 months, 4 weeks

2
1
0 0

[GIT PULL] Kselftest fixes update for Linux 6.15-rc3

by Shuah Khan

Hi Linus, Please pull the following kselftest fixes update for Linux 6.15-rc3. Fixes dynevent_limitations.tc test failure on dash by detecting and handling bash and dash differences in evaluating \\. diff is attached. thanks, -- Shuah ---------------------------------------------------------------- The following changes since commit 8ffd015db85fea3e15a77027fda6c02ced4d2444: Linux 6.15-rc2 (2025-04-13 11:54:49 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-fixes-6.15-rc3 for you to fetch changes up to 07be53cfa81afe94b14fb4bfee8243f2e0125d5e: selftests/ftrace: Differentiate bash and dash in dynevent_limitations.tc (2025-04-16 12:47:41 -0600) ---------------------------------------------------------------- linux_kselftest-fixes-6.15-rc3 Fixes dynevent_limitations.tc test failure on dash by detecting and handling bash and dash differences in evaluating \\. ---------------------------------------------------------------- Steven Rostedt (1): selftests/ftrace: Differentiate bash and dash in dynevent_limitations.tc .../ftrace/test.d/dynevent/dynevent_limitations.tc | 23 +++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) ----------------------------------------------------------------

8 months, 4 weeks

2
1
0 0

[PATCH 12/12] torture: Add testing of RCU's Rust bindings to torture.sh

by Joel Fernandes

From: "Paul E. McKenney" <paulmck(a)kernel.org> This commit adds a --do-rcu-rust parameter to torture.sh, which invokes a rust_doctests_kernel kunit run. Note that kunit wants a clean source tree, so this runs "make mrproper", which might come as a surprise to some users. Should there be a --mrproper parameter to torture.sh to make the user explicitly ask for it? Co-developed-by: Boqun Feng <boqun.feng(a)gmail.com> Signed-off-by: Boqun Feng <boqun.feng(a)gmail.com> Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org> Signed-off-by: Joel Fernandes <joelagnelf(a)nvidia.com> --- .../selftests/rcutorture/bin/torture.sh | 45 +++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/tools/testing/selftests/rcutorture/bin/torture.sh b/tools/testing/selftests/rcutorture/bin/torture.sh index 5ccd60a563be..6d1a84f3f631 100755 --- a/tools/testing/selftests/rcutorture/bin/torture.sh +++ b/tools/testing/selftests/rcutorture/bin/torture.sh @@ -59,6 +59,7 @@ do_clocksourcewd=yes do_rt=yes do_rcutasksflavors=yes do_srcu_lockdep=yes +do_rcu_rust=no # doyesno - Helper function for yes/no arguments function doyesno () { @@ -89,6 +90,7 @@ usage () { echo " --do-rcutorture / --do-no-rcutorture / --no-rcutorture" echo " --do-refscale / --do-no-refscale / --no-refscale" echo " --do-rt / --do-no-rt / --no-rt" + echo " --do-rcu-rust / --do-no-rcu-rust / --no-rcu-rust" echo " --do-scftorture / --do-no-scftorture / --no-scftorture" echo " --do-srcu-lockdep / --do-no-srcu-lockdep / --no-srcu-lockdep" echo " --duration [ <minutes> | <hours>h | <days>d ]" @@ -191,6 +193,9 @@ do --do-rt|--do-no-rt|--no-rt) do_rt=`doyesno "$1" --do-rt` ;; + --do-rcu-rust|--do-no-rcu-rust|--no-rcu-rust) + do_rcu_rust=`doyesno "$1" --do-rcu-rust` + ;; --do-scftorture|--do-no-scftorture|--no-scftorture) do_scftorture=`doyesno "$1" --do-scftorture` ;; @@ -485,6 +490,46 @@ then torture_set "rcurttorture-exp" tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration "$duration_rcutorture" --configs "TREE03" --kconfig "CONFIG_PREEMPT_RT=y CONFIG_EXPERT=y CONFIG_HZ_PERIODIC=n CONFIG_NO_HZ_FULL=y" --trust-make fi +if test "$do_rcu_rust" = "yes" +then + echo " --- do-rcu-rust:" Start `date` | tee -a $T/log + rrdir="tools/testing/selftests/rcutorture/res/$ds/results-rcu-rust" + mkdir -p "$rrdir" + echo " --- make LLVM=1 rustavailable " | tee -a $rrdir/log > $rrdir/rustavailable.out + make LLVM=1 rustavailable > $T/rustavailable.out 2>&1 + retcode=$? + echo $retcode > $rrdir/rustavailable.exitcode + cat $T/rustavailable.out | tee -a $rrdir/log >> $rrdir/rustavailable.out 2>&1 + buildphase=rustavailable + if test "$retcode" -eq 0 + then + echo " --- Running 'make mrproper' in order to run kunit." | tee -a $rrdir/log > $rrdir/mrproper.out + make mrproper > $rrdir/mrproper.out 2>&1 + retcode=$? + echo $retcode > $rrdir/mrproper.exitcode + buildphase=mrproper + fi + if test "$retcode" -eq 0 + then + echo " --- Running rust_doctests_kernel." | tee -a $rrdir/log > $rrdir/rust_doctests_kernel.out + ./tools/testing/kunit/kunit.py run --make_options LLVM=1 --make_options CLIPPY=1 --arch arm64 --kconfig_add CONFIG_SMP=y --kconfig_add CONFIG_WERROR=y --kconfig_add CONFIG_RUST=y rust_doctests_kernel >> $rrdir/rust_doctests_kernel.out 2>&1 + # @@@ Remove "--arch arm64" in order to test on native architecture? + # @@@ Analyze $rrdir/rust_doctests_kernel.out contents? + retcode=$? + echo $retcode > $rrdir/rust_doctests_kernel.exitcode + buildphase=rust_doctests_kernel + fi + if test "$retcode" -eq 0 + then + echo "rcu-rust($retcode)" $rrdir >> $T/successes + echo Success >> $rrdir/log + else + echo "rcu-rust($retcode)" $rrdir >> $T/failures + echo " --- rcu-rust Test summary:" >> $rrdir/log + echo " --- Summary: Exit code $retcode from $buildphase, see $rrdir/$buildphase.out" >> $rrdir/log + fi +fi + if test "$do_srcu_lockdep" = "yes" then echo " --- do-srcu-lockdep:" Start `date` | tee -a $T/log -- 2.43.0

8 months, 4 weeks

1
0
0 0

[PATCH 11/12] torture: Add --do-{,no-}normal to torture.sh

by Joel Fernandes

From: "Paul E. McKenney" <paulmck(a)kernel.org> Right now, torture.sh runs normal runs unconditionally, which can be slow and thus annoying when you only want to test --kcsan or --kasan runs. This commit therefore adds a --do-normal argument so that "--kcsan --do-no-kasan --do-no-normal" runs only KCSAN runs. Note that specifying "--do-no-kasan --do-no-kcsan --do-no-normal" gets normal runs, so you should not try to use this as a synonym for --do-none. Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org> Signed-off-by: Joel Fernandes <joelagnelf(a)nvidia.com> --- .../selftests/rcutorture/bin/torture.sh | 30 +++++++++++++++++-- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/rcutorture/bin/torture.sh b/tools/testing/selftests/rcutorture/bin/torture.sh index d53ee1e0ffc7..5ccd60a563be 100755 --- a/tools/testing/selftests/rcutorture/bin/torture.sh +++ b/tools/testing/selftests/rcutorture/bin/torture.sh @@ -51,6 +51,8 @@ do_scftorture=yes do_rcuscale=yes do_refscale=yes do_kvfree=yes +do_normal=yes +explicit_normal=no do_kasan=yes do_kcsan=no do_clocksourcewd=yes @@ -128,6 +130,8 @@ do do_refscale=yes do_rt=yes do_kvfree=yes + do_normal=yes + explicit_normal=no do_kasan=yes do_kcsan=yes do_clocksourcewd=yes @@ -161,11 +165,17 @@ do do_refscale=no do_rt=no do_kvfree=no + do_normal=no + explicit_normal=no do_kasan=no do_kcsan=no do_clocksourcewd=no do_srcu_lockdep=no ;; + --do-normal|--do-no-normal|--no-normal) + do_normal=`doyesno "$1" --do-normal` + explicit_normal=yes + ;; --do-rcuscale|--do-no-rcuscale|--no-rcuscale) do_rcuscale=`doyesno "$1" --do-rcuscale` ;; @@ -242,6 +252,17 @@ trap 'rm -rf $T' 0 2 echo " --- " $scriptname $args | tee -a $T/log echo " --- Results directory: " $ds | tee -a $T/log +if test "$do_normal" = "no" && test "$do_kasan" = "no" && test "$do_kcsan" = "no" +then + # Match old scripts so that "--do-none --do-rcutorture" does + # normal rcutorture testing, but no KASAN or KCSAN testing. + if test $explicit_normal = yes + then + echo " --- Everything disabled, so explicit --do-normal overridden" | tee -a $T/log + fi + do_normal=yes +fi + # Calculate rcutorture defaults and apportion time if test -z "$configs_rcutorture" then @@ -332,9 +353,12 @@ function torture_set { local kcsan_kmake_tag= local flavor=$1 shift - curflavor=$flavor - torture_one "$@" - mv $T/last-resdir $T/last-resdir-nodebug || : + if test "$do_normal" = "yes" + then + curflavor=$flavor + torture_one "$@" + mv $T/last-resdir $T/last-resdir-nodebug || : + fi if test "$do_kasan" = "yes" then curflavor=${flavor}-kasan -- 2.43.0

8 months, 4 weeks

1
0
0 0

[PATCH 04/12] rcutorture: Make torture.sh --do-rt use CONFIG_PREEMPT_RT

by Joel Fernandes

From: "Paul E. McKenney" <paulmck(a)kernel.org> The torture.sh --do-rt command-line parameter is intended to mimic -rt kernels. Now that CONFIG_PREEMPT_RT is upstream, this commit makes this mimicking more precise. Note that testing of RCU priority boosting is disabled in favor of forward-progress testing of RCU callbacks. If it turns out to be possible to make kernels built with CONFIG_PREEMPT_RT=y to tolerate testing of both, both will be enabled. [ paulmck: Apply Sebastian Siewior feedback. ] Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de> Signed-off-by: Joel Fernandes <joelagnelf(a)nvidia.com> --- tools/testing/selftests/rcutorture/bin/torture.sh | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/rcutorture/bin/torture.sh b/tools/testing/selftests/rcutorture/bin/torture.sh index 0447c4a00cc4..d53ee1e0ffc7 100755 --- a/tools/testing/selftests/rcutorture/bin/torture.sh +++ b/tools/testing/selftests/rcutorture/bin/torture.sh @@ -448,13 +448,17 @@ fi if test "$do_rt" = "yes" then - # With all post-boot grace periods forced to normal. - torture_bootargs="rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 rcupdate.rcu_normal=1" - torture_set "rcurttorture" tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration "$duration_rcutorture" --configs "TREE03" --trust-make + # In both runs, disable testing of RCU priority boosting because + # -rt doesn't like its interaction with testing of callback + # flooding. + + # With all post-boot grace periods forced to normal (default for PREEMPT_RT). + torture_bootargs="rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 rcutorture.test_boost=0 rcutorture.preempt_duration=0" + torture_set "rcurttorture" tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration "$duration_rcutorture" --configs "TREE03" --kconfig "CONFIG_PREEMPT_RT=y CONFIG_EXPERT=y CONFIG_HZ_PERIODIC=n CONFIG_NO_HZ_IDLE=y" --trust-make # With all post-boot grace periods forced to expedited. - torture_bootargs="rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 rcupdate.rcu_expedited=1" - torture_set "rcurttorture-exp" tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration "$duration_rcutorture" --configs "TREE03" --trust-make + torture_bootargs="rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 rcutorture.test_boost=0 rcupdate.rcu_normal_after_boot=0 rcupdate.rcu_expedited=1 rcutorture.preempt_duration=0" + torture_set "rcurttorture-exp" tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration "$duration_rcutorture" --configs "TREE03" --kconfig "CONFIG_PREEMPT_RT=y CONFIG_EXPERT=y CONFIG_HZ_PERIODIC=n CONFIG_NO_HZ_FULL=y" --trust-make fi if test "$do_srcu_lockdep" = "yes" -- 2.43.0

8 months, 4 weeks

1
0
0 0

[PATCH 02/12] rcutorture: Make srcu_lockdep.sh check reader-conflict handling

by Joel Fernandes

From: "Paul E. McKenney" <paulmck(a)kernel.org> Mixing different flavors of RCU readers is forbidden, for example, you should not use srcu_read_lock() and srcu_read_lock_nmisafe() on the same srcu_struct structure. There are checks for this, but these checks are not tested on a regular basis. This commit therefore adds such tests to srcu_lockdep.sh. Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org> Signed-off-by: Joel Fernandes <joelagnelf(a)nvidia.com> --- .../selftests/rcutorture/bin/srcu_lockdep.sh | 31 +++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/tools/testing/selftests/rcutorture/bin/srcu_lockdep.sh b/tools/testing/selftests/rcutorture/bin/srcu_lockdep.sh index b94f6d3445c6..208be7d09a61 100755 --- a/tools/testing/selftests/rcutorture/bin/srcu_lockdep.sh +++ b/tools/testing/selftests/rcutorture/bin/srcu_lockdep.sh @@ -79,6 +79,37 @@ do done done +# Test lockdep-enabled testing of mixed SRCU readers. +for val in 0x1 0xf +do + err= + tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 5s --configs "SRCU-P" --kconfig "CONFIG_FORCE_NEED_SRCU_NMI_SAFE=y" --bootargs "rcutorture.reader_flavor=$val" --trust-make --datestamp "$ds/$val" > "$T/kvm.sh.out" 2>&1 + ret=$? + mv "$T/kvm.sh.out" "$RCUTORTURE/res/$ds/$val" + if ! grep -q '^CONFIG_PROVE_LOCKING=y' .config + then + echo "rcu_torture_init_srcu_lockdep:Error: CONFIG_PROVE_LOCKING disabled in rcutorture SRCU-P scenario" + nerrs=$((nerrs+1)) + err=1 + fi + if test "$val" -eq 0xf && test "$ret" -eq 0 + then + err=1 + echo -n Unexpected success for > "$RCUTORTURE/res/$ds/$val/kvm.sh.err" + fi + if test "$val" -eq 0x1 && test "$ret" -ne 0 + then + err=1 + echo -n Unexpected failure for > "$RCUTORTURE/res/$ds/$val/kvm.sh.err" + fi + if test -n "$err" + then + grep "rcu_torture_init_srcu_lockdep: test_srcu_lockdep = " "$RCUTORTURE/res/$ds/$val/SRCU-P/console.log" | sed -e 's/^.*rcu_torture_init_srcu_lockdep://' >> "$RCUTORTURE/res/$ds/$val/kvm.sh.err" + cat "$RCUTORTURE/res/$ds/$val/kvm.sh.err" + nerrs=$((nerrs+1)) + fi +done + # Set up exit code. if test "$nerrs" -ne 0 then -- 2.43.0

8 months, 4 weeks

1
0
0 0

[PATCH 01/12] rcutorture: Make srcu_lockdep.sh check kernel Kconfig

by Joel Fernandes

From: "Paul E. McKenney" <paulmck(a)kernel.org> The srcu_lockdep.sh currently blindly trusts the rcutorture SRCU-P scenario to build its kernel with lockdep enabled. Of course, this dependency might not be obvious to someone rebalancing SRCU scenarios. This commit therefore adds code to srcu_lockdep.sh that verifies that the .config file has lockdep enabled. Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org> Signed-off-by: Joel Fernandes <joelagnelf(a)nvidia.com> --- .../testing/selftests/rcutorture/bin/srcu_lockdep.sh | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/rcutorture/bin/srcu_lockdep.sh b/tools/testing/selftests/rcutorture/bin/srcu_lockdep.sh index 2db12c5cad9c..b94f6d3445c6 100755 --- a/tools/testing/selftests/rcutorture/bin/srcu_lockdep.sh +++ b/tools/testing/selftests/rcutorture/bin/srcu_lockdep.sh @@ -39,8 +39,9 @@ do shift done -err= nerrs=0 + +# Test lockdep's handling of deadlocks. for d in 0 1 do for t in 0 1 2 @@ -52,6 +53,12 @@ do tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 5s --configs "SRCU-P" --kconfig "CONFIG_FORCE_NEED_SRCU_NMI_SAFE=y" --bootargs "rcutorture.test_srcu_lockdep=$val rcutorture.reader_flavor=0x2" --trust-make --datestamp "$ds/$val" > "$T/kvm.sh.out" 2>&1 ret=$? mv "$T/kvm.sh.out" "$RCUTORTURE/res/$ds/$val" + if ! grep -q '^CONFIG_PROVE_LOCKING=y' .config + then + echo "rcu_torture_init_srcu_lockdep:Error: CONFIG_PROVE_LOCKING disabled in rcutorture SRCU-P scenario" + nerrs=$((nerrs+1)) + err=1 + fi if test "$d" -ne 0 && test "$ret" -eq 0 then err=1 @@ -71,6 +78,8 @@ do done done done + +# Set up exit code. if test "$nerrs" -ne 0 then exit 1 -- 2.43.0

8 months, 4 weeks

1
0
0 0

[PATCH 14/14] rcutorture: Fix issue with re-using old images on ARM64

by Joel Fernandes

On ARM64, when running with --configs '36*SRCU-P', I noticed that only 1 instance instead of 36 for starting. Fix it by checking for Image files, instead of bzImage which ARM does not seem to have. With this I see all 36 instances running at the same time in the batch. Tested-by: Paul E. McKenney <paulmck(a)kernel.org> Signed-off-by: Joel Fernandes <joelagnelf(a)nvidia.com> --- tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh index ad79784e552d..957800c9ffba 100755 --- a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh +++ b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh @@ -73,7 +73,7 @@ config_override_param "$config_dir/CFcommon.$(uname -m)" KcList \ cp $T/KcList $resdir/ConfigFragment base_resdir=`echo $resdir | sed -e 's/\.[0-9]\+$//'` -if test "$base_resdir" != "$resdir" && test -f $base_resdir/bzImage && test -f $base_resdir/vmlinux +if test "$base_resdir" != "$resdir" && (test -f $base_resdir/bzImage || test -f $base_resdir/Image) && test -f $base_resdir/vmlinux then # Rerunning previous test, so use that test's kernel. QEMU="`identify_qemu $base_resdir/vmlinux`" -- 2.43.0

8 months, 4 weeks

1
0
0 0

[PATCH 11/14] torture: Add --do-{,no-}normal to torture.sh

by Joel Fernandes

From: "Paul E. McKenney" <paulmck(a)kernel.org> Right now, torture.sh runs normal runs unconditionally, which can be slow and thus annoying when you only want to test --kcsan or --kasan runs. This commit therefore adds a --do-normal argument so that "--kcsan --do-no-kasan --do-no-normal" runs only KCSAN runs. Note that specifying "--do-no-kasan --do-no-kcsan --do-no-normal" gets normal runs, so you should not try to use this as a synonym for --do-none. Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org> Signed-off-by: Joel Fernandes <joelagnelf(a)nvidia.com> --- .../selftests/rcutorture/bin/torture.sh | 30 +++++++++++++++++-- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/rcutorture/bin/torture.sh b/tools/testing/selftests/rcutorture/bin/torture.sh index b64b356f55ff..475f758f6216 100755 --- a/tools/testing/selftests/rcutorture/bin/torture.sh +++ b/tools/testing/selftests/rcutorture/bin/torture.sh @@ -51,6 +51,8 @@ do_scftorture=yes do_rcuscale=yes do_refscale=yes do_kvfree=yes +do_normal=yes +explicit_normal=no do_kasan=yes do_kcsan=no do_clocksourcewd=yes @@ -128,6 +130,8 @@ do do_refscale=yes do_rt=yes do_kvfree=yes + do_normal=yes + explicit_normal=no do_kasan=yes do_kcsan=yes do_clocksourcewd=yes @@ -161,11 +165,17 @@ do do_refscale=no do_rt=no do_kvfree=no + do_normal=no + explicit_normal=no do_kasan=no do_kcsan=no do_clocksourcewd=no do_srcu_lockdep=no ;; + --do-normal|--do-no-normal|--no-normal) + do_normal=`doyesno "$1" --do-normal` + explicit_normal=yes + ;; --do-rcuscale|--do-no-rcuscale|--no-rcuscale) do_rcuscale=`doyesno "$1" --do-rcuscale` ;; @@ -242,6 +252,17 @@ trap 'rm -rf $T' 0 2 echo " --- " $scriptname $args | tee -a $T/log echo " --- Results directory: " $ds | tee -a $T/log +if test "$do_normal" = "no" && test "$do_kasan" = "no" && test "$do_kcsan" = "no" +then + # Match old scripts so that "--do-none --do-rcutorture" does + # normal rcutorture testing, but no KASAN or KCSAN testing. + if test $explicit_normal = yes + then + echo " --- Everything disabled, so explicit --do-normal overridden" | tee -a $T/log + fi + do_normal=yes +fi + # Calculate rcutorture defaults and apportion time if test -z "$configs_rcutorture" then @@ -332,9 +353,12 @@ function torture_set { local kcsan_kmake_tag= local flavor=$1 shift - curflavor=$flavor - torture_one "$@" - mv $T/last-resdir $T/last-resdir-nodebug || : + if test "$do_normal" = "yes" + then + curflavor=$flavor + torture_one "$@" + mv $T/last-resdir $T/last-resdir-nodebug || : + fi if test "$do_kasan" = "yes" then curflavor=${flavor}-kasan -- 2.43.0

8 months, 4 weeks

1
0
0 0

[PATCH 04/14] rcutorture: Make torture.sh --do-rt use CONFIG_PREEMPT_RT

by Joel Fernandes

From: "Paul E. McKenney" <paulmck(a)kernel.org> The torture.sh --do-rt command-line parameter is intended to mimic -rt kernels. Now that CONFIG_PREEMPT_RT is upstream, this commit makes this mimicking more precise. Note that testing of RCU priority boosting is disabled in favor of forward-progress testing of RCU callbacks. If it turns out to be possible to make kernels built with CONFIG_PREEMPT_RT=y to tolerate testing of both, both will be enabled. [ paulmck: Apply Sebastian Siewior feedback. ] Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de> Signed-off-by: Joel Fernandes <joelagnelf(a)nvidia.com> --- tools/testing/selftests/rcutorture/bin/torture.sh | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/rcutorture/bin/torture.sh b/tools/testing/selftests/rcutorture/bin/torture.sh index 0447c4a00cc4..b64b356f55ff 100755 --- a/tools/testing/selftests/rcutorture/bin/torture.sh +++ b/tools/testing/selftests/rcutorture/bin/torture.sh @@ -448,13 +448,17 @@ fi if test "$do_rt" = "yes" then - # With all post-boot grace periods forced to normal. - torture_bootargs="rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 rcupdate.rcu_normal=1" - torture_set "rcurttorture" tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration "$duration_rcutorture" --configs "TREE03" --trust-make + # In both runs, disable testing of RCU priority boosting because + # -rt doesn't like its interaction with testing of callback + # flooding. + + # With all post-boot grace periods forced to normal (default for PREEMPT_RT). + torture_bootargs="rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 rcutorture.test_boost=0 rcutorture.preempt_duration=0" + torture_set "rcurttorture" tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration "$duration_rcutorture" --configs "TREE03" --kconfig "CONFIG_PREEMPT_RT=y CONFIG_EXPERT=y CONFIG_HZ_PERIODIC=n CONFIG_NO_HZ_IDLE=y CONFIG_RCU_NOCB_CPU=y" --trust-make # With all post-boot grace periods forced to expedited. - torture_bootargs="rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 rcupdate.rcu_expedited=1" - torture_set "rcurttorture-exp" tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration "$duration_rcutorture" --configs "TREE03" --trust-make + torture_bootargs="rcupdate.rcu_cpu_stall_suppress_at_boot=1 torture.disable_onoff_at_boot rcupdate.rcu_task_stall_timeout=30000 rcutorture.test_boost=0 rcupdate.rcu_normal_after_boot=0 rcupdate.rcu_expedited=1 rcutorture.preempt_duration=0" + torture_set "rcurttorture-exp" tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration "$duration_rcutorture" --configs "TREE03" --kconfig "CONFIG_PREEMPT_RT=y CONFIG_EXPERT=y CONFIG_HZ_PERIODIC=n CONFIG_NO_HZ_FULL=y CONFIG_RCU_NOCB_CPU=y" --trust-make fi if test "$do_srcu_lockdep" = "yes" -- 2.43.0

8 months, 4 weeks

1
0
0 0

[PATCH 02/14] rcutorture: Make srcu_lockdep.sh check reader-conflict handling

by Joel Fernandes

From: "Paul E. McKenney" <paulmck(a)kernel.org> Mixing different flavors of RCU readers is forbidden, for example, you should not use srcu_read_lock() and srcu_read_lock_nmisafe() on the same srcu_struct structure. There are checks for this, but these checks are not tested on a regular basis. This commit therefore adds such tests to srcu_lockdep.sh. Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org> Signed-off-by: Joel Fernandes <joelagnelf(a)nvidia.com> --- .../selftests/rcutorture/bin/srcu_lockdep.sh | 31 +++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/tools/testing/selftests/rcutorture/bin/srcu_lockdep.sh b/tools/testing/selftests/rcutorture/bin/srcu_lockdep.sh index b94f6d3445c6..208be7d09a61 100755 --- a/tools/testing/selftests/rcutorture/bin/srcu_lockdep.sh +++ b/tools/testing/selftests/rcutorture/bin/srcu_lockdep.sh @@ -79,6 +79,37 @@ do done done +# Test lockdep-enabled testing of mixed SRCU readers. +for val in 0x1 0xf +do + err= + tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 5s --configs "SRCU-P" --kconfig "CONFIG_FORCE_NEED_SRCU_NMI_SAFE=y" --bootargs "rcutorture.reader_flavor=$val" --trust-make --datestamp "$ds/$val" > "$T/kvm.sh.out" 2>&1 + ret=$? + mv "$T/kvm.sh.out" "$RCUTORTURE/res/$ds/$val" + if ! grep -q '^CONFIG_PROVE_LOCKING=y' .config + then + echo "rcu_torture_init_srcu_lockdep:Error: CONFIG_PROVE_LOCKING disabled in rcutorture SRCU-P scenario" + nerrs=$((nerrs+1)) + err=1 + fi + if test "$val" -eq 0xf && test "$ret" -eq 0 + then + err=1 + echo -n Unexpected success for > "$RCUTORTURE/res/$ds/$val/kvm.sh.err" + fi + if test "$val" -eq 0x1 && test "$ret" -ne 0 + then + err=1 + echo -n Unexpected failure for > "$RCUTORTURE/res/$ds/$val/kvm.sh.err" + fi + if test -n "$err" + then + grep "rcu_torture_init_srcu_lockdep: test_srcu_lockdep = " "$RCUTORTURE/res/$ds/$val/SRCU-P/console.log" | sed -e 's/^.*rcu_torture_init_srcu_lockdep://' >> "$RCUTORTURE/res/$ds/$val/kvm.sh.err" + cat "$RCUTORTURE/res/$ds/$val/kvm.sh.err" + nerrs=$((nerrs+1)) + fi +done + # Set up exit code. if test "$nerrs" -ne 0 then -- 2.43.0

8 months, 4 weeks

1
0
0 0

[PATCH 01/14] rcutorture: Make srcu_lockdep.sh check kernel Kconfig

by Joel Fernandes

From: "Paul E. McKenney" <paulmck(a)kernel.org> The srcu_lockdep.sh currently blindly trusts the rcutorture SRCU-P scenario to build its kernel with lockdep enabled. Of course, this dependency might not be obvious to someone rebalancing SRCU scenarios. This commit therefore adds code to srcu_lockdep.sh that verifies that the .config file has lockdep enabled. Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org> Signed-off-by: Joel Fernandes <joelagnelf(a)nvidia.com> --- .../testing/selftests/rcutorture/bin/srcu_lockdep.sh | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/rcutorture/bin/srcu_lockdep.sh b/tools/testing/selftests/rcutorture/bin/srcu_lockdep.sh index 2db12c5cad9c..b94f6d3445c6 100755 --- a/tools/testing/selftests/rcutorture/bin/srcu_lockdep.sh +++ b/tools/testing/selftests/rcutorture/bin/srcu_lockdep.sh @@ -39,8 +39,9 @@ do shift done -err= nerrs=0 + +# Test lockdep's handling of deadlocks. for d in 0 1 do for t in 0 1 2 @@ -52,6 +53,12 @@ do tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 5s --configs "SRCU-P" --kconfig "CONFIG_FORCE_NEED_SRCU_NMI_SAFE=y" --bootargs "rcutorture.test_srcu_lockdep=$val rcutorture.reader_flavor=0x2" --trust-make --datestamp "$ds/$val" > "$T/kvm.sh.out" 2>&1 ret=$? mv "$T/kvm.sh.out" "$RCUTORTURE/res/$ds/$val" + if ! grep -q '^CONFIG_PROVE_LOCKING=y' .config + then + echo "rcu_torture_init_srcu_lockdep:Error: CONFIG_PROVE_LOCKING disabled in rcutorture SRCU-P scenario" + nerrs=$((nerrs+1)) + err=1 + fi if test "$d" -ne 0 && test "$ret" -eq 0 then err=1 @@ -71,6 +78,8 @@ do done done done + +# Set up exit code. if test "$nerrs" -ne 0 then exit 1 -- 2.43.0

8 months, 4 weeks

1
0
0 0

[PATCH] tracing: selftests: Add testing a user string to filters

by Steven Rostedt

From: Steven Rostedt <rostedt(a)goodmis.org> Running the following commands was broken: # cd /sys/kernel/tracing # echo "filename.ustring ~ \"/proc*\"" > events/syscalls/sys_enter_openat/filter # echo 1 > events/syscalls/sys_enter_openat/enable # ls /proc/$$/maps # cat trace And would produce nothing when it should have produced something like: ls-1192 [007] ..... 8169.828333: sys_openat(dfd: ffffffffffffff9c, filename: 7efc18359904, flags: 80000, mode: 0) Add a test to check this case so that it will be caught if it breaks again. Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- Shuah, I'm Cc'ing you on this for your information, but I'll take it through my tree as it will be attached with the fix, as it will fail without it. .../test.d/filter/event-filter-function.tc | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc index 118247b8dd84..ab449a2cea8c 100644 --- a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc +++ b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc @@ -80,6 +80,25 @@ if [ $misscnt -gt 0 ]; then exit_fail fi +# Check strings too +if [ -f events/syscalls/sys_enter_openat/filter ]; then + echo "filename.ustring ~ \"*test.d*\"" > events/syscalls/sys_enter_openat/filter + echo 1 > events/syscalls/sys_enter_openat/enable + echo 1 > tracing_on + ls /bin/sh + nocnt=`grep openat trace | wc -l` + ls $TEST_DIR + echo 0 > tracing_on + hitcnt=`grep openat trace | wc -l`; + echo 0 > events/syscalls/sys_enter_openat/enable + if [ $nocnt -gt 0 ]; then + exit_fail + fi + if [ $hitcnt -eq 0 ]; then + exit_fail + fi +fi + reset_events_filter exit 0 -- 2.47.2

8 months, 4 weeks

2
2
0 0

[PATCH v6 05/18] rust: kunit: refactor to use `&raw [const|mut]`

by Antonio Hickey

Replacing all occurrences of `addr_of!(place)` and `addr_of_mut!(place)` with `&raw const place` and `&raw mut place` respectively. This will allow us to reduce macro complexity, and improve consistency with existing reference syntax as `&raw const`, `&raw mut` are similar to `&`, `&mut` making it fit more naturally with other existing code. Suggested-by: Benno Lossin <benno.lossin(a)proton.me> Link: https://github.com/Rust-for-Linux/linux/issues/1148 Signed-off-by: Antonio Hickey <contact(a)antoniohickey.com> --- rust/kernel/kunit.rs | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/rust/kernel/kunit.rs b/rust/kernel/kunit.rs index 1604fb6a5b1b..9f8165b15a37 100644 --- a/rust/kernel/kunit.rs +++ b/rust/kernel/kunit.rs @@ -130,9 +130,9 @@ unsafe impl Sync for UnaryAssert {} unsafe { $crate::bindings::__kunit_do_failed_assertion( kunit_test, - core::ptr::addr_of!(LOCATION.0), + &raw const LOCATION.0, $crate::bindings::kunit_assert_type_KUNIT_ASSERTION, - core::ptr::addr_of!(ASSERTION.0.assert), + &raw const ASSERTION.0.assert, Some($crate::bindings::kunit_unary_assert_format), core::ptr::null(), ); @@ -261,7 +261,7 @@ macro_rules! kunit_unsafe_test_suite { // (as documented) must be valid for the lifetime of // the suite (i.e., static). test_cases: unsafe { - ::core::ptr::addr_of_mut!($test_cases) + (&raw mut $test_cases) .cast::<::kernel::bindings::kunit_case>() }, suite_init: None, @@ -283,7 +283,7 @@ macro_rules! kunit_unsafe_test_suite { #[cfg_attr(not(target_os = "macos"), link_section = ".kunit_test_suites")] static mut KUNIT_TEST_SUITE_ENTRY: *const ::kernel::bindings::kunit_suite = // SAFETY: `KUNIT_TEST_SUITE` is static. - unsafe { ::core::ptr::addr_of_mut!(KUNIT_TEST_SUITE) }; + unsafe { &raw mut KUNIT_TEST_SUITE }; }; }; } -- 2.48.1

8 months, 4 weeks

1
0
0 0

[PATCH v4 0/4] mm: introduce THP deferred setting

by Nico Pache

This series is a follow-up to [1], which adds mTHP support to khugepaged. mTHP khugepaged support is a "loose" dependency for the sysfs/sysctl configs to make sense. Without it global="defer" and mTHP="inherit" case is "undefined" behavior. We've seen cases were customers switching from RHEL7 to RHEL8 see a significant increase in the memory footprint for the same workloads. Through our investigations we found that a large contributing factor to the increase in RSS was an increase in THP usage. For workloads like MySQL, or when using allocators like jemalloc, it is often recommended to set /transparent_hugepages/enabled=never. This is in part due to performance degradations and increased memory waste. This series introduces enabled=defer, this setting acts as a middle ground between always and madvise. If the mapping is MADV_HUGEPAGE, the page fault handler will act normally, making a hugepage if possible. If the allocation is not MADV_HUGEPAGE, then the page fault handler will default to the base size allocation. The caveat is that khugepaged can still operate on pages thats not MADV_HUGEPAGE. This allows for three things... one, applications specifically designed to use hugepages will get them, and two, applications that don't use hugepages can still benefit from them without aggressively inserting THPs at every possible chance. This curbs the memory waste, and defers the use of hugepages to khugepaged. Khugepaged can then scan the memory for eligible collapsing. Lastly there is the added benefit for those who want THPs but experience higher latency PFs. Now you can get base page performance at the PF handler and Hugepage performance for those mappings after they collapse. Admins may want to lower max_ptes_none, if not, khugepaged may aggressively collapse single allocations into hugepages. TESTING: - Built for x86_64, aarch64, ppc64le, and s390x - selftests mm - In [1] I provided a script [2] that has multiple access patterns - lots of general use. - redis testing. This test was my original case for the defer mode. What I was able to prove was that THP=always leads to increased max_latency cases; hence why it is recommended to disable THPs for redis servers. However with 'defer' we dont have the max_latency spikes and can still get the system to utilize THPs. I further tested this with the mTHP defer setting and found that redis (and probably other jmalloc users) can utilize THPs via defer (+mTHP defer) without a large latency penalty and some potential gains. I uploaded some mmtest results here[3] which compares: stock+thp=never stock+(m)thp=always khugepaged-mthp + defer (max_ptes_none=64) The results show that (m)THPs can cause some throughput regression in some cases, but also has gains in other cases. The mTHP+defer results have more gains and less losses over the (m)THP=always case. V4 Changes: - Minor Documentation fixes - rebased the dependent series [1] onto mm-unstable commit 0e68b850b1d3 ("vmalloc: use atomic_long_add_return_relaxed()") V3 Changes: - moved some Documentation to the other series and merged the remaining Documentation updates into one V2 Changes: - rebase changes ontop mTHP khugepaged support series - Fix selftests parsing issue - add mTHP defer option - add mTHP defer Documentation [1] - https://lore.kernel.org/lkml/20250417000238.74567-1-npache@redhat.com/ [2] - https://gitlab.com/npache/khugepaged_mthp_test [3] - https://people.redhat.com/npache/mthp_khugepaged_defer/testoutput2/output.h… Nico Pache (4): mm: defer THP insertion to khugepaged mm: document (m)THP defer usage khugepaged: add defer option to mTHP options selftests: mm: add defer to thp setting parser Documentation/admin-guide/mm/transhuge.rst | 31 +++++++--- include/linux/huge_mm.h | 18 +++++- mm/huge_memory.c | 69 +++++++++++++++++++--- mm/khugepaged.c | 10 ++-- tools/testing/selftests/mm/thp_settings.c | 1 + tools/testing/selftests/mm/thp_settings.h | 1 + 6 files changed, 107 insertions(+), 23 deletions(-) -- 2.48.1

8 months, 4 weeks

3
7
0 0

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror